- Two minor fixes for recent changes
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEKLLlsBKG3yQ88j7+M2qzM29mf5cFAma3pFoACgkQM2qzM29m
f5f4dA//SiPOS7v60jQubfV3f031E+W3a6YxvB5tqByA23hOavjYowGREyvhtaS5
lgqRO25BXz7uNOjSWUXaV7SsR1GKElWtk/gGVsfYy03A91Qvafb24oapJRwRXeJ1
5n6X5AScxJeoDGEIUMoGU3O87v4T6UoTj5K6bjUDqr5kTRTL6rPFj3jYvy0YVLe3
EsA1P+BLctB1XiboMYNfZyEJbxNF5gpO7OaeS8VPQIeNYhaF1cwsa7A9wDJ22UpB
BQEguTjPqgzkthOdkc3jNcoXtDUEzAXxQqACKR8CU6cCoyr159KupSlci4LqA2o4
8CwcVd01L79JjGcrZqo+4vmbuY6rfhN7U/EYcNNJPZKUIo7Q5h4dq/pCaBADK2Ke
UJVPsmdYXKMwn/pPJI9kjoellqSrjOxqFDXXqBb/e8Z/xXuXmiYRl6E+dtGeBlty
j4qm12d1BGKs7zRaLLThbFTH6s3kUCjVhEr+uphse6CVF6HV/NNK79jooHrrgCiS
e1vgRaYIgzMREOyq8fZwvINDuk/8Pa1Z4wlR65orwVzWlZMmk9bLm+wwplcyj34x
una8RrQhxlicY5nAGUa+aRCkKUtO9oSPs6hukoT+DB06+mhQ7u3ro96ojueMfMdQ
pKux8zrCdJe7jZUlnIW30/QsrgWhHW9KsrdUA9eKQ1FoZIOdS9I=
=YX9E
-----END PGP SIGNATURE-----
Merge tag 'nfsd-6.11-1' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux
Pull nfsd fixes from Chuck Lever:
- Two minor fixes for recent changes
* tag 'nfsd-6.11-1' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux:
nfsd: don't set SVC_SOCK_ANONYMOUS when creating nfsd sockets
sunrpc: avoid -Wformat-security warning
const qualify the struct ctl_table argument in the proc_handler function
signatures. This is a prerequisite to moving the static ctl_table
structs into .rodata data which will ensure that proc_handler function
pointers cannot be modified.
This patch has been generated by the following coccinelle script:
```
virtual patch
@r1@
identifier ctl, write, buffer, lenp, ppos;
identifier func !~ "appldata_(timer|interval)_handler|sched_(rt|rr)_handler|rds_tcp_skbuf_handler|proc_sctp_do_(hmac_alg|rto_min|rto_max|udp_port|alpha_beta|auth|probe_interval)";
@@
int func(
- struct ctl_table *ctl
+ const struct ctl_table *ctl
,int write, void *buffer, size_t *lenp, loff_t *ppos);
@r2@
identifier func, ctl, write, buffer, lenp, ppos;
@@
int func(
- struct ctl_table *ctl
+ const struct ctl_table *ctl
,int write, void *buffer, size_t *lenp, loff_t *ppos)
{ ... }
@r3@
identifier func;
@@
int func(
- struct ctl_table *
+ const struct ctl_table *
,int , void *, size_t *, loff_t *);
@r4@
identifier func, ctl;
@@
int func(
- struct ctl_table *ctl
+ const struct ctl_table *ctl
,int , void *, size_t *, loff_t *);
@r5@
identifier func, write, buffer, lenp, ppos;
@@
int func(
- struct ctl_table *
+ const struct ctl_table *
,int write, void *buffer, size_t *lenp, loff_t *ppos);
```
* Code formatting was adjusted in xfs_sysctl.c to comply with code
conventions. The xfs_stats_clear_proc_handler,
xfs_panic_mask_proc_handler and xfs_deprecated_dointvec_minmax where
adjusted.
* The ctl_table argument in proc_watchdog_common was const qualified.
This is called from a proc_handler itself and is calling back into
another proc_handler, making it necessary to change it as part of the
proc_handler migration.
Co-developed-by: Thomas Weißschuh <linux@weissschuh.net>
Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
Co-developed-by: Joel Granados <j.granados@samsung.com>
Signed-off-by: Joel Granados <j.granados@samsung.com>
Using a non-constant string as an sprintf-style is potentially dangerous:
net/sunrpc/svc.c: In function 'param_get_pool_mode':
net/sunrpc/svc.c:164:32: error: format not a string literal and no format arguments [-Werror=format-security]
Use a literal "%s" format instead.
Fixes: 5f71f3c325 ("sunrpc: refactor pool_mode setting code")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
New Features:
* Add support for large folios
* Implement rpcrdma generic device removal notification
* Add client support for attribute delegations
* Use a LAYOUTRETURN during reboot recovery to report layoutstats and errors
* Improve throughput for random buffered writes
* Add NVMe support to pnfs/blocklayout
Bugfixes:
* Fix rpcrdma_reqs_reset()
* Avoid soft lockups when using UDP
* Fix an nfs/blocklayout premature PR key unregestration
* Another fix for EXCHGID4_FLAG_USE_PNFS_DS for DS server
* Do not extend writes to the entire folio
* Pass explicit offset and count values to tracepoints
* Fix a race to wake up sleeping SUNRPC sync tasks
* Fix gss_status tracepoint output
Cleanups:
* Add missing MODULE_DESCRIPTION() macros
* Add blocklayout / SCSI layout tracepoints
* Remove asm-generic headers from xprtrdma verbs.c
* Remove unused 'struct mnt_fhstatus'
* Other delegation related cleanups
* Other folio related cleanups
* Other pNFS related cleanups
* Other xprtrdma cleanups
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEnZ5MQTpR7cLU7KEp18tUv7ClQOsFAmaZgr0ACgkQ18tUv7Cl
QOv8FxAAnUyYG7Kdbv+5Ko/SFv0imxCb5DQh2XC/hSHNrlKBlDnqe2PANXR9XocL
mS0Wry5tZf/T+o+QoKv0HQUdWFlnqKzwclggrekf/lkioU1feWsLe2RzDl1iUh0V
6fwcCyWXW1mYX2CtCaDe+/ZFcoZOMD+bItNHt/RdDScSnS9Jd8GSyocsVKsqaBx6
3wub0FJ4UBgYNoX2T3YyK2JwvO9GLaKIQRJV74rjgPJKjcjhptbcb5MKBmOZrF95
UCcpl4CwvD9RTsSEp0B98UbAFFpk8Nw1tmHF3GmyG/nsrJomDuLKFvbsiq23eHUf
XeULZIbjMEzU56vjoTglZA4s7JYx17D0vzdPGUqU4mLN3LPm5LtGLBg2uQoPw/xW
50euLU+ol36mfnQlBsuM/tAXgtoAcT63aNeNRNp8aOL47xA+PC6kWTBK9OaR5+x6
w+d22Dpy+riMk1TRaAVt0ANcENKELsWRFvxkuWCpQhVoQ1h8LigQJzeggEEK7Sa6
5u9H6wCTee2wz746uwA43koj1utuyrLq/5S+qEtCY1pbP3U0A+Gh0Xh00OXiYuzL
TgRdksmiAL8cA51WjSrq6HhGLOUJAYLfbdKaVhW+fULxUVwzWhFFaFbbdiq/e4OR
0pfqls8UZWICE51GeTfalEidpKZgV/LxU3QOuVoalWBULyj/TeI=
=avTW
-----END PGP SIGNATURE-----
Merge tag 'nfs-for-6.11-1' of git://git.linux-nfs.org/projects/anna/linux-nfs
Pull NFS client updates from Anna Schumaker:
"New Features:
- Add support for large folios
- Implement rpcrdma generic device removal notification
- Add client support for attribute delegations
- Use a LAYOUTRETURN during reboot recovery to report layoutstats
and errors
- Improve throughput for random buffered writes
- Add NVMe support to pnfs/blocklayout
Bugfixes:
- Fix rpcrdma_reqs_reset()
- Avoid soft lockups when using UDP
- Fix an nfs/blocklayout premature PR key unregestration
- Another fix for EXCHGID4_FLAG_USE_PNFS_DS for DS server
- Do not extend writes to the entire folio
- Pass explicit offset and count values to tracepoints
- Fix a race to wake up sleeping SUNRPC sync tasks
- Fix gss_status tracepoint output
Cleanups:
- Add missing MODULE_DESCRIPTION() macros
- Add blocklayout / SCSI layout tracepoints
- Remove asm-generic headers from xprtrdma verbs.c
- Remove unused 'struct mnt_fhstatus'
- Other delegation related cleanups
- Other folio related cleanups
- Other pNFS related cleanups
- Other xprtrdma cleanups"
* tag 'nfs-for-6.11-1' of git://git.linux-nfs.org/projects/anna/linux-nfs: (63 commits)
SUNRPC: Fixup gss_status tracepoint error output
SUNRPC: Fix a race to wake a sync task
nfs: split nfs_read_folio
nfs: pass explicit offset/count to trace events
nfs: do not extend writes to the entire folio
nfs/blocklayout: add support for NVMe
nfs: remove nfs_page_length
nfs: remove the unused max_deviceinfo_size field from struct pnfs_layoutdriver_type
nfs: don't reuse partially completed requests in nfs_lock_and_join_requests
nfs: move nfs_wait_on_request to write.c
nfs: fold nfs_page_group_lock_subrequests into nfs_lock_and_join_requests
nfs: fold nfs_folio_find_and_lock_request into nfs_lock_and_join_requests
nfs: simplify nfs_folio_find_and_lock_request
nfs: remove nfs_folio_private_request
nfs: remove dead code for the old swap over NFS implementation
NFSv4.1 another fix for EXCHGID4_FLAG_USE_PNFS_DS for DS server
nfs: Block on write congestion
nfs: Properly initialize server->writeback
nfs: Drop pointless check from nfs_commit_release_pages()
nfs/blocklayout: SCSI layout trace points for reservation key reg/unreg
...
We've observed NFS clients with sync tasks sleeping in __rpc_execute
waiting on RPC_TASK_QUEUED that have not responded to a wake-up from
rpc_make_runnable(). I suspect this problem usually goes unnoticed,
because on a busy client the task will eventually be re-awoken by another
task completion or xprt event. However, if the state manager is draining
the slot table, a sync task missing a wake-up can result in a hung client.
We've been able to prove that the waker in rpc_make_runnable() successfully
calls wake_up_bit() (ie- there's no race to tk_runstate), but the
wake_up_bit() call fails to wake the waiter. I suspect the waker is
missing the load of the bit's wait_queue_head, so waitqueue_active() is
false. There are some very helpful comments about this problem above
wake_up_bit(), prepare_to_wait(), and waitqueue_active().
Fix this by inserting smp_mb__after_atomic() before the wake_up_bit(),
which pairs with prepare_to_wait() calling set_current_state().
Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
This is a light release containing optimizations, code clean-ups,
and minor bug fixes. This development cycle focused on work outside
of upstream kernel development:
1. Continuing to build upstream CI for NFSD based on kdevops
2. Continuing to focus on the quality of NFSD in LTS kernels
3. Participation in IETF nfsv4 WG discussions about NFSv4 ACLs,
directory delegation, and NFSv4.2 COPY offload
Notable features in v6.11 that were not pulled through the NFSD tree
include NFS server-side support for the new pNFS NVMe layout type
[RFC9561]. Functional testing for pNFS block layouts like this one
has been introduced to our kdevops CI harness. Work on improving
the resolution of file attribute time stamps in local filesystems
is also ongoing tree-wide.
As always I am grateful to NFSD contributors, reviewers, testers,
and bug reporters who participated during this cycle.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEKLLlsBKG3yQ88j7+M2qzM29mf5cFAmaVM0cACgkQM2qzM29m
f5fzOQ//c5CXIF3zCLIUofm5eZSP2zIszmHR75rVTEnf0Ehm2BJRF6VZiTvWXRpz
bOuswxfV1Bds+TofbPIP8jqDcMp8NIXemdb6+QMwh4FDY4M8t1v6TRYt35L6Ulrq
bSV81aRS622ofQ35sRzwxpGX6rB6YbB+5L4EKuxdEqRKSB8rCxQcjPy2qypcWlRC
hEAGDe3IiVxTz4VQBpASRqbH9Udw/XEqIhv5c8aLtPvl8i+yWyV5m2G5FMRdBj49
u8rCLoPi/mON8TDs2U4pbhcdgfBWWvGS6woFp6qrqM0wzXIPLalWsPGK3DUtuFUg
onxsClJXMWUvW4k4hbjiqosduLGY/kMeX62Lx1dCj/gktrJpU0GDNR/XbBhHU+hj
UT2CL8AfedC4FQekdyJri/rDgPiTMsf8UE0lgtchHMUXH0ztrjaRxMGiIFMm5vCl
dJBMGJfCkKR/+U1YrGRQI0tPL8CJKYI8klOEhLoOsCr/WC9p4nvvAzSg4W9mNK5P
ni4+KU4f/bj8U0Ap2bUacTpXj6W8VcwJWeuHahVA1Slo+eqXO401hj4W88dQmm9O
ZDR5h+6PI6KoL/KL6I4EyOv+sIEtW3s18cEWbSSu3N/CPuhSGTx8d2J201shJXRN
uDdMkvbwv48x20pgD2oTkPrZbJHOL3BK5/WPBg7pwpfkoRrBAhY=
=Xd5e
-----END PGP SIGNATURE-----
Merge tag 'nfsd-6.11' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux
Pull nfsd updates from Chuck Lever:
"This is a light release containing optimizations, code clean-ups, and
minor bug fixes.
This development cycle focused on work outside of upstream kernel
development:
- Continuing to build upstream CI for NFSD based on kdevops
- Continuing to focus on the quality of NFSD in LTS kernels
- Participation in IETF nfsv4 WG discussions about NFSv4 ACLs,
directory delegation, and NFSv4.2 COPY offload
Notable features for v6.11 that do not come through the NFSD tree
include NFS server-side support for the new pNFS NVMe layout type
[RFC9561]. Functional testing for pNFS block layouts like this one has
been introduced to our kdevops CI harness. Work on improving the
resolution of file attribute time stamps in local filesystems is also
ongoing tree-wide.
As always I am grateful to NFSD contributors, reviewers, testers, and
bug reporters who participated during this cycle"
* tag 'nfsd-6.11' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux:
nfsd: nfsd_file_lease_notifier_call gets a file_lease as an argument
gss_krb5: Fix the error handling path for crypto_sync_skcipher_setkey
MAINTAINERS: Add a bugzilla link for NFSD
nfsd: new netlink ops to get/set server pool_mode
sunrpc: refactor pool_mode setting code
nfsd: allow passing in array of thread counts via netlink
nfsd: make nfsd_svc take an array of thread counts
sunrpc: fix up the special handling of sv_nrpools == 1
SUNRPC: Add a trace point in svc_xprt_deferred_close
NFSD: Support write delegations in LAYOUTGET
lockd: Use *-y instead of *-objs in Makefile
NFSD: Fix nfsdcld warning
svcrdma: Handle ADDR_CHANGE CM event properly
svcrdma: Refactor the creation of listener CMA ID
NFSD: remove unused structs 'nfsd3_voidargs'
NFSD: harden svcxdr_dupstr() and svcxdr_tmpalloc() against integer overflows
When using a BPF program on kernel_connect(), the call can return -EPERM. This
causes xs_tcp_setup_socket() to loop forever, filling up the syslog and causing
the kernel to potentially freeze up.
Neil suggested:
This will propagate -EPERM up into other layers which might not be ready
to handle it. It might be safer to map EPERM to an error we would be more
likely to expect from the network system - such as ECONNREFUSED or ENETDOWN.
ECONNREFUSED as error seems reasonable. For programs setting a different error
can be out of reach (see handling in 4fbac77d2d) in particular on kernels
which do not have f10d059661 ("bpf: Make BPF_PROG_RUN_ARRAY return -err
instead of allow boolean"), thus given that it is better to simply remap for
consistent behavior. UDP does handle EPERM in xs_udp_send_request().
Fixes: d74bad4e74 ("bpf: Hooks for sys_connect")
Fixes: 4fbac77d2d ("bpf: Hooks for sys_bind")
Co-developed-by: Lex Siegel <usiegl00@gmail.com>
Signed-off-by: Lex Siegel <usiegl00@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: Neil Brown <neilb@suse.de>
Cc: Trond Myklebust <trondmy@kernel.org>
Cc: Anna Schumaker <anna@kernel.org>
Link: https://github.com/cilium/cilium/issues/33395
Link: https://lore.kernel.org/bpf/171374175513.12877.8993642908082014881@noble.neil.brown.name
Link: https://patch.msgid.link/9069ec1d59e4b2129fc23433349fd5580ad43921.1720075070.git.daniel@iogearbox.net
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
If we fail to call crypto_sync_skcipher_setkey, we should free the
memory allocation for cipher, replace err_return with err_free_cipher
to free the memory of cipher.
Fixes: 4891f2d008 ("gss_krb5: import functionality to derive keys into the kernel")
Signed-off-by: Gaosheng Cui <cuigaosheng1@huawei.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Allow the pool_mode setting code to be called from internal callers
so we can call it from a new netlink op. Add a new svc_pool_map_get
function to return the current setting. Change the existing module
parameter handling to use the new interfaces under the hood.
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Only pooled services take a reference to the svc_pool_map. The sunrpc
code has always used the sv_nrpools value to detect whether the service
is pooled.
The problem there is that nfsd is a pooled service, but when it's
running in "global" pool_mode, it doesn't take a reference to the pool
map because it has a sv_nrpools value of 1. This means that we have
two separate codepaths for starting the server, depending on whether
it's pooled or not.
Fix this by adding a new flag to the svc_serv, that indicates whether
the serv is pooled. With this we can have the nfsd service
unconditionally take a reference, regardless of pool_mode.
Note that this is a behavior change for
/sys/module/sunrpc/parameters/pool_mode. Usually this file does not
allow you to change the pool-mode while there are nfsd threads running,
but if the pool-mode is "global" it's allowed. My assumption is that
this is a bug, since it probably should never have worked this way.
This patch changes the behavior such that you get back EBUSY even
when nfsd is running in global mode. I think this is more reasonable
behavior, and given that most people set this today using the module
parameter, it's doubtful anyone will notice.
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
The trace point in svc_xprt_close() reports only some local close
requests. Try to capture more local close requests.
Note that "trace-cmd record -T -e sunrpc:svc_xprt_close" will
neatly capture the identity of the caller requesting the close.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Sagi tells me that when a bonded device reports an address change,
the consumer must destroy its listener IDs and create new ones.
See commit a032e4f6d6 ("nvmet-rdma: fix bonding failover possible
NULL deref").
Suggested-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
In a moment, I will add a second consumer of CMA ID creation in
svcrdma. Refactor so this code can be reused.
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Prior to the commit identified below, call_transmit_status() would
handle -EPERM and other errors related to an unreachable server by
falling through to call_status() which added a 3-second delay and
handled the failure as a timeout.
Since that commit, call_transmit_status() falls through to
handle_bind(). For UDP this moves straight on to handle_connect() and
handle_transmit() so we immediately retransmit - and likely get the same
error.
This results in an indefinite loop in __rpc_execute() which triggers a
soft-lockup warning.
For the errors that indicate an unreachable server,
call_transmit_status() should fall back to call_status() as it did
before. This cannot cause the thundering herd that the previous patch
was avoiding, as the call_status() will insert a delay.
Fixes: ed7dc973bd ("SUNRPC: Prevent thundering herd when the socket is not connected")
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
The original code was designed so that most calls to
rpcrdma_rep_create() would occur on the NUMA node that the device
preferred. There are a few cases where that's not possible, so
those reps are marked as temporary.
However, we have the device (and its preferred node) already in
rpcrdma_rep_create(), so let's use that to guarantee the memory
is allocated from the correct node.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Commit 7a03aeb66c ("xprtrdma: Micro-optimize MR DMA-unmapping")
removed the last use of the @r_xprt parameter in this function, but
neglected to remove the parameter itself.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Wait for all disconnects to complete to ensure the transport has
divested all of its hardware resources before the underlying RDMA
device can be removed.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Commit e87a911fed ("nvme-rdma: use ib_client API to detect device
removal") explains the benefits of handling device removal outside
of the CM event handler.
Sketch in an IB device removal notification mechanism that can be
used by both the client and server side RPC-over-RDMA transport
implementations.
Suggested-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Avoid FastReg operations getting MW_BIND_ERR after a reconnect.
rpcrdma_reqs_reset() is called on transport tear-down to get each
rpcrdma_req back into a clean state.
MRs on req->rl_registered are waiting for a FastReg, are already
registered, or are waiting for invalidation. If the transport is
being torn down when reqs_reset() is called, the matching LocalInv
might never be posted. That leaves these MR registered /and/ on
req->rl_free_mrs, where they can be re-used for the next
connection.
Since xprtrdma does not keep specific track of the MR state, it's
not possible to know what state these MRs are in, so the only safe
thing to do is release them immediately.
Fixes: 5de55ce951 ("xprtrdma: Release in-flight MRs on disconnect")
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
asm-generic/barrier.h and asm/bitops.h are already brought into the
file through the header linux/sunrpc/svc_rdma.h and the file can
still be built with their removal. They have been replaced with the
preferred linux/bitops.h and asm/barrier.h to remove the need for the
asm-generic header.
Suggested-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Tanzir Hasan <tanzirh@google.com>
Acked-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
I still see "RPC: Could not send backchannel reply error: -110"
quite often, along with slow-running tests. Debugging shows that the
backchannel is still stumbling when it has to queue a callback reply
on a busy transport.
Note that every one of these timeouts causes a connection loss by
virtue of the xprt_conditional_disconnect() call in that arm of
call_cb_transmit_status().
I found that setting to_maxval is necessary to get the RPC timeout
logic to behave whenever to_exponential is not set.
Fixes: 57331a59ac ("NFSv4.1: Use the nfs_client's rpc timeouts for backchannel")
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Benjamin Coddington <bcodding@redhat.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Bugfixes:
- NFSv4.2: Fix a memory leak in nfs4_set_security_label
- NFSv2/v3: abort nfs_atomic_open_v23 if the name is too long.
- NFS: Add appropriate memory barriers to the sillyrename code
- Propagate readlink errors in nfs_symlink_filler
- NFS: don't invalidate dentries on transient errors
- NFS: fix unnecessary synchronous writes in random write workloads
- NFSv4.1: enforce rootpath check when deciding whether or not to trunk
Other:
- Change email address for Trond Myklebust due to email server concerns
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEESQctxSBg8JpV8KqEZwvnipYKAPIFAmZrD7kACgkQZwvnipYK
APJo/RAAlg5uU8kTzXCbUbe9LImF5nh+9XFtg6nnQ+rxQqCU17noT0bazghvBcDP
N6v4evWJVeZhnqspVZkdMQWeyNEqsew5uPRoC+gy0sh4RdwT+BHsMwLMWtNTzXoc
GW7DOJ7LePzxmh0bksIFn6vmsuhxyI7hKkDx8XuG0YjmHQDcl2TeyHLfii7TFIMP
hFEVw63ZFb+HKV0oInyP27iiM1HstvZ8MbxLcu1PoA3IaiNUYXUgBViWF2c5P6uY
KV7KynUMgkWQc289aaR7QE5Yz2f4vsYF4oD72+Z3v65W+5HunYut/HUnUGgjHPGq
dI5EwSgxW5YKoo/kIvto3yF+ppkppl2gUYFlN3+O/IEXwh+FTXBF2b/tUWFkKQPH
7X3YoosWV/WN1eWqa55znF5IzrG5gdR5z6Et1elTmn4xG4hwoC5U5lOP34DohS4Q
N/MMxzVcL348j2teN+dFNXM3WkEVaMJudavJ7A7KehZKSTZAuFNHDxsMjvyGBbGI
Za1DanWSCWBD8Bawt9hB8z+k4eN7dGfeWUgMHa2zxOowfeq0MYTjAlr1A8SuXADv
E+1tjL7CL7HeHReOg0cbP11BxLOlXYiObsbivFcbYGglbWwtPqR4q4+CIA6AuhCF
LSpxF/6uKCXUHCuuGdAgZ5nNApHvC+HoUN0gCBcpIALcazLq0TQ=
=DeAG
-----END PGP SIGNATURE-----
Merge tag 'nfs-for-6.10-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs
Pull NFS client fixes from Trond Myklebust:
"Bugfixes:
- NFSv4.2: Fix a memory leak in nfs4_set_security_label
- NFSv2/v3: abort nfs_atomic_open_v23 if the name is too long.
- NFS: Add appropriate memory barriers to the sillyrename code
- Propagate readlink errors in nfs_symlink_filler
- NFS: don't invalidate dentries on transient errors
- NFS: fix unnecessary synchronous writes in random write workloads
- NFSv4.1: enforce rootpath check when deciding whether or not to trunk
Other:
- Change email address for Trond Myklebust due to email server concerns"
* tag 'nfs-for-6.10-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
NFS: add barriers when testing for NFS_FSDATA_BLOCKED
SUNRPC: return proper error from gss_wrap_req_priv
NFSv4.1 enforce rootpath check in fs_location query
NFS: abort nfs_atomic_open_v23 if name is too long.
nfs: don't invalidate dentries on transient errors
nfs: Avoid flushing many pages with NFS_FILE_SYNC
nfs: propagate readlink errors in nfs_symlink_filler
MAINTAINERS: Change email address for Trond Myklebust
NFSv4: Fix memory leak in nfs4_set_security_label
- Fix an occasional memory overwrite caused by a fix added in 6.10
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEKLLlsBKG3yQ88j7+M2qzM29mf5cFAmZjB6EACgkQM2qzM29m
f5cPXBAAstE7c7QAYCQtAwK/C/rsbr8xjAxcjivwAEOFl8znKc4ZTJ3u2tVSgAqI
DSbjXK/JVOW10ahDLawqpnWuqTUHtGlK4CMVRdZkmfEqokulhny+FmelJ4B0ZWMQ
fUSSdYO4ONuMRcytnFperqI0LL7XuEnSsKnE1cQ4HIybQchJyilj9c/TERIVGWzn
eHcVke07JF+lMvHWWJoDCcHhHHtl82znDPpmrsyzOLKS+jtiaP41r2ZPtohhoPDm
9VQmTqpSXWyexXpzgQzyu4TOWz6PPakoa701x7pW/2wZgzabdRaJhkYea64Bc8GK
RTfr7TpSKkcnttQSb2jbb2+qG15g+f7nDbWtejqWuhEppwdj/EznuyrHi80N5Klf
pw4+XFU8HoC15jcdltMb2Jecbv6uulQ0AcfCOLUh61E86tiy7YDChmhRLgDhzRsS
jDvq8HxRe1CNmqoxfqIuWuWiMwyCRSSd06U7u3/eqBZeLXOxQb26bxG8GXitUPgY
G4SCCSvExs2YqnegsZNQGB0L5YKd/zhhnkoc/Y29CU9ZhoItgHlblKODsix9kVJl
ckN/qSZS5/rrV7IsvzsB0ywNb5VkfnkspHu7X8tlYBuwgnTHJCp25AuWVt31/6jH
E+tW4wdvd1oBTLqpWRewclTwUrmvJYVYirS01LiTVehx/mnJdw8=
=NvAo
-----END PGP SIGNATURE-----
Merge tag 'nfsd-6.10-1' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux
Pull nfsd fix from Chuck Lever:
- Fix an occasional memory overwrite caused by a fix added in 6.10
* tag 'nfsd-6.10-1' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux:
SUNRPC: Fix loop termination condition in gss_free_in_token_pages()
The in_token->pages[] array is not NULL terminated. This results in
the following KASAN splat:
KASAN: maybe wild-memory-access in range [0x04a2013400000008-0x04a201340000000f]
Fixes: bafa6b4d95 ("SUNRPC: Fix gss_free_in_token_pages()")
Reviewed-by: Benjamin Coddington <bcodding@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Highlights include:
Stable fixes:
- nfs: fix undefined behavior in nfs_block_bits()
- NFSv4.2: Fix READ_PLUS when server doesn't support OP_READ_PLUS
Bugfixes:
- Fix mixing of the lock/nolock and local_lock mount options
- NFSv4: Fixup smatch warning for ambiguous return
- NFSv3: Fix remount when using the legacy binary mount api
- SUNRPC: Fix the handling of expired RPCSEC_GSS contexts
- SUNRPC: fix the NFSACL RPC retries when soft mounts are enabled
- rpcrdma: fix handling for RDMA_CM_EVENT_DEVICE_REMOVAL
Features and cleanups:
- NFSv3: Use the atomic_open API to fix open(O_CREAT|O_TRUNC)
- pNFS/filelayout: S layout segment range in LAYOUTGET
- pNFS: rework pnfs_generic_pg_check_layout to check IO range
- NFSv2: Turn off enabling of NFS v2 by default
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEESQctxSBg8JpV8KqEZwvnipYKAPIFAmZPpYMACgkQZwvnipYK
APITOw//acjE9YTZcST9kgkf2bfwuHFcdxvMZAr4MV0YsfqMesU2MYmaK/5YMLyo
iNCHjLmlfE2iLAUqvFtakc1F3guACJqqFfMdnMHa1MwPznrL3yNNClGnBamovbPd
XK2MBgpQBXb+xLxqH0A2TtOK2ofk0CFzb3x9eaziox8omBM2j3v6ZARsDHYehuhM
Hig8IxW/kZ7kx5jxqSVktrgW3gDKqIuLssF6fJVINzh45jHC5QO98cuSwetx6Mi1
Aw04HbOE6B66ORrzC1wyGN3PwOkTW2kgAiyB6UNNt+Hnvr0RD5TEqf3s3mzmhP9N
7LJ3H1Okxdcpn0G/bR4LBUg26r5BWxhfPiTYG/l9vAQk65yt2LO1kFzXbECBEfaG
ctGG7/7mMLVPs05kIFYm5S0cIYW2dYNuE20JY50LMaCIopjThdfruQj3yR4xibSt
bHrAbG9wW4qg/cgx860t5h7nbZnD5OOYIqKOCDRNrUfP7P0mK/tD49HggLjDo47M
vIMlYS3bTNSF7uEPTrv6bFr8XOD1I3BVXDQwGaJMZ8zyhkUIQtKO70+i4xM1E/Wl
Jw5Z6NpM8saDD449ZqX4IRUPDAhvz4v00QqD3Tqr4MHEc5sWi898S7XcJgL3bEai
QMJmBkAK8aDAP/suPw8VQc9wqplFNlB+QEh87p2WO+yRoEucn+A=
=HMSC
-----END PGP SIGNATURE-----
Merge tag 'nfs-for-6.10-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs
Pull NFS client updates from Trond Myklebust:
"Stable fixes:
- nfs: fix undefined behavior in nfs_block_bits()
- NFSv4.2: Fix READ_PLUS when server doesn't support OP_READ_PLUS
Bugfixes:
- Fix mixing of the lock/nolock and local_lock mount options
- NFSv4: Fixup smatch warning for ambiguous return
- NFSv3: Fix remount when using the legacy binary mount api
- SUNRPC: Fix the handling of expired RPCSEC_GSS contexts
- SUNRPC: fix the NFSACL RPC retries when soft mounts are enabled
- rpcrdma: fix handling for RDMA_CM_EVENT_DEVICE_REMOVAL
Features and cleanups:
- NFSv3: Use the atomic_open API to fix open(O_CREAT|O_TRUNC)
- pNFS/filelayout: S layout segment range in LAYOUTGET
- pNFS: rework pnfs_generic_pg_check_layout to check IO range
- NFSv2: Turn off enabling of NFS v2 by default"
* tag 'nfs-for-6.10-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
nfs: fix undefined behavior in nfs_block_bits()
pNFS: rework pnfs_generic_pg_check_layout to check IO range
pNFS/filelayout: check layout segment range
pNFS/filelayout: fixup pNfs allocation modes
rpcrdma: fix handling for RDMA_CM_EVENT_DEVICE_REMOVAL
NFS: Don't enable NFS v2 by default
NFS: Fix READ_PLUS when server doesn't support OP_READ_PLUS
sunrpc: fix NFSACL RPC retry on soft mount
SUNRPC: fix handling expired GSS context
nfs: keep server info for remounts
NFSv4: Fixup smatch warning for ambiguous return
NFS: make sure lock/nolock overriding local_lock mount option
NFS: add atomic_open for NFSv3 to handle O_TRUNC correctly.
pNFS/filelayout: Specify the layout segment range in LAYOUTGET
pNFS/filelayout: Remove the whole file layout requirement
Under the scenario of IB device bonding, when bringing down one of the
ports, or all ports, we saw xprtrdma entering a non-recoverable state
where it is not even possible to complete the disconnect and shut it
down the mount, requiring a reboot. Following debug, we saw that
transport connect never ended after receiving the
RDMA_CM_EVENT_DEVICE_REMOVAL callback.
The DEVICE_REMOVAL callback is irrespective of whether the CM_ID is
connected, and ESTABLISHED may not have happened. So need to work with
each of these states accordingly.
Fixes: 2acc5cae29 ('xprtrdma: Prevent dereferencing r_xprt->rx_ep after it is freed')
Cc: Sagi Grimberg <sagi.grimberg@vastdata.com>
Signed-off-by: Dan Aloni <dan.aloni@vastdata.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
It used to be quite awhile ago since 1b63a75180 ('SUNRPC: Refactor
rpc_clone_client()'), in 2012, that `cl_timeout` was copied in so that
all mount parameters propagate to NFSACL clients. However since that
change, if mount options as follows are given:
soft,timeo=50,retrans=16,vers=3
The resultant NFSACL client receives:
cl_softrtry: 1
cl_timeout: to_initval=60000, to_maxval=60000, to_increment=0, to_retries=2, to_exponential=0
These values lead to NFSACL operations not being retried under the
condition of transient network outages with soft mount. Instead, getacl
call fails after 60 seconds with EIO.
The simple fix is to pass the existing client's `cl_timeout` as the new
client timeout.
Cc: Chuck Lever <chuck.lever@oracle.com>
Cc: Benjamin Coddington <bcodding@redhat.com>
Link: https://lore.kernel.org/all/20231105154857.ryakhmgaptq3hb6b@gmail.com/T/
Fixes: 1b63a75180 ('SUNRPC: Refactor rpc_clone_client()')
Signed-off-by: Dan Aloni <dan.aloni@vastdata.com>
Reviewed-by: Benjamin Coddington <bcodding@redhat.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
In the case where we have received a successful reply to an RPC request,
but while processing the reply the client in rpc_decode_header() finds
an expired context, the code ends up propagating the error to the caller
instead of getting a new context and retrying the request.
To give more details, in rpc_decode_header() we call rpcauth_checkverf()
will call into the gss and internally will at some point call
gss_validate() which has a check if the current’s context lifetime
expired, and it would fail. The reason for the failure gets ‘scrubbed’
and translated to EACCES so when we get back to rpc_decode_header() we
just go to “out_verifier” which for that error would get converted to
“out_garbage” (ie it’s treated as garballed reply) and the next
action is call_encode. Which (1) doesn’t reencode or re-send (not to
mention no upcall happens because context expires as that reason just
not known) and it again fails in the same decoding process. After
re-trying it 3 times the error is propagated back to the caller
(ie nfs4_write_done_cb() in the case a failing write).
To fix this, instead we need to look to the case where the server
decides that context has expired and replies with an RPC auth error.
In that case, the rpc_decode_header() goes to "out_msg_denied" in that
we return EKEYREJECTED which in call_decode() is sent to “call_reserve”
which triggers an upcalls and a re-try of the operation.
The proposed fix is in case of a failed rpc_decode_header() to check
if credentials were set to be invalid and use that as a proxy for
deciding that context has expired and then treat is same way as
receiving an auth error.
Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Reviewed-by: Benjamin Coddington <bcodding@redhat.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
documented (hopefully adequately) in the respective changelogs. Notable
series include:
- Lucas Stach has provided some page-mapping
cleanup/consolidation/maintainability work in the series "mm/treewide:
Remove pXd_huge() API".
- In the series "Allow migrate on protnone reference with
MPOL_PREFERRED_MANY policy", Donet Tom has optimized mempolicy's
MPOL_PREFERRED_MANY mode, yielding almost doubled performance in one
test.
- In their series "Memory allocation profiling" Kent Overstreet and
Suren Baghdasaryan have contributed a means of determining (via
/proc/allocinfo) whereabouts in the kernel memory is being allocated:
number of calls and amount of memory.
- Matthew Wilcox has provided the series "Various significant MM
patches" which does a number of rather unrelated things, but in largely
similar code sites.
- In his series "mm: page_alloc: freelist migratetype hygiene" Johannes
Weiner has fixed the page allocator's handling of migratetype requests,
with resulting improvements in compaction efficiency.
- In the series "make the hugetlb migration strategy consistent" Baolin
Wang has fixed a hugetlb migration issue, which should improve hugetlb
allocation reliability.
- Liu Shixin has hit an I/O meltdown caused by readahead in a
memory-tight memcg. Addressed in the series "Fix I/O high when memory
almost met memcg limit".
- In the series "mm/filemap: optimize folio adding and splitting" Kairui
Song has optimized pagecache insertion, yielding ~10% performance
improvement in one test.
- Baoquan He has cleaned up and consolidated the early zone
initialization code in the series "mm/mm_init.c: refactor
free_area_init_core()".
- Baoquan has also redone some MM initializatio code in the series
"mm/init: minor clean up and improvement".
- MM helper cleanups from Christoph Hellwig in his series "remove
follow_pfn".
- More cleanups from Matthew Wilcox in the series "Various page->flags
cleanups".
- Vlastimil Babka has contributed maintainability improvements in the
series "memcg_kmem hooks refactoring".
- More folio conversions and cleanups in Matthew Wilcox's series
"Convert huge_zero_page to huge_zero_folio"
"khugepaged folio conversions"
"Remove page_idle and page_young wrappers"
"Use folio APIs in procfs"
"Clean up __folio_put()"
"Some cleanups for memory-failure"
"Remove page_mapping()"
"More folio compat code removal"
- David Hildenbrand chipped in with "fs/proc/task_mmu: convert hugetlb
functions to work on folis".
- Code consolidation and cleanup work related to GUP's handling of
hugetlbs in Peter Xu's series "mm/gup: Unify hugetlb, part 2".
- Rick Edgecombe has developed some fixes to stack guard gaps in the
series "Cover a guard gap corner case".
- Jinjiang Tu has fixed KSM's behaviour after a fork+exec in the series
"mm/ksm: fix ksm exec support for prctl".
- Baolin Wang has implemented NUMA balancing for multi-size THPs. This
is a simple first-cut implementation for now. The series is "support
multi-size THP numa balancing".
- Cleanups to vma handling helper functions from Matthew Wilcox in the
series "Unify vma_address and vma_pgoff_address".
- Some selftests maintenance work from Dev Jain in the series
"selftests/mm: mremap_test: Optimizations and style fixes".
- Improvements to the swapping of multi-size THPs from Ryan Roberts in
the series "Swap-out mTHP without splitting".
- Kefeng Wang has significantly optimized the handling of arm64's
permission page faults in the series
"arch/mm/fault: accelerate pagefault when badaccess"
"mm: remove arch's private VM_FAULT_BADMAP/BADACCESS"
- GUP cleanups from David Hildenbrand in "mm/gup: consistently call it
GUP-fast".
- hugetlb fault code cleanups from Vishal Moola in "Hugetlb fault path to
use struct vm_fault".
- selftests build fixes from John Hubbard in the series "Fix
selftests/mm build without requiring "make headers"".
- Memory tiering fixes/improvements from Ho-Ren (Jack) Chuang in the
series "Improved Memory Tier Creation for CPUless NUMA Nodes". Fixes
the initialization code so that migration between different memory types
works as intended.
- David Hildenbrand has improved follow_pte() and fixed an errant driver
in the series "mm: follow_pte() improvements and acrn follow_pte()
fixes".
- David also did some cleanup work on large folio mapcounts in his
series "mm: mapcount for large folios + page_mapcount() cleanups".
- Folio conversions in KSM in Alex Shi's series "transfer page to folio
in KSM".
- Barry Song has added some sysfs stats for monitoring multi-size THP's
in the series "mm: add per-order mTHP alloc and swpout counters".
- Some zswap cleanups from Yosry Ahmed in the series "zswap same-filled
and limit checking cleanups".
- Matthew Wilcox has been looking at buffer_head code and found the
documentation to be lacking. The series is "Improve buffer head
documentation".
- Multi-size THPs get more work, this time from Lance Yang. His series
"mm/madvise: enhance lazyfreeing with mTHP in madvise_free" optimizes
the freeing of these things.
- Kemeng Shi has added more userspace-visible writeback instrumentation
in the series "Improve visibility of writeback".
- Kemeng Shi then sent some maintenance work on top in the series "Fix
and cleanups to page-writeback".
- Matthew Wilcox reduces mmap_lock traffic in the anon vma code in the
series "Improve anon_vma scalability for anon VMAs". Intel's test bot
reported an improbable 3x improvement in one test.
- SeongJae Park adds some DAMON feature work in the series
"mm/damon: add a DAMOS filter type for page granularity access recheck"
"selftests/damon: add DAMOS quota goal test"
- Also some maintenance work in the series
"mm/damon/paddr: simplify page level access re-check for pageout"
"mm/damon: misc fixes and improvements"
- David Hildenbrand has disabled some known-to-fail selftests ni the
series "selftests: mm: cow: flag vmsplice() hugetlb tests as XFAIL".
- memcg metadata storage optimizations from Shakeel Butt in "memcg:
reduce memory consumption by memcg stats".
- DAX fixes and maintenance work from Vishal Verma in the series
"dax/bus.c: Fixups for dax-bus locking".
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZkgQYwAKCRDdBJ7gKXxA
jrdKAP9WVJdpEcXxpoub/vVE0UWGtffr8foifi9bCwrQrGh5mgEAx7Yf0+d/oBZB
nvA4E0DcPrUAFy144FNM0NTCb7u9vAw=
=V3R/
-----END PGP SIGNATURE-----
Merge tag 'mm-stable-2024-05-17-19-19' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull mm updates from Andrew Morton:
"The usual shower of singleton fixes and minor series all over MM,
documented (hopefully adequately) in the respective changelogs.
Notable series include:
- Lucas Stach has provided some page-mapping cleanup/consolidation/
maintainability work in the series "mm/treewide: Remove pXd_huge()
API".
- In the series "Allow migrate on protnone reference with
MPOL_PREFERRED_MANY policy", Donet Tom has optimized mempolicy's
MPOL_PREFERRED_MANY mode, yielding almost doubled performance in
one test.
- In their series "Memory allocation profiling" Kent Overstreet and
Suren Baghdasaryan have contributed a means of determining (via
/proc/allocinfo) whereabouts in the kernel memory is being
allocated: number of calls and amount of memory.
- Matthew Wilcox has provided the series "Various significant MM
patches" which does a number of rather unrelated things, but in
largely similar code sites.
- In his series "mm: page_alloc: freelist migratetype hygiene"
Johannes Weiner has fixed the page allocator's handling of
migratetype requests, with resulting improvements in compaction
efficiency.
- In the series "make the hugetlb migration strategy consistent"
Baolin Wang has fixed a hugetlb migration issue, which should
improve hugetlb allocation reliability.
- Liu Shixin has hit an I/O meltdown caused by readahead in a
memory-tight memcg. Addressed in the series "Fix I/O high when
memory almost met memcg limit".
- In the series "mm/filemap: optimize folio adding and splitting"
Kairui Song has optimized pagecache insertion, yielding ~10%
performance improvement in one test.
- Baoquan He has cleaned up and consolidated the early zone
initialization code in the series "mm/mm_init.c: refactor
free_area_init_core()".
- Baoquan has also redone some MM initializatio code in the series
"mm/init: minor clean up and improvement".
- MM helper cleanups from Christoph Hellwig in his series "remove
follow_pfn".
- More cleanups from Matthew Wilcox in the series "Various
page->flags cleanups".
- Vlastimil Babka has contributed maintainability improvements in the
series "memcg_kmem hooks refactoring".
- More folio conversions and cleanups in Matthew Wilcox's series:
"Convert huge_zero_page to huge_zero_folio"
"khugepaged folio conversions"
"Remove page_idle and page_young wrappers"
"Use folio APIs in procfs"
"Clean up __folio_put()"
"Some cleanups for memory-failure"
"Remove page_mapping()"
"More folio compat code removal"
- David Hildenbrand chipped in with "fs/proc/task_mmu: convert
hugetlb functions to work on folis".
- Code consolidation and cleanup work related to GUP's handling of
hugetlbs in Peter Xu's series "mm/gup: Unify hugetlb, part 2".
- Rick Edgecombe has developed some fixes to stack guard gaps in the
series "Cover a guard gap corner case".
- Jinjiang Tu has fixed KSM's behaviour after a fork+exec in the
series "mm/ksm: fix ksm exec support for prctl".
- Baolin Wang has implemented NUMA balancing for multi-size THPs.
This is a simple first-cut implementation for now. The series is
"support multi-size THP numa balancing".
- Cleanups to vma handling helper functions from Matthew Wilcox in
the series "Unify vma_address and vma_pgoff_address".
- Some selftests maintenance work from Dev Jain in the series
"selftests/mm: mremap_test: Optimizations and style fixes".
- Improvements to the swapping of multi-size THPs from Ryan Roberts
in the series "Swap-out mTHP without splitting".
- Kefeng Wang has significantly optimized the handling of arm64's
permission page faults in the series
"arch/mm/fault: accelerate pagefault when badaccess"
"mm: remove arch's private VM_FAULT_BADMAP/BADACCESS"
- GUP cleanups from David Hildenbrand in "mm/gup: consistently call
it GUP-fast".
- hugetlb fault code cleanups from Vishal Moola in "Hugetlb fault
path to use struct vm_fault".
- selftests build fixes from John Hubbard in the series "Fix
selftests/mm build without requiring "make headers"".
- Memory tiering fixes/improvements from Ho-Ren (Jack) Chuang in the
series "Improved Memory Tier Creation for CPUless NUMA Nodes".
Fixes the initialization code so that migration between different
memory types works as intended.
- David Hildenbrand has improved follow_pte() and fixed an errant
driver in the series "mm: follow_pte() improvements and acrn
follow_pte() fixes".
- David also did some cleanup work on large folio mapcounts in his
series "mm: mapcount for large folios + page_mapcount() cleanups".
- Folio conversions in KSM in Alex Shi's series "transfer page to
folio in KSM".
- Barry Song has added some sysfs stats for monitoring multi-size
THP's in the series "mm: add per-order mTHP alloc and swpout
counters".
- Some zswap cleanups from Yosry Ahmed in the series "zswap
same-filled and limit checking cleanups".
- Matthew Wilcox has been looking at buffer_head code and found the
documentation to be lacking. The series is "Improve buffer head
documentation".
- Multi-size THPs get more work, this time from Lance Yang. His
series "mm/madvise: enhance lazyfreeing with mTHP in madvise_free"
optimizes the freeing of these things.
- Kemeng Shi has added more userspace-visible writeback
instrumentation in the series "Improve visibility of writeback".
- Kemeng Shi then sent some maintenance work on top in the series
"Fix and cleanups to page-writeback".
- Matthew Wilcox reduces mmap_lock traffic in the anon vma code in
the series "Improve anon_vma scalability for anon VMAs". Intel's
test bot reported an improbable 3x improvement in one test.
- SeongJae Park adds some DAMON feature work in the series
"mm/damon: add a DAMOS filter type for page granularity access recheck"
"selftests/damon: add DAMOS quota goal test"
- Also some maintenance work in the series
"mm/damon/paddr: simplify page level access re-check for pageout"
"mm/damon: misc fixes and improvements"
- David Hildenbrand has disabled some known-to-fail selftests ni the
series "selftests: mm: cow: flag vmsplice() hugetlb tests as
XFAIL".
- memcg metadata storage optimizations from Shakeel Butt in "memcg:
reduce memory consumption by memcg stats".
- DAX fixes and maintenance work from Vishal Verma in the series
"dax/bus.c: Fixups for dax-bus locking""
* tag 'mm-stable-2024-05-17-19-19' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (426 commits)
memcg, oom: cleanup unused memcg_oom_gfp_mask and memcg_oom_order
selftests/mm: hugetlb_madv_vs_map: avoid test skipping by querying hugepage size at runtime
mm/hugetlb: add missing VM_FAULT_SET_HINDEX in hugetlb_wp
mm/hugetlb: add missing VM_FAULT_SET_HINDEX in hugetlb_fault
selftests: cgroup: add tests to verify the zswap writeback path
mm: memcg: make alloc_mem_cgroup_per_node_info() return bool
mm/damon/core: fix return value from damos_wmark_metric_value
mm: do not update memcg stats for NR_{FILE/SHMEM}_PMDMAPPED
selftests: cgroup: remove redundant enabling of memory controller
Docs/mm/damon/maintainer-profile: allow posting patches based on damon/next tree
Docs/mm/damon/maintainer-profile: change the maintainer's timezone from PST to PT
Docs/mm/damon/design: use a list for supported filters
Docs/admin-guide/mm/damon/usage: fix wrong schemes effective quota update command
Docs/admin-guide/mm/damon/usage: fix wrong example of DAMOS filter matching sysfs file
selftests/damon: classify tests for functionalities and regressions
selftests/damon/_damon_sysfs: use 'is' instead of '==' for 'None'
selftests/damon/_damon_sysfs: find sysfs mount point from /proc/mounts
selftests/damon/_damon_sysfs: check errors from nr_schemes file reads
mm/damon/core: initialize ->esz_bp from damos_quota_init_priv()
selftests/damon: add a test for DAMOS quota goal
...
This is a light release containing mostly optimizations, code clean-
ups, and minor bug fixes. This development cycle has focused on non-
upstream kernel work:
1. Continuing to build upstream CI for NFSD, based on kdevops
2. Backporting NFSD filecache-related fixes to selected LTS kernels
One notable new feature in v6.10 NFSD is the addition of a new
netlink protocol dedicated to configuring NFSD. A new user space
tool, nfsdctl, is to be added to nfs-utils. Lots more to come here.
As always I am very grateful to NFSD contributors, reviewers,
testers, and bug reporters who participated during this cycle.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEKLLlsBKG3yQ88j7+M2qzM29mf5cFAmZHdB8ACgkQM2qzM29m
f5cSMhAApukZZQCSR9lcppVrv48vsTKHFup4qFG5upOtdHR8yuSI4HOfSb5F9gsO
fHJABFFtvlsTWVFotwUY7ljjtin00PK0bfn9kZnekvcZ1A5Yoly8SJmxK+jjnmAw
jaXT7XzGYWShYiRkIXb3NYE9uiC1VZOYURYTVbMwklg3jbsyp2M7ylnRKIqUO2Qt
bin6tdqnDx2H4Hou9k4csMX4sZJlXQZjQxzxhWuL1XrjEMlXREnklfppLzIlnJJt
eHFxTRhwPdcJ9CbGVsae7GNQeGUdgq7P/AIFuHWIruvxaknY7ZOp2Z/xnxaifeU+
O2Psh/9G7zmqFkeH01QwItita8rUdBwgTv0r7QPw8/lCd0xMieqFynNGtTGwWv0Q
1DC8RssM3axeHHfpTgXtkqfwFvKIyE6xKrvTCBZ8Pd8hsrWzbYI4d/oTe8rwXLZ6
sMD5wgsfagl6fd6G+4/9adFniOgpUi2xHmqJ5yyALyzUDeHiiqsOmxM2Rb0FN5YR
ixlNj7s9lmYbbMwQshNRhV/fOPQRvKvicHAyKO7Yko/seDf8NxwQfPX6M2j2esUG
Ld8lW1hGpBDWpF1YnA6AsC+Jr12+A4c2Lg95155R9Svumk6Fv/4MIftiWpO8qf/g
d66Q35eGr3BSSypP9KFEa7aegZdcJAlUpLhsd0Wj2rbei7gh0kU=
=tpVD
-----END PGP SIGNATURE-----
Merge tag 'nfsd-6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux
Pull nfsd updates from Chuck Lever:
"This is a light release containing mostly optimizations, code clean-
ups, and minor bug fixes. This development cycle has focused on non-
upstream kernel work:
1. Continuing to build upstream CI for NFSD, based on kdevops
2. Backporting NFSD filecache-related fixes to selected LTS kernels
One notable new feature in v6.10 NFSD is the addition of a new netlink
protocol dedicated to configuring NFSD. A new user space tool,
nfsdctl, is to be added to nfs-utils. Lots more to come here.
As always I am very grateful to NFSD contributors, reviewers, testers,
and bug reporters who participated during this cycle"
* tag 'nfsd-6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: (29 commits)
NFSD: Force all NFSv4.2 COPY requests to be synchronous
SUNRPC: Fix gss_free_in_token_pages()
NFS/knfsd: Remove the invalid NFS error 'NFSERR_OPNOTSUPP'
knfsd: LOOKUP can return an illegal error value
nfsd: set security label during create operations
NFSD: Add COPY status code to OFFLOAD_STATUS response
NFSD: Record status of async copy operation in struct nfsd4_copy
SUNRPC: Remove comment for sp_lock
NFSD: add listener-{set,get} netlink command
SUNRPC: add a new svc_find_listener helper
SUNRPC: introduce svc_xprt_create_from_sa utility routine
NFSD: add write_version to netlink command
NFSD: convert write_threads to netlink command
NFSD: allow callers to pass in scope string to nfsd_svc
NFSD: move nfsd_mutex handling into nfsd_svc callers
lockd: host: Remove unnecessary statements'host = NULL;'
nfsd: don't create nfsv4recoverydir in nfsdfs when not used.
nfsd: optimise recalculate_deny_mode() for a common case
nfsd: add tracepoint in mark_client_expired_locked
nfsd: new tracepoint for check_slot_seqid
...
Dan Carpenter says:
> Commit 5866efa8cb ("SUNRPC: Fix svcauth_gss_proxy_init()") from Oct
> 24, 2019 (linux-next), leads to the following Smatch static checker
> warning:
>
> net/sunrpc/auth_gss/svcauth_gss.c:1039 gss_free_in_token_pages()
> warn: iterator 'i' not incremented
>
> net/sunrpc/auth_gss/svcauth_gss.c
> 1034 static void gss_free_in_token_pages(struct gssp_in_token *in_token)
> 1035 {
> 1036 u32 inlen;
> 1037 int i;
> 1038
> --> 1039 i = 0;
> 1040 inlen = in_token->page_len;
> 1041 while (inlen) {
> 1042 if (in_token->pages[i])
> 1043 put_page(in_token->pages[i]);
> ^
> This puts page zero over and over.
>
> 1044 inlen -= inlen > PAGE_SIZE ? PAGE_SIZE : inlen;
> 1045 }
> 1046
> 1047 kfree(in_token->pages);
> 1048 in_token->pages = NULL;
> 1049 }
Based on the way that the ->pages[] array is constructed in
gss_read_proxy_verf(), we know that once the loop encounters a NULL
page pointer, the remaining array elements must also be NULL.
Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
Suggested-by: Trond Myklebust <trondmy@hammerspace.com>
Fixes: 5866efa8cb ("SUNRPC: Fix svcauth_gss_proxy_init()")
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
It is obsolete since sp_lock was discarded in commit 580a25756a
("SUNRPC: discard sp_lock").
Signed-off-by: Guoqing Jiang <guoqing.jiang@linux.dev>
Reviewed-by: NeilBrown <neilb@suse.de>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
svc_find_listener will return the transport instance pointer for the
endpoint accepting connections/peer traffic from the specified transport
class and matching sockaddr.
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Add svc_xprt_create_from_sa utility routine and refactor
svc_xprt_create() codebase in order to introduce the capability to
create a svc port from socket address.
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Tested-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
since vs_proc pointer is dereferenced before getting it's address there's
no need to check for NULL.
Found by Linux Verification Center (linuxtesting.org) with SVACE.
Fixes: 8e5b67731d ("SUNRPC: Add a callback to initialise server requests")
Signed-off-by: Aleksandr Aprelkov <aaprelkov@usergate.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
This commit comes at the tail end of a greater effort to remove the
empty elements at the end of the ctl_table arrays (sentinels) which
will reduce the overall build time size of the kernel and run time
memory bloat by ~64 bytes per sentinel (further information Link :
https://lore.kernel.org/all/ZO5Yx5JFogGi%2FcBo@bombadil.infradead.org/)
* Remove sentinel element from ctl_table structs.
Signed-off-by: Joel Granados <j.granados@samsung.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Bugfixes:
- Fix an Oops in xs_tcp_tls_setup_socket
- Fix an Oops due to missing error handling in nfs_net_init()
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEESQctxSBg8JpV8KqEZwvnipYKAPIFAmYv7PEACgkQZwvnipYK
APKcgA//XE7iII6qQU9IC6jiv44qO7NtB6Zy3PQFCWO2ssMSqXbc4lO2eCmR/1nA
3Mlf1RwPxp+M+iOqFgANZV7voQ/r6djEyM1ycr+J2G/mfoxmKMVmnvg3lcyAfYNj
3fm6n0t8ZCkb3URoO4K0ejw007QfN2zpCL2psKucBdahOX7OYHT45o02liKN8ge5
0h7XKUjDStKsId4y5UVNB+QUeaQaWzKKMCzTzX4CxfHXZpIjbDjkdJ9WxAue2+Th
5yvNtPKkTi9EYwHjFAgN7ZKAC7Gu+jZXs9ewdqdyaSfYlioGk7ALz2ZZSptKb5zr
+nwuR+SC4Yzg5uBdjOLXS7/6Z/CVyp2bmgoFAzrP8cC0zfB7wJMNwUTZbUx3d823
Q7xYwecj1F9PE5CjHJlYpFZiKkiMHB242EFmBrcUR+yoRPHRgEg32UpI1/YWYnVO
pG+Hto0O8JnlGzkqslKy/qN7OMFgNTli+nrFnJT7TxDk27GpOY3gv271164TgeUt
MOk7iY6QjDqk6Zpzbg5AMcq6UhB+QUe76XAAFtDvjXLLBbsRckmbNTBbkFv6/O3p
a3bOd7oeugNIaRJJaR/lQ/EVAoBSeNUUj5G1ivoXDWsNXE+ZKNNdYyrCociVgI6S
j31k3XtbGVMk8M/7Lu5slPOGL4xqDtJAddF0eE7buQhRZNX7pPA=
=OtQc
-----END PGP SIGNATURE-----
Merge tag 'nfs-for-6.9-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs
Pull NFS client fixes from Trond Myklebust:
- Fix an Oops in xs_tcp_tls_setup_socket
- Fix an Oops due to missing error handling in nfs_net_init()
* tag 'nfs-for-6.9-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
nfs: Handle error of rpc_proc_register() in nfs_net_init().
SUNRPC: add a missing rpc_stat for TCP TLS
Main goal of memory allocation profiling patchset is to provide accounting
that is cheap enough to run in production. To achieve that we inject
counters using codetags at the allocation call sites to account every time
allocation is made. This injection allows us to perform accounting
efficiently because injected counters are immediately available as opposed
to the alternative methods, such as using _RET_IP_, which would require
counter lookup and appropriate locking that makes accounting much more
expensive. This method requires all allocation functions to inject
separate counters at their call sites so that their callers can be
individually accounted. Counter injection is implemented by allocation
hooks which should wrap all allocation functions.
Inlined functions which perform allocations but do not use allocation
hooks are directly charged for the allocations they perform. In most
cases these functions are just specialized allocation wrappers used from
multiple places to allocate objects of a specific type. It would be more
useful to do the accounting at their call sites instead. Instrument these
helpers to do accounting at the call site. Simple inlined allocation
wrappers are converted directly into macros. More complex allocators or
allocators with documentation are converted into _noprof versions and
allocation hooks are added. This allows memory allocation profiling
mechanism to charge allocations to the callers of these functions.
Link: https://lkml.kernel.org/r/20240415020731.1152108-1-surenb@google.com
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Acked-by: Jan Kara <jack@suse.cz> [jbd2]
Cc: Anna Schumaker <anna@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Benjamin Tissoires <benjamin.tissoires@redhat.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: David Rientjes <rientjes@google.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Dennis Zhou <dennis@kernel.org>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Jakub Sitnicki <jakub@cloudflare.com>
Cc: Jiri Kosina <jikos@kernel.org>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Kent Overstreet <kent.overstreet@linux.dev>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Paolo Abeni <pabeni@redhat.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Trond Myklebust <trond.myklebust@hammerspace.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Performance regression reported with NFS/RDMA using Omnipath,
bisected to commit e084ee673c ("svcrdma: Add Write chunk WRs to
the RPC's Send WR chain").
Tracing on the server reports:
nfsd-7771 [060] 1758.891809: svcrdma_sq_post_err:
cq.id=205 cid=226 sc_sq_avail=13643/851 status=-12
sq_post_err reports ENOMEM, and the rdma->sc_sq_avail (13643) is
larger than rdma->sc_sq_depth (851). The number of available Send
Queue entries is always supposed to be smaller than the Send Queue
depth. That seems like a Send Queue accounting bug in svcrdma.
As it's getting to be late in the 6.9-rc cycle, revert this commit.
It can be revisited in a subsequent kernel release.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=218743
Fixes: e084ee673c ("svcrdma: Add Write chunk WRs to the RPC's Send WR chain")
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
- Address a slow memory leak with RPC-over-TCP
- Prevent another NFS4ERR_DELAY loop during CREATE_SESSION
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEKLLlsBKG3yQ88j7+M2qzM29mf5cFAmYReWEACgkQM2qzM29m
f5fsfxAAhVkcd5Om9iBI7/Ib2QtJdeyn9+Q6hOJi9ITDPpdbSrd1Fmd8ufyKNuxH
dwGLyV0+ELbUl1RRNfdnl+TkzYHMTURuvDEgUyhYA28GOJVd9GWXwX2KZR7J+AP5
HtpSGLXt+XvuO7uB+SFS85wwF0DJL39Qy4jCVYCOuN2Z8zqfTg5TwstOQ8X794QN
b5JzLkUlxQfd6kGRvU+BZHNf7R/yBfjUQWVybyhqzdjnCbbnPH+cl0hTlEIQTYJH
G31Gty1J/RGt1ZeURuF4OG4lFocRJW/SqoruneweBAOksN9PVcwsoMf6m16l3+AD
ZMnBt7FInQc/mAqRqIoLTsmYT8OyDa3a6qjubqWCYicCXvj1FxxOd7IaYytXxv/2
Z8ZvKSSvyXRwM3mUt+3E5DTM8NnsxPxnO9iSGIMUeH7n96LU0X39b/Ll6in6+eu2
/go8cLe59uuYDF9n2srX/LLWHj5wAWxVi+OgiSsAbsDFYTtJXK+syT2CpsEFXiUZ
5AYUbfGVqQ8uNtfGaaJd71CNCuEKC5qYpeC5cS2nnruV6SArfG69DMRAO0pxJYAC
6X7gm9Se1zyI8r9gR0rKjJ5ojeTPQBLfk6oVavum6CCwHzkKQTLG2jHBq8cdpwoL
KxXc37fhW9m9c2B3g2dikclM2+XrMyUzJ5Ync9SSiwFJN/956I0=
=dGcu
-----END PGP SIGNATURE-----
Merge tag 'nfsd-6.9-2' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux
Pull nfsd fixes from Chuck Lever:
- Address a slow memory leak with RPC-over-TCP
- Prevent another NFS4ERR_DELAY loop during CREATE_SESSION
* tag 'nfsd-6.9-2' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux:
nfsd: hold a lighter-weight client reference over CB_RECALL_ANY
SUNRPC: Fix a slow server-side memory leak with RPC-over-TCP
Jan Schunk reports that his small NFS servers suffer from memory
exhaustion after just a few days. A bisect shows that commit
e18e157bb5 ("SUNRPC: Send RPC message on TCP with a single
sock_sendmsg() call") is the first bad commit.
That commit assumed that sock_sendmsg() releases all the pages in
the underlying bio_vec array, but the reality is that it doesn't.
svc_xprt_release() releases the rqst's response pages, but the
record marker page fragment isn't one of those, so it is never
released.
This is a narrow fix that can be applied to stable kernels. A
more extensive fix is in the works.
Reported-by: Jan Schunk <scpcom@gmx.de>
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=218671
Fixes: e18e157bb5 ("SUNRPC: Send RPC message on TCP with a single sock_sendmsg() call")
Cc: Alexander Duyck <alexander.duyck@gmail.com>
Cc: Jakub Kacinski <kuba@kernel.org>
Cc: David Howells <dhowells@redhat.com>
Reviewed-by: David Howells <dhowells@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Scott reports an occasional scatterlist BUG that is triggered by the
RFC 8009 Kunit test, then says:
> Looking through the git history of the auth_gss code, there are various
> places where static buffers were replaced by dynamically allocated ones
> because they're being used with scatterlists.
Reported-by: Scott Mayhew <smayhew@redhat.com>
Fixes: 561141dd49 ("SUNRPC: Use a static buffer for the checksum initialization vector")
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Highlights include:
Bugfixes:
- Fix for an Oops in the NFSv4.2 listxattr handler
- Correct an incorrect buffer size in listxattr
- Fix for an Oops in the pNFS flexfiles layout
- Fix a refcount leak in NFS O_DIRECT writes
- Fix missing locking in NFS O_DIRECT
- Avoid an infinite loop in pnfs_update_layout
- Fix an overflow in the RPC waitqueue queue length counter
- Ensure that pNFS I/O is also protected by TLS when xprtsec
is specified by the mount options
- Fix a leaked folio lock in the netfs read code
- Fix a potential deadlock in fscache
- Allow setting the fscache uniquifier in NFSv4
- Fix an off by one in root_nfs_cat()
- Fix another off by one in rpc_sockaddr2uaddr()
- nfs4_do_open() can incorrectly trigger state recovery.
- Various fixes for connection shutdown
Features and cleanups:
- Ensure that containers only see their own RPC and NFS stats
- Enable nconnect for RDMA
- Remove dead code from nfs_writepage_locked()
- Various tracepoint additions to track EXCHANGE_ID, GETDEVICEINFO, and
mount options.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEESQctxSBg8JpV8KqEZwvnipYKAPIFAmX14K0ACgkQZwvnipYK
APLCeg/7Bdah7158TdNxSQAHPo3jzDqZmc933eZC0H8C9whNlu6XIa9fyT6ZrsQr
qkQ/ztSwsB6yp6vLPSnVdDh5KsndwrInTB874H8y6+8x+KwwuhSQ7Uy8epg5wrO0
kgiaRYSH7HB7EgUdNY14fHNXkA/DMLHz1F1aw2NVGCYmVCMg7kGV4wYCOH6bI2Ea
Wu8amZce6D1AbktbdSZcEz2ricR3lGXjCUPMnzRCaSpUmdd2t7d/rsnjTeKU1gb4
p9zLlOZs9Xe2vMT0ZQI8SEI+Scze82LBy7ykSKyhOjOt4AurVpzQFAvK+3dFZoIq
lzIHJwabBGNui26CR1k90ZqERLkkk+24i3ccT28HwhTqe5eM/qDCKOVQmuP0F1F8
QYsnIM+NnmPZveSGAMdOQwlGFQTyJbT5Na1blHTW2R2rjqBzgvfn8fR0vV4L5P7B
0J8ShmZKVkvb7mtJJhaaI4LF41ciCF8+I5zwpnYQi0tsX370XPNNFbzS3BmPUVFL
k0uEMVfNy69PkaH4DJWQT9GoE3qiAamkO+EdAlPad6b8QMdJJZxXOmaUzL8YsCHV
sX5ugsih/Hf5/+QFBCbHEy7G3oeeHsT80yO8nvGT+yy94bv4F+WcM/tviyRbKrls
t5audBDNRfrAeUlqAQkXfFmAyqP2CGNr29oL62cXL2muFG7d7ys=
=5n+X
-----END PGP SIGNATURE-----
Merge tag 'nfs-for-6.9-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs
Pull NFS client updates from Trond Myklebust:
"Highlights include:
Bugfixes:
- Fix for an Oops in the NFSv4.2 listxattr handler
- Correct an incorrect buffer size in listxattr
- Fix for an Oops in the pNFS flexfiles layout
- Fix a refcount leak in NFS O_DIRECT writes
- Fix missing locking in NFS O_DIRECT
- Avoid an infinite loop in pnfs_update_layout
- Fix an overflow in the RPC waitqueue queue length counter
- Ensure that pNFS I/O is also protected by TLS when xprtsec is
specified by the mount options
- Fix a leaked folio lock in the netfs read code
- Fix a potential deadlock in fscache
- Allow setting the fscache uniquifier in NFSv4
- Fix an off by one in root_nfs_cat()
- Fix another off by one in rpc_sockaddr2uaddr()
- nfs4_do_open() can incorrectly trigger state recovery
- Various fixes for connection shutdown
Features and cleanups:
- Ensure that containers only see their own RPC and NFS stats
- Enable nconnect for RDMA
- Remove dead code from nfs_writepage_locked()
- Various tracepoint additions to track EXCHANGE_ID, GETDEVICEINFO,
and mount options"
* tag 'nfs-for-6.9-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (29 commits)
nfs: fix panic when nfs4_ff_layout_prepare_ds() fails
NFS: trace the uniquifier of fscache
NFS: Read unlock folio on nfs_page_create_from_folio() error
NFS: remove unused variable nfs_rpcstat
nfs: fix UAF in direct writes
nfs: properly protect nfs_direct_req fields
NFS: enable nconnect for RDMA
NFSv4: nfs4_do_open() is incorrectly triggering state recovery
NFS: avoid infinite loop in pnfs_update_layout.
NFS: remove sync_mode test from nfs_writepage_locked()
NFSv4.1/pnfs: fix NFS with TLS in pnfs
NFS: Fix an off by one in root_nfs_cat()
nfs: make the rpc_stat per net namespace
nfs: expose /proc/net/sunrpc/nfs in net namespaces
sunrpc: add a struct rpc_stats arg to rpc_create_args
nfs: remove unused NFS_CALL macro
NFSv4.1: add tracepoint to trunked nfs4_exchange_id calls
NFS: Fix nfs_netfs_issue_read() xarray locking for writeback interrupt
SUNRPC: increase size of rpc_wait_queue.qlen from unsigned short to unsigned int
nfs: fix regression in handling of fsc= option in NFSv4
...
Yes, yes, I know the slab people were planning on going slow and letting
every subsystem fight this thing on their own. But let's just rip off
the band-aid and get it over and done with. I don't want to see a
number of unnecessary pull requests just to get rid of a flag that no
longer has any meaning.
This was mainly done with a couple of 'sed' scripts and then some manual
cleanup of the end result.
Link: https://lore.kernel.org/all/CAHk-=wji0u+OOtmAOD-5JV3SXcRJF___k_+8XNKmak0yd5vW1Q@mail.gmail.com/
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
We want to be able to have our rpc stats handled in a per network
namespace manner, so add an option to rpc_create_args to specify a
different rpc_stats struct instead of using the one on the rpc_program.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Chain RDMA Writes that convey Write chunks onto the local Send
chain. This means all WRs for an RPC Reply are now posted with a
single ib_post_send() call, and there is a single Send completion
when all of these are done. That reduces both the per-transport
doorbell rate and completion rate.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>