linux

korg/linux

mirror of https://mirrors.bfsu.edu.cn/git/linux.git synced 2024-11-12 05:48:39 +08:00

Author	SHA1	Message	Date
J. Bruce Fields	792a5112aa	nfsd: COPY with length 0 should copy to end of file >From https://tools.ietf.org/html/rfc7862#page-65 A count of 0 (zero) requests that all bytes from ca_src_offset through EOF be copied to the destination. Reported-by: <radchenkoy@gmail.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2021-03-22 10:19:04 -04:00
Ricardo Ribalda	34a624931b	nfsd: Fix typo "accesible" Trivial fix. Cc: linux-nfs@vger.kernel.org Signed-off-by: Ricardo Ribalda <ribalda@chromium.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2021-03-22 10:19:03 -04:00
Trond Myklebust	c6c7f2a84d	nfsd: Ensure knfsd shuts down when the "nfsd" pseudofs is unmounted In order to ensure that knfsd threads don't linger once the nfsd pseudofs is unmounted (e.g. when the container is killed) we let nfsd_umount() shut down those threads and wait for them to exit. This also should ensure that we don't need to do a kernel mount of the pseudofs, since the thread lifetime is now limited by the lifetime of the filesystem. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2021-03-22 10:19:03 -04:00
Paul Menzel	f988a7b71d	nfsd: Log client tracking type log message as info instead of warning `printk()`, by default, uses the log level warning, which leaves the user reading NFSD: Using UMH upcall client tracking operations. wondering what to do about it (`dmesg --level=warn`). Several client tracking methods are tried, and expected to fail. That’s why a message is printed only on success. It might be interesting for users to know the chosen method, so use info-level instead of debug-level. Cc: linux-nfs@vger.kernel.org Signed-off-by: Paul Menzel <pmenzel@molgen.mpg.de> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2021-03-22 10:19:03 -04:00
J. Bruce Fields	7f7e7a4006	nfsd: helper for laundromat expiry calculations We do this same logic repeatedly, and it's easy to get the sense of the comparison wrong. Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2021-03-22 10:19:03 -04:00
Chuck Lever	219a170502	NFSD: Clean up NFSDDBG_FACILITY macro These are no longer needed because there are no dprintk() call sites in these files. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2021-03-22 10:19:02 -04:00
Chuck Lever	6019ce0742	NFSD: Add a tracepoint to record directory entry encoding Enable watching the progress of directory encoding to capture the timing of any issues with reading or encoding a directory. The new tracepoint captures dirent encoding for all NFS versions. For example, here's what a few NFSv4 directory entries might look like: nfsd-989 [002] 468.596265: nfsd_dirent: fh_hash=0x5d162594 ino=2 name=. nfsd-989 [002] 468.596267: nfsd_dirent: fh_hash=0x5d162594 ino=1 name=.. nfsd-989 [002] 468.596299: nfsd_dirent: fh_hash=0x5d162594 ino=3827 name=zlib.c nfsd-989 [002] 468.596325: nfsd_dirent: fh_hash=0x5d162594 ino=3811 name=xdiff nfsd-989 [002] 468.596351: nfsd_dirent: fh_hash=0x5d162594 ino=3810 name=xdiff-interface.h nfsd-989 [002] 468.596377: nfsd_dirent: fh_hash=0x5d162594 ino=3809 name=xdiff-interface.c Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2021-03-22 10:19:02 -04:00
Chuck Lever	1416f43530	NFSD: Clean up after updating NFSv3 ACL encoders Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2021-03-22 10:19:02 -04:00
Chuck Lever	15e432bf0c	NFSD: Update the NFSv3 SETACL result encoder to use struct xdr_stream Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2021-03-22 10:19:02 -04:00
Chuck Lever	20798dfe24	NFSD: Update the NFSv3 GETACL result encoder to use struct xdr_stream Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2021-03-22 10:19:01 -04:00
Chuck Lever	83d0b84572	NFSD: Clean up after updating NFSv2 ACL encoders Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2021-03-22 10:19:01 -04:00
Chuck Lever	07f5c2963c	NFSD: Update the NFSv2 ACL ACCESS result encoder to use struct xdr_stream Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2021-03-22 10:19:01 -04:00
Chuck Lever	8d2009a10b	NFSD: Update the NFSv2 ACL GETATTR result encoder to use struct xdr_stream Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2021-03-22 10:19:01 -04:00
Chuck Lever	778f068fa0	NFSD: Update the NFSv2 SETACL result encoder to use struct xdr_stream The SETACL result encoder is exactly the same as the NFSv2 attrstatres decoder. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2021-03-22 10:19:00 -04:00
Chuck Lever	f8cba47344	NFSD: Update the NFSv2 GETACL result encoder to use struct xdr_stream Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2021-03-22 10:19:00 -04:00
Chuck Lever	8a2cf9f570	NFSD: Remove unused NFSv2 directory entry encoders Clean up. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2021-03-22 10:18:59 -04:00
Chuck Lever	f5dcccd647	NFSD: Update the NFSv2 READDIR entry encoder to use struct xdr_stream Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2021-03-22 10:18:59 -04:00
Chuck Lever	94c8f8c682	NFSD: Update the NFSv2 READDIR result encoder to use struct xdr_stream Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2021-03-22 10:18:59 -04:00
Chuck Lever	8141d6a2bb	NFSD: Count bytes instead of pages in the NFSv2 READDIR encoder Clean up: Counting the bytes used by each returned directory entry seems less brittle to me than trying to measure consumed pages after the fact. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2021-03-22 10:18:59 -04:00
Chuck Lever	d52532002f	NFSD: Add a helper that encodes NFSv3 directory offset cookies Refactor: Add helper function similar to nfs3svc_encode_cookie3(). Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2021-03-22 10:18:58 -04:00
Chuck Lever	bf15229f2c	NFSD: Update the NFSv2 STATFS result encoder to use struct xdr_stream Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2021-03-22 10:18:58 -04:00
Chuck Lever	a6f8d9dc9e	NFSD: Update the NFSv2 READ result encoder to use struct xdr_stream Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2021-03-22 10:18:58 -04:00
Chuck Lever	d9014b0f8f	NFSD: Update the NFSv2 READLINK result encoder to use struct xdr_stream Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2021-03-22 10:18:58 -04:00
Chuck Lever	e3b4ef221a	NFSD: Update the NFSv2 diropres encoder to use struct xdr_stream Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2021-03-22 10:18:57 -04:00
Chuck Lever	92b54a4fa4	NFSD: Update the NFSv2 attrstat encoder to use struct xdr_stream Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2021-03-22 10:18:57 -04:00
Chuck Lever	a887eaed2a	NFSD: Update the NFSv2 stat encoder to use struct xdr_stream Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2021-03-22 10:18:57 -04:00
Chuck Lever	76ed0dd96e	NFSD: Reduce svc_rqst::rq_pages churn during READDIR operations During NFSv2 and NFSv3 READDIR/PLUS operations, NFSD advances rq_next_page to the full size of the client-requested buffer, then releases all those pages at the end of the request. The next request to use that nfsd thread has to refill the pages. NFSD does this even when the dirlist in the reply is small. With NFSv3 clients that send READDIR operations with large buffer sizes, that can be 256 put_page/alloc_page pairs per READDIR request, even though those pages often remain unused. We can save some work by not releasing dirlist buffer pages that were not used to form the READDIR Reply. I've left the NFSv2 code alone since there are never more than three pages involved in an NFSv2 READDIR Reply. Eventually we should nail down why these pages need to be released at all in order to avoid allocating and releasing pages unnecessarily. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2021-03-22 10:18:56 -04:00
Chuck Lever	1411934627	NFSD: Remove unused NFSv3 directory entry encoders Clean up. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2021-03-22 10:18:56 -04:00
Chuck Lever	7f87fc2d34	NFSD: Update NFSv3 READDIR entry encoders to use struct xdr_stream The benefit of the xdr_stream helpers is that they transparently handle encoding an XDR data item that crosses page boundaries. Most of the open-coded logic to do that here can be eliminated. A sub-buffer and sub-stream are set up as a sink buffer for the directory entry encoder. As an entry is encoded, it is added to the end of the content in this buffer/stream. The total length of the directory list is tracked in the buffer's @len field. When it comes time to encode the Reply, the sub-buffer is merged into rq_res's page array at the correct place using xdr_write_pages(). Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2021-03-22 10:18:56 -04:00
Chuck Lever	e4ccfe3014	NFSD: Update the NFSv3 READDIR3res encoder to use struct xdr_stream Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2021-03-22 10:18:56 -04:00
Chuck Lever	a1409e2de4	NFSD: Count bytes instead of pages in the NFSv3 READDIR encoder Clean up: Counting the bytes used by each returned directory entry seems less brittle to me than trying to measure consumed pages after the fact. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2021-03-22 10:18:55 -04:00
Chuck Lever	a161e6c76a	NFSD: Add a helper that encodes NFSv3 directory offset cookies Refactor: De-duplicate identical code that handles encoding of directory offset cookies across page boundaries. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2021-03-22 10:18:55 -04:00
Chuck Lever	5ef2826c76	NFSD: Update the NFSv3 COMMIT3res encoder to use struct xdr_stream As an additional clean up, encode_wcc_data() is removed because it is now no longer used. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2021-03-22 10:18:55 -04:00
Chuck Lever	ded04a587f	NFSD: Update the NFSv3 PATHCONF3res encoder to use struct xdr_stream Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2021-03-22 10:18:55 -04:00
Chuck Lever	0a139d1b7f	NFSD: Update the NFSv3 FSINFO3res encoder to use struct xdr_stream Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2021-03-22 10:18:54 -04:00
Chuck Lever	8b7044984f	NFSD: Update the NFSv3 FSSTAT3res encoder to use struct xdr_stream Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2021-03-22 10:18:54 -04:00
Chuck Lever	4d74380a44	NFSD: Update the NFSv3 LINK3res encoder to use struct xdr_stream Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2021-03-22 10:18:54 -04:00
Chuck Lever	89d79e9672	NFSD: Update the NFSv3 RENAMEv3res encoder to use struct xdr_stream Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2021-03-22 10:18:54 -04:00
Chuck Lever	78315b3678	NFSD: Update the NFSv3 CREATE family of encoders to use struct xdr_stream Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2021-03-22 10:18:53 -04:00
Chuck Lever	ecb7a085ac	NFSD: Update the NFSv3 WRITE3res encoder to use struct xdr_stream Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2021-03-22 10:18:53 -04:00
Chuck Lever	cc9bcdad77	NFSD: Update the NFSv3 READ3res encode to use struct xdr_stream Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2021-03-22 10:18:53 -04:00
Chuck Lever	9a9c8923b3	NFSD: Update the NFSv3 READLINK3res encoder to use struct xdr_stream Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2021-03-22 10:18:52 -04:00
Chuck Lever	70f8e83985	NFSD: Update the NFSv3 wccstat result encoder to use struct xdr_stream Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2021-03-22 10:18:52 -04:00
Chuck Lever	5cf353354a	NFSD: Update the NFSv3 LOOKUP3res encoder to use struct xdr_stream Also, clean up: Rename the encoder function to match the name of the result structure in RFC 1813, consistent with other encoder function names in nfs3xdr.c. "diropres" is an NFSv2 thingie. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2021-03-22 10:18:52 -04:00
Chuck Lever	907c38227f	NFSD: Update the NFSv3 ACCESS3res encoder to use struct xdr_stream Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2021-03-22 10:18:52 -04:00
Chuck Lever	2c42f804d3	NFSD: Update the GETATTR3res encoder to use struct xdr_stream As an additional clean up, some renaming is done to more closely reflect the data type and variable names used in the NFSv3 XDR definition provided in RFC 1813. "attrstat" is an NFSv2 thingie. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2021-03-22 10:18:51 -04:00
Chuck Lever	bddfdbcddb	NFSD: Extract the svcxdr_init_encode() helper NFSD initializes an encode xdr_stream only after the RPC layer has already inserted the RPC Reply header. Thus it behaves differently than xdr_init_encode does, which assumes the passed-in xdr_buf is entirely devoid of content. nfs4proc.c has this server-side stream initialization helper, but it is visible only to the NFSv4 code. Move this helper to a place that can be accessed by NFSv2 and NFSv3 server XDR functions. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2021-03-22 10:18:51 -04:00
Olga Kornievskaia	b4250dd868	NFSD: fix error handling in NFSv4.0 callbacks When the server tries to do a callback and a client fails it due to authentication problems, we need the server to set callback down flag in RENEW so that client can recover. Suggested-by: Bruce Fields <bfields@redhat.com> Signed-off-by: Olga Kornievskaia <kolga@netapp.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Tested-by: Benjamin Coddington <bcodding@redhat.com> Link: https://lore.kernel.org/linux-nfs/FB84E90A-1A03-48B3-8BF7-D9D10AC2C9FE@oracle.com/T/#t	2021-03-11 10:58:49 -05:00
Olga Kornievskaia	614c975017	NFSD: fix dest to src mount in inter-server COPY A cleanup of the inter SSC copy needs to call fput() of the source file handle to make sure that file structure is freed as well as drop the reference on the superblock to unmount the source server. Fixes: `36e1e5ba90` ("NFSD: Fix use-after-free warning when doing inter-server copy") Signed-off-by: Olga Kornievskaia <kolga@netapp.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Tested-by: Dai Ngo <dai.ngo@oracle.com>	2021-03-09 13:26:59 -05:00
J. Bruce Fields	6ee65a7730	Revert "nfsd4: a client's own opens needn't prevent delegations" This reverts commit `94415b06eb`. That commit claimed to allow a client to get a read delegation when it was the only writer. Actually it allowed a client to get a read delegation when any client has a write open! The main problem is that it's depending on nfs4_clnt_odstate structures that are actually only maintained for pnfs exports. This causes clients to miss writes performed by other clients, even when there have been intervening closes and opens, violating close-to-open cache consistency. We can do this a different way, but first we should just revert this. I've added pynfs 4.1 test DELEG19 to test for this, as I should have done originally! Cc: stable@vger.kernel.org Reported-by: Timo Rothenpieler <timo@rothenpieler.org> Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2021-03-09 10:37:34 -05:00
J. Bruce Fields	4aa5e00203	Revert "nfsd4: remove check_conflicting_opens warning" This reverts commit `50747dd5e4` "nfsd4: remove check_conflicting_opens warning", as a prerequisite for reverting `94415b06eb`, which has a serious bug. Cc: stable@vger.kernel.org Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2021-03-09 10:37:34 -05:00
Al Viro	6e3e2c4362	new helper: inode_wrong_type() inode_wrong_type(inode, mode) returns true if setting inode->i_mode to given value would've changed the inode type. We have enough of those checks open-coded to make a helper worthwhile. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2021-03-08 10:19:35 -05:00
J. Bruce Fields	bfdd89f232	nfsd: don't abort copies early The typical result of the backwards comparison here is that the source server in a server-to-server copy will return BAD_STATEID within a few seconds of the copy starting, instead of giving the copy a full lease period, so the copy_file_range() call will end up unnecessarily returning a short read. Fixes: `624322f1ad` "NFSD add COPY_NOTIFY operation" Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2021-03-06 16:41:48 -05:00
Julian Braha	7005227369	fs: nfsd: fix kconfig dependency warning for NFSD_V4 When NFSD_V4 is enabled and CRYPTO is disabled, Kbuild gives the following warning: WARNING: unmet direct dependencies detected for CRYPTO_SHA256 Depends on [n]: CRYPTO [=n] Selected by [y]: - NFSD_V4 [=y] && NETWORK_FILESYSTEMS [=y] && NFSD [=y] && PROC_FS [=y] WARNING: unmet direct dependencies detected for CRYPTO_MD5 Depends on [n]: CRYPTO [=n] Selected by [y]: - NFSD_V4 [=y] && NETWORK_FILESYSTEMS [=y] && NFSD [=y] && PROC_FS [=y] This is because NFSD_V4 selects CRYPTO_MD5 and CRYPTO_SHA256, without depending on or selecting CRYPTO, despite those config options being subordinate to CRYPTO. Signed-off-by: Julian Braha <julianbraha@gmail.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2021-03-06 16:41:48 -05:00
Trond Myklebust	d30881f573	nfsd: Don't keep looking up unhashed files in the nfsd file cache If a file is unhashed, then we're going to reject it anyway and retry, so make sure we skip it when we're doing the RCU lockless lookup. This avoids a number of unnecessary nfserr_jukebox returns from nfsd_file_acquire() Fixes: `65294c1f2c` ("nfsd: add a new struct file caching facility to nfsd") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2021-03-06 16:41:47 -05:00
Linus Torvalds	7d6beb71da	idmapped-mounts-v5.12 -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCYCegywAKCRCRxhvAZXjc ouJ6AQDlf+7jCQlQdeKKoN9QDFfMzG1ooemat36EpRRTONaGuAD8D9A4sUsG4+5f 4IU5Lj9oY4DEmF8HenbWK2ZHsesL2Qg= =yPaw -----END PGP SIGNATURE----- Merge tag 'idmapped-mounts-v5.12' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux Pull idmapped mounts from Christian Brauner: "This introduces idmapped mounts which has been in the making for some time. Simply put, different mounts can expose the same file or directory with different ownership. This initial implementation comes with ports for fat, ext4 and with Christoph's port for xfs with more filesystems being actively worked on by independent people and maintainers. Idmapping mounts handle a wide range of long standing use-cases. Here are just a few: - Idmapped mounts make it possible to easily share files between multiple users or multiple machines especially in complex scenarios. For example, idmapped mounts will be used in the implementation of portable home directories in systemd-homed.service(8) where they allow users to move their home directory to an external storage device and use it on multiple computers where they are assigned different uids and gids. This effectively makes it possible to assign random uids and gids at login time. - It is possible to share files from the host with unprivileged containers without having to change ownership permanently through chown(2). - It is possible to idmap a container's rootfs and without having to mangle every file. For example, Chromebooks use it to share the user's Download folder with their unprivileged containers in their Linux subsystem. - It is possible to share files between containers with non-overlapping idmappings. - Filesystem that lack a proper concept of ownership such as fat can use idmapped mounts to implement discretionary access (DAC) permission checking. - They allow users to efficiently changing ownership on a per-mount basis without having to (recursively) chown(2) all files. In contrast to chown (2) changing ownership of large sets of files is instantenous with idmapped mounts. This is especially useful when ownership of a whole root filesystem of a virtual machine or container is changed. With idmapped mounts a single syscall mount_setattr syscall will be sufficient to change the ownership of all files. - Idmapped mounts always take the current ownership into account as idmappings specify what a given uid or gid is supposed to be mapped to. This contrasts with the chown(2) syscall which cannot by itself take the current ownership of the files it changes into account. It simply changes the ownership to the specified uid and gid. This is especially problematic when recursively chown(2)ing a large set of files which is commong with the aforementioned portable home directory and container and vm scenario. - Idmapped mounts allow to change ownership locally, restricting it to specific mounts, and temporarily as the ownership changes only apply as long as the mount exists. Several userspace projects have either already put up patches and pull-requests for this feature or will do so should you decide to pull this: - systemd: In a wide variety of scenarios but especially right away in their implementation of portable home directories. https://systemd.io/HOME_DIRECTORY/ - container runtimes: containerd, runC, LXD:To share data between host and unprivileged containers, unprivileged and privileged containers, etc. The pull request for idmapped mounts support in containerd, the default Kubernetes runtime is already up for quite a while now: https://github.com/containerd/containerd/pull/4734 - The virtio-fs developers and several users have expressed interest in using this feature with virtual machines once virtio-fs is ported. - ChromeOS: Sharing host-directories with unprivileged containers. I've tightly synced with all those projects and all of those listed here have also expressed their need/desire for this feature on the mailing list. For more info on how people use this there's a bunch of talks about this too. Here's just two recent ones: https://www.cncf.io/wp-content/uploads/2020/12/Rootless-Containers-in-Gitpod.pdf https://fosdem.org/2021/schedule/event/containers_idmap/ This comes with an extensive xfstests suite covering both ext4 and xfs: https://git.kernel.org/brauner/xfstests-dev/h/idmapped_mounts It covers truncation, creation, opening, xattrs, vfscaps, setid execution, setgid inheritance and more both with idmapped and non-idmapped mounts. It already helped to discover an unrelated xfs setgid inheritance bug which has since been fixed in mainline. It will be sent for inclusion with the xfstests project should you decide to merge this. In order to support per-mount idmappings vfsmounts are marked with user namespaces. The idmapping of the user namespace will be used to map the ids of vfs objects when they are accessed through that mount. By default all vfsmounts are marked with the initial user namespace. The initial user namespace is used to indicate that a mount is not idmapped. All operations behave as before and this is verified in the testsuite. Based on prior discussions we want to attach the whole user namespace and not just a dedicated idmapping struct. This allows us to reuse all the helpers that already exist for dealing with idmappings instead of introducing a whole new range of helpers. In addition, if we decide in the future that we are confident enough to enable unprivileged users to setup idmapped mounts the permission checking can take into account whether the caller is privileged in the user namespace the mount is currently marked with. The user namespace the mount will be marked with can be specified by passing a file descriptor refering to the user namespace as an argument to the new mount_setattr() syscall together with the new MOUNT_ATTR_IDMAP flag. The system call follows the openat2() pattern of extensibility. The following conditions must be met in order to create an idmapped mount: - The caller must currently have the CAP_SYS_ADMIN capability in the user namespace the underlying filesystem has been mounted in. - The underlying filesystem must support idmapped mounts. - The mount must not already be idmapped. This also implies that the idmapping of a mount cannot be altered once it has been idmapped. - The mount must be a detached/anonymous mount, i.e. it must have been created by calling open_tree() with the OPEN_TREE_CLONE flag and it must not already have been visible in the filesystem. The last two points guarantee easier semantics for userspace and the kernel and make the implementation significantly simpler. By default vfsmounts are marked with the initial user namespace and no behavioral or performance changes are observed. The manpage with a detailed description can be found here: `1d7b902e28` In order to support idmapped mounts, filesystems need to be changed and mark themselves with the FS_ALLOW_IDMAP flag in fs_flags. The patches to convert individual filesystem are not very large or complicated overall as can be seen from the included fat, ext4, and xfs ports. Patches for other filesystems are actively worked on and will be sent out separately. The xfstestsuite can be used to verify that port has been done correctly. The mount_setattr() syscall is motivated independent of the idmapped mounts patches and it's been around since July 2019. One of the most valuable features of the new mount api is the ability to perform mounts based on file descriptors only. Together with the lookup restrictions available in the openat2() RESOLVE_* flag namespace which we added in v5.6 this is the first time we are close to hardened and race-free (e.g. symlinks) mounting and path resolution. While userspace has started porting to the new mount api to mount proper filesystems and create new bind-mounts it is currently not possible to change mount options of an already existing bind mount in the new mount api since the mount_setattr() syscall is missing. With the addition of the mount_setattr() syscall we remove this last restriction and userspace can now fully port to the new mount api, covering every use-case the old mount api could. We also add the crucial ability to recursively change mount options for a whole mount tree, both removing and adding mount options at the same time. This syscall has been requested multiple times by various people and projects. There is a simple tool available at https://github.com/brauner/mount-idmapped that allows to create idmapped mounts so people can play with this patch series. I'll add support for the regular mount binary should you decide to pull this in the following weeks: Here's an example to a simple idmapped mount of another user's home directory: u1001@f2-vm:/$ sudo ./mount --idmap both:1000:1001:1 /home/ubuntu/ /mnt u1001@f2-vm:/$ ls -al /home/ubuntu/ total 28 drwxr-xr-x 2 ubuntu ubuntu 4096 Oct 28 22:07 . drwxr-xr-x 4 root root 4096 Oct 28 04:00 .. -rw------- 1 ubuntu ubuntu 3154 Oct 28 22:12 .bash_history -rw-r--r-- 1 ubuntu ubuntu 220 Feb 25 2020 .bash_logout -rw-r--r-- 1 ubuntu ubuntu 3771 Feb 25 2020 .bashrc -rw-r--r-- 1 ubuntu ubuntu 807 Feb 25 2020 .profile -rw-r--r-- 1 ubuntu ubuntu 0 Oct 16 16:11 .sudo_as_admin_successful -rw------- 1 ubuntu ubuntu 1144 Oct 28 00:43 .viminfo u1001@f2-vm:/$ ls -al /mnt/ total 28 drwxr-xr-x 2 u1001 u1001 4096 Oct 28 22:07 . drwxr-xr-x 29 root root 4096 Oct 28 22:01 .. -rw------- 1 u1001 u1001 3154 Oct 28 22:12 .bash_history -rw-r--r-- 1 u1001 u1001 220 Feb 25 2020 .bash_logout -rw-r--r-- 1 u1001 u1001 3771 Feb 25 2020 .bashrc -rw-r--r-- 1 u1001 u1001 807 Feb 25 2020 .profile -rw-r--r-- 1 u1001 u1001 0 Oct 16 16:11 .sudo_as_admin_successful -rw------- 1 u1001 u1001 1144 Oct 28 00:43 .viminfo u1001@f2-vm:/$ touch /mnt/my-file u1001@f2-vm:/$ setfacl -m u:1001:rwx /mnt/my-file u1001@f2-vm:/$ sudo setcap -n 1001 cap_net_raw+ep /mnt/my-file u1001@f2-vm:/$ ls -al /mnt/my-file -rw-rwxr--+ 1 u1001 u1001 0 Oct 28 22:14 /mnt/my-file u1001@f2-vm:/$ ls -al /home/ubuntu/my-file -rw-rwxr--+ 1 ubuntu ubuntu 0 Oct 28 22:14 /home/ubuntu/my-file u1001@f2-vm:/$ getfacl /mnt/my-file getfacl: Removing leading '/' from absolute path names # file: mnt/my-file # owner: u1001 # group: u1001 user::rw- user:u1001:rwx group::rw- mask::rwx other::r-- u1001@f2-vm:/$ getfacl /home/ubuntu/my-file getfacl: Removing leading '/' from absolute path names # file: home/ubuntu/my-file # owner: ubuntu # group: ubuntu user::rw- user:ubuntu:rwx group::rw- mask::rwx other::r--" * tag 'idmapped-mounts-v5.12' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux: (41 commits) xfs: remove the possibly unused mp variable in xfs_file_compat_ioctl xfs: support idmapped mounts ext4: support idmapped mounts fat: handle idmapped mounts tests: add mount_setattr() selftests fs: introduce MOUNT_ATTR_IDMAP fs: add mount_setattr() fs: add attr_flags_to_mnt_flags helper fs: split out functions to hold writers namespace: only take read lock in do_reconfigure_mnt() mount: make {lock,unlock}_mount_hash() static namespace: take lock_mount_hash() directly when changing flags nfs: do not export idmapped mounts overlayfs: do not mount on top of idmapped mounts ecryptfs: do not mount on top of idmapped mounts ima: handle idmapped mounts apparmor: handle idmapped mounts fs: make helpers idmap mount aware exec: handle idmapped mounts would_dump: handle idmapped mounts ...	2021-02-23 13:39:45 -08:00
Linus Torvalds	7c70f3a748	Optimization: - Cork the socket while there are queued replies Fixes: - DRC shutdown ordering - svc_rdma_accept() lockdep splat -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEKLLlsBKG3yQ88j7+M2qzM29mf5cFAmAsA80ACgkQM2qzM29m f5erXA/+MrR3ZtwK2eaTITu13TzzTrMURbp/n0wCCW/Ls1YMb6bn9ggtBwu2W5Cn Vb0RO9OLcmoI6CjqPh0CTUvvZspMYOAX4W1jQecKt2ml075APdlqUcv9YWPUQqVJ qTg8HxDymvHvY3I3FcBxhzofmGzF8AOmQZJw9uI5Wt/ivBfqGWcAGlxyRmB3mdsm cJRK0Sy7QMn2LefMcpMEeSbPA049/NZNRp6fcXnpPQFer42thoosYsNhTlAJfCXC C5S0z3/T6rpuJucV9la/WkpUA0YhWbPEHWNdAB5tzSqmoEo4LpzJzjv7uyQU4oue QlmChIz9qasgTI/BnCkBIzPD99S4UQcXjX0BnNinkQ77e6+b/vdAR+T+NLHJdkAf +7Xz6T9aZNaz2R49CjYl6/kG0rlNkjUzyURRYs/9zEBhogMPH/N4T7Z2M+ljCkeb tc3OaFDXZ2rfr7EKBGsfnEKINM1gpYipzILkr8GSHUMZLzOB/64upKySaJVjCGXj 7Sf1w+vJUWwYc+FqFvbaR4ybr01VIfdsecpn1TtY870zG1JzimzAHVZk1/xC9+CX J+lVOXbjawDl1Et3V3fWq6Y7mhAWves/NKPcbSug9sFc4qRHEmPbAq/RRtlsjQcn foMr5R8qd8OwEamVypZ2nIFxq4q3b742AS8lZhaK+DyZKq3oLac= =+R4U -----END PGP SIGNATURE----- Merge tag 'nfsd-5.12-1' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux Pull more nfsd updates from Chuck Lever: "Here are a few additional NFSD commits for the merge window: Optimization: - Cork the socket while there are queued replies Fixes: - DRC shutdown ordering - svc_rdma_accept() lockdep splat" * tag 'nfsd-5.12-1' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: SUNRPC: Further clean up svc_tcp_sendmsg() SUNRPC: Remove redundant socket flags from svc_tcp_sendmsg() SUNRPC: Use TCP_CORK to optimise send performance on the server svcrdma: Hold private mutex while invoking rdma_accept() nfsd: register pernet ops last, unregister first	2021-02-22 13:29:55 -08:00
Linus Torvalds	582cd91f69	for-5.12/block-2021-02-17 -----BEGIN PGP SIGNATURE----- iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmAtmIwQHGF4Ym9lQGtl cm5lbC5kawAKCRD301j7KXHgplzLEAC5O+3rBM8QuiJdo39Yppmuw4hDJ6hOKynP EJQLKQQi0VfXgU+MprGvcbpFYmNbgICvUICQkEzJuk++kPCu/BJtJz0yErQeLgS+ RdXiPV6enbF7iRML5TVRTr1q/z7sJMXcIIJ8Pz/rU/JNfGYExVd0WfnEY9mp1jOt Bl9V+qyTazdP+Ma4+uEPatSayqcdi1rxB5I+7v/sLiOvKZZWkaRZjUZ/mxAjUfvK dBOOPjMygEo3tCLkIyyA6lpLvr1r+SUZhLuebRLEKa3To3TW6RtoG0qwpKmI2iKw ylLeVLB60nM9RUxjflVOfBsHxz1bDg5Ve86y5nCjQd4Jo8x1c4DnecyGE5/Tu8Rg rgbsfD6nFWzhDCvcZT0XrfQ4ZAjIL2IfT+ypQiQ6UlRd3hvIKRmzWMkjuH2svr0u ey9Kq+lYerI4cM0F3W73gzUKdIQOuCzBCYxQuSQQomscBa7FCInyU192dAI9Aj6l Yd06mgKu6qCx6zLv6JfpBqaBHZMwyGE4dmZgPQFuuwO+b4N+Ck3Jm5fzEzw/xIxQ wdo/DlsAl60BXentB6FByGBJaCjVdSymRqN/xNCAbFKCjmr6TLBuXPfg1gYYO7xC VOcVjWe8iN3wWHZab3t2mxMKH9B9B/KKzIhu6TNHSmgtQ5paZPRCBx995pDyRw26 WC22RGC2MA== =os1E -----END PGP SIGNATURE----- Merge tag 'for-5.12/block-2021-02-17' of git://git.kernel.dk/linux-block Pull core block updates from Jens Axboe: "Another nice round of removing more code than what is added, mostly due to Christoph's relentless pursuit of tech debt removal/cleanups. This pull request contains: - Two series of BFQ improvements (Paolo, Jan, Jia) - Block iov_iter improvements (Pavel) - bsg error path fix (Pan) - blk-mq scheduler improvements (Jan) - -EBUSY discard fix (Jan) - bvec allocation improvements (Ming, Christoph) - bio allocation and init improvements (Christoph) - Store bdev pointer in bio instead of gendisk + partno (Christoph) - Block trace point cleanups (Christoph) - hard read-only vs read-only split (Christoph) - Block based swap cleanups (Christoph) - Zoned write granularity support (Damien) - Various fixes/tweaks (Chunguang, Guoqing, Lei, Lukas, Huhai)" * tag 'for-5.12/block-2021-02-17' of git://git.kernel.dk/linux-block: (104 commits) mm: simplify swapdev_block sd_zbc: clear zone resources for non-zoned case block: introduce blk_queue_clear_zone_settings() zonefs: use zone write granularity as block size block: introduce zone_write_granularity limit block: use blk_queue_set_zoned in add_partition() nullb: use blk_queue_set_zoned() to setup zoned devices nvme: cleanup zone information initialization block: document zone_append_max_bytes attribute block: use bi_max_vecs to find the bvec pool md/raid10: remove dead code in reshape_request block: mark the bio as cloned in bio_iov_bvec_set block: set BIO_NO_PAGE_REF in bio_iov_bvec_set block: remove a layer of indentation in bio_iov_iter_get_pages block: turn the nr_iovecs argument to bio_alloc* into an unsigned short block: remove the 1 and 4 vec bvec_slabs entries block: streamline bvec_alloc block: factor out a bvec_alloc_gfp helper block: move struct biovec_slab to bio.c block: reuse BIO_INLINE_VECS for integrity bvecs ...	2021-02-21 11:02:48 -08:00
J. Bruce Fields	bd5ae9288d	nfsd: register pernet ops last, unregister first These pernet operations may depend on stuff set up or torn down in the module init/exit functions. And they may be called at any time in between. So it makes more sense for them to be the last to be registered in the init function, and the first to be unregistered in the exit function. In particular, without this, the drc slab is being destroyed before all the per-net drcs are shut down, resulting in an "Objects remaining in nfsd_drc on __kmem_cache_shutdown()" warning in exit_nfsd. Reported-by: Zhi Li <yieli@redhat.com> Fixes: `3ba75830ce` "nfsd4: drc containerization" Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2021-02-15 10:45:00 -05:00
J. Bruce Fields	428a23d2bf	nfsd: skip some unnecessary stats in the v4 case In the typical case of v4 and an i_version-supporting filesystem, we can skip a stat which is only required to fake up a change attribute from ctime. Signed-off-by: J. Bruce Fields <bfields@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2021-01-30 11:47:21 -05:00
J. Bruce Fields	3cc55f4434	nfs: use change attribute for NFS re-exports When exporting NFS, we may as well use the real change attribute returned by the original server instead of faking up a change attribute from the ctime. Note we can't do that by setting I_VERSION--that would also turn on the logic in iversion.h which treats the lower bit specially, and that doesn't make sense for NFS. So instead we define a new export operation for filesystems like NFS that want to manage the change attribute themselves. Signed-off-by: J. Bruce Fields <bfields@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2021-01-30 11:47:12 -05:00
Dai Ngo	02591f9feb	NFSv4_2: SSC helper should use its own config. Currently NFSv4_2 SSC helper, nfs_ssc, incorrectly uses GRACE_PERIOD as its config. Fix by adding new config NFS_V4_2_SSC_HELPER which depends on NFS_V4_2 and is automatically selected when NFSD_V4 is enabled. Also removed the file name from a comment in nfs_ssc.c. Signed-off-by: Dai Ngo <dai.ngo@oracle.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2021-01-28 10:55:37 -05:00
J. Bruce Fields	ec59659b49	nfsd: cstate->session->se_client -> cstate->clp I'm not sure why we're writing this out the hard way in so many places. Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2021-01-28 10:55:37 -05:00
J. Bruce Fields	1722b04624	nfsd: simplify nfsd4_check_open_reclaim The set_client() was already taken care of by process_open1(). The comments here are mostly redundant with the code. Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2021-01-28 10:55:37 -05:00
J. Bruce Fields	f71475ba8c	nfsd: remove unused set_client argument Every caller is setting this argument to false, so we don't need it. Also cut this comment a bit and remove an unnecessary warning. Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2021-01-28 10:55:37 -05:00
J. Bruce Fields	47fdb22dac	nfsd: find_cpntf_state cleanup I think this unusual use of struct compound_state could cause confusion. It's not that much more complicated just to open-code this stateid lookup. The only change in behavior should be a different error return in the case the copy is using a source stateid that is a revoked delegation, but I doubt that matters. Signed-off-by: J. Bruce Fields <bfields@redhat.com> [ cel: squashed in fix reported by Coverity ] Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2021-01-25 09:36:29 -05:00
J. Bruce Fields	7950b5316e	nfsd: refactor set_client This'll be useful elsewhere. Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2021-01-25 09:36:29 -05:00
J. Bruce Fields	460d27091a	nfsd: rename lookup_clientid->set_client I think this is a better name, and I'm going to reuse elsewhere the code that does the lookup itself. Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2021-01-25 09:36:29 -05:00
J. Bruce Fields	b4587eb2cf	nfsd: simplify nfsd_renew You can take the single-exit thing too far, I think. Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2021-01-25 09:36:29 -05:00
J. Bruce Fields	a9d53a75cf	nfsd: simplify process_lock Similarly, this STALE_CLIENTID check is already handled by: nfs4_preprocess_confirmed_seqid_op()-> nfs4_preprocess_seqid_op()-> nfsd4_lookup_stateid()-> set_client()-> STALE_CLIENTID() (This may cause it to return a different error in some cases where there are multiple things wrong; pynfs test SEQ10 regressed on this commit because of that, but I think that's the test's fault, and I've fixed it separately.) Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2021-01-25 09:36:29 -05:00
J. Bruce Fields	33311873ad	nfsd4: simplify process_lookup1 This STALE_CLIENTID check is redundant with the one in lookup_clientid(). There's a difference in behavior is in case of memory allocation failure, which I think isn't a big deal. Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2021-01-25 09:36:29 -05:00
Amir Goldstein	20ad856e47	nfsd: report per-export stats Collect some nfsd stats per export in addition to the global stats. A new nfsdfs export_stats file is created. It uses the same ops as the exports file to iterate the export entries and we use the file's name to determine the reported info per export. For example: $ cat /proc/fs/nfsd/export_stats # Version 1.1 # Path Client Start-time # Stats /test localhost 92 fh_stale: 0 io_read: 9 io_write: 1 Every export entry reports the start time when stats collection started, so stats collecting scripts can know if stats where reset between samples. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2021-01-25 09:36:28 -05:00
Amir Goldstein	e567b98ce9	nfsd: protect concurrent access to nfsd stats counters nfsd stats counters can be updated by concurrent nfsd threads without any protection. Convert some nfsd_stats and nfsd_net struct members to use percpu counters. The longest_chain* members of struct nfsd_net remain unprotected. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2021-01-25 09:36:27 -05:00
Amir Goldstein	1b76d1df1a	nfsd: remove unused stats counters Commit `501cb1849f` ("nfsd: rip out the raparms cache") removed the code that updates read-ahead cache stats counters, commit `8bbfa9f388` ("knfsd: remove the nfsd thread busy histogram") removed code that updates the thread busy stats counters back in 2009 and code that updated filehandle cache stats was removed back in 2002. Remove the unused stats counters from nfsd_stats struct and print hardcoded zeros in /proc/net/rpc/nfsd. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2021-01-25 09:36:27 -05:00
Chuck Lever	9cee763ee6	NFSD: Clean up after updating NFSv3 ACL decoders Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2021-01-25 09:36:27 -05:00
Chuck Lever	68519ff2a1	NFSD: Update the NFSv2 SETACL argument decoder to use struct xdr_stream Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2021-01-25 09:36:27 -05:00
Chuck Lever	05027eafc2	NFSD: Update the NFSv3 GETACL argument decoder to use struct xdr_stream Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2021-01-25 09:36:27 -05:00
Chuck Lever	baadce65d6	NFSD: Clean up after updating NFSv2 ACL decoders Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2021-01-25 09:36:27 -05:00
Chuck Lever	64063892ef	NFSD: Update the NFSv2 ACL ACCESS argument decoder to use struct xdr_stream Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2021-01-25 09:36:27 -05:00
Chuck Lever	571d31f37a	NFSD: Update the NFSv2 ACL GETATTR argument decoder to use struct xdr_stream Since the ACL GETATTR procedure is the same as the normal GETATTR procedure, simply re-use nfssvc_decode_fhandleargs. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2021-01-25 09:36:27 -05:00
Chuck Lever	427eab3ba2	NFSD: Update the NFSv2 SETACL argument decoder to use struct xdr_stream Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2021-01-25 09:36:27 -05:00
Chuck Lever	635a45d347	NFSD: Update the NFSv2 GETACL argument decoder to use struct xdr_stream Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2021-01-25 09:36:26 -05:00
Chuck Lever	5650682e16	NFSD: Remove argument length checking in nfsd_dispatch() Now that the argument decoders for NFSv2 and NFSv3 use the xdr_stream mechanism, the version-specific length checking logic in nfsd_dispatch() is no longer necessary. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2021-01-25 09:36:26 -05:00
Chuck Lever	09f75a5375	NFSD: Update the NFSv2 SYMLINK argument decoder to use struct xdr_stream Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2021-01-25 09:36:26 -05:00
Chuck Lever	7dcf65b91e	NFSD: Update the NFSv2 CREATE argument decoder to use struct xdr_stream Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2021-01-25 09:36:26 -05:00
Chuck Lever	2fdd6bd293	NFSD: Update the NFSv2 SETATTR argument decoder to use struct xdr_stream Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2021-01-25 09:36:26 -05:00
Chuck Lever	77edcdf91f	NFSD: Update the NFSv2 LINK argument decoder to use struct xdr_stream Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2021-01-25 09:36:26 -05:00
Chuck Lever	62aa557efb	NFSD: Update the NFSv2 RENAME argument decoder to use struct xdr_stream Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2021-01-25 09:36:26 -05:00
Chuck Lever	6d742c1864	NFSD: Update NFSv2 diropargs decoding to use struct xdr_stream Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2021-01-25 09:36:26 -05:00
Chuck Lever	8688361ae2	NFSD: Update the NFSv2 READDIR argument decoder to use struct xdr_stream As an additional clean up, move code not related to XDR decoding into readdir's .pc_func call out. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2021-01-25 09:36:26 -05:00
Chuck Lever	788cd46ecf	NFSD: Add helper to set up the pages where the dirlist is encoded Add a helper similar to nfsd3_init_dirlist_pages(). Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2021-01-25 09:36:25 -05:00
Chuck Lever	1fcbd1c945	NFSD: Update the NFSv2 READLINK argument decoder to use struct xdr_stream If the code that sets up the sink buffer for nfsd_readlink() is moved adjacent to the nfsd_readlink() call site that uses it, then the only argument is a file handle, and the fhandle decoder can be used instead. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2021-01-25 09:36:25 -05:00
Chuck Lever	a51b5b737a	NFSD: Update the NFSv2 WRITE argument decoder to use struct xdr_stream Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2021-01-25 09:36:25 -05:00
Chuck Lever	8c293ef993	NFSD: Update the NFSv2 READ argument decoder to use struct xdr_stream The code that sets up rq_vec is refactored so that it is now adjacent to the nfsd_read() call site where it is used. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2021-01-25 09:36:25 -05:00
Chuck Lever	ebcd8e8b28	NFSD: Update the NFSv2 GETATTR argument decoder to use struct xdr_stream Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2021-01-25 09:36:25 -05:00
Chuck Lever	f8a38e2d6c	NFSD: Update the MKNOD3args decoder to use struct xdr_stream This commit removes the last usage of the original decode_sattr3(), so it is removed as a clean-up. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2021-01-25 09:36:25 -05:00
Chuck Lever	da39201637	NFSD: Update the SYMLINK3args decoder to use struct xdr_stream Similar to the WRITE decoder, code that checks the sanity of the payload size is re-wired to work with xdr_stream infrastructure. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2021-01-25 09:36:25 -05:00
Chuck Lever	83374c278d	NFSD: Update the MKDIR3args decoder to use struct xdr_stream Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2021-01-25 09:36:25 -05:00
Chuck Lever	6b3a11960d	NFSD: Update the CREATE3args decoder to use struct xdr_stream Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2021-01-25 09:36:25 -05:00
Chuck Lever	9cde9360d1	NFSD: Update the SETATTR3args decoder to use struct xdr_stream Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2021-01-25 09:36:25 -05:00
Chuck Lever	efaa1e7c2c	NFSD: Update the LINK3args decoder to use struct xdr_stream Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2021-01-25 09:36:24 -05:00
Chuck Lever	d181e0a4be	NFSD: Update the RENAME3args decoder to use struct xdr_stream Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2021-01-25 09:36:24 -05:00
Chuck Lever	54d1d43dc7	NFSD: Update the NFSv3 DIROPargs decoder to use struct xdr_stream Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2021-01-25 09:36:24 -05:00
Chuck Lever	c8d26a0acf	NFSD: Update COMMIT3arg decoder to use struct xdr_stream Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2021-01-25 09:36:24 -05:00
Chuck Lever	9cedc2e64c	NFSD: Update READDIR3args decoders to use struct xdr_stream As an additional clean up, neither nfsd3_proc_readdir() nor nfsd3_proc_readdirplus() make use of the dircount argument, so remove it from struct nfsd3_readdirargs. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2021-01-25 09:36:24 -05:00
Chuck Lever	40116ebd09	NFSD: Add helper to set up the pages where the dirlist is encoded De-duplicate some code that is used by both READDIR and READDIRPLUS to build the dirlist in the Reply. Because this code is not related to decoding READ arguments, it is moved to a more appropriate spot. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2021-01-25 09:36:24 -05:00
Chuck Lever	0a8f37fb34	NFSD: Fix returned READDIR offset cookie Code inspection shows that the server's NFSv3 READDIR implementation handles offset cookies slightly differently than the NFSv2 READDIR, NFSv3 READDIRPLUS, and NFSv4 READDIR implementations, and there doesn't seem to be any need for this difference. As a clean up, I copied the logic from nfsd3_proc_readdirplus(). Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2021-01-25 09:36:24 -05:00
Chuck Lever	224c1c894e	NFSD: Update READLINK3arg decoder to use struct xdr_stream The NFSv3 READLINK request takes a single filehandle, so it can re-use GETATTR's decoder. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2021-01-25 09:36:24 -05:00
Chuck Lever	c43b2f229a	NFSD: Update WRITE3arg decoder to use struct xdr_stream As part of the update, open code that sanity-checks the size of the data payload against the length of the RPC Call message has to be re-implemented to use xdr_stream infrastructure. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2021-01-25 09:36:24 -05:00
Chuck Lever	be63bd2ac6	NFSD: Update READ3arg decoder to use struct xdr_stream The code that sets up rq_vec is refactored so that it is now adjacent to the nfsd_read() call site where it is used. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2021-01-25 09:36:24 -05:00
Chuck Lever	3b921a2b14	NFSD: Update ACCESS3arg decoder to use struct xdr_stream Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2021-01-25 09:36:23 -05:00
Chuck Lever	9575363a9e	NFSD: Update GETATTR3args decoder to use struct xdr_stream Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2021-01-25 09:36:23 -05:00
Chuck Lever	2289e87b59	SUNRPC: Make trace_svc_process() display the RPC procedure symbolically The next few patches will employ these strings to help make server- side trace logs more human-readable. A similar technique is already in use in kernel RPC client code. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2021-01-25 09:36:23 -05:00
Guoqing Jiang	684da7628d	block: remove unnecessary argument from blk_execute_rq We can remove 'q' from blk_execute_rq as well after the previous change in blk_execute_rq_nowait. And more importantly it never really was needed to start with given that we can trivial derive it from struct request. Cc: linux-scsi@vger.kernel.org Cc: virtualization@lists.linux-foundation.org Cc: linux-ide@vger.kernel.org Cc: linux-mmc@vger.kernel.org Cc: linux-nvme@lists.infradead.org Cc: linux-nfs@vger.kernel.org Acked-by: Ulf Hansson <ulf.hansson@linaro.org> # for mmc Signed-off-by: Guoqing Jiang <guoqing.jiang@cloud.ionos.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-01-24 21:52:39 -07:00
Christian Brauner	899bf2ceb3	nfs: do not export idmapped mounts Prevent nfs from exporting idmapped mounts until we have ported it to support exporting idmapped mounts. Link: https://lore.kernel.org/linux-api/20210123130958.3t6kvgkl634njpsm@wittgenstein Cc: Christoph Hellwig <hch@lst.de> Cc: David Howells <dhowells@redhat.com> Cc: "J. Bruce Fields" <bfields@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: linux-fsdevel@vger.kernel.org Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>	2021-01-24 14:29:33 +01:00
Christian Brauner	6521f89170	namei: prepare for idmapped mounts The various vfs_*() helpers are called by filesystems or by the vfs itself to perform core operations such as create, link, mkdir, mknod, rename, rmdir, tmpfile and unlink. Enable them to handle idmapped mounts. If the inode is accessed through an idmapped mount map it into the mount's user namespace and pass it down. Afterwards the checks and operations are identical to non-idmapped mounts. If the initial user namespace is passed nothing changes so non-idmapped mounts will see identical behavior as before. Link: https://lore.kernel.org/r/20210121131959.646623-15-christian.brauner@ubuntu.com Cc: Christoph Hellwig <hch@lst.de> Cc: David Howells <dhowells@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: linux-fsdevel@vger.kernel.org Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>	2021-01-24 14:27:18 +01:00
Christian Brauner	9fe6145097	namei: introduce struct renamedata In order to handle idmapped mounts we will extend the vfs rename helper to take two new arguments in follow up patches. Since this operations already takes a bunch of arguments add a simple struct renamedata and make the current helper use it before we extend it. Link: https://lore.kernel.org/r/20210121131959.646623-14-christian.brauner@ubuntu.com Cc: Christoph Hellwig <hch@lst.de> Cc: David Howells <dhowells@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: linux-fsdevel@vger.kernel.org Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>	2021-01-24 14:27:18 +01:00
Tycho Andersen	c7c7a1a18a	xattr: handle idmapped mounts When interacting with extended attributes the vfs verifies that the caller is privileged over the inode with which the extended attribute is associated. For posix access and posix default extended attributes a uid or gid can be stored on-disk. Let the functions handle posix extended attributes on idmapped mounts. If the inode is accessed through an idmapped mount we need to map it according to the mount's user namespace. Afterwards the checks are identical to non-idmapped mounts. This has no effect for e.g. security xattrs since they don't store uids or gids and don't perform permission checks on them like posix acls do. Link: https://lore.kernel.org/r/20210121131959.646623-10-christian.brauner@ubuntu.com Cc: Christoph Hellwig <hch@lst.de> Cc: David Howells <dhowells@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: linux-fsdevel@vger.kernel.org Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: James Morris <jamorris@linux.microsoft.com> Signed-off-by: Tycho Andersen <tycho@tycho.pizza> Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>	2021-01-24 14:27:17 +01:00
Christian Brauner	e65ce2a50c	acl: handle idmapped mounts The posix acl permission checking helpers determine whether a caller is privileged over an inode according to the acls associated with the inode. Add helpers that make it possible to handle acls on idmapped mounts. The vfs and the filesystems targeted by this first iteration make use of posix_acl_fix_xattr_from_user() and posix_acl_fix_xattr_to_user() to translate basic posix access and default permissions such as the ACL_USER and ACL_GROUP type according to the initial user namespace (or the superblock's user namespace) to and from the caller's current user namespace. Adapt these two helpers to handle idmapped mounts whereby we either map from or into the mount's user namespace depending on in which direction we're translating. Similarly, cap_convert_nscap() is used by the vfs to translate user namespace and non-user namespace aware filesystem capabilities from the superblock's user namespace to the caller's user namespace. Enable it to handle idmapped mounts by accounting for the mount's user namespace. In addition the fileystems targeted in the first iteration of this patch series make use of the posix_acl_chmod() and, posix_acl_update_mode() helpers. Both helpers perform permission checks on the target inode. Let them handle idmapped mounts. These two helpers are called when posix acls are set by the respective filesystems to handle this case we extend the ->set() method to take an additional user namespace argument to pass the mount's user namespace down. Link: https://lore.kernel.org/r/20210121131959.646623-9-christian.brauner@ubuntu.com Cc: Christoph Hellwig <hch@lst.de> Cc: David Howells <dhowells@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: linux-fsdevel@vger.kernel.org Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>	2021-01-24 14:27:17 +01:00
Christian Brauner	2f221d6f7b	attr: handle idmapped mounts When file attributes are changed most filesystems rely on the setattr_prepare(), setattr_copy(), and notify_change() helpers for initialization and permission checking. Let them handle idmapped mounts. If the inode is accessed through an idmapped mount map it into the mount's user namespace. Afterwards the checks are identical to non-idmapped mounts. If the initial user namespace is passed nothing changes so non-idmapped mounts will see identical behavior as before. Helpers that perform checks on the ia_uid and ia_gid fields in struct iattr assume that ia_uid and ia_gid are intended values and have already been mapped correctly at the userspace-kernelspace boundary as we already do today. If the initial user namespace is passed nothing changes so non-idmapped mounts will see identical behavior as before. Link: https://lore.kernel.org/r/20210121131959.646623-8-christian.brauner@ubuntu.com Cc: Christoph Hellwig <hch@lst.de> Cc: David Howells <dhowells@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: linux-fsdevel@vger.kernel.org Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>	2021-01-24 14:27:16 +01:00
Christian Brauner	47291baa8d	namei: make permission helpers idmapped mount aware The two helpers inode_permission() and generic_permission() are used by the vfs to perform basic permission checking by verifying that the caller is privileged over an inode. In order to handle idmapped mounts we extend the two helpers with an additional user namespace argument. On idmapped mounts the two helpers will make sure to map the inode according to the mount's user namespace and then peform identical permission checks to inode_permission() and generic_permission(). If the initial user namespace is passed nothing changes so non-idmapped mounts will see identical behavior as before. Link: https://lore.kernel.org/r/20210121131959.646623-6-christian.brauner@ubuntu.com Cc: Christoph Hellwig <hch@lst.de> Cc: David Howells <dhowells@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: linux-fsdevel@vger.kernel.org Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: James Morris <jamorris@linux.microsoft.com> Acked-by: Serge Hallyn <serge@hallyn.com> Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>	2021-01-24 14:27:16 +01:00
J. Bruce Fields	51b2ee7d00	nfsd4: readdirplus shouldn't return parent of export If you export a subdirectory of a filesystem, a READDIRPLUS on the root of that export will return the filehandle of the parent with the ".." entry. The filehandle is optional, so let's just not return the filehandle for ".." if we're at the root of an export. Note that once the client learns one filehandle outside of the export, they can trivially access the rest of the export using further lookups. However, it is also not very difficult to guess filehandles outside of the export. So exporting a subdirectory of a filesystem should considered equivalent to providing access to the entire filesystem. To avoid confusion, we recommend only exporting entire filesystems. Reported-by: Youjipeng <wangzhibei1999@gmail.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2021-01-12 08:54:14 -05:00
Linus Torvalds	c912fd05fa	Fixes: - Fix major TCP performance regression - Get NFSv4.2 READ_PLUS regression tests to pass - Improve NFSv4 COMPOUND memory allocation - Fix sparse warning -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEKLLlsBKG3yQ88j7+M2qzM29mf5cFAl/138wACgkQM2qzM29m f5c0QQ/+NkUxtmXd5lKXjzB0NcXsiQm9QxGvY52Oj75DHHprGmGkNEQAKczr/1Gu l+MArFXJTITrZRwbqQMA4uxwgCfup51atI12c27n1u5T9+bMicJIjT5yCtQ7rT2t U70VSZKgBlWWTcvfiEcFc1rloI3IY5c4ZYpeMxaXseegn6w3LYQfkZLcRdRleSz3 P0IO59Eow8Wt/GxRXpeYv0sK2m8OK1OyknKAzbq9swrc0ARJzKIwuTDs7jPtlvg5 SkDOTrXdSHwVvTrCqr9BwaNtQa76xR/Zo5UqKYgyzx3/NQ7h39hRTR5xLVst+Ynh 3TgOPS0YDWlmRzjX0xhr5y+rwWFxRvS6uecaIMOSuqABQ1F0RwbfXE/XplQLhk1E kjL819y5MuUpOdjMx5SZEo0pC7VeAoqGmzvTunpf974ExTNvDiKf0fPFs74cYUzG /a4k3DYJQbzUgG1PzPElbKbPUwSk/W/M7p9Tw7R9dnX2huVa/2J6TllbnbUi6REf 4qVqCe3WXFHE8Q9FCBuYEaTddToPqA4M98B8ba/pDYiqgfI8goWvGEQukuL7RES0 0i3G5SMC5zScgk44RMewyNrzl8IzCJXITv39+YDQ9O4FVJJXTSAMoyQ5aXlzVhc6 v+b4560cXoltEecFzooKjNbb+2FURKNgfeDk9xgG2DoydzelipU= =POBn -----END PGP SIGNATURE----- Merge tag 'nfsd-5.11-1' of git://git.linux-nfs.org/projects/cel/cel-2.6 Pull nfsd fixes from Chuck Lever: - Fix major TCP performance regression - Get NFSv4.2 READ_PLUS regression tests to pass - Improve NFSv4 COMPOUND memory allocation - Fix sparse warning * tag 'nfsd-5.11-1' of git://git.linux-nfs.org/projects/cel/cel-2.6: NFSD: Restore NFSv4 decoding's SAVEMEM functionality SUNRPC: Handle TCP socket sends with kernel_sendpage() again NFSD: Fix sparse warning in nfssvc.c nfsd: Don't set eof on a truncated READ_PLUS nfsd: Fixes for nfsd4_encode_read_plus_data()	2021-01-11 11:35:46 -08:00
Chuck Lever	7b723008f9	NFSD: Restore NFSv4 decoding's SAVEMEM functionality While converting the NFSv4 decoder to use xdr_stream-based XDR processing, I removed the old SAVEMEM() macro. This macro wrapped a bit of logic that avoided a memory allocation by recognizing when the decoded item resides in a linear section of the Receive buffer. In that case, it returned a pointer into that buffer instead of allocating a bounce buffer. The bounce buffer is necessary only when xdr_inline_decode() has placed the decoded item in the xdr_stream's scratch buffer, which disappears the next time xdr_inline_decode() is called with that xdr_stream. That happens only if the data item crosses a page boundary in the receive buffer, an exceedingly rare occurrence. Allocating a bounce buffer every time results in a minor performance regression that was introduced by the recent NFSv4 decoder overhaul. Let's restore the previous behavior. On average, it saves about 1.5 kmalloc() calls per COMPOUND. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2020-12-18 12:28:58 -05:00
Chuck Lever	d6c9e4368c	NFSD: Fix sparse warning in nfssvc.c fs/nfsd/nfssvc.c:36:6: warning: symbol 'inter_copy_offload_enable' was not declared. Should it be static? The parameter was added by commit `ce0887ac96` ("NFSD add nfs4 inter ssc to nfsd4_copy"). Relocate it into the source file that uses it, and make it static. This approach is similar to the nfs4_disable_idmapping, cltrack_prog, and cltrack_legacy_disable module parameters. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2020-12-18 12:28:23 -05:00
Trond Myklebust	b68f0cbd3f	nfsd: Don't set eof on a truncated READ_PLUS If the READ_PLUS operation was truncated due to an error, then ensure we clear the 'eof' flag. Fixes: `9f0b5792f0` ("NFSD: Encode a full READ_PLUS reply") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2020-12-18 12:28:00 -05:00
Trond Myklebust	72d78717c6	nfsd: Fixes for nfsd4_encode_read_plus_data() Ensure that we encode the data payload + padding, and that we truncate the preallocated buffer to the actual read size. Fixes: `528b84934e` ("NFSD: Add READ_PLUS data support") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2020-12-18 12:27:55 -05:00
Linus Torvalds	14bd41e418	\n -----BEGIN PGP SIGNATURE----- iQEzBAABCAAdFiEEq1nRK9aeMoq1VSgcnJ2qBz9kQNkFAl/bPRMACgkQnJ2qBz9k QNmktwf7BE+H0PEgm3VfEs8uKUnmgr/TTBd9rhuKVa8NeYrT1YlX2ocCykawaLSW ppyXkr2rWKwvRO5P9hZPUsMbjvp7ucz14imBHlhiQpPyfh8cqMazPJLySqbAI/M+ Eo8WIl74EqQ4VIgCGgfIVD073yjA4FWvO+5/CITYR44Pc2WzyCdU/1oKGBrs4+Cg OZAsHvg+2uKiEVeaBwbII+X/jChCJwEfHEYry3A8oRL427HrDir7Jc9i3SNGTDnc SE6DPj9X5HWOfoXjVrMratnaz654isvdRdP6GRAFKX8rJlNPGLMZbQ3DTzLGTYKL 7r9KylGD5nCkL1SXjUOLCqHgVRrgpg== =xcC/ -----END PGP SIGNATURE----- Merge tag 'fsnotify_for_v5.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs Pull fsnotify updates from Jan Kara: "A few fsnotify fixes from Amir fixing fallout from big fsnotify overhaul a few months back and an improvement of defaults limiting maximum number of inotify watches from Waiman" * tag 'fsnotify_for_v5.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs: fsnotify: fix events reported to watching parent and child inotify: convert to handle_inode_event() interface fsnotify: generalize handle_inode_event() inotify: Increase default inotify.max_user_watches limit to 1048576	2020-12-17 10:56:27 -08:00
Trond Myklebust	716a8bc7f7	nfsd: Record NFSv4 pre/post-op attributes as non-atomic For the case of NFSv4, specify to the client that the pre/post-op attributes were not recorded atomically with the main operation. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2020-12-09 09:39:38 -05:00
Trond Myklebust	01cbf38539	nfsd: Set PF_LOCAL_THROTTLE on local filesystems only Don't set PF_LOCAL_THROTTLE on remote filesystems like NFS, since they aren't expected to ever be subject to double buffering. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2020-12-09 09:39:38 -05:00
Trond Myklebust	2e19d10c14	nfsd: Fix up nfsd to ensure that timeout errors don't result in ESTALE If the underlying filesystem times out, then we want knfsd to return NFSERR_JUKEBOX/DELAY rather than NFSERR_STALE. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2020-12-09 09:39:38 -05:00
Jeff Layton	7f84b488f9	nfsd: close cached files prior to a REMOVE or RENAME that would replace target It's not uncommon for some workloads to do a bunch of I/O to a file and delete it just afterward. If knfsd has a cached open file however, then the file may still be open when the dentry is unlinked. If the underlying filesystem is nfs, then that could trigger it to do a sillyrename. On a REMOVE or RENAME scan the nfsd_file cache for open files that correspond to the inode, and proactively unhash and put their references. This should prevent any delete-on-last-close activity from occurring, solely due to knfsd's open file cache. This must be done synchronously though so we use the variants that call flush_delayed_fput. There are deadlock possibilities if you call flush_delayed_fput while holding locks, however. In the case of nfsd_rename, we don't even do the lookups of the dentries to be renamed until we've locked for rename. Once we've figured out what the target dentry is for a rename, check to see whether there are cached open files associated with it. If there are, then unwind all of the locking, close them all, and then reattempt the rename. None of this is really necessary for "typical" filesystems though. It's mostly of use for NFS, so declare a new export op flag and use that to determine whether to close the files beforehand. Signed-off-by: Jeff Layton <jeff.layton@primarydata.com> Signed-off-by: Lance Shelton <lance.shelton@hammerspace.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2020-12-09 09:39:38 -05:00
Jeff Layton	ba5e8187c5	nfsd: allow filesystems to opt out of subtree checking When we start allowing NFS to be reexported, then we have some problems when it comes to subtree checking. In principle, we could allow it, but it would mean encoding parent info in the filehandles and there may not be enough space for that in a NFSv3 filehandle. To enforce this at export upcall time, we add a new export_ops flag that declares the filesystem ineligible for subtree checking. Signed-off-by: Jeff Layton <jeff.layton@primarydata.com> Signed-off-by: Lance Shelton <lance.shelton@hammerspace.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2020-12-09 09:39:38 -05:00
Jeff Layton	daab110e47	nfsd: add a new EXPORT_OP_NOWCC flag to struct export_operations With NFSv3 nfsd will always attempt to send along WCC data to the client. This generally involves saving off the in-core inode information prior to doing the operation on the given filehandle, and then issuing a vfs_getattr to it after the op. Some filesystems (particularly clustered or networked ones) have an expensive ->getattr inode operation. Atomicity is also often difficult or impossible to guarantee on such filesystems. For those, we're best off not trying to provide WCC information to the client at all, and to simply allow it to poll for that information as needed with a GETATTR RPC. This patch adds a new flags field to struct export_operations, and defines a new EXPORT_OP_NOWCC flag that filesystems can use to indicate that nfsd should not attempt to provide WCC info in NFSv3 replies. It also adds a blurb about the new flags field and flag to the exporting documentation. The server will also now skip collecting this information for NFSv2 as well, since that info is never used there anyway. Note that this patch does not add this flag to any filesystem export_operations structures. This was originally developed to allow reexporting nfs via nfsd. Other filesystems may want to consider enabling this flag too. It's hard to tell however which ones have export operations to enable export via knfsd and which ones mostly rely on them for open-by-filehandle support, so I'm leaving that up to the individual maintainers to decide. I am cc'ing the relevant lists for those filesystems that I think may want to consider adding this though. Cc: HPDD-discuss@lists.01.org Cc: ceph-devel@vger.kernel.org Cc: cluster-devel@redhat.com Cc: fuse-devel@lists.sourceforge.net Cc: ocfs2-devel@oss.oracle.com Signed-off-by: Jeff Layton <jeff.layton@primarydata.com> Signed-off-by: Lance Shelton <lance.shelton@hammerspace.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2020-12-09 09:39:38 -05:00
J. Bruce Fields	1631087ba8	Revert "nfsd4: support change_attr_type attribute" This reverts commit `a85857633b`. We're still factoring ctime into our change attribute even in the IS_I_VERSION case. If someone sets the system time backwards, a client could see the change attribute go backwards. Maybe we can just say "well, don't do that", but there's some question whether that's good enough, or whether we need a better guarantee. Also, the client still isn't actually using the attribute. While we're still figuring this out, let's just stop returning this attribute. Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2020-12-09 09:39:38 -05:00
J. Bruce Fields	942b20dc24	nfsd4: don't query change attribute in v2/v3 case inode_query_iversion() has side effects, and there's no point calling it when we're not even going to use it. We check whether we're currently processing a v4 request by checking fh_maxsize, which is arguably a little hacky; we could add a flag to svc_fh instead. Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2020-12-09 09:39:38 -05:00
J. Bruce Fields	4b03d99794	nfsd: minor nfsd4_change_attribute cleanup Minor cleanup, no change in behavior. Also pull out a common helper that'll be useful elsewhere. Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2020-12-09 09:39:37 -05:00
J. Bruce Fields	b2140338d8	nfsd: simplify nfsd4_change_info It doesn't make sense to carry all these extra fields around. Just make everything into change attribute from the start. This is just cleanup, there should be no change in behavior. Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2020-12-09 09:39:37 -05:00
J. Bruce Fields	70b87f7729	nfsd: only call inode_query_iversion in the I_VERSION case inode_query_iversion() can modify i_version. Depending on the exported filesystem, that may not be safe. For example, if you're re-exporting NFS, NFS stores the server's change attribute in i_version and does not expect it to be modified locally. This has been observed causing unnecessary cache invalidations. The way a filesystem indicates that it's OK to call inode_query_iverson() is by setting SB_I_VERSION. So, move the I_VERSION check out of encode_change(), where it's used only in GETATTR responses, to nfsd4_change_attribute(), which is also called for pre- and post- operation attributes. (Note we could also pull the NFSEXP_V4ROOT case into nfsd4_change_attribute() as well. That would actually be a no-op, since pre/post attrs are only used for metadata-modifying operations, and V4ROOT exports are read-only. But we might make the change in the future just for simplicity.) Reported-by: Daire Byrne <daire@dneg.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2020-12-09 09:39:37 -05:00
Dai Ngo	ca9364dde5	NFSD: Fix 5 seconds delay when doing inter server copy Since commit `b4868b44c5` ("NFSv4: Wait for stateid updates after CLOSE/OPEN_DOWNGRADE"), every inter server copy operation suffers 5 seconds delay regardless of the size of the copy. The delay is from nfs_set_open_stateid_locked when the check by nfs_stateid_is_sequential fails because the seqid in both nfs4_state and nfs4_stateid are 0. Fix by modifying nfs4_init_cp_state to return the stateid with seqid 1 instead of 0. This is also to conform with section 4.8 of RFC 7862. Here is the relevant paragraph from section 4.8 of RFC 7862: A copy offload stateid's seqid MUST NOT be zero. In the context of a copy offload operation, it is inappropriate to indicate "the most recent copy offload operation" using a stateid with a seqid of zero (see Section 8.2.2 of [RFC5661]). It is inappropriate because the stateid refers to internal state in the server and there may be several asynchronous COPY operations being performed in parallel on the same file by the server. Therefore, a copy offload stateid with a seqid of zero MUST be considered invalid. Fixes: `ce0887ac96` ("NFSD add nfs4 inter ssc to nfsd4_copy") Signed-off-by: Dai Ngo <dai.ngo@oracle.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2020-12-09 09:38:34 -05:00
Chuck Lever	eb162e1772	NFSD: Fix sparse warning in nfs4proc.c linux/fs/nfsd/nfs4proc.c:1542:24: warning: incorrect type in assignment (different base types) linux/fs/nfsd/nfs4proc.c:1542:24: expected restricted __be32 [assigned] [usertype] status linux/fs/nfsd/nfs4proc.c:1542:24: got int Clean-up: The dup_copy_fields() function returns only zero, so make it return void for now, and get rid of the return code check. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2020-12-09 09:38:34 -05:00
kazuo ito	4420440c57	nfsd: Fix message level for normal termination The warning message from nfsd terminating normally can confuse system adminstrators or monitoring software. Though it's not exactly fair to pin-point a commit where it originated, the current form in the current place started to appear in: Fixes: `e096bbc648` ("knfsd: remove special handling for SIGHUP") Signed-off-by: kazuo ito <kzpn200@gmail.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2020-12-09 09:38:33 -05:00
Amir Goldstein	950cc0d2be	fsnotify: generalize handle_inode_event() The handle_inode_event() interface was added as (quoting comment): "a simple variant of handle_event() for groups that only have inode marks and don't have ignore mask". In other words, all backends except fanotify. The inotify backend also falls under this category, but because it required extra arguments it was left out of the initial pass of backends conversion to the simple interface. This results in code duplication between the generic helper fsnotify_handle_event() and the inotify_handle_event() callback which also happen to be buggy code. Generalize the handle_inode_event() arguments and add the check for FS_EXCL_UNLINK flag to the generic helper, so inotify backend could be converted to use the simple interface. Link: https://lore.kernel.org/r/20201202120713.702387-2-amir73il@gmail.com CC: stable@vger.kernel.org Fixes: `b9a1b97725` ("fsnotify: create method handle_inode_event() in fsnotify_operations") Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>	2020-12-03 14:58:35 +01:00
Chuck Lever	5cfc822f3e	NFSD: Remove macros that are no longer used Now that all the NFSv4 decoder functions have been converted to make direct calls to the xdr helpers, remove the unused C macros. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2020-11-30 14:46:44 -05:00
Chuck Lever	d9b74bdac6	NFSD: Replace READ* macros in nfsd4_decode_compound() And clean-up: Now that we have removed the DECODE_TAIL macro from nfsd4_decode_compound(), we observe that there's no benefit for nfsd4_decode_compound() to return nfs_ok or nfserr_bad_xdr only to have its sole caller convert those values to one or zero, respectively. Have nfsd4_decode_compound() return 1/0 instead. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2020-11-30 14:46:44 -05:00
Chuck Lever	3a237b4af5	NFSD: Make nfsd4_ops::opnum a u32 Avoid passing a "pointer to int" argument to xdr_stream_decode_u32. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2020-11-30 14:46:43 -05:00
Chuck Lever	2212036cad	NFSD: Replace READ* macros in nfsd4_decode_listxattrs() Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2020-11-30 14:46:43 -05:00
Chuck Lever	403366a7e8	NFSD: Replace READ* macros in nfsd4_decode_setxattr() Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2020-11-30 14:46:43 -05:00
Chuck Lever	830c71502a	NFSD: Replace READ* macros in nfsd4_decode_xattr_name() Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2020-11-30 14:46:43 -05:00
Chuck Lever	3dfd0b0e15	NFSD: Replace READ* macros in nfsd4_decode_clone() Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2020-11-30 14:46:43 -05:00

1 2 3 4 5 ...

3273 Commits