Commit Graph

78864 Commits

Author SHA1 Message Date
Chuck Lever
dcf3f80965 NFSD: Revert "NFSD: NFSv4 CLOSE should release an nfsd_file immediately"
This reverts commit 5e138c4a75.

That commit attempted to make files available to other users as soon
as all NFSv4 clients were done with them, rather than waiting until
the filecache LRU had garbage collected them.

It gets the reference counting wrong, for one thing.

But it also misses that DELEGRETURN should release a file in the
same fashion. In fact, any nfsd_file_put() on an file held open
by an NFSv4 client needs potentially to release the file
immediately...

Clear the way for implementing that idea.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: NeilBrown <neilb@suse.de>
2022-11-28 12:54:45 -05:00
Chuck Lever
c252849082 NFSD: Pass the target nfsd_file to nfsd_commit()
In a moment I'm going to introduce separate nfsd_file types, one of
which is garbage-collected; the other, not. The garbage-collected
variety is to be used by NFSv2 and v3, and the non-garbage-collected
variety is to be used by NFSv4.

nfsd_commit() is invoked by both NFSv3 and NFSv4 consumers. We want
nfsd_commit() to find and use the correct variety of cached
nfsd_file object for the NFS version that is in use.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Tested-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: NeilBrown <neilb@suse.de>
2022-11-28 12:54:45 -05:00
Jeff Layton
e0aa651068 nfsd: don't call nfsd_file_put from client states seqfile display
We had a report of this:

    BUG: sleeping function called from invalid context at fs/nfsd/filecache.c:440

...with a stack trace showing nfsd_file_put being called from
nfs4_show_open. This code has always tried to call fput while holding a
spinlock, but we recently changed this to use the filecache, and that
started triggering the might_sleep() in nfsd_file_put.

states_start takes and holds the cl_lock while iterating over the
client's states, and we can't sleep with that held.

Have the various nfs4_show_* functions instead hold the fi_lock instead
of taking a nfsd_file reference.

Fixes: 78599c42ae ("nfsd4: add file to display list of client's opens")
Link: https://bugzilla.redhat.com/show_bug.cgi?id=2138357
Reported-by: Zhi Li <yieli@redhat.com>
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-11-28 12:54:45 -05:00
David Disseldorp
427505ffea exportfs: use pr_debug for unreachable debug statements
expfs.c has a bunch of dprintk statements which are unusable due to:
 #define dprintk(fmt, args...) do{}while(0)
Use pr_debug so that they can be enabled dynamically.
Also make some minor changes to the debug statements to fix some
incorrect types, and remove __func__ which can be handled by dynamic
debug separately.

Signed-off-by: David Disseldorp <ddiss@suse.de>
Reviewed-by: NeilBrown <neilb@suse.de>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-11-28 12:54:45 -05:00
Jeff Layton
2f3a4b2ac2 nfsd: allow disabling NFSv2 at compile time
rpc.nfsd stopped supporting NFSv2 a year ago. Take the next logical
step toward deprecating it and allow NFSv2 support to be compiled out.

Add a new CONFIG_NFSD_V2 option that can be turned off and rework the
CONFIG_NFSD_V?_ACL option dependencies. Add a description that
discourages enabling it.

Also, change the description of CONFIG_NFSD to state that the always-on
version is now 3 instead of 2.

Finally, add an #ifdef around "case 2:" in __write_versions. When NFSv2
is disabled at compile time, this should make the kernel ignore attempts
to disable it at runtime, but still error out when trying to enable it.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Tom Talpey <tom@talpey.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-11-28 12:54:45 -05:00
Jeff Layton
cb12fae1c3 nfsd: move nfserrno() to vfs.c
nfserrno() is common to all nfs versions, but nfsproc.c is specifically
for NFSv2. Move it to vfs.c, and the prototype to vfs.h.

While we're in here, remove the #ifdef EDQUOT check in this function.
It's apparently a holdover from the initial merge of the nfsd code in
1997. No other place in the kernel checks that that symbol is defined
before using it, so I think we can dispense with it here.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-11-28 12:54:44 -05:00
Jeff Layton
8e823bafff nfsd: ignore requests to disable unsupported versions
The kernel currently errors out if you attempt to enable or disable a
version that it doesn't recognize. Change it to ignore attempts to
disable an unrecognized version. If we don't support it, then there is
no harm in doing so.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Tom Talpey <tom@talpey.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-11-28 12:54:44 -05:00
Chuck Lever
841fd0a3cb NFSD: Finish converting the NFSv3 GETACL result encoder
For some reason, the NFSv2 GETACL result encoder was fully converted
to use the new nfs_stream_encode_acl(), but the NFSv3 equivalent was
not similarly converted.

Fixes: 20798dfe24 ("NFSD: Update the NFSv3 GETACL result encoder to use struct xdr_stream")
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-11-28 12:54:44 -05:00
Chuck Lever
ea5021e911 NFSD: Finish converting the NFSv2 GETACL result encoder
The xdr_stream conversion inadvertently left some code that set the
page_len of the send buffer. The XDR stream encoders should handle
this automatically now.

This oversight adds garbage past the end of the Reply message.
Clients typically ignore the garbage, but NFSD does not need to send
it, as it leaks stale memory contents onto the wire.

Fixes: f8cba47344 ("NFSD: Update the NFSv2 GETACL result encoder to use struct xdr_stream")
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-11-28 12:54:44 -05:00
Colin Ian King
69eed23baf NFSD: Remove redundant assignment to variable host_err
Variable host_err is assigned a value that is never read, it is being
re-assigned a value in every different execution path in the following
switch statement. The assignment is redundant and can be removed.

Cleans up clang-scan warning:
warning: Value stored to 'host_err' is never read [deadcode.DeadStores]

Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-11-28 12:54:44 -05:00
Anna Schumaker
eeadcb7579 NFSD: Simplify READ_PLUS
Chuck had suggested reverting READ_PLUS so it returns a single DATA
segment covering the requested read range. This prepares the server for
a future "sparse read" function so support can easily be added without
needing to rip out the old READ_PLUS code at the same time.

Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-11-28 12:54:43 -05:00
Linus Torvalds
cf562a45a0 Amir's copy_file_range() fix
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQQqUNBr3gm4hGXdBJlZ7Krx/gZQ6wUCY4OtEwAKCRBZ7Krx/gZQ
 66LvAP9tMMKsXoZY5dNjkAeQo/I5PHx81iLYu5GyigqTsf0g8gD+MeM2qxQE9QTt
 6gngWpnNif7Pe5Jj5yuwl4IGbjDG9AQ=
 =Tx7P
 -----END PGP SIGNATURE-----

Merge tag 'pull-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs

Pull vfs fix from Al Viro:
 "Amir's copy_file_range() fix"

* tag 'pull-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  vfs: fix copy_file_range() averts filesystem freeze protection
2022-11-27 12:40:06 -08:00
Linus Torvalds
30a853c1bd Two small cifs/smb3 client fixes
-----BEGIN PGP SIGNATURE-----
 
 iQGzBAABCgAdFiEE6fsu8pdIjtWE/DpLiiy9cAdyT1EFAmOC6QYACgkQiiy9cAdy
 T1EaUgv/eEFpIzkRXuJHuPBlxWM7Gz6DiFkWd83ADcB15+4ADzPVdc1HqbnSw0/0
 yPsxfhn95Ydo+zORPJ5puGUpveqNIen8xPeGaORGCCEMCYw5A58K2I8bkSYcMJUd
 63hiLmw9ZpaXxkLXX/Y04hSYBwPIhwLSrT1kfPy7jru78DjvmOya1WHaBzQZQ0ej
 374cDrG9yhWl+udFWJi1HxQt+fnJecaMNJy51BDeOeN3mM4j/kqHgYw2q5GfcbhL
 nyr56UTxHvBpyciguKzqNIUVcFiANr7Il5Yz++NSjRcFUnhgoxPZ+eHMU1X6fpFB
 fgLhcDsbBeWhPZnOAsb4X3HEhb41f61i5s2pYzwU7bhXYla0UrqEIS9RNjh3dgPx
 WnRErVPiOwPP6mRl+064KITWcv3riDb8QbwEtqJQQT1TUPXM0b190wq59lHqIpJk
 rV41bV7s3z839CulF+2Y9fuQVe5SL90kDSxF3gOue4SCzBfbl0Tww5gn3IIHvo+5
 cyZZ3wQU
 =89M+
 -----END PGP SIGNATURE-----

Merge tag '6.1-rc6-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6

Pull cifs fixes from Steve French:
 "Two small cifs/smb3 client fixes:

   - an unlock missing in an error path in copychunk_range found by
     xfstest 476

   - a fix for a use after free in a debug code path"

* tag '6.1-rc6-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6:
  cifs: fix missing unlock in cifs_file_copychunk_range()
  cifs: Use after free in debug code
2022-11-27 08:48:26 -08:00
Linus Torvalds
e5f3ec38c8 Fixes:
- Fix rare data corruption on READ operations
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEKLLlsBKG3yQ88j7+M2qzM29mf5cFAmOCPhcACgkQM2qzM29m
 f5ewTRAAuAjJvtnsRxZBILs76U3KpAnn3ghtYKtzCmanq1UEJAlTz5vLLCqJRTXw
 OajN4f/T/kD3bY5XxYcsbSMhVJhgd7JMx/DyPjp3rY+nS1cS/CB+TxN3VqhOZJDl
 MfEJTN0MxWrq0pXKYxuPECmwk/C7jAU4vjWU+sDWhu4Anig564fdPxGGWw9vRfh5
 YP8U8cuM5LIFCM2ZSjPsqWEXgd7pbYpYdOx6JPA74ftia8RBU1YVbIqZ6uFXiTnV
 tpKAuQYem2ixV6UfsrVbeHDfsSkQcX2iaTGlcLB5y12nprV0wTOQM6lONTD878vh
 37oc8zG9cdpWfcRzkyB8a58jHfSk74pMkV300C38CeV1KtajFVoRkoTvLYJdxbf8
 WPcRcbsXbotb4+A1D2H6QvPKUbrlK+HIHe8POU67tKq5BI8xRw+ojTsJ2ANYr9Jt
 OviLUraUnKFvCK2LN03+Op4l+UBTIdOGzGMqveILfxPD6zlTYJbQQ64Nih4JLfJG
 uDN2892H4He3he5oD9Dzoh2xSWfLV5PJEAaN9wW7pyv8D5HfYBXWPan6JeaMHnfW
 +NcoFQ3cSewyMZ1Rw0aCsKfXRGwikJrbAsrlvNe+uFRD6ueeWULp5fnTlOn+0tW1
 tp+EzLOMoNkL9NW/rLeCIPN1MNXzgabgd4Diq8U3iTDRxf0FahU=
 =zxjk
 -----END PGP SIGNATURE-----

Merge tag 'nfsd-6.1-6' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux

Pull nfsd fix from Chuck Lever:

 - Fix rare data corruption on READ operations

* tag 'nfsd-6.1-6' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux:
  NFSD: Fix reads with a non-zero offset that don't end on a page boundary
2022-11-26 12:25:49 -08:00
Linus Torvalds
3e0d88f911 zonefs fixes for 6.1-rc7
- Fix a race between zonefs module initialization of sysfs attribute
    directory and mounting a drive (from Xiaoxu).
 
  - Fix active zone accounting in the rare case of an IO error due to a
    zone transition to offline or read-only state (from me).
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQSRPv8tYSvhwAzJdzjdoc3SxdoYdgUCY4FHbQAKCRDdoc3SxdoY
 dtVJAQDQPKQW7GlxgmfsxNZz4bR9CT2DHqoprN1g1bbKM9hYKwEApq0xtpBjyqX2
 EaePIFvlYa67f5kWnW4XEd5vetbh0AI=
 =imdG
 -----END PGP SIGNATURE-----

Merge tag 'zonefs-6.1-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/zonefs

Pull zonefs fixes from Damien Le Moal:

 - Fix a race between zonefs module initialization of sysfs attribute
   directory and mounting a drive (from Xiaoxu).

 - Fix active zone accounting in the rare case of an IO error due to a
   zone transition to offline or read-only state (from me).

* tag 'zonefs-6.1-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/zonefs:
  zonefs: Fix active zone accounting
  zonefs: Fix race between modprobe and mount
2022-11-25 16:34:39 -08:00
Linus Torvalds
3eaea0db25 for-6.1-rc6-tag
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEE8rQSAMVO+zA4DBdWxWXV+ddtWDsFAmOBCJEACgkQxWXV+ddt
 WDu5Nw/+P59ARfAm/4HRId4iL6UKozSMc+blWLeP9KkjcytdAfek0oGe3gZ7NJVK
 8VYa93yNneCTkNFLIEpqEduGQjN04dr0odRUXD/kIR8EEtjbgDrH9ZmL47An5wVH
 qE8ILlh2+DXk/QLTpjo8n4mm+MJDJYzfz/jVV9vl8ehMahjj1M0/KmO/vNvDbP2s
 owWU1FBjX7TV6kHa+SQGqd1HfXS1YUx203I4SDmPj8vSXtysvSOWClT3HO6i6O5S
 MSS3Me+rx9eMFMISNghL8I466+lPlGxK14DmLUE4l0kfoKyd4eHQw+ft76D6Twuz
 JqjegAGA1nzqDO0XDXb4WPjrPKG8r8Ven2eInF3kncku9GyeEafL+L+nmj7PHsE7
 dixWo2TQ9z1Wm/n1NWlU02ZSLdbetUtYTvZczUhevtNzuYUtILihcFZO3+Cp7V4p
 R2WwJ5XXdfS8g8Q9kJCOuVd9fZ+3hQvEF1IwWCP9ZZfmIC6/4/uGGFB6TJu7HmZC
 trpQYn9l5aP9L9Uq8t+9j+XoDEzQW0tZGpiYI9ypAa5Q5xbw3Ez2JNTbF7YVqQE2
 iFDwuuy/X1iNvifniQgdodKVQLK/PcNrlcNb/gPG6cGCWjlTj3SKT9SlrwAgSDZW
 pFWFb9NtN3ORjLeCiONo/ZGpZzM9/XQplub+4WuXQXGNJasRIoE=
 =Q4JA
 -----END PGP SIGNATURE-----

Merge tag 'for-6.1-rc6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux

Pull btrfs fixes from David Sterba:

 - fix a regression in nowait + buffered write

 - in zoned mode fix endianness when comparing super block generation

 - locking and lockdep fixes:
     - fix potential sleeping under spinlock when setting qgroup limit
     - lockdep warning fixes when btrfs_path is freed after copy_to_user
     - do not modify log tree while holding a leaf from fs tree locked

 - fix freeing of sysfs files of static features on error

 - use kv.alloc for zone map allocation as a fallback to avoid warnings
   due to high order allocation

 - send, avoid unaligned encoded writes when attempting to clone range

* tag 'for-6.1-rc6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
  btrfs: sysfs: normalize the error handling branch in btrfs_init_sysfs()
  btrfs: do not modify log tree while holding a leaf from fs tree locked
  btrfs: use kvcalloc in btrfs_get_dev_zone_info
  btrfs: qgroup: fix sleep from invalid context bug in btrfs_qgroup_inherit()
  btrfs: send: avoid unaligned encoded writes when attempting to clone range
  btrfs: zoned: fix missing endianness conversion in sb_write_pointer
  btrfs: free btrfs_path before copying subvol info to userspace
  btrfs: free btrfs_path before copying fspath to userspace
  btrfs: free btrfs_path before copying inodes to userspace
  btrfs: free btrfs_path before copying root refs to userspace
  btrfs: fix assertion failure and blocking during nowait buffered write
2022-11-25 13:24:05 -08:00
Linus Torvalds
0b1dcc2cf5 24 hotfixes. 8 marked cc:stable and 16 for post-6.0 issues.
There have been a lot of hotfixes this cycle, and this is quite a large
 batch given how far we are into the -rc cycle.  Presumably a reflection of
 the unusually large amount of MM material which went into 6.1-rc1.
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCY4Bd6gAKCRDdBJ7gKXxA
 jvX6AQCsG1ld24kMpdD+70XXUyC29g/6/jribgtZApHyDYjxSwD/WmLNpPlUPRax
 WB071Y5w65vjSTUKvwU0OLGbHwyxgAw=
 =swD5
 -----END PGP SIGNATURE-----

Merge tag 'mm-hotfixes-stable-2022-11-24' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull hotfixes from Andrew Morton:
 "24 MM and non-MM hotfixes. 8 marked cc:stable and 16 for post-6.0
  issues.

  There have been a lot of hotfixes this cycle, and this is quite a
  large batch given how far we are into the -rc cycle. Presumably a
  reflection of the unusually large amount of MM material which went
  into 6.1-rc1"

* tag 'mm-hotfixes-stable-2022-11-24' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (24 commits)
  test_kprobes: fix implicit declaration error of test_kprobes
  nilfs2: fix nilfs_sufile_mark_dirty() not set segment usage as dirty
  mm/cgroup/reclaim: fix dirty pages throttling on cgroup v1
  mm: fix unexpected changes to {failslab|fail_page_alloc}.attr
  swapfile: fix soft lockup in scan_swap_map_slots
  hugetlb: fix __prep_compound_gigantic_page page flag setting
  kfence: fix stack trace pruning
  proc/meminfo: fix spacing in SecPageTables
  mm: multi-gen LRU: retry folios written back while isolated
  mailmap: update email address for Satya Priya
  mm/migrate_device: return number of migrating pages in args->cpages
  kbuild: fix -Wimplicit-function-declaration in license_is_gpl_compatible
  MAINTAINERS: update Alex Hung's email address
  mailmap: update Alex Hung's email address
  mm: mmap: fix documentation for vma_mas_szero
  mm/damon/sysfs-schemes: skip stats update if the scheme directory is removed
  mm/memory: return vm_fault_t result from migrate_to_ram() callback
  mm: correctly charge compressed memory to its memcg
  ipc/shm: call underlying open/close vm_ops
  gcov: clang: fix the buffer overflow issue
  ...
2022-11-25 10:18:25 -08:00
Linus Torvalds
b308570957 a couple of fixes, one of them for this cycle regression...
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQQqUNBr3gm4hGXdBJlZ7Krx/gZQ6wUCY4A7qAAKCRBZ7Krx/gZQ
 61SoAP9bWYuW44RNEkitOIPCi/bmWy5D9alVEjWinmxzsT94NgEA5rfc4uuSCLp1
 se+Qgcu5iNKpHTXIOAOX7ozCCQJzQgo=
 =GK/t
 -----END PGP SIGNATURE-----

Merge tag 'pull-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs

Pull vfs fixes from Al Viro:
 "A couple of fixes, one of them for this cycle regression..."

* tag 'pull-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  vfs: vfs_tmpfile: ensure O_EXCL flag is enforced
  fs: use acquire ordering in __fget_light()
2022-11-25 10:12:43 -08:00
Damien Le Moal
db58653ce0 zonefs: Fix active zone accounting
If a file zone transitions to the offline or readonly state from an
active state, we must clear the zone active flag and decrement the
active seq file counter. Do so in zonefs_account_active() using the new
zonefs inode flags ZONEFS_ZONE_OFFLINE and ZONEFS_ZONE_READONLY. These
flags are set if necessary in zonefs_check_zone_condition() based on the
result of report zones operation after an IO error.

Fixes: 87c9ce3ffe ("zonefs: Add active seq file accounting")
Cc: stable@vger.kernel.org
Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
2022-11-25 17:01:22 +09:00
Amir Goldstein
10bc8e4af6 vfs: fix copy_file_range() averts filesystem freeze protection
Commit 868f9f2f8e ("vfs: fix copy_file_range() regression in cross-fs
copies") removed fallback to generic_copy_file_range() for cross-fs
cases inside vfs_copy_file_range().

To preserve behavior of nfsd and ksmbd server-side-copy, the fallback to
generic_copy_file_range() was added in nfsd and ksmbd code, but that
call is missing sb_start_write(), fsnotify hooks and more.

Ideally, nfsd and ksmbd would pass a flag to vfs_copy_file_range() that
will take care of the fallback, but that code would be subtle and we got
vfs_copy_file_range() logic wrong too many times already.

Instead, add a flag to explicitly request vfs_copy_file_range() to
perform only generic_copy_file_range() and let nfsd and ksmbd use this
flag only in the fallback path.

This choise keeps the logic changes to minimum in the non-nfsd/ksmbd code
paths to reduce the risk of further regressions.

Fixes: 868f9f2f8e ("vfs: fix copy_file_range() regression in cross-fs copies")
Tested-by: Namjae Jeon <linkinjeon@kernel.org>
Tested-by: Luis Henriques <lhenriques@suse.de>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2022-11-25 00:52:28 -05:00
Linus Torvalds
6fd2152fd1 Fix a regression in the lazytime code that was introduced in v6.1-rc1,
and a use-after-free that can be triggered by a maliciously corrupted
 file system.
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEK2m5VNv+CHkogTfJ8vlZVpUNgaMFAmN+1NoACgkQ8vlZVpUN
 gaNY8Qf/ZJyI8JZTEyrd7KxM1S6eDwUth1kYsQQtqQojd5tTqYOsslSiTp2Yw+LI
 4Gw75tDiMA6CYt+BuvbbgOk36YXaPW69sB+uParL/hgK005R50raQZ7oMdjQba4Z
 ODa+x27r5SVZIrcEAcHX15+BcjeCDZ/e5RMsV37ww+LsKNlnWNYn4o3S6eIv1ERo
 0iqgasbaaATCy37gStRvtnbsyPWDdwL5XlJg0XnqFLzo6Yz1NMcXaxEZehyCaCSc
 SixkjSR1gafQu/0ZTdVaH1vXQPmFwCEMOGONdbb+FrQGnIBFv/kvgXeTKTkjOP1+
 2GV7UgdPXeUNECTrMBieEpxsLqpcZw==
 =8ya6
 -----END PGP SIGNATURE-----

Merge tag 'ext4_for_linus_stable2' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4

Pull ext4 fixes from Ted Ts'o:
 "Fix a regression in the lazytime code that was introduced in v6.1-rc1,
  and a use-after-free that can be triggered by a maliciously corrupted
  file system"

* tag 'ext4_for_linus_stable2' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
  fs: do not update freeing inode i_io_list
  ext4: fix use-after-free in ext4_ext_shift_extents
2022-11-24 10:22:42 -08:00
Chuck Lever
ac8db824ea NFSD: Fix reads with a non-zero offset that don't end on a page boundary
This was found when virtual machines with nfs-mounted qcow2 disks
failed to boot properly.

Reported-by: Anders Blomdell <anders.blomdell@control.lth.se>
Suggested-by: Al Viro <viro@zeniv.linux.org.uk>
Link: https://bugzilla.redhat.com/show_bug.cgi?id=2142132
Fixes: bfbfb6182a ("nfsd_splice_actor(): handle compound pages")
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-11-23 14:32:35 -05:00
David Howells
9f0933ac02 fscache: fix OOB Read in __fscache_acquire_volume
The type of a->key[0] is char in fscache_volume_same().  If the length
of cache volume key is greater than 127, the value of a->key[0] is less
than 0.  In this case, klen becomes much larger than 255 after type
conversion, because the type of klen is size_t.  As a result, memcmp()
is read out of bounds.

This causes a slab-out-of-bounds Read in __fscache_acquire_volume(), as
reported by Syzbot.

Fix this by changing the type of the stored key to "u8 *" rather than
"char *" (it isn't a simple string anyway).  Also put in a check that
the volume name doesn't exceed NAME_MAX.

  BUG: KASAN: slab-out-of-bounds in memcmp+0x16f/0x1c0 lib/string.c:757
  Read of size 8 at addr ffff888016f3aa90 by task syz-executor344/3613
  Call Trace:
   memcmp+0x16f/0x1c0 lib/string.c:757
   memcmp include/linux/fortify-string.h:420 [inline]
   fscache_volume_same fs/fscache/volume.c:133 [inline]
   fscache_hash_volume fs/fscache/volume.c:171 [inline]
   __fscache_acquire_volume+0x76c/0x1080 fs/fscache/volume.c:328
   fscache_acquire_volume include/linux/fscache.h:204 [inline]
   v9fs_cache_session_get_cookie+0x143/0x240 fs/9p/cache.c:34
   v9fs_session_init+0x1166/0x1810 fs/9p/v9fs.c:473
   v9fs_mount+0xba/0xc90 fs/9p/vfs_super.c:126
   legacy_get_tree+0x105/0x220 fs/fs_context.c:610
   vfs_get_tree+0x89/0x2f0 fs/super.c:1530
   do_new_mount fs/namespace.c:3040 [inline]
   path_mount+0x1326/0x1e20 fs/namespace.c:3370
   do_mount fs/namespace.c:3383 [inline]
   __do_sys_mount fs/namespace.c:3591 [inline]
   __se_sys_mount fs/namespace.c:3568 [inline]
   __x64_sys_mount+0x27f/0x300 fs/namespace.c:3568

Fixes: 62ab633523 ("fscache: Implement volume registration")
Reported-by: syzbot+a76f6a6e524cf2080aa3@syzkaller.appspotmail.com
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Zhang Peng <zhangpeng362@huawei.com>
Reviewed-by: Jingbo Xu <jefflexu@linux.alibaba.com>
cc: Dominique Martinet <asmadeus@codewreck.org>
cc: Jeff Layton <jlayton@kernel.org>
cc: v9fs-developer@lists.sourceforge.net
cc: linux-cachefs@redhat.com
Link: https://lore.kernel.org/r/Y3OH+Dmi0QIOK18n@codewreck.org/ # Zhang Peng's v1 fix
Link: https://lore.kernel.org/r/20221115140447.2971680-1-zhangpeng362@huawei.com/ # Zhang Peng's v2 fix
Link: https://lore.kernel.org/r/166869954095.3793579.8500020902371015443.stgit@warthog.procyon.org.uk/ # v1
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-11-23 10:31:13 -08:00
Zhen Lei
ffdbb44f2f btrfs: sysfs: normalize the error handling branch in btrfs_init_sysfs()
Although kset_unregister() can eventually remove all attribute files,
explicitly rolling back with the matching function makes the code logic
look clearer.

CC: stable@vger.kernel.org # 5.4+
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-11-23 16:52:22 +01:00
Filipe Manana
796787c978 btrfs: do not modify log tree while holding a leaf from fs tree locked
When logging an inode in full mode, or when logging xattrs or when logging
the dir index items of a directory, we are modifying the log tree while
holding a read lock on a leaf from the fs/subvolume tree. This can lead to
a deadlock in rare circumstances, but it is a real possibility, and it was
recently reported by syzbot with the following trace from lockdep:

   WARNING: possible circular locking dependency detected
   6.1.0-rc5-next-20221116-syzkaller #0 Not tainted
   ------------------------------------------------------
   syz-executor.1/16154 is trying to acquire lock:
   ffff88807e3084a0 (&delayed_node->mutex){+.+.}-{3:3}, at: __btrfs_release_delayed_node.part.0+0xa1/0xf30 fs/btrfs/delayed-inode.c:256

   but task is already holding lock:
   ffff88807df33078 (btrfs-log-00){++++}-{3:3}, at: __btrfs_tree_lock+0x32/0x3d0 fs/btrfs/locking.c:197

   which lock already depends on the new lock.

   the existing dependency chain (in reverse order) is:

   -> #2 (btrfs-log-00){++++}-{3:3}:
          down_read_nested+0x9e/0x450 kernel/locking/rwsem.c:1634
          __btrfs_tree_read_lock+0x32/0x350 fs/btrfs/locking.c:135
          btrfs_tree_read_lock fs/btrfs/locking.c:141 [inline]
          btrfs_read_lock_root_node+0x82/0x3a0 fs/btrfs/locking.c:280
          btrfs_search_slot_get_root fs/btrfs/ctree.c:1678 [inline]
          btrfs_search_slot+0x3ca/0x2c70 fs/btrfs/ctree.c:1998
          btrfs_lookup_csum+0x116/0x3f0 fs/btrfs/file-item.c:209
          btrfs_csum_file_blocks+0x40e/0x1370 fs/btrfs/file-item.c:1021
          log_csums.isra.0+0x244/0x2d0 fs/btrfs/tree-log.c:4258
          copy_items.isra.0+0xbfb/0xed0 fs/btrfs/tree-log.c:4403
          copy_inode_items_to_log+0x13d6/0x1d90 fs/btrfs/tree-log.c:5873
          btrfs_log_inode+0xb19/0x4680 fs/btrfs/tree-log.c:6495
          btrfs_log_inode_parent+0x890/0x2a20 fs/btrfs/tree-log.c:6982
          btrfs_log_dentry_safe+0x59/0x80 fs/btrfs/tree-log.c:7083
          btrfs_sync_file+0xa41/0x13c0 fs/btrfs/file.c:1921
          vfs_fsync_range+0x13e/0x230 fs/sync.c:188
          generic_write_sync include/linux/fs.h:2856 [inline]
          iomap_dio_complete+0x73a/0x920 fs/iomap/direct-io.c:128
          btrfs_direct_write fs/btrfs/file.c:1536 [inline]
          btrfs_do_write_iter+0xba2/0x1470 fs/btrfs/file.c:1668
          call_write_iter include/linux/fs.h:2160 [inline]
          do_iter_readv_writev+0x20b/0x3b0 fs/read_write.c:735
          do_iter_write+0x182/0x700 fs/read_write.c:861
          vfs_iter_write+0x74/0xa0 fs/read_write.c:902
          iter_file_splice_write+0x745/0xc90 fs/splice.c:686
          do_splice_from fs/splice.c:764 [inline]
          direct_splice_actor+0x114/0x180 fs/splice.c:931
          splice_direct_to_actor+0x335/0x8a0 fs/splice.c:886
          do_splice_direct+0x1ab/0x280 fs/splice.c:974
          do_sendfile+0xb19/0x1270 fs/read_write.c:1255
          __do_sys_sendfile64 fs/read_write.c:1323 [inline]
          __se_sys_sendfile64 fs/read_write.c:1309 [inline]
          __x64_sys_sendfile64+0x259/0x2c0 fs/read_write.c:1309
          do_syscall_x64 arch/x86/entry/common.c:50 [inline]
          do_syscall_64+0x39/0xb0 arch/x86/entry/common.c:80
          entry_SYSCALL_64_after_hwframe+0x63/0xcd

   -> #1 (btrfs-tree-00){++++}-{3:3}:
          __lock_release kernel/locking/lockdep.c:5382 [inline]
          lock_release+0x371/0x810 kernel/locking/lockdep.c:5688
          up_write+0x2a/0x520 kernel/locking/rwsem.c:1614
          btrfs_tree_unlock_rw fs/btrfs/locking.h:189 [inline]
          btrfs_unlock_up_safe+0x1e3/0x290 fs/btrfs/locking.c:238
          search_leaf fs/btrfs/ctree.c:1832 [inline]
          btrfs_search_slot+0x265e/0x2c70 fs/btrfs/ctree.c:2074
          btrfs_insert_empty_items+0xbd/0x1c0 fs/btrfs/ctree.c:4133
          btrfs_insert_delayed_item+0x826/0xfa0 fs/btrfs/delayed-inode.c:746
          btrfs_insert_delayed_items fs/btrfs/delayed-inode.c:824 [inline]
          __btrfs_commit_inode_delayed_items fs/btrfs/delayed-inode.c:1111 [inline]
          __btrfs_run_delayed_items+0x280/0x590 fs/btrfs/delayed-inode.c:1153
          flush_space+0x147/0xe90 fs/btrfs/space-info.c:728
          btrfs_async_reclaim_metadata_space+0x541/0xc10 fs/btrfs/space-info.c:1086
          process_one_work+0x9bf/0x1710 kernel/workqueue.c:2289
          worker_thread+0x669/0x1090 kernel/workqueue.c:2436
          kthread+0x2e8/0x3a0 kernel/kthread.c:376
          ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:308

   -> #0 (&delayed_node->mutex){+.+.}-{3:3}:
          check_prev_add kernel/locking/lockdep.c:3097 [inline]
          check_prevs_add kernel/locking/lockdep.c:3216 [inline]
          validate_chain kernel/locking/lockdep.c:3831 [inline]
          __lock_acquire+0x2a43/0x56d0 kernel/locking/lockdep.c:5055
          lock_acquire kernel/locking/lockdep.c:5668 [inline]
          lock_acquire+0x1e3/0x630 kernel/locking/lockdep.c:5633
          __mutex_lock_common kernel/locking/mutex.c:603 [inline]
          __mutex_lock+0x12f/0x1360 kernel/locking/mutex.c:747
          __btrfs_release_delayed_node.part.0+0xa1/0xf30 fs/btrfs/delayed-inode.c:256
          __btrfs_release_delayed_node fs/btrfs/delayed-inode.c:251 [inline]
          btrfs_release_delayed_node fs/btrfs/delayed-inode.c:281 [inline]
          btrfs_remove_delayed_node+0x52/0x60 fs/btrfs/delayed-inode.c:1285
          btrfs_evict_inode+0x511/0xf30 fs/btrfs/inode.c:5554
          evict+0x2ed/0x6b0 fs/inode.c:664
          dispose_list+0x117/0x1e0 fs/inode.c:697
          prune_icache_sb+0xeb/0x150 fs/inode.c:896
          super_cache_scan+0x391/0x590 fs/super.c:106
          do_shrink_slab+0x464/0xce0 mm/vmscan.c:843
          shrink_slab_memcg mm/vmscan.c:912 [inline]
          shrink_slab+0x388/0x660 mm/vmscan.c:991
          shrink_node_memcgs mm/vmscan.c:6088 [inline]
          shrink_node+0x93d/0x1f30 mm/vmscan.c:6117
          shrink_zones mm/vmscan.c:6355 [inline]
          do_try_to_free_pages+0x3b4/0x17a0 mm/vmscan.c:6417
          try_to_free_mem_cgroup_pages+0x3a4/0xa70 mm/vmscan.c:6732
          reclaim_high.constprop.0+0x182/0x230 mm/memcontrol.c:2393
          mem_cgroup_handle_over_high+0x190/0x520 mm/memcontrol.c:2578
          try_charge_memcg+0xe0c/0x12f0 mm/memcontrol.c:2816
          try_charge mm/memcontrol.c:2827 [inline]
          charge_memcg+0x90/0x3b0 mm/memcontrol.c:6889
          __mem_cgroup_charge+0x2b/0x90 mm/memcontrol.c:6910
          mem_cgroup_charge include/linux/memcontrol.h:667 [inline]
          __filemap_add_folio+0x615/0xf80 mm/filemap.c:852
          filemap_add_folio+0xaf/0x1e0 mm/filemap.c:934
          __filemap_get_folio+0x389/0xd80 mm/filemap.c:1976
          pagecache_get_page+0x2e/0x280 mm/folio-compat.c:104
          find_or_create_page include/linux/pagemap.h:612 [inline]
          alloc_extent_buffer+0x2b9/0x1580 fs/btrfs/extent_io.c:4588
          btrfs_init_new_buffer fs/btrfs/extent-tree.c:4869 [inline]
          btrfs_alloc_tree_block+0x2e1/0x1320 fs/btrfs/extent-tree.c:4988
          __btrfs_cow_block+0x3b2/0x1420 fs/btrfs/ctree.c:440
          btrfs_cow_block+0x2fa/0x950 fs/btrfs/ctree.c:595
          btrfs_search_slot+0x11b0/0x2c70 fs/btrfs/ctree.c:2038
          btrfs_update_root+0xdb/0x630 fs/btrfs/root-tree.c:137
          update_log_root fs/btrfs/tree-log.c:2841 [inline]
          btrfs_sync_log+0xbfb/0x2870 fs/btrfs/tree-log.c:3064
          btrfs_sync_file+0xdb9/0x13c0 fs/btrfs/file.c:1947
          vfs_fsync_range+0x13e/0x230 fs/sync.c:188
          generic_write_sync include/linux/fs.h:2856 [inline]
          iomap_dio_complete+0x73a/0x920 fs/iomap/direct-io.c:128
          btrfs_direct_write fs/btrfs/file.c:1536 [inline]
          btrfs_do_write_iter+0xba2/0x1470 fs/btrfs/file.c:1668
          call_write_iter include/linux/fs.h:2160 [inline]
          do_iter_readv_writev+0x20b/0x3b0 fs/read_write.c:735
          do_iter_write+0x182/0x700 fs/read_write.c:861
          vfs_iter_write+0x74/0xa0 fs/read_write.c:902
          iter_file_splice_write+0x745/0xc90 fs/splice.c:686
          do_splice_from fs/splice.c:764 [inline]
          direct_splice_actor+0x114/0x180 fs/splice.c:931
          splice_direct_to_actor+0x335/0x8a0 fs/splice.c:886
          do_splice_direct+0x1ab/0x280 fs/splice.c:974
          do_sendfile+0xb19/0x1270 fs/read_write.c:1255
          __do_sys_sendfile64 fs/read_write.c:1323 [inline]
          __se_sys_sendfile64 fs/read_write.c:1309 [inline]
          __x64_sys_sendfile64+0x259/0x2c0 fs/read_write.c:1309
          do_syscall_x64 arch/x86/entry/common.c:50 [inline]
          do_syscall_64+0x39/0xb0 arch/x86/entry/common.c:80
          entry_SYSCALL_64_after_hwframe+0x63/0xcd

   other info that might help us debug this:

   Chain exists of:
     &delayed_node->mutex --> btrfs-tree-00 --> btrfs-log-00

   Possible unsafe locking scenario:

          CPU0                    CPU1
          ----                    ----
     lock(btrfs-log-00);
                                  lock(btrfs-tree-00);
                                  lock(btrfs-log-00);
     lock(&delayed_node->mutex);

Holding a read lock on a leaf from a fs/subvolume tree creates a nasty
lock dependency when we are COWing extent buffers for the log tree and we
have two tasks modifying the log tree, with each one in one of the
following 2 scenarios:

1) Modifying the log tree triggers an extent buffer allocation while
   holding a write lock on a parent extent buffer from the log tree.
   Allocating the pages for an extent buffer, or the extent buffer
   struct, can trigger inode eviction and finally the inode eviction
   will trigger a release/remove of a delayed node, which requires
   taking the delayed node's mutex;

2) Allocating a metadata extent for a log tree can trigger the async
   reclaim thread and make us wait for it to release enough space and
   unblock our reservation ticket. The reclaim thread can start flushing
   delayed items, and that in turn results in the need to lock delayed
   node mutexes and in the need to write lock extent buffers of a
   subvolume tree - all this while holding a write lock on the parent
   extent buffer in the log tree.

So one task in scenario 1) running in parallel with another task in
scenario 2) could lead to a deadlock, one wanting to lock a delayed node
mutex while having a read lock on a leaf from the subvolume, while the
other is holding the delayed node's mutex and wants to write lock the same
subvolume leaf for flushing delayed items.

Fix this by cloning the leaf of the fs/subvolume tree, release/unlock the
fs/subvolume leaf and use the clone leaf instead.

Reported-by: syzbot+9b7c21f486f5e7f8d029@syzkaller.appspotmail.com
Link: https://lore.kernel.org/linux-btrfs/000000000000ccc93c05edc4d8cf@google.com/
CC: stable@vger.kernel.org # 6.0+
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-11-23 16:52:15 +01:00
Christoph Hellwig
8fe97d47b5 btrfs: use kvcalloc in btrfs_get_dev_zone_info
Otherwise the kernel memory allocator seems to be unhappy about failing
order 6 allocations for the zones array, that cause 100% reproducible
mount failures in my qemu setup:

  [26.078981] mount: page allocation failure: order:6, mode:0x40dc0(GFP_KERNEL|__GFP_COMP|__GFP_ZERO), nodemask=(null)
  [26.079741] CPU: 0 PID: 2965 Comm: mount Not tainted 6.1.0-rc5+ #185
  [26.080181] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
  [26.080950] Call Trace:
  [26.081132]  <TASK>
  [26.081291]  dump_stack_lvl+0x56/0x6f
  [26.081554]  warn_alloc+0x117/0x140
  [26.081808]  ? __alloc_pages_direct_compact+0x1b5/0x300
  [26.082174]  __alloc_pages_slowpath.constprop.0+0xd0e/0xde0
  [26.082569]  __alloc_pages+0x32a/0x340
  [26.082836]  __kmalloc_large_node+0x4d/0xa0
  [26.083133]  ? trace_kmalloc+0x29/0xd0
  [26.083399]  kmalloc_large+0x14/0x60
  [26.083654]  btrfs_get_dev_zone_info+0x1b9/0xc00
  [26.083980]  ? _raw_spin_unlock_irqrestore+0x28/0x50
  [26.084328]  btrfs_get_dev_zone_info_all_devices+0x54/0x80
  [26.084708]  open_ctree+0xed4/0x1654
  [26.084974]  btrfs_mount_root.cold+0x12/0xde
  [26.085288]  ? lock_is_held_type+0xe2/0x140
  [26.085603]  legacy_get_tree+0x28/0x50
  [26.085876]  vfs_get_tree+0x1d/0xb0
  [26.086139]  vfs_kern_mount.part.0+0x6c/0xb0
  [26.086456]  btrfs_mount+0x118/0x3a0
  [26.086728]  ? lock_is_held_type+0xe2/0x140
  [26.087043]  legacy_get_tree+0x28/0x50
  [26.087323]  vfs_get_tree+0x1d/0xb0
  [26.087587]  path_mount+0x2ba/0xbe0
  [26.087850]  ? _raw_spin_unlock_irqrestore+0x38/0x50
  [26.088217]  __x64_sys_mount+0xfe/0x140
  [26.088506]  do_syscall_64+0x35/0x80
  [26.088776]  entry_SYSCALL_64_after_hwframe+0x63/0xcd

Fixes: 5b31646898 ("btrfs: get zone information of zoned block devices")
CC: stable@vger.kernel.org # 5.15+
Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-11-23 16:51:50 +01:00
Chen Zhongjin
512c5ca01a nilfs2: fix nilfs_sufile_mark_dirty() not set segment usage as dirty
When extending segments, nilfs_sufile_alloc() is called to get an
unassigned segment, then mark it as dirty to avoid accidentally allocating
the same segment in the future.

But for some special cases such as a corrupted image it can be unreliable.
If such corruption of the dirty state of the segment occurs, nilfs2 may
reallocate a segment that is in use and pick the same segment for writing
twice at the same time.

This will cause the problem reported by syzkaller:
https://syzkaller.appspot.com/bug?id=c7c4748e11ffcc367cef04f76e02e931833cbd24

This case started with segbuf1.segnum = 3, nextnum = 4 when constructed. 
It supposed segment 4 has already been allocated and marked as dirty.

However the dirty state was corrupted and segment 4 usage was not dirty. 
For the first time nilfs_segctor_extend_segments() segment 4 was allocated
again, which made segbuf2 and next segbuf3 had same segment 4.

sb_getblk() will get same bh for segbuf2 and segbuf3, and this bh is added
to both buffer lists of two segbuf.  It makes the lists broken which
causes NULL pointer dereference.

Fix the problem by setting usage as dirty every time in
nilfs_sufile_mark_dirty(), which is called during constructing current
segment to be written out and before allocating next segment.

[chenzhongjin@huawei.com: add lock protection per Ryusuke]
  Link: https://lkml.kernel.org/r/20221121091141.214703-1-chenzhongjin@huawei.com
Link: https://lkml.kernel.org/r/20221118063304.140187-1-chenzhongjin@huawei.com
Fixes: 9ff05123e3 ("nilfs2: segment constructor")
Signed-off-by: Chen Zhongjin <chenzhongjin@huawei.com>
Reported-by: <syzbot+77e4f0...@syzkaller.appspotmail.com>
Reported-by: Liu Shixin <liushixin2@huawei.com>
Acked-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Tested-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-11-22 18:50:45 -08:00
Yosry Ahmed
f850c84948 proc/meminfo: fix spacing in SecPageTables
SecPageTables has a tab after it instead of a space, this can break
fragile parsers that depend on spaces after the stat names.

Link: https://lkml.kernel.org/r/20221117043247.133294-1-yosryahmed@google.com
Fixes: ebc97a52b5 ("mm: add NR_SECONDARY_PAGETABLE to count secondary page table uses.")
Signed-off-by: Yosry Ahmed <yosryahmed@google.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Shakeel Butt <shakeelb@google.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Sean Christopherson <seanjc@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-11-22 18:50:44 -08:00
Svyatoslav Feldsherov
4e3c51f4e8 fs: do not update freeing inode i_io_list
After commit cbfecb927f ("fs: record I_DIRTY_TIME even if inode
already has I_DIRTY_INODE") writeback_single_inode can push inode with
I_DIRTY_TIME set to b_dirty_time list. In case of freeing inode with
I_DIRTY_TIME set this can happen after deletion of inode from i_io_list
at evict. Stack trace is following.

evict
fat_evict_inode
fat_truncate_blocks
fat_flush_inodes
writeback_inode
sync_inode_metadata(inode, sync=0)
writeback_single_inode(inode, wbc) <- wbc->sync_mode == WB_SYNC_NONE

This will lead to use after free in flusher thread.

Similar issue can be triggered if writeback_single_inode in the
stack trace update inode->i_io_list. Add explicit check to avoid it.

Fixes: cbfecb927f ("fs: record I_DIRTY_TIME even if inode already has I_DIRTY_INODE")
Reported-by: syzbot+6ba92bd00d5093f7e371@syzkaller.appspotmail.com
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Svyatoslav Feldsherov <feldsherov@google.com>
Link: https://lore.kernel.org/r/20221115202001.324188-1-feldsherov@google.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2022-11-22 17:00:00 -05:00
Zhang Xiaoxu
4e45886956 zonefs: Fix race between modprobe and mount
There is a race between modprobe and mount as below:

 modprobe zonefs                | mount -t zonefs
--------------------------------|-------------------------
 zonefs_init                    |
  register_filesystem       [1] |
                                | zonefs_fill_super    [2]
  zonefs_sysfs_init         [3] |

1. register zonefs suceess, then
2. user can mount the zonefs
3. if sysfs initialize failed, the module initialize failed.

Then the mount process maybe some error happened since the module
initialize failed.

Let's register zonefs after all dependency resource ready. And
reorder the dependency resource release in module exit.

Fixes: 9277a6d4fb ("zonefs: Export open zone resource information through sysfs")
Signed-off-by: Zhang Xiaoxu <zhangxiaoxu5@huawei.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
2022-11-22 14:18:32 +09:00
ChenXiaoSong
5024878477 cifs: fix missing unlock in cifs_file_copychunk_range()
xfstests generic/013 and generic/476 reported WARNING as follows:

  WARNING: lock held when returning to user space!
  6.1.0-rc5+ #4 Not tainted
  ------------------------------------------------
  fsstress/504233 is leaving the kernel with locks still held!
  2 locks held by fsstress/504233:
   #0: ffff888054c38850 (&sb->s_type->i_mutex_key#21){+.+.}-{3:3}, at:
                        lock_two_nondirectories+0xcf/0xf0
   #1: ffff8880b8fec750 (&sb->s_type->i_mutex_key#21/4){+.+.}-{3:3}, at:
                        lock_two_nondirectories+0xb7/0xf0

This will lead to deadlock and hungtask.

Fix this by releasing locks when failed to write out on a file range in
cifs_file_copychunk_range().

Fixes: 3e3761f1ec ("smb3: use filemap_write_and_wait_range instead of filemap_write_and_wait")
Cc: stable@vger.kernel.org # 6.0
Reviewed-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
Signed-off-by: ChenXiaoSong <chenxiaosong2@huawei.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2022-11-21 10:27:03 -06:00
ChenXiaoSong
f7e942b5bb btrfs: qgroup: fix sleep from invalid context bug in btrfs_qgroup_inherit()
Syzkaller reported BUG as follows:

  BUG: sleeping function called from invalid context at
       include/linux/sched/mm.h:274
  Call Trace:
   <TASK>
   dump_stack_lvl+0xcd/0x134
   __might_resched.cold+0x222/0x26b
   kmem_cache_alloc+0x2e7/0x3c0
   update_qgroup_limit_item+0xe1/0x390
   btrfs_qgroup_inherit+0x147b/0x1ee0
   create_subvol+0x4eb/0x1710
   btrfs_mksubvol+0xfe5/0x13f0
   __btrfs_ioctl_snap_create+0x2b0/0x430
   btrfs_ioctl_snap_create_v2+0x25a/0x520
   btrfs_ioctl+0x2a1c/0x5ce0
   __x64_sys_ioctl+0x193/0x200
   do_syscall_64+0x35/0x80

Fix this by calling qgroup_dirty() on @dstqgroup, and update limit item in
btrfs_run_qgroups() later outside of the spinlock context.

CC: stable@vger.kernel.org # 4.9+
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: ChenXiaoSong <chenxiaosong2@huawei.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-11-21 14:57:52 +01:00
Filipe Manana
a11452a370 btrfs: send: avoid unaligned encoded writes when attempting to clone range
When trying to see if we can clone a file range, there are cases where we
end up sending two write operations in case the inode from the source root
has an i_size that is not sector size aligned and the length from the
current offset to its i_size is less than the remaining length we are
trying to clone.

Issuing two write operations when we could instead issue a single write
operation is not incorrect. However it is not optimal, specially if the
extents are compressed and the flag BTRFS_SEND_FLAG_COMPRESSED was passed
to the send ioctl. In that case we can end up sending an encoded write
with an offset that is not sector size aligned, which makes the receiver
fallback to decompressing the data and writing it using regular buffered
IO (so re-compressing the data in case the fs is mounted with compression
enabled), because encoded writes fail with -EINVAL when an offset is not
sector size aligned.

The following example, which triggered a bug in the receiver code for the
fallback logic of decompressing + regular buffer IO and is fixed by the
patchset referred in a Link at the bottom of this changelog, is an example
where we have the non-optimal behaviour due to an unaligned encoded write:

   $ cat test.sh
   #!/bin/bash

   DEV=/dev/sdj
   MNT=/mnt/sdj

   mkfs.btrfs -f $DEV > /dev/null
   mount -o compress $DEV $MNT

   # File foo has a size of 33K, not aligned to the sector size.
   xfs_io -f -c "pwrite -S 0xab 0 33K" $MNT/foo

   xfs_io -f -c "pwrite -S 0xcd 0 64K" $MNT/bar

   # Now clone the first 32K of file bar into foo at offset 0.
   xfs_io -c "reflink $MNT/bar 0 0 32K" $MNT/foo

   # Snapshot the default subvolume and create a full send stream (v2).
   btrfs subvolume snapshot -r $MNT $MNT/snap

   btrfs send --compressed-data -f /tmp/test.send $MNT/snap

   echo -e "\nFile bar in the original filesystem:"
   od -A d -t x1 $MNT/snap/bar

   umount $MNT
   mkfs.btrfs -f $DEV > /dev/null
   mount $DEV $MNT

   echo -e "\nReceiving stream in a new filesystem..."
   btrfs receive -f /tmp/test.send $MNT

   echo -e "\nFile bar in the new filesystem:"
   od -A d -t x1 $MNT/snap/bar

   umount $MNT

Before this patch, the send stream included one regular write and one
encoded write for file 'bar', with the later being not sector size aligned
and causing the receiver to fallback to decompression + buffered writes.
The output of the btrfs receive command in verbose mode (-vvv):

   (...)
   mkfile o258-7-0
   rename o258-7-0 -> bar
   utimes
   clone bar - source=foo source offset=0 offset=0 length=32768
   write bar - offset=32768 length=1024
   encoded_write bar - offset=33792, len=4096, unencoded_offset=33792, unencoded_file_len=31744, unencoded_len=65536, compression=1, encryption=0
   encoded_write bar - falling back to decompress and write due to errno 22 ("Invalid argument")
   (...)

This patch avoids the regular write followed by an unaligned encoded write
so that we end up sending a single encoded write that is aligned. So after
this patch the stream content is (output of btrfs receive -vvv):

   (...)
   mkfile o258-7-0
   rename o258-7-0 -> bar
   utimes
   clone bar - source=foo source offset=0 offset=0 length=32768
   encoded_write bar - offset=32768, len=4096, unencoded_offset=32768, unencoded_file_len=32768, unencoded_len=65536, compression=1, encryption=0
   (...)

So we get more optimal behaviour and avoid the silent data loss bug in
versions of btrfs-progs affected by the bug referred by the Link tag
below (btrfs-progs v5.19, v5.19.1, v6.0 and v6.0.1).

Link: https://lore.kernel.org/linux-btrfs/cover.1668529099.git.fdmanana@suse.com/
Reviewed-by: Boris Burkov <boris@bur.io>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-11-21 14:41:41 +01:00
Christoph Hellwig
c51f0e6a12 btrfs: zoned: fix missing endianness conversion in sb_write_pointer
generation is an on-disk __le64 value, so use btrfs_super_generation to
convert it to host endian before comparing it.

Fixes: 12659251ca ("btrfs: implement log-structured superblock for ZONED mode")
CC: stable@vger.kernel.org # 5.15+
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-11-21 14:40:40 +01:00
Dan Carpenter
f391d6ee00 cifs: Use after free in debug code
This debug code dereferences "old_iface" after it was already freed by
the call to release_iface().  Re-order the debugging to avoid this
issue.

Fixes: b54034a73b ("cifs: during reconnect, update interface if necessary")
Cc: stable@vger.kernel.org # 5.19+
Reviewed-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2022-11-19 14:27:37 -06:00
Linus Torvalds
926028aaa3 4 cifs/smb3 client fixes
-----BEGIN PGP SIGNATURE-----
 
 iQGzBAABCgAdFiEE6fsu8pdIjtWE/DpLiiy9cAdyT1EFAmN4VVMACgkQiiy9cAdy
 T1F3AQv+N3jzbBdJobg19mr0tVrpz0Itw8BaPTAi2SmY5UFCXg8AEhT9bVj1XoSC
 lfTZJ7qhV6v1qu8HPyHY8sfW6Pna20FLsL8GyMD2dOU9NwfscecWJSeZX36OkaQX
 8Q12cz+J6DgM2NJw0VW6NUCeY6DilbdYG23ZXO0i1NSL8rCMpr7mhK8j5SYgL3Bn
 yTpd0DZ30fHdcAzt6bAvsekFPaqqbaBtSJl2Z7ibg5wiq/gt0WZ9eqUco9aMdfqW
 HJ9xeoTDq/836ZtJiHNJ5li7/jEO6AYoImkhS3Cr5TXSqtgDdPWVKM7P7CTL42yV
 xVcp3OQSiCV9tbGoqJkLMlOqpuM4f2oCkNHxj/bmblCV0QGty7zNet+s4yks2B0d
 fnmzIwYRbHCV1+rPOiSTA7zhR9qQKIoPEw2tsuWm+Pw/LnraTvnc6Mf0TefK68O/
 y/fEq2/zuvt4uHdBjDTUYqwzi9qYyglU/zga3FIRCG87WbOSUXSwVWIDaOgSrg9O
 FX+KKdA3
 =/bd9
 -----END PGP SIGNATURE-----

Merge tag '6.1-rc5-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6

Pull cifs fixes from Steve French:

 - two missing and one incorrect return value checks

 - fix leak on tlink mount failure

* tag '6.1-rc5-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6:
  cifs: add check for returning value of SMB2_set_info_init
  cifs: Fix wrong return value checking when GETFLAGS
  cifs: add check for returning value of SMB2_close_init
  cifs: Fix connections leak when tlink setup failed
2022-11-19 08:58:58 -08:00
Peter Griffin
406c706c7b vfs: vfs_tmpfile: ensure O_EXCL flag is enforced
If O_EXCL is *not* specified, then linkat() can be
used to link the temporary file into the filesystem.
If O_EXCL is specified then linkat() should fail (-1).

After commit 863f144f12 ("vfs: open inside ->tmpfile()")
the O_EXCL flag is no longer honored by the vfs layer for
tmpfile, which means the file can be linked even if O_EXCL
flag is specified, which is a change in behaviour for
userspace!

The open flags was previously passed as a parameter, so it
was uneffected by the changes to file->f_flags caused by
finish_open(). This patch fixes the issue by storing
file->f_flags in a local variable so the O_EXCL test
logic is restored.

This regression was detected by Android CTS Bionic fcntl()
tests running on android-mainline [1].

[1] https://android.googlesource.com/platform/bionic/+/
    refs/heads/master/tests/fcntl_test.cpp#352

Fixes: 863f144f12 ("vfs: open inside ->tmpfile()")
Acked-by: Miklos Szeredi <mszeredi@redhat.com>
Tested-by: Will McVicker <willmcvicker@google.com>
Signed-off-by: Peter Griffin <peter.griffin@linaro.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2022-11-19 02:22:11 -05:00
Linus Torvalds
bf5003a0dc zonefs fixes for 6.1-rc6
- Fix the IO error recovery path for failures happening in the last
    zone of device, and that zone is a "runt" zone (smaller than the
    other zone). The current code was failing to properly obtain a zone
    report in that case.
 
  - Remove the unused to_attr() function as it is unused, causing
    compilation warnings with clang.
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQSRPv8tYSvhwAzJdzjdoc3SxdoYdgUCY3gofwAKCRDdoc3SxdoY
 dpUlAQDa1mHGQzT0Lrg7y+4HILKsX1dmzJ6wIWTKaFg1f68GNgD/cjZB/jp3UKEi
 nmXAv/dGVx6ZBtIEybLDb522i1xccAg=
 =gAqT
 -----END PGP SIGNATURE-----

Merge tag 'zonefs-6.1-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/zonefs

Pull zonefs fixes from Damien Le Moal:

 - Fix the IO error recovery path for failures happening in the last
   zone of device, and that zone is a "runt" zone (smaller than the
   other zone). The current code was failing to properly obtain a zone
   report in that case.

 - Remove the unused to_attr() function as it is unused, causing
   compilation warnings with clang.

* tag 'zonefs-6.1-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/zonefs:
  zonefs: Remove to_attr() helper function
  zonefs: fix zone report size in __zonefs_io_error()
2022-11-18 17:17:42 -08:00
Linus Torvalds
63c8c0d7dc Driver core fixes for 6.1-rc6
Here are 2 small driver core fixes for 6.1-rc6:
 	- utsname fix, this one should already be in your tree as it
 	  came from a different tree earlier.
 	- kernfs bugfix for a much reported syzbot report that seems to
 	  keep getting triggered.
 
 Both of these have been in linux-next for a while with no reported
 issues.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 
 iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCY3dPug8cZ3JlZ0Brcm9h
 aC5jb20ACgkQMUfUDdst+ynaxgCfbhh4z3ARWRl1L5H+vtnH+MeaLsoAnRBKQ2yv
 GUXBXpiyz5TSzfGiP/Im
 =80uW
 -----END PGP SIGNATURE-----

Merge tag 'driver-core-6.1-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core

Pull driver core fixes from Greg KH:
 "Here are two small driver core fixes for 6.1-rc6:

   - utsname fix, this one should already be in your tree as it came
     from a different tree earlier.

   - kernfs bugfix for a much reported syzbot report that seems to keep
     getting triggered.

  Both of these have been in linux-next for a while with no reported
  issues"

* tag 'driver-core-6.1-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
  kernfs: Fix spurious lockdep warning in kernfs_find_and_get_node_by_id()
  kernel/utsname_sysctl.c: Add missing enum uts_proc value
2022-11-18 10:49:53 -08:00
Linus Torvalds
ae75334011 Three filesystem bug fixes, intended for stable.
-----BEGIN PGP SIGNATURE-----
 
 iQFHBAABCAAxFiEEydHwtzie9C7TfviiSn/eOAIR84sFAmN2pWUTHGlkcnlvbW92
 QGdtYWlsLmNvbQAKCRBKf944AhHzi4M8CACkcWyoJUN8yQwjJTOW+y1t5UP+o49i
 kFGGQMZTAgtTXkwhO4e8eJa+3QJFkbDZg2KEeJyrr64Y4NzM4K2oBzE2zVcKg78k
 gtZUbXsK5eiRvj12Nw59WPBY/quZNuweIM0cVgrz71CuWrnT0I6oAGVvFqz5x3vp
 w0Z8CUy+tB+FKlYEjLSY3qBXzXCB3S+9Vi6C1JCIEdJUUzxvCRoZThTLCm6XrXOo
 CJrNsUmNmXL5IlDBTiomJNAgnGGICgf4Q3xPlrRdOwKUZ4wMrMOPcfpfcmi2J4a2
 G5zenneauAwuX03yCjhQ+UvdjYfIBZYSUBMpXLFPL0iaW7ScpEEpYKqc
 =GAP4
 -----END PGP SIGNATURE-----

Merge tag 'ceph-for-6.1-rc6' of https://github.com/ceph/ceph-client

Pull ceph fixes from Ilya Dryomov:
 "Three filesystem bug fixes, intended for stable"

* tag 'ceph-for-6.1-rc6' of https://github.com/ceph/ceph-client:
  ceph: fix NULL pointer dereference for req->r_session
  ceph: avoid putting the realm twice when decoding snaps fails
  ceph: fix a NULL vs IS_ERR() check when calling ceph_lookup_inode()
  MAINTAINERS: git://github.com -> https://github.com for ceph
2022-11-17 13:28:28 -08:00
Linus Torvalds
81ac25651a Fixes:
- Fix another tracepoint crash
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEKLLlsBKG3yQ88j7+M2qzM29mf5cFAmN2YDoACgkQM2qzM29m
 f5cVcg//cit7+4c4/1S14dKhAxWaTXY02KOAD2rJ3AOfBVsTiTELQu7FzZV8r6L1
 zmrtHyjKAlxdNjBvZUUSVTrGqBMcUCoOS8+lK8Xjo3c0uOmI6gB5Eog9EQjI9pQe
 MUVGnHciv4yDV8ph1qTAwL5py+lO2b0BJs/ADrbOnXUw+NC17hivjta/yQyG0mLe
 hsNv3sGmXdOHgDUpqPz5ZY/b++pkF9lntaU53F7timkWc2v2lXyHCZRDUcp6AV5c
 +rMuJXdG5odU5UafUSeuL6towet+Hkhs3I5uLAvtBl+8WqzJ3dW9Je9kNpeJL3XD
 PAuKOjYdvuLMJ74/l9lF35Wyn3MZRCYjz5vIDICC0HxVqvN8raJPaLFlGA5UsPyQ
 7PDwatFrVEFqlDH/mv9EId/1l0PsBwrGaHr/Yh+FTSidzW4RXt27yIR/Gl55e7Da
 lMGO1+nr9wM461VM4T3JtqLRbE3TbvjtiNZZnpsJ0X8HzTRLGJoSVXGSO8t+d0qF
 CE+XmQoMJ2f02bJiSQweq7sq/LpAnqTXJ0/zq8JcmpE4Kda8t64f8jVIMe0S4Uym
 T6ei0UUebBg957zwDqVp6F9sXE8FpPf2TQEJgk49QMFt9Azs9zD/L7/icG86sriZ
 2j6vjEElUW22iDV+13aiABrK4tXQPv2ksKZkWEvQeSaTmVgK5tk=
 =CiAC
 -----END PGP SIGNATURE-----

Merge tag 'nfsd-6.1-5' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux

Pull nfsd fix from Chuck Lever:

 - Fix another tracepoint crash

* tag 'nfsd-6.1-5' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux:
  NFSD: Fix trace_nfsd_fh_verify_err() crasher
2022-11-17 09:04:50 -08:00
Anastasia Belova
a51e5d293d cifs: add check for returning value of SMB2_set_info_init
If the returning value of SMB2_set_info_init is an error-value,
exit the function.

Found by Linux Verification Center (linuxtesting.org) with SVACE.

Fixes: 0967e54579 ("cifs: use a compound for setting an xattr")

Signed-off-by: Anastasia Belova <abelova@astralinux.ru>
Signed-off-by: Steve French <stfrench@microsoft.com>
2022-11-16 12:24:26 -06:00
Damien Le Moal
61ba9e9712 zonefs: Remove to_attr() helper function
to_attr() in zonefs sysfs code is unused, which it causes a warning when
compiling with clang and W=1. Delete it to prevent the warning.

Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
2022-11-16 16:08:31 +09:00
Damien Le Moal
7dd12d65ac zonefs: fix zone report size in __zonefs_io_error()
When an IO error occurs, the function __zonefs_io_error() is used to
issue a zone report to obtain the latest zone information from the
device. This function gets a zone report for all zones used as storage
for a file, which is always 1 zone except for files representing
aggregated conventional zones.

The number of zones of a zone report for a file is calculated in
__zonefs_io_error() by doing a bit-shift of the inode i_zone_size field,
which is equal to or larger than the device zone size. However, this
calculation does not take into account that the last zone of a zoned
device may be smaller than the zone size reported by bdev_zone_sectors()
(which is used to set the bit shift size). As a result, if an error
occurs for an IO targetting such last smaller zone, the zone report will
ask for 0 zones, leading to an invalid zone report.

Fix this by using the fact that all files require a 1 zone report,
except if the inode i_zone_size field indicates a zone size larger than
the device zone size. This exception case corresponds to a mount with
aggregated conventional zones.

A check for this exception is added to the file inode initialization
during mount. If an invalid setup is detected, emit an error and fail
the mount (check contributed by Johannes Thumshirn).

Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
2022-11-16 16:08:28 +09:00
Zhang Xiaoxu
92bbd67a55 cifs: Fix wrong return value checking when GETFLAGS
The return value of CIFSGetExtAttr is negative, should be checked
with -EOPNOTSUPP rather than EOPNOTSUPP.

Fixes: 64a5cfa6db ("Allow setting per-file compression via SMB2/3")
Signed-off-by: Zhang Xiaoxu <zhangxiaoxu5@huawei.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2022-11-16 00:21:04 -06:00
Linus Torvalds
59d0d52c30 netfslib fixes
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEqG5UsNXhtOCrfGQP+7dXa6fLC2sFAmNzxK8ACgkQ+7dXa6fL
 C2uN9g//bKgnlQpAbhEV7OQc4TqGatxHWzzOne77ceGjedyT1/IAFUJWIP8bAHSa
 e8UlJ/4rLBraoD1JaaFBARprJcy6vv8iX7hFWGgBWWcM23V5FQqSwogzLBqKI3LI
 Ig3w8159CHSnG0r9LImB7cPuZtE/3W+GxapnaUITeSSiirocW0q3/KafGn8QvFy3
 /jYbRNKr6q7Ya6QBx8xKF6iI5jN0WmsqroUdebGOZSoLSq/45OaZoksFeVJT8FLR
 wwlpdcQu3Ag/3wELIZNUSu39UMQE6ttcXehTybTYiAMSeZq71G8TooHGAxFzqdRK
 peYoneBcG2l4GtZm6/SXRjBdtju3ZaL8PU9JPK2q3tUx6sQY8HG6w4bbDohA2BhM
 upG1rC2bclZ6FJ+Q0FRI5y/YoOc+FsVydTZIaW/yd8DUKLYP0+2e7tM5xVTnqqBj
 UXiTe8J+kSHRSmZPX+DEcdVVYiR770iyEYXNHBGIlFvcYh2bi76pW5dVmn5bEMSU
 AV1R0rAnB9u6wTdVwDJUrDIVMOii2qNn6+XU908fida/T6P4d+9cTC8R8WAR454i
 7mHY3saavl8arSasXtZyG3kffm3UNilFdrNxdITlk+K6lcjF4la9cFEhM7T3w73M
 nJuaTLFI0iiT/321mSnTsCHB+HXN1OiV8H0v/Eani41HecCplsQ=
 =VxQ6
 -----END PGP SIGNATURE-----

AMerge tag 'netfs-fixes-20221115' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs

Pull netfx fixes from David Howells:
 "Two fixes, affecting the functions that iterates over the pagecache
  unmarking or unlocking pages after an op is complete:

   - xas_for_each() loops must call xas_retry() first thing and
     immediately do a "continue" in the case that the extracted value is
     a special value that indicates that the walk raced with a
     modification. Fix the unlock and unmark loops to do this.

   - The maths in the unlock loop is dodgy as it could, theoretically,
     at some point in the future end up with a starting file pointer
     that is in the middle of a folio. This will cause a subtraction to
     go negative - but the number is unsigned. Fix the maths to use
     absolute file positions instead of relative page indices"

* tag 'netfs-fixes-20221115' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs:
  netfs: Fix dodgy maths
  netfs: Fix missing xas_retry() calls in xarray iteration
2022-11-15 14:56:23 -08:00
Anastasia Belova
d520de6cb4 cifs: add check for returning value of SMB2_close_init
If the returning value of SMB2_close_init is an error-value,
exit the function.

Found by Linux Verification Center (linuxtesting.org) with SVACE.

Fixes: 352d96f3ac ("cifs: multichannel: move channel selection above transport layer")

Signed-off-by: Anastasia Belova <abelova@astralinux.ru>
Signed-off-by: Steve French <stfrench@microsoft.com>
2022-11-15 12:57:54 -06:00
Linus Torvalds
81e7cfa3a9 Changes since last update:
- Fix packed_inode invalid access when reading fragments on crafted
    images;
 
  - Add a missing erofs_put_metabuf() in an error path in fscache mode;
 
  - Fix incorrect `count' for unmapped extents in fscache mode;
 
  - Fix use-after-free of fsid and domain_id string when remounting;
 
  - Fix missing xas_retry() in fscache mode.
 -----BEGIN PGP SIGNATURE-----
 
 iIcEABYIAC8WIQThPAmQN9sSA0DVxtI5NzHcH7XmBAUCY3OcchEceGlhbmdAa2Vy
 bmVsLm9yZwAKCRA5NzHcH7XmBCzgAP92t7Lfu7gBuyhXfCJwJVFK0Iku8j9mhOiT
 +C/RVB+9zQEAk/2vy3ULcGN5k6k2q7OgEzNxQ/jM3hVQnoK+sQzwVwA=
 =XHmr
 -----END PGP SIGNATURE-----

Merge tag 'erofs-for-6.1-rc6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs

Pull erofs fixes from Gao Xiang:
 "Most patches randomly fix error paths or corner cases in fscache mode
  reported recently. One fixes an invalid access relating to fragments
  on crafted images.

  Summary:

   - Fix packed_inode invalid access when reading fragments on crafted
     images

   - Add a missing erofs_put_metabuf() in an error path in fscache mode

   - Fix incorrect `count' for unmapped extents in fscache mode

   - Fix use-after-free of fsid and domain_id string when remounting

   - Fix missing xas_retry() in fscache mode"

* tag 'erofs-for-6.1-rc6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs:
  erofs: fix missing xas_retry() in fscache mode
  erofs: fix use-after-free of fsid and domain_id string
  erofs: get correct count for unmapped range in fscache mode
  erofs: put metabuf in error path in fscache mode
  erofs: fix general protection fault when reading fragment
2022-11-15 10:30:34 -08:00
David Howells
5e51c627c5 netfs: Fix dodgy maths
Fix the dodgy maths in netfs_rreq_unlock_folios().  start_page could be
inside the folio, in which case the calculation of pgpos will be come up
with a negative number (though for the moment rreq->start is rounded down
earlier and folios would have to get merged whilst locked)

Alter how this works to just frame the tracking in terms of absolute file
positions, rather than offsets from the start of the I/O request.  This
simplifies the maths and makes it easier to follow.

Fix the issue by using folio_pos() and folio_size() to calculate the end
position of the page.

Fixes: 3d3c950467 ("netfs: Provide readahead and readpage netfs helpers")
Reported-by: Matthew Wilcox <willy@infradead.org>
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Jingbo Xu <jefflexu@linux.alibaba.com>
cc: linux-cachefs@redhat.com
cc: linux-fsdevel@vger.kernel.org
Link: https://lore.kernel.org/r/Y2SJw7w1IsIik3nb@casper.infradead.org/
Link: https://lore.kernel.org/r/166757988611.950645.7626959069846893164.stgit@warthog.procyon.org.uk/ # v2
2022-11-15 16:56:07 +00:00
David Howells
7e043a80b5 netfs: Fix missing xas_retry() calls in xarray iteration
netfslib has a number of places in which it performs iteration of an xarray
whilst being under the RCU read lock.  It *should* call xas_retry() as the
first thing inside of the loop and do "continue" if it returns true in case
the xarray walker passed out a special value indicating that the walk needs
to be redone from the root[*].

Fix this by adding the missing retry checks.

[*] I wonder if this should be done inside xas_find(), xas_next_node() and
    suchlike, but I'm told that's not an simple change to effect.

This can cause an oops like that below.  Note the faulting address - this
is an internal value (|0x2) returned from xarray.

BUG: kernel NULL pointer dereference, address: 0000000000000402
...
RIP: 0010:netfs_rreq_unlock+0xef/0x380 [netfs]
...
Call Trace:
 netfs_rreq_assess+0xa6/0x240 [netfs]
 netfs_readpage+0x173/0x3b0 [netfs]
 ? init_wait_var_entry+0x50/0x50
 filemap_read_page+0x33/0xf0
 filemap_get_pages+0x2f2/0x3f0
 filemap_read+0xaa/0x320
 ? do_filp_open+0xb2/0x150
 ? rmqueue+0x3be/0xe10
 ceph_read_iter+0x1fe/0x680 [ceph]
 ? new_sync_read+0x115/0x1a0
 new_sync_read+0x115/0x1a0
 vfs_read+0xf3/0x180
 ksys_read+0x5f/0xe0
 do_syscall_64+0x38/0x90
 entry_SYSCALL_64_after_hwframe+0x44/0xae

Changes:
========
ver #2)
 - Changed an unsigned int to a size_t to reduce the likelihood of an
   overflow as per Willy's suggestion.
 - Added an additional patch to fix the maths.

Fixes: 3d3c950467 ("netfs: Provide readahead and readpage netfs helpers")
Reported-by: George Law <glaw@redhat.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Jingbo Xu <jefflexu@linux.alibaba.com>
cc: Matthew Wilcox <willy@infradead.org>
cc: linux-cachefs@redhat.com
cc: linux-fsdevel@vger.kernel.org
Link: https://lore.kernel.org/r/166749229733.107206.17482609105741691452.stgit@warthog.procyon.org.uk/ # v1
Link: https://lore.kernel.org/r/166757987929.950645.12595273010425381286.stgit@warthog.procyon.org.uk/ # v2
2022-11-15 16:49:06 +00:00