2
0
mirror of https://github.com/edk2-porting/linux-next.git synced 2024-12-29 23:53:55 +08:00
Commit Graph

71584 Commits

Author SHA1 Message Date
Linus Torvalds
e9338abf0e fallthrough fixes for Clang for 5.14-rc2
Hi Linus,
 
 Please, pull the following patches that fix many fall-through
 warnings when building with Clang and -Wimplicit-fallthrough.
 
 This pull-request also contains the patch for Makefile that enables
 -Wimplicit-fallthrough for Clang, globally.
 
 It's also important to notice that since we have adopted the use of
 the pseudo-keyword macro fallthrough; we also want to avoid having
 more /* fall through */ comments being introduced. Notice that contrary
 to GCC, Clang doesn't recognize any comments as implicit fall-through
 markings when the -Wimplicit-fallthrough option is enabled. So, in
 order to avoid having more comments being introduced, we have to use
 the option -Wimplicit-fallthrough=5 for GCC, which similar to Clang,
 will cause a warning in case a code comment is intended to be used
 as a fall-through marking. The patch for Makefile also enforces this.
 
 We had almost 4,000 of these issues for Clang in the beginning,
 and there might be a couple more out there when building some
 architectures with certain configurations. However, with the
 recent fixes I think we are in good shape and it is now possible
 to enable -Wimplicit-fallthrough for Clang. :)
 
 Thanks!
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEkmRahXBSurMIg1YvRwW0y0cG2zEFAmDvPV0ACgkQRwW0y0cG
 2zGJ1xAAsbSN8I+ESxFCKi4UwQC71988isb0rHGL6rQPBbEYD2bbI+82aGu6WH8z
 2ZVQZa5x3tUoIsqR2GMK5Npn1dFvuNWGZkYzlgjAp+FgzaULCLUc4NW9DEWU/Kbd
 UBz8ilXL7tUAlDalUYjXAVVTxMnRTZ+BDCbEmIdRZ5X/pMlD2xHeMGG4kh8/cHjJ
 +mqFwCFPZoyLJzPBnm37OEfIr8JxrLZAo1gfmz5sxfmBdYDY3hjV/BVgP8h0bkY9
 ITaq0LMEAPMXvU+4DP3DxuqzRr9COYMcddmbaWwsXPd7UAuoXwGLvqx7JE0QqY3g
 J80w4/hXqEa4o4/fUAsfjYEQoNTEnhl0iGskBJD2xy5Lj17/m4aOSL44ibr2Ag67
 yfJPWi9tooYELEGNobPVHPuqk4ts6amKTOKqHWMBEcTEOIGzaGWPEjRoyaMivfZ3
 G3ZbEPEYERcJRizm63UbciLeslGsxb6tdYQEy5VI8Wa7caz9Ci9HT+jjS7Vm2fuq
 1zg+vIgwuSxvwxrSV/PDTRcQm9EM/MoRL0D4jbr7gtYD6KNwH8Vo6M21CoaB9gH3
 FgMiCT4zy9KiONjVDJKkjtzuk+9OwpI5AAg0/fAQ4Ak9TzUN3Xfg7HpytETCIJ0Z
 Ii3Jqwe+3IRTNq94ZTIiX/0hT7i5GsPxUfPzI2vwttCJn9ctUcM=
 =XVNN
 -----END PGP SIGNATURE-----

Merge tag 'Wimplicit-fallthrough-clang-5.14-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gustavoars/linux

Pull fallthrough fixes from Gustavo Silva:
 "This fixes many fall-through warnings when building with Clang and
  -Wimplicit-fallthrough, and also enables -Wimplicit-fallthrough for
  Clang, globally.

  It's also important to notice that since we have adopted the use of
  the pseudo-keyword macro fallthrough, we also want to avoid having
  more /* fall through */ comments being introduced. Contrary to GCC,
  Clang doesn't recognize any comments as implicit fall-through markings
  when the -Wimplicit-fallthrough option is enabled.

  So, in order to avoid having more comments being introduced, we use
  the option -Wimplicit-fallthrough=5 for GCC, which similar to Clang,
  will cause a warning in case a code comment is intended to be used as
  a fall-through marking. The patch for Makefile also enforces this.

  We had almost 4,000 of these issues for Clang in the beginning, and
  there might be a couple more out there when building some
  architectures with certain configurations. However, with the recent
  fixes I think we are in good shape and it is now possible to enable
  the warning for Clang"

* tag 'Wimplicit-fallthrough-clang-5.14-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gustavoars/linux: (27 commits)
  Makefile: Enable -Wimplicit-fallthrough for Clang
  powerpc/smp: Fix fall-through warning for Clang
  dmaengine: mpc512x: Fix fall-through warning for Clang
  usb: gadget: fsl_qe_udc: Fix fall-through warning for Clang
  powerpc/powernv: Fix fall-through warning for Clang
  MIPS: Fix unreachable code issue
  MIPS: Fix fall-through warnings for Clang
  ASoC: Mediatek: MT8183: Fix fall-through warning for Clang
  power: supply: Fix fall-through warnings for Clang
  dmaengine: ti: k3-udma: Fix fall-through warning for Clang
  s390: Fix fall-through warnings for Clang
  dmaengine: ipu: Fix fall-through warning for Clang
  iommu/arm-smmu-v3: Fix fall-through warning for Clang
  mmc: jz4740: Fix fall-through warning for Clang
  PCI: Fix fall-through warning for Clang
  scsi: libsas: Fix fall-through warning for Clang
  video: fbdev: Fix fall-through warning for Clang
  math-emu: Fix fall-through warning
  cpufreq: Fix fall-through warning for Clang
  drm/msm: Fix fall-through warning in msm_gem_new_impl()
  ...
2021-07-15 13:57:31 -07:00
Linus Torvalds
dd9c7df94c Merge branch 'akpm' (patches from Andrew)
Merge misc fixes from Andrew Morton:
 "13 patches.

  Subsystems affected by this patch series: mm (kasan, pagealloc, rmap,
  hmm, and hugetlb), and hfs"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
  mm/hugetlb: fix refs calculation from unaligned @vaddr
  hfs: add lock nesting notation to hfs_find_init
  hfs: fix high memory mapping in hfs_bnode_read
  hfs: add missing clean-up in hfs_fill_super
  lib/test_hmm: remove set but unused page variable
  mm: fix the try_to_unmap prototype for !CONFIG_MMU
  mm/page_alloc: further fix __alloc_pages_bulk() return value
  mm/page_alloc: correct return value when failing at preparing
  mm/page_alloc: avoid page allocator recursion with pagesets.lock held
  Revert "mm/page_alloc: make should_fail_alloc_page() static"
  kasan: fix build by including kernel.h
  kasan: add memzero init for unaligned size at DEBUG
  mm: move helper to check slub_debug_enabled
2021-07-15 12:17:05 -07:00
Desmond Cheong Zhi Xi
b3b2177a2d hfs: add lock nesting notation to hfs_find_init
Syzbot reports a possible recursive lock in [1].

This happens due to missing lock nesting information.  From the logs, we
see that a call to hfs_fill_super is made to mount the hfs filesystem.
While searching for the root inode, the lock on the catalog btree is
grabbed.  Then, when the parent of the root isn't found, a call to
__hfs_bnode_create is made to create the parent of the root.  This
eventually leads to a call to hfs_ext_read_extent which grabs a lock on
the extents btree.

Since the order of locking is catalog btree -> extents btree, this lock
hierarchy does not lead to a deadlock.

To tell lockdep that this locking is safe, we add nesting notation to
distinguish between catalog btrees, extents btrees, and attributes
btrees (for HFS+).  This has already been done in hfsplus.

Link: https://syzkaller.appspot.com/bug?id=f007ef1d7a31a469e3be7aeb0fde0769b18585db [1]
Link: https://lkml.kernel.org/r/20210701030756.58760-4-desmondcheongzx@gmail.com
Signed-off-by: Desmond Cheong Zhi Xi <desmondcheongzx@gmail.com>
Reported-by: syzbot+b718ec84a87b7e73ade4@syzkaller.appspotmail.com
Tested-by: syzbot+b718ec84a87b7e73ade4@syzkaller.appspotmail.com
Reviewed-by: Viacheslav Dubeyko <slava@dubeyko.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Gustavo A. R. Silva <gustavoars@kernel.org>
Cc: Shuah Khan <skhan@linuxfoundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-07-15 10:13:49 -07:00
Desmond Cheong Zhi Xi
54a5ead6f5 hfs: fix high memory mapping in hfs_bnode_read
Pages that we read in hfs_bnode_read need to be kmapped into kernel
address space.  However, currently only the 0th page is kmapped.  If the
given offset + length exceeds this 0th page, then we have an invalid
memory access.

To fix this, we kmap relevant pages one by one and copy their relevant
portions of data.

An example of invalid memory access occurring without this fix can be seen
in the following crash report:

  ==================================================================
  BUG: KASAN: use-after-free in memcpy include/linux/fortify-string.h:191 [inline]
  BUG: KASAN: use-after-free in hfs_bnode_read+0xc4/0xe0 fs/hfs/bnode.c:26
  Read of size 2 at addr ffff888125fdcffe by task syz-executor5/4634

  CPU: 0 PID: 4634 Comm: syz-executor5 Not tainted 5.13.0-syzkaller #0
  Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
  Call Trace:
   __dump_stack lib/dump_stack.c:79 [inline]
   dump_stack+0x195/0x1f8 lib/dump_stack.c:120
   print_address_description.constprop.0+0x1d/0x110 mm/kasan/report.c:233
   __kasan_report mm/kasan/report.c:419 [inline]
   kasan_report.cold+0x7b/0xd4 mm/kasan/report.c:436
   check_region_inline mm/kasan/generic.c:180 [inline]
   kasan_check_range+0x154/0x1b0 mm/kasan/generic.c:186
   memcpy+0x24/0x60 mm/kasan/shadow.c:65
   memcpy include/linux/fortify-string.h:191 [inline]
   hfs_bnode_read+0xc4/0xe0 fs/hfs/bnode.c:26
   hfs_bnode_read_u16 fs/hfs/bnode.c:34 [inline]
   hfs_bnode_find+0x880/0xcc0 fs/hfs/bnode.c:365
   hfs_brec_find+0x2d8/0x540 fs/hfs/bfind.c:126
   hfs_brec_read+0x27/0x120 fs/hfs/bfind.c:165
   hfs_cat_find_brec+0x19a/0x3b0 fs/hfs/catalog.c:194
   hfs_fill_super+0xc13/0x1460 fs/hfs/super.c:419
   mount_bdev+0x331/0x3f0 fs/super.c:1368
   hfs_mount+0x35/0x40 fs/hfs/super.c:457
   legacy_get_tree+0x10c/0x220 fs/fs_context.c:592
   vfs_get_tree+0x93/0x300 fs/super.c:1498
   do_new_mount fs/namespace.c:2905 [inline]
   path_mount+0x13f5/0x20e0 fs/namespace.c:3235
   do_mount fs/namespace.c:3248 [inline]
   __do_sys_mount fs/namespace.c:3456 [inline]
   __se_sys_mount fs/namespace.c:3433 [inline]
   __x64_sys_mount+0x2b8/0x340 fs/namespace.c:3433
   do_syscall_64+0x37/0xc0 arch/x86/entry/common.c:47
   entry_SYSCALL_64_after_hwframe+0x44/0xae
  RIP: 0033:0x45e63a
  Code: 48 c7 c2 bc ff ff ff f7 d8 64 89 02 b8 ff ff ff ff eb d2 e8 88 04 00 00 0f 1f 84 00 00 00 00 00 49 89 ca b8 a5 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 bc ff ff ff f7 d8 64 89 01 48
  RSP: 002b:00007f9404d410d8 EFLAGS: 00000246 ORIG_RAX: 00000000000000a5
  RAX: ffffffffffffffda RBX: 0000000020000248 RCX: 000000000045e63a
  RDX: 0000000020000000 RSI: 0000000020000100 RDI: 00007f9404d41120
  RBP: 00007f9404d41120 R08: 00000000200002c0 R09: 0000000020000000
  R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000003
  R13: 0000000000000003 R14: 00000000004ad5d8 R15: 0000000000000000

  The buggy address belongs to the page:
  page:00000000dadbcf3e refcount:0 mapcount:0 mapping:0000000000000000 index:0x1 pfn:0x125fdc
  flags: 0x2fffc0000000000(node=0|zone=2|lastcpupid=0x3fff)
  raw: 02fffc0000000000 ffffea000497f748 ffffea000497f6c8 0000000000000000
  raw: 0000000000000001 0000000000000000 00000000ffffffff 0000000000000000
  page dumped because: kasan: bad access detected

  Memory state around the buggy address:
   ffff888125fdce80: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
   ffff888125fdcf00: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
  >ffff888125fdcf80: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
                                                                  ^
   ffff888125fdd000: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
   ffff888125fdd080: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
  ==================================================================

Link: https://lkml.kernel.org/r/20210701030756.58760-3-desmondcheongzx@gmail.com
Signed-off-by: Desmond Cheong Zhi Xi <desmondcheongzx@gmail.com>
Reviewed-by: Viacheslav Dubeyko <slava@dubeyko.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Gustavo A. R. Silva <gustavoars@kernel.org>
Cc: Shuah Khan <skhan@linuxfoundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-07-15 10:13:49 -07:00
Desmond Cheong Zhi Xi
16ee572eaf hfs: add missing clean-up in hfs_fill_super
Patch series "hfs: fix various errors", v2.

This series ultimately aims to address a lockdep warning in
hfs_find_init reported by Syzbot [1].

The work done for this led to the discovery of another bug, and the
Syzkaller repro test also reveals an invalid memory access error after
clearing the lockdep warning.  Hence, this series is broken up into
three patches:

1. Add a missing call to hfs_find_exit for an error path in
   hfs_fill_super

2. Fix memory mapping in hfs_bnode_read by fixing calls to kmap

3. Add lock nesting notation to tell lockdep that the observed locking
   hierarchy is safe

This patch (of 3):

Before exiting hfs_fill_super, the struct hfs_find_data used in
hfs_find_init should be passed to hfs_find_exit to be cleaned up, and to
release the lock held on the btree.

The call to hfs_find_exit is missing from an error path.  We add it back
in by consolidating calls to hfs_find_exit for error paths.

Link: https://syzkaller.appspot.com/bug?id=f007ef1d7a31a469e3be7aeb0fde0769b18585db [1]
Link: https://lkml.kernel.org/r/20210701030756.58760-1-desmondcheongzx@gmail.com
Link: https://lkml.kernel.org/r/20210701030756.58760-2-desmondcheongzx@gmail.com
Signed-off-by: Desmond Cheong Zhi Xi <desmondcheongzx@gmail.com>
Reviewed-by: Viacheslav Dubeyko <slava@dubeyko.com>
Cc: Gustavo A. R. Silva <gustavoars@kernel.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Shuah Khan <skhan@linuxfoundation.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-07-15 10:13:49 -07:00
Christian Brauner
d1d488d813 fs: add vfs_parse_fs_param_source() helper
Add a simple helper that filesystems can use in their parameter parser
to parse the "source" parameter. A few places open-coded this function
and that already caused a bug in the cgroup v1 parser that we fixed.
Let's make it harder to get this wrong by introducing a helper which
performs all necessary checks.

Link: https://syzkaller.appspot.com/bug?id=6312526aba5beae046fdae8f00399f87aab48b12
Cc: Christoph Hellwig <hch@lst.de>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-07-14 09:19:06 -07:00
Linus Torvalds
40226a3d96 Signed tag for a set of bugfixes for vboxsf for 5.14
This patch series adds support for the atomic_open
 directory-inode op to vboxsf.
 
 Note this is not just an enhancement this also fixes an actual issue
 which users are hitting, see the commit message of the
 "boxsf: Add support for the atomic_open directory-inode" patch.
 -----BEGIN PGP SIGNATURE-----
 
 iQFIBAABCAAyFiEEuvA7XScYQRpenhd+kuxHeUQDJ9wFAmDTLTsUHGhkZWdvZWRl
 QHJlZGhhdC5jb20ACgkQkuxHeUQDJ9ytCAf8DjQurYh0B+E5i9pFL1hLgS715rnD
 4qu8GT7+DF/9Yj8Mpg7aS/v1GNwjWBE506Fj6bc8E3s637OrflBiqqtFM6a/jcfP
 i3RDtfCiTD9jeDT5OPhV4esuQvXnQ63ldXFSHf1TxaNb4Be8OmACibnSvslyC+Eb
 YhKtMRH+oKeQfob3rbTJBglkDRe1KUuA2zGPBuYheaLLaYSHrj1xSRCoGY6mJMBJ
 pP5FCT/nOsgxD6zej3/aa57put9kZoYVlu1TLnCfkggzuirL+82/pABC3ZYTtsM8
 jeby97djOI/fufIlVD1yX7q+kzyVWj3ouparoAKsu5TDSmmIRYnfu/RlNA==
 =yFXG
 -----END PGP SIGNATURE-----

Merge tag 'vboxsf-v5.14-1' of git://git.kernel.org/pub/scm/linux/kernel/git/hansg/linux

Pull vboxsf fixes from Hans de Goede:
 "This adds support for the atomic_open directory-inode op to vboxsf.

  Note this is not just an enhancement this also fixes an actual issue
  which users are hitting, see the commit message of the "boxsf: Add
  support for the atomic_open directory-inode" patch"

* tag 'vboxsf-v5.14-1' of git://git.kernel.org/pub/scm/linux/kernel/git/hansg/linux:
  vboxsf: Add support for the atomic_open directory-inode op
  vboxsf: Add vboxsf_[create|release]_sf_handle() helpers
  vboxsf: Make vboxsf_dir_create() return the handle for the created file
  vboxsf: Honor excl flag to the dir-inode create op
2021-07-13 12:07:18 -07:00
Linus Torvalds
f02bf8578b for-5.14-rc1-tag
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEE8rQSAMVO+zA4DBdWxWXV+ddtWDsFAmDsjSEACgkQxWXV+ddt
 WDtnZRAAieSXta8GaJYNF4cKs7xHttIkNl0ljJHsJsKoN5kCxW22RWsf8gAyToT3
 XERkJfRksgMH0Th3StJqTxg0fQTSiSi1bcz+wJjMVvQev2gX8dw7O05GLZT5GTzx
 zquI57+OGDEpQdEM6YzrUl+tYnO0roibI2LQeMWUXYXJTy6F75zWjBqKcTGcnfGc
 d8bOi6ijN4F148zIxvr6ahHrQN9WGwD5OWA1I5RqHBadgwCDWsQIdE6/N1Kdavf5
 uW785lJ8a4VqOWyM7Y0kp4madnF9rwZ/CFyoQFJ51oG/NrUf469+bCBFM8VOEwSa
 c3ZaqvF8CF3sndSAYiI4MEBFbM2O4hIVl/B9NkjDXDu3VlkRwwHDxZfadvc4BzsG
 kfisaw/GbOvOv8ojxBq4ux2nbRIVul096HpZH4UWHs/MCQ5Ct40OP5sG77YZKQgf
 o+D65V3NMn1gnp+B8wqyNnraY4hAoBePoK9f3IH+WXF5hlk6gWkbWxmXxCIPvJM4
 XTJUcNCXDZtKA9KRgOmcP9fZSu4gyD3hbDRgU5nKkLLSGE+mE4BRmtnq91VnT7FA
 5Nxlrjw9Na9LoyXYaoHcCksj207KU6WVgIjK4OFJarLMWlSDYBwAQCX0+voG+ZBq
 qa6BuLpq2aJhB6Q4M3MdAQSbhfR6tcI+HENCQlFHa6Je7oY9NVQ=
 =v00S
 -----END PGP SIGNATURE-----

Merge tag 'for-5.14-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux

Pull btrfs zoned mode fixes from David Sterba:

 - fix deadlock when allocating system chunk

 - fix wrong mutex unlock on an error path

 - fix extent map splitting for append operation

 - update and fix message reporting unusable chunk space

 - don't block when background zone reclaim runs with balance in
   parallel

* tag 'for-5.14-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
  btrfs: zoned: fix wrong mutex unlock on failure to allocate log root tree
  btrfs: don't block if we can't acquire the reclaim lock
  btrfs: properly split extent_map for REQ_OP_ZONE_APPEND
  btrfs: rework chunk allocation to avoid exhaustion of the system chunk array
  btrfs: fix deadlock with concurrent chunk allocations involving system chunks
  btrfs: zoned: print unusable percentage when reclaiming block groups
  btrfs: zoned: fix types for u64 division in btrfs_reclaim_bgs_work
2021-07-13 12:02:07 -07:00
Gustavo A. R. Silva
e8865537a6 fcntl: Fix unreachable code in do_fcntl()
Fix the following warning:

fs/fcntl.c:373:3: warning: fallthrough annotation in unreachable code [-Wimplicit-fallthrough]
                   fallthrough;
                   ^
   include/linux/compiler_attributes.h:210:41: note: expanded from macro 'fallthrough'
   # define fallthrough                    __attribute__((__fallthrough__))

by placing the fallthrough; statement inside ifdeffery.

Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
2021-07-12 11:09:13 -05:00
Gustavo A. R. Silva
5937e00017 xfs: Fix multiple fall-through warnings for Clang
In preparation to enable -Wimplicit-fallthrough for Clang, fix
the following warnings by replacing /* fallthrough */ comments,
and its variants, with the new pseudo-keyword macro fallthrough:

fs/xfs/libxfs/xfs_attr.c:487:2: warning: unannotated fall-through between switch labels [-Wimplicit-fallthrough]
fs/xfs/libxfs/xfs_attr.c:500:2: warning: unannotated fall-through between switch labels [-Wimplicit-fallthrough]
fs/xfs/libxfs/xfs_attr.c:532:2: warning: unannotated fall-through between switch labels [-Wimplicit-fallthrough]
fs/xfs/libxfs/xfs_attr.c:594:2: warning: unannotated fall-through between switch labels [-Wimplicit-fallthrough]
fs/xfs/libxfs/xfs_attr.c:607:2: warning: unannotated fall-through between switch labels [-Wimplicit-fallthrough]
fs/xfs/libxfs/xfs_attr.c:1410:2: warning: unannotated fall-through between switch labels [-Wimplicit-fallthrough]
fs/xfs/libxfs/xfs_attr.c:1445:2: warning: unannotated fall-through between switch labels [-Wimplicit-fallthrough]
fs/xfs/libxfs/xfs_attr.c:1473:2: warning: unannotated fall-through between switch labels [-Wimplicit-fallthrough]

Notice that Clang doesn't recognize /* fallthrough */ comments as
implicit fall-through markings, so in order to globally enable
-Wimplicit-fallthrough for Clang, these comments need to be
replaced with fallthrough; in the whole codebase.

Link: https://github.com/KSPP/linux/issues/115
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
2021-07-12 00:25:09 -05:00
Linus Torvalds
1e16624d7b 13 cifs/smb3 fixes. Most are to address minor issues pointed out by Coverity. Also includes a packet signing enhancement and mount improvement
-----BEGIN PGP SIGNATURE-----
 
 iQGzBAABCgAdFiEE6fsu8pdIjtWE/DpLiiy9cAdyT1EFAmDpFjwACgkQiiy9cAdy
 T1GriwwApJ/dwPeiWeotsw8kiFQjM0S/wu0JByPLJIjZ27LN3D4aGO8uqaB861G4
 4Q+w545poIcRmn//2PwhN6LZUnBY6RkPCTra4mbVtMB6c1MU9NsqRe3vqiqKN167
 qc2zCTl/z67jamUIwuWHB+ljphecGKBsSpik1ZqQkTbMIz+SXoMfkgr3R3CAXkpY
 EyzuLCaIU3PC8MCb749LEG6Xkq+7sT1GIaoW9BPle/ZFKbZ3acvPGzVcA4s4qlDH
 /mNZ3cVKxGDv0CxUqa9wkpelfuaooFLmWwah3mvG7YhQ2gxmFlty8/+ppP2LvTpm
 ZRLmpeot7PctDnNTSRhrvm+XC3HqzLLrXXtaGvoW4zTKa3RrWFEcr5tppMmiHjJt
 gTJhkkINTHv0Ewp677/vu8xnEPLgBcR0t2IEPNjgI0jqPuYrjT6EF0wsSUP3K00O
 ODJGT/q/dmtqjezenmOi9MQnGKox/9C2uuAcagWPzHoeesYfK8Sh+q5tc+bfykHz
 6QN5jKp/
 =7Jd9
 -----END PGP SIGNATURE-----

Merge tag '5.14-rc-smb3-fixes-part2' of git://git.samba.org/sfrench/cifs-2.6

Pull cifs fixes from Steve French:
 "13 cifs/smb3 fixes. Most are to address minor issues pointed out by
  Coverity.

  Also includes a packet signing enhancement and mount improvement"

* tag '5.14-rc-smb3-fixes-part2' of git://git.samba.org/sfrench/cifs-2.6:
  cifs: update internal version number
  cifs: prevent NULL deref in cifs_compose_mount_options()
  SMB3.1.1: Add support for negotiating signing algorithm
  cifs: use helpers when parsing uid/gid mount options and validate them
  CIFS: Clarify SMB1 code for POSIX Lock
  CIFS: Clarify SMB1 code for rename open file
  CIFS: Clarify SMB1 code for delete
  CIFS: Clarify SMB1 code for SetFileSize
  smb3: fix typo in header file
  CIFS: Clarify SMB1 code for UnixSetPathInfo
  CIFS: Clarify SMB1 code for UnixCreateSymLink
  cifs: clarify SMB1 code for UnixCreateHardLink
  cifs: make locking consistent around the server session status
2021-07-10 12:04:58 -07:00
Linus Torvalds
50be9417e2 io_uring-5.14-2021-07-09
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmDoXgsQHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgpv7XEADFFTtlR/xLAgPoVA5XX25g4m/k/Uo2zY+Y
 JXDHX55MIpSHTycnfXESiwf2a6fTkbKCAlcGbkqKxySxyFF36vvcRXF4PnspeRI5
 LjZlhUgqqFgWQ5Evl3LcKpG8sLsg1fB6vjD0kI+x9nlNA36Ly6egIgZ+i39tk7zK
 m72ez0A5B1sKTn78BNze+akaSSjDmPTvb53E0Jl4eqn4cTKySXQ5JJlG71ke/4lJ
 VUNdufrxZMP0EvBTwxjQCg0DwhCy57tYiB98/OrMagfbOxxsF9GxKafvmtpN4fev
 rLLnKs4tU68aRK3nVeIbUYfQAquNiNdy9Hoqv4XYzN7NBMnGiyRmmYLN1pbA7OlW
 lmc0v2c4ORoeHMTChQlEcx6xKVpGrRS7+S8P+Vz81JPpn+OO1sRhgzjc9/FdEdjy
 sTBnQg23V274zL6jZsIL9HZzdI0fv5ybzuG061Tj7P6TUs9E/nn0wGo6m8FWTpSd
 B5Gl42SOWlybFETXessXFECd7TrbvT6Dkghg//GnbhtfUNtG4NwBfuIJLVrFGwJr
 RGu5YXxSP79YuAdqP0cOlr3O/i/BrdqjmveDSUbpMHANEz6gGTplcCOwOs4QWBCI
 HO2ExxEP2NmLOtwXfGabTlt6Wlo3zj1ToHhpOmXCYVfEZonhHxcwV4odWfRRx43A
 OTSfC1Tvlg==
 =74oU
 -----END PGP SIGNATURE-----

Merge tag 'io_uring-5.14-2021-07-09' of git://git.kernel.dk/linux-block

Pull io_uring fixes from Jens Axboe:
 "A few fixes that should go into this merge.

  One fixes a regression introduced in this release, others are just
  generic fixes, mostly related to handling fallback task_work"

* tag 'io_uring-5.14-2021-07-09' of git://git.kernel.dk/linux-block:
  io_uring: remove dead non-zero 'poll' check
  io_uring: mitigate unlikely iopoll lag
  io_uring: fix drain alloc fail return code
  io_uring: fix exiting io_req_task_work_add leaks
  io_uring: simplify task_work func
  io_uring: fix stuck fallback reqs
2021-07-09 12:17:38 -07:00
Linus Torvalds
a022f7d575 block-5.14-2021-07-08
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmDnGVYQHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgpv6UEAC78zkseI8TmKaowNfkz/+MkP9eSFb1pVn3
 rxpbPOsZompHoZpeWt4oHL+3Rmm3a9iRo/APA2ELas4zvp+Q+6uG7eha2Dc4hUA9
 YgeO4z9YfG8wQNZc3x7bncb6ZwqEE5nnbFe/m25SyrAZVLlZ7FKHxfoZDqjhlGFC
 eLNiYO6vdvwgCoBMcotyCDttrPfEu6947/5vB1zevv57twdQQaEWGUhvyx1XrlDX
 0YD5fmdOjNU2isgxt4xo2Ur2zL6w254/hvj58sV3Z7JfkJpI9DCK+ztKEfzuyEhA
 WYz06rDAT1+1KuVLfowaZ+pYiPPOIsL0+QXI83r3nLaE7WGGlfS8Hmz//1FbziYs
 ZSZI826kEN+/lKeWTcKOOMhmkYyXEFFuQZS34eg9KI4xwML8v+ILlHmcp+tjebw9
 vzNF6f7N2ki+jnyxxyNxeMHxeAMWsqnIRROOhZg6bbs6UVNpDy4qRzpQaDOaJsVe
 uSAQ6PTd/etR9KE+ClhLe6X7Rmp/lfZCPe64wqM/3k1qV2KWhE1fwCQO4c5o1MBN
 rpk3Ef5PZYP3aakCvZnfcjMWlpZNbq/xMc6vPc+yq32akq1t1KbODVBiR5odcH0C
 Gt5N11im50SO06haBt7EOe4JMQLbK5sxG15t4C6mNQZgPegGfaLlVkKpzIkOzUha
 OkRofKMcDA==
 =gHse
 -----END PGP SIGNATURE-----

Merge tag 'block-5.14-2021-07-08' of git://git.kernel.dk/linux-block

Pull more block updates from Jens Axboe:
 "A combination of changes that ended up depending on both the driver
  and core branch (and/or the IDE removal), and a few late arriving
  fixes. In detail:

   - Fix io ticks wrap-around issue (Chunguang)

   - nvme-tcp sock locking fix (Maurizio)

   - s390-dasd fixes (Kees, Christoph)

   - blk_execute_rq polling support (Keith)

   - blk-cgroup RCU iteration fix (Yu)

   - nbd backend ID addition (Prasanna)

   - Partition deletion fix (Yufen)

   - Use blk_mq_alloc_disk for mmc, mtip32xx, ubd (Christoph)

   - Removal of now dead block request types due to IDE removal
     (Christoph)

   - Loop probing and control device cleanups (Christoph)

   - Device uevent fix (Christoph)

   - Misc cleanups/fixes (Tetsuo, Christoph)"

* tag 'block-5.14-2021-07-08' of git://git.kernel.dk/linux-block: (34 commits)
  blk-cgroup: prevent rcu_sched detected stalls warnings while iterating blkgs
  block: fix the problem of io_ticks becoming smaller
  nvme-tcp: can't set sk_user_data without write_lock
  loop: remove unused variable in loop_set_status()
  block: remove the bdgrab in blk_drop_partitions
  block: grab a device refcount in disk_uevent
  s390/dasd: Avoid field over-reading memcpy()
  dasd: unexport dasd_set_target_state
  block: check disk exist before trying to add partition
  ubd: remove dead code in ubd_setup_common
  nvme: use return value from blk_execute_rq()
  block: return errors from blk_execute_rq()
  nvme: use blk_execute_rq() for passthrough commands
  block: support polling through blk_execute_rq
  block: remove REQ_OP_SCSI_{IN,OUT}
  block: mark blk_mq_init_queue_data static
  loop: rewrite loop_exit using idr_for_each_entry
  loop: split loop_lookup
  loop: don't allow deleting an unspecified loop device
  loop: move loop_ctl_mutex locking into loop_add
  ...
2021-07-09 12:05:33 -07:00
Steve French
4d069f6022 cifs: update internal version number
To 2.33

Signed-off-by: Steve French <stfrench@microsoft.com>
2021-07-09 13:03:36 -05:00
Paulo Alcantara
03313d1c3a cifs: prevent NULL deref in cifs_compose_mount_options()
The optional @ref parameter might contain an NULL node_name, so
prevent dereferencing it in cifs_compose_mount_options().

Addresses-Coverity: 1476408 ("Explicit null dereferenced")
Signed-off-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
Signed-off-by: Steve French <stfrench@microsoft.com>
2021-07-09 12:57:05 -05:00
Steve French
53d31a3ffd SMB3.1.1: Add support for negotiating signing algorithm
Support for faster packet signing (using GMAC instead of CMAC) can
now be negotiated to some newer servers, including Windows.
See MS-SMB2 section 2.2.3.17.

This patch adds support for sending the new negotiate context
with the first of three supported signing algorithms (AES-CMAC)
and decoding the response.  A followon patch will add support
for sending the other two (including AES-GMAC, which is fastest)
and changing the signing algorithm used based on what was
negotiated.

To allow the client to request GMAC signing set module parameter
"enable_negotiate_signing" to 1.

Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
Reviewed-by: Pavel Shilovsky <pshilovsky@samba.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
2021-07-09 12:48:58 -05:00
Linus Torvalds
7a400bf283 This pull request contains the following changes for UBIFS:
- Fix for a race xattr list and modification
 - Various minor fixes (spelling, return codes, ...)
 -----BEGIN PGP SIGNATURE-----
 
 iQJKBAABCAA0FiEEdgfidid8lnn52cLTZvlZhesYu8EFAmDnaSYWHHJpY2hhcmRA
 c2lnbWEtc3Rhci5hdAAKCRBm+VmF6xi7wd36D/9ZJDHP1D1ZTTN6eFbAy55fDV+U
 6nx36Ot+DA5BqaPYX1YRBNxshBIdd620W4n07US20SL14k7dzky/y+oJt+ZZY44k
 1ev6zqbmWTKdZhITwoaeVWRGYkEUVTvGW4MoOERZSfLhodrRVCfqs9/1a8ol/x1g
 AhZFxBRBUWLWlHYNplqdZdIuCo0qGQBWITv/A8of0pXH9cGxRTvYV5V6CbeXrnuD
 LgvckvwCZHPtRHgil3R1l/oGo109QNKhEycFBBLf1ydqIyIjOBcpI2ETZmifyv9D
 v2dgyu9u1SM2mzMEZmdTfWVBjqhKH2vYiv1otF2BD7DCJ48Y4yb/YQby3MOrZfDI
 rI3B8FlfNeXkqzoKjuLjiHe21KXrifPNsuTcunNd8Thnw+0mqsY3OimY4pYpQO2O
 JhDlnt8V4QLcUeg2NYGrQHx1qfLSMgc5pYd0ntOWWhJKXtreFYtX2WeorMroIHZQ
 B4li4VasaLvg+HSwJsgJ4qrPHl0fMT8eu0yb4KjsJi/d9Zl4FhmRe/QVs43GayAT
 zczxywAAN81sP2q/TQPtDgHzI90QTA0E89JkQVa0cGA2/VznM9RmAhrxGIjwGKZE
 4owC5RMA8w7eb60aLZ4HEooLwNXbKRd3ayR9Hatbp+Cq3+GBUG77em+2sLtwyqiD
 mDWJo6AWRZxBDpG4jg==
 =2Wk7
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-5.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/ubifs

Pull UBIFS updates from Richard Weinberger:

 - Fix for a race xattr list and modification

 - Various minor fixes (spelling, return codes, ...)

* tag 'for-linus-5.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/ubifs:
  ubifs: Set/Clear I_LINKABLE under i_lock for whiteout inode
  ubifs: Fix spelling mistakes
  ubifs: Remove ui_mutex in ubifs_xattr_get and change_xattr
  ubifs: Fix races between xattr_{set|get} and listxattr operations
  ubifs: fix snprintf() checking
  ubifs: journal: Fix error return code in ubifs_jnl_write_inode()
2021-07-09 10:10:47 -07:00
Linus Torvalds
e49d68ce7c Ext4 regression and bug fixes for v5.14-rc1
-----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEK2m5VNv+CHkogTfJ8vlZVpUNgaMFAmDnU2AACgkQ8vlZVpUN
 gaOBIAgApIAIeGbppf7aFjRN4h4wxfRpr7w6lux3GVmz7D+6djRi21X5dT5xq01m
 u6DkLAcKrCATIidyP6qHlvBbxxcPt2PX1FcQbruj9WcnSng1Ngl7RW8BEqp/eIRo
 Nb7MY0pg8HIJVMEniWQcdEjFWKDL3ksWR9+X3V3nhSzp+0kXFF1ySjk+TWi/ZGSn
 T/Q1sEyeUOiVfV75cIW5JbKoJEgvCvrclFvGJLYVcIAYeqJfQKQ0+tlkhDeYnWfQ
 nZgh1UU350bO629LGIhbRAkLbAloEb0d57mOQCrATo0JFrAZ52+0ZCkrTXtIyoOF
 TUILVf3zsqgdO8HLDkbH1G+lGn9WOA==
 =qU+W
 -----END PGP SIGNATURE-----

Merge tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4

Pull ext4 updates from Ted Ts'o:
 "Ext4 regression and bug fixes"

* tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
  ext4: inline jbd2_journal_[un]register_shrinker()
  ext4: fix flags validity checking for EXT4_IOC_CHECKPOINT
  ext4: fix possible UAF when remounting r/o a mmp-protected file system
  ext4: use ext4_grp_locked_error in mb_find_extent
  ext4: fix WARN_ON_ONCE(!buffer_uptodate) after an error writing the superblock
  Revert "ext4: consolidate checks for resize of bigalloc into ext4_resize_begin"
2021-07-09 09:57:27 -07:00
Linus Torvalds
47a7ce6288 We have new filesystem client metrics for reporting I/O sizes from
Xiubo, two patchsets from Jeff that begin to untangle some heavyweight
 blocking locks in the filesystem and a bunch of code cleanups.
 -----BEGIN PGP SIGNATURE-----
 
 iQFHBAABCAAxFiEEydHwtzie9C7TfviiSn/eOAIR84sFAmDnVcgTHGlkcnlvbW92
 QGdtYWlsLmNvbQAKCRBKf944AhHzi+d9CACqbWorDRCksqBB40muthHfgArYAc8A
 WZEvrcieymV6P+A3KJj9wtNeRgT8iSdJDweD/5Yl0ZfZUx3i0x78600fe5cls3u3
 XiX154G8KZpnAQbuDXnSny+4PiEQMkbfL3Zk++TSClBWb2PqYF/LvEsCfdBIuHYm
 BRMTpZ9rGWD+WWnz1iroubhMfmUTdyGzsgA4zjBNr46d2k1gZVviB0TDsEfhC8lP
 qio7IABkIWmvVJk9MCwp4JJQMMKuaN9DRddoA2Q/NZzevxHRUWCvW5a6o6vpO1+W
 d74Zzf9kbwCy+qbO1YpS0yrpNXP2IBVa0ZPNChOVDluPTmgVyQmrRjnU
 =wXsA
 -----END PGP SIGNATURE-----

Merge tag 'ceph-for-5.14-rc1' of git://github.com/ceph/ceph-client

Pull ceph updates from Ilya Dryomov:
 "We have new filesystem client metrics for reporting I/O sizes from
  Xiubo, two patchsets from Jeff that begin to untangle some heavyweight
  blocking locks in the filesystem and a bunch of code cleanups"

* tag 'ceph-for-5.14-rc1' of git://github.com/ceph/ceph-client:
  ceph: take reference to req->r_parent at point of assignment
  ceph: eliminate ceph_async_iput()
  ceph: don't take s_mutex in ceph_flush_snaps
  ceph: don't take s_mutex in try_flush_caps
  ceph: don't take s_mutex or snap_rwsem in ceph_check_caps
  ceph: eliminate session->s_gen_ttl_lock
  ceph: allow ceph_put_mds_session to take NULL or ERR_PTR
  ceph: clean up locking annotation for ceph_get_snap_realm and __lookup_snap_realm
  ceph: add some lockdep assertions around snaprealm handling
  ceph: decoding error in ceph_update_snap_realm should return -EIO
  ceph: add IO size metrics support
  ceph: update and rename __update_latency helper to __update_stdev
  ceph: simplify the metrics struct
  libceph: fix doc warnings in cls_lock_client.c
  libceph: remove unnecessary ret variable in ceph_auth_init()
  libceph: fix some spelling mistakes
  libceph: kill ceph_none_authorizer::reply_buf
  ceph: make ceph_queue_cap_snap static
  ceph: make ceph_netfs_read_ops static
  ceph: remove bogus checks and WARN_ONs from ceph_set_page_dirty
2021-07-09 09:52:13 -07:00
Linus Torvalds
96890bc2ea NFS client updates for Linux 5.14
Highlights include:
 
 Stable fixes:
 - Two sunrpc fixes for deadlocks involving privileged rpc_wait_queues
 
 Bugfixes
 - SUNRPC: Avoid a KASAN slab-out-of-bounds bug in xdr_set_page_base()
 - SUNRPC: prevent port reuse on transports which don't request it.
 - NFSv3: Fix memory leak in posix_acl_create()
 - NFS: Various fixes to attribute revalidation timeouts
 - NFSv4: Fix handling of non-atomic change attribute updates
 - NFSv4: If a server is down, don't cause mounts to other servers to
   hang as well
 - pNFS: Fix an Oops in pnfs_mark_request_commit() when doing O_DIRECT
 - NFS: Fix mount failures due to incorrect setting of the has_sec_mnt_opts
   filesystem flag
  - NFS: Ensure nfs_readpage returns promptly when an internal error occurs
  - NFS: Fix fscache read from NFS after cache error
  - pNFS: Various bugfixes around the LAYOUTGET operation
 
 Features
 - Multiple patches to add support for fcntl() leases over NFSv4.
 - A sysfs interface to display more information about the various
   transport connections used by the RPC client
 - A sysfs interface to allow a suitably privileged user to offline a
   transport that may no longer point to a valid server
 - A sysfs interface to allow a suitably privileged user to change the
   server IP address used by the RPC client
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEESQctxSBg8JpV8KqEZwvnipYKAPIFAmDnPrkACgkQZwvnipYK
 APKGAw/9EjdGoic6VpShQyb5uxaRoDd4uwWgFLBOfPhIWC7qNMtkj49wHIOEUm7e
 YfdF5RlCdmshaoMxjY84wjl8NTMwHbPahgooDd4+UsZUs2qxZ8dBsr0itfbFsJv8
 BpaCYKQt6XGQngGrWfC7SiCETnMej2YsmjDfHvhD58TxnRfPWexHUvx9xi9uGRCS
 sIWRA2QMNs7LwdShkkRotagodRLhu/zo4g0lon5lI8D/SRg6o8RoO4YP6oKH1FN4
 OyVzy1aWZGocgwCMUtNeuigJSRyDa+bJTfJ2c27uw5g18s0XWZ3j2DxD5I+HCEuE
 B4rhg+ujtPIifYLHf2Aj3nlxdBePZ5L67a2MOOUo+wSD+nPmNMZF1eIT/3Jsg/HA
 Z8gqcBiTIkBfVGJxWWbrbHfxPXQiK1IRGQx9acyhLCN9M6Kv5bbkn4R4dnronvJR
 g6O968fgC5uvl60CXdc8NCpWtSitXB/nH8pn7MbJ8JBGq7QIYNkS0d4E8ePhYwxk
 sRYJt21O+ryjodfQDHaUxodzCKGcpRoknpirMmgoAp4zdkva4ltViNsQvHa7jFh8
 HIuhU6Aia1xVYpUMDEXf2WMXCT9yLa2TyMDuS5KDfb69wBkQJWeKNkebf+1k03wQ
 saEmdoP4aEEujimkA7rqyOlI8XhsudKvBd3HXg+w9+xIt4yoie0=
 =NaOI
 -----END PGP SIGNATURE-----

Merge tag 'nfs-for-5.14-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs

Pull NFS client updates from Trond Myklebust:
 "Highlights include:

  Features:

   - Multiple patches to add support for fcntl() leases over NFSv4.

   - A sysfs interface to display more information about the various
     transport connections used by the RPC client

   - A sysfs interface to allow a suitably privileged user to offline a
     transport that may no longer point to a valid server

   - A sysfs interface to allow a suitably privileged user to change the
     server IP address used by the RPC client

  Stable fixes:

   - Two sunrpc fixes for deadlocks involving privileged rpc_wait_queues

  Bugfixes:

   - SUNRPC: Avoid a KASAN slab-out-of-bounds bug in xdr_set_page_base()

   - SUNRPC: prevent port reuse on transports which don't request it.

   - NFSv3: Fix memory leak in posix_acl_create()

   - NFS: Various fixes to attribute revalidation timeouts

   - NFSv4: Fix handling of non-atomic change attribute updates

   - NFSv4: If a server is down, don't cause mounts to other servers to
     hang as well

   - pNFS: Fix an Oops in pnfs_mark_request_commit() when doing O_DIRECT

   - NFS: Fix mount failures due to incorrect setting of the
     has_sec_mnt_opts filesystem flag

   - NFS: Ensure nfs_readpage returns promptly when an internal error
     occurs

   - NFS: Fix fscache read from NFS after cache error

   - pNFS: Various bugfixes around the LAYOUTGET operation"

* tag 'nfs-for-5.14-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (46 commits)
  NFSv4/pNFS: Return an error if _nfs4_pnfs_v3_ds_connect can't load NFSv3
  NFSv4/pNFS: Don't call _nfs4_pnfs_v3_ds_connect multiple times
  NFSv4/pnfs: Clean up layout get on open
  NFSv4/pnfs: Fix layoutget behaviour after invalidation
  NFSv4/pnfs: Fix the layout barrier update
  NFS: Fix fscache read from NFS after cache error
  NFS: Ensure nfs_readpage returns promptly when internal error occurs
  sunrpc: remove an offlined xprt using sysfs
  sunrpc: provide showing transport's state info in the sysfs directory
  sunrpc: display xprt's queuelen of assigned tasks via sysfs
  sunrpc: provide multipath info in the sysfs directory
  NFSv4.1 identify and mark RPC tasks that can move between transports
  sunrpc: provide transport info in the sysfs directory
  SUNRPC: take a xprt offline using sysfs
  sunrpc: add dst_attr attributes to the sysfs xprt directory
  SUNRPC for TCP display xprt's source port in sysfs xprt_info
  SUNRPC query transport's source port
  SUNRPC display xprt's main value in sysfs's xprt_info
  SUNRPC mark the first transport
  sunrpc: add add sysfs directory per xprt under each xprt_switch
  ...
2021-07-09 09:43:57 -07:00
Linus Torvalds
227c4d507c f2fs-for-5.14-rc1
In this round, we've improved the compression support especially for Android
 such as allowing compression for mmap files, replacing the immutable bit with
 internal bit to prohibits data writes explicitly, and adding a mount option,
 "compress_cache", to improve random reads. And, we added "readonly" feature to
 compact the partition w/ compression enabled, which will be useful for Android
 RO partitions.
 
 Enhancement:
  - support compression for mmap file
  - use an f2fs flag instead of IMMUTABLE bit for compression
  - support RO feature w/ extent_cache
  - fully support swapfile with file pinning
  - improve atgc tunability
  - add nocompress extensions to unselect files for compression
 
 Bug fix:
  - fix false alaram on iget failure during GC
  - fix race condition on global pointers when there are multiple f2fs instances
  - add MODULE_SOFTDEP for initramfs
 
 As usual, we've also cleaned up some places for better code readability.
 (e.g., sysfs/feature, debugging messages, slab cache name, and docs)
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEE00UqedjCtOrGVvQiQBSofoJIUNIFAmDmAKIACgkQQBSofoJI
 UNIGGA//XOcQ+BF6t/os55UZFZc2E5F9VDNkKKBw9WIXp2g2/5D8hl+dT7qMFzMA
 ndZiRl0wEElce1ZqSzrt8TRcbwoDLgguUdD7B70CR2NMZ36kdVDet3BW01SnwhXy
 JoiLBkdB5VUdK6QMnP7+xE3tvKhEk3FDohpOoaOtmYyPpk8Q7C8n0PX2KADgmepl
 6aNi1aJGy+hf6UyhcHlgfKLpndyX6gGJVmoLN+yJF2D9tLALaLHcZgJUBG9ZajNH
 yLNwiBG7ybcs1TrWan0887Ne3GVXQpWqrBLou3s4b8abTLpYdFBnIPrN00jmu3Hp
 I/uenkBS4HI5AZgKFrF1i2QiKlBL/+PmQ1fJtraVJj6auhnwHqZGIdTb8c3AbeSJ
 pKhC3w8W8K6Fs2Awuyc4qlWDdNquAeM0wf5qlD0yLgZO6/KgpXbBDCQIOhhBn2hW
 6h6jZ4KJwEuTXEZB39PJ83YXPpzoxoZW4hVZXlAcsgX2s3Ke1R0BrjZ+ff6JeG78
 Oj13PCWFbwd+UjS3I2W1R9taBDeuYkyMF7CKl+BtAzkYD1bO+KfO4sfSoR/XLSVx
 OwB36LFzklNJeJ1koJZw4n/czJQYZTOQgThDPFDNL1m8zpyB5p8MNL7f84cOJ2Yo
 aU7mpTX/i56bGsO3sr4OyO37MlZTBkyNjMI/sBgouPXYlN3YmHs=
 =1IHL
 -----END PGP SIGNATURE-----

Merge tag 'f2fs-for-5.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs

Pull f2fs updates from Jaegeuk Kim:
 "In this round, we've improved the compression support especially for
  Android such as allowing compression for mmap files, replacing the
  immutable bit with internal bit to prohibits data writes explicitly,
  and adding a mount option, "compress_cache", to improve random reads.
  And, we added "readonly" feature to compact the partition w/
  compression enabled, which will be useful for Android RO partitions.

  Enhancements:
   - support compression for mmap file
   - use an f2fs flag instead of IMMUTABLE bit for compression
   - support RO feature w/ extent_cache
   - fully support swapfile with file pinning
   - improve atgc tunability
   - add nocompress extensions to unselect files for compression

  Bug fixes:
   - fix false alaram on iget failure during GC
   - fix race condition on global pointers when there are multiple f2fs
     instances
   - add MODULE_SOFTDEP for initramfs

  As usual, we've also cleaned up some places for better code
  readability (e.g., sysfs/feature, debugging messages, slab cache
  name, and docs)"

* tag 'f2fs-for-5.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (32 commits)
  f2fs: drop dirty node pages when cp is in error status
  f2fs: initialize page->private when using for our internal use
  f2fs: compress: add nocompress extensions support
  MAINTAINERS: f2fs: update my email address
  f2fs: remove false alarm on iget failure during GC
  f2fs: enable extent cache for compression files in read-only
  f2fs: fix to avoid adding tab before doc section
  f2fs: introduce f2fs_casefolded_name slab cache
  f2fs: swap: support migrating swapfile in aligned write mode
  f2fs: swap: remove dead codes
  f2fs: compress: add compress_inode to cache compressed blocks
  f2fs: clean up /sys/fs/f2fs/<disk>/features
  f2fs: add pin_file in feature list
  f2fs: Advertise encrypted casefolding in sysfs
  f2fs: Show casefolding support only when supported
  f2fs: support RO feature
  f2fs: logging neatening
  f2fs: introduce FI_COMPRESS_RELEASED instead of using IMMUTABLE bit
  f2fs: compress: remove unneeded preallocation
  f2fs: atgc: export entries for better tunability via sysfs
  ...
2021-07-09 09:37:56 -07:00
Jens Axboe
9ce85ef2cb io_uring: remove dead non-zero 'poll' check
Colin reports that Coverity complains about checking for poll being
non-zero after having dereferenced it multiple times. This is a valid
complaint, and actually a leftover from back when this code was based
on the aio poll code.

Kill the redundant check.

Link: https://lore.kernel.org/io-uring/fe70c532-e2a7-3722-58a1-0fa4e5c5ff2c@canonical.com/
Reported-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-07-09 08:20:28 -06:00
Ronnie Sahlberg
e0a3cbcd5c cifs: use helpers when parsing uid/gid mount options and validate them
Use the nice helpers to initialize and the uid/gid/cred_uid when passed as mount arguments.

Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Acked-by: Pavel Shilovsky <pshilovsky@samba.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
2021-07-08 18:25:04 -05:00
Pavel Begunkov
8f487ef2cb io_uring: mitigate unlikely iopoll lag
We have requests like IORING_OP_FILES_UPDATE that don't go through
->iopoll_list but get completed in place under ->uring_lock, and so
after dropping the lock io_iopoll_check() should expect that some CQEs
might have get completed in a meanwhile.

Currently such events won't be accounted in @nr_events, and the loop
will continue to poll even if there is enough of CQEs. It shouldn't be a
problem as it's not likely to happen and so, but not nice either. Just
return earlier in this case, it should be enough.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/66ef932cc66a34e3771bbae04b2953a8058e9d05.1625747741.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-07-08 14:07:43 -06:00
Trond Myklebust
878b3dfc42 Merge part 2 of branch 'sysfs-devel'
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2021-07-08 14:03:26 -04:00
Trond Myklebust
dd5c153ed7 NFSv4/pNFS: Return an error if _nfs4_pnfs_v3_ds_connect can't load NFSv3
Currently we fail to return an error if the NFSv3 module failed to load
when we're trying to connect to a pNFS data server.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2021-07-08 14:03:26 -04:00
Trond Myklebust
f46f84931a NFSv4/pNFS: Don't call _nfs4_pnfs_v3_ds_connect multiple times
After we grab the lock in nfs4_pnfs_ds_connect(), there is no check for
whether or not ds->ds_clp has already been initialised, so we can end up
adding the same transports multiple times.

Fixes: fc821d5920 ("pnfs/NFSv4.1: Add multipath capabilities to pNFS flexfiles servers over NFSv3")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2021-07-08 14:03:26 -04:00
Trond Myklebust
b4e89bcba2 NFSv4/pnfs: Clean up layout get on open
Cache the layout in the arguments so we don't have to keep looking it up
from the inode.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2021-07-08 14:03:26 -04:00
Trond Myklebust
0b77f97a7e NFSv4/pnfs: Fix layoutget behaviour after invalidation
If the layout gets invalidated, we should wait for any outstanding
layoutget requests for that layout to complete, and we should resend
them only after re-establishing the layout stateid.

Fixes: d29b468da4 ("pNFS/NFSv4: Improve rejection of out-of-order layouts")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2021-07-08 14:03:26 -04:00
Trond Myklebust
aa95edf309 NFSv4/pnfs: Fix the layout barrier update
If we have multiple outstanding layoutget requests, the current code to
update the layout barrier assumes that the outstanding layout stateids
are updated in order. That's not necessarily the case.

Instead of using the value of lo->plh_outstanding as a guesstimate for
the window of values we need to accept, just wait to update the window
until we're processing the last one. The intention here is just to
ensure that we don't process 2^31 seqid updates without also updating
the barrier.

Fixes: 1bcf34fdac ("pNFS/NFSv4: Update the layout barrier when we schedule a layoutreturn")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2021-07-08 14:03:26 -04:00
Dave Wysochanski
ba512c1bc3 NFS: Fix fscache read from NFS after cache error
Earlier commits refactored some NFS read code and removed
nfs_readpage_async(), but neglected to properly fixup
nfs_readpage_from_fscache_complete().  The code path is
only hit when something unusual occurs with the cachefiles
backing filesystem, such as an IO error or while a cookie
is being invalidated.

Mark page with PG_checked if fscache IO completes in error,
unlock the page, and let the VM decide to re-issue based on
PG_uptodate.  When the VM reissues the readpage, PG_checked
allows us to skip over fscache and read from the server.

Link: https://marc.info/?l=linux-nfs&m=162498209518739
Fixes: 1e83b173b2 ("NFS: Add nfs_pageio_complete_read() and remove nfs_readpage_async()")
Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2021-07-08 14:03:26 -04:00
Dave Wysochanski
e0340f16a0 NFS: Ensure nfs_readpage returns promptly when internal error occurs
A previous refactoring of nfs_readpage() might end up calling
wait_on_page_locked_killable() even if readpage_async_filler() failed
with an internal error and pg_error was non-zero (for example, if
nfs_create_request() failed).  In the case of an internal error,
skip over wait_on_page_locked_killable() as this is only needed
when the read is sent and an error occurs during completion handling.

Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2021-07-08 14:03:26 -04:00
Olga Kornievskaia
85e39feead NFSv4.1 identify and mark RPC tasks that can move between transports
In preparation for when we can re-try a task on a different transport,
identify and mark such RPC tasks as moveable. Only 4.1+ operarations can
be re-tried on a different transport.

Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2021-07-08 14:03:24 -04:00
Theodore Ts'o
0705e8d1e2 ext4: inline jbd2_journal_[un]register_shrinker()
The function jbd2_journal_unregister_shrinker() was getting called
twice when the file system was getting unmounted.  On Power and ARM
platforms this was causing kernel crash when unmounting the file
system, when a percpu_counter was destroyed twice.

Fix this by removing jbd2_journal_[un]register_shrinker() functions,
and inlining the shrinker setup and teardown into
journal_init_common() and jbd2_journal_destroy().  This means that
ext4 and ocfs2 now no longer need to know about registering and
unregistering jbd2's shrinker.

Also, while we're at it, rename the percpu counter from
j_jh_shrink_count to j_checkpoint_jh_count, since this makes it
clearer what this counter is intended to track.

Link: https://lore.kernel.org/r/20210705145025.3363130-1-tytso@mit.edu
Fixes: 4ba3fcdde7 ("jbd2,ext4: add a shrinker to release checkpointed buffers")
Reported-by: Jon Hunter <jonathanh@nvidia.com>
Reported-by: Sachin Sant <sachinp@linux.vnet.ibm.com>
Tested-by: Sachin Sant <sachinp@linux.vnet.ibm.com>
Tested-by: Jon Hunter <jonathanh@nvidia.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2021-07-08 08:37:31 -04:00
Theodore Ts'o
0955901908 ext4: fix flags validity checking for EXT4_IOC_CHECKPOINT
Use the correct bitmask when checking for any not-yet-supported flags.

Link: https://lore.kernel.org/r/20210702173425.1276158-1-tytso@mit.edu
Fixes: 351a0a3fbc ("ext4: add ioctl EXT4_IOC_CHECKPOINT")
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Leah Rumancik <leah.rumancik@gmail.com>
2021-07-08 08:37:31 -04:00
Theodore Ts'o
61bb4a1c41 ext4: fix possible UAF when remounting r/o a mmp-protected file system
After commit 618f003199 ("ext4: fix memory leak in
ext4_fill_super"), after the file system is remounted read-only, there
is a race where the kmmpd thread can exit, causing sbi->s_mmp_tsk to
point at freed memory, which the call to ext4_stop_mmpd() can trip
over.

Fix this by only allowing kmmpd() to exit when it is stopped via
ext4_stop_mmpd().

Link: https://lore.kernel.org/r/20210707002433.3719773-1-tytso@mit.edu
Reported-by: Ye Bin <yebin10@huawei.com>
Bug-Report-Link: <20210629143603.2166962-1-yebin10@huawei.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Jan Kara <jack@suse.cz>
2021-07-08 08:36:09 -04:00
Steve French
d4dc277c48 CIFS: Clarify SMB1 code for POSIX Lock
Coverity also complains about the way we calculate the offset
(starting from the address of a 4 byte array within the
header structure rather than from the beginning of the struct
plus 4 bytes) for SMB1 PosixLock. This changeset
doesn't change the address but makes it slightly clearer.

Addresses-Coverity: 711520 ("Out of bounds write")
Reviewed-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
Signed-off-by: Steve French <stfrench@microsoft.com>
2021-07-07 16:43:17 -05:00
Steve French
f371793d6e CIFS: Clarify SMB1 code for rename open file
Coverity also complains about the way we calculate the offset
(starting from the address of a 4 byte array within the
header structure rather than from the beginning of the struct
plus 4 bytes) for SMB1 RenameOpenFile. This changeset
doesn't change the address but makes it slightly clearer.

Addresses-Coverity: 711521 ("Out of bounds write")
Reviewed-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
Signed-off-by: Steve French <stfrench@microsoft.com>
2021-07-07 16:42:25 -05:00
Linus Torvalds
0cc2ea8ceb Some highlights:
- add tracepoints for callbacks and for client creation and
 	  destruction
 	- cache the mounts used for server-to-server copies
 	- expose callback information in /proc/fs/nfsd/clients/*/info
 	- don't hold locks unnecessarily while waiting for commits
 	- update NLM to use xdr_stream, as we have for NFSv2/v3/v4
 -----BEGIN PGP SIGNATURE-----
 
 iQJJBAABCAAzFiEEYtFWavXG9hZotryuJ5vNeUKO4b4FAmDlvjIVHGJmaWVsZHNA
 ZmllbGRzZXMub3JnAAoJECebzXlCjuG+0MoP/RJ8Q7zwIz6WFHn3bCRaEXpnnkAH
 mmMfELhmgvH0V5nXWbb2rAfhllY+/zeWtf8QHSEKUPCnVLmB7WeXKdjXSy7EnYJ8
 R8DuuuII85McIrg93nJ8hxm4wXTaTZKXpS4Vxkuxc6YKxoeJoXOaTjbgRLIw8mfX
 w4wPfjAsnROboVxvDHUmBS9zNKaAi2dZ0jH2x2eS7eZSWzoJC30yd+pFSxyYoOac
 3fZUntDskQDGIpXHuTf53WcaK7h1bUHrwS7Joez8Z0ctg4vcbJsfdhKZUZwAxOZh
 3xWAgm3PFcze5xqHuX8BYBThHfB3uTeygZQRb3zI9sG2UQtQfundrtlxZRSjMMkC
 cwlSi2SQNL66EBIgOcS3U/9OeorLALnnRax1KWMWjpFzaBJJQTJDumwLRx4zogI1
 Ouiu0fI+hApck+L+qCzJMidA2wxOBsDzH471YiGiqQSmgNZc6wBc+aC/JKN8QAWb
 jG53vvpa3gCZa8Rs3KyOoUvtcCCdiQc+nljbzqtVfIvvGa9MSixufa+U5fojLEO7
 i8aangK+mteMxrrejEKvRu1efDIfpFq0HW7ev1mzW2Jl/AguDXM5XUeGK2mMMPtc
 WqT3arbtGVcXJN+Oh5TzTVuED/DecyO0Fig77G+WJTiWONgoHfs+E5nC4aHSpohn
 bMpmQMIOmTa5zgQP
 =BQyR
 -----END PGP SIGNATURE-----

Merge tag 'nfsd-5.14' of git://linux-nfs.org/~bfields/linux

Pull nfsd updates from Bruce Fields:

 - add tracepoints for callbacks and for client creation and destruction

 - cache the mounts used for server-to-server copies

 - expose callback information in /proc/fs/nfsd/clients/*/info

 - don't hold locks unnecessarily while waiting for commits

 - update NLM to use xdr_stream, as we have for NFSv2/v3/v4

* tag 'nfsd-5.14' of git://linux-nfs.org/~bfields/linux: (69 commits)
  nfsd: fix NULL dereference in nfs3svc_encode_getaclres
  NFSD: Prevent a possible oops in the nfs_dirent() tracepoint
  nfsd: remove redundant assignment to pointer 'this'
  nfsd: Reduce contention for the nfsd_file nf_rwsem
  lockd: Update the NLMv4 SHARE results encoder to use struct xdr_stream
  lockd: Update the NLMv4 nlm_res results encoder to use struct xdr_stream
  lockd: Update the NLMv4 TEST results encoder to use struct xdr_stream
  lockd: Update the NLMv4 void results encoder to use struct xdr_stream
  lockd: Update the NLMv4 FREE_ALL arguments decoder to use struct xdr_stream
  lockd: Update the NLMv4 SHARE arguments decoder to use struct xdr_stream
  lockd: Update the NLMv4 SM_NOTIFY arguments decoder to use struct xdr_stream
  lockd: Update the NLMv4 nlm_res arguments decoder to use struct xdr_stream
  lockd: Update the NLMv4 UNLOCK arguments decoder to use struct xdr_stream
  lockd: Update the NLMv4 CANCEL arguments decoder to use struct xdr_stream
  lockd: Update the NLMv4 LOCK arguments decoder to use struct xdr_stream
  lockd: Update the NLMv4 TEST arguments decoder to use struct xdr_stream
  lockd: Update the NLMv4 void arguments decoder to use struct xdr_stream
  lockd: Update the NLMv1 SHARE results encoder to use struct xdr_stream
  lockd: Update the NLMv1 nlm_res results encoder to use struct xdr_stream
  lockd: Update the NLMv1 TEST results encoder to use struct xdr_stream
  ...
2021-07-07 12:50:08 -07:00
Pavel Begunkov
c32aace0cf io_uring: fix drain alloc fail return code
After a recent change io_drain_req() started to fail requests with
result=0 in case of allocation failure, where it should be and have
been -ENOMEM.

Fixes: 76cc33d791 ("io_uring: refactor io_req_defer()")
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/e068110ac4293e0c56cfc4d280d0f22b9303ec08.1625682153.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-07-07 12:49:32 -06:00
Steve French
2a780e8b64 CIFS: Clarify SMB1 code for delete
Coverity also complains about the way we calculate the offset
(starting from the address of a 4 byte array within the
header structure rather than from the beginning of the struct
plus 4 bytes) for SMB1 SetFileDisposition (which is used to
unlink a file by setting the delete on close flag).  This
changeset doesn't change the address but makes it slightly
clearer.

Addresses-Coverity: 711524 ("Out of bounds write")
Reviewed-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
Signed-off-by: Steve French <stfrench@microsoft.com>
2021-07-07 11:53:17 -05:00
Steve French
e3973ea3a7 CIFS: Clarify SMB1 code for SetFileSize
Coverity also complains about the way we calculate the offset
(starting from the address of a 4 byte array within the header
structure rather than from the beginning of the struct plus
4 bytes) for setting the file size using SMB1. This changeset
doesn't change the address but makes it slightly clearer.

Addresses-Coverity: 711525 ("Out of bounds write")
Reviewed-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
Signed-off-by: Steve French <stfrench@microsoft.com>
2021-07-07 11:52:53 -05:00
Filipe Manana
ea32af47f0 btrfs: zoned: fix wrong mutex unlock on failure to allocate log root tree
When syncing the log, if we fail to allocate the root node for the log
root tree:

1) We are unlocking fs_info->tree_log_mutex, but at this point we have
   not yet locked this mutex;

2) We have locked fs_info->tree_root->log_mutex, but we end up not
   unlocking it;

So fix this by unlocking fs_info->tree_root->log_mutex instead of
fs_info->tree_log_mutex.

Fixes: e75f9fd194 ("btrfs: zoned: move log tree node allocation out of log_root_tree->log_mutex")
CC: stable@vger.kernel.org # 5.13+
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2021-07-07 18:27:44 +02:00
Johannes Thumshirn
9cc0b837e1 btrfs: don't block if we can't acquire the reclaim lock
If we can't acquire the reclaim_bgs_lock on block group reclaim, we
block until it is free. This can potentially stall for a long time.

While reclaim of block groups is necessary for a good user experience on
a zoned file system, there still is no need to block as it is best
effort only, just like when we're deleting unused block groups.

CC: stable@vger.kernel.org # 5.13
Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2021-07-07 17:42:57 +02:00
Naohiro Aota
abb99cfdaf btrfs: properly split extent_map for REQ_OP_ZONE_APPEND
Damien reported a test failure with btrfs/209. The test itself ran fine,
but the fsck ran afterwards reported a corrupted filesystem.

The filesystem corruption happens because we're splitting an extent and
then writing the extent twice. We have to split the extent though, because
we're creating too large extents for a REQ_OP_ZONE_APPEND operation.

When dumping the extent tree, we can see two EXTENT_ITEMs at the same
start address but different lengths.

$ btrfs inspect dump-tree /dev/nullb1 -t extent
...
   item 19 key (269484032 EXTENT_ITEM 126976) itemoff 15470 itemsize 53
           refs 1 gen 7 flags DATA
           extent data backref root FS_TREE objectid 257 offset 786432 count 1
   item 20 key (269484032 EXTENT_ITEM 262144) itemoff 15417 itemsize 53
           refs 1 gen 7 flags DATA
           extent data backref root FS_TREE objectid 257 offset 786432 count 1

The duplicated EXTENT_ITEMs originally come from wrongly split extent_map in
extract_ordered_extent(). Since extract_ordered_extent() uses
create_io_em() to split an existing extent_map, we will have
split->orig_start != split->start. Then, it will be logged with non-zero
"extent data offset". Finally, the logged entries are replayed into
a duplicated EXTENT_ITEM.

Introduce and use proper splitting function for extent_map. The function is
intended to be simple and specific usage for extract_ordered_extent() e.g.
not supporting compression case (we do not allow splitting compressed
extent_map anyway).

There was a question raised by Qu, in summary why we want to split the
extent map (and not the bio):

The problem is not the limit on the zone end, which as you mention is
the same as the block group end. The problem is that data write use zone
append (ZA) operations. ZA BIOs cannot be split so a large extent may
need to be processed with multiple ZA BIOs, While that is also true for
regular writes, the major difference is that ZA are "nameless" write
operation giving back the written sectors on completion. And ZA
operations may be reordered by the block layer (not intentionally
though). Combine both of these characteristics and you can see that the
data for a large extent may end up being shuffled when written resulting
in data corruption and the impossibility to map the extent to some start
sector.

To avoid this problem, zoned btrfs uses the principle "one data extent
== one ZA BIO". So large extents need to be split. This is unfortunate,
but we can revisit this later and optimize, e.g. merge back together the
fragments of an extent once written if they actually were written
sequentially in the zone.

Reported-by: Damien Le Moal <damien.lemoal@wdc.com>
Fixes: d22002fd37 ("btrfs: zoned: split ordered extent when bio is sent")
CC: stable@vger.kernel.org # 5.12+
CC: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2021-07-07 17:42:45 +02:00
Filipe Manana
79bd37120b btrfs: rework chunk allocation to avoid exhaustion of the system chunk array
Commit eafa4fd0ad ("btrfs: fix exhaustion of the system chunk array
due to concurrent allocations") fixed a problem that resulted in
exhausting the system chunk array in the superblock when there are many
tasks allocating chunks in parallel. Basically too many tasks enter the
first phase of chunk allocation without previous tasks having finished
their second phase of allocation, resulting in too many system chunks
being allocated. That was originally observed when running the fallocate
tests of stress-ng on a PowerPC machine, using a node size of 64K.

However that commit also introduced a deadlock where a task in phase 1 of
the chunk allocation waited for another task that had allocated a system
chunk to finish its phase 2, but that other task was waiting on an extent
buffer lock held by the first task, therefore resulting in both tasks not
making any progress. That change was later reverted by a patch with the
subject "btrfs: fix deadlock with concurrent chunk allocations involving
system chunks", since there is no simple and short solution to address it
and the deadlock is relatively easy to trigger on zoned filesystems, while
the system chunk array exhaustion is not so common.

This change reworks the chunk allocation to avoid the system chunk array
exhaustion. It accomplishes that by making the first phase of chunk
allocation do the updates of the device items in the chunk btree and the
insertion of the new chunk item in the chunk btree. This is done while
under the protection of the chunk mutex (fs_info->chunk_mutex), in the
same critical section that checks for available system space, allocates
a new system chunk if needed and reserves system chunk space. This way
we do not have chunk space reserved until the second phase completes.

The same logic is applied to chunk removal as well, since it keeps
reserved system space long after it is done updating the chunk btree.

For direct allocation of system chunks, the previous behaviour remains,
because otherwise we would deadlock on extent buffers of the chunk btree.
Changes to the chunk btree are by large done by chunk allocation and chunk
removal, which first reserve chunk system space and then later do changes
to the chunk btree. The other remaining cases are uncommon and correspond
to adding a device, removing a device and resizing a device. All these
other cases do not pre-reserve system space, they modify the chunk btree
right away, so they don't hold reserved space for a long period like chunk
allocation and chunk removal do.

The diff of this change is huge, but more than half of it is just addition
of comments describing both how things work regarding chunk allocation and
removal, including both the new behavior and the parts of the old behavior
that did not change.

CC: stable@vger.kernel.org # 5.12+
Tested-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Tested-by: Naohiro Aota <naohiro.aota@wdc.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Tested-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2021-07-07 17:42:41 +02:00
Filipe Manana
1cb3db1cf3 btrfs: fix deadlock with concurrent chunk allocations involving system chunks
When a task attempting to allocate a new chunk verifies that there is not
currently enough free space in the system space_info and there is another
task that allocated a new system chunk but it did not finish yet the
creation of the respective block group, it waits for that other task to
finish creating the block group. This is to avoid exhaustion of the system
chunk array in the superblock, which is limited, when we have a thundering
herd of tasks allocating new chunks. This problem was described and fixed
by commit eafa4fd0ad ("btrfs: fix exhaustion of the system chunk array
due to concurrent allocations").

However there are two very similar scenarios where this can lead to a
deadlock:

1) Task B allocated a new system chunk and task A is waiting on task B
   to finish creation of the respective system block group. However before
   task B ends its transaction handle and finishes the creation of the
   system block group, it attempts to allocate another chunk (like a data
   chunk for an fallocate operation for a very large range). Task B will
   be unable to progress and allocate the new chunk, because task A set
   space_info->chunk_alloc to 1 and therefore it loops at
   btrfs_chunk_alloc() waiting for task A to finish its chunk allocation
   and set space_info->chunk_alloc to 0, but task A is waiting on task B
   to finish creation of the new system block group, therefore resulting
   in a deadlock;

2) Task B allocated a new system chunk and task A is waiting on task B to
   finish creation of the respective system block group. By the time that
   task B enter the final phase of block group allocation, which happens
   at btrfs_create_pending_block_groups(), when it modifies the extent
   tree, the device tree or the chunk tree to insert the items for some
   new block group, it needs to allocate a new chunk, so it ends up at
   btrfs_chunk_alloc() and keeps looping there because task A has set
   space_info->chunk_alloc to 1, but task A is waiting for task B to
   finish creation of the new system block group and release the reserved
   system space, therefore resulting in a deadlock.

In short, the problem is if a task B needs to allocate a new chunk after
it previously allocated a new system chunk and if another task A is
currently waiting for task B to complete the allocation of the new system
chunk.

Unfortunately this deadlock scenario introduced by the previous fix for
the system chunk array exhaustion problem does not have a simple and short
fix, and requires a big change to rework the chunk allocation code so that
chunk btree updates are all made in the first phase of chunk allocation.
And since this deadlock regression is being frequently hit on zoned
filesystems and the system chunk array exhaustion problem is triggered
in more extreme cases (originally observed on PowerPC with a node size
of 64K when running the fallocate tests from stress-ng), revert the
changes from that commit. The next patch in the series, with a subject
of "btrfs: rework chunk allocation to avoid exhaustion of the system
chunk array" does the necessary changes to fix the system chunk array
exhaustion problem.

Reported-by: Naohiro Aota <naohiro.aota@wdc.com>
Link: https://lore.kernel.org/linux-btrfs/20210621015922.ewgbffxuawia7liz@naota-xeon/
Fixes: eafa4fd0ad ("btrfs: fix exhaustion of the system chunk array due to concurrent allocations")
CC: stable@vger.kernel.org # 5.12+
Tested-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Tested-by: Naohiro Aota <naohiro.aota@wdc.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Tested-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2021-07-07 17:42:40 +02:00
Johannes Thumshirn
5f93e776c6 btrfs: zoned: print unusable percentage when reclaiming block groups
When we're automatically reclaiming a zone, because its zone_unusable
value is above the reclaim threshold, we're only logging how much
percent of the zone's capacity are used, but not how much of the
capacity is unusable.

Also print the percentage of the unusable space in the block group
before we're reclaiming it.

Example:

  BTRFS info (device sdg): reclaiming chunk 230686720 with 13% used 86% unusable

CC: stable@vger.kernel.org # 5.13
Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2021-07-07 17:42:37 +02:00
David Sterba
54afaae34e btrfs: zoned: fix types for u64 division in btrfs_reclaim_bgs_work
The types in calculation of the used percentage in the reclaiming
messages are both u64, though bg->length is either 1GiB (non-zoned) or
the zone size in the zoned mode. The upper limit on zone size is 8GiB so
this could theoretically overflow in the future, right now the values
fit.

Fixes: 18bb8bbf13 ("btrfs: zoned: automatically reclaim zones")
CC: stable@vger.kernel.org # 5.13
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2021-07-07 17:42:34 +02:00
Jaegeuk Kim
28607bf3aa f2fs: drop dirty node pages when cp is in error status
Otherwise, writeback is going to fall in a loop to flush dirty inode forever
before getting SBI_CLOSING.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2021-07-06 22:05:06 -07:00