Commit Graph

66207 Commits

Author SHA1 Message Date
Linus Torvalds
2ac69819ba Fixes:
- Eliminate an oops introduced in v5.8
 - Remove a duplicate #include added by nfsd-5.9
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEKLLlsBKG3yQ88j7+M2qzM29mf5cFAl8/4gUACgkQM2qzM29m
 f5cxKBAAp7UjD3YNlLhSviowuOfYpWNjyk1cEQ6hWFA9oVeSfZfU3/axW8uYTHPm
 QZ6ams6gjorP4CXwVkFGFHpTRg4CfVN9g5lKxrcjvELqNllWBhE9UupRgbX3+XBE
 qselRI22M64o2tfDE+tPrDB8w8PwHmqrHwRydXfgiFlHk7nt6xD7NitaJBnPlYPM
 21OBl6mrjLwtRwvX9n5wpy/+bfOTHbGV5VNez0fAfKXggNmRdt/UNROC4doLg4M0
 2khAV3vgx49FRpCPL6SZPcBYd6zfrYOcj3iSf6wpxS5nTb2MifXFqz1MvKRTj863
 gzvSmh7vuf0+EaOAXuLjCD9dURZpuG/k0vJGijOgaSt0+vNQHjIgZ1XRFHQtQCp4
 zPJ/Qyk5k7uajHzcBFuNPUFAkOovH6LRoOzpqGvXhwaxrMPWti0LyyVKidVJrt/d
 EtOKQR+HCN0zAwjadXSPK8Nw1PjMzplkF7TaxXvF2LdO/4vpEZZNoz+if59gRcFY
 65h2++7y+0MCX8l83uUZfs+jQU2aR1w5a0DjVzi86xzJtyhr6gEyTj3Z6L9HIHwW
 dnSpUmoiaCoN0eqxvEBjw0VEPqB806CuiUER0Jdd8k7mPk04fsQ/9+UsYyliSLEG
 N56LFSWLXLHsySa2WkuB/ghzT2/Q0vFoZKXW0KNSD7W4C5XMxi4=
 =czB3
 -----END PGP SIGNATURE-----

Merge tag 'nfsd-5.9-1' of git://git.linux-nfs.org/projects/cel/cel-2.6

Pull nfs server fixes from Chuck Lever:

 - Eliminate an oops introduced in v5.8

 - Remove a duplicate #include added by nfsd-5.9

* tag 'nfsd-5.9-1' of git://git.linux-nfs.org/projects/cel/cel-2.6:
  SUNRPC: remove duplicate include
  nfsd: fix oops on mixed NFSv4/NFSv3 client access
2020-08-25 18:01:36 -07:00
Linus Torvalds
abb3438d69 m68knommu: fixes for 5.9-rc3
Fixes include:
 . revert binfmt_flat data offset removal
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEmsfM6tQwfNjBOxr3TiQVqaG9L4AFAl9FEmEACgkQTiQVqaG9
 L4CzmBAAqqvgz04SYakQ1CLaeG2syWW3JM9XmPdYgoca+YJwm1SRaSpAxXJRQxfG
 /4iQucbjFW7lD+KiemepHkYQapesNW8UaHdwN2YEemRMfjCj69WJ9wVh2jCyaBN8
 t7Qg1CuK5L9TfjqXYvAFW5dyeO2/hNQuFGYpy6sSrE2MW1m1vC7pG8Ku+yhhRKn6
 W6xt3+R9PVHtYszBf8aMMcc+G394C4K7B8cbZbyllayFMX59ZKyArZJx6A43FT1I
 Lb/yaEh4uujnL1XnlT2LOy2RA7heJpfCFH8fwz2oA5tMhzqHIpEZ82iCaaMyRvXe
 fbx2IDXD/NqwxgrbU1PuUP9lyUuWTiqczwwv5FQDFoEBH9My04QR6Do/fUojWoIz
 ah8SzS2H96o20CJZXQaUpwdLXyj9E5OCME0J2EBIu6RkM2kbaDoRHwKIKEyFx0qx
 vE/vC/KY2b81xwQgmKQa28HhCmQRlQvLBZTm8pO41ut3y6zgGzFs9/MsAmcQrrAk
 0dsIlOR6ApKDPM2QCF8WXB9cSY2D7ZdEPd1Sujam65n0rXfESPBpQmr5Izx0j6KZ
 9p1FVphJT1UYa1SZiCZq8BYz4vIqgKvJgMTFW2p0WtNC9T7Dc9/uBmWxnrLdbcqe
 Y2+IyialY2YAvnIEkQV7dBrSaAkslxwe9a723dbhx4vkq2HBU0Y=
 =ENzl
 -----END PGP SIGNATURE-----

Merge tag 'm68knommu-for-v5.9-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gerg/m68knommu

Pull m68knommu fix from Greg Ungerer:
 "Only a single fix for the binfmt_flat loader (reverting a recent
  change)"

* tag 'm68knommu-for-v5.9-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gerg/m68knommu:
  binfmt_flat: revert "binfmt_flat: don't offset the data start"
2020-08-25 11:59:23 -07:00
Linus Torvalds
9907ab3714 for-5.9-rc2-tag
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEE8rQSAMVO+zA4DBdWxWXV+ddtWDsFAl9D5EkACgkQxWXV+ddt
 WDto+g/6A/2QzxhgOmqqHTiDvn3DkL60XfjB6lmq3NEvinrST+VH20EoX/EuX2Kn
 u2+gMiWrgBUwlvERkoSasxdJf/6dCCc+9zYDjjKkAxCckENT85Np71o3iEc7Z5z+
 LFgS26mt6aYlCCHyIsHutzHtK2MKiUz7/oaUYZMJBHHkKS/5hL1mzIbwiWAqfU2H
 q0iMz9L2mjp1kZnpwa/yhg/NJ/oGZsKm3UPGDhdc0RlCWHBbDXHFk1wvNRo/yKQW
 l+yy0dh6PAZ45pRL0/WZwvOzcAglb+uSmwa64UOvwio4Na9P7oAcBzTFmtbBtvP4
 WBrOUPCTzkvgQcmoAsWFpD4nrzgW4oS71EICTOIRlPx7A86TP3wYpFEygUlLCoZC
 Pd4e9mPClmW78hcRT12eJeGcJIzgoKWhR8597jNUEYz3R5T2wKHOcNnq9a1E1PLv
 zR+5MFShsylUHd7HbMC1O86XnfXe5esegNQMvx36kTS+cR9Dyt5EWMNIAYK4BPM3
 /tXWZRqlZPOh3T7DZ4QR5oSSDDNq7ROTdv9jmsleno+woG0MNDYsA7jCbeJnGTmI
 CtTUP+p41otyM2lFZjV8PG/XyXDKb3UfU5gcsDOZdGP5S0tkyBiKSA6eqhz6DaTi
 fQOLGZdkNpNN/burbMq7d7YEHr3F6LC17U3L4k5V4MTAm2lp7ZQ=
 =ONgI
 -----END PGP SIGNATURE-----

Merge tag 'for-5.9-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux

Pull btrfs fixes from David Sterba:

 - fix swapfile activation on subvolumes with deleted snapshots

 - error value mixup when removing directory entries from tree log

 - fix lzo compression level reset after previous level setting

 - fix space cache memory leak after transaction abort

 - fix const function attribute

 - more error handling improvements

* tag 'for-5.9-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
  btrfs: detect nocow for swap after snapshot delete
  btrfs: check the right error variable in btrfs_del_dir_entries_in_log
  btrfs: fix space cache memory leak after transaction abort
  btrfs: use the correct const function attribute for btrfs_get_num_csums
  btrfs: reset compression level for lzo on remount
  btrfs: handle errors from async submission
2020-08-24 12:01:20 -07:00
Max Filippov
2217b98262 binfmt_flat: revert "binfmt_flat: don't offset the data start"
binfmt_flat loader uses the gap between text and data to store data
segment pointers for the libraries. Even in the absence of shared
libraries it stores at least one pointer to the executable's own data
segment. Text and data can go back to back in the flat binary image and
without offsetting data segment last few instructions in the text
segment may get corrupted by the data segment pointer.

Fix it by reverting commit a2357223c5 ("binfmt_flat: don't offset the
data start").

Cc: stable@vger.kernel.org
Fixes: a2357223c5 ("binfmt_flat: don't offset the data start")
Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
Signed-off-by: Greg Ungerer <gerg@linux-m68k.org>
2020-08-24 08:49:13 +10:00
Gustavo A. R. Silva
df561f6688 treewide: Use fallthrough pseudo-keyword
Replace the existing /* fall through */ comments and its variants with
the new pseudo-keyword macro fallthrough[1]. Also, remove unnecessary
fall-through markings when it is the case.

[1] https://www.kernel.org/doc/html/v5.7/process/deprecated.html?highlight=fallthrough#implicit-switch-case-fall-through

Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
2020-08-23 17:36:59 -05:00
Linus Torvalds
f320ac6e13 Merge branch 'work.epoll' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull epoll fixes from Al Viro:
 "Fix reference counting and clean up exit paths"

* 'work.epoll' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  do_epoll_ctl(): clean the failure exits up a bit
  epoll: Keep a reference on files added to the check list
2020-08-22 17:11:38 -07:00
Al Viro
52c479697c do_epoll_ctl(): clean the failure exits up a bit
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-08-22 18:25:52 -04:00
Marc Zyngier
a9ed4a6560 epoll: Keep a reference on files added to the check list
When adding a new fd to an epoll, and that this new fd is an
epoll fd itself, we recursively scan the fds attached to it
to detect cycles, and add non-epool files to a "check list"
that gets subsequently parsed.

However, this check list isn't completely safe when deletions
can happen concurrently. To sidestep the issue, make sure that
a struct file placed on the check list sees its f_count increased,
ensuring that a concurrent deletion won't result in the file
disapearing from under our feet.

Cc: stable@vger.kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-08-22 18:23:57 -04:00
Linus Torvalds
f873db9acd io_uring-5.9-2020-08-21
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAl9AMgoQHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgpsqjEAC3hNnlu7BwVRFMeJzOyxZUqvvtT2ktYTZs
 9duV7qezounq022nKihl0/wy0KWOxfg4HDT492la54nKYAPf6xhszrbuAUx9pW8Z
 pwcUccaci7nB29V4a4wedtxz+jegCN2LXbRNk4DOpchlVKULfrOIcfW5/rL/7gkp
 15n/AAIZNChJ6y9dJDqYRoiF152/6uk7t+BolU/+W9QCKi2PW40nTOgfkzSnBvJV
 WaHlYHKAOUaiurIUjZQolgohNNBUzNwWtF/4HSeT5n8c94gSpI3IKFkmNCjxQQ96
 I0gjJZIss7N8ysKFBy3WALqx9FqxSWS3pi/G9fai4o/VPEFj+fhfBTh+H1fzLaoM
 V+oOHMCt5Cwlw+n8vSgtUU0JF6ZnmoolfpHWPchtCJyQ42i/gt41MrePdu/tUC+n
 tV7wvftuM/+AN36vDDgbDc5BTKjCnRQSHz80M3EwUznJJjaeTAPxnQ+pVlpN9IS+
 sbywlg+Xake9F19qA/astAH9n3U2+m3HdmoIXfG1vrXKFt/I9d36gh5hzlCh5//5
 zAu1/iwy1fAlaI4CWRR14+e8/ozu5SCxlswsI79sGZcFuv+WQsQ84q297rq8v0Wr
 HdtmiRDGlBFfcuiEOjoSzSEwMWPc1F+8EcmiEp8SZBglKDM+kQI9XMKKXakqh7K0
 yEWGAMm+1g==
 =dLiS
 -----END PGP SIGNATURE-----

Merge tag 'io_uring-5.9-2020-08-21' of git://git.kernel.dk/linux-block

Pull io_uring fixes from Jens Axboe:

 - Make sure the head link cancelation includes async work

 - Get rid of kiocb_wait_page_queue_init(), makes no sense to have it as
   a separate function since you moved it into io_uring itself

 - io_import_iovec cleanups (Pavel, me)

 - Use system_unbound_wq for ring exit work, to avoid spawning tons of
   these if we have tons of rings exiting at the same time

 - Fix req->flags overflow flag manipulation (Pavel)

* tag 'io_uring-5.9-2020-08-21' of git://git.kernel.dk/linux-block:
  io_uring: kill extra iovec=NULL in import_iovec()
  io_uring: comment on kfree(iovec) checks
  io_uring: fix racy req->flags modification
  io_uring: use system_unbound_wq for ring exit work
  io_uring: cleanup io_import_iovec() of pre-mapped request
  io_uring: get rid of kiocb_wait_page_queue_init()
  io_uring: find and cancel head link async work on files exit
2020-08-21 14:59:16 -07:00
Linus Torvalds
349111f050 Merge branch 'akpm' (patches from Andrew)
Merge misc fixes from Andrew Morton:
 "11 patches.

  Subsystems affected by this: misc, mm/hugetlb, mm/vmalloc, mm/misc,
  romfs, relay, uprobes, squashfs, mm/cma, mm/pagealloc"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
  mm, page_alloc: fix core hung in free_pcppages_bulk()
  mm: include CMA pages in lowmem_reserve at boot
  squashfs: avoid bio_alloc() failure with 1Mbyte blocks
  uprobes: __replace_page() avoid BUG in munlock_vma_page()
  kernel/relay.c: fix memleak on destroy relay channel
  romfs: fix uninitialized memory leak in romfs_dev_read()
  mm/rodata_test.c: fix missing function declaration
  mm/vunmap: add cond_resched() in vunmap_pmd_range
  khugepaged: adjust VM_BUG_ON_MM() in __khugepaged_enter()
  hugetlb_cgroup: convert comma to semicolon
  mailmap: add Andi Kleen
2020-08-21 14:44:48 -07:00
Linus Torvalds
d723b99ec9 Improvements to ext4's block allocator performance for very large file
systems, especially when the file system or files which are highly
 fragmented.  There is a new mount option, prefetch_block_bitmaps which
 will pull in the block bitmaps and set up the in-memory buddy bitmaps
 when the file system is initially mounted.
 
 Beyond that, a lot of bug fixes and cleanups.  In particular, a number
 of changes to make ext4 more robust in the face of write errors or
 file system corruptions.
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEK2m5VNv+CHkogTfJ8vlZVpUNgaMFAl8/Q9YACgkQ8vlZVpUN
 gaPz+wgAkiWwpge0pfcukABW9FcHK9R82IPggA/NnFu0I+3trpqVQP8mYWqg+1l7
 X0W6B6GHMcITGdwxVDNGHHv0WabXCqFPT0ENwW1cnl9UL6I91Ev2NjmG9HP6hVZa
 g3+NyXJwiOP38xsxpPJGPoYFw2wZyv8/e41MMnsE6goYjMmB04sHvXCUQkbN41Fn
 3CMdsiueYZDAKflvAlL50Jy7Imz5tq9oy81/z+amqvWo4T0U8zRwQuf25nBAhr25
 1WdT4CbCNGO2Qwyu9X+t/KGNVIQhCctkx/yz71l3p2piEGkw/XE4VJNrkmWb0zN7
 k9F5uGOZlAlQEzx+5PN//Qtz1Db0QQ==
 =E6vv
 -----END PGP SIGNATURE-----

Merge tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4

Pull ext4 updates from Ted Ts'o:
 "Improvements to ext4's block allocator performance for very large file
  systems, especially when the file system or files which are highly
  fragmented. There is a new mount option, prefetch_block_bitmaps which
  will pull in the block bitmaps and set up the in-memory buddy bitmaps
  when the file system is initially mounted.

  Beyond that, a lot of bug fixes and cleanups. In particular, a number
  of changes to make ext4 more robust in the face of write errors or
  file system corruptions"

* tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (46 commits)
  ext4: limit the length of per-inode prealloc list
  ext4: reorganize if statement of ext4_mb_release_context()
  ext4: add mb_debug logging when there are lost chunks
  ext4: Fix comment typo "the the".
  jbd2: clean up checksum verification in do_one_pass()
  ext4: change to use fallthrough macro
  ext4: remove unused parameter of ext4_generic_delete_entry function
  mballoc: replace seq_printf with seq_puts
  ext4: optimize the implementation of ext4_mb_good_group()
  ext4: delete invalid comments near ext4_mb_check_limits()
  ext4: fix typos in ext4_mb_regular_allocator() comment
  ext4: fix checking of directory entry validity for inline directories
  fs: prevent BUG_ON in submit_bh_wbc()
  ext4: correctly restore system zone info when remount fails
  ext4: handle add_system_zone() failure in ext4_setup_system_zone()
  ext4: fold ext4_data_block_valid_rcu() into the caller
  ext4: check journal inode extents more carefully
  ext4: don't allow overlapping system zones
  ext4: handle error of ext4_setup_system_zone() on remount
  ext4: delete the invalid BUGON in ext4_mb_load_buddy_gfp()
  ...
2020-08-21 11:03:38 -07:00
David Howells
5e0b17b026 afs: Fix NULL deref in afs_dynroot_depopulate()
If an error occurs during the construction of an afs superblock, it's
possible that an error occurs after a superblock is created, but before
we've created the root dentry.  If the superblock has a dynamic root
(ie.  what's normally mounted on /afs), the afs_kill_super() will call
afs_dynroot_depopulate() to unpin any created dentries - but this will
oops if the root hasn't been created yet.

Fix this by skipping that bit of code if there is no root dentry.

This leads to an oops looking like:

	general protection fault, ...
	KASAN: null-ptr-deref in range [0x0000000000000068-0x000000000000006f]
	...
	RIP: 0010:afs_dynroot_depopulate+0x25f/0x529 fs/afs/dynroot.c:385
	...
	Call Trace:
	 afs_kill_super+0x13b/0x180 fs/afs/super.c:535
	 deactivate_locked_super+0x94/0x160 fs/super.c:335
	 afs_get_tree+0x1124/0x1460 fs/afs/super.c:598
	 vfs_get_tree+0x89/0x2f0 fs/super.c:1547
	 do_new_mount fs/namespace.c:2875 [inline]
	 path_mount+0x1387/0x2070 fs/namespace.c:3192
	 do_mount fs/namespace.c:3205 [inline]
	 __do_sys_mount fs/namespace.c:3413 [inline]
	 __se_sys_mount fs/namespace.c:3390 [inline]
	 __x64_sys_mount+0x27f/0x300 fs/namespace.c:3390
	 do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
	 entry_SYSCALL_64_after_hwframe+0x44/0xa9

which is oopsing on this line:

	inode_lock(root->d_inode);

presumably because sb->s_root was NULL.

Fixes: 0da0b7fd73 ("afs: Display manually added cells in dynamic root mount")
Reported-by: syzbot+c1eff8205244ae7e11a6@syzkaller.appspotmail.com
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-08-21 10:56:40 -07:00
Phillip Lougher
f26044c83e squashfs: avoid bio_alloc() failure with 1Mbyte blocks
This is a regression introduced by the patch "migrate from ll_rw_block
usage to BIO".

Bio_alloc() is limited to 256 pages (1 Mbyte).  This can cause a failure
when reading 1 Mbyte block filesystems.  The problem is a datablock can be
fully (or almost uncompressed), requiring 256 pages, but, because blocks
are not aligned to page boundaries, it may require 257 pages to read.

Bio_kmalloc() can handle 1024 pages, and so use this for the edge
condition.

Fixes: 93e72b3c61 ("squashfs: migrate from ll_rw_block usage to BIO")
Reported-by: Nicolas Prochazka <nicolas.prochazka@gmail.com>
Reported-by: Tomoatsu Shimada <shimada@walbrix.com>
Signed-off-by: Phillip Lougher <phillip@squashfs.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Guenter Roeck <groeck@chromium.org>
Cc: Philippe Liard <pliard@google.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Adrien Schildknecht <adrien+dev@schischi.me>
Cc: Daniel Rosenberg <drosen@google.com>
Cc: <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20200815035637.15319-1-phillip@squashfs.org.uk
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-08-21 09:52:53 -07:00
Jann Horn
bcf85fcedf romfs: fix uninitialized memory leak in romfs_dev_read()
romfs has a superblock field that limits the size of the filesystem; data
beyond that limit is never accessed.

romfs_dev_read() fetches a caller-supplied number of bytes from the
backing device.  It returns 0 on success or an error code on failure;
therefore, its API can't represent short reads, it's all-or-nothing.

However, when romfs_dev_read() detects that the requested operation would
cross the filesystem size limit, it currently silently truncates the
requested number of bytes.  This e.g.  means that when the content of a
file with size 0x1000 starts one byte before the filesystem size limit,
->readpage() will only fill a single byte of the supplied page while
leaving the rest uninitialized, leaking that uninitialized memory to
userspace.

Fix it by returning an error code instead of truncating the read when the
requested read operation would go beyond the end of the filesystem.

Fixes: da4458bda2 ("NOMMU: Make it possible for RomFS to use MTD devices directly")
Signed-off-by: Jann Horn <jannh@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: David Howells <dhowells@redhat.com>
Cc: <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20200818013202.2246365-1-jannh@google.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-08-21 09:52:53 -07:00
Boris Burkov
a84d5d429f btrfs: detect nocow for swap after snapshot delete
can_nocow_extent and btrfs_cross_ref_exist both rely on a heuristic for
detecting a must cow condition which is not exactly accurate, but saves
unnecessary tree traversal. The incorrect assumption is that if the
extent was created in a generation smaller than the last snapshot
generation, it must be referenced by that snapshot. That is true, except
the snapshot could have since been deleted, without affecting the last
snapshot generation.

The original patch claimed a performance win from this check, but it
also leads to a bug where you are unable to use a swapfile if you ever
snapshotted the subvolume it's in. Make the check slower and more strict
for the swapon case, without modifying the general cow checks as a
compromise. Turning swap on does not seem to be a particularly
performance sensitive operation, so incurring a possibly unnecessary
btrfs_search_slot seems worthwhile for the added usability.

Note: Until the snapshot is competely cleaned after deletion,
check_committed_refs will still cause the logic to think that cow is
necessary, so the user must until 'btrfs subvolu sync' finished before
activating the swapfile swapon.

CC: stable@vger.kernel.org # 5.4+
Suggested-by: Omar Sandoval <osandov@osandov.com>
Signed-off-by: Boris Burkov <boris@bur.io>
Signed-off-by: David Sterba <dsterba@suse.com>
2020-08-21 12:21:23 +02:00
Josef Bacik
fb2fecbad5 btrfs: check the right error variable in btrfs_del_dir_entries_in_log
With my new locking code dbench is so much faster that I tripped over a
transaction abort from ENOSPC.  This turned out to be because
btrfs_del_dir_entries_in_log was checking for ret == -ENOSPC, but this
function sets err on error, and returns err.  So instead of properly
marking the inode as needing a full commit, we were returning -ENOSPC
and aborting in __btrfs_unlink_inode.  Fix this by checking the proper
variable so that we return the correct thing in the case of ENOSPC.

The ENOENT needs to be checked, because btrfs_lookup_dir_item_index()
can return -ENOENT if the dir item isn't in the tree log (which would
happen if we hadn't fsync'ed this guy).  We actually handle that case in
__btrfs_unlink_inode, so it's an expected error to get back.

Fixes: 4a500fd178 ("Btrfs: Metadata ENOSPC handling for tree log")
CC: stable@vger.kernel.org # 4.4+
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
[ add note and comment about ENOENT ]
Signed-off-by: David Sterba <dsterba@suse.com>
2020-08-21 12:20:01 +02:00
David Howells
ba8e42077b afs: Fix key ref leak in afs_put_operation()
The afs_put_operation() function needs to put the reference to the key
that's authenticating the operation.

Fixes: e49c7b2f6d ("afs: Build an abstraction around an "operation" concept")
Reported-by: Dave Botsch <botsch@cnf.cornell.edu>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-08-20 10:41:45 -07:00
Pavel Begunkov
867a23eab5 io_uring: kill extra iovec=NULL in import_iovec()
If io_import_iovec() returns an error, return iovec is undefined and
must not be used, so don't set it to NULL when failing.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-08-20 05:36:19 -06:00
Pavel Begunkov
f261c16861 io_uring: comment on kfree(iovec) checks
kfree() handles NULL pointers well, but io_{read,write}() checks it
because of performance reasons. Leave a comment there for those who are
tempted to patch it.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-08-20 05:36:17 -06:00
Pavel Begunkov
bb175342aa io_uring: fix racy req->flags modification
Setting and clearing REQ_F_OVERFLOW in io_uring_cancel_files() and
io_cqring_overflow_flush() are racy, because they might be called
asynchronously.

REQ_F_OVERFLOW flag in only needed for files cancellation, so if it can
be guaranteed that requests _currently_ marked inflight can't be
overflown, the problem will be solved with removing the flag
altogether.

That's how the patch works, it removes inflight status of a request
in io_cqring_fill_event() whenever it should be thrown into CQ-overflow
list. That's Ok to do, because no opcode specific handling can be done
after io_cqring_fill_event(), the same assumption as with "struct
io_completion" patches.
And it already have a good place for such cleanups, which is
io_clean_op(). A nice side effect of this is removing this inflight
check from the hot path.

note on synchronisation: now __io_cqring_fill_event() may be taking two
spinlocks simultaneously, completion_lock and inflight_lock. It's fine,
because we never do that in reverse order, and CQ-overflow of inflight
requests shouldn't happen often.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-08-20 05:36:15 -06:00
Jens Axboe
fc666777da io_uring: use system_unbound_wq for ring exit work
We currently use system_wq, which is unbounded in terms of number of
workers. This means that if we're exiting tons of rings at the same
time, then we'll briefly spawn tons of event kworkers just for a very
short blocking time as the rings exit.

Use system_unbound_wq instead, which has a sane cap on the concurrency
level.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-08-19 11:10:51 -06:00
Filipe Manana
bbc37d6e47 btrfs: fix space cache memory leak after transaction abort
If a transaction aborts it can cause a memory leak of the pages array of
a block group's io_ctl structure. The following steps explain how that can
happen:

1) Transaction N is committing, currently in state TRANS_STATE_UNBLOCKED
   and it's about to start writing out dirty extent buffers;

2) Transaction N + 1 already started and another task, task A, just called
   btrfs_commit_transaction() on it;

3) Block group B was dirtied (extents allocated from it) by transaction
   N + 1, so when task A calls btrfs_start_dirty_block_groups(), at the
   very beginning of the transaction commit, it starts writeback for the
   block group's space cache by calling btrfs_write_out_cache(), which
   allocates the pages array for the block group's io_ctl with a call to
   io_ctl_init(). Block group A is added to the io_list of transaction
   N + 1 by btrfs_start_dirty_block_groups();

4) While transaction N's commit is writing out the extent buffers, it gets
   an IO error and aborts transaction N, also setting the file system to
   RO mode;

5) Task A has already returned from btrfs_start_dirty_block_groups(), is at
   btrfs_commit_transaction() and has set transaction N + 1 state to
   TRANS_STATE_COMMIT_START. Immediately after that it checks that the
   filesystem was turned to RO mode, due to transaction N's abort, and
   jumps to the "cleanup_transaction" label. After that we end up at
   btrfs_cleanup_one_transaction() which calls btrfs_cleanup_dirty_bgs().
   That helper finds block group B in the transaction's io_list but it
   never releases the pages array of the block group's io_ctl, resulting in
   a memory leak.

In fact at the point when we are at btrfs_cleanup_dirty_bgs(), the pages
array points to pages that were already released by us at
__btrfs_write_out_cache() through the call to io_ctl_drop_pages(). We end
up freeing the pages array only after waiting for the ordered extent to
complete through btrfs_wait_cache_io(), which calls io_ctl_free() to do
that. But in the transaction abort case we don't wait for the space cache's
ordered extent to complete through a call to btrfs_wait_cache_io(), so
that's why we end up with a memory leak - we wait for the ordered extent
to complete indirectly by shutting down the work queues and waiting for
any jobs in them to complete before returning from close_ctree().

We can solve the leak simply by freeing the pages array right after
releasing the pages (with the call to io_ctl_drop_pages()) at
__btrfs_write_out_cache(), since we will never use it anymore after that
and the pages array points to already released pages at that point, which
is currently not a problem since no one will use it after that, but not a
good practice anyway since it can easily lead to use-after-free issues.

So fix this by freeing the pages array right after releasing the pages at
__btrfs_write_out_cache().

This issue can often be reproduced with test case generic/475 from fstests
and kmemleak can detect it and reports it with the following trace:

unreferenced object 0xffff9bbf009fa600 (size 512):
  comm "fsstress", pid 38807, jiffies 4298504428 (age 22.028s)
  hex dump (first 32 bytes):
    00 a0 7c 4d 3d ed ff ff 40 a0 7c 4d 3d ed ff ff  ..|M=...@.|M=...
    80 a0 7c 4d 3d ed ff ff c0 a0 7c 4d 3d ed ff ff  ..|M=.....|M=...
  backtrace:
    [<00000000f4b5cfe2>] __kmalloc+0x1a8/0x3e0
    [<0000000028665e7f>] io_ctl_init+0xa7/0x120 [btrfs]
    [<00000000a1f95b2d>] __btrfs_write_out_cache+0x86/0x4a0 [btrfs]
    [<00000000207ea1b0>] btrfs_write_out_cache+0x7f/0xf0 [btrfs]
    [<00000000af21f534>] btrfs_start_dirty_block_groups+0x27b/0x580 [btrfs]
    [<00000000c3c23d44>] btrfs_commit_transaction+0xa6f/0xe70 [btrfs]
    [<000000009588930c>] create_subvol+0x581/0x9a0 [btrfs]
    [<000000009ef2fd7f>] btrfs_mksubvol+0x3fb/0x4a0 [btrfs]
    [<00000000474e5187>] __btrfs_ioctl_snap_create+0x119/0x1a0 [btrfs]
    [<00000000708ee349>] btrfs_ioctl_snap_create_v2+0xb0/0xf0 [btrfs]
    [<00000000ea60106f>] btrfs_ioctl+0x12c/0x3130 [btrfs]
    [<000000005c923d6d>] __x64_sys_ioctl+0x83/0xb0
    [<0000000043ace2c9>] do_syscall_64+0x33/0x80
    [<00000000904efbce>] entry_SYSCALL_64_after_hwframe+0x44/0xa9

CC: stable@vger.kernel.org # 4.9+
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2020-08-19 18:39:46 +02:00
David Sterba
604997b4a3 btrfs: use the correct const function attribute for btrfs_get_num_csums
The build robot reports

compiler: h8300-linux-gcc (GCC) 9.3.0
   In file included from fs/btrfs/tests/extent-map-tests.c:8:
>> fs/btrfs/tests/../ctree.h:2166:8: warning: type qualifiers ignored on function return type [-Wignored-qualifiers]
    2166 | size_t __const btrfs_get_num_csums(void);
         |        ^~~~~~~

The function attribute for const does not follow the expected scheme and
in this case is confused with a const type qualifier.

Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2020-08-19 18:39:21 +02:00
Marcos Paulo de Souza
282dd7d771 btrfs: reset compression level for lzo on remount
Currently a user can set mount "-o compress" which will set the
compression algorithm to zlib, and use the default compress level for
zlib (3):

  relatime,compress=zlib:3,space_cache

If the user remounts the fs using "-o compress=lzo", then the old
compress_level is used:

  relatime,compress=lzo:3,space_cache

But lzo does not expose any tunable compression level. The same happens
if we set any compress argument with different level, also with zstd.

Fix this by resetting the compress_level when compress=lzo is
specified.  With the fix applied, lzo is shown without compress level:

  relatime,compress=lzo,space_cache

CC: stable@vger.kernel.org # 4.4+
Signed-off-by: Marcos Paulo de Souza <mpdesouza@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2020-08-19 18:39:12 +02:00
Johannes Thumshirn
c965d6402f btrfs: handle errors from async submission
Btrfs' async submit mechanism is able to handle errors in the submission
path and the meta-data async submit function correctly passes the error
code to the caller.

In btrfs_submit_bio_start() and btrfs_submit_bio_start_direct_io() we're
not handling the errors returned by btrfs_csum_one_bio() correctly though
and simply call BUG_ON(). This is unnecessary as the caller of these two
functions - run_one_async_start - correctly checks for the return values
and sets the status of the async_submit_bio. The actual bio submission
will be handled later on by run_one_async_done only if
async_submit_bio::status is 0, so the data won't be written if we
encountered an error in the checksum process.

Simply return the error from btrfs_csum_one_bio() to the async submitters,
like it's done in btree_submit_bio_start().

Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2020-08-19 18:26:14 +02:00
brookxu
27bc446e2d ext4: limit the length of per-inode prealloc list
In the scenario of writing sparse files, the per-inode prealloc list may
be very long, resulting in high overhead for ext4_mb_use_preallocated().
To circumvent this problem, we limit the maximum length of per-inode
prealloc list to 512 and allow users to modify it.

After patching, we observed that the sys ratio of cpu has dropped, and
the system throughput has increased significantly. We created a process
to write the sparse file, and the running time of the process on the
fixed kernel was significantly reduced, as follows:

Running time on unfixed kernel:
[root@TENCENT64 ~]# time taskset 0x01 ./sparse /data1/sparce.dat
real    0m2.051s
user    0m0.008s
sys     0m2.026s

Running time on fixed kernel:
[root@TENCENT64 ~]# time taskset 0x01 ./sparse /data1/sparce.dat
real    0m0.471s
user    0m0.004s
sys     0m0.395s

Signed-off-by: Chunguang Xu <brookxu@tencent.com>
Link: https://lore.kernel.org/r/d7a98178-056b-6db5-6bce-4ead23f4a257@gmail.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-08-19 12:04:36 -04:00
brookxu
66d5e0277e ext4: reorganize if statement of ext4_mb_release_context()
Reorganize the if statement of ext4_mb_release_context(), make it
easier to read.

Signed-off-by: Chunguang Xu <brookxu@tencent.com>
Link: https://lore.kernel.org/r/5439ac6f-db79-ad68-76c1-a4dda9aa0cc3@gmail.com
Reviewed-by: Ritesh Harjani <riteshh@linux.ibm.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-08-19 12:04:36 -04:00
brookxu
c55ee7d202 ext4: add mb_debug logging when there are lost chunks
Lost chunks are when some other process raced with the current thread
to grab a particular block allocation.  Add mb_debug log for
developers who wants to see how often this is happening for a
particular workload.

Signed-off-by: Chunguang Xu <brookxu@tencent.com>
Link: https://lore.kernel.org/r/0a165ac0-1912-aebd-8a0d-b42e7cd1aea1@gmail.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-08-19 12:04:36 -04:00
kyoungho koo
7ca4fcba92 ext4: Fix comment typo "the the".
I have found double typed comments "the the". So i modified it to
one "the"

Signed-off-by: kyoungho koo <rnrudgh@gmail.com>
Link: https://lore.kernel.org/r/20200424171620.GA11943@koo-Z370-HD3
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-08-19 12:04:35 -04:00
Shijie Luo
00a3fff071 jbd2: clean up checksum verification in do_one_pass()
Remove the unnecessary chksum_err and checksum_seen variables as well as
some redundant code to make the function easier to understand.

[ With changes suggested by jack@ and tytso@ ]

Signed-off-by: Shijie Luo <luoshijie1@huawei.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20200819122955.33526-1-luoshijie1@huawei.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-08-19 12:04:35 -04:00
Jens Axboe
8452fd0ce6 io_uring: cleanup io_import_iovec() of pre-mapped request
io_rw_prep_async() goes through a dance of clearing req->io, calling
the iovec import, then re-setting req->io. Provide an internal helper
that does the right thing without needing state tweaked to get there.

This enables further cleanups in io_read, io_write, and
io_resubmit_prep(), but that's left for another time.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-08-18 14:03:15 -07:00
Shijie Luo
70d7ced2ed ext4: change to use fallthrough macro
Change to use fallthrough macro in switch case.

Signed-off-by: Shijie Luo <luoshijie1@huawei.com>
Reviewed-by: Ritesh Harjani <riteshh@linux.ibm.com>
Link: https://lore.kernel.org/r/20200810114435.24182-1-luoshijie1@huawei.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-08-18 14:27:40 -04:00
Kyoungho Koo
2fe34d2938 ext4: remove unused parameter of ext4_generic_delete_entry function
The ext4_generic_delete_entry function does not use the parameter
handle, so it can be removed.

Signed-off-by: Kyoungho Koo <rnrudgh@gmail.com>
Reviewed-by: Ritesh Harjani <riteshh@linux.ibm.com>
Link: https://lore.kernel.org/r/20200810080701.GA14160@koo-Z370-HD3
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-08-18 14:25:54 -04:00
Xu Wang
e0d438c72a mballoc: replace seq_printf with seq_puts
seq_puts is a lot cheaper than seq_printf, so use that to print
literal strings.

Signed-off-by: Xu Wang <vulab@iscas.ac.cn>
Reviewed-by: Ritesh Harjani <riteshh@linux.ibm.com>
Link: https://lore.kernel.org/r/20200810022158.9167-1-vulab@iscas.ac.cn
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-08-18 14:21:59 -04:00
brookxu
dddcd2f9eb ext4: optimize the implementation of ext4_mb_good_group()
It might be better to adjust the code in two places:
1. Determine whether grp is currupt or not should be placed first.
2. (cr<=2 && free <ac->ac_g_ex.fe_len)should may belong to the crx
   strategy, and it may be more appropriate to put it in the
   subsequent switch statement block. For cr1, cr2, the conditions
   in switch potentially realize the above judgment. For cr0, we
   should add (free <ac->ac_g_ex.fe_len) judgment, and then delete
   (free / fragments) >= ac->ac_g_ex.fe_len), because cr0 returns
   true by default.

Signed-off-by: Chunguang Xu <brookxu@tencent.com>
Reviewed-by: Andreas Dilger <adilger@dilger.ca>
Reviewed-by: Ritesh Harjani <riteshh@linux.ibm.com>
Link: https://lore.kernel.org/r/e20b2d8f-1154-adb7-3831-a9e11ba842e9@gmail.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-08-18 14:18:36 -04:00
brookxu
051e2ce8cb ext4: delete invalid comments near ext4_mb_check_limits()
These comments do not seem to be related to ext4_mb_check_limits(),
it may be invalid.

Signed-off-by: Chunguang Xu <brookxu@tencent.com>
Reviewed-by: Ritesh Harjani <riteshh@linux.ibm.com>
Link: https://lore.kernel.org/r/c49faf0c-d5d5-9c51-6911-9e0ff57c6bfa@gmail.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-08-18 14:15:54 -04:00
brookxu
e9a3cd48d6 ext4: fix typos in ext4_mb_regular_allocator() comment
Fix typos in ext4_mb_regular_allocator() comment

Signed-off-by: Chunguang Xu <brookxu@tencent.com>
Reviewed-by: Ritesh Harjani <riteshh@linux.ibm.com>
Reviewed-by: Andreas Dilger <adilger@dilger.ca>
Link: https://lore.kernel.org/r/d6514145-73b3-808b-ec5a-a8be27c51f9c@gmail.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-08-18 14:14:16 -04:00
Jens Axboe
3b2a4439e0 io_uring: get rid of kiocb_wait_page_queue_init()
The 5.9 merge moved this function io_uring, which means that we don't
need to retain the generic nature of it. Clean up this part by removing
redundant checks, and just inlining the small remainder in
io_rw_should_retry().

No functional changes in this patch.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-08-16 14:36:31 -07:00
Jens Axboe
b711d4eaf0 io_uring: find and cancel head link async work on files exit
Commit f254ac04c8 ("io_uring: enable lookup of links holding inflight files")
only handled 2 out of the three head link cases we have, we also need to
lookup and cancel work that is blocked in io-wq if that work has a link
that's holding a reference to the files structure.

Put the "cancel head links that hold this request pending" logic into
io_attempt_cancel(), which will to through the motions of finding and
canceling head links that hold the current inflight files stable request
pending.

Cc: stable@vger.kernel.org
Reported-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-08-16 14:36:31 -07:00
J. Bruce Fields
34b09af4f5 nfsd: fix oops on mixed NFSv4/NFSv3 client access
If an NFSv2/v3 client breaks an NFSv4 client's delegation, it will hit a
NULL dereference in nfsd_breaker_owns_lease().

Easily reproduceable with for example

	mount -overs=4.2 server:/export /mnt/
	sleep 1h </mnt/file &
	mount -overs=3 server:/export /mnt2/
	touch /mnt2/file

Reported-by: Robert Dinse <nanook@eskimo.com>
Fixes: 28df3d1539 ("nfsd: clients don't need to break their own delegations")
BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=208807
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2020-08-16 16:51:18 -04:00
Linus Torvalds
2cc3c4b3c2 io_uring-5.9-2020-08-15
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAl84aLkQHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgps72D/96/HCiUArhxmRltiwqB9KjemEPaY7nWDkz
 0hrUtTCr/MjqFIzgPwx6pIjdiSP4GbmcMCrBO67E+mLbOdO1hKte2ElysRAsTlLE
 fGrcdrs3is5+QK8aqJJks74NzM7XG6jfrR8ewV9cz6aJRWbgRjCWvSxx/03Iha2B
 t897xuwJg4K30Z83IxPfnu+Xp0dIfmFLUuXQApsZ3bNTuQW3sR4CC/v418+1Wmk4
 kXGbQtxcEsrhCy9OxnNyU6GPEJ3b99ANPbRE8OUNQwaHiejvMOCWpcmaoT6TwKeU
 aku+XcVyoMjxJQk2k0uzr8Ecj5G1FJv4fUHhTZBxcGqkrxhkTjQ520HtrqPlc7uV
 BjyXutZ8yjmeCbvGrsXu8f8ktjHHkntkRDA8hgzW1OpmWYuWKF/2OIjNpmcmtvbj
 XqwBDEKdQW9X4dHoQKsVExtzeT6nNP0dxaeZX8OeB2GGkitP7rCm8k/SOuDPTCLi
 MX/qWpERo4hRfCLjY+4nezxkFMLIF7Jej3tzwuVRshFYsRVQzTPQpbnkmkuwibhi
 ObEwVI+lLkbatnR2wmJwoVKcywH13U68VNJXACyw1GZnPlp2lYylT/MV80y/iELE
 mj4zDklqwfIrnoHEuCkERwgXrYsffhUrFvajmAMnJncUFOI4khYJ+dWBVVlBkyn0
 e7UK1Sd7Jw==
 =T+fx
 -----END PGP SIGNATURE-----

Merge tag 'io_uring-5.9-2020-08-15' of git://git.kernel.dk/linux-block

Pull io_uring fixes from Jens Axboe:
 "A few differerent things in here.

  Seems like syzbot got some more io_uring bits wired up, and we got a
  handful of reports and the associated fixes are in here.

  General fixes too, and a lot of them marked for stable.

  Lastly, a bit of fallout from the async buffered reads, where we now
  more easily trigger short reads. Some applications don't really like
  that, so the io_read() code now handles short reads internally, and
  got a cleanup along the way so that it's now easier to read (and
  documented). We're now passing tests that failed before"

* tag 'io_uring-5.9-2020-08-15' of git://git.kernel.dk/linux-block:
  io_uring: short circuit -EAGAIN for blocking read attempt
  io_uring: sanitize double poll handling
  io_uring: internally retry short reads
  io_uring: retain iov_iter state over io_read/io_write calls
  task_work: only grab task signal lock when needed
  io_uring: enable lookup of links holding inflight files
  io_uring: fail poll arm on queue proc failure
  io_uring: hold 'ctx' reference around task_work queue + execute
  fs: RWF_NOWAIT should imply IOCB_NOIO
  io_uring: defer file table grabbing request cleanup for locked requests
  io_uring: add missing REQ_F_COMP_LOCKED for nested requests
  io_uring: fix recursive completion locking on oveflow flush
  io_uring: use TWA_SIGNAL for task_work uncondtionally
  io_uring: account locked memory before potential error case
  io_uring: set ctx sq/cq entry count earlier
  io_uring: Fix NULL pointer dereference in loop_rw_iter()
  io_uring: add comments on how the async buffered read retry works
  io_uring: io_async_buf_func() need not test page bit
2020-08-16 10:55:12 -07:00
Jens Axboe
f91daf565b io_uring: short circuit -EAGAIN for blocking read attempt
One case was missed in the short IO retry handling, and that's hitting
-EAGAIN on a blocking attempt read (eg from io-wq context). This is a
problem on sockets that are marked as non-blocking when created, they
don't carry any REQ_F_NOWAIT information to help us terminate them
instead of perpetually retrying.

Fixes: 227c0c9673 ("io_uring: internally retry short reads")
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-08-15 15:58:42 -07:00
Jens Axboe
d4e7cd36a9 io_uring: sanitize double poll handling
There's a bit of confusion on the matching pairs of poll vs double poll,
depending on if the request is a pure poll (IORING_OP_POLL_ADD) or
poll driven retry.

Add io_poll_get_double() that returns the double poll waitqueue, if any,
and io_poll_get_single() that returns the original poll waitqueue. With
that, remove the argument to io_poll_remove_double().

Finally ensure that wait->private is cleared once the double poll handler
has run, so that remove knows it's already been seen.

Cc: stable@vger.kernel.org # v5.8
Reported-by: syzbot+7f617d4a9369028b8a2c@syzkaller.appspotmail.com
Fixes: 18bceab101 ("io_uring: allow POLL_ADD with double poll_wait() users")
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-08-15 11:48:18 -07:00
Linus Torvalds
410520d07f 9p pull request for inclusion in 5.9
- some code cleanup
 - a couple of static analysis fixes
 - setattr: try to pick a fid associated with the file rather than the
 dentry, which might sometimes matter
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE/IPbcYBuWt0zoYhOq06b7GqY5nAFAl83dvkACgkQq06b7GqY
 5nDk4g/+KNpbIFHhDyhCzmZ1WbaUYlZseFm8VBR9dzwOkXCuVl+WQd9ducYlEifT
 cZ29xmEuom01LhFOs6VzC+IODQq3h1WvEbVRfCSlLEWvfHyPODSDY8mrn+1TdOZL
 UOOP1wUjaDqmAcBQ4PSDNmj2cONNWPqkgnuyswjsBpabOY/U8FxG9J/QiPfC12zs
 QoF5800OiuLYS5SefI9kKJ47DkTLpm/aq7/Z1liAxJBzHrguORjg16cssXhWeysD
 jEMD3pIrsqMDX/vNkAO0DaR8FWvxTFZR1Bza1ylJiUy3K3rLXKfo+42jiCnuo5e9
 uhA012RfiCZB3kIsLe5hCPXUWz0MhizaJAPkceHsgUtkuiU2qXyY05HEvS2HHl0s
 inqR8gRkZ2NhROx1R/0q+Wu2posCBderDwg7Mkzt/7j5Ue3cTIKvbeBxH9iLLaK+
 rHp0NbfP6wYN5qKjH1rru1IOmaUCV2HHV6x/alnwaVBfZqg5HYKkzlS7M+5hAbtb
 niLFXXCtPBGqS+u9w7sruj6utuiJjXBfB1ejaawY+frS41+v3NkHjjUWCH67u8I9
 u3pmPpvm5o7gRoRmvmxEdSb+jAYZwLTcbkUhRMHLQOHCBZXtcul41p2BUpoBm0zw
 42mlMMMBAjaxtzLlryvACg5lx6x7ykE4pExKAc0dzrzWfgzwEZQ=
 =hwGq
 -----END PGP SIGNATURE-----

Merge tag '9p-for-5.9-rc1' of git://github.com/martinetd/linux

Pull 9p updates from Dominique Martinet:

 - some code cleanup

 - a couple of static analysis fixes

 - setattr: try to pick a fid associated with the file rather than the
   dentry, which might sometimes matter

* tag '9p-for-5.9-rc1' of git://github.com/martinetd/linux:
  9p: Remove unneeded cast from memory allocation
  9p: remove unused code in 9p
  net/9p: Fix sparse endian warning in trans_fd.c
  9p: Fix memory leak in v9fs_mount
  9p: retrieve fid from file when file instance exist.
2020-08-15 08:34:36 -07:00
Linus Torvalds
f6513bd39c 3 small cifs/smb3 fixes, one for stable fixing mkdir path with idsfromsid mount option
-----BEGIN PGP SIGNATURE-----
 
 iQGzBAABCgAdFiEE6fsu8pdIjtWE/DpLiiy9cAdyT1EFAl83ch8ACgkQiiy9cAdy
 T1ExdAwAqHLP2o4lADcHsACPWxswMwuSwMpNQhSOV4bnRaVGkh3EGbvFZ8dmop1u
 XrEyppM8TlbMUzIW6OO/Rntbl91Ll4Mi7d+15vdBgwOSKS2dJ+tkbV1iWQoJffUv
 cxfPSLlnAe8dFaitRky3HWcYi4+vwY/TksaaQpdbVFVGJpvdKG8zu/wYmJEZ57TD
 0Hjy4BhbGH1iu6X7mutslRCJlx1kOw5jyvqsECXqhF80+tKze4QlC5dZ6hjMPb+6
 C9sp03/KMe84/1GNDbsTxc0Vjd0K0+1E+lgDl2N7XVC6HNTg47Smzm7X+SgOc7J9
 rF2keAadbvD+prNx52nojI27y4JW99mGwuSpjZBIFJrJ6hq2ukbH7DBOQXSJ7Ovt
 +0OgreUDlZS2FuLaVAIQak5lQbQ34xd/zIrQ1bQHefkmGuAE+2Dm86GIP4/WUSlY
 RTNzvUrCq0EnVQncMlG0pPiZg6Yiz3ie3nSI9MohYl+niqhpsd3M/uFh68CNnSgV
 aZovWPp4
 =FHq0
 -----END PGP SIGNATURE-----

Merge tag '5.9-rc-smb3-fixes-part2' of git://git.samba.org/sfrench/cifs-2.6

Pull cifs fixes from Steve French:
 "Three small cifs/smb3 fixes, one for stable fixing mkdir path with
  the 'idsfromsid' mount option"

* tag '5.9-rc-smb3-fixes-part2' of git://git.samba.org/sfrench/cifs-2.6:
  SMB3: Fix mkdir when idsfromsid configured on mount
  cifs: Convert to use the fallthrough macro
  cifs: Fix an error pointer dereference in cifs_mount()
2020-08-15 08:31:39 -07:00
Linus Torvalds
37711e5e23 NFS client updates for Linux 5.9
Highlights include:
 
 Stable fixes:
 - pNFS: Don't return layout segments that are being used for I/O
 - pNFS: Don't move layout segments off the active list when being used for I/O
 
 Features:
 - NFS: Add support for user xattrs through the NFSv4.2 protocol
 - NFS: Allow applications to speed up readdir+statx() using AT_STATX_DONT_SYNC
 - NFSv4.0 allow nconnect for v4.0
 
 Bugfixes and cleanups:
 - nfs: ensure correct writeback errors are returned on close()
 - nfs: nfs_file_write() should check for writeback errors
 - nfs: Fix getxattr kernel panic and memory overflow
 - NFS: Fix the pNFS/flexfiles mirrored read failover code
 - SUNRPC: dont update timeout value on connection reset
 - freezer: Add unsafe versions of freezable_schedule_timeout_interruptible for NFS
 - sunrpc: destroy rpc_inode_cachep after unregister_filesystem
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEESQctxSBg8JpV8KqEZwvnipYKAPIFAl83CYEACgkQZwvnipYK
 APLx4g/9EQNTG5VkUToElRHs4b3vFxbmh9Odnk/JwPHxY5GQ8/AyGqwWHBgMZc7/
 2AV/C83pk7pJsDNsKbVAaFCT1cwjmItHM63vKJYGBYbE8LhkZ/1sYkdtPBYwHoVl
 7CWfpVY/4NjYw5GJfrVA5Y0m7lrQInRtIMzfaENw2tpw+/cKUpadxgEJltzFNvpa
 Ploinr1ZRBl1tvfeHNRP5ZMPk2AfgGWtQKQ/b2UWUk5tXALoQm2Eu+/oku39uqhy
 hW5tCbU2BzR91gg5JwF9n7VowkCHXfe7lMzDBTVfwZOELOmoyys/1wKv550FWcWi
 yymljWiPGZOnXGT1vfKptPESQjdtElMfanvEZ0BzS+yNR0HZGnIupaxGlYlG9ZGU
 2sXHQPp3mk2Q+L1IgbTSCnSju0YlZo32JQpYCZiROjIXnPWPQ50YNhr8GL18M1FW
 hTeShg2avWH+59GB6moEBmsuvui7Dy1jkimblToLEoGJ4kbvEl72FYSqTCkAXXbB
 rVUzhmJFgfk/EOS4d+QKJoBqNzw3aw79wyT7PLkoCYBqPZBHQexlmI+ktMbgUEdw
 c/fM7l5/Vcb9weIHKzul2Jbk5q6bFME/xPnIkr3v/oKIFlLFwQ04BX6R/42AMJHw
 V5Q9Wp81Vy6RXQMn8P0ZMeY0WQC/rhpijOEVMUC+Ni+spz44AdM=
 =k4pE
 -----END PGP SIGNATURE-----

Merge tag 'nfs-for-5.9-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs

Pull NFS client updates from Trond Myklebust:
 "Stable fixes:
   - pNFS: Don't return layout segments that are being used for I/O
   - pNFS: Don't move layout segments off the active list when being used for I/O

  Features:
   - NFS: Add support for user xattrs through the NFSv4.2 protocol
   - NFS: Allow applications to speed up readdir+statx() using AT_STATX_DONT_SYNC
   - NFSv4.0 allow nconnect for v4.0

  Bugfixes and cleanups:
   - nfs: ensure correct writeback errors are returned on close()
   - nfs: nfs_file_write() should check for writeback errors
   - nfs: Fix getxattr kernel panic and memory overflow
   - NFS: Fix the pNFS/flexfiles mirrored read failover code
   - SUNRPC: dont update timeout value on connection reset
   - freezer: Add unsafe versions of freezable_schedule_timeout_interruptible for NFS
   - sunrpc: destroy rpc_inode_cachep after unregister_filesystem"

* tag 'nfs-for-5.9-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (32 commits)
  NFS: Fix flexfiles read failover
  fs: nfs: delete repeated words in comments
  rpc_pipefs: convert comma to semicolon
  nfs: Fix getxattr kernel panic and memory overflow
  NFS: Don't return layout segments that are in use
  NFS: Don't move layouts to plh_return_segs list while in use
  NFS: Add layout segment info to pnfs read/write/commit tracepoints
  NFS: Add tracepoints for layouterror and layoutstats.
  NFS: Report the stateid + status in trace_nfs4_layoutreturn_on_close()
  SUNRPC dont update timeout value on connection reset
  nfs: nfs_file_write() should check for writeback errors
  nfs: ensure correct writeback errors are returned on close()
  NFSv4.2: xattr cache: get rid of cache discard work queue
  NFS: remove redundant initialization of variable result
  NFSv4.0 allow nconnect for v4.0
  freezer: Add unsafe versions of freezable_schedule_timeout_interruptible for NFS
  sunrpc: destroy rpc_inode_cachep after unregister_filesystem
  NFSv4.2: add client side xattr caching.
  NFSv4.2: hook in the user extended attribute handlers
  NFSv4.2: add the extended attribute proc functions.
  ...
2020-08-15 08:26:55 -07:00
Randy Dunlap
c734124c5c fs: autofs: delete repeated words in comments
Drop duplicated words {the, at} in comments.

Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Ian Kent <raven@themaw.net>
Link: http://lkml.kernel.org/r/20200811021817.24982-1-rdunlap@infradead.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-08-14 19:56:56 -07:00
Kees Cook
fc4177be96 exec: restore EACCES of S_ISDIR execve()
Patch series "Fix S_ISDIR execve() errno".

Fix an errno change for execve() of directories, noticed by Marc Zyngier.
Along with the fix, include a regression test to avoid seeing this return
in the future.

This patch (of 2):

The return code for attempting to execute a directory has always been
EACCES.  Adjust the S_ISDIR exec test to reflect the old errno instead of
the general EISDIR for other kinds of "open" attempts on directories.

Fixes: 633fb6ac39 ("exec: move S_ISREG() check earlier")
Reported-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Tested-by: Greg Kroah-Hartman <gregkh@android.com>
Reviewed-by: Greg Kroah-Hartman <gregkh@google.com>
Link: http://lkml.kernel.org/r/20200813231723.2725102-2-keescook@chromium.org
Link: https://lore.kernel.org/lkml/20200813151305.6191993b@why
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-08-14 19:56:56 -07:00
Steve French
c8c412f976 SMB3: Fix mkdir when idsfromsid configured on mount
mkdir uses a compounded create operation which was not setting
the security descriptor on create of a directory. Fix so
mkdir now sets the mode and owner info properly when idsfromsid
and modefromsid are configured on the mount.

Signed-off-by: Steve French <stfrench@microsoft.com>
CC: Stable <stable@vger.kernel.org> # v5.8
Reviewed-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
2020-08-13 19:41:01 -05:00
Jens Axboe
227c0c9673 io_uring: internally retry short reads
We've had a few application cases of not handling short reads properly,
and it is understandable as short reads aren't really expected if the
application isn't doing non-blocking IO.

Now that we retain the iov_iter over retries, we can implement internal
retry pretty trivially. This ensures that we don't return a short read,
even for buffered reads on page cache conflicts.

Cleanup the deep nesting and hard to read nature of io_read() as well,
it's much more straight forward now to read and understand. Added a
few comments explaining the logic as well.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-08-13 16:00:31 -07:00