2
0
mirror of https://github.com/edk2-porting/linux-next.git synced 2024-12-19 02:34:01 +08:00
Commit Graph

61973 Commits

Author SHA1 Message Date
Jens Axboe
35cb6d54c1 fs: make build_open_flags() available internally
This is a prep patch for supporting non-blocking open from io_uring.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-01-20 17:01:53 -07:00
Jens Axboe
d63d1b5edb io_uring: add support for fallocate()
This exposes fallocate(2) through io_uring.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-01-20 17:01:53 -07:00
Jens Axboe
4d92748373 Merge branch 'io_uring-5.5' into for-5.6/io_uring-vfs
Pull in compatability fix for the files_update command.

* io_uring-5.5:
  io_uring: fix compat for IORING_REGISTER_FILES_UPDATE
2020-01-20 17:01:17 -07:00
Eugene Syromiatnikov
1292e972ff io_uring: fix compat for IORING_REGISTER_FILES_UPDATE
fds field of struct io_uring_files_update is problematic with regards
to compat user space, as pointer size is different in 32-bit, 32-on-64-bit,
and 64-bit user space.  In order to avoid custom handling of compat in
the syscall implementation, make fds __u64 and use u64_to_user_ptr in
order to retrieve it.  Also, align the field naturally and check that
no garbage is passed there.

Fixes: c3a31e6056 ("io_uring: add support for IORING_REGISTER_FILES_UPDATE")
Signed-off-by: Eugene Syromiatnikov <esyr@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-01-20 17:00:44 -07:00
Jens Axboe
fa7773deb3 Merge branch 'work.openat2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs into for-5.6/io_uring-vfs
Pull in Al's openat2 branch, since we'll need that for the openat2
support.

* 'work.openat2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  Documentation: path-lookup: include new LOOKUP flags
  selftests: add openat2(2) selftests
  open: introduce openat2(2) syscall
  namei: LOOKUP_{IN_ROOT,BENEATH}: permit limited ".." resolution
  namei: LOOKUP_IN_ROOT: chroot-like scoped resolution
  namei: LOOKUP_BENEATH: O_BENEATH-like scoped resolution
  namei: LOOKUP_NO_XDEV: block mountpoint crossing
  namei: LOOKUP_NO_MAGICLINKS: block magic-link resolution
  namei: LOOKUP_NO_SYMLINKS: block symlink resolution
  namei: allow set_root() to produce errors
  namei: allow nd_jump_link() to produce errors
  nsfs: clean-up ns_get_path() signature to return int
  namei: only return -ECHILD from follow_dotdot_rcu()
2020-01-19 19:47:04 -07:00
Aleksa Sarai
fddb5d430a open: introduce openat2(2) syscall
/* Background. */
For a very long time, extending openat(2) with new features has been
incredibly frustrating. This stems from the fact that openat(2) is
possibly the most famous counter-example to the mantra "don't silently
accept garbage from userspace" -- it doesn't check whether unknown flags
are present[1].

This means that (generally) the addition of new flags to openat(2) has
been fraught with backwards-compatibility issues (O_TMPFILE has to be
defined as __O_TMPFILE|O_DIRECTORY|[O_RDWR or O_WRONLY] to ensure old
kernels gave errors, since it's insecure to silently ignore the
flag[2]). All new security-related flags therefore have a tough road to
being added to openat(2).

Userspace also has a hard time figuring out whether a particular flag is
supported on a particular kernel. While it is now possible with
contemporary kernels (thanks to [3]), older kernels will expose unknown
flag bits through fcntl(F_GETFL). Giving a clear -EINVAL during
openat(2) time matches modern syscall designs and is far more
fool-proof.

In addition, the newly-added path resolution restriction LOOKUP flags
(which we would like to expose to user-space) don't feel related to the
pre-existing O_* flag set -- they affect all components of path lookup.
We'd therefore like to add a new flag argument.

Adding a new syscall allows us to finally fix the flag-ignoring problem,
and we can make it extensible enough so that we will hopefully never
need an openat3(2).

/* Syscall Prototype. */
  /*
   * open_how is an extensible structure (similar in interface to
   * clone3(2) or sched_setattr(2)). The size parameter must be set to
   * sizeof(struct open_how), to allow for future extensions. All future
   * extensions will be appended to open_how, with their zero value
   * acting as a no-op default.
   */
  struct open_how { /* ... */ };

  int openat2(int dfd, const char *pathname,
              struct open_how *how, size_t size);

/* Description. */
The initial version of 'struct open_how' contains the following fields:

  flags
    Used to specify openat(2)-style flags. However, any unknown flag
    bits or otherwise incorrect flag combinations (like O_PATH|O_RDWR)
    will result in -EINVAL. In addition, this field is 64-bits wide to
    allow for more O_ flags than currently permitted with openat(2).

  mode
    The file mode for O_CREAT or O_TMPFILE.

    Must be set to zero if flags does not contain O_CREAT or O_TMPFILE.

  resolve
    Restrict path resolution (in contrast to O_* flags they affect all
    path components). The current set of flags are as follows (at the
    moment, all of the RESOLVE_ flags are implemented as just passing
    the corresponding LOOKUP_ flag).

    RESOLVE_NO_XDEV       => LOOKUP_NO_XDEV
    RESOLVE_NO_SYMLINKS   => LOOKUP_NO_SYMLINKS
    RESOLVE_NO_MAGICLINKS => LOOKUP_NO_MAGICLINKS
    RESOLVE_BENEATH       => LOOKUP_BENEATH
    RESOLVE_IN_ROOT       => LOOKUP_IN_ROOT

open_how does not contain an embedded size field, because it is of
little benefit (userspace can figure out the kernel open_how size at
runtime fairly easily without it). It also only contains u64s (even
though ->mode arguably should be a u16) to avoid having padding fields
which are never used in the future.

Note that as a result of the new how->flags handling, O_PATH|O_TMPFILE
is no longer permitted for openat(2). As far as I can tell, this has
always been a bug and appears to not be used by userspace (and I've not
seen any problems on my machines by disallowing it). If it turns out
this breaks something, we can special-case it and only permit it for
openat(2) but not openat2(2).

After input from Florian Weimer, the new open_how and flag definitions
are inside a separate header from uapi/linux/fcntl.h, to avoid problems
that glibc has with importing that header.

/* Testing. */
In a follow-up patch there are over 200 selftests which ensure that this
syscall has the correct semantics and will correctly handle several
attack scenarios.

In addition, I've written a userspace library[4] which provides
convenient wrappers around openat2(RESOLVE_IN_ROOT) (this is necessary
because no other syscalls support RESOLVE_IN_ROOT, and thus lots of care
must be taken when using RESOLVE_IN_ROOT'd file descriptors with other
syscalls). During the development of this patch, I've run numerous
verification tests using libpathrs (showing that the API is reasonably
usable by userspace).

/* Future Work. */
Additional RESOLVE_ flags have been suggested during the review period.
These can be easily implemented separately (such as blocking auto-mount
during resolution).

Furthermore, there are some other proposed changes to the openat(2)
interface (the most obvious example is magic-link hardening[5]) which
would be a good opportunity to add a way for userspace to restrict how
O_PATH file descriptors can be re-opened.

Another possible avenue of future work would be some kind of
CHECK_FIELDS[6] flag which causes the kernel to indicate to userspace
which openat2(2) flags and fields are supported by the current kernel
(to avoid userspace having to go through several guesses to figure it
out).

[1]: https://lwn.net/Articles/588444/
[2]: https://lore.kernel.org/lkml/CA+55aFyyxJL1LyXZeBsf2ypriraj5ut1XkNDsunRBqgVjZU_6Q@mail.gmail.com
[3]: commit 629e014bb8 ("fs: completely ignore unknown open flags")
[4]: https://sourceware.org/bugzilla/show_bug.cgi?id=17523
[5]: https://lore.kernel.org/lkml/20190930183316.10190-2-cyphar@cyphar.com/
[6]: https://youtu.be/ggD-eb3yPVs

Suggested-by: Christian Brauner <christian.brauner@ubuntu.com>
Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-01-18 09:19:18 -05:00
Linus Torvalds
25e73aadf2 io_uring-5.5-2020-01-16
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAl4hQEoQHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgpjZ0D/9X0JXHTlP8qebVw0Rjnb838TJtygwBZ8bm
 EsvEYP9lOJbR15V2WGWO2daNaaKMouglMQ8OWYMNGvREDtfNBxy3mE8ZVnpG385R
 RUweqCIK0rHpAfSRr4Nh9GwIeMyomLzOeumjVzXATsUS1o2+bCfv34pe22uikpgx
 njA2ab389hS2b9fMOFf78odazOMiCQSW7a2dwO1+5TNWtmYCei3SNPZuqZucvRPr
 9iSnZswJZb8KqyGyuJo6dQQhvurXgAM8LRglc6KIJ1NpyJCgPzyULYEYOvLyLHLo
 USvvivi5xFeUQy1x7w72Xu3dQ0Jg+i9nSDiAACM+ehCdVcKC0OcbFcvPJ06iH9V3
 RRdBUBJHHXSzklVHpo44iwZcmPNQNAWwM/vtlsrT9ln9fkgLeHG3zsScKOcv9fFw
 9YmtmZQkw9Zst5wghiOQsLhwsUndOPLLUbtiNGmUr1eKXeRYekFpO++HI/DwkWhN
 rFVJiHbMxIP0k7uk54sNPoHrXthfNiiFjOf4eZDV20xwVJ0xenmYpfW8XW447r3W
 C2dGRtRBbm598OCV0PzXFd1vIUKAr8b8fJwS3gZzZOH0uYbYr79AOn1cs2F//0M0
 MUXZo9LHfpfeGkMzimiPZj7lrZEI4LPAjYc1mnt4fXhuzPhYAinkcU3tQRY0T+ia
 4YqjdDtD3Q==
 =MhHp
 -----END PGP SIGNATURE-----

Merge tag 'io_uring-5.5-2020-01-16' of git://git.kernel.dk/linux-block

Pull io_uring fixes form Jens Axboe:

 - Ensure ->result is always set when IO is retried (Bijan)

 - In conjunction with the above, fix a regression in polled IO issue
   when retried (me/Bijan)

 - Don't setup async context for read/write fixed, otherwise we may
   wrongly map the iovec on retry (me)

 - Cancel io-wq work if we fail getting mm reference (me)

 - Ensure dependent work is always initialized correctly (me)

 - Only allow original task to submit IO, don't allow it from a passed
   ring fd (me)

* tag 'io_uring-5.5-2020-01-16' of git://git.kernel.dk/linux-block:
  io_uring: only allow submit from owning task
  io_uring: ensure workqueue offload grabs ring mutex for poll list
  io_uring: clear req->result always before issuing a read/write request
  io_uring: be consistent in assigning next work from handler
  io-wq: cancel work if we fail getting a mm reference
  io_uring: don't setup async context for read/write fixed
2020-01-17 11:25:45 -08:00
Linus Torvalds
effaf90137 for-5.5-rc6-tag
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEE8rQSAMVO+zA4DBdWxWXV+ddtWDsFAl4h7wwACgkQxWXV+ddt
 WDtFgxAAqanZ3wqq8xNqybfWmmFNrKtdkakErRuQFWe/+kZH59HuBifTbYeVMD4Z
 hFgveFJpBIqo7uMUxUItSTtHcr3qx7TP9ejGYOaQO997oNPxPQXuEY8Lq5ebDBVB
 89Gn+Eg/Q+uPvCJSctxx4dblSiGZKb3iOEh+lJuWJV4bj8beekcTrqsg01ZchPRO
 Ygk1ltW7Vpf0wVkdts4FKiKiwX02M2C9zxh9NQjpNwH1DMow4XtBPsbqHbiHzRym
 SoD4+0dbhfdnKkNnBTFEJBbjbZcYwM9EQnfiyVL+/hDMHX4XTetqeFN1G8usfXXX
 2kxvwttPUtluJqlQXQnUU4mQEA4p5ORTgGgw1WBF3h+Aezumkql+27Bd6aiDKGZz
 SPc9sveft60R23TxorlrYVqfADgyZKEaZ+2wEM99Xoz4OdvP7jkqDentJW9us1Xh
 Xmfovq5xcRY17f9tdhiwqH5vgwxrLgmjBvTm/kcGX3ImhU8Yxk8xKw1JoV0P9cjW
 7awK4l8pyPbOUhekdT8hYqWXlL/DXhAMHraV1zfBKIbu1omlGByeg23jNM2iS/0B
 YtRkEEen0tRHpuKLB08twTKCak94wObBamKNFE6Snt1cDudLwGDpUosazM9l4uPR
 2D3SHs7UWNgtvTRCfq2LVoRMRSR2BA1b19EkylUig4ay7khmZ2k=
 =2e9q
 -----END PGP SIGNATURE-----

Merge tag 'for-5.5-rc6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux

Pull btrfs fixes from David Sterba:
 "A few more fixes that have been in the works during last twp weeks.
  All have a user visible effect and are stable material:

   - scrub: properly update progress after calling cancel ioctl, calling
     'resume' would start from the beginning otherwise

   - fix subvolume reference removal, after moving out of the original
     path the reference is not recognized and will lead to transaction
     abort

   - fix reloc root lifetime checks, could lead to crashes when there's
     subvolume cleaning running in parallel

   - fix memory leak when quotas get disabled in the middle of extent
     accounting

   - fix transaction abort in case of balance being started on degraded
     mount on eg. RAID1"

* tag 'for-5.5-rc6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
  btrfs: check rw_devices, not num_devices for balance
  Btrfs: always copy scrub arguments back to user space
  btrfs: relocation: fix reloc_root lifespan and access
  btrfs: fix memory leak in qgroup accounting
  btrfs: do not delete mismatched root refs
  btrfs: fix invalid removal of root ref
  btrfs: rework arguments of btrfs_unlink_subvol
2020-01-17 11:21:05 -08:00
Linus Torvalds
ab7541c3ad fuse fixes for 5.5-rc7
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQSQHSd0lITzzeNWNm3h3BK/laaZPAUCXiF1QwAKCRDh3BK/laaZ
 POf9AQCoPHnT7oH1gYUHfZAhS4cYX72+v6F75gYKUce0/jSDPQEAhbcMhoo31aO2
 BGTXRkeCVtg77IhxUmhXCLoQYjpSoQc=
 =UOsx
 -----END PGP SIGNATURE-----

Merge tag 'fuse-fixes-5.5-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse

Pull fuse fix from Miklos Szeredi:
 "Fix a regression in the last release affecting the ftp module of the
  gvfs filesystem"

* tag 'fuse-fixes-5.5-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse:
  fuse: fix fuse_send_readpages() in the syncronous read case
2020-01-17 08:42:02 -08:00
Josef Bacik
b35cf1f0bf btrfs: check rw_devices, not num_devices for balance
The fstest btrfs/154 reports

  [ 8675.381709] BTRFS: Transaction aborted (error -28)
  [ 8675.383302] WARNING: CPU: 1 PID: 31900 at fs/btrfs/block-group.c:2038 btrfs_create_pending_block_groups+0x1e0/0x1f0 [btrfs]
  [ 8675.390925] CPU: 1 PID: 31900 Comm: btrfs Not tainted 5.5.0-rc6-default+ #935
  [ 8675.392780] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-59-gc9ba527-rebuilt.opensuse.org 04/01/2014
  [ 8675.395452] RIP: 0010:btrfs_create_pending_block_groups+0x1e0/0x1f0 [btrfs]
  [ 8675.402672] RSP: 0018:ffffb2090888fb00 EFLAGS: 00010286
  [ 8675.404413] RAX: 0000000000000000 RBX: ffff92026dfa91c8 RCX: 0000000000000001
  [ 8675.406609] RDX: 0000000000000000 RSI: ffffffff8e100899 RDI: ffffffff8e100971
  [ 8675.408775] RBP: ffff920247c61660 R08: 0000000000000000 R09: 0000000000000000
  [ 8675.410978] R10: 0000000000000000 R11: 0000000000000000 R12: 00000000ffffffe4
  [ 8675.412647] R13: ffff92026db74000 R14: ffff920247c616b8 R15: ffff92026dfbc000
  [ 8675.413994] FS:  00007fd5e57248c0(0000) GS:ffff92027d800000(0000) knlGS:0000000000000000
  [ 8675.416146] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  [ 8675.417833] CR2: 0000564aa51682d8 CR3: 000000006dcbc004 CR4: 0000000000160ee0
  [ 8675.419801] Call Trace:
  [ 8675.420742]  btrfs_start_dirty_block_groups+0x355/0x480 [btrfs]
  [ 8675.422600]  btrfs_commit_transaction+0xc8/0xaf0 [btrfs]
  [ 8675.424335]  reset_balance_state+0x14a/0x190 [btrfs]
  [ 8675.425824]  btrfs_balance.cold+0xe7/0x154 [btrfs]
  [ 8675.427313]  ? kmem_cache_alloc_trace+0x235/0x2c0
  [ 8675.428663]  btrfs_ioctl_balance+0x298/0x350 [btrfs]
  [ 8675.430285]  btrfs_ioctl+0x466/0x2550 [btrfs]
  [ 8675.431788]  ? mem_cgroup_charge_statistics+0x51/0xf0
  [ 8675.433487]  ? mem_cgroup_commit_charge+0x56/0x400
  [ 8675.435122]  ? do_raw_spin_unlock+0x4b/0xc0
  [ 8675.436618]  ? _raw_spin_unlock+0x1f/0x30
  [ 8675.438093]  ? __handle_mm_fault+0x499/0x740
  [ 8675.439619]  ? do_vfs_ioctl+0x56e/0x770
  [ 8675.441034]  do_vfs_ioctl+0x56e/0x770
  [ 8675.442411]  ksys_ioctl+0x3a/0x70
  [ 8675.443718]  ? trace_hardirqs_off_thunk+0x1a/0x1c
  [ 8675.445333]  __x64_sys_ioctl+0x16/0x20
  [ 8675.446705]  do_syscall_64+0x50/0x210
  [ 8675.448059]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
  [ 8675.479187] BTRFS: error (device vdb) in btrfs_create_pending_block_groups:2038: errno=-28 No space left

We now use btrfs_can_overcommit() to see if we can flip a block group
read only.  Before this would fail because we weren't taking into
account the usable un-allocated space for allocating chunks.  With my
patches we were allowed to do the balance, which is technically correct.

The test is trying to start balance on degraded mount.  So now we're
trying to allocate a chunk and cannot because we want to allocate a
RAID1 chunk, but there's only 1 device that's available for usage.  This
results in an ENOSPC.

But we shouldn't even be making it this far, we don't have enough
devices to restripe.  The problem is we're using btrfs_num_devices(),
that also includes missing devices. That's not actually what we want, we
need to use rw_devices.

The chunk_mutex is not needed here, rw_devices changes only in device
add, remove or replace, all are excluded by EXCL_OP mechanism.

Fixes: e4d8ec0f65 ("Btrfs: implement online profile changing")
CC: stable@vger.kernel.org # 4.4+
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
[ add stacktrace, update changelog, drop chunk_mutex ]
Signed-off-by: David Sterba <dsterba@suse.com>
2020-01-17 15:40:54 +01:00
Filipe Manana
5afe6ce748 Btrfs: always copy scrub arguments back to user space
If scrub returns an error we are not copying back the scrub arguments
structure to user space. This prevents user space to know how much
progress scrub has done if an error happened - this includes -ECANCELED
which is returned when users ask for scrub to stop. A particular use
case, which is used in btrfs-progs, is to resume scrub after it is
canceled, in that case it relies on checking the progress from the scrub
arguments structure and then use that progress in a call to resume
scrub.

So fix this by always copying the scrub arguments structure to user
space, overwriting the value returned to user space with -EFAULT only if
copying the structure failed to let user space know that either that
copying did not happen, and therefore the structure is stale, or it
happened partially and the structure is probably not valid and corrupt
due to the partial copy.

Reported-by: Graham Cobb <g.btrfs@cobb.uk.net>
Link: https://lore.kernel.org/linux-btrfs/d0a97688-78be-08de-ca7d-bcb4c7fb397e@cobb.uk.net/
Fixes: 06fe39ab15 ("Btrfs: do not overwrite scrub error with fault error in scrub ioctl")
CC: stable@vger.kernel.org # 5.1+
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Qu Wenruo <wqu@suse.com>
Tested-by: Graham Cobb <g.btrfs@cobb.uk.net>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2020-01-17 15:28:52 +01:00
Jens Axboe
44d282796f io_uring: only allow submit from owning task
If the credentials or the mm doesn't match, don't allow the task to
submit anything on behalf of this ring. The task that owns the ring can
pass the file descriptor to another task, but we don't want to allow
that task to submit an SQE that then assumes the ring mm and creds if
it needs to go async.

Cc: stable@vger.kernel.org
Suggested-by: Stefan Metzmacher <metze@samba.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-01-16 21:43:24 -07:00
Miklos Szeredi
7df1e988c7 fuse: fix fuse_send_readpages() in the syncronous read case
Buffered read in fuse normally goes via:

 -> generic_file_buffered_read()
   -> fuse_readpages()
     -> fuse_send_readpages()
       ->fuse_simple_request() [called since v5.4]

In the case of a read request, fuse_simple_request() will return a
non-negative bytecount on success or a negative error value.  A positive
bytecount was taken to be an error and the PG_error flag set on the page.
This resulted in generic_file_buffered_read() falling back to ->readpage(),
which would repeat the read request and succeed.  Because of the repeated
read succeeding the bug was not detected with regression tests or other use
cases.

The FTP module in GVFS however fails the second read due to the
non-seekable nature of FTP downloads.

Fix by checking and ignoring positive return value from
fuse_simple_request().

Reported-by: Ondrej Holy <oholy@redhat.com>
Link: https://gitlab.gnome.org/GNOME/gvfs/issues/441
Fixes: 134831e36b ("fuse: convert readpages to simple api")
Cc: <stable@vger.kernel.org> # v5.4
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2020-01-16 11:09:36 +01:00
Jens Axboe
11ba820bf1 io_uring: ensure workqueue offload grabs ring mutex for poll list
A previous commit moved the locking for the async sqthread, but didn't
take into account that the io-wq workers still need it. We can't use
req->in_async for this anymore as both the sqthread and io-wq workers
set it, gate the need for locking on io_wq_current_is_worker() instead.

Fixes: 8a4955ff1c ("io_uring: sqthread should grab ctx->uring_lock for submissions")
Reported-by: Bijan Mottahedeh <bijan.mottahedeh@oracle.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-01-15 21:51:17 -07:00
Bijan Mottahedeh
797f3f535d io_uring: clear req->result always before issuing a read/write request
req->result is cleared when io_issue_sqe() calls io_read/write_pre()
routines.  Those routines however are not called when the sqe
argument is NULL, which is the case when io_issue_sqe() is called from
io_wq_submit_work().  io_issue_sqe() may then examine a stale result if
a polled request had previously failed with -EAGAIN:

        if (ctx->flags & IORING_SETUP_IOPOLL) {
                if (req->result == -EAGAIN)
                        return -EAGAIN;

                io_iopoll_req_issued(req);
        }

and in turn cause a subsequently completed request to be re-issued in
io_wq_submit_work().

Signed-off-by: Bijan Mottahedeh <bijan.mottahedeh@oracle.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-01-15 21:36:13 -07:00
Linus Torvalds
84bf39461e Merge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs fixes from Al Viro:
 "Fixes for mountpoint_last() bugs (by converting to use of
  lookup_last()) and an autofs regression fix from this cycle (caused by
  follow_managed() breakage introduced in barrier fixes series)"

* 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  fix autofs regression caused by follow_managed() changes
  reimplement path_mountpoint() with less magic
2020-01-15 09:58:14 -08:00
Al Viro
508c877276 fix autofs regression caused by follow_managed() changes
we need to reload ->d_flags after the call of ->d_manage() - the thing
might've been called with dentry still negative and have the damn thing
turned positive while we'd waited.

Fixes: d41efb522e "fs/namei.c: pull positivity check into follow_managed()"
Reported-by: Ian Kent <raven@themaw.net>
Tested-by: Ian Kent <raven@themaw.net>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-01-15 01:36:46 -05:00
Al Viro
c64cd6e34e reimplement path_mountpoint() with less magic
... and get rid of a bunch of bugs in it.  Background:
the reason for path_mountpoint() is that umount() really doesn't
want attempts to revalidate the root of what it's trying to umount.
The thing we want to avoid actually happen from complete_walk();
solution was to do something parallel to normal path_lookupat()
and it both went overboard and got the boilerplate subtly
(and not so subtly) wrong.

A better solution is to do pretty much what the normal path_lookupat()
does, but instead of complete_walk() do unlazy_walk().  All it takes
to avoid that ->d_weak_revalidate() call...  mountpoint_last() goes
away, along with everything it got wrong, and so does the magic around
LOOKUP_NO_REVAL.

Another source of bugs is that when we traverse mounts at the final
location (and we need to do that - umount . expects to get whatever's
overmounting ., if any, out of the lookup) we really ought to take
care of ->d_manage() - as it is, manual umount of autofs automount
in progress can lead to unpleasant surprises for the daemon.  Easily
solved by using handle_lookup_down() instead of follow_mount().

Tested-by: Ian Kent <raven@themaw.net>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-01-15 01:36:06 -05:00
Jens Axboe
78912934f4 io_uring: be consistent in assigning next work from handler
If we pass back dependent work in case of links, we need to always
ensure that we call the link setup and work prep handler. If not, we
might be missing some setup for the next work item.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-01-14 22:09:06 -07:00
Jens Axboe
e0bbb3461a io-wq: cancel work if we fail getting a mm reference
If we require mm and user context, mark the request for cancellation
if we fail to acquire the desired mm.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-01-14 22:06:11 -07:00
Linus Torvalds
e033e7d4a8 Merge branch 'dhowells' (patches from DavidH)
Merge misc fixes from David Howells.

Two afs fixes and a key refcounting fix.

* dhowells:
  afs: Fix afs_lookup() to not clobber the version on a new dentry
  afs: Fix use-after-loss-of-ref
  keys: Fix request_key() cache
2020-01-14 09:56:31 -08:00
David Howells
f52b83b0b1 afs: Fix afs_lookup() to not clobber the version on a new dentry
Fix afs_lookup() to not clobber the version set on a new dentry by
afs_do_lookup() - especially as it's using the wrong version of the
version (we need to use the one given to us by whatever op the dir
contents correspond to rather than what's in the afs_vnode).

Fixes: 9dd0b82ef5 ("afs: Fix missing dentry data version updating")
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-01-14 09:40:06 -08:00
David Howells
40a708bd62 afs: Fix use-after-loss-of-ref
afs_lookup() has a tracepoint to indicate the outcome of
d_splice_alias(), passing it the inode to retrieve the fid from.
However, the function gave up its ref on that inode when it called
d_splice_alias(), which may have failed and dropped the inode.

Fix this by caching the fid.

Fixes: 80548b0399 ("afs: Add more tracepoints")
Reported-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-01-14 09:40:06 -08:00
Jens Axboe
74566df3a7 io_uring: don't setup async context for read/write fixed
We don't need it, and if we have it, then the retry handler will attempt
to copy the non-existent iovec with the inline iovec, with a segment
count that doesn't make sense.

Fixes: f67676d160 ("io_uring: ensure async punted read/write requests copy iovec")
Reported-by: Jonathan Lemon <jonathan.lemon@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-01-13 19:25:29 -07:00
Qu Wenruo
6282675e67 btrfs: relocation: fix reloc_root lifespan and access
[BUG]
There are several different KASAN reports for balance + snapshot
workloads.  Involved call paths include:

   should_ignore_root+0x54/0xb0 [btrfs]
   build_backref_tree+0x11af/0x2280 [btrfs]
   relocate_tree_blocks+0x391/0xb80 [btrfs]
   relocate_block_group+0x3e5/0xa00 [btrfs]
   btrfs_relocate_block_group+0x240/0x4d0 [btrfs]
   btrfs_relocate_chunk+0x53/0xf0 [btrfs]
   btrfs_balance+0xc91/0x1840 [btrfs]
   btrfs_ioctl_balance+0x416/0x4e0 [btrfs]
   btrfs_ioctl+0x8af/0x3e60 [btrfs]
   do_vfs_ioctl+0x831/0xb10

   create_reloc_root+0x9f/0x460 [btrfs]
   btrfs_reloc_post_snapshot+0xff/0x6c0 [btrfs]
   create_pending_snapshot+0xa9b/0x15f0 [btrfs]
   create_pending_snapshots+0x111/0x140 [btrfs]
   btrfs_commit_transaction+0x7a6/0x1360 [btrfs]
   btrfs_mksubvol+0x915/0x960 [btrfs]
   btrfs_ioctl_snap_create_transid+0x1d5/0x1e0 [btrfs]
   btrfs_ioctl_snap_create_v2+0x1d3/0x270 [btrfs]
   btrfs_ioctl+0x241b/0x3e60 [btrfs]
   do_vfs_ioctl+0x831/0xb10

   btrfs_reloc_pre_snapshot+0x85/0xc0 [btrfs]
   create_pending_snapshot+0x209/0x15f0 [btrfs]
   create_pending_snapshots+0x111/0x140 [btrfs]
   btrfs_commit_transaction+0x7a6/0x1360 [btrfs]
   btrfs_mksubvol+0x915/0x960 [btrfs]
   btrfs_ioctl_snap_create_transid+0x1d5/0x1e0 [btrfs]
   btrfs_ioctl_snap_create_v2+0x1d3/0x270 [btrfs]
   btrfs_ioctl+0x241b/0x3e60 [btrfs]
   do_vfs_ioctl+0x831/0xb10

[CAUSE]
All these call sites are only relying on root->reloc_root, which can
undergo btrfs_drop_snapshot(), and since we don't have real refcount
based protection to reloc roots, we can reach already dropped reloc
root, triggering KASAN.

[FIX]
To avoid such access to unstable root->reloc_root, we should check
BTRFS_ROOT_DEAD_RELOC_TREE bit first.

This patch introduces wrappers that provide the correct way to check the
bit with memory barriers protection.

Most callers don't distinguish merged reloc tree and no reloc tree.  The
only exception is should_ignore_root(), as merged reloc tree can be
ignored, while no reloc tree shouldn't.

[CRITICAL SECTION ANALYSIS]
Although test_bit()/set_bit()/clear_bit() doesn't imply a barrier, the
DEAD_RELOC_TREE bit has extra help from transaction as a higher level
barrier, the lifespan of root::reloc_root and DEAD_RELOC_TREE bit are:

	NULL: reloc_root is NULL	PTR: reloc_root is not NULL
	0: DEAD_RELOC_ROOT bit not set	DEAD: DEAD_RELOC_ROOT bit set

	(NULL, 0)    Initial state		 __
	  |					 /\ Section A
        btrfs_init_reloc_root()			 \/
	  |				 	 __
	(PTR, 0)     reloc_root initialized      /\
          |					 |
	btrfs_update_reloc_root()		 |  Section B
          |					 |
	(PTR, DEAD)  reloc_root has been merged  \/
          |					 __
	=== btrfs_commit_transaction() ====================
	  |					 /\
	clean_dirty_subvols()			 |
	  |					 |  Section C
	(NULL, DEAD) reloc_root cleanup starts   \/
          |					 __
	btrfs_drop_snapshot()			 /\
	  |					 |  Section D
	(NULL, 0)    Back to initial state	 \/

Every have_reloc_root() or test_bit(DEAD_RELOC_ROOT) caller holds
transaction handle, so none of such caller can cross transaction boundary.

In Section A, every caller just found no DEAD bit, and grab reloc_root.

In the cross section A-B, caller may get no DEAD bit, but since reloc_root
is still completely valid thus accessing reloc_root is completely safe.

No test_bit() caller can cross the boundary of Section B and Section C.

In Section C, every caller found the DEAD bit, so no one will access
reloc_root.

In the cross section C-D, either caller gets the DEAD bit set, avoiding
access reloc_root no matter if it's safe or not.  Or caller get the DEAD
bit cleared, then access reloc_root, which is already NULL, nothing will
be wrong.

The memory write barriers are between the reloc_root updates and bit
set/clear, the pairing read side is before test_bit.

Reported-by: Zygo Blaxell <ce3g8jdj@umail.furryterror.org>
Fixes: d2311e6985 ("btrfs: relocation: Delay reloc tree deletion after merge_reloc_roots")
CC: stable@vger.kernel.org # 5.4+
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
[ barriers ]
Signed-off-by: David Sterba <dsterba@suse.com>
2020-01-13 23:10:56 +01:00
Linus Torvalds
9fb7007de8 Char/Misc patch for 5.5-rc6
Here is a single fix, for the chrdev core, for 5.5-rc6
 
 There's been a long-standing race condition triggered by syzbot, and
 occasionally real people, in the chrdev open() path.  Will finally took
 the time to track it down and fix it for real before the holidays.
 
 Here's that one patch, it's been in linux-next for a while with no
 reported issues and it does fix the reported problem.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 
 iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCXhjcRA8cZ3JlZ0Brcm9h
 aC5jb20ACgkQMUfUDdst+ykIyQCfcrNOyyFktEj7/qiVJrMLbzVWoWYAoMHtNQcG
 3IYmNNJ+eXXJEiOgeZ4J
 =J0bS
 -----END PGP SIGNATURE-----

Merge tag 'char-misc-5.5-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc

Pull char/misc fix from Greg KH:
 "Here is a single fix, for the chrdev core, for 5.5-rc6

  There's been a long-standing race condition triggered by syzbot, and
  occasionally real people, in the chrdev open() path. Will finally took
  the time to track it down and fix it for real before the holidays.

  Here's that one patch, it's been in linux-next for a while with no
  reported issues and it does fix the reported problem"

* tag 'char-misc-5.5-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
  chardev: Avoid potential use-after-free in 'chrdev_open()'
2020-01-10 13:25:24 -08:00
Linus Torvalds
4e4cd21c64 block-5.5-2020-01-10
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAl4YvdoQHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgpoOuD/0eZkvtim/ZyEzj081PZUkslWwjNvEZV9+o
 iYWMp0PBDyYgR79ca86EVTevcMiVxPnKpQl5DT+p9L1JzZ7dFc8U7fTpygjwbzx0
 FdHlPQt+oN4TsGl3hpTGGnw2ArbCHnqqj31ahgo87zo0a01xv33C3QeGFyXEIYoR
 F0QQ5E7EAyT2umKKflX9PWnrbOQZ91p2P3m+AE0TtOXMUgTi2KJKHUFu+G5OOwZB
 dM41GvyZDY9WA7bUlFeOp0mRZsiGkfEsI59VP6AR8ZkxwsOeHLrVB5iBEGPiTDL2
 dUwLwbGrLYFtwLEh4yd0aKt9++H2RZjJwi4ssyaDkkWCMQHECQXwd34DBmrV/qia
 hgh/4DV0E1X3MZFYOk44zp8kwjgpmU9MCH3dFU0bWnzm9WrvtS9uBDjgEDkn6zty
 xONSQeyHWVFQFwIjG260YEbuTplOTFP5rNWEf2CHWMHuk9kp8kfATWt9wlazYhtz
 OUELfWmkrGk8nqMN4Ee+ty582I8gxk48IGwiJHOYh1gMHHgFnJbgr97Pe2NCLLee
 9elkJnUQSdXuF314uznrAf7XLiEC0hfHGnPCTD8pAMf2DYOGKNeivh2wtdhd98cu
 AvHew9qnI2C/oahY+DaE4wunP4VlNZ/ZeNAl0h8KG7uNBCA4uFtyBMNimnKVNblw
 7KDsQ3vDIg==
 =1bsG
 -----END PGP SIGNATURE-----

Merge tag 'block-5.5-2020-01-10' of git://git.kernel.dk/linux-block

Pull block fixes from Jens Axboe:
 "A few fixes that should go into this round.

  This pull request contains two NVMe fixes via Keith, removal of a dead
  function, and a fix for the bio op for read truncates (Ming)"

* tag 'block-5.5-2020-01-10' of git://git.kernel.dk/linux-block:
  nvmet: fix per feat data len for get_feature
  nvme: Translate more status codes to blk_status_t
  fs: move guard_bio_eod() after bio_set_op_attrs
  block: remove unused mp_bvec_last_segment
2020-01-10 12:05:26 -08:00
Linus Torvalds
30b6487d15 io_uring-5.5-2020-01-10
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAl4YvQsQHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgpkf2D/9nY2CFUfVtYKv2nNpFTXSnTIT2r9WZEXfI
 vD75OF/w3RxRUJHyJNKw8YIP8fRDmpqzvryaPjb33o1nXwu2ioKe1Yrd63vbXd41
 vr6X+jAp7iieO1PoG/QFb7p7ZBuhnFFKor/1scoD+4SQEOTtmwroeL6ECw1LdQGA
 A8F5nQzHVzrfhXNconPlXuXKVpoErP4AQ4ls5YmQI3AugK4XEYeRpRuGWuecdjJM
 In3TJ47heNPOCU1Eo7MO7lkFBR+CPi4uqsdFgsYJl7F1KiuwHL0e7qsWvmUMxz+9
 Olq0hAVFAXLY6J8uEz+dBKN7Xu0zdsG2ILfOJrDUjbGQ+/aPb++08W162wxg1HlP
 +WKjpZtQpEQ3lUwk/e2fplSTC9bCA/90h/tmwjL3Olbl9+H5mxnovTevcxDX+L6x
 sBfHRlDPo8r1r71jXOjCKSYYLwEBHGmAA1DTXQt11386oIwP4RFXaregbkxRxvAu
 qqaChgnm9wb6wyApxYnU8r14qtRwN7lpK0STJB+krqoMywKpeJPirjoTDgMeovgf
 1VYDDHz/PYkSWo+nRQWtXPK0k6sk2K2YnM5t4O0LZC2Pskc6C7V5PT7ZUyTw/Hfi
 7IbWu5FCzsgCwC4UEhCcd9kRtFFQceZGW/SA1+VU98EyVUKrewXmq0NpmSiNfscV
 0cdAHuVHhw==
 =TzX1
 -----END PGP SIGNATURE-----

Merge tag 'io_uring-5.5-2020-01-10' of git://git.kernel.dk/linux-block

Pull io_uring fix from Jens Axboe:
 "Single fix for this series, fixing a regression with the short read
  handling.

  This just removes it, as it cannot safely be done for all cases"

* tag 'io_uring-5.5-2020-01-10' of git://git.kernel.dk/linux-block:
  io_uring: remove punt of short reads to async context
2020-01-10 12:03:12 -08:00
Linus Torvalds
bef1d88263 pstore fix for rare error path
- Fix label allocation lifetime/visibility to avoid further mistakes
 -----BEGIN PGP SIGNATURE-----
 Comment: Kees Cook <kees@outflux.net>
 
 iQJKBAABCgA0FiEEpcP2jyKd1g9yPm4TiXL039xtwCYFAl4YAM8WHGtlZXNjb29r
 QGNocm9taXVtLm9yZwAKCRCJcvTf3G3AJl9mD/9il1pBZmLTQFVzfLeVnDemQ1t1
 zhPywuR/FzE6KXX4kqaRJZ2ttLsHFaGqttbegQ5Jh8g5SdHZgMWRIz9y9Mb/re4g
 5ny19VH4dvJHY6CYQxl3SlL+uSx29dNQOC4XGRtVwLrs6XeCIoWhXw1KI6ahiuVW
 AK1lelPlLIdVOLrvDn3qHy6iQBXgEA0PSosxUEjxC43ekoIRwKwsMKCeT3awHrOY
 uTamIU8vXwqCl+eFsgoQBoq8dgBvNI121DbKB2rToC2Pb3cE84Ez2Ul7hrqDQqzo
 0wLPYJhvzMDqCaRwc5a42PqYFmRxSZrNHVbAnA9n/gEFPluJTklsq8fOEkXYml16
 BKiC96PcKYfJZdEePbNx3tQVtXYj/mlncLKb8FxIlybyC+P92XGlHWDkrXyva8uY
 6XyTyGpkEN1zwZnA+R6otvaOX8+I5XuX4vsRRlTaDd+1Y6SY7HxvMjxFuTko0yfY
 y/X0rfrhzvtrXUCoC+bkaA73EwaSXpie+8goFWQz9Erjpl7o0lH4lCIqm7J0hynM
 0kmlusSEfqNQ8HBnbP25BVjW4Ws/ASrr5yGDNTigsaYeCdHkwYy+Ses50I9gMWuB
 oNz7jUblC0rPlO1geBQTFpec2u+9sS7Y+xhYhh+h8GLwCX09zOB8gUTE+GBgVhsY
 ptjh+yD7ybBiXrX8jQ==
 =biFc
 -----END PGP SIGNATURE-----

Merge tag 'pstore-v5.5-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux

Pull pstore fix from Kees Cook:
 "Cengiz Can forwarded a Coverity report about more problems with a rare
  pstore initialization error path, so the allocation lifetime was
  rearranged to avoid needing to share the kfree() responsibilities
  between caller and callee"

* tag 'pstore-v5.5-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
  pstore/ram: Regularize prz label allocation lifetime
2020-01-09 21:42:05 -08:00
Ming Lei
83c9c54716 fs: move guard_bio_eod() after bio_set_op_attrs
Commit 85a8ce62c2 ("block: add bio_truncate to fix guard_bio_eod")
adds bio_truncate() for handling bio EOD. However, bio_truncate()
doesn't use the passed 'op' parameter from guard_bio_eod's callers.

So bio_trunacate() may retrieve wrong 'op', and zering pages may
not be done for READ bio.

Fixes this issue by moving guard_bio_eod() after bio_set_op_attrs()
in submit_bh_wbc() so that bio_truncate() can always retrieve correct
op info.

Meantime remove the 'op' parameter from guard_bio_eod() because it isn't
used any more.

Cc: Carlos Maiolino <cmaiolino@redhat.com>
Cc: linux-fsdevel@vger.kernel.org
Fixes: 85a8ce62c2 ("block: add bio_truncate to fix guard_bio_eod")
Signed-off-by: Ming Lei <ming.lei@redhat.com>

Fold in kerneldoc and bio_op() change.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-01-09 08:16:12 -07:00
Kees Cook
e163fdb3f7 pstore/ram: Regularize prz label allocation lifetime
In my attempt to fix a memory leak, I introduced a double-free in the
pstore error path. Instead of trying to manage the allocation lifetime
between persistent_ram_new() and its callers, adjust the logic so
persistent_ram_new() always takes a kstrdup() copy, and leaves the
caller's allocation lifetime up to the caller. Therefore callers are
_always_ responsible for freeing their label. Before, it only needed
freeing when the prz itself failed to allocate, and not in any of the
other prz failure cases, which callers would have no visibility into,
which is the root design problem that lead to both the leak and now
double-free bugs.

Reported-by: Cengiz Can <cengiz@kernel.wtf>
Link: https://lore.kernel.org/lkml/d4ec59002ede4aaf9928c7f7526da87c@kernel.wtf
Fixes: 8df955a32a ("pstore/ram: Fix error-path memory leak in persistent_ram_new() callers")
Cc: stable@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
2020-01-08 17:05:45 -08:00
Johannes Thumshirn
26ef8493e1 btrfs: fix memory leak in qgroup accounting
When running xfstests on the current btrfs I get the following splat from
kmemleak:

unreferenced object 0xffff88821b2404e0 (size 32):
  comm "kworker/u4:7", pid 26663, jiffies 4295283698 (age 8.776s)
  hex dump (first 32 bytes):
    01 00 00 00 00 00 00 00 10 ff fd 26 82 88 ff ff  ...........&....
    10 ff fd 26 82 88 ff ff 20 ff fd 26 82 88 ff ff  ...&.... ..&....
  backtrace:
    [<00000000f94fd43f>] ulist_alloc+0x25/0x60 [btrfs]
    [<00000000fd023d99>] btrfs_find_all_roots_safe+0x41/0x100 [btrfs]
    [<000000008f17bd32>] btrfs_find_all_roots+0x52/0x70 [btrfs]
    [<00000000b7660afb>] btrfs_qgroup_rescan_worker+0x343/0x680 [btrfs]
    [<0000000058e66778>] btrfs_work_helper+0xac/0x1e0 [btrfs]
    [<00000000f0188930>] process_one_work+0x1cf/0x350
    [<00000000af5f2f8e>] worker_thread+0x28/0x3c0
    [<00000000b55a1add>] kthread+0x109/0x120
    [<00000000f88cbd17>] ret_from_fork+0x35/0x40

This corresponds to:

  (gdb) l *(btrfs_find_all_roots_safe+0x41)
  0x8d7e1 is in btrfs_find_all_roots_safe (fs/btrfs/backref.c:1413).
  1408
  1409            tmp = ulist_alloc(GFP_NOFS);
  1410            if (!tmp)
  1411                    return -ENOMEM;
  1412            *roots = ulist_alloc(GFP_NOFS);
  1413            if (!*roots) {
  1414                    ulist_free(tmp);
  1415                    return -ENOMEM;
  1416            }
  1417

Following the lifetime of the allocated 'roots' ulist, it gets freed
again in btrfs_qgroup_account_extent().

But this does not happen if the function is called with the
'BTRFS_FS_QUOTA_ENABLED' flag cleared, then btrfs_qgroup_account_extent()
does a short leave and directly returns.

Instead of directly returning we should jump to the 'out_free' in order to
free all resources as expected.

CC: stable@vger.kernel.org # 4.14+
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
[ add comment ]
Signed-off-by: David Sterba <dsterba@suse.com>
2020-01-08 17:56:17 +01:00
Josef Bacik
423a716cd7 btrfs: do not delete mismatched root refs
btrfs_del_root_ref() will simply WARN_ON() if the ref doesn't match in
any way, and then continue to delete the reference.  This shouldn't
happen, we have these values because there's more to the reference than
the original root and the sub root.  If any of these checks fail, return
-ENOENT.

CC: stable@vger.kernel.org # 4.4+
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2020-01-08 14:44:24 +01:00
Josef Bacik
d49d3287e7 btrfs: fix invalid removal of root ref
If we have the following sequence of events

  btrfs sub create A
  btrfs sub create A/B
  btrfs sub snap A C
  mkdir C/foo
  mv A/B C/foo
  rm -rf *

We will end up with a transaction abort.

The reason for this is because we create a root ref for B pointing to A.
When we create a snapshot of C we still have B in our tree, but because
the root ref points to A and not C we will make it appear to be empty.

The problem happens when we move B into C.  This removes the root ref
for B pointing to A and adds a ref of B pointing to C.  When we rmdir C
we'll see that we have a ref to our root and remove the root ref,
despite not actually matching our reference name.

Now btrfs_del_root_ref() allowing this to work is a bug as well, however
we know that this inode does not actually point to a root ref in the
first place, so we shouldn't be calling btrfs_del_root_ref() in the
first place and instead simply look up our dir index for this item and
do the rest of the removal.

CC: stable@vger.kernel.org # 4.4+
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2020-01-08 14:44:23 +01:00
Josef Bacik
045d3967b6 btrfs: rework arguments of btrfs_unlink_subvol
btrfs_unlink_subvol takes the name of the dentry and the root objectid
based on what kind of inode this is, either a real subvolume link or a
empty one that we inherited as a snapshot.  We need to fix how we unlink
in the case for BTRFS_EMPTY_SUBVOL_DIR_OBJECTID in the future, so rework
btrfs_unlink_subvol to just take the dentry and handle getting the right
objectid given the type of inode this is.  There is no functional change
here, simply pushing the work into btrfs_unlink_subvol() proper.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2020-01-08 14:43:34 +01:00
Jens Axboe
eacc6dfaea io_uring: remove punt of short reads to async context
We currently punt any short read on a regular file to async context,
but this fails if the short read is due to running into EOF. This is
especially problematic since we only do the single prep for commands
now, as we don't reset kiocb->ki_pos. This can result in a 4k read on
a 1k file returning zero, as we detect the short read and then retry
from async context. At the time of retry, the position is now 1k, and
we end up reading nothing, and hence return 0.

Instead of trying to patch around the fact that short reads can be
legitimate and won't succeed in case of retry, remove the logic to punt
a short read to async context. Simply return it.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-01-07 13:08:56 -07:00
Will Deacon
68faa679b8 chardev: Avoid potential use-after-free in 'chrdev_open()'
'chrdev_open()' calls 'cdev_get()' to obtain a reference to the
'struct cdev *' stashed in the 'i_cdev' field of the target inode
structure. If the pointer is NULL, then it is initialised lazily by
looking up the kobject in the 'cdev_map' and so the whole procedure is
protected by the 'cdev_lock' spinlock to serialise initialisation of
the shared pointer.

Unfortunately, it is possible for the initialising thread to fail *after*
installing the new pointer, for example if the subsequent '->open()' call
on the file fails. In this case, 'cdev_put()' is called, the reference
count on the kobject is dropped and, if nobody else has taken a reference,
the release function is called which finally clears 'inode->i_cdev' from
'cdev_purge()' before potentially freeing the object. The problem here
is that a racing thread can happily take the 'cdev_lock' and see the
non-NULL pointer in the inode, which can result in a refcount increment
from zero and a warning:

  |  ------------[ cut here ]------------
  |  refcount_t: addition on 0; use-after-free.
  |  WARNING: CPU: 2 PID: 6385 at lib/refcount.c:25 refcount_warn_saturate+0x6d/0xf0
  |  Modules linked in:
  |  CPU: 2 PID: 6385 Comm: repro Not tainted 5.5.0-rc2+ #22
  |  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014
  |  RIP: 0010:refcount_warn_saturate+0x6d/0xf0
  |  Code: 05 55 9a 15 01 01 e8 9d aa c8 ff 0f 0b c3 80 3d 45 9a 15 01 00 75 ce 48 c7 c7 00 9c 62 b3 c6 08
  |  RSP: 0018:ffffb524c1b9bc70 EFLAGS: 00010282
  |  RAX: 0000000000000000 RBX: ffff9e9da1f71390 RCX: 0000000000000000
  |  RDX: ffff9e9dbbd27618 RSI: ffff9e9dbbd18798 RDI: ffff9e9dbbd18798
  |  RBP: 0000000000000000 R08: 000000000000095f R09: 0000000000000039
  |  R10: 0000000000000000 R11: ffffb524c1b9bb20 R12: ffff9e9da1e8c700
  |  R13: ffffffffb25ee8b0 R14: 0000000000000000 R15: ffff9e9da1e8c700
  |  FS:  00007f3b87d26700(0000) GS:ffff9e9dbbd00000(0000) knlGS:0000000000000000
  |  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  |  CR2: 00007fc16909c000 CR3: 000000012df9c000 CR4: 00000000000006e0
  |  DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
  |  DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
  |  Call Trace:
  |   kobject_get+0x5c/0x60
  |   cdev_get+0x2b/0x60
  |   chrdev_open+0x55/0x220
  |   ? cdev_put.part.3+0x20/0x20
  |   do_dentry_open+0x13a/0x390
  |   path_openat+0x2c8/0x1470
  |   do_filp_open+0x93/0x100
  |   ? selinux_file_ioctl+0x17f/0x220
  |   do_sys_open+0x186/0x220
  |   do_syscall_64+0x48/0x150
  |   entry_SYSCALL_64_after_hwframe+0x44/0xa9
  |  RIP: 0033:0x7f3b87efcd0e
  |  Code: 89 54 24 08 e8 a3 f4 ff ff 8b 74 24 0c 48 8b 3c 24 41 89 c0 44 8b 54 24 08 b8 01 01 00 00 89 f4
  |  RSP: 002b:00007f3b87d259f0 EFLAGS: 00000293 ORIG_RAX: 0000000000000101
  |  RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f3b87efcd0e
  |  RDX: 0000000000000000 RSI: 00007f3b87d25a80 RDI: 00000000ffffff9c
  |  RBP: 00007f3b87d25e90 R08: 0000000000000000 R09: 0000000000000000
  |  R10: 0000000000000000 R11: 0000000000000293 R12: 00007ffe188f504e
  |  R13: 00007ffe188f504f R14: 00007f3b87d26700 R15: 0000000000000000
  |  ---[ end trace 24f53ca58db8180a ]---

Since 'cdev_get()' can already fail to obtain a reference, simply move
it over to use 'kobject_get_unless_zero()' instead of 'kobject_get()',
which will cause the racing thread to return -ENXIO if the initialising
thread fails unexpectedly.

Cc: Hillf Danton <hdanton@sina.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Reported-by: syzbot+82defefbbd8527e1c2cb@syzkaller.appspotmail.com
Signed-off-by: Will Deacon <will@kernel.org>
Cc: stable <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/20191219120203.32691-1-will@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-01-06 20:10:26 +01:00
Gang He
b73eba2a86 ocfs2: fix the crash due to call ocfs2_get_dlm_debug once less
Because ocfs2_get_dlm_debug() function is called once less here, ocfs2
file system will trigger the system crash, usually after ocfs2 file
system is unmounted.

This system crash is caused by a generic memory corruption, these crash
backtraces are not always the same, for exapmle,

    ocfs2: Unmounting device (253,16) on (node 172167785)
    general protection fault: 0000 [#1] SMP PTI
    CPU: 3 PID: 14107 Comm: fence_legacy Kdump:
    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996)
    RIP: 0010:__kmalloc+0xa5/0x2a0
    Code: 00 00 4d 8b 07 65 4d 8b
    RSP: 0018:ffffaa1fc094bbe8 EFLAGS: 00010286
    RAX: 0000000000000000 RBX: d310a8800d7a3faf RCX: 0000000000000000
    RDX: 0000000000000000 RSI: 0000000000000dc0 RDI: ffff96e68fc036c0
    RBP: d310a8800d7a3faf R08: ffff96e6ffdb10a0 R09: 00000000752e7079
    R10: 000000000001c513 R11: 0000000004091041 R12: 0000000000000dc0
    R13: 0000000000000039 R14: ffff96e68fc036c0 R15: ffff96e68fc036c0
    FS:  00007f699dfba540(0000) GS:ffff96e6ffd80000(0000) knlGS:00000
    CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: 000055f3a9d9b768 CR3: 000000002cd1c000 CR4: 00000000000006e0
    Call Trace:
     ext4_htree_store_dirent+0x35/0x100 [ext4]
     htree_dirblock_to_tree+0xea/0x290 [ext4]
     ext4_htree_fill_tree+0x1c1/0x2d0 [ext4]
     ext4_readdir+0x67c/0x9d0 [ext4]
     iterate_dir+0x8d/0x1a0
     __x64_sys_getdents+0xab/0x130
     do_syscall_64+0x60/0x1f0
     entry_SYSCALL_64_after_hwframe+0x49/0xbe
    RIP: 0033:0x7f699d33a9fb

This regression problem was introduced by commit e581595ea2 ("ocfs: no
need to check return value of debugfs_create functions").

Link: http://lkml.kernel.org/r/20191225061501.13587-1-ghe@suse.com
Fixes: e581595ea2 ("ocfs: no need to check return value of debugfs_create functions")
Signed-off-by: Gang He <ghe@suse.com>
Acked-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Changwei Ge <gechangwei@live.cn>
Cc: Gang He <ghe@suse.com>
Cc: Jun Piao <piaojun@huawei.com>
Cc: <stable@vger.kernel.org>	[5.3+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-01-04 13:55:09 -08:00
Kai Li
397eac17f8 ocfs2: call journal flush to mark journal as empty after journal recovery when mount
If journal is dirty when mount, it will be replayed but jbd2 sb log tail
cannot be updated to mark a new start because journal->j_flag has
already been set with JBD2_ABORT first in journal_init_common.

When a new transaction is committed, it will be recored in block 1
first(journal->j_tail is set to 1 in journal_reset).  If emergency
restart happens again before journal super block is updated
unfortunately, the new recorded trans will not be replayed in the next
mount.

The following steps describe this procedure in detail.
1. mount and touch some files
2. these transactions are committed to journal area but not checkpointed
3. emergency restart
4. mount again and its journals are replayed
5. journal super block's first s_start is 1, but its s_seq is not updated
6. touch a new file and its trans is committed but not checkpointed
7. emergency restart again
8. mount and journal is dirty, but trans committed in 6 will not be
replayed.

This exception happens easily when this lun is used by only one node.
If it is used by multi-nodes, other node will replay its journal and its
journal super block will be updated after recovery like what this patch
does.

ocfs2_recover_node->ocfs2_replay_journal.

The following jbd2 journal can be generated by touching a new file after
journal is replayed, and seq 15 is the first valid commit, but first seq
is 13 in journal super block.

logdump:
  Block 0: Journal Superblock
  Seq: 0   Type: 4 (JBD2_SUPERBLOCK_V2)
  Blocksize: 4096   Total Blocks: 32768   First Block: 1
  First Commit ID: 13   Start Log Blknum: 1
  Error: 0
  Feature Compat: 0
  Feature Incompat: 2 block64
  Feature RO compat: 0
  Journal UUID: 4ED3822C54294467A4F8E87D2BA4BC36
  FS Share Cnt: 1   Dynamic Superblk Blknum: 0
  Per Txn Block Limit    Journal: 0    Data: 0

  Block 1: Journal Commit Block
  Seq: 14   Type: 2 (JBD2_COMMIT_BLOCK)

  Block 2: Journal Descriptor
  Seq: 15   Type: 1 (JBD2_DESCRIPTOR_BLOCK)
  No. Blocknum        Flags
   0. 587             none
  UUID: 00000000000000000000000000000000
   1. 8257792         JBD2_FLAG_SAME_UUID
   2. 619             JBD2_FLAG_SAME_UUID
   3. 24772864        JBD2_FLAG_SAME_UUID
   4. 8257802         JBD2_FLAG_SAME_UUID
   5. 513             JBD2_FLAG_SAME_UUID JBD2_FLAG_LAST_TAG
  ...
  Block 7: Inode
  Inode: 8257802   Mode: 0640   Generation: 57157641 (0x3682809)
  FS Generation: 2839773110 (0xa9437fb6)
  CRC32: 00000000   ECC: 0000
  Type: Regular   Attr: 0x0   Flags: Valid
  Dynamic Features: (0x1) InlineData
  User: 0 (root)   Group: 0 (root)   Size: 7
  Links: 1   Clusters: 0
  ctime: 0x5de5d870 0x11104c61 -- Tue Dec  3 11:37:20.286280801 2019
  atime: 0x5de5d870 0x113181a1 -- Tue Dec  3 11:37:20.288457121 2019
  mtime: 0x5de5d870 0x11104c61 -- Tue Dec  3 11:37:20.286280801 2019
  dtime: 0x0 -- Thu Jan  1 08:00:00 1970
  ...
  Block 9: Journal Commit Block
  Seq: 15   Type: 2 (JBD2_COMMIT_BLOCK)

The following is journal recovery log when recovering the upper jbd2
journal when mount again.

syslog:
  ocfs2: File system on device (252,1) was not unmounted cleanly, recovering it.
  fs/jbd2/recovery.c:(do_one_pass, 449): Starting recovery pass 0
  fs/jbd2/recovery.c:(do_one_pass, 449): Starting recovery pass 1
  fs/jbd2/recovery.c:(do_one_pass, 449): Starting recovery pass 2
  fs/jbd2/recovery.c:(jbd2_journal_recover, 278): JBD2: recovery, exit status 0, recovered transactions 13 to 13

Due to first commit seq 13 recorded in journal super is not consistent
with the value recorded in block 1(seq is 14), journal recovery will be
terminated before seq 15 even though it is an unbroken commit, inode
8257802 is a new file and it will be lost.

Link: http://lkml.kernel.org/r/20191217020140.2197-1-li.kai4@h3c.com
Signed-off-by: Kai Li <li.kai4@h3c.com>
Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Reviewed-by: Changwei Ge <gechangwei@live.cn>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Gang He <ghe@suse.com>
Cc: Jun Piao <piaojun@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-01-04 13:55:09 -08:00
Randy Dunlap
e39e773ad1 fs/posix_acl.c: fix kernel-doc warnings
Fix kernel-doc warnings in fs/posix_acl.c.
Also fix one typo (setgit -> setgid).

  fs/posix_acl.c:647: warning: Function parameter or member 'inode' not described in 'posix_acl_update_mode'
  fs/posix_acl.c:647: warning: Function parameter or member 'mode_p' not described in 'posix_acl_update_mode'
  fs/posix_acl.c:647: warning: Function parameter or member 'acl' not described in 'posix_acl_update_mode'

Link: http://lkml.kernel.org/r/29b0dc46-1f28-a4e5-b1d0-ba2b65629779@infradead.org
Fixes: 073931017b ("posix_acl: Clear SGID bit when setting file permissions")

Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Acked-by: Andreas Gruenbacher <agruenba@redhat.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: Jan Kara <jack@suse.cz>
Cc: Andreas Gruenbacher <agruenba@redhat.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-01-04 13:55:09 -08:00
Eric Biggers
213921f967 fs/namespace.c: make to_mnt_ns() static
Make to_mnt_ns() static to address the following 'sparse' warning:

    fs/namespace.c:1731:22: warning: symbol 'to_mnt_ns' was not declared. Should it be static?

Link: http://lkml.kernel.org/r/20191209234830.156260-1-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@google.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-01-04 13:55:09 -08:00
Eric Biggers
7bebd69ecf fs/nsfs.c: include headers for missing declarations
Include linux/proc_fs.h and fs/internal.h to address the following
'sparse' warnings:

    fs/nsfs.c:41:32: warning: symbol 'ns_dentry_operations' was not declared. Should it be static?
    fs/nsfs.c:145:5: warning: symbol 'open_related_ns' was not declared. Should it be static?

Link: http://lkml.kernel.org/r/20191209234822.156179-1-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@google.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-01-04 13:55:09 -08:00
Eric Biggers
b16155a0b0 fs/direct-io.c: include fs/internal.h for missing prototype
Include fs/internal.h to address the following 'sparse' warning:

    fs/direct-io.c:591:5: warning: symbol 'sb_init_dio_done_wq' was not declared. Should it be static?

Link: http://lkml.kernel.org/r/20191209234544.128302-1-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@google.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-01-04 13:55:09 -08:00
Linus Torvalds
3a562aee72 for-5.5-rc4-tag
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEE8rQSAMVO+zA4DBdWxWXV+ddtWDsFAl4PenMACgkQxWXV+ddt
 WDtsyg/6A86Da1zpLQKHiQtfhH629oeL/lFI0Cz+52xcgKSAUX8pHrWzKawSVpTT
 OTgksskHREPDjyDQytxAUKJnuplFEF7sCFNhzhsjQ7cvhC0lLDmW0VQr1Wob1/c4
 miW18YXN2DrYFQqdCeypvq1FItglxuBmsa9eITYzytRZcATY+3b506FhT/d8NIiH
 f3nnChUiNuyfAiez1SC4FavDpVma1O6SpPMk29wUN/+/Dnd4aBrfvhuygy7xyIOK
 rzotRR5xagQDpei+99lT2hys4Pv0yEOajoGmgbgpaZNP0vgQcBLaF5UTeMv/Ib2j
 i+muu1rWi4R6lS3hh30kRSifKmPF7I9JB1dbd5gPfZiERlDQk+qZWrPFv9l0lf7M
 R3jn04EaPVNr0dtftRFF2VvpIlUQYgIvwZHx4Td6Oy9XO0X4n5vlKxYg9aHmybJ2
 Vni13ad2oWNLo2Dd1eUYrEJ/QwGc1pq7JU9HiOST7yEJv1FN++AF4qTYTyvc7Yl4
 HrN349/2k2J9vU88BIlXIxYG73AL9lIpGF2ROiEw+xn4oOevDP/5HCP8ZvELoNVM
 NqhJGoURu+AhZ7Rm3RYAYySShjQPF+qw/Dnz8sowQsEFUQ990E1wUxWZwTOP6+JQ
 cdwzYiXYIExTghfQdUDdwNlpCXUbSuJxFOGjKmhBuzA7HHwGf1Q=
 =jGvW
 -----END PGP SIGNATURE-----

Merge tag 'for-5.5-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux

Pull btrfs fixes from David Sterba:
 "A few fixes for btrfs:

   - blkcg accounting problem with compression that could stall writes

   - setting up blkcg bio for compression crashes due to NULL bdev
     pointer

   - fix possible infinite loop in writeback for nocow files (here
     possible means almost impossible, 13 things that need to happen to
     trigger it)"

* tag 'for-5.5-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
  Btrfs: fix infinite loop during nocow writeback due to race
  btrfs: fix compressed write bio blkcg attribution
  btrfs: punt all bios created in btrfs_submit_compressed_write()
2020-01-03 12:20:21 -08:00
Linus Torvalds
b6b4aafc99 block-5.5-20200103
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAl4PauoQHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgpkqMD/4zcRGP8LhkVG3I8QMqmhhJNyZb5WP5O/mV
 ekWYYzxaPhn/WQjM17Y6MBuvPTqUk1R/FiZdjW+OPUOrXUPsasSSKmLvcpiHIRHY
 YTMhzNNRpainakM7q+BwhfeNm/T3d0JXMfN8oufIfrdFHy7+bcZupctv0aQgnLdz
 Z8f40nR8t/uYfRrMzCjEM9mY2P9qelwP3ylnloS2XC0r+O/j/tBZEtnormhFwnfa
 NGx732WyHBK4Qt9bRK8FzmuUmUmhxU+MWHcR5erCCzeFEY4DnzzthFuKGvZ3Ai0f
 qWMjYJBrWHvNJL6M4Dm3zE/tVWIFg3//zo7buuZD29Ms3GoqGj88mbrS/BFKNp1+
 U5cShEMqd/7y5RUB0nOzRnfGVVHz/2WRRKvhc0/cFHulJb0q28u5pHhmlrduvxr/
 VogoSiimo7qLVGiBeXgfubuyB3nvzGvDICgiqt3rIVgpgMytoiBz53HgnNLB8QtO
 CtYDJOm8sCTtnkEPshij9Ly0dfIYUVKz7QIx65qSfcv+vxag/WmWuG5woRtqtB6N
 AoFqf8PpHXHtVrokbqAG4j4R7QrLSrYSTh/vDa6muTaiu7fe5Gb2gXKWPbIGU5n+
 b+2oKqlpVN1ZqOvhUv/PXW0U/nMx2j5Untank1ebHd+416nfNqM5lN6jsfZ3HnaR
 tpST+JtLDg==
 =ll2G
 -----END PGP SIGNATURE-----

Merge tag 'block-5.5-20200103' of git://git.kernel.dk/linux-block

Pull block fixes from Jens Axboe:
 "Three fixes in here:

   - Fix for a missing split on default memory boundary mask (4G) (Ming)

   - Fix for multi-page read bio truncate (Ming)

   - Fix for null_blk zone close request handling (Damien)"

* tag 'block-5.5-20200103' of git://git.kernel.dk/linux-block:
  null_blk: Fix REQ_OP_ZONE_CLOSE handling
  block: fix splitting segments on boundary masks
  block: add bio_truncate to fix guard_bio_eod
2020-01-03 12:11:30 -08:00
Jan Stancek
15f0ec941f mm/hugetlbfs: fix for_each_hstate() loop in init_hugetlbfs_fs()
LTP memfd_create04 started failing for some huge page sizes
after v5.4-10135-gc3bfc5dd73c6.

The problem is the check introduced to for_each_hstate() loop that
should skip default_hstate_idx.  Since it doesn't update 'i' counter,
all subsequent huge page sizes are skipped as well.

Fixes: 8fc312b32b ("mm/hugetlbfs: fix error handling when setting up mounts")
Signed-off-by: Jan Stancek <jstancek@redhat.com>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-01-03 10:39:08 -08:00
Linus Torvalds
278b14eb92 pstore bug fixes
- always reset circular buffer state when writing new dump (Aleksandr Yashkin)
 - fix rare error-path memory leak (Kees Cook)
 -----BEGIN PGP SIGNATURE-----
 Comment: Kees Cook <kees@outflux.net>
 
 iQJKBAABCgA0FiEEpcP2jyKd1g9yPm4TiXL039xtwCYFAl4OWFsWHGtlZXNjb29r
 QGNocm9taXVtLm9yZwAKCRCJcvTf3G3AJg3FD/0Vp/Qf0zfjGlV657kNhbYMKYSM
 h/ItNhFovnRnxobY2CzjZyYoU9ZCsv2bEhplILlJxG84l+nb3nUubHGhUVcoQ6iE
 aDJAdz632qB+R8l/Ouk0xUp/UnVvoheA68RCaeY6R7G/vAHqceoGD72Ji5kFe1T8
 QxPqVAJAIn+hdBHtZVhuueqINQ+3nCUUwE4Yu0veqEG5wcf+awGDjDrha2ZsMyfM
 Zl54co7l5Z5MwLLTxXO8RRFXlGtmMXnLQcKUwdPIxi+ZBZRan2pdj2zX5Sr0eGA2
 HtZchn9Bc5bbwERRobbwWzQeqbwfoRHMOWFq7mUHyaho/5Pbv67A0lOib2cuXG+U
 WjpXV9nLiAbVqIH0wph4Vppx5acDI7o0JRS0oct/f7trZE58zXfHLJHLqPrblV8J
 3pr989yGBEYI8YCf2dJXIKQlT0+s24Iyby0RmkCwQC/DkRWf9+a6BBYLHvGS96Y2
 gZ4wKGJRXbDP3m4NuojYRv3DwF71jtSw5bR1kLAdjEygbtxG064mTcACH+DsScKK
 6zBbOA2qBf1ReFdz4KVN9TAOnPIsTgeBG5ttmzGgkKZUUAe9+65AJH8jS22NSD2S
 mn4S2hXio5hV3bMUJtItEzMjYwMkdeyac8EubPlLH4DoNdZ01ER6L07x83m5cPjl
 wZ845V9VZ14i4bkq8w==
 =yG+c
 -----END PGP SIGNATURE-----

Merge tag 'pstore-v5.5-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux

Pull pstore bug fixes from Kees Cook:

 - always reset circular buffer state when writing new dump (Aleksandr
   Yashkin)

 - fix rare error-path memory leak (Kees Cook)

* tag 'pstore-v5.5-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
  pstore/ram: Write new dumps to start of recycled zones
  pstore/ram: Fix error-path memory leak in persistent_ram_new() callers
2020-01-02 16:39:51 -08:00
Dominik Brodowski
74f1a29910 Revert "fs: remove ksys_dup()"
This reverts commit 8243186f0c ("fs: remove ksys_dup()") and the
subsequent fix for it in commit 2d3145f8d2 ("early init: fix error
handling when opening /dev/console").

Trying to use filp_open() and f_dupfd() instead of pseudo-syscalls
caused more trouble than what is worth it: it requires accessing vfs
internals and it turns out there were other bugs in it too.

In particular, the file reference counting was wrong - because unlike
the original "open+2*dup" sequence it used "filp_open+3*f_dupfd" and
thus had an extra leaked file reference.

That in turn then caused odd problems with Androidx86 long after boot
becaue of how the extra reference to the console kept the session active
even after all file descriptors had been closed.

Reported-by: youling 257 <youling257@gmail.com>
Cc: Arvind Sankar <nivedita@alum.mit.edu>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-01-02 16:15:33 -08:00
Aleksandr Yashkin
9e5f1c1980 pstore/ram: Write new dumps to start of recycled zones
The ram_core.c routines treat przs as circular buffers. When writing a
new crash dump, the old buffer needs to be cleared so that the new dump
doesn't end up in the wrong place (i.e. at the end).

The solution to this problem is to reset the circular buffer state before
writing a new Oops dump.

Signed-off-by: Aleksandr Yashkin <a.yashkin@inango-systems.com>
Signed-off-by: Nikolay Merinov <n.merinov@inango-systems.com>
Signed-off-by: Ariel Gilman <a.gilman@inango-systems.com>
Link: https://lore.kernel.org/r/20191223133816.28155-1-n.merinov@inango-systems.com
Fixes: 896fc1f0c4 ("pstore/ram: Switch to persistent_ram routines")
Cc: stable@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
2020-01-02 12:30:50 -08:00
Kees Cook
8df955a32a pstore/ram: Fix error-path memory leak in persistent_ram_new() callers
For callers that allocated a label for persistent_ram_new(), if the call
fails, they must clean up the allocation.

Suggested-by: Navid Emamdoost <navid.emamdoost@gmail.com>
Fixes: 1227daa43b ("pstore/ram: Clarify resource reservation labels")
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/lkml/20191211191353.14385-1-navid.emamdoost@gmail.com
Signed-off-by: Kees Cook <keescook@chromium.org>
2020-01-02 12:30:39 -08:00