Commit Graph

897963 Commits

Author SHA1 Message Date
He Zhe
8c96f1bc6f mm/kmemleak: turn kmemleak_lock and object->lock to raw_spinlock_t
kmemleak_lock as a rwlock on RT can possibly be acquired in atomic
context which does work.

Since the kmemleak operation is performed in atomic context make it a
raw_spinlock_t so it can also be acquired on RT.  This is used for
debugging and is not enabled by default in a production like environment
(where performance/latency matters) so it makes sense to make it a
raw_spinlock_t instead trying to get rid of the atomic context.  Turn
also the kmemleak_object->lock into raw_spinlock_t which is acquired
(nested) while the kmemleak_lock is held.

The time spent in "echo scan > kmemleak" slightly improved on 64core box
with this patch applied after boot.

[bigeasy@linutronix.de: redo the description, update comments. Merge the individual bits:  He Zhe did the kmemleak_lock, Liu Haitao the ->lock and Yongxin Liu forwarded Liu's patch.]
Link: http://lkml.kernel.org/r/20191219170834.4tah3prf2gdothz4@linutronix.de
Link: https://lkml.kernel.org/r/20181218150744.GB20197@arrakis.emea.arm.com
Link: https://lkml.kernel.org/r/1542877459-144382-1-git-send-email-zhe.he@windriver.com
Link: https://lkml.kernel.org/r/20190927082230.34152-1-yongxin.liu@windriver.com
Signed-off-by: He Zhe <zhe.he@windriver.com>
Signed-off-by: Liu Haitao <haitao.liu@windriver.com>
Signed-off-by: Yongxin Liu <yongxin.liu@windriver.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-01-31 10:30:36 -08:00
Yu Zhao
90e9f6a66c mm/slub.c: avoid slub allocation while holding list_lock
If we are already under list_lock, don't call kmalloc().  Otherwise we
will run into a deadlock because kmalloc() also tries to grab the same
lock.

Fix the problem by using a static bitmap instead.

  WARNING: possible recursive locking detected
  --------------------------------------------
  mount-encrypted/4921 is trying to acquire lock:
  (&(&n->list_lock)->rlock){-.-.}, at: ___slab_alloc+0x104/0x437

  but task is already holding lock:
  (&(&n->list_lock)->rlock){-.-.}, at: __kmem_cache_shutdown+0x81/0x3cb

  other info that might help us debug this:
   Possible unsafe locking scenario:

         CPU0
         ----
    lock(&(&n->list_lock)->rlock);
    lock(&(&n->list_lock)->rlock);

   *** DEADLOCK ***

Link: http://lkml.kernel.org/r/20191108193958.205102-2-yuzhao@google.com
Signed-off-by: Yu Zhao <yuzhao@google.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
Cc: Yu Zhao <yuzhao@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-01-31 10:30:36 -08:00
wangyan
25b69918d9 ocfs2: use ocfs2_update_inode_fsync_trans() to access t_tid in handle->h_transaction
For the uniform format, we use ocfs2_update_inode_fsync_trans() to
access t_tid in handle->h_transaction

Link: http://lkml.kernel.org/r/6ff9a312-5f7d-0e27-fb51-bc4e062fcd97@huawei.com
Signed-off-by: Yan Wang <wangyan122@huawei.com>
Reviewed-by: Jun Piao <piaojun@huawei.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Joseph Qi <jiangqi903@gmail.com>
Cc: Changwei Ge <gechangwei@live.cn>
Cc: Gang He <ghe@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-01-31 10:30:36 -08:00
wangyan
9f16ca48fc ocfs2: fix a NULL pointer dereference when call ocfs2_update_inode_fsync_trans()
I found a NULL pointer dereference in ocfs2_update_inode_fsync_trans(),
handle->h_transaction may be NULL in this situation:

ocfs2_file_write_iter
  ->__generic_file_write_iter
      ->generic_perform_write
        ->ocfs2_write_begin
          ->ocfs2_write_begin_nolock
            ->ocfs2_write_cluster_by_desc
              ->ocfs2_write_cluster
                ->ocfs2_mark_extent_written
                  ->ocfs2_change_extent_flag
                    ->ocfs2_split_extent
                      ->ocfs2_try_to_merge_extent
                        ->ocfs2_extend_rotate_transaction
                          ->ocfs2_extend_trans
                            ->jbd2_journal_restart
                              ->jbd2__journal_restart
                                // handle->h_transaction is NULL here
                                ->handle->h_transaction = NULL;
                                ->start_this_handle
                                  /* journal aborted due to storage
                                     network disconnection, return error */
                                  ->return -EROFS;
                         /* line 3806 in ocfs2_try_to_merge_extent (),
                            it will ignore ret error. */
                        ->ret = 0;
        ->...
        ->ocfs2_write_end
          ->ocfs2_write_end_nolock
            ->ocfs2_update_inode_fsync_trans
              // NULL pointer dereference
              ->oi->i_sync_tid = handle->h_transaction->t_tid;

The information of NULL pointer dereference as follows:
    JBD2: Detected IO errors while flushing file data on dm-11-45
    Aborting journal on device dm-11-45.
    JBD2: Error -5 detected when updating journal superblock for dm-11-45.
    (dd,22081,3):ocfs2_extend_trans:474 ERROR: status = -30
    (dd,22081,3):ocfs2_try_to_merge_extent:3877 ERROR: status = -30
    Unable to handle kernel NULL pointer dereference at
    virtual address 0000000000000008
    Mem abort info:
      ESR = 0x96000004
      Exception class = DABT (current EL), IL = 32 bits
      SET = 0, FnV = 0
      EA = 0, S1PTW = 0
    Data abort info:
      ISV = 0, ISS = 0x00000004
      CM = 0, WnR = 0
    user pgtable: 4k pages, 48-bit VAs, pgdp = 00000000e74e1338
    [0000000000000008] pgd=0000000000000000
    Internal error: Oops: 96000004 [#1] SMP
    Process dd (pid: 22081, stack limit = 0x00000000584f35a9)
    CPU: 3 PID: 22081 Comm: dd Kdump: loaded
    Hardware name: Huawei TaiShan 2280 V2/BC82AMDD, BIOS 0.98 08/25/2019
    pstate: 60400009 (nZCv daif +PAN -UAO)
    pc : ocfs2_write_end_nolock+0x2b8/0x550 [ocfs2]
    lr : ocfs2_write_end_nolock+0x2a0/0x550 [ocfs2]
    sp : ffff0000459fba70
    x29: ffff0000459fba70 x28: 0000000000000000
    x27: ffff807ccf7f1000 x26: 0000000000000001
    x25: ffff807bdff57970 x24: ffff807caf1d4000
    x23: ffff807cc79e9000 x22: 0000000000001000
    x21: 000000006c6cd000 x20: ffff0000091d9000
    x19: ffff807ccb239db0 x18: ffffffffffffffff
    x17: 000000000000000e x16: 0000000000000007
    x15: ffff807c5e15bd78 x14: 0000000000000000
    x13: 0000000000000000 x12: 0000000000000000
    x11: 0000000000000000 x10: 0000000000000001
    x9 : 0000000000000228 x8 : 000000000000000c
    x7 : 0000000000000fff x6 : ffff807a308ed6b0
    x5 : ffff7e01f10967c0 x4 : 0000000000000018
    x3 : d0bc661572445600 x2 : 0000000000000000
    x1 : 000000001b2e0200 x0 : 0000000000000000
    Call trace:
     ocfs2_write_end_nolock+0x2b8/0x550 [ocfs2]
     ocfs2_write_end+0x4c/0x80 [ocfs2]
     generic_perform_write+0x108/0x1a8
     __generic_file_write_iter+0x158/0x1c8
     ocfs2_file_write_iter+0x668/0x950 [ocfs2]
     __vfs_write+0x11c/0x190
     vfs_write+0xac/0x1c0
     ksys_write+0x6c/0xd8
     __arm64_sys_write+0x24/0x30
     el0_svc_common+0x78/0x130
     el0_svc_handler+0x38/0x78
     el0_svc+0x8/0xc

To prevent NULL pointer dereference in this situation, we use
is_handle_aborted() before using handle->h_transaction->t_tid.

Link: http://lkml.kernel.org/r/03e750ab-9ade-83aa-b000-b9e81e34e539@huawei.com
Signed-off-by: Yan Wang <wangyan122@huawei.com>
Reviewed-by: Jun Piao <piaojun@huawei.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Joseph Qi <jiangqi903@gmail.com>
Cc: Changwei Ge <gechangwei@live.cn>
Cc: Gang He <ghe@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-01-31 10:30:36 -08:00
Andy Shevchenko
dd3e7cba16 ocfs2/dlm: move BITS_TO_BYTES() to bitops.h for wider use
There are users already and will be more of BITS_TO_BYTES() macro.  Move
it to bitops.h for wider use.

In the case of ocfs2 the replacement is identical.

As for bnx2x, there are two places where floor version is used.  In the
first case to calculate the amount of structures that can fit one memory
page.  In this case obviously the ceiling variant is correct and
original code might have a potential bug, if amount of bits % 8 is not
0.  In the second case the macro is used to calculate bytes transmitted
in one microsecond.  This will work for all speeds which is multiply of
1Gbps without any change, for the rest new code will give ceiling value,
for instance 100Mbps will give 13 bytes, while old code gives 12 bytes
and the arithmetically correct one is 12.5 bytes.  Further the value is
used to setup timer threshold which in any case has its own margins due
to certain resolution.  I don't see here an issue with slightly shifting
thresholds for low speed connections, the card is supposed to utilize
highest available rate, which is usually 10Gbps.

Link: http://lkml.kernel.org/r/20200108121316.22411-1-andriy.shevchenko@linux.intel.com
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Acked-by: Sudarsana Reddy Kalluru <skalluru@marvell.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Changwei Ge <gechangwei@live.cn>
Cc: Gang He <ghe@suse.com>
Cc: Jun Piao <piaojun@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-01-31 10:30:36 -08:00
Colin Ian King
d8f1875069 ocfs2/dlm: remove redundant assignment to ret
The variable ret is being initialized with a value that is never read
and it is being updated later with a new value.  The initialization is
redundant and can be removed.

Addresses Coverity ("Unused value")

Link: http://lkml.kernel.org/r/20191202164833.62865-1-colin.king@canonical.com
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Changwei Ge <gechangwei@live.cn>
Cc: Gang He <ghe@suse.com>
Cc: Jun Piao <piaojun@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-01-31 10:30:36 -08:00
Masahiro Yamada
ca322fb603 ocfs2: make local header paths relative to C files
Gang He reports the failure of building fs/ocfs2/ as an external module
of the kernel installed on the system:

 $ cd fs/ocfs2
 $ make -C /lib/modules/`uname -r`/build M=`pwd` modules

If you want to make it work reliably, I'd recommend to remove ccflags-y
from the Makefiles, and to make header paths relative to the C files.  I
think this is the correct usage of the #include "..." directive.

Link: http://lkml.kernel.org/r/20191227022950.14804-1-ghe@suse.com
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Signed-off-by: Gang He <ghe@suse.com>
Reported-by: Gang He <ghe@suse.com>
Reviewed-by: Gang He <ghe@suse.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Joseph Qi <jiangqi903@gmail.com>
Cc: Changwei Ge <gechangwei@live.cn>
Cc: Jun Piao <piaojun@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-01-31 10:30:36 -08:00
zhengbin
5b43d6453a ocfs2: remove unneeded semicolons
Fixes coccicheck warnings:

  fs/ocfs2/cluster/quorum.c:76:2-3: Unneeded semicolon
  fs/ocfs2/dlmglue.c:573:2-3: Unneeded semicolon

Link: http://lkml.kernel.org/r/6ee3aa16-9078-30b1-df3f-22064950bd98@linux.alibaba.com
Signed-off-by: zhengbin <zhengbin13@huawei.com>
Reported-by: Hulk Robot <hulkci@huawei.com>
Acked-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Changwei Ge <gechangwei@live.cn>
Cc: Gang He <ghe@suse.com>
Cc: Jun Piao <piaojun@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-01-31 10:30:36 -08:00
Aditya Pakki
67e2d2eb54 fs: ocfs: remove unnecessary assertion in dlm_migrate_lockres
In the only caller of dlm_migrate_lockres() - dlm_empty_lockres(),
target is checked for O2NM_MAX_NODES.  Thus, the assertion in
dlm_migrate_lockres() is unnecessary and can be removed.  The patch
eliminates such a check.

Link: http://lkml.kernel.org/r/20191218194111.26041-1-pakki001@umn.edu
Signed-off-by: Aditya Pakki <pakki001@umn.edu>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Joseph Qi <jiangqi903@gmail.com>
Cc: Changwei Ge <gechangwei@live.cn>
Cc: Gang He <ghe@suse.com>
Cc: Jun Piao <piaojun@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-01-31 10:30:36 -08:00
Luca Ceresoli
4efc61c798 scripts/spelling.txt: add "issus" typo
Add "issus" and correct it as "issues".

Link: http://lkml.kernel.org/r/20200105221950.8384-1-luca@lucaceresoli.net
Signed-off-by: Luca Ceresoli <luca@lucaceresoli.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-01-31 10:30:36 -08:00
Xiong
2ab1278fe4 scripts/spelling.txt: add more spellings to spelling.txt
Here are some of the common spelling mistakes and typos that I've found
while fixing up spelling mistakes in the kernel.  Most of them still
exist in more than two source files.

Link: http://lkml.kernel.org/r/20191229143626.51238-1-xndchn@gmail.com
Signed-off-by: Xiong <xndchn@gmail.com>
Cc: Colin Ian King <colin.king@canonical.com>
Cc: Stephen Boyd <sboyd@kernel.org>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Chris Paterson <chris.paterson2@renesas.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-01-31 10:30:36 -08:00
Yang Shi
5984fabb6e mm: move_pages: report the number of non-attempted pages
Since commit a49bd4d716 ("mm, numa: rework do_pages_move"), the
semantic of move_pages() has changed to return the number of
non-migrated pages if they were result of a non-fatal reasons (usually a
busy page).

This was an unintentional change that hasn't been noticed except for LTP
tests which checked for the documented behavior.

There are two ways to go around this change.  We can even get back to
the original behavior and return -EAGAIN whenever migrate_pages is not
able to migrate pages due to non-fatal reasons.  Another option would be
to simply continue with the changed semantic and extend move_pages
documentation to clarify that -errno is returned on an invalid input or
when migration simply cannot succeed (e.g.  -ENOMEM, -EBUSY) or the
number of pages that couldn't have been migrated due to ephemeral
reasons (e.g.  page is pinned or locked for other reasons).

This patch implements the second option because this behavior is in
place for some time without anybody complaining and possibly new users
depending on it.  Also it allows to have a slightly easier error
handling as the caller knows that it is worth to retry when err > 0.

But since the new semantic would be aborted immediately if migration is
failed due to ephemeral reasons, need include the number of
non-attempted pages in the return value too.

Link: http://lkml.kernel.org/r/1580160527-109104-1-git-send-email-yang.shi@linux.alibaba.com
Fixes: a49bd4d716 ("mm, numa: rework do_pages_move")
Signed-off-by: Yang Shi <yang.shi@linux.alibaba.com>
Suggested-by: Michal Hocko <mhocko@suse.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Reviewed-by: Wei Yang <richardw.yang@linux.intel.com>
Cc: <stable@vger.kernel.org>    [4.17+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-01-31 10:30:36 -08:00
Wei Yang
fac0516b55 mm: thp: don't need care deferred split queue in memcg charge move path
If compound is true, this means it is a PMD mapped THP.  Which implies
the page is not linked to any defer list.  So the first code chunk will
not be executed.

Also with this reason, it would not be proper to add this page to a
defer list.  So the second code chunk is not correct.

Based on this, we should remove the defer list related code.

[yang.shi@linux.alibaba.com: better patch title]
Link: http://lkml.kernel.org/r/20200117233836.3434-1-richardw.yang@linux.intel.com
Fixes: 87eaceb3fa ("mm: thp: make deferred split shrinker memcg aware")
Signed-off-by: Wei Yang <richardw.yang@linux.intel.com>
Suggested-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Yang Shi <yang.shi@linux.alibaba.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: <stable@vger.kernel.org>    [5.4+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-01-31 10:30:36 -08:00
Dan Williams
f1037ec0cc mm/memory_hotplug: fix remove_memory() lockdep splat
The daxctl unit test for the dax_kmem driver currently triggers the
(false positive) lockdep splat below.  It results from the fact that
remove_memory_block_devices() is invoked under the mem_hotplug_lock()
causing lockdep entanglements with cpu_hotplug_lock() and sysfs (kernfs
active state tracking).  It is a false positive because the sysfs
attribute path triggering the memory remove is not the same attribute
path associated with memory-block device.

sysfs_break_active_protection() is not applicable since there is no real
deadlock conflict, instead move memory-block device removal outside the
lock.  The mem_hotplug_lock() is not needed to synchronize the
memory-block device removal vs the page online state, that is already
handled by lock_device_hotplug().  Specifically, lock_device_hotplug()
is sufficient to allow try_remove_memory() to check the offline state of
the memblocks and be assured that any in progress online attempts are
flushed / blocked by kernfs_drain() / attribute removal.

The add_memory() path safely creates memblock devices under the
mem_hotplug_lock().  There is no kernfs active state synchronization in
the memblock device_register() path, so nothing to fix there.

This change is only possible thanks to the recent change that refactored
memory block device removal out of arch_remove_memory() (commit
4c4b7f9ba9 "mm/memory_hotplug: remove memory block devices before
arch_remove_memory()"), and David's due diligence tracking down the
guarantees afforded by kernfs_drain().  Not flagged for -stable since
this only impacts ongoing development and lockdep validation, not a
runtime issue.

    ======================================================
    WARNING: possible circular locking dependency detected
    5.5.0-rc3+ #230 Tainted: G           OE
    ------------------------------------------------------
    lt-daxctl/6459 is trying to acquire lock:
    ffff99c7f0003510 (kn->count#241){++++}, at: kernfs_remove_by_name_ns+0x41/0x80

    but task is already holding lock:
    ffffffffa76a5450 (mem_hotplug_lock.rw_sem){++++}, at: percpu_down_write+0x20/0xe0

    which lock already depends on the new lock.

    the existing dependency chain (in reverse order) is:

    -> #2 (mem_hotplug_lock.rw_sem){++++}:
           __lock_acquire+0x39c/0x790
           lock_acquire+0xa2/0x1b0
           get_online_mems+0x3e/0xb0
           kmem_cache_create_usercopy+0x2e/0x260
           kmem_cache_create+0x12/0x20
           ptlock_cache_init+0x20/0x28
           start_kernel+0x243/0x547
           secondary_startup_64+0xb6/0xc0

    -> #1 (cpu_hotplug_lock.rw_sem){++++}:
           __lock_acquire+0x39c/0x790
           lock_acquire+0xa2/0x1b0
           cpus_read_lock+0x3e/0xb0
           online_pages+0x37/0x300
           memory_subsys_online+0x17d/0x1c0
           device_online+0x60/0x80
           state_store+0x65/0xd0
           kernfs_fop_write+0xcf/0x1c0
           vfs_write+0xdb/0x1d0
           ksys_write+0x65/0xe0
           do_syscall_64+0x5c/0xa0
           entry_SYSCALL_64_after_hwframe+0x49/0xbe

    -> #0 (kn->count#241){++++}:
           check_prev_add+0x98/0xa40
           validate_chain+0x576/0x860
           __lock_acquire+0x39c/0x790
           lock_acquire+0xa2/0x1b0
           __kernfs_remove+0x25f/0x2e0
           kernfs_remove_by_name_ns+0x41/0x80
           remove_files.isra.0+0x30/0x70
           sysfs_remove_group+0x3d/0x80
           sysfs_remove_groups+0x29/0x40
           device_remove_attrs+0x39/0x70
           device_del+0x16a/0x3f0
           device_unregister+0x16/0x60
           remove_memory_block_devices+0x82/0xb0
           try_remove_memory+0xb5/0x130
           remove_memory+0x26/0x40
           dev_dax_kmem_remove+0x44/0x6a [kmem]
           device_release_driver_internal+0xe4/0x1c0
           unbind_store+0xef/0x120
           kernfs_fop_write+0xcf/0x1c0
           vfs_write+0xdb/0x1d0
           ksys_write+0x65/0xe0
           do_syscall_64+0x5c/0xa0
           entry_SYSCALL_64_after_hwframe+0x49/0xbe

    other info that might help us debug this:

    Chain exists of:
      kn->count#241 --> cpu_hotplug_lock.rw_sem --> mem_hotplug_lock.rw_sem

     Possible unsafe locking scenario:

           CPU0                    CPU1
           ----                    ----
      lock(mem_hotplug_lock.rw_sem);
                                   lock(cpu_hotplug_lock.rw_sem);
                                   lock(mem_hotplug_lock.rw_sem);
      lock(kn->count#241);

     *** DEADLOCK ***

No fixes tag as this has been a long standing issue that predated the
addition of kernfs lockdep annotations.

Link: http://lkml.kernel.org/r/157991441887.2763922.4770790047389427325.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Cc: Vishal Verma <vishal.l.verma@intel.com>
Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-01-31 10:30:36 -08:00
Wei Yang
dfe9aa23ca mm/migrate.c: also overwrite error when it is bigger than zero
If we get here after successfully adding page to list, err would be 1 to
indicate the page is queued in the list.

Current code has two problems:

  * on success, 0 is not returned
  * on error, if add_page_for_migratioin() return 1, and the following err1
    from do_move_pages_to_node() is set, the err1 is not returned since err
    is 1

And these behaviors break the user interface.

Link: http://lkml.kernel.org/r/20200119065753.21694-1-richardw.yang@linux.intel.com
Fixes: e0153fc2c7 ("mm: move_pages: return valid node id in status if the page is already on the target node").
Signed-off-by: Wei Yang <richardw.yang@linux.intel.com>
Acked-by: Yang Shi <yang.shi@linux.alibaba.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Christoph Lameter <cl@linux.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-01-31 10:30:36 -08:00
Pingfan Liu
1f503443e7 mm/sparse.c: reset section's mem_map when fully deactivated
After commit ba72b4c8cf ("mm/sparsemem: support sub-section hotplug"),
when a mem section is fully deactivated, section_mem_map still records
the section's start pfn, which is not used any more and will be
reassigned during re-addition.

In analogy with alloc/free pattern, it is better to clear all fields of
section_mem_map.

Beside this, it breaks the user space tool "makedumpfile" [1], which
makes assumption that a hot-removed section has mem_map as NULL, instead
of checking directly against SECTION_MARKED_PRESENT bit.  (makedumpfile
will be better to change the assumption, and need a patch)

The bug can be reproduced on IBM POWERVM by "drmgr -c mem -r -q 5" ,
trigger a crash, and save vmcore by makedumpfile

[1]: makedumpfile, commit e73016540293 ("[v1.6.7] Update version")

Link: http://lkml.kernel.org/r/1579487594-28889-1-git-send-email-kernelfans@gmail.com
Signed-off-by: Pingfan Liu <kernelfans@gmail.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: David Hildenbrand <david@redhat.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Baoquan He <bhe@redhat.com>
Cc: Qian Cai <cai@lca.pw>
Cc: Kazuhito Hagio <k-hagio@ab.jp.nec.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-01-31 10:30:36 -08:00
Dan Carpenter
c7a91bc7c2 mm/mempolicy.c: fix out of bounds write in mpol_parse_str()
What we are trying to do is change the '=' character to a NUL terminator
and then at the end of the function we restore it back to an '='.  The
problem is there are two error paths where we jump to the end of the
function before we have replaced the '=' with NUL.

We end up putting the '=' in the wrong place (possibly one element
before the start of the buffer).

Link: http://lkml.kernel.org/r/20200115055426.vdjwvry44nfug7yy@kili.mountain
Reported-by: syzbot+e64a13c5369a194d67df@syzkaller.appspotmail.com
Fixes: 095f1fc4eb ("mempolicy: rework shmem mpol parsing and display")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Dmitry Vyukov <dvyukov@google.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Cc: Lee Schermerhorn <lee.schermerhorn@hp.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-01-31 10:30:36 -08:00
Theodore Ts'o
68f23b8906 memcg: fix a crash in wb_workfn when a device disappears
Without memcg, there is a one-to-one mapping between the bdi and
bdi_writeback structures.  In this world, things are fairly
straightforward; the first thing bdi_unregister() does is to shutdown
the bdi_writeback structure (or wb), and part of that writeback ensures
that no other work queued against the wb, and that the wb is fully
drained.

With memcg, however, there is a one-to-many relationship between the bdi
and bdi_writeback structures; that is, there are multiple wb objects
which can all point to a single bdi.  There is a refcount which prevents
the bdi object from being released (and hence, unregistered).  So in
theory, the bdi_unregister() *should* only get called once its refcount
goes to zero (bdi_put will drop the refcount, and when it is zero,
release_bdi gets called, which calls bdi_unregister).

Unfortunately, del_gendisk() in block/gen_hd.c never got the memo about
the Brave New memcg World, and calls bdi_unregister directly.  It does
this without informing the file system, or the memcg code, or anything
else.  This causes the root wb associated with the bdi to be
unregistered, but none of the memcg-specific wb's are shutdown.  So when
one of these wb's are woken up to do delayed work, they try to
dereference their wb->bdi->dev to fetch the device name, but
unfortunately bdi->dev is now NULL, thanks to the bdi_unregister()
called by del_gendisk().  As a result, *boom*.

Fortunately, it looks like the rest of the writeback path is perfectly
happy with bdi->dev and bdi->owner being NULL, so the simplest fix is to
create a bdi_dev_name() function which can handle bdi->dev being NULL.
This also allows us to bulletproof the writeback tracepoints to prevent
them from dereferencing a NULL pointer and crashing the kernel if one is
tracing with memcg's enabled, and an iSCSI device dies or a USB storage
stick is pulled.

The most common way of triggering this will be hotremoval of a device
while writeback with memcg enabled is going on.  It was triggering
several times a day in a heavily loaded production environment.

Google Bug Id: 145475544

Link: https://lore.kernel.org/r/20191227194829.150110-1-tytso@mit.edu
Link: http://lkml.kernel.org/r/20191228005211.163952-1-tytso@mit.edu
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: Chris Mason <clm@fb.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-01-31 10:30:36 -08:00
Andy Shevchenko
69334ca530 lib/test_bitmap: correct test data offsets for 32-bit
On 32-bit platform the size of long is only 32 bits which makes wrong
offset in the array of 64 bit size.

Calculate offset based on BITS_PER_LONG.

Link: http://lkml.kernel.org/r/20200109103601.45929-1-andriy.shevchenko@linux.intel.com
Fixes: 30544ed5de ("lib/bitmap: introduce bitmap_replace() helper")
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Reported-by: Guenter Roeck <linux@roeck-us.net>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Yury Norov <yury.norov@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-01-31 10:30:36 -08:00
Andy Shevchenko
dc2c733e65 kdb: Use for_each_console() helper
Replace open coded single-linked list iteration loop with for_each_console()
helper in use.

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org>
2020-01-31 17:34:54 +00:00
Colin Ian King
a4f8a7fb19 kdb: remove redundant assignment to pointer bp
The point bp is assigned a value that is never read, it is being
re-assigned later to bp = &kdb_breakpoints[lowbp] in a for-loop.
Remove the redundant assignment.

Addresses-Coverity ("Unused value")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Link: https://lore.kernel.org/r/20191128130753.181246-1-colin.king@canonical.com
Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org>
2020-01-31 17:34:06 +00:00
Douglas Anderson
bbfceba15f kdb: Get rid of confusing diag msg from "rd" if current task has no regs
If you switch to a sleeping task with the "pid" command and then type
"rd", kdb tells you this:

  No current kdb registers.  You may need to select another task
  diag: -17: Invalid register name

The first message makes sense, but not the second.  Fix it by just
returning 0 after commands accessing the current registers finish if
we've already printed the "No current kdb registers" error.

While fixing kdb_rd(), change the function to use "if" rather than
"ifdef".  It cleans the function up a bit and any modern compiler will
have no trouble handling still producing good code.

Signed-off-by: Douglas Anderson <dianders@chromium.org>
Link: https://lore.kernel.org/r/20191109111624.5.I121f4c6f0c19266200bf6ef003de78841e5bfc3d@changeid
Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org>
2020-01-31 17:34:03 +00:00
Douglas Anderson
9441d5f6b7 kdb: Gid rid of implicit setting of the current task / regs
Some (but not all?) of the kdb backtrace paths would cause the
kdb_current_task and kdb_current_regs to remain changed.  As discussed
in a review of a previous patch [1], this doesn't seem intuitive, so
let's fix that.

...but, it turns out that there's actually no longer any reason to set
the current task / current regs while backtracing anymore anyway.  As
of commit 2277b49258 ("kdb: Fix stack crawling on 'running' CPUs
that aren't the master") if we're backtracing on a task running on a
CPU we ask that CPU to do the backtrace itself.  Linux can do that
without anything fancy.  If we're doing backtrace on a sleeping task
we can also do that fine without updating globals.  So this patch
mostly just turns into deleting a bunch of code.

[1] https://lore.kernel.org/r/20191010150735.dhrj3pbjgmjrdpwr@holly.lan

Signed-off-by: Douglas Anderson <dianders@chromium.org>
Link: https://lore.kernel.org/r/20191109111624.4.Ibc3d982bbeb9e46872d43973ba808cd4c79537c7@changeid
Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org>
2020-01-31 17:34:00 +00:00
Douglas Anderson
a8649fb0a8 kdb: kdb_current_task shouldn't be exported
The kdb_current_task variable has been declared in
"kernel/debug/kdb/kdb_private.h" since 2010 when kdb was added to the
mainline kernel.  This is not a public header.  There should be no
reason that kdb_current_task should be exported and there are no
in-kernel users that need it.  Remove the export.

Signed-off-by: Douglas Anderson <dianders@chromium.org>
Link: https://lore.kernel.org/r/20191109111623.3.I14b22b5eb15ca8f3812ab33e96621231304dc1f7@changeid
Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org>
2020-01-31 17:33:57 +00:00
Douglas Anderson
c67c10a67f kdb: kdb_current_regs should be private
As of the patch ("MIPS: kdb: Remove old workaround for backtracing on
other CPUs") there is no reason for kdb_current_regs to be in the
public "kdb.h".  Let's move it next to kdb_current_task.

Signed-off-by: Douglas Anderson <dianders@chromium.org>
Link: https://lore.kernel.org/r/20191109111623.2.Iadbfb484e90b557cc4b5ac9890bfca732cd99d77@changeid
Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org>
2020-01-31 17:33:54 +00:00
Douglas Anderson
b356e89b89 MIPS: kdb: Remove old workaround for backtracing on other CPUs
As of commit 2277b49258 ("kdb: Fix stack crawling on 'running' CPUs
that aren't the master") we no longer need any special case for doing
stack dumps on CPUs that are not the kdb master.  Let's remove.

Signed-off-by: Douglas Anderson <dianders@chromium.org>
Link: https://lore.kernel.org/r/20191109111623.1.I30a0cac4d9880040c8d41495bd9a567fe3e24989@changeid
Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org>
2020-01-31 17:33:51 +00:00
Linus Torvalds
e813e65038 ARM: Cleanups and corner case fixes
PPC: Bugfixes
 
 x86:
 * Support for mapping DAX areas with large nested page table entries.
 * Cleanups and bugfixes here too.  A particularly important one is
 a fix for FPU load when the thread has TIF_NEED_FPU_LOAD.  There is
 also a race condition which could be used in guest userspace to exploit
 the guest kernel, for which the embargo expired today.
 * Fast path for IPI delivery vmexits, shaving about 200 clock cycles
 from IPI latency.
 * Protect against "Spectre-v1/L1TF" (bring data in the cache via
 speculative out of bound accesses, use L1TF on the sibling hyperthread
 to read it), which unfortunately is an even bigger whack-a-mole game
 than SpectreV1.
 
 Sean continues his mission to rewrite KVM.  In addition to a sizable
 number of x86 patches, this time he contributed a pretty large refactoring
 of vCPU creation that affects all architectures but should not have any
 visible effect.
 
 s390 will come next week together with some more x86 patches.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQEcBAABAgAGBQJeMxtCAAoJEL/70l94x66DQxIIAJv9hMmXLQHGFnUMskjGErR6
 DCLSC0YRdRMwE50CerblyJtGsMwGsPyHZwvZxoAceKJ9w0Yay9cyaoJ87ItBgHoY
 ce0HrqIUYqRSJ/F8WH2lSzkzMBr839rcmqw8p1tt4D5DIsYnxHGWwRaaP+5M/1KQ
 YKFu3Hea4L00U339iIuDkuA+xgz92LIbsn38svv5fxHhPAyWza0rDEYHNgzMKuoF
 IakLf5+RrBFAh6ZuhYWQQ44uxjb+uQa9pVmcqYzzTd5t1g4PV5uXtlJKesHoAvik
 Eba8IEUJn+HgQJjhp3YxQYuLeWOwRF3bwOiZ578MlJ4OPfYXMtbdlqCQANHOcGk=
 =H/q1
 -----END PGP SIGNATURE-----

Merge tag 'kvm-5.6-1' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull KVM updates from Paolo Bonzini:
 "This is the first batch of KVM changes.

  ARM:
   - cleanups and corner case fixes.

  PPC:
   - Bugfixes

  x86:
   - Support for mapping DAX areas with large nested page table entries.

   - Cleanups and bugfixes here too. A particularly important one is a
     fix for FPU load when the thread has TIF_NEED_FPU_LOAD. There is
     also a race condition which could be used in guest userspace to
     exploit the guest kernel, for which the embargo expired today.

   - Fast path for IPI delivery vmexits, shaving about 200 clock cycles
     from IPI latency.

   - Protect against "Spectre-v1/L1TF" (bring data in the cache via
     speculative out of bound accesses, use L1TF on the sibling
     hyperthread to read it), which unfortunately is an even bigger
     whack-a-mole game than SpectreV1.

  Sean continues his mission to rewrite KVM. In addition to a sizable
  number of x86 patches, this time he contributed a pretty large
  refactoring of vCPU creation that affects all architectures but should
  not have any visible effect.

  s390 will come next week together with some more x86 patches"

* tag 'kvm-5.6-1' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (204 commits)
  x86/KVM: Clean up host's steal time structure
  x86/KVM: Make sure KVM_VCPU_FLUSH_TLB flag is not missed
  x86/kvm: Cache gfn to pfn translation
  x86/kvm: Introduce kvm_(un)map_gfn()
  x86/kvm: Be careful not to clear KVM_VCPU_FLUSH_TLB bit
  KVM: PPC: Book3S PR: Fix -Werror=return-type build failure
  KVM: PPC: Book3S HV: Release lock on page-out failure path
  KVM: arm64: Treat emulated TVAL TimerValue as a signed 32-bit integer
  KVM: arm64: pmu: Only handle supported event counters
  KVM: arm64: pmu: Fix chained SW_INCR counters
  KVM: arm64: pmu: Don't mark a counter as chained if the odd one is disabled
  KVM: arm64: pmu: Don't increment SW_INCR if PMCR.E is unset
  KVM: x86: Use a typedef for fastop functions
  KVM: X86: Add 'else' to unify fastop and execute call path
  KVM: x86: inline memslot_valid_for_gpte
  KVM: x86/mmu: Use huge pages for DAX-backed files
  KVM: x86/mmu: Remove lpage_is_disallowed() check from set_spte()
  KVM: x86/mmu: Fold max_mapping_level() into kvm_mmu_hugepage_adjust()
  KVM: x86/mmu: Zap any compound page when collapsing sptes
  KVM: x86/mmu: Remove obsolete gfn restoration in FNAME(fetch)
  ...
2020-01-31 09:30:41 -08:00
Filipe Manana
9722b10148 Btrfs: send, fix emission of invalid clone operations within the same file
When doing an incremental send and a file has extents shared with itself
at different file offsets, it's possible for send to emit clone operations
that will fail at the destination because the source range goes beyond the
file's current size. This happens when the file size has increased in the
send snapshot, there is a hole between the shared extents and both shared
extents are at file offsets which are greater the file's size in the
parent snapshot.

Example:

  $ mkfs.btrfs -f /dev/sdb
  $ mount /dev/sdb /mnt/sdb

  $ xfs_io -f -c "pwrite -S 0xf1 0 64K" /mnt/sdb/foobar
  $ btrfs subvolume snapshot -r /mnt/sdb /mnt/sdb/base
  $ btrfs send -f /tmp/1.snap /mnt/sdb/base

  # Create a 320K extent at file offset 512K.
  $ xfs_io -c "pwrite -S 0xab 512K 64K" /mnt/sdb/foobar
  $ xfs_io -c "pwrite -S 0xcd 576K 64K" /mnt/sdb/foobar
  $ xfs_io -c "pwrite -S 0xef 640K 64K" /mnt/sdb/foobar
  $ xfs_io -c "pwrite -S 0x64 704K 64K" /mnt/sdb/foobar
  $ xfs_io -c "pwrite -S 0x73 768K 64K" /mnt/sdb/foobar

  # Clone part of that 320K extent into a lower file offset (192K).
  # This file offset is greater than the file's size in the parent
  # snapshot (64K). Also the clone range is a bit behind the offset of
  # the 320K extent so that we leave a hole between the shared extents.
  $ xfs_io -c "reflink /mnt/sdb/foobar 448K 192K 192K" /mnt/sdb/foobar

  $ btrfs subvolume snapshot -r /mnt/sdb /mnt/sdb/incr
  $ btrfs send -p /mnt/sdb/base -f /tmp/2.snap /mnt/sdb/incr

  $ mkfs.btrfs -f /dev/sdc
  $ mount /dev/sdc /mnt/sdc

  $ btrfs receive -f /tmp/1.snap /mnt/sdc
  $ btrfs receive -f /tmp/2.snap /mnt/sdc
  ERROR: failed to clone extents to foobar: Invalid argument

The problem is that after processing the extent at file offset 256K, which
refers to the first 128K of the 320K extent created by the buffered write
operations, we have 'cur_inode_next_write_offset' set to 384K, which
corresponds to the end offset of the partially shared extent (256K + 128K)
and to the current file size in the receiver. Then when we process the
extent at offset 512K, we do extent backreference iteration to figure out
if we can clone the extent from some other inode or from the same inode,
and we consider the extent at offset 256K of the same inode as a valid
source for a clone operation, which is not correct because at that point
the current file size in the receiver is 384K, which corresponds to the
end of last processed extent (at file offset 256K), so using a clone
source range from 256K to 256K + 320K is invalid because that goes past
the current size of the file (384K) - this makes the receiver get an
-EINVAL error when attempting the clone operation.

So fix this by excluding clone sources that have a range that goes beyond
the current file size in the receiver when iterating extent backreferences.

A test case for fstests follows soon.

Fixes: 11f2069c11 ("Btrfs: send, allow clone operations within the same file")
CC: stable@vger.kernel.org # 5.5+
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2020-01-31 14:02:19 +01:00
Josef Bacik
f4b1363cae btrfs: do not do delalloc reservation under page lock
We ran into a deadlock in production with the fixup worker.  The stack
traces were as follows:

Thread responsible for the writeout, waiting on the page lock

  [<0>] io_schedule+0x12/0x40
  [<0>] __lock_page+0x109/0x1e0
  [<0>] extent_write_cache_pages+0x206/0x360
  [<0>] extent_writepages+0x40/0x60
  [<0>] do_writepages+0x31/0xb0
  [<0>] __writeback_single_inode+0x3d/0x350
  [<0>] writeback_sb_inodes+0x19d/0x3c0
  [<0>] __writeback_inodes_wb+0x5d/0xb0
  [<0>] wb_writeback+0x231/0x2c0
  [<0>] wb_workfn+0x308/0x3c0
  [<0>] process_one_work+0x1e0/0x390
  [<0>] worker_thread+0x2b/0x3c0
  [<0>] kthread+0x113/0x130
  [<0>] ret_from_fork+0x35/0x40
  [<0>] 0xffffffffffffffff

Thread of the fixup worker who is holding the page lock

  [<0>] start_delalloc_inodes+0x241/0x2d0
  [<0>] btrfs_start_delalloc_roots+0x179/0x230
  [<0>] btrfs_alloc_data_chunk_ondemand+0x11b/0x2e0
  [<0>] btrfs_check_data_free_space+0x53/0xa0
  [<0>] btrfs_delalloc_reserve_space+0x20/0x70
  [<0>] btrfs_writepage_fixup_worker+0x1fc/0x2a0
  [<0>] normal_work_helper+0x11c/0x360
  [<0>] process_one_work+0x1e0/0x390
  [<0>] worker_thread+0x2b/0x3c0
  [<0>] kthread+0x113/0x130
  [<0>] ret_from_fork+0x35/0x40
  [<0>] 0xffffffffffffffff

Thankfully the stars have to align just right to hit this.  First you
have to end up in the fixup worker, which is tricky by itself (my
reproducer does DIO reads into a MMAP'ed region, so not a common
operation).  Then you have to have less than a page size of free data
space and 0 unallocated space so you go down the "commit the transaction
to free up pinned space" path.  This was accomplished by a random
balance that was running on the host.  Then you get this deadlock.

I'm still in the process of trying to force the deadlock to happen on
demand, but I've hit other issues.  I can still trigger the fixup worker
path itself so this patch has been tested in that regard, so the normal
case is fine.

Fixes: 87826df0ec ("btrfs: delalloc for page dirtied out-of-band in fixup worker")
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2020-01-31 14:02:15 +01:00
Josef Bacik
5ab5805569 btrfs: drop the -EBUSY case in __extent_writepage_io
Now that we only return 0 or -EAGAIN from btrfs_writepage_cow_fixup, we
do not need this -EBUSY case.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2020-01-31 14:02:11 +01:00
Chris Mason
25f3c50219 Btrfs: keep pages dirty when using btrfs_writepage_fixup_worker
For COW, btrfs expects pages dirty pages to have been through a few setup
steps.  This includes reserving space for the new block allocations and marking
the range in the state tree for delayed allocation.

A few places outside btrfs will dirty pages directly, especially when unmapping
mmap'd pages.  In order for these to properly go through COW, we run them
through a fixup worker to wait for stable pages, and do the delalloc prep.

87826df0ec added a window where the dirty pages were cleaned, but pending
more action from the fixup worker.  We clear_page_dirty_for_io() before
we call into writepage, so the page is no longer dirty.  The commit
changed it so now we leave the page clean between unlocking it here and
the fixup worker starting at some point in the future.

During this window, page migration can jump in and relocate the page.  Once our
fixup work actually starts, it finds page->mapping is NULL and we end up
freeing the page without ever writing it.

This leads to crc errors and other exciting problems, since it screws up the
whole statemachine for waiting for ordered extents.  The fix here is to keep
the page dirty while we're waiting for the fixup worker to get to work.
This is accomplished by returning -EAGAIN from btrfs_writepage_cow_fixup
if we queued the page up for fixup, which will cause the writepage
function to redirty the page.

Because we now expect the page to be dirty once it gets to the fixup
worker we must adjust the error cases to call clear_page_dirty_for_io()
on the page.  That is the bulk of the patch, but it is not the fix, the
fix is the -EAGAIN from btrfs_writepage_cow_fixup.  We cannot separate
these two changes out because the error conditions change with the new
expectations.

Signed-off-by: Chris Mason <clm@fb.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2020-01-31 14:02:08 +01:00
Josef Bacik
a30a3d2067 btrfs: take overcommit into account in inc_block_group_ro
inc_block_group_ro does a calculation to see if we have enough room left
over if we mark this block group as read only in order to see if it's ok
to mark the block group as read only.

The problem is this calculation _only_ works for data, where our used is
always less than our total.  For metadata we will overcommit, so this
will almost always fail for metadata.

Fix this by exporting btrfs_can_overcommit, and then see if we have
enough space to remove the remaining free space in the block group we
are trying to mark read only.  If we do then we can mark this block
group as read only.

Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2020-01-31 14:02:01 +01:00
Josef Bacik
a7a63acc65 btrfs: fix force usage in inc_block_group_ro
For some reason we've translated the do_chunk_alloc that goes into
btrfs_inc_block_group_ro to force in inc_block_group_ro, but these are
two different things.

force for inc_block_group_ro is used when we are forcing the block group
read only no matter what, for example when the underlying chunk is
marked read only.  We need to not do the space check here as this block
group needs to be read only.

btrfs_inc_block_group_ro() has a do_chunk_alloc flag that indicates that
we need to pre-allocate a chunk before marking the block group read
only.  This has nothing to do with forcing, and in fact we _always_ want
to do the space check in this case, so unconditionally pass false for
force in this case.

Then fixup inc_block_group_ro to honor force as it's expected and
documented to do.

Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2020-01-31 14:01:55 +01:00
Nikolay Borisov
5750c37523 btrfs: Correctly handle empty trees in find_first_clear_extent_bit
Raviu reported that running his regular fs_trim segfaulted with the
following backtrace:

[  237.525947] assertion failed: prev, in ../fs/btrfs/extent_io.c:1595
[  237.525984] ------------[ cut here ]------------
[  237.525985] kernel BUG at ../fs/btrfs/ctree.h:3117!
[  237.525992] invalid opcode: 0000 [#1] SMP PTI
[  237.525998] CPU: 4 PID: 4423 Comm: fstrim Tainted: G     U     OE     5.4.14-8-vanilla #1
[  237.526001] Hardware name: ASUSTeK COMPUTER INC.
[  237.526044] RIP: 0010:assfail.constprop.58+0x18/0x1a [btrfs]
[  237.526079] Call Trace:
[  237.526120]  find_first_clear_extent_bit+0x13d/0x150 [btrfs]
[  237.526148]  btrfs_trim_fs+0x211/0x3f0 [btrfs]
[  237.526184]  btrfs_ioctl_fitrim+0x103/0x170 [btrfs]
[  237.526219]  btrfs_ioctl+0x129a/0x2ed0 [btrfs]
[  237.526227]  ? filemap_map_pages+0x190/0x3d0
[  237.526232]  ? do_filp_open+0xaf/0x110
[  237.526238]  ? _copy_to_user+0x22/0x30
[  237.526242]  ? cp_new_stat+0x150/0x180
[  237.526247]  ? do_vfs_ioctl+0xa4/0x640
[  237.526278]  ? btrfs_ioctl_get_supported_features+0x30/0x30 [btrfs]
[  237.526283]  do_vfs_ioctl+0xa4/0x640
[  237.526288]  ? __do_sys_newfstat+0x3c/0x60
[  237.526292]  ksys_ioctl+0x70/0x80
[  237.526297]  __x64_sys_ioctl+0x16/0x20
[  237.526303]  do_syscall_64+0x5a/0x1c0
[  237.526310]  entry_SYSCALL_64_after_hwframe+0x49/0xbe

That was due to btrfs_fs_device::aloc_tree being empty. Initially I
thought this wasn't possible and as a percaution have put the assert in
find_first_clear_extent_bit. Turns out this is indeed possible and could
happen when a file system with SINGLE data/metadata profile has a 2nd
device added. Until balance is run or a new chunk is allocated on this
device it will be completely empty.

In this case find_first_clear_extent_bit should return the full range
[0, -1ULL] and let the caller handle this i.e for trim the end will be
capped at the size of actual device.

Link: https://lore.kernel.org/linux-btrfs/izW2WNyvy1dEDweBICizKnd2KDwDiDyY2EYQr4YCwk7pkuIpthx-JRn65MPBde00ND6V0_Lh8mW0kZwzDiLDv25pUYWxkskWNJnVP0kgdMA=@protonmail.com/
Fixes: 45bfcfc168 ("btrfs: Implement find_first_clear_extent_bit")
CC: stable@vger.kernel.org # 5.2+
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2020-01-31 14:01:29 +01:00
Josef Bacik
42ffb0bf58 btrfs: flush write bio if we loop in extent_write_cache_pages
There exists a deadlock with range_cyclic that has existed forever.  If
we loop around with a bio already built we could deadlock with a writer
who has the page locked that we're attempting to write but is waiting on
a page in our bio to be written out.  The task traces are as follows

  PID: 1329874  TASK: ffff889ebcdf3800  CPU: 33  COMMAND: "kworker/u113:5"
   #0 [ffffc900297bb658] __schedule at ffffffff81a4c33f
   #1 [ffffc900297bb6e0] schedule at ffffffff81a4c6e3
   #2 [ffffc900297bb6f8] io_schedule at ffffffff81a4ca42
   #3 [ffffc900297bb708] __lock_page at ffffffff811f145b
   #4 [ffffc900297bb798] __process_pages_contig at ffffffff814bc502
   #5 [ffffc900297bb8c8] lock_delalloc_pages at ffffffff814bc684
   #6 [ffffc900297bb900] find_lock_delalloc_range at ffffffff814be9ff
   #7 [ffffc900297bb9a0] writepage_delalloc at ffffffff814bebd0
   #8 [ffffc900297bba18] __extent_writepage at ffffffff814bfbf2
   #9 [ffffc900297bba98] extent_write_cache_pages at ffffffff814bffbd

  PID: 2167901  TASK: ffff889dc6a59c00  CPU: 14  COMMAND:
  "aio-dio-invalid"
   #0 [ffffc9003b50bb18] __schedule at ffffffff81a4c33f
   #1 [ffffc9003b50bba0] schedule at ffffffff81a4c6e3
   #2 [ffffc9003b50bbb8] io_schedule at ffffffff81a4ca42
   #3 [ffffc9003b50bbc8] wait_on_page_bit at ffffffff811f24d6
   #4 [ffffc9003b50bc60] prepare_pages at ffffffff814b05a7
   #5 [ffffc9003b50bcd8] btrfs_buffered_write at ffffffff814b1359
   #6 [ffffc9003b50bdb0] btrfs_file_write_iter at ffffffff814b5933
   #7 [ffffc9003b50be38] new_sync_write at ffffffff8128f6a8
   #8 [ffffc9003b50bec8] vfs_write at ffffffff81292b9d
   #9 [ffffc9003b50bf00] ksys_pwrite64 at ffffffff81293032

I used drgn to find the respective pages we were stuck on

page_entry.page 0xffffea00fbfc7500 index 8148 bit 15 pid 2167901
page_entry.page 0xffffea00f9bb7400 index 7680 bit 0 pid 1329874

As you can see the kworker is waiting for bit 0 (PG_locked) on index
7680, and aio-dio-invalid is waiting for bit 15 (PG_writeback) on index
8148.  aio-dio-invalid has 7680, and the kworker epd looks like the
following

  crash> struct extent_page_data ffffc900297bbbb0
  struct extent_page_data {
    bio = 0xffff889f747ed830,
    tree = 0xffff889eed6ba448,
    extent_locked = 0,
    sync_io = 0
  }

Probably worth mentioning as well that it waits for writeback of the
page to complete while holding a lock on it (at prepare_pages()).

Using drgn I walked the bio pages looking for page
0xffffea00fbfc7500 which is the one we're waiting for writeback on

  bio = Object(prog, 'struct bio', address=0xffff889f747ed830)
  for i in range(0, bio.bi_vcnt.value_()):
      bv = bio.bi_io_vec[i]
      if bv.bv_page.value_() == 0xffffea00fbfc7500:
	  print("FOUND IT")

which validated what I suspected.

The fix for this is simple, flush the epd before we loop back around to
the beginning of the file during writeout.

Fixes: b293f02e14 ("Btrfs: Add writepages support")
CC: stable@vger.kernel.org # 4.4+
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2020-01-31 14:01:25 +01:00
Filipe Manana
7227ff4de5 Btrfs: fix race between adding and putting tree mod seq elements and nodes
There is a race between adding and removing elements to the tree mod log
list and rbtree that can lead to use-after-free problems.

Consider the following example that explains how/why the problems happens:

1) Task A has mod log element with sequence number 200. It currently is
   the only element in the mod log list;

2) Task A calls btrfs_put_tree_mod_seq() because it no longer needs to
   access the tree mod log. When it enters the function, it initializes
   'min_seq' to (u64)-1. Then it acquires the lock 'tree_mod_seq_lock'
   before checking if there are other elements in the mod seq list.
   Since the list it empty, 'min_seq' remains set to (u64)-1. Then it
   unlocks the lock 'tree_mod_seq_lock';

3) Before task A acquires the lock 'tree_mod_log_lock', task B adds
   itself to the mod seq list through btrfs_get_tree_mod_seq() and gets a
   sequence number of 201;

4) Some other task, name it task C, modifies a btree and because there
   elements in the mod seq list, it adds a tree mod elem to the tree
   mod log rbtree. That node added to the mod log rbtree is assigned
   a sequence number of 202;

5) Task B, which is doing fiemap and resolving indirect back references,
   calls btrfs get_old_root(), with 'time_seq' == 201, which in turn
   calls tree_mod_log_search() - the search returns the mod log node
   from the rbtree with sequence number 202, created by task C;

6) Task A now acquires the lock 'tree_mod_log_lock', starts iterating
   the mod log rbtree and finds the node with sequence number 202. Since
   202 is less than the previously computed 'min_seq', (u64)-1, it
   removes the node and frees it;

7) Task B still has a pointer to the node with sequence number 202, and
   it dereferences the pointer itself and through the call to
   __tree_mod_log_rewind(), resulting in a use-after-free problem.

This issue can be triggered sporadically with the test case generic/561
from fstests, and it happens more frequently with a higher number of
duperemove processes. When it happens to me, it either freezes the VM or
it produces a trace like the following before crashing:

  [ 1245.321140] general protection fault: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC PTI
  [ 1245.321200] CPU: 1 PID: 26997 Comm: pool Not tainted 5.5.0-rc6-btrfs-next-52 #1
  [ 1245.321235] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-0-ga698c8995f-prebuilt.qemu.org 04/01/2014
  [ 1245.321287] RIP: 0010:rb_next+0x16/0x50
  [ 1245.321307] Code: ....
  [ 1245.321372] RSP: 0018:ffffa151c4d039b0 EFLAGS: 00010202
  [ 1245.321388] RAX: 6b6b6b6b6b6b6b6b RBX: ffff8ae221363c80 RCX: 6b6b6b6b6b6b6b6b
  [ 1245.321409] RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffff8ae221363c80
  [ 1245.321439] RBP: ffff8ae20fcc4688 R08: 0000000000000002 R09: 0000000000000000
  [ 1245.321475] R10: ffff8ae20b120910 R11: 00000000243f8bb1 R12: 0000000000000038
  [ 1245.321506] R13: ffff8ae221363c80 R14: 000000000000075f R15: ffff8ae223f762b8
  [ 1245.321539] FS:  00007fdee1ec7700(0000) GS:ffff8ae236c80000(0000) knlGS:0000000000000000
  [ 1245.321591] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  [ 1245.321614] CR2: 00007fded4030c48 CR3: 000000021da16003 CR4: 00000000003606e0
  [ 1245.321642] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
  [ 1245.321668] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
  [ 1245.321706] Call Trace:
  [ 1245.321798]  __tree_mod_log_rewind+0xbf/0x280 [btrfs]
  [ 1245.321841]  btrfs_search_old_slot+0x105/0xd00 [btrfs]
  [ 1245.321877]  resolve_indirect_refs+0x1eb/0xc60 [btrfs]
  [ 1245.321912]  find_parent_nodes+0x3dc/0x11b0 [btrfs]
  [ 1245.321947]  btrfs_check_shared+0x115/0x1c0 [btrfs]
  [ 1245.321980]  ? extent_fiemap+0x59d/0x6d0 [btrfs]
  [ 1245.322029]  extent_fiemap+0x59d/0x6d0 [btrfs]
  [ 1245.322066]  do_vfs_ioctl+0x45a/0x750
  [ 1245.322081]  ksys_ioctl+0x70/0x80
  [ 1245.322092]  ? trace_hardirqs_off_thunk+0x1a/0x1c
  [ 1245.322113]  __x64_sys_ioctl+0x16/0x20
  [ 1245.322126]  do_syscall_64+0x5c/0x280
  [ 1245.322139]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
  [ 1245.322155] RIP: 0033:0x7fdee3942dd7
  [ 1245.322177] Code: ....
  [ 1245.322258] RSP: 002b:00007fdee1ec6c88 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
  [ 1245.322294] RAX: ffffffffffffffda RBX: 00007fded40210d8 RCX: 00007fdee3942dd7
  [ 1245.322314] RDX: 00007fded40210d8 RSI: 00000000c020660b RDI: 0000000000000004
  [ 1245.322337] RBP: 0000562aa89e7510 R08: 0000000000000000 R09: 00007fdee1ec6d44
  [ 1245.322369] R10: 0000000000000073 R11: 0000000000000246 R12: 00007fdee1ec6d48
  [ 1245.322390] R13: 00007fdee1ec6d40 R14: 00007fded40210d0 R15: 00007fdee1ec6d50
  [ 1245.322423] Modules linked in: ....
  [ 1245.323443] ---[ end trace 01de1e9ec5dff3cd ]---

Fix this by ensuring that btrfs_put_tree_mod_seq() computes the minimum
sequence number and iterates the rbtree while holding the lock
'tree_mod_log_lock' in write mode. Also get rid of the 'tree_mod_seq_lock'
lock, since it is now redundant.

Fixes: bd989ba359 ("Btrfs: add tree modification log functions")
Fixes: 097b8a7c9e ("Btrfs: join tree mod log code with the code holding back delayed refs")
CC: stable@vger.kernel.org # 4.4+
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2020-01-31 14:01:20 +01:00
Colin Ian King
8c173d5e04 thermal: stm32: fix spelling mistake "preprare" -> "prepare"
There is a spelling mistake in a dev_err error message. Fix it.

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Link: https://lore.kernel.org/r/20200130100537.18069-1-colin.king@canonical.com
2020-01-31 10:52:13 +01:00
Randy Dunlap
fe27f13d67 Documentation: cpu-idle-cooling: fix a SEVERE docs build failure
Sphinx ('make htmldocs') stops with a SEVERE error:

Sphinx parallel build error:
SystemMessage: /home/rdunlap/lnx/next/linux-next-20200120/Documentation/driver-api/thermal/cpu-idle-cooling.rst:69: (SEVERE/4) Unexpected section title.

^
|

so fix the .rst file so that the SEVERE build error does not happen.
Also fix another minor formatting warning (unexpected unindent).

Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Zhang Rui <rui.zhang@intel.com>
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: Amit Kucheria <amit.kucheria@verdurent.com>
Cc: linux-pm@vger.kernel.org
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Link: https://lore.kernel.org/r/712c1152-56b5-307f-b3f3-ed03a30b804a@infradead.org
2020-01-31 10:50:33 +01:00
Rafael J. Wysocki
82b2c6ffd3 Merge branches 'pm-cpufreq' and 'pm-core'
* pm-cpufreq:
  cpufreq: Avoid creating excessively large stack frames

* pm-core:
  PM: core: Fix handling of devices deleted during system-wide resume
2020-01-31 10:28:23 +01:00
Yangbo Lu
a932872f1b clk: qoriq: add ls1088a hwaccel clocks support
This patch is to add hwaccel clocks information for ls1088a.

Signed-off-by: Yangbo Lu <yangbo.lu@nxp.com>
Link: https://lkml.kernel.org/r/20191216100111.17122-1-yangbo.lu@nxp.com
Signed-off-by: Stephen Boyd <sboyd@kernel.org>
2020-01-30 16:32:13 -08:00
Wen He
d37010a3c1 clk: ls1028a: Add clock driver for Display output interface
Add clock driver for QorIQ LS1028A Display output interfaces(LCD, DPHY),
as implemented in TSMC CLN28HPM PLL, this PLL supports the programmable
integer division and range of the display output pixel clock's 27-594MHz.

Signed-off-by: Wen He <wen.he_1@nxp.com>
Signed-off-by: Michael Walle <michael@walle.cc>
Link: https://lkml.kernel.org/r/20191213083402.35678-2-wen.he_1@nxp.com
Signed-off-by: Stephen Boyd <sboyd@kernel.org>
2020-01-30 16:28:07 -08:00
Wen He
87a5ffb34b dt/bindings: clk: Add YAML schemas for LS1028A Display Clock bindings
LS1028A has a clock domain PXLCLK0 used for provide pixel clocks to Display
output interface. Add a YAML schema for this.

Signed-off-by: Wen He <wen.he_1@nxp.com>
Signed-off-by: Michael Walle <michael@walle.cc>
Reviewed-by: Rob Herring <robh@kernel.org>
Link: https://lkml.kernel.org/r/20191213083402.35678-1-wen.he_1@nxp.com
Signed-off-by: Stephen Boyd <sboyd@kernel.org>
2020-01-30 16:28:06 -08:00
Linus Torvalds
ccaaaf6fe5 MPX requires recompiling applications, which requires compiler support.
Unfortunately, GCC 9.1 is expected to be be released without support for
 MPX.  This means that there was only a relatively small window where
 folks could have ever used MPX.  It failed to gain wide adoption in the
 industry, and Linux was the only mainstream OS to ever support it widely.
 
 Support for the feature may also disappear on future processors.
 
 This set completes the process that we started during the 5.4 merge window.
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABCAAGBQJeK1/pAAoJEGg1lTBwyZKwgC8QAIiVn1d7A9Uj/WpnpgfCChCZ
 9XiV6Ak999qD9fbAcrgNfPjieaD4mtokocSRVJuRgJu5iLnIJCINlozLPe4yVl7P
 7zebnxkLq0CIA8d56bEUoFlC0J+oWYlDVQePZzNQsSk5KHVGXVLpF6U4vDVzZeQy
 cprgvdeY+ehB7G6IIo0MWTg5ylKYAsOAyVvK8NIGpKY2k6/YqCnsptnsVE7bvlHy
 TrEOiUWLv+hh0bMkZdP1PwKQKEuMO/IZly0HtviFbMN7T4TB1spfg7ELoBucEq3T
 s4EVbYRe+nIE4tuEAveaX3CgxJek8cY5MlticskdaKSEACBwabdOF55qsZy0u+WA
 PYC4iUIXfbOH8OgieKWtGX4IuSkRYdQ2nP4BOpe4ZX4+zvU7zOCIyVSKRrwkX8cc
 ADtWI5FAtB36KCgUuWnHGHNZpOxPTbTLBuBataFY4Q2uBNJEBJpscZ5H9ObtyGFU
 ZjlzqFnM0nFNDKEI1EEtv9jLzgZTU1RQ46s7EFeSeEQ2/s9wJ3+s5sBlVbljsmus
 o658bLOEaRWC/aF15dgmEXW9GAO6uifNdmbzGnRn7oEMYyFQPTWbZvi1zGz58QaG
 Y6WTtigVtsSrHS4wpYd+p+n1W06VnB6J3BpBM4G1VQv1Vm0dNd1tUOfkqOzPjg7c
 33Itmsz2LaW1mb67GlgZ
 =g4cC
 -----END PGP SIGNATURE-----

Merge tag 'mpx-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/daveh/x86-mpx

Pull x86 MPX removal from Dave Hansen:
 "MPX requires recompiling applications, which requires compiler
  support. Unfortunately, GCC 9.1 is expected to be be released without
  support for MPX. This means that there was only a relatively small
  window where folks could have ever used MPX. It failed to gain wide
  adoption in the industry, and Linux was the only mainstream OS to ever
  support it widely.

  Support for the feature may also disappear on future processors.

  This set completes the process that we started during the 5.4 merge
  window when the MPX prctl()s were removed. XSAVE support is left in
  place, which allows MPX-using KVM guests to continue to function"

* tag 'mpx-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/daveh/x86-mpx:
  x86/mpx: remove MPX from arch/x86
  mm: remove arch_bprm_mm_init() hook
  x86/mpx: remove bounds exception code
  x86/mpx: remove build infrastructure
  x86/alternatives: add missing insn.h include
2020-01-30 16:11:50 -08:00
Linus Torvalds
35c222fd32 MTD core
* block2mtd: page index should use pgoff_t
 * maps: physmap: minimal Runtime PM support
 * maps: pcmciamtd: avoid possible sleep-in-atomic-context bugs
 * concat: Fix a comment referring to an unknown symbol
 
 Raw NAND
 * Macronix: Use match_string() helper
 * Atmel: switch to using devm_fwnode_gpiod_get()
 * Denali: rework the SKIP_BYTES feature and add reset controlling
 * Brcmnand: set appropriate DMA mask
 * Cadence: add unspecified HAS_IOMEM dependency
 * Various cleanup.
 
 Onenand
 * Rename Samsung and Omap2 drivers to avoid possible build warnings
 * Enable compile testing
 * Various build issues
 * Kconfig cleanup
 
 SPI-NAND
 * Support for Toshiba TC58CVG2S0HRAIJ
 
 SPI-NOR:
 - Add support for TB selection using SR bit 6,
 - Add support for few flashes.
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCgAdFiEE9HuaYnbmDhq/XIDIJWrqGEe9VoQFAl4zMfgACgkQJWrqGEe9
 VoS0xwf+KdaihRno4SkDovcHoF7K54N6CqBhwuV9uabfy4phEr38cyvaivYu0rG7
 k/n3CUNRDghTh7DAUT7pBsjUeZn9XxvKyQaZz34TBgoQYwGz57ssp8lMRmJkYoA6
 t9z95N9bRJ+IzZJlYELCbhNq+aOGyWYgWL+aaO0CE8OyOeWzdZumdd4k7cF7rSAu
 9gWV/6iX/qP081NexfjPEVmMtNQ+0p4T7zQ01nQA7rIZiVoIgMKwBu41aRYycEEs
 LeuV5gNEDn2vGBl+u85w5oF6o1TIzDeTmh0G7Jm3NQGGco2kOOZ1O39a0hrDONrA
 hEoEIG/rAMKOtaLr6rCGnV/5/i/Tlw==
 =WC+m
 -----END PGP SIGNATURE-----

Merge tag 'mtd/for-5.6' of git://git.kernel.org/pub/scm/linux/kernel/git/mtd/linux

Pull MTD updates from Miquel Raynal:
 "MTD core
   - block2mtd: page index should use pgoff_t
   - maps: physmap: minimal Runtime PM support
   - maps: pcmciamtd: avoid possible sleep-in-atomic-context bugs
   - concat: Fix a comment referring to an unknown symbol

  Raw NAND:
   - Macronix: Use match_string() helper
   - Atmel: switch to using devm_fwnode_gpiod_get()
   - Denali: rework the SKIP_BYTES feature and add reset controlling
   - Brcmnand: set appropriate DMA mask
   - Cadence: add unspecified HAS_IOMEM dependency
   - Various cleanup.

  Onenand:
   - Rename Samsung and Omap2 drivers to avoid possible build warnings
   - Enable compile testing
   - Various build issues
   - Kconfig cleanup

  SPI-NAND:
   - Support for Toshiba TC58CVG2S0HRAIJ

  SPI-NOR:
   - Add support for TB selection using SR bit 6,
   - Add support for few flashes"

* tag 'mtd/for-5.6' of git://git.kernel.org/pub/scm/linux/kernel/git/mtd/linux: (41 commits)
  mtd: concat: Fix a comment referring to an unknown symbol
  mtd: rawnand: add unspecified HAS_IOMEM dependency
  mtd: block2mtd: page index should use pgoff_t
  mtd: maps: physmap: Add minimal Runtime PM support
  mtd: maps: pcmciamtd: fix possible sleep-in-atomic-context bugs in pcmciamtd_set_vpp()
  mtd: onenand: Rename omap2 driver to avoid a build warning
  mtd: onenand: Use a better name for samsung driver
  mtd: rawnand: atmel: switch to using devm_fwnode_gpiod_get()
  mtd: spinand: add support for Toshiba TC58CVG2S0HRAIJ
  mtd: rawnand: macronix: Use match_string() helper to simplify the code
  mtd: sharpslpart: Fix unsigned comparison to zero
  mtd: onenand: Enable compile testing of OMAP and Samsung drivers
  mtd: onenand: samsung: Fix printing format for size_t on 64-bit
  mtd: onenand: samsung: Fix pointer cast -Wpointer-to-int-cast warnings on 64 bit
  mtd: rawnand: denali: remove hard-coded DENALI_DEFAULT_OOB_SKIP_BYTES
  mtd: rawnand: denali_dt: add reset controlling
  dt-bindings: mtd: denali_dt: document reset property
  mtd: rawnand: denali_dt: Add support for configuring SPARE_AREA_SKIP_BYTES
  mtd: rawnand: denali_dt: error out if platform has no associated data
  mtd: rawnand: brcmnand: Set appropriate DMA mask
  ...
2020-01-30 15:46:02 -08:00
Linus Torvalds
e84bcd61f6 This pull request contains mostly fixes for UBI and UBIFS:
UBI:
  - Fixes for memory leaks in error paths
  - Fix for an logic error in a fastmap selfcheck
 
 UBIFS:
  - Fix for FS_IOC_SETFLAGS related to fscrypt flag
  - Support for FS_ENCRYPT_FL
  - Fix for a dead lock in bulk-read mode
 -----BEGIN PGP SIGNATURE-----
 
 iQJKBAABCAA0FiEEdgfidid8lnn52cLTZvlZhesYu8EFAl4k1v8WHHJpY2hhcmRA
 c2lnbWEtc3Rhci5hdAAKCRBm+VmF6xi7wTGuD/9PU3qZJq1w5F499YAJb2qx2hLD
 rseg7SZ4rzKwXI5m2g2EP63lgMQkJZC7u5YUv9c1m0gQnNvVXbTdhba3M1V437Kd
 F0Ce2SeqbVRi4faGWyEH0TEEmFo2Nk7Uz3iJeaXxUY8BqVrvQZaBYk6GWtj+wWIl
 Yc2ONKwzIF2BDTU5pFvP2yubHnTm00M4uP46MqAcaWoehd9L+xZhC0xiXyWFiAQE
 g/ITk4vgD3bJRkI0nYNuHxFIgafIweGlyuNfMMpfh2Yqo5/tGnppPE+H+Maokb8V
 6Gqmt9XR34ZGH8mOZsMFWxeK6e68DP2AkzL1EsiT2FlUc6hhCr+pOVEN17Y4eb//
 IRpy7l8f9BkHvR72roaQusE1UjANC2sw2VtDi4TJO6WpFRx4n94//bf+IxO32os8
 0AbIyzYCEo1Kql0wTxhqTZnHJr+zHjcFWOuzZ/95iH5wVQmb3hvlmmozL6ZPV9sG
 cqyV1sEcFhkUKgCSTmbtoYBKfEJLj4j3WYvLoI7apLYN014ExNJY7PIVfIUtMfZQ
 Sn0sN8+/gpQOOben67IQK9EdcvEhEkY4JdTHpuZEQmh3cS4HNnhzjM2A+n2sWTU8
 lxkakeemcO2sV6ue/Vg6Fq4fPEnQtyOVVsSUHjA7hIy5JXwprDvft4gHkI2r40FI
 fC9PuCjoUIIJtMyA7g==
 =9tMk
 -----END PGP SIGNATURE-----

Merge tag 'upstream-5.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/ubifs

Pull UBI/UBIFS updates from Miquel Raynal:
 "This pull request contains mostly fixes for UBI and UBIFS:

  UBI:
   - Fixes for memory leaks in error paths
   - Fix for an logic error in a fastmap selfcheck

  UBIFS:
   - Fix for FS_IOC_SETFLAGS related to fscrypt flag
   - Support for FS_ENCRYPT_FL
   - Fix for a dead lock in bulk-read mode"

Sent on behalf of Richard Weinberger who is traveling.

* tag 'upstream-5.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/ubifs:
  ubi: Fix an error pointer dereference in error handling code
  ubifs: Fix memory leak from c->sup_node
  ubifs: Fix ino_t format warnings in orphan_delete()
  ubifs: Fix deadlock in concurrent bulk-read and writepage
  ubifs: Fix wrong memory allocation
  ubi: Free the normal volumes in error paths of ubi_attach_mtd_dev()
  ubi: Check the presence of volume before call ubi_fastmap_destroy_checkmap()
  ubifs: Add support for FS_ENCRYPT_FL
  ubifs: Fix FS_IOC_SETFLAGS unexpectedly clearing encrypt flag
  ubi: wl: Remove set but not used variable 'prev_e'
  ubi: fastmap: Fix inverted logic in seen selfcheck
2020-01-30 15:44:12 -08:00
Linus Torvalds
6e135baed8 f2fs-for-5.6
In this series, we've implemented transparent compression experimentally. It
 supports LZO and LZ4, but will add more later as we investigate in the field
 more. At this point, the feature doesn't expose compressed space to user
 directly in order to guarantee potential data updates later to the space.
 Instead, the main goal is to reduce data writes to flash disk as much as
 possible, resulting in extending disk life time as well as relaxing IO
 congestion. Alternatively, we're also considering to add ioctl() to reclaim
 compressed space and show it to user after putting the immutable bit.
 
 Enhancement:
  - add compression support
  - avoid unnecessary locks in quota ops
  - harden power-cut scenario for zoned block devices
  - use private bio_set to avoid IO congestion
  - replace GC mutex with rwsem to serialize callers
 
 Bug fix:
  - fix dentry consistency and memory corruption in rename()'s error case
  - fix wrong swap extent reports
  - fix casefolding bugs
  - change lock coverage to avoid deadlock
  - avoid GFP_KERNEL under f2fs_lock_op
 
 And, we've cleaned up sysfs entries to prepare no debugfs.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEE00UqedjCtOrGVvQiQBSofoJIUNIFAl4zInwACgkQQBSofoJI
 UNL4Tg/+JBbVEFa3IUBGMdbjfgd/g0Jye++iMAYYGRWT6Ll/IGcHRV9NunITjgWU
 mBZqdhI28kXeiGCcewB1ZvivjLx22X4n6yevHk2B5A6PNe9IDCHi0HOAhJJHkjPH
 ecv2L+vX3Oj4y0+H7JNz9Fo3OIPJvMPtCQWlg1z+VQyhB85zNP7fZlvvIY4tG8yw
 ERo0YNotLqwcF1BxCwNbAhV3aJGDxar+MI//yNzpiwDX7IptVpqestfcoIYc9kKL
 4kSWRyEIGwcuIeyoM6aofGS9t4Z/Oe/gdqcxNr6l5n0Q/tMTpb4b/fJFGNr6RRx9
 X9NQo8flkQb2DEIOP0DVpO2aPebzsVtzg3LZUOLA83+wCHfwINtHai2Dy2zDJ2my
 BrVdou8fe2oxoaYihJg/Tz9cd0nA/6mZArtpYvDImAmX/xuGOvVk9zZkXNwc9nVX
 EyVzy0vW4lA6gAIJ95aG6DDhJcAtVoy0MhBRWG92Pufxhn9aW24AV63ChWUf9DRx
 /3RqpMAuQ3UC2gOxXKKnr54lsdhUIMn/y9sjROkVvQ1BvgRVxO8I4GFvMHMKv9pR
 9KXiVRdzyYERyoL4+MF7A2zTnw+RHL4RVILa85p2ALGy2jQ1UuNUQi0BN9x2u1v8
 S1ifNNX8SwOP+83ImFJhhn3HybpFQ45aLO3F7ZjKBQAnufJu+xw=
 =zeoY
 -----END PGP SIGNATURE-----

Merge tag 'f2fs-for-5.6' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs

Pull f2fs updates from Jaegeuk Kim:
 "In this series, we've implemented transparent compression
  experimentally. It supports LZO and LZ4, but will add more later as we
  investigate in the field more.

  At this point, the feature doesn't expose compressed space to user
  directly in order to guarantee potential data updates later to the
  space. Instead, the main goal is to reduce data writes to flash disk
  as much as possible, resulting in extending disk life time as well as
  relaxing IO congestion.

  Alternatively, we're also considering to add ioctl() to reclaim
  compressed space and show it to user after putting the immutable bit.

  Enhancements:
   - add compression support
   - avoid unnecessary locks in quota ops
   - harden power-cut scenario for zoned block devices
   - use private bio_set to avoid IO congestion
   - replace GC mutex with rwsem to serialize callers

  Bug fixes:
   - fix dentry consistency and memory corruption in rename()'s error case
   - fix wrong swap extent reports
   - fix casefolding bugs
   - change lock coverage to avoid deadlock
   - avoid GFP_KERNEL under f2fs_lock_op

  And, we've cleaned up sysfs entries to prepare no debugfs"

* tag 'f2fs-for-5.6' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (31 commits)
  f2fs: fix race conditions in ->d_compare() and ->d_hash()
  f2fs: fix dcache lookup of !casefolded directories
  f2fs: Add f2fs stats to sysfs
  f2fs: delete duplicate information on sysfs nodes
  f2fs: change to use rwsem for gc_mutex
  f2fs: update f2fs document regarding to fsync_mode
  f2fs: add a way to turn off ipu bio cache
  f2fs: code cleanup for f2fs_statfs_project()
  f2fs: fix miscounted block limit in f2fs_statfs_project()
  f2fs: show the CP_PAUSE reason in checkpoint traces
  f2fs: fix deadlock allocating bio_post_read_ctx from mempool
  f2fs: remove unneeded check for error allocating bio_post_read_ctx
  f2fs: convert inline_dir early before starting rename
  f2fs: fix memleak of kobject
  f2fs: fix to add swap extent correctly
  f2fs: run fsck when getting bad inode during GC
  f2fs: support data compression
  f2fs: free sysfs kobject
  f2fs: declare nested quota_sem and remove unnecessary sems
  f2fs: don't put new_page twice in f2fs_rename
  ...
2020-01-30 15:39:24 -08:00
Linus Torvalds
0196be12aa \n
-----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEq1nRK9aeMoq1VSgcnJ2qBz9kQNkFAl4zANcACgkQnJ2qBz9k
 QNkyBQgA5/ppAhSC7Snc6BDm5PMiOJjN+FhYB1W9bHbkRlKfTetJxQTxbPpokZPq
 A+99KuuNb3Uay2XWqan2pwZ90/9SIUZT8HnwNYwEHh33Nt76A1ybqqM0IAk+RWus
 KjW7Jg/xCbbFKQX/estngjIlniUQ0WP7VTTwS/NPnvsIYNEpWJQvyIecm2DZhWGS
 fmbn5x7PYnyveADd2Tf9z0iOKKI0ysLYksUlx+Ndg3fwPaWsI57tgUZL0Tzf552S
 cCsRjQrcnhjuHTDEhH9HOGQlu45U4bBNkXKKoc1HUrp58UyTY2Rnn/QCM8jkTpzB
 7NwoFyqPtWguJTFDsUH1rmqQisYoMQ==
 =1v6t
 -----END PGP SIGNATURE-----

Merge tag 'for_v5.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs

Pull UDF, quota, reiserfs, ext2 fixes and cleanups from Jan Kara:
 "A few assorted fixes and cleanups for udf, quota, reiserfs, and ext2"

* tag 'for_v5.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
  fs/reiserfs: remove unused macros
  fs/quota: remove unused macro
  udf: Clarify meaning of f_files in udf_statfs
  udf: Allow writing to 'Rewritable' partitions
  udf: Disallow R/W mode for disk with Metadata partition
  udf: Fix meaning of ENTITYID_FLAGS_* macros to be really bitwise-or flags
  udf: Fix free space reporting for metadata and virtual partitions
  udf: Update header files to UDF 2.60
  udf: Move OSTA Identifier Suffix macros from ecma_167.h to osta_udf.h
  udf: Fix spelling in EXT_NEXT_EXTENT_ALLOCDESCS
  ext2: Adjust indentation in ext2_fill_super
  quota: avoid time_t in v1_disk_dqblk definition
  reiserfs: Fix spurious unlock in reiserfs_fill_super() error handling
  reiserfs: Fix memory leak of journal device string
  ext2: set proper errno in error case of ext2_fill_super()
2020-01-30 15:37:41 -08:00
Linus Torvalds
91f1a9566f New code for 5.6:
- Get rid of compat_time_t
 - Convert time_t to time64_t in quota code
 - Remove shadow variables
 - Prevent ATTR_ flag misuse in the attrmulti ioctls
 - Clean out strlen in the attr code
 - Remove some bogus asserts
 - Fix various file size limit calculation errors with 32-bit kernels
 - Pack xfs_dir2_sf_entry_t to fix build errors on arm oabi
 - Fix nowait inode locking calls for directio aio reads.
 - Fix memory corruption bugs when invalidating remote xattr value
   buffers.
 - Streamline remote attr value removal.
 - Make the buffer log format size consistent across platforms.
 - Strengthen buffer log format size checking.
 - Fix messed up return types of xfs_inode_need_cow.
 - Fix some unused variable warnings.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEUzaAxoMeQq6m2jMV+H93GTRKtOsFAl4rGQcACgkQ+H93GTRK
 tOv1uhAAo5F38h9ZpdoMK4TPJNaREmLtM+MAA03l3sk+BY5zGDo690WDaxa5cd8h
 cJi8pqHpcnroJVptCkbdagx0BIPnHMVpw7CKFpBlCjNHpFeO1ULLpk/RcvwISRZy
 OvEOmpn6Z/FQ2jTBjr45KXIiS5DAK3KRMrmGdvh6j73YVhQ1lgadY/6Jj2TRjzT1
 FFjnYMGhufBsP0yV9Rg10AcIhNCIAr3RYmfjqFXMA8raEX33gdpsJ6GU6r5iyXRJ
 B9dByoBlGCL15ZlxZOqEiN4omqqBLux5jrJr1tg0L0hbfu7UIxMcwXiSTnoaO2SZ
 G7GjlEO3wszf3wGEeaaJd/tN58SQDvfz3yY9vZObrlTelN5iDrcziLHbWc+xkwPh
 mykly36x8+dZ8kqgBxiF1WFgRheYNQIWnZ9wtCfWvsPtjrklIZNZqKDEAxXnSg0w
 8lRXgeNfy6Jh78A817aPQibqkUAtxSk8RJhimkPWyKwUA3jSzgv1LsjgzoRCM8ik
 /vmmRKmXviFrQwIgDKYQc/wHx58khKBi2Kna+rZUJrozrBU+P0EfG+AU6Rxjw1ob
 cOOSNooI5kp+uTrmlaKKr9mua80m0hACgrFT9Fj+SJWgi313343VVNXubRgK/yV7
 WxTcTOZadkQgqgeQSMzwFQuQPjQob9bNoQf5PK01sJ/4v96Zsmg=
 =J5Em
 -----END PGP SIGNATURE-----

Merge tag 'xfs-5.6-merge-6' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux

Pull xfs updates from Darrick Wong:
 "In this release we clean out the last of the old 32-bit timestamp
  code, fix a number of bugs and memory corruptions on 32-bit platforms,
  and a refactoring of some of the extended attribute code.

  I think I'll be back next week with some refactoring of how the XFS
  buffer code returns error codes, however I prefer to hold onto that
  for another week to let it soak a while longer

  Summary:

   - Get rid of compat_time_t

   - Convert time_t to time64_t in quota code

   - Remove shadow variables

   - Prevent ATTR_ flag misuse in the attrmulti ioctls

   - Clean out strlen in the attr code

   - Remove some bogus asserts

   - Fix various file size limit calculation errors with 32-bit kernels

   - Pack xfs_dir2_sf_entry_t to fix build errors on arm oabi

   - Fix nowait inode locking calls for directio aio reads

   - Fix memory corruption bugs when invalidating remote xattr value
     buffers

   - Streamline remote attr value removal

   - Make the buffer log format size consistent across platforms

   - Strengthen buffer log format size checking

   - Fix messed up return types of xfs_inode_need_cow

   - Fix some unused variable warnings"

* tag 'xfs-5.6-merge-6' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: (24 commits)
  xfs: remove unused variable 'done'
  xfs: fix uninitialized variable in xfs_attr3_leaf_inactive
  xfs: change return value of xfs_inode_need_cow to int
  xfs: check log iovec size to make sure it's plausibly a buffer log format
  xfs: make struct xfs_buf_log_format have a consistent size
  xfs: complain if anyone tries to create a too-large buffer log item
  xfs: clean up xfs_buf_item_get_format return value
  xfs: streamline xfs_attr3_leaf_inactive
  xfs: fix memory corruption during remote attr value buffer invalidation
  xfs: refactor remote attr value buffer invalidation
  xfs: fix IOCB_NOWAIT handling in xfs_file_dio_aio_read
  xfs: Add __packed to xfs_dir2_sf_entry_t definition
  xfs: fix s_maxbytes computation on 32-bit kernels
  xfs: truncate should remove all blocks, not just to the end of the page cache
  xfs: introduce XFS_MAX_FILEOFF
  xfs: remove bogus assertion when online repair isn't enabled
  xfs: Remove all strlen in all xfs_attr_* functions for attr names.
  xfs: fix misuse of the XFS_ATTR_INCOMPLETE flag
  xfs: also remove cached ACLs when removing the underlying attr
  xfs: reject invalid flags combinations in XFS_IOC_ATTRMULTI_BY_HANDLE
  ...
2020-01-30 15:24:24 -08:00
Linus Torvalds
e5da4c933c This merge window, we've added some performance improvements in how we
handle inode locking in the read/write paths, and improving the
 performance of Direct I/O overwrites.  We also now record the error
 code which caused the first and most recent ext4_error() report in the
 superblock, to make it easier to root cause problems in production
 systems.  There are also many of the usual cleanups and miscellaneous
 bug fixes.
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEK2m5VNv+CHkogTfJ8vlZVpUNgaMFAl4yBf0ACgkQ8vlZVpUN
 gaOK8Af9EsY1vyR/IvEosfXJoKIqnTXN1SLt94iAOUh6dNeVNcyv1SIzRGFrpmsg
 uHY02EkcTl68b/AjV7ieDpOnOSmlP7NzynuVoar2hrjKX0MzpEu03Vv1a3dUQKuU
 zcdchi83EwRjEvegsNK/VF3FFadk3TtC7x+7o6p840V6OAyp5CXhjm1akJqIJwvd
 A4gTpruTSRIFg6Jj36HEDNRgSAeILed3wC7Ywtxt51tLK7Lp/qB1EuvYodMQRvGz
 d0fRhbNHKepVYfxwpDUDMFnrqDPZ/SZGF73XBxP2zHd6SXy9dBLzGsRL+oj9tTUg
 YQJtt4Yxjjg8Q1UrMyMRzQpi4S8dAQ==
 =pVeR
 -----END PGP SIGNATURE-----

Merge tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4

Pull ext4 updates from Ted Ts'o:
 "This merge window, we've added some performance improvements in how we
  handle inode locking in the read/write paths, and improving the
  performance of Direct I/O overwrites.

  We also now record the error code which caused the first and most
  recent ext4_error() report in the superblock, to make it easier to
  root cause problems in production systems.

  There are also many of the usual cleanups and miscellaneous bug fixes"

* tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (49 commits)
  jbd2: clean __jbd2_journal_abort_hard() and __journal_abort_soft()
  jbd2: make sure ESHUTDOWN to be recorded in the journal superblock
  ext4, jbd2: ensure panic when aborting with zero errno
  jbd2: switch to use jbd2_journal_abort() when failed to submit the commit record
  jbd2_seq_info_next should increase position index
  jbd2: remove pointless assertion in __journal_remove_journal_head
  ext4,jbd2: fix comment and code style
  jbd2: delete the duplicated words in the comments
  ext4: fix extent_status trace points
  ext4: fix symbolic enum printing in trace output
  ext4: choose hardlimit when softlimit is larger than hardlimit in ext4_statfs_project()
  ext4: fix race conditions in ->d_compare() and ->d_hash()
  ext4: make dioread_nolock the default
  ext4: fix extent_status fragmentation for plain files
  jbd2: clear JBD2_ABORT flag before journal_reset to update log tail info when load journal
  ext4: drop ext4_kvmalloc()
  ext4: Add EXT4_IOC_FSGETXATTR/EXT4_IOC_FSSETXATTR to compat_ioctl
  ext4: remove unused macro MPAGE_DA_EXTENT_TAIL
  ext4: add missing braces in ext4_ext_drop_refs()
  ext4: fix some nonstandard indentation in extents.c
  ...
2020-01-30 15:17:05 -08:00
Ronnie Sahlberg
c54849ddd8 cifs: fix soft mounts hanging in the reconnect code
RHBZ: 1795429

In recent DFS updates we have a new variable controlling how many times we will
retry to reconnect the share.
If DFS is not used, then this variable is initialized to 0 in:

static inline int
dfs_cache_get_nr_tgts(const struct dfs_cache_tgt_list *tl)
{
        return tl ? tl->tl_numtgts : 0;
}

This means that in the reconnect loop in smb2_reconnect() we will immediately wrap retries to -1
and never actually get to pass this conditional:

                if (--retries)
                        continue;

The effect is that we no longer reach the point where we fail the commands with -EHOSTDOWN
and basically the kernel threads are virtually hung and unkillable.

Fixes: a3a53b7603 (cifs: Add support for failover in smb2_reconnect())
Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
CC: Stable <stable@vger.kernel.org>
2020-01-30 15:23:55 -06:00