Commit Graph

1215413 Commits

Author SHA1 Message Date
Juntong Deng
bbe246f875 selftests/mm: fix awk usage in charge_reserved_hugetlb.sh and hugetlb_reparenting_test.sh that may cause error
According to the awk manual, the -e option does not need to be specified
in front of 'program' (unless you need to mix program-file).

The redundant -e option can cause error when users use awk tools other
than gawk (for example, mawk does not support the -e option).

Error Example:
awk: not an option: -e

Link: https://lkml.kernel.org/r/VI1P193MB075228810591AF2FDD7D42C599C3A@VI1P193MB0752.EURP193.PROD.OUTLOOK.COM
Signed-off-by: Juntong Deng <juntong.deng@outlook.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-09-29 17:20:48 -07:00
Yang Shi
24526268f4 mm: mempolicy: keep VMA walk if both MPOL_MF_STRICT and MPOL_MF_MOVE are specified
When calling mbind() with MPOL_MF_{MOVE|MOVEALL} | MPOL_MF_STRICT, kernel
should attempt to migrate all existing pages, and return -EIO if there is
misplaced or unmovable page.  Then commit 6f4576e368 ("mempolicy: apply
page table walker on queue_pages_range()") messed up the return value and
didn't break VMA scan early ianymore when MPOL_MF_STRICT alone.  The
return value problem was fixed by commit a7f40cfe3b ("mm: mempolicy:
make mbind() return -EIO when MPOL_MF_STRICT is specified"), but it broke
the VMA walk early if unmovable page is met, it may cause some pages are
not migrated as expected.

The code should conceptually do:

 if (MPOL_MF_MOVE|MOVEALL)
     scan all vmas
     try to migrate the existing pages
     return success
 else if (MPOL_MF_MOVE* | MPOL_MF_STRICT)
     scan all vmas
     try to migrate the existing pages
     return -EIO if unmovable or migration failed
 else /* MPOL_MF_STRICT alone */
     break early if meets unmovable and don't call mbind_range() at all
 else /* none of those flags */
     check the ranges in test_walk, EFAULT without mbind_range() if discontig.

Fixed the behavior.

Link: https://lkml.kernel.org/r/20230920223242.3425775-1-yang@os.amperecomputing.com
Fixes: a7f40cfe3b ("mm: mempolicy: make mbind() return -EIO when MPOL_MF_STRICT is specified")
Signed-off-by: Yang Shi <yang@os.amperecomputing.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Rafael Aquini <aquini@redhat.com>
Cc: Kirill A. Shutemov <kirill@shutemov.name>
Cc: David Rientjes <rientjes@google.com>
Cc: <stable@vger.kernel.org>	[4.9+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-09-29 17:20:48 -07:00
Jinjie Ruan
45120b1574 mm/damon/vaddr-test: fix memory leak in damon_do_test_apply_three_regions()
When CONFIG_DAMON_VADDR_KUNIT_TEST=y and making CONFIG_DEBUG_KMEMLEAK=y
and CONFIG_DEBUG_KMEMLEAK_AUTO_SCAN=y, the below memory leak is detected.

Since commit 9f86d62429 ("mm/damon/vaddr-test: remove unnecessary
variables"), the damon_destroy_ctx() is removed, but still call
damon_new_target() and damon_new_region(), the damon_region which is
allocated by kmem_cache_alloc() in damon_new_region() and the damon_target
which is allocated by kmalloc in damon_new_target() are not freed.  And
the damon_region which is allocated in damon_new_region() in
damon_set_regions() is also not freed.

So use damon_destroy_target to free all the damon_regions and damon_target.

    unreferenced object 0xffff888107c9a940 (size 64):
      comm "kunit_try_catch", pid 1069, jiffies 4294670592 (age 732.761s)
      hex dump (first 32 bytes):
        00 00 00 00 00 00 00 00 06 00 00 00 6b 6b 6b 6b  ............kkkk
        60 c7 9c 07 81 88 ff ff f8 cb 9c 07 81 88 ff ff  `...............
      backtrace:
        [<ffffffff817e0167>] kmalloc_trace+0x27/0xa0
        [<ffffffff819c11cf>] damon_new_target+0x3f/0x1b0
        [<ffffffff819c7d55>] damon_do_test_apply_three_regions.constprop.0+0x95/0x3e0
        [<ffffffff819c82be>] damon_test_apply_three_regions1+0x21e/0x260
        [<ffffffff829fce6a>] kunit_generic_run_threadfn_adapter+0x4a/0x90
        [<ffffffff81237cf6>] kthread+0x2b6/0x380
        [<ffffffff81097add>] ret_from_fork+0x2d/0x70
        [<ffffffff81003791>] ret_from_fork_asm+0x11/0x20
    unreferenced object 0xffff8881079cc740 (size 56):
      comm "kunit_try_catch", pid 1069, jiffies 4294670592 (age 732.761s)
      hex dump (first 32 bytes):
        05 00 00 00 00 00 00 00 14 00 00 00 00 00 00 00  ................
        6b 6b 6b 6b 6b 6b 6b 6b 00 00 00 00 6b 6b 6b 6b  kkkkkkkk....kkkk
      backtrace:
        [<ffffffff819bc492>] damon_new_region+0x22/0x1c0
        [<ffffffff819c7d91>] damon_do_test_apply_three_regions.constprop.0+0xd1/0x3e0
        [<ffffffff819c82be>] damon_test_apply_three_regions1+0x21e/0x260
        [<ffffffff829fce6a>] kunit_generic_run_threadfn_adapter+0x4a/0x90
        [<ffffffff81237cf6>] kthread+0x2b6/0x380
        [<ffffffff81097add>] ret_from_fork+0x2d/0x70
        [<ffffffff81003791>] ret_from_fork_asm+0x11/0x20
    unreferenced object 0xffff888107c9ac40 (size 64):
      comm "kunit_try_catch", pid 1071, jiffies 4294670595 (age 732.843s)
      hex dump (first 32 bytes):
        00 00 00 00 00 00 00 00 06 00 00 00 6b 6b 6b 6b  ............kkkk
        a0 cc 9c 07 81 88 ff ff 78 a1 76 07 81 88 ff ff  ........x.v.....
      backtrace:
        [<ffffffff817e0167>] kmalloc_trace+0x27/0xa0
        [<ffffffff819c11cf>] damon_new_target+0x3f/0x1b0
        [<ffffffff819c7d55>] damon_do_test_apply_three_regions.constprop.0+0x95/0x3e0
        [<ffffffff819c851e>] damon_test_apply_three_regions2+0x21e/0x260
        [<ffffffff829fce6a>] kunit_generic_run_threadfn_adapter+0x4a/0x90
        [<ffffffff81237cf6>] kthread+0x2b6/0x380
        [<ffffffff81097add>] ret_from_fork+0x2d/0x70
        [<ffffffff81003791>] ret_from_fork_asm+0x11/0x20
    unreferenced object 0xffff8881079ccc80 (size 56):
      comm "kunit_try_catch", pid 1071, jiffies 4294670595 (age 732.843s)
      hex dump (first 32 bytes):
        05 00 00 00 00 00 00 00 14 00 00 00 00 00 00 00  ................
        6b 6b 6b 6b 6b 6b 6b 6b 00 00 00 00 6b 6b 6b 6b  kkkkkkkk....kkkk
      backtrace:
        [<ffffffff819bc492>] damon_new_region+0x22/0x1c0
        [<ffffffff819c7d91>] damon_do_test_apply_three_regions.constprop.0+0xd1/0x3e0
        [<ffffffff819c851e>] damon_test_apply_three_regions2+0x21e/0x260
        [<ffffffff829fce6a>] kunit_generic_run_threadfn_adapter+0x4a/0x90
        [<ffffffff81237cf6>] kthread+0x2b6/0x380
        [<ffffffff81097add>] ret_from_fork+0x2d/0x70
        [<ffffffff81003791>] ret_from_fork_asm+0x11/0x20
    unreferenced object 0xffff888107c9af40 (size 64):
      comm "kunit_try_catch", pid 1073, jiffies 4294670597 (age 733.011s)
      hex dump (first 32 bytes):
        00 00 00 00 00 00 00 00 06 00 00 00 6b 6b 6b 6b  ............kkkk
        20 a2 76 07 81 88 ff ff b8 a6 76 07 81 88 ff ff   .v.......v.....
      backtrace:
        [<ffffffff817e0167>] kmalloc_trace+0x27/0xa0
        [<ffffffff819c11cf>] damon_new_target+0x3f/0x1b0
        [<ffffffff819c7d55>] damon_do_test_apply_three_regions.constprop.0+0x95/0x3e0
        [<ffffffff819c877e>] damon_test_apply_three_regions3+0x21e/0x260
        [<ffffffff829fce6a>] kunit_generic_run_threadfn_adapter+0x4a/0x90
        [<ffffffff81237cf6>] kthread+0x2b6/0x380
        [<ffffffff81097add>] ret_from_fork+0x2d/0x70
        [<ffffffff81003791>] ret_from_fork_asm+0x11/0x20
    unreferenced object 0xffff88810776a200 (size 56):
      comm "kunit_try_catch", pid 1073, jiffies 4294670597 (age 733.011s)
      hex dump (first 32 bytes):
        05 00 00 00 00 00 00 00 14 00 00 00 00 00 00 00  ................
        6b 6b 6b 6b 6b 6b 6b 6b 00 00 00 00 6b 6b 6b 6b  kkkkkkkk....kkkk
      backtrace:
        [<ffffffff819bc492>] damon_new_region+0x22/0x1c0
        [<ffffffff819c7d91>] damon_do_test_apply_three_regions.constprop.0+0xd1/0x3e0
        [<ffffffff819c877e>] damon_test_apply_three_regions3+0x21e/0x260
        [<ffffffff829fce6a>] kunit_generic_run_threadfn_adapter+0x4a/0x90
        [<ffffffff81237cf6>] kthread+0x2b6/0x380
        [<ffffffff81097add>] ret_from_fork+0x2d/0x70
        [<ffffffff81003791>] ret_from_fork_asm+0x11/0x20
    unreferenced object 0xffff88810776a740 (size 56):
      comm "kunit_try_catch", pid 1073, jiffies 4294670597 (age 733.025s)
      hex dump (first 32 bytes):
        3d 00 00 00 00 00 00 00 3f 00 00 00 00 00 00 00  =.......?.......
        6b 6b 6b 6b 6b 6b 6b 6b 00 00 00 00 6b 6b 6b 6b  kkkkkkkk....kkkk
      backtrace:
        [<ffffffff819bc492>] damon_new_region+0x22/0x1c0
        [<ffffffff819bfcc2>] damon_set_regions+0x4c2/0x8e0
        [<ffffffff819c7dbb>] damon_do_test_apply_three_regions.constprop.0+0xfb/0x3e0
        [<ffffffff819c877e>] damon_test_apply_three_regions3+0x21e/0x260
        [<ffffffff829fce6a>] kunit_generic_run_threadfn_adapter+0x4a/0x90
        [<ffffffff81237cf6>] kthread+0x2b6/0x380
        [<ffffffff81097add>] ret_from_fork+0x2d/0x70
        [<ffffffff81003791>] ret_from_fork_asm+0x11/0x20
    unreferenced object 0xffff888108038240 (size 64):
      comm "kunit_try_catch", pid 1075, jiffies 4294670600 (age 733.022s)
      hex dump (first 32 bytes):
        00 00 00 00 00 00 00 00 03 00 00 00 6b 6b 6b 6b  ............kkkk
        48 ad 76 07 81 88 ff ff 98 ae 76 07 81 88 ff ff  H.v.......v.....
      backtrace:
        [<ffffffff817e0167>] kmalloc_trace+0x27/0xa0
        [<ffffffff819c11cf>] damon_new_target+0x3f/0x1b0
        [<ffffffff819c7d55>] damon_do_test_apply_three_regions.constprop.0+0x95/0x3e0
        [<ffffffff819c898d>] damon_test_apply_three_regions4+0x1cd/0x210
        [<ffffffff829fce6a>] kunit_generic_run_threadfn_adapter+0x4a/0x90
        [<ffffffff81237cf6>] kthread+0x2b6/0x380
        [<ffffffff81097add>] ret_from_fork+0x2d/0x70
        [<ffffffff81003791>] ret_from_fork_asm+0x11/0x20
    unreferenced object 0xffff88810776ad28 (size 56):
      comm "kunit_try_catch", pid 1075, jiffies 4294670600 (age 733.022s)
      hex dump (first 32 bytes):
        05 00 00 00 00 00 00 00 07 00 00 00 00 00 00 00  ................
        6b 6b 6b 6b 6b 6b 6b 6b 00 00 00 00 6b 6b 6b 6b  kkkkkkkk....kkkk
      backtrace:
        [<ffffffff819bc492>] damon_new_region+0x22/0x1c0
        [<ffffffff819bfcc2>] damon_set_regions+0x4c2/0x8e0
        [<ffffffff819c7dbb>] damon_do_test_apply_three_regions.constprop.0+0xfb/0x3e0
        [<ffffffff819c898d>] damon_test_apply_three_regions4+0x1cd/0x210
        [<ffffffff829fce6a>] kunit_generic_run_threadfn_adapter+0x4a/0x90
        [<ffffffff81237cf6>] kthread+0x2b6/0x380
        [<ffffffff81097add>] ret_from_fork+0x2d/0x70
        [<ffffffff81003791>] ret_from_fork_asm+0x11/0x20

Link: https://lkml.kernel.org/r/20230925072100.3725620-1-ruanjinjie@huawei.com
Fixes: 9f86d62429 ("mm/damon/vaddr-test: remove unnecessary variables")
Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com>
Reviewed-by: SeongJae Park <sj@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-09-29 17:20:47 -07:00
Michal Hocko
4597648fdd mm, memcg: reconsider kmem.limit_in_bytes deprecation
This reverts commits 86327e8eb9 ("memcg: drop kmem.limit_in_bytes") and
partially reverts 58056f7750 ("memcg, kmem: further deprecate
kmem.limit_in_bytes") which have incrementally removed support for the
kernel memory accounting hard limit.  Unfortunately it has turned out that
there is still userspace depending on the existence of
memory.kmem.limit_in_bytes [1].  The underlying functionality is not
really required but the non-existent file just confuses the userspace
which fails in the result.  The patch to fix this on the userspace side
has been submitted but it is hard to predict how it will propagate through
the maze of 3rd party consumers of the software.

Now, reverting alone 86327e8eb9 is not an option because there is
another set of userspace which cannot cope with ENOTSUPP returned when
writing to the file.  Therefore we have to go and revisit 58056f7750 as
well.  There are two ways to go ahead.  Either we give up on the
deprecation and fully revert 58056f7750 as well or we can keep
kmem.limit_in_bytes but make the write a noop and warn about the fact. 
This should work for both known breaking workloads which depend on the
existence but do not depend on the hard limit enforcement.

Note to backporters to stable trees.  a8c49af3be ("memcg: add per-memcg
total kernel memory stat") introduced in 4.18 has added memcg_account_kmem
so the accounting is not done by obj_cgroup_charge_pages directly for v1
anymore.  Prior kernels need to add it explicitly (thanks to Johannes for
pointing this out).

[akpm@linux-foundation.org: fix build - remove unused local]
Link: http://lkml.kernel.org/r/20230920081101.GA12096@linuxonhyperv3.guj3yctzbm1etfxqx2vob5hsef.xx.internal.cloudapp.net [1]
Link: https://lkml.kernel.org/r/ZRE5VJozPZt9bRPy@dhcp22.suse.cz
Fixes: 86327e8eb9 ("memcg: drop kmem.limit_in_bytes")
Fixes: 58056f7750 ("memcg, kmem: further deprecate kmem.limit_in_bytes")
Signed-off-by: Michal Hocko <mhocko@suse.com>
Acked-by: Shakeel Butt <shakeelb@google.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Jeremi Piotrowski <jpiotrowski@linux.microsoft.com>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Tejun heo <tj@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-09-29 17:20:47 -07:00
Domenico Cerasuolo
ca56489c2f mm: zswap: fix potential memory corruption on duplicate store
While stress-testing zswap a memory corruption was happening when writing
back pages.  __frontswap_store used to check for duplicate entries before
attempting to store a page in zswap, this was because if the store fails
the old entry isn't removed from the tree.  This change removes duplicate
entries in zswap_store before the actual attempt.

[cerasuolodomenico@gmail.com: add a warning and a comment, per Johannes]
  Link: https://lkml.kernel.org/r/20230925130002.1929369-1-cerasuolodomenico@gmail.com
Link: https://lkml.kernel.org/r/20230922172211.1704917-1-cerasuolodomenico@gmail.com
Fixes: 42c06a0e8e ("mm: kill frontswap")
Signed-off-by: Domenico Cerasuolo <cerasuolodomenico@gmail.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Nhat Pham <nphamcs@gmail.com>
Cc: Dan Streetman <ddstreet@ieee.org>
Cc: Domenico Cerasuolo <cerasuolodomenico@gmail.com>
Cc: Seth Jennings <sjenning@redhat.com>
Cc: Vitaly Wool <vitaly.wool@konsulko.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-09-29 17:20:47 -07:00
Ryan Roberts
6f1bace9a9 arm64: hugetlb: fix set_huge_pte_at() to work with all swap entries
When called with a swap entry that does not embed a PFN (e.g. 
PTE_MARKER_POISONED or PTE_MARKER_UFFD_WP), the previous implementation of
set_huge_pte_at() would either cause a BUG() to fire (if CONFIG_DEBUG_VM
is enabled) or cause a dereference of an invalid address and subsequent
panic.

arm64's huge pte implementation supports multiple huge page sizes, some of
which are implemented in the page table with multiple contiguous entries. 
So set_huge_pte_at() needs to work out how big the logical pte is, so that
it can also work out how many physical ptes (or pmds) need to be written. 
It previously did this by grabbing the folio out of the pte and querying
its size.

However, there are cases when the pte being set is actually a swap entry. 
But this also used to work fine, because for huge ptes, we only ever saw
migration entries and hwpoison entries.  And both of these types of swap
entries have a PFN embedded, so the code would grab that and everything
still worked out.

But over time, more calls to set_huge_pte_at() have been added that set
swap entry types that do not embed a PFN.  And this causes the code to go
bang.  The triggering case is for the uffd poison test, commit
99aa77215a ("selftests/mm: add uffd unit test for UFFDIO_POISON"), which
causes a PTE_MARKER_POISONED swap entry to be set, coutesey of commit
8a13897fb0 ("mm: userfaultfd: support UFFDIO_POISON for hugetlbfs") -
added in v6.5-rc7.  Although review shows that there are other call sites
that set PTE_MARKER_UFFD_WP (which also has no PFN), these don't trigger
on arm64 because arm64 doesn't support UFFD WP.

Arguably, the root cause is really due to commit 18f3962953 ("mm:
hugetlb: kill set_huge_swap_pte_at()"), which aimed to simplify the
interface to the core code by removing set_huge_swap_pte_at() (which took
a page size parameter) and replacing it with calls to set_huge_pte_at()
where the size was inferred from the folio, as descibed above.  While that
commit didn't break anything at the time, it did break the interface
because it couldn't handle swap entries without PFNs.  And since then new
callers have come along which rely on this working.  But given the
brokeness is only observable after commit 8a13897fb0 ("mm: userfaultfd:
support UFFDIO_POISON for hugetlbfs"), that one gets the Fixes tag.

Now that we have modified the set_huge_pte_at() interface to pass the huge
page size in the previous patch, we can trivially fix this issue.

Link: https://lkml.kernel.org/r/20230922115804.2043771-3-ryan.roberts@arm.com
Fixes: 8a13897fb0 ("mm: userfaultfd: support UFFDIO_POISON for hugetlbfs")
Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
Reviewed-by: Axel Rasmussen <axelrasmussen@google.com>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Alexandre Ghiti <alex@ghiti.fr>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: David S. Miller <davem@davemloft.net>
Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Helge Deller <deller@gmx.de>
Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
Cc: Lorenzo Stoakes <lstoakes@gmail.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Qi Zheng <zhengqi.arch@bytedance.com>
Cc: SeongJae Park <sj@kernel.org>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Uladzislau Rezki (Sony) <urezki@gmail.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Will Deacon <will@kernel.org>
Cc: <stable@vger.kernel.org>	[6.5+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-09-29 17:20:47 -07:00
Ryan Roberts
935d4f0c6d mm: hugetlb: add huge page size param to set_huge_pte_at()
Patch series "Fix set_huge_pte_at() panic on arm64", v2.

This series fixes a bug in arm64's implementation of set_huge_pte_at(),
which can result in an unprivileged user causing a kernel panic.  The
problem was triggered when running the new uffd poison mm selftest for
HUGETLB memory.  This test (and the uffd poison feature) was merged for
v6.5-rc7.

Ideally, I'd like to get this fix in for v6.6 and I've cc'ed stable
(correctly this time) to get it backported to v6.5, where the issue first
showed up.


Description of Bug
==================

arm64's huge pte implementation supports multiple huge page sizes, some of
which are implemented in the page table with multiple contiguous entries. 
So set_huge_pte_at() needs to work out how big the logical pte is, so that
it can also work out how many physical ptes (or pmds) need to be written. 
It previously did this by grabbing the folio out of the pte and querying
its size.

However, there are cases when the pte being set is actually a swap entry. 
But this also used to work fine, because for huge ptes, we only ever saw
migration entries and hwpoison entries.  And both of these types of swap
entries have a PFN embedded, so the code would grab that and everything
still worked out.

But over time, more calls to set_huge_pte_at() have been added that set
swap entry types that do not embed a PFN.  And this causes the code to go
bang.  The triggering case is for the uffd poison test, commit
99aa77215a ("selftests/mm: add uffd unit test for UFFDIO_POISON"), which
causes a PTE_MARKER_POISONED swap entry to be set, coutesey of commit
8a13897fb0 ("mm: userfaultfd: support UFFDIO_POISON for hugetlbfs") -
added in v6.5-rc7.  Although review shows that there are other call sites
that set PTE_MARKER_UFFD_WP (which also has no PFN), these don't trigger
on arm64 because arm64 doesn't support UFFD WP.

If CONFIG_DEBUG_VM is enabled, we do at least get a BUG(), but otherwise,
it will dereference a bad pointer in page_folio():

    static inline struct folio *hugetlb_swap_entry_to_folio(swp_entry_t entry)
    {
        VM_BUG_ON(!is_migration_entry(entry) && !is_hwpoison_entry(entry));

        return page_folio(pfn_to_page(swp_offset_pfn(entry)));
    }


Fix
===

The simplest fix would have been to revert the dodgy cleanup commit
18f3962953 ("mm: hugetlb: kill set_huge_swap_pte_at()"), but since
things have moved on, this would have required an audit of all the new
set_huge_pte_at() call sites to see if they should be converted to
set_huge_swap_pte_at().  As per the original intent of the change, it
would also leave us open to future bugs when people invariably get it
wrong and call the wrong helper.

So instead, I've added a huge page size parameter to set_huge_pte_at(). 
This means that the arm64 code has the size in all cases.  It's a bigger
change, due to needing to touch the arches that implement the function,
but it is entirely mechanical, so in my view, low risk.

I've compile-tested all touched arches; arm64, parisc, powerpc, riscv,
s390, sparc (and additionally x86_64).  I've additionally booted and run
mm selftests against arm64, where I observe the uffd poison test is fixed,
and there are no other regressions.


This patch (of 2):

In order to fix a bug, arm64 needs to be told the size of the huge page
for which the pte is being set in set_huge_pte_at().  Provide for this by
adding an `unsigned long sz` parameter to the function.  This follows the
same pattern as huge_pte_clear().

This commit makes the required interface modifications to the core mm as
well as all arches that implement this function (arm64, parisc, powerpc,
riscv, s390, sparc).  The actual arm64 bug will be fixed in a separate
commit.

No behavioral changes intended.

Link: https://lkml.kernel.org/r/20230922115804.2043771-1-ryan.roberts@arm.com
Link: https://lkml.kernel.org/r/20230922115804.2043771-2-ryan.roberts@arm.com
Fixes: 8a13897fb0 ("mm: userfaultfd: support UFFDIO_POISON for hugetlbfs")
Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>	[powerpc 8xx]
Reviewed-by: Lorenzo Stoakes <lstoakes@gmail.com>	[vmalloc change]
Cc: Alexandre Ghiti <alex@ghiti.fr>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: David S. Miller <davem@davemloft.net>
Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Helge Deller <deller@gmx.de>
Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Qi Zheng <zhengqi.arch@bytedance.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: SeongJae Park <sj@kernel.org>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Uladzislau Rezki (Sony) <urezki@gmail.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Will Deacon <will@kernel.org>
Cc: <stable@vger.kernel.org>	[6.5+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-09-29 17:20:47 -07:00
Liam R. Howlett
a8091f039c maple_tree: add MAS_UNDERFLOW and MAS_OVERFLOW states
When updating the maple tree iterator to avoid rewalks, an issue was
introduced when shifting beyond the limits.  This can be seen by trying to
go to the previous address of 0, which would set the maple node to
MAS_NONE and keep the range as the last entry.

Subsequent calls to mas_find() would then search upwards from mas->last
and skip the value at mas->index/mas->last.  This showed up as a bug in
mprotect which skips the actual VMA at the current range after attempting
to go to the previous VMA from 0.

Since MAS_NONE may already be set when searching for a value that isn't
contained within a node, changing the handling of MAS_NONE in mas_find()
would make the code more complicated and error prone.  Furthermore, there
was no way to tell which limit was hit, and thus which action to take
(next or the entry at the current range).

This solution is to add two states to track what happened with the
previous iterator action.  This allows for the expected behaviour of the
next command to return the correct item (either the item at the range
requested, or the next/previous).

Tests are also added and updated accordingly.

Link: https://lkml.kernel.org/r/20230921181236.509072-3-Liam.Howlett@oracle.com
Link: https://gist.github.com/heatd/85d2971fae1501b55b6ea401fbbe485b
Link: https://lore.kernel.org/linux-mm/20230921181236.509072-1-Liam.Howlett@oracle.com/
Fixes: 39193685d5 ("maple_tree: try harder to keep active node with mas_prev()")
Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Reported-by: Pedro Falcato <pedro.falcato@gmail.com>
Closes: https://gist.github.com/heatd/85d2971fae1501b55b6ea401fbbe485b
Closes: https://bugs.archlinux.org/task/79656
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-09-29 17:20:46 -07:00
Liam R. Howlett
5c590804b6 maple_tree: add mas_is_active() to detect in-tree walks
Patch series "maple_tree: Fix mas_prev() state regression".

Pedro Falcato retported an mprotect regression [1] which was bisected back
to the iterator changes for maple tree.  Root cause analysis showed the
mas_prev() running off the end of the VMA space (previous from 0) followed
by mas_find(), would skip the first value.

This patchset introduces maple state underflow/overflow so the sequence of
calls on the maple state will return what the user expects.

Users who encounter this bug may see mprotect(), userfaultfd_register(),
and mlock() fail on VMAs mapped with address 0.


This patch (of 2):

Instead of constantly checking each possibility of the maple state,
create a fast path that will skip over checking unlikely states.

Link: https://lkml.kernel.org/r/20230921181236.509072-1-Liam.Howlett@oracle.com
Link: https://lkml.kernel.org/r/20230921181236.509072-2-Liam.Howlett@oracle.com
Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: Pedro Falcato <pedro.falcato@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-09-29 17:20:46 -07:00
Pan Bian
7ee29facd8 nilfs2: fix potential use after free in nilfs_gccache_submit_read_data()
In nilfs_gccache_submit_read_data(), brelse(bh) is called to drop the
reference count of bh when the call to nilfs_dat_translate() fails.  If
the reference count hits 0 and its owner page gets unlocked, bh may be
freed.  However, bh->b_page is dereferenced to put the page after that,
which may result in a use-after-free bug.  This patch moves the release
operation after unlocking and putting the page.

NOTE: The function in question is only called in GC, and in combination
with current userland tools, address translation using DAT does not occur
in that function, so the code path that causes this issue will not be
executed.  However, it is possible to run that code path by intentionally
modifying the userland GC library or by calling the GC ioctl directly.

[konishi.ryusuke@gmail.com: NOTE added to the commit log]
Link: https://lkml.kernel.org/r/1543201709-53191-1-git-send-email-bianpan2016@163.com
Link: https://lkml.kernel.org/r/20230921141731.10073-1-konishi.ryusuke@gmail.com
Fixes: a3d93f709e ("nilfs2: block cache for garbage collection")
Signed-off-by: Pan Bian <bianpan2016@163.com>
Reported-by: Ferry Meng <mengferry@linux.alibaba.com>
Closes: https://lkml.kernel.org/r/20230818092022.111054-1-mengferry@linux.alibaba.com
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Tested-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-09-29 17:20:46 -07:00
Matthew Wilcox (Oracle)
ce60f27bb6 mm: abstract moving to the next PFN
In order to fix the L1TF vulnerability, x86 can invert the PTE bits for
PROT_NONE VMAs, which means we cannot move from one PTE to the next by
adding 1 to the PFN field of the PTE.  This results in the BUG reported at
[1].

Abstract advancing the PTE to the next PFN through a pte_next_pfn()
function/macro.

Link: https://lkml.kernel.org/r/20230920040958.866520-1-willy@infradead.org
Fixes: bcc6cc8325 ("mm: add default definition of set_ptes()")
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reported-by: syzbot+55cc72f8cc3a549119df@syzkaller.appspotmail.com
Closes: https://lkml.kernel.org/r/000000000000d099fa0604f03351@google.com [1]
Reviewed-by: Yin Fengwei <fengwei.yin@intel.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-09-29 17:20:46 -07:00
Matthew Wilcox (Oracle)
a501a07030 mm: report success more often from filemap_map_folio_range()
Even though we had successfully mapped the relevant page, we would rarely
return success from filemap_map_folio_range().  That leads to falling back
from the VMA lock path to the mmap_lock path, which is a speed &
scalability issue.  Found by inspection.

Link: https://lkml.kernel.org/r/20230920035336.854212-1-willy@infradead.org
Fixes: 617c28ecab ("filemap: batch PTE mappings")
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Yin Fengwei <fengwei.yin@intel.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-09-29 17:20:45 -07:00
Greg Ungerer
7c31515857 fs: binfmt_elf_efpic: fix personality for ELF-FDPIC
The elf-fdpic loader hard sets the process personality to either
PER_LINUX_FDPIC for true elf-fdpic binaries or to PER_LINUX for normal ELF
binaries (in this case they would be constant displacement compiled with
-pie for example).  The problem with that is that it will lose any other
bits that may be in the ELF header personality (such as the "bug
emulation" bits).

On the ARM architecture the ADDR_LIMIT_32BIT flag is used to signify a
normal 32bit binary - as opposed to a legacy 26bit address binary.  This
matters since start_thread() will set the ARM CPSR register as required
based on this flag.  If the elf-fdpic loader loses this bit the process
will be mis-configured and crash out pretty quickly.

Modify elf-fdpic loader personality setting so that it preserves the upper
three bytes by using the SET_PERSONALITY macro to set it.  This macro in
the generic case sets PER_LINUX and preserves the upper bytes. 
Architectures can override this for their specific use case, and ARM does
exactly this.

The problem shows up quite easily running under qemu using the ARM
architecture, but not necessarily on all types of real ARM hardware.  If
the underlying ARM processor does not support the legacy 26-bit addressing
mode then everything will work as expected.

Link: https://lkml.kernel.org/r/20230907011808.2985083-1-gerg@kernel.org
Fixes: 1bde925d23 ("fs/binfmt_elf_fdpic.c: provide NOMMU loader for regular ELF binaries")
Signed-off-by: Greg Ungerer <gerg@kernel.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Greg Ungerer <gerg@kernel.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-09-29 17:20:45 -07:00
Ben Wolsieffer
fe44198016 proc: nommu: fix empty /proc/<pid>/maps
On no-MMU, /proc/<pid>/maps reads as an empty file.  This happens because
find_vma(mm, 0) always returns NULL (assuming no vma actually contains the
zero address, which is normally the case).

To fix this bug and improve the maintainability in the future, this patch
makes the no-MMU implementation as similar as possible to the MMU
implementation.

The only remaining differences are the lack of hold/release_task_mempolicy
and the extra code to shoehorn the gate vma into the iterator.

This has been tested on top of 6.5.3 on an STM32F746.

Link: https://lkml.kernel.org/r/20230915160055.971059-2-ben.wolsieffer@hefring.com
Fixes: 0c563f1480 ("proc: remove VMA rbtree use from nommu")
Signed-off-by: Ben Wolsieffer <ben.wolsieffer@hefring.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Giulio Benetti <giulio.benetti@benettiengineering.com>
Cc: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-09-19 13:21:34 -07:00
Yin Fengwei
c8be038067 filemap: add filemap_map_order0_folio() to handle order0 folio
Kernel test robot reported regressions for several benchmarks [1].
The regression are related with commit:
de74976eb6 ("filemap: add filemap_map_folio_range()")

It turned out that function filemap_map_folio_range() brings these
regressions when handle folio with order0.

Add filemap_map_order0_folio() to handle order0 folio. The benefit
come from two perspectives:
  - the code size is smaller (around 126 bytes)
  - no loop

Testing showed the regressions reported by 0day [1] all are fixed:
commit 9f1f5b60e7: parent commit of de74976eb6
commit fbdf9263a3d7fdbd: latest mm-unstable commit
commit 7fbfe2003f84686d: this fixing patch

9f1f5b60e7 fbdf9263a3d7fdbd            7fbfe2003f84686d
---------------- --------------------------- ---------------------------
   3843810           -21.4%    3020268            +4.6%    4018708      stress-ng.bad-altstack.ops
     64061           -21.4%      50336            +4.6%      66977      stress-ng.bad-altstack.ops_per_sec

   1709026           -14.4%    1462102            +2.4%    1750757      stress-ng.fork.ops
     28483           -14.4%      24368            +2.4%      29179      stress-ng.fork.ops_per_sec

   3685088           -53.6%    1710976            +0.5%    3702454      stress-ng.zombie.ops
     56732           -65.3%      19667            +0.7%      57107      stress-ng.zombie.ops_per_sec

     61874           -12.1%      54416            +0.4%      62136      vm-scalability.median
  13527663           -11.7%   11942117            -0.1%   13513946      vm-scalability.throughput
 4.066e+09           -11.7%   3.59e+09            -0.1%  4.061e+09      vm-scalability.workload

[1]:
https://lore.kernel.org/oe-lkp/72e017b9-deb6-44fa-91d6-716ee2c39cbc@intel.com/T/#m7d2bba30f75a9cee8eab07e5809abd9b3b206c84

Link: https://lkml.kernel.org/r/20230914134741.1937654-1-fengwei.yin@intel.com
Fixes: de74976eb6 ("filemap: add filemap_map_folio_range()")
Signed-off-by: Yin Fengwei <fengwei.yin@intel.com>
Reported-by: kernel test robot <oliver.sang@intel.com>
Closes: https://lore.kernel.org/oe-lkp/202309111556.b2aa3d7a-oliver.sang@intel.com
Cc: Feng Tang <feng.tang@intel.com>
Cc: Huang Ying <ying.huang@intel.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-09-19 13:21:34 -07:00
Ben Wolsieffer
578d7699e5 proc: nommu: /proc/<pid>/maps: release mmap read lock
The no-MMU implementation of /proc/<pid>/map doesn't normally release
the mmap read lock, because it uses !IS_ERR_OR_NULL(_vml) to determine
whether to release the lock.  Since _vml is NULL when the end of the
mappings is reached, the lock is not released.

Reading /proc/1/maps twice doesn't cause a hang because it only
takes the read lock, which can be taken multiple times and therefore
doesn't show any problem if the lock isn't released. Instead, you need
to perform some operation that attempts to take the write lock after
reading /proc/<pid>/maps. To actually reproduce the bug, compile the
following code as 'proc_maps_bug':

#include <stdio.h>
#include <unistd.h>
#include <sys/mman.h>

int main(int argc, char *argv[]) {
        void *buf;
        sleep(1);
        buf = mmap(NULL, 4096, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
        puts("mmap returned");
        return 0;
}

Then, run:

  ./proc_maps_bug &; cat /proc/$!/maps; fg

Without this patch, mmap() will hang and the command will never
complete.
	
This code was incorrectly adapted from the MMU implementation, which at
the time released the lock in m_next() before returning the last entry.

The MMU implementation has diverged further from the no-MMU version since
then, so this patch brings their locking and error handling into sync,
fixing the bug and hopefully avoiding similar issues in the future.

Link: https://lkml.kernel.org/r/20230914163019.4050530-2-ben.wolsieffer@hefring.com
Fixes: 47fecca15c ("fs/proc/task_nommu.c: don't use priv->task->mm")
Signed-off-by: Ben Wolsieffer <ben.wolsieffer@hefring.com>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Cc: Giulio Benetti <giulio.benetti@benettiengineering.com>
Cc: Greg Ungerer <gerg@uclinux.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-09-19 13:21:34 -07:00
Johannes Weiner
9ea9cb00a8 mm: memcontrol: fix GFP_NOFS recursion in memory.high enforcement
Breno and Josef report a deadlock scenario from cgroup reclaim
re-entering the filesystem:

[  361.546690] ======================================================
[  361.559210] WARNING: possible circular locking dependency detected
[  361.571703] 6.5.0-0_fbk700_debug_rc0_kbuilder_13159_gbf787a128001 #1 Tainted: G S          E
[  361.589704] ------------------------------------------------------
[  361.602277] find/9315 is trying to acquire lock:
[  361.611625] ffff88837ba140c0 (&delayed_node->mutex){+.+.}-{4:4}, at: __btrfs_release_delayed_node+0x68/0x4f0
[  361.631437]
[  361.631437] but task is already holding lock:
[  361.643243] ffff8881765b8678 (btrfs-tree-01){++++}-{4:4}, at: btrfs_tree_read_lock+0x1e/0x40

[  362.904457]  mutex_lock_nested+0x1c/0x30
[  362.912414]  __btrfs_release_delayed_node+0x68/0x4f0
[  362.922460]  btrfs_evict_inode+0x301/0x770
[  362.982726]  evict+0x17c/0x380
[  362.988944]  prune_icache_sb+0x100/0x1d0
[  363.005559]  super_cache_scan+0x1f8/0x260
[  363.013695]  do_shrink_slab+0x2a2/0x540
[  363.021489]  shrink_slab_memcg+0x237/0x3d0
[  363.050606]  shrink_slab+0xa7/0x240
[  363.083382]  shrink_node_memcgs+0x262/0x3b0
[  363.091870]  shrink_node+0x1a4/0x720
[  363.099150]  shrink_zones+0x1f6/0x5d0
[  363.148798]  do_try_to_free_pages+0x19b/0x5e0
[  363.157633]  try_to_free_mem_cgroup_pages+0x266/0x370
[  363.190575]  reclaim_high+0x16f/0x1f0
[  363.208409]  mem_cgroup_handle_over_high+0x10b/0x270
[  363.246678]  try_charge_memcg+0xaf2/0xc70
[  363.304151]  charge_memcg+0xf0/0x350
[  363.320070]  __mem_cgroup_charge+0x28/0x40
[  363.328371]  __filemap_add_folio+0x870/0xd50
[  363.371303]  filemap_add_folio+0xdd/0x310
[  363.399696]  __filemap_get_folio+0x2fc/0x7d0
[  363.419086]  pagecache_get_page+0xe/0x30
[  363.427048]  alloc_extent_buffer+0x1cd/0x6a0
[  363.435704]  read_tree_block+0x43/0xc0
[  363.443316]  read_block_for_search+0x361/0x510
[  363.466690]  btrfs_search_slot+0xc8c/0x1520

This is caused by the mem_cgroup_handle_over_high() not respecting the
gfp_mask of the allocation context.  We used to only call this function on
resume to userspace, where no locks were held.  But c9afe31ec4 ("memcg:
synchronously enforce memory.high for large overcharges") added a call
from the allocation context without considering the gfp.

Link: https://lkml.kernel.org/r/20230914152139.100822-1-hannes@cmpxchg.org
Fixes: c9afe31ec4 ("memcg: synchronously enforce memory.high for large overcharges")
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Reported-by: Breno Leitao <leitao@debian.org>
Reported-by: Josef Bacik <josef@toxicpanda.com>
Acked-by: Shakeel Butt <shakeelb@google.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Muchun Song <songmuchun@bytedance.com>
Cc: <stable@vger.kernel.org>	[5.17+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-09-19 13:21:34 -07:00
Randy Dunlap
0c7752d5b1 pidfd: prevent a kernel-doc warning
Change the comment to match the function name that the SYSCALL_DEFINE()
macros generate to prevent a kernel-doc warning.

kernel/pid.c:628: warning: expecting prototype for pidfd_open(). Prototype was for sys_pidfd_open() instead

Link: https://lkml.kernel.org/r/20230912060822.2500-1-rdunlap@infradead.org
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Christian Brauner <brauner@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-09-19 13:21:33 -07:00
Randy Dunlap
36ee98b555 argv_split: fix kernel-doc warnings
Use proper kernel-doc notation to prevent build warnings:

lib/argv_split.c:36: warning: Function parameter or member 'argv' not described in 'argv_free'
lib/argv_split.c:61: warning: No description found for return value of 'argv_split'

Link: https://lkml.kernel.org/r/20230912060838.3794-1-rdunlap@infradead.org
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-09-19 13:21:33 -07:00
Randy Dunlap
c80da1fb85 scatterlist: add missing function params to kernel-doc
Describe missing function parameters to prevent kernel-doc warnings:

lib/scatterlist.c:288: warning: Function parameter or member 'first_chunk' not described in '__sg_alloc_table'
lib/scatterlist.c:800: warning: Function parameter or member 'flags' not described in 'sg_miter_start'

Link: https://lkml.kernel.org/r/20230912060848.4673-1-rdunlap@infradead.org
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-09-19 13:21:33 -07:00
Alexey Dobriyan
9d1be94df5 selftests/proc: fixup proc-empty-vm test after KSM changes
/proc/${pid}/smaps_rollup is not empty file even if process's address
space is empty, update the test.

Link: https://lkml.kernel.org/r/725e041f-e9df-4f3d-b267-d4cd2774a78d@p183
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Stefan Roesch <shr@devkernel.io>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-09-19 13:21:33 -07:00
Andrew Morton
493d4eecf4 revert "scripts/gdb/symbols: add specific ko module load command"
Revert 11f956538c ("scripts/gdb/symbols: add specific ko module load
command") due to breakage identified by Johannes Berg in [1].

Fixes: 11f956538c ("scripts/gdb/symbols: add specific ko module load command")
Reported-by: Johannes Berg <johannes@sipsolutions.net>
Closes: https://lkml.kernel.org/r/c44b748307a074d0c250002cdcfe209b8cce93c9.camel@sipsolutions.net [1]
Cc: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Cc: Chinwen Chang <chinwen.chang@mediatek.com>
Cc: Jan Kiszka <jan.kiszka@siemens.com>
Cc: Kieran Bingham <kbingham@kernel.org>
Cc: Kuan-Ying Lee <Kuan-Ying.Lee@mediatek.com>
Cc: Matthias Brugger <matthias.bgg@gmail.com>
Cc: Qun-Wei Lin <qun-wei.lin@mediatek.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-09-19 13:21:33 -07:00
Ryan Roberts
c652df8a4a selftests: link libasan statically for tests with -fsanitize=address
When dynamically linking, Address Sanitizer requires its library to be the
first one to be loaded; this is apparently to ensure that every call to
malloc is intercepted.  If using LD_PRELOAD, those listed libraries will
be loaded before the libraries listed in the program's ELF and will
therefore violate this requirement, leading to the below failure and
output from ASan.

commit 58e2847ad2 ("selftests: line buffer test program's stdout")
modified the kselftest runner to force line buffering by forcing the test
programs to run through `stdbuf`.  It turns out that stdbuf implements
line buffering by injecting a library via LD_PRELOAD.  Therefore selftests
that use ASan started failing.

Fix this by statically linking libasan in the affected test programs,
using the `-static-libasan` option.  Note this is already the default for
Clang, but not got GCC.

Test output sample for failing case:

  TAP version 13
  1..3
  # timeout set to 300
  # selftests: openat2: openat2_test
  # ==4052==ASan runtime does not come first in initial library list;
  you should either link runtime to your application or manually preload
  it with LD_PRELOAD.
  not ok 1 selftests: openat2: openat2_test # exit=1
  # timeout set to 300
  # selftests: openat2: resolve_test
  # ==4070==ASan runtime does not come first in initial library list;
  you should either link runtime to your application or manually preload
  it with LD_PRELOAD.
  not ok 2 selftests: openat2: resolve_test # exit=1

Link: https://lkml.kernel.org/r/20230912135048.1755771-1-ryan.roberts@arm.com
Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
Fixes: 58e2847ad2 ("selftests: line buffer test program's stdout")
Reported-by: kernel test robot <oliver.sang@intel.com>
Closes: https://lore.kernel.org/oe-lkp/202309121342.97e2f008-oliver.sang@intel.com
Cc: David Hildenbrand <david@redhat.com>
Cc: Florent Revest <revest@chromium.org>
Cc: Jérôme Glisse <jglisse@redhat.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Mark Brown <broonie@kernel.org>
Cc: Peter Xu <peterx@redhat.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Tom Rix <trix@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-09-19 13:21:32 -07:00
Jens Axboe
4653e5dd04 task_work: add kerneldoc annotation for 'data' argument
A previous commit changed the arguments to task_work_cancel_match(), but
didn't document all of them.

Link: https://lkml.kernel.org/r/93938bff-baa3-4091-85f5-784aae297a07@kernel.dk
Fixes: c7aab1a7c5 ("task_work: add helper for more targeted task_work canceling")
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202309120307.zis3yQGe-lkp@intel.com/
Acked-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-09-19 13:21:32 -07:00
Johannes Weiner
7b086755fb mm: page_alloc: fix CMA and HIGHATOMIC landing on the wrong buddy list
Commit 4b23a68f95 ("mm/page_alloc: protect PCP lists with a spinlock")
bypasses the pcplist on lock contention and returns the page directly to
the buddy list of the page's migratetype.

For pages that don't have their own pcplist, such as CMA and HIGHATOMIC,
the migratetype is temporarily updated such that the page can hitch a ride
on the MOVABLE pcplist.  Their true type is later reassessed when flushing
in free_pcppages_bulk().  However, when lock contention is detected after
the type was already overridden, the bypass will then put the page on the
wrong buddy list.

Once on the MOVABLE buddy list, the page becomes eligible for fallbacks
and even stealing.  In the case of HIGHATOMIC, otherwise ineligible
allocations can dip into the highatomic reserves.  In the case of CMA, the
page can be lost from the CMA region permanently.

Use a separate pcpmigratetype variable for the pcplist override.  Use the
original migratetype when going directly to the buddy.  This fixes the bug
and should make the intentions more obvious in the code.

Originally sent here to address the HIGHATOMIC case:
https://lore.kernel.org/lkml/20230821183733.106619-4-hannes@cmpxchg.org/

Changelog updated in response to the CMA-specific bug report.

[mgorman@techsingularity.net: updated changelog]
Link: https://lkml.kernel.org/r/20230911181108.GA104295@cmpxchg.org
Fixes: 4b23a68f95 ("mm/page_alloc: protect PCP lists with a spinlock")
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Reported-by: Joe Liu <joe.liu@mediatek.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-09-19 13:21:32 -07:00
Geert Uytterhoeven
e72590fa56 sh: mm: re-add lost __ref to ioremap_prot() to fix modpost warning
When __ioremap_caller() was replaced by ioremap_prot(), the __ref
annotation added in commit af1415314a ("sh: Flag __ioremap_caller()
__init_refok.") was removed, causing a modpost warning:

    WARNING: modpost: vmlinux: section mismatch in reference: ioremap_prot+0x88 (section: .text) -> ioremap_fixed (section: .init.text)

ioremap_prot() calls ioremap_fixed() (which is marked __init), but only
before mem_init_done becomes true, so this is safe.  Hence fix this by
re-adding the lost __ref.

Link: https://lkml.kernel.org/r/20230911093850.1517389-1-geert+renesas@glider.be
Fixes: 0453c9a780 ("sh: mm: convert to GENERIC_IOREMAP")
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Reviewed-by: Baoquan He <bhe@redhat.com>
Reviewed-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
Cc: Rich Felker <dalias@libc.org>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-09-19 13:21:32 -07:00
Linus Torvalds
ce9ecca023 Linux 6.6-rc2 2023-09-17 14:40:24 -07:00
Linus Torvalds
e789286468 Misc fixes:
- Fix an UV boot crash,
 - Skip spurious ENDBR generation on _THIS_IP_,
 - Fix ENDBR use in putuser() asm methods,
 - Fix corner case boot crashes on 5-level paging,
 - and fix a false positive WARNING on LTO kernels.
 
 Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmUHOrwRHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1j6Jw/+PjUfh/4+KM/Z8VzcBy2UhY3NMF2ptGCN
 47FPLy+8ADqOvIfgsPsBEO9VXwdvkHfH64YWRUlULjvPNOSs+37drBYMe9AI9xKE
 u6NhoBHmsnOtoLkBFIQYlJys9GyAeoMlwSSHxzRwQS+3VztRjoH636jiBcg/h7DR
 zhakfnJD1SSOZuEyyDPnO0uIUarrcqC2jdZwucSqZnvZFdA/pexUHQEe2RtMXLln
 EIA5kuEo7UdADcSr/Lbty7MKO+6xpRTjxF0yNItPtwPWsnxrSAC7P+dQ37uB975U
 Z0CJ/kw54XG5Sdoech7XCWYmJzDxSPhiziA1USKpZJMo5phAG/apTJK6NG4TG94r
 U+3QhLopUoSd8N74/VtZn0FjOrMsk7YKD5o8twlTbpCd2VaiJk4X5oBIns6Tvq05
 0Vmsx15XO3mEzg1wWbbdjhjHW0czRgBpikS9QyexZKVkukYcW8QT6bk1gK1ypg94
 do4PHKB53QIt31dedy/ddpQj4u1sJ4+a9/+IG29kjUB33M4+uFedtw11vfe+CDN0
 XLRc6HfPblogIIEO/figJgwSq/TPCOsNHTwKkejq+D1Ey6DsrnX9Gg4BWVz/3dDW
 84SW7TaO2FGEDh4NkR2ijkZlbpofFnCvhCh/uohodPlqDrTVTuRKCZW9I5plmGVa
 qeXUpNDFs1E=
 =BMjT
 -----END PGP SIGNATURE-----

Merge tag 'x86-urgent-2023-09-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 fixes from Ingo Molnar:
 "Misc fixes:

   - Fix an UV boot crash

   - Skip spurious ENDBR generation on _THIS_IP_

   - Fix ENDBR use in putuser() asm methods

   - Fix corner case boot crashes on 5-level paging

   - and fix a false positive WARNING on LTO kernels"

* tag 'x86-urgent-2023-09-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/purgatory: Remove LTO flags
  x86/boot/compressed: Reserve more memory for page tables
  x86/ibt: Avoid duplicate ENDBR in __put_user_nocheck*()
  x86/ibt: Suppress spurious ENDBR
  x86/platform/uv: Use alternate source for socket to node data
2023-09-17 11:13:37 -07:00
Linus Torvalds
e5a710d132 Fix a performance regression on large SMT systems, an Intel SMT4
balancing bug, and a topology setup bug on (Intel) hybrid processors.
 
 Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmUHOVQRHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1iOahAAj3YsoNbT/k6m9yp622n1OopaNEQvsK+/
 F2Q5g/hJrm3+W5764rF8CvjhDbmrv6owjp3yUyZLDIfSAFZYMvwoNody3a373Yr3
 VFBMJ00jNIv/TAFCJZYeybg3yViwObKKfpu4JBj//QU+4uGWCoBMolkVekU2bBti
 r50fMxBPgg2Yic57DCC8Y+JZzHI/ydQ3rvVXMzkrTZCO/zY4/YmERM9d+vp4wl4B
 uG9cfXQ4Yf/1gZo0XDlTUkOJUXPnkMgi+N4eHYlGuyOCoIZOfATI24hRaPBoQcdx
 PDwHcKmyNxH9iaRppNQMvi797g3KrKVEmZwlZg1IfsILhKC0F4GsQ85tw8qQWE8j
 brFPkWVUxAUSl4LXoqVInaxDHmJWR2UC3RA7b+BxFF/GMLTow0z4a+JMC6eKlNyR
 uBisZnuEuecqwF9TLhyD3KBHh7PihUPz8PuFHk+Um5sggaUE82I+VwX6uxEi5y8r
 ke2kAkpuMxPWT5lwDmFPAXWfvpZz5vvTIRUxGGj2+s4d8b0YfLtZsx5+uOIacaub
 Gw+wYFfSowph72tR/SUVq0An/UTSPPBxty8eFIVeE6sW9bw3ghTtkf8300xjV7Rj
 sKVxXy/podAo8wG7R6aZfTfsCpohmeEjskiatYdThYamPPx7V0R5pq4twmTXTHLJ
 bFvQ1GFCOu0=
 =jIeN
 -----END PGP SIGNATURE-----

Merge tag 'sched-urgent-2023-09-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull scheduler fixes from Ingo Molnar:
 "Fix a performance regression on large SMT systems, an Intel SMT4
  balancing bug, and a topology setup bug on (Intel) hybrid processors"

* tag 'sched-urgent-2023-09-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/sched: Restore the SD_ASYM_PACKING flag in the DIE domain
  sched/fair: Fix SMT4 group_smt_balance handling
  sched/fair: Optimize should_we_balance() for large SMT systems
2023-09-17 11:10:23 -07:00
Linus Torvalds
e54ca3c81f Fix a cold functions related false-positive objtool warning
that triggers on Clang.
 
 Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmUHOFwRHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1hQdRAAsekH6786PH2hiL7DL1KhCZMdC1V71ORr
 3YFj1LcG+mXB6nQLt961KgA4l4efGMMxBxhT47wqOm0tJXUOVSzXxi3aQ0eoIPH0
 m5MnSWEyfZRjcvNjS8IZ2N8CJr1AvnSZPJ3iaJD2knNqHOCMORXbrhXnc9ulL3PR
 r1eBaaylLtlhHUdvekUeW8qZBAFx3ZzWz3lf0IY8seBbBPTXVp6dS4PPMzZ5vwTB
 e9yyOiLaF1P5mNZnOBNfEVKTQTmaFECDRp9PhGcTxY0GY4+9apyD5h/aDJwRJyFN
 ciB+zvmxw3mjlhCCG1CllImjz/gvzdwqzxeYlHPyZvEbnuJqCkdBLSgRGwi9vtyw
 APsHYYAHr6CNR/15/PvmX+GGR6No0OkR9BoZL5ygJE5+sapKvyeItymqovRRKGZ/
 kEQK2fj6EiDiy2EejMZ9EFUtWfhkV5OkT0Jd0nd/ZxZi3UbBEfqq6JgSIe/+KzC3
 Iniovn77mpQHP1cM/OGbPByOMUygjNBwigCwo12imxrktud+/HQJ74gX7cBsYKEH
 fKbAbHoLpC7/hqGc/3nzZF7b1pBMf4Lehm6iePsXai6Fv9hO7/T5RH54xJGp4HTO
 EexuFJt/d7l4ymtGtO8i/V65iiVkXsddnBivYfOqisxwB0s5BcMgsh4XrWMGd8Q4
 KP9fcsOtUKM=
 =DIBp
 -----END PGP SIGNATURE-----

Merge tag 'objtool-urgent-2023-09-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull objtool fix from Ingo Molnar:
 "Fix a cold functions related false-positive objtool warning that
  triggers on Clang"

* tag 'objtool-urgent-2023-09-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  objtool: Fix _THIS_IP_ detection for cold functions
2023-09-17 10:59:37 -07:00
Linus Torvalds
99a73f9e8d Fix a missing preempt-enable in the WARN() slowpath.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmUHN6IRHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1hacRAAnNilMzfO8EDNwoPgE0eIseQNif/qSDxi
 jMZE9bkrgaQXCYnqBSukjIaGaGuckVFi5TFRbY/nGnkjCy0hkYwUQ4UqiewDXzP8
 i/Dpo9W2e9ubv9iPDy2x5okofcvWSKIw8cPkAkHiIMfRSPS5jsTeSfyY/DZyq6/c
 qJSuYPISn5Hq3KGln4xzL5bLRWvyUVlt5/urLH60Gbb8W4ZEhdNm82Y1nTWZVOa4
 QfIVirHbJdt/Va4UOAnaz24c5HI7/SjH8E2RKcKB/wUBEMoPEUfc6ba3/ZzYQbg6
 io+2bLbZppv4HiGcw98ofyVr+WL8S9EGmJpBiuvhnWJyAd4Ei9UamuDisbxl+0t3
 a2UEHVygokCvjJAeIy1BrBhuGdnZPrENi8qmdEpAHSING4ICKCGfpYOnQzbAwOlO
 57FFpulcvqhraqY8sfpIQImgslCvy5Dm854w2FUcjUsADNLcBYrMELKrBoQLznxm
 URzhXHbbDhGABITQnKkgNldVwM+/no3Z7/WusnevpMnxPb9ynhYl6rZMp84q+rOJ
 UsskzkWD19ONgc8aCvnMinHj+z+kKtbpzohrt1EcnH5Me0kM35lkyxwZ/O0wPfRp
 WQr2zf7ARTEuuB96JNBI6bc5A1a0ftp1wjItZnZ1AOV4FRTBE0V43zgWl2wbITZe
 3IrSWCBYcew=
 =znqQ
 -----END PGP SIGNATURE-----

Merge tag 'core-urgent-2023-09-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull WARN fix from Ingo Molnar:
 "Fix a missing preempt-enable in the WARN() slowpath"

* tag 'core-urgent-2023-09-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  panic: Reenable preemption in WARN slowpath
2023-09-17 10:55:35 -07:00
Linus Torvalds
42aadec8c7 stat: remove no-longer-used helper macros
The choose_32_64() macros were added to deal with an odd inconsistency
between the 32-bit and 64-bit layout of 'struct stat' way back when in
commit a52dd971f9 ("vfs: de-crapify "cp_new_stat()" function").

Then a decade later Mikulas noticed that said inconsistency had been a
mistake in the early x86-64 port, and shouldn't have existed in the
first place.  So commit 932aba1e16 ("stat: fix inconsistency between
struct stat and struct compat_stat") removed the uses of the helpers.

But the helpers remained around, unused.

Get rid of them.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2023-09-17 10:46:12 -07:00
Linus Torvalds
45c3c62722 three small SMB3 client fixes, one to improve a null check and two minor cleanup
-----BEGIN PGP SIGNATURE-----
 
 iQGzBAABCgAdFiEE6fsu8pdIjtWE/DpLiiy9cAdyT1EFAmUGFSgACgkQiiy9cAdy
 T1H8ZQwAuJhiLsTJK0lnWWxZC+KsIvTXlNKqx3VUqhJeYdxAc1tNCVjHTgdm63QA
 gRA0Htt8UhUoVVIMiipW2/PHA4rrNU7i0ULXSasAL6d8pPuZfeCzoehSfFo4u2ra
 bVDjfQUDtRakSU//Aj+Bv2sO77UWz0pQ5y0v2LCpPQ9Ks5TmLgxT+40uXCXf/LAe
 3aBbvrgLOlt0JMXaIEaQoecMitUqajmuuq/5SVQ7lz0xvn7cCLKgk22LehtwHR0W
 Ae8GdCkfFipdq+gp76CZPHO9evmRCsjmF95z56/++HdLrftYln5W/TDfjTlOZM9V
 tP99hK/2EjsWL7TMCOG59w21sKuaOdBA7AV7blgWxZAbKsrBgtMEXgQxSZMiK+Vm
 lKR5lGLWoujQLcnzWRh+WL7XP0ZxzitTlrlLeFxciPSGP843GRx+0oINLKL8CInr
 9mTwkzlzODNKA+83yRs5+Q3i0mq161IugsRrk1NHRUsr7oXiWWIxhcqCy5N5+R2S
 SfB16ql5
 =WtnH
 -----END PGP SIGNATURE-----

Merge tag '6.6-rc1-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6

Pull smb client fixes from Steve French:
 "Three small SMB3 client fixes, one to improve a null check and two
  minor cleanups"

* tag '6.6-rc1-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6:
  smb3: fix some minor typos and repeated words
  smb3: correct places where ENOTSUPP is used instead of preferred EOPNOTSUPP
  smb3: move server check earlier when setting channel sequence number
2023-09-17 10:41:42 -07:00
Linus Torvalds
39e0c8afdc two ksmbd server fixes
-----BEGIN PGP SIGNATURE-----
 
 iQGzBAABCgAdFiEE6fsu8pdIjtWE/DpLiiy9cAdyT1EFAmUGEycACgkQiiy9cAdy
 T1FJQwv/XxgRr8cCkGvoxJW+P969Aw/PMy4BhhL1s26zm85wEudfUldV19VeJad0
 4Jcc7w608pCAhR9Xsl3MLB437ttOID9jOK9G10RvqwAJSaCqO6whpXRaJTGe4g4n
 N71/18R2uPW3HswBe5KSk0pC6WtYaGDTDpJ7hV1zAwiQWQywMA9FgzOspQCRaBEf
 JqH9F8tnzpfT/lDTe64Q3mRXC1ppO/XdJHhxzgRv8l41bc/0PWCPk2GuxpMQNhCo
 BediiKyIa/kUPEDID9k02VVxoW+aitbcw5kYUfMO55V6IkstuDbjq5k7k+r0BKfK
 AM8YE/LyRM5izwaV73tS2mSVZlEQLSlfwAuAY5uvcnanUIegFypCHEclnNmkS3Qx
 dXAonMWGD4+8N/aywNg5Zm5ql3HzLzS4uCIVJbyeOLqd1GljaYjvWsGkXvY9NnyT
 ED5ya4jTFZeEbONEdnPcgmOEZifs93VnklCsaGMFRJbv1gnKsBOt75EooeB7+44j
 TyRaeNNe
 =MOeW
 -----END PGP SIGNATURE-----

Merge tag '6.6-rc1-ksmbd' of git://git.samba.org/ksmbd

Pull smb server fixes from Steve French:
 "Two ksmbd server fixes"

* tag '6.6-rc1-ksmbd' of git://git.samba.org/ksmbd:
  ksmbd: fix passing freed memory 'aux_payload_buf'
  ksmbd: remove unneeded mark_inode_dirty in set_info_sec()
2023-09-17 10:38:01 -07:00
Linus Torvalds
3fde3003ca Regression and bug fixes for ext4.
-----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEK2m5VNv+CHkogTfJ8vlZVpUNgaMFAmUGh1YACgkQ8vlZVpUN
 gaN9lQgAqmMWu3xLwOERgVbK3CYT8WMcv0m9/by+vSwghCoPVDWWENgEgAzo4YpK
 Lsp4q62wHaWs6AzvJEaJ8ryedo7e4FUHxcvp2f6dCuOPadOEZZZTa4G5fAr0kYXS
 TIoaFtv6F2QVnGU6Y5lhtfYzmgLRdLL0B6MfSTYGO2MSREqxapvfxyGBQdkOuXfO
 UEtrUUEqQ2GdDcKp+FRRnaUvNaTPEESY8d5eVwrMmyUhQWUQL/N2BPbFkk1TP6RG
 MLDNsUZpdhZvLs6qLuR7dvO5wa2fshvRJIXlPINM0R0as5LmHqVL/ifCNkCn4W+k
 ZNvdSPhqew68KHHq3sYFtm9rbZ3YOA==
 =DopS
 -----END PGP SIGNATURE-----

Merge tag 'ext4_for_linus-6.6-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4

Pull ext4 fixes from Ted Ts'o:
 "Regression and bug fixes for ext4"

* tag 'ext4_for_linus-6.6-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
  ext4: fix rec_len verify error
  ext4: do not let fstrim block system suspend
  ext4: move setting of trimmed bit into ext4_try_to_trim_range()
  jbd2: Fix memory leak in journal_init_common()
  jbd2: Remove page size assumptions
  buffer: Make bh_offset() work for compound pages
2023-09-17 10:33:53 -07:00
Song Liu
75b2f7e4c9 x86/purgatory: Remove LTO flags
-flto* implies -ffunction-sections. With LTO enabled, ld.lld generates
multiple .text sections for purgatory.ro:

  $ readelf -S purgatory.ro  | grep " .text"
    [ 1] .text             PROGBITS         0000000000000000  00000040
    [ 7] .text.purgatory   PROGBITS         0000000000000000  000020e0
    [ 9] .text.warn        PROGBITS         0000000000000000  000021c0
    [13] .text.sha256_upda PROGBITS         0000000000000000  000022f0
    [15] .text.sha224_upda PROGBITS         0000000000000000  00002be0
    [17] .text.sha256_fina PROGBITS         0000000000000000  00002bf0
    [19] .text.sha224_fina PROGBITS         0000000000000000  00002cc0

This causes WARNING from kexec_purgatory_setup_sechdrs():

  WARNING: CPU: 26 PID: 110894 at kernel/kexec_file.c:919
  kexec_load_purgatory+0x37f/0x390

Fix this by disabling LTO for purgatory.

[ AFAICT, x86 is the only arch that supports LTO and purgatory. ]

We could also fix this with an explicit linker script to rejoin .text.*
sections back into .text. However, given the benefit of LTOing purgatory
is small, simply disable the production of more .text.* sections for now.

Fixes: b33fff07e3 ("x86, build: allow LTO to be selected")
Signed-off-by: Song Liu <song@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Reviewed-by: Sami Tolvanen <samitolvanen@google.com>
Link: https://lore.kernel.org/r/20230914170138.995606-1-song@kernel.org
2023-09-17 09:49:03 +02:00
Kirill A. Shutemov
f530ee95b7 x86/boot/compressed: Reserve more memory for page tables
The decompressor has a hard limit on the number of page tables it can
allocate. This limit is defined at compile-time and will cause boot
failure if it is reached.

The kernel is very strict and calculates the limit precisely for the
worst-case scenario based on the current configuration. However, it is
easy to forget to adjust the limit when a new use-case arises. The
worst-case scenario is rarely encountered during sanity checks.

In the case of enabling 5-level paging, a use-case was overlooked. The
limit needs to be increased by one to accommodate the additional level.
This oversight went unnoticed until Aaron attempted to run the kernel
via kexec with 5-level paging and unaccepted memory enabled.

Update wost-case calculations to include 5-level paging.

To address this issue, let's allocate some extra space for page tables.
128K should be sufficient for any use-case. The logic can be simplified
by using a single value for all kernel configurations.

[ Also add a warning, should this memory run low - by Dave Hansen. ]

Fixes: 34bbb0009f ("x86/boot/compressed: Enable 5-level paging during decompression stage")
Reported-by: Aaron Lu <aaron.lu@intel.com>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20230915070221.10266-1-kirill.shutemov@linux.intel.com
2023-09-17 09:48:57 +02:00
Linus Torvalds
f0b0d403ea Kbuild fixes for v6.6
- Fix kernel-devel RPM and linux-headers Deb package
 
  - Fix too long argument list error in 'make modules_install'
 -----BEGIN PGP SIGNATURE-----
 
 iQJJBAABCgAzFiEEbmPs18K1szRHjPqEPYsBB53g2wYFAmUF0Y4VHG1hc2FoaXJv
 eUBrZXJuZWwub3JnAAoJED2LAQed4NsGXVcP/2Jiv5RLizT5Aq7O1WuvG37NgSck
 cP8JJnX8NQxtBxJaPN7z5+t3c8fucKb1M0oko0mu+8SanoeXfz2NlijztVgCOeI5
 DU8KPUQXmQLIwu2orpqrNqffBaiRpmrlo6HKsabmY8d67XwdWPxbwhUT8OOiDOQw
 7iAkp9fntxyHctzWiAyUXelublydfqJndyi73GYDr2QMu9NEC7ej06asTsdmyvKY
 JmIO31Xl3RwktUFUOPiF4+ZhR3c2Lqh54vZQTCs9KuCxNJGHB2w5pFh2YVZ6LhTE
 RDvn6qel9aoKZKSfTUCGkA5+YMN5boFjWv4Ld1xOXlLFTPIEzmi4k5+NuctUak+H
 KF8Zam9lgb/AKO9t2z+E52rB55NPc6l6kVs/4DkoEVRZ9t8itl/RDN51LgSYDu9e
 Hl172up3/mtXNS5x3FRClvwdZgKHPVtXudg/+6yXO6opyq55ePFnZrom3BOWXhj/
 BfUuI8g+Crb6Hfs4PB7II/ALaIVSqY3FvxfbKNSlDPUJ1s/OKg86Lc7ZG4r62mK4
 SRlwKrM75MYZNmVu7QULyMEVIJ6vY2FGcjq4vKS4612gF10TBFpAc49hVFZnctgf
 LEr+u79lcviM6oFaw+6jAEe5L2MldzFrT+hR1EeLTxYLEX39w4IKm/nk1o5Q0Zp+
 qxn5LPTtGrN5z35A
 =2LRy
 -----END PGP SIGNATURE-----

Merge tag 'kbuild-fixes-v6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild

Pull Kbuild fixes from Masahiro Yamada:

 - Fix kernel-devel RPM and linux-headers Deb package

 - Fix too long argument list error in 'make modules_install'

* tag 'kbuild-fixes-v6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
  kbuild: avoid long argument lists in make modules_install
  kbuild: fix kernel-devel RPM package and linux-headers Deb package
2023-09-16 15:27:00 -07:00
Linus Torvalds
3cec504909 vm: fix move_vma() memory accounting being off
Commit 408579cd62 ("mm: Update do_vmi_align_munmap() return
semantics") seems to have updated one of the callers of do_vmi_munmap()
incorrectly: it used to check for the error case (which didn't
change: negative means error).

That commit changed the check to the success case (which did change:
before that commit, 0 was success, and 1 was "success and lock
downgraded".  After the change, it's always 0 for success, and the lock
will have been released if requested).

This didn't change any actual VM behavior _except_ for memory accounting
when 'VM_ACCOUNT' was set on the vma.  Which made the wrong return value
test fairly subtle, since everything continues to work.

Or rather - it continues to work but the "Committed memory" accounting
goes all wonky (Committed_AS value in /proc/meminfo), and depending on
settings that then causes problems much much later as the VM relies on
bogus statistics for its heuristics.

Revert that one line of the change back to the original logic.

Fixes: 408579cd62 ("mm: Update do_vmi_align_munmap() return semantics")
Reported-by: Christoph Biedl <linux-kernel.bfrz@manchmal.in-ulm.de>
Reported-bisected-and-tested-by: Michael Labiuk <michael.labiuk@virtuozzo.com>
Cc: Bagas Sanjaya <bagasdotme@gmail.com>
Cc: Liam R. Howlett <Liam.Howlett@oracle.com>
Link: https://lore.kernel.org/all/1694366957@msgid.manchmal.in-ulm.de/
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2023-09-16 15:23:31 -07:00
Linus Torvalds
ad8a69f361 SCSI fixes on 20230916
16 small(ish) fixes all in drivers.  The major fixes are in pm8001
 (fixes MSI-X issue going back to its origin), the qla2xxx endianness
 fix, which fixes a bug on big endian and the lpfc ones which can cause
 an oops on module removal without them.
 
 Signed-off-by: James E.J. Bottomley <jejb@linux.ibm.com>
 -----BEGIN PGP SIGNATURE-----
 
 iJwEABMIAEQWIQTnYEDbdso9F2cI+arnQslM7pishQUCZQXNvyYcamFtZXMuYm90
 dG9tbGV5QGhhbnNlbnBhcnRuZXJzaGlwLmNvbQAKCRDnQslM7pishRKxAP4kCOBR
 UOGLJJdhmotofj9GX0Rr1qAFheJmTSAT6ctAOgD+IWucKDqKfbRLbmzqiSvmWmul
 P0STz4VnUKs9LqdLyok=
 =3wre
 -----END PGP SIGNATURE-----

Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi

Pull SCSI fixes from James Bottomley:
 "16 small(ish) fixes all in drivers.

  The major fixes are in pm8001 (fixes MSI-X issue going back to its
  origin), the qla2xxx endianness fix, which fixes a bug on big endian
  and the lpfc ones which can cause an oops on module removal without
  them"

* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
  scsi: lpfc: Prevent use-after-free during rmmod with mapped NVMe rports
  scsi: lpfc: Early return after marking final NLP_DROPPED flag in dev_loss_tmo
  scsi: lpfc: Fix the NULL vs IS_ERR() bug for debugfs_create_file()
  scsi: target: core: Fix target_cmd_counter leak
  scsi: pm8001: Setup IRQs on resume
  scsi: pm80xx: Avoid leaking tags when processing OPC_INB_SET_CONTROLLER_CONFIG command
  scsi: pm80xx: Use phy-specific SAS address when sending PHY_START command
  scsi: ufs: core: Poll HCS.UCRDY before issuing a UIC command
  scsi: ufs: core: Move __ufshcd_send_uic_cmd() outside host_lock
  scsi: qedf: Add synchronization between I/O completions and abort
  scsi: target: Replace strlcpy() with strscpy()
  scsi: qla2xxx: Fix NULL vs IS_ERR() bug for debugfs_create_dir()
  scsi: qla2xxx: Use raw_smp_processor_id() instead of smp_processor_id()
  scsi: qla2xxx: Correct endianness for rqstlen and rsplen
  scsi: ppa: Fix accidentally reversed conditions for 16-bit and 32-bit EPP
  scsi: megaraid_sas: Fix deadlock on firmware crashdump
2023-09-16 11:54:48 -07:00
Linus Torvalds
cc3e5afc6a ata changes for 6.6-rc2
- Fix link power management transitions to disallow unsupported
    states (Niklas).
 
  - A small string handling fix for the sata_mv driver (Christophe).
 
  - Clear port pending interrupts before reset, as per AHCI
    specifications (Szuying). Followup fixes for this one are to not
    clear ATA_PFLAG_EH_PENDING in ata_eh_reset() to allow EH to
    continue on with other actions recorded with error interrupts
    triggered before EH completes. A~Nd an additional fix to avoid
    thawing a port twice in EH (Niklas).
 
  - Small code style fixes in the pata_parport driver to silence the
    build bot as it keeps complaining about bad indentation (me).
 
  - A fix for the recent CDL code to avoid fetching sense data for
    successful commands when not necessary for correct operation
    (Niklas).
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQSRPv8tYSvhwAzJdzjdoc3SxdoYdgUCZQWcYAAKCRDdoc3SxdoY
 dg0HAQDxkfzueH5T00LSsg9+jI73eMScmC7asR3cbwmEiTRATgEAxpWUgaR7e7YP
 ZM9XWTyfcCYTfAEaJduS5a6ThHl3pAI=
 =gTtf
 -----END PGP SIGNATURE-----

Merge tag 'ata-6.6-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/libata

Pull ata fixes from Damien Le Moal:

 - Fix link power management transitions to disallow unsupported states
   (Niklas)

 - A small string handling fix for the sata_mv driver (Christophe)

 - Clear port pending interrupts before reset, as per AHCI
   specifications (Szuying).

   Followup fixes for this one are to not clear ATA_PFLAG_EH_PENDING in
   ata_eh_reset() to allow EH to continue on with other actions recorded
   with error interrupts triggered before EH completes. And an
   additional fix to avoid thawing a port twice in EH (Niklas)

 - Small code style fixes in the pata_parport driver to silence the
   build bot as it keeps complaining about bad indentation (me)

 - A fix for the recent CDL code to avoid fetching sense data for
   successful commands when not necessary for correct operation (Niklas)

* tag 'ata-6.6-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/libata:
  ata: libata-core: fetch sense data for successful commands iff CDL enabled
  ata: libata-eh: do not thaw the port twice in ata_eh_reset()
  ata: libata-eh: do not clear ATA_PFLAG_EH_PENDING in ata_eh_reset()
  ata: pata_parport: Fix code style issues
  ata: libahci: clear pending interrupt status
  ata: sata_mv: Fix incorrect string length computation in mv_dump_mem()
  ata: libata: disallow dev-initiated LPM transitions to unsupported states
2023-09-16 11:49:57 -07:00
Linus Torvalds
cce67b6bed USB fix for 6.6-rc2
Here is a single USB fix for a much-reported regression for 6.6-rc1.
 
 It resolves a crash in the typec debugfs code for many systems.  It's
 been in linux-next with no reported issues, and many people have
 reported it resolving their problem with 6.6-rc1.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 
 iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCZQWXNQ8cZ3JlZ0Brcm9h
 aC5jb20ACgkQMUfUDdst+ymxmwCcDS42Mt0fXkxsjzJy4KCNSCNEvcYAoJ/wfVw7
 K2NylIy78y0PEvO4i0H9
 =8qKC
 -----END PGP SIGNATURE-----

Merge tag 'usb-6.6-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb

Pull USB fix from Greg KH:
 "Here is a single USB fix for a much-reported regression for 6.6-rc1.

  It resolves a crash in the typec debugfs code for many systems. It's
  been in linux-next with no reported issues, and many people have
  reported it resolving their problem with 6.6-rc1"

* tag 'usb-6.6-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
  usb: typec: ucsi: Fix NULL pointer dereference
2023-09-16 11:37:11 -07:00
Linus Torvalds
205d049478 Driver core fix for 6.6-rc2
Here is a single driver core fix for a much-reported-by-sysbot issue
 that showed up in 6.6-rc1.  It's been submitted by many people, all in
 the same way, so it obviously fixes things for them all.
 
 Also in here is a single documentation update adding riscv to the
 embargoed hardware document in case there are any future issues with
 that processor family.
 
 Both of these have been in linux-next with no reported problems.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 
 iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCZQWXxA8cZ3JlZ0Brcm9h
 aC5jb20ACgkQMUfUDdst+ymPaQCgg34ErWYWTR2q/uCf4cpBce0q2xAAn226mZ6f
 Tt90d/o0lXnhtV8mXnWm
 =3Ye9
 -----END PGP SIGNATURE-----

Merge tag 'driver-core-6.6-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core

Pull driver core fixes from Greg KH:
 "Here is a single driver core fix for a much-reported-by-sysbot issue
  that showed up in 6.6-rc1. It's been submitted by many people, all in
  the same way, so it obviously fixes things for them all.

  Also in here is a single documentation update adding riscv to the
  embargoed hardware document in case there are any future issues with
  that processor family.

  Both of these have been in linux-next with no reported problems"

* tag 'driver-core-6.6-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
  Documentation: embargoed-hardware-issues.rst: Add myself for RISC-V
  driver core: return an error when dev_set_name() hasn't happened
2023-09-16 11:26:52 -07:00
Linus Torvalds
fd455e77a6 Char/Misc fix for 6.6-rc2
Here is a single patch for 6.6-rc2 that reverts a 6.5 change for the
 comedi subsystem that has ended up being incorrect and caused drivers
 that were working for people to be unable to be able to be selected to
 build at all.  To fix this, the Kconfig change needs to be reverted and
 a future set of fixes for the ioport dependancies will show up in
 6.7-rc1 (there's no rush for them.)
 
 This has been in linux-next with no reported issues.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 
 iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCZQWYWg8cZ3JlZ0Brcm9h
 aC5jb20ACgkQMUfUDdst+ylW2ACdGSq8skVr8qCcaQ7z1nbrc/HWcXgAoKxrP53E
 08QGj1NOD8cqWIcClQfg
 =dvIN
 -----END PGP SIGNATURE-----

Merge tag 'char-misc-6.6-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc

Pull char/misc fix from Greg KH:
 "Here is a single patch for 6.6-rc2 that reverts a 6.5 change for the
  comedi subsystem that has ended up being incorrect and caused drivers
  that were working for people to be unable to be able to be selected to
  build at all.

  To fix this, the Kconfig change needs to be reverted and a future set
  of fixes for the ioport dependancies will show up in 6.7-rc1 (there's
  no rush for them.)

  This has been in linux-next with no reported issues"

* tag 'char-misc-6.6-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
  Revert "comedi: add HAS_IOPORT dependencies"
2023-09-16 11:17:19 -07:00
Linus Torvalds
c37f8efcbc Main thing is the removal of 'probe_new' because all i2c client drivers
are converted now. Thanks Uwe, this marks the end of a long conversion
 process. Other than that, we have a few Kconfig updates and driver
 bugfixes.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEOZGx6rniZ1Gk92RdFA3kzBSgKbYFAmUFekwACgkQFA3kzBSg
 Kbbuag//WWvlEdixGIHwjiToGm4zvpVdV5txr2L1p1xWdXSrXtCVUifgvg6tTFj8
 04hyuEP0c8ml90ZpBb6AMSggNGN09UdaMqb+dj52XBFbqsFOyCjXjy8bHOWZjJ5C
 /3N9/qUpDW4aJ3IBSQgLp+tcW+r6ulGa7KmZGazKkKSLiDIsqS4UicgEzLCA0Wig
 Zidy7Eaq6oo+nGtZH8UOava+o4P9nIrKRf/ILyXK8amFat0rTD1eT+fc7TvI5iC7
 cZ5oFCAsaL1Jz7LNQxiCEQmQWBZOpyUbQlOfT2MCMNhWJG0x6FpF1cYqHHarNEJh
 UDxGQHG0Dj+AHtuZ4D4qAMxFlEYTVVNVNhSea02mxlqcOfbvvC73YfHAZIUsNdPt
 wStg+eSJjHAt/omqzOTTLEJAYnCWvjxc6+7/PN0HjLAipaR/KH1p67mCPKx5y/lt
 +OhvOS2eOpHWwiE9OZZCARqX5HlDJpGpK4TFy2rMOJvKnvYSzkHOd7Zy6Q9gRCW7
 vVeJb1h1tI10RAufk8Rxfp3g8UiAFF8F1rxjDAgmMpGVTqM+TIM5Z2/b2vZdNU1R
 g1mta5OtDzVILs1uwkzS/opSsGvm0e4kDDREyb0spZusi6GZbni89+u93AXhCnkl
 UVmnz65CvQ7eiE9RGLXnsAIIgDdiO7JOAS2ALzK8/lPs8eXs3fU=
 =hFC5
 -----END PGP SIGNATURE-----

Merge tag 'i2c-for-6.6-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux

Pull i2c fixes from Wolfram Sang:
 "The main thing is the removal of 'probe_new' because all i2c client
  drivers are converted now. Thanks Uwe, this marks the end of a long
  conversion process.

  Other than that, we have a few Kconfig updates and driver bugfixes"

* tag 'i2c-for-6.6-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
  i2c: cadence: Fix the kernel-doc warnings
  i2c: aspeed: Reset the i2c controller when timeout occurs
  i2c: I2C_MLXCPLD on ARM64 should depend on ACPI
  i2c: Make I2C_ATR invisible
  i2c: Drop legacy callback .probe_new()
  w1: ds2482: Switch back to use struct i2c_driver's .probe()
2023-09-16 11:09:18 -07:00
Niklas Cassel
5e35a9ac3f ata: libata-core: fetch sense data for successful commands iff CDL enabled
Currently, we fetch sense data for a _successful_ command if either:
1) Command was NCQ and ATA_DFLAG_CDL_ENABLED flag set (flag
   ATA_DFLAG_CDL_ENABLED will only be set if the Successful NCQ command
   sense data supported bit is set); or
2) Command was non-NCQ and regular sense data reporting is enabled.

This means that case 2) will trigger for a non-NCQ command which has
ATA_SENSE bit set, regardless if CDL is enabled or not.

This decision was by design. If the device reports that it has sense data
available, it makes sense to fetch that sense data, since the sk/asc/ascq
could be important information regardless if CDL is enabled or not.

However, the fetching of sense data for a successful command is done via
ATA EH. Considering how intricate the ATA EH is, we really do not want to
invoke ATA EH unless absolutely needed.

Before commit 18bd7718b5 ("scsi: ata: libata: Handle completion of CDL
commands using policy 0xD") we never fetched sense data for successful
commands.

In order to not invoke the ATA EH unless absolutely necessary, even if the
device claims support for sense data reporting, only fetch sense data for
successful (NCQ and non-NCQ commands) commands that are using CDL.

[Damien] Modified the check to test the qc flag ATA_QCFLAG_HAS_CDL
instead of the device support for CDL, which is implied for commands
using CDL.

Fixes: 3ac873c76d ("ata: libata-core: fix when to fetch sense data for successful commands")
Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
2023-09-16 21:12:19 +09:00
Niklas Cassel
7a3bc2b398 ata: libata-eh: do not thaw the port twice in ata_eh_reset()
commit 1e641060c4 ("libata: clear eh_info on reset completion") added
a workaround that broke the retry mechanism in ATA EH.

Tejun himself suggested to remove this workaround when it was identified
to cause additional problems:
https://lore.kernel.org/linux-ide/20110426135027.GI878@htj.dyndns.org/

He even said:
"Hmm... it seems I wasn't thinking straight when I added that work around."
https://lore.kernel.org/linux-ide/20110426155229.GM878@htj.dyndns.org/

While removing the workaround solved the issue, however, the workaround was
kept to avoid "spurious hotplug events during reset", and instead another
workaround was added on top of the existing workaround in commit
8c56cacc72 ("libata: fix unexpectedly frozen port after ata_eh_reset()").

Because these IRQs happened when the port was frozen, we know that they
were actually a side effect of PxIS and IS.IPS(x) not being cleared before
the COMRESET. This is now done in commit 94152042eaa9 ("ata: libahci: clear
pending interrupt status"), so these workarounds can now be removed.

Since commit 1e641060c4 ("libata: clear eh_info on reset completion") has
now been reverted, the ATA EH retry mechanism is functional again, so there
is once again no need to thaw the port more than once in ata_eh_reset().

This reverts "the workaround on top of the workaround" introduced in commit
8c56cacc72 ("libata: fix unexpectedly frozen port after ata_eh_reset()").

Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
2023-09-16 21:11:28 +09:00
Niklas Cassel
80cc944eca ata: libata-eh: do not clear ATA_PFLAG_EH_PENDING in ata_eh_reset()
ata_scsi_port_error_handler() starts off by clearing ATA_PFLAG_EH_PENDING,
before calling ap->ops->error_handler() (without holding the ap->lock).

If an error IRQ is received while ap->ops->error_handler() is running,
the irq handler will set ATA_PFLAG_EH_PENDING.

Once ap->ops->error_handler() returns, ata_scsi_port_error_handler()
checks if ATA_PFLAG_EH_PENDING is set, and if it is, another iteration
of ATA EH is performed.

The problem is that ATA_PFLAG_EH_PENDING is not only cleared by
ata_scsi_port_error_handler(), it is also cleared by ata_eh_reset().

ata_eh_reset() is called by ap->ops->error_handler(). This additional
clearing done by ata_eh_reset() breaks the whole retry logic in
ata_scsi_port_error_handler(). Thus, if an error IRQ is received while
ap->ops->error_handler() is running, the port will currently remain
frozen and will never get re-enabled.

The additional clearing in ata_eh_reset() was introduced in commit
1e641060c4 ("libata: clear eh_info on reset completion").

Looking at the original error report:
https://marc.info/?l=linux-ide&m=124765325828495&w=2

We can see the following happening:
[    1.074659] ata3: XXX port freeze
[    1.074700] ata3: XXX hardresetting link, stopping engine
[    1.074746] ata3: XXX flipping SControl

[    1.411471] ata3: XXX irq_stat=400040 CONN|PHY
[    1.411475] ata3: XXX port freeze

[    1.420049] ata3: XXX starting engine
[    1.420096] ata3: XXX rc=0, class=1
[    1.420142] ata3: XXX clearing IRQs for thawing
[    1.420188] ata3: XXX port thawed
[    1.420234] ata3: SATA link up 3.0 Gbps (SStatus 123 SControl 300)

We are not supposed to be able to receive an error IRQ while the port is
frozen (PxIE is set to 0, i.e. all IRQs for the port are disabled).

AHCI 1.3.1 section 10.7.1.1 First Tier (IS Register) states:
"Each bit location can be thought of as reporting a '1' if the virtual
"interrupt line" for that port is indicating it wishes to generate an
interrupt. That is, if a port has one or more interrupt status bit set,
and the enables for those status bits are set, then this bit shall be set."

Additionally, AHCI state P:ComInit clearly shows that the state machine
will only jump to P:ComInitSetIS (which sets IS.IPS(x) to '1'), if PxIE.PCE
is set to '1'. In our case, PxIE is set to 0, so IS.IPS(x) won't get set.

So IS.IPS(x) only gets set if PxIS and PxIE is set.

AHCI 1.3.1 section 10.7.1.1 First Tier (IS Register) also states:
"The bits in this register are read/write clear. It is set by the level of
the virtual interrupt line being a set, and cleared by a write of '1' from
the software."

So if IS.IPS(x) is set, you need to explicitly clear it by writing a 1 to
IS.IPS(x) for that port.

Since PxIE is cleared, the only way to get an interrupt while the port is
frozen, is if IS.IPS(x) is set, and the only way IS.IPS(x) can be set when
the port is frozen, is if it was set before the port was frozen.

However, since commit 737dd811a3 ("ata: libahci: clear pending interrupt
status"), we clear both PxIS and IS.IPS(x) after freezing the port, but
before the COMRESET, so the problem that commit 1e641060c4 ("libata:
clear eh_info on reset completion") fixed can no longer happen.

Thus, revert commit 1e641060c4 ("libata: clear eh_info on reset
completion"), so that the retry logic in ata_scsi_port_error_handler()
works once again. (The retry logic is still needed, since we can still
get an error IRQ _after_ the port has been thawed, but before
ata_scsi_port_error_handler() takes the ap->lock in order to check
if ATA_PFLAG_EH_PENDING is set.)

Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
2023-09-16 21:10:37 +09:00
Linus Torvalds
57d88e8a59 linux-kselftest-fixes-6.6-rc2
This kselftest fixes update for Linux 6.6-rc2 consists of important
 fixes to user_events test and ftrace test.
 
 user_events test has been enabled for default run in Linux 6.6-rc1. The
 following fixes are for bugs found since then:
 
 - adds checks for dependencies and skips the test. user_events test
   requires root access, and tracefs and user_events enabled. It leaves
   tracefs mounted and a fix is in progress for that missing piece.
 
 - creates user_events test-specific Kconfig fragments.
 
 ftrace test fixes:
 
 - unmounts tracefs for recovering environment. Fix identified during the
   above mentioned user_events dependencies fix.
 
 - adds softlink to latest log directory improving usage.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEPZKym/RZuOCGeA/kCwJExA0NQxwFAmUEgFwACgkQCwJExA0N
 QxwhCQ//el9LBeL2siNsMY7tH9LyPfO1FTq3ZqECqGGf2CN66uT53oPJDfgEhG1k
 n+x9+lAPe7Pezy6/pzocfPnbFcFYpgoBclNCcXZW3Yhkt6XUgWDBEzWvYIvhaHjg
 AlYv/ez/IcOpvlph1LlfcNtmRJX/ss+K1sZmXD2ivT5ocoVnwTSELtzVxIdsTQGg
 snuPgz23ODXoK+TCQvp2tghEpgoQclTYkKH3C534GP1SV6Mv2btoz6p654x0DIR3
 nh8wUBAXhJR9g6eumgziWWWhLBZSz3IQWVJZ7pzOjSDkIiSd0HX18uVpzGx2Mq0E
 yScYcCfLQYHoUvskhKfntZjDlRtb07DAKxqv8Tjbu3rvssYSNVMlnb5qlcqfp7go
 CZyiPZ9es8ELbPy2iVn3sjNGhGYcBxJ0ZIF7EEYutRV9kcmG4j+2oaRUWdkz3a+P
 XlGnDsdTsHic2qWMPX80V6g1eecrDoWeID12EC5Vtr80Dp4elGi+PS5jviJ6wgP8
 Y5SksEyamCnSE71M1uQoSCbhOA5/xVpK65N9EvLrMCDNWQ/ZKVle4b0SKM7AQpZM
 dN4UTNEZqfSYW6xLpWOUmf1EtIewiJsdHbizHooxIxjnNAVvi0ZElDG//omxj1Vz
 bT4OWvPGDJIqxwYHWowcZC9tq+rjDcH5nNTn6x8dPHCRoQSvNX4=
 =HdKU
 -----END PGP SIGNATURE-----

Merge tag 'linux-kselftest-fixes-6.6-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest

Pull more kselftest fixes from Shuah Khan
 "Fixes to user_events test and ftrace test.

  The user_events test was enabled by default in Linux 6.6-rc1. The
  following fixes are for bugs found since then:

   - add checks for dependencies and skip the test if they aren't met.

     The user_events test requires root access, and tracefs and
     user_events enabled. It leaves tracefs mounted and a fix is in
     progress for that missing piece.

   - create user_events test-specific Kconfig fragments

  ftrace test fixes:

   - unmount tracefs for recovering environment. Fix identified during
     the above mentioned user_events dependencies fix.

   - adds softlink to latest log directory improving usage"

* tag 'linux-kselftest-fixes-6.6-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
  selftests: tracing: Fix to unmount tracefs for recovering environment
  selftests: user_events: create test-specific Kconfig fragments
  ftrace/selftests: Add softlink to latest log directory
  selftests/user_events: Fix failures when user_events is not installed
2023-09-15 19:22:20 -07:00
Linus Torvalds
d8d7cd6563 nfsd-6.6 fixes:
- Use correct order when encoding NFSv4 RENAME change_info
 - Fix a potential oops during NFSD shutdown
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEKLLlsBKG3yQ88j7+M2qzM29mf5cFAmUEf90ACgkQM2qzM29m
 f5dcrhAAjz3SLoz4nAlo441JMATp/auwqTBKlWvNUWbK/ZgYjUcTXH8RfPbBQ/Yl
 5NfsThubmPTSr/bNvV+C963C4c/25d45SYwabAdLoZWKIpWf1mCDNhE1JkGQoM83
 KyZcrLnEknh/fN6jAxueFhWPCaavGDK8eI8H6Ls33GvX0cG4S7IYseGZ1x2dMWpE
 XFnV8oAWSD3cjKrTGwPYja2W0ShISGn4UMnj+lPWEpfCDydnQzKh87dhWzvlAxCf
 4Ckikmv3vcCfm15Ja0JZX3K1cvh0A3oISbgN/QNqGpzDzQ1qyugsx9uNltDa76N8
 NzGENnvO2/WURkw4DHJYIsiNt82uB04NYSAL6mLat3GU4DeAKOl3r+9W0jZPFS17
 7mOLh9dqg3ubSxOr6BpeY4I6+Uq2enUTwh1+VkvC7hQ906ZY8kzaj5MtLddvkjCw
 p4BTDzH280AwtnwiJ7q0WvBR5TxdGc3fqqsLJ+yMX2aGdhM1iMOjjENoSc3+v+Cy
 pEK1tju/lDzcMZrl5A08Za30L5boHp11n5SLia7tctKiFs4biL3WwNu527Y/Klq6
 04DtkcXdF/tW702388sKL1UnnXhw4KA8k7Us4HrQL+zXDZ+/UJ+AzY6s8ls2ytbN
 0NESe/ntsKjoijrRphAp+RgC8fhC7iT0GMJP8OHyHUqEOU2Wrbw=
 =ZTRy
 -----END PGP SIGNATURE-----

Merge tag 'nfsd-6.6-1' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux

Pull nfsd fixes from Chuck Lever:

 - Use correct order when encoding NFSv4 RENAME change_info

 - Fix a potential oops during NFSD shutdown

* tag 'nfsd-6.6-1' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux:
  NFSD: fix possible oops when nfsd/pool_stats is closed.
  nfsd: fix change_info in NFSv4 RENAME replies
2023-09-15 16:48:44 -07:00