Commit Graph

81866 Commits

Author SHA1 Message Date
Matthew Wilcox (Oracle)
3c98a41cc2 buffer: convert grow_dev_page() to use a folio
Get a folio from the page cache instead of a page, then use the folio API
throughout.  Removes a few calls to compound_head() and may be needed to
support block size > PAGE_SIZE.

Link: https://lkml.kernel.org/r/20230612210141.730128-11-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Andreas Gruenbacher <agruenba@redhat.com>
Cc: Bob Peterson <rpeterso@redhat.com>
Cc: Hannes Reinecke <hare@suse.com>
Cc: Luis Chamberlain <mcgrof@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-06-19 16:19:31 -07:00
Matthew Wilcox (Oracle)
4a9622f2fd buffer: convert page_zero_new_buffers() to folio_zero_new_buffers()
Most of the callers already have a folio; convert reiserfs_write_end() to
have a folio.  Removes a couple of hidden calls to compound_head().

Link: https://lkml.kernel.org/r/20230612210141.730128-10-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Andreas Gruenbacher <agruenba@redhat.com>
Cc: Bob Peterson <rpeterso@redhat.com>
Cc: Hannes Reinecke <hare@suse.com>
Cc: Luis Chamberlain <mcgrof@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-06-19 16:19:31 -07:00
Matthew Wilcox (Oracle)
8c6cb3e3d5 buffer: convert __block_commit_write() to take a folio
This removes a hidden call to compound_head() inside
__block_commit_write() and moves it to those callers which are still page
based.  Also make block_write_end() safe for large folios.

Link: https://lkml.kernel.org/r/20230612210141.730128-9-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Andreas Gruenbacher <agruenba@redhat.com>
Cc: Bob Peterson <rpeterso@redhat.com>
Cc: Hannes Reinecke <hare@suse.com>
Cc: Luis Chamberlain <mcgrof@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-06-19 16:19:31 -07:00
Matthew Wilcox (Oracle)
fe181377a2 buffer: convert block_page_mkwrite() to use a folio
If any page in a folio is dirtied, dirty the entire folio.  Removes a
number of hidden calls to compound_head() and references to page->mapping
and page->index.  Fixes a pre-existing bug where we could mark a folio as
dirty if the file is truncated to a multiple of the page size just as we
take the page fault.  I don't believe this bug has any bad effect, it's
just inefficient.

Link: https://lkml.kernel.org/r/20230612210141.730128-8-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Andreas Gruenbacher <agruenba@redhat.com>
Cc: Bob Peterson <rpeterso@redhat.com>
Cc: Hannes Reinecke <hare@suse.com>
Cc: Luis Chamberlain <mcgrof@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-06-19 16:19:31 -07:00
Matthew Wilcox (Oracle)
bb0ea5989c buffer: make block_write_full_page() handle large folios correctly
Keep the interface as struct page, but work entirely on the folio
internally.  Removes several PAGE_SIZE assumptions and removes some
references to page->index and page->mapping.

Link: https://lkml.kernel.org/r/20230612210141.730128-7-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Tested-by: Bob Peterson <rpeterso@redhat.com>
Reviewed-by: Bob Peterson <rpeterso@redhat.com>
Cc: Andreas Gruenbacher <agruenba@redhat.com>
Cc: Hannes Reinecke <hare@suse.com>
Cc: Luis Chamberlain <mcgrof@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-06-19 16:19:30 -07:00
Matthew Wilcox (Oracle)
285e0fc95a gfs2: support ludicrously large folios in gfs2_trans_add_databufs()
We may someday support folios larger than 4GB, so use a size_t for the
byte count within a folio to prevent unpleasant truncations.

Link: https://lkml.kernel.org/r/20230612210141.730128-6-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Tested-by: Bob Peterson <rpeterso@redhat.com>
Reviewed-by: Bob Peterson <rpeterso@redhat.com>
Reviewed-by: Andreas Gruenbacher <agruenba@redhat.com>
Cc: Hannes Reinecke <hare@suse.com>
Cc: Luis Chamberlain <mcgrof@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-06-19 16:19:30 -07:00
Matthew Wilcox (Oracle)
53418a18fc buffer: convert __block_write_full_page() to __block_write_full_folio()
Remove nine hidden calls to compound_head() by using a folio instead of a
page.

Link: https://lkml.kernel.org/r/20230612210141.730128-5-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Tested-by: Bob Peterson <rpeterso@redhat.com>
Reviewed-by: Bob Peterson <rpeterso@redhat.com>
Cc: Andreas Gruenbacher <agruenba@redhat.com>
Cc: Hannes Reinecke <hare@suse.com>
Cc: Luis Chamberlain <mcgrof@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-06-19 16:19:30 -07:00
Matthew Wilcox (Oracle)
c1401fd18f gfs2: convert gfs2_write_jdata_page() to gfs2_write_jdata_folio()
Add support for large folios and remove some accesses to page->mapping and
page->index.

Link: https://lkml.kernel.org/r/20230612210141.730128-4-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Tested-by: Bob Peterson <rpeterso@redhat.com>
Reviewed-by: Bob Peterson <rpeterso@redhat.com>
Reviewed-by: Andreas Gruenbacher <agruenba@redhat.com>
Cc: Hannes Reinecke <hare@suse.com>
Cc: Luis Chamberlain <mcgrof@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-06-19 16:19:30 -07:00
Matthew Wilcox (Oracle)
d0cfcaee0a gfs2: pass a folio to __gfs2_jdata_write_folio()
Remove a couple of folio->page conversions in the callers, and two calls
to compound_head() in the function itself.  Rename it from
__gfs2_jdata_writepage() to __gfs2_jdata_write_folio().

Link: https://lkml.kernel.org/r/20230612210141.730128-3-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Tested-by: Bob Peterson <rpeterso@redhat.com>
Reviewed-by: Bob Peterson <rpeterso@redhat.com>
Reviewed-by: Andreas Gruenbacher <agruenba@redhat.com>
Cc: Hannes Reinecke <hare@suse.com>
Cc: Luis Chamberlain <mcgrof@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-06-19 16:19:29 -07:00
Matthew Wilcox (Oracle)
c0ba597db9 gfs2: use a folio inside gfs2_jdata_writepage()
Patch series "gfs2/buffer folio changes for 6.5", v3.

This kind of started off as a gfs2 patch series, then became entwined with
buffer heads once I realised that gfs2 was the only remaining caller of
__block_write_full_page().  For those not in the gfs2 world, the big point
of this series is that block_write_full_page() should now handle large
folios correctly.


This patch (of 14):

Replace a few implicit calls to compound_head() with one explicit one.

Link: https://lkml.kernel.org/r/20230612210141.730128-1-willy@infradead.org
Link: https://lkml.kernel.org/r/20230612210141.730128-2-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Tested-by: Bob Peterson <rpeterso@redhat.com>
Reviewed-by: Bob Peterson <rpeterso@redhat.com>
Reviewed-by: Andreas Gruenbacher <agruenba@redhat.com>
Cc: Andreas Gruenbacher <agruenba@redhat.com>
Cc: Hannes Reinecke <hare@suse.com>
Cc: Luis Chamberlain <mcgrof@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-06-19 16:19:29 -07:00
Liam R. Howlett
65ac132027 userfaultfd: fix regression in userfaultfd_unmap_prep()
Android reported a performance regression in the userfaultfd unmap path. 
A closer inspection on the userfaultfd_unmap_prep() change showed that a
second tree walk would be necessary in the reworked code.

Fix the regression by passing each VMA that will be unmapped through to
the userfaultfd_unmap_prep() function as they are added to the unmap list,
instead of re-walking the tree for the VMA.

Link: https://lkml.kernel.org/r/20230601015402.2819343-1-Liam.Howlett@oracle.com
Fixes: 69dbe6daf1 ("userfaultfd: use maple tree iterator to iterate VMAs")
Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Reported-by: Suren Baghdasaryan <surenb@google.com>
Suggested-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-06-19 16:19:28 -07:00
Ryan Roberts
c33c794828 mm: ptep_get() conversion
Convert all instances of direct pte_t* dereferencing to instead use
ptep_get() helper.  This means that by default, the accesses change from a
C dereference to a READ_ONCE().  This is technically the correct thing to
do since where pgtables are modified by HW (for access/dirty) they are
volatile and therefore we should always ensure READ_ONCE() semantics.

But more importantly, by always using the helper, it can be overridden by
the architecture to fully encapsulate the contents of the pte.  Arch code
is deliberately not converted, as the arch code knows best.  It is
intended that arch code (arm64) will override the default with its own
implementation that can (e.g.) hide certain bits from the core code, or
determine young/dirty status by mixing in state from another source.

Conversion was done using Coccinelle:

----

// $ make coccicheck \
//          COCCI=ptepget.cocci \
//          SPFLAGS="--include-headers" \
//          MODE=patch

virtual patch

@ depends on patch @
pte_t *v;
@@

- *v
+ ptep_get(v)

----

Then reviewed and hand-edited to avoid multiple unnecessary calls to
ptep_get(), instead opting to store the result of a single call in a
variable, where it is correct to do so.  This aims to negate any cost of
READ_ONCE() and will benefit arch-overrides that may be more complex.

Included is a fix for an issue in an earlier version of this patch that
was pointed out by kernel test robot.  The issue arose because config
MMU=n elides definition of the ptep helper functions, including
ptep_get().  HUGETLB_PAGE=n configs still define a simple
huge_ptep_clear_flush() for linking purposes, which dereferences the ptep.
So when both configs are disabled, this caused a build error because
ptep_get() is not defined.  Fix by continuing to do a direct dereference
when MMU=n.  This is safe because for this config the arch code cannot be
trying to virtualize the ptes because none of the ptep helpers are
defined.

Link: https://lkml.kernel.org/r/20230612151545.3317766-4-ryan.roberts@arm.com
Reported-by: kernel test robot <lkp@intel.com>
Link: https://lore.kernel.org/oe-kbuild-all/202305120142.yXsNEo6H-lkp@intel.com/
Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alex Williamson <alex.williamson@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andrey Konovalov <andreyknvl@gmail.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: Dave Airlie <airlied@gmail.com>
Cc: Dimitri Sivanich <dimitri.sivanich@hpe.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Jérôme Glisse <jglisse@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Lorenzo Stoakes <lstoakes@gmail.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Mike Rapoport (IBM) <rppt@kernel.org>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Naoya Horiguchi <naoya.horiguchi@nec.com>
Cc: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: SeongJae Park <sj@kernel.org>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Uladzislau Rezki (Sony) <urezki@gmail.com>
Cc: Vincenzo Frascino <vincenzo.frascino@arm.com>
Cc: Yu Zhao <yuzhao@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-06-19 16:19:25 -07:00
Hugh Dickins
2b683a4ff6 mm/userfaultfd: retry if pte_offset_map() fails
Instead of worrying whether the pmd is stable, userfaultfd_must_wait()
call pte_offset_map() as before, but go back to try again if that fails.

Risk of endless loop?  It already broke out if pmd_none(), !pmd_present()
or pmd_trans_huge(), and pte_offset_map() would have cleared pmd_bad():
which leaves pmd_devmap().  Presumably pmd_devmap() is inappropriate in a
vma subject to userfaultfd (it would have been mistreated before), but add
a check just to avoid all possibility of endless loop there.

Link: https://lkml.kernel.org/r/54423f-3dff-fd8d-614a-632727cc4cfb@google.com
Signed-off-by: Hugh Dickins <hughd@google.com>
Acked-by: Peter Xu <peterx@redhat.com>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: David Hildenbrand <david@redhat.com>
Cc: "Huang, Ying" <ying.huang@intel.com>
Cc: Ira Weiny <ira.weiny@intel.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Lorenzo Stoakes <lstoakes@gmail.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Mike Rapoport (IBM) <rppt@kernel.org>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Naoya Horiguchi <naoya.horiguchi@nec.com>
Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Qi Zheng <zhengqi.arch@bytedance.com>
Cc: Ralph Campbell <rcampbell@nvidia.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: SeongJae Park <sj@kernel.org>
Cc: Song Liu <song@kernel.org>
Cc: Steven Price <steven.price@arm.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Will Deacon <will@kernel.org>
Cc: Yang Shi <shy828301@gmail.com>
Cc: Yu Zhao <yuzhao@google.com>
Cc: Zack Rusin <zackr@vmware.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-06-19 16:19:15 -07:00
Hugh Dickins
7780d04046 mm/pagewalkers: ACTION_AGAIN if pte_offset_map_lock() fails
Simple walk_page_range() users should set ACTION_AGAIN to retry when
pte_offset_map_lock() fails.

No need to check pmd_trans_unstable(): that was precisely to avoid the
possiblity of calling pte_offset_map() on a racily removed or inserted THP
entry, but such cases are now safely handled inside it.  Likewise there is
no need to check pmd_none() or pmd_bad() before calling it.

Link: https://lkml.kernel.org/r/c77d9d10-3aad-e3ce-4896-99e91c7947f3@google.com
Signed-off-by: Hugh Dickins <hughd@google.com>
Reviewed-by: SeongJae Park <sj@kernel.org> for mm/damon part
Cc: Alistair Popple <apopple@nvidia.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: David Hildenbrand <david@redhat.com>
Cc: "Huang, Ying" <ying.huang@intel.com>
Cc: Ira Weiny <ira.weiny@intel.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Lorenzo Stoakes <lstoakes@gmail.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Mike Rapoport (IBM) <rppt@kernel.org>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Naoya Horiguchi <naoya.horiguchi@nec.com>
Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Qi Zheng <zhengqi.arch@bytedance.com>
Cc: Ralph Campbell <rcampbell@nvidia.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Song Liu <song@kernel.org>
Cc: Steven Price <steven.price@arm.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Will Deacon <will@kernel.org>
Cc: Yang Shi <shy828301@gmail.com>
Cc: Yu Zhao <yuzhao@google.com>
Cc: Zack Rusin <zackr@vmware.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-06-19 16:19:13 -07:00
Hugh Dickins
26e1a0c327 mm: use pmdp_get_lockless() without surplus barrier()
Patch series "mm: allow pte_offset_map[_lock]() to fail", v2.

What is it all about?  Some mmap_lock avoidance i.e.  latency reduction. 
Initially just for the case of collapsing shmem or file pages to THPs; but
likely to be relied upon later in other contexts e.g.  freeing of empty
page tables (but that's not work I'm doing).  mmap_write_lock avoidance
when collapsing to anon THPs?  Perhaps, but again that's not work I've
done: a quick attempt was not as easy as the shmem/file case.

I would much prefer not to have to make these small but wide-ranging
changes for such a niche case; but failed to find another way, and have
heard that shmem MADV_COLLAPSE's usefulness is being limited by that
mmap_write_lock it currently requires.

These changes (though of course not these exact patches) have been in
Google's data centre kernel for three years now: we do rely upon them.

What is this preparatory series about?

The current mmap locking will not be enough to guard against that tricky
transition between pmd entry pointing to page table, and empty pmd entry,
and pmd entry pointing to huge page: pte_offset_map() will have to
validate the pmd entry for itself, returning NULL if no page table is
there.  What to do about that varies: sometimes nearby error handling
indicates just to skip it; but in many cases an ACTION_AGAIN or "goto
again" is appropriate (and if that risks an infinite loop, then there must
have been an oops, or pfn 0 mistaken for page table, before).

Given the likely extension to freeing empty page tables, I have not
limited this set of changes to a THP config; and it has been easier, and
sets a better example, if each site is given appropriate handling: even
where deeper study might prove that failure could only happen if the pmd
table were corrupted.

Several of the patches are, or include, cleanup on the way; and by the
end, pmd_trans_unstable() and suchlike are deleted: pte_offset_map() and
pte_offset_map_lock() then handle those original races and more.  Most
uses of pte_lockptr() are deprecated, with pte_offset_map_nolock() taking
its place.


This patch (of 32):

Use pmdp_get_lockless() in preference to READ_ONCE(*pmdp), to get a more
reliable result with PAE (or READ_ONCE as before without PAE); and remove
the unnecessary extra barrier()s which got left behind in its callers.

HOWEVER: Note the small print in linux/pgtable.h, where it was designed
specifically for fast GUP, and depends on interrupts being disabled for
its full guarantee: most callers which have been added (here and before)
do NOT have interrupts disabled, so there is still some need for caution.

Link: https://lkml.kernel.org/r/f35279a9-9ac0-de22-d245-591afbfb4dc@google.com
Signed-off-by: Hugh Dickins <hughd@google.com>
Acked-by: Yu Zhao <yuzhao@google.com>
Acked-by: Peter Xu <peterx@redhat.com>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: David Hildenbrand <david@redhat.com>
Cc: "Huang, Ying" <ying.huang@intel.com>
Cc: Ira Weiny <ira.weiny@intel.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Lorenzo Stoakes <lstoakes@gmail.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Mike Rapoport (IBM) <rppt@kernel.org>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Naoya Horiguchi <naoya.horiguchi@nec.com>
Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Qi Zheng <zhengqi.arch@bytedance.com>
Cc: Ralph Campbell <rcampbell@nvidia.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: SeongJae Park <sj@kernel.org>
Cc: Song Liu <song@kernel.org>
Cc: Steven Price <steven.price@arm.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Will Deacon <will@kernel.org>
Cc: Yang Shi <shy828301@gmail.com>
Cc: Zack Rusin <zackr@vmware.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-06-19 16:19:12 -07:00
Roberto Sassu
36ce9d76b0 shmem: use ramfs_kill_sb() for kill_sb method of ramfs-based tmpfs
As the ramfs-based tmpfs uses ramfs_init_fs_context() for the
init_fs_context method, which allocates fc->s_fs_info, use ramfs_kill_sb()
to free it and avoid a memory leak.

Link: https://lkml.kernel.org/r/20230607161523.2876433-1-roberto.sassu@huaweicloud.com
Fixes: c3b1b1cbf0 ("ramfs: add support for "mode=" mount option")
Signed-off-by: Roberto Sassu <roberto.sassu@huawei.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-06-19 16:19:04 -07:00
Christoph Hellwig
64d1b4dd82 fuse: use direct_write_fallback
Use the generic direct_write_fallback helper instead of duplicating the
logic.

Link: https://lkml.kernel.org/r/20230601145904.1385409-13-hch@lst.de
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andreas Gruenbacher <agruenba@redhat.com>
Cc: Anna Schumaker <anna@kernel.org>
Cc: Chao Yu <chao@kernel.org>
Cc: Christian Brauner <brauner@kernel.org>
Cc: "Darrick J. Wong" <djwong@kernel.org>
Cc: Ilya Dryomov <idryomov@gmail.com>
Cc: Jaegeuk Kim <jaegeuk@kernel.org>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Miklos Szeredi <miklos@szeredi.hu>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Trond Myklebust <trond.myklebust@hammerspace.com>
Cc: Xiubo Li <xiubli@redhat.com>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Cc: Miklos Szeredi <mszeredi@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-06-09 16:25:54 -07:00
Christoph Hellwig
596df33d67 fuse: drop redundant arguments to fuse_perform_write
pos is always equal to iocb->ki_pos, and mapping is always equal to
iocb->ki_filp->f_mapping.

Link: https://lkml.kernel.org/r/20230601145904.1385409-12-hch@lst.de
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Acked-by: Miklos Szeredi <mszeredi@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andreas Gruenbacher <agruenba@redhat.com>
Cc: Anna Schumaker <anna@kernel.org>
Cc: Chao Yu <chao@kernel.org>
Cc: Christian Brauner <brauner@kernel.org>
Cc: "Darrick J. Wong" <djwong@kernel.org>
Cc: Ilya Dryomov <idryomov@gmail.com>
Cc: Jaegeuk Kim <jaegeuk@kernel.org>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Miklos Szeredi <miklos@szeredi.hu>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Trond Myklebust <trond.myklebust@hammerspace.com>
Cc: Xiubo Li <xiubli@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-06-09 16:25:54 -07:00
Christoph Hellwig
70e986c3b4 fuse: update ki_pos in fuse_perform_write
Both callers of fuse_perform_write need to updated ki_pos, move it into
common code.

Link: https://lkml.kernel.org/r/20230601145904.1385409-11-hch@lst.de
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andreas Gruenbacher <agruenba@redhat.com>
Cc: Anna Schumaker <anna@kernel.org>
Cc: Chao Yu <chao@kernel.org>
Cc: Christian Brauner <brauner@kernel.org>
Cc: "Darrick J. Wong" <djwong@kernel.org>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Ilya Dryomov <idryomov@gmail.com>
Cc: Jaegeuk Kim <jaegeuk@kernel.org>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Miklos Szeredi <miklos@szeredi.hu>
Cc: Miklos Szeredi <mszeredi@redhat.com>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Trond Myklebust <trond.myklebust@hammerspace.com>
Cc: Xiubo Li <xiubli@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-06-09 16:25:54 -07:00
Christoph Hellwig
44fff0fa08 fs: factor out a direct_write_fallback helper
Add a helper dealing with handling the syncing of a buffered write
fallback for direct I/O.

Link: https://lkml.kernel.org/r/20230601145904.1385409-10-hch@lst.de
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Miklos Szeredi <mszeredi@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andreas Gruenbacher <agruenba@redhat.com>
Cc: Anna Schumaker <anna@kernel.org>
Cc: Chao Yu <chao@kernel.org>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Ilya Dryomov <idryomov@gmail.com>
Cc: Jaegeuk Kim <jaegeuk@kernel.org>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Miklos Szeredi <miklos@szeredi.hu>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Trond Myklebust <trond.myklebust@hammerspace.com>
Cc: Xiubo Li <xiubli@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-06-09 16:25:53 -07:00
Christoph Hellwig
8ee93b4bb6 iomap: use kiocb_write_and_wait and kiocb_invalidate_pages
Use the common helpers for direct I/O page invalidation instead of open
coding the logic.  This leads to a slight reordering of checks in
__iomap_dio_rw to keep the logic straight.

Link: https://lkml.kernel.org/r/20230601145904.1385409-9-hch@lst.de
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andreas Gruenbacher <agruenba@redhat.com>
Cc: Anna Schumaker <anna@kernel.org>
Cc: Chao Yu <chao@kernel.org>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Ilya Dryomov <idryomov@gmail.com>
Cc: Jaegeuk Kim <jaegeuk@kernel.org>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Miklos Szeredi <miklos@szeredi.hu>
Cc: Miklos Szeredi <mszeredi@redhat.com>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Trond Myklebust <trond.myklebust@hammerspace.com>
Cc: Xiubo Li <xiubli@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-06-09 16:25:53 -07:00
Christoph Hellwig
219580eea1 iomap: update ki_pos in iomap_file_buffered_write
All callers of iomap_file_buffered_write need to updated ki_pos, move it
into common code.

Link: https://lkml.kernel.org/r/20230601145904.1385409-8-hch@lst.de
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Andreas Gruenbacher <agruenba@redhat.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Acked-by: Damien Le Moal <dlemoal@kernel.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Anna Schumaker <anna@kernel.org>
Cc: Chao Yu <chao@kernel.org>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Ilya Dryomov <idryomov@gmail.com>
Cc: Jaegeuk Kim <jaegeuk@kernel.org>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Miklos Szeredi <miklos@szeredi.hu>
Cc: Miklos Szeredi <mszeredi@redhat.com>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Trond Myklebust <trond.myklebust@hammerspace.com>
Cc: Xiubo Li <xiubli@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-06-09 16:25:53 -07:00
Christoph Hellwig
c402a9a943 filemap: add a kiocb_invalidate_post_direct_write helper
Add a helper to invalidate page cache after a dio write.

Link: https://lkml.kernel.org/r/20230601145904.1385409-7-hch@lst.de
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Acked-by: Darrick J. Wong <djwong@kernel.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andreas Gruenbacher <agruenba@redhat.com>
Cc: Anna Schumaker <anna@kernel.org>
Cc: Chao Yu <chao@kernel.org>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Ilya Dryomov <idryomov@gmail.com>
Cc: Jaegeuk Kim <jaegeuk@kernel.org>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Miklos Szeredi <miklos@szeredi.hu>
Cc: Miklos Szeredi <mszeredi@redhat.com>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Trond Myklebust <trond.myklebust@hammerspace.com>
Cc: Xiubo Li <xiubli@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-06-09 16:25:53 -07:00
Christoph Hellwig
182c25e9c1 filemap: update ki_pos in generic_perform_write
All callers of generic_perform_write need to updated ki_pos, move it into
common code.

Link: https://lkml.kernel.org/r/20230601145904.1385409-4-hch@lst.de
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Xiubo Li <xiubli@redhat.com>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Acked-by: Theodore Ts'o <tytso@mit.edu>
Acked-by: Darrick J. Wong <djwong@kernel.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andreas Gruenbacher <agruenba@redhat.com>
Cc: Anna Schumaker <anna@kernel.org>
Cc: Chao Yu <chao@kernel.org>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Ilya Dryomov <idryomov@gmail.com>
Cc: Jaegeuk Kim <jaegeuk@kernel.org>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Miklos Szeredi <miklos@szeredi.hu>
Cc: Miklos Szeredi <mszeredi@redhat.com>
Cc: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-06-09 16:25:52 -07:00
Christoph Hellwig
936e114a24 iomap: update ki_pos a little later in iomap_dio_complete
Move the ki_pos update down a bit to prepare for a better common helper
that invalidates pages based of an iocb.

Link: https://lkml.kernel.org/r/20230601145904.1385409-3-hch@lst.de
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andreas Gruenbacher <agruenba@redhat.com>
Cc: Anna Schumaker <anna@kernel.org>
Cc: Chao Yu <chao@kernel.org>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Ilya Dryomov <idryomov@gmail.com>
Cc: Jaegeuk Kim <jaegeuk@kernel.org>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Miklos Szeredi <miklos@szeredi.hu>
Cc: Miklos Szeredi <mszeredi@redhat.com>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Trond Myklebust <trond.myklebust@hammerspace.com>
Cc: Xiubo Li <xiubli@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-06-09 16:25:52 -07:00
Christoph Hellwig
0d625446d0 backing_dev: remove current->backing_dev_info
Patch series "cleanup the filemap / direct I/O interaction", v4.

This series cleans up some of the generic write helper calling conventions
and the page cache writeback / invalidation for direct I/O.  This is a
spinoff from the no-bufferhead kernel project, for which we'll want to an
use iomap based buffered write path in the block layer.


This patch (of 12):

The last user of current->backing_dev_info disappeared in commit
b9b1335e64 ("remove bdi_congested() and wb_congested() and related
functions").  Remove the field and all assignments to it.

Link: https://lkml.kernel.org/r/20230601145904.1385409-1-hch@lst.de
Link: https://lkml.kernel.org/r/20230601145904.1385409-2-hch@lst.de
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Christian Brauner <brauner@kernel.org>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Acked-by: Theodore Ts'o <tytso@mit.edu>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andreas Gruenbacher <agruenba@redhat.com>
Cc: Anna Schumaker <anna@kernel.org>
Cc: Chao Yu <chao@kernel.org>
Cc: Ilya Dryomov <idryomov@gmail.com>
Cc: Jaegeuk Kim <jaegeuk@kernel.org>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Miklos Szeredi <miklos@szeredi.hu>
Cc: Miklos Szeredi <mszeredi@redhat.com>
Cc: Trond Myklebust <trond.myklebust@hammerspace.com>
Cc: Xiubo Li <xiubli@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-06-09 16:25:51 -07:00
Lorenzo Stoakes
ca5e863233 mm/gup: remove vmas parameter from get_user_pages_remote()
The only instances of get_user_pages_remote() invocations which used the
vmas parameter were for a single page which can instead simply look up the
VMA directly. In particular:-

- __update_ref_ctr() looked up the VMA but did nothing with it so we simply
  remove it.

- __access_remote_vm() was already using vma_lookup() when the original
  lookup failed so by doing the lookup directly this also de-duplicates the
  code.

We are able to perform these VMA operations as we already hold the
mmap_lock in order to be able to call get_user_pages_remote().

As part of this work we add get_user_page_vma_remote() which abstracts the
VMA lookup, error handling and decrementing the page reference count should
the VMA lookup fail.

This forms part of a broader set of patches intended to eliminate the vmas
parameter altogether.

[akpm@linux-foundation.org: avoid passing NULL to PTR_ERR]
Link: https://lkml.kernel.org/r/d20128c849ecdbf4dd01cc828fcec32127ed939a.1684350871.git.lstoakes@gmail.com
Signed-off-by: Lorenzo Stoakes <lstoakes@gmail.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> (for arm64)
Acked-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Janosch Frank <frankja@linux.ibm.com> (for s390)
Reviewed-by: Christoph Hellwig <hch@lst.de>
Cc: Christian König <christian.koenig@amd.com>
Cc: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Jarkko Sakkinen <jarkko@kernel.org>
Cc: Jason Gunthorpe <jgg@nvidia.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Sakari Ailus <sakari.ailus@linux.intel.com>
Cc: Sean Christopherson <seanjc@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-06-09 16:25:26 -07:00
Yuanchu Xie
7bab8dfb12 mm: pagemap: restrict pagewalk to the requested range
The pagewalk in pagemap_read reads one PTE past the end of the requested
range, and stops when the buffer runs out of space. While it produces
the right result, the extra read is unnecessary and less performant.

I timed the following command before and after this patch:
	dd count=100000 if=/proc/self/pagemap of=/dev/null
The results are consistently within 0.001s across 5 runs.

Before:
100000+0 records in
100000+0 records out
51200000 bytes (51 MB) copied, 0.0763159 s, 671 MB/s

real    0m0.078s
user    0m0.012s
sys     0m0.065s

After:
100000+0 records in
100000+0 records out
51200000 bytes (51 MB) copied, 0.0487928 s, 1.0 GB/s

real    0m0.050s
user    0m0.011s
sys     0m0.039s

Link: https://lkml.kernel.org/r/20230515172608.3558391-1-yuanchu@google.com
Signed-off-by: Yuanchu Xie <yuanchu@google.com>
Acked-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Yang Shi <shy828301@gmail.com>
Acked-by: David Rientjes <rientjes@google.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
Cc: Zach O'Keefe <zokeefe@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-06-09 16:25:20 -07:00
Ackerley Tng
adef080382 fs: hugetlbfs: set vma policy only when needed for allocating folio
Calling hugetlb_set_vma_policy() later avoids setting the vma policy
and then dropping it on a page cache hit.

Link: https://lkml.kernel.org/r/20230502235622.3652586-1-ackerleytng@google.com
Signed-off-by: Ackerley Tng <ackerleytng@google.com>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Erdem Aktas <erdemaktas@google.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Sidhartha Kumar <sidhartha.kumar@oracle.com>
Cc: Vishal Annapurve <vannapurve@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-06-09 16:25:17 -07:00
Yosry Ahmed
2816ea2abf writeback: move wb_over_bg_thresh() call outside lock section
Patch series "cgroup: eliminate atomic rstat flushing", v5.

A previous patch series [1] changed most atomic rstat flushing contexts to
become non-atomic.  This was done to avoid an expensive operation that
scales with # cgroups and # cpus to happen with irqs disabled and
scheduling not permitted.  There were two remaining atomic flushing
contexts after that series.  This series tries to eliminate them as well,
eliminating atomic rstat flushing completely.

The two remaining atomic flushing contexts are:
(a) wb_over_bg_thresh()->mem_cgroup_wb_stats()
(b) mem_cgroup_threshold()->mem_cgroup_usage()

For (a), flushing needs to be atomic as wb_writeback() calls
wb_over_bg_thresh() with a spinlock held.  However, it seems like the call
to wb_over_bg_thresh() doesn't need to be protected by that spinlock, so
this series proposes a refactoring that moves the call outside the lock
criticial section and makes the stats flushing in mem_cgroup_wb_stats()
non-atomic.

For (b), flushing needs to be atomic as mem_cgroup_threshold() is called
with irqs disabled.  We only flush the stats when calculating the root
usage, as it is approximated as the sum of some memcg stats (file, anon,
and optionally swap) instead of the conventional page counter.  This
series proposes changing this calculation to use the global stats instead,
eliminating the need for a memcg stat flush.

After these 2 contexts are eliminated, we no longer need
mem_cgroup_flush_stats_atomic() or cgroup_rstat_flush_atomic().  We can
remove them and simplify the code.

[1] https://lore.kernel.org/linux-mm/20230330191801.1967435-1-yosryahmed@google.com/


This patch (of 5):

wb_over_bg_thresh() calls mem_cgroup_wb_stats() which invokes an rstat
flush, which can be expensive on large systems. Currently,
wb_writeback() calls wb_over_bg_thresh() within a lock section, so we
have to do the rstat flush atomically. On systems with a lot of
cpus and/or cgroups, this can cause us to disable irqs for a long time,
potentially causing problems.

Move the call to wb_over_bg_thresh() outside the lock section in
preparation to make the rstat flush in mem_cgroup_wb_stats() non-atomic.
The list_empty(&wb->work_list) check should be okay outside the lock
section of wb->list_lock as it is protected by a separate lock
(wb->work_lock), and wb_over_bg_thresh() doesn't seem like it is
modifying any of wb->b_* lists the wb->list_lock is protecting.
Also, the loop seems to be already releasing and reacquring the
lock, so this refactoring looks safe.

Link: https://lkml.kernel.org/r/20230421174020.2994750-1-yosryahmed@google.com
Link: https://lkml.kernel.org/r/20230421174020.2994750-2-yosryahmed@google.com
Signed-off-by: Yosry Ahmed <yosryahmed@google.com>
Reviewed-by: Michal Koutný <mkoutny@suse.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Acked-by: Shakeel Butt <shakeelb@google.com>
Acked-by: Tejun Heo <tj@kernel.org>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Muchun Song <songmuchun@bytedance.com>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-06-09 16:25:14 -07:00
Linus Torvalds
b158dd941b for-6.4-rc3-tag
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEE8rQSAMVO+zA4DBdWxWXV+ddtWDsFAmRwqRsACgkQxWXV+ddt
 WDuCPQ//T8JVY6usnGF/Fw/3zbtDNvrdQLDfp3HovIg7gmLIBda0bT05w4Q46FUU
 l4BV0bHyTUNWPlXUmrrSmt8HipRe2z4Wjwc16azdLmSs5zf0FO1LbsCKDmM8Ncid
 LTi2jzyyb3E44ZzC/i7RCaBt+vYRb2ZmtZ/glh3K4H0GgTAYl1GxZoAoYgBnvmlG
 nvmlWWDaM2cRKaUREm75il37LKLIlW5jvdUFQrqwWNgUH72ay5/7SZxHywlk8x6b
 qwhhp+s6bMUNzi6CqE2SLnESjI9yl0l/0gLebhDXVulo0BiCrti+YLpueP4eQs1B
 yYXX3PvHOXhoN4tUQ4yDF9G57To4Gw1aiQOnWOOLcbyGG1ZgyekpoRRXh6r74LKt
 FDyWT+u/xd78by1km3VzqmvKtqHnRFNMYfP+MMDIhyhy5prKCWeVo7bC+2FP+89o
 kv9+0Z0w0lkLycFfLaewZkEv0/WY8GMuT7kptHQ2Ao6ulAvG+j97sgVBFGXJjeCr
 B1OAGdeTF79IV139bCxPA62cat87Zrh15mZN+y7U32Vs2JkOqbT0LTQGKoVs/TCI
 AyHCDb8oOfGiebibnEDrDNtubz7NFCq4ntZRmuv5FJ+l2d1wl6ZvsI+DoYP7Zide
 DLR7ZtPs1Yvm27xDjs+fVmMx4nuNGikEbPZPxJro1CjLVzCEt7k=
 =elHB
 -----END PGP SIGNATURE-----

Merge tag 'for-6.4-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux

Pull btrfs fixes from David Sterba:

 - handle memory allocation error in checksumming helper (reported by
   syzbot)

 - fix lockdep splat when aborting a transaction, add NOFS protection
   around invalidate_inode_pages2 that could allocate with GFP_KERNEL

 - reduce chances to hit an ENOSPC during scrub with RAID56 profiles

* tag 'for-6.4-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
  btrfs: use nofs when cleaning up aborted transactions
  btrfs: handle memory allocation failure in btrfs_csum_one_bio
  btrfs: scrub: try harder to mark RAID56 block groups read-only
2023-05-26 13:21:38 -07:00
Linus Torvalds
0d85b27b0c four smb3 client server fixes (3 also for stable) and 3 patches related to move of fs/cifs and fs/ksmbd directories to common fs/smb parent directory
-----BEGIN PGP SIGNATURE-----
 
 iQGzBAABCgAdFiEE6fsu8pdIjtWE/DpLiiy9cAdyT1EFAmRv7UIACgkQiiy9cAdy
 T1GzUQv+KF/IDyb5wzxamh35hSDLzQo1KKaGPdumN+xyQXUhiaml3XqWfQPWC3EO
 vDF4a5zvi5Wm0TNwwYqUFwgMBKFqlUHw64qEkIv6MW/IHOv8/CYepBIeTLwIQCyr
 REJgfU1oJJLa0U4DsPYpwgEVqnuFdb20oaKMPVTgCAHnnpKsBTtKa7ZDbCBZtHOV
 URg7at6c/Dc6uWGOWRif++llmq5a5b6sBxtZ+C99dQGDKvSqbFTOf6If1u6HAaO0
 m75DPcb9o2IA2lLjxALbbIeofPeEphkcH2WBUNHC2tfFo91EcndVxfKB/jo8wzP7
 /5MGHlFEjmupPsPJq6bbMNj+jyPa3UM/CGqsw8ij4SmmIIt0FbBsFPuEMIPmtAsW
 GJL0/Nf1cDiJJIeMahaW936VRK66VLkEGvhKFCxVpPA93IN0eNh1E0HSXsGrpYJ+
 lp4edXJam/2rHngbPgB+LUaPHoVTj0xZwTDGDzNlyI6S5HWcBv43CgzlFF1sWtpD
 4CNjpqzS
 =sHwf
 -----END PGP SIGNATURE-----

Merge tag '6.4-rc3-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6

Pull smb directory moves and client fixes from Steve French:
 "Four smb3 client fixes (three of which marked for stable) and three
  patches to move of fs/cifs and fs/ksmbd to a new common "fs/smb"
  parent directory

   - Move the client and server source directories to a common parent
     directory:

       fs/cifs -> fs/smb/client
       fs/ksmbd -> fs/smb/server
       fs/smbfs_common -> fs/smb/common

   - important readahead fix

   - important fix for SMB1 regression

   - fix for missing mount option ("mapchars") in mount API conversion

   - minor debugging improvement"

* tag '6.4-rc3-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6:
  smb3: move Documentation/filesystems/cifs to Documentation/filesystems/smb
  cifs: correct references in Documentation to old fs/cifs path
  smb: move client and server files to common directory fs/smb
  cifs: mapchars mount option ignored
  smb3: display debug information better for encryption
  cifs: fix smb1 mount regression
  cifs: Fix cifs_limit_bvec_subset() to correctly check the maxmimum size
2023-05-25 19:23:18 -07:00
Linus Torvalds
9db898594c vfs/v6.4-rc2/misc.fixes
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCZG9CygAKCRCRxhvAZXjc
 opSUAP94up0d2bhB4CDRGkszpBefogBXyEylT8v+1EPtzs8K6QEA9OEbn4wWsIlh
 vYLUjejArgUGuxDl7iiZzAx8p6n9qws=
 =lEs3
 -----END PGP SIGNATURE-----

Merge tag 'vfs/v6.4-rc3/misc.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs

Pull vfs fixes from Christian Brauner:

 - During the acl rework we merged this cycle the generic_listxattr()
   helper had to be modified in a way that in principle it would allow
   for POSIX ACLs to be reported. At least that was the impression we
   had initially. Because before the acl rework POSIX ACLs would be
   reported if the filesystem did have POSIX ACL xattr handlers in
   sb->s_xattr. That logic changed and now we can simply check whether
   the superblock has SB_POSIXACL set and if the inode has
   inode->i_{default_}acl set report the appropriate POSIX ACL name.

   However, we didn't realize that generic_listxattr() was only ever
   used by two filesystems. Both of them don't support POSIX ACLs via
   sb->s_xattr handlers and so never reported POSIX ACLs via
   generic_listxattr() even if they raised SB_POSIXACL and did contain
   inodes which had acls set. The example here is nfs4.

   As a result, generic_listxattr() suddenly started reporting POSIX
   ACLs when it wouldn't have before. Since SB_POSIXACL implies that the
   umask isn't stripped in the VFS nfs4 can't just drop SB_POSIXACL from
   the superblock as it would also alter umask handling for them.

   So just have generic_listxattr() not report POSIX ACLs as it never
   did anyway. It's documented as such.

 - Our SB_* flags currently use a signed integer and we shift the last
   bit causing UBSAN to complain about undefined behavior. Switch to
   using unsigned. While the original patch used an explicit unsigned
   bitshift it's now pretty common to rely on the BIT() macro in a lot
   of headers nowadays. So the patch has been adjusted to use that.

 - Add Namjae as ntfs reviewer. They're already active this cycle so
   let's make it explicit right now.

* tag 'vfs/v6.4-rc3/misc.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
  ntfs: Add myself as a reviewer
  fs: don't call posix_acl_listxattr in generic_listxattr
  fs: fix undefined behavior in bit shift for SB_NOUSER
2023-05-25 11:03:58 -07:00
Steve French
bf8a352d49 cifs: correct references in Documentation to old fs/cifs path
The fs/cifs directory has moved to fs/smb/client, correct mentions
of this in Documentation and comments.

Acked-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
2023-05-24 16:29:21 -05:00
Steve French
38c8a9a520 smb: move client and server files to common directory fs/smb
Move CIFS/SMB3 related client and server files (cifs.ko and ksmbd.ko
and helper modules) to new fs/smb subdirectory:

   fs/cifs --> fs/smb/client
   fs/ksmbd --> fs/smb/server
   fs/smbfs_common --> fs/smb/common

Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Acked-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
2023-05-24 16:29:21 -05:00
Steve French
cb8b02fd63 cifs: mapchars mount option ignored
There are two ways that special characters (not allowed in some
other operating systems like Windows, but allowed in POSIX) have
been mapped in the past ("SFU" and "SFM" mappings) to allow them
to be stored in a range reserved for special chars. The default
for Linux has been to use "mapposix" (ie the SFM mapping) but
the conversion to the new mount API in the 5.11 kernel broke
the ability to override the default mapping of the reserved
characters (like '?' and '*' and '\') via "mapchars" mount option.

This patch fixes that - so can now mount with "mapchars"
mount option to override the default ("mapposix" ie SFM) mapping.

Reported-by: Tyler Spivey <tspivey8@gmail.com>
Fixes: 24e0a1eff9 ("cifs: switch to new mount api")
Signed-off-by: Steve French <stfrench@microsoft.com>
2023-05-24 16:26:44 -05:00
Steve French
8b4dd44f9b smb3: display debug information better for encryption
Fix /proc/fs/cifs/DebugData to use the same case for "encryption"
(ie "Encryption" with init capital letter was used in one place).
In addition, if gcm256 encryption (intead of gcm128) is used on
a connection to a server, note that in the DebugData as well.

It now displays (when gcm256 negotiated):
 Security type: RawNTLMSSP  SessionId: 0x86125800bc000b0d encrypted(gcm256)

Acked-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2023-05-24 16:23:33 -05:00
Paulo Alcantara
72a7804a66 cifs: fix smb1 mount regression
cifs.ko maps NT_STATUS_NOT_FOUND to -EIO when SMB1 servers couldn't
resolve referral paths.  Proceed to tree connect when we get -EIO from
dfs_get_referral() as well.

Reported-by: Kris Karas (Bug Reporting) <bugs-a21@moonlit-rail.com>
Tested-by: Woody Suwalski <terraluna977@gmail.com>
Fixes: 8e3554150d ("cifs: fix sharing of DFS connections")
Cc: stable@vger.kernel.org # v6.2+
Signed-off-by: Paulo Alcantara (SUSE) <pc@manguebit.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2023-05-24 16:22:44 -05:00
David Howells
4ef4aee67e cifs: Fix cifs_limit_bvec_subset() to correctly check the maxmimum size
Fix cifs_limit_bvec_subset() so that it limits the span to the maximum
specified and won't return with a size greater than max_size.

Fixes: d08089f649 ("cifs: Change the I/O paths to use an iterator rather than a page list")
Cc: stable@vger.kernel.org # 6.3
Reported-by: Shyam Prasad N <sprasad@microsoft.com>
Reviewed-by: Shyam Prasad N <sprasad@microsoft.com>
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Steve French <smfrench@gmail.com>
cc: Rohith Surabattula <rohiths.msft@gmail.com>
cc: Paulo Alcantara <pc@manguebit.com>
cc: Tom Talpey <tom@talpey.com>
cc: Jeff Layton <jlayton@kernel.org>
cc: linux-cifs@vger.kernel.org
cc: linux-fsdevel@vger.kernel.org
Signed-off-by: Steve French <stfrench@microsoft.com>
2023-05-23 15:35:44 -05:00
Linus Torvalds
5fe326b446 Changes since last update:
- Fix null-ptr-deref related to long xattr name prefixes;
 
  - Avoid pcpubuf compilation if CONFIG_EROFS_FS_ZIP is off;
 
  - Use high priority kthreads by default if per-cpu kthread workers are
    enabled.
 -----BEGIN PGP SIGNATURE-----
 
 iIcEABYIAC8WIQThPAmQN9sSA0DVxtI5NzHcH7XmBAUCZGzgSBEceGlhbmdAa2Vy
 bmVsLm9yZwAKCRA5NzHcH7XmBP4SAP9l5ct5U/aqteASSm+VkEjtZe546A3WwoYK
 dXgY8LzKAAD/QfWVpBocK605rbEBb2KfJMnvgQ20Pvzd2jQhox8x7Qg=
 =CaUC
 -----END PGP SIGNATURE-----

Merge tag 'erofs-for-6.4-rc4-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs

Pull erofs fixes from Gao Xiang:
 "One patch addresses a null-ptr-deref issue reported by syzbot weeks
  ago, which is caused by the new long xattr name prefix feature and
  needs to be fixed.

  The remaining two patches are minor cleanups to avoid unnecessary
  compilation and adjust per-cpu kworker configuration.

  Summary:

   - Fix null-ptr-deref related to long xattr name prefixes

   - Avoid pcpubuf compilation if CONFIG_EROFS_FS_ZIP is off

   - Use high priority kthreads by default if per-cpu kthread workers
     are enabled"

* tag 'erofs-for-6.4-rc4-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs:
  erofs: use HIPRI by default if per-cpu kthreads are enabled
  erofs: avoid pcpubuf.c inclusion if CONFIG_EROFS_FS_ZIP is off
  erofs: fix null-ptr-deref caused by erofs_xattr_prefixes_init
2023-05-23 10:47:32 -07:00
Gao Xiang
cf7f2732b4 erofs: use HIPRI by default if per-cpu kthreads are enabled
As Sandeep shown [1], high priority RT per-cpu kthreads are
typically helpful for Android scenarios to minimize the scheduling
latencies.

Switch EROFS_FS_PCPU_KTHREAD_HIPRI on by default if
EROFS_FS_PCPU_KTHREAD is on since it's the typical use cases for
EROFS_FS_PCPU_KTHREAD.

Also clean up unneeded sched_set_normal().

[1] https://lore.kernel.org/r/CAB=BE-SBtO6vcoyLNA9F-9VaN5R0t3o_Zn+FW8GbO6wyUqFneQ@mail.gmail.com

Reviewed-by: Yue Hu <huyue2@coolpad.com>
Reviewed-by: Sandeep Dhavale <dhavale@google.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Link: https://lore.kernel.org/r/20230522092141.124290-1-hsiangkao@linux.alibaba.com
2023-05-23 16:57:08 +08:00
Yue Hu
285d0f85da erofs: avoid pcpubuf.c inclusion if CONFIG_EROFS_FS_ZIP is off
The function of pcpubuf.c is just for low-latency decompression
algorithms (e.g. lz4).

Signed-off-by: Yue Hu <huyue2@coolpad.com>
Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Link: https://lore.kernel.org/r/20230515095758.10391-1-zbestahu@gmail.com
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
2023-05-23 16:56:40 +08:00
Jingbo Xu
0a17567b4a erofs: fix null-ptr-deref caused by erofs_xattr_prefixes_init
Fragments and dedupe share one feature bit, and thus packed inode may not
exist when fragment feature bit (dedupe feature bit exactly) is set, e.g.
when deduplication feature is in use while fragments feature is not.  In
this case, sbi->packed_inode could be NULL while fragments feature bit
is set.

Fix this by accessing packed inode only when it exists.

Reported-by: syzbot+902d5a9373ae8f748a94@syzkaller.appspotmail.com
Link: https://syzkaller.appspot.com/bug?extid=902d5a9373ae8f748a94
Reported-and-tested-by: syzbot+bbb353775d51424087f2@syzkaller.appspotmail.com
Fixes: 9e38291461 ("erofs: add helpers to load long xattr name prefixes")
Fixes: 6a318ccd7e ("erofs: enable long extended attribute name prefixes")
Signed-off-by: Jingbo Xu <jefflexu@linux.alibaba.com>
Reviewed-by: Yue Hu <huyue2@coolpad.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Link: https://lore.kernel.org/r/20230515103941.129784-1-jefflexu@linux.alibaba.com
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
2023-05-23 16:56:21 +08:00
Linus Torvalds
421ca22e31 NFS Client Bugfixes for Linux 6.4-rc
Stable Fix:
   * Don't change task->tk_status after the call to rpc_exit_task
 
 Other Bugfixes:
   * Convert kmap_atomic() to kmap_local_folio()
   * Fix a potential double free with READ_PLUS
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEnZ5MQTpR7cLU7KEp18tUv7ClQOsFAmRrttUACgkQ18tUv7Cl
 QOuhaA//QFHklXZk/vCkQnNQMYWL11GJliWawLoDfcZal6uQ/a2QCQV1Cbmav62B
 FR2BmXDxzM2PRdLu2VHGpkn0CQW3M1tvgaNjGD1xdOxpyIkn47T5lfAd/4X2XPiU
 M1ck2Usc258UB1yoKV+jbUD3ptn2BvC+VMWJInaA578hv8TA6Ouh7lP7rPJfDHoJ
 OfoLxx9/VqGqMWzfExAHnGw328oieXNnOwynETAdapVwjQeiEcYAED82pJmVsD7+
 m++6dRVQRA2bMIMRFWmW8HsO08sR32wzy76XgKws4Xu59Fiy+TQ8PoeUjCtTNq6/
 9ibPwH4R7VbcxXa2eT23EbtO2nSkZw/dFiL0s5VNYqeVrBwwlzyklU1uSvIEPegk
 zHamqxMMlVLkoMwJa83wIKB8/viPKwV5zcF9UjmrJy67+wXZet6M0c7S9HyiTj9U
 NzVbqyK3KhMtsD4ps/EGVWsgGKAIeWbE8wPlP7GF7PHwEw+hWa9pHir6L6BizNqG
 DJ/2zfZxDvOGy2r5OvSqGn07/zsj+0URixzEq0IOn1Li/osFZpvK3EVFncd/qsvW
 NwPRoF+70skFRdXhbdWa/HEUZlyN2uiIU24luraMrN0U4b4X7aw+EMnMekBi+Vec
 bEtWEUJ/vK3mlsOde4gVW0PZBhe8JE6PHlqkQBn5zobV3/cXXCw=
 =6xFZ
 -----END PGP SIGNATURE-----

Merge tag 'nfs-for-6.4-2' of git://git.linux-nfs.org/projects/anna/linux-nfs

Pull NFS client fixes from Anna Schumaker:
 "Stable Fix:

   - Don't change task->tk_status after the call to rpc_exit_task

  Other Bugfixes:

   - Convert kmap_atomic() to kmap_local_folio()

   - Fix a potential double free with READ_PLUS"

* tag 'nfs-for-6.4-2' of git://git.linux-nfs.org/projects/anna/linux-nfs:
  NFSv4.2: Fix a potential double free with READ_PLUS
  SUNRPC: Don't change task->tk_status after the call to rpc_exit_task
  NFS: Convert kmap_atomic() to kmap_local_folio()
2023-05-22 12:01:13 -07:00
Linus Torvalds
e2065b8c1b four ksmbd server fixes
-----BEGIN PGP SIGNATURE-----
 
 iQGzBAABCgAdFiEE6fsu8pdIjtWE/DpLiiy9cAdyT1EFAmRppOoACgkQiiy9cAdy
 T1GXBAv9FP5orZJKZ2yR+k/xAccodIPUlAx9ZcfBw9rV8dihny0RzOhafRm4FUln
 EuXoS+nWAxNiaOfLZDQ6PezzeVtYbNvlx5EOZ3tZt2I4tb65hdgdiP9axgo6KtfY
 dXMH+Ml2wNxgey9HOfDzDnxdGpBXiNaKlIMbBf0BdtTzvo+BNQulP21P/8SLJg11
 mbHj9XBouae5D7yakJlefq09wKgzolK5ZYqQyLSF2gpVPzQHB+m0zNXBaaHFQbdC
 7xHr+wPBLERyNnEW6F9WBZ9d5ayqdt+UE6HjxeQtnXzkQgrWHKMqJfdEcwjitYCN
 CNTpGdJGxoi7JjbJczPcG3bglJPpOPwbOdu7MTMvom/o4DhR8jrxjtv69k8Kt8ZH
 WSHsS/740psJFnRf9nY82DHEY1Hy27V/5xtLOjvV2C2nR/Z0KUDIR6/lWnpuWUyU
 is/pTbTFGOqQ6xtxnfIFgSx6aYRgbR1chljBzalPKtzuNLipyAKNePRBELYo9hko
 y+M7HtAQ
 =ZNmq
 -----END PGP SIGNATURE-----

Merge tag '6.4-rc2-ksmbd-server-fixes' of git://git.samba.org/ksmbd

Pull ksmbd server fixes from Steve French:

 - two fixes for incorrect SMB3 message validation (one for client which
   uses 8 byte padding, and one for empty bcc)

 - two fixes for out of bounds bugs: one for username offset checks (in
   session setup) and the other for create context name length checks in
   open requests

* tag '6.4-rc2-ksmbd-server-fixes' of git://git.samba.org/ksmbd:
  ksmbd: smb2: Allow messages padded to 8byte boundary
  ksmbd: allocate one more byte for implied bcc[0]
  ksmbd: fix wrong UserName check in session_user
  ksmbd: fix global-out-of-bounds in smb2_find_context_vals
2023-05-21 10:55:31 -07:00
Linus Torvalds
0c9dcf128e 2 smb3 client fixes, both related to deferred close, and also for stable
-----BEGIN PGP SIGNATURE-----
 
 iQGzBAABCgAdFiEE6fsu8pdIjtWE/DpLiiy9cAdyT1EFAmRpn+4ACgkQiiy9cAdy
 T1F07gv/dvtE23DaAsTtOsXMzc2fQ9jyQiexgUUMWjYWeWJS06r2o3QMWsSV86QT
 z645h6jYgUBeuWVFPF/h0WYjGn/C35Fy08SRuNSReNNahYbNh0A5fe+ic8AoA+f1
 LWQYOqRkAaZdcfuOP2Cg2OiNDswxLln4L0eTlJu7Hrdi/xUM5qa66VmFfvfVsu3/
 nUlV9KGV6lVoEJbD2Oy+9pfB/2ltgmauQqofXAh35BHSah8Q5U2E2QHHhyMwRBBc
 qSINxSoNDDyoW5sCXxzgBPH23lzlMNo0tHVRSqPMtLypzoehzwHmkFJVuGv2F82n
 Mj+pMD7As4d7/82IpmCMkhkOcUCRLa/d3gHqZMZVCFSXJ8tpTbRTBiiervJ3/94M
 IYfZiBuKy6z2mYdE8sW0zXCXzYE9+iAgySER5Ey2IXlbCSN7N81lV2KE8E4jjKhM
 Qoe5DL/AGSjDW0RFSOC7PPRpOqpc//PV2JpPmoYodV1i1nWq5dC1DhQcbXjg/r7c
 0fABdS0y
 =hi0y
 -----END PGP SIGNATURE-----

Merge tag '6.4-rc2-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6

Pull cifs client fixes from Steve French:
 "Two smb3 client fixes, both related to deferred close, and also for
  stable:

   - send close for deferred handles before not after lease break
     response to avoid possible sharing violations

   - check all opens on an inode (looking for deferred handles) when
     lease break is returned not just the handle the lease break came in
     on"

* tag '6.4-rc2-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6:
  SMB3: drop reference to cfile before sending oplock break
  SMB3: Close all deferred handles of inode in case of handle lease break
2023-05-21 10:20:58 -07:00
Anna Schumaker
43439d858b NFSv4.2: Fix a potential double free with READ_PLUS
kfree()-ing the scratch page isn't enough, we also need to set the pointer
back to NULL to avoid a double-free in the case of a resend.

Fixes: fbd2a05f29 (NFSv4.2: Rework scratch handling for READ_PLUS)
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2023-05-19 17:11:59 -04:00
Fabio M. De Francesco
4b71e2416e NFS: Convert kmap_atomic() to kmap_local_folio()
kmap_atomic() is deprecated in favor of kmap_local_{folio,page}().

Therefore, replace kmap_atomic() with kmap_local_folio() in
nfs_readdir_folio_array_append().

kmap_atomic() disables page-faults and preemption (the latter only for
!PREEMPT_RT kernels), However, the code within the mapping/un-mapping in
nfs_readdir_folio_array_append() does not depend on the above-mentioned
side effects.

Therefore, a mere replacement of the old API with the new one is all that
is required (i.e., there is no need to explicitly add any calls to
pagefault_disable() and/or preempt_disable()).

Tested with (x)fstests in a QEMU/KVM x86_32 VM, 6GB RAM, booting a kernel
with HIGHMEM64GB enabled.

Cc: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Fabio M. De Francesco <fmdefrancesco@gmail.com>
Fixes: ec108d3cc7 ("NFS: Convert readdir page array functions to use a folio")
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2023-05-19 16:50:05 -04:00
Linus Torvalds
a594874588 A workaround for a just discovered bug in MClientSnap encoding which
goes back to 2017 (marked for stable) and a fixup to quieten a static
 checker.
 -----BEGIN PGP SIGNATURE-----
 
 iQFHBAABCAAxFiEEydHwtzie9C7TfviiSn/eOAIR84sFAmRnnmQTHGlkcnlvbW92
 QGdtYWlsLmNvbQAKCRBKf944AhHzi+UGB/9b2jo9bvRJXm3Z9baTyCYGCLmpOMYB
 gUDAHY9iTZBWdxbk+YppCWyh20oXz1082DV6vMn2FBhFgv4um/7GXesoVMGin73n
 5w3YB8nBW0LeFsuuLMp+tnWnsIbYxEdVmNSe5lNZX16UVRW+GUBJeLPeiJrB2YCE
 NuCWw4SUxRDKU1cCHWIBjIz0qJmvbW+8U7f0OwPqk1e5QmoE9Fs44sfJ9aBX4ap7
 nlPWsoNX0fRixKNcsueBHLr4xEqYG0qqyvCiZnz3r59Zlcs2HwcfixBfNnJPjDeu
 3ijPm+mYjAT8Vg2mVwf2fCXAtdXlzX9+ULHZDp2VoD/0LB+E5ep08HAO
 =Vixp
 -----END PGP SIGNATURE-----

Merge tag 'ceph-for-6.4-rc3' of https://github.com/ceph/ceph-client

Pull ceph fixes from Ilya Dryomov:
 "A workaround for a just discovered bug in MClientSnap encoding which
  goes back to 2017 (marked for stable) and a fixup to quieten a static
  checker"

* tag 'ceph-for-6.4-rc3' of https://github.com/ceph/ceph-client:
  ceph: force updating the msg pointer in non-split case
  ceph: silence smatch warning in reconnect_caps_cb()
2023-05-19 12:02:12 -07:00
Linus Torvalds
ac92c27935 s390 updates for 6.4-rc3
- Add check whether the required facilities are installed
   before using the s390-specific ChaCha20 implementation.
 
 - Key blobs for s390 protected key interface IOCTLs commands
   PKEY_VERIFYKEY2 and PKEY_VERIFYKEY3 may contain clear key
   material. Zeroize copies of these keys in kernel memory
   after creating protected keys.
 
 - Set CONFIG_INIT_STACK_NONE=y in defconfigs to avoid extra
   overhead of initializing all stack variables by default.
 
 - Make sure that when a new channel-path is enabled all
   subchannels are evaluated: with and without any devices
   connected on it.
 
 - When SMT thread CPUs are added to CPU topology masks the
   nr_cpu_ids limit is not checked and could be exceeded.
   Respect the nr_cpu_ids limit and avoid a warning when
   CONFIG_DEBUG_PER_CPU_MAPS is set.
 
 - The pointer to IPL Parameter Information Block is stored
   in the absolute lowcore as a virtual address. Save it as
   the physical address for later use by dump tools.
 
 - Fix a Queued Direct I/O (QDIO) problem on z/VM guests using
   QIOASSIST with dedicated (pass through) QDIO-based devices
   such as FCP, real OSA or HiperSockets.
 
 - s390's struct statfs and struct statfs64 contain padding,
   which field-by-field copying does not set. Initialize the
   respective structures with zeros before filling them and
   copying to userspace.
 
 - Grow s390 compat_statfs64, statfs and statfs64 structures
   f_spare array member to cover padding and simplify things.
 
 - Remove obsolete SCHED_BOOK and SCHED_DRAWER configs.
 
 - Remove unneeded S390_CCW_IOMMU and S390_AP_IOM configs.
 -----BEGIN PGP SIGNATURE-----
 
 iI0EABYIADUWIQQrtrZiYVkVzKQcYivNdxKlNrRb8AUCZGd5BRccYWdvcmRlZXZA
 bGludXguaWJtLmNvbQAKCRDNdxKlNrRb8OqMAQCsdBG7eR3dp3mY8ao34dqlWt98
 rDQD8oiMgCkFyn77jQEAoo3HhqWY8oTu88fl82dkF0OpGW+7zgoNHUYhH8Z0gAY=
 =wtTO
 -----END PGP SIGNATURE-----

Merge tag 's390-6.4-2' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux

Pull s390 fixes from Alexander Gordeev:

 - Add check whether the required facilities are installed before using
   the s390-specific ChaCha20 implementation

 - Key blobs for s390 protected key interface IOCTLs commands
   PKEY_VERIFYKEY2 and PKEY_VERIFYKEY3 may contain clear key material.
   Zeroize copies of these keys in kernel memory after creating
   protected keys

 - Set CONFIG_INIT_STACK_NONE=y in defconfigs to avoid extra overhead of
   initializing all stack variables by default

 - Make sure that when a new channel-path is enabled all subchannels are
   evaluated: with and without any devices connected on it

 - When SMT thread CPUs are added to CPU topology masks the nr_cpu_ids
   limit is not checked and could be exceeded. Respect the nr_cpu_ids
   limit and avoid a warning when CONFIG_DEBUG_PER_CPU_MAPS is set

 - The pointer to IPL Parameter Information Block is stored in the
   absolute lowcore as a virtual address. Save it as the physical
   address for later use by dump tools

 - Fix a Queued Direct I/O (QDIO) problem on z/VM guests using QIOASSIST
   with dedicated (pass through) QDIO-based devices such as FCP, real
   OSA or HiperSockets

 - s390's struct statfs and struct statfs64 contain padding, which
   field-by-field copying does not set. Initialize the respective
   structures with zeros before filling them and copying to userspace

 - Grow s390 compat_statfs64, statfs and statfs64 structures f_spare
   array member to cover padding and simplify things

 - Remove obsolete SCHED_BOOK and SCHED_DRAWER configs

 - Remove unneeded S390_CCW_IOMMU and S390_AP_IOM configs

* tag 's390-6.4-2' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
  s390/iommu: get rid of S390_CCW_IOMMU and S390_AP_IOMMU
  s390/Kconfig: remove obsolete configs SCHED_{BOOK,DRAWER}
  s390/uapi: cover statfs padding by growing f_spare
  statfs: enforce statfs[64] structure initialization
  s390/qdio: fix do_sqbs() inline assembly constraint
  s390/ipl: fix IPIB virtual vs physical address confusion
  s390/topology: honour nr_cpu_ids when adding CPUs
  s390/cio: include subchannels without devices also for evaluation
  s390/defconfigs: set CONFIG_INIT_STACK_NONE=y
  s390/pkey: zeroize key blobs
  s390/crypto: use vector instructions only if available for ChaCha20
2023-05-19 11:11:04 -07:00