Go to file
Yang Shi e0f43fa506 mm: filemap: coding style cleanup for filemap_map_pmd()
Patch series "Solve silent data loss caused by poisoned page cache (shmem/tmpfs)", v5.

When discussing the patch that splits page cache THP in order to offline
the poisoned page, Noaya mentioned there is a bigger problem [1] that
prevents this from working since the page cache page will be truncated
if uncorrectable errors happen.  By looking this deeper it turns out
this approach (truncating poisoned page) may incur silent data loss for
all non-readonly filesystems if the page is dirty.  It may be worse for
in-memory filesystem, e.g.  shmem/tmpfs since the data blocks are
actually gone.

To solve this problem we could keep the poisoned dirty page in page
cache then notify the users on any later access, e.g.  page fault,
read/write, etc.  The clean page could be truncated as is since they can
be reread from disk later on.

The consequence is the filesystems may find poisoned page and manipulate
it as healthy page since all the filesystems actually don't check if the
page is poisoned or not in all the relevant paths except page fault.  In
general, we need make the filesystems be aware of poisoned page before
we could keep the poisoned page in page cache in order to solve the data
loss problem.

To make filesystems be aware of poisoned page we should consider:

 - The page should be not written back: clearing dirty flag could
   prevent from writeback.

 - The page should not be dropped (it shows as a clean page) by drop
   caches or other callers: the refcount pin from hwpoison could prevent
   from invalidating (called by cache drop, inode cache shrinking, etc),
   but it doesn't avoid invalidation in DIO path.

 - The page should be able to get truncated/hole punched/unlinked: it
   works as it is.

 - Notify users when the page is accessed, e.g. read/write, page fault
   and other paths (compression, encryption, etc).

The scope of the last one is huge since almost all filesystems need do
it once a page is returned from page cache lookup.  There are a couple
of options to do it:

 1. Check hwpoison flag for every path, the most straightforward way.

 2. Return NULL for poisoned page from page cache lookup, the most
    callsites check if NULL is returned, this should have least work I
    think. But the error handling in filesystems just return -ENOMEM,
    the error code will incur confusion to the users obviously.

 3. To improve #2, we could return error pointer, e.g. ERR_PTR(-EIO),
    but this will involve significant amount of code change as well
    since all the paths need check if the pointer is ERR or not just
    like option #1.

I did prototypes for both #1 and #3, but it seems #3 may require more
changes than #1.  For #3 ERR_PTR will be returned so all the callers
need to check the return value otherwise invalid pointer may be
dereferenced, but not all callers really care about the content of the
page, for example, partial truncate which just sets the truncated range
in one page to 0.  So for such paths it needs additional modification if
ERR_PTR is returned.  And if the callers have their own way to handle
the problematic pages we need to add a new FGP flag to tell FGP
functions to return the pointer to the page.

It may happen very rarely, but once it happens the consequence (data
corruption) could be very bad and it is very hard to debug.  It seems
this problem had been slightly discussed before, but seems no action was
taken at that time.  [2]

As the aforementioned investigation, it needs huge amount of work to
solve the potential data loss for all filesystems.  But it is much
easier for in-memory filesystems and such filesystems actually suffer
more than others since even the data blocks are gone due to truncating.
So this patchset starts from shmem/tmpfs by taking option #1.

TODO:
* The unpoison has been broken since commit 0ed950d1f2 ("mm,hwpoison: make
  get_hwpoison_page() call get_any_page()"), and this patch series make
  refcount check for unpoisoning shmem page fail.
* Expand to other filesystems.  But I haven't heard feedback from filesystem
  developers yet.

Patch breakdown:
Patch #1: cleanup, depended by patch #2
Patch #2: fix THP with hwpoisoned subpage(s) PMD map bug
Patch #3: coding style cleanup
Patch #4: refactor and preparation.
Patch #5: keep the poisoned page in page cache and handle such case for all
          the paths.
Patch #6: the previous patches unblock page cache THP split, so this patch
          add page cache THP split support.

This patch (of 4):

A minor cleanup to the indent.

Link: https://lkml.kernel.org/r/20211020210755.23964-1-shy828301@gmail.com
Link: https://lkml.kernel.org/r/20211020210755.23964-4-shy828301@gmail.com
Signed-off-by: Yang Shi <shy828301@gmail.com>
Reviewed-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Peter Xu <peterx@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-11-06 13:30:38 -07:00
arch s390: use generic version of arch_is_kernel_initmem_freed() 2021-11-06 13:30:38 -07:00
block block-5.15-2021-10-29 2021-10-29 11:10:29 -07:00
certs certs: Add support for using elliptic curve keys for signing modules 2021-08-23 19:55:42 +03:00
crypto Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 2021-08-30 12:57:10 -07:00
Documentation Documentation: update pagemap with shmem exceptions 2021-11-06 13:30:36 -07:00
drivers arm64: support page mapping percpu first chunk allocator 2021-11-06 13:30:37 -07:00
fs fs: explicitly unregister per-superblock BDIs 2021-11-06 13:30:34 -07:00
include mm: fix data race in PagePoisoned() 2021-11-06 13:30:38 -07:00
init mm: create a new system state and fix core_kernel_text() 2021-11-06 13:30:38 -07:00
ipc ipc: remove memcg accounting for sops objects in do_semtimedop() 2021-09-14 10:22:11 -07:00
kernel mm: make generic arch_is_kernel_initmem_freed() do what it says 2021-11-06 13:30:38 -07:00
lib lib/test_vmalloc.c: use swap() to make code cleaner 2021-11-06 13:30:37 -07:00
LICENSES LICENSES/dual/CC-BY-4.0: Git rid of "smart quotes" 2021-07-15 06:31:24 -06:00
mm mm: filemap: coding style cleanup for filemap_map_pmd() 2021-11-06 13:30:38 -07:00
net mptcp: fix corrupt receiver key in MPC + data + checksum 2021-10-28 08:19:06 -07:00
samples samples/bpf: Relicense bpf_insn.h as GPL-2.0-only OR BSD-2-Clause 2021-09-29 16:03:55 +02:00
scripts Compiler Attributes: add __alloc_size() for better bounds checking 2021-11-06 13:30:33 -07:00
security Merge branch 'ucount-fixes-for-v5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace 2021-10-21 17:27:17 -10:00
sound ALSA: usb-audio: Fix microphone sound on Jieli webcam. 2021-10-19 08:07:01 +02:00
tools perf script: Fix PERF_SAMPLE_WEIGHT_STRUCT support 2021-10-31 12:51:41 -03:00
usr .gitignore: prefix local generated files with a slash 2021-05-02 00:43:35 +09:00
virt KVM: Remove tlbs_dirty 2021-09-23 11:01:12 -04:00
.clang-format clang-format: Update with the latest for_each macro list 2021-05-12 23:32:39 +02:00
.cocciconfig
.get_maintainer.ignore Opt out of scripts/get_maintainer.pl 2019-05-16 10:53:40 -07:00
.gitattributes .gitattributes: use 'dts' diff driver for dts files 2019-12-04 19:44:11 -08:00
.gitignore .gitignore: ignore only top-level modules.builtin 2021-05-02 00:43:35 +09:00
.mailmap mailmap: add Andrej Shadura 2021-10-18 20:22:03 -10:00
COPYING COPYING: state that all contributions really are covered by this file 2020-02-10 13:32:20 -08:00
CREDITS MAINTAINERS: Move Daniel Drake to credits 2021-09-21 08:34:58 +03:00
Kbuild kbuild: rename hostprogs-y/always to hostprogs/always-y 2020-02-04 01:53:07 +09:00
Kconfig kbuild: ensure full rebuild when the compiler is updated 2020-05-12 13:28:33 +09:00
MAINTAINERS drm fixes for 5.15 final 2021-10-28 12:17:01 -07:00
Makefile Compiler Attributes: add __alloc_size() for better bounds checking 2021-11-06 13:30:33 -07:00
README Drop all 00-INDEX files from Documentation/ 2018-09-09 15:08:58 -06:00

Linux kernel
============

There are several guides for kernel developers and users. These guides can
be rendered in a number of formats, like HTML and PDF. Please read
Documentation/admin-guide/README.rst first.

In order to build the documentation, use ``make htmldocs`` or
``make pdfdocs``.  The formatted documentation can also be read online at:

    https://www.kernel.org/doc/html/latest/

There are various text files in the Documentation/ subdirectory,
several of them using the Restructured Text markup notation.

Please read the Documentation/process/changes.rst file, as it contains the
requirements for building and running the kernel, and information about
the problems which may result by upgrading your kernel.