Commit Graph

78794 Commits

Author SHA1 Message Date
Bo Liu
811b99fd23 fat (exportfs): fix some kernel-doc warnings
Fix the following W=1 kernel build warning(s):

  fs/fat/nfs.c:21: warning: This comment starts with '/**', but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst
  fs/fat/nfs.c:139: warning: This comment starts with '/**', but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst

Link: https://lkml.kernel.org/r/20221111075648.4005-1-liubo03@inspur.com
Signed-off-by: Bo Liu <liubo03@inspur.com>
Acked-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-11-30 16:13:17 -08:00
Li Zetao
ce2fcf1516 ocfs2: fix memory leak in ocfs2_mount_volume()
There is a memory leak reported by kmemleak:

  unreferenced object 0xffff88810cc65e60 (size 32):
    comm "mount.ocfs2", pid 23753, jiffies 4302528942 (age 34735.105s)
    hex dump (first 32 bytes):
      10 00 00 00 00 00 00 00 00 01 01 01 01 01 01 01  ................
      01 01 01 01 01 01 01 01 00 00 00 00 00 00 00 00  ................
    backtrace:
      [<ffffffff8170f73d>] __kmalloc+0x4d/0x150
      [<ffffffffa0ac3f51>] ocfs2_compute_replay_slots+0x121/0x330 [ocfs2]
      [<ffffffffa0b65165>] ocfs2_check_volume+0x485/0x900 [ocfs2]
      [<ffffffffa0b68129>] ocfs2_mount_volume.isra.0+0x1e9/0x650 [ocfs2]
      [<ffffffffa0b7160b>] ocfs2_fill_super+0xe0b/0x1740 [ocfs2]
      [<ffffffff818e1fe2>] mount_bdev+0x312/0x400
      [<ffffffff819a086d>] legacy_get_tree+0xed/0x1d0
      [<ffffffff818de82d>] vfs_get_tree+0x7d/0x230
      [<ffffffff81957f92>] path_mount+0xd62/0x1760
      [<ffffffff81958a5a>] do_mount+0xca/0xe0
      [<ffffffff81958d3c>] __x64_sys_mount+0x12c/0x1a0
      [<ffffffff82f26f15>] do_syscall_64+0x35/0x80
      [<ffffffff8300006a>] entry_SYSCALL_64_after_hwframe+0x46/0xb0

This call stack is related to two problems.  Firstly, the ocfs2 super uses
"replay_map" to trace online/offline slots, in order to recover offline
slots during recovery and mount.  But when ocfs2_truncate_log_init()
returns an error in ocfs2_mount_volume(), the memory of "replay_map" will
not be freed in error handling path.  Secondly, the memory of "replay_map"
will not be freed if d_make_root() returns an error in ocfs2_fill_super().
But the memory of "replay_map" will be freed normally when completing
recovery and mount in ocfs2_complete_mount_recovery().

Fix the first problem by adding error handling path to free "replay_map"
when ocfs2_truncate_log_init() fails.  And fix the second problem by
calling ocfs2_free_replay_slots(osb) in the error handling path
"out_dismount".  In addition, since ocfs2_free_replay_slots() is static,
it is necessary to remove its static attribute and declare it in header
file.

Link: https://lkml.kernel.org/r/20221109074627.2303950-1-lizetao1@huawei.com
Fixes: 9140db04ef ("ocfs2: recover orphans in offline slots during recovery and mount")
Signed-off-by: Li Zetao <lizetao1@huawei.com>
Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Changwei Ge <gechangwei@live.cn>
Cc: Gang He <ghe@suse.com>
Cc: Jun Piao <piaojun@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-11-30 16:13:17 -08:00
Akinobu Mita
d472cf797c debugfs: fix error when writing negative value to atomic_t debugfs file
The simple attribute files do not accept a negative value since the commit
488dac0c92 ("libfs: fix error cast of negative value in
simple_attr_write()"), so we have to use a 64-bit value to write a
negative value for a debugfs file created by debugfs_create_atomic_t().

This restores the previous behaviour by introducing
DEFINE_DEBUGFS_ATTRIBUTE_SIGNED for a signed value.

Link: https://lkml.kernel.org/r/20220919172418.45257-4-akinobu.mita@gmail.com
Fixes: 488dac0c92 ("libfs: fix error cast of negative value in simple_attr_write()")
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Reported-by: Zhao Gongyi <zhaogongyi@huawei.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Rafael J. Wysocki <rafael@kernel.org>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Wei Yongjun <weiyongjun1@huawei.com>
Cc: Yicong Yang <yangyicong@hisilicon.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-11-30 16:13:16 -08:00
Akinobu Mita
2e41f274f9 libfs: add DEFINE_SIMPLE_ATTRIBUTE_SIGNED for signed value
Patch series "fix error when writing negative value to simple attribute
files".

The simple attribute files do not accept a negative value since the commit
488dac0c92 ("libfs: fix error cast of negative value in
simple_attr_write()"), but some attribute files want to accept a negative
value.


This patch (of 3):

The simple attribute files do not accept a negative value since the commit
488dac0c92 ("libfs: fix error cast of negative value in
simple_attr_write()"), so we have to use a 64-bit value to write a
negative value.

This adds DEFINE_SIMPLE_ATTRIBUTE_SIGNED for a signed value.

Link: https://lkml.kernel.org/r/20220919172418.45257-1-akinobu.mita@gmail.com
Link: https://lkml.kernel.org/r/20220919172418.45257-2-akinobu.mita@gmail.com
Fixes: 488dac0c92 ("libfs: fix error cast of negative value in simple_attr_write()")
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Reported-by: Zhao Gongyi <zhaogongyi@huawei.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Rafael J. Wysocki <rafael@kernel.org>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Wei Yongjun <weiyongjun1@huawei.com>
Cc: Yicong Yang <yangyicong@hisilicon.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-11-30 16:13:16 -08:00
Baokun Li
c7e8d3279c squashfs: fix null-ptr-deref in squashfs_fill_super
When squashfs_read_table() returns an error or `sb->s_magic !=
SQUASHFS_MAGIC`, enters the error branch and calls
msblk->thread_ops->destroy(msblk) to destroy msblk.  However,
msblk->thread_ops has not been initialized.  Therefore, the following
problem is triggered:

==================================================================
BUG: KASAN: null-ptr-deref in squashfs_fill_super+0xe7a/0x13b0
Read of size 8 at addr 0000000000000008 by task swapper/0/1

CPU: 0 PID: 1 Comm: swapper/0 Not tainted 6.1.0-rc3-next-20221031 #367
Call Trace:
 <TASK>
 dump_stack_lvl+0x73/0x9f
 print_report+0x743/0x759
 kasan_report+0xc0/0x120
 __asan_load8+0xd3/0x140
 squashfs_fill_super+0xe7a/0x13b0
 get_tree_bdev+0x27b/0x450
 squashfs_get_tree+0x19/0x30
 vfs_get_tree+0x49/0x150
 path_mount+0xaae/0x1350
 init_mount+0xad/0x100
 do_mount_root+0xbc/0x1d0
 mount_block_root+0x173/0x316
 mount_root+0x223/0x236
 prepare_namespace+0x1eb/0x237
 kernel_init_freeable+0x528/0x576
 kernel_init+0x29/0x250
 ret_from_fork+0x1f/0x30
 </TASK>
==================================================================

To solve this issue, msblk->thread_ops is initialized immediately after
msblk is assigned a value.

Link: https://lkml.kernel.org/r/20221101073343.3961562-1-libaokun1@huawei.com
Fixes: b0645770d3c7 ("squashfs: add the mount parameter theads=<single|multi|percpu>")
Signed-off-by: Baokun Li <libaokun1@huawei.com>
Reviewed-by: Xiaoming Ni <nixiaoming@huawei.com>
Reviewed-by: Phillip Lougher <phillip@squashfs.org.uk>
Cc: Yu Kuai <yukuai3@huawei.com>
Cc: Zhang Yi <yi.zhang@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-11-18 13:55:09 -08:00
Shang XiaoJing
13b6269dd0 ocfs2: fix memory leak in ocfs2_stack_glue_init()
ocfs2_table_header should be free in ocfs2_stack_glue_init() if
ocfs2_sysfs_init() failed, otherwise kmemleak will report memleak.

BUG: memory leak
unreferenced object 0xffff88810eeb5800 (size 128):
  comm "modprobe", pid 4507, jiffies 4296182506 (age 55.888s)
  hex dump (first 32 bytes):
    c0 40 14 a0 ff ff ff ff 00 00 00 00 01 00 00 00  .@..............
    01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
  backtrace:
    [<000000001e59e1cd>] __register_sysctl_table+0xca/0xef0
    [<00000000c04f70f7>] 0xffffffffa0050037
    [<000000001bd12912>] do_one_initcall+0xdb/0x480
    [<0000000064f766c9>] do_init_module+0x1cf/0x680
    [<000000002ba52db0>] load_module+0x6441/0x6f20
    [<000000009772580d>] __do_sys_finit_module+0x12f/0x1c0
    [<00000000380c1f22>] do_syscall_64+0x3f/0x90
    [<000000004cf473bc>] entry_SYSCALL_64_after_hwframe+0x63/0xcd

Link: https://lkml.kernel.org/r/41651ca1-432a-db34-eb97-d35744559de1@linux.alibaba.com
Fixes: 3878f110f7 ("ocfs2: Move the hb_ctl_path sysctl into the stack glue.")
Signed-off-by: Shang XiaoJing <shangxiaojing@huawei.com>
Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Changwei Ge <gechangwei@live.cn>
Cc: Gang He <ghe@suse.com>
Cc: Jun Piao <piaojun@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-11-18 13:55:09 -08:00
Xiaoming Ni
fb40fe04f9 squashfs: allows users to configure the number of decompression threads
The maximum number of threads in the decompressor_multi.c file is fixed
and cannot be adjusted according to user needs.  Therefore, the mount
parameter needs to be added to allow users to configure the number of
threads as required.  The upper limit is num_online_cpus() * 2.

Link: https://lkml.kernel.org/r/20221019030930.130456-3-nixiaoming@huawei.com
Signed-off-by: Xiaoming Ni <nixiaoming@huawei.com>
Reviewed-by: Phillip Lougher <phillip@squashfs.org.uk>
Cc: Jianguo Chen <chenjianguo3@huawei.com>
Cc: Jubin Zhong <zhongjubin@huawei.com>
Cc: Zhang Yi <yi.zhang@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-11-18 13:55:08 -08:00
Xiaoming Ni
80f784098f squashfs: add the mount parameter theads=<single|multi|percpu>
Patch series 'squashfs: Add the mount parameter "threads="'.

Currently, Squashfs supports multiple decompressor parallel modes. 
However, this mode can be configured only during kernel building and does
not support flexible selection during runtime.

In the current patch set, the mount parameter "threads=" is added to allow
users to select the parallel decompressor mode and configure the number of
decompressors when mounting a file system.

"threads=<single|multi|percpu|1|2|3|...>"
The upper limit is num_online_cpus() * 2.


This patch (of 2):

Squashfs supports three decompression concurrency modes:
	Single-thread mode: concurrent reads are blocked and the memory
		overhead is small.
	Multi-thread mode/percpu mode: reduces concurrent read blocking but
		increases memory overhead.

The corresponding schema must be fixed at compile time. During mounting,
the concurrent decompression mode cannot be adjusted based on file read
blocking.

The mount parameter theads=<single|multi|percpu> is added to select
the concurrent decompression mode of a single SquashFS file system
image.

Link: https://lkml.kernel.org/r/20221019030930.130456-1-nixiaoming@huawei.com
Link: https://lkml.kernel.org/r/20221019030930.130456-2-nixiaoming@huawei.com
Signed-off-by: Xiaoming Ni <nixiaoming@huawei.com>
Reviewed-by: Phillip Lougher <phillip@squashfs.org.uk>
Cc: Jianguo Chen <chenjianguo3@huawei.com>
Cc: Jubin Zhong <zhongjubin@huawei.com>
Cc: Zhang Yi <yi.zhang@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-11-18 13:55:08 -08:00
Ryusuke Konishi
ebeccaaef6 nilfs2: fix shift-out-of-bounds due to too large exponent of block size
If field s_log_block_size of superblock data is corrupted and too large,
init_nilfs() and load_nilfs() still can trigger a shift-out-of-bounds
warning followed by a kernel panic (if panic_on_warn is set):

 shift exponent 38973 is too large for 32-bit type 'int'
 Call Trace:
  <TASK>
  dump_stack_lvl+0xcd/0x134
  ubsan_epilogue+0xb/0x50
  __ubsan_handle_shift_out_of_bounds.cold.12+0x17b/0x1f5
  init_nilfs.cold.11+0x18/0x1d [nilfs2]
  nilfs_mount+0x9b5/0x12b0 [nilfs2]
  ...

This fixes the issue by adding and using a new helper function for getting
block size with sanity check.

Link: https://lkml.kernel.org/r/20221027044306.42774-3-konishi.ryusuke@gmail.com
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Tested-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-11-18 13:55:08 -08:00
Ryusuke Konishi
610a2a3d7d nilfs2: fix shift-out-of-bounds/overflow in nilfs_sb2_bad_offset()
Patch series "nilfs2: fix UBSAN shift-out-of-bounds warnings on mount
time".

The first patch fixes a bug reported by syzbot, and the second one fixes
the remaining bug of the same kind.  Although they are triggered by the
same super block data anomaly, I divided it into the above two because the
details of the issues and how to fix it are different.

Both are required to eliminate the shift-out-of-bounds issues at mount
time.


This patch (of 2):

If the block size exponent information written in an on-disk superblock is
corrupted, nilfs_sb2_bad_offset helper function can trigger
shift-out-of-bounds warning followed by a kernel panic (if panic_on_warn
is set):

 shift exponent 38983 is too large for 64-bit type 'unsigned long long'
 Call Trace:
  <TASK>
  __dump_stack lib/dump_stack.c:88 [inline]
  dump_stack_lvl+0x1b1/0x28e lib/dump_stack.c:106
  ubsan_epilogue lib/ubsan.c:151 [inline]
  __ubsan_handle_shift_out_of_bounds+0x33d/0x3b0 lib/ubsan.c:322
  nilfs_sb2_bad_offset fs/nilfs2/the_nilfs.c:449 [inline]
  nilfs_load_super_block+0xdf5/0xe00 fs/nilfs2/the_nilfs.c:523
  init_nilfs+0xb7/0x7d0 fs/nilfs2/the_nilfs.c:577
  nilfs_fill_super+0xb1/0x5d0 fs/nilfs2/super.c:1047
  nilfs_mount+0x613/0x9b0 fs/nilfs2/super.c:1317
  ...

In addition, since nilfs_sb2_bad_offset() performs multiplication without
considering the upper bound, the computation may overflow if the disk
layout parameters are not normal.

This fixes these issues by inserting preliminary sanity checks for those
parameters and by converting the comparison from one involving
multiplication and left bit-shifting to one using division and right
bit-shifting.

Link: https://lkml.kernel.org/r/20221027044306.42774-1-konishi.ryusuke@gmail.com
Link: https://lkml.kernel.org/r/20221027044306.42774-2-konishi.ryusuke@gmail.com
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Reported-by: syzbot+e91619dd4c11c4960706@syzkaller.appspotmail.com
Tested-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-11-18 13:55:08 -08:00
Alexey Dobriyan
941baf6feb proc: give /proc/cmdline size
Most /proc files don't have length (in fstat sense).  This leads to
inefficiencies when reading such files with APIs commonly found in modern
programming languages.  They open file, then fstat descriptor, get st_size
== 0 and either assume file is empty or start reading without knowing
target size.

cat(1) does OK because it uses large enough buffer by default.  But naive
programs copy-pasted from SO aren't:

	let mut f = std::fs::File::open("/proc/cmdline").unwrap();
	let mut buf: Vec<u8> = Vec::new();
	f.read_to_end(&mut buf).unwrap();

will result in

	openat(AT_FDCWD, "/proc/cmdline", O_RDONLY|O_CLOEXEC) = 3
	statx(0, NULL, AT_STATX_SYNC_AS_STAT, STATX_ALL, NULL) = -1 EFAULT (Bad address)
	statx(3, "", AT_STATX_SYNC_AS_STAT|AT_EMPTY_PATH, STATX_ALL, {stx_mask=STATX_BASIC_STATS|STATX_MNT_ID, stx_attributes=0, stx_mode=S_IFREG|0444, stx_size=0, ...}) = 0
	lseek(3, 0, SEEK_CUR)                   = 0
	read(3, "BOOT_IMAGE=(hd3,gpt2)/vmlinuz-5.", 32) = 32
	read(3, "19.6-100.fc35.x86_64 root=/dev/m", 32) = 32
	read(3, "apper/fedora_localhost--live-roo"..., 64) = 64
	read(3, "ocalhost--live-swap rd.lvm.lv=fe"..., 128) = 116
	read(3, "", 12)

open/stat is OK, lseek looks silly but there are 3 unnecessary reads
because Rust starts with 32 bytes per Vec<u8> and grows from there.

In case of /proc/cmdline, the length is known precisely.

Make variables readonly while I'm at it.

P.S.: I tried to scp /proc/cpuinfo today and got empty file
	but this is separate story.

Link: https://lkml.kernel.org/r/YxoywlbM73JJN3r+@localhost.localdomain
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-11-18 13:55:07 -08:00
Ivan Babrou
f1f1f25699 proc: report open files as size in stat() for /proc/pid/fd
Many monitoring tools include open file count as a metric.  Currently the
only way to get this number is to enumerate the files in /proc/pid/fd.

The problem with the current approach is that it does many things people
generally don't care about when they need one number for a metric.  In our
tests for cadvisor, which reports open file counts per cgroup, we observed
that reading the number of open files is slow.  Out of 35.23% of CPU time
spent in `proc_readfd_common`, we see 29.43% spent in `proc_fill_cache`,
which is responsible for filling dentry info.  Some of this extra time is
spinlock contention, but it's a contention for the lock we don't want to
take to begin with.

We considered putting the number of open files in /proc/pid/status. 
Unfortunately, counting the number of fds involves iterating the
open_files bitmap, which has a linear complexity in proportion with the
number of open files (bitmap slots really, but it's close).  We don't want
to make /proc/pid/status any slower, so instead we put this info in
/proc/pid/fd as a size member of the stat syscall result.  Previously the
reported number was zero, so there's very little risk of breaking
anything, while still providing a somewhat logical way to count the open
files with a fallback if it's zero.

RFC for this patch included iterating open fds under RCU.  Thanks to Frank
Hofmann for the suggestion to use the bitmap instead.

Previously:

```
$ sudo stat /proc/1/fd | head -n2
  File: /proc/1/fd
  Size: 0         	Blocks: 0          IO Block: 1024   directory
```

With this patch:

```
$ sudo stat /proc/1/fd | head -n2
  File: /proc/1/fd
  Size: 65        	Blocks: 0          IO Block: 1024   directory
```

Correctness check:

```
$ sudo ls /proc/1/fd | wc -l
65
```

I added the docs for /proc/<pid>/fd while I'm at it.

[ivan@cloudflare.com: use bitmap_weight() to count the bits]
  Link: https://lkml.kernel.org/r/20221018045844.37697-1-ivan@cloudflare.com
[akpm@linux-foundation.org: include linux/bitmap.h for bitmap_weight()]
[ivan@cloudflare.com: return errno from proc_fd_getattr() instead of setting negative size]
  Link: https://lkml.kernel.org/r/20221024173140.30673-1-ivan@cloudflare.com
Link: https://lkml.kernel.org/r/20220922224027.59266-1-ivan@cloudflare.com
Signed-off-by: Ivan Babrou <ivan@cloudflare.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Christoph Anton Mitterer <mail@christoph.anton.mitterer.name>
Cc: David Hildenbrand <david@redhat.com>
Cc: David Laight <David.Laight@ACULAB.COM>
Cc: Ivan Babrou <ivan@cloudflare.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Kalesh Singh <kaleshsingh@google.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-11-18 13:55:07 -08:00
Jianglei Nie
12b9d301ff proc/vmcore: fix potential memory leak in vmcore_init()
Patch series "Some minor cleanup patches resent".

The first three patches trivial clean up patches.

And for the patch "kexec: replace crash_mem_range with range", I got a
ibm-p9wr ppc64le system to test, it works well.


This patch (of 4):

elfcorehdr_alloc() allocates a memory chunk for elfcorehdr_addr with
kzalloc().  If is_vmcore_usable() returns false, elfcorehdr_addr is a
predefined value.  If parse_crash_elf_headers() gets some error and
returns a negetive value, the elfcorehdr_addr should be released with
elfcorehdr_free().

Fix it by calling elfcorehdr_free() when parse_crash_elf_headers() fails.

Link: https://lkml.kernel.org/r/20220929042936.22012-1-bhe@redhat.com
Link: https://lkml.kernel.org/r/20220929042936.22012-2-bhe@redhat.com
Signed-off-by: Jianglei Nie <niejianglei2021@163.com>
Signed-off-by: Baoquan He <bhe@redhat.com>
Acked-by: Baoquan He <bhe@redhat.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Chen Lifu <chenlifu@huawei.com>
Cc: "Eric W . Biederman" <ebiederm@xmission.com>
Cc: Li Chen <lchen@ambarella.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Russell King <linux@armlinux.org.uk>
Cc: ye xingchen <ye.xingchen@zte.com.cn>
Cc: Zeal Robot <zealci@zte.com.cn>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-11-18 13:55:06 -08:00
Joseph Qi
b270f492dc ocfs2/dlm: use bitmap API instead of hand-writing it
Use bitmap_zero/bitmap_copy/bitmap_qeual directly for bitmap operations.

Link: https://lkml.kernel.org/r/20221007124846.186453-3-joseph.qi@linux.alibaba.com
Signed-off-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Cc: Changwei Ge <gechangwei@live.cn>
Cc: Gang He <ghe@suse.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Jun Piao <piaojun@huawei.com>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Mark Fasheh <mark@fasheh.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-11-18 13:55:06 -08:00
Joseph Qi
6d4a93b680 ocfs2: use bitmap API in fill_node_map
Pass bits directly into fill_node_map helper and use bitmap API directly
to simplify code.

Link: https://lkml.kernel.org/r/20221007124846.186453-2-joseph.qi@linux.alibaba.com
Signed-off-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Cc: Changwei Ge <gechangwei@live.cn>
Cc: Gang He <ghe@suse.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Jun Piao <piaojun@huawei.com>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Mark Fasheh <mark@fasheh.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-11-18 13:55:06 -08:00
Joseph Qi
71dd5d651b ocfs2/cluster: use bitmap API instead of hand-writing it
Use bitmap_zero/bitmap_copy/bitmap_equal directly for bitmap operations.

Link: https://lkml.kernel.org/r/20221007124846.186453-1-joseph.qi@linux.alibaba.com
Signed-off-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Changwei Ge <gechangwei@live.cn>
Cc: Gang He <ghe@suse.com>
Cc: Jun Piao <piaojun@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-11-18 13:55:06 -08:00
Oleksandr Natalenko
8603b6f586 core_pattern: add CPU specifier
Statistically, in a large deployment regular segfaults may indicate a CPU
issue.

Currently, it is not possible to find out what CPU the segfault happened
on.  There are at least two attempts to improve segfault logging with this
regard, but they do not help in case the logs rotate.

Hence, lets make sure it is possible to permanently record a CPU the task
ran on using a new core_pattern specifier.

Link: https://lkml.kernel.org/r/20220903064330.20772-1-oleksandr@redhat.com
Signed-off-by: Oleksandr Natalenko <oleksandr@redhat.com>
Suggested-by: Renaud Métrich <rmetrich@redhat.com>
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: "Eric W . Biederman" <ebiederm@xmission.com>
Cc: Grzegorz Halat <ghalat@redhat.com>
Cc: "Guilherme G. Piccoli" <gpiccoli@igalia.com>
Cc: "Huang, Ying" <ying.huang@intel.com>
Cc: Jason A. Donenfeld <Jason@zx2c4.com>
Cc: Joel Savitz <jsavitz@redhat.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Kees Cook <keescook@chromium.org>
Cc: Laurent Dufour <ldufour@linux.ibm.com>
Cc: Luis Chamberlain <mcgrof@kernel.org>
Cc: Rob Herring <robh@kernel.org>
Cc: Stephen Kitt <steve@sk2.org>
Cc: Will Deacon <will@kernel.org>
Cc: Xiaoming Ni <nixiaoming@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-11-18 13:55:06 -08:00
Linus Torvalds
9761070d14 Fix a number of bug fixes, including some regressions, the most
serious of which was one which would cause online resizes to fail with
 file systems with metadata checksums enabled.  Also fix a warning
 caused by the newly added fortify string checker, plus some bugs that
 were found using fuzzed file systems.
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEK2m5VNv+CHkogTfJ8vlZVpUNgaMFAmNnSCYACgkQ8vlZVpUN
 gaNbBgf/QsOe7KCrr/X7mK7SFgbNY+jsmvagPV0SvAg9Uc0P3EkmXE0NcNcZOAUx
 mgNBYNNS+QGKtdqHBy8p1kNgcbFAR/OJZ7rFD3XUnB/N+XKZSgimhNUx+IaEX7Dx
 XidK5cPcKEZlbfuqxwkIfvaqC9v3XcpFpHicA/uDTPe4kZ8VhJQk294M5EuMA8lQ
 wumDFsf/1sN4osJH7eHMZk/e3iFN8fwrpCgvwJ56zzW7UWSl8jJrq9kxHo43iijY
 82DbRCdsVrdTPaD5gJSvcggLgMpUu+yoA1UbwiUlR1AtmaFfDg+rfIZs1ooyCdHl
 QLQ3RlXdkfHTwAYBFFApzR55MhPakQ==
 =zw2b
 -----END PGP SIGNATURE-----

Merge tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4

Pull ext4 fixes from Ted Ts'o:
 "Fix a number of bugs, including some regressions, the most serious of
  which was one which would cause online resizes to fail with file
  systems with metadata checksums enabled.

  Also fix a warning caused by the newly added fortify string checker,
  plus some bugs that were found using fuzzed file systems"

* tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
  ext4: fix fortify warning in fs/ext4/fast_commit.c:1551
  ext4: fix wrong return err in ext4_load_and_init_journal()
  ext4: fix warning in 'ext4_da_release_space'
  ext4: fix BUG_ON() when directory entry has invalid rec_len
  ext4: update the backup superblock's at the end of the online resize
2022-11-06 10:30:29 -08:00
Linus Torvalds
90153f928b 3 cifs/smb3 fixes, one for symlink handling and two fix multichannel issues with iterating channels, including for oplock breaks when leases are disabled
-----BEGIN PGP SIGNATURE-----
 
 iQGzBAABCgAdFiEE6fsu8pdIjtWE/DpLiiy9cAdyT1EFAmNnPqoACgkQiiy9cAdy
 T1FXpgv/fgkH48LGjRH+Pd86oZchCagEMF8Zfy59SKiNWwcuS43B+sjwrotbwT4r
 Nmq1LYHmGgy0b63L9L+DSwO6mxOEvm3ryJ2vxInsG1Rsebw0oSBxolPOHjYLWHKH
 +BsMGxLEVHWMHzFzrNJC1Fp9oGgmD6tNmdDBxBw471UK1AURfc7tg/70MmDm7lDx
 cTLE40Fu+ni3OZ22YL0jYIgHWkk0S1r+/lFNYLvxrZF+D7zRhnVCALbY60L/a4/T
 /nLViWlHAKp9UUlCJTJOXyfVV2PVkF2JEUCIPcfTvYNvDFMGLH/mLLNd6iSq4EgX
 HE811XfZ8HrfL+T2oTHcgNo6CkCZBtdw2zV/RivRDojxHYy/soYv0p0B54c37V3i
 x89/tc1KYxHNR27W+0dxT8D66cRkez0Sb4f/BKdhHfl0WaAmVpfl41XGQ3pihm/C
 Nb8nj/b16R9lqm27Zgu8Vy3p1LSF0d3tn/UDxIP3unoyQWEHHY7oiBlMJ6uc6qls
 faQSx8tv
 =ESTZ
 -----END PGP SIGNATURE-----

Merge tag '6.1-rc4-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6

Pull cifs fixes from Steve French:
 "One symlink handling fix and two fixes foir multichannel issues with
  iterating channels, including for oplock breaks when leases are
  disabled"

* tag '6.1-rc4-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6:
  cifs: fix use-after-free on the link name
  cifs: avoid unnecessary iteration of tcp sessions
  cifs: always iterate smb sessions using primary channel
2022-11-06 10:19:39 -08:00
Theodore Ts'o
0d043351e5 ext4: fix fortify warning in fs/ext4/fast_commit.c:1551
With the new fortify string system, rework the memcpy to avoid this
warning:

memcpy: detected field-spanning write (size 60) of single field "&raw_inode->i_generation" at fs/ext4/fast_commit.c:1551 (size 4)

Cc: stable@kernel.org
Fixes: 54d9469bc5 ("fortify: Add run-time WARN for cross-field memcpy()")
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2022-11-06 01:07:59 -04:00
Jason Yan
9f2a1d9fb3 ext4: fix wrong return err in ext4_load_and_init_journal()
The return value is wrong in ext4_load_and_init_journal(). The local
variable 'err' need to be initialized before goto out. The original code
in __ext4_fill_super() is fine because it has two return values 'ret'
and 'err' and 'ret' is initialized as -EINVAL. After we factor out
ext4_load_and_init_journal(), this code is broken. So fix it by directly
returning -EINVAL in the error handler path.

Cc: stable@kernel.org
Fixes: 9c1dd22d74 ("ext4: factor out ext4_load_and_init_journal()")
Signed-off-by: Jason Yan <yanaijie@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20221025040206.3134773-1-yanaijie@huawei.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2022-11-06 01:07:59 -04:00
Ye Bin
1b8f787ef5 ext4: fix warning in 'ext4_da_release_space'
Syzkaller report issue as follows:
EXT4-fs (loop0): Free/Dirty block details
EXT4-fs (loop0): free_blocks=0
EXT4-fs (loop0): dirty_blocks=0
EXT4-fs (loop0): Block reservation details
EXT4-fs (loop0): i_reserved_data_blocks=0
EXT4-fs warning (device loop0): ext4_da_release_space:1527: ext4_da_release_space: ino 18, to_free 1 with only 0 reserved data blocks
------------[ cut here ]------------
WARNING: CPU: 0 PID: 92 at fs/ext4/inode.c:1528 ext4_da_release_space+0x25e/0x370 fs/ext4/inode.c:1524
Modules linked in:
CPU: 0 PID: 92 Comm: kworker/u4:4 Not tainted 6.0.0-syzkaller-09423-g493ffd6605b2 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/22/2022
Workqueue: writeback wb_workfn (flush-7:0)
RIP: 0010:ext4_da_release_space+0x25e/0x370 fs/ext4/inode.c:1528
RSP: 0018:ffffc900015f6c90 EFLAGS: 00010296
RAX: 42215896cd52ea00 RBX: 0000000000000000 RCX: 42215896cd52ea00
RDX: 0000000000000000 RSI: 0000000080000001 RDI: 0000000000000000
RBP: 1ffff1100e907d96 R08: ffffffff816aa79d R09: fffff520002bece5
R10: fffff520002bece5 R11: 1ffff920002bece4 R12: ffff888021fd2000
R13: ffff88807483ecb0 R14: 0000000000000001 R15: ffff88807483e740
FS:  0000000000000000(0000) GS:ffff8880b9a00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00005555569ba628 CR3: 000000000c88e000 CR4: 00000000003506f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 <TASK>
 ext4_es_remove_extent+0x1ab/0x260 fs/ext4/extents_status.c:1461
 mpage_release_unused_pages+0x24d/0xef0 fs/ext4/inode.c:1589
 ext4_writepages+0x12eb/0x3be0 fs/ext4/inode.c:2852
 do_writepages+0x3c3/0x680 mm/page-writeback.c:2469
 __writeback_single_inode+0xd1/0x670 fs/fs-writeback.c:1587
 writeback_sb_inodes+0xb3b/0x18f0 fs/fs-writeback.c:1870
 wb_writeback+0x41f/0x7b0 fs/fs-writeback.c:2044
 wb_do_writeback fs/fs-writeback.c:2187 [inline]
 wb_workfn+0x3cb/0xef0 fs/fs-writeback.c:2227
 process_one_work+0x877/0xdb0 kernel/workqueue.c:2289
 worker_thread+0xb14/0x1330 kernel/workqueue.c:2436
 kthread+0x266/0x300 kernel/kthread.c:376
 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:306
 </TASK>

Above issue may happens as follows:
ext4_da_write_begin
  ext4_create_inline_data
    ext4_clear_inode_flag(inode, EXT4_INODE_EXTENTS);
    ext4_set_inode_flag(inode, EXT4_INODE_INLINE_DATA);
__ext4_ioctl
  ext4_ext_migrate -> will lead to eh->eh_entries not zero, and set extent flag
ext4_da_write_begin
  ext4_da_convert_inline_data_to_extent
    ext4_da_write_inline_data_begin
      ext4_da_map_blocks
        ext4_insert_delayed_block
	  if (!ext4_es_scan_clu(inode, &ext4_es_is_delonly, lblk))
	    if (!ext4_es_scan_clu(inode, &ext4_es_is_mapped, lblk))
	      ext4_clu_mapped(inode, EXT4_B2C(sbi, lblk)); -> will return 1
	       allocated = true;
          ext4_es_insert_delayed_block(inode, lblk, allocated);
ext4_writepages
  mpage_map_and_submit_extent(handle, &mpd, &give_up_on_write); -> return -ENOSPC
  mpage_release_unused_pages(&mpd, give_up_on_write); -> give_up_on_write == 1
    ext4_es_remove_extent
      ext4_da_release_space(inode, reserved);
        if (unlikely(to_free > ei->i_reserved_data_blocks))
	  -> to_free == 1  but ei->i_reserved_data_blocks == 0
	  -> then trigger warning as above

To solve above issue, forbid inode do migrate which has inline data.

Cc: stable@kernel.org
Reported-by: syzbot+c740bb18df70ad00952e@syzkaller.appspotmail.com
Signed-off-by: Ye Bin <yebin10@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20221018022701.683489-1-yebin10@huawei.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2022-11-06 01:07:59 -04:00
Luís Henriques
17a0bc9bd6 ext4: fix BUG_ON() when directory entry has invalid rec_len
The rec_len field in the directory entry has to be a multiple of 4.  A
corrupted filesystem image can be used to hit a BUG() in
ext4_rec_len_to_disk(), called from make_indexed_dir().

 ------------[ cut here ]------------
 kernel BUG at fs/ext4/ext4.h:2413!
 ...
 RIP: 0010:make_indexed_dir+0x53f/0x5f0
 ...
 Call Trace:
  <TASK>
  ? add_dirent_to_buf+0x1b2/0x200
  ext4_add_entry+0x36e/0x480
  ext4_add_nondir+0x2b/0xc0
  ext4_create+0x163/0x200
  path_openat+0x635/0xe90
  do_filp_open+0xb4/0x160
  ? __create_object.isra.0+0x1de/0x3b0
  ? _raw_spin_unlock+0x12/0x30
  do_sys_openat2+0x91/0x150
  __x64_sys_open+0x6c/0xa0
  do_syscall_64+0x3c/0x80
  entry_SYSCALL_64_after_hwframe+0x46/0xb0

The fix simply adds a call to ext4_check_dir_entry() to validate the
directory entry, returning -EFSCORRUPTED if the entry is invalid.

CC: stable@kernel.org
Link: https://bugzilla.kernel.org/show_bug.cgi?id=216540
Signed-off-by: Luís Henriques <lhenriques@suse.de>
Link: https://lore.kernel.org/r/20221012131330.32456-1-lhenriques@suse.de
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2022-11-06 01:07:49 -04:00
ChenXiaoSong
542228db2f cifs: fix use-after-free on the link name
xfstests generic/011 reported use-after-free bug as follows:

  BUG: KASAN: use-after-free in __d_alloc+0x269/0x859
  Read of size 15 at addr ffff8880078933a0 by task dirstress/952

  CPU: 1 PID: 952 Comm: dirstress Not tainted 6.1.0-rc3+ #77
  Call Trace:
   __dump_stack+0x23/0x29
   dump_stack_lvl+0x51/0x73
   print_address_description+0x67/0x27f
   print_report+0x3e/0x5c
   kasan_report+0x7b/0xa8
   kasan_check_range+0x1b2/0x1c1
   memcpy+0x22/0x5d
   __d_alloc+0x269/0x859
   d_alloc+0x45/0x20c
   d_alloc_parallel+0xb2/0x8b2
   lookup_open+0x3b8/0x9f9
   open_last_lookups+0x63d/0xc26
   path_openat+0x11a/0x261
   do_filp_open+0xcc/0x168
   do_sys_openat2+0x13b/0x3f7
   do_sys_open+0x10f/0x146
   __se_sys_creat+0x27/0x2e
   __x64_sys_creat+0x55/0x6a
   do_syscall_64+0x40/0x96
   entry_SYSCALL_64_after_hwframe+0x63/0xcd

  Allocated by task 952:
   kasan_save_stack+0x1f/0x42
   kasan_set_track+0x21/0x2a
   kasan_save_alloc_info+0x17/0x1d
   __kasan_kmalloc+0x7e/0x87
   __kmalloc_node_track_caller+0x59/0x155
   kstrndup+0x60/0xe6
   parse_mf_symlink+0x215/0x30b
   check_mf_symlink+0x260/0x36a
   cifs_get_inode_info+0x14e1/0x1690
   cifs_revalidate_dentry_attr+0x70d/0x964
   cifs_revalidate_dentry+0x36/0x62
   cifs_d_revalidate+0x162/0x446
   lookup_open+0x36f/0x9f9
   open_last_lookups+0x63d/0xc26
   path_openat+0x11a/0x261
   do_filp_open+0xcc/0x168
   do_sys_openat2+0x13b/0x3f7
   do_sys_open+0x10f/0x146
   __se_sys_creat+0x27/0x2e
   __x64_sys_creat+0x55/0x6a
   do_syscall_64+0x40/0x96
   entry_SYSCALL_64_after_hwframe+0x63/0xcd

  Freed by task 950:
   kasan_save_stack+0x1f/0x42
   kasan_set_track+0x21/0x2a
   kasan_save_free_info+0x1c/0x34
   ____kasan_slab_free+0x1c1/0x1d5
   __kasan_slab_free+0xe/0x13
   __kmem_cache_free+0x29a/0x387
   kfree+0xd3/0x10e
   cifs_fattr_to_inode+0xb6a/0xc8c
   cifs_get_inode_info+0x3cb/0x1690
   cifs_revalidate_dentry_attr+0x70d/0x964
   cifs_revalidate_dentry+0x36/0x62
   cifs_d_revalidate+0x162/0x446
   lookup_open+0x36f/0x9f9
   open_last_lookups+0x63d/0xc26
   path_openat+0x11a/0x261
   do_filp_open+0xcc/0x168
   do_sys_openat2+0x13b/0x3f7
   do_sys_open+0x10f/0x146
   __se_sys_creat+0x27/0x2e
   __x64_sys_creat+0x55/0x6a
   do_syscall_64+0x40/0x96
   entry_SYSCALL_64_after_hwframe+0x63/0xcd

When opened a symlink, link name is from 'inode->i_link', but it may be
reset to a new value when revalidate the dentry. If some processes get the
link name on the race scenario, then UAF will happen on link name.

Fix this by implementing 'get_link' interface to duplicate the link name.

Fixes: 76894f3e2f ("cifs: improve symlink handling for smb2+")
Signed-off-by: ChenXiaoSong <chenxiaosong2@huawei.com>
Reviewed-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
Signed-off-by: Steve French <stfrench@microsoft.com>
2022-11-04 23:36:54 -05:00
Shyam Prasad N
23d9b9b757 cifs: avoid unnecessary iteration of tcp sessions
In a few places, we do unnecessary iterations of
tcp sessions, even when the server struct is provided.

The change avoids it and uses the server struct provided.

Signed-off-by: Shyam Prasad N <sprasad@microsoft.com>
Reviewed-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
Signed-off-by: Steve French <stfrench@microsoft.com>
2022-11-04 23:34:40 -05:00
Shyam Prasad N
8abcaeaed3 cifs: always iterate smb sessions using primary channel
smb sessions and tcons currently hang off primary channel only.
Secondary channels have the lists as empty. Whenever there's a
need to iterate sessions or tcons, we should use the list in the
corresponding primary channel.

Signed-off-by: Shyam Prasad N <sprasad@microsoft.com>
Reviewed-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
Signed-off-by: Steve French <stfrench@microsoft.com>
2022-11-04 23:34:02 -05:00
Linus Torvalds
64c3dd0b98 Bug fixes for 6.1-rc4:
- Fix a UAF bug during log recovery.
 - Fix memory leaks when mount fails.
 - Detect corrupt bestfree information in a directory block.
 - Fix incorrect return value type for the dax page fault handlers.
 - Fix fortify complaints about memcpy of xfs log item objects.
 - Strengthen inadequate validation of recovered log items.
 - Fix incorrectly declared flex array in EFI log item structs.
 - Log corrupt log items for debugging purposes.
 - Fix infinite loop problems in the refcount code if the refcount btree
   node block keys are corrupt.
 - Fix infinite loop problems in the refcount code if the refcount btree
   records suffer MSB bitflips.
 - Add more sanity checking to continued defer ops to prevent overflows
   from one AG to the next or off EOFS.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEUzaAxoMeQq6m2jMV+H93GTRKtOsFAmNhT68ACgkQ+H93GTRK
 tOs5Bw/+NSyPZ7jYVa3mXYKRsqMU/nqAGnNK4D4uS8gRlJTBolpC8Vs4fpBTQzV8
 3JN8F/AUIQJOCxt5a81tlsPSYgEsxuIqion1olh3Z6ln4wN0su3rj0E3h+CTtgV8
 xf3axdre4uC2xYhmKiDTD4ezLqylnRmsK1nNLbFzRnnJrYN+FiiJB7BefuJkbEzI
 HRTAJPo3oxsCDinkkyQhZ8CjD7ZenYuhgc4jFmVSLqNjULkF2kDyHgLCfojq+p3E
 G6WsuJ9fonMXlt2WV7k3tKektHIll8+ile6+zuPSjOH+WHo4/jWIjvUsg0X+M3DS
 jemPFNgpS6jSJy3qbPJoDej8XlV0FV4VzsCh2a/YaGa1Outl8V9ZhMyt9tc8LWzF
 3Z1KkywsBqzK9m9yDlokmGPq71kCEQ+OMQSSlELEf6q7HHUf6yr3MyA5tXKqzJod
 DYFYoX70EoPAKk47gFI5EIYrzuTFx7PRugUUSU09e0wmjSswH7RjNur+Ya1eHhYc
 VUe6gUluuAkTFHhEjk+8mTg1iUlg92YdzL7pKSoeAlQczz1ZwQhE9W0ul1/z07d4
 F4DXi6CtmM38e7XsX0CKmZ0ins9QmSDJheCKmE3kdLYY9PpzQpgtlq4kqjUP5eJw
 XZwB6cUS4pXw2zf4tW1qQ5pe13umfN6/VqymagG4fKWfAwj8s9o=
 =1IfG
 -----END PGP SIGNATURE-----

Merge tag 'xfs-6.1-fixes-4' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux

Pull xfs fixes from Darrick Wong:
 "Dave and I had thought that this would be a very quiet cycle, but we
  thought wrong.

  At first there were the usual trickle of minor bugfixes, but then
  Zorro pulled -rc1 and noticed complaints about the stronger memcpy
  checks w.r.t. flex arrays.

  Analyzing how to fix that revealed a bunch of validation gaps in
  validating ondisk log items during recovery, and then a customer hit
  an infinite loop in the refcounting code on a corrupt filesystem.

  So. This largeish batch of fixes addresses all those problems, I hope.

  Summary:

   - Fix a UAF bug during log recovery

   - Fix memory leaks when mount fails

   - Detect corrupt bestfree information in a directory block

   - Fix incorrect return value type for the dax page fault handlers

   - Fix fortify complaints about memcpy of xfs log item objects

   - Strengthen inadequate validation of recovered log items

   - Fix incorrectly declared flex array in EFI log item structs

   - Log corrupt log items for debugging purposes

   - Fix infinite loop problems in the refcount code if the refcount
     btree node block keys are corrupt

   - Fix infinite loop problems in the refcount code if the refcount
     btree records suffer MSB bitflips

   - Add more sanity checking to continued defer ops to prevent
     overflows from one AG to the next or off EOFS"

* tag 'xfs-6.1-fixes-4' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: (28 commits)
  xfs: rename XFS_REFC_COW_START to _COWFLAG
  xfs: fix uninitialized list head in struct xfs_refcount_recovery
  xfs: fix agblocks check in the cow leftover recovery function
  xfs: check record domain when accessing refcount records
  xfs: remove XFS_FIND_RCEXT_SHARED and _COW
  xfs: refactor domain and refcount checking
  xfs: report refcount domain in tracepoints
  xfs: track cow/shared record domains explicitly in xfs_refcount_irec
  xfs: refactor refcount record usage in xchk_refcountbt_rec
  xfs: dump corrupt recovered log intent items to dmesg consistently
  xfs: move _irec structs to xfs_types.h
  xfs: actually abort log recovery on corrupt intent-done log items
  xfs: check deferred refcount op continuation parameters
  xfs: refactor all the EFI/EFD log item sizeof logic
  xfs: create a predicate to verify per-AG extents
  xfs: fix memcpy fortify errors in EFI log format copying
  xfs: make sure aglen never goes negative in xfs_refcount_adjust_extents
  xfs: fix memcpy fortify errors in RUI log format copying
  xfs: fix memcpy fortify errors in CUI log format copying
  xfs: fix memcpy fortify errors in BUI log format copying
  ...
2022-11-04 15:05:42 -07:00
Linus Torvalds
7f7bac08d9 fuse fixes for 6.1-rc4
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQSQHSd0lITzzeNWNm3h3BK/laaZPAUCY2PdvgAKCRDh3BK/laaZ
 POCdAQDzAm/UqHlkyLXfrlcCJxkcC7nt65SBQ+OhDa6SAytYpgD9GXeuYUxZ0iU9
 TJpnwMAex/GeutxLXUwjjBRNxA9CkQY=
 =qovn
 -----END PGP SIGNATURE-----

Merge tag 'fuse-fixes-6.1-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse

Pull fuse fixes from Miklos Szeredi:
 "Fix two rarely triggered but long-standing issues"

* tag 'fuse-fixes-6.1-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse:
  fuse: add file_modified() to fallocate
  fuse: fix readdir cache race
2022-11-03 13:41:29 -07:00
Linus Torvalds
f2f32f8af2 for-6.1-rc3-tag
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEE8rQSAMVO+zA4DBdWxWXV+ddtWDsFAmNj2yMACgkQxWXV+ddt
 WDsRPg/+Mgp4lLF6WCUhWNbO7K7EdJ+YEikDr7/35TTUcnpqZ6oBrWiHwwcG4d2S
 V7eQLf/yId5zVfSD+aZEOSz8gC6Mh+0CujVdj09BYuDl7fDIEjFaoH38JsAhANFO
 uUaqxzgZw2feWpwiEF9P2iwZD8VqUMAELjASjBBZVMs6WCpM6SDQRPDj/IkfI2BN
 qgtKB7Im9VYBN92eIKlg6+MQCwuMMXKZRQH3dkPfYGJYQMDRyYrDxoeVWSAf9pGX
 Xvb3mEUZEcPQmE6ue78Ny0OGXX2sh7Mvz4cEFBJvFUPi99Iu6TluVBgN0akuMTwZ
 oZbV/1Abs+KV+yOICAhE/u7mKkLPsfRZeR4Ly8qjIlMUN12r1MR1BuGOJj750nsi
 LLBohtfQ+BQYpEOrJ32MbdxXy6/CBinC6Xqz+J3M+F/AMYREPLaND7Co5YkgWyT4
 pViRpgxLV+plP5bizbiXtnXI1h4OMBRx7idAZmeBNFtquHSzgf9psUz+sHI8Wvr2
 tAI+6n7RSnUDG/N+p0cJSqZf4RZWevjVJrUS4pko56t9ixK/xPkyVFbYLIdcd3bC
 N83tDgNtdBuyuFw3f2Ye+f0BxBhpZx6getQW2W9mb+6ylN5nyHFWmQpDGO5sDec0
 KJRR3w8vQ/0+64P2JhjFbYW55CzpmB279qGxemsnGakDweEcs+o=
 =Ltzp
 -----END PGP SIGNATURE-----

Merge tag 'for-6.1-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux

Pull btrfs fixes from David Sterba:
 "A batch of error handling fixes for resource leaks, fixes for nowait
  mode in combination with direct and buffered IO:

   - direct IO + dsync + nowait could miss a sync of the file after
     write, add handling for this combination

   - buffered IO + nowait should not fail with ENOSPC, only blocking IO
     could determine that

   - error handling fixes:
      - fix inode reserve space leak due to nowait buffered write
      - check the correct variable after allocation (direct IO submit)
      - fix inode list leak during backref walking
      - fix ulist freeing in self tests"

* tag 'for-6.1-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
  btrfs: fix inode reserve space leak due to nowait buffered write
  btrfs: fix nowait buffered write returning -ENOSPC
  btrfs: remove pointless and double ulist frees in error paths of qgroup tests
  btrfs: fix ulist leaks in error paths of qgroup self tests
  btrfs: fix inode list leak during backref walking at find_parent_nodes()
  btrfs: fix inode list leak during backref walking at resolve_indirect_refs()
  btrfs: fix lost file sync on direct IO write with nowait and dsync iocb
  btrfs: fix a memory allocation failure test in btrfs_submit_direct
2022-11-03 11:12:48 -07:00
Linus Torvalds
31fc92fc93 NFS Client Bugfixes for Linux 6.1-rc
Bugfixes:
   * Fix some coccicheck warnings
   * Avoid memcpy() run-time warning
   * Fix up various state reclaim / RECLAIM_COMPLETE errors
   * Fix a null pointer dereference in sysfs
   * Fix LOCK races
   * Fix gss_unwrap_resp_integ() crasher
   * Fix zero length clones
   * Fix memleak when allocate slot fails
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEnZ5MQTpR7cLU7KEp18tUv7ClQOsFAmNil9kACgkQ18tUv7Cl
 QOvr2BAA3piO9HMIWIqCbewSIeotzzjdeYSh5qZ7GqRmsz7/KSvN28+dxaXlJVs1
 Vi646NHKsp5rkqXK10su+AjtDDER3P9ybOlZyNkwz6AzkAbpBIreKOqd7AV/mJ0d
 kZv8rdJSaDUlsAjnCcaTyjAr9qdT2olI6gSdPXdVjBkbbNcxtygxAToA0Bw1tTBr
 pP7pSYXbdbl1tZYe5fuvZdbhVRLggrcIYpvSrSho05iFHz5MZIc7g50uvr13Tv4Y
 A0tZg0YCHoxKcAvTjh2M7pjEOzCvBGP9an3me260PljCm+AwFXTQLBLAvHeGm7D5
 sflS60T5rlLBwqvZXa4efXvhWJJTnkQxDLrUKCgoUgLAVuzYrq6oTRUtOgBHnl18
 mj8MR3EHh/t4Y+c7AURK+wBzBaxg02ltUYWVjUT0k1+pDzaFVjnNzEvX+1Nj3Rm/
 Ib4D8zsditwHuug7A95ALNhwLjxBYqJS3b8okn0vIvpKxvLa6jjvXXN2ggDOUQWY
 wfKVa7A3dBmKBWh/uu5s/P5q6pTxYdc9fZUaJZoEXwjYcGXVpfUqeaQGl/IMv4Xp
 Qir8nlcEPGGU4eD8Byl2Fr01NsnHDNDD8QdvJcI+mqy7p1ZPOrqiXYckZdjPIcz2
 4EpjY+IDoOlnPW9FWq+EeyuZVc60rvun4qHfMsf54MGRT8qSaoI=
 =iGEB
 -----END PGP SIGNATURE-----

Merge tag 'nfs-for-6.1-2' of git://git.linux-nfs.org/projects/anna/linux-nfs

Pull NFS client bugfixes from Anna Schumaker:

 - Fix some coccicheck warnings

 - Avoid memcpy() run-time warning

 - Fix up various state reclaim / RECLAIM_COMPLETE errors

 - Fix a null pointer dereference in sysfs

 - Fix LOCK races

 - Fix gss_unwrap_resp_integ() crasher

 - Fix zero length clones

 - Fix memleak when allocate slot fails

* tag 'nfs-for-6.1-2' of git://git.linux-nfs.org/projects/anna/linux-nfs:
  nfs4: Fix kmemleak when allocate slot failed
  NFSv4.2: Fixup CLONE dest file size for zero-length count
  SUNRPC: Fix crasher in gss_unwrap_resp_integ()
  NFSv4: Retry LOCK on OLD_STATEID during delegation return
  SUNRPC: Fix null-ptr-deref when xps sysfs alloc failed
  NFSv4.1: We must always send RECLAIM_COMPLETE after a reboot
  NFSv4.1: Handle RECLAIM_COMPLETE trunking errors
  NFSv4: Fix a potential state reclaim deadlock
  NFS: Avoid memcpy() run-time warning for struct sockaddr overflows
  nfs: Remove redundant null checks before kfree
2022-11-02 11:18:13 -07:00
Filipe Manana
eb81b682b1 btrfs: fix inode reserve space leak due to nowait buffered write
During a nowait buffered write, if we fail to balance dirty pages we exit
btrfs_buffered_write() without releasing the delalloc space reserved for
an extent, resulting in leaking space from the inode's block reserve.

So fix that by releasing the delalloc space for the extent when balancing
dirty pages fails.

Reported-by: kernel test robot <yujie.liu@intel.com>
Link: https://lore.kernel.org/all/202210111304.d369bc32-yujie.liu@intel.com
Fixes: 965f47aeb5 ("btrfs: make btrfs_buffered_write nowait compatible")
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-11-02 17:44:45 +01:00
Filipe Manana
a348c8d4f6 btrfs: fix nowait buffered write returning -ENOSPC
If we are doing a buffered write in NOWAIT context and we can't reserve
metadata space due to -ENOSPC, then we should return -EAGAIN so that we
retry the write in a context allowed to block and do metadata reservation
with flushing, which might succeed this time due to the allowed flushing.

Returning -ENOSPC while in NOWAIT context simply makes some writes fail
with -ENOSPC when they would likely succeed after switching from NOWAIT
context to blocking context. That is unexpected behaviour and even fio
complains about it with a warning like this:

  fio: io_u error on file /mnt/sdi/task_0.0.0: No space left on device: write offset=1535705088, buflen=65536
  fio: pid=592630, err=28/file:io_u.c:1846, func=io_u error, error=No space left on device

The fio's job config is this:

   [global]
   bs=64K
   ioengine=io_uring
   iodepth=1
   size=2236962133
   nr_files=1
   filesize=2236962133
   direct=0
   runtime=10
   fallocate=posix
   io_size=2236962133
   group_reporting
   time_based

   [task_0]
   rw=randwrite
   directory=/mnt/sdi
   numjobs=4

So fix this by returning -EAGAIN if we are in NOWAIT context and the
metadata reservation failed with -ENOSPC.

Fixes: 304e45acdb ("btrfs: plumb NOWAIT through the write path")
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-11-02 17:44:42 +01:00
Filipe Manana
d0ea17aec1 btrfs: remove pointless and double ulist frees in error paths of qgroup tests
Several places in the qgroup self tests follow the pattern of freeing the
ulist pointer they passed to btrfs_find_all_roots() if the call to that
function returned an error. That is pointless because that function always
frees the ulist in case it returns an error.

Also In some places like at test_multiple_refs(), after a call to
btrfs_qgroup_account_extent() we also leave "old_roots" and "new_roots"
pointing to ulists that were freed, because btrfs_qgroup_account_extent()
has freed those ulists, and if after that the next call to
btrfs_find_all_roots() fails, we call ulist_free() on the "old_roots"
ulist again, resulting in a double free.

So remove those calls to reduce the code size and avoid double ulist
free in case of an error.

Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-11-02 17:44:30 +01:00
Filipe Manana
d37de92b38 btrfs: fix ulist leaks in error paths of qgroup self tests
In the test_no_shared_qgroup() and test_multiple_refs() qgroup self tests,
if we fail to add the tree ref, remove the extent item or remove the
extent ref, we are returning from the test function without freeing the
"old_roots" ulist that was allocated by the previous calls to
btrfs_find_all_roots(). Fix that by calling ulist_free() before returning.

Fixes: 442244c963 ("btrfs: qgroup: Switch self test to extent-oriented qgroup mechanism.")
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-11-02 17:43:32 +01:00
Filipe Manana
92876eec38 btrfs: fix inode list leak during backref walking at find_parent_nodes()
During backref walking, at find_parent_nodes(), if we are dealing with a
data extent and we get an error while resolving the indirect backrefs, at
resolve_indirect_refs(), or in the while loop that iterates over the refs
in the direct refs rbtree, we end up leaking the inode lists attached to
the direct refs we have in the direct refs rbtree that were not yet added
to the refs ulist passed as argument to find_parent_nodes(). Since they
were not yet added to the refs ulist and prelim_release() does not free
the lists, on error the caller can only free the lists attached to the
refs that were added to the refs ulist, all the remaining refs get their
inode lists never freed, therefore leaking their memory.

Fix this by having prelim_release() always free any attached inode list
to each ref found in the rbtree, and have find_parent_nodes() set the
ref's inode list to NULL once it transfers ownership of the inode list
to a ref added to the refs ulist passed to find_parent_nodes().

Fixes: 86d5f99442 ("btrfs: convert prelimary reference tracking to use rbtrees")
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-11-02 17:43:28 +01:00
Filipe Manana
5614dc3a47 btrfs: fix inode list leak during backref walking at resolve_indirect_refs()
During backref walking, at resolve_indirect_refs(), if we get an error
we jump to the 'out' label and call ulist_free() on the 'parents' ulist,
which frees all the elements in the ulist - however that does not free
any inode lists that may be attached to elements, through the 'aux' field
of a ulist node, so we end up leaking lists if we have any attached to
the unodes.

Fix this by calling free_leaf_list() instead of ulist_free() when we exit
from resolve_indirect_refs(). The static function free_leaf_list() is
moved up for this to be possible and it's slightly simplified by removing
unnecessary code.

Fixes: 3301958b7c ("Btrfs: add inodes before dropping the extent lock in find_all_leafs")
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-11-02 17:43:25 +01:00
Linus Torvalds
6eafb4a13d Fixes:
- Fix a loop that occurs when using multiple net namespaces
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEKLLlsBKG3yQ88j7+M2qzM29mf5cFAmNhj5EACgkQM2qzM29m
 f5fqNxAAwlrhR83lzzcM4xt8woUnhnlyUjbVF38lVFLV7SJQC0Q2y4BktORxK1se
 GPKkWF5vn188xCwvhGZwFYdR2dL3z3GmUkOX9MOWWJwjAVkcACj5lVZcOzdSq2Ny
 iXKTym/6zTqp2rc0rjXQnaXLwUUHo3uNZe6qtMpY8tezwYkN9EG3ZWNcSgtFGdwA
 4accYxYu4p3J0BGig4rq0R3tjFf3Ya2u9igCdvBrObzBRNyYpoVlyYpXRoK0f1mp
 PhWk+9qtEBD5qqddj5ZgQtkZt8GSIHxJvlyyYnvv/YvSqZ26e3zjkS9tDVLPdTss
 6RiaKz8iKYEOHAtABfqikJMoPGU51fg5auY4gmm4DgeYO9HTQmQXvpHBZEuejTKt
 Gv4CVOV7ziQtSl5EwOLO5d1CiHWA9u57PYrzQeHf7+Y1kCHmB9dy35LztG+3LaNJ
 r357EyGaGhXD4tpad4xZAl9soo2DUy2BWIr1CvbwvLaveV3oAu/svPUAvCWXRPH9
 /PDfVmAOo1t4yYvMIsx/gJn//Wv0qBtnLsCaby34el4NF5eSTRaYTT+LTUNPLd/j
 oVwf0FPEyp7lHXNH+rrjCn91YrjY+1qnVLkrf1TbpC9XcemONe3lNnl/X5IjSNqS
 BiJXS1Xe1qLeiU+vRsxH8gN/+vr4PDkebm/M371rs7ymL5pfrEM=
 =Mss6
 -----END PGP SIGNATURE-----

Merge tag 'nfsd-6.1-3' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux

Pull nfsd fix from Chuck Lever:

 - Fix a loop that occurs when using multiple net namespaces

* tag 'nfsd-6.1-3' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux:
  nfsd: fix net-namespace logic in __nfsd_file_cache_purge
2022-11-01 15:05:03 -07:00
Jeff Layton
d3aefd2b29 nfsd: fix net-namespace logic in __nfsd_file_cache_purge
If the namespace doesn't match the one in "net", then we'll continue,
but that doesn't cause another rhashtable_walk_next call, so it will
loop infinitely.

Fixes: ce502f81ba ("NFSD: Convert the filecache to use rhashtable")
Reported-by: Petr Vorel <pvorel@suse.cz>
Link: https://lore.kernel.org/ltp/Y1%2FP8gDAcWC%2F+VR3@pevik/
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-11-01 17:27:27 -04:00
Linus Torvalds
5aaef24b5c for-6.1-rc3-tag
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEE8rQSAMVO+zA4DBdWxWXV+ddtWDsFAmNfzNwACgkQxWXV+ddt
 WDuC6Q//a72PAq1sjwvQqAcr+OOe3PWnmlwYZCnXxiab5c74Kc7rDhDZcO3m/Qt5
 3YTwgK5FT4Y0AI8RN1NXx3+UOAYCWp/TGeBdbPHg35XIYKAnCh4pfql84Uiw1Awz
 HbqmSTma7sqVdRMehkKCkd7w4YoyAAsDdyXFQlSFm4ah9WHFZDswBc+m6xQZuWvU
 QVQS6wUTxkxuBZp0UComWGBNHiDeDZbga7VqO8UHPYOB394IV2mYP6fh8l0oB/BS
 bfKgsHjV9e0S0Ul0oPVADCGCiJcTbdnw3IA+Cje7MSgZ3kds/4Bo5IJWT5QRb94A
 yDAFpxc+t3+FgpoKS3/tZK7imXwgpXueiT2bBj+BjDDWD2VUVVBG4QmXYIW6tuqY
 vtEFw9+NCAvS2gRetHyXxQshYh/QW//+AZSkuI6/fuPSM+lRG5E0lnDxqrZiOMIo
 e6SJOGH3tCmtusL5VSXIQ8DPaLI9PBg4OXChytwmLHwPIusbQOvD5sTDpd99UezB
 dLXqZOGGScAc11HU1AFyZfAxTBybUgUxX/xCviJtf7ZOWKdcwiFrzSJOL5upSPz3
 8qZTVjrD71mJlEa0Z8wj0Utuu4Psecp0GN+fs5JJxmqsFO0cYApU17OqPZ22+yEV
 RU26YNpqurYVarHVER4WxyXYraBYd1Cr6s6bFVDnuZynfiCOYIw=
 =3tvc
 -----END PGP SIGNATURE-----

Merge tag 'for-6.1-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux

Pull btrfs fixes from David Sterba:
 "A few more fixes and regression fixes:

   - fix a corner case when handling tree-mod-log chagnes in reallocated
     notes

   - fix crash on raid0 filesystems created with <5.4 mkfs.btrfs that
     could lead to division by zero

   - add missing super block checksum verification after thawing
     filesystem

   - handle one more case in send when dealing with orphan files

   - fix parameter type mismatch for generation when reading dentry

   - improved error handling in raid56 code

   - better struct bio packing after recent cleanups"

* tag 'for-6.1-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
  btrfs: don't use btrfs_chunk::sub_stripes from disk
  btrfs: fix type of parameter generation in btrfs_get_dentry
  btrfs: send: fix send failure of a subcase of orphan inodes
  btrfs: make thaw time super block check to also verify checksum
  btrfs: fix tree mod log mishandling of reallocated nodes
  btrfs: reorder btrfs_bio for better packing
  btrfs: raid56: avoid double freeing for rbio if full_stripe_write() failed
  btrfs: raid56: properly handle the error when unable to find the missing stripe
2022-10-31 12:28:29 -07:00
Darrick J. Wong
8b972158af xfs: rename XFS_REFC_COW_START to _COWFLAG
We've been (ab)using XFS_REFC_COW_START as both an integer quantity and
a bit flag, even though it's *only* a bit flag.  Rename the variable to
reflect its nature and update the cast target since we're not supposed
to be comparing it to xfs_agblock_t now.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
2022-10-31 08:58:22 -07:00
Darrick J. Wong
c1ccf967bf xfs: fix uninitialized list head in struct xfs_refcount_recovery
We're supposed to initialize the list head of an object before adding it
to another list.  Fix that, and stop using the kmem_{alloc,free} calls
from the Irix days.

Fixes: 174edb0e46 ("xfs: store in-progress CoW allocations in the refcount btree")
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
2022-10-31 08:58:22 -07:00
Darrick J. Wong
f1fdc82078 xfs: fix agblocks check in the cow leftover recovery function
As we've seen, refcount records use the upper bit of the rc_startblock
field to ensure that all the refcount records are at the right side of
the refcount btree.  This works because an AG is never allowed to have
more than (1U << 31) blocks in it.  If we ever encounter a filesystem
claiming to have that many blocks, we absolutely do not want reflink
touching it at all.

However, this test at the start of xfs_refcount_recover_cow_leftovers is
slightly incorrect -- it /should/ be checking that agblocks isn't larger
than the XFS_MAX_CRC_AG_BLOCKS constant, and it should check that the
constant is never large enough to conflict with that CoW flag.

Note that the V5 superblock verifier has not historically rejected
filesystems where agblocks >= XFS_MAX_CRC_AG_BLOCKS, which is why this
ended up in the COW recovery routine.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
2022-10-31 08:58:21 -07:00
Darrick J. Wong
f62ac3e0ac xfs: check record domain when accessing refcount records
Now that we've separated the startblock and CoW/shared extent domain in
the incore refcount record structure, check the domain whenever we
retrieve a record to ensure that it's still in the domain that we want.
Depending on the circumstances, a change in domain either means we're
done processing or that we've found a corruption and need to fail out.

The refcount check in xchk_xref_is_cow_staging is redundant since
_get_rec has done that for a long time now, so we can get rid of it.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
2022-10-31 08:58:21 -07:00
Darrick J. Wong
68d0f38917 xfs: remove XFS_FIND_RCEXT_SHARED and _COW
Now that we have an explicit enum for shared and CoW staging extents, we
can get rid of the old FIND_RCEXT flags.  Omit a couple of conversions
that disappear in the next patches.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
2022-10-31 08:58:21 -07:00
Darrick J. Wong
f492135df0 xfs: refactor domain and refcount checking
Create a helper function to ensure that CoW staging extent records have
a single refcount and that shared extent records have more than 1
refcount.  We'll put this to more use in the next patch.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
2022-10-31 08:58:21 -07:00
Darrick J. Wong
571423a162 xfs: report refcount domain in tracepoints
Now that we've broken out the startblock and shared/cow domain in the
incore refcount extent record structure, update the tracepoints to
report the domain.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
2022-10-31 08:58:21 -07:00
Darrick J. Wong
9a50ee4f8d xfs: track cow/shared record domains explicitly in xfs_refcount_irec
Just prior to committing the reflink code into upstream, the xfs
maintainer at the time requested that I find a way to shard the refcount
records into two domains -- one for records tracking shared extents, and
a second for tracking CoW staging extents.  The idea here was to
minimize mount time CoW reclamation by pushing all the CoW records to
the right edge of the keyspace, and it was accomplished by setting the
upper bit in rc_startblock.  We don't allow AGs to have more than 2^31
blocks, so the bit was free.

Unfortunately, this was a very late addition to the codebase, so most of
the refcount record processing code still treats rc_startblock as a u32
and pays no attention to whether or not the upper bit (the cow flag) is
set.  This is a weakness is theoretically exploitable, since we're not
fully validating the incoming metadata records.

Fuzzing demonstrates practical exploits of this weakness.  If the cow
flag of a node block key record is corrupted, a lookup operation can go
to the wrong record block and start returning records from the wrong
cow/shared domain.  This causes the math to go all wrong (since cow
domain is still implicit in the upper bit of rc_startblock) and we can
crash the kernel by tricking xfs into jumping into a nonexistent AG and
tripping over xfs_perag_get(mp, <nonexistent AG>) returning NULL.

To fix this, start tracking the domain as an explicit part of struct
xfs_refcount_irec, adjust all refcount functions to check the domain
of a returned record, and alter the function definitions to accept them
where necessary.

Found by fuzzing keys[2].cowflag = add in xfs/464.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
2022-10-31 08:58:21 -07:00
Darrick J. Wong
5a8c345ca8 xfs: refactor refcount record usage in xchk_refcountbt_rec
Consolidate the open-coded xfs_refcount_irec fields into an actual
struct and use the existing _btrec_to_irec to decode the ondisk record.
This will reduce code churn in the next patch.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
2022-10-31 08:58:21 -07:00
Darrick J. Wong
9e7e2436c1 xfs: move _irec structs to xfs_types.h
Structure definitions for incore objects do not belong in the ondisk
format header.  Move them to the incore types header where they belong.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
2022-10-31 08:58:20 -07:00
Darrick J. Wong
8edbe0cf8b xfs: check deferred refcount op continuation parameters
If we're in the middle of a deferred refcount operation and decide to
roll the transaction to avoid overflowing the transaction space, we need
to check the new agbno/aglen parameters that we're about to record in
the new intent.  Specifically, we need to check that the new extent is
completely within the filesystem, and that continuation does not put us
into a different AG.

If the keys of a node block are wrong, the lookup to resume an
xfs_refcount_adjust_extents operation can put us into the wrong record
block.  If this happens, we might not find that we run out of aglen at
an exact record boundary, which will cause the loop control to do the
wrong thing.

The previous patch should take care of that problem, but let's add this
extra sanity check to stop corruption problems sooner than later.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
2022-10-31 08:58:20 -07:00