iomap_want_unshare_iter currently sits in fs/iomap/buffered-io.c, which
depends on CONFIG_BLOCK. It is also in used in fs/dax.c whіch has no
such dependency. Given that it is a trivial check turn it into an inline
in include/linux/iomap.h to fix the DAX && !BLOCK build.
Fixes: 6ef6a0e821 ("iomap: share iomap_unshare_iter predicate code with fsdax")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20241015041350.118403-1-hch@lst.de
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
In the normal case, when we excute `echo 0 > /proc/fs/nfsd/threads`, the
function `nfs4_state_destroy_net` in `nfs4_state_shutdown_net` will
release all resources related to the hashed `nfs4_client`. If the
`nfsd_client_shrinker` is running concurrently, the `expire_client`
function will first unhash this client and then destroy it. This can
lead to the following warning. Additionally, numerous use-after-free
errors may occur as well.
nfsd_client_shrinker echo 0 > /proc/fs/nfsd/threads
expire_client nfsd_shutdown_net
unhash_client ...
nfs4_state_shutdown_net
/* won't wait shrinker exit */
/* cancel_work(&nn->nfsd_shrinker_work)
* nfsd_file for this /* won't destroy unhashed client1 */
* client1 still alive nfs4_state_destroy_net
*/
nfsd_file_cache_shutdown
/* trigger warning */
kmem_cache_destroy(nfsd_file_slab)
kmem_cache_destroy(nfsd_file_mark_slab)
/* release nfsd_file and mark */
__destroy_client
====================================================================
BUG nfsd_file (Not tainted): Objects remaining in nfsd_file on
__kmem_cache_shutdown()
--------------------------------------------------------------------
CPU: 4 UID: 0 PID: 764 Comm: sh Not tainted 6.12.0-rc3+ #1
dump_stack_lvl+0x53/0x70
slab_err+0xb0/0xf0
__kmem_cache_shutdown+0x15c/0x310
kmem_cache_destroy+0x66/0x160
nfsd_file_cache_shutdown+0xac/0x210 [nfsd]
nfsd_destroy_serv+0x251/0x2a0 [nfsd]
nfsd_svc+0x125/0x1e0 [nfsd]
write_threads+0x16a/0x2a0 [nfsd]
nfsctl_transaction_write+0x74/0xa0 [nfsd]
vfs_write+0x1a5/0x6d0
ksys_write+0xc1/0x160
do_syscall_64+0x5f/0x170
entry_SYSCALL_64_after_hwframe+0x76/0x7e
====================================================================
BUG nfsd_file_mark (Tainted: G B W ): Objects remaining
nfsd_file_mark on __kmem_cache_shutdown()
--------------------------------------------------------------------
dump_stack_lvl+0x53/0x70
slab_err+0xb0/0xf0
__kmem_cache_shutdown+0x15c/0x310
kmem_cache_destroy+0x66/0x160
nfsd_file_cache_shutdown+0xc8/0x210 [nfsd]
nfsd_destroy_serv+0x251/0x2a0 [nfsd]
nfsd_svc+0x125/0x1e0 [nfsd]
write_threads+0x16a/0x2a0 [nfsd]
nfsctl_transaction_write+0x74/0xa0 [nfsd]
vfs_write+0x1a5/0x6d0
ksys_write+0xc1/0x160
do_syscall_64+0x5f/0x170
entry_SYSCALL_64_after_hwframe+0x76/0x7e
To resolve this issue, cancel `nfsd_shrinker_work` using synchronous
mode in nfs4_state_shutdown_net.
Fixes: 7c24fa2250 ("NFSD: replace delayed_work with work_struct for nfsd_client_shrinker")
Signed-off-by: Yang Erkun <yangerkun@huaweicloud.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
As Allison reported [1], currently get_tree_bdev() will store
"Can't lookup blockdev" error message. Although it makes sense for
pure bdev-based fses, this message may mislead users who try to use
EROFS file-backed mounts since get_tree_nodev() is used as a fallback
then.
Add get_tree_bdev_flags() to specify extensible flags [2] and
GET_TREE_BDEV_QUIET_LOOKUP to silence "Can't lookup blockdev" message
since it's misleading to EROFS file-backed mounts now.
[1] https://lore.kernel.org/r/CAOYeF9VQ8jKVmpy5Zy9DNhO6xmWSKMB-DO8yvBB0XvBE7=3Ugg@mail.gmail.com
[2] https://lore.kernel.org/r/ZwUkJEtwIpUA4qMz@infradead.org
Suggested-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Link: https://lore.kernel.org/r/20241009033151.2334888-1-hsiangkao@linux.alibaba.com
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Christian Brauner <brauner@kernel.org>
This fixes a fsck bug on a very old filesystem (pre mainline merge).
Fixes: 72350ee0ea ("bcachefs: Kill snapshot arg to fsck_write_inode()")
Reported-by: Marcin Mirosław <marcin@mejor.pl>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
kvmalloc() doesn't support allocations > INT_MAX, but vmalloc() does -
the limit should be lifted, but we can work around this for now.
A user with a 75 TB filesystem reported the following journal replay
error:
https://github.com/koverstreet/bcachefs/issues/769
In journal replay we have to sort and dedup all the keys from the
journal, which means we need a large contiguous allocation. Given that
the user has 128GB of ram, the 2GB limit on allocation size has become
far too small.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Fix a bug where mount was failing with -ERESTARTSYS:
https://github.com/koverstreet/bcachefs/issues/741
We only want the interruptible wait when called from fsync.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
- fix for multiple slabs created with the same name
- enable multipage folios
- theorical fix to also look for opened fids by inode if none
was found by dentry
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEE/IPbcYBuWt0zoYhOq06b7GqY5nAFAmcS81AACgkQq06b7GqY
5nACpBAAtXOGRjg+dushCwUVKBlnI3oTwE2G+ywnphNZg2A0emlMOxos7x1OTiM3
Fu0b10MCUWHIXo4jD6ALVPWITJTfjiXR8s90Q/ozypcIXXhkDDShhV31b2h6Iplr
YyKyjEehDFRiS7rqWC2a9mce99sOpwdQRmnssnWbjYvpJ4imFbl+50Z1I5Nc/Omu
j2y02eMuikiWF/shKj0Dx1mmpZ4InSv3kvlM+V2D2YdWKNonGZe/xFZhid95LXmr
Upt55R8k9qR2pn4VU22eKP6c34DIZGDlrcQdPUCNP5QuaAdGZov3TjNQdjE1bJmF
E2QdxvUNfvvHqlvaRrlWa27uMgXMcy7QV3LEKwmo3tmaYVw2PDMRbFXc9zQdxy91
zqXjjGasnwzE8ca36y79vZjFTHAyY5VK/3cHCL3ai+ysu4UL3k2QgmVegREG/xKk
G8Nz4UO/R6s8Wc2VqxKJdZS5NMLlADS+Aes0PG+9AxQz7iR9Ktgwrw39KDxMi+Lm
PeH3Gz2rP9+EPoa3usoBQtvvvmJKM/Wb9qdPW9vTtRbRJ7bVclJoizFoLMA/TiW1
Jru+HYGBO75s8RynwEDLMiJhkjZWHfVgDjPsY6YsGVH8W2gOcJ7egQ2J2EsuurN3
tzKz4uQilV+VeDuWs8pWKrX/c3Y3KpSYV+oayg7Je7LoTlQBmU8=
=VG4t
-----END PGP SIGNATURE-----
Merge tag '9p-for-6.12-rc4' of https://github.com/martinetd/linux
Pull 9p fixes from Dominique Martinet:
"Mashed-up update that I sat on too long:
- fix for multiple slabs created with the same name
- enable multipage folios
- theorical fix to also look for opened fids by inode if none was
found by dentry"
[ Enabling multi-page folios should have been done during the merge
window, but it's a one-liner, and the actual meat of the enablement
is in netfs and already in use for other filesystems... - Linus ]
* tag '9p-for-6.12-rc4' of https://github.com/martinetd/linux:
9p: Avoid creating multiple slab caches with the same name
9p: Enable multipage folios
9p: v9fs_fid_find: also lookup by inode if not found dentry
There is a race between laundromat handling of revoked delegations
and a client sending free_stateid operation. Laundromat thread
finds that delegation has expired and needs to be revoked so it
marks the delegation stid revoked and it puts it on a reaper list
but then it unlock the state lock and the actual delegation revocation
happens without the lock. Once the stid is marked revoked a racing
free_stateid processing thread does the following (1) it calls
list_del_init() which removes it from the reaper list and (2) frees
the delegation stid structure. The laundromat thread ends up not
calling the revoke_delegation() function for this particular delegation
but that means it will no release the lock lease that exists on
the file.
Now, a new open for this file comes in and ends up finding that
lease list isn't empty and calls nfsd_breaker_owns_lease() which ends
up trying to derefence a freed delegation stateid. Leading to the
followint use-after-free KASAN warning:
kernel: ==================================================================
kernel: BUG: KASAN: slab-use-after-free in nfsd_breaker_owns_lease+0x140/0x160 [nfsd]
kernel: Read of size 8 at addr ffff0000e73cd0c8 by task nfsd/6205
kernel:
kernel: CPU: 2 UID: 0 PID: 6205 Comm: nfsd Kdump: loaded Not tainted 6.11.0-rc7+ #9
kernel: Hardware name: Apple Inc. Apple Virtualization Generic Platform, BIOS 2069.0.0.0.0 08/03/2024
kernel: Call trace:
kernel: dump_backtrace+0x98/0x120
kernel: show_stack+0x1c/0x30
kernel: dump_stack_lvl+0x80/0xe8
kernel: print_address_description.constprop.0+0x84/0x390
kernel: print_report+0xa4/0x268
kernel: kasan_report+0xb4/0xf8
kernel: __asan_report_load8_noabort+0x1c/0x28
kernel: nfsd_breaker_owns_lease+0x140/0x160 [nfsd]
kernel: nfsd_file_do_acquire+0xb3c/0x11d0 [nfsd]
kernel: nfsd_file_acquire_opened+0x84/0x110 [nfsd]
kernel: nfs4_get_vfs_file+0x634/0x958 [nfsd]
kernel: nfsd4_process_open2+0xa40/0x1a40 [nfsd]
kernel: nfsd4_open+0xa08/0xe80 [nfsd]
kernel: nfsd4_proc_compound+0xb8c/0x2130 [nfsd]
kernel: nfsd_dispatch+0x22c/0x718 [nfsd]
kernel: svc_process_common+0x8e8/0x1960 [sunrpc]
kernel: svc_process+0x3d4/0x7e0 [sunrpc]
kernel: svc_handle_xprt+0x828/0xe10 [sunrpc]
kernel: svc_recv+0x2cc/0x6a8 [sunrpc]
kernel: nfsd+0x270/0x400 [nfsd]
kernel: kthread+0x288/0x310
kernel: ret_from_fork+0x10/0x20
This patch proposes a fixed that's based on adding 2 new additional
stid's sc_status values that help coordinate between the laundromat
and other operations (nfsd4_free_stateid() and nfsd4_delegreturn()).
First to make sure, that once the stid is marked revoked, it is not
removed by the nfsd4_free_stateid(), the laundromat take a reference
on the stateid. Then, coordinating whether the stid has been put
on the cl_revoked list or we are processing FREE_STATEID and need to
make sure to remove it from the list, each check that state and act
accordingly. If laundromat has added to the cl_revoke list before
the arrival of FREE_STATEID, then nfsd4_free_stateid() knows to remove
it from the list. If nfsd4_free_stateid() finds that operations arrived
before laundromat has placed it on cl_revoke list, it marks the state
freed and then laundromat will no longer add it to the list.
Also, for nfsd4_delegreturn() when looking for the specified stid,
we need to access stid that are marked removed or freeable, it means
the laundromat has started processing it but hasn't finished and this
delegreturn needs to return nfserr_deleg_revoked and not
nfserr_bad_stateid. The latter will not trigger a FREE_STATEID and the
lack of it will leave this stid on the cl_revoked list indefinitely.
Fixes: 2d4a532d38 ("nfsd: ensure that clp->cl_revoked list is protected by clp->cl_lock")
CC: stable@vger.kernel.org
Signed-off-by: Olga Kornievskaia <okorniev@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
-----BEGIN PGP SIGNATURE-----
iQGzBAABCgAdFiEE6fsu8pdIjtWE/DpLiiy9cAdyT1EFAmcSdmYACgkQiiy9cAdy
T1EnnAwAoNbY+odLB9atHIuaBftpyINrhzRrzpwTfYNtPKUPGxxGk2fiP29YqMLb
OF4jnC87E3P/xhydoZHXXe3kKBQFVMAkJZKHiZBvJd+brk/EadfQnNmIio1pwOGh
zFNxSujFtsM/1HU/ZoI2kaHzrqj5KxWKWFytZ6umd8C3NyKK9Lo/lcqUBKv8MpJy
XXkMBh+7HGKRfDQlU+n6NQ5+dqFL5xDjTXlm9dM8LXuInKy5oKTGnRhLA7OA8lt7
EenFo8joy0IpXUByHt+ksQ8P88NCnU2h9kGp1UrGrBPh90+MokRr9GAcH8twK8jt
/bpL4yzAwuk1TAg+L9mSLT2OtWYsDpsQZmsBMbxBZGr2qmtjwgbxSgjf6DNiJZgn
jz15nFsuEsU5AbX4EAE67fwRWAo9AmQFyOOcYgkiIWOFHaRU6D/2NzCxCDZ+mfpy
Z5f7dF/sA158iY4wmB5BrQpFamxzpLADz6Qy4NA9hXjEKsbyFAuf22EjE64ruxZ4
8nMB3buh
=peum
-----END PGP SIGNATURE-----
Merge tag 'v6.12-rc3-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6
Pull smb client fixes from Steve French:
- Fix possible double free setting xattrs
- Fix slab out of bounds with large ioctl payload
- Remove three unused functions, and an unused variable that could be
confusing
* tag 'v6.12-rc3-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6:
cifs: Remove unused functions
smb/client: Fix logically dead code
smb: client: fix OOBs when building SMB2_IOCTL request
smb: client: fix possible double free in smb2_set_ea()
* Fix integer overflow in xrep_bmap
* Fix stale dealloc punching for COW IO
Signed-off-by: Carlos Maiolino <cem@kernel.org>
-----BEGIN PGP SIGNATURE-----
iJUEABMJAB0WIQQMHYkcUKcy4GgPe2RGdaER5QtfpgUCZw5LIwAKCRBGdaER5Qtf
puRlAYDezbvs1dDSkKIGOt3inGdLptNAu4qniXBUkbYI9BzmtIVDueWP4Wo0dV3d
gu3xrWQBfjFXdmEuBlwLuAFrp07AN18BVMj+DWCiEShsPHSoSPcF/IrDiz4BHvGv
MKYq9CywFw==
=Gj9b
-----END PGP SIGNATURE-----
Merge tag 'xfs-6.12-fixes-4' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux
Pull xfs fixes from Carlos Maiolino:
- Fix integer overflow in xrep_bmap
- Fix stale dealloc punching for COW IO
* tag 'xfs-6.12-fixes-4' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
xfs: punch delalloc extents from the COW fork for COW writes
xfs: set IOMAP_F_SHARED for all COW fork allocations
xfs: share more code in xfs_buffered_write_iomap_begin
xfs: support the COW fork in xfs_bmap_punch_delalloc_range
xfs: IOMAP_ZERO and IOMAP_UNSHARE already hold invalidate_lock
xfs: take XFS_MMAPLOCK_EXCL xfs_file_write_zero_eof
xfs: factor out a xfs_file_write_zero_eof helper
iomap: move locking out of iomap_write_delalloc_release
iomap: remove iomap_file_buffered_write_punch_delalloc
iomap: factor out a iomap_last_written_block helper
xfs: fix integer overflow in xrep_bmap
Building the kernel with W=1 generates the following warning:
fs/proc/fd.c:81: warning: This comment starts with '/**',
but isn't a kernel-doc comment.
Use a normal comment for the helper function proc_fdinfo_permission().
Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev>
Link: https://lore.kernel.org/r/20241018102705.92237-2-thorsten.blum@linux.dev
Signed-off-by: Christian Brauner <brauner@kernel.org>
hash_check_key() checks and repairs the hash table btrees: dirents and
xattrs are open addressing hash tables.
We recently had a corruption reported where the hash type on an inode
somehow got flipped, which made the existing dirents invisible and
allowed new ones to be created with the same name.
Now, hash_check_key() can repair duplicates: it will delete one of them,
if it has an xattr or dangling dirent, but if it has two valid dirents
one of them gets renamed.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Different versions of the same inode (same inode number, different
snapshot ID) must have the same hash seed and type - lookups require
this, since they see keys from different snapshots simultaneously.
To repair we only need to make the inodes consistent, hash_check_key()
will do the rest.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
This helped with discovering some filesystem corruption fsck has having
trouble with: the str_hash type had gotten flipped on one snapshot's
version of an inode.
All versions of a given inode number have the same hash seed and hash
type, since lookups will be done with a single hash/seed and type and
see dirents/xattrs from multiple snapshots.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
The options parse in get_tree will split the options buffer, it will
get the empty string for last one by strsep(). After commit
ea0eeb89b1d5 ("bcachefs: reject unknown mount options") is merged,
unknown mount options is not allowed (here is empty string), and this
causes this errors. This can be reproduced just by the following steps:
bcachefs format /dev/loop
mount -t bcachefs -o metadata_target=loop1 /dev/loop1 /mnt/bcachefs/
Fixes: ea0eeb89b1d5 ("bcachefs: reject unknown mount options")
Signed-off-by: Hongbo Li <lihongbo22@huawei.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
When call show_options in bcachefs, the options buffer is appeneded
to the seq variable. In fact, it requires an additional comma to be
appended first. This will affect the remount process when reading
existing mount options.
Fixes: 9305cf91d05e ("bcachefs: bch2_opts_to_text()")
Signed-off-by: Hongbo Li <lihongbo22@huawei.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Found by generic/299: When we have to truncate a write due to -ENOSPC,
we may have to read in the folio we're writing to if we're now no longer
doing a complete write to a !uptodate folio.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
bch2_folio_reservation_get_partial(), on partial success, will now
return a reservation that's aligned to the filesystem blocksize.
This is a partial fix for fstests generic/299 - fio verify is badly
behaved in the presence of short writes that aren't aligned to its
blocksize.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
bch2_disk_reservation_put() zeroes out the reservation - oops.
This fixes a disk reservation leak when getting a quota reservation
returned an error.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Using commit_do() to call alloc_sectors_start_trans() breaks when we're
randomly injecting transaction restarts - the restart in the commit
causes us to leak the lock that alloc_sectorS_start_trans() takes.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
bch2_bucket_io_time_reset() doesn't need to succeed, which is why it
didn't previously retry on transaction restart - but we're now treating
these as errors.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
This is ugly:
We may discover in alloc_write_key that the data type we calculated is
wrong, because BCH_DATA_need_discard is checked/set elsewhere, and the
disk accounting counters we calculated need to be updated.
But bch2_alloc_key_to_dev_counters(..., BTREE_TRIGGER_gc) is not safe
w.r.t. transaction restarts, so we need to propagate the fixup back to
our gc state in case we take a transaction restart.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
this one is fairly harmless since the invalidate worker will just run
again later if it needs to, but still worth fixing
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
This should be impossible to hit in practice; the first lookup within a
transaction won't return a restart due to lock ordering, but we're
adding fault injection for transaction restarts and shaking out bugs.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
It is the usual shower of unrelated singletons - please see the individual
changelogs for details.
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZxGY5wAKCRDdBJ7gKXxA
js6RAQC16zQ7WRV091i79cEi1C5648NbZjMCU626hZjuyfbzKgEA2v8PYtjj9w2e
UGLxMY+PYZki2XNEh75Sikdkiyl9Vgg=
=xcWT
-----END PGP SIGNATURE-----
Merge tag 'mm-hotfixes-stable-2024-10-17-16-08' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull misc fixes from Andrew Morton:
"28 hotfixes. 13 are cc:stable. 23 are MM.
It is the usual shower of unrelated singletons - please see the
individual changelogs for details"
* tag 'mm-hotfixes-stable-2024-10-17-16-08' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (28 commits)
maple_tree: add regression test for spanning store bug
maple_tree: correct tree corruption on spanning store
mm/mglru: only clear kswapd_failures if reclaimable
mm/swapfile: skip HugeTLB pages for unuse_vma
selftests: mm: fix the incorrect usage() info of khugepaged
MAINTAINERS: add Jann as memory mapping/VMA reviewer
mm: swap: prevent possible data-race in __try_to_reclaim_swap
mm: khugepaged: fix the incorrect statistics when collapsing large file folios
MAINTAINERS: kasan, kcov: add bugzilla links
mm: don't install PMD mappings when THPs are disabled by the hw/process/vma
mm: huge_memory: add vma_thp_disabled() and thp_disabled_by_hw()
Docs/damon/maintainer-profile: update deprecated awslabs GitHub URLs
Docs/damon/maintainer-profile: add missing '_' suffixes for external web links
maple_tree: check for MA_STATE_BULK on setting wr_rebalance
mm: khugepaged: fix the arguments order in khugepaged_collapse_file trace point
mm/damon/tests/sysfs-kunit.h: fix memory leak in damon_sysfs_test_add_targets()
mm: remove unused stub for can_swapin_thp()
mailmap: add an entry for Andy Chiu
MAINTAINERS: add memory mapping/VMA co-maintainers
fs/proc: fix build with GCC 15 due to -Werror=unterminated-string-initialization
...
When btrfs reserves an extent and does not use it (e.g, by an error), it
calls btrfs_free_reserved_extent() to free the reserved extent. In the
process, it calls btrfs_add_free_space() and then it accounts the region
bytes as block_group->zone_unusable.
However, it leaves the space_info->bytes_zone_unusable side not updated. As
a result, ENOSPC can happen while a space_info reservation succeeded. The
reservation is fine because the freed region is not added in
space_info->bytes_zone_unusable, leaving that space as "free". OTOH,
corresponding block group counts it as zone_unusable and its allocation
pointer is not rewound, we cannot allocate an extent from that block group.
That will also negate space_info's async/sync reclaim process, and cause an
ENOSPC error from the extent allocation process.
Fix that by returning the space to space_info->bytes_zone_unusable.
Ideally, since a bio is not submitted for this reserved region, we should
return the space to free space and rewind the allocation pointer. But, it
needs rework on extent allocation handling, so let it work in this way for
now.
Fixes: 169e0da91a ("btrfs: zoned: track unusable bytes for zones")
CC: stable@vger.kernel.org # 5.15+
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
afs_wake_up_async_call() can incur lock recursion. The problem is that it
is called from AF_RXRPC whilst holding the ->notify_lock, but it tries to
take a ref on the afs_call struct in order to pass it to a work queue - but
if the afs_call is already queued, we then have an extraneous ref that must
be put... calling afs_put_call() may call back down into AF_RXRPC through
rxrpc_kernel_shutdown_call(), however, which might try taking the
->notify_lock again.
This case isn't very common, however, so defer it to a workqueue. The oops
looks something like:
BUG: spinlock recursion on CPU#0, krxrpcio/7001/1646
lock: 0xffff888141399b30, .magic: dead4ead, .owner: krxrpcio/7001/1646, .owner_cpu: 0
CPU: 0 UID: 0 PID: 1646 Comm: krxrpcio/7001 Not tainted 6.12.0-rc2-build3+ #4351
Hardware name: ASUS All Series/H97-PLUS, BIOS 2306 10/09/2014
Call Trace:
<TASK>
dump_stack_lvl+0x47/0x70
do_raw_spin_lock+0x3c/0x90
rxrpc_kernel_shutdown_call+0x83/0xb0
afs_put_call+0xd7/0x180
rxrpc_notify_socket+0xa0/0x190
rxrpc_input_split_jumbo+0x198/0x1d0
rxrpc_input_data+0x14b/0x1e0
? rxrpc_input_call_packet+0xc2/0x1f0
rxrpc_input_call_event+0xad/0x6b0
rxrpc_input_packet_on_conn+0x1e1/0x210
rxrpc_input_packet+0x3f2/0x4d0
rxrpc_io_thread+0x243/0x410
? __pfx_rxrpc_io_thread+0x10/0x10
kthread+0xcf/0xe0
? __pfx_kthread+0x10/0x10
ret_from_fork+0x24/0x40
? __pfx_kthread+0x10/0x10
ret_from_fork_asm+0x1a/0x30
</TASK>
Signed-off-by: David Howells <dhowells@redhat.com>
Link: https://lore.kernel.org/r/1394602.1729162732@warthog.procyon.org.uk
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
cc: linux-fsdevel@vger.kernel.org
Signed-off-by: Christian Brauner <brauner@kernel.org>
When copying a namespace we won't have added the new copy into the
namespace rbtree until after the copy succeeded. Calling free_mnt_ns()
will try to remove the copy from the rbtree which is invalid. Simply
free the namespace skeleton directly.
Link: https://lore.kernel.org/r/20241016-adapter-seilwinde-83c508a7bde1@brauner
Fixes: 1901c92497 ("fs: keep an index of current mount namespaces")
Tested-by: Brad Spengler <spender@grsecurity.net>
Cc: stable@vger.kernel.org # v6.11+
Reported-by: Brad Spengler <spender@grsecurity.net>
Suggested-by: Brad Spengler <spender@grsecurity.net>
Signed-off-by: Christian Brauner <brauner@kernel.org>
In the I/O locking code borrowed from NFS into netfslib, i_rwsem is held
locked across a buffered write - but this causes a performance regression
in cifs as it excludes buffered reads for the duration (cifs didn't use any
locking for buffered reads).
Mitigate this somewhat by downgrading the i_rwsem to a read lock across the
buffered write. This at least allows parallel reads to occur whilst
excluding other writes, DIO, truncate and setattr.
Note that this shouldn't be a problem for a buffered write as a read
through an mmap can circumvent i_rwsem anyway.
Also note that we might want to make this change in NFS also.
Signed-off-by: David Howells <dhowells@redhat.com>
Link: https://lore.kernel.org/r/1317958.1729096113@warthog.procyon.org.uk
cc: Steve French <sfrench@samba.org>
cc: Paulo Alcantara <pc@manguebit.com>
cc: Trond Myklebust <trondmy@kernel.org>
cc: Jeff Layton <jlayton@kernel.org>
cc: netfs@lists.linux.dev
cc: linux-cifs@vger.kernel.org
cc: linux-nfs@vger.kernel.org
cc: linux-fsdevel@vger.kernel.org
Signed-off-by: Christian Brauner <brauner@kernel.org>
show show_smap_vma_flags() has been a using misspelled initializer in
mnemonics[] - it needed to initialize 2 element array of char and it used
NUL-padded 2 character string literals (i.e. 3-element initializer).
This has been spotted by gcc-15[*]; prior to that gcc quietly dropped the
3rd eleemnt of initializers. To fix this we are increasing the size of
mnemonics[] (from mnemonics[BITS_PER_LONG][2] to
mnemonics[BITS_PER_LONG][3]) to accomodate the NUL-padded string literals.
This also helps us in simplyfying the logic for printing of the flags as
instead of printing each character from the mnemonics[], we can just print
the mnemonics[] using seq_printf.
[*]: fs/proc/task_mmu.c:917:49: error: initializer-string for array of `char' is too long [-Werror=unterminate d-string-initialization]
917 | [0 ... (BITS_PER_LONG-1)] = "??",
| ^~~~
fs/proc/task_mmu.c:917:49: error: initializer-string for array of `char' is too long [-Werror=unterminate d-string-initialization]
fs/proc/task_mmu.c:917:49: error: initializer-string for array of `char' is too long [-Werror=unterminate d-string-initialization]
fs/proc/task_mmu.c:917:49: error: initializer-string for array of `char' is too long [-Werror=unterminate d-string-initialization]
fs/proc/task_mmu.c:917:49: error: initializer-string for array of `char' is too long [-Werror=unterminate d-string-initialization]
fs/proc/task_mmu.c:917:49: error: initializer-string for array of `char' is too long [-Werror=unterminate d-string-initialization]
...
Stephen pointed out:
: The C standard explicitly allows for a string initializer to be too long
: due to the NUL byte at the end ... so this warning may be overzealous.
but let's make the warning go away anwyay.
Link: https://lkml.kernel.org/r/20241005063700.2241027-1-brahmajit.xyz@gmail.com
Link: https://lkml.kernel.org/r/20241003093040.47c08382@canb.auug.org.au
Signed-off-by: Brahmajit Das <brahmajit.xyz@gmail.com>
Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: David Hildenbrand <david@redhat.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Syzbot reported that a task hang occurs in vcs_open() during a fuzzing
test for nilfs2.
The root cause of this problem is that in nilfs_find_entry(), which
searches for directory entries, ignores errors when loading a directory
page/folio via nilfs_get_folio() fails.
If the filesystem images is corrupted, and the i_size of the directory
inode is large, and the directory page/folio is successfully read but
fails the sanity check, for example when it is zero-filled,
nilfs_check_folio() may continue to spit out error messages in bursts.
Fix this issue by propagating the error to the callers when loading a
page/folio fails in nilfs_find_entry().
The current interface of nilfs_find_entry() and its callers is outdated
and cannot propagate error codes such as -EIO and -ENOMEM returned via
nilfs_find_entry(), so fix it together.
Link: https://lkml.kernel.org/r/20241004033640.6841-1-konishi.ryusuke@gmail.com
Fixes: 2ba466d74e ("nilfs2: directory entry operations")
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Reported-by: Lizhi Xu <lizhi.xu@windriver.com>
Closes: https://lkml.kernel.org/r/20240927013806.3577931-1-lizhi.xu@windriver.com
Reported-by: syzbot+8a192e8d090fa9a31135@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=8a192e8d090fa9a31135
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEE8rQSAMVO+zA4DBdWxWXV+ddtWDsFAmcPxtAACgkQxWXV+ddt
WDu9lA//WfB88fwEKnqBYDRo6aiSMIAzLDuXkJ9i8d7rcjZO1OIZkEnMOsxhvTcZ
KxgjNjkgzoTyUwoAUlG+ZpvMeSNMhBdr2NFkXmYzN9oanFE4zplpZiWx6tGSApRU
0ilngjXBsr8p03HmB88Yb05DVYQ2elMP6Jx3VETDBa0CNyp4//tGKzusNhZdA7KM
XLZmkKRk3ZKabNo+p2J5t8UGJCl2L18U0o/EphfSkODKadUnsBbAPZUt2EGQCZwv
uZhDFAUkgTFBkeRO7JwTfDrNi51M4zwmh+kEduzg4Ny4TdFb1UapU7K1N330WMru
4Qa953Met9I4NB/kKI+fZP1lN4NGuD2qEU6yoZVSy4UiqRp1gEg8kOUfVGFbNJa1
VFYcwdrBad0I4PjnQc5bpZVjzqJT5wWiZxjlWrB7VyIfdmnvQxe5h4DBwBhN5FJr
+MEtuY2QNFygjDAZ5z0Ss8hegqI+FYi562Cjy9QRLhb3qGD8STF2BIChaILIn3oA
UVJUlUP6CUmCu1RZRMFB4/WkeHO46FmZxJErGfFeXqJInThf0/rdSZOQgIP0JsUq
N8FINEgXFAMCkK1PT7MNvAYkSP0tR7B0JjGKcSlGS3v3F0URCNGvHSiqbLedAtXT
lc1MdXTZxub8h6xhIvgY1j7HRAFGrunn7LD6MIKRWX1SZPWwAGI=
=DEUA
-----END PGP SIGNATURE-----
Merge tag 'for-6.12-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull btrfs fixes from David Sterba:
- regression fix: dirty extents tracked in xarray for qgroups must be
adjusted for 32bit platforms
- fix potentially freeing uninitialized name in fscrypt structure
- fix warning about unneeded variable in a send callback
* tag 'for-6.12-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: fix uninitialized pointer free on read_alloc_one_name() error
btrfs: send: cleanup unneeded return variable in changed_verity()
btrfs: fix uninitialized pointer free in add_inode_ref()
btrfs: use sector numbers as keys for the dirty extents xarray
-----BEGIN PGP SIGNATURE-----
iQGzBAABCgAdFiEE6fsu8pdIjtWE/DpLiiy9cAdyT1EFAmcPUXEACgkQiiy9cAdy
T1Gm7QwAlPW5//Cb4B0gpjzRcUws51IZ4yFhp4IQWmsd0RqdjZ4TxSCOPF3u3HR3
0OPxyLdbUn6h5g0S2ayzqomHx2VBOQTjgyuMtaTWzokToMNu8kqvxK1MTslkBior
9YEHUz9+5f0OJ+JBGNUzjfy4Plygr5y09udaLfqIknuY8+SeuooxNNUNfkIvrP7C
JsSAWJznN9VMpKJmszYc4ntyTiz1XVXyyjJmjhRQ27ah8LUghqZ0mamgigTS5UFa
U7eYBDfs6+9i5Lvkd4bJPdGyov9g/EPViLURZMfNaz3+p0TfosN8s2UZuhHC+zuv
BDQ+wHGRqzmteZspLanrGBt9y9svHXp1CD7MwqWeGR3GhKsfsxCMJpE931fBhsxM
vlJdd/xCs128fv48AvNyHA9abN0U1FpskOJhOzjDgvhKqDoIQ4TCC7QFDEttsPRv
ZiQmyOCPyZZY28EmfoltU4CFcMIwKQ81nPUSOJFgKmHBbSpc+Qtnv5QgRHZCzj7n
StJfaIMv
=WhJj
-----END PGP SIGNATURE-----
Merge tag 'v6.12-rc3-ksmbd-fixes' of git://git.samba.org/ksmbd
Pull smb server fixes from Steve French:
- fix race between session setup and session logoff
- add supplementary group support
* tag 'v6.12-rc3-ksmbd-fixes' of git://git.samba.org/ksmbd:
ksmbd: add support for supplementary groups
ksmbd: fix user-after-free from session log off
Syzbot reported that after nilfs2 reads a corrupted file system image
and degrades to read-only, the BUG_ON check for the buffer delay flag
in submit_bh_wbc() may fail, causing a kernel bug.
This is because the buffer delay flag is not cleared when clearing the
buffer state flags to discard a page/folio or a buffer head. So, fix
this.
This became necessary when the use of nilfs2's own page clear routine
was expanded. This state inconsistency does not occur if the buffer
is written normally by log writing.
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Link: https://lore.kernel.org/r/20241015213300.7114-1-konishi.ryusuke@gmail.com
Fixes: 8c26c4e269 ("nilfs2: fix issue with flush kernel thread after remount in RO mode because of driver's internal error or metadata corruption")
Reported-by: syzbot+985ada84bf055a575c07@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=985ada84bf055a575c07
Cc: stable@vger.kernel.org
Signed-off-by: Christian Brauner <brauner@kernel.org>
yangyun reported that libfuse test test_copy_file_range() copies zero
bytes from a newly written file when fuse passthrough is enabled.
The reason is that extending passthrough write is not updating the fuse
inode size and when vfs_copy_file_range() observes a zero size inode,
it returns without calling the filesystem copy_file_range() method.
Fix this by adjusting the fuse inode size after an extending passthrough
write.
This does not provide cache coherency of fuse inode attributes and
backing inode attributes, but it should prevent situations where fuse
inode size is too small, causing read/copy to be wrongly shortened.
Reported-by: yangyun <yangyun50@huawei.com>
Closes: https://github.com/libfuse/libfuse/issues/1048
Fixes: 57e1176e60 ("fuse: implement read/write passthrough")
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
cifs_ses_find_chan() has been unused since commit
f486ef8e20 ("cifs: use the chans_need_reconnect bitmap for reconnect status")
cifs_read_page_from_socket() has been unused since commit
d08089f649 ("cifs: Change the I/O paths to use an iterator rather than a page list")
cifs_chan_in_reconnect() has been unused since commit
bc962159e8 ("cifs: avoid race conditions with parallel reconnects")
Remove them.
Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
The if condition in collect_sample: can never be satisfied
because of a logical contradiction. The indicated dead code
may have performed some action; that action will never occur.
Fixes: 94ae8c3fee ("smb: client: compress: LZ77 code improvements cleanup")
Signed-off-by: Advait Dhamorikar <advaitdhamorikar@gmail.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
When using encryption, either enforced by the server or when using
'seal' mount option, the client will squash all compound request buffers
down for encryption into a single iov in smb2_set_next_command().
SMB2_ioctl_init() allocates a small buffer (448 bytes) to hold the
SMB2_IOCTL request in the first iov, and if the user passes an input
buffer that is greater than 328 bytes, smb2_set_next_command() will
end up writing off the end of @rqst->iov[0].iov_base as shown below:
mount.cifs //srv/share /mnt -o ...,seal
ln -s $(perl -e "print('a')for 1..1024") /mnt/link
BUG: KASAN: slab-out-of-bounds in
smb2_set_next_command.cold+0x1d6/0x24c [cifs]
Write of size 4116 at addr ffff8881148fcab8 by task ln/859
CPU: 1 UID: 0 PID: 859 Comm: ln Not tainted 6.12.0-rc3 #1
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS
1.16.3-2.fc40 04/01/2014
Call Trace:
<TASK>
dump_stack_lvl+0x5d/0x80
? smb2_set_next_command.cold+0x1d6/0x24c [cifs]
print_report+0x156/0x4d9
? smb2_set_next_command.cold+0x1d6/0x24c [cifs]
? __virt_addr_valid+0x145/0x310
? __phys_addr+0x46/0x90
? smb2_set_next_command.cold+0x1d6/0x24c [cifs]
kasan_report+0xda/0x110
? smb2_set_next_command.cold+0x1d6/0x24c [cifs]
kasan_check_range+0x10f/0x1f0
__asan_memcpy+0x3c/0x60
smb2_set_next_command.cold+0x1d6/0x24c [cifs]
smb2_compound_op+0x238c/0x3840 [cifs]
? kasan_save_track+0x14/0x30
? kasan_save_free_info+0x3b/0x70
? vfs_symlink+0x1a1/0x2c0
? do_symlinkat+0x108/0x1c0
? __pfx_smb2_compound_op+0x10/0x10 [cifs]
? kmem_cache_free+0x118/0x3e0
? cifs_get_writable_path+0xeb/0x1a0 [cifs]
smb2_get_reparse_inode+0x423/0x540 [cifs]
? __pfx_smb2_get_reparse_inode+0x10/0x10 [cifs]
? rcu_is_watching+0x20/0x50
? __kmalloc_noprof+0x37c/0x480
? smb2_create_reparse_symlink+0x257/0x490 [cifs]
? smb2_create_reparse_symlink+0x38f/0x490 [cifs]
smb2_create_reparse_symlink+0x38f/0x490 [cifs]
? __pfx_smb2_create_reparse_symlink+0x10/0x10 [cifs]
? find_held_lock+0x8a/0xa0
? hlock_class+0x32/0xb0
? __build_path_from_dentry_optional_prefix+0x19d/0x2e0 [cifs]
cifs_symlink+0x24f/0x960 [cifs]
? __pfx_make_vfsuid+0x10/0x10
? __pfx_cifs_symlink+0x10/0x10 [cifs]
? make_vfsgid+0x6b/0xc0
? generic_permission+0x96/0x2d0
vfs_symlink+0x1a1/0x2c0
do_symlinkat+0x108/0x1c0
? __pfx_do_symlinkat+0x10/0x10
? strncpy_from_user+0xaa/0x160
__x64_sys_symlinkat+0xb9/0xf0
do_syscall_64+0xbb/0x1d0
entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7f08d75c13bb
Reported-by: David Howells <dhowells@redhat.com>
Fixes: e77fe73c7e ("cifs: we can not use small padding iovs together with encryption")
Signed-off-by: Paulo Alcantara (Red Hat) <pc@manguebit.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Clang static checker(scan-build) warning:
fs/smb/client/smb2ops.c:1304:2: Attempt to free released memory.
1304 | kfree(ea);
| ^~~~~~~~~
There is a double free in such case:
'ea is initialized to NULL' -> 'first successful memory allocation for
ea' -> 'something failed, goto sea_exit' -> 'first memory release for ea'
-> 'goto replay_again' -> 'second goto sea_exit before allocate memory
for ea' -> 'second memory release for ea resulted in double free'.
Re-initialie 'ea' to NULL near to the replay_again label, it can fix this
double free problem.
Fixes: 4f1fffa237 ("cifs: commands that are retried should have replay flag set")
Reviewed-by: Dan Carpenter <dan.carpenter@linaro.org>
Signed-off-by: Su Hui <suhui@nfschina.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
- New metadata version inode_has_child_snapshots
This fixes bugs with handling of unlinked inodes + snapshots, in
particular when an inode is reattached after taking a snapshot;
deleted inodes now get correctly cleaned up across snapshots.
- Disk accounting rewrite fixes
- validation fixes for when a device has been removed
- fix journal replay failing with "journal_reclaim_would_deadlock"
- Some more small fixes for erasure coding + device removal
- Assorted small syzbot fixes
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEKnAFLkS8Qha+jvQrE6szbY3KbnYFAmcNw4UACgkQE6szbY3K
bnbSzBAAmSCCQCqRwnFSp4OdNSlBK9q1e5WsbKOqHgtoXZU/mOUBe/5bnPPqm6Mg
GkTc7FqVOs/95/rEDKXw2LneFgxRrt8MriJCUdXZvV5fC2R4Kdl0TkwABtMtm2Ae
wp37n6iQO81j4uZHfOj67RzC2NRo7dMdun5HnQPRBTKzyuDaZXqwjMmF2LmaeODh
oiBFUvD5nFBo5XvXPABBin6xpdquHO+6ZWf6SFD4+iRe11NrJAOAIS/crJvxsFfr
I/X152Z+gzKPE+NhANKMxlHyNnVGo7iHUqhUjVuI4SSaXb9Ap6k4sXgfoIzncR17
GA5qWtaNS1W72+awT3R2EaF9Tqi+Vng2RVfxxQ04giImnBq0eziOjlZ26enOE0LU
0ZZrBFzqpItqYbNnzPissHuKb1mAQGPWy6kxoGIrqDKbichA7lzyWDz2lgEE85Sx
E1mvHwYbKhUuLC4c4460hueGVUgMWmjqM3E8oex+oNDpauPB+/bnYkcgZEG2RBla
+ZlDL28fg4fxtqlUrOQeonQ1RecGNdRMJz7xiGnkYU9rQpUuv8QwFiBZGAbLP6zn
6fbFZGxS/pO95sY7GmAtKz7ZgKxJQCzII4s+Oht5AgOvoBlPjAiol1UbwYadYQxz
HKF+WBaPC9z/L6JjP+gx+uUzTWRIfBmhHylhWbKr4vLGfx3Jc1g=
=Rkq2
-----END PGP SIGNATURE-----
Merge tag 'bcachefs-2024-10-14' of git://evilpiepirate.org/bcachefs
Pull bcachefs fixes from Kent Overstreet:
- New metadata version inode_has_child_snapshots
This fixes bugs with handling of unlinked inodes + snapshots, in
particular when an inode is reattached after taking a snapshot;
deleted inodes now get correctly cleaned up across snapshots.
- Disk accounting rewrite fixes
- validation fixes for when a device has been removed
- fix journal replay failing with "journal_reclaim_would_deadlock"
- Some more small fixes for erasure coding + device removal
- Assorted small syzbot fixes
* tag 'bcachefs-2024-10-14' of git://evilpiepirate.org/bcachefs: (27 commits)
bcachefs: Fix sysfs warning in fstests generic/730,731
bcachefs: Handle race between stripe reuse, invalidate_stripe_to_dev
bcachefs: Fix kasan splat in new_stripe_alloc_buckets()
bcachefs: Add missing validation for bch_stripe.csum_granularity_bits
bcachefs: Fix missing bounds checks in bch2_alloc_read()
bcachefs: fix uaf in bch2_dio_write_done()
bcachefs: Improve check_snapshot_exists()
bcachefs: Fix bkey_nocow_lock()
bcachefs: Fix accounting replay flags
bcachefs: Fix invalid shift in member_to_text()
bcachefs: Fix bch2_have_enough_devs() for BCH_SB_MEMBER_INVALID
bcachefs: __wait_for_freeing_inode: Switch to wait_bit_queue_entry
bcachefs: Check if stuck in journal_res_get()
closures: Add closure_wait_event_timeout()
bcachefs: Fix state lock involved deadlock
bcachefs: Fix NULL pointer dereference in bch2_opt_to_text
bcachefs: Release transaction before wake up
bcachefs: add check for btree id against max in try read node
bcachefs: Disk accounting device validation fixes
bcachefs: bch2_inode_or_descendents_is_open()
...
When ->iomap_end is called on a short write to the COW fork it needs to
punch stale delalloc data from the COW fork and not the data fork.
Ensure that IOMAP_F_NEW is set for new COW fork allocations in
xfs_buffered_write_iomap_begin, and then use the IOMAP_F_SHARED flag
in xfs_buffered_write_delalloc_punch to decide which fork to punch.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
Change to always set xfs_buffered_write_iomap_begin for COW fork
allocations even if they don't overlap existing data fork extents,
which will allow the iomap_end callback to detect if it has to punch
stale delalloc blocks from the COW fork instead of the data fork. It
also means we sample the sequence counter for both the data and the COW
fork when writing to the COW fork, which ensures we properly revalidate
when only COW fork changes happens.
This is essentially a revert of commit 72a048c105 ("xfs: only set
IOMAP_F_SHARED when providing a srcmap to a write"). This is fine because
the problem that the commit fixed has now been dealt with in iomap by
only looking at the actual srcmap and not the fallback to the write
iomap.
Note that the direct I/O path was never changed and has always set
IOMAP_F_SHARED for all COW fork allocations.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
Introduce a local iomap_flags variable so that the code allocating new
delalloc blocks in the data fork can fall through to the found_imap
label and reuse the code to unlock and fill the iomap.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
xfs_buffered_write_iomap_begin can also create delallocate reservations
that need cleaning up, prepare for that by adding support for the COW
fork in xfs_bmap_punch_delalloc_range.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
All XFS callers of iomap_zero_range and iomap_file_unshare already hold
invalidate_lock, so we can't take it again in
iomap_file_buffered_write_punch_delalloc.
Use the passed in flags argument to detect if we're called from a zero
or unshare operation and don't take the lock again in this case.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
xfs_file_write_zero_eof is the only caller of xfs_zero_range that does
not take XFS_MMAPLOCK_EXCL (aka the invalidate lock). Currently that
is actually the right thing, as an error in the iomap zeroing code will
also take the invalidate_lock to clean up, but to fix that deadlock we
need a consistent locking pattern first.
The only extra thing that XFS_MMAPLOCK_EXCL will lock out are read
pagefaults, which isn't really needed here, but also not actively
harmful.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
Split a helper from xfs_file_write_checks that just deal with the
post-EOF zeroing to keep the code readable.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
XFS (which currently is the only user of iomap_write_delalloc_release)
already holds invalidate_lock for most zeroing operations. To be able
to avoid a deadlock it needs to stop taking the lock, but doing so
in iomap would leak XFS locking details into iomap.
To avoid this require the caller to hold invalidate_lock when calling
iomap_write_delalloc_release instead of taking it there.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
Currently iomap_file_buffered_write_punch_delalloc can be called from
XFS either with the invalidate lock held or not. To fix this while
keeping the locking in the file system and not the iomap library
code we'll need to life the locking up into the file system.
To prepare for that, open code iomap_file_buffered_write_punch_delalloc
in the only caller, and instead export iomap_write_delalloc_release.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
Split out a pice of logic from iomap_file_buffered_write_punch_delalloc
that is useful for all iomap_end implementations.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
This includes an urgent fix to resolve DIO read performance regression caused by
0cac51185e ("f2fs: fix to avoid racing in between read and OPU dio write").
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEE00UqedjCtOrGVvQiQBSofoJIUNIFAmcNSyoACgkQQBSofoJI
UNJBKg/6AgfkxWeY4Vb5h43nnmwnv/eJZZRIhsFMSz+vWEI1wfKeCkUys0Q1GGZ3
RPZl+ZT4vJ2FDjLRwZFfCi31nT/qIAlOnm0GtlDczsccBh91RrzFFKDHcwvfOd0/
NLpdqsDt2tuurf6zjbhtx5Paoyr7KvMg4+sSfRqV+7nvXmLImMH7ahRGiB5Eh4HP
gDNpQ7tk+D2+ZHBU40PUSYXooikFYznGuHk5JjpKnVCAsK8F0u9nA35ZeSlkkUCM
8MGS+zHEpEqD/wZWlrwUWhmXmHLuNUbJh6X3pPNYxe0s/+ymHo99zKRw5HHKKybK
FFZYbSWXrTNS+2SS2NUhUxp3CpPV0N6IGM+i7UkYo4DMu/MG7skVrOjkLx7NQvSf
8/8B8g6LQt32JzlNvCrZcjStEgdxbXzFaJZH71952C7dp8mnhc4LkgkYkADnjsa3
/+L+nazVgX6YaXZ9Ny2TY3gMF/gyJHp7LzZGeOdeqKNWYTclnrkEzGJ3Eg9aE2vz
yymcz2P7nWYFklIfnRUPAYnvUwleBysYkHsw5z3wrqX6TjW5IW8fcozN9dIIdOTC
2AVBGhi931xnJSj4AU9in+p+s9qfPP6bHN/C/5PuL04UQ7+y+pjUhmwJxUzugVoq
T4xpp1BFj/3SQ8SUQX4nVTdHqR5uKfHvFKyAe6/0345w6bQCQtI=
=7KMu
-----END PGP SIGNATURE-----
Merge tag 'f2fs-6.12-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs
Pull f2fs fix from Jaegeuk Kim:
"An urgent fix to resolve DIO read performance regression caused by
'f2fs: fix to avoid racing in between read and OPU dio write'"
* tag 'f2fs-6.12-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs:
f2fs: allow parallel DIO reads
- Make sure only regular inodes can be used for file-backed mounts;
- Two minor codebase cleanups.
-----BEGIN PGP SIGNATURE-----
iQJFBAABCgAvFiEEQ0A6bDUS9Y+83NPFUXZn5Zlu5qoFAmcNHqARHHhpYW5nQGtl
cm5lbC5vcmcACgkQUXZn5Zlu5qo1XA/+MFbobJ4bWxJQKnouLlCiFQ5C1xEFbVn2
HasHLfrMIcdz/n/S3Ib4Ayi+9W0zM2Ekq9EG+fuOBqjZP17+EOj3e7OPtVVPNwx0
u2GbD9zNCliZg9PigCfPO+6oImt6l/Mytmx+7bELqbMywAy7JNCNesJuyycsTcja
o1I3dNNUZdppilohXPIENTRLjBlOuGBaZdUXDih0LqB+Pb0jgXTP6JfD88h1MLFw
xBbhqQ1A/GgyESfsMpZFn2xvFIocLBCIAdAehi9M1AiEwCTjGkTZ66WW3H6Es/Zp
vcC9KjHJoGGCXxZf8mnoQHQo/WqQuNUPc2BVf9iExzCo0nwRArcTbAu5Bskqg0LF
c+a7FrrxhODz8ioxOOiMUqG4b3/qGkzlk6w5a/t7IRrmFtmcXmPWZ14aI8qpy7o/
CW3iPUoF/zEsmmFvOgJtHwy3g+bC8KhDvz3fqFIDSSMjSKjqb4cPYSe/L5MyhwED
wmLgp1uYjEyR0uuqqUp93FEYIbHuO5HpPRT5crLczRIoYn7bXRhjNWLCTmzlLqrj
yDAQsrngK99BQ7g0FTQ/OV9si/HRRGsusZmCkeCb6KnRNIvml4X9/WXKc1ioOFk/
3MSaxlQlTXzCCctjVCDNn9GfD/yR1cXu2sUpGSEnP1ssLG4ARyXGVfoeSw7gJ4xn
C5lm9SOmkzU=
=0gFG
-----END PGP SIGNATURE-----
Merge tag 'erofs-for-6.12-rc4-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs
Pull erofs fixes from Gao Xiang:
"The main one fixes a syzbot issue due to the invalid inode type out of
file-backed mounts. The others are minor cleanups without actual logic
changes.
Summary:
- Make sure only regular inodes can be used for file-backed mounts
- Two minor codebase cleanups"
* tag 'erofs-for-6.12-rc4-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs:
erofs: get rid of kaddr in `struct z_erofs_maprecorder`
erofs: get rid of z_erofs_try_to_claim_pcluster()
erofs: ensure regular inodes for file-backed mounts
sysfs warns if we're removing a symlink from a directory that's no
longer in sysfs; this is triggered by fstests generic/730, which
simulates hot removal of a block device.
This patch is however not a correct fix, since checking
kobj->state_in_sysfs on a kobj owned by another subsystem is racy.
A better fix would be to add the appropriate check to
sysfs_remove_link() - and sysfs_create_link() as well.
But kobject_add_internal()/kobject_del() do not as of today have locking
that would support that.
Note that the block/holder.c code appears to be subject to this race as
well.
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
When creating a new stripe, we may reuse an existing stripe that has
some empty and some nonempty blocks.
Generally, the existing stripe won't change underneath us - except for
block sector counts, which we copy to the new key in
ec_stripe_key_update.
But the device removal path can now invalidate stripe pointers to a
device, and that can race with stripe reuse.
Change ec_stripe_key_update() to check for and resolve this
inconsistency.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
We were checking that the alloc key was for a valid device, but not a
valid bucket.
This is the upgrade path from versions prior to bcachefs being mainlined.
Reported-by: syzbot+a1b59c8e1a3f022fd301@syzkaller.appspotmail.com
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
-----BEGIN PGP SIGNATURE-----
iQGzBAABCgAdFiEE6fsu8pdIjtWE/DpLiiy9cAdyT1EFAmcMBf4ACgkQiiy9cAdy
T1Hf7Qv/f/TEXZisWIGshUpIerxOAWmN70bTw4sNID9ge8mVWwtVJBs57rlSjPTc
97Jj95urqnKEAGk/KC8qntp5QCMBQAeBFILigZph2c7vqEXPQy0dpbDUEUFuRN2G
mq0wn7IcJZcPJmhZGx9JJeteHk/24drJRSM+jyklwI2Rmev6Y6dlsv4JyMuvP7iI
YuCdbN7rYXsRBkpnK5AbiWCRdxwQMiMuGsppNQyBVSZKkt/g+8R16Z6WKxSbkaZf
XajVsywhlP5Bg9HRAk/YTPK4enKVi8ISp9qfS9EuinwM/VFzEnXnYrec/fiD0Ukg
rEemM7iF/YQdQq/2q8gm5KpoOjnLbaew+Zb+OoWyXMK7RJygD79+uMHn3v1cdi7B
BWCgbQQ7KiRi6rOo0Xzz8Rmw3L4+DHjTvIbh46jz90qQyuumR2hUSa7cPl2ATO4l
lxA50Q8xPE1i0Cfob1w/XHlrfmWMyovtHSKDvaeOMclp/VAHDfS6nB0x/ngyY8UH
ii2czaDd
=uI8y
-----END PGP SIGNATURE-----
Merge tag '6.12-rc2-cifs-fixes' of git://git.samba.org/sfrench/cifs-2.6
Pull smb client fixes from Steve French:
"Two fixes for Windows symlink handling"
* tag '6.12-rc2-cifs-fixes' of git://git.samba.org/sfrench/cifs-2.6:
cifs: Fix creating native symlinks pointing to current or parent directory
cifs: Improve creating native symlinks pointing to directory
Check if we have snapshot_trees or subvolumes that refer to the snapshot
node being reconstructed, and use them.
With this, the kill_btree_root test that blows away the snapshots btree
now passes, and we're able to successfully reconstruct.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
BCH_TRANS_COMMIT_journal_reclaim without BCH_WATERMARK_reclaim means
"return an error if low on journal space" - but accounting replay must
succeed.
Fixes https://github.com/koverstreet/bcachefs/issues/656
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Localio Bugfixes:
* Remove duplicated include in localio.c
* Fix race in NFS calls to nfsd_file_put_local() and nfsd_serv_put()
* Fix Kconfig for NFS_COMMON_LOCALIO_SUPPORT
* Fix nfsd_file tracepoints to handle NULL rqstp pointers
Other Bugfixes:
* Fix program selection loop in svc_process_common
* Fix integer overflow in decode_rc_list()
* Prevent NULL-pointer dereference in nfs42_complete_copies()
* Fix CB_RECALL performance issues when using a large number of delegations
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEnZ5MQTpR7cLU7KEp18tUv7ClQOsFAmcJjjQACgkQ18tUv7Cl
QOvgJw/6A33s+pjyBVLIKT6oMCPkUJeQ4Rhg9Je0Qw/ji0eFkT4Eyd65kRz3T9M/
qRrCfWaUd2dTYcbKQyhuGTlEfICZa9R4I0/Ztk9yvf9xcd1xFXKzTkFekGUVeHQA
OcngDu9psFxhvyzKI8nAHs1ephX/T7TywvTKANMRbeRCYYvVkytAt9YeVMigYZa5
dnchoUdGUdL6B6RXCU/Qhf0A1uYyA4hkk/FTBCPgv+kYx5pnjFq0y/yIIHDzCR3I
+yE1ss3EpVTQgt2Ca/cmDyYXsa7G8G51U7cS5AeIoXfsf1EGtTujowWcBY4oqFEC
ixx58fQe48AqwsP5XDZn8gnsuYH9snnw5rIB0IVqq55/a+XLMupHayyf/iziMV3s
JWgT4gKDyFca2pT+bJ8iWweU+ecRYxKGnh2NydyBiqowogsHZm4uKh0vELvqqkBd
RIjCyIiQVhYBII2jqpjRnxrqhGUT5XO99NQdQIGV0bUjCEP4YAjY4ChfEVcWXhnB
ppyBP+r8N5O77NcVqsVQS26U0/jb9K30LyYl9VT43ank3d+VVtHA5ZqnUflWtwuc
2XiGDvXW9mIvbVraWIZXUNVy39bzRclDf5bx4jeYLnKCMym81rkEIBOvBKQKZTrl
v+1Nhaj+fSw+rFSUm0KPqms0UDiT0Ol7ltu84ifadYqubbSEbqU=
=QBvR
-----END PGP SIGNATURE-----
Merge tag 'nfs-for-6.12-2' of git://git.linux-nfs.org/projects/anna/linux-nfs
Pull NFS client fixes from Anna Schumaker:
"Localio Bugfixes:
- remove duplicated include in localio.c
- fix race in NFS calls to nfsd_file_put_local() and nfsd_serv_put()
- fix Kconfig for NFS_COMMON_LOCALIO_SUPPORT
- fix nfsd_file tracepoints to handle NULL rqstp pointers
Other Bugfixes:
- fix program selection loop in svc_process_common
- fix integer overflow in decode_rc_list()
- prevent NULL-pointer dereference in nfs42_complete_copies()
- fix CB_RECALL performance issues when using a large number of
delegations"
* tag 'nfs-for-6.12-2' of git://git.linux-nfs.org/projects/anna/linux-nfs:
NFS: remove revoked delegation from server's delegation list
nfsd/localio: fix nfsd_file tracepoints to handle NULL rqstp
nfs_common: fix Kconfig for NFS_COMMON_LOCALIO_SUPPORT
nfs_common: fix race in NFS calls to nfsd_file_put_local() and nfsd_serv_put()
NFSv4: Prevent NULL-pointer dereference in nfs42_complete_copies()
SUNRPC: Fix integer overflow in decode_rc_list()
sunrpc: fix prog selection loop in svc_process_common
nfs: Remove duplicated include in localio.c
The function read_alloc_one_name() does not initialize the name field of
the passed fscrypt_str struct if kmalloc fails to allocate the
corresponding buffer. Thus, it is not guaranteed that
fscrypt_str.name is initialized when freeing it.
This is a follow-up to the linked patch that fixes the remaining
instances of the bug introduced by commit e43eec81c5 ("btrfs: use
struct qstr instead of name and namelen pairs").
Link: https://lore.kernel.org/linux-btrfs/20241009080833.1355894-1-jroi.martin@gmail.com/
Fixes: e43eec81c5 ("btrfs: use struct qstr instead of name and namelen pairs")
CC: stable@vger.kernel.org # 6.1+
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: Roi Martin <jroi.martin@gmail.com>
Signed-off-by: David Sterba <dsterba@suse.com>
As all changed_* functions need to return something, just return 0
directly here, as the verity status is passed via the context.
Reported by LKP: fs/btrfs/send.c:6877:5-8: Unneeded variable: "ret". Return "0" on line 6883
Reported-by: kernel test robot <lkp@intel.com>
Link: https://lore.kernel.org/oe-kbuild-all/202410092305.WbyqspH8-lkp@intel.com/
Signed-off-by: Christian Heusel <christian@heusel.eu>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
The add_inode_ref() function does not initialize the "name" struct when
it is declared. If any of the following calls to "read_one_inode()
returns NULL,
dir = read_one_inode(root, parent_objectid);
if (!dir) {
ret = -ENOENT;
goto out;
}
inode = read_one_inode(root, inode_objectid);
if (!inode) {
ret = -EIO;
goto out;
}
then "name.name" would be freed on "out" before being initialized.
out:
...
kfree(name.name);
This issue was reported by Coverity with CID 1526744.
Fixes: e43eec81c5 ("btrfs: use struct qstr instead of name and namelen pairs")
CC: stable@vger.kernel.org # 6.6+
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Roi Martin <jroi.martin@gmail.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
We are using the logical address ("bytenr") of an extent as the key for
qgroup records in the dirty extents xarray. This is a problem because the
xarrays use "unsigned long" for keys/indices, meaning that on a 32 bits
platform any extent starting at or beyond 4G is truncated, which is a too
low limitation as virtually everyone is using storage with more than 4G of
space. This means a "bytenr" of 4G gets truncated to 0, and so does 8G and
16G for example, resulting in incorrect qgroup accounting.
Fix this by using sector numbers as keys instead, that is, using keys that
match the logical address right shifted by fs_info->sectorsize_bits, which
is what we do for the fs_info->buffer_radix that tracks extent buffers
(radix trees also use an "unsigned long" type for keys). This also makes
the index space more dense which helps optimize the xarray (as mentioned
at Documentation/core-api/xarray.rst).
Fixes: 3cce39a8ca ("btrfs: qgroup: use xarray to track dirty extents in transaction")
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Even though system user has a supplementary group, It gets
NT_STATUS_ACCESS_DENIED when attempting to create file or directory.
This patch add KSMBD_EVENT_LOGIN_REQUEST_EXT/RESPONSE_EXT netlink events
to get supplementary groups list. The new netlink event doesn't break
backward compatibility when using old ksmbd-tools.
Co-developed-by: Atte Heikkilä <atteh.mailbox@gmail.com>
Signed-off-by: Atte Heikkilä <atteh.mailbox@gmail.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
This fixes a regression which prevents parallel DIO reads.
Fixes: 0cac51185e ("f2fs: fix to avoid racing in between read and OPU dio write")
Reviewed-by: Daeho Jeong <daehojeong@google.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
The variable declaration in this function predates the merge of the
nrext64 (aka 64-bit extent counters) feature, which means that the
variable declaration type is insufficient to avoid an integer overflow.
Fix that by redeclaring the variable to be xfs_extnum_t.
Coverity-id: 1630958
Fixes: 8f71bede8e ("xfs: repair inode fork block mapping data structures")
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEE8rQSAMVO+zA4DBdWxWXV+ddtWDsFAmcH4qAACgkQxWXV+ddt
WDtDig//czntfO+iRvDERZWTIB6vdVExfLd3r3ZNYlO1pIvgCuvqx3iYva+0ZhGW
8A+gcRax7cz0jCaxDp/+5lIRfdNxZH6/LwjZsDgU8Ly7himeRmwhtn2fCgNeiH/K
bUl92+ZMo2vwqTKXYa3xF1g3Hz6cRXVW7gJrMwNhb1hpPTGx+lgYJU02m/Io/vjK
1jcrZ84OEPIOY5uiAoDyO2hgsT/zVEeuuOiSTpKSzrghPbo0vmjLiYJ5T+CE5Uw3
u3w7/Fqnw49NwucqtncvyFFDXY9EWNuQhowi3hqJgOYTInqwwJigIpQV0hDDwYxb
ohGUGjazGfAEf/cy1jZXMbwCVgg8/Nj9x0eDKKhfs19VYUbMkEYQ8LKRTUlCeBwS
H/2AmqpqHEEO+tPY3P+w6MVwkNho8JNpWPdP5OzJs7XrD067IViOjD06HPM/k5ci
TU3zp9NYvgHVtmfZK1Aqsg9OYVhI1klVXejmlAzOLxejRPWXK/1hBw3kXbC6I+k1
50l0Yh1dgEnclMI3yWsKoj8IYUAkh2eudt0pNsot4a5vICMY++NVS2eukdz5UcEz
ix7hcpYcCcmzoOaelyEgmdAncWVGJT5w2Nzy85YaOp+Z1C65Ywb41utU+sSY+swB
kZfwl9vrsfu754vX7UKBherCvvYo+Lnj3GeX8Oe+1LoT2BP0TPk=
=lTqc
-----END PGP SIGNATURE-----
Merge tag 'for-6.12-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull btrfs fixes from David Sterba:
- update fstrim loop and add more cancellation points, fix reported
delayed or blocked suspend if there's a huge chunk queued
- fix error handling in recent qgroup xarray conversion
- in zoned mode, fix warning printing device path without RCU
protection
- again fix invalid extent xarray state (6252690f7e), lost due to
refactoring
* tag 'for-6.12-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: fix clear_dirty and writeback ordering in submit_one_sector()
btrfs: zoned: fix missing RCU locking in error message when loading zone info
btrfs: fix missing error handling when adding delayed ref with qgroups enabled
btrfs: add cancellation points to trim loops
btrfs: split remaining space to discard in chunks
- Fix NFSD bring-up / shutdown
- Fix a UAF when releasing a stateid
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEKLLlsBKG3yQ88j7+M2qzM29mf5cFAmcHx6IACgkQM2qzM29m
f5ecExAAheSpSP9D+1n3hKeOlNfhY8FUzd41Arn4NYV+jIBJbtx94/FlSNOxA0mp
Ovm1I8uAxy4TR8TLt7tsxfT7JDOStKwXFl3QlOUZT/+uyyJr7/q5R959R3oMiccR
Rfrpj6j2yYWrI8qGDGHca4Vv2bDSxr4mzztWwDe0SHSsjwf4OAcv+XF5vcfZ/CJN
Bxulb9WfNU8XvdFcRDHokMfk6jiY6/+FCTwX8ckvbVEG6gHT8+CRYSUJ05j0LJGo
xKZV913NgzcuV7PH0vq6vExJE6+rEPt/ejDAT5FM5yeNe+WJ4RTDgsYyIr9iLbHF
mWB9M4NnP+EZhejtOCbZ9RZjjKro09ilEPpqILuuGQPtcHSeWmhNbFz0kwLe+zYZ
CdtjnPZhjB0ITWgZ1HCtoJ8k/ZcMa7iiM/kApMLGr9fVj8/BHHFzS95PK7K/Fqur
FLdhvo6CzZCnRd16e2kqWsG7wO2lPWcz4NWTf9wxIG5GCunXoVCEnK1VfHvnldbH
BIFXZ+ib5qnL2i3Qmz7bQxmfIp5ryZnNx1mF0OM8imR9K/rsnARd7JfQ99lpMy8D
mD4coZVTMMk/Zg9zuH8k5GBzB2zXXqgngp4IJIxqrKR7/AsuSU3R7r+O9CWN91GQ
GKpRtMn/rVUg81jxDr3qoKquyxONoyVrVXAKsj1PgUSQdjUJgqU=
=Rud7
-----END PGP SIGNATURE-----
Merge tag 'nfsd-6.12-1' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux
Pull nfsd fixes from Chuck Lever:
- Fix NFSD bring-up / shutdown
- Fix a UAF when releasing a stateid
* tag 'nfsd-6.12-1' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux:
nfsd: fix possible badness in FREE_STATEID
nfsd: nfsd_destroy_serv() must call svc_destroy() even if nfsd_startup_net() failed
NFSD: Mark filecache "down" if init fails
* A few small typo fixes
* fstests xfs/538 DEBUG-only fix
* Performance fix on blockgc on COW'ed files,
by skipping trims on cowblock inodes currently
opened for write
* Prevent cowblocks to be freed under dirty pagecache
during unshare
* Update MAINTAINERS file to quote the new maintainer
Signed-off-by: Carlos Maiolino <cem@kernel.org>
-----BEGIN PGP SIGNATURE-----
iJUEABMJAB0WIQQMHYkcUKcy4GgPe2RGdaER5QtfpgUCZwY6mgAKCRBGdaER5Qtf
poE8AYCZzMJr9wMrs2RsWRnaEhMRJNZIPQmSKXgHAK3mV5AbXtdHRc8yGVNHf+mW
Nh0fwAkBf1Ix0VJWkXOSFHZI9O2lLRsCogbNjFhwYF0MHZch2/mq1Wa4Tj1SDlfg
Ny2PJBNHyA==
=OkRo
-----END PGP SIGNATURE-----
Merge tag 'xfs-6.12-fixes-3' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux
Pull xfs fixes from Carlos Maiolino:
- A few small typo fixes
- fstests xfs/538 DEBUG-only fix
- Performance fix on blockgc on COW'ed files, by skipping trims on
cowblock inodes currently opened for write
- Prevent cowblocks to be freed under dirty pagecache during unshare
- Update MAINTAINERS file to quote the new maintainer
* tag 'xfs-6.12-fixes-3' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
xfs: fix a typo
xfs: don't free cowblocks from under dirty pagecache on unshare
xfs: skip background cowblock trims on inodes open for write
xfs: support lowmode allocations in xfs_bmap_exact_minlen_extent_alloc
xfs: call xfs_bmap_exact_minlen_extent_alloc from xfs_bmap_btalloc
xfs: don't ifdef around the exact minlen allocations
xfs: fold xfs_bmap_alloc_userdata into xfs_bmapi_allocate
xfs: distinguish extra split from real ENOSPC from xfs_attr_node_try_addname
xfs: distinguish extra split from real ENOSPC from xfs_attr3_leaf_split
xfs: return bool from xfs_attr3_leaf_add
xfs: merge xfs_attr_leaf_try_add into xfs_attr_leaf_addname
xfs: Use try_cmpxchg() in xlog_cil_insert_pcp_aggregate()
xfs: scrub: convert comma to semicolon
xfs: Remove empty declartion in header file
MAINTAINERS: add Carlos Maiolino as XFS release manager
While we do currently return -EFAULT in this case, it seems prudent to
follow the behaviour of other syscalls like clone3. It seems quite
unlikely that anyone depends on this error code being EFAULT, but we can
always revert this if it turns out to be an issue.
Cc: stable@vger.kernel.org # v5.6+
Fixes: fddb5d430a ("open: introduce openat2(2) syscall")
Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
Link: https://lore.kernel.org/r/20241010-extensible-structs-check_fields-v3-3-d2833dfe6edd@cyphar.com
Signed-off-by: Christian Brauner <brauner@kernel.org>
There is racy issue between smb2 session log off and smb2 session setup.
It will cause user-after-free from session log off.
This add session_lock when setting SMB2_SESSION_EXPIRED and referece
count to session struct not to free session while it is being used.
Cc: stable@vger.kernel.org # v5.15+
Reported-by: zdi-disclosures@trendmicro.com # ZDI-CAN-25282
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
are MM.
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZwcILgAKCRDdBJ7gKXxA
jjnMAQDRl+UfscRUeMippi7wnL3ee6MKyhhZVOhoxP24uB7yBwD/Ulq4oE+mLHml
YTlK/wj5qTZIsdxGaBzM1yifqp3L7gU=
=lFmJ
-----END PGP SIGNATURE-----
Merge tag 'mm-hotfixes-stable-2024-10-09-15-46' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull misc fixes from Andrew Morton:
"12 hotfixes, 5 of which are c:stable. All singletons, about half of
which are MM"
* tag 'mm-hotfixes-stable-2024-10-09-15-46' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
mm: zswap: delete comments for "value" member of 'struct zswap_entry'.
CREDITS: sort alphabetically by name
secretmem: disable memfd_secret() if arch cannot set direct map
.mailmap: update Fangrui's email
mm/huge_memory: check pmd_special() only after pmd_present()
resource, kunit: fix user-after-free in resource_test_region_intersects()
fs/proc/kcore.c: allow translation of physical memory addresses
selftests/mm: fix incorrect buffer->mirror size in hmm2 double_map test
device-dax: correct pgoff align in dax_set_mapping()
kthread: unpark only parked kthread
Revert "mm: introduce PF_MEMALLOC_NORECLAIM, PF_MEMALLOC_NOWARN"
bcachefs: do not use PF_MEMALLOC_NORECLAIM
Like how we already do when the allocator seems to be stuck, check if
we're waiting too long for a journal reservation and print some debug
info.
This is specifically to track down
https://github.com/koverstreet/bcachefs/issues/656
which is showing up in userspace where we don't have sysfs/debugfs to
get the journal debug info.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
This patch adds a bounds check to the bch2_opt_to_text function to prevent
NULL pointer dereferences when accessing the opt->choices array. This
ensures that the index used is within valid bounds before dereferencing.
The new version enhances the readability.
Reported-and-tested-by: syzbot+37186860aa7812b331d5@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=37186860aa7812b331d5
Signed-off-by: Mohammed Anees <pvmohammedanees2003@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
We will get this if we wake up first:
Kernel panic - not syncing: btree_node_write_done leaked btree_trans
since there are still transactions waiting for cycle detectors after
BTREE_NODE_write_in_flight is cleared.
Signed-off-by: Alan Huang <mmpgouride@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
- Fix failure to validate that accounting replicas entries point to
valid devices: this wasn't a real bug since they'd be cleaned up by
GC, but is still something we should know about
- Fix failure to validate that dev_data_type entries point to valid
devices: this does fix a real bug, since bch2_accounting_read() would
then try to copy the counters to that device and pop an inconsistent
error when the device didn't exist
- Remove accounting entries that are zeroed or invalid: if we're not
validating them we need to get rid of them: they might not exist in
the superblock, so we need the to trigger the superblock mark path
when they're readded.
This fixes the replication.ktest rereplicate test, which was failing
with "superblock not marked for replicas..."
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
fsck can now correctly check if inodes in interior snapshot nodes are
open/in use.
- Tweak the vfs inode rhashtable so that the subvolume ID isn't hashed,
meaning inums in different subvolumes will hash to the same slot. Note
that this is a hack, and will cause problems if anyone ever has the
same file in many different snapshots open all at the same time.
- Then check if any of those subvolumes is a descendent of the snapshot
ID being checked
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
There's an inherent race in taking a snapshot while an unlinked file is
open, and then reattaching it in the child snapshot.
In the interior snapshot node the file will appear unlinked, as though
it should be deleted - it's not referenced by anything in that snapshot
- but we can't delete it, because the file data is referenced by the
child snapshot.
This was being handled incorrectly with
propagate_key_to_snapshot_leaves() - but that doesn't resolve the
fundamental inconsistency of "this file looks like it should be deleted
according to normal rules, but - ".
To fix this, we need to fix the rule for when an inode is deleted. The
previous rule, ignoring snapshots (there was no well-defined rule
for with snapshots) was:
Unlinked, non open files are deleted, either at recovery time or
during online fsck
The new rule is:
Unlinked, non open files, that do not exist in child snapshots, are
deleted.
To make this work transactionally, we add a new inode flag,
BCH_INODE_has_child_snapshot; it overrides BCH_INODE_unlinked when
considering whether to delete an inode, or put it on the deleted list.
For transactional consistency, clearing it handled by the inode trigger:
when deleting an inode we check if there are parent inodes which can now
have the BCH_INODE_has_child_snapshot flag cleared.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
When /proc/kcore is read an attempt to read the first two pages results in
HW-specific page swap on s390 and another (so called prefix) pages are
accessed instead. That leads to a wrong read.
Allow architecture-specific translation of memory addresses using
kc_xlate_dev_mem_ptr() and kc_unxlate_dev_mem_ptr() callbacks similarily
to /dev/mem xlate_dev_mem_ptr() and unxlate_dev_mem_ptr() callbacks. That
way an architecture can deal with specific physical memory ranges.
Re-use the existing /dev/mem callback implementation on s390, which
handles the described prefix pages swapping correctly.
For other architectures the default callback is basically NOP. It is
expected the condition (vaddr == __va(__pa(vaddr))) always holds true for
KCORE_RAM memory type.
Link: https://lkml.kernel.org/r/20240930122119.1651546-1-agordeev@linux.ibm.com
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Suggested-by: Heiko Carstens <hca@linux.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Patch series "remove PF_MEMALLOC_NORECLAIM" v3.
This patch (of 2):
bch2_new_inode relies on PF_MEMALLOC_NORECLAIM to try to allocate a new
inode to achieve GFP_NOWAIT semantic while holding locks. If this
allocation fails it will drop locks and use GFP_NOFS allocation context.
We would like to drop PF_MEMALLOC_NORECLAIM because it is really
dangerous to use if the caller doesn't control the full call chain with
this flag set. E.g. if any of the function down the chain needed
GFP_NOFAIL request the PF_MEMALLOC_NORECLAIM would override this and
cause unexpected failure.
While this is not the case in this particular case using the scoped gfp
semantic is not really needed bacause we can easily pus the allocation
context down the chain without too much clutter.
[akpm@linux-foundation.org: fix kerneldoc warnings]
Link: https://lkml.kernel.org/r/20240926172940.167084-1-mhocko@kernel.org
Link: https://lkml.kernel.org/r/20240926172940.167084-2-mhocko@kernel.org
Signed-off-by: Michal Hocko <mhocko@suse.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Jan Kara <jack@suse.cz> # For vfs changes
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Christian Brauner <brauner@kernel.org>
Cc: James Morris <jmorris@namei.org>
Cc: Kent Overstreet <kent.overstreet@linux.dev>
Cc: Paul Moore <paul@paul-moore.com>
Cc: Serge E. Hallyn <serge@hallyn.com>
Cc: Yafang Shao <laoar.shao@gmail.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
After the delegation is returned to the NFS server remove it
from the server's delegations list to reduce the time it takes
to scan this list.
Network trace captured while running the below script shows the
time taken to service the CB_RECALL increases gradually due to
the overhead of traversing the delegation list in
nfs_delegation_find_inode_server.
The NFS server in this test is a Solaris server which issues
CB_RECALL when receiving the all-zero stateid in the SETATTR.
mount=/mnt/data
for i in $(seq 1 20)
do
echo $i
mkdir $mount/testtarfile$i
time tar -C $mount/testtarfile$i -xf 5000_files.tar
done
Signed-off-by: Dai Ngo <dai.ngo@oracle.com>
Reviewed-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
* Patch to handle code-points with the Ignorable property as regular
character instead of treating them as an empty string. (Me)
Signed-off-by: Gabriel Krisman Bertazi <krisman@suse.de>
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQS3XO7QfvpFoONBhH1OwQgI3t8RJgUCZwbNtQAKCRBOwQgI3t8R
JrlrAP4yCrZCp4YPlXO6oQGfS9RIeYpmcMzGmp1IAeqlzpB5qwD/YS53kiAzF4qV
+eD2fl/O4qNhZcWqBZKSH4shZBbXJAg=
=XCsY
-----END PGP SIGNATURE-----
Merge tag 'unicode-fixes-6.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/krisman/unicode
Pull unicode fix from Gabriel Krisman Bertazi:
- Handle code-points with the Ignorable property as regular character
instead of treating them as an empty string (me)
* tag 'unicode-fixes-6.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/krisman/unicode:
unicode: Don't special case ignorable code points
We don't need to handle them separately. Instead, just let them
decompose/casefold to themselves.
Signed-off-by: Gabriel Krisman Bertazi <krisman@suse.de>
This commit is a replay of commit 6252690f7e ("btrfs: fix invalid
mapping of extent xarray state"). We need to call
btrfs_folio_clear_dirty() before btrfs_set_range_writeback(), so that
xarray DIRTY tag is cleared.
With a refactoring commit 8189197425 ("btrfs: refactor
__extent_writepage_io() to do sector-by-sector submission"), it screwed
up and the order is reversed and causing the same hang. Fix the ordering
now in submit_one_sector().
Fixes: 8189197425 ("btrfs: refactor __extent_writepage_io() to do sector-by-sector submission")
Reviewed-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
Signed-off-by: David Sterba <dsterba@suse.com>
At btrfs_load_zone_info() we have an error path that is dereferencing
the name of a device which is a RCU string but we are not holding a RCU
read lock, which is incorrect.
Fix this by using btrfs_err_in_rcu() instead of btrfs_err().
The problem is there since commit 08e11a3db0 ("btrfs: zoned: load zone's
allocation offset"), back then at btrfs_load_block_group_zone_info() but
then later on that code was factored out into the helper
btrfs_load_zone_info() by commit 09a46725cc ("btrfs: zoned: factor out
per-zone logic from btrfs_load_block_group_zone_info").
Fixes: 08e11a3db0 ("btrfs: zoned: load zone's allocation offset")
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Naohiro Aota <naohiro.aota@wdc.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Fix a typo in comments.
Signed-off-by: Andrew Kreimer <algonell@gmail.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
fallocate unshare mode explicitly breaks extent sharing. When a
command completes, it checks the data fork for any remaining shared
extents to determine whether the reflink inode flag and COW fork
preallocation can be removed. This logic doesn't consider in-core
pagecache and I/O state, however, which means we can unsafely remove
COW fork blocks that are still needed under certain conditions.
For example, consider the following command sequence:
xfs_io -fc "pwrite 0 1k" -c "reflink <file> 0 256k 1k" \
-c "pwrite 0 32k" -c "funshare 0 1k" <file>
This allocates a data block at offset 0, shares it, and then
overwrites it with a larger buffered write. The overwrite triggers
COW fork preallocation, 32 blocks by default, which maps the entire
32k write to delalloc in the COW fork. All but the shared block at
offset 0 remains hole mapped in the data fork. The unshare command
redirties and flushes the folio at offset 0, removing the only
shared extent from the inode. Since the inode no longer maps shared
extents, unshare purges the COW fork before the remaining 28k may
have written back.
This leaves dirty pagecache backed by holes, which writeback quietly
skips, thus leaving clean, non-zeroed pagecache over holes in the
file. To verify, fiemap shows holes in the first 32k of the file and
reads return different data across a remount:
$ xfs_io -c "fiemap -v" <file>
<file>:
EXT: FILE-OFFSET BLOCK-RANGE TOTAL FLAGS
...
1: [8..511]: hole 504
...
$ xfs_io -c "pread -v 4k 8" <file>
00001000: cd cd cd cd cd cd cd cd ........
$ umount <mnt>; mount <dev> <mnt>
$ xfs_io -c "pread -v 4k 8" <file>
00001000: 00 00 00 00 00 00 00 00 ........
To avoid this problem, make unshare follow the same rules used for
background cowblock scanning and never purge the COW fork for inodes
with dirty pagecache or in-flight I/O.
Fixes: 46afb0628b ("xfs: only flush the unshared range in xfs_reflink_unshare")
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
New:
implement fallocate for compressed files;
add support for the compression attribute;
optimize large writes to sparse files.
Fixed:
fix several potential deadlock scenarios;
fix various internal bugs detected by syzbot;
add checks before accessing NTFS structures during parsing;
correct the format of output messages.
Refactored:
replace fsparam_flag_no with fsparam_flag in options parser;
remove unused functions and macros.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEh0DEKNP0I9IjwfWEqbAzH4MkB7YFAmcFRkgACgkQqbAzH4Mk
B7aQdQ//VT6lzBaxSZ0dm3xs/ygQYE3ter1OczAMQxh70O5px+Z0li5WCXQQe31W
I74MmAy2G7C+JdzMkSc4F7+pFF21DO+1WQjODl9rP7yoHiLteiLFkMI+P5tHuFFT
VPupAiCDQgm6QCtW2wofp/WR0ooxpLzC/yQsILrhUXJHdEY21oHLy/MQSOhVnv2k
VeFdWA70+YS+JEQ+YyZBSeGeMVMQG9VROA57dZ+eAy1vgBUJGcKhYp9v4eiWG1uT
A8rnV81FiHu+49OrE+ZwStguHkxu30dvlHuOqKW0pLW9/XoKCFf9w0WqLOGrAxLs
lKznVgQZ521+QW+lNURMHBZt/bqQxQQInuOOH7jwLyBh0tmKoZlIU9Vw4V3yrm3t
9NELb0FR/FeH02O89nMHl0SeaxDJvgFHDnfoOShuHaffur61d4RjChxqB4QcibOl
Ibvtns71Uo89ngPIXdeTJK+7XZQVmAZHBXXYL+Rwr84Yv0tyn2q8DbrqzmphiYlR
wbLyVBGqtx0ZcN/geuFLiVt2W/HSMPocXqrxv791ALX6cryL+CZdolQP7vWjt4Vh
rKoaRZYN8fdcfT+BIYy95Kz0k12T7ColOoMfxVwugxK9ypF6tH6BJBwYXjCW4MOv
AhkszJcsISxMKidsbzZA9xESM6M3sLHcW0swjAwWWRhc5zbSe6E=
=j71V
-----END PGP SIGNATURE-----
Merge tag 'ntfs3_for_6.12' of https://github.com/Paragon-Software-Group/linux-ntfs3
Pull ntfs3 updates from Konstantin Komarov:
"New:
- implement fallocate for compressed files
- add support for the compression attribute
- optimize large writes to sparse files
Fixes:
- fix several potential deadlock scenarios
- fix various internal bugs detected by syzbot
- add checks before accessing NTFS structures during parsing
- correct the format of output messages
Refactoring:
- replace fsparam_flag_no with fsparam_flag in options parser
- remove unused functions and macros"
* tag 'ntfs3_for_6.12' of https://github.com/Paragon-Software-Group/linux-ntfs3: (25 commits)
fs/ntfs3: Format output messages like others fs in kernel
fs/ntfs3: Additional check in ntfs_file_release
fs/ntfs3: Fix general protection fault in run_is_mapped_full
fs/ntfs3: Sequential field availability check in mi_enum_attr()
fs/ntfs3: Additional check in ni_clear()
fs/ntfs3: Fix possible deadlock in mi_read
ntfs3: Change to non-blocking allocation in ntfs_d_hash
fs/ntfs3: Remove unused al_delete_le
fs/ntfs3: Rename ntfs3_setattr into ntfs_setattr
fs/ntfs3: Replace fsparam_flag_no -> fsparam_flag
fs/ntfs3: Add support for the compression attribute
fs/ntfs3: Implement fallocate for compressed files
fs/ntfs3: Make checks in run_unpack more clear
fs/ntfs3: Add rough attr alloc_size check
fs/ntfs3: Stale inode instead of bad
fs/ntfs3: Refactor enum_rstbl to suppress static checker
fs/ntfs3: Fix sparse warning in ni_fiemap
fs/ntfs3: Fix warning possible deadlock in ntfs_set_state
fs/ntfs3: Fix sparse warning for bigendian
fs/ntfs3: Separete common code for file_read/write iter/splice
...
When adding a delayed ref head, at delayed-ref.c:add_delayed_ref_head(),
if we fail to insert the qgroup record we don't error out, we ignore it.
In fact we treat it as if there was no error and there was already an
existing record - we don't distinguish between the cases where
btrfs_qgroup_trace_extent_nolock() returns 1, meaning a record already
existed and we can free the given record, and the case where it returns
a negative error value, meaning the insertion into the xarray that is
used to track records failed.
Effectively we end up ignoring that we are lacking qgroup record in the
dirty extents xarray, resulting in incorrect qgroup accounting.
Fix this by checking for errors and return them to the callers.
Fixes: 3cce39a8ca ("btrfs: qgroup: use xarray to track dirty extents in transaction")
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
There are reports that system cannot suspend due to running trim because
the task responsible for trimming the device isn't able to finish in
time, especially since we have a free extent discarding phase, which can
trim a lot of unallocated space. There are no limits on the trim size
(unlike the block group part).
Since trime isn't a critical call it can be interrupted at any time,
in such cases we stop the trim, report the amount of discarded bytes and
return an error.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=219180
Link: https://bugzilla.suse.com/show_bug.cgi?id=1229737
CC: stable@vger.kernel.org # 5.15+
Signed-off-by: Luca Stefani <luca.stefani.ge1@gmail.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Per Qu Wenruo in case we have a very large disk, e.g. 8TiB device,
mostly empty although we will do the split according to our super block
locations, the last super block ends at 256G, we can submit a huge
discard for the range [256G, 8T), causing a large delay.
Split the space left to discard based on BTRFS_MAX_DISCARD_CHUNK_SIZE in
preparation of introduction of cancellation points to trim. The value
of the chunk size is arbitrary, it can be higher or derived from actual
device capabilities but we can't easily read that using
bio_discard_limit().
Link: https://bugzilla.kernel.org/show_bug.cgi?id=219180
Link: https://bugzilla.suse.com/show_bug.cgi?id=1229737
CC: stable@vger.kernel.org # 5.15+
Signed-off-by: Luca Stefani <luca.stefani.ge1@gmail.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
The code that copies data from srcmap to iomap in dax_unshare_iter is
very very broken, which bfoster's recent fsx changes have exposed.
If the pos and len passed to dax_file_unshare are not aligned to an
fsblock boundary, the iter pos and length in the _iter function will
reflect this unalignment.
dax_iomap_direct_access always returns a pointer to the start of the
kmapped fsdax page, even if its pos argument is in the middle of that
page. This is catastrophic for data integrity when iter->pos is not
aligned to a page, because daddr/saddr do not point to the same byte in
the file as iter->pos. Hence we corrupt user data by copying it to the
wrong place.
If iter->pos + iomap_length() in the _iter function not aligned to a
page, then we fail to copy a full block, and only partially populate the
destination block. This is catastrophic for data confidentiality
because we expose stale pmem contents.
Fix both of these issues by aligning copy_pos/copy_len to a page
boundary (remember, this is fsdax so 1 fsblock == 1 base page) so that
we always copy full blocks.
We're not done yet -- there's no call to invalidate_inode_pages2_range,
so programs that have the file range mmap'd will continue accessing the
old memory mapping after the file metadata updates have completed.
Be careful with the return value -- if the unshare succeeds, we still
need to return the number of bytes that the iomap iter thinks we're
operating on.
Cc: ruansy.fnst@fujitsu.com
Fixes: d984648e42 ("fsdax,xfs: port unshare to fsdax")
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Link: https://lore.kernel.org/r/172796813328.1131942.16777025316348797355.stgit@frogsfrogsfrogs
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Christian Brauner <brauner@kernel.org>
Remove the code in dax_unshare_iter that zeroes the destination memory
because it's not necessary.
If srcmap is unwritten, we don't have to do anything because that
unwritten extent came from the regular file mapping, and unwritten
extents cannot be shared. The same applies to holes.
Furthermore, zeroing to unshare a mapping is just plain wrong because
unsharing means copy on write, and we should be copying data.
This is effectively a revert of commit 13dd4e0462 ("fsdax: unshare:
zero destination if srcmap is HOLE or UNWRITTEN")
Cc: ruansy.fnst@fujitsu.com
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Link: https://lore.kernel.org/r/172796813311.1131942.16033376284752798632.stgit@frogsfrogsfrogs
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Christian Brauner <brauner@kernel.org>
The predicate code that iomap_unshare_iter uses to decide if it's really
needs to unshare a file range mapping should be shared with the fsdax
version, because right now they're opencoded and inconsistent.
Note that we simplify the predicate logic a bit -- we no longer allow
unsharing of inline data mappings, but there aren't any filesystems that
allow shared inline data currently.
This is a fix in the sense that it should have been ported to fsdax.
Fixes: b53fdb215d ("iomap: improve shared block detection in iomap_unshare_iter")
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Link: https://lore.kernel.org/r/172796813294.1131942.15762084021076932620.stgit@frogsfrogsfrogs
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Christian Brauner <brauner@kernel.org>
It doesn't make sense to allocate a COW extent when unsharing a hole
because holes cannot be shared.
Fixes: 1f1397b721 ("xfs: don't allocate into the data fork for an unshare request")
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Link: https://lore.kernel.org/r/172796813277.1131942.5486112889531210260.stgit@frogsfrogsfrogs
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Christian Brauner <brauner@kernel.org>
netfslib currently defers dropping the ref on the folios it obtains during
readahead to after it has started I/O on the basis that we can do it whilst
we wait for the I/O to complete, but this runs the risk of the I/O
collection racing with this in future.
Furthermore, Matthew Wilcox strongly suggests that the refs should be
dropped immediately, as readahead_folio() does (netfslib is using
__readahead_batch() which doesn't drop the refs).
Fixes: ee4cdf7ba8 ("netfs: Speed up buffered reading")
Suggested-by: Matthew Wilcox <willy@infradead.org>
Signed-off-by: David Howells <dhowells@redhat.com>
Link: https://lore.kernel.org/r/3771538.1728052438@warthog.procyon.org.uk
cc: Jeff Layton <jlayton@kernel.org>
cc: netfs@lists.linux.dev
cc: linux-fsdevel@vger.kernel.org
Signed-off-by: Christian Brauner <brauner@kernel.org>
The background blockgc scanner runs on a 5m interval by default and
trims preallocation (post-eof and cow fork) from inodes that are
otherwise idle. Idle effectively means that iolock can be acquired
without blocking and that the inode has no dirty pagecache or I/O in
flight.
This simple mechanism and heuristic has worked fairly well for
post-eof speculative preallocations. Support for reflink and COW
fork preallocations came sometime later and plugged into the same
mechanism, with similar heuristics. Some recent testing has shown
that COW fork preallocation may be notably more sensitive to blockgc
processing than post-eof preallocation, however.
For example, consider an 8GB reflinked file with a COW extent size
hint of 1MB. A worst case fully randomized overwrite of this file
results in ~8k extents of an average size of ~1MB. If the same
workload is interrupted a couple times for blockgc processing
(assuming the file goes idle), the resulting extent count explodes
to over 100k extents with an average size <100kB. This is
significantly worse than ideal and essentially defeats the COW
extent size hint mechanism.
While this particular test is instrumented, it reflects a fairly
reasonable pattern in practice where random I/Os might spread out
over a large period of time with varying periods of (in)activity.
For example, consider a cloned disk image file for a VM or container
with long uptime and variable and bursty usage. A background blockgc
scan that races and processes the image file when it happens to be
clean and idle can have a significant effect on the future
fragmentation level of the file, even when still in use.
To help combat this, update the heuristic to skip cowblocks inodes
that are currently opened for write access during non-sync blockgc
scans. This allows COW fork preallocations to persist for as long as
possible unless otherwise needed for functional purposes (i.e. a
sync scan), the file is idle and closed, or the inode is being
evicted from cache. While here, update the comments to help
distinguish performance oriented heuristics from the logic that
exists to maintain functional correctness.
Suggested-by: Darrick Wong <djwong@kernel.org>
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
Currently the debug-only xfs_bmap_exact_minlen_extent_alloc allocation
variant fails to drop into the lowmode last resort allocator, and
thus can sometimes fail allocations for which the caller has a
transaction block reservation.
Fix this by using xfs_bmap_btalloc_low_space to do the actual allocation.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
xfs_bmap_exact_minlen_extent_alloc duplicates the args setup in
xfs_bmap_btalloc. Switch to call it from xfs_bmap_btalloc after
doing the basic setup.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
Exact minlen allocations only exist as an error injection tool for debug
builds. Currently this is implemented using ifdefs, which means the code
isn't even compiled for non-XFS_DEBUG builds. Enhance the compile test
coverage by always building the code and use the compilers' dead code
elimination to remove it from the generated binary instead.
The only downside is that the alloc_minlen_only field is unconditionally
added to struct xfs_alloc_args now, but by moving it around and packing
it tightly this doesn't actually increase the size of the structure.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
Userdata and metadata allocations end up in the same allocation helpers.
Remove the separate xfs_bmap_alloc_userdata function to make this more
clear.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
Just like xfs_attr3_leaf_split, xfs_attr_node_try_addname can return
-ENOSPC both for an actual failure to allocate a disk block, but also
to signal the caller to convert the format of the attr fork. Use magic
1 to ask for the conversion here as well.
Note that unlike the similar issue in xfs_attr3_leaf_split, this one was
only found by code review.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
xfs_attr3_leaf_split propagates the need for an extra btree split as
-ENOSPC to it's only caller, but the same return value can also be
returned from xfs_da_grow_inode when it fails to find free space.
Distinguish the two cases by returning 1 for the extra split case instead
of overloading -ENOSPC.
This can be triggered relatively easily with the pending realtime group
support and a file system with a lot of small zones that use metadata
space on the main device. In this case every about 5-10th run of
xfs/538 runs into the following assert:
ASSERT(oldblk->magic == XFS_ATTR_LEAF_MAGIC);
in xfs_attr3_leaf_split caused by an allocation failure. Note that
the allocation failure is caused by another bug that will be fixed
subsequently, but this commit at least sorts out the error handling.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
xfs_attr3_leaf_add only has two potential return values, indicating if the
entry could be added or not. Replace the errno return with a bool so that
ENOSPC from it can't easily be confused with a real ENOSPC.
Remove the return value from the xfs_attr3_leaf_add_work helper entirely,
as it always return 0.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
xfs_attr_leaf_try_add is only called by xfs_attr_leaf_addname, and
merging the two will simplify a following error handling fix.
To facilitate this move the remote block state save/restore helpers up in
the file so that they don't need forward declarations now.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
Use !try_cmpxchg instead of cmpxchg (*ptr, old, new) != old in
xlog_cil_insert_pcp_aggregate(). x86 CMPXCHG instruction returns
success in ZF flag, so this change saves a compare after cmpxchg.
Also, try_cmpxchg implicitly assigns old *ptr value to "old" when
cmpxchg fails. There is no need to re-read the value in the loop.
Note that the value from *ptr should be read using READ_ONCE to
prevent the compiler from merging, refetching or reordering the read.
No functional change intended.
Signed-off-by: Uros Bizjak <ubizjak@gmail.com>
Reviewed-by: Christoph Hellwig <hch@infradead.org>
Cc: Chandan Babu R <chandan.babu@oracle.com>
Cc: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
Replace a comma between expression statements by a semicolon.
Signed-off-by: Yan Zhen <yanzhen@vivo.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
The definition of xfs_attr_use_log_assist() has been removed since
commit d9c61ccb3b ("xfs: move xfs_attr_use_log_assist out of xfs_log.c").
So, Remove the empty declartion in header files.
Signed-off-by: Zhang Zekun <zhangzekun11@huawei.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
Calling 'ln -s . symlink' or 'ln -s .. symlink' creates symlink pointing to
some object name which ends with U+F029 unicode codepoint. This is because
trailing dot in the object name is replaced by non-ASCII unicode codepoint.
So Linux SMB client currently is not able to create native symlink pointing
to current or parent directory on Windows SMB server which can be read by
either on local Windows server or by any other SMB client which does not
implement compatible-reverse character replacement.
Fix this problem in cifsConvertToUTF16() function which is doing that
character replacement. Function comment already says that it does not need
to handle special cases '.' and '..', but after introduction of native
symlinks in reparse point form, this handling is needed.
Note that this change depends on the previous change
"cifs: Improve creating native symlinks pointing to directory".
Signed-off-by: Pali Rohár <pali@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
SMB protocol for native symlinks distinguish between symlink to directory
and symlink to file. These two symlink types cannot be exchanged, which
means that symlink of file type pointing to directory cannot be resolved at
all (and vice-versa).
Windows follows this rule for local filesystems (NTFS) and also for SMB.
Linux SMB client currenly creates all native symlinks of file type. Which
means that Windows (and some other SMB clients) cannot resolve symlinks
pointing to directory created by Linux SMB client.
As Linux system does not distinguish between directory and file symlinks,
its API does not provide enough information for Linux SMB client during
creating of native symlinks.
Add some heuristic into the Linux SMB client for choosing the correct
symlink type during symlink creation. Check if the symlink target location
ends with slash, or last path component is dot or dot-dot, and check if the
target location on SMB share exists and is a directory. If at least one
condition is truth then create a new SMB symlink of directory type.
Otherwise create it as file type symlink.
This change improves interoperability with Windows systems. Windows systems
would be able to resolve more SMB symlinks created by Linux SMB client
which points to existing directory.
Signed-off-by: Pali Rohár <pali@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
BCH_INODE_i_size_dirty dates from before we had logged operations for
truncate (as well as finsert) - it hasn't been needed since before
bcachefs was mainlined.
BCH_INODE_i_sectors_dirty hasn't been needed since we started always
updating i_sectors transactionally - it's been unused for even longer.
BCH_INODE_backptr_untrusted also hasn't been used since prior to
mainlining; when unlinking a hardling, we zero out the backpointer
fields if they're for the dirent being removed.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
When we find an unreachable inode, we now reattach it in the oldest
version that needs to be reattached (thus avoiding redundant work
reattaching every single version), and we now fix up inode -> dirent
backpointers in newer versions as needed - or white out the reattaching
dirent in newer versions, if the newer version isn't supposed to be
reattached.
This results in the second verify fsck now passing cleanly after
repairing on a user-provided filesystem image with thousands of
different snapshots.
Reported-by: Christopher Snowhill <chris@kode54.net>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
With inode backpointers, we can write a very simple
check_unreachable_inodes() pass that only looks for non-unlinked inodes
that are missing backpointers, and reattaches them.
This simplifies check_directory_structure() so that it's now only
checking for directory structure loops,
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
A lot of little fixes, bigger ones include:
- bcachefs's __wait_on_freeing_inode() was broken in rc1 due to vfs
changes, now fixed along with another lost wakeup
- fragmentation LRU fixes; fsck now repairs successfully (this is the
data structure copygc uses); along with some nice simplification.
- Rework logged op error handling, so that if logged op replay errors
(due to another filesystem error) we delete the logged op instead of
going into an infinite loop)
- Various small filesystem connectivitity repair fixes
The final part of this patch series, fixing snapshots + unlinked file
handling, is now out on the list - I'm giving that part of the series
more time for user testing.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEKnAFLkS8Qha+jvQrE6szbY3KbnYFAmcBhkIACgkQE6szbY3K
bnYt8RAAqZo6RcN91sgz6xGsJkUvE6DS4Rtj1J4vlVAmuiIa5NUhRhqFnS6j8V9A
AWZw63JwTizrglbLk4Z4knfiViT4GOeiKX4sttaJk7cLW7bxwCUddlho1G5Q7q0I
PFurYevqG1ltcl5oZpD6LhZiqEhndQI3XnkpEvKsmoXy9TSB4KEqaU8Y+cewjq4q
KCFuxTBhbmatxP9eTGuDhd6uWw5h0EVDGQyMitEcSutIaernGlSsBQ8gZ5n9dWSd
lP91qFT5iypmCMo9Arf8Fq1YBvOpV6P91eq8YPa4A3sKDfzHn3CCzsSyjUiGK0RM
Wcl+kNwqYJa7Fwtb7aGgTVhaMkqLzPTI+XYye3FXrXjJ6B0JKpl2QvvDoFhDxop9
ZPb57QyRgRBtOvofvFz8fWQOr67n+HNvaMbeG1iwGvqm6/MrgdSLsN6OaRh80uAE
5P0qX7rwTTOfJj5T6dKLxr3KuXKXNrM5AAIG0MjOMsha232+XUAZvofYNmqx7BMi
juJvqZc9/GXrcXqdPTYDyBs4UXDkwHsKdr744ooZ64VNiIYFs6eTvXp7V0XuajYH
ExLrEEjhO2UGPM5N9R9jw9AMsEhJstexgylHQsiiADtdi+jY4LKa/NZAJSJQQC+C
QQyE3Q7ZCpzRPiGPkkpIY/D7IRoIHL2H+LhbXV/K3oMGdbA7hS4=
=XnG4
-----END PGP SIGNATURE-----
Merge tag 'bcachefs-2024-10-05' of git://evilpiepirate.org/bcachefs
Pull bcachefs fixes from Kent Overstreet:
"A lot of little fixes, bigger ones include:
- bcachefs's __wait_on_freeing_inode() was broken in rc1 due to vfs
changes, now fixed along with another lost wakeup
- fragmentation LRU fixes; fsck now repairs successfully (this is the
data structure copygc uses); along with some nice simplification.
- Rework logged op error handling, so that if logged op replay errors
(due to another filesystem error) we delete the logged op instead
of going into an infinite loop)
- Various small filesystem connectivitity repair fixes"
* tag 'bcachefs-2024-10-05' of git://evilpiepirate.org/bcachefs:
bcachefs: Rework logged op error handling
bcachefs: Add warn param to subvol_get_snapshot, peek_inode
bcachefs: Kill snapshot arg to fsck_write_inode()
bcachefs: Check for unlinked, non-empty dirs in check_inode()
bcachefs: Check for unlinked inodes with dirents
bcachefs: Check for directories with no backpointers
bcachefs: Kill alloc_v4.fragmentation_lru
bcachefs: minor lru fsck fixes
bcachefs: Mark more errors AUTOFIX
bcachefs: Make sure we print error that causes fsck to bail out
bcachefs: bkey errors are only AUTOFIX during read
bcachefs: Create lost+found in correct snapshot
bcachefs: Fix reattach_inode()
bcachefs: Add missing wakeup to bch2_inode_hash_remove()
bcachefs: Fix trans_commit disk accounting revert
bcachefs: Fix bch2_inode_is_open() check
bcachefs: Fix return type of dirent_points_to_inode_nowarn()
bcachefs: Fix bad shift in bch2_read_flag_list()
When multiple FREE_STATEIDs are sent for the same delegation stateid,
it can lead to a possible either use-after-free or counter refcount
underflow errors.
In nfsd4_free_stateid() under the client lock we find a delegation
stateid, however the code drops the lock before calling nfs4_put_stid(),
that allows another FREE_STATE to find the stateid again. The first one
will proceed to then free the stateid which leads to either
use-after-free or decrementing already zeroed counter.
Fixes: 3f29cc82a8 ("nfsd: split sc_status out of sc_type")
Signed-off-by: Olga Kornievskaia <okorniev@redhat.com>
Reviewed-by: Benjamin Coddington <bcodding@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
fast commits.
-----BEGIN PGP SIGNATURE-----
iQEzBAABCAAdFiEEK2m5VNv+CHkogTfJ8vlZVpUNgaMFAmcAYaIACgkQ8vlZVpUN
gaO8fQf+IOHRpyqVBXd2WwDJ05vBJAbbaKdhaI1r6gFPF1AzyIZ+lpCnEV9f9B2u
IrDZaigb/CiJxO9mLRLrp0z/O9s6Z/+NjNdVFkoynJo2Crfr5mi07DXR9EQ7XkvT
WYcT0IXg7xVswOzVosgBOUA76yMLJwtuTPp7YsAmCmp5EksMk1LSk1VzUEJcKRgJ
BHSWSn2BqHxwMStB5L0eBnYtdaW8l1RLeWlTPdcPeCrD7UrYCXThvxfyz2KdztBz
YfjcQEZhY6OD/uKdNFE6oQc54kL8RpqGDpF3YGMBixiZ8h99PpGUUz3bB5VLTfco
WlxQwOD9dofPQ5Yh+s5icY8q67XXow==
=Cnp+
-----END PGP SIGNATURE-----
Merge tag 'ext4_for_linus-5.12-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4
Pull ext4 fixes from Ted Ts'o:
"Fix some ext4 bugs and regressions relating to oneline resize and fast
commits"
* tag 'ext4_for_linus-5.12-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
ext4: fix off by one issue in alloc_flex_gd()
ext4: mark fc as ineligible using an handle in ext4_xattr_set()
ext4: use handle to mark fc as ineligible in __track_dentry_update()
Initially it was thought that we just wanted to ignore errors from
logged op replay, but it turns out we do need to catch -EROFS, or we'll
go into an infinite loop.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
These shouldn't always be fatal errors - logged op resume, in
particular, and we want it as a parameter there.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
It was initially believed that it would be better to be explicit about
the snapshot we're updating when writing inodes in fsck; however, it
turns out that passing around the snapshot separately is more error
prone and we're usually updating the inode in the same snapshow we read
it from.
This is different from normal filesystem paths, where we do the update
in the snapshot of the subvolume we're in.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>