linux

korg/linux

mirror of https://mirrors.bfsu.edu.cn/git/linux.git synced 2024-11-13 14:24:11 +08:00

Author	SHA1	Message	Date
Naohiro Aota	b1934cd606	btrfs: zoned: handle broken write pointer on zones Btrfs rejects to mount a FS if it finds a block group with a broken write pointer (e.g, unequal write pointers on two zones of RAID1 block group). Since such case can happen easily with a power-loss or crash of a system, we need to handle the case more gently. Handle such block group by making it unallocatable, so that there will be no writes into it. That can be done by setting the allocation pointer at the end of allocating region (= block_group->zone_capacity). Then, existing code handle zone_unusable properly. Having proper zone_capacity is necessary for the change. So, set it as fast as possible. We cannot handle RAID0 and RAID10 case like this. But, they are anyway unable to read because of a missing stripe. Fixes: `265f7237dd` ("btrfs: zoned: allow DUP on meta-data block groups") Fixes: `568220fa96` ("btrfs: zoned: support RAID0/1/10 on top of raid stripe tree") CC: stable@vger.kernel.org # 6.1+ Reported-by: HAN Yuwei <hrx@bupt.moe> Cc: Xuefer <xuefer@gmail.com> Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com> Signed-off-by: David Sterba <dsterba@suse.com>	2024-09-02 23:39:34 +02:00
Fedor Pchelkin	c346c62976	btrfs: qgroup: don't use extent changeset when not needed The local extent changeset is passed to clear_record_extent_bits() where it may have some additional memory dynamically allocated for ulist. When qgroup is disabled, the memory is leaked because in this case the changeset is not released upon __btrfs_qgroup_release_data() return. Since the recorded contents of the changeset are not used thereafter, just don't pass it. Found by Linux Verification Center (linuxtesting.org) with Syzkaller. Reported-by: syzbot+81670362c283f3dd889c@syzkaller.appspotmail.com Closes: https://lore.kernel.org/lkml/000000000000aa8c0c060ade165e@google.com Fixes: `af0e2aab3b` ("btrfs: qgroup: flush reservations during quota disable") CC: stable@vger.kernel.org # 6.10+ Reviewed-by: Boris Burkov <boris@bur.io> Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Fedor Pchelkin <pchelkin@ispras.ru> Signed-off-by: David Sterba <dsterba@suse.com>	2024-09-02 20:18:08 +02:00
Ryusuke Konishi	6576dd6695	nilfs2: fix state management in error path of log writing function After commit `a694291a62` ("nilfs2: separate wait function from nilfs_segctor_write") was applied, the log writing function nilfs_segctor_do_construct() was able to issue I/O requests continuously even if user data blocks were split into multiple logs across segments, but two potential flaws were introduced in its error handling. First, if nilfs_segctor_begin_construction() fails while creating the second or subsequent logs, the log writing function returns without calling nilfs_segctor_abort_construction(), so the writeback flag set on pages/folios will remain uncleared. This causes page cache operations to hang waiting for the writeback flag. For example, truncate_inode_pages_final(), which is called via nilfs_evict_inode() when an inode is evicted from memory, will hang. Second, the NILFS_I_COLLECTED flag set on normal inodes remain uncleared. As a result, if the next log write involves checkpoint creation, that's fine, but if a partial log write is performed that does not, inodes with NILFS_I_COLLECTED set are erroneously removed from the "sc_dirty_files" list, and their data and b-tree blocks may not be written to the device, corrupting the block mapping. Fix these issues by uniformly calling nilfs_segctor_abort_construction() on failure of each step in the loop in nilfs_segctor_do_construct(), having it clean up logs and segment usages according to progress, and correcting the conditions for calling nilfs_redirty_inodes() to ensure that the NILFS_I_COLLECTED flag is cleared. Link: https://lkml.kernel.org/r/20240814101119.4070-1-konishi.ryusuke@gmail.com Fixes: `a694291a62` ("nilfs2: separate wait function from nilfs_segctor_write") Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Tested-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2024-09-01 17:59:00 -07:00
Ryusuke Konishi	5787fcaab9	nilfs2: fix missing cleanup on rollforward recovery error In an error injection test of a routine for mount-time recovery, KASAN found a use-after-free bug. It turned out that if data recovery was performed using partial logs created by dsync writes, but an error occurred before starting the log writer to create a recovered checkpoint, the inodes whose data had been recovered were left in the ns_dirty_files list of the nilfs object and were not freed. Fix this issue by cleaning up inodes that have read the recovery data if the recovery routine fails midway before the log writer starts. Link: https://lkml.kernel.org/r/20240810065242.3701-1-konishi.ryusuke@gmail.com Fixes: `0f3e1c7f23` ("nilfs2: recovery functions") Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Tested-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2024-09-01 17:59:00 -07:00
Ryusuke Konishi	6834082589	nilfs2: protect references to superblock parameters exposed in sysfs The superblock buffers of nilfs2 can not only be overwritten at runtime for modifications/repairs, but they are also regularly swapped, replaced during resizing, and even abandoned when degrading to one side due to backing device issues. So, accessing them requires mutual exclusion using the reader/writer semaphore "nilfs->ns_sem". Some sysfs attribute show methods read this superblock buffer without the necessary mutual exclusion, which can cause problems with pointer dereferencing and memory access, so fix it. Link: https://lkml.kernel.org/r/20240811100320.9913-1-konishi.ryusuke@gmail.com Fixes: `da7141fb78` ("nilfs2: add /sys/fs/nilfs2/<device> group") Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2024-09-01 17:59:00 -07:00
Kent Overstreet	7f12a963b6	bcachefs: fix rebalance accounting Fixes: `49aa783039` ("bcachefs: Fix rebalance_work accounting") Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-09-01 15:54:40 -04:00
Baokun Li	72a6e22c60	fscache: delete fscache_cookie_lru_timer when fscache exits to avoid UAF The fscache_cookie_lru_timer is initialized when the fscache module is inserted, but is not deleted when the fscache module is removed. If timer_reduce() is called before removing the fscache module, the fscache_cookie_lru_timer will be added to the timer list of the current cpu. Afterwards, a use-after-free will be triggered in the softIRQ after removing the fscache module, as follows: ================================================================== BUG: unable to handle page fault for address: fffffbfff803c9e9 PF: supervisor read access in kernel mode PF: error_code(0x0000) - not-present page PGD 21ffea067 P4D 21ffea067 PUD 21ffe6067 PMD 110a7c067 PTE 0 Oops: Oops: 0000 [#1] PREEMPT SMP KASAN PTI CPU: 1 UID: 0 PID: 0 Comm: swapper/1 Tainted: G W 6.11.0-rc3 #855 Tainted: [W]=WARN RIP: 0010:__run_timer_base.part.0+0x254/0x8a0 Call Trace: <IRQ> tmigr_handle_remote_up+0x627/0x810 __walk_groups.isra.0+0x47/0x140 tmigr_handle_remote+0x1fa/0x2f0 handle_softirqs+0x180/0x590 irq_exit_rcu+0x84/0xb0 sysvec_apic_timer_interrupt+0x6e/0x90 </IRQ> <TASK> asm_sysvec_apic_timer_interrupt+0x1a/0x20 RIP: 0010:default_idle+0xf/0x20 default_idle_call+0x38/0x60 do_idle+0x2b5/0x300 cpu_startup_entry+0x54/0x60 start_secondary+0x20d/0x280 common_startup_64+0x13e/0x148 </TASK> Modules linked in: [last unloaded: netfs] ================================================================== Therefore delete fscache_cookie_lru_timer when removing the fscahe module. Fixes: `12bb21a29c` ("fscache: Implement cookie user counting and resource pinning") Cc: stable@kernel.org Signed-off-by: Baokun Li <libaokun1@huawei.com> Link: https://lore.kernel.org/r/20240826112056.2458299-1-libaokun@huaweicloud.com Acked-by: David Howells <dhowells@redhat.com> Signed-off-by: Christian Brauner <brauner@kernel.org>	2024-09-01 10:30:25 +02:00
Linus Torvalds	6b9ffc4595	four cifs.ko client fixes -----BEGIN PGP SIGNATURE----- iQGzBAABCgAdFiEE6fsu8pdIjtWE/DpLiiy9cAdyT1EFAmbTuKMACgkQiiy9cAdy T1GsHwwAnrVfxJ+ZiAH0wbfyFcgRLOAePeADcedn4QWQaPbmyjqqQbHfiwRwDa5X sICpnxCS+3MM9aahA7G4FOZNle/DexmFUODScESmYMfdqt4hMGzGbi9KhA4l7TY8 rcewHNpbAiPW3S0y/VtOBoXXskURMEL6+KCaBwE3u990jimJtCxPie4PQbfI/V6O 4Qjqc8qjryPo70ru4g72h/LfJdaDKxV/JYymDyhhu5/Gf7PPbv0QKZ9hhxhpc6Y4 81IcJ7S4JnLA8V9nrglrbV3ymvOCXNH0UQRHOa4Hc6H7MmrVj1aE5nu0/nfgVaOh iaaKfuuv6ItDQBWqUg6tHqM8DSPONJkbhuFkXqL/rOmrl7B0G5T1UBlt3ZqNZEy5 bEX1VCqCDQRsr1nUCxC7t5r03teXeNq59nWg/JWBBbLohWLp4Dw4eKW0xlKyo3VT Oxho3E8DnVXRu8MdTF/OeFJllp71KY3ujt2wm8uu+f5H45vz9mBN0UEUAx6hoh3c SsxufLuG =l4NV -----END PGP SIGNATURE----- Merge tag 'v6.11-rc5-smb-client-fixes' of git://git.samba.org/sfrench/cifs-2.6 Pull smb client fixes from Steve French: - copy_file_range fix - two read fixes including read past end of file rc fix and read retry crediting fix - falloc zero range fix * tag 'v6.11-rc5-smb-client-fixes' of git://git.samba.org/sfrench/cifs-2.6: cifs: Fix FALLOC_FL_ZERO_RANGE to preflush buffered part of target region cifs: Fix copy offload to flush destination region netfs, cifs: Fix handling of short DIO read cifs: Fix lack of credit renegotiation on read retry	2024-09-01 15:49:26 +12:00
Linus Torvalds	a4c763129f	bcachefs fixes for 6.11-rc6 - Fix a rare data corruption in the rebalance path, caught as a nonce inconsistency on encrypted filesystems - Revert lockless buffered write path - Mark more errors as autofix -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEKnAFLkS8Qha+jvQrE6szbY3KbnYFAmbT0+8ACgkQE6szbY3K bnafeg/+KQroY9Ig1Rn9qSnVKZOjkyDeqRq8sgvfOI5exDyuqcTgM69HU6HJbzzk wCFwVNoscx0PMMrHMLtnVKohevGnATHXqCMz0tZ1YIslFlPsHlQToYfDmae3keZQ ZX6crRCxIGxXUfx5VVf8tPn02ZFEqTkilHoZteCzp24w5d6dpjtlJwYzCJ5k+gTK 1lDcQp9IerwbbbFAvg0yu3BObTG6t2aHvtE0rHJ8gzlsVeDvxhnYRPRi4QJ5lar+ Zwpcp48559j4dl3lYh6y7rU4UfHEecxSu0blKF79D8h0u4dxzu0szyDZiZluVK84 uEI4/hNVDmL6W75mRbkjzzbwJqBdgIB35FomaziJ7Z2VFlaZf5YPWWRQE28NcMD6 nKGMtEc/ryFQKffqTHupAtp9cTZBXEQE9mZGcqWLX8mr7ClVztJLmJUCvicwAwBC sUKzhWiD6HgpAJYsDvukHNJEUGN/NBa4lp3x2lUu13n0zHRZkqY0+3b9EkDrO1KE 24ueRbD3l6g1SIRZmvCjiFCSSlOm5wpqzEYKrQndAyU3fXai/mCCncFT/fqs2zJs nH7TCR9pGvW3ln0GuyZyc8+lgcdZegPalAWLHtpNzy9xQWxbn19O4mCmRGhWCbKF irtL7Pn3+EKuUnhagIOp/ImDIH9po9yX9h5PmVndeJ9Dl6YhOF0= =LTM8 -----END PGP SIGNATURE----- Merge tag 'bcachefs-2024-08-21' of https://github.com/koverstreet/bcachefs Push bcachefs fixes from Kent Overstreet: "The data corruption in the buffered write path is troubling; inode lock should not have been able to cause that... - Fix a rare data corruption in the rebalance path, caught as a nonce inconsistency on encrypted filesystems - Revert lockless buffered write path - Mark more errors as autofix" * tag 'bcachefs-2024-08-21' of https://github.com/koverstreet/bcachefs: bcachefs: Mark more errors as autofix bcachefs: Revert lockless buffered IO path bcachefs: Fix bch2_extents_match() false positive bcachefs: Fix failure to return error in data_update_index_update()	2024-09-01 15:23:20 +12:00
Kent Overstreet	3d3020c461	bcachefs: Mark more errors as autofix errors that are known to always be safe to fix should be autofix: this should be most errors even at this point, but that will need some thorough review. note that errors are still logged in the superblock, so we'll still know that they happened. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-08-31 19:27:01 -04:00
Kent Overstreet	e3e6940940	bcachefs: Revert lockless buffered IO path We had a report of data corruption on nixos when building installer images. https://github.com/NixOS/nixpkgs/pull/321055#issuecomment-2184131334 It seems that writes are being dropped, but only when issued by QEMU, and possibly only in snapshot mode. It's undetermined if it's write calls are being dropped or dirty folios. Further testing, via minimizing the original patch to just the change that skips the inode lock on non appends/truncates, reveals that it really is just not taking the inode lock that causes the corruption: it has nothing to do with the other logic changes for preserving write atomicity in corner cases. It's also kernel config dependent: it doesn't reproduce with the minimal kernel config that ktest uses, but it does reproduce with nixos's distro config. Bisection the kernel config initially pointer the finger at page migration or compaction, but it appears that was erroneous; we haven't yet determined what kernel config option actually triggers it. Sadly it appears this will have to be reverted since we're getting too close to release and my plate is full, but we'd _really_ like to fully debug it. My suspicion is that this patch is exposing a preexisting bug - the inode lock actually covers very little in IO paths, and we have a different lock (the pagecache add lock) that guards against races with truncate here. Fixes: `7e64c86cdc` ("bcachefs: Buffered write path now can avoid the inode lock") Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-08-31 19:26:08 -04:00
Linus Torvalds	6a2fcc51a7	nfsd-6.11 fixes: - One more write delegation fix -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEKLLlsBKG3yQ88j7+M2qzM29mf5cFAmbTO2QACgkQM2qzM29m f5d6Jg/+L8ltg5iGzdgwZYoOrlhS7sz4Y/BcOViNZ25we0J+kFaauycCyMCG9wS1 o1NXAZ8d1lvDTZI8Bw7rzWl1IS2mjfg1NX8t5MhVUxrkus40jjwip9VPYRegQhBT WZ/ggaudZinc/+i2toR7eY3wJe/PqOWeML4XWbx//tinfLnlC62UKMudOvaXk3B8 8y0nGWQaJEuaZuFuA9FFOs7MHgR50rSevOdk90avBqFYBVvq2wA6ZvKw0TbH47Q6 BbELVbIqlFOSfui/w+DQXqGm7SYMOUkaLsPLspXXlDBR0myjORlQ8Ch6alaWp9pd 2yAGlYNalTJVlJt/2Uqu4USPZuUK9Ijd+2TNg1ObCdRFzpRVmQDU/wzv8A0DWNdI MbiwX2ckwUt3u2nh+DHWagSKcuxcRR908YwEHs3/rAmcZDSWiZdJtDZ3NiBKNZrD KHYdEOl5rl5P7bi6VcaR8gYREbKiq6BISo7ru3Ix7ImIQD87a/x393/tkOutw8bM VfIEYcnsbqlTs07KVUZ2jcIziFrttPmh5rs8qfDHsk899bzR1CBkQedwZAUD0Ghu dmvKebXSoLc2sWli5CcrfkWxkjRuIuSQMOPnY9RrRFFaNXBYC3JA7EUWsvbXsX0x WSuZPlS9Jv6bCdgvBMAIjTA/uxShLeEf33GIcKK9iI0mASKwXHY= =uoNK -----END PGP SIGNATURE----- Merge tag 'nfsd-6.11-3' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux Pull nfsd fix from Chuck Lever: - One more write delegation fix * tag 'nfsd-6.11-3' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: nfsd: fix nfsd4_deleg_getattr_conflict in presence of third party lease	2024-09-01 06:55:47 +12:00
Linus Torvalds	0efdc09796	Bug fixes for 6.11-rc6: * Do not call out v1 inodes with non-zero di_nlink field as being corrupt. * Change xfs_finobt_count_blocks() to count "free inode btree" blocks rather than "inode btree" blocks. * Don't report the number of trimmed bytes via FITRIM because the underlying storage isn't required to do anything and failed discard IOs aren't reported to the caller anyway. * Fix incorrect setting of rm_owner field in an rmap query. * Report missing disk offset range in an fsmap query. * Obtain m_growlock when extending realtime section of the filesystem. * Reset rootdir extent size hint after extending realtime section of the filesystem. Signed-off-by: Chandan Babu R <chandanbabu@kernel.org> -----BEGIN PGP SIGNATURE----- iHUEABYIAB0WIQQjMC4mbgVeU7MxEIYH7y4RirJu9AUCZs3OYgAKCRAH7y4RirJu 9OF/AP9MXSSmBHmTfpqJZbKCI9j+EvAGyucbITi32ZBnbnNnKgEAr5FrueGcKS98 H/FxMeNbSWZp0s5hUYsXsACtdo75YgE= =prEp -----END PGP SIGNATURE----- Merge tag 'xfs-6.11-fixes-4' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux Pull xfs fixes from Chandan Babu: - Do not call out v1 inodes with non-zero di_nlink field as being corrupt - Change xfs_finobt_count_blocks() to count "free inode btree" blocks rather than "inode btree" blocks - Don't report the number of trimmed bytes via FITRIM because the underlying storage isn't required to do anything and failed discard IOs aren't reported to the caller anyway - Fix incorrect setting of rm_owner field in an rmap query - Report missing disk offset range in an fsmap query - Obtain m_growlock when extending realtime section of the filesystem - Reset rootdir extent size hint after extending realtime section of the filesystem * tag 'xfs-6.11-fixes-4' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: xfs: reset rootdir extent size hint after growfsrt xfs: take m_growlock when running growfsrt xfs: Fix missing interval for missing_owner in xfs fsmap xfs: use XFS_BUF_DADDR_NULL for daddrs in getfsmap code xfs: Fix the owner setting issue for rmap query in xfs fsmap xfs: don't bother reporting blocks trimmed via FITRIM xfs: xfs_finobt_count_blocks() walks the wrong btree xfs: fix folio dirtying for XFILE_ALLOC callers xfs: fix di_onlink checking for V1/V2 inodes	2024-09-01 06:48:37 +12:00
NeilBrown	40927f3d09	nfsd: fix nfsd4_deleg_getattr_conflict in presence of third party lease It is not safe to dereference fl->c.flc_owner without first confirming fl->fl_lmops is the expected manager. nfsd4_deleg_getattr_conflict() tests fl_lmops but largely ignores the result and assumes that flc_owner is an nfs4_delegation anyway. This is wrong. With this patch we restore the "!= &nfsd_lease_mng_ops" case to behave as it did before the change mentioned below. This is the same as the current code, but without any reference to a possible delegation. Fixes: `c5967721e1` ("NFSD: handle GETATTR conflict with write delegation") Signed-off-by: NeilBrown <neilb@suse.de> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2024-08-30 10:48:29 -04:00
Dan Carpenter	844436e045	ksmbd: Unlock on in ksmbd_tcp_set_interfaces() Unlock before returning an error code if this allocation fails. Fixes: `0626e6641f` ("cifsd: add server handler for central processing and tranport layers") Cc: stable@vger.kernel.org # v5.15+ Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Acked-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>	2024-08-29 20:28:37 -05:00
Namjae Jeon	78c5a6f1f6	ksmbd: unset the binding mark of a reused connection Steve French reported null pointer dereference error from sha256 lib. cifs.ko can send session setup requests on reused connection. If reused connection is used for binding session, conn->binding can still remain true and generate_preauth_hash() will not set sess->Preauth_HashValue and it will be NULL. It is used as a material to create an encryption key in ksmbd_gen_smb311_encryptionkey. ->Preauth_HashValue cause null pointer dereference error from crypto_shash_update(). BUG: kernel NULL pointer dereference, address: 0000000000000000 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 0 P4D 0 Oops: 0000 [#1] PREEMPT SMP PTI CPU: 8 PID: 429254 Comm: kworker/8:39 Hardware name: LENOVO 20MAS08500/20MAS08500, BIOS N2CET69W (1.52 ) Workqueue: ksmbd-io handle_ksmbd_work [ksmbd] RIP: 0010:lib_sha256_base_do_update.isra.0+0x11e/0x1d0 [sha256_ssse3] <TASK> ? show_regs+0x6d/0x80 ? __die+0x24/0x80 ? page_fault_oops+0x99/0x1b0 ? do_user_addr_fault+0x2ee/0x6b0 ? exc_page_fault+0x83/0x1b0 ? asm_exc_page_fault+0x27/0x30 ? __pfx_sha256_transform_rorx+0x10/0x10 [sha256_ssse3] ? lib_sha256_base_do_update.isra.0+0x11e/0x1d0 [sha256_ssse3] ? __pfx_sha256_transform_rorx+0x10/0x10 [sha256_ssse3] ? __pfx_sha256_transform_rorx+0x10/0x10 [sha256_ssse3] _sha256_update+0x77/0xa0 [sha256_ssse3] sha256_avx2_update+0x15/0x30 [sha256_ssse3] crypto_shash_update+0x1e/0x40 hmac_update+0x12/0x20 crypto_shash_update+0x1e/0x40 generate_key+0x234/0x380 [ksmbd] generate_smb3encryptionkey+0x40/0x1c0 [ksmbd] ksmbd_gen_smb311_encryptionkey+0x72/0xa0 [ksmbd] ntlm_authenticate.isra.0+0x423/0x5d0 [ksmbd] smb2_sess_setup+0x952/0xaa0 [ksmbd] __process_request+0xa3/0x1d0 [ksmbd] __handle_ksmbd_work+0x1c4/0x2f0 [ksmbd] handle_ksmbd_work+0x2d/0xa0 [ksmbd] process_one_work+0x16c/0x350 worker_thread+0x306/0x440 ? __pfx_worker_thread+0x10/0x10 kthread+0xef/0x120 ? __pfx_kthread+0x10/0x10 ret_from_fork+0x44/0x70 ? __pfx_kthread+0x10/0x10 ret_from_fork_asm+0x1b/0x30 </TASK> Fixes: `f5a544e3ba` ("ksmbd: add support for SMB3 multichannel") Cc: stable@vger.kernel.org # v5.15+ Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>	2024-08-29 20:28:36 -05:00
Thorsten Blum	8d8d244726	smb: Annotate struct xattr_smb_acl with __counted_by() Add the __counted_by compiler attribute to the flexible array member entries to improve access bounds-checking via CONFIG_UBSAN_BOUNDS and CONFIG_FORTIFY_SOURCE. Signed-off-by: Thorsten Blum <thorsten.blum@toblux.com> Acked-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>	2024-08-29 20:28:36 -05:00
Linus Torvalds	1b5fe53681	execve fix for v6.11-rc6 - binfmt_elf_fdpic: fix AUXV size with ELF_HWCAP2 (Max Filippov) -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQRSPkdeREjth1dHnSE2KwveOeQkuwUCZtDUTAAKCRA2KwveOeQk u/G1AP95bAt6g/+da7pGzS3KwdCZVUNL36kaZvoj8zH7ShVMkQD9GotercPaISh1 PURnWKPYUrMxaHGUxRc0IOXBRPTeXgo= =FRmR -----END PGP SIGNATURE----- Merge tag 'execve-v6.11-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux Pull execve fix from Kees Cook: - binfmt_elf_fdpic: fix AUXV size with ELF_HWCAP2 (Max Filippov) * tag 'execve-v6.11-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: binfmt_elf_fdpic: fix AUXV size calculation when ELF_HWCAP2 is defined	2024-08-30 12:32:53 +12:00
Stephen Brennan	04c8abae1b	dcache: keep dentry_hashtable or d_hash_shift even when not used The runtime constant feature removes all the users of these variables, allowing the compiler to optimize them away. It's quite difficult to extract their values from the kernel text, and the memory saved by removing them is tiny, and it was never the point of this optimization. Since the dentry_hashtable is a core data structure, it's valuable for debugging tools to be able to read it easily. For instance, scripts built on drgn, like the dentrycache script[1], rely on it to be able to perform diagnostics on the contents of the dcache. Annotate it as used, so the compiler doesn't discard it. Link: `3afc56146f/drgn_tools/dentry.py (L325-L355)` [1] Fixes: `e3c92e8171` ("runtime constants: add x86 architecture support") Signed-off-by: Stephen Brennan <stephen.s.brennan@oracle.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2024-08-30 12:25:50 +12:00
Bernd Schubert	3ab394b363	fuse: disable the combination of passthrough and writeback cache Current design and handling of passthrough is without fuse caching and with that FUSE_WRITEBACK_CACHE is conflicting. Fixes: `7dc4e97a4f` ("fuse: introduce FUSE_PASSTHROUGH capability") Cc: stable@kernel.org # v6.9 Signed-off-by: Bernd Schubert <bschubert@ddn.com> Acked-by: Amir Goldstein <amir73il@gmail.com> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2024-08-29 11:43:01 +02:00
David Howells	91d1dfae46	cifs: Fix FALLOC_FL_ZERO_RANGE to preflush buffered part of target region Under certain conditions, the range to be cleared by FALLOC_FL_ZERO_RANGE may only be buffered locally and not yet have been flushed to the server. For example: xfs_io -f -t -c "pwrite -S 0x41 0 4k" \ -c "pwrite -S 0x42 4k 4k" \ -c "fzero 0 4k" \ -c "pread -v 0 8k" /xfstest.test/foo will write two 4KiB blocks of data, which get buffered in the pagecache, and then fallocate() is used to clear the first 4KiB block on the server - but we don't flush the data first, which means the EOF position on the server is wrong, and so the FSCTL_SET_ZERO_DATA RPC fails (and xfs_io ignores the error), but then when we try to read it, we see the old data. Fix this by preflushing any part of the target region that above the server's idea of the EOF position to force the server to update its EOF position. Note, however, that we don't want to simply expand the file by moving the EOF before doing the FSCTL_SET_ZERO_DATA[] because someone else might see the zeroed region or if the RPC fails we then have to try to clean it up or risk getting corruption. [] And we have to move the EOF first otherwise FSCTL_SET_ZERO_DATA won't do what we want. This fixes the generic/008 xfstest. [!] Note: A better way to do this might be to split the operation into two parts: we only do FSCTL_SET_ZERO_DATA for the part of the range below the server's EOF and then, if that worked, invalidate the buffered pages for the part above the range. Fixes: `6b69040247` ("cifs/smb3: Fix data inconsistent when zero file range") Signed-off-by: David Howells <dhowells@redhat.com> cc: Steve French <stfrench@microsoft.com> cc: Zhang Xiaoxu <zhangxiaoxu5@huawei.com> cc: Pavel Shilovsky <pshilov@microsoft.com> cc: Paulo Alcantara <pc@manguebit.com> cc: Shyam Prasad N <nspmangalore@gmail.com> cc: Rohith Surabattula <rohiths.msft@gmail.com> cc: Jeff Layton <jlayton@kernel.org> cc: linux-cifs@vger.kernel.org cc: linux-mm@kvack.org Signed-off-by: Steve French <stfrench@microsoft.com>	2024-08-28 16:52:17 -05:00
Linus Torvalds	a18093afa3	nfsd-6.11 fixes: - Fix a number of crashers - Update email address for an NFSD reviewer -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEKLLlsBKG3yQ88j7+M2qzM29mf5cFAmbPLJQACgkQM2qzM29m f5edLBAAveHRyP1XKgLIFtIX40GxK029NEHT2vq9ageaLFsFGylAaAgXSB5rUXER 3FfhOrNuAr/ZAV2bqm/qnA3yDuh/OuD9MIxEA+C2cNJ4bjEKXaqq5iePTK+AeNGH cnvE3UlI9rQkflpuZxbliJZm+u+mxaKnMO2riFbZeKrQh5C6Mn6z4fXp88CZj65U 7oDeONqyjMEtkjJPutzJZr0gJbPjeGgZlrsVgMMg/nki3y+Fal6Rt0hDO2u9evV9 3zTyJ7S7yUnsZ0b2JTx061EfJLd7KFmefWG4UKRKYk1XtiDbHt/cUSi4/QBA0EWw 6VK5aJWUF2OwUpkAYohU0o4/qApoce1raR0cpwrRwzLINDdwPTkfz9L8dpjRPJ68 ubUhWP7D/xASD5RhSrbH6lG8XY3ISXkRk2knwKXFtJtq2uIz2Gxc6F1gzzPE/whR N6JxdiMrUaKoO4qiwOtrXwiYACR9+qsVSW/QxNG1xdQhNXyLb5L6uUSe384MPybw nVIkrOdwO4uors6DzkFoHI/Au2QrTi9pzn53PK01RkhrJKUIS3lMPBnPbf5RAIQN EowYaJNSm52Px+omXzZzHoOgI0h0P3UWiuXoui1Zcy/xCT6uEbj5QtKxM6iKSXyd 4KU3DnC+9nr6+3ld47MFn38f7gmPNrvB56XMCDQKmr9x9OmtB5k= =fRqF -----END PGP SIGNATURE----- Merge tag 'nfsd-6.11-2' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux Pull nfsd fixes from Chuck Lever: - Fix a number of crashers - Update email address for an NFSD reviewer * tag 'nfsd-6.11-2' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: fs/nfsd: fix update of inode attrs in CB_GETATTR nfsd: fix potential UAF in nfsd4_cb_getattr_release nfsd: hold reference to delegation when updating it for cb_getattr MAINTAINERS: Update Olga Kornievskaia's email address nfsd: prevent panic for nfsv4.0 closed files in nfs4_show_open nfsd: ensure that nfsd4_fattr_args.context is zeroed out	2024-08-29 06:20:44 +12:00
Linus Torvalds	2840526875	for-6.11-rc5-tag -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEE8rQSAMVO+zA4DBdWxWXV+ddtWDsFAmbPBfoACgkQxWXV+ddt WDvzwxAAlL4L2EA3UhFJNHXqvDfGUKldyNv3pn9p6fT5fsSrJGMiVVbF0BDTtrw7 ZekH/ELxNaGCndR7qPScahez9Qll9JiRHVx68yBuanAqjCU7UfE6omKG5wo82+NX vjNNovKcyXEdErsD6TFVpv7abErEEHmVVUxKjKB4nJxux6OuvZZDzmN0pm4WrmIm 226az0nxPi+MQjROKMT3qcyccxDxUzDLRnzCunkHSJzojBC3KZimx707/rZSQi05 w4HVL1QoxfhKLRA5qXWp27YWbz78UehFv68HlAZLabnJq6khWdHaVUpfkfdJurn9 j/ZAFlb8vKWFRIh2WQE3twJ2nJTW/h2zvnMV0NbXFz3LGlG6krOPYuFARD0WUNAI OdZANhi1YJQTlgBjiKYU+bP6dN5hA49TFL6/LGNrgAHJzVw8Mf7ovsG4L9gwYRoU D9Ed2IFZj3SYJj92u8/1VHD0yejg8tZNBYT6fiSBdY+Da6q5rZ2fvxxJIHv7m3A6 crkm5APdBLGCCQRm5Vnqy/K8PSf2vu5moLebS81pKsJWxA7/d8jLcnMhHfoBMw8X LpLxpsXnaatSUYysclmzaRazpChvTYbE4z5qAYc0kUNI/wbx/q/A42ixOaGnsPWX pnfDs6jb6VHxT1bPgDjipFUJAeB/og8C6nyX4Tuq53gmWuR6q2k= =+IpE -----END PGP SIGNATURE----- Merge tag 'for-6.11-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs fixes from David Sterba: - fix use-after-free when submitting bios for read, after an error and partially submitted bio the original one is freed while it can be still be accessed again - fix fstests case btrfs/301, with enabled quotas wait for delayed iputs when flushing delalloc - fix periodic block group reclaim, an unitialized value can be returned if there are no block groups to reclaim - fix build warning (-Wmaybe-uninitialized) * tag 'for-6.11-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: btrfs: fix uninitialized return value from btrfs_reclaim_sweep() btrfs: fix a use-after-free when hitting errors inside btrfs_submit_chunk() btrfs: initialize last_extent_end to fix -Wmaybe-uninitialized warning in extent_fiemap() btrfs: run delayed iputs when flushing delalloc	2024-08-29 06:17:46 +12:00
Joanne Koong	f7790d6778	fuse: update stats for pages in dropped aux writeback list In the case where the aux writeback list is dropped (e.g. the pages have been truncated or the connection is broken), the stats for its pages and backing device info need to be updated as well. Fixes: `e2653bd53a` ("fuse: fix leaked aux requests") Signed-off-by: Joanne Koong <joannelkoong@gmail.com> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Cc: <stable@vger.kernel.org> # v5.1 Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2024-08-28 18:10:29 +02:00
Miklos Szeredi	76a51ac00c	fuse: clear PG_uptodate when using a stolen page Originally when a stolen page was inserted into fuse's page cache by fuse_try_move_page(), it would be marked uptodate. Then fuse_readpages_end() would call SetPageUptodate() again on the already uptodate page. Commit `413e8f014c` ("fuse: Convert fuse_readpages_end() to use folio_end_read()") changed that by replacing the SetPageUptodate() + unlock_page() combination with folio_end_read(), which does mostly the same, except it sets the uptodate flag with an xor operation, which in the above scenario resulted in the uptodate flag being cleared, which in turn resulted in EIO being returned on the read. Fix by clearing PG_uptodate instead of setting it in fuse_try_move_page(), conforming to the expectation of folio_end_read(). Reported-by: Jürg Billeter <j@bitron.ch> Debugged-by: Matthew Wilcox <willy@infradead.org> Fixes: `413e8f014c` ("fuse: Convert fuse_readpages_end() to use folio_end_read()") Cc: <stable@vger.kernel.org> # v6.10 Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2024-08-28 18:10:29 +02:00
yangyun	3002240d16	fuse: fix memory leak in fuse_create_open The memory of struct fuse_file is allocated but not freed when get_create_ext return error. Fixes: `3e2b6fdbdc` ("fuse: send security context of inode on file") Cc: stable@vger.kernel.org # v5.17 Signed-off-by: yangyun <yangyun50@huawei.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2024-08-28 18:10:29 +02:00
Joanne Koong	97f30876c9	fuse: check aborted connection before adding requests to pending list for resending There is a race condition where inflight requests will not be aborted if they are in the middle of being re-sent when the connection is aborted. If fuse_resend has already moved all the requests in the fpq->processing lists to its private queue ("to_queue") and then the connection starts and finishes aborting, these requests will be added to the pending queue and remain on it indefinitely. Fixes: `760eac73f9` ("fuse: Introduce a new notification type for resend pending requests") Signed-off-by: Joanne Koong <joannelkoong@gmail.com> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: Jingbo Xu <jefflexu@linux.alibaba.com> Cc: <stable@vger.kernel.org> # v6.9 Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2024-08-28 18:10:29 +02:00
Jann Horn	b18915248a	fuse: use unsigned type for getxattr/listxattr size truncation The existing code uses min_t(ssize_t, outarg.size, XATTR_LIST_MAX) when parsing the FUSE daemon's response to a zero-length getxattr/listxattr request. On 32-bit kernels, where ssize_t and outarg.size are the same size, this is wrong: The min_t() will pass through any size values that are negative when interpreted as signed. fuse_listxattr() will then return this userspace-supplied negative value, which callers will treat as an error value. This kind of bug pattern can lead to fairly bad security bugs because of how error codes are used in the Linux kernel. If a caller were to convert the numeric error into an error pointer, like so: struct foo *func(...) { int len = fuse_getxattr(..., NULL, 0); if (len < 0) return ERR_PTR(len); ... } then it would end up returning this userspace-supplied negative value cast to a pointer - but the caller of this function wouldn't recognize it as an error pointer (IS_ERR_VALUE() only detects values in the narrow range in which legitimate errno values are), and so it would just be treated as a kernel pointer. I think there is at least one theoretical codepath where this could happen, but that path would involve virtio-fs with submounts plus some weird SELinux configuration, so I think it's probably not a concern in practice. Cc: stable@vger.kernel.org # v4.9 Fixes: `63401ccdb2` ("fuse: limit xattr returned size") Signed-off-by: Jann Horn <jannh@google.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2024-08-28 18:10:29 +02:00
David Howells	8101d6e112	cifs: Fix copy offload to flush destination region Fix cifs_file_copychunk_range() to flush the destination region before invalidating it to avoid potential loss of data should the copy fail, in whole or in part, in some way. Fixes: `7b2404a886` ("cifs: Fix flushing, invalidation and file size with copy_file_range()") Signed-off-by: David Howells <dhowells@redhat.com> cc: Steve French <stfrench@microsoft.com> cc: Paulo Alcantara <pc@manguebit.com> cc: Shyam Prasad N <nspmangalore@gmail.com> cc: Rohith Surabattula <rohiths.msft@gmail.com> cc: Matthew Wilcox <willy@infradead.org> cc: Jeff Layton <jlayton@kernel.org> cc: linux-cifs@vger.kernel.org cc: linux-mm@kvack.org cc: linux-fsdevel@vger.kernel.org Signed-off-by: Steve French <stfrench@microsoft.com>	2024-08-28 07:48:33 -05:00
David Howells	1da29f2c39	netfs, cifs: Fix handling of short DIO read Short DIO reads, particularly in relation to cifs, are not being handled correctly by cifs and netfslib. This can be tested by doing a DIO read of a file where the size of read is larger than the size of the file. When it crosses the EOF, it gets a short read and this gets retried, and in the case of cifs, the retry read fails, with the failure being translated to ENODATA. Fix this by the following means: (1) Add a flag, NETFS_SREQ_HIT_EOF, for the filesystem to set when it detects that the read did hit the EOF. (2) Make the netfslib read assessment stop processing subrequests when it encounters one with that flag set. (3) Return rreq->transferred, the accumulated contiguous amount read to that point, to userspace for a DIO read. (4) Make cifs set the flag and clear the error if the read RPC returned ENODATA. (5) Make cifs set the flag and clear the error if a short read occurred without error and the read-to file position is now at the remote inode size. Fixes: `69c3c023af` ("cifs: Implement netfslib hooks") Signed-off-by: David Howells <dhowells@redhat.com> cc: Steve French <sfrench@samba.org> cc: Paulo Alcantara <pc@manguebit.com> cc: Jeff Layton <jlayton@kernel.org> cc: linux-cifs@vger.kernel.org cc: netfs@lists.linux.dev cc: linux-fsdevel@vger.kernel.org Signed-off-by: Steve French <stfrench@microsoft.com>	2024-08-28 07:47:36 -05:00
David Howells	6a5dcd4877	cifs: Fix lack of credit renegotiation on read retry When netfslib asks cifs to issue a read operation, it prefaces this with a call to ->clamp_length() which cifs uses to negotiate credits, providing receive capacity on the server; however, in the event that a read op needs reissuing, netfslib doesn't call ->clamp_length() again as that could shorten the subrequest, leaving a gap. This causes the retried read to be done with zero credits which causes the server to reject it with STATUS_INVALID_PARAMETER. This is a problem for a DIO read that is requested that would go over the EOF. The short read will be retried, causing EINVAL to be returned to the user when it fails. Fix this by making cifs_req_issue_read() negotiate new credits if retrying (NETFS_SREQ_RETRYING now gets set in the read side as well as the write side in this instance). This isn't sufficient, however: the new credits might not be sufficient to complete the remainder of the read, so also add an additional field, rreq->actual_len, that holds the actual size of the op we want to perform without having to alter subreq->len. We then rely on repeated short reads being retried until we finish the read or reach the end of file and make a zero-length read. Also fix a couple of places where the subrequest start and length need to be altered by the amount so far transferred when being used. Fixes: `69c3c023af` ("cifs: Implement netfslib hooks") Signed-off-by: David Howells <dhowells@redhat.com> cc: Steve French <sfrench@samba.org> cc: Paulo Alcantara <pc@manguebit.com> cc: Jeff Layton <jlayton@kernel.org> cc: linux-cifs@vger.kernel.org cc: netfs@lists.linux.dev cc: linux-fsdevel@vger.kernel.org Signed-off-by: Steve French <stfrench@microsoft.com>	2024-08-28 07:47:36 -05:00
Linus Torvalds	86987d84b9	four cifs.ko client fixes -----BEGIN PGP SIGNATURE----- iQGzBAABCgAdFiEE6fsu8pdIjtWE/DpLiiy9cAdyT1EFAmbOBa8ACgkQiiy9cAdy T1GHBAwAmSK9FTCg4x5tTRBwiS9VSrC3KQ2TwLrkXjeWXhJycZjsDQHnbHG68Q+t Sbq711RClpdvwMWLUqiryjd+VVqPzG/9jZLOPeeW7SIljyksUzxaQXGbGcquz57N hnZjrjyyquU5NhtOALyVeO4lNYboYTH+fETsrMoJIGNoI0yBHZSM/eRQO1heLRBn 629yKbqp0m/5/A/w3s1nKljO74sG//6LKDZld6es7tmxgku8TFNEqsI7SONw3pUg dYgM2kIPf4rwpqupfxSriylz0xlHIEmITn5wkvygS+TvcWXsG855TVLjtD1I6uX3 JYOZ9gfqubGNXkT5SbbfsmAOma8PBm54oT0UWwJVUj/5Ed3D9EyBl2jlqjNLQjSU qJ0/ha+AyFCn2vviPA+vVnHd5I2Y82JlI4VrwrSyHG5E/6UMHNQ2Do8GbA/eRdcp HeqR57V4VNzNzVfKCp4XygwBuifbXdRX+yrUdBPDZm+CMLPD6wGZdUQu4FaESYv6 i24UdVG9 =BDrz -----END PGP SIGNATURE----- Merge tag 'v6.11-rc5-client-fixes' of git://git.samba.org/sfrench/cifs-2.6 Pull smb client fixes from Steve French: - two RDMA/smbdirect fixes and a minor cleanup - punch hole fix * tag 'v6.11-rc5-client-fixes' of git://git.samba.org/sfrench/cifs-2.6: cifs: Fix FALLOC_FL_PUNCH_HOLE support smb/client: fix rdma usage in smb2_async_writev() smb/client: remove unused rq_iter_size from struct smb_rqst smb/client: avoid dereferencing rdata=NULL in smb2_new_read_req()	2024-08-28 15:05:02 +12:00
Filipe Manana	ecb54277cb	btrfs: fix uninitialized return value from btrfs_reclaim_sweep() The return variable 'ret' at btrfs_reclaim_sweep() is never assigned if none of the space infos is reclaimable (for example if periodic reclaim is disabled, which is the default), so we return an undefined value. This can be fixed my making btrfs_reclaim_sweep() not return any value as well as do_reclaim_sweep() because: 1) do_reclaim_sweep() always returns 0, so we can make it return void; 2) The only caller of btrfs_reclaim_sweep() (btrfs_reclaim_bgs()) doesn't care about its return value, and in its context there's nothing to do about any errors anyway. Therefore remove the return value from btrfs_reclaim_sweep() and do_reclaim_sweep(). Fixes: `e4ca3932ae` ("btrfs: periodic block_group reclaim") Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2024-08-27 16:42:09 +02:00
Darrick J. Wong	a24cae8fc1	xfs: reset rootdir extent size hint after growfsrt If growfsrt is run on a filesystem that doesn't have a rt volume, it's possible to change the rt extent size. If the root directory was previously set up with an inherited extent size hint and rtinherit, it's possible that the hint is no longer a multiple of the rt extent size. Although the verifiers don't complain about this, xfs_repair will, so if we detect this situation, log the root directory to clean it up. This is still racy, but it's better than nothing. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>	2024-08-27 18:32:14 +05:30
Darrick J. Wong	16e1fbdce9	xfs: take m_growlock when running growfsrt Take the grow lock when we're expanding the realtime volume, like we do for the other growfs calls. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>	2024-08-27 18:32:14 +05:30
Zizhi Wo	ca6448aed4	xfs: Fix missing interval for missing_owner in xfs fsmap In the fsmap query of xfs, there is an interval missing problem: [root@fedora ~]# xfs_io -c 'fsmap -vvvv' /mnt EXT: DEV BLOCK-RANGE OWNER FILE-OFFSET AG AG-OFFSET TOTAL 0: 253:16 [0..7]: static fs metadata 0 (0..7) 8 1: 253:16 [8..23]: per-AG metadata 0 (8..23) 16 2: 253:16 [24..39]: inode btree 0 (24..39) 16 3: 253:16 [40..47]: per-AG metadata 0 (40..47) 8 4: 253:16 [48..55]: refcount btree 0 (48..55) 8 5: 253:16 [56..103]: per-AG metadata 0 (56..103) 48 6: 253:16 [104..127]: free space 0 (104..127) 24 ...... BUG: [root@fedora ~]# xfs_io -c 'fsmap -vvvv -d 104 107' /mnt [root@fedora ~]# Normally, we should be able to get [104, 107), but we got nothing. The problem is caused by shifting. The query for the problem-triggered scenario is for the missing_owner interval (e.g. freespace in rmapbt/ unknown space in bnobt), which is obtained by subtraction (gap). For this scenario, the interval is obtained by info->last. However, rec_daddr is calculated based on the start_block recorded in key[1], which is converted by calling XFS_BB_TO_FSBT. Then if rec_daddr does not exceed info->next_daddr, which means keys[1].fmr_physical >> (mp)->m_blkbb_log <= info->next_daddr, no records will be displayed. In the above example, 104 >> (mp)->m_blkbb_log = 12 and 107 >> (mp)->m_blkbb_log = 12, so the two are reduced to 0 and the gap is ignored: before calculate ----------------> after shifting 104(st) 107(ed) 12(st/ed) \|---------\| \| sector size block size Resolve this issue by introducing the "end_daddr" field in xfs_getfsmap_info. This records \|key[1].fmr_physical + key[1].length\| at the granularity of sector. If the current query is the last, the rec_daddr is end_daddr to prevent missing interval problems caused by shifting. We only need to focus on the last query, because xfs disks are internally aligned with disk blocksize that are powers of two and minimum 512, so there is no problem with shifting in previous queries. After applying this patch, the above problem have been solved: [root@fedora ~]# xfs_io -c 'fsmap -vvvv -d 104 107' /mnt EXT: DEV BLOCK-RANGE OWNER FILE-OFFSET AG AG-OFFSET TOTAL 0: 253:16 [104..106]: free space 0 (104..106) 3 Fixes: `e89c041338` ("xfs: implement the GETFSMAP ioctl") Signed-off-by: Zizhi Wo <wozizhi@huawei.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> [djwong: limit the range of end_addr correctly] Signed-off-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>	2024-08-27 18:32:14 +05:30
Darrick J. Wong	6b35cc8d92	xfs: use XFS_BUF_DADDR_NULL for daddrs in getfsmap code Use XFS_BUF_DADDR_NULL (instead of a magic sentinel value) to mean "this field is null" like the rest of xfs. Cc: wozizhi@huawei.com Fixes: `e89c041338` ("xfs: implement the GETFSMAP ioctl") Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>	2024-08-27 18:32:08 +05:30
Linus Torvalds	3e9bff3bbe	vfs-6.11-rc6.fixes -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCZsxg5QAKCRCRxhvAZXjc olSiAQDvFvim4YtMmUDagC3yWTBsf+o3lYdAIuzNE0NtSn4vpAEAl/HVhQCaEDjv mcE3jokEsbvyXLnzs78PrY0Heua2mQg= =AHAd -----END PGP SIGNATURE----- Merge tag 'vfs-6.11-rc6.fixes' of gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs Pull vfs fixes from Christian Brauner: "VFS: - Ensure that backing files uses file->f_ops->splice_write() for splice netfs: - Revert the removal of PG_private_2 from netfs_release_folio() as cephfs still relies on this - When AS_RELEASE_ALWAYS is set on a mapping the folio needs to always be invalidated during truncation - Fix losing untruncated data in a folio by making letting netfs_release_folio() return false if the folio is dirty - Fix trimming of streaming-write folios in netfs_inval_folio() - Reset iterator before retrying a short read - Fix interaction of streaming writes with zero-point tracker afs: - During truncation afs currently calls truncate_setsize() which sets i_size, expands the pagecache and truncates it. The first two operations aren't needed because they will have already been done. So call truncate_pagecache() instead and skip the redundant parts overlayfs: - Fix checking of the number of allowed lower layers so 500 layers can actually be used instead of just 499 - Add missing '\n' to pr_err() output - Pass string to ovl_parse_layer() and thus allow it to be used for Opt_lowerdir as well pidfd: - Revert blocking the creation of pidfds for kthread as apparently userspace relies on this. Specifically, it breaks systemd during shutdown romfs: - Fix romfs_read_folio() to use the correct offset with folio_zero_tail()" * tag 'vfs-6.11-rc6.fixes' of gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs: netfs: Fix interaction of streaming writes with zero-point tracker netfs: Fix missing iterator reset on retry of short read netfs: Fix trimming of streaming-write folios in netfs_inval_folio() netfs: Fix netfs_release_folio() to say no if folio dirty afs: Fix post-setattr file edit to do truncation correctly mm: Fix missing folio invalidation calls during truncation ovl: ovl_parse_param_lowerdir: Add missed '\n' for pr_err ovl: fix wrong lowerdir number check for parameter Opt_lowerdir ovl: pass string to ovl_parse_layer() backing-file: convert to using fops->splice_write Revert "pidfd: prevent creation of pidfds for kthreads" romfs: fix romfs_read_folio() netfs, ceph: Partially revert "netfs: Replace PG_fscache by setting folio->private and marking dirty"	2024-08-27 16:57:35 +12:00
Kent Overstreet	d26935690c	bcachefs: Fix bch2_extents_match() false positive This was caught as a very rare nonce inconsistency, on systems with encryption and replication (and tiering, or some form of rebalance operation running): [Wed Jul 17 13:30:03 2024] about to insert invalid key in data update path [Wed Jul 17 13:30:03 2024] old: u64s 10 type extent 671283510:6392:U32_MAX len 16 ver 106595503: durability: 2 crc: c_size 8 size 16 offset 0 nonce 0 csum chacha20_poly1305_80 compress zstd ptr: 3:355968:104 gen 7 ptr: 4:513244:48 gen 6 rebalance: target hdd compression zstd [Wed Jul 17 13:30:03 2024] k: u64s 10 type extent 671283510:6400:U32_MAX len 16 ver 106595508: durability: 2 crc: c_size 8 size 16 offset 0 nonce 0 csum chacha20_poly1305_80 compress zstd ptr: 3:355968:112 gen 7 ptr: 4:513244:56 gen 6 rebalance: target hdd compression zstd [Wed Jul 17 13:30:03 2024] new: u64s 14 type extent 671283510:6392:U32_MAX len 8 ver 106595508: durability: 2 crc: c_size 8 size 16 offset 0 nonce 0 csum chacha20_poly1305_80 compress zstd ptr: 3:355968:112 gen 7 cached ptr: 4:513244:56 gen 6 cached rebalance: target hdd compression zstd crc: c_size 8 size 16 offset 8 nonce 0 csum chacha20_poly1305_80 compress zstd ptr: 1:10860085:32 gen 0 ptr: 0:17285918:408 gen 0 [Wed Jul 17 13:30:03 2024] bcachefs (cca5bc65-fe77-409d-a9fa-465a6e7f4eae): fatal error - emergency read only bch2_extents_match() was reporting true for extents that did not actually point to the same data. bch2_extent_match() iterates over pairs of pointers, looking for pointers that point to the same location on disk (with matching generation numbers). However one or both extents may have been trimmed (or merged) and they might not have the same disk offset: it corrects for this by subtracting the key offset and the checksum entry offset. However, this failed when an extent was immediately partially overwritten, and the new overwrite was allocated the next adjacent disk space. Normally, with compression off, this would never cause a bug, since the new extent would have to be immediately after the old extent for the pointer offsets to match, and the rebalance index update path is not looking for an extent outside the range of the extent it moved. However with compression enabled, extents take up less space on disk than they do in the btree index space - and spuriously matching after partial overwrite is possible. To fix this, add a secondary check, that strictly checks that the regions pointed to on disk overlap. https://github.com/koverstreet/bcachefs/issues/717 Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-08-26 20:33:12 -04:00
Kent Overstreet	66927b8928	bcachefs: Fix failure to return error in data_update_index_update() This fixes an assertion pop in io_write.c - if we don't return an error we're supposed to have completed all the btree updates. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-08-26 20:33:12 -04:00
Qu Wenruo	10d9d8c351	btrfs: fix a use-after-free when hitting errors inside btrfs_submit_chunk() [BUG] There is an internal report that KASAN is reporting use-after-free, with the following backtrace: BUG: KASAN: slab-use-after-free in btrfs_check_read_bio+0xa68/0xb70 [btrfs] Read of size 4 at addr ffff8881117cec28 by task kworker/u16:2/45 CPU: 1 UID: 0 PID: 45 Comm: kworker/u16:2 Not tainted 6.11.0-rc2-next-20240805-default+ #76 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.2-3-gd478f380-rebuilt.opensuse.org 04/01/2014 Workqueue: btrfs-endio btrfs_end_bio_work [btrfs] Call Trace: dump_stack_lvl+0x61/0x80 print_address_description.constprop.0+0x5e/0x2f0 print_report+0x118/0x216 kasan_report+0x11d/0x1f0 btrfs_check_read_bio+0xa68/0xb70 [btrfs] process_one_work+0xce0/0x12a0 worker_thread+0x717/0x1250 kthread+0x2e3/0x3c0 ret_from_fork+0x2d/0x70 ret_from_fork_asm+0x11/0x20 Allocated by task 20917: kasan_save_stack+0x37/0x60 kasan_save_track+0x10/0x30 __kasan_slab_alloc+0x7d/0x80 kmem_cache_alloc_noprof+0x16e/0x3e0 mempool_alloc_noprof+0x12e/0x310 bio_alloc_bioset+0x3f0/0x7a0 btrfs_bio_alloc+0x2e/0x50 [btrfs] submit_extent_page+0x4d1/0xdb0 [btrfs] btrfs_do_readpage+0x8b4/0x12a0 [btrfs] btrfs_readahead+0x29a/0x430 [btrfs] read_pages+0x1a7/0xc60 page_cache_ra_unbounded+0x2ad/0x560 filemap_get_pages+0x629/0xa20 filemap_read+0x335/0xbf0 vfs_read+0x790/0xcb0 ksys_read+0xfd/0x1d0 do_syscall_64+0x6d/0x140 entry_SYSCALL_64_after_hwframe+0x4b/0x53 Freed by task 20917: kasan_save_stack+0x37/0x60 kasan_save_track+0x10/0x30 kasan_save_free_info+0x37/0x50 __kasan_slab_free+0x4b/0x60 kmem_cache_free+0x214/0x5d0 bio_free+0xed/0x180 end_bbio_data_read+0x1cc/0x580 [btrfs] btrfs_submit_chunk+0x98d/0x1880 [btrfs] btrfs_submit_bio+0x33/0x70 [btrfs] submit_one_bio+0xd4/0x130 [btrfs] submit_extent_page+0x3ea/0xdb0 [btrfs] btrfs_do_readpage+0x8b4/0x12a0 [btrfs] btrfs_readahead+0x29a/0x430 [btrfs] read_pages+0x1a7/0xc60 page_cache_ra_unbounded+0x2ad/0x560 filemap_get_pages+0x629/0xa20 filemap_read+0x335/0xbf0 vfs_read+0x790/0xcb0 ksys_read+0xfd/0x1d0 do_syscall_64+0x6d/0x140 entry_SYSCALL_64_after_hwframe+0x4b/0x53 [CAUSE] Although I cannot reproduce the error, the report itself is good enough to pin down the cause. The call trace is the regular endio workqueue context, but the free-by-task trace is showing that during btrfs_submit_chunk() we already hit a critical error, and is calling btrfs_bio_end_io() to error out. And the original endio function called bio_put() to free the whole bio. This means a double freeing thus causing use-after-free, e.g.: 1. Enter btrfs_submit_bio() with a read bio The read bio length is 128K, crossing two 64K stripes. 2. The first run of btrfs_submit_chunk() 2.1 Call btrfs_map_block(), which returns 64K 2.2 Call btrfs_split_bio() Now there are two bios, one referring to the first 64K, the other referring to the second 64K. 2.3 The first half is submitted. 3. The second run of btrfs_submit_chunk() 3.1 Call btrfs_map_block(), which by somehow failed Now we call btrfs_bio_end_io() to handle the error 3.2 btrfs_bio_end_io() calls the original endio function Which is end_bbio_data_read(), and it calls bio_put() for the original bio. Now the original bio is freed. 4. The submitted first 64K bio finished Now we call into btrfs_check_read_bio() and tries to advance the bio iter. But since the original bio (thus its iter) is already freed, we trigger the above use-after free. And even if the memory is not poisoned/corrupted, we will later call the original endio function, causing a double freeing. [FIX] Instead of calling btrfs_bio_end_io(), call btrfs_orig_bbio_end_io(), which has the extra check on split bios and do the proper refcounting for cloned bios. Furthermore there is already one extra btrfs_cleanup_bio() call, but that is duplicated to btrfs_orig_bbio_end_io() call, so remove that label completely. Reported-by: David Sterba <dsterba@suse.com> Fixes: `852eee62d3` ("btrfs: allow btrfs_submit_bio to split bios") CC: stable@vger.kernel.org # 6.6+ Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2024-08-27 01:34:08 +02:00
Jeff Layton	7e8ae8486e	fs/nfsd: fix update of inode attrs in CB_GETATTR Currently, we copy the mtime and ctime to the in-core inode and then mark the inode dirty. This is fine for certain types of filesystems, but not all. Some require a real setattr to properly change these values (e.g. ceph or reexported NFS). Fix this code to call notify_change() instead, which is the proper way to effect a setattr. There is one problem though: In this case, the client is holding a write delegation and has sent us attributes to update our cache. We don't want to break the delegation for this since that would defeat the purpose. Add a new ATTR_DELEG flag that makes notify_change bypass the try_break_deleg call. Fixes: `c5967721e1` ("NFSD: handle GETATTR conflict with write delegation") Reviewed-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2024-08-26 19:04:00 -04:00
Max Filippov	c6a09e342f	binfmt_elf_fdpic: fix AUXV size calculation when ELF_HWCAP2 is defined create_elf_fdpic_tables() does not correctly account the space for the AUX vector when an architecture has ELF_HWCAP2 defined. Prior to the commit `10e29251be` ("binfmt_elf_fdpic: fix /proc/<pid>/auxv") it resulted in the last entry of the AUX vector being set to zero, but with that change it results in a kernel BUG. Fix that by adding one to the number of AUXV entries (nitems) when ELF_HWCAP2 is defined. Fixes: `10e29251be` ("binfmt_elf_fdpic: fix /proc/<pid>/auxv") Cc: stable@vger.kernel.org Reported-by: Greg Ungerer <gerg@kernel.org> Closes: https://lore.kernel.org/lkml/5b51975f-6d0b-413c-8b38-39a6a45e8821@westnet.com.au/ Signed-off-by: Max Filippov <jcmvbkbc@gmail.com> Tested-by: Greg Ungerer <gerg@kernel.org> Link: https://lore.kernel.org/r/20240826032745.3423812-1-jcmvbkbc@gmail.com Signed-off-by: Kees Cook <kees@kernel.org>	2024-08-26 13:00:38 -07:00
Jeff Layton	1116e0e372	nfsd: fix potential UAF in nfsd4_cb_getattr_release Once we drop the delegation reference, the fields embedded in it are no longer safe to access. Do that last. Fixes: `c5967721e1` ("NFSD: handle GETATTR conflict with write delegation") Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2024-08-26 11:53:05 -04:00
Jeff Layton	da05ba23d4	nfsd: hold reference to delegation when updating it for cb_getattr Once we've dropped the flc_lock, there is nothing that ensures that the delegation that was found will still be around later. Take a reference to it while holding the lock and then drop it when we've finished with the delegation. Fixes: `c5967721e1` ("NFSD: handle GETATTR conflict with write delegation") Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2024-08-26 11:52:40 -04:00
David Sterba	33f58a0480	btrfs: initialize last_extent_end to fix -Wmaybe-uninitialized warning in extent_fiemap() There's a warning (probably on some older compiler version): fs/btrfs/fiemap.c: warning: 'last_extent_end' may be used uninitialized in this function [-Wmaybe-uninitialized]: => 822:19 Initialize the variable to 0 although it's not necessary as it's either properly set or not used after an error. The called function is in the same file so this is a false alert but we want to fix all -Wmaybe-uninitialized reports. Link: https://lore.kernel.org/all/20240819070639.2558629-1-geert@linux-m68k.org/ Reported-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: David Sterba <dsterba@suse.com>	2024-08-26 16:58:13 +02:00
Zizhi Wo	68415b349f	xfs: Fix the owner setting issue for rmap query in xfs fsmap I notice a rmap query bug in xfs_io fsmap: [root@fedora ~]# xfs_io -c 'fsmap -vvvv' /mnt EXT: DEV BLOCK-RANGE OWNER FILE-OFFSET AG AG-OFFSET TOTAL 0: 253:16 [0..7]: static fs metadata 0 (0..7) 8 1: 253:16 [8..23]: per-AG metadata 0 (8..23) 16 2: 253:16 [24..39]: inode btree 0 (24..39) 16 3: 253:16 [40..47]: per-AG metadata 0 (40..47) 8 4: 253:16 [48..55]: refcount btree 0 (48..55) 8 5: 253:16 [56..103]: per-AG metadata 0 (56..103) 48 6: 253:16 [104..127]: free space 0 (104..127) 24 ...... Bug: [root@fedora ~]# xfs_io -c 'fsmap -vvvv -d 0 3' /mnt [root@fedora ~]# Normally, we should be able to get one record, but we got nothing. The root cause of this problem lies in the incorrect setting of rm_owner in the rmap query. In the case of the initial query where the owner is not set, __xfs_getfsmap_datadev() first sets info->high.rm_owner to ULLONG_MAX. This is done to prevent any omissions when comparing rmap items. However, if the current ag is detected to be the last one, the function sets info's high_irec based on the provided key. If high->rm_owner is not specified, it should continue to be set to ULLONG_MAX; otherwise, there will be issues with interval omissions. For example, consider "start" and "end" within the same block. If high->rm_owner == 0, it will be smaller than the founded record in rmapbt, resulting in a query with no records. The main call stack is as follows: xfs_ioc_getfsmap xfs_getfsmap xfs_getfsmap_datadev_rmapbt __xfs_getfsmap_datadev info->high.rm_owner = ULLONG_MAX if (pag->pag_agno == end_ag) xfs_fsmap_owner_to_rmap // set info->high.rm_owner = 0 because fmr_owner == -1ULL dest->rm_owner = 0 // get nothing xfs_getfsmap_datadev_rmapbt_query The problem can be resolved by simply modify the xfs_fsmap_owner_to_rmap function internal logic to achieve. After applying this patch, the above problem have been solved: [root@fedora ~]# xfs_io -c 'fsmap -vvvv -d 0 3' /mnt EXT: DEV BLOCK-RANGE OWNER FILE-OFFSET AG AG-OFFSET TOTAL 0: 253:16 [0..7]: static fs metadata 0 (0..7) 8 Fixes: `e89c041338` ("xfs: implement the GETFSMAP ioctl") Signed-off-by: Zizhi Wo <wozizhi@huawei.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>	2024-08-26 09:52:27 +05:30
Darrick J. Wong	410e8a18f8	xfs: don't bother reporting blocks trimmed via FITRIM Don't bother reporting the number of bytes that we "trimmed" because the underlying storage isn't required to do anything(!) and failed discard IOs aren't reported to the caller anyway. It's not like userspace can use the reported value for anything useful like adjusting the offset parameter of the next call, and it's not like anyone ever wrote a manpage about FITRIM's out parameters. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Tested-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>	2024-08-26 09:52:13 +05:30
Dave Chinner	95179935be	xfs: xfs_finobt_count_blocks() walks the wrong btree As a result of the factoring in commit `14dd46cf31` ("xfs: split xfs_inobt_init_cursor"), mount started taking a long time on a user's filesystem. For Anders, this made mount times regress from under a second to over 15 minutes for a filesystem with only 30 million inodes in it. Anders bisected it down to the above commit, but even then the bug was not obvious. In this commit, over 20 calls to xfs_inobt_init_cursor() were modified, and some we modified to call a new function named xfs_finobt_init_cursor(). If that takes you a moment to reread those function names to see what the rename was, then you have realised why this bug wasn't spotted during review. And it wasn't spotted on inspection even after the bisect pointed at this commit - a single missing "f" isn't the easiest thing for a human eye to notice.... The result is that xfs_finobt_count_blocks() now incorrectly calls xfs_inobt_init_cursor() so it is now walking the inobt instead of the finobt. Hence when there are lots of allocated inodes in a filesystem, mount takes a -long- time run because it now walks a massive allocated inode btrees instead of the small, nearly empty free inode btrees. It also means all the finobt space reservations are wrong, so mount could potentially given ENOSPC on kernel upgrade. In hindsight, commit `14dd46cf31` should have been two commits - the first to convert the finobt callers to the new API, the second to modify the xfs_inobt_init_cursor() API for the inobt callers. That would have made the bug very obvious during review. Fixes: `14dd46cf31` ("xfs: split xfs_inobt_init_cursor") Reported-by: Anders Blomdell <anders.blomdell@gmail.com> Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>	2024-08-26 09:52:00 +05:30
Darrick J. Wong	5335affcff	xfs: fix folio dirtying for XFILE_ALLOC callers willy pointed out that folio_mark_dirty is the correct function to use to mark an xfile folio dirty because it calls out to the mapping's aops to mark it dirty. For tmpfs this likely doesn't matter much since it currently uses nop_dirty_folio, but let's use the abstractions properly. Reported-by: willy@infradead.org Fixes: `6907e3c00a` ("xfs: add file_{get,put}_folio") Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>	2024-08-26 09:51:27 +05:30
Darrick J. Wong	e21fea4ac3	xfs: fix di_onlink checking for V1/V2 inodes "KjellR" complained on IRC that an old V4 filesystem suddenly stopped mounting after upgrading from 6.9.11 to 6.10.3, with the following splat when trying to read the rt bitmap inode: 00000000: 49 4e 80 00 01 02 00 01 00 00 00 00 00 00 00 00 IN.............. 00000010: 00 00 00 01 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00000020: 00 00 00 00 00 00 00 00 43 d2 a9 da 21 0f d6 30 ........C...!..0 00000030: 43 d2 a9 da 21 0f d6 30 00 00 00 00 00 00 00 00 C...!..0........ 00000040: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00000050: 00 00 00 02 00 00 00 00 00 00 00 04 00 00 00 00 ................ 00000060: ff ff ff ff 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00000070: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ As Dave Chinner points out, this is a V1 inode with both di_onlink and di_nlink set to 1 and di_flushiter == 0. In other words, this inode was formatted this way by mkfs and hasn't been touched since then. Back in the old days of xfsprogs 3.2.3, I observed that libxfs_ialloc would set di_nlink, but if the filesystem didn't have NLINK, it would then set di_version = 1. libxfs_iflush_int later sees the V1 inode and copies the value of di_nlink to di_onlink without zeroing di_onlink. Eventually this filesystem must have been upgraded to support NLINK because 6.10 doesn't support !NLINK filesystems, which is how we tripped over this old behavior. The filesystem doesn't have a realtime section, so that's why the rtbitmap inode has never been touched. Fix this by removing the di_onlink/di_nlink checking for all V1/V2 inodes because this is a muddy mess. The V3 inode handling code has always supported NLINK and written di_onlink==0 so keep that check. The removal of the V1 inode handling code when we dropped support for !NLINK obscured this old behavior. Reported-by: kjell.m.randa@gmail.com Fixes: `40cb8613d6` ("xfs: check unused nlink fields in the ondisk inode") Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>	2024-08-26 09:50:41 +05:30
Josef Bacik	2d34472610	btrfs: run delayed iputs when flushing delalloc We have transient failures with btrfs/301, specifically in the part where we do for i in $(seq 0 10); do write 50m to file rm -f file done Sometimes this will result in a transient quota error, and it's because sometimes we start writeback on the file which results in a delayed iput, and thus the rm doesn't actually clean the file up. When we're flushing the quota space we need to run the delayed iputs to make sure all the unlinks that we think have completed have actually completed. This removes the small window where we could fail to find enough space in our quota. CC: stable@vger.kernel.org # 5.15+ Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: David Sterba <dsterba@suse.com>	2024-08-25 19:15:34 +02:00
David Howells	416871f4fb	cifs: Fix FALLOC_FL_PUNCH_HOLE support The cifs filesystem doesn't quite emulate FALLOC_FL_PUNCH_HOLE correctly (note that due to lack of protocol support, it can't actually implement it directly). Whilst it will (partially) invalidate dirty folios in the pagecache, it doesn't write them back first, and so the EOF marker on the server may be lower than inode->i_size. This presents a problem, however, as if the punched hole invalidates the tail of the locally cached dirty data, writeback won't know it needs to move the EOF over to account for the hole punch (which isn't supposed to move the EOF). We could just write zeroes over the punched out region of the pagecache and write that back - but this is supposed to be a deallocatory operation. Fix this by manually moving the EOF over on the server after the operation if the hole punched would corrupt it. Note that the FSCTL_SET_ZERO_DATA RPC and the setting of the EOF should probably be compounded to stop a third party interfering (or, at least, massively reduce the chance). This was reproducible occasionally by using fsx with the following script: truncate 0x0 0x375e2 0x0 punch_hole 0x2f6d3 0x6ab5 0x375e2 truncate 0x0 0x3a71f 0x375e2 mapread 0xee05 0xcf12 0x3a71f write 0x2078e 0x5604 0x3a71f write 0x3ebdf 0x1421 0x3a71f * punch_hole 0x379d0 0x8630 0x40000 * mapread 0x2aaa2 0x85b 0x40000 fallocate 0x1b401 0x9ada 0x40000 read 0x15f2 0x7d32 0x40000 read 0x32f37 0x7a3b 0x40000 * The second "write" should extend the EOF to 0x40000, and the "punch_hole" should operate inside of that - but that depends on whether the VM gets in and writes back the data first. If it doesn't, the file ends up 0x3a71f in size, not 0x40000. Fixes: `31742c5a33` ("enable fallocate punch hole ("fallocate -p") for SMB3") Signed-off-by: David Howells <dhowells@redhat.com> cc: Steve French <sfrench@samba.org> cc: Paulo Alcantara <pc@manguebit.com> cc: Shyam Prasad N <nspmangalore@gmail.com> cc: Jeff Layton <jlayton@kernel.org> cc: linux-cifs@vger.kernel.org cc: netfs@lists.linux.dev Signed-off-by: Steve French <stfrench@microsoft.com>	2024-08-25 09:06:25 -05:00
Stefan Metzmacher	017d170174	smb/client: fix rdma usage in smb2_async_writev() rqst.rq_iter needs to be truncated otherwise we'll also send the bytes into the stream socket... This is the logic behind rqst.rq_npages = 0, which was removed in "cifs: Change the I/O paths to use an iterator rather than a page list" (`d08089f649`). Cc: stable@vger.kernel.org Fixes: `d08089f649` ("cifs: Change the I/O paths to use an iterator rather than a page list") Reviewed-by: David Howells <dhowells@redhat.com> Signed-off-by: Stefan Metzmacher <metze@samba.org> Signed-off-by: Steve French <stfrench@microsoft.com>	2024-08-25 09:06:25 -05:00
Stefan Metzmacher	b608e2c318	smb/client: remove unused rq_iter_size from struct smb_rqst Reviewed-by: David Howells <dhowells@redhat.com> Fixes: `d08089f649` ("cifs: Change the I/O paths to use an iterator rather than a page list") Signed-off-by: Stefan Metzmacher <metze@samba.org> Signed-off-by: Steve French <stfrench@microsoft.com>	2024-08-25 09:06:25 -05:00
Stefan Metzmacher	c724b2ab6a	smb/client: avoid dereferencing rdata=NULL in smb2_new_read_req() This happens when called from SMB2_read() while using rdma and reaching the rdma_readwrite_threshold. Cc: stable@vger.kernel.org Fixes: `a6559cc1d3` ("cifs: split out smb3_use_rdma_offload() helper") Reviewed-by: David Howells <dhowells@redhat.com> Signed-off-by: Stefan Metzmacher <metze@samba.org> Signed-off-by: Steve French <stfrench@microsoft.com>	2024-08-25 09:06:25 -05:00
Linus Torvalds	72bea05cb1	bcachefs fixes for 6.11-rc5, v2 - rhashtable conversion for vfs inodes - rcu_pending, btree key cache conversion + nocow deadlock fix + fix for new rebalance_work accounting -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEKnAFLkS8Qha+jvQrE6szbY3KbnYFAmbKB9wACgkQE6szbY3K bnYMtw/+LSGV/eqwLwdeuABggU5gehWjxqkWF/uGE7fPP8pP0dJQnvCLRVKtAro2 0mJh6j+kM602fU5jH/W8WNn1h8J7dAdkyqI7P/D3ZgTdaCtxso+A0Nj95CYpNdWY ESX9CLUYSxtFatT/kfWvlRvqlSJBYo7WgNsV6tcnPdpC+Oki6Kwlq22iI+ma9Ty7 uDbgd05/R9KCSxaaV+9iojCsEq6h/tuFH8Z3f3SevA8H29odh5mt0UWNn05pf3mt rAnDUJ5TQVYubMIcbS6MhjVoLZ3AxOefkk4pctdbdmGSPJcssDeXvATn/wHYl6Fp +et1ECRU3Sc3dqcmT0RaTm/yxYytdtKA4HVxS4ELKbsIM2xU0Pjq3JQwKzHRwXDd a3r0WXa+LqHkBP37g0HhuxhxAECnbpUM9bvDivgGssVDLyxfMKUkhDsuzegjrHAF v5H08myk5maKvLv+dD6e23t0l1i9eB/bSsw1iNGOgZP4k9gsUlESvppFGw/10F+Q 1Y/qeSiNTG9kJyo9PQTOZ6rFVxrfaZ9NFP4EAXcWId81OsQHYY8XnE5XaJATxnwF MzCgNdmzuf67X6Q8fCeNCJtiZ5sCmbyENGd6hbyYFDg+R02p0NOM4ABVN6BBfXJ+ eHPyu2bvusIZt8MD6c7fOxyGsGdgLxIv/SkqLayZdxEaY3VvS2g= =ejxu -----END PGP SIGNATURE----- Merge tag 'bcachefs-2024-08-24' of git://evilpiepirate.org/bcachefs Pull bcachefs fixes from Kent Overstreet: - assorted syzbot fixes - some upgrade fixes for old (pre 1.0) filesystems - fix for moving data off a device that was switched to durability=0 after data had been written to it. - nocow deadlock fix - fix for new rebalance_work accounting * tag 'bcachefs-2024-08-24' of git://evilpiepirate.org/bcachefs: (28 commits) bcachefs: Fix rebalance_work accounting bcachefs: Fix failure to flush moves before sleeping in copygc bcachefs: don't use rht_bucket() in btree_key_cache_scan() bcachefs: add missing inode_walker_exit() bcachefs: clear path->should_be_locked in bch2_btree_key_cache_drop() bcachefs: Fix double assignment in check_dirent_to_subvol() bcachefs: Fix refcounting in discard path bcachefs: Fix compat issue with old alloc_v4 keys bcachefs: Fix warning in bch2_fs_journal_stop() fs/super.c: improve get_tree() error message bcachefs: Fix missing validation in bch2_sb_journal_v2_validate() bcachefs: Fix replay_now_at() assert bcachefs: Fix locking in bch2_ioc_setlabel() bcachefs: fix failure to relock in btree_node_fill() bcachefs: fix failure to relock in bch2_btree_node_mem_alloc() bcachefs: unlock_long() before resort in journal replay bcachefs: fix missing bch2_err_str() bcachefs: fix time_stats_to_text() bcachefs: Fix bch2_bucket_gens_init() bcachefs: Fix bch2_trigger_alloc assert ...	2024-08-25 17:20:48 +12:00
Linus Torvalds	780bdc1ba7	five ksmbd server fixes -----BEGIN PGP SIGNATURE----- iQGzBAABCgAdFiEE6fsu8pdIjtWE/DpLiiy9cAdyT1EFAmbJteoACgkQiiy9cAdy T1F+Pwv/RHXSnQD+jkFEfQCEgsZZOfWD0V74VZqm90N48gfB3giZw9mtV4I1jQzI 0+UerZjN7lHIDC4f6qp48TSEodHpprAxLfsg5JJN/OxDE+0MSbctTjLeHlduVzw6 iHEdaE3jWN0p4YZRdbyrUCaOoTEk9cKwiG7r2DjArNyQ8kClveeqrGfdZUDTHNkv IIs6CJ8PFo7dicpAIGPmMz1TGq5Lh2EFjZTYEweSSlyXUNKaWgz3BXBIXD4LwK6w mFjGPxGNBDorcvzHcOUZnrpfACB3WNOSPN/WK5sQL6LXGCx3sWtUvGxLFkxFwjSq D7gvo7qnBuycNyR03RfmWyXYx+2KzdYoAUGTNV114zMJskBC0QhIIF6JK+xZdPZX XHxbr4CRR7fsaZOur5MTWXEzVJxvC1irULKoBp7lvYpEoAV6yXpK3XegAHIASKUE /Cw9qikIvxrMg4BjWPP1JhbKRw92uL2ty4oO913hbnBsScS8jCystuNl6ataiXWq PN5rN4sy =bGOb -----END PGP SIGNATURE----- Merge tag '6.11-rc5-server-fixes' of git://git.samba.org/ksmbd Pull smb server fixes from Steve French: - query directory flex array fix - fix potential null ptr reference in open - fix error message in some open cases - two minor cleanups * tag '6.11-rc5-server-fixes' of git://git.samba.org/ksmbd: smb/server: update misguided comment of smb2_allocate_rsp_buf() smb/server: remove useless assignment of 'file_present' in smb2_open() smb/server: fix potential null-ptr-deref of lease_ctx_info in smb2_open() smb/server: fix return value of smb2_open() ksmbd: the buffer of smb2 query dir response has at least 1 byte	2024-08-25 12:15:04 +12:00
Kent Overstreet	49aa783039	bcachefs: Fix rebalance_work accounting rebalance_work was keying off of the presence of rebelance_opts in the extent - but that was incorrect, we keep those around after rebalance for indirect extents since the inode's options are not directly available Fixes: `20ac515a9c` ("bcachefs: bch_acct_rebalance_work") Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-08-24 10:16:21 -04:00
Kent Overstreet	d3204616a6	bcachefs: Fix failure to flush moves before sleeping in copygc This fixes an apparent deadlock - rebalance would get stuck trying to take nocow locks because they weren't being released by copygc. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-08-24 10:16:21 -04:00
David Howells	e00e99ba6c	netfs: Fix interaction of streaming writes with zero-point tracker When a folio that is marked for streaming write (dirty, but not uptodate, with partial content specified in the private data) is written back, the folio is effectively switched to the blank state upon completion of the write. This means that if we want to read it in future, we need to reread the whole folio. However, if the folio is above the zero_point position, when it is read back, it will just be cleared and the read skipped, leading to apparent local corruption. Fix this by increasing the zero_point to the end of the dirty data in the folio when clearing the folio state after writeback. This is analogous to the folio having ->release_folio() called upon it. This was causing the config.log generated by configuring a cpython tree on a cifs share to get corrupted because the scripts involved were appending text to the file in small pieces. Fixes: `288ace2f57` ("netfs: New writeback implementation") Signed-off-by: David Howells <dhowells@redhat.com> Link: https://lore.kernel.org/r/563286.1724500613@warthog.procyon.org.uk cc: Steve French <sfrench@samba.org> cc: Paulo Alcantara <pc@manguebit.com> cc: Jeff Layton <jlayton@kernel.org> cc: linux-cifs@vger.kernel.org cc: netfs@lists.linux.dev cc: linux-fsdevel@vger.kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org>	2024-08-24 16:09:17 +02:00
David Howells	950b03d0f6	netfs: Fix missing iterator reset on retry of short read Fix netfs_rreq_perform_resubmissions() to reset before retrying a short read, otherwise the wrong part of the output buffer will be used. Fixes: `92b6cc5d1e` ("netfs: Add iov_iters to (sub)requests to describe various buffers") Signed-off-by: David Howells <dhowells@redhat.com> Link: https://lore.kernel.org/r/20240823200819.532106-6-dhowells@redhat.com cc: Steve French <sfrench@samba.org> cc: Paulo Alcantara <pc@manguebit.com> cc: Jeff Layton <jlayton@kernel.org> cc: linux-cifs@vger.kernel.org cc: netfs@lists.linux.dev cc: linux-fsdevel@vger.kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org>	2024-08-24 16:09:17 +02:00
David Howells	cce6bfa6ca	netfs: Fix trimming of streaming-write folios in netfs_inval_folio() When netfslib writes to a folio that it doesn't have data for, but that data exists on the server, it will make a 'streaming write' whereby it stores data in a folio that is marked dirty, but not uptodate. When it does this, it attaches a record to folio->private to track the dirty region. When truncate() or fallocate() wants to invalidate part of such a folio, it will call into ->invalidate_folio(), specifying the part of the folio that is to be invalidated. netfs_invalidate_folio(), on behalf of the filesystem, must then determine how to trim the streaming write record. In a couple of cases, however, it does this incorrectly (the reduce-length and move-start cases are switched over and don't, in any case, calculate the value correctly). Fix this by making the logic tree more obvious and fixing the cases. Fixes: `9ebff83e64` ("netfs: Prep to use folio->private for write grouping and streaming write") Signed-off-by: David Howells <dhowells@redhat.com> Link: https://lore.kernel.org/r/20240823200819.532106-5-dhowells@redhat.com cc: Matthew Wilcox (Oracle) <willy@infradead.org> cc: Pankaj Raghav <p.raghav@samsung.com> cc: Jeff Layton <jlayton@kernel.org> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org cc: netfs@lists.linux.dev cc: linux-mm@kvack.org cc: linux-fsdevel@vger.kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org>	2024-08-24 16:09:16 +02:00
David Howells	7dfc8f0c61	netfs: Fix netfs_release_folio() to say no if folio dirty Fix netfs_release_folio() to say no (ie. return false) if the folio is dirty (analogous with iomap's behaviour). Without this, it will say yes to the release of a dirty page by split_huge_page_to_list_to_order(), which will result in the loss of untruncated data in the folio. Without this, the generic/075 and generic/112 xfstests (both fsx-based tests) fail with minimum folio size patches applied[1]. Fixes: `c1ec4d7c2e` ("netfs: Provide invalidate_folio and release_folio calls") Signed-off-by: David Howells <dhowells@redhat.com> Link: https://lore.kernel.org/r/20240815090849.972355-1-kernel@pankajraghav.com/ [1] Link: https://lore.kernel.org/r/20240823200819.532106-4-dhowells@redhat.com cc: Matthew Wilcox (Oracle) <willy@infradead.org> cc: Pankaj Raghav <p.raghav@samsung.com> cc: Jeff Layton <jlayton@kernel.org> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org cc: netfs@lists.linux.dev cc: linux-mm@kvack.org cc: linux-fsdevel@vger.kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org>	2024-08-24 16:09:16 +02:00
David Howells	a74ee0e878	afs: Fix post-setattr file edit to do truncation correctly At the end of an kAFS RPC operation, there is an "edit" phase (originally intended for post-directory modification ops to edit the local image) that the setattr VFS op uses to fix up the pagecache if the RPC that requested truncation of a file was successful. afs_setattr_edit_file() calls truncate_setsize() which sets i_size, expands the pagecache if needed and truncates the pagecache. The first two of those, however, are redundant as they've already been done by afs_setattr_success() under the io_lock and the first is also done under the callback lock (cb_lock). Fix afs_setattr_edit_file() to call truncate_pagecache() instead (which is called by truncate_setsize(), thereby skipping the redundant parts. Fixes: `100ccd18bb` ("netfs: Optimise away reads above the point at which there can be no data") Signed-off-by: David Howells <dhowells@redhat.com> Link: https://lore.kernel.org/r/20240823200819.532106-3-dhowells@redhat.com cc: Matthew Wilcox (Oracle) <willy@infradead.org> cc: Pankaj Raghav <p.raghav@samsung.com> cc: Jeff Layton <jlayton@kernel.org> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org cc: netfs@lists.linux.dev cc: linux-mm@kvack.org cc: linux-fsdevel@vger.kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org>	2024-08-24 16:09:16 +02:00
Christian Brauner	d10771d51b	Merge patch series "ovl: simplify ovl_parse_param_lowerdir()" Simplify and fix overlayfs layer parsing so the maximum of 500 layers can be used. * patches from https://lore.kernel.org/r/20240705011510.794025-1-chengzhihao1@huawei.com: ovl: ovl_parse_param_lowerdir: Add missed '\n' for pr_err ovl: fix wrong lowerdir number check for parameter Opt_lowerdir ovl: pass string to ovl_parse_layer() Link: https://lore.kernel.org/r/20240705011510.794025-1-chengzhihao1@huawei.com Signed-off-by: Christian Brauner <brauner@kernel.org>	2024-08-24 16:00:46 +02:00
Linus Torvalds	60f0560f53	NFS Client Bugfixes for Linux 6.11-rc Bugfixes: * Fix rpcrdma refcounting in xa_alloc * Fix rpcrdma usage of XA_FLAGS_ALLOC * Fix requesting FATTR4_WORD2_OPEN_ARGUMENTS * Fix attribute bitmap decoder to handle a 3rd word * Add reschedule points when returning delegations to avoid soft lockups * Fix clearing layout segments in layoutreturn * Avoid unnecessary rescanning of the per-server delegation list -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEnZ5MQTpR7cLU7KEp18tUv7ClQOsFAmbIywQACgkQ18tUv7Cl QOtvHQ//VJO6iTh3kmONCru2ohxgAl7+qX+3HHbMR62S8AWCp0ujId0nir7CuxDa 49c+R03s36lGXkTX5x0d3Idhbv12a5Jdy21oZuJU+RCm6Z7MdNXdDl9HN/gXXqdl 0Z3Wk8r3Pi4/tgejau0i2zN2wXxVNKSQgovETnuI/BQLHupDvDy8Sd8lrIqqoiXY sffCiKSTbCFWg6JLEF1UWZZ1VtLUsDZRBQJD+67l1NbjSX/tiBsY0CquWcHjXAlY 2VGDXdFCZwsQyYuqNdMVh1Cr95hcT0F1YZLOT+vn+6b6rA+UbtmPlURt8iR4gBFo Fadpp5pRziYb9wyg/DgFABihB6PzcboIg5Lm0rx870WEuzxSs8NQeQ9sw5hJC797 At8C4I+cNLOaPU5nUEcG53+svEl9F2jDI2jFc8aa5zAW2hHAtpZhLVre5to0CDb/ hu/H+h2yvjJyfSB7kCdVqlU93PJM96P7F1KEdVYmkuXQQMhkZknntVnu41w7KOst SKy0iU29idlU7SFHvKYyc4URC63kTKLWmTZZn3uDJouwDRYudCRPFQpiwnoDJOY3 wRi3jRPhmZQXn7ArChQSPrHqjRVTu9Y0gUcrtAj4YODv+bkngxmZbG3J3IqZKw21 AkMiDZESyriKVOTurX0Fzaj63zHSrIc+TwyTTXwFCGpCbVCf5/k= =OnQe -----END PGP SIGNATURE----- Merge tag 'nfs-for-6.11-2' of git://git.linux-nfs.org/projects/anna/linux-nfs Pull NFS client fixes from Anna Schumaker: - Fix rpcrdma refcounting in xa_alloc - Fix rpcrdma usage of XA_FLAGS_ALLOC - Fix requesting FATTR4_WORD2_OPEN_ARGUMENTS - Fix attribute bitmap decoder to handle a 3rd word - Add reschedule points when returning delegations to avoid soft lockups - Fix clearing layout segments in layoutreturn - Avoid unnecessary rescanning of the per-server delegation list * tag 'nfs-for-6.11-2' of git://git.linux-nfs.org/projects/anna/linux-nfs: NFS: Avoid unnecessary rescanning of the per-server delegation list NFSv4: Fix clearing of layout segments in layoutreturn NFSv4: Add missing rescheduling points in nfs_client_return_marked_delegations nfs: fix bitmap decoder to handle a 3rd word nfs: fix the fetch of FATTR4_OPEN_ARGUMENTS rpcrdma: Trace connection registration and unregistration rpcrdma: Use XA_FLAGS_ALLOC instead of XA_FLAGS_ALLOC1 rpcrdma: Device kref is over-incremented on error from xa_alloc	2024-08-24 09:03:25 +08:00
Linus Torvalds	66ace9a8f9	four cifs.ko client fixes -----BEGIN PGP SIGNATURE----- iQGzBAABCgAdFiEE6fsu8pdIjtWE/DpLiiy9cAdyT1EFAmbIqhgACgkQiiy9cAdy T1EAPgwAnW+vu15huT1zQn2BtFcn85zdBGXL/avjbbMLDwNHj5Lpae+PbbRa4gZ0 VN6OQdq5Rt3Z2pJDfFZtFECKq4AN1Lxn1ur4wujBIzez3CxyFCXjDeS5/3lRP6c+ 0CiHVtRe7IgncGUnnhvwPhiG6/cjTNiXlImb6SgmFLP/0U7ZnWl5p3LmR7exfVY9 Fubqq3HF0UpxMUD3thM055ftqT/xP6RdrITX2K2Led+BlJAJm1x+0E//4nApQ2IX C3VeBRZTvQtBC+pay754BqSnfAifgVObF8cfswDMS4U7ImV5gS+CxSx4vlg4bF7o 2f32mZAXz9U3yMIBMjtBT/q/LbN28SRSjo1x35CJ9LCUK6IzARHiLZG/PVltK3Cj copuH3n5ZV0nGVdsv10Uheo3euFlrKKylPn8xAEhMsQzG7Q6ek/pT+avb+xl6MWf i8eOnMobCFiOEJtSk/uV23579wf8maVQM92M2rf2UO6K5eHIceOq0HGfSoeVV9dZ 1rgZb1D6 =8U5O -----END PGP SIGNATURE----- Merge tag 'v6.11-rc4-client-fixes' of git://git.samba.org/sfrench/cifs-2.6 Pull smb client fixes from Steve French: - fix refcount leak (can cause rmmod fail) - fix byte range locking problem with cached reads - fix for mount failure if reparse point unrecognized - minor typo * tag 'v6.11-rc4-client-fixes' of git://git.samba.org/sfrench/cifs-2.6: smb/client: fix typo: GlobalMid_Sem -> GlobalMid_Lock smb: client: ignore unhandled reparse tags smb3: fix problem unloading module due to leaked refcount on shutdown smb3: fix broken cached reads when posix locks	2024-08-24 08:50:21 +08:00
Zhihao Cheng	441e36ef5b	ovl: ovl_parse_param_lowerdir: Add missed '\n' for pr_err Add '\n' for pr_err in function ovl_parse_param_lowerdir(), which ensures that error message is displayed at once. Fixes: `b36a5780cb` ("ovl: modify layer parameter parsing") Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com> Link: https://lore.kernel.org/r/20240705011510.794025-4-chengzhihao1@huawei.com Signed-off-by: Christian Brauner <brauner@kernel.org>	2024-08-23 19:58:59 +02:00
Zhihao Cheng	ca76ac36bb	ovl: fix wrong lowerdir number check for parameter Opt_lowerdir The max count of lowerdir is OVL_MAX_STACK[500], which is broken by commit 37f32f526438("ovl: fix memory leak in ovl_parse_param()") for parameter Opt_lowerdir. Since commit 819829f0319a("ovl: refactor layer parsing helpers") and commit 24e16e385f22("ovl: add support for appending lowerdirs one by one") added check ovl_mount_dir_check() in function ovl_parse_param_lowerdir(), the 'ctx->nr' should be smaller than OVL_MAX_STACK, after commit 37f32f526438("ovl: fix memory leak in ovl_parse_param()") is applied, the 'ctx->nr' is updated before the check ovl_mount_dir_check(), which leads the max count of lowerdir to become 499 for parameter Opt_lowerdir. Fix it by replacing lower layers parsing code with the existing helper function ovl_parse_layer(). Fixes: `37f32f5264` ("ovl: fix memory leak in ovl_parse_param()") Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com> Link: https://lore.kernel.org/r/20240705011510.794025-3-chengzhihao1@huawei.com Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Christian Brauner <brauner@kernel.org>	2024-08-23 19:56:38 +02:00
Christian Brauner	7eff3453cb	ovl: pass string to ovl_parse_layer() So it can be used for parsing the Opt_lowerdir. Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com> Link: https://lore.kernel.org/r/20240705011510.794025-2-chengzhihao1@huawei.com Signed-off-by: Christian Brauner <brauner@kernel.org>	2024-08-23 19:56:38 +02:00
Olga Kornievskaia	a204501e17	nfsd: prevent panic for nfsv4.0 closed files in nfs4_show_open Prior to commit `3f29cc82a8` ("nfsd: split sc_status out of sc_type") states_show() relied on sc_type field to be of valid type before calling into a subfunction to show content of a particular stateid. From that commit, we split the validity of the stateid into sc_status and no longer changed sc_type to 0 while unhashing the stateid. This resulted in kernel oopsing for nfsv4.0 opens that stay around and in nfs4_show_open() would derefence sc_file which was NULL. Instead, for closed open stateids forgo displaying information that relies of having a valid sc_file. To reproduce: mount the server with 4.0, read and close a file and then on the server cat /proc/fs/nfsd/clients/2/states [ 513.590804] Call trace: [ 513.590925] _raw_spin_lock+0xcc/0x160 [ 513.591119] nfs4_show_open+0x78/0x2c0 [nfsd] [ 513.591412] states_show+0x44c/0x488 [nfsd] [ 513.591681] seq_read_iter+0x5d8/0x760 [ 513.591896] seq_read+0x188/0x208 [ 513.592075] vfs_read+0x148/0x470 [ 513.592241] ksys_read+0xcc/0x178 Fixes: `3f29cc82a8` ("nfsd: split sc_status out of sc_type") Signed-off-by: Olga Kornievskaia <okorniev@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2024-08-23 12:25:46 -04:00
Ed Tsai	996b37da1e	backing-file: convert to using fops->splice_write Filesystems may define their own splice write. Therefore, use the file fops instead of invoking iter_file_splice_write() directly. Signed-off-by: Ed Tsai <ed.tsai@mediatek.com> Link: https://lore.kernel.org/r/20240708072208.25244-1-ed.tsai@mediatek.com Fixes: `5ca7346861` ("fuse: implement splice read/write passthrough") Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Christian Brauner <brauner@kernel.org>	2024-08-23 13:08:31 +02:00
Trond Myklebust	f92214e4c3	NFS: Avoid unnecessary rescanning of the per-server delegation list If the call to nfs_delegation_grab_inode() fails, we will not have dropped any locks that require us to rescan the list. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2024-08-22 17:01:10 -04:00
Trond Myklebust	d72b796311	NFSv4: Fix clearing of layout segments in layoutreturn Make sure that we clear the layout segments in cases where we see a fatal error, and also in the case where the layout is invalid. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2024-08-22 17:01:10 -04:00
Trond Myklebust	a017ad1313	NFSv4: Add missing rescheduling points in nfs_client_return_marked_delegations We're seeing reports of soft lockups when iterating through the loops, so let's add rescheduling points. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2024-08-22 17:01:10 -04:00
Jeff Layton	95832998fb	nfs: fix bitmap decoder to handle a 3rd word It only decodes the first two words at this point. Have it decode the third word as well. Without this, the client doesn't send delegated timestamps in the CB_GETATTR response. With this change we also need to expand the on-stack bitmap in decode_recallany_args to 3 elements, in case the server sends a larger bitmap than expected. Fixes: `43df7110f4` ("NFSv4: Add CB_GETATTR support for delegated attributes") Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Benjamin Coddington <bcodding@redhat.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2024-08-22 17:01:10 -04:00
Jeff Layton	cb78f9b7d0	nfs: fix the fetch of FATTR4_OPEN_ARGUMENTS The client doesn't properly request FATTR4_OPEN_ARGUMENTS in the initial SERVER_CAPS getattr. Add FATTR4_WORD2_OPEN_ARGUMENTS to the initial request. Fixes: `707f13b3d0` (NFSv4: Add support for the FATTR4_OPEN_ARGUMENTS attribute) Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Benjamin Coddington <bcodding@redhat.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2024-08-22 17:01:09 -04:00
ChenXiaoSong	5e51224d2a	smb/client: fix typo: GlobalMid_Sem -> GlobalMid_Lock The comments have typos, fix that to not confuse readers. Signed-off-by: ChenXiaoSong <chenxiaosong@kylinos.cn> Reviewed-by: Namjae Jeon <linkinjeon@kernel.org Signed-off-by: Steve French <stfrench@microsoft.com>	2024-08-22 15:44:19 -05:00
Jeff Layton	f58bab6fd4	nfsd: ensure that nfsd4_fattr_args.context is zeroed out If nfsd4_encode_fattr4 ends up doing a "goto out" before we get to checking for the security label, then args.context will be set to uninitialized junk on the stack, which we'll then try to free. Initialize it early. Fixes: `f59388a579` ("NFSD: Add nfsd4_encode_fattr4_sec_label()") Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2024-08-22 14:49:10 -04:00
Paulo Alcantara	ec68680411	smb: client: ignore unhandled reparse tags Just ignore reparse points that the client can't parse rather than bailing out and not opening the file or directory. Reported-by: Marc <1marc1@gmail.com> Closes: https://lore.kernel.org/r/CAMHwNVv-B+Q6wa0FEXrAuzdchzcJRsPKDDRrNaYZJd6X-+iJzw@mail.gmail.com Fixes: `539aad7f14` ("smb: client: introduce ->parse_reparse_point()") Tested-by: Anthony Nandaa (Microsoft) <profnandaa@gmail.com> Signed-off-by: Paulo Alcantara (Red Hat) <pc@manguebit.com> Signed-off-by: Steve French <stfrench@microsoft.com>	2024-08-22 12:37:16 -05:00
Steve French	15179cf280	smb3: fix problem unloading module due to leaked refcount on shutdown The shutdown ioctl can leak a refcount on the tlink which can prevent rmmod (unloading the cifs.ko) module from working. Found while debugging xfstest generic/043 Fixes: `69ca1f5755` ("smb3: add dynamic tracepoints for shutdown ioctl") Reviewed-by: Meetakshi Setiya <msetiya@microsoft.com> Reviewed-by: Shyam Prasad N <sprasad@microsoft.com> Signed-off-by: Steve French <stfrench@microsoft.com>	2024-08-22 12:36:57 -05:00
ChenXiaoSong	2b7e0573a4	smb/server: update misguided comment of smb2_allocate_rsp_buf() smb2_allocate_rsp_buf() will return other error code except -ENOMEM. Signed-off-by: ChenXiaoSong <chenxiaosong@kylinos.cn> Acked-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>	2024-08-22 09:52:00 -05:00
ChenXiaoSong	0dd771b7d6	smb/server: remove useless assignment of 'file_present' in smb2_open() The variable is already true here. Signed-off-by: ChenXiaoSong <chenxiaosong@kylinos.cn> Acked-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>	2024-08-22 09:52:00 -05:00
ChenXiaoSong	4e8771a366	smb/server: fix potential null-ptr-deref of lease_ctx_info in smb2_open() null-ptr-deref will occur when (req_op_level == SMB2_OPLOCK_LEVEL_LEASE) and parse_lease_state() return NULL. Fix this by check if 'lease_ctx_info' is NULL. Additionally, remove the redundant parentheses in parse_durable_handle_context(). Signed-off-by: ChenXiaoSong <chenxiaosong@kylinos.cn> Signed-off-by: Steve French <stfrench@microsoft.com>	2024-08-22 09:52:00 -05:00
ChenXiaoSong	2186a11653	smb/server: fix return value of smb2_open() In most error cases, error code is not returned in smb2_open(), __process_request() will not print error message. Fix this by returning the correct value at the end of smb2_open(). Signed-off-by: ChenXiaoSong <chenxiaosong@kylinos.cn> Acked-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>	2024-08-22 09:52:00 -05:00
Namjae Jeon	ce61b605a0	ksmbd: the buffer of smb2 query dir response has at least 1 byte When STATUS_NO_MORE_FILES status is set to smb2 query dir response, ->StructureSize is set to 9, which mean buffer has 1 byte. This issue occurs because ->Buffer[1] in smb2_query_directory_rsp to flex-array. Fixes: `eb3e28c1e8` ("smb3: Replace smb2pdu 1-element arrays with flex-arrays") Cc: stable@vger.kernel.org # v6.1+ Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>	2024-08-22 09:52:00 -05:00
Kent Overstreet	a592cdf516	bcachefs: don't use rht_bucket() in btree_key_cache_scan() rht_bucket() does strange complicated things when a rehash is in progress. Instead, just skip scanning when a rehash is in progress: scanning is going to be more expensive (many more empty slots to cover), and some sort of infinite loop is being observed Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-08-22 10:04:41 -04:00
Kent Overstreet	3e878fe5a0	bcachefs: add missing inode_walker_exit() fix a small leak Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-08-22 10:04:41 -04:00
Kent Overstreet	87313ac1f1	bcachefs: clear path->should_be_locked in bch2_btree_key_cache_drop() bch2_btree_key_cache_drop() evicts the key cache entry - it's used when we're doing an update that bypasses the key cache, because for cache coherency reasons a key can't be in the key cache unless it also exists in the btree - i.e. creates have to bypass the cache. After evicting, the path no longer points to a key cache key, and relock() will always fail if should_be_locked is true. Prep for improving path->should_be_locked assertions Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-08-22 03:12:57 -04:00
Yuesong Li	dedb2fe375	bcachefs: Fix double assignment in check_dirent_to_subvol() ret was assigned twice in check_dirent_to_subvol(). Reported by cocci. Signed-off-by: Yuesong Li <liyuesong@vivo.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-08-22 02:40:52 -04:00
Kent Overstreet	0b50b7313e	bcachefs: Fix refcounting in discard path bch_dev->io_ref does not protect against the filesystem going away; bch_fs->writes does. Thus the filesystem write ref needs to be the last ref we release. Reported-by: syzbot+9e0404b505e604f67e41@syzkaller.appspotmail.com Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-08-22 02:07:23 -04:00
Kent Overstreet	8ed823b192	bcachefs: Fix compat issue with old alloc_v4 keys we allow new fields to be added to existing key types, and new versions should treat them as being zeroed; this was not handled in alloc_v4_validate. Reported-by: syzbot+3b2968fa4953885dd66a@syzkaller.appspotmail.com Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-08-22 02:07:23 -04:00
Kent Overstreet	7f2de6947f	bcachefs: Fix warning in bch2_fs_journal_stop() j->last_empty_seq needs to match j->seq when the journal is empty Reported-by: syzbot+4093905737cf289b6b38@syzkaller.appspotmail.com Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-08-22 02:07:23 -04:00
Kent Overstreet	06f67437ab	fs/super.c: improve get_tree() error message seeing an odd bug where we fail to correctly return an error from .get_tree(): https://syzkaller.appspot.com/bug?extid=c0360e8367d6d8d04a66 we need to be able to distinguish between accidently returning a positive error (as implied by the log) and no error. Cc: David Howells <dhowells@redhat.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-08-22 02:07:23 -04:00
Kent Overstreet	bdbdd4759f	bcachefs: Fix missing validation in bch2_sb_journal_v2_validate() Reported-by: syzbot+47ecc948aadfb2ab3efc@syzkaller.appspotmail.com Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-08-22 02:07:23 -04:00
Kent Overstreet	cab18be695	bcachefs: Fix replay_now_at() assert Journal replay, in the slowpath where we insert keys in journal order, was inserting keys in the wrong order; keys from early repair come last. Reported-by: syzbot+2c4fcb257ce2b6a29d0e@syzkaller.appspotmail.com Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-08-22 02:07:23 -04:00
Kent Overstreet	6575b8c987	bcachefs: Fix locking in bch2_ioc_setlabel() Fixes: `7a254053a5` ("bcachefs: support FS_IOC_SETFSLABEL") Reported-by: syzbot+7e9efdfec27fbde0141d@syzkaller.appspotmail.com Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-08-22 02:07:23 -04:00
Kent Overstreet	5dbfc4ef72	bcachefs: fix failure to relock in btree_node_fill() Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-08-22 02:07:23 -04:00
Kent Overstreet	3c5d0b72a8	bcachefs: fix failure to relock in bch2_btree_node_mem_alloc() We weren't always so strict about trans->locked state - but now we are, and new assertions are shaking some bugs out. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-08-22 02:07:23 -04:00

1 2 3 4 5 ...

93229 Commits