linux

korg/linux

mirror of https://mirrors.bfsu.edu.cn/git/linux.git synced 2024-11-24 20:54:10 +08:00

Author	SHA1	Message	Date
Kemeng Shi	438a35e72d	ext4: remove unused parameter ngroup in ext4_mb_choose_next_group_() Remove unused parameter ngroup in ext4_mb_choose_next_group_(). Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Ojaswin Mujoo <ojaswin@linux.ibm.com> Link: https://lore.kernel.org/r/20240105092102.496631-3-shikemeng@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2024-01-18 10:52:44 -05:00
Kemeng Shi	133de5a0d8	ext4: remove unused return value of __mb_check_buddy Remove unused return value of __mb_check_buddy. Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20240105092102.496631-2-shikemeng@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2024-01-18 10:52:44 -05:00
Baokun Li	c5f3a3821d	ext4: mark the group block bitmap as corrupted before reporting an error Otherwise unlocking the group in ext4_grp_locked_error may allow other processes to modify the core block bitmap that is known to be corrupt. Signed-off-by: Baokun Li <libaokun1@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20240104142040.2835097-9-libaokun1@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2024-01-18 10:50:24 -05:00
Baokun Li	832698373a	ext4: avoid allocating blocks from corrupted group in ext4_mb_find_by_goal() Places the logic for checking if the group's block bitmap is corrupt under the protection of the group lock to avoid allocating blocks from the group with a corrupted block bitmap. Signed-off-by: Baokun Li <libaokun1@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20240104142040.2835097-8-libaokun1@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2024-01-18 10:50:24 -05:00
Baokun Li	4530b3660d	ext4: avoid allocating blocks from corrupted group in ext4_mb_try_best_found() Determine if the group block bitmap is corrupted before using ac_b_ex in ext4_mb_try_best_found() to avoid allocating blocks from a group with a corrupted block bitmap in the following concurrency and making the situation worse. ext4_mb_regular_allocator ext4_lock_group(sb, group) ext4_mb_good_group // check if the group bbitmap is corrupted ext4_mb_complex_scan_group // Scan group gets ac_b_ex but doesn't use it ext4_unlock_group(sb, group) ext4_mark_group_bitmap_corrupted(group) // The block bitmap was corrupted during // the group unlock gap. ext4_mb_try_best_found ext4_lock_group(ac->ac_sb, group) ext4_mb_use_best_found mb_mark_used // Allocating blocks in block bitmap corrupted group Signed-off-by: Baokun Li <libaokun1@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20240104142040.2835097-7-libaokun1@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2024-01-18 10:50:24 -05:00
Baokun Li	993bf0f4c3	ext4: avoid dividing by 0 in mb_update_avg_fragment_size() when block bitmap corrupt Determine if bb_fragments is 0 instead of determining bb_free to eliminate the risk of dividing by zero when the block bitmap is corrupted. Signed-off-by: Baokun Li <libaokun1@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20240104142040.2835097-6-libaokun1@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2024-01-18 10:50:24 -05:00
Baokun Li	2331fd4a49	ext4: avoid bb_free and bb_fragments inconsistency in mb_free_blocks() After updating bb_free in mb_free_blocks, it is possible to return without updating bb_fragments because the block being freed is found to have already been freed, which leads to inconsistency between bb_free and bb_fragments. Since the group may be unlocked in ext4_grp_locked_error(), this can lead to problems such as dividing by zero when calculating the average fragment length. Hence move the update of bb_free to after the block double-free check guarantees that the corresponding statistics are updated only after the core block bitmap is modified. Fixes: `eabe0444df` ("ext4: speed-up releasing blocks on commit") CC: <stable@vger.kernel.org> # 3.10 Suggested-by: Jan Kara <jack@suse.cz> Signed-off-by: Baokun Li <libaokun1@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20240104142040.2835097-5-libaokun1@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2024-01-18 10:50:24 -05:00
Baokun Li	c9b528c357	ext4: regenerate buddy after block freeing failed if under fc replay This mostly reverts commit `6bd97bf273` ("ext4: remove redundant mb_regenerate_buddy()") and reintroduces mb_regenerate_buddy(). Based on code in mb_free_blocks(), fast commit replay can end up marking as free blocks that are already marked as such. This causes corruption of the buddy bitmap so we need to regenerate it in that case. Reported-by: Jan Kara <jack@suse.cz> Fixes: `6bd97bf273` ("ext4: remove redundant mb_regenerate_buddy()") Signed-off-by: Baokun Li <libaokun1@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20240104142040.2835097-4-libaokun1@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2024-01-18 10:50:24 -05:00
Baokun Li	172202152a	ext4: do not trim the group with corrupted block bitmap Otherwise operating on an incorrupted block bitmap can lead to all sorts of unknown problems. Signed-off-by: Baokun Li <libaokun1@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20240104142040.2835097-3-libaokun1@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2024-01-18 10:50:24 -05:00
Baokun Li	55583e899a	ext4: fix double-free of blocks due to wrong extents moved_len In ext4_move_extents(), moved_len is only updated when all moves are successfully executed, and only discards orig_inode and donor_inode preallocations when moved_len is not zero. When the loop fails to exit after successfully moving some extents, moved_len is not updated and remains at 0, so it does not discard the preallocations. If the moved extents overlap with the preallocated extents, the overlapped extents are freed twice in ext4_mb_release_inode_pa() and ext4_process_freed_data() (as described in commit `94d7c16cbb` ("ext4: Fix double-free of blocks with EXT4_IOC_MOVE_EXT")), and bb_free is incremented twice. Hence when trim is executed, a zero-division bug is triggered in mb_update_avg_fragment_size() because bb_free is not zero and bb_fragments is zero. Therefore, update move_len after each extent move to avoid the issue. Reported-by: Wei Chen <harperchen1110@gmail.com> Reported-by: xingwei lee <xrivendell7@gmail.com> Closes: https://lore.kernel.org/r/CAO4mrferzqBUnCag8R3m2zf897ts9UEuhjFQGPtODT92rYyR2Q@mail.gmail.com Fixes: `fcf6b1b729` ("ext4: refactor ext4_move_extents code base") CC: <stable@vger.kernel.org> # 3.18 Signed-off-by: Baokun Li <libaokun1@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20240104142040.2835097-2-libaokun1@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2024-01-18 10:48:48 -05:00
Yuezhang Mo	0991abedde	exfat: fix zero the unwritten part for dio read For dio read, bio will be leave in flight when a successful partial aio read have been setup, blockdev_direct_IO() will return -EIOCBQUEUED. In the case, iter->iov_offset will be not advanced, the oops reported by syzbot will occur if revert iter->iov_offset with iov_iter_revert(). The unwritten part had been zeroed by aio read, so there is no need to zero it in dio read. Reported-by: syzbot+fd404f6b03a58e8bc403@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=fd404f6b03a58e8bc403 Fixes: `11a347fb6c` ("exfat: change to get file size from DataLength") Signed-off-by: Yuezhang Mo <Yuezhang.Mo@sony.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>	2024-01-18 23:01:51 +09:00
Linus Torvalds	09d1c6a80f	Generic: - Use memdup_array_user() to harden against overflow. - Unconditionally advertise KVM_CAP_DEVICE_CTRL for all architectures. - Clean up Kconfigs that all KVM architectures were selecting - New functionality around "guest_memfd", a new userspace API that creates an anonymous file and returns a file descriptor that refers to it. guest_memfd files are bound to their owning virtual machine, cannot be mapped, read, or written by userspace, and cannot be resized. guest_memfd files do however support PUNCH_HOLE, which can be used to switch a memory area between guest_memfd and regular anonymous memory. - New ioctl KVM_SET_MEMORY_ATTRIBUTES allowing userspace to specify per-page attributes for a given page of guest memory; right now the only attribute is whether the guest expects to access memory via guest_memfd or not, which in Confidential SVMs backed by SEV-SNP, TDX or ARM64 pKVM is checked by firmware or hypervisor that guarantees confidentiality (AMD PSP, Intel TDX module, or EL2 in the case of pKVM). x86: - Support for "software-protected VMs" that can use the new guest_memfd and page attributes infrastructure. This is mostly useful for testing, since there is no pKVM-like infrastructure to provide a meaningfully reduced TCB. - Fix a relatively benign off-by-one error when splitting huge pages during CLEAR_DIRTY_LOG. - Fix a bug where KVM could incorrectly test-and-clear dirty bits in non-leaf TDP MMU SPTEs if a racing thread replaces a huge SPTE with a non-huge SPTE. - Use more generic lockdep assertions in paths that don't actually care about whether the caller is a reader or a writer. - let Xen guests opt out of having PV clock reported as "based on a stable TSC", because some of them don't expect the "TSC stable" bit (added to the pvclock ABI by KVM, but never set by Xen) to be set. - Revert a bogus, made-up nested SVM consistency check for TLB_CONTROL. - Advertise flush-by-ASID support for nSVM unconditionally, as KVM always flushes on nested transitions, i.e. always satisfies flush requests. This allows running bleeding edge versions of VMware Workstation on top of KVM. - Sanity check that the CPU supports flush-by-ASID when enabling SEV support. - On AMD machines with vNMI, always rely on hardware instead of intercepting IRET in some cases to detect unmasking of NMIs - Support for virtualizing Linear Address Masking (LAM) - Fix a variety of vPMU bugs where KVM fail to stop/reset counters and other state prior to refreshing the vPMU model. - Fix a double-overflow PMU bug by tracking emulated counter events using a dedicated field instead of snapshotting the "previous" counter. If the hardware PMC count triggers overflow that is recognized in the same VM-Exit that KVM manually bumps an event count, KVM would pend PMIs for both the hardware-triggered overflow and for KVM-triggered overflow. - Turn off KVM_WERROR by default for all configs so that it's not inadvertantly enabled by non-KVM developers, which can be problematic for subsystems that require no regressions for W=1 builds. - Advertise all of the host-supported CPUID bits that enumerate IA32_SPEC_CTRL "features". - Don't force a masterclock update when a vCPU synchronizes to the current TSC generation, as updating the masterclock can cause kvmclock's time to "jump" unexpectedly, e.g. when userspace hotplugs a pre-created vCPU. - Use RIP-relative address to read kvm_rebooting in the VM-Enter fault paths, partly as a super minor optimization, but mostly to make KVM play nice with position independent executable builds. - Guard KVM-on-HyperV's range-based TLB flush hooks with an #ifdef on CONFIG_HYPERV as a minor optimization, and to self-document the code. - Add CONFIG_KVM_HYPERV to allow disabling KVM support for HyperV "emulation" at build time. ARM64: - LPA2 support, adding 52bit IPA/PA capability for 4kB and 16kB base granule sizes. Branch shared with the arm64 tree. - Large Fine-Grained Trap rework, bringing some sanity to the feature, although there is more to come. This comes with a prefix branch shared with the arm64 tree. - Some additional Nested Virtualization groundwork, mostly introducing the NV2 VNCR support and retargetting the NV support to that version of the architecture. - A small set of vgic fixes and associated cleanups. Loongarch: - Optimization for memslot hugepage checking - Cleanup and fix some HW/SW timer issues - Add LSX/LASX (128bit/256bit SIMD) support RISC-V: - KVM_GET_REG_LIST improvement for vector registers - Generate ISA extension reg_list using macros in get-reg-list selftest - Support for reporting steal time along with selftest s390: - Bugfixes Selftests: - Fix an annoying goof where the NX hugepage test prints out garbage instead of the magic token needed to run the test. - Fix build errors when a header is delete/moved due to a missing flag in the Makefile. - Detect if KVM bugged/killed a selftest's VM and print out a helpful message instead of complaining that a random ioctl() failed. - Annotate the guest printf/assert helpers with __printf(), and fix the various bugs that were lurking due to lack of said annotation. There are two non-KVM patches buried in the middle of guest_memfd support: fs: Rename anon_inode_getfile_secure() and anon_inode_getfd_secure() mm: Add AS_UNMOVABLE to mark mapping as completely unmovable The first is small and mostly suggested-by Christian Brauner; the second a bit less so but it was written by an mm person (Vlastimil Babka). -----BEGIN PGP SIGNATURE----- iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmWcMWkUHHBib256aW5p QHJlZGhhdC5jb20ACgkQv/vSX3jHroO15gf/WLmmg3SET6Uzw9iEq2xo28831ZA+ 6kpILfIDGKozV5safDmMvcInlc/PTnqOFrsKyyN4kDZ+rIJiafJdg/loE0kPXBML wdR+2ix5kYI1FucCDaGTahskBDz8Lb/xTpwGg9BFLYFNmuUeHc74o6GoNvr1uliE 4kLZL2K6w0cSMPybUD+HqGaET80ZqPwecv+s1JL+Ia0kYZJONJifoHnvOUJ7DpEi rgudVdgzt3EPjG0y1z6MjvDBXTCOLDjXajErlYuZD3Ej8N8s59Dh2TxOiDNTLdP4 a4zjRvDmgyr6H6sz+upvwc7f4M4p+DBvf+TkWF54mbeObHUYliStqURIoA== =66Ws -----END PGP SIGNATURE----- Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm Pull kvm updates from Paolo Bonzini: "Generic: - Use memdup_array_user() to harden against overflow. - Unconditionally advertise KVM_CAP_DEVICE_CTRL for all architectures. - Clean up Kconfigs that all KVM architectures were selecting - New functionality around "guest_memfd", a new userspace API that creates an anonymous file and returns a file descriptor that refers to it. guest_memfd files are bound to their owning virtual machine, cannot be mapped, read, or written by userspace, and cannot be resized. guest_memfd files do however support PUNCH_HOLE, which can be used to switch a memory area between guest_memfd and regular anonymous memory. - New ioctl KVM_SET_MEMORY_ATTRIBUTES allowing userspace to specify per-page attributes for a given page of guest memory; right now the only attribute is whether the guest expects to access memory via guest_memfd or not, which in Confidential SVMs backed by SEV-SNP, TDX or ARM64 pKVM is checked by firmware or hypervisor that guarantees confidentiality (AMD PSP, Intel TDX module, or EL2 in the case of pKVM). x86: - Support for "software-protected VMs" that can use the new guest_memfd and page attributes infrastructure. This is mostly useful for testing, since there is no pKVM-like infrastructure to provide a meaningfully reduced TCB. - Fix a relatively benign off-by-one error when splitting huge pages during CLEAR_DIRTY_LOG. - Fix a bug where KVM could incorrectly test-and-clear dirty bits in non-leaf TDP MMU SPTEs if a racing thread replaces a huge SPTE with a non-huge SPTE. - Use more generic lockdep assertions in paths that don't actually care about whether the caller is a reader or a writer. - let Xen guests opt out of having PV clock reported as "based on a stable TSC", because some of them don't expect the "TSC stable" bit (added to the pvclock ABI by KVM, but never set by Xen) to be set. - Revert a bogus, made-up nested SVM consistency check for TLB_CONTROL. - Advertise flush-by-ASID support for nSVM unconditionally, as KVM always flushes on nested transitions, i.e. always satisfies flush requests. This allows running bleeding edge versions of VMware Workstation on top of KVM. - Sanity check that the CPU supports flush-by-ASID when enabling SEV support. - On AMD machines with vNMI, always rely on hardware instead of intercepting IRET in some cases to detect unmasking of NMIs - Support for virtualizing Linear Address Masking (LAM) - Fix a variety of vPMU bugs where KVM fail to stop/reset counters and other state prior to refreshing the vPMU model. - Fix a double-overflow PMU bug by tracking emulated counter events using a dedicated field instead of snapshotting the "previous" counter. If the hardware PMC count triggers overflow that is recognized in the same VM-Exit that KVM manually bumps an event count, KVM would pend PMIs for both the hardware-triggered overflow and for KVM-triggered overflow. - Turn off KVM_WERROR by default for all configs so that it's not inadvertantly enabled by non-KVM developers, which can be problematic for subsystems that require no regressions for W=1 builds. - Advertise all of the host-supported CPUID bits that enumerate IA32_SPEC_CTRL "features". - Don't force a masterclock update when a vCPU synchronizes to the current TSC generation, as updating the masterclock can cause kvmclock's time to "jump" unexpectedly, e.g. when userspace hotplugs a pre-created vCPU. - Use RIP-relative address to read kvm_rebooting in the VM-Enter fault paths, partly as a super minor optimization, but mostly to make KVM play nice with position independent executable builds. - Guard KVM-on-HyperV's range-based TLB flush hooks with an #ifdef on CONFIG_HYPERV as a minor optimization, and to self-document the code. - Add CONFIG_KVM_HYPERV to allow disabling KVM support for HyperV "emulation" at build time. ARM64: - LPA2 support, adding 52bit IPA/PA capability for 4kB and 16kB base granule sizes. Branch shared with the arm64 tree. - Large Fine-Grained Trap rework, bringing some sanity to the feature, although there is more to come. This comes with a prefix branch shared with the arm64 tree. - Some additional Nested Virtualization groundwork, mostly introducing the NV2 VNCR support and retargetting the NV support to that version of the architecture. - A small set of vgic fixes and associated cleanups. Loongarch: - Optimization for memslot hugepage checking - Cleanup and fix some HW/SW timer issues - Add LSX/LASX (128bit/256bit SIMD) support RISC-V: - KVM_GET_REG_LIST improvement for vector registers - Generate ISA extension reg_list using macros in get-reg-list selftest - Support for reporting steal time along with selftest s390: - Bugfixes Selftests: - Fix an annoying goof where the NX hugepage test prints out garbage instead of the magic token needed to run the test. - Fix build errors when a header is delete/moved due to a missing flag in the Makefile. - Detect if KVM bugged/killed a selftest's VM and print out a helpful message instead of complaining that a random ioctl() failed. - Annotate the guest printf/assert helpers with __printf(), and fix the various bugs that were lurking due to lack of said annotation" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (185 commits) x86/kvm: Do not try to disable kvmclock if it was not enabled KVM: x86: add missing "depends on KVM" KVM: fix direction of dependency on MMU notifiers KVM: introduce CONFIG_KVM_COMMON KVM: arm64: Add missing memory barriers when switching to pKVM's hyp pgd KVM: arm64: vgic-its: Avoid potential UAF in LPI translation cache RISC-V: KVM: selftests: Add get-reg-list test for STA registers RISC-V: KVM: selftests: Add steal_time test support RISC-V: KVM: selftests: Add guest_sbi_probe_extension RISC-V: KVM: selftests: Move sbi_ecall to processor.c RISC-V: KVM: Implement SBI STA extension RISC-V: KVM: Add support for SBI STA registers RISC-V: KVM: Add support for SBI extension registers RISC-V: KVM: Add SBI STA info to vcpu_arch RISC-V: KVM: Add steal-update vcpu request RISC-V: KVM: Add SBI STA extension skeleton RISC-V: paravirt: Implement steal-time support RISC-V: Add SBI STA extension definitions RISC-V: paravirt: Add skeleton for pv-time support RISC-V: KVM: Fix indentation in kvm_riscv_vcpu_set_reg_csr() ...	2024-01-17 13:03:37 -08:00
Linus Torvalds	0c6bc37255	This pull request contains updates for UBI and UBIFS: UBI: - Use in-tree fault injection framework and add new injection types - Fix for a memory leak in the block driver UBIFS: - kernel-doc fixes - Various minor fixes -----BEGIN PGP SIGNATURE----- iQJKBAABCAA0FiEEdgfidid8lnn52cLTZvlZhesYu8EFAmWi8k0WHHJpY2hhcmRA c2lnbWEtc3Rhci5hdAAKCRBm+VmF6xi7wVMEEAClpCwGQ1zjViuDR+ly1etpd2VJ SVH687jQ5bj92joMbJuX1n3iucDKu22KNR6tuePtEWousKEjiP5MU5Vhj4qcEZJj ORwtLOhchF7EHokJ16O2zBTBjznQuSmy0TG8vB/4hKj1a9FHLYPoDpZ595i2ATIA sh4+jfTRiOviX1SWe3qP9Hwx/WBXJpNluNNosabaEkTPe6CEAqnw92Hsm8PC8WY0 0F9zKPbRTiu/Mt8PoF0YHo9pNsX0TikJMPj+QuBSOt3tK5PmPFttL6ce5Zal+wi3 Df+8Qqw2QPchMDesaeZHtknZkZWbxtWPk+1U7EaLUwb6lw7cyI9SPWtQFYS4Ot6r ieUW5mQt2arC6Yjj1u+pFLIvLJOYgg0kiPySvRiA4EKkAyTMBjQzeyf0XCVrgW2s UeBiQTz5LkL4soAo/aWDyny81RXJjtuMpn/+WAq4o36LZkG4aiGXh+ue5l5d9Mq5 Fh/MNyRA9le5STebrqqH7TBtiOwBG+ZJ9yqYffzya+756od6wsnemGfaZ/pPzzSe sp9MEYzrz4hhRvDHegKcIbxb+OUVFNJ1t5gdIUsZAqWARxcfYD9xeqyHVVhvFDjf UzQhZXfKgdnwp4zWHtSBRkDKCEMvxG8Nw3Rnp9ayZwxiQBBalRV6MV33g5RXRIis Xp+fCRu3gjlhBzlU6w== =5I24 -----END PGP SIGNATURE----- Merge tag 'ubifs-for-linus-6.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/ubifs Pull UBI and UBIFS updates from Richard Weinberger: "UBI: - Use in-tree fault injection framework and add new injection types - Fix for a memory leak in the block driver UBIFS: - kernel-doc fixes - Various minor fixes" * tag 'ubifs-for-linus-6.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/ubifs: ubi: block: fix memleak in ubiblock_create() ubifs: fix kernel-doc warnings mtd: Add several functions to the fail_function list ubi: Reserve sufficient buffer length for the input mask ubi: Add six fault injection type for testing ubi: Split io_failures into write_failure and erase_failure ubi: Use the fault injection framework to enhance the fault injection capability ubifs: ubifs_symlink: Fix memleak of inode->i_link in error path ubifs: Check @c->dirty_[n\|p]n_cnt and @c->nroot state under @c->lp_mutex ubifs: describe function parameters ubifs: auth.c: fix kernel-doc function prototype warning ubifs: use crypto_shash_tfm_digest() in ubifs_hmac_wkm()	2024-01-17 10:27:13 -08:00
Linus Torvalds	eebe75827b	fscrypt fix for 6.8-rc1 Fix a bug in my change to how f2fs frees its superblock info (which was part of changing the timing of fscrypt keyring destruction). -----BEGIN PGP SIGNATURE----- iIoEABYIADIWIQSacvsUNc7UX4ntmEPzXCl4vpKOKwUCZaH8VRQcZWJpZ2dlcnNA Z29vZ2xlLmNvbQAKCRDzXCl4vpKOK6ZKAP9cGzwa35300y5/ZwPQxdN7eIThjU0f dv3pUhd69LkZ8QD/QwFRxtjLOp0nx/nfUjwm2TBH44XjidFvPXb0nRCumgc= =SHQL -----END PGP SIGNATURE----- Merge tag 'fscrypt-for-linus' of git://git.kernel.org/pub/scm/fs/fscrypt/linux Pull fscrypt fix from Eric Biggers: "Fix a bug in my change to how f2fs frees its superblock info (which was part of changing the timing of fscrypt keyring destruction)" * tag 'fscrypt-for-linus' of git://git.kernel.org/pub/scm/fs/fscrypt/linux: f2fs: fix double free of f2fs_sb_info	2024-01-17 10:23:34 -08:00
Linus Torvalds	c2459ce011	vfs-6.8-rc1.fixes -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCZaJ8xAAKCRCRxhvAZXjc ojs2AQCrK7pwncSszfIbQRK7SAHhZS/k4G3LQiQ8mt7VstcTlgD/TpbfnlIX6ONf g3NWgQ8Y/ifPDqQl2qnd9PK4zYVJswo= =ExMf -----END PGP SIGNATURE----- Merge tag 'vfs-6.8-rc1.fixes' of gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs Pull vfs fixes from Christian Brauner: "This contains two fixes for the current merge window. The listmount changes that you requested and a fix for a fsnotify performance regression: - The proposed listmount changes are currently under my authorship. I wasn't sure whether you'd wanted to be author as the patch wasn't signed off. If you do I'm happy if you just apply your own patch. I've tested the patch with my sh4 cross-build setup. And confirmed that a) the build failure with sh on current upstream is reproducible and that b) the proposed patch fixes the build failure. That should only leave the task of fixing put_user on sh. - The fsnotify regression was caused by moving one of the hooks out of the security hook in preparation for other fsnotify work. This meant that CONFIG_SECURITY would have compiled out the fsnotify hook before but didn't do so now. That lead to up to 6% performance regression in some io_uring workloads that compile all fsnotify and security checks out. Fix this by making sure that the relevant hooks are covered by the already existing CONFIG_FANOTIFY_ACCESS_PERMISSIONS where the relevant hook belongs" * tag 'vfs-6.8-rc1.fixes' of gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs: fs: rework listmount() implementation fsnotify: compile out fsnotify permission hooks if !FANOTIFY_ACCESS_PERMISSIONS	2024-01-17 09:34:25 -08:00
Linus Torvalds	7f5e47f785	17 hotfixes. 10 address post-6.7 issues and the other 7 are cc:stable. -----BEGIN PGP SIGNATURE----- iHUEABYIAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZaHe5gAKCRDdBJ7gKXxA jrAiAQCYZQuwsNVyGJUuPD/GGQzqVUZNpWcuYwMXXAi6dO5rSAD+LDeFviun2K52 uHCz4iRq5EwNLA+MbdHtAnQzr+e5CQ8= =Jjkw -----END PGP SIGNATURE----- Merge tag 'mm-hotfixes-stable-2024-01-12-16-52' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull misc hotfixes from Andrew Morton: "For once not mostly MM-related. 17 hotfixes. 10 address post-6.7 issues and the other 7 are cc:stable" * tag 'mm-hotfixes-stable-2024-01-12-16-52' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: userfaultfd: avoid huge_zero_page in UFFDIO_MOVE MAINTAINERS: add entry for shrinker selftests: mm: hugepage-vmemmap fails on 64K page size systems mm/memory_hotplug: fix memmap_on_memory sysfs value retrieval mailmap: switch email for Tanzir Hasan mailmap: add old address mappings for Randy kernel/crash_core.c: make __crash_hotplug_lock static efi: disable mirror feature during crashkernel kexec: do syscore_shutdown() in kernel_kexec mailmap: update entry for Manivannan Sadhasivam fs/proc/task_mmu: move mmu notification mechanism inside mm lock mm: zswap: switch maintainers to recently active developers and reviewers scripts/decode_stacktrace.sh: optionally use LLVM utilities kasan: avoid resetting aux_lock lib/Kconfig.debug: disable CONFIG_DEBUG_INFO_BTF for Hexagon MAINTAINERS: update LTP maintainers kdump: defer the insertion of crashkernel resources	2024-01-17 09:31:36 -08:00
Colin Ian King	8ca5d2641b	cifs: remove redundant variable tcon_exist The variable tcon_exist is being assigned however it is never read, the variable is redundant and can be removed. Cleans up clang scan build warning: warning: Although the value stored to 'tcon_exist' is used in the enclosing expression, the value is never actually readfrom 'tcon_exist' [deadcode.DeadStores] Signed-off-by: Colin Ian King <colin.i.king@gmail.com> Signed-off-by: Steve French <stfrench@microsoft.com>	2024-01-16 17:19:46 -06:00
Erick Archer	1057066009	eventfs: Use kcalloc() instead of kzalloc() As noted in the "Deprecated Interfaces, Language Features, Attributes, and Conventions" documentation [1], size calculations (especially multiplication) should not be performed in memory allocator (or similar) function arguments due to the risk of them overflowing. This could lead to values wrapping around and a smaller allocation being made than the caller was expecting. Using those allocations could lead to linear overflows of heap memory and other misbehaviors. So, use the purpose specific kcalloc() function instead of the argument size * count in the kzalloc() function. [1] https://www.kernel.org/doc/html/next/process/deprecated.html#open-coded-arithmetic-in-allocator-arguments Link: https://lore.kernel.org/linux-trace-kernel/20240115181658.4562-1-erick.archer@gmx.com Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Mark Rutland <mark.rutland@arm.com> Link: https://github.com/KSPP/linux/issues/162 Signed-off-by: Erick Archer <erick.archer@gmx.com> Reviewed-by: Gustavo A. R. Silva <gustavoars@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>	2024-01-16 17:52:33 -05:00
Steven Rostedt (Google)	852e46e239	eventfs: Do not create dentries nor inodes in iterate_shared The original eventfs code added a wrapper around the dcache_readdir open callback and created all the dentries and inodes at open, and increment their ref count. A wrapper was added around the dcache_readdir release function to decrement all the ref counts of those created inodes and dentries. But this proved to be buggy[1] for when a kprobe was created during a dir read, it would create a dentry between the open and the release, and because the release would decrement all ref counts of all files and directories, that would include the kprobe directory that was not there to have its ref count incremented in open. This would cause the ref count to go to negative and later crash the kernel. To solve this, the dentries and inodes that were created and had their ref count upped in open needed to be saved. That list needed to be passed from the open to the release, so that the release would only decrement the ref counts of the entries that were incremented in the open. Unfortunately, the dcache_readdir logic was already using the file->private_data, which is the only field that can be used to pass information from the open to the release. What was done was the eventfs created another descriptor that had a void pointer to save the dcache_readdir pointer, and it wrapped all the callbacks, so that it could save the list of entries that had their ref counts incremented in the open, and pass it to the release. The wrapped callbacks would just put back the dcache_readdir pointer and call the functions it used so it could still use its data[2]. But Linus had an issue with the "hijacking" of the file->private_data (unfortunately this discussion was on a security list, so no public link). Which we finally agreed on doing everything within the iterate_shared callback and leave the dcache_readdir out of it[3]. All the information needed for the getents() could be created then. But this ended up being buggy too[4]. The iterate_shared callback was not the right place to create the dentries and inodes. Even Christian Brauner had issues with that[5]. An attempt was to go back to creating the inodes and dentries at the open, create an array to store the information in the file->private_data, and pass that information to the other callbacks.[6] The difference between that and the original method, is that it does not use dcache_readdir. It also does not up the ref counts of the dentries and pass them. Instead, it creates an array of a structure that saves the dentry's name and inode number. That information is used in the iterate_shared callback, and the array is freed in the dir release. The dentries and inodes created in the open are not used for the iterate_share or release callbacks. Just their names and inode numbers. Linus did not like that either[7] and just wanted to remove the dentries being created in iterate_shared and use the hard coded inode numbers. [ All this while Linus enjoyed an unexpected vacation during the merge window due to lack of power. ] [1] https://lore.kernel.org/linux-trace-kernel/20230919211804.230edf1e@gandalf.local.home/ [2] https://lore.kernel.org/linux-trace-kernel/20230922163446.1431d4fa@gandalf.local.home/ [3] https://lore.kernel.org/linux-trace-kernel/20240104015435.682218477@goodmis.org/ [4] https://lore.kernel.org/all/202401152142.bfc28861-oliver.sang@intel.com/ [5] https://lore.kernel.org/all/20240111-unzahl-gefegt-433acb8a841d@brauner/ [6] https://lore.kernel.org/all/20240116114711.7e8637be@gandalf.local.home/ [7] https://lore.kernel.org/all/20240116170154.5bf0a250@gandalf.local.home/ Link: https://lore.kernel.org/linux-trace-kernel/20240116211353.573784051@goodmis.org Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Christian Brauner <brauner@kernel.org> Cc: Al Viro <viro@ZenIV.linux.org.uk> Cc: Ajay Kaher <ajay.kaher@broadcom.com> Fixes: `493ec81a8f` ("eventfs: Stop using dcache_readdir() for getdents()") Reported-by: kernel test robot <oliver.sang@intel.com> Closes: https://lore.kernel.org/oe-lkp/202401152142.bfc28861-oliver.sang@intel.com Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>	2024-01-16 17:48:19 -05:00
Steven Rostedt (Google)	53c41052ba	eventfs: Have the inodes all for files and directories all be the same The dentries and inodes are created in the readdir for the sole purpose of getting a consistent inode number. Linus stated that is unnecessary, and that all inodes can have the same inode number. For a virtual file system they are pretty meaningless. Instead use a single unique inode number for all files and one for all directories. Link: https://lore.kernel.org/all/20240116133753.2808d45e@gandalf.local.home/ Link: https://lore.kernel.org/linux-trace-kernel/20240116211353.412180363@goodmis.org Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Christian Brauner <brauner@kernel.org> Cc: Al Viro <viro@ZenIV.linux.org.uk> Cc: Ajay Kaher <ajay.kaher@broadcom.com> Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>	2024-01-16 16:27:47 -05:00
Konstantin Komarov	ddb17dc880	fs/ntfs3: Use kvfree to free memory allocated by kvmalloc Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>	2024-01-16 11:31:56 +03:00
David Howells	2b872b0f46	erofs: Don't use certain unnecessary folio_() functions Filesystems should use folio->index and folio->mapping, instead of folio_index(folio), folio_mapping() and folio_file_mapping() since they know that it's in the pagecache. Change this automagically with: perl -p -i -e 's/folio_mapping[(]([^)])[)]/\1->mapping/g' fs/erofs/.c perl -p -i -e 's/folio_file_mapping[(]([^)])[)]/\1->mapping/g' fs/erofs/.c perl -p -i -e 's/folio_index[(]([^)])[)]/\1->index/g' fs/erofs/*.c Reported-by: Matthew Wilcox <willy@infradead.org> Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Cc: Chao Yu <chao@kernel.org> Cc: Yue Hu <huyue2@coolpad.com> Cc: Jeffle Xu <jefflexu@linux.alibaba.com> Cc: linux-erofs@lists.ozlabs.org Cc: linux-fsdevel@vger.kernel.org Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> Link: https://lore.kernel.org/r/20240115144635.1931422-1-hsiangkao@linux.alibaba.com	2024-01-15 23:52:52 +08:00
Al Viro	2a965d1b15	ceph: get rid of passing callbacks in __dentry_leases_walk() __dentry_leases_walk() gets a callback and calls it for a bunch of denties; there are exactly two callers and we already have a flag telling them apart - lwc->dir_lease. Seeing that indirect calls are costly these days, let's get rid of the callback and just call the right function directly. Has a side benefit of saner signatures... [ xiubli: a minor fix in the commit title ] Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Reviewed-by: Xiubo Li <xiubli@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2024-01-15 15:54:54 +01:00
Al Viro	f6fb21b22f	ceph: d_obtain_{alias,root}(ERR_PTR(...)) will do the right thing Clean up the code. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Reviewed-by: Xiubo Li <xiubli@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2024-01-15 15:40:51 +01:00
Wenchao Hao	0f4cf64eab	ceph: fix invalid pointer access if get_quota_realm return ERR_PTR This issue is reported by smatch that get_quota_realm() might return ERR_PTR but we did not handle it. It's not a immediate bug, while we still should address it to avoid potential bugs if get_quota_realm() is changed to return other ERR_PTR in future. Set ceph_snap_realm's pointer in get_quota_realm()'s to address this issue, the pointer would be set to NULL if get_quota_realm() failed to get struct ceph_snap_realm, so no ERR_PTR would happen any more. [ xiubli: minor code style clean up ] Signed-off-by: Wenchao Hao <haowenchao2@huawei.com> Reviewed-by: Xiubo Li <xiubli@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2024-01-15 15:40:51 +01:00
Xiubo Li	b36b03344f	ceph: remove duplicated code in ceph_netfs_issue_read() When allocating an osd request the libceph.ko will add the 'read_from_replica' flag by default. Signed-off-by: Xiubo Li <xiubli@redhat.com> Reviewed-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2024-01-15 15:40:51 +01:00
Xiubo Li	6df89bf220	ceph: send oldest_client_tid when renewing caps Update the oldest_client_tid via the session renew caps msg to make sure that the MDSs won't pile up the completed request list in a very large size. [ idryomov: drop inapplicable comment ] Link: https://tracker.ceph.com/issues/63364 Signed-off-by: Xiubo Li <xiubli@redhat.com> Reviewed-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2024-01-15 15:40:51 +01:00
Xiubo Li	66207de308	ceph: rename create_session_open_msg() to create_session_full_msg() Makes the create session msg helper to be more general and could be used by other ops. Signed-off-by: Xiubo Li <xiubli@redhat.com> Reviewed-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2024-01-15 15:40:51 +01:00
Eric Biggers	9c896d6bc3	ceph: select FS_ENCRYPTION_ALGS if FS_ENCRYPTION The kconfig options for filesystems that support FS_ENCRYPTION are supposed to select FS_ENCRYPTION_ALGS. This is needed to ensure that required crypto algorithms get enabled as loadable modules or builtin as is appropriate for the set of enabled filesystems. Do this for CEPH_FS so that there aren't any missing algorithms if someone happens to have CEPH_FS as their only enabled filesystem that supports encryption. Cc: stable@vger.kernel.org Fixes: `f061feda6c` ("ceph: add fscrypt ioctls and ceph.fscrypt.auth vxattr") Signed-off-by: Eric Biggers <ebiggers@google.com> Reviewed-by: Xiubo Li <xiubli@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2024-01-15 15:40:50 +01:00
Xiubo Li	b493ad718b	ceph: fix deadlock or deadcode of misusing dget() The lock order is incorrect between denty and its parent, we should always make sure that the parent get the lock first. But since this deadcode is never used and the parent dir will always be set from the callers, let's just remove it. Link: https://lore.kernel.org/r/20231116081919.GZ1957730@ZenIV Reported-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Xiubo Li <xiubli@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2024-01-15 15:40:50 +01:00
Xiubo Li	aaefabc4a5	ceph: try to allocate a smaller extent map for sparse read In fscrypt case and for a smaller read length we can predict the max count of the extent map. And for small read length use cases this could save some memories. [ idryomov: squash into a single patch to avoid build break, drop redundant variable in ceph_alloc_sparse_ext_map() ] Signed-off-by: Xiubo Li <xiubli@redhat.com> Reviewed-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2024-01-15 15:40:50 +01:00
Venky Shankar	f48e0342a7	ceph: reinitialize mds feature bit even when session in open Following along the same lines as per the user-space fix. Right now this isn't really an issue with the ceph kernel driver because of the feature bit laginess, however, that can change over time (when the new snaprealm info type is ported to the kernel driver) and depending on the MDS version that's being upgraded can cause message decoding issues - so, fix that early on. Link: http://tracker.ceph.com/issues/63188 Signed-off-by: Venky Shankar <vshankar@redhat.com> Reviewed-by: Xiubo Li <xiubli@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2024-01-15 15:40:50 +01:00
Xiubo Li	cbcb358b74	ceph: skip reconnecting if MDS is not ready When MDS closed the session the kclient will send to reconnect to it immediately, but if the MDS just restarted and still not ready yet, such as still in the up:replay state and the sessionmap journal logs hasn't be replayed, the MDS will close the session. And then the kclient could remove the session and later when the mdsmap is in RECONNECT phrase it will skip reconnecting. But the MDS will wait until timeout and then evict the kclient. Just skip sending the reconnection request until the MDS is ready. Link: https://tracker.ceph.com/issues/62489 Signed-off-by: Xiubo Li <xiubli@redhat.com> Reviewed-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2024-01-15 15:40:50 +01:00
Namjae Jeon	77bebd1864	ksmbd: only v2 leases handle the directory When smb2 leases is disable, ksmbd can send oplock break notification and cause wait oplock break ack timeout. It may appear like hang when accessing a directory. This patch make only v2 leases handle the directory. Cc: stable@vger.kernel.org Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>	2024-01-14 22:14:56 -06:00
Namjae Jeon	38d20c6290	ksmbd: fix UAF issue in ksmbd_tcp_new_connection() The race is between the handling of a new TCP connection and its disconnection. It leads to UAF on `struct tcp_transport` in ksmbd_tcp_new_connection() function. Cc: stable@vger.kernel.org Reported-by: zdi-disclosures@trendmicro.com # ZDI-CAN-22991 Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>	2024-01-14 11:39:49 -06:00
Namjae Jeon	92e470163d	ksmbd: validate mech token in session setup If client send invalid mech token in session setup request, ksmbd validate and make the error if it is invalid. Cc: stable@vger.kernel.org Reported-by: zdi-disclosures@trendmicro.com # ZDI-CAN-22890 Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>	2024-01-14 11:39:49 -06:00
Gao Xiang	118a8cf504	erofs: fix inconsistent per-file compression format EROFS can select compression algorithms on a per-file basis, and each per-file compression algorithm needs to be marked in the on-disk superblock for initialization. However, syzkaller can generate inconsistent crafted images that use an unsupported algorithmtype for specific inodes, e.g. use MicroLZMA algorithmtype even it's not set in `sbi->available_compr_algs`. This can lead to an unexpected "BUG: kernel NULL pointer dereference" if the corresponding decompressor isn't built-in. Fix this by checking against `sbi->available_compr_algs` for each m_algorithmformat request. Incorrect !erofs_sb_has_compr_cfgs preset bitmap is now fixed together since it was harmless previously. Reported-by: <bugreport@ubisectech.com> Fixes: `8f89926290` ("erofs: get compression algorithms directly on mapping") Fixes: `622ceaddb7` ("erofs: lzma compression support") Reviewed-by: Yue Hu <huyue2@coolpad.com> Link: https://lore.kernel.org/r/20240113150602.1471050-1-hsiangkao@linux.alibaba.com Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>	2024-01-13 23:58:08 +08:00
Christian Brauner	ba5afb9a84	fs: rework listmount() implementation Linus pointed out that there's error handling and naming issues in the that we should rewrite: * Perform the access checks for the buffer before actually doing any work instead of doing it during the iteration. * Rename the arguments to listmount() and do_listmount() to clarify what the arguments are used for. * Get rid of the pointless ctr variable and overflow checking. * Get rid of the pointless speculation check. Link: https://lore.kernel.org/r/CAHk-=wjh6Cypo8WC-McXgSzCaou3UXccxB+7PVeSuGR8AjCphg@mail.gmail.com Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Christian Brauner <brauner@kernel.org>	2024-01-13 13:06:25 +01:00
Eric Biggers	c919330dd5	f2fs: fix double free of f2fs_sb_info kill_f2fs_super() is called even if f2fs_fill_super() fails. f2fs_fill_super() frees the struct f2fs_sb_info, so it must set sb->s_fs_info to NULL to prevent it from being freed again. Fixes: `275dca4630` ("f2fs: move release of block devices to after kill_block_super()") Reported-by: <syzbot+8f477ac014ff5b32d81f@syzkaller.appspotmail.com> Closes: https://lore.kernel.org/lkml/0000000000006cb174060ec34502@google.com Reviewed-by: Chao Yu <chao@kernel.org> Link: https://lore.kernel.org/linux-f2fs-devel/20240113005747.38887-1-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com>	2024-01-12 18:55:09 -08:00
Linus Torvalds	052d534373	Description for this pull request: - Replace the internal table lookup algorithm with the hweight library and ffs of the bitops library. - Handle the two types of stream entry, valid data size(has been written) and data size separately.It will improves compatibility with two differently sized files created on Windows. -----BEGIN PGP SIGNATURE----- iQJKBAABCgA0FiEE6NzKS6Uv/XAAGHgyZwv7A1FEIQgFAmWgmxoWHGxpbmtpbmpl b25Aa2VybmVsLm9yZwAKCRBnC/sDUUQhCERgD/4rHm1yG0ZlURvXiAwZwVOQMJoz 9Y8Gz3M1LsycJEN1uxNjSYfUe9LX/BlbXz5uIH8tVQjEEIbyl0RmJjITawVBHVbS Ps/UMDiQvT5DPqIwhrfTh9qxy0cRi7WBuKNAXRXSSVRx2mMWYIjNxT+8dIcD2FEG 63ojnoYi8RwuYuvCwo51coxf8/E7GY+WAYnC97hqtj2jSQ6gMjeDtiyYx1m5PfUN 32NdG1IYaiTstD7EU1lv1QNzLZx/Q9gBhi0jhDu1qc0fI+rS49p0zqop1TeEtsIf RD05XHZ8KRapChgoSvw+hb6CfZ7RanImFAHm6WnILqgFoY7uagUH1dn3oOJFgdLA OTwbEA/sQmnIdqg07Hhgf74OI9bu/kgP7g8/xrooqhO2SkYGLXDLgYFhEk08aEyE sp9fxtBfKhXUVHKafzkKtUmI+THl5W793aAfND5W+ahX2zDprwupzg/F7p4Sj3tJ GbvaRL/n1d/O1dhf/doTmfggH7TnDODS729w0HBSNJU+q6zrGluLRyqB3XsRFXng 7RlN8f4HSI6eFRVG7KTwxVcfwsedtPmNRKLg3PEMkXz5jb4wsw7tZUB3gAFwy9qf cZd7/+oU9qKEgrBRDJfJsFqq0IpzLCXDEZp00F5RregLIhWHZN4ghBrVU94ciDuT gxBgoSWrLqObnXmLVQ== =WFoc -----END PGP SIGNATURE----- Merge tag 'exfat-for-6.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/linkinjeon/exfat Pull exfat updates from Namjae Jeon: - Replace the internal table lookup algorithm with the hweight library and ffs of the bitops library. - Handle the two types of stream entry, valid data size (has been written) and data size separately. It improves compatibility with two differently sized files created on Windows. * tag 'exfat-for-6.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/linkinjeon/exfat: exfat: do not zero the extended part exfat: change to get file size from DataLength exfat: using ffs instead of internal logic exfat: using hweight instead of internal logic	2024-01-12 18:05:56 -08:00
Linus Torvalds	f16ab99c2e	fix buggered locking in bch2_ioctl_subvolume_destroy() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> -----BEGIN PGP SIGNATURE----- iHUEABYIAB0WIQQqUNBr3gm4hGXdBJlZ7Krx/gZQ6wUCZaDougAKCRBZ7Krx/gZQ 60eJAQCtXa908kOFDjSSTetU6aBzWKcCCHszirjhXiTFJv1jTgD/TbvyGs4ku7Ri oI4nh1XX4QMVWsup1VETnnLAjt6DhAw= =fror -----END PGP SIGNATURE----- Merge tag 'pull-bcachefs-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull bcachefs locking fix from Al Viro: "Fix broken locking in bch2_ioctl_subvolume_destroy()" * tag 'pull-bcachefs-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: bch2_ioctl_subvolume_destroy(): fix locking new helper: user_path_locked_at()	2024-01-12 18:04:01 -08:00
Muhammad Usama Anjum	4cccb6221c	fs/proc/task_mmu: move mmu notification mechanism inside mm lock Move mmu notification mechanism inside mm lock to prevent race condition in other components which depend on it. The notifier will invalidate memory range. Depending upon the number of iterations, different memory ranges would be invalidated. The following warning would be removed by this patch: WARNING: CPU: 0 PID: 5067 at arch/x86/kvm/../../../virt/kvm/kvm_main.c:734 kvm_mmu_notifier_change_pte+0x860/0x960 arch/x86/kvm/../../../virt/kvm/kvm_main.c:734 There is no behavioural and performance change with this patch when there is no component registered with the mmu notifier. [akpm@linux-foundation.org: narrow the scope of `range', per Sean] Link: https://lkml.kernel.org/r/20240109112445.590736-1-usama.anjum@collabora.com Fixes: `52526ca7fd` ("fs/proc/task_mmu: implement IOCTL to get and optionally clear info about PTEs") Signed-off-by: Muhammad Usama Anjum <usama.anjum@collabora.com> Reported-by: syzbot+81227d2bd69e9dedb802@syzkaller.appspotmail.com Link: https://lore.kernel.org/all/000000000000f6d051060c6785bc@google.com/ Reviewed-by: Sean Christopherson <seanjc@google.com> Cc: Andrei Vagin <avagin@google.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: David Hildenbrand <david@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: Liam R. Howlett <Liam.Howlett@oracle.com> Cc: Michał Mirosław <mirq-linux@rere.qmqm.pl> Cc: Peter Xu <peterx@redhat.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Suren Baghdasaryan <surenb@google.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2024-01-12 15:20:46 -08:00
Linus Torvalds	70d201a408	f2fs update for 6.8-rc1 In this series, we've some progress to support Zoned block device regarding to the power-cut recovery flow and enabling checkpoint=disable feature which is essential for Android OTA. Other than that, some patches touched sysfs entries and tracepoints which are minor, while several bug fixes on error handlers and compression flows are good to improve the overall stability. Enhancement: - enable checkpoint=disable for zoned block device - sysfs entries such as discard status, discard_io_aware, dir_level - tracepoints such as f2fs_vm_page_mkwrite(), f2fs_rename(), f2fs_new_inode() - use shared inode lock during f2fs_fiemap() and f2fs_seek_block() Bug fix: - address some power-cut recovery issues on zoned block device - handle errors and logics on do_garbage_collect(), f2fs_reserve_new_block(), f2fs_move_file_range(), f2fs_recover_xattr_data() - don't set FI_PREALLOCATED_ALL for partial write - fix to update iostat correctly in f2fs_filemap_fault() - fix to wait on block writeback for post_read case - fix to tag gcing flag on page during block migration - restrict max filesize for 16K f2fs - fix to avoid dirent corruption - explicitly null-terminate the xattr list There are also several clean-up patches to remove dead codes and better readability. -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEE00UqedjCtOrGVvQiQBSofoJIUNIFAmWgMYcACgkQQBSofoJI UNJShxAAiYOXP7LPOAbPS1251BBgl8AIfs6u96hGTZkxOYsLHrBBbPbkWf3+nVbC JsBsVOe9K50rssK9kPg6XHPbmFGC8ERlyYcZTpONLfjtHOaQicbRnc//2qOvnCx8 JOKcMVkZyLU/HbOCoUW6mzNCQlOl0aAV8tRcb7jwAxT0HgpjHTHxej/62gRcPKzC 1E5w4iNTY//R97YGB36jPeGlKhbBZ7Ox1NM6AWadgE7B0j9rcYiBnPQllyeyaVVo XMCWRdl42tNMks2zgvU+vC41OrZ55bwLTQmVj3P1wnyKXig5/ZLQsrEcIGE+b2tP Mx+imCIRNYZqLwv5KYl6FU+KuLQGuZT1AjpP70Cb95WLyiYvVE6+xeiZg0fVTCEF 3Hg7lEqMtAEAh1NEmJyYmbiAm9KQ3vHyse9ix++tfm+Xvgqj8b2flmzAtIFKpCBV J+yFI+A55IYuYZt7gzPoZLkQL0tULPf80TKQrzwlnHNtZ6T6FK2Nunu+Urwf1/Th s5IulqHJZxHU/Bgd6yQZUVfDILcXTkqNCpO3+qLZMPZizlH1hXiJFTeVzS6mnGvZ sK2LL4rEJ8EhDHU1F0SJzCWJcuR8cQ/t2zKYUygo9LvHbtEM1bZwC1Bqfolt7NrU +pgiM2wnE9yjkPdfZN1JgYZDq0/lGvxPQ5NAc/5ERX71QonRyn8= =MQl3 -----END PGP SIGNATURE----- Merge tag 'f2fs-for-6.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs Pull f2fs update from Jaegeuk Kim: "In this series, we've some progress to support Zoned block device regarding to the power-cut recovery flow and enabling checkpoint=disable feature which is essential for Android OTA. Other than that, some patches touched sysfs entries and tracepoints which are minor, while several bug fixes on error handlers and compression flows are good to improve the overall stability. Enhancements: - enable checkpoint=disable for zoned block device - sysfs entries such as discard status, discard_io_aware, dir_level - tracepoints such as f2fs_vm_page_mkwrite(), f2fs_rename(), f2fs_new_inode() - use shared inode lock during f2fs_fiemap() and f2fs_seek_block() Bug fixes: - address some power-cut recovery issues on zoned block device - handle errors and logics on do_garbage_collect(), f2fs_reserve_new_block(), f2fs_move_file_range(), f2fs_recover_xattr_data() - don't set FI_PREALLOCATED_ALL for partial write - fix to update iostat correctly in f2fs_filemap_fault() - fix to wait on block writeback for post_read case - fix to tag gcing flag on page during block migration - restrict max filesize for 16K f2fs - fix to avoid dirent corruption - explicitly null-terminate the xattr list There are also several clean-up patches to remove dead codes and better readability" * tag 'f2fs-for-6.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (33 commits) f2fs: show more discard status by sysfs f2fs: Add error handling for negative returns from do_garbage_collect f2fs: Constrain the modification range of dir_level in the sysfs f2fs: Use wait_event_freezable_timeout() for freezable kthread f2fs: fix to check return value of f2fs_recover_xattr_data f2fs: don't set FI_PREALLOCATED_ALL for partial write f2fs: fix to update iostat correctly in f2fs_filemap_fault() f2fs: fix to check compress file in f2fs_move_file_range() f2fs: fix to wait on block writeback for post_read case f2fs: fix to tag gcing flag on page during block migration f2fs: add tracepoint for f2fs_vm_page_mkwrite() f2fs: introduce f2fs_invalidate_internal_cache() for cleanup f2fs: update blkaddr in __set_data_blkaddr() for cleanup f2fs: introduce get_dnode_addr() to clean up codes f2fs: delete obsolete FI_DROP_CACHE f2fs: delete obsolete FI_FIRST_BLOCK_WRITTEN f2fs: Restrict max filesize for 16K f2fs f2fs: let's finish or reset zones all the time f2fs: check write pointers when checkpoint=disable f2fs: fix write pointers on zoned device after roll forward ...	2024-01-11 20:39:15 -08:00
Linus Torvalds	6a31658aa1	11 ksmbd server fixes -----BEGIN PGP SIGNATURE----- iQGzBAABCgAdFiEE6fsu8pdIjtWE/DpLiiy9cAdyT1EFAmWfZIYACgkQiiy9cAdy T1Gijgv+MH40buaJETR2FjRxzUiC92FFafGcX+fh5dLM22Sxb9AW+cNzOf/CSqNE 0AKpdhh+MVq0xiYwaXrrGGUfpUOZ3fwTHNJjpCt5o34b8U6IrHIah96noCRhDwQm pE1Loi5TyWYRYsCjajau+tIi9+lgROkJ9eM34bx2dkkDw5ng4MQAygqJDwDM1n+i N/O3vRqG3ZUrK5h7v9kVBdFZlkiiVm8cjNH86prfecnT6TTa/6QZoNd9kQAyvGe2 hWhW4J70y+H3JHcaBeGp11wcLcyBeFwBdqeo+os5EUPN/BbWaXuUqktBKvfnHIWA ucwrAMNVVK82JhudlkIGuYVgEOrLDrZsWjIJmDajFtJuO3Yo9VLkeZKcMRhPj45H kodPbCfPxNXg/y2fn1P3nCyiRSHHFb0QFEvK5JZV0Zwv5yPeziA/5e+6lu6OWNW7 VYQb2ZdOzuNW6aXHXnPRQQDSbVFsgD5dVxqYA9cugvtKAslUb9Q6z4kNkCFbYq/F xJ0VH0K5 =UhAE -----END PGP SIGNATURE----- Merge tag '6.8-rc-smb-server-fixes' of git://git.samba.org/ksmbd Pull smb server updates from Steve French: - memory allocation fix - three lease fixes, including important rename fix - read only share fix - thread freeze fix - three cleanup fixes (two kernel doc related) - locking fix in setting EAs - packet header validation fix * tag '6.8-rc-smb-server-fixes' of git://git.samba.org/ksmbd: ksmbd: Add missing set_freezable() for freezable kthread ksmbd: free ppace array on error in parse_dacl ksmbd: send lease break notification on FILE_RENAME_INFORMATION ksmbd: don't allow O_TRUNC open on read-only share ksmbd: vfs: fix all kernel-doc warnings ksmbd: auth: fix most kernel-doc warnings ksmbd: Remove usage of the deprecated ida_simple_xx() API ksmbd: don't increment epoch if current state and request state are same ksmbd: fix potential circular locking issue in smb2_set_ea() ksmbd: set v2 lease version on lease upgrade ksmbd: validate the zero field of packet header	2024-01-11 20:27:41 -08:00
Linus Torvalds	488926926a	misc cleanups (the part that hadn't been picked by individual fs trees) Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> -----BEGIN PGP SIGNATURE----- iHUEABYIAB0WIQQqUNBr3gm4hGXdBJlZ7Krx/gZQ6wUCZZ/BCAAKCRBZ7Krx/gZQ 68qqAQD6LtfYLDJGdJM+lNpyiG4BA7coYpPlJtmH7mzL+MbFPgEAnM7XsK6zyvza 3+rEggLM0UFWjg9Ln7Nlq035TeYtFwo= =w1mD -----END PGP SIGNATURE----- Merge tag 'pull-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull misc filesystem updates from Al Viro: "Misc cleanups (the part that hadn't been picked by individual fs trees)" * tag 'pull-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: apparmorfs: don't duplicate kfree_link() orangefs: saner arguments passing in readdir guts ocfs2_find_match(): there's no such thing as NULL or negative ->d_parent reiserfs_add_entry(): get rid of pointless namelen checks __ocfs2_add_entry(), ocfs2_prepare_dir_for_insert(): namelen checks ext4_add_entry(): ->d_name.len is never 0 befs: d_obtain_alias(ERR_PTR(...)) will do the right thing affs: d_obtain_alias(ERR_PTR(...)) will do the right thing /proc/sys: use d_splice_alias() calling conventions to simplify failure exits hostfs: use d_splice_alias() calling conventions to simplify failure exits udf_fiiter_add_entry(): check for zero ->d_name.len is bogus... udf: d_obtain_alias(ERR_PTR(...)) will do the right thing... udf: d_splice_alias() will do the right thing on ERR_PTR() inode nfsd: kill stale comment about simple_fill_super() requirements bfs_add_entry(): get rid of pointless ->d_name.len checks nilfs2: d_obtain_alias(ERR_PTR(...)) will do the right thing... zonefs: d_splice_alias() will do the right thing on ERR_PTR() inode	2024-01-11 20:23:50 -08:00
Linus Torvalds	499aa1ca4e	dcache stuff for this cycle change of locking rules for __dentry_kill(), regularized refcounting rules in that area, assorted cleanups and removal of weird corner cases (e.g. now ->d_iput() on child is always called before the parent might hit __dentry_kill(), etc.) Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> -----BEGIN PGP SIGNATURE----- iHUEABYIAB0WIQQqUNBr3gm4hGXdBJlZ7Krx/gZQ6wUCZZ+sQQAKCRBZ7Krx/gZQ 6ybjAQDM5jiS93IUzfHjCWq0nVBX5YGbDAkZOeqxbmIdQb+2UAEA6elP5r0fBBcA seo3bry4DirQMDaA/Cjh4+8r71YSOQs= =7+Hk -----END PGP SIGNATURE----- Merge tag 'pull-dcache' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull dcache updates from Al Viro: "Change of locking rules for __dentry_kill(), regularized refcounting rules in that area, assorted cleanups and removal of weird corner cases (e.g. now ->d_iput() on child is always called before the parent might hit __dentry_kill(), etc)" * tag 'pull-dcache' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (40 commits) dcache: remove unnecessary NULL check in dget_dlock() kill DCACHE_MAY_FREE __d_unalias() doesn't use inode argument d_alloc_parallel(): in-lookup hash insertion doesn't need an RCU variant get rid of DCACHE_GENOCIDE d_genocide(): move the extern into fs/internal.h simple_fill_super(): don't bother with d_genocide() on failure nsfs: use d_make_root() d_alloc_pseudo(): move setting ->d_op there from the (sole) caller kill d_instantate_anon(), fold __d_instantiate_anon() into remaining caller retain_dentry(): introduce a trimmed-down lockless variant __dentry_kill(): new locking scheme d_prune_aliases(): use a shrink list switch select_collect{,2}() to use of to_shrink_list() to_shrink_list(): call only if refcount is 0 fold dentry_kill() into dput() don't try to cut corners in shrink_lock_dentry() fold the call of retain_dentry() into fast_dput() Call retain_dentry() with refcount 0 dentry_kill(): don't bother with retain_dentry() on slow path ...	2024-01-11 20:11:35 -08:00
Linus Torvalds	bf4e7080ae	fix directory locking scheme on rename broken in 6.5; we really can't lock two unrelated directories without holding ->s_vfs_rename_mutex first and in case of same-parent rename of a subdirectory 6.5 ends up doing just that. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> -----BEGIN PGP SIGNATURE----- iHUEABYIAB0WIQQqUNBr3gm4hGXdBJlZ7Krx/gZQ6wUCZZ+lyQAKCRBZ7Krx/gZQ 60MWAP94hTqeMIpjhsUIkrTnylrIFaiw4UCWFJzIRG1QQYKqCgD/XUaWI9np7dL6 0wR/j4CQSdJjiEFKUFE2pD3QoSuJYAQ= =+x0+ -----END PGP SIGNATURE----- Merge tag 'pull-rename' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull rename updates from Al Viro: "Fix directory locking scheme on rename This was broken in 6.5; we really can't lock two unrelated directories without holding ->s_vfs_rename_mutex first and in case of same-parent rename of a subdirectory 6.5 ends up doing just that" * tag 'pull-rename' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: rename(): avoid a deadlock in the case of parents having no common ancestor kill lock_two_inodes() rename(): fix the locking of subdirectories f2fs: Avoid reading renamed directory if parent does not change ext4: don't access the source subdirectory content on same-directory rename ext2: Avoid reading renamed directory if parent does not change udf_rename(): only access the child content on cross-directory rename ocfs2: Avoid touching renamed directory if parent does not change reiserfs: Avoid touching renamed directory if parent does not change	2024-01-11 20:00:22 -08:00
Linus Torvalds	2f444347a8	minixfs kmap_local_page() switchover and related fixes - very similar to sysv series. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> -----BEGIN PGP SIGNATURE----- iHUEABYIAB0WIQQqUNBr3gm4hGXdBJlZ7Krx/gZQ6wUCZZ+hqgAKCRBZ7Krx/gZQ 61nkAP0Ybu4QRGISg1w5uKlNY8uNO37wR51oZ8c6DrOHOqv7tQEA+tB2LGc0rDjp 5J6mMlmCQlJnXj4k3OjVch9S7xdsSA8= =0C5U -----END PGP SIGNATURE----- Merge tag 'pull-minix' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull minixfs updates from Al Viro: "minixfs kmap_local_page() switchover and related fixes - very similar to sysv series" * tag 'pull-minix' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: minixfs: switch to kmap_local_page() minixfs: Use dir_put_page() in minix_unlink() and minix_rename() minixfs: change the signature of dir_get_page() minixfs: use offset_in_page()	2024-01-11 19:54:18 -08:00
Linus Torvalds	5b9b41617b	Another moderately busy cycle for documentation, including: - The minimum Sphinx requirement has been raised to 2.4.4, following a warning that was added in 6.2. - Some reworking of the Documentation/process front page to, hopefully, make it more useful. - Various kernel-doc tweaks to, for example, make it deal properly with __counted_by annotations. - We have also restored a warning for documentation of nonexistent structure members that disappeared a while back. That had the delightful consequence of adding some 600 warnings to the docs build. A sustained effort by Randy, Vegard, and myself has addressed almost all of those, bringing the documentation back into sync with the code. The fixes are going through the appropriate maintainer trees. - Various improvements to the HTML rendered docs, including automatic links to Git revisions and a nice new pulldown to make translations easy to access. - Speaking of translations, more of those for Spanish and Chinese. ...plus the usual stream of documentation updates and typo fixes. -----BEGIN PGP SIGNATURE----- iQFDBAABCAAtFiEEIw+MvkEiF49krdp9F0NaE2wMflgFAmWcRKMPHGNvcmJldEBs d24ubmV0AAoJEBdDWhNsDH5YTKIH/AxBt/3iWt40dPf18arZHLU6tdUbmg01ttef CNKWkniCmABGKc//KYDXvjZMRDt0YlrS0KgUzrb8nIQTBlZG40D+88EwjXE0HeGP xt1Fk7OPOiJEqBZ3HEe0PDVfOiA+4yR6CmDKklCJuKg77X9atklneBwPUw/cOASk CWj+BdbwPBiSNQv48Lp87rGusKwnH/g0MN2uS0z9MPr1DYjM1K8+ngZjGW24lZHt qs5yhP43mlZGBF/lwNJXQp/xhnKAqJ9XwylBX9Wmaoxaz9yyzNVsADGvROMudgzi 9YB+Jdy7Z0JSrVoLIRhUuDOv7aW8vk+8qLmGJt2aTIsqehbQ6pk= =fCtT -----END PGP SIGNATURE----- Merge tag 'docs-6.8' of git://git.lwn.net/linux Pull documentation update from Jonathan Corbet: "Another moderately busy cycle for documentation, including: - The minimum Sphinx requirement has been raised to 2.4.4, following a warning that was added in 6.2 - Some reworking of the Documentation/process front page to, hopefully, make it more useful - Various kernel-doc tweaks to, for example, make it deal properly with __counted_by annotations - We have also restored a warning for documentation of nonexistent structure members that disappeared a while back. That had the delightful consequence of adding some 600 warnings to the docs build. A sustained effort by Randy, Vegard, and myself has addressed almost all of those, bringing the documentation back into sync with the code. The fixes are going through the appropriate maintainer trees - Various improvements to the HTML rendered docs, including automatic links to Git revisions and a nice new pulldown to make translations easy to access - Speaking of translations, more of those for Spanish and Chinese ... plus the usual stream of documentation updates and typo fixes" * tag 'docs-6.8' of git://git.lwn.net/linux: (57 commits) MAINTAINERS: use tabs for indent of CONFIDENTIAL COMPUTING THREAT MODEL A reworked process/index.rst ring-buffer/Documentation: Add documentation on buffer_percent file Translated the RISC-V architecture boot documentation. Docs: remove mentions of fdformat from util-linux Docs/zh_CN: Fix the meaning of DEBUG to pr_debug() Documentation: move driver-api/dcdbas to userspace-api/ Documentation: move driver-api/isapnp to userspace-api/ Documentation/core-api : fix typo in workqueue Documentation/trace: Fixed typos in the ftrace FLAGS section kernel-doc: handle a void function without producing a warning scripts/get_abi.pl: ignore some temp files docs: kernel_abi.py: fix command injection scripts/get_abi: fix source path leak CREDITS, MAINTAINERS, docs/process/howto: Update man-pages' maintainer docs: translations: add translations links when they exist kernel-doc: Align quick help and the code MAINTAINERS: add reviewer for Spanish translations docs: ignore __counted_by attribute in structure definitions scripts: kernel-doc: Clarify missing struct member description ..	2024-01-11 19:46:52 -08:00
Qu Wenruo	173431b274	btrfs: defrag: reject unknown flags of btrfs_ioctl_defrag_range_args Add extra sanity check for btrfs_ioctl_defrag_range_args::flags. This is not really to enhance fuzzing tests, but as a preparation for future expansion on btrfs_ioctl_defrag_range_args. In the future we're going to add new members, allowing more fine tuning for btrfs defrag. Without the -ENONOTSUPP error, there would be no way to detect if the kernel supports those new defrag features. CC: stable@vger.kernel.org # 4.14+ Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2024-01-12 02:04:19 +01:00
Omar Sandoval	3324d05478	btrfs: avoid copying BTRFS_ROOT_SUBVOL_DEAD flag to snapshot of subvolume being deleted Sweet Tea spotted a race between subvolume deletion and snapshotting that can result in the root item for the snapshot having the BTRFS_ROOT_SUBVOL_DEAD flag set. The race is: Thread 1 \| Thread 2 ----------------------------------------------\|---------- btrfs_delete_subvolume \| btrfs_set_root_flags(BTRFS_ROOT_SUBVOL_DEAD)\| \|btrfs_mksubvol \| down_read(subvol_sem) \| create_snapshot \| ... \| create_pending_snapshot \| copy root item from source down_write(subvol_sem) \| This flag is only checked in send and swap activate, which this would cause to fail mysteriously. create_snapshot() now checks the root refs to reject a deleted subvolume, so we can fix this by locking subvol_sem earlier so that the BTRFS_ROOT_SUBVOL_DEAD flag and the root refs are updated atomically. CC: stable@vger.kernel.org # 4.14+ Reported-by: Sweet Tea Dorminy <sweettea-kernel@dorminy.me> Reviewed-by: Sweet Tea Dorminy <sweettea-kernel@dorminy.me> Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: Omar Sandoval <osandov@fb.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2024-01-12 02:00:21 +01:00
Omar Sandoval	7081929ab2	btrfs: don't abort filesystem when attempting to snapshot deleted subvolume If the source file descriptor to the snapshot ioctl refers to a deleted subvolume, we get the following abort: BTRFS: Transaction aborted (error -2) WARNING: CPU: 0 PID: 833 at fs/btrfs/transaction.c:1875 create_pending_snapshot+0x1040/0x1190 [btrfs] Modules linked in: pata_acpi btrfs ata_piix libata scsi_mod virtio_net blake2b_generic xor net_failover virtio_rng failover scsi_common rng_core raid6_pq libcrc32c CPU: 0 PID: 833 Comm: t_snapshot_dele Not tainted 6.7.0-rc6 #2 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-1.fc39 04/01/2014 RIP: 0010:create_pending_snapshot+0x1040/0x1190 [btrfs] RSP: 0018:ffffa09c01337af8 EFLAGS: 00010282 RAX: 0000000000000000 RBX: ffff9982053e7c78 RCX: 0000000000000027 RDX: ffff99827dc20848 RSI: 0000000000000001 RDI: ffff99827dc20840 RBP: ffffa09c01337c00 R08: 0000000000000000 R09: ffffa09c01337998 R10: 0000000000000003 R11: ffffffffb96da248 R12: fffffffffffffffe R13: ffff99820535bb28 R14: ffff99820b7bd000 R15: ffff99820381ea80 FS: 00007fe20aadabc0(0000) GS:ffff99827dc00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000559a120b502f CR3: 00000000055b6000 CR4: 00000000000006f0 Call Trace: <TASK> ? create_pending_snapshot+0x1040/0x1190 [btrfs] ? __warn+0x81/0x130 ? create_pending_snapshot+0x1040/0x1190 [btrfs] ? report_bug+0x171/0x1a0 ? handle_bug+0x3a/0x70 ? exc_invalid_op+0x17/0x70 ? asm_exc_invalid_op+0x1a/0x20 ? create_pending_snapshot+0x1040/0x1190 [btrfs] ? create_pending_snapshot+0x1040/0x1190 [btrfs] create_pending_snapshots+0x92/0xc0 [btrfs] btrfs_commit_transaction+0x66b/0xf40 [btrfs] btrfs_mksubvol+0x301/0x4d0 [btrfs] btrfs_mksnapshot+0x80/0xb0 [btrfs] __btrfs_ioctl_snap_create+0x1c2/0x1d0 [btrfs] btrfs_ioctl_snap_create_v2+0xc4/0x150 [btrfs] btrfs_ioctl+0x8a6/0x2650 [btrfs] ? kmem_cache_free+0x22/0x340 ? do_sys_openat2+0x97/0xe0 __x64_sys_ioctl+0x97/0xd0 do_syscall_64+0x46/0xf0 entry_SYSCALL_64_after_hwframe+0x6e/0x76 RIP: 0033:0x7fe20abe83af RSP: 002b:00007ffe6eff1360 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 RAX: ffffffffffffffda RBX: 0000000000000004 RCX: 00007fe20abe83af RDX: 00007ffe6eff23c0 RSI: 0000000050009417 RDI: 0000000000000003 RBP: 0000000000000003 R08: 0000000000000000 R09: 00007fe20ad16cd0 R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000 R13: 00007ffe6eff13c0 R14: 00007fe20ad45000 R15: 0000559a120b6d58 </TASK> ---[ end trace 0000000000000000 ]--- BTRFS: error (device vdc: state A) in create_pending_snapshot:1875: errno=-2 No such entry BTRFS info (device vdc: state EA): forced readonly BTRFS warning (device vdc: state EA): Skipping commit of aborted transaction. BTRFS: error (device vdc: state EA) in cleanup_transaction:2055: errno=-2 No such entry This happens because create_pending_snapshot() initializes the new root item as a copy of the source root item. This includes the refs field, which is 0 for a deleted subvolume. The call to btrfs_insert_root() therefore inserts a root with refs == 0. btrfs_get_new_fs_root() then finds the root and returns -ENOENT if refs == 0, which causes create_pending_snapshot() to abort. Fix it by checking the source root's refs before attempting the snapshot, but after locking subvol_sem to avoid racing with deletion. CC: stable@vger.kernel.org # 4.14+ Reviewed-by: Sweet Tea Dorminy <sweettea-kernel@dorminy.me> Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: Omar Sandoval <osandov@fb.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2024-01-12 02:00:18 +01:00
Naohiro Aota	b18f3b60b3	btrfs: zoned: fix lock ordering in btrfs_zone_activate() The btrfs CI reported a lockdep warning as follows by running generic generic/129. WARNING: possible circular locking dependency detected 6.7.0-rc5+ #1 Not tainted ------------------------------------------------------ kworker/u5:5/793427 is trying to acquire lock: ffff88813256d028 (&cache->lock){+.+.}-{2:2}, at: btrfs_zone_finish_one_bg+0x5e/0x130 but task is already holding lock: ffff88810a23a318 (&fs_info->zone_active_bgs_lock){+.+.}-{2:2}, at: btrfs_zone_finish_one_bg+0x34/0x130 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (&fs_info->zone_active_bgs_lock){+.+.}-{2:2}: ... -> #0 (&cache->lock){+.+.}-{2:2}: ... This is because we take fs_info->zone_active_bgs_lock after a block_group's lock in btrfs_zone_activate() while doing the opposite in other places. Fix the issue by expanding the fs_info->zone_active_bgs_lock's critical section and taking it before a block_group's lock. Fixes: `a7e1ac7bdc` ("btrfs: zoned: reserve zones for an active metadata/system block group") CC: stable@vger.kernel.org # 6.6 Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2024-01-12 02:00:09 +01:00
Naohiro Aota	d967c914a6	btrfs: fix unbalanced unlock of mapping_tree_lock The error path of btrfs_get_chunk_map() releases fs_info->mapping_tree_lock. But, it is taken and released in btrfs_find_chunk_map(). So, there is no need to do so. Fixes: `7dc66abb5a` ("btrfs: use a dedicated data structure for chunk maps") Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com> Signed-off-by: David Sterba <dsterba@suse.com>	2024-01-12 01:59:59 +01:00
Fedor Pchelkin	f03e274a8b	btrfs: ref-verify: free ref cache before clearing mount opt As clearing REF_VERIFY mount option indicates there were some errors in a ref-verify process, a ref cache is not relevant anymore and should be freed. btrfs_free_ref_cache() requires REF_VERIFY option being set so call it just before clearing the mount option. Found by Linux Verification Center (linuxtesting.org) with Syzkaller. Reported-by: syzbot+be14ed7728594dc8bd42@syzkaller.appspotmail.com Fixes: `fd708b81d9` ("Btrfs: add a extent ref verify tool") CC: stable@vger.kernel.org # 5.4+ Closes: https://lore.kernel.org/lkml/000000000000e5a65c05ee832054@google.com/ Reported-by: syzbot+c563a3c79927971f950f@syzkaller.appspotmail.com Closes: https://lore.kernel.org/lkml/0000000000007fe09705fdc6086c@google.com/ Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: Fedor Pchelkin <pchelkin@ispras.ru> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2024-01-12 01:59:49 +01:00
Dmitry Antipov	6ff09b6b8c	btrfs: fix kvcalloc() arguments order in btrfs_ioctl_send() When compiling with gcc version 14.0.0 20231220 (experimental) and W=1, I've noticed the following warning: fs/btrfs/send.c: In function 'btrfs_ioctl_send': fs/btrfs/send.c:8208:44: warning: 'kvcalloc' sizes specified with 'sizeof' in the earlier argument and not in the later argument [-Wcalloc-transposed-args] 8208 \| sctx->clone_roots = kvcalloc(sizeof(*sctx->clone_roots), \| ^ Since 'n' and 'size' arguments of 'kvcalloc()' are multiplied to calculate the final size, their actual order doesn't affect the result and so this is not a bug. But it's still worth to fix it. Signed-off-by: Dmitry Antipov <dmantipov@yandex.ru> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2024-01-12 01:59:45 +01:00
Naohiro Aota	02444f2ac2	btrfs: zoned: optimize hint byte for zoned allocator Writing sequentially to a huge file on btrfs on a SMR HDD revealed a decline of the performance (220 MiB/s to 30 MiB/s after 500 minutes). The performance goes down because of increased latency of the extent allocation, which is induced by a traversing of a lot of full block groups. So, this patch optimizes the ffe_ctl->hint_byte by choosing a block group with sufficient size from the active block group list, which does not contain full block groups. After applying the patch, the performance is maintained well. Fixes: `2eda57089e` ("btrfs: zoned: implement sequential extent allocation") CC: stable@vger.kernel.org # 5.15+ Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com> Signed-off-by: David Sterba <dsterba@suse.com>	2024-01-12 01:59:43 +01:00
Naohiro Aota	b271fee9a4	btrfs: zoned: factor out prepare_allocation_zoned() Factor out prepare_allocation_zoned() for further extension. While at it, optimize the if-branch a bit. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2024-01-12 01:59:41 +01:00
Linus Torvalds	01d550f0fc	for-6.8/block-2024-01-08 -----BEGIN PGP SIGNATURE----- iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmWcIOIQHGF4Ym9lQGtl cm5lbC5kawAKCRD301j7KXHgpn6hD/9oO7U75PuxUwYYHZ9Uzxpw6gQ0LEmeyJmE NQYCkfYHVq3IsgOdF7elI9v3qtr6v8V8CdB7cByrnn3DgwsMuiTKZZ0dK7vH37PO DX+/xn349e8oH7RdRo7f3m95g1YbHfpfnj0Rc4mjTDV72Jr/HlLTVgGTQg8DEnCR wBIFmeuBHHgeeLh87gsWLAP7ReReiy9V1uqpDFsko2/4BxRAM/8eedkwcAxD8aEy rd+dT/SBQj2cOdQMUeExT3gWjwzHh6ZHx3f1WCLK5fdck6BogH2hBUeri6F/H98L HoaXjBZYBTH68hB/mnO5I4g1ZlrVM74Vp7JPa3e1SFFtyEi6lsyrk2J3GoNh0E7r pXqH5kAcaJwBsBrbRGuvEyGbn9RLTaN5Gvseud0VE4oMruyodTniQaHXuIGackgz sMavMho4486EUWPaF7gIBdLNK1hO13w+IDZ4+3oBxhudMqdgZbk4iYpOCqQ7QY5G 2vkzAE/sZ+aVNXeaIQOI8dE5clBy8gJ+6+t8dm3DY1r1xdbcnU40iZ8/fri3h69r vHs9bpQnVWZF0gEyEflY1pkcAPpIkvMmWCR7Ehy5YCkIfa+qfSL05o3dicpWovLP N+gCtpkhTK2AvmUWsUMypMLRvoSOImyCIiobrr3qNBaUdgRP8xKfUa72RuRp8cGl Vrj5oAiE3w== =YAfp -----END PGP SIGNATURE----- Merge tag 'for-6.8/block-2024-01-08' of git://git.kernel.dk/linux Pull block updates from Jens Axboe: "Pretty quiet round this time around. This contains: - NVMe updates via Keith: - nvme fabrics spec updates (Guixin, Max) - nvme target udpates (Guixin, Evan) - nvme attribute refactoring (Daniel) - nvme-fc numa fix (Keith) - MD updates via Song: - Fix/Cleanup RCU usage from conf->disks[i].rdev (Yu Kuai) - Fix raid5 hang issue (Junxiao Bi) - Add Yu Kuai as Reviewer of the md subsystem - Remove deprecated flavors (Song Liu) - raid1 read error check support (Li Nan) - Better handle events off-by-1 case (Alex Lyakas) - Efficiency improvements for passthrough (Kundan) - Support for mapping integrity data directly (Keith) - Zoned write fix (Damien) - rnbd fixes (Kees, Santosh, Supriti) - Default to a sane discard size granularity (Christoph) - Make the default max transfer size naming less confusing (Christoph) - Remove support for deprecated host aware zoned model (Christoph) - Misc fixes (me, Li, Matthew, Min, Ming, Randy, liyouhong, Daniel, Bart, Christoph)" * tag 'for-6.8/block-2024-01-08' of git://git.kernel.dk/linux: (78 commits) block: Treat sequential write preferred zone type as invalid block: remove disk_clear_zoned sd: remove the !ZBC && blk_queue_is_zoned case in sd_read_block_characteristics drivers/block/xen-blkback/common.h: Fix spelling typo in comment blk-cgroup: fix rcu lockdep warning in blkg_lookup() blk-cgroup: don't use removal safe list iterators block: floor the discard granularity to the physical block size mtd_blkdevs: use the default discard granularity bcache: use the default discard granularity zram: use the default discard granularity null_blk: use the default discard granularity nbd: use the default discard granularity ubd: use the default discard granularity block: default the discard granularity to sector size bcache: discard_granularity should not be smaller than a sector block: remove two comments in bio_split_discard block: rename and document BLK_DEF_MAX_SECTORS loop: don't abuse BLK_DEF_MAX_SECTORS aoe: don't abuse BLK_DEF_MAX_SECTORS null_blk: don't cap max_hw_sectors to BLK_DEF_MAX_SECTORS ...	2024-01-11 13:58:04 -08:00
Linus Torvalds	3e7aeb78ab	Networking changes for 6.8. Core & protocols ---------------- - Analyze and reorganize core networking structs (socks, netdev, netns, mibs) to optimize cacheline consumption and set up build time warnings to safeguard against future header changes. This improves TCP performances with many concurrent connections up to 40%. - Add page-pool netlink-based introspection, exposing the memory usage and recycling stats. This helps indentify bad PP users and possible leaks. - Refine TCP/DCCP source port selection to no longer favor even source port at connect() time when IP_LOCAL_PORT_RANGE is set. This lowers the time taken by connect() for hosts having many active connections to the same destination. - Refactor the TCP bind conflict code, shrinking related socket structs. - Refactor TCP SYN-Cookie handling, as a preparation step to allow arbitrary SYN-Cookie processing via eBPF. - Tune optmem_max for 0-copy usage, increasing the default value to 128KB and namespecifying it. - Allow coalescing for cloned skbs coming from page pools, improving RX performances with some common configurations. - Reduce extension header parsing overhead at GRO time. - Add bridge MDB bulk deletion support, allowing user-space to request the deletion of matching entries. - Reorder nftables struct members, to keep data accessed by the datapath first. - Introduce TC block ports tracking and use. This allows supporting multicast-like behavior at the TC layer. - Remove UAPI support for retired TC qdiscs (dsmark, CBQ and ATM) and classifiers (RSVP and tcindex). - More data-race annotations. - Extend the diag interface to dump TCP bound-only sockets. - Conditional notification of events for TC qdisc class and actions. - Support for WPAN dynamic associations with nearby devices, to form a sub-network using a specific PAN ID. - Implement SMCv2.1 virtual ISM device support. - Add support for Batman-avd mulicast packet type. BPF --- - Tons of verifier improvements: - BPF register bounds logic and range support along with a large test suite - log improvements - complete precision tracking support for register spills - track aligned STACK_ZERO cases as imprecise spilled registers. It improves the verifier "instructions processed" metric from single digit to 50-60% for some programs - support for user's global BPF subprogram arguments with few commonly requested annotations for a better developer experience - support tracking of BPF_JNE which helps cases when the compiler transforms (unsigned) "a > 0" into "if a == 0 goto xxx" and the like - several fixes - Add initial TX metadata implementation for AF_XDP with support in mlx5 and stmmac drivers. Two types of offloads are supported right now, that is, TX timestamp and TX checksum offload. - Fix kCFI bugs in BPF all forms of indirect calls from BPF into kernel and from kernel into BPF work with CFI enabled. This allows BPF to work with CONFIG_FINEIBT=y. - Change BPF verifier logic to validate global subprograms lazily instead of unconditionally before the main program, so they can be guarded using BPF CO-RE techniques. - Support uid/gid options when mounting bpffs. - Add a new kfunc which acquires the associated cgroup of a task within a specific cgroup v1 hierarchy where the latter is identified by its id. - Extend verifier to allow bpf_refcount_acquire() of a map value field obtained via direct load which is a use-case needed in sched_ext. - Add BPF link_info support for uprobe multi link along with bpftool integration for the latter. - Support for VLAN tag in XDP hints. - Remove deprecated bpfilter kernel leftovers given the project is developed in user-space (https://github.com/facebook/bpfilter). Misc ---- - Support for parellel TC self-tests execution. - Increase MPTCP self-tests coverage. - Updated the bridge documentation, including several so-far undocumented features. - Convert all the net self-tests to run in unique netns, to avoid random failures due to conflict and allow concurrent runs. - Add TCP-AO self-tests. - Add kunit tests for both cfg80211 and mac80211. - Autogenerate Netlink families documentation from YAML spec. - Add yml-gen support for fixed headers and recursive nests, the tool can now generate user-space code for all genetlink families for which we have specs. - A bunch of additional module descriptions fixes. - Catch incorrect freeing of pages belonging to a page pool. Driver API ---------- - Rust abstractions for network PHY drivers; do not cover yet the full C API, but already allow implementing functional PHY drivers in rust. - Introduce queue and NAPI support in the netdev Netlink interface, allowing complete access to the device <> NAPIs <> queues relationship. - Introduce notifications filtering for devlink to allow control application scale to thousands of instances. - Improve PHY validation, requesting rate matching information for each ethtool link mode supported by both the PHY and host. - Add support for ethtool symmetric-xor RSS hash. - ACPI based Wifi band RFI (WBRF) mitigation feature for the AMD platform. - Expose pin fractional frequency offset value over new DPLL generic netlink attribute. - Convert older drivers to platform remove callback returning void. - Add support for PHY package MMD read/write. New hardware / drivers ---------------------- - Ethernet: - Octeon CN10K devices - Broadcom 5760X P7 - Qualcomm SM8550 SoC - Texas Instrument DP83TG720S PHY - Bluetooth: - IMC Networks Bluetooth radio Removed ------- - WiFi: - libertas 16-bit PCMCIA support - Atmel at76c50x drivers - HostAP ISA/PCMCIA style 802.11b driver - zd1201 802.11b USB dongles - Orinoco ISA/PCMCIA 802.11b driver - Aviator/Raytheon driver - Planet WL3501 driver - RNDIS USB 802.11b driver Drivers ------- - Ethernet high-speed NICs: - Intel (100G, ice, idpf): - allow one by one port representors creation and removal - add temperature and clock information reporting - add get/set for ethtool's header split ringparam - add again FW logging - adds support switchdev hardware packet mirroring - iavf: implement symmetric-xor RSS hash - igc: add support for concurrent physical and free-running timers - i40e: increase the allowable descriptors - nVidia/Mellanox: - Preparation for Socket-Direct multi-dev netdev. That will allow in future releases combining multiple PFs devices attached to different NUMA nodes under the same netdev - Broadcom (bnxt): - TX completion handling improvements - add basic ntuple filter support - reduce MSIX vectors usage for MQPRIO offload - add VXLAN support, USO offload and TX coalesce completion for P7 - Marvell Octeon EP: - xmit-more support - add PF-VF mailbox support and use it for FW notifications for VFs - Wangxun (ngbe/txgbe): - implement ethtool functions to operate pause param, ring param, coalesce channel number and msglevel - Netronome/Corigine (nfp): - add flow-steering support - support UDP segmentation offload - Ethernet NICs embedded, slower, virtual: - Xilinx AXI: remove duplicate DMA code adopting the dma engine driver - stmmac: add support for HW-accelerated VLAN stripping - TI AM654x sw: add mqprio, frame preemption & coalescing - gve: add support for non-4k page sizes. - virtio-net: support dynamic coalescing moderation - nVidia/Mellanox Ethernet datacenter switches: - allow firmware upgrade without a reboot - more flexible support for bridge flooding via the compressed FID flooding mode - Ethernet embedded switches: - Microchip: - fine-tune flow control and speed configurations in KSZ8xxx - KSZ88X3: enable setting rmii reference - Renesas: - add jumbo frames support - Marvell: - 88E6xxx: add "eth-mac" and "rmon" stats support - Ethernet PHYs: - aquantia: add firmware load support - at803x: refactor the driver to simplify adding support for more chip variants - NXP C45 TJA11xx: Add MACsec offload support - Wifi: - MediaTek (mt76): - NVMEM EEPROM improvements - mt7996 Extremely High Throughput (EHT) improvements - mt7996 Wireless Ethernet Dispatcher (WED) support - mt7996 36-bit DMA support - Qualcomm (ath12k): - support for a single MSI vector - WCN7850: support AP mode - Intel (iwlwifi): - new debugfs file fw_dbg_clear - allow concurrent P2P operation on DFS channels - Bluetooth: - QCA2066: support HFP offload - ISO: more broadcast-related improvements - NXP: better recovery in case receiver/transmitter get out of sync Signed-off-by: Paolo Abeni <pabeni@redhat.com> -----BEGIN PGP SIGNATURE----- iQJGBAABCAAwFiEEg1AjqC77wbdLX2LbKSR5jcyPE6QFAmWdamsSHHBhYmVuaUBy ZWRoYXQuY29tAAoJECkkeY3MjxOkGC4P/2xjLzdw22ckSssuE9ORbGko9SNjnqHk PQh1E+26BHiCg5KB8VvzMsL78E79MRNXEattSW+1g7dhCvln3oi+Vd0WkdRkgt35 98Iv18zLbbwFAJeyKvmLAPAkQkMLtVj19QILBBRrugF+egEZgVSE3JBcTAiKv2ZQ HzkabA171Ri6LpCcEEtY5XuaKvimGnGzF8YMFf8rX0wtqd2p5kbY9aMe47WAGxvU Vf9548XvH+A5yVH2/4/gujtUOpA/RHuhuCMb+oo0cZ+VCC1x9MGzoXzj6r87OTkf k2W1whNzcGoin92f+9Lk1JYMuiGKBH4QVaDdNXJnYFSJWPTE7RvRsPzYTSD4/GzK yEZbzSJXpy/2vDQm16NoAxl7evRs8Sorzkw4LQRviZHI/5SAkK2ZQiCK5CO8QSYy C1LELcV5kn6Foe24xWnrWLjAGug9oJnYoGPMU5gvPmFJMvUMXqm5rmbBgUWL5Rxw q1M6gVzabCyWUy6z2G2vaqW2ZntNVvCkdsLtIX0XZkcTzNoP0MA+TuhyGz4wbiuo PeyQp/mbGnDgCYggqKIA0YWrTVxkhFrKN520cbO8qXBQytV9oFbM/0/+C0/r/5WX pL1JVzLrh6l5ME7EIQfha8UOF9j8q4ueSwb40P3AR2NaZiDABM0zfUZ6+sx+91WF ucqPEcZB5cRE =1bW6 -----END PGP SIGNATURE----- Merge tag 'net-next-6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next Pull networking updates from Paolo Abeni: "The most interesting thing is probably the networking structs reorganization and a significant amount of changes is around self-tests. Core & protocols: - Analyze and reorganize core networking structs (socks, netdev, netns, mibs) to optimize cacheline consumption and set up build time warnings to safeguard against future header changes This improves TCP performances with many concurrent connections up to 40% - Add page-pool netlink-based introspection, exposing the memory usage and recycling stats. This helps indentify bad PP users and possible leaks - Refine TCP/DCCP source port selection to no longer favor even source port at connect() time when IP_LOCAL_PORT_RANGE is set. This lowers the time taken by connect() for hosts having many active connections to the same destination - Refactor the TCP bind conflict code, shrinking related socket structs - Refactor TCP SYN-Cookie handling, as a preparation step to allow arbitrary SYN-Cookie processing via eBPF - Tune optmem_max for 0-copy usage, increasing the default value to 128KB and namespecifying it - Allow coalescing for cloned skbs coming from page pools, improving RX performances with some common configurations - Reduce extension header parsing overhead at GRO time - Add bridge MDB bulk deletion support, allowing user-space to request the deletion of matching entries - Reorder nftables struct members, to keep data accessed by the datapath first - Introduce TC block ports tracking and use. This allows supporting multicast-like behavior at the TC layer - Remove UAPI support for retired TC qdiscs (dsmark, CBQ and ATM) and classifiers (RSVP and tcindex) - More data-race annotations - Extend the diag interface to dump TCP bound-only sockets - Conditional notification of events for TC qdisc class and actions - Support for WPAN dynamic associations with nearby devices, to form a sub-network using a specific PAN ID - Implement SMCv2.1 virtual ISM device support - Add support for Batman-avd mulicast packet type BPF: - Tons of verifier improvements: - BPF register bounds logic and range support along with a large test suite - log improvements - complete precision tracking support for register spills - track aligned STACK_ZERO cases as imprecise spilled registers. This improves the verifier "instructions processed" metric from single digit to 50-60% for some programs - support for user's global BPF subprogram arguments with few commonly requested annotations for a better developer experience - support tracking of BPF_JNE which helps cases when the compiler transforms (unsigned) "a > 0" into "if a == 0 goto xxx" and the like - several fixes - Add initial TX metadata implementation for AF_XDP with support in mlx5 and stmmac drivers. Two types of offloads are supported right now, that is, TX timestamp and TX checksum offload - Fix kCFI bugs in BPF all forms of indirect calls from BPF into kernel and from kernel into BPF work with CFI enabled. This allows BPF to work with CONFIG_FINEIBT=y - Change BPF verifier logic to validate global subprograms lazily instead of unconditionally before the main program, so they can be guarded using BPF CO-RE techniques - Support uid/gid options when mounting bpffs - Add a new kfunc which acquires the associated cgroup of a task within a specific cgroup v1 hierarchy where the latter is identified by its id - Extend verifier to allow bpf_refcount_acquire() of a map value field obtained via direct load which is a use-case needed in sched_ext - Add BPF link_info support for uprobe multi link along with bpftool integration for the latter - Support for VLAN tag in XDP hints - Remove deprecated bpfilter kernel leftovers given the project is developed in user-space (https://github.com/facebook/bpfilter) Misc: - Support for parellel TC self-tests execution - Increase MPTCP self-tests coverage - Updated the bridge documentation, including several so-far undocumented features - Convert all the net self-tests to run in unique netns, to avoid random failures due to conflict and allow concurrent runs - Add TCP-AO self-tests - Add kunit tests for both cfg80211 and mac80211 - Autogenerate Netlink families documentation from YAML spec - Add yml-gen support for fixed headers and recursive nests, the tool can now generate user-space code for all genetlink families for which we have specs - A bunch of additional module descriptions fixes - Catch incorrect freeing of pages belonging to a page pool Driver API: - Rust abstractions for network PHY drivers; do not cover yet the full C API, but already allow implementing functional PHY drivers in rust - Introduce queue and NAPI support in the netdev Netlink interface, allowing complete access to the device <> NAPIs <> queues relationship - Introduce notifications filtering for devlink to allow control application scale to thousands of instances - Improve PHY validation, requesting rate matching information for each ethtool link mode supported by both the PHY and host - Add support for ethtool symmetric-xor RSS hash - ACPI based Wifi band RFI (WBRF) mitigation feature for the AMD platform - Expose pin fractional frequency offset value over new DPLL generic netlink attribute - Convert older drivers to platform remove callback returning void - Add support for PHY package MMD read/write New hardware / drivers: - Ethernet: - Octeon CN10K devices - Broadcom 5760X P7 - Qualcomm SM8550 SoC - Texas Instrument DP83TG720S PHY - Bluetooth: - IMC Networks Bluetooth radio Removed: - WiFi: - libertas 16-bit PCMCIA support - Atmel at76c50x drivers - HostAP ISA/PCMCIA style 802.11b driver - zd1201 802.11b USB dongles - Orinoco ISA/PCMCIA 802.11b driver - Aviator/Raytheon driver - Planet WL3501 driver - RNDIS USB 802.11b driver Driver updates: - Ethernet high-speed NICs: - Intel (100G, ice, idpf): - allow one by one port representors creation and removal - add temperature and clock information reporting - add get/set for ethtool's header split ringparam - add again FW logging - adds support switchdev hardware packet mirroring - iavf: implement symmetric-xor RSS hash - igc: add support for concurrent physical and free-running timers - i40e: increase the allowable descriptors - nVidia/Mellanox: - Preparation for Socket-Direct multi-dev netdev. That will allow in future releases combining multiple PFs devices attached to different NUMA nodes under the same netdev - Broadcom (bnxt): - TX completion handling improvements - add basic ntuple filter support - reduce MSIX vectors usage for MQPRIO offload - add VXLAN support, USO offload and TX coalesce completion for P7 - Marvell Octeon EP: - xmit-more support - add PF-VF mailbox support and use it for FW notifications for VFs - Wangxun (ngbe/txgbe): - implement ethtool functions to operate pause param, ring param, coalesce channel number and msglevel - Netronome/Corigine (nfp): - add flow-steering support - support UDP segmentation offload - Ethernet NICs embedded, slower, virtual: - Xilinx AXI: remove duplicate DMA code adopting the dma engine driver - stmmac: add support for HW-accelerated VLAN stripping - TI AM654x sw: add mqprio, frame preemption & coalescing - gve: add support for non-4k page sizes. - virtio-net: support dynamic coalescing moderation - nVidia/Mellanox Ethernet datacenter switches: - allow firmware upgrade without a reboot - more flexible support for bridge flooding via the compressed FID flooding mode - Ethernet embedded switches: - Microchip: - fine-tune flow control and speed configurations in KSZ8xxx - KSZ88X3: enable setting rmii reference - Renesas: - add jumbo frames support - Marvell: - 88E6xxx: add "eth-mac" and "rmon" stats support - Ethernet PHYs: - aquantia: add firmware load support - at803x: refactor the driver to simplify adding support for more chip variants - NXP C45 TJA11xx: Add MACsec offload support - Wifi: - MediaTek (mt76): - NVMEM EEPROM improvements - mt7996 Extremely High Throughput (EHT) improvements - mt7996 Wireless Ethernet Dispatcher (WED) support - mt7996 36-bit DMA support - Qualcomm (ath12k): - support for a single MSI vector - WCN7850: support AP mode - Intel (iwlwifi): - new debugfs file fw_dbg_clear - allow concurrent P2P operation on DFS channels - Bluetooth: - QCA2066: support HFP offload - ISO: more broadcast-related improvements - NXP: better recovery in case receiver/transmitter get out of sync" * tag 'net-next-6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1714 commits) lan78xx: remove redundant statement in lan78xx_get_eee lan743x: remove redundant statement in lan743x_ethtool_get_eee bnxt_en: Fix RCU locking for ntuple filters in bnxt_rx_flow_steer() bnxt_en: Fix RCU locking for ntuple filters in bnxt_srxclsrldel() bnxt_en: Remove unneeded variable in bnxt_hwrm_clear_vnic_filter() tcp: Revert no longer abort SYN_SENT when receiving some ICMP Revert "mlx5 updates 2023-12-20" Revert "net: stmmac: Enable Per DMA Channel interrupt" ipvlan: Remove usage of the deprecated ida_simple_xx() API ipvlan: Fix a typo in a comment net/sched: Remove ipt action tests net: stmmac: Use interrupt mode INTM=1 for per channel irq net: stmmac: Add support for TX/RX channel interrupt net: stmmac: Make MSI interrupt routine generic dt-bindings: net: snps,dwmac: per channel irq net: phy: at803x: make read_status more generic net: phy: at803x: add support for cdt cross short test for qca808x net: phy: at803x: refactor qca808x cable test get status function net: phy: at803x: generalize cdt fault length function net: ethernet: cortina: Drop TSO support ...	2024-01-11 10:07:29 -08:00
Tejun Heo	e3977e0609	Revert "kernfs: convert kernfs_idr_lock to an irq safe raw spinlock" This reverts commit dad3fb67ca1cbef87ce700e83a55835e5921ce8a. The commit converted kernfs_idr_lock to an IRQ-safe raw_spinlock because it could be acquired while holding an rq lock through bpf_cgroup_from_id(). However, kernfs_idr_lock is held while doing GPF_NOWAIT allocations which involves acquiring an non-IRQ-safe and non-raw lock leading to the following lockdep warning: ============================= [ BUG: Invalid wait context ] 6.7.0-rc5-kzm9g-00251-g655022a45b1c #578 Not tainted ----------------------------- swapper/0/0 is trying to lock: dfbcd488 (&c->lock){....}-{3:3}, at: local_lock_acquire+0x0/0xa4 other info that might help us debug this: context-{5:5} 2 locks held by swapper/0/0: #0: dfbc9c60 (lock){+.+.}-{3:3}, at: local_lock_acquire+0x0/0xa4 #1: c0c012a8 (kernfs_idr_lock){....}-{2:2}, at: __kernfs_new_node.constprop.0+0x68/0x258 stack backtrace: CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.7.0-rc5-kzm9g-00251-g655022a45b1c #578 Hardware name: Generic SH73A0 (Flattened Device Tree) unwind_backtrace from show_stack+0x10/0x14 show_stack from dump_stack_lvl+0x68/0x90 dump_stack_lvl from __lock_acquire+0x3cc/0x168c __lock_acquire from lock_acquire+0x274/0x30c lock_acquire from local_lock_acquire+0x28/0xa4 local_lock_acquire from ___slab_alloc+0x234/0x8a8 ___slab_alloc from __slab_alloc.constprop.0+0x30/0x44 __slab_alloc.constprop.0 from kmem_cache_alloc+0x7c/0x148 kmem_cache_alloc from radix_tree_node_alloc.constprop.0+0x44/0xdc radix_tree_node_alloc.constprop.0 from idr_get_free+0x110/0x2b8 idr_get_free from idr_alloc_u32+0x9c/0x108 idr_alloc_u32 from idr_alloc_cyclic+0x50/0xb8 idr_alloc_cyclic from __kernfs_new_node.constprop.0+0x88/0x258 __kernfs_new_node.constprop.0 from kernfs_create_root+0xbc/0x154 kernfs_create_root from sysfs_init+0x18/0x5c sysfs_init from mnt_init+0xc4/0x220 mnt_init from vfs_caches_init+0x6c/0x88 vfs_caches_init from start_kernel+0x474/0x528 start_kernel from 0x0 Let's rever the commit. It's undesirable to spread out raw spinlock usage anyway and the problem can be solved by protecting the lookup path with RCU instead. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Andrea Righi <andrea.righi@canonical.com> Reported-by: Geert Uytterhoeven <geert@linux-m68k.org> Link: http://lkml.kernel.org/r/CAMuHMdV=AKt+mwY7svEq5gFPx41LoSQZ_USME5_MEdWQze13ww@mail.gmail.com Link: https://lore.kernel.org/r/20240109214828.252092-2-tj@kernel.org Tested-by: Andrea Righi <andrea.righi@canonical.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2024-01-11 11:51:27 +01:00
Darrick J. Wong	d61b40bf15	xfs: fix backwards logic in xfs_bmap_alloc_account We're only allocating from the realtime device if the inode is marked for realtime and we're /not/ allocating into the attr fork. Fixes: `5864346054` ("xfs: also use xfs_bmap_btalloc_accounting for RT allocations") Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>	2024-01-11 10:34:01 +05:30
Linus Torvalds	a05aea98d4	sysctl-6.8-rc1 To help make the move of sysctls out of kernel/sysctl.c not incur a size penalty sysctl has been changed to allow us to not require the sentinel, the final empty element on the sysctl array. Joel Granados has been doing all this work. On the v6.6 kernel we got the major infrastructure changes required to support this. For v6.7 we had all arch/ and drivers/ modified to remove the sentinel. For v6.8-rc1 we get a few more updates for fs/ directory only. The kernel/ directory is left but we'll save that for v6.9-rc1 as those patches are still being reviewed. After that we then can expect also the removal of the no longer needed check for procname == NULL. Let us recap the purpose of this work: - this helps reduce the overall build time size of the kernel and run time memory consumed by the kernel by about ~64 bytes per array - the extra 64-byte penalty is no longer inncurred now when we move sysctls out from kernel/sysctl.c to their own files Thomas Weißschuh also sent a few cleanups, for v6.9-rc1 we expect to see further work by Thomas Weißschuh with the constificatin of the struct ctl_table. Due to Joel Granados's work, and to help bring in new blood, I have suggested for him to become a maintainer and he's accepted. So for v6.9-rc1 I look forward to seeing him sent you a pull request for further sysctl changes. This also removes Iurii Zaikin as a maintainer as he has moved on to other projects and has had no time to help at all. -----BEGIN PGP SIGNATURE----- iQJGBAABCgAwFiEENnNq2KuOejlQLZofziMdCjCSiKcFAmWdWDESHG1jZ3JvZkBr ZXJuZWwub3JnAAoJEM4jHQowkoinjJAP/jTNNoyzWisvrrvmXqR5txFGLOE+wW6x Xv9avuiM+DTHsH/wK8CkXEivwDqYNAZEHU7NEcolS5bJX/ddSRwN9b5aSVlCrUdX Ab4rXmpeSCNFp9zNszWJsDuBKIqjvsKw7qGleGtgZ2qAUHbbH30VROLWCggaee50 wU3icDLdwkasxrcMXy4Sq5dT5wYC4j/QelqBGIkYPT14Arl1im5zqPZ95gmO/s/6 mdicTAmq+hhAUfUBJBXRKtsvxY6CItxe55Q4fjpncLUJLHUw+VPVNoBKFWJlBwlh LO3liKFfakPSkil4/en+/+zuMByd0JBkIzIJa+Kk5kjpbHRhK0RkmU4+Y5G5spWN jjLfiv6RxInNaZ8EWQBMfjE95A7PmYDQ4TOH08+OvzdDIi6B0BB5tBGQpG9BnyXk YsLg1Uo4CwE/vn1/a9w0rhadjUInvmAryhb/uSJYFz/lmApLm2JUpY3/KstwGetb z+HmLstJb24Djkr6pH8DcjhzRBHeWQ5p0b4/6B+v1HqAUuEhdbyw1F2GrDywyF3R h/UOAaKLm1+ffdA246o9TejKiDU96qEzzXMaCzPKyestaRZuiyuYEMDhYbvtsMV5 zIdMJj5HQ+U1KHDv4IN99DEj7+/vjE3f4Sjo+POFpQeQ8/d+fxpFNqXVv449dgnb 6xEkkxsR0ElM =2qBt -----END PGP SIGNATURE----- Merge tag 'sysctl-6.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux Pull sysctl updates from Luis Chamberlain: "To help make the move of sysctls out of kernel/sysctl.c not incur a size penalty sysctl has been changed to allow us to not require the sentinel, the final empty element on the sysctl array. Joel Granados has been doing all this work. In the v6.6 kernel we got the major infrastructure changes required to support this. For v6.7 we had all arch/ and drivers/ modified to remove the sentinel. For v6.8-rc1 we get a few more updates for fs/ directory only. The kernel/ directory is left but we'll save that for v6.9-rc1 as those patches are still being reviewed. After that we then can expect also the removal of the no longer needed check for procname == NULL. Let us recap the purpose of this work: - this helps reduce the overall build time size of the kernel and run time memory consumed by the kernel by about ~64 bytes per array - the extra 64-byte penalty is no longer inncurred now when we move sysctls out from kernel/sysctl.c to their own files Thomas Weißschuh also sent a few cleanups, for v6.9-rc1 we expect to see further work by Thomas Weißschuh with the constificatin of the struct ctl_table. Due to Joel Granados's work, and to help bring in new blood, I have suggested for him to become a maintainer and he's accepted. So for v6.9-rc1 I look forward to seeing him sent you a pull request for further sysctl changes. This also removes Iurii Zaikin as a maintainer as he has moved on to other projects and has had no time to help at all" * tag 'sysctl-6.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux: sysctl: remove struct ctl_path sysctl: delete unused define SYSCTL_PERM_EMPTY_DIR coda: Remove the now superfluous sentinel elements from ctl_table array sysctl: Remove the now superfluous sentinel elements from ctl_table array fs: Remove the now superfluous sentinel elements from ctl_table array cachefiles: Remove the now superfluous sentinel element from ctl_table array sysclt: Clarify the results of selftest run sysctl: Add a selftest for handling empty dirs sysctl: Fix out of bounds access for empty sysctl registers MAINTAINERS: Add Joel Granados as co-maintainer for proc sysctl MAINTAINERS: remove Iurii Zaikin from proc sysctl	2024-01-10 17:44:36 -08:00
Linus Torvalds	78273df7f6	header cleanups for 6.8 The goal is to get sched.h down to a type only header, so the main thing happening in this patchset is splitting out various _types.h headers and dependency fixups, as well as moving some things out of sched.h to better locations. This is prep work for the memory allocation profiling patchset which adds new sched.h interdepencencies. Testing - it's been in -next, and fixes from pretty much all architectures have percolated in - nothing major. -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEKnAFLkS8Qha+jvQrE6szbY3KbnYFAmWfBwwACgkQE6szbY3K bnZPwBAAmuRojXaeWxi01IPIOehSGDe68vw44PR9glEMZvxdnZuPOdvE4/+245/L bRKU2WBCjBUokUbV9msIShwRkFTZAmEMPNfPAAsFMA+VXeDYHKB+ZRdwTggNAQ+I SG6fZgh5m0HsewCDxU8oqVHkjVq4fXn0cy+aL6xLEd9gu67GoBzX2pDieS2Kvy6j jnyoKTxFwb+LTQgph0P4EIpq5I2umAsdLwdSR8EJ+8e9NiNvMo1pI00Lx/ntAnFZ JftWUJcMy3TQ5u1GkyfQN9y/yThX1bZK5GvmHS9SJ2Dkacaus5d+xaKCHtRuFS1I 7C6b8PsNgRczUMumBXus44HdlNfNs1yU3lvVxFvBIPE1qC9pYRHrkWIXXIocXLLC oxTEJ6B2G3BQZVQgLIA4fOaxMVhmvKffi/aEZLi9vN9VVosd1a6XNKI6KbyRnXFp GSs9qDqszhn5I3GYNlDNQTc/8UsRlhPFgS6nS0By6QnvxtGi9QkU2tBRBsXvqwCy cLoCYIhc2tvugHvld70dz26umiJ4rnmxGlobStNoigDvIKAIUt1UmIdr1so8P8eH xehnL9ZcOX6xnANDL0AqMFFHV6I58CJynhFdUoXfVQf/DWLGX48mpi9LVNsYBzsI CAwVOAQ0UjGrpdWmJ9ueY/ABYqg9vRjzaDEXQ+MhAYO55CLaVsg= =3tyT -----END PGP SIGNATURE----- Merge tag 'header_cleanup-2024-01-10' of https://evilpiepirate.org/git/bcachefs Pull header cleanups from Kent Overstreet: "The goal is to get sched.h down to a type only header, so the main thing happening in this patchset is splitting out various _types.h headers and dependency fixups, as well as moving some things out of sched.h to better locations. This is prep work for the memory allocation profiling patchset which adds new sched.h interdepencencies" * tag 'header_cleanup-2024-01-10' of https://evilpiepirate.org/git/bcachefs: (51 commits) Kill sched.h dependency on rcupdate.h kill unnecessary thread_info.h include Kill unnecessary kernel.h include preempt.h: Kill dependency on list.h rseq: Split out rseq.h from sched.h LoongArch: signal.c: add header file to fix build error restart_block: Trim includes lockdep: move held_lock to lockdep_types.h sem: Split out sem_types.h uidgid: Split out uidgid_types.h seccomp: Split out seccomp_types.h refcount: Split out refcount_types.h uapi/linux/resource.h: fix include x86/signal: kill dependency on time.h syscall_user_dispatch.h: split out *_types.h mm_types_task.h: Trim dependencies Split out irqflags_types.h ipc: Kill bogus dependency on spinlock.h shm: Slim down dependencies workqueue: Split out workqueue_types.h ...	2024-01-10 16:43:55 -08:00
Linus Torvalds	999a36b52b	bcachefs updates for 6.8: - btree write buffer rewrite: instead of adding keys to the btree write buffer at transaction commit time, we know journal them with a different journal entry type and copy them from the journal to the write buffer just prior to journal write. This reduces the number of atomic operations on shared cachelines in the transaction commit path and is a signicant performance improvement on some workloads: multithreaded 4k random writes went from ~650k iops to ~850k iops. - Bring back optimistic spinning for six locks: the new implementation doesn't use osq locks; instead we add to the lock waitlist as normal, and then spin on the lock_acquired bit in the waitlist entry, _not_ the lock itself. - BCH_IOCTL_DEV_USAGE_V2, which allows for new data types - BCH_IOCTL_OFFLINE_FSCK, which runs the kernel implementation of fsck but without mounting: useful for transparently using the kernel version of fsck from 'bcachefs fsck' when the kernel version is a better match for the on disk filesystem. - BCH_IOCTL_ONLINE_FSCK: online fsck. Not all passes are supported yet, but the passes that are supported are fully featured - errors may be corrected as normal. The new ioctls use the new 'thread_with_file' abstraction for kicking off a kthread that's tied to a file descriptor returned to userspace via the ioctl. - btree_paths within a btree_trans are now dynamically growable, instead of being limited to 64. This is important for the check_directory_structure phase of fsck, and also fixes some issues we were having with btree path overflow in the reflink btree. - Trigger refactoring; prep work for the upcoming disk space accounting rewrite - Numerous bugfixes :) -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEKnAFLkS8Qha+jvQrE6szbY3KbnYFAmWe8PUACgkQE6szbY3K bnYw6g/9GAXfIGasTZZwK2XEr36RYtEFYMwd/m9V1ET0DH6d/MFH9G7tTYl52AQ4 k9cDFb0d2qdtNk2Rlml1lHFrxMzkp2Q7j9S4YcETrE+/Dir8ODVcJXrGeNTCMGmz B+C12mTOpWrzGMrioRgFZjWAnacsY3RP8NFRTT9HIJHO9UCP+xN5y++sX10C5Gwv 7UVWTaUwjkgdYWkR8RCKGXuG5cNNlRp4Y0eeK2XruG1iI9VAilir1glcD/YMOY8M vECQzmf2ZLGFS/tpnmqVhNbNwVWpTQMYassvKaisWNHLDUgskOoF8YfoYSH27t7F GBb1154O2ga6ea866677FDeNVlg386mGCTUy2xOhMpDL3zW+/Is+8MdfJI4MJP5R EwcjHnn2bk0C2kULbAohw0gnU42FulfvsLNnrfxCeygmZrDoOOCL1HpvnBG4vskc Fp6NK83l974QnyLdPsjr1yB2d2pgb+uMP1v76IukQi0IjNSAyvwSa5nloPTHRzpC j6e2cFpdtX+6vEu6KngXVKTblSEnwhVBTaTR37Lr8PX1sZqFS/+mjRDgg3HZa/GI u0fC0mQyVL9KjDs5LJGpTc/qs8J4mpoS5+dfzn38MI76dFxd5TYZKWVfILTrOtDF ugDnoLkMuYFdueKI2M3YzxXyaA7HBT+7McAdENuJJzJnEuSAZs0= =JvA2 -----END PGP SIGNATURE----- Merge tag 'bcachefs-2024-01-10' of https://evilpiepirate.org/git/bcachefs Pull bcachefs updates from Kent Overstreet: - btree write buffer rewrite: instead of adding keys to the btree write buffer at transaction commit time, we now journal them with a different journal entry type and copy them from the journal to the write buffer just prior to journal write. This reduces the number of atomic operations on shared cachelines in the transaction commit path and is a signicant performance improvement on some workloads: multithreaded 4k random writes went from ~650k iops to ~850k iops. - Bring back optimistic spinning for six locks: the new implementation doesn't use osq locks; instead we add to the lock waitlist as normal, and then spin on the lock_acquired bit in the waitlist entry, _not_ the lock itself. - New ioctls: - BCH_IOCTL_DEV_USAGE_V2, which allows for new data types - BCH_IOCTL_OFFLINE_FSCK, which runs the kernel implementation of fsck but without mounting: useful for transparently using the kernel version of fsck from 'bcachefs fsck' when the kernel version is a better match for the on disk filesystem. - BCH_IOCTL_ONLINE_FSCK: online fsck. Not all passes are supported yet, but the passes that are supported are fully featured - errors may be corrected as normal. The new ioctls use the new 'thread_with_file' abstraction for kicking off a kthread that's tied to a file descriptor returned to userspace via the ioctl. - btree_paths within a btree_trans are now dynamically growable, instead of being limited to 64. This is important for the check_directory_structure phase of fsck, and also fixes some issues we were having with btree path overflow in the reflink btree. - Trigger refactoring; prep work for the upcoming disk space accounting rewrite - Numerous bugfixes :) * tag 'bcachefs-2024-01-10' of https://evilpiepirate.org/git/bcachefs: (226 commits) bcachefs: eytzinger0_find() search should be const bcachefs: move "ptrs not changing" optimization to bch2_trigger_extent() bcachefs: fix simulateously upgrading & downgrading bcachefs: Restart recovery passes more reliably bcachefs: bch2_dump_bset() doesn't choke on u64s == 0 bcachefs: improve checksum error messages bcachefs: improve validate_bset_keys() bcachefs: print sb magic when relevant bcachefs: __bch2_sb_field_to_text() bcachefs: %pg is banished bcachefs: Improve would_deadlock trace event bcachefs: fsck_err()s don't need to manually check c->sb.version anymore bcachefs: Upgrades now specify errors to fix, like downgrades bcachefs: no thread_with_file in userspace bcachefs: Don't autofix errors we can't fix bcachefs: add missing bch2_latency_acct() call bcachefs: increase max_active on io_complete_wq bcachefs: add time_stats for btree_node_read_done() bcachefs: don't clear accessed bit in btree node fill bcachefs: Add an option to control btree node prefetching ...	2024-01-10 16:34:17 -08:00
Linus Torvalds	84e9a2d551	Various smb client fixes, most related to better handling special file types -----BEGIN PGP SIGNATURE----- iQGzBAABCgAdFiEE6fsu8pdIjtWE/DpLiiy9cAdyT1EFAmWfB8kACgkQiiy9cAdy T1HnJwv/ZhSkyu7iSsP9pWmYaCvzKkT6OljIvixTMKilzVzLe24g+sqSSkDb+SSw 6W9v26AWJqjfzVYyoy8PtR+Erbp5CKoJifAQg+1KK/jJ38ApWCqTpOtRBuo/vEX+ OFdAolOaNr74QWpylInyg5GCyjarH78cvyctukj6ZNO419fX3xKQfIc8z0yE0Osp 0M4R/xy8MuuoKuEzDVuimzY6J9qPqQCsv2jrUDXRZry1bBR0JYl7mIxg3wPT+nBM aFFVybF1kKlGvb0clmymSpI4isJg1i5v03Vy3lot01CQxI5kBftE/CkMdVNxMPBh WDs2ZIXk3c9zsyWmI7ijhEaOVeaaOG8s1MQ/374bBRMeUIzcqGY5rn/gn4JEQbnh rjy9+8oH17+h76qCxHzoVZpuoE2hwHuifQ4KKesoApV3edbPDOzMGol4gA+7KvQD pJAfIDPSZFyA9kNOQiyoritAOG2YulQ+cMcOXG4HeIHJWqnbLiiIq0mt3sJCdvqI 5PiZDR81 =K+9M -----END PGP SIGNATURE----- Merge tag 'v6.8-rc-part1-smb-client' of git://git.samba.org/sfrench/cifs-2.6 Pull smb client fixes from Steve French: "Various smb client fixes, most related to better handling special file types: - Improve handling of special file types: - performance improvement (better compounding and better caching of readdir entries that are reparse points) - extend support for creating special files (sockets, fifos, block/char devices) - fix renaming and hardlinking of reparse points - extend support for creating symlinks with IO_REPARSE_TAG_SYMLINK - Multichannel logging improvement - Exception handling fix - Minor cleanups" * tag 'v6.8-rc-part1-smb-client' of git://git.samba.org/sfrench/cifs-2.6: cifs: update internal module version number for cifs.ko cifs: remove unneeded return statement cifs: make cifs_chan_update_iface() a void function cifs: delete unnecessary NULL checks in cifs_chan_update_iface() cifs: get rid of dup length check in parse_reparse_point() smb: client: stop revalidating reparse points unnecessarily cifs: Pass unbyteswapped eof value into SMB2_set_eof() smb3: Improve exception handling in allocate_mr_list() cifs: fix in logging in cifs_chan_update_iface smb: client: handle special files and symlinks in SMB3 POSIX smb: client: cleanup smb2_query_reparse_point() smb: client: allow creating symlinks via reparse points smb: client: fix hardlinking of reparse points smb: client: fix renaming of reparse points smb: client: optimise reparse point querying smb: client: allow creating special files via reparse points smb: client: extend smb2_compound_op() to accept more commands smb: client: Fix minor whitespace errors and warnings	2024-01-10 16:23:30 -08:00
Linus Torvalds	587217f970	NFS Client Updates for Linux 6.8 New Features: * Always ask for type with READDIR * Remove nfs_writepage() Bugfixes: * Fix a suspicious RCU usage warning * Fix a blocklayoutdriver reference leak * Fix the block driver's calculation of layoutget size * Fix handling NFS4ERR_RETURNCONFLICT * Fix _xprt_switch_find_current_entry() * Fix v4.1 backchannel request timeouts * Don't add zero-length pnfs block devices * Use the parent cred in nfs_access_login_time() Cleanups: * A few improvements when dealing with referring calls from the server * Clean up various unused variables, struct fields, and function calls * Various tracepoint improvements -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEnZ5MQTpR7cLU7KEp18tUv7ClQOsFAmWfDp8ACgkQ18tUv7Cl QOsJ+Q/8DgrVmP3jwoM9Fu7JI/RnTQr9svk7zyrlyrQd3ywYqu6A1SC7lphcrzhy qxH55ykUuVgCB4kFqWPsU5yilJ8UzPncTOUObiBxN3pCU885Wckm4PJ9PNXtF9ct hc7+RpSTby/hYxiJABGVLgUADJ30rYBe6Y+KspSf+S1HvmgY1jbMPhEbVGpP2QBt zSF5pmnecZ748LGzSwSeW29WUZhvRPBL5B204EB4aq9SmPAhnAclnE7uhErQ1u8e Z6RVwSXv2j1FcM79F5xc/gAByCQhObGuMceFd0sAnx87RUttHi1fteVboz2gZxHB rawZQ9p9K9c7ayCu8disxKWTxNYAztvXDOs+Dnij+c3/2EpAmEUD53AXnXAz025b IbSnh6ggLlxoKLv1Lrwrli2d/Qi4TYTm2RSW/dY416pIhoO3aC6fv1a5tUnou9RX 3XpiiFeNoTixWswmS23AMT7BrJTWXY/+NX7AxFZUyPyJ8y9F2Ug8BCam8uAvTluf 80Dx0pB+7DRF19/ZkH0mUFU+2/+mlK/Ub0p+izSJhkhPSH5TwUTA7hvX6xb7yFtS OY4aTVD0rpTbSOvHOEI+F4tWBnw8onTobYMfRcuwNKYJCvuEh4mLLpn44QEJwW9M 3nHIzdE75Nz3deO+gg6Jo5JuiMwqvh7AEGsxIA64FnAi/xRCDi0= =9hVw -----END PGP SIGNATURE----- Merge tag 'nfs-for-6.8-1' of git://git.linux-nfs.org/projects/anna/linux-nfs Pull nfs client updates from Anna Schumaker: "New Features: - Always ask for type with READDIR - Remove nfs_writepage() Bugfixes: - Fix a suspicious RCU usage warning - Fix a blocklayoutdriver reference leak - Fix the block driver's calculation of layoutget size - Fix handling NFS4ERR_RETURNCONFLICT - Fix _xprt_switch_find_current_entry() - Fix v4.1 backchannel request timeouts - Don't add zero-length pnfs block devices - Use the parent cred in nfs_access_login_time() Cleanups: - A few improvements when dealing with referring calls from the server - Clean up various unused variables, struct fields, and function calls - Various tracepoint improvements" * tag 'nfs-for-6.8-1' of git://git.linux-nfs.org/projects/anna/linux-nfs: (21 commits) NFSv4.1: Use the nfs_client's rpc timeouts for backchannel SUNRPC: Fixup v4.1 backchannel request timeouts rpc_pipefs: Replace one label in bl_resolve_deviceid() nfs: Remove writepage NFS: drop unused nfs_direct_req bytes_left pNFS: Fix the pnfs block driver's calculation of layoutget size nfs: print fileid in lookup tracepoints nfs: rename the nfs_async_rename_done tracepoint nfs: add new tracepoint at nfs4 revalidate entry point SUNRPC: fix _xprt_switch_find_current_entry logic NFSv4.1/pnfs: Ensure we handle the error NFS4ERR_RETURNCONFLICT NFSv4.1: if referring calls are complete, trust the stateid argument NFSv4: Track the number of referring calls in struct cb_process_state NFS: Use parent's objective cred in nfs_access_login_time() NFSv4: Always ask for type with READDIR pnfs/blocklayout: Don't add zero-length pnfs_block_dev blocklayoutdriver: Fix reference leak of pnfs_device_node SUNRPC: Fix a suspicious RCU usage warning SUNRPC: Create a helper function for accessing the rpc_clnt's xprt_switch SUNRPC: Remove unused function rpc_clnt_xprt_switch_put() ...	2024-01-10 16:13:57 -08:00
Linus Torvalds	0d19d9e146	Various ext4 bug fixes and cleanups for v6.8-rc1. The fixes are mostly in the fstrim and mballoc code paths. Also enable dioread_nolock in the case where the block size is less than the page size. (Dioread_nolock has been default in the bs == ps case for quite some time.) -----BEGIN PGP SIGNATURE----- iQEzBAABCAAdFiEEK2m5VNv+CHkogTfJ8vlZVpUNgaMFAmWe6MMACgkQ8vlZVpUN gaM/gAf/e9j4yCAR/W23cICNh/9hw2U0HItEONZF7GDfySlGADL5dsOADe58jLY9 g8UwBpHptOcyxmMTYgdKPQ2YpUF+3Kd4oi2M1Q6CjeeBeRbwuzT4lMTeKrtMEgiz Ns8mqBgGX3DIXjcbkdO9QdLZPBj07djamAIQlWVLHAR2w6LPgiBhHebUSe+36Ufk xLaj5X2nkdTtPcN1EnlTYNR+zMLyAwXUsxKf44aUveRwiNAfLGBgY9yvFby7hC+6 ENCP1WsalvVnaI8mr9pgt1KTXIrElknA1bbiWJ9RZ5Y8Za+MEHxXBKpP/AStX8Nc WEo7a9tNB1AXU04+/SgVp9GAkXEViA== =Zk8h -----END PGP SIGNATURE----- Merge tag 'ext4_for_linus-6.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 Pull ext4 updates from Ted Ts'o: "Various ext4 bug fixes and cleanups. The fixes are mostly in the fstrim and mballoc code paths. Also enable dioread_nolock in the case where the block size is less than the page size (dioread_nolock has been default in the bs == ps case for quite some time)" * tag 'ext4_for_linus-6.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: ext4: fix inconsistent between segment fstrim and full fstrim ext4: fallback to complex scan if aligned scan doesn't work ext4: convert ext4_da_do_write_end() to take a folio ext4: allow for the last group to be marked as trimmed ext4: move ext4_check_bdev_write_error() into nojournal mode jbd2: abort journal when detecting metadata writeback error of fs dev jbd2: remove unused 'JBD2_CHECKPOINT_IO_ERROR' and 'j_atomic_flags' jbd2: replace journal state flag by checking errseq jbd2: add errseq to detect client fs's bdev writeback error ext4: improving calculation of 'fe_{len\|start}' in mb_find_extent() ext4: clarify handling of unwritten bh in __ext4_block_zero_page_range() ext4: treat end of range as exclusive in ext4_zero_range() ext4: enable dioread_nolock as default for bs < ps case ext4: delete redundant calculations in ext4_mb_get_buddy_page_lock() ext4: reduce unnecessary memory allocation in alloc_flex_gd() ext4: avoid online resizing failures due to oversized flex bg ext4: remove unnecessary check from alloc_flex_gd() ext4: unify the type of flexbg_size to unsigned int	2024-01-10 16:09:14 -08:00
Linus Torvalds	6bd593bc74	unicode updates Other than the update to MAINTAINERS, this PR has only a fix to stop ecryptfs from inadvertently mounting case-insensitive filesystems that it cannot handle, which would otherwise caused post-mount failures. It has been on linux-next for the past month and a half. As a side note, the optimization to the case-insensitive comparison code that you suggested, and the negative dentry support are still on the list, and were postponed to the next release. -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQRIEdicmeMNCZKVCdo6u2Upsdk6RAUCZZxQsQAKCRA6u2Upsdk6 RDNtAP9QkDintyPet5GhZKvgUDuNkduMbjDymDv9aELCdhz3FAEAprzJOanEXjOY JInwn3pvZIcpQzwFEJtDt1HZEZLF0Q0= =sHKn -----END PGP SIGNATURE----- Merge tag 'unicode-next-6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/krisman/unicode Pull unicode updates from Gabriel Krisman Bertazi: "Other than the update to MAINTAINERS, this PR has only a fix to stop ecryptfs from inadvertently mounting case-insensitive filesystems that it cannot handle, which would otherwise caused post-mount failures" * tag 'unicode-next-6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/krisman/unicode: MAINTAINERS: update unicode maintainer e-mail address ecryptfs: Reject casefold directory inodes	2024-01-10 16:06:58 -08:00
Linus Torvalds	120a201bd2	hardening updates for v6.8-rc1 - Introduce the param_unknown_fn type and other clean ups (Andy Shevchenko) - Various __counted_by annotations (Christophe JAILLET, Gustavo A. R. Silva, Kees Cook) - Add KFENCE test to LKDTM (Stephen Boyd) - Various strncpy() refactorings (Justin Stitt) - Fix qnx4 to avoid writing into the smaller of two overlapping buffers - Various strlcpy() refactorings -----BEGIN PGP SIGNATURE----- iQJKBAABCgA0FiEEpcP2jyKd1g9yPm4TiXL039xtwCYFAmWcOsQWHGtlZXNjb29r QGNocm9taXVtLm9yZwAKCRCJcvTf3G3AJoiDD/9gNhalNG+6MNF5TDwSvO9X7pvL bQ6D3clByRxYjnJ4dMQ7p3s+rJ937uQt9PezIWHgRoldjQy3x7AJ5BxkhjeMlD2B YLbfdVYPy09X0Ewk1Efvfm/ta6tJpBGYF7Bc7LIneZrdQ6gemBpLW1PNZAFYzcWX oDjV+M1NytxaiF0aebxPZvZ1W+NGQ105Sxvj5MheDoezyO/j0CTe+ZYtCzFguFY0 8SPpR5FG4AFidb8GHd5Ndv0trVWjF1jat0FUFgEFOCE0fJNWLVR0Bbr2MtXiG7wL LF7IZ/Mn+mi+O3BmcD6JiaYf9EPlMUXCyqc8NvsnoWGqhWhWmQPCInZVrpplMUNK V/UHVMkmjDs4f/lAHBJoJHDK6fmOD+cAFaNMOltfErcjV4s+lEo6vHoiKl8hfPnH EzpQaK3funGroVYwTc35e07NrJJHCzqIUhZ0FJO7ByuOE2tIomiVo9Xy9gy54iCT qzC7zkrZ0MKqui4qiUY9FWayRRYLX4qNxELm4yie6Pzmk8943hNOaDofcyKWuZFC eqvhIkvqb4LasLrzCBk+ehA2KWSRmTrR6E9IygwbBXUTsvn2yj2RRYeAlGQNBTBZ adgSXQpRBmtKYqyihWLhP4QcunknEiQdDS3lS2qJmPH33Iv3jGH4yS6BNIBufMGL PoC2UxSfGd+YT079fw== =1Wxx -----END PGP SIGNATURE----- Merge tag 'hardening-v6.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux Pull hardening updates from Kees Cook: - Introduce the param_unknown_fn type and other clean ups (Andy Shevchenko) - Various __counted_by annotations (Christophe JAILLET, Gustavo A. R. Silva, Kees Cook) - Add KFENCE test to LKDTM (Stephen Boyd) - Various strncpy() refactorings (Justin Stitt) - Fix qnx4 to avoid writing into the smaller of two overlapping buffers - Various strlcpy() refactorings * tag 'hardening-v6.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: qnx4: Use get_directory_fname() in qnx4_match() qnx4: Extract dir entry filename processing into helper atags_proc: Add __counted_by for struct buffer and use struct_size() tracing/uprobe: Replace strlcpy() with strscpy() params: Fix multi-line comment style params: Sort headers params: Use size_add() for kmalloc() params: Do not go over the limit when getting the string length params: Introduce the param_unknown_fn type lkdtm: Add kfence read after free crash type nvme-fc: replace deprecated strncpy with strscpy nvdimm/btt: replace deprecated strncpy with strscpy nvme-fabrics: replace deprecated strncpy with strscpy drm/modes: replace deprecated strncpy with strscpy_pad afs: Add __counted_by for struct afs_acl and use struct_size() VMCI: Annotate struct vmci_handle_arr with __counted_by i40e: Annotate struct i40e_qvlist_info with __counted_by HID: uhid: replace deprecated strncpy with strscpy samples: Replace strlcpy() with strscpy() SUNRPC: Replace strlcpy() with strscpy()	2024-01-10 11:03:52 -08:00
Ye Bin	68da4c44b9	ext4: fix inconsistent between segment fstrim and full fstrim Suppose we issue two FITRIM ioctls for ranges [0,15] and [16,31] with mininum length of trimmed range set to 8 blocks. If we have say a range of blocks 10-22 free, this range will not be trimmed because it straddles the boundary of the two FITRIM ranges and neither part is big enough. This is a bit surprising to some users that call FITRIM on smaller ranges of blocks to limit impact on the system. Also XFS trims all free space extents that overlap with the specified range so we are inconsistent among filesystems. Let's change ext4_try_to_trim_range() to consider for trimming the whole free space extent that straddles the end of specified range, not just the part of it within the range. Signed-off-by: Ye Bin <yebin10@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20231216010919.1995851-1-yebin10@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2024-01-10 13:53:36 -05:00
Ojaswin Mujoo	1f6bc02f18	ext4: fallback to complex scan if aligned scan doesn't work Currently in case the goal length is a multiple of stripe size we use ext4_mb_scan_aligned() to find the stripe size aligned physical blocks. In case we are not able to find any, we again go back to calling ext4_mb_choose_next_group() to search for a different suitable block group. However, since the linear search always begins from the start, most of the times we end up with the same BG and the cycle continues. With large fliesystems, the CPU can be stuck in this loop for hours which can slow down the whole system. Hence, until we figure out a better way to continue the search (rather than starting from beginning) in ext4_mb_choose_next_group(), lets just fallback to ext4_mb_complex_scan_group() in case aligned scan fails, as it is much more likely to find the needed blocks. Signed-off-by: Ojaswin Mujoo <ojaswin@linux.ibm.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/ee033f6dfa0a7f2934437008a909c3788233950f.1702455010.git.ojaswin@linux.ibm.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2024-01-10 13:53:36 -05:00
Matthew Wilcox (Oracle)	4d5cdd757d	ext4: convert ext4_da_do_write_end() to take a folio There's nothing page-specific happening in ext4_da_do_write_end(); it's merely used for its refcount & lock, both of which are folio properties. Saves four calls to compound_head(). Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20231214053035.1018876-1-willy@infradead.org Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2024-01-10 13:53:36 -05:00
Suraj Jitindar Singh	7c784d6248	ext4: allow for the last group to be marked as trimmed The ext4 filesystem tracks the trim status of blocks at the group level. When an entire group has been trimmed then it is marked as such and subsequent trim invocations with the same minimum trim size will not be attempted on that group unless it is marked as able to be trimmed again such as when a block is freed. Currently the last group can't be marked as trimmed due to incorrect logic in ext4_last_grp_cluster(). ext4_last_grp_cluster() is supposed to return the zero based index of the last cluster in a group. This is then used by ext4_try_to_trim_range() to determine if the trim operation spans the entire group and as such if the trim status of the group should be recorded. ext4_last_grp_cluster() takes a 0 based group index, thus the valid values for grp are 0..(ext4_get_groups_count - 1). Any group index less than (ext4_get_groups_count - 1) is not the last group and must have EXT4_CLUSTERS_PER_GROUP(sb) clusters. For the last group we need to calculate the number of clusters based on the number of blocks in the group. Finally subtract 1 from the number of clusters as zero based indexing is expected. Rearrange the function slightly to make it clear what we are calculating and returning. Reproducer: // Create file system where the last group has fewer blocks than // blocks per group $ mkfs.ext4 -b 4096 -g 8192 /dev/nvme0n1 8191 $ mount /dev/nvme0n1 /mnt Before Patch: $ fstrim -v /mnt /mnt: 25.9 MiB (27156480 bytes) trimmed // Group not marked as trimmed so second invocation still discards blocks $ fstrim -v /mnt /mnt: 25.9 MiB (27156480 bytes) trimmed After Patch: fstrim -v /mnt /mnt: 25.9 MiB (27156480 bytes) trimmed // Group marked as trimmed so second invocation DOESN'T discard any blocks fstrim -v /mnt /mnt: 0 B (0 bytes) trimmed Fixes: `45e4ab320c` ("ext4: move setting of trimmed bit into ext4_try_to_trim_range()") Cc: <stable@vger.kernel.org> # 4.19+ Signed-off-by: Suraj Jitindar Singh <surajjs@amazon.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20231213051635.37731-1-surajjs@amazon.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2024-01-10 13:53:17 -05:00
Linus Torvalds	72116efd63	pstore updates for v6.8-rc1 - Do not allow misconfigured ECC sizes (Sergey Shtylyov) - Allow for odd number of CPUs (Weichen Chen) - Refactor error handling to use cleanup.h -----BEGIN PGP SIGNATURE----- iQJKBAABCgA0FiEEpcP2jyKd1g9yPm4TiXL039xtwCYFAmWcPVEWHGtlZXNjb29r QGNocm9taXVtLm9yZwAKCRCJcvTf3G3AJgjoD/wIWCiu4TWAziFlXy4Gmz2bFT+i bY7APft9SkCa9QENIohhKaNuDYSymjGfq+cupvZ3/erDdfjgwPAg/Cs8fiKnAdRY +sSFyDttcZu0Z9u7QB1TI2GG4E0MA/x9K001RwNzODj27yCj4mozuwoyfiuiTgHo Dclkl2p7b4SjXrWuh5tCSaaV3TO3af8rAseT63phoqBM0BwRwh7rza1A3LhDoeWY 27/uba919KwTfvBH+yqOtglsWIe9bBI+vr4J9OGb2DOdpWi3yhwe074mjCn5C/BR TpQDUT5moX0xsmdc4NaTKgyxWQ5EOa832TjNbPn5RMqaslVvnz6zYLCL+D1qYQvG Jasbg8qa8hqdxS+KxgPZTSfkmpYi80AxzBGngRrlXEMArLTW40dhmebXN5QiT0CP IKMYq7xuPiVN+GiZTl7hThqxFTOb5I6pbKDoIUFPCTIjJUcLTwM9y71dQ+XzJHKu GAHvzvzLSD2Y0BwaWedWinPjTqaBsOfeqecE77dIkMoFWa7Y0dx0BxySUT2dUKny 6Z28mMX6C9sf5ncdJLjcEXf0UDECfnuXw+1NJUwyaSBtlR56pWIk33YFWf1+u3Jn p6ZX6Jx6A77h4236A63zodSdna4NzuSQETmyqFvJOra8Gubidx2ggwL9EdxK4qHq tQJxHbxI4+vRpaOm7A== =TOWy -----END PGP SIGNATURE----- Merge tag 'pstore-v6.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux Pull pstore updates from Kees Cook: - Do not allow misconfigured ECC sizes (Sergey Shtylyov) - Allow for odd number of CPUs (Weichen Chen) - Refactor error handling to use cleanup.h * tag 'pstore-v6.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: pstore: inode: Use cleanup.h for struct pstore_private pstore: inode: Use __free(pstore_iput) for inode allocations pstore: inode: Convert mutex usage to guard(mutex) pstore: inode: Convert kfree() usage to __free(kfree) pstore: ram_core: fix possible overflow in persistent_ram_init_ecc() pstore/ram: Fix crash when setting number of cpus to an odd number	2024-01-10 10:53:02 -08:00
Linus Torvalds	4d925f6057	overlayfs updates for 6.8 -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEE9zuTYTs0RXF+Ke33EVvVyTe/1WoFAmWeozwACgkQEVvVyTe/ 1WqnyA//U2Ka5ZIncs/hA5D03LMyuCh9qlH5qAGce5vrBTxogTlFuTGGKtsUuCB5 Y4GALO+fw8aWAowt5X1XfHD3TETLVbCshT7dYjKsKy/ojANCbgkCipXBudYx+l9m fllwTZyueK0UY14kCU2DAV5PYsI/XVVykk71GSMOMLCUfRJfDI7R0vBD0NaUd7Kz Wp/M6t0MnXX23nGUdgNoroZPPj3Ts/gK2MXID+QHXGaR2+M1B1lLKfSu6TcRDLtn tbe/ivaw4y1jj3jfFwMC7sSSDyIJeZh9tBB4Rvv2vsMiYU8zAC6Eg35eIbPONu42 pUMd0QQa79H3cyYEDtUzyskzur0Jry5azzb8JdQWipgVKFh5g3CHce2XAFlVjw2a 9RyCKg41A9LvdB5l/PvBtsxig2PzaYqE09rXAfUM7eLNFlOLbL99uc1WJbIFfG43 Czh9vPxsuJ5RkdwS7R0m4GYDw8+BKW6WjpaC+Eje4I8X1rAQK0H/BLTCxe2dLRB7 0neAg8e3g6NdisRSLOP74xoEn/dhijNP7ENOFF1EdP/BFPHL7+sRsV6XYwwBeUAc c6YsxeAPylm6gvIq/ESoRiY+e5QWvImHIWP+zB/cySYdT0fQHL9WjO6/uZW0ALuv oZugICSmZ15pYlACIU8iYztRkS19CJZrUV7Gbq4+AurUKP8kCEI= =2Ohx -----END PGP SIGNATURE----- Merge tag 'ovl-update-6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/overlayfs/vfs Pull overlayfs updates from Amir Goldstein: "This is a very small update with no bug fixes and no new features. The larger update of overlayfs for this cycle, the re-factoring of overlayfs code into generic backing_file helpers, was already merged via Christian. Summary: - Simplify/clarify some code No bug fixes here, just some changes following questions from Al about overlayfs code that could be a little more simple to follow. - Overlayfs documentation style fixes Mainly fixes for ReST formatting suggested by documentation developers" * tag 'ovl-update-6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/overlayfs/vfs: overlayfs.rst: fix ReST formatting overlayfs.rst: use consistent feature names ovl: initialize ovl_copy_up_ctx.destname inside ovl_do_copy_up() ovl: remove redundant ofs->indexdir member	2024-01-10 10:48:22 -08:00
Linus Torvalds	0507d2526f	Changes since last update: - Add basic sub-page compressed data support; - Fix a memory leak on MicroLZMA and DEFLATE compression; - Fix a rare LZ4 inplace decompression issue on recent x86 CPUs; - Fix a KASAN issue reported by syzbot around crafted images; - Some cleanups. -----BEGIN PGP SIGNATURE----- iQJFBAABCgAvFiEEQ0A6bDUS9Y+83NPFUXZn5Zlu5qoFAmWeiBMRHHhpYW5nQGtl cm5lbC5vcmcACgkQUXZn5Zlu5qpRiw/9EexUiFCsXGUQP9P4M7KXoTxYDrjVi8uN xjTQAame59JGGqzBivVAlUvP/zqdluafFvstEsINv3VoLzw+OLDHHbGVN3w/Jn2C Thilxul3shRyVhcUK/7d0lDagY32ggwYpqKc4Cr/6RiVHtQ7fnJBdsELFetSeI6d FcLQed/S4C3MgN0g/j9erj8j0Rizgk+yoLqglIECaxIxTbmhnZFXcLfRDWF/OoEy AdZ48qK5sIEBbVAhH/5sxXNod77wbwuTjpnzSaC+9PiAHgKGdl3W5Vf3SnckosmX WFbwszqk5JISS01vcNISLZg1U47a9vVd7CDis7lkbtU2LddhFerTmf3Xr6FIc+qJ hvsr+0djRbArF66DvYjWcoYueHkYh/kgTsYXsvmqheKtyNZJIrk6d0YS32+6XKth TGwX55WdWrLqhfwac509EFYKD7moYCXMTFaJh4zhqMiz5TX5eVLlRcoU3Uy57x3/ Q2UWnPuYiGFuWrhnYWNgn1n6KoQgb/tD9jjQ5D/i9AJI9aHydkoUFJdQTgxMv9FY lfdxp94Yo2+XjJ9BhSACgVkSnGzv89/9iUQ0Fps08rnc25rD4upiipqtAuqDWn6N gcEXC6oAOywdWdR5Y+yP/N3hIMYxn48X2gt875jyYMe0KTzIETIyPG4l3YhfitTN 0pBOcZBOQkw= =TiFo -----END PGP SIGNATURE----- Merge tag 'erofs-for-6.8-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs Pull erofs updates from Gao Xiang: "In this cycle, we'd like to enable basic sub-page compressed data support for Android ecosystem (for vendors to try out 16k page size with 4k-block images in their compatibility mode) as well as container images (so that 4k-block images can be parsed on arm64 cloud servers using 64k page size.) In addition, there are several bugfixes and cleanups as usual. All commits have been in -next for a while and no potential merge conflict is observed. Summary: - Add basic sub-page compressed data support - Fix a memory leak on MicroLZMA and DEFLATE compression - Fix a rare LZ4 inplace decompression issue on recent x86 CPUs - Fix a KASAN issue reported by syzbot around crafted images - Some cleanups" * tag 'erofs-for-6.8-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs: erofs: make erofs_{err,info}() support NULL sb parameter erofs: avoid debugging output for (de)compressed data erofs: allow partially filled compressed bvecs erofs: enable sub-page compressed block support erofs: refine z_erofs_transform_plain() for sub-page block support erofs: fix ztailpacking for subpage compressed blocks erofs: fix up compacted indexes for block size < 4096 erofs: record `pclustersize` in bytes instead of pages erofs: support I/O submission for sub-page compressed blocks erofs: fix lz4 inplace decompression erofs: fix memory leak on short-lived bounced pages	2024-01-10 10:39:56 -08:00
Linus Torvalds	17b9e388c6	fscrypt updates for 6.8 Adjust the timing of the fscrypt keyring destruction, to prepare for btrfs's fscrypt support. Also document that CephFS supports fscrypt now. -----BEGIN PGP SIGNATURE----- iIoEABYIADIWIQSacvsUNc7UX4ntmEPzXCl4vpKOKwUCZZx4UBQcZWJpZ2dlcnNA Z29vZ2xlLmNvbQAKCRDzXCl4vpKOK85+AQCBHoG6R5UuPqafoDtabcCpxRW/ZHdo WzOwjvHz1/tq5AEApogvjPI/3v2gelLnG9ZrXUBZMWZN6W0LQbH/k1VHjQ8= =nvWY -----END PGP SIGNATURE----- Merge tag 'fscrypt-for-linus' of git://git.kernel.org/pub/scm/fs/fscrypt/linux Pull fscrypt updates from Eric Biggers: "Adjust the timing of the fscrypt keyring destruction, to prepare for btrfs's fscrypt support. Also document that CephFS supports fscrypt now" * tag 'fscrypt-for-linus' of git://git.kernel.org/pub/scm/fs/fscrypt/linux: fs: move fscrypt keyring destruction to after ->put_super f2fs: move release of block devices to after kill_block_super() fscrypt: document that CephFS supports fscrypt now fscrypt: update comment for do_remove_key() fscrypt.rst: update definition of struct fscrypt_context_v2	2024-01-10 10:24:49 -08:00
Linus Torvalds	49f4810356	NFSD 6.8 Release Notes The bulk of the patches for this release are clean-ups and minor bug fixes. There is one significant revert to mention: support for RDMA Read operations in the server's RPC-over-RDMA transport implementation has been fixed so it waits for Read completion in a way that avoids tying up an nfsd thread. This prevents a possible DoS vector if an RPC-over-RDMA client should become unresponsive during RDMA Read operations. As always I am grateful to NFSD contributors, reviewers, and testers. -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEKLLlsBKG3yQ88j7+M2qzM29mf5cFAmWdW34ACgkQM2qzM29m f5fKmw/+PcjoNDWR55kTmOo8j0h4HF8rhunvP2C50svnnsX63y1WKkLaxyAFN/Hl UFucJDQBjJvwi+PEbGOXcjkizuG5mhRBFvFIYDJYGWsE1s7B/v3E/Servvt1wSek UjoTjknYrqH6R3YfA8zBaWRJUXwvVQW3Bzo4mShrQK7He9/7nBHdUe0aWbAA9oW3 QgzKH/FzqCS03MvuxQv74KgBcl3diIrDaj041A3CtSnXzSKqwc3LaUAd5B4BL+oq GnxpV1rtZla50M4Ntddi+vSjUvHWZySQ1GEJj7rKLTwpGXkxM2NuMkGx676WR4Iv sYDX0fsica2elKbqJem8pk68qi6XEdZVAdoOHdgNJRClmYHby8xkrL/TYKiQZf42 IN9FogoVSZ+vSdI158Weim9+0Jqf+ffIh57ZtOyQQQAGZkdhB6GhcbdHJhQ9eOgB LAiAL7bsoWvDmBh5m9KnBmQYGpZoDUa6AT0bIvGD2O4/MdpHBkyT8Xwt+210nPOK mpBtxe5O8cUcg7A5/TwnVRg5jKp4CF8VWh2R8sGDhcYV8UfRthB38h4rHNhv4vxt l6ZUgmtTxrs1rCeh6aoiWTKXeQmI8meWlcet7cxw/axAsaTXkYPi5mslxF9f4O8u nQ8q7LuZQy2CKZO/t98STwx7s9OJcDOwcy51rnKK85TlCwnxFWg= =mIKg -----END PGP SIGNATURE----- Merge tag 'nfsd-6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux Pull nfsd updates from Chuck Lever: "The bulk of the patches for this release are clean-ups and minor bug fixes. There is one significant revert to mention: support for RDMA Read operations in the server's RPC-over-RDMA transport implementation has been fixed so it waits for Read completion in a way that avoids tying up an nfsd thread. This prevents a possible DoS vector if an RPC-over-RDMA client should become unresponsive during RDMA Read operations. As always I am grateful to NFSD contributors, reviewers, and testers" * tag 'nfsd-6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: (56 commits) nfsd: rename nfsd_last_thread() to nfsd_destroy_serv() SUNRPC: discard sv_refcnt, and svc_get/svc_put svc: don't hold reference for poolstats, only mutex. SUNRPC: remove printk when back channel request not found svcrdma: Implement multi-stage Read completion again svcrdma: Copy construction of svc_rqst::rq_arg to rdma_read_complete() svcrdma: Add back svcxprt_rdma::sc_read_complete_q svcrdma: Add back svc_rdma_recv_ctxt::rc_pages svcrdma: Clean up comment in svc_rdma_accept() svcrdma: Remove queue-shortening warnings svcrdma: Remove pointer addresses shown in dprintk() svcrdma: Optimize svc_rdma_cc_init() svcrdma: De-duplicate completion ID initialization helpers svcrdma: Move the svc_rdma_cc_init() call svcrdma: Remove struct svc_rdma_read_info svcrdma: Update the synopsis of svc_rdma_read_special() svcrdma: Update the synopsis of svc_rdma_read_call_chunk() svcrdma: Update synopsis of svc_rdma_read_multiple_chunks() svcrdma: Update synopsis of svc_rdma_copy_inline_range() svcrdma: Update the synopsis of svc_rdma_read_data_item() ...	2024-01-10 10:20:08 -08:00
Linus Torvalds	d8c8e595dc	dlm for 6.8 This set cleans up the interface between nfs lockd and dlm, which is handling nfs file locking for gfs2 and ocfs2. Very basic lockd functionality is fixed, in which the fl owner was using the lockd pid instead of the owner value from nfs. -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEcGkeEvkvjdvlR90nOBtzx/yAaaoFAmWDZFwACgkQOBtzx/yA aapu/hAAx/9ahq4Vm+T7Lpw6wGEKISUi5djZlqrN7EddHcyAMFFX/41PkOez9KJT Rr4Mp+MBB6xjDDco4uVZxhnWJCI6RKExSB4N+eMx0Rhs09Ksf8UCtxTvKaDa18fr ZwPmGNpE/a3khTkwC5h/98m8kOyYIqSOL8/cR8zGytkHkgDiyv4VqD0cHAvwxR5a O8jQDtssXld6sF5GxhVQnLQiu0eVfFLlaaSsb28ju+yMPVOTDxmwNkP3eP+8d1le lcNp82+C7UmzO5Ds1/SgBIJZoej/xipz00BAlGH1oieD4xRLCbkoJSQsGxpkPwEI I1V8fd7zaFQ1VnDHMeMrjl46qjUQKkCfDK/v9BCvN5x8sCqaqUydMQ0mD/424NXe A/JgjAtloIhIOqmX/K/h4jioTrFlVevtTAr9Cv/sq31VX0+ALJVS3ccbhv68gjiW Cflef7Va53mXYfIAs6qc60/ArpvrPUG7Bna4aIb5iVJj4z/OOjnTxyZVOD3wJetY bs4w2dSrafX589EN/gIyKka3iOMcJS7wVsvRME9KYVikNbHgQrSpsixHPlLdjGq+ cHbozutVQYnhaGI608yMjPZ+rXu5jYEfAIQnI8FABbi4VR29+SnzxrZllMICUZ+Y pfRQ6YkiuBRy2HSbnwudemj6iSrPqZEts2GDkqj2LDfkMWeycKM= =UBeR -----END PGP SIGNATURE----- Merge tag 'dlm-6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm Pull dlm updates from David Teigland: "This set cleans up the interface between nfs lockd and dlm, which is handling nfs file locking for gfs2 and ocfs2. Very basic lockd functionality is fixed, in which the fl owner was using the lockd pid instead of the owner value from nfs" * tag 'dlm-6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm: dlm: update format header reflect current format dlm: fix format seq ops type 4 dlm: implement EXPORT_OP_ASYNC_LOCK dlm: use FL_SLEEP to determine blocking vs non-blocking dlm: use fl_owner from lockd dlm: use kernel_connect() and kernel_bind()	2024-01-10 10:17:23 -08:00
Linus Torvalds	0c59ae1290	AFS fileserver rotation fix -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEqG5UsNXhtOCrfGQP+7dXa6fLC2sFAmWYJ6kACgkQ+7dXa6fL C2v6YBAAkDdqgWN96h2KOcd+El13Uxa3WNDjTHtzc0ZhjEDkzkU42sSF2yE0nerS 6kX18vibXC+TPnbBn1gOSGrVoFIC1kh/vUjrz/UQYfxXN19P8LE2wSdl+bC4nPT1 Qkrxkr+q4GSSJoYg9QUUAu0Hh2PvXMeDE/XyED6XiAkuDUbISO9yDeu+wo3wZM5L 1e8vRlg/2EQl2v1Crh5nC0tgJZbGULc2mCqi/rU5A9wdlKHFzwjU+2PTsbQNKE0m 0ueLblFeFRwBZpOfAUNNVAt3bwaSfhYpqUiiSldrU/JXhnx5CgY1kHzI3OPVedQt WMfp/epwO848i3qVM8dHJXc93NJeC3gTBK7gYRrH07MuK3Of1KRH3D8YBsE0/r0q NVcDQ6/eoni06CA8VMfSIEQ2+Q0m4xxUzAQURsOxRPY/FktzCKXMfpYTDZqbQfow SXrKmsPnMZe4DUnvdcTSU8B3+vybJH/JgEnZXRtCPOYNDSyMcPhKPG2ioOz4UV+M amQmpYfG4hzi1VmRrH57dwlXejBX16+zc9pLdZC5c0/phk3caYrJVMA8pwCOP4HM AvB5Yl6gH2aGj1kKjffL7nWnQ2QbD7VWUn98TqLPezOX7DwQHMMKvlfPnv6R87sy 0HMmj9VxCgOvGLOf1JdQoTxtb49ndM4Y5fPvKYK2awW5FkAacLM= =bHoG -----END PGP SIGNATURE----- Merge tag 'afs-fix-rotation-20240105' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs Pull afs updates from David Howells: "The majority of the patches are aimed at fixing and improving the AFS filesystem's rotation over server IP addresses, but there are also some fixes from Oleg Nesterov for the use of read_seqbegin_or_lock(). - Fix fileserver probe handling so that the next round of probes doesn't break ongoing server/address rotation by clearing all the probe result tracking. This could occasionally cause the rotation algorithm to drop straight through, give a 'successful' result without actually emitting any RPC calls, leaving the reply buffer in an undefined state. Instead, detach the probe results into a separate struct and allocate a new one each time we start probing and update the pointer to it. Probes are also sent in order of address preference to try and improve the chance that the preferred one will complete first. - Fix server rotation so that it uses configurable address preferences across on the probes that have completed so far than ranking them by RTT as the latter doesn't necessarily give the best route. The preference list can be altered by writing into /proc/net/afs/addr_prefs. - Fix the handling of Read-Only (and Backup) volume callbacks as there is one per volume, not one per file, so if someone performs a command that, say, offlines the volume but doesn't change it, when it comes back online we don't spam the server with a status fetch for every vnode we're using. Instead, check the Creation timestamp in the VolSync record when prompted by a callback break. - Handle volume regression (ie. a RW volume being restored from a backup) by scrubbing all cache data for that volume. This is detected from the VolSync creation timestamp. - Adjust abort handling and abort -> error mapping to match better with what other AFS clients do. - Fix offline and busy volume state handling as they only apply to individual server instances and not entire volumes and the rotation algorithm should go and look at other servers if available. Also make it sleep briefly before each retry if all the volume instances are unavailable" * tag 'afs-fix-rotation-20240105' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs: (40 commits) afs: trace: Log afs_make_call(), including server address afs: Fix offline and busy message emission afs: Fix fileserver rotation afs: Overhaul invalidation handling to better support RO volumes afs: Parse the VolSync record in the reply of a number of RPC ops afs: Don't leave DONTUSE/NEWREPSITE servers out of server list afs: Fix comment in afs_do_lookup() afs: Apply server breaks to mmap'd files in the call processor afs: Move the vnode/volume validity checking code into its own file afs: Defer volume record destruction to a workqueue afs: Make it possible to find the volumes that are using a server afs: Combine the endpoint state bools into a bitmask afs: Keep a record of the current fileserver endpoint state afs: Dispatch vlserver probes in priority order afs: Dispatch fileserver probes in priority order afs: Mark address lists with configured priorities afs: Provide a way to configure address priorities afs: Remove the unimplemented afs_cmp_addr_list() afs: Add some more info to /proc/net/afs/servers rxrpc: Create a procfile to display outstanding client conn bundles ...	2024-01-10 10:11:01 -08:00
Linus Torvalds	032500abc5	Stability improvements. -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEIodevzQLVs53l6BhNqiEXrVAjGQFAmWcF7sACgkQNqiEXrVA jGQCOw//RdOt12/cDRZxIgdhCotNBNAde/BtG95Mv8KnmYKpnFAWx1piAhDxyVun 4oUSJ6fG9VBJeKogQnP1LDW3oBH3WZ3twM4lrTxHo0BOhuv/u/CXJkchR6LHpl20 9ewtbQ3MOBh2MHhXd2Siyc0sp14STk0F8yNGOJawM1lw5GiTMkZMb6BdGaoRSahF bqjjJnlOeDF0Znzg+CL5X2pWAsUa0oPXzFXuuEYySvTn9EkwPK0D/kllGiwqt7yq XIULsUd89Nj7hznUT+ylFzctwxIlM5DB3z81eP9rsf+R7dHkriBRCJVPBuxvKajJ MTKxzEBP0/I0v276T4DYABVdYZu9BL9dK/eSgo3lxjQnC5BoDMR/AHcP93NQknxf 2aNb8mt1Uq44Qh7iyrtJuB6OlfnlK5rVtYJcGgHfhEWL7Gf4y2UiJH7QspVwH7J5 KQwOmXvUpR5dHPbnEQIgYKM0LlAapQQ9jnzf2y4i/Z33l2EE/KNsI6+K5VG6oTDS RDYASqH0hd+P+7z+7Qhuwu2cFG8oRcvCQh6nEcrvq1md1WEIspt1frrNDEeDmw6o pvwflfjvD3126sqnlIkAOJlqvthERcA13mhvgj891IXlCSlth65IYM83i57MxCUd uZcDkYKsmXhUYo5MEM0dPfP3+LKml52XJqGjR7gOSZ8hm0tHo8k= =RXjP -----END PGP SIGNATURE----- Merge tag 'jfs-6.8' of github.com:kleikamp/linux-shaggy Pull jfs updates from David Kleikamp: "Stability improvements" * tag 'jfs-6.8' of github.com:kleikamp/linux-shaggy: jfs: Add missing set_freezable() for freezable kthread jfs: fix array-index-out-of-bounds in diNewExt jfs: fix shift-out-of-bounds in dbJoin jfs: fix uaf in jfs_evict_inode jfs: fix array-index-out-of-bounds in dbAdjTree jfs: fix slab-out-of-bounds Read in dtSearch UBSAN: array-index-out-of-bounds in dtSplitRoot FS:JFS:UBSAN:array-index-out-of-bounds in dbAdjTree	2024-01-10 10:04:36 -08:00
Linus Torvalds	bfed9a9294	gfs2 updates - Add support for non-blocking lookup (MAY_NOT_BLOCK / LOOKUP_RCU) - Various minor fixes and cleanups -----BEGIN PGP SIGNATURE----- iQJIBAABCAAyFiEEJZs3krPW0xkhLMTc1b+f6wMTZToFAmWb00YUHGFncnVlbmJh QHJlZGhhdC5jb20ACgkQ1b+f6wMTZTpb6w//Sj7bN2SsLlx131LPxnzGnu+LgQ7b vd9atU4+DSov2J/KpfX+arxiZSCcB/5FdatpeulSsczjtvvp/JyWuOQSudBlxA+N bUpRrzoLoIrm1rkemLLOpwHmP1WkmpjCsxRilheoXi9jqw3MROoN/ZIpUVfnaGBy NKWsK7rr1W0+nkKIColCRCfCujkJJ+s9Js8fsmOtOZA8+JYCdsZo7q7VzbhdGBFh IPLFEHiRmJIBjECvs76T3MtxkdYQElhsCacE8i9ozqPlDoBDdj1zKzYD2wrd5t0Z V49Ef6IKoezuxUob7f8ReHSOHUxc4kDxptJQsP6TI4bs+lBUTUBRtjlWiUwOwo2H MdklRpGaxt0aChHqSXRA5+eDURRvq4Ly42vXnYFdiiNofwGYWrsEc00PUEBr55kF 9DlEfl/GP2gisleqmNTW8OSPV+/WP46KG0f9uy5dDDCvXCw66wdu11LXsF7KQwFc CRcaXLAgbk+M3qi3XBykEoTvugFQ06s6CSty0zmyNwwGJEelgfXwQl0ISO6L/Qnb NJIurC20cwizlnRPvMT5MUqXMuwuE1mTMQdfOMACYsGMBkfXrObteK2EUPCfK0uv nHPD/RCfZxboXq9B7xdltEoFPsNfyipT2YfUASXQJ9txZLmKrU9ZP+rMc/Dmeekr cvog8NJ+HvzE7JM= =vN0N -----END PGP SIGNATURE----- Merge tag 'gfs2-v6.7-rc1-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2 Pull gfs2 updates from Andreas Gruenbacher: - Add support for non-blocking lookup (MAY_NOT_BLOCK / LOOKUP_RCU) - Various minor fixes and cleanups * tag 'gfs2-v6.7-rc1-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2: gfs2: Fix freeze consistency check in log_write_header gfs2: Refcounting fix in gfs2_thaw_super gfs2: Minor gfs2_{freeze,thaw}_super cleanup gfs2: Use wait_event_freezable_timeout() for freezable kthread gfs2: Add missing set_freezable() for freezable kthread gfs2: Remove use of error flag in journal reads gfs2: Lift withdraw check out of gfs2_ail1_empty gfs2: Rename gfs2_withdrawn to gfs2_withdrawing_or_withdrawn gfs2: Mark withdraws as unlikely gfs2: Minor gfs2_ail1_empty cleanup gfs2: use is_subdir() gfs2: d_obtain_alias(ERR_PTR(...)) will do the right thing gfs2: Use GL_NOBLOCK flag for non-blocking lookups gfs2: Add GL_NOBLOCK flag gfs2: rgrp: fix kernel-doc warnings gfs2: fix kernel BUG in gfs2_quota_cleanup gfs2: Fix inode_go_instantiate description gfs2: Fix kernel NULL pointer dereference in gfs2_rgrp_dump	2024-01-10 09:36:40 -08:00
Linus Torvalds	affc5af36b	for-6.8-tag -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEE8rQSAMVO+zA4DBdWxWXV+ddtWDsFAmWYTmMACgkQxWXV+ddt WDvPRg/+KgS5LV3nNC0MguYcTMQxmgeutIgXZIMfeA3v6EnFS7nj8leP4EPc6+bj JPSkwj4u2vHVwpnTVuEAuJUXnmFY+Qu70nVy6bM2uOHOYTVBQ8zRVK4cErNNLWCp OekDaADR53RrZ/xprlQ7b7Ph0Ch2uq9OrpH50IcyquEsH1ffkxlqwyrvth4/8dxC 6zgsFHWrbtVKJf0DYoQPpjEPz5tpdQ+xHZwtmf1cNlUgI1objODr/ZTqXtZqTfw4 /GwrtDPbEri53K/qjgr0dDH7pBVqD6PtnbgoHfYkiizZ0G7UkmlaK6rZIurtATJb Yk/RCqCUp9tPC4yeFSewFMm1Y8Ae3rkUBG7rnYkvMmBspMqyh/kQAWSBimF5yk/y vFEdFTe9AbdvP19Nw0CqovLzaO6RrOXCL1usnFvCmBgvF5gZAv63ZW1njP3ZoNta wB8Rs6hxdRkph8Dk7yvYf54uUR+JyKqjHY6egg2qkKTjz0CSf6qQFyFZXpr81m97 gK4WN5SeP/P2ukRbBKKyzZ5IljUxZuVatvJa0tktd7kAbU26WLzofOJ7pX+iqimM F2G7gKGJZykLY1WPntXBp9Dg97Ras2O5iViQ7ZKwRdOx1yZS5zzTYlIznHBAmXbL UgXfVnpJH1xFdkvedNTn+Fz9BHNV1K2a2AT7VITj7sxz23z3aJA= =4sw3 -----END PGP SIGNATURE----- Merge tag 'for-6.8-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs updates from David Sterba: "There are no exciting changes for users, it's been mostly API conversions and some fixes or refactoring. The mount API conversion is a base for future improvements that would come with VFS. Metadata processing has been converted to folios, not yet enabling the large folios but it's one patch away once everything gets tested enough. Core changes: - convert extent buffers to folios: - direct API conversion where possible - performance can drop by a few percent on metadata heavy workloads, the folio sizes are not constant and the calculations add up in the item helpers - both regular and subpage modes - data cannot be converted yet, we need to port that to iomap and there are some other generic changes required - convert mount to the new API, should not be user visible: - options deprecated long time ago have been removed: inode_cache, recovery - the new logic that splits mount to two phases slightly changes timing of device scanning for multi-device filesystems - LSM options will now work (like for selinux) - convert delayed nodes radix tree to xarray, preserving the preload-like logic that still allows to allocate with GFP_NOFS - more validation of sysfs value of scrub_speed_max - refactor chunk map structure, reduce size and improve performance - extent map refactoring, smaller data structures, improved performance - reduce size of struct extent_io_tree, embedded in several structures - temporary pages used for compression are cached and attached to a shrinker, this may slightly improve performance - in zoned mode, remove redirty extent buffer tracking, zeros are written in case an out-of-order is detected and proper data are written to the actual write pointer - cleanups, refactoring, error message improvements, updated tests - verify and update branch name or tag - remove unwanted text" * tag 'for-6.8-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: (89 commits) btrfs: pass btrfs_io_geometry into btrfs_max_io_len btrfs: pass struct btrfs_io_geometry to set_io_stripe btrfs: open code set_io_stripe for RAID56 btrfs: change block mapping to switch/case in btrfs_map_block btrfs: factor out block mapping for single profiles btrfs: factor out block mapping for RAID5/6 btrfs: reduce scope of data_stripes in btrfs_map_block btrfs: factor out block mapping for RAID10 btrfs: factor out block mapping for DUP profiles btrfs: factor out RAID1 block mapping btrfs: factor out block-mapping for RAID0 btrfs: re-introduce struct btrfs_io_geometry btrfs: factor out helper for single device IO check btrfs: migrate btrfs_repair_io_failure() to folio interfaces btrfs: migrate eb_bitmap_offset() to folio interfaces btrfs: migrate various end io functions to folios btrfs: migrate subpage code to folio interfaces btrfs: migrate get_eb_page_index() and get_eb_offset_in_page() to folios btrfs: don't double put our subpage reference in alloc_extent_buffer btrfs: cleanup metadata page pointer usage ...	2024-01-10 09:27:40 -08:00
Linus Torvalds	12958e9c4c	New code for 6.8: * New features/functionality * Online repair * Reserve disk space for online repairs. * Fix misinteraction between the AIL and btree bulkloader because of which the bulk load fails to queue a buffer for writeback if it happens to be on the AIL list. * Prevent transaction reservation overflows when reaping blocks during online repair. * Whenever possible, bulkloader now copies multiple records into a block. * Support repairing of 1. Per-AG free space, inode and refcount btrees. 2. Ondisk inodes. 3. File data and attribute fork mappings. * Verify the contents of 1. Inode and data fork of realtime bitmap file. 2. Quota files. * Introduce MF_MEM_PRE_REMOVE. This will be used to notify tasks about a pmem device being removed. * Bug fixes * Fix memory leak of recovered attri intent items. * Fix UAF during log intent recovery. * Fix realtime geometry integer overflows. * Prevent scrub from live locking in xchk_iget. * Prevent fs shutdown when removing files during low free disk space. * Prevent transaction reservation overflow when extending an RT device. * Prevent incorrect warning from being printed when extending a filesystem. * Fix an off-by-one error in xreap_agextent_binval. * Serialize access to perag radix tree during deletion operation. * Fix perag memory leak during growfs. * Allow allocation of minlen realtime extent when the maximum sized realtime free extent is minlen in size. * Cleanups * Remove duplicate boilerplate code spread across functionality associated with different log items. * Cleanup resblks interfaces. * Pass defer ops pointer to defer helpers instead of an enum. * Initialize di_crc in xfs_log_dinode to prevent KMSAN warnings. * Use static_assert() instead of BUILD_BUG_ON_MSG() to validate size of structures and structure member offsets. This is done in order to be able to share the code with userspace. * Move XFS documentation under a new directory specific to XFS. * Do not invoke deferred ops' ->create_done callback if the deferred operation does not have an intent item associated with it. * Remove duplicate inclusion of header files from scrub/health.c. * Refactor Realtime code. * Cleanup attr code. Signed-off-by: Chandan Babu R <chandanbabu@kernel.org> -----BEGIN PGP SIGNATURE----- iHUEABYIAB0WIQQjMC4mbgVeU7MxEIYH7y4RirJu9AUCZZJQbwAKCRAH7y4RirJu 9JjkAP9Zg0QZNmAMsZwvgEBbuF/OnHKl4GmPA5uq0jPmSWCOqAEA0HjlOmuNfQWn 93fIw6CPbt+9QCluTYBwUisKLIJ/wgA= =qmO0 -----END PGP SIGNATURE----- Merge tag 'xfs-6.8-merge-3' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux Pull xfs updates from Chandan Babu: "New features/functionality: - Online repair: - Reserve disk space for online repairs - Fix misinteraction between the AIL and btree bulkloader because of which the bulk load fails to queue a buffer for writeback if it happens to be on the AIL list - Prevent transaction reservation overflows when reaping blocks during online repair - Whenever possible, bulkloader now copies multiple records into a block - Support repairing of 1. Per-AG free space, inode and refcount btrees 2. Ondisk inodes 3. File data and attribute fork mappings - Verify the contents of 1. Inode and data fork of realtime bitmap file 2. Quota files - Introduce MF_MEM_PRE_REMOVE. This will be used to notify tasks about a pmem device being removed Bug fixes: - Fix memory leak of recovered attri intent items - Fix UAF during log intent recovery - Fix realtime geometry integer overflows - Prevent scrub from live locking in xchk_iget - Prevent fs shutdown when removing files during low free disk space - Prevent transaction reservation overflow when extending an RT device - Prevent incorrect warning from being printed when extending a filesystem - Fix an off-by-one error in xreap_agextent_binval - Serialize access to perag radix tree during deletion operation - Fix perag memory leak during growfs - Allow allocation of minlen realtime extent when the maximum sized realtime free extent is minlen in size Cleanups: - Remove duplicate boilerplate code spread across functionality associated with different log items - Cleanup resblks interfaces - Pass defer ops pointer to defer helpers instead of an enum - Initialize di_crc in xfs_log_dinode to prevent KMSAN warnings - Use static_assert() instead of BUILD_BUG_ON_MSG() to validate size of structures and structure member offsets. This is done in order to be able to share the code with userspace - Move XFS documentation under a new directory specific to XFS - Do not invoke deferred ops' ->create_done callback if the deferred operation does not have an intent item associated with it - Remove duplicate inclusion of header files from scrub/health.c - Refactor Realtime code - Cleanup attr code" * tag 'xfs-6.8-merge-3' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: (123 commits) xfs: use the op name in trace_xlog_intent_recovery_failed xfs: fix a use after free in xfs_defer_finish_recovery xfs: turn the XFS_DA_OP_REPLACE checks in xfs_attr_shortform_addname into asserts xfs: remove xfs_attr_sf_hdr_t xfs: remove struct xfs_attr_shortform xfs: use xfs_attr_sf_findname in xfs_attr_shortform_getvalue xfs: remove xfs_attr_shortform_lookup xfs: simplify xfs_attr_sf_findname xfs: move the xfs_attr_sf_lookup tracepoint xfs: return if_data from xfs_idata_realloc xfs: make if_data a void pointer xfs: fold xfs_rtallocate_extent into xfs_bmap_rtalloc xfs: simplify and optimize the RT allocation fallback cascade xfs: reorder the minlen and prod calculations in xfs_bmap_rtalloc xfs: remove XFS_RTMIN/XFS_RTMAX xfs: remove rt-wrappers from xfs_format.h xfs: factor out a xfs_rtalloc_sumlevel helper xfs: tidy up xfs_rtallocate_extent_exact xfs: merge the calls to xfs_rtallocate_range in xfs_rtallocate_block xfs: reflow the tail end of xfs_rtallocate_extent_block ...	2024-01-10 08:45:22 -08:00
Linus Torvalds	32720aca90	\n -----BEGIN PGP SIGNATURE----- iQEzBAABCAAdFiEEq1nRK9aeMoq1VSgcnJ2qBz9kQNkFAmWb/aIACgkQnJ2qBz9k QNkV4AgAvsnFBOX3/jkZTA/2a+8AOJXx0uMdgupfMBzNkKXhhfWictaqWsiw7IzZ pypnHV1GL9dbFt7z+mEfApvXby5ZBY245Tn5Qx2rPUYZVAFDtLR4ELT4+9RBrrdJ DWxI8DgKetHeQLeP8neXVdU7ec/gP3J28lsYzqFAkJXA/ik13pWh16sLSZfYk7WX gmdSW+Ize0YIfPJdQYMy0sJLnzB4iOvv4CbwbZqAm7ZesXHuisSolfrnvLLPU5ju s7hf/dt46zakT6n2uYckhcv8Q7NnBpPnmgWRJI/utNtCAjLyZJJXbycUKFIGdBpm jn/oeDwiYb/tXIB39ISt4QNZodfv5Q== =rbo3 -----END PGP SIGNATURE----- Merge tag 'fsnotify_for_v6.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs Pull fsnotify updates from Jan Kara: "fanotify changes allowing use of fanotify directory events even for filesystems such as FUSE which don't report proper fsid" * tag 'fsnotify_for_v6.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs: fanotify: allow "weak" fsid when watching a single filesystem fanotify: store fsid in mark instead of in connector	2024-01-10 08:38:33 -08:00
Linus Torvalds	9963327f8e	\n -----BEGIN PGP SIGNATURE----- iQEzBAABCAAdFiEEq1nRK9aeMoq1VSgcnJ2qBz9kQNkFAmWb/RsACgkQnJ2qBz9k QNkHtQf+IhhWZPZhs2eLx5/TtAxf1ya9jaD2lbPZa5Cq2GFeR5FSLxN70zjlwyrD HYhWHRwfqEwIbptMwzf1rpbMqEs4U5PigdhjAKpJQjMVyWyKD49yt97KVbBmQDUg 1PUCqRq6RjFL1wQjW7sc5ZBslNU6RlSSEkejro8qLHoG1xQ+aDMNeZIBD11GpLha hGjXZRr64NDn2rqI71ZvNF3IxQRRl4HOOLN7YutHSYHHX5O4YFnhe9jxRgq1PDKU 1+ukKNu7kJMK+hCzK2+aKJ9OzzWyECG8PLI2F2fw54AmZDNRHrTA5recXUhFrPXg 4cHeh9kwphDlhmMu0+XIkN/P3wJHWA== =jnIR -----END PGP SIGNATURE----- Merge tag 'fs_for_v6.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs Pull small quota cleanup from Jan Kara. * tag 'fs_for_v6.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs: quota: convert dquot_claim_space_nodirty() to return void	2024-01-10 08:32:04 -08:00
Chunhai Guo	aa12a790d3	erofs: make erofs_{err,info}() support NULL sb parameter Make erofs_err() and erofs_info() support NULL sb parameter for more general usage. Suggested-by: Gao Xiang <xiang@kernel.org> Signed-off-by: Chunhai Guo <guochunhai@vivo.com> Link: https://lore.kernel.org/r/20240103123202.3054718-1-guochunhai@vivo.com Reviewed-by: Jingbo Xu <jefflexu@linux.alibaba.com> Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>	2024-01-10 19:59:39 +08:00
Gao Xiang	496530c7c1	erofs: avoid debugging output for (de)compressed data Syzbot reported a KMSAN warning, erofs: (device loop0): z_erofs_lz4_decompress_mem: failed to decompress -12 in[46, 4050] out[917] ===================================================== BUG: KMSAN: uninit-value in hex_dump_to_buffer+0xae9/0x10f0 lib/hexdump.c:194 .. print_hex_dump+0x13d/0x3e0 lib/hexdump.c:276 z_erofs_lz4_decompress_mem fs/erofs/decompressor.c:252 [inline] z_erofs_lz4_decompress+0x257e/0x2a70 fs/erofs/decompressor.c:311 z_erofs_decompress_pcluster fs/erofs/zdata.c:1290 [inline] z_erofs_decompress_queue+0x338c/0x6460 fs/erofs/zdata.c:1372 z_erofs_runqueue+0x36cd/0x3830 z_erofs_read_folio+0x435/0x810 fs/erofs/zdata.c:1843 The root cause is that the printed decompressed buffer may be filled incompletely due to decompression failure. Since they were once only used for debugging, get rid of them now. Reported-and-tested-by: syzbot+6c746eea496f34b3161d@syzkaller.appspotmail.com Closes: https://lore.kernel.org/r/000000000000321c24060d7cfa1c@google.com Reviewed-by: Yue Hu <huyue2@coolpad.com> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> Link: https://lore.kernel.org/r/20231227151903.2900413-1-hsiangkao@linux.alibaba.com	2024-01-10 19:59:39 +08:00
Steve French	26ba1bf310	cifs: update internal module version number for cifs.ko From 2.46 to 2.47 Signed-off-by: Steve French <stfrench@microsoft.com>	2024-01-09 23:42:51 -06:00
Kevin Hao	8fb7b72392	ksmbd: Add missing set_freezable() for freezable kthread The kernel thread function ksmbd_conn_handler_loop() invokes the try_to_freeze() in its loop. But all the kernel threads are non-freezable by default. So if we want to make a kernel thread to be freezable, we have to invoke set_freezable() explicitly. Signed-off-by: Kevin Hao <haokexin@gmail.com> Acked-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>	2024-01-09 20:01:16 -06:00
Fedor Pchelkin	8cf9bedfc3	ksmbd: free ppace array on error in parse_dacl The ppace array is not freed if one of the init_acl_state() calls inside parse_dacl() fails. At the moment the function may fail only due to the memory allocation errors so it's highly unlikely in this case but nevertheless a fix is needed. Move ppace allocation after the init_acl_state() calls with proper error handling. Found by Linux Verification Center (linuxtesting.org). Fixes: `e2f34481b2` ("cifsd: add server-side procedures for SMB3") Cc: stable@vger.kernel.org Signed-off-by: Fedor Pchelkin <pchelkin@ispras.ru> Acked-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>	2024-01-09 19:27:36 -06:00
Linus Torvalds	a7e4c6cf5b	EFI updates for v6.8 - Fix a syzbot reported issue in efivarfs where concurrent accesses to the file system resulted in list corruption - Add support for accessing EFI variables via the TEE subsystem (and a trusted application in the secure world) instead of via EFI runtime firmware running in the OS's execution context - Avoid linker tricks to discover the image base on LoongArch -----BEGIN PGP SIGNATURE----- iHUEABYIAB0WIQQQm/3uucuRGn1Dmh0wbglWLn0tXAUCZYVaHQAKCRAwbglWLn0t XPm/AQDzX9A6TND00eOLYYWw91kybHnzrVd8GRKOv2EIxGz33AEAgW6nXIJlBRax MBq6S/sXdyknuCC3sO7H9FexdD4BzQM= =MZUx -----END PGP SIGNATURE----- Merge tag 'efi-next-for-v6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi Pull EFI updates from Ard Biesheuvel: - Fix a syzbot reported issue in efivarfs where concurrent accesses to the file system resulted in list corruption - Add support for accessing EFI variables via the TEE subsystem (and a trusted application in the secure world) instead of via EFI runtime firmware running in the OS's execution context - Avoid linker tricks to discover the image base on LoongArch * tag 'efi-next-for-v6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi: efi: memmap: fix kernel-doc warnings efi/loongarch: Directly position the loaded image file efivarfs: automatically update super block flag efi: Add tee-based EFI variable driver efi: Add EFI_ACCESS_DENIED status code efi: expose efivar generic ops register function efivarfs: Move efivarfs list into superblock s_fs_info efivarfs: Free s_fs_info on unmount efivarfs: Move efivar availability check into FS context init efivarfs: force RO when remounting if SetVariable is not supported	2024-01-09 17:11:27 -08:00
Linus Torvalds	bd012f3a5b	ACPI updates for 6.8-rc1 - Add CSI-2 and DisCo for Imaging support to the ACPI device enumeration code (Sakari Ailus, Rafael J. Wysocki). - Adjust the cpufreq thermal reduction algorithm in the ACPI processor driver for Tegra241 (Srikar Srimath Tirumala, Arnd Bergmann). - Make acpi_proc_quirk_mwait_check() x86-specific (Rafael J. Wysocki). - Switch over ACPI to using a threaded interrupt handler for the SCI (Rafael J. Wysocki). - Allow ACPI Notify () handlers to run on all CPUs and clean up the ACPI interface for deferred events processing (Rafael J. Wysocki). - Switch over the ACPI EC driver to using a threaded handler for the dedicated IRQ on systems without the EC GPE (Rafael J. Wysocki). - Adjust code using ACPICA spinlocks and the ACPI EC driver spinlock to keep local interrupts on (Rafael J. Wysocki). - Adjust the USB4 _OSC handshake to correctly handle cases in which certain types of OS control are denied by the platform (Mika Westerberg). - Correct and clean up the generic function for parsing ACPI data-only tables with array structure (Yuntao Wang). - Modify acpi_dev_uid_match() to support different types of its second argument and adjust its users accordingly (Raag Jadav). - Clean up code related to acpi_evaluate_reference() and ACPI device lists (Rafael J. Wysocki). - Use generic ACPI helpers for evaluating trip point temperature objects in the ACPI thermal zone driver (Rafael J. Wysockii, Arnd Bergmann). - Add Thermal fast Sampling Period (_TFP) support to the ACPI thermal zone driver (Jeff Brasen). - Modify the ACPI LPIT table handling code to avoid u32 multiplication overflows in state residency computations (Nikita Kiryushin). - Drop an unused helper function from the ACPI backlight (video) driver and add a clarifying comment to it (Hans de Goede). - Update the ACPI backlight driver to avoid using uninitialized memory in some cases (Nikita Kiryushin). - Add ACPI backlight quirk for the Colorful X15 AT 23 laptop (Yuluo Qiu). - Add support for vendor-defined error types to the ACPI APEI error injection code (Avadhut Naik). - Adjust APEI to properly set MF_ACTION_REQUIRED on synchronous memory failure events, so they are handled differently from the asynchronous ones (Shuai Xue). - Fix NULL pointer dereference check in the ACPI extlog driver (Prarit Bhargava). - Adjust the ACPI extlog driver to clear the Extended Error Log status when RAS_CEC handled the error (Tony Luck). - Add IRQ override quirks for some Infinity laptops and for TongFang GMxXGxx (David McFarland, Hans de Goede). - Clean up the ACPI NUMA code and fix it to ensure that fake_pxm is not the same as one of the real pxm values (Yuntao Wang). - Fix the fractional clock divider flags in the ACPI LPSS (Intel SoC) driver so as to prevent miscalculation of the values in the clock divider (Andy Shevchenko). - Adjust comments in the ACPI watchdog driver to prevent kernel-doc from complaining during documentation builds (Randy Dunlap). - Make the ACPI button driver send wakeup key events to user space in addition to power button events on systems that can be woken up by the power button (Ken Xue). - Adjust pnpacpi_parse_allocated_vendor() to use memcpy() on a full structure field (Dmitry Antipov). -----BEGIN PGP SIGNATURE----- iQJGBAABCAAwFiEE4fcc61cGeeHD/fCwgsRv/nhiVHEFAmWb8asSHHJqd0Byand5 c29ja2kubmV0AAoJEILEb/54YlRxhsQP/jfRiEP7L9WUl66PdzxSWi1u7bVUZIbs z07ujAFdAbvpdM1WgWVq6mSzYewAqIm0A9Koabj7zKuG4VPh0Gjvq26jrK/et65m RJhC/qcnZ4h/2bELf9/JE7FIQMDWBGK8gNHBBXVQOZrQYIiBzJ2xyHJ4F0AvLVW6 GGuX/4mb00jlWGr6uot6qjBgLLxY0EowneLUuH4onEWrThoNWy7zbD34LSsKuljA a69UkQPetXbkX4XQYnt4K4BAnwjRQNU2DlUE9lpMtheTS70wilxrC+P0XaETeO7c NCm38X2aUv/hSwJ0BekBRdNEvG/WQsfRdOt9jWAkoCL3oDCZdOgfM6Eas7ZDLF2n RoxLk2O9UXFwaSSGBVgkRLPCVyWBNI6C8GXnVDN8f9hqIk+jmlsXaXghpzVlGS54 +ox6fjO81zJjEBxSP5ACCTNZq3BwwHhPhygtIkTO5JQ9SPn+WYCPM0C5Lcvzoj7A x7cdOguddhAi4ZWcoRo2cg7qN6vVaDgDgV+ylzh7q5N4cBY4edCJLzcFFuasriN4 j9/Uj/EgCafrnOhlTJz0iZkAbPZ6T/qa3qBfF948dtFRkztTsddmGA4xof90jfG9 /FLXL4wSiXK7jbFeUb1OCLOVANWpjHP3pM3gmnggiI3ApcweEGilhhbgVr7FuCG8 7qj78EUqNVbW =Ntzm -----END PGP SIGNATURE----- Merge tag 'acpi-6.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull ACPI updates from Rafael Wysocki: "From the new features standpoint, the most significant change here is the addition of CSI-2 and MIPI DisCo for Imaging support to the ACPI device enumeration code that will allow MIPI cameras to be enumerated through the platform firmware on systems using ACPI. Also significant is the switch-over to threaded interrupt handlers for the ACPI SCI and the dedicated EC interrupt (on systems where the former is not used) which essentially allows all ACPI code to run with local interrupts enabled. That should improve responsiveness significantly on systems where multiple GPEs are enabled and the handling of one SCI involves many I/O address space accesses which previously had to be carried out in one go with disabled interrupts on the local CPU. Apart from the above, the ACPI thermal zone driver will use the Thermal fast Sampling Period (_TFP) object if available, which should allow temperature changes to be followed more accurately on some systems, the ACPI Notify () handlers can run on all CPUs (not just on CPU0), which should generally speed up the processing of events signaled through the ACPI SCI, and the ACPI power button driver will trigger wakeup key events via the input subsystem (on systems where it is a system wakeup device) In addition to that, there are the usual bunch of fixes and cleanups. Specifics: - Add CSI-2 and DisCo for Imaging support to the ACPI device enumeration code (Sakari Ailus, Rafael J. Wysocki) - Adjust the cpufreq thermal reduction algorithm in the ACPI processor driver for Tegra241 (Srikar Srimath Tirumala, Arnd Bergmann) - Make acpi_proc_quirk_mwait_check() x86-specific (Rafael J. Wysocki) - Switch over ACPI to using a threaded interrupt handler for the SCI (Rafael J. Wysocki) - Allow ACPI Notify () handlers to run on all CPUs and clean up the ACPI interface for deferred events processing (Rafael J. Wysocki) - Switch over the ACPI EC driver to using a threaded handler for the dedicated IRQ on systems without the EC GPE (Rafael J. Wysocki) - Adjust code using ACPICA spinlocks and the ACPI EC driver spinlock to keep local interrupts on (Rafael J. Wysocki) - Adjust the USB4 _OSC handshake to correctly handle cases in which certain types of OS control are denied by the platform (Mika Westerberg) - Correct and clean up the generic function for parsing ACPI data-only tables with array structure (Yuntao Wang) - Modify acpi_dev_uid_match() to support different types of its second argument and adjust its users accordingly (Raag Jadav) - Clean up code related to acpi_evaluate_reference() and ACPI device lists (Rafael J. Wysocki) - Use generic ACPI helpers for evaluating trip point temperature objects in the ACPI thermal zone driver (Rafael J. Wysockii, Arnd Bergmann) - Add Thermal fast Sampling Period (_TFP) support to the ACPI thermal zone driver (Jeff Brasen) - Modify the ACPI LPIT table handling code to avoid u32 multiplication overflows in state residency computations (Nikita Kiryushin) - Drop an unused helper function from the ACPI backlight (video) driver and add a clarifying comment to it (Hans de Goede) - Update the ACPI backlight driver to avoid using uninitialized memory in some cases (Nikita Kiryushin) - Add ACPI backlight quirk for the Colorful X15 AT 23 laptop (Yuluo Qiu) - Add support for vendor-defined error types to the ACPI APEI error injection code (Avadhut Naik) - Adjust APEI to properly set MF_ACTION_REQUIRED on synchronous memory failure events, so they are handled differently from the asynchronous ones (Shuai Xue) - Fix NULL pointer dereference check in the ACPI extlog driver (Prarit Bhargava) - Adjust the ACPI extlog driver to clear the Extended Error Log status when RAS_CEC handled the error (Tony Luck) - Add IRQ override quirks for some Infinity laptops and for TongFang GMxXGxx (David McFarland, Hans de Goede) - Clean up the ACPI NUMA code and fix it to ensure that fake_pxm is not the same as one of the real pxm values (Yuntao Wang) - Fix the fractional clock divider flags in the ACPI LPSS (Intel SoC) driver so as to prevent miscalculation of the values in the clock divider (Andy Shevchenko) - Adjust comments in the ACPI watchdog driver to prevent kernel-doc from complaining during documentation builds (Randy Dunlap) - Make the ACPI button driver send wakeup key events to user space in addition to power button events on systems that can be woken up by the power button (Ken Xue) - Adjust pnpacpi_parse_allocated_vendor() to use memcpy() on a full structure field (Dmitry Antipov)" * tag 'acpi-6.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (56 commits) ACPI: resource: Add Infinity laptops to irq1_edge_low_force_override ACPI: button: trigger wakeup key events ACPI: resource: Add another DMI match for the TongFang GMxXGxx ACPI: EC: Use a spin lock without disabing interrupts ACPI: EC: Use a threaded handler for dedicated IRQ ACPI: OSL: Use spin locks without disabling interrupts ACPI: APEI: set memory failure flags as MF_ACTION_REQUIRED on synchronous events ACPI: utils: Introduce helper for _DEP list lookup ACPI: utils: Fix white space in struct acpi_handle_list definition ACPI: utils: Refine acpi_handle_list_equal() slightly ACPI: utils: Return bool from acpi_evaluate_reference() ACPI: utils: Rearrange in acpi_evaluate_reference() ACPI: arm64: export acpi_arch_thermal_cpufreq_pctg() ACPI: extlog: Clear Extended Error Log status when RAS_CEC handled the error ACPI: LPSS: Fix the fractional clock divider flags ACPI: NUMA: Fix the logic of getting the fake_pxm value ACPI: NUMA: Optimize the check for the availability of node values ACPI: NUMA: Remove unnecessary check in acpi_parse_gi_affinity() ACPI: watchdog: fix kernel-doc warnings ACPI: extlog: fix NULL pointer dereference check ...	2024-01-09 16:12:44 -08:00
Linus Torvalds	6c1dd1fe5d	integrity-v6.8 -----BEGIN PGP SIGNATURE----- iIoEABYIADIWIQQdXVVFGN5XqKr1Hj7LwZzRsCrn5QUCZZ0pVhQcem9oYXJAbGlu dXguaWJtLmNvbQAKCRDLwZzRsCrn5RVMAQDm9J+iiY/2Af75vOTKIZXtGF6KsBpx 9b9ALPqPNZPgugD+PfwSbS+6rO8AItXE0Q2+FwtDaV8LxgSwK9vGeCHI2wM= =yinc -----END PGP SIGNATURE----- Merge tag 'integrity-v6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity Pull integrity updates from Mimi Zohar: - Add a new IMA/EVM maintainer and reviewer - Disable EVM on overlayfs The EVM HMAC and the original file signatures contain filesystem specific metadata (e.g. i_ino, i_generation and s_uuid), preventing the security.evm xattr from directly being copied up to the overlay. Further before calculating and writing out the overlay file's EVM HMAC, EVM must first verify the existing backing file's 'security.evm' value. For now until a solution is developed, disable EVM on overlayfs. - One bug fix and two cleanups * tag 'integrity-v6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity: overlay: disable EVM evm: add support to disable EVM on unsupported filesystems evm: don't copy up 'security.evm' xattr MAINTAINERS: Add Eric Snowberg as a reviewer to IMA MAINTAINERS: Add Roberto Sassu as co-maintainer to IMA and EVM KEYS: encrypted: Add check for strsep ima: Remove EXPERIMENTAL from Kconfig ima: Reword IMA_KEYRINGS_PERMIT_SIGNED_BY_BUILTIN_OR_SECONDARY	2024-01-09 13:24:06 -08:00
Linus Torvalds	063a7ce32d	lsm/stable-6.8 PR 20240105 -----BEGIN PGP SIGNATURE----- iQJIBAABCAAyFiEES0KozwfymdVUl37v6iDy2pc3iXMFAmWYKUIUHHBhdWxAcGF1 bC1tb29yZS5jb20ACgkQ6iDy2pc3iXNyHw/+IKnqL1MZ5QS+/HtSzi4jCL47N9yZ OHLol6XswyEGHH9myKPPGnT5lVA93v98v4ty2mws7EJUSGZQQUntYBPbU9Gi40+B XDzYSRocoj96sdlKeOJMgaWo3NBRD9HYSoGPDNWZixy6m+bLPk/Dqhn3FabKf1lo 2qQSmstvChFRmVNkmgaQnBCAtWVqla4EJEL0EKX6cspHbuzRNTeJdTPn6Q/zOUVL O2znOZuEtSVpYS7yg3uJT0hHD8H0GnIciAcDAhyPSBL5Uk5l6gwJiACcdRfLRbgp QM5Z4qUFdKljV5XBCzYnfhhrx1df08h1SG84El8UK8HgTTfOZfYmawByJRWNJSQE TdCmtyyvEbfb61CKBFVwD7Tzb9/y8WgcY5N3Un8uCQqRzFIO+6cghHri5NrVhifp nPFlP4klxLHh3d7ZVekLmCMHbpaacRyJKwLy+f/nwbBEID47jpPkvZFIpbalat+r QaKRBNWdTeV+GZ+Yu0uWsI029aQnpcO1kAnGg09fl6b/dsmxeKOVWebir25AzQ++ a702S8HRmj80X+VnXHU9a64XeGtBH7Nq0vu0lGHQPgwhSx/9P6/qICEPwsIriRjR I9OulWt4OBPDtlsonHFgDs+lbnd0Z0GJUwYT8e9pjRDMxijVO9lhAXyglVRmuNR8 to2ByKP5BO+Vh8Y= =Py+n -----END PGP SIGNATURE----- Merge tag 'lsm-pr-20240105' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/lsm Pull security module updates from Paul Moore: - Add three new syscalls: lsm_list_modules(), lsm_get_self_attr(), and lsm_set_self_attr(). The first syscall simply lists the LSMs enabled, while the second and third get and set the current process' LSM attributes. Yes, these syscalls may provide similar functionality to what can be found under /proc or /sys, but they were designed to support multiple, simultaneaous (stacked) LSMs from the start as opposed to the current /proc based solutions which were created at a time when only one LSM was allowed to be active at a given time. We have spent considerable time discussing ways to extend the existing /proc interfaces to support multiple, simultaneaous LSMs and even our best ideas have been far too ugly to support as a kernel API; after +20 years in the kernel, I felt the LSM layer had established itself enough to justify a handful of syscalls. Support amongst the individual LSM developers has been nearly unanimous, with a single objection coming from Tetsuo (TOMOYO) as he is worried that the LSM_ID_XXX token concept will make it more difficult for out-of-tree LSMs to survive. Several members of the LSM community have demonstrated the ability for out-of-tree LSMs to continue to exist by picking high/unused LSM_ID values as well as pointing out that many kernel APIs rely on integer identifiers, e.g. syscalls (!), but unfortunately Tetsuo's objections remain. My personal opinion is that while I have no interest in penalizing out-of-tree LSMs, I'm not going to penalize in-tree development to support out-of-tree development, and I view this as a necessary step forward to support the push for expanded LSM stacking and reduce our reliance on /proc and /sys which has occassionally been problematic for some container users. Finally, we have included the linux-api folks on (all?) recent revisions of the patchset and addressed all of their concerns. - Add a new security_file_ioctl_compat() LSM hook to handle the 32-bit ioctls on 64-bit systems problem. This patch includes support for all of the existing LSMs which provide ioctl hooks, although it turns out only SELinux actually cares about the individual ioctls. It is worth noting that while Casey (Smack) and Tetsuo (TOMOYO) did not give explicit ACKs to this patch, they did both indicate they are okay with the changes. - Fix a potential memory leak in the CALIPSO code when IPv6 is disabled at boot. While it's good that we are fixing this, I doubt this is something users are seeing in the wild as you need to both disable IPv6 and then attempt to configure IPv6 labeled networking via NetLabel/CALIPSO; that just doesn't make much sense. Normally this would go through netdev, but Jakub asked me to take this patch and of all the trees I maintain, the LSM tree seemed like the best fit. - Update the LSM MAINTAINERS entry with additional information about our process docs, patchwork, bug reporting, etc. I also noticed that the Lockdown LSM is missing a dedicated MAINTAINERS entry so I've added that to the pull request. I've been working with one of the major Lockdown authors/contributors to see if they are willing to step up and assume a Lockdown maintainer role; hopefully that will happen soon, but in the meantime I'll continue to look after it. - Add a handful of mailmap entries for Serge Hallyn and myself. * tag 'lsm-pr-20240105' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/lsm: (27 commits) lsm: new security_file_ioctl_compat() hook lsm: Add a __counted_by() annotation to lsm_ctx.ctx calipso: fix memory leak in netlbl_calipso_add_pass() selftests: remove the LSM_ID_IMA check in lsm/lsm_list_modules_test MAINTAINERS: add an entry for the lockdown LSM MAINTAINERS: update the LSM entry mailmap: add entries for Serge Hallyn's dead accounts mailmap: update/replace my old email addresses lsm: mark the lsm_id variables are marked as static lsm: convert security_setselfattr() to use memdup_user() lsm: align based on pointer length in lsm_fill_user_ctx() lsm: consolidate buffer size handling into lsm_fill_user_ctx() lsm: correct error codes in security_getselfattr() lsm: cleanup the size counters in security_getselfattr() lsm: don't yet account for IMA in LSM_CONFIG_COUNT calculation lsm: drop LSM_ID_IMA LSM: selftests for Linux Security Module syscalls SELinux: Add selfattr hooks AppArmor: Add selfattr hooks Smack: implement setselfattr and getselfattr hooks ...	2024-01-09 12:57:46 -08:00
Linus Torvalds	9f2a635235	Quite a lot of kexec work this time around. Many singleton patches in many places. The notable patch series are: - nilfs2 folio conversion from Matthew Wilcox in "nilfs2: Folio conversions for file paths". - Additional nilfs2 folio conversion from Ryusuke Konishi in "nilfs2: Folio conversions for directory paths". - IA64 remnant removal in Heiko Carstens's "Remove unused code after IA-64 removal". - Arnd Bergmann has enabled the -Wmissing-prototypes warning everywhere in "Treewide: enable -Wmissing-prototypes". This had some followup fixes: - Nathan Chancellor has cleaned up the hexagon build in the series "hexagon: Fix up instances of -Wmissing-prototypes". - Nathan also addressed some s390 warnings in "s390: A couple of fixes for -Wmissing-prototypes". - Arnd Bergmann addresses the same warnings for MIPS in his series "mips: address -Wmissing-prototypes warnings". - Baoquan He has made kexec_file operate in a top-down-fitting manner similar to kexec_load in the series "kexec_file: Load kernel at top of system RAM if required" - Baoquan He has also added the self-explanatory "kexec_file: print out debugging message if required". - Some checkstack maintenance work from Tiezhu Yang in the series "Modify some code about checkstack". - Douglas Anderson has disentangled the watchdog code's logging when multiple reports are occurring simultaneously. The series is "watchdog: Better handling of concurrent lockups". - Yuntao Wang has contributed some maintenance work on the crash code in "crash: Some cleanups and fixes". -----BEGIN PGP SIGNATURE----- iHUEABYIAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZZ2R6AAKCRDdBJ7gKXxA juCVAP4t76qUISDOSKugB/Dn5E4Nt9wvPY9PcufnmD+xoPsgkQD+JVl4+jd9+gAV vl6wkJDiJO5JZ3FVtBtC3DFA/xHtVgk= =kQw+ -----END PGP SIGNATURE----- Merge tag 'mm-nonmm-stable-2024-01-09-10-33' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull non-MM updates from Andrew Morton: "Quite a lot of kexec work this time around. Many singleton patches in many places. The notable patch series are: - nilfs2 folio conversion from Matthew Wilcox in 'nilfs2: Folio conversions for file paths'. - Additional nilfs2 folio conversion from Ryusuke Konishi in 'nilfs2: Folio conversions for directory paths'. - IA64 remnant removal in Heiko Carstens's 'Remove unused code after IA-64 removal'. - Arnd Bergmann has enabled the -Wmissing-prototypes warning everywhere in 'Treewide: enable -Wmissing-prototypes'. This had some followup fixes: - Nathan Chancellor has cleaned up the hexagon build in the series 'hexagon: Fix up instances of -Wmissing-prototypes'. - Nathan also addressed some s390 warnings in 's390: A couple of fixes for -Wmissing-prototypes'. - Arnd Bergmann addresses the same warnings for MIPS in his series 'mips: address -Wmissing-prototypes warnings'. - Baoquan He has made kexec_file operate in a top-down-fitting manner similar to kexec_load in the series 'kexec_file: Load kernel at top of system RAM if required' - Baoquan He has also added the self-explanatory 'kexec_file: print out debugging message if required'. - Some checkstack maintenance work from Tiezhu Yang in the series 'Modify some code about checkstack'. - Douglas Anderson has disentangled the watchdog code's logging when multiple reports are occurring simultaneously. The series is 'watchdog: Better handling of concurrent lockups'. - Yuntao Wang has contributed some maintenance work on the crash code in 'crash: Some cleanups and fixes'" * tag 'mm-nonmm-stable-2024-01-09-10-33' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (157 commits) crash_core: fix and simplify the logic of crash_exclude_mem_range() x86/crash: use SZ_1M macro instead of hardcoded value x86/crash: remove the unused image parameter from prepare_elf_headers() kdump: remove redundant DEFAULT_CRASH_KERNEL_LOW_SIZE scripts/decode_stacktrace.sh: strip unexpected CR from lines watchdog: if panicking and we dumped everything, don't re-enable dumping watchdog/hardlockup: use printk_cpu_sync_get_irqsave() to serialize reporting watchdog/softlockup: use printk_cpu_sync_get_irqsave() to serialize reporting watchdog/hardlockup: adopt softlockup logic avoiding double-dumps kexec_core: fix the assignment to kimage->control_page x86/kexec: fix incorrect end address passed to kernel_ident_mapping_init() lib/trace_readwrite.c:: replace asm-generic/io with linux/io nilfs2: cpfile: fix some kernel-doc warnings stacktrace: fix kernel-doc typo scripts/checkstack.pl: fix no space expression between sp and offset x86/kexec: fix incorrect argument passed to kexec_dprintk() x86/kexec: use pr_err() instead of kexec_dprintk() when an error occurs nilfs2: add missing set_freezable() for freezable kthread kernel: relay: remove relay_file_splice_read dead code, doesn't work docs: submit-checklist: remove all of "make namespacecheck" ...	2024-01-09 11:46:20 -08:00
Linus Torvalds	fb46e22a9e	Many singleton patches against the MM code. The patch series which are included in this merge do the following: - Peng Zhang has done some mapletree maintainance work in the series "maple_tree: add mt_free_one() and mt_attr() helpers" "Some cleanups of maple tree" - In the series "mm: use memmap_on_memory semantics for dax/kmem" Vishal Verma has altered the interworking between memory-hotplug and dax/kmem so that newly added 'device memory' can more easily have its memmap placed within that newly added memory. - Matthew Wilcox continues folio-related work (including a few fixes) in the patch series "Add folio_zero_tail() and folio_fill_tail()" "Make folio_start_writeback return void" "Fix fault handler's handling of poisoned tail pages" "Convert aops->error_remove_page to ->error_remove_folio" "Finish two folio conversions" "More swap folio conversions" - Kefeng Wang has also contributed folio-related work in the series "mm: cleanup and use more folio in page fault" - Jim Cromie has improved the kmemleak reporting output in the series "tweak kmemleak report format". - In the series "stackdepot: allow evicting stack traces" Andrey Konovalov to permits clients (in this case KASAN) to cause eviction of no longer needed stack traces. - Charan Teja Kalla has fixed some accounting issues in the page allocator's atomic reserve calculations in the series "mm: page_alloc: fixes for high atomic reserve caluculations". - Dmitry Rokosov has added to the samples/ dorectory some sample code for a userspace memcg event listener application. See the series "samples: introduce cgroup events listeners". - Some mapletree maintanance work from Liam Howlett in the series "maple_tree: iterator state changes". - Nhat Pham has improved zswap's approach to writeback in the series "workload-specific and memory pressure-driven zswap writeback". - DAMON/DAMOS feature and maintenance work from SeongJae Park in the series "mm/damon: let users feed and tame/auto-tune DAMOS" "selftests/damon: add Python-written DAMON functionality tests" "mm/damon: misc updates for 6.8" - Yosry Ahmed has improved memcg's stats flushing in the series "mm: memcg: subtree stats flushing and thresholds". - In the series "Multi-size THP for anonymous memory" Ryan Roberts has added a runtime opt-in feature to transparent hugepages which improves performance by allocating larger chunks of memory during anonymous page faults. - Matthew Wilcox has also contributed some cleanup and maintenance work against eh buffer_head code int he series "More buffer_head cleanups". - Suren Baghdasaryan has done work on Andrea Arcangeli's series "userfaultfd move option". UFFDIO_MOVE permits userspace heap compaction algorithms to move userspace's pages around rather than UFFDIO_COPY'a alloc/copy/free. - Stefan Roesch has developed a "KSM Advisor", in the series "mm/ksm: Add ksm advisor". This is a governor which tunes KSM's scanning aggressiveness in response to userspace's current needs. - Chengming Zhou has optimized zswap's temporary working memory use in the series "mm/zswap: dstmem reuse optimizations and cleanups". - Matthew Wilcox has performed some maintenance work on the writeback code, both code and within filesystems. The series is "Clean up the writeback paths". - Andrey Konovalov has optimized KASAN's handling of alloc and free stack traces for secondary-level allocators, in the series "kasan: save mempool stack traces". - Andrey also performed some KASAN maintenance work in the series "kasan: assorted clean-ups". - David Hildenbrand has gone to town on the rmap code. Cleanups, more pte batching, folio conversions and more. See the series "mm/rmap: interface overhaul". - Kinsey Ho has contributed some maintenance work on the MGLRU code in the series "mm/mglru: Kconfig cleanup". - Matthew Wilcox has contributed lruvec page accounting code cleanups in the series "Remove some lruvec page accounting functions". -----BEGIN PGP SIGNATURE----- iHUEABYIAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZZyF2wAKCRDdBJ7gKXxA jjWjAP42LHvGSjp5M+Rs2rKFL0daBQsrlvy6/jCHUequSdWjSgEAmOx7bc5fbF27 Oa8+DxGM9C+fwqZ/7YxU2w/WuUmLPgU= =0NHs -----END PGP SIGNATURE----- Merge tag 'mm-stable-2024-01-08-15-31' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull MM updates from Andrew Morton: "Many singleton patches against the MM code. The patch series which are included in this merge do the following: - Peng Zhang has done some mapletree maintainance work in the series 'maple_tree: add mt_free_one() and mt_attr() helpers' 'Some cleanups of maple tree' - In the series 'mm: use memmap_on_memory semantics for dax/kmem' Vishal Verma has altered the interworking between memory-hotplug and dax/kmem so that newly added 'device memory' can more easily have its memmap placed within that newly added memory. - Matthew Wilcox continues folio-related work (including a few fixes) in the patch series 'Add folio_zero_tail() and folio_fill_tail()' 'Make folio_start_writeback return void' 'Fix fault handler's handling of poisoned tail pages' 'Convert aops->error_remove_page to ->error_remove_folio' 'Finish two folio conversions' 'More swap folio conversions' - Kefeng Wang has also contributed folio-related work in the series 'mm: cleanup and use more folio in page fault' - Jim Cromie has improved the kmemleak reporting output in the series 'tweak kmemleak report format'. - In the series 'stackdepot: allow evicting stack traces' Andrey Konovalov to permits clients (in this case KASAN) to cause eviction of no longer needed stack traces. - Charan Teja Kalla has fixed some accounting issues in the page allocator's atomic reserve calculations in the series 'mm: page_alloc: fixes for high atomic reserve caluculations'. - Dmitry Rokosov has added to the samples/ dorectory some sample code for a userspace memcg event listener application. See the series 'samples: introduce cgroup events listeners'. - Some mapletree maintanance work from Liam Howlett in the series 'maple_tree: iterator state changes'. - Nhat Pham has improved zswap's approach to writeback in the series 'workload-specific and memory pressure-driven zswap writeback'. - DAMON/DAMOS feature and maintenance work from SeongJae Park in the series 'mm/damon: let users feed and tame/auto-tune DAMOS' 'selftests/damon: add Python-written DAMON functionality tests' 'mm/damon: misc updates for 6.8' - Yosry Ahmed has improved memcg's stats flushing in the series 'mm: memcg: subtree stats flushing and thresholds'. - In the series 'Multi-size THP for anonymous memory' Ryan Roberts has added a runtime opt-in feature to transparent hugepages which improves performance by allocating larger chunks of memory during anonymous page faults. - Matthew Wilcox has also contributed some cleanup and maintenance work against eh buffer_head code int he series 'More buffer_head cleanups'. - Suren Baghdasaryan has done work on Andrea Arcangeli's series 'userfaultfd move option'. UFFDIO_MOVE permits userspace heap compaction algorithms to move userspace's pages around rather than UFFDIO_COPY'a alloc/copy/free. - Stefan Roesch has developed a 'KSM Advisor', in the series 'mm/ksm: Add ksm advisor'. This is a governor which tunes KSM's scanning aggressiveness in response to userspace's current needs. - Chengming Zhou has optimized zswap's temporary working memory use in the series 'mm/zswap: dstmem reuse optimizations and cleanups'. - Matthew Wilcox has performed some maintenance work on the writeback code, both code and within filesystems. The series is 'Clean up the writeback paths'. - Andrey Konovalov has optimized KASAN's handling of alloc and free stack traces for secondary-level allocators, in the series 'kasan: save mempool stack traces'. - Andrey also performed some KASAN maintenance work in the series 'kasan: assorted clean-ups'. - David Hildenbrand has gone to town on the rmap code. Cleanups, more pte batching, folio conversions and more. See the series 'mm/rmap: interface overhaul'. - Kinsey Ho has contributed some maintenance work on the MGLRU code in the series 'mm/mglru: Kconfig cleanup'. - Matthew Wilcox has contributed lruvec page accounting code cleanups in the series 'Remove some lruvec page accounting functions'" * tag 'mm-stable-2024-01-08-15-31' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (361 commits) mm, treewide: rename MAX_ORDER to MAX_PAGE_ORDER mm, treewide: introduce NR_PAGE_ORDERS selftests/mm: add separate UFFDIO_MOVE test for PMD splitting selftests/mm: skip test if application doesn't has root privileges selftests/mm: conform test to TAP format output selftests: mm: hugepage-mmap: conform to TAP format output selftests/mm: gup_test: conform test to TAP format output mm/selftests: hugepage-mremap: conform test to TAP format output mm/vmstat: move pgdemote_* out of CONFIG_NUMA_BALANCING mm: zsmalloc: return -ENOSPC rather than -EINVAL in zs_malloc while size is too large mm/memcontrol: remove __mod_lruvec_page_state() mm/khugepaged: use a folio more in collapse_file() slub: use a folio in __kmalloc_large_node slub: use folio APIs in free_large_kmalloc() slub: use alloc_pages_node() in alloc_slab_page() mm: remove inc/dec lruvec page state functions mm: ratelimit stat flush from workingset shrinker kasan: stop leaking stack trace handles mm/mglru: remove CONFIG_TRANSPARENT_HUGEPAGE mm/mglru: add dummy pmd_dirty() ...	2024-01-09 11:18:47 -08:00
Namjae Jeon	3fc74c65b3	ksmbd: send lease break notification on FILE_RENAME_INFORMATION Send lease break notification on FILE_RENAME_INFORMATION request. This patch fix smb2.lease.v2_epoch2 test failure. Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>	2024-01-09 12:55:07 -06:00
Namjae Jeon	d592a9158a	ksmbd: don't allow O_TRUNC open on read-only share When file is changed using notepad on read-only share(read_only = yes in ksmbd.conf), There is a problem where existing data is truncated. notepad in windows try to O_TRUNC open(FILE_OVERWRITE_IF) and all data in file is truncated. This patch don't allow O_TRUNC open on read-only share and add KSMBD_TREE_CONN_FLAG_WRITABLE check in smb2_set_info(). Cc: stable@vger.kernel.org Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>	2024-01-09 12:55:01 -06:00
Randy Dunlap	8d99c1131d	ksmbd: vfs: fix all kernel-doc warnings Fix all kernel-doc warnings in vfs.c: vfs.c:54: warning: Function parameter or member 'parent' not described in 'ksmbd_vfs_lock_parent' vfs.c:54: warning: Function parameter or member 'child' not described in 'ksmbd_vfs_lock_parent' vfs.c:54: warning: No description found for return value of 'ksmbd_vfs_lock_parent' vfs.c:372: warning: Function parameter or member 'fp' not described in 'ksmbd_vfs_read' vfs.c:372: warning: Excess function parameter 'fid' description in 'ksmbd_vfs_read' vfs.c:489: warning: Function parameter or member 'fp' not described in 'ksmbd_vfs_write' vfs.c:489: warning: Excess function parameter 'fid' description in 'ksmbd_vfs_write' vfs.c:555: warning: Function parameter or member 'path' not described in 'ksmbd_vfs_getattr' vfs.c:555: warning: Function parameter or member 'stat' not described in 'ksmbd_vfs_getattr' vfs.c:555: warning: Excess function parameter 'work' description in 'ksmbd_vfs_getattr' vfs.c:555: warning: Excess function parameter 'fid' description in 'ksmbd_vfs_getattr' vfs.c:555: warning: Excess function parameter 'attrs' description in 'ksmbd_vfs_getattr' vfs.c:572: warning: Function parameter or member 'p_id' not described in 'ksmbd_vfs_fsync' vfs.c:595: warning: Function parameter or member 'work' not described in 'ksmbd_vfs_remove_file' vfs.c:595: warning: Function parameter or member 'path' not described in 'ksmbd_vfs_remove_file' vfs.c:595: warning: Excess function parameter 'name' description in 'ksmbd_vfs_remove_file' vfs.c:633: warning: Function parameter or member 'work' not described in 'ksmbd_vfs_link' vfs.c:805: warning: Function parameter or member 'fp' not described in 'ksmbd_vfs_truncate' vfs.c:805: warning: Excess function parameter 'fid' description in 'ksmbd_vfs_truncate' vfs.c:846: warning: Excess function parameter 'size' description in 'ksmbd_vfs_listxattr' vfs.c:953: warning: Function parameter or member 'option' not described in 'ksmbd_vfs_set_fadvise' vfs.c:953: warning: Excess function parameter 'options' description in 'ksmbd_vfs_set_fadvise' vfs.c:1167: warning: Function parameter or member 'um' not described in 'ksmbd_vfs_lookup_in_dir' vfs.c:1203: warning: Function parameter or member 'work' not described in 'ksmbd_vfs_kern_path_locked' vfs.c:1641: warning: No description found for return value of 'ksmbd_vfs_init_kstat' Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: Namjae Jeon <linkinjeon@kernel.org> Cc: Steve French <sfrench@samba.org> Cc: Sergey Senozhatsky <senozhatsky@chromium.org> Cc: Tom Talpey <tom@talpey.com> Cc: linux-cifs@vger.kernel.org Acked-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>	2024-01-09 12:52:33 -06:00
Randy Dunlap	b4068f1ef3	ksmbd: auth: fix most kernel-doc warnings Fix 12 of 17 kernel-doc warnings in auth.c: auth.c:221: warning: Function parameter or member 'conn' not described in 'ksmbd_auth_ntlmv2' auth.c:221: warning: Function parameter or member 'cryptkey' not described in 'ksmbd_auth_ntlmv2' auth.c:305: warning: Function parameter or member 'blob_len' not described in 'ksmbd_decode_ntlmssp_auth_blob' auth.c:305: warning: Function parameter or member 'conn' not described in 'ksmbd_decode_ntlmssp_auth_blob' auth.c:305: warning: Excess function parameter 'usr' description in 'ksmbd_decode_ntlmssp_auth_blob' auth.c:385: warning: Function parameter or member 'blob_len' not described in 'ksmbd_decode_ntlmssp_neg_blob' auth.c:385: warning: Function parameter or member 'conn' not described in 'ksmbd_decode_ntlmssp_neg_blob' auth.c:385: warning: Excess function parameter 'rsp' description in 'ksmbd_decode_ntlmssp_neg_blob' auth.c:385: warning: Excess function parameter 'sess' description in 'ksmbd_decode_ntlmssp_neg_blob' auth.c:413: warning: Function parameter or member 'conn' not described in 'ksmbd_build_ntlmssp_challenge_blob' auth.c:413: warning: Excess function parameter 'rsp' description in 'ksmbd_build_ntlmssp_challenge_blob' auth.c:413: warning: Excess function parameter 'sess' description in 'ksmbd_build_ntlmssp_challenge_blob' The other 5 are only present when a W=1 kernel build is done or when scripts/kernel-doc is run with -Wall. They are: auth.c:81: warning: No description found for return value of 'ksmbd_gen_sess_key' auth.c:385: warning: No description found for return value of 'ksmbd_decode_ntlmssp_neg_blob' auth.c:413: warning: No description found for return value of 'ksmbd_build_ntlmssp_challenge_blob' auth.c:577: warning: No description found for return value of 'ksmbd_sign_smb2_pdu' auth.c:628: warning: No description found for return value of 'ksmbd_sign_smb3_pdu' Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: Namjae Jeon <linkinjeon@kernel.org> Cc: Steve French <sfrench@samba.org> Cc: Sergey Senozhatsky <senozhatsky@chromium.org> Cc: Tom Talpey <tom@talpey.com> Cc: linux-cifs@vger.kernel.org Acked-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>	2024-01-09 12:52:33 -06:00
Christophe JAILLET	ebfee7ad27	ksmbd: Remove usage of the deprecated ida_simple_xx() API ida_alloc() and ida_free() should be preferred to the deprecated ida_simple_get() and ida_simple_remove(). This is less verbose. Note that the upper limit of ida_simple_get() is exclusive, but the one of ida_alloc_range() is inclusive. So change a 0xFFFFFFFF into a 0xFFFFFFFE in order to keep the same behavior. Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Acked-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>	2024-01-09 12:52:33 -06:00
Namjae Jeon	b6e9a44e99	ksmbd: don't increment epoch if current state and request state are same If existing lease state and request state are same, don't increment epoch in create context. Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>	2024-01-09 12:52:33 -06:00
Namjae Jeon	6fc0a265e1	ksmbd: fix potential circular locking issue in smb2_set_ea() smb2_set_ea() can be called in parent inode lock range. So add get_write argument to smb2_set_ea() not to call nested mnt_want_write(). Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>	2024-01-09 12:52:33 -06:00
Namjae Jeon	bb05367a66	ksmbd: set v2 lease version on lease upgrade If file opened with v2 lease is upgraded with v1 lease, smb server should response v2 lease create context to client. This patch fix smb2.lease.v2_epoch2 test failure. This test case assumes the following scenario: 1. smb2 create with v2 lease(R, LEASE1 key) 2. smb server return smb2 create response with v2 lease context(R, LEASE1 key, epoch + 1) 3. smb2 create with v1 lease(RH, LEASE1 key) 4. smb server return smb2 create response with v2 lease context(RH, LEASE1 key, epoch + 2) i.e. If same client(same lease key) try to open a file that is being opened with v2 lease with v1 lease, smb server should return v2 lease. Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Acked-by: Tom Talpey <tom@talpey.com> Signed-off-by: Steve French <stfrench@microsoft.com>	2024-01-09 12:52:32 -06:00
Li Nan	516b3eb8c8	ksmbd: validate the zero field of packet header The SMB2 Protocol requires that "The first byte of the Direct TCP transport packet header MUST be zero (0x00)"[1]. Commit `1c1bcf2d3e` ("ksmbd: validate smb request protocol id") removed the validation of this 1-byte zero. Add the validation back now. [1]: [MS-SMB2] - v20230227, page 30. https://winprotocoldoc.blob.core.windows.net/productionwindowsarchives/MS-SMB2/%5bMS-SMB2%5d-230227.pdf Fixes: `1c1bcf2d3e` ("ksmbd: validate smb request protocol id") Signed-off-by: Li Nan <linan122@huawei.com> Acked-by: Tom Talpey <tom@talpey.com> Acked-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>	2024-01-09 12:52:32 -06:00
David Howells	e2bdb5272f	netfs: Fix wrong #ifdef hiding wait netfs_writepages_begin() has the wait on the fscache folio conditional on CONFIG_NETFS_FSCACHE - which doesn't exist. Fix it to be conditional on CONFIG_FSCACHE instead. Fixes: `62c3b7481b` ("netfs: Provide a writepages implementation") Reported-by: Lukas Bulwahn <lukas.bulwahn@gmail.com> Signed-off-by: David Howells <dhowells@redhat.com> cc: Matthew Wilcox <willy@infradead.org> cc: linux-afs@lists.infradead.org cc: linux-cachefs@redhat.com cc: linux-fsdevel@vger.kernel.org cc: linux-mm@kvack.org Link: https://lore.kernel.org/r/20240109083257.GK132648@kernel.org/	2024-01-09 13:33:01 +00:00
David Howells	3d1d4aa0cc	cachefiles: Fix signed/unsigned mixup In __cachefiles_prepare_write(), the start and pos variables were made unsigned 64-bit so that the casts in the checking could be got rid of - which should be fine since absolute file offsets can't be negative, except that an error code may be obtained from vfs_llseek(), which would be negative. This breaks the error check. Fix this for now by reverting pos and start to be signed and putting back the casts. Unfortunately, the error value checks cannot be replaced with IS_ERR_VALUE() as long might be 32-bits. Fixes: `7097c96411` ("cachefiles: Fix __cachefiles_prepare_write()") Reported-by: Simon Horman <horms@kernel.org> Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202401071152.DbKqMQMu-lkp@intel.com/ Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com> cc: Yiqun Leng <yqleng@linux.alibaba.com> cc: Jia Zhu <zhujia.zj@bytedance.com> cc: Jeff Layton <jlayton@kernel.org> cc: linux-cachefs@redhat.com cc: linux-erofs@lists.ozlabs.org cc: linux-fsdevel@vger.kernel.org cc: linux-mm@kvack.org	2024-01-09 13:32:44 +00:00
Steve French	a3f763fdcb	cifs: remove unneeded return statement Return statement was not needed at end of cifs_chan_update_iface Suggested-by: Christophe Jaillet <christophe.jaillet@wanadoo.fr> Signed-off-by: Steve French <stfrench@microsoft.com>	2024-01-08 22:37:10 -06:00
Dan Carpenter	8d606c311b	cifs: make cifs_chan_update_iface() a void function The return values for cifs_chan_update_iface() didn't match what the documentation said and nothing was checking them anyway. Just make it a void function. Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Signed-off-by: Steve French <stfrench@microsoft.com>	2024-01-08 22:36:26 -06:00
Dan Carpenter	c3a11c0ec6	cifs: delete unnecessary NULL checks in cifs_chan_update_iface() We return early if "iface" is NULL so there is no need to check here. Delete those checks. Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Signed-off-by: Steve French <stfrench@microsoft.com>	2024-01-08 18:43:19 -06:00
Kirill A. Shutemov	5e0a760b44	mm, treewide: rename MAX_ORDER to MAX_PAGE_ORDER commit `23baf831a3` ("mm, treewide: redefine MAX_ORDER sanely") has changed the definition of MAX_ORDER to be inclusive. This has caused issues with code that was not yet upstream and depended on the previous definition. To draw attention to the altered meaning of the define, rename MAX_ORDER to MAX_PAGE_ORDER. Link: https://lkml.kernel.org/r/20231228144704.14033-2-kirill.shutemov@linux.intel.com Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2024-01-08 15:27:15 -08:00
Linus Torvalds	5db8752c3b	vfs-6.8.iov_iter -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCZZUzBQAKCRCRxhvAZXjc ot+3AQCZw1PBD4azVxFMWH76qwlAGoVIFug4+ogKU/iUa4VLygEA2FJh1vLJw5iI LpgBEIUTPVkwtzinAW94iJJo1Vr7NAI= =p6PB -----END PGP SIGNATURE----- Merge tag 'vfs-6.8.iov_iter' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs iov_iter cleanups from Christian Brauner: "This contains a minor cleanup. The patches drop an unused argument from import_single_range() allowing to replace import_single_range() with import_ubuf() and dropping import_single_range() completely" * tag 'vfs-6.8.iov_iter' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: iov_iter: replace import_single_range() with import_ubuf() iov_iter: remove unused 'iov' argument from import_single_range()	2024-01-08 11:43:04 -08:00
Gabriel Krisman Bertazi	cd72c7ef5f	ecryptfs: Reject casefold directory inodes Even though it seems to be able to resolve some names of case-insensitive directories, the lack of d_hash and d_compare means we end up with a broken state in the d_cache. Considering it was never a goal to support these two together, and we are preparing to use d_revalidate in case-insensitive filesystems, which would make the combination even more broken, reject any attempt to get a casefolded inode from ecryptfs. Signed-off-by: Gabriel Krisman Bertazi <krisman@suse.de> Reviewed-by: Eric Biggers <ebiggers@google.com>	2024-01-08 16:34:43 -03:00
Linus Torvalds	26458409a9	vfs-6.8.cachefiles -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCZZU0egAKCRCRxhvAZXjc onAqAP9s2ohvjE4QE2ad7svXOzNWKesGcyDyoEBwBpt3Yq8hvAEA+J4xiaMBlRAg FmBobDwtcvOzxL1q+BbB3IsmmuFrRww= =ZS18 -----END PGP SIGNATURE----- Merge tag 'vfs-6.8.cachefiles' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs cachefiles updates from Christian Brauner: "This contains improvements for on-demand cachefiles. If the daemon crashes and the on-demand cachefiles fd is unexpectedly closed in-flight requests and subsequent read operations associated with the fd will fail with EIO. This causes issues in various scenarios as this failure is currently unrecoverable. The work contained in this pull request introduces a failover mode and enables the daemon to recover in-flight requested-related objects. A restarted daemon will be able to process requests as usual. This requires that in-flight requests are stored during daemon crash or while the daemon is offline. In addition, a handle to /dev/cachefiles needs to be stored. This can be done by e.g., systemd's fdstore (cf. [1]) which enables the restarted daemon to recover state. Three new states are introduced in this patchset: (1) CLOSE Object is closed by the daemon. (2) OPEN Object is open and ready for processing. IOW, the open request has been handled successfully. (3) REOPENING Object has been previously closed and is now reopened due to a read request. A restarted daemon can recover the /dev/cachefiles fd from systemd's fdstore and writes "restore" to the device. This causes the object state to be reset from CLOSE to REOPENING and reinitializes the object. The daemon may now handle the open request. Any in-flight operations are restored and handled avoiding interruptions for users" Link: https://systemd.io/FILE_DESCRIPTOR_STORE [1] * tag 'vfs-6.8.cachefiles' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: cachefiles: add restore command to recover inflight ondemand read requests cachefiles: narrow the scope of triggering EPOLLIN events in ondemand mode cachefiles: resend an open request if the read request's object is closed cachefiles: extract ondemand info field from cachefiles_object cachefiles: introduce object ondemand state	2024-01-08 11:26:50 -08:00
Linus Torvalds	bb93c5ed45	vfs-6.8.rw -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCZZUzXQAKCRCRxhvAZXjc ogOtAQDpqUp1zY4dV/dZisCJ5xarZTsSZ1AvgmcxZBtS0NhbdgEAshWvYGA9ryS/ ChL5jjtjjZDLhRA//reoFHTQIrdp2w8= =bF+R -----END PGP SIGNATURE----- Merge tag 'vfs-6.8.rw' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs rw updates from Christian Brauner: "This contains updates from Amir for read-write backing file helpers for stacking filesystems such as overlayfs: - Fanotify is currently in the process of introducing pre content events. Roughly, a new permission event will be added indicating that it is safe to write to the file being accessed. These events are used by hierarchical storage managers to e.g., fill the content of files on first access. During that work we noticed that our current permission checking is inconsistent in rw_verify_area() and remap_verify_area(). Especially in the splice code permission checking is done multiple times. For example, one time for the whole range and then again for partial ranges inside the iterator. In addition, we mostly do permission checking before we call file_start_write() except for a few places where we call it after. For pre-content events we need such permission checking to be done before file_start_write(). So this is a nice reason to clean this all up. After this series, all permission checking is done before file_start_write(). As part of this cleanup we also massaged the splice code a bit. We got rid of a few helpers because we are alredy drowning in special read-write helpers. We also cleaned up the return types for splice helpers. - Introduce generic read-write helpers for backing files. This lifts some overlayfs code to common code so it can be used by the FUSE passthrough work coming in over the next cycles. Make Amir and Miklos the maintainers for this new subsystem of the vfs" * tag 'vfs-6.8.rw' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (30 commits) fs: fix __sb_write_started() kerneldoc formatting fs: factor out backing_file_mmap() helper fs: factor out backing_file_splice_{read,write}() helpers fs: factor out backing_file_{read,write}_iter() helpers fs: prepare for stackable filesystems backing file helpers fsnotify: optionally pass access range in file permission hooks fsnotify: assert that file_start_write() is not held in permission hooks fsnotify: split fsnotify_perm() into two hooks fs: use splice_copy_file_range() inline helper splice: return type ssize_t from all helpers fs: use do_splice_direct() for nfsd/ksmbd server-side-copy fs: move file_start_write() into direct_splice_actor() fs: fork splice_file_range() from do_splice_direct() fs: create {sb,file}_write_not_started() helpers fs: create file_write_started() helper fs: create __sb_write_started() helper fs: move kiocb_start_write() into vfs_iocb_iter_write() fs: move permission hook out of do_iter_read() fs: move permission hook out of do_iter_write() fs: move file_start_write() into vfs_iter_write() ...	2024-01-08 11:11:51 -08:00
Linus Torvalds	8c9440fea7	vfs-6.8.mount -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCZZU0CgAKCRCRxhvAZXjc osncAQDSJK0frJL+72NqXxa4YNzivrnuw6fhp5iaDAEqxdm8ygEAoJWyh7Rmkt8G drAXWGyGnCYqv7UgC6axLyciid7TxQg= =vJuv -----END PGP SIGNATURE----- Merge tag 'vfs-6.8.mount' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs mount updates from Christian Brauner: "This contains the work to retrieve detailed information about mounts via two new system calls. This is hopefully the beginning of the end of the saga that started with fsinfo() years ago. The LWN articles in [1] and [2] can serve as a summary so we can avoid rehashing everything here. At LSFMM in May 2022 we got into a room and agreed on what we want to do about fsinfo(). Basically, split it into pieces. This is the first part of that agreement. Specifically, it is concerned with retrieving information about mounts. So this only concerns the mount information retrieval, not the mount table change notification, or the extended filesystem specific mount option work. That is separate work. Currently mounts have a 32bit id. Mount ids are already in heavy use by libmount and other low-level userspace but they can't be relied upon because they're recycled very quickly. We agreed that mounts should carry a unique 64bit id by which they can be referenced directly. This is now implemented as part of this work. The new 64bit mount id is exposed in statx() through the new STATX_MNT_ID_UNIQUE flag. If the flag isn't raised the old mount id is returned. If it is raised and the kernel supports the new 64bit mount id the flag is raised in the result mask and the new 64bit mount id is returned. New and old mount ids do not overlap so they cannot be conflated. Two new system calls are introduced that operate on the 64bit mount id: statmount() and listmount(). A summary of the api and usage can be found on LWN as well (cf. [3]) but of course, I'll provide a summary here as well. Both system calls rely on struct mnt_id_req. Which is the request struct used to pass the 64bit mount id identifying the mount to operate on. It is extensible to allow for the addition of new parameters and for future use in other apis that make use of mount ids. statmount() mimicks the semantics of statx() and exposes a set flags that userspace may raise in mnt_id_req to request specific information to be retrieved. A statmount() call returns a struct statmount filled in with information about the requested mount. Supported requests are indicated by raising the request flag passed in struct mnt_id_req in the @mask argument in struct statmount. Currently we do support: - STATMOUNT_SB_BASIC: Basic filesystem info - STATMOUNT_MNT_BASIC Mount information (mount id, parent mount id, mount attributes etc) - STATMOUNT_PROPAGATE_FROM Propagation from what mount in current namespace - STATMOUNT_MNT_ROOT Path of the root of the mount (e.g., mount --bind /bla /mnt returns /bla) - STATMOUNT_MNT_POINT Path of the mount point (e.g., mount --bind /bla /mnt returns /mnt) - STATMOUNT_FS_TYPE Name of the filesystem type as the magic number isn't enough due to submounts The string options STATMOUNT_MNT_{ROOT,POINT} and STATMOUNT_FS_TYPE are appended to the end of the struct. Userspace can use the offsets in @fs_type, @mnt_root, and @mnt_point to reference those strings easily. The struct statmount reserves quite a bit of space currently for future extensibility. This isn't really a problem and if this bothers us we can just send a follow-up pull request during this cycle. listmount() is given a 64bit mount id via mnt_id_req just as statmount(). It takes a buffer and a size to return an array of the 64bit ids of the child mounts of the requested mount. Userspace can thus choose to either retrieve child mounts for a mount in batches or iterate through the child mounts. For most use-cases it will be sufficient to just leave space for a few child mounts. But for big mount tables having an iterator is really helpful. Iterating through a mount table works by setting @param in mnt_id_req to the mount id of the last child mount retrieved in the previous listmount() call" Link: https://lwn.net/Articles/934469 [1] Link: https://lwn.net/Articles/829212 [2] Link: https://lwn.net/Articles/950569 [3] * tag 'vfs-6.8.mount' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: add selftest for statmount/listmount fs: keep struct mnt_id_req extensible wire up syscalls for statmount/listmount add listmount(2) syscall statmount: simplify string option retrieval statmount: simplify numeric option retrieval add statmount(2) syscall namespace: extract show_path() helper mounts: keep list of mounts in an rbtree add unique mount ID	2024-01-08 10:57:34 -08:00
Linus Torvalds	3f6984e730	vfs-6.8.super -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCZZUx4wAKCRCRxhvAZXjc osaNAQC/c+xXVfiq/pFbuK9MQLna4RGZaGcG9k312YniXbHq0AD9HAf4aPcZwPy1 /wkD4pauj3UZ3f0xBSyazGBvAXyN0Qc= =iFAQ -----END PGP SIGNATURE----- Merge tag 'vfs-6.8.super' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs super updates from Christian Brauner: "This contains the super work for this cycle including the long-awaited series by Jan to make it possible to prevent writing to mounted block devices: - Writing to mounted devices is dangerous and can lead to filesystem corruption as well as crashes. Furthermore syzbot comes with more and more involved examples how to corrupt block device under a mounted filesystem leading to kernel crashes and reports we can do nothing about. Add tracking of writers to each block device and a kernel cmdline argument which controls whether other writeable opens to block devices open with BLK_OPEN_RESTRICT_WRITES flag are allowed. Note that this effectively only prevents modification of the particular block device's page cache by other writers. The actual device content can still be modified by other means - e.g. by issuing direct scsi commands, by doing writes through devices lower in the storage stack (e.g. in case loop devices, DM, or MD are involved) etc. But blocking direct modifications of the block device page cache is enough to give filesystems a chance to perform data validation when loading data from the underlying storage and thus prevent kernel crashes. Syzbot can use this cmdline argument option to avoid uninteresting crashes. Also users whose userspace setup does not need writing to mounted block devices can set this option for hardening. We expect that this will be interesting to quite a few workloads. Btrfs is currently opted out of this because they still haven't merged patches we require for this to work from three kernel releases ago. - Reimplement block device freezing and thawing as holder operations on the block device. This allows us to extend block device freezing to all devices associated with a superblock and not just the main device. It also allows us to remove get_active_super() and thus another function that scans the global list of superblocks. Freezing via additional block devices only works if the filesystem chooses to use @fs_holder_ops for these additional devices as well. That currently only includes ext4 and xfs. Earlier releases switched get_tree_bdev() and mount_bdev() to use @fs_holder_ops. The remaining nilfs2 open-coded version of mount_bdev() has been converted to rely on @fs_holder_ops as well. So block device freezing for the main block device will continue to work as before. There should be no regressions in functionality. The only special case is btrfs where block device freezing for the main block device never worked because sb->s_bdev isn't set. Block device freezing for btrfs can be fixed once they can switch to @fs_holder_ops but that can happen whenever they're ready" * tag 'vfs-6.8.super' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (27 commits) block: Fix a memory leak in bdev_open_by_dev() super: don't bother with WARN_ON_ONCE() super: massage wait event mechanism ext4: Block writes to journal device xfs: Block writes to log device fs: Block writes to mounted block devices btrfs: Do not restrict writes to btrfs devices block: Add config option to not allow writing to mounted devices block: Remove blkdev_get_by_*() functions bcachefs: Convert to bdev_open_by_path() fs: handle freezing from multiple devices fs: remove dead check nilfs2: simplify device handling fs: streamline thaw_super_locked ext4: simplify device handling xfs: simplify device handling fs: simplify setup_bdev_super() calls blkdev: comment fs_holder_ops porting: document block device freeze and thaw changes fs: remove unused helper ...	2024-01-08 10:43:51 -08:00
Linus Torvalds	c604110e66	vfs-6.8.misc -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCZZUxRQAKCRCRxhvAZXjc ov/QAQDzvge3oQ9MEymmOiyzzcF+HhAXBr+9oEsYJjFc1p0TsgEA61gXjZo7F1jY KBqd6znOZCR+Waj0kIVJRAo/ISRBqQc= =0bRl -----END PGP SIGNATURE----- Merge tag 'vfs-6.8.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull misc vfs updates from Christian Brauner: "This contains the usual miscellaneous features, cleanups, and fixes for vfs and individual fses. Features: - Add Jan Kara as VFS reviewer - Show correct device and inode numbers in proc/<pid>/maps for vma files on stacked filesystems. This is now easily doable thanks to the backing file work from the last cycles. This comes with selftests Cleanups: - Remove a redundant might_sleep() from wait_on_inode() - Initialize pointer with NULL, not 0 - Clarify comment on access_override_creds() - Rework and simplify eventfd_signal() and eventfd_signal_mask() helpers - Process aio completions in batches to avoid needless wakeups - Completely decouple struct mnt_idmap from namespaces. We now only keep the actual idmapping around and don't stash references to namespaces - Reformat maintainer entries to indicate that a given subsystem belongs to fs/ - Simplify fput() for files that were never opened - Get rid of various pointless file helpers - Rename various file helpers - Rename struct file members after SLAB_TYPESAFE_BY_RCU switch from last cycle - Make relatime_need_update() return bool - Use GFP_KERNEL instead of GFP_USER when allocating superblocks - Replace deprecated ida_simple_() calls with their current ida_() counterparts Fixes: - Fix comments on user namespace id mapping helpers. They aren't kernel doc comments so they shouldn't be using /** - s/Retuns/Returns/g in various places - Add missing parameter documentation on can_move_mount_beneath() - Rename i_mapping->private_data to i_mapping->i_private_data - Fix a false-positive lockdep warning in pipe_write() for watch queues - Improve __fget_files_rcu() code generation to improve performance - Only notify writer that pipe resizing has finished after setting pipe->max_usage otherwise writers are never notified that the pipe has been resized and hang - Fix some kernel docs in hfsplus - s/passs/pass/g in various places - Fix kernel docs in ntfs - Fix kcalloc() arguments order reported by gcc 14 - Fix uninitialized value in reiserfs" * tag 'vfs-6.8.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (36 commits) reiserfs: fix uninit-value in comp_keys watch_queue: fix kcalloc() arguments order ntfs: dir.c: fix kernel-doc function parameter warnings fs: fix doc comment typo fs tree wide selftests/overlayfs: verify device and inode numbers in /proc/pid/maps fs/proc: show correct device and inode numbers in /proc/pid/maps eventfd: Remove usage of the deprecated ida_simple_xx() API fs: super: use GFP_KERNEL instead of GFP_USER for super block allocation fs/hfsplus: wrapper.c: fix kernel-doc warnings fs: add Jan Kara as reviewer fs/inode: Make relatime_need_update return bool pipe: wakeup wr_wait after setting max_usage file: remove __receive_fd() file: stop exposing receive_fd_user() fs: replace f_rcuhead with f_task_work file: remove pointless wrapper file: s/close_fd_get_file()/file_close_fd()/g Improve __fget_files_rcu() code generation (and thus __fget_light()) file: massage cleanup of files that failed to open fs/pipe: Fix lockdep false-positive in watchqueue pipe_write() ...	2024-01-08 10:26:08 -08:00
Yuezhang Mo	f55c096f62	exfat: do not zero the extended part Since the read operation beyond the ValidDataLength returns zero, if we just extend the size of the file, we don't need to zero the extended part, but only change the DataLength without changing the ValidDataLength. Signed-off-by: Yuezhang Mo <Yuezhang.Mo@sony.com> Reviewed-by: Andy Wu <Andy.Wu@sony.com> Reviewed-by: Aoyama Wataru <wataru.aoyama@sony.com> Reviewed-by: Sungjong Seo <sj1557.seo@samsung.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>	2024-01-08 21:57:22 +09:00
Yuezhang Mo	11a347fb6c	exfat: change to get file size from DataLength In stream extension directory entry, the ValidDataLength field describes how far into the data stream user data has been written, and the DataLength field describes the file size. Signed-off-by: Yuezhang Mo <Yuezhang.Mo@sony.com> Reviewed-by: Andy Wu <Andy.Wu@sony.com> Reviewed-by: Aoyama Wataru <wataru.aoyama@sony.com> Reviewed-by: Sungjong Seo <sj1557.seo@samsung.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>	2024-01-08 21:57:22 +09:00
John Sanpe	34939ae005	exfat: using ffs instead of internal logic Replaced the internal table lookup algorithm with ffs of the bitops library with better performance. Use it to increase the single processing length of the exfat_find_free_bitmap function, from single-byte search to long type. Signed-off-by: John Sanpe <sanpeqf@gmail.com> Acked-by: Sungjong Seo <sj1557.seo@samsung.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>	2024-01-08 21:57:21 +09:00
John Sanpe	7423546040	exfat: using hweight instead of internal logic Replace the internal table lookup algorithm with the hweight library, which has instruction set acceleration capabilities. Use it to increase the length of a single calculation of the exfat_find_free_bitmap function to the long type. Signed-off-by: John Sanpe <sanpeqf@gmail.com> Acked-by: Sungjong Seo <sj1557.seo@samsung.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>	2024-01-08 21:57:21 +09:00
Paulo Alcantara	8a3c4e44c2	cifs: get rid of dup length check in parse_reparse_point() smb2_compound_op(SMB2_OP_GET_REPARSE) already checks if ioctl response has a valid reparse data buffer's length, so there's no need to check it again in parse_reparse_point(). In order to get rid of duplicate check, validate reparse data buffer's length also in cifs_query_reparse_point(). Signed-off-by: Paulo Alcantara (SUSE) <pc@manguebit.com> Signed-off-by: Steve French <stfrench@microsoft.com>	2024-01-07 21:18:00 -06:00
NeilBrown	17419aefcb	nfsd: rename nfsd_last_thread() to nfsd_destroy_serv() As this function now destroys the svc_serv, this is a better name. Signed-off-by: NeilBrown <neilb@suse.de> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2024-01-07 17:54:33 -05:00
NeilBrown	1e3577a452	SUNRPC: discard sv_refcnt, and svc_get/svc_put sv_refcnt is no longer useful. lockd and nfs-cb only ever have the svc active when there are a non-zero number of threads, so sv_refcnt mirrors sv_nrthreads. nfsd also keeps the svc active between when a socket is added and when the first thread is started, but we don't really need a refcount for that. We can simply not destroy the svc while there are any permanent sockets attached. So remove sv_refcnt and the get/put functions. Instead of a final call to svc_put(), call svc_destroy() instead. This is changed to also store NULL in the passed-in pointer to make it easier to avoid use-after-free situations. Signed-off-by: NeilBrown <neilb@suse.de> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2024-01-07 17:54:33 -05:00
NeilBrown	7b207ccd98	svc: don't hold reference for poolstats, only mutex. A future patch will remove refcounting on svc_serv as it is of little use. It is currently used to keep the svc around while the pool_stats file is open. Change this to get the pointer, protected by the mutex, only in seq_start, and the release the mutex in seq_stop. This means that if the nfsd server is stopped and restarted while the pool_stats file it open, then some pool stats info could be from the first instance and some from the second. This might appear odd, but is unlikely to be a problem in practice. Signed-off-by: NeilBrown <neilb@suse.de> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2024-01-07 17:54:33 -05:00
ChenXiaoSong	52e8910075	NFSv4, NFSD: move enum nfs_cb_opnum4 to include/linux/nfs4.h Callback operations enum is defined in client and server, move it to common header file. Signed-off-by: ChenXiaoSong <chenxiaosong@kylinos.cn> Acked-by: Anna Schumaker <Anna.Schumaker@netapp.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2024-01-07 17:54:26 -05:00
Dan Carpenter	3c86e615d1	nfsd: remove unnecessary NULL check We check "state" for NULL on the previous line so it can't be NULL here. No need to check again. Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/r/202312031425.LffZTarR-lkp@intel.com/ Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2024-01-07 17:54:26 -05:00
Chuck Lever	a2c91753a4	NFSD: Modify NFSv4 to use nfsd_read_splice_ok() Avoid the use of an atomic bitop, and prepare for adding a run-time switch for using splice reads. Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2024-01-07 17:54:25 -05:00
Chuck Lever	c21fd7a8e8	NFSD: Replace RQ_SPLICE_OK in nfsd_read() RQ_SPLICE_OK is a bit of a layering violation. Also, a subsequent patch is going to provide a mechanism for always disabling splice reads. Splicing is an issue only for NFS READs, so refactor nfsd_read() to check the auth type directly instead of relying on an rq_flag setting. The new helper will be added into the NFSv4 read path in a subsequent patch. Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2024-01-07 17:54:25 -05:00
Chuck Lever	a853ed5525	NFSD: Document lack of f_pos_lock in nfsd_readdir() Al Viro notes that normal system calls hold f_pos_lock when calling ->iterate_shared and ->llseek; however nfsd_readdir() does not take that mutex when calling these methods. It should be safe however because the struct file acquired by nfsd_readdir() is not visible to other threads. Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2024-01-07 17:54:25 -05:00
Chuck Lever	d0ab8b649b	NFSD: Remove nfsd_drc_gc() tracepoint This trace point was for debugging the DRC's garbage collection. In the field it's just noise. Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2024-01-07 17:54:25 -05:00
Chuck Lever	ce7df05508	NFSD: Make the file_delayed_close workqueue UNBOUND workqueue: nfsd_file_delayed_close [nfsd] hogged CPU for >13333us 8 times, consider switching to WQ_UNBOUND There's no harm in closing a cached file descriptor on another core. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2024-01-07 17:54:25 -05:00
Oleg Nesterov	f3734cc407	NFSD: use read_seqbegin() rather than read_seqbegin_or_lock() The usage of read_seqbegin_or_lock() in nfsd_copy_write_verifier() is wrong. "seq" is always even and thus "or_lock" has no effect, this code can never take ->writeverf_lock for writing. I guess this is fine, nfsd_copy_write_verifier() just copies 8 bytes and nfsd_reset_write_verifier() is supposed to be very rare operation so we do not need the adaptive locking in this case. Yet the code looks wrong and sub-optimal, it can use read_seqbegin() without changing the behaviour. [ cel: Note also that it eliminates this Sparse warning: fs/nfsd/nfssvc.c:360:6: warning: context imbalance in 'nfsd_copy_write_verifier' - different lock contexts for basic block ] Signed-off-by: Oleg Nesterov <oleg@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: NeilBrown <neilb@suse.de> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2024-01-07 17:54:24 -05:00
Jeff Layton	74fd48739d	nfsd: new Kconfig option for legacy client tracking We've had a number of attempts at different NFSv4 client tracking methods over the years, but now nfsdcld has emerged as the clear winner since the others (recoverydir and the usermodehelper upcall) are problematic. As a case in point, the recoverydir backend uses MD5 hashes to encode long form clientid strings, which means that nfsd repeatedly gets dinged on FIPS audits, since MD5 isn't considered secure. Its use of MD5 is not cryptographically significant, so there is no danger there, but allowing us to compile that out allows us to sidestep the issue entirely. As a prelude to eventually removing support for these client tracking methods, add a new Kconfig option that enables them. Mark it deprecated and make it default to N. Acked-by: NeilBrown <neilb@suse.de> Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2024-01-07 17:54:24 -05:00
Paulo Alcantara	6d039984c1	smb: client: stop revalidating reparse points unnecessarily Query dir responses don't provide enough information on reparse points such as major/minor numbers and symlink targets other than reparse tags, however we don't need to unconditionally revalidate them only because they are reparse points. Instead, revalidate them only when their ctime or reparse tag has changed. For instance, Windows Server updates ctime of reparse points when their data have changed. Signed-off-by: Paulo Alcantara (SUSE) <pc@manguebit.com> Signed-off-by: Steve French <stfrench@microsoft.com>	2024-01-07 15:46:06 -06:00
David Howells	6ebfede8d5	cifs: Pass unbyteswapped eof value into SMB2_set_eof() Change SMB2_set_eof() to take eof as CPU order rather than __le64 and pass it directly rather than by pointer. This moves the conversion down into SMB_set_eof() rather than all of its callers and means we don't need to undo it for the traceline. Signed-off-by: David Howells <dhowells@redhat.com> cc: Jeff Layton <jlayton@kernel.org> cc: linux-cifs@vger.kernel.org Signed-off-by: Steve French <stfrench@microsoft.com>	2024-01-07 15:46:06 -06:00
Markus Elfring	96d566b6c9	smb3: Improve exception handling in allocate_mr_list() The kfree() function was called in one case by the allocate_mr_list() function during error handling even if the passed variable contained a null pointer. This issue was detected by using the Coccinelle software. Thus use another label. Signed-off-by: Markus Elfring <elfring@users.sourceforge.net> Signed-off-by: Steve French <stfrench@microsoft.com>	2024-01-07 15:46:06 -06:00
Shyam Prasad N	516eea97f9	cifs: fix in logging in cifs_chan_update_iface Recently, cifs_chan_update_iface was modified to not remove an iface if a suitable replacement was not found. With that, there were two conditionals that were exactly the same. This change removes that extra condition check. Also, fixed a logging in the same function to indicate the correct message. Signed-off-by: Shyam Prasad N <sprasad@microsoft.com> Signed-off-by: Steve French <stfrench@microsoft.com>	2024-01-07 15:46:06 -06:00
Paulo Alcantara	9c38568a75	smb: client: handle special files and symlinks in SMB3 POSIX Parse reparse points in SMB3 posix query info as they will be supported and required by the new specification. Signed-off-by: Paulo Alcantara (SUSE) <pc@manguebit.com> Signed-off-by: Steve French <stfrench@microsoft.com>	2024-01-07 15:46:06 -06:00
Paulo Alcantara	3ded18a9e9	smb: client: cleanup smb2_query_reparse_point() Use smb2_compound_op() with SMB2_OP_GET_REPARSE to get reparse point. Signed-off-by: Paulo Alcantara (SUSE) <pc@manguebit.com> Signed-off-by: Steve French <stfrench@microsoft.com>	2024-01-07 15:46:06 -06:00
Paulo Alcantara	514d793e27	smb: client: allow creating symlinks via reparse points Add support for creating symlinks via IO_REPARSE_TAG_SYMLINK reparse points in SMB2+. These are fully supported by most SMB servers and documented in MS-FSCC. Also have the advantage of requiring fewer roundtrips as their symlink targets can be parsed directly from CREATE responses on STATUS_STOPPED_ON_SYMLINK errors. Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202311260838.nx5mkj1j-lkp@intel.com/ Signed-off-by: Paulo Alcantara (SUSE) <pc@manguebit.com> Signed-off-by: Steve French <stfrench@microsoft.com>	2024-01-07 15:46:06 -06:00
Paulo Alcantara	5408990aa6	smb: client: fix hardlinking of reparse points The client was sending an SMB2_CREATE request without setting OPEN_REPARSE_POINT flag thus failing the entire hardlink operation. Fix this by setting OPEN_REPARSE_POINT in create options for SMB2_CREATE request when the source inode is a repase point. Signed-off-by: Paulo Alcantara (SUSE) <pc@manguebit.com> Signed-off-by: Steve French <stfrench@microsoft.com>	2024-01-07 15:46:06 -06:00
Paulo Alcantara	7435d51b7e	smb: client: fix renaming of reparse points The client was sending an SMB2_CREATE request without setting OPEN_REPARSE_POINT flag thus failing the entire rename operation. Fix this by setting OPEN_REPARSE_POINT in create options for SMB2_CREATE request when the source inode is a repase point. Signed-off-by: Paulo Alcantara (SUSE) <pc@manguebit.com> Signed-off-by: Steve French <stfrench@microsoft.com>	2024-01-07 15:46:06 -06:00
Paulo Alcantara	67ec9949b0	smb: client: optimise reparse point querying Reduce number of roundtrips to server when querying reparse points in ->query_path_info() by sending a single compound request of create+get_reparse+get_info+close. Signed-off-by: Paulo Alcantara (SUSE) <pc@manguebit.com> Signed-off-by: Steve French <stfrench@microsoft.com>	2024-01-07 15:46:05 -06:00
Paulo Alcantara	102466f303	smb: client: allow creating special files via reparse points Add support for creating special files (e.g. char/block devices, sockets, fifos) via NFS reparse points on SMB2+, which are fully supported by most SMB servers and documented in MS-FSCC. smb2_get_reparse_inode() creates the file with a corresponding reparse point buffer set in @iov through a single roundtrip to the server. Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202311260746.HOJ039BV-lkp@intel.com/ Signed-off-by: Paulo Alcantara (SUSE) <pc@manguebit.com> Signed-off-by: Steve French <stfrench@microsoft.com>	2024-01-07 15:46:05 -06:00
Paulo Alcantara	3322960ce2	smb: client: extend smb2_compound_op() to accept more commands Make smb2_compound_op() accept up to MAX_COMPOUND(5) commands to be sent over a single compounded request. This will allow next commits to read and write reparse files through a single roundtrip to the server. Signed-off-by: Paulo Alcantara (SUSE) <pc@manguebit.com> Signed-off-by: Steve French <stfrench@microsoft.com>	2024-01-07 15:46:05 -06:00
Pierre Mariani	0108ce08ae	smb: client: Fix minor whitespace errors and warnings Fixes no-op checkpatch errors and warnings. Signed-off-by: Pierre Mariani <pierre.mariani@gmail.com> Signed-off-by: Steve French <stfrench@microsoft.com>	2024-01-07 15:46:05 -06:00
Randy Dunlap	ac8e9f64f5	ubifs: fix kernel-doc warnings Fix kernel-doc warnings found when using "W=1". file.c:1385: warning: Excess function parameter 'time' description in 'ubifs_update_time' and 9 warnings like this one: file.c:326: warning: No description found for return value of 'allocate_budget' Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Reported-by: kernel test robot <lkp@intel.com> Link: https://lore.kernel.org/oe-kbuild-all/202312030417.66c5PwHj-lkp@intel.com/ Cc: Richard Weinberger <richard@nod.at> Cc: linux-mtd@lists.infradead.org Reviewed-by: Zhihao Cheng <chengzhihao1@huawei.com> [rw: massaged patch to resolve conflict ] Signed-off-by: Richard Weinberger <richard@nod.at>	2024-01-06 23:49:50 +01:00
Zhihao Cheng	1e022216dc	ubifs: ubifs_symlink: Fix memleak of inode->i_link in error path For error handling path in ubifs_symlink(), inode will be marked as bad first, then iput() is invoked. If inode->i_link is initialized by fscrypt_encrypt_symlink() in encryption scenario, inode->i_link won't be freed by callchain ubifs_free_inode -> fscrypt_free_inode in error handling path, because make_bad_inode() has changed 'inode->i_mode' as 'S_IFREG'. Following kmemleak is easy to be reproduced by injecting error in ubifs_jnl_update() when doing symlink in encryption scenario: unreferenced object 0xffff888103da3d98 (size 8): comm "ln", pid 1692, jiffies 4294914701 (age 12.045s) backtrace: kmemdup+0x32/0x70 __fscrypt_encrypt_symlink+0xed/0x1c0 ubifs_symlink+0x210/0x300 [ubifs] vfs_symlink+0x216/0x360 do_symlinkat+0x11a/0x190 do_syscall_64+0x3b/0xe0 There are two ways fixing it: 1. Remove make_bad_inode() in error handling path. We can do that because ubifs_evict_inode() will do same processes for good symlink inode and bad symlink inode, for inode->i_nlink checking is before is_bad_inode(). 2. Free inode->i_link before marking inode bad. Method 2 is picked, it has less influence, personally, I think. Cc: stable@vger.kernel.org Fixes: `2c58d548f5` ("fscrypt: cache decrypted symlink target in ->i_link") Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com> Suggested-by: Eric Biggers <ebiggers@kernel.org> Reviewed-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Richard Weinberger <richard@nod.at>	2024-01-06 23:34:56 +01:00
Kent Overstreet	169de41985	bcachefs: eytzinger0_find() search should be const Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-05 23:24:46 -05:00
Kent Overstreet	f5d4481c3e	bcachefs: move "ptrs not changing" optimization to bch2_trigger_extent() This is useful for btree ptrs as well, when we're just updating sectors_written. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-05 23:24:46 -05:00
Kent Overstreet	e7999235e6	bcachefs: fix simulateously upgrading & downgrading Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-05 23:24:21 -05:00
Kent Overstreet	72e2c920e4	bcachefs: Restart recovery passes more reliably Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-05 23:24:21 -05:00
Kent Overstreet	d04d272743	bcachefs: bch2_dump_bset() doesn't choke on u64s == 0 Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-05 23:24:21 -05:00
Kent Overstreet	4819b66e29	bcachefs: improve checksum error messages new helpers: - bch2_csum_to_text() - bch2_csum_err_msg() standardize our checksum error messages a bit, and print out the checksums a bit more nicely. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-05 23:24:21 -05:00
Kent Overstreet	2d02bfb01b	bcachefs: improve validate_bset_keys() Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-05 23:24:21 -05:00
Kent Overstreet	5e448c4893	bcachefs: print sb magic when relevant Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-05 23:24:21 -05:00
Kent Overstreet	5b88365660	bcachefs: __bch2_sb_field_to_text() Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-05 23:24:21 -05:00
Kent Overstreet	1f5af5fc17	bcachefs: %pg is banished not portable to userspace Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-05 23:24:21 -05:00
Kent Overstreet	c13fbb7de2	bcachefs: Improve would_deadlock trace event We now include backtraces for every thread involved in the cycle. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-05 23:24:21 -05:00
Kent Overstreet	074cbcdaee	bcachefs: fsck_err()s don't need to manually check c->sb.version anymore Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-05 23:24:21 -05:00
Kent Overstreet	15eaaa4c31	bcachefs: Upgrades now specify errors to fix, like downgrades Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-05 23:24:20 -05:00
Kent Overstreet	d641d4cae7	bcachefs: no thread_with_file in userspace Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-05 23:24:20 -05:00
Kent Overstreet	a64a37338d	bcachefs: Don't autofix errors we can't fix Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-05 23:24:20 -05:00
Kent Overstreet	e9bc59f9df	bcachefs: add missing bch2_latency_acct() call Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-05 23:24:20 -05:00
Kent Overstreet	4798bd2443	bcachefs: increase max_active on io_complete_wq this definitely should _not_ be 1, and we don't actually want any concurrency limiting at all here - btree node read completions are getting blocked behind btree node write submissions. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-05 23:24:20 -05:00
Kent Overstreet	c72e4d7a30	bcachefs: add time_stats for btree_node_read_done() Seeing weird latency issues in the btree node read path - add one bch2_btree_node_read_done(). Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-05 23:24:20 -05:00
Kent Overstreet	b819f30855	bcachefs: don't clear accessed bit in btree node fill Seeing strange performance issues that might be caused by memory pressure causing prefetched nodes to be evicted before they're used. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-05 23:24:20 -05:00
Kent Overstreet	49a5192c0e	bcachefs: Add an option to control btree node prefetching Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-05 23:24:20 -05:00
Kent Overstreet	8a0dda6fd6	bcachefs: kill useless return ret Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-05 23:24:20 -05:00
Kent Overstreet	f0431c5f47	bcachefs: Combine .trans_trigger, .atomic_trigger Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-05 23:24:20 -05:00
Kent Overstreet	4f9ec59f8f	bcachefs: unify extent trigger Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-05 23:24:20 -05:00
Kent Overstreet	5a82ec3fea	bcachefs: bch2_trigger_stripe_ptr() Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-05 23:24:20 -05:00
Kent Overstreet	d55ddf6e7a	bcachefs: Online fsck can now fix errors BCH_FS_fsck_done -> BCH_FS_fsck_running; set when we might be fixing fsck errors. Also; set fix_errors to ask by default when fsck is running. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-05 23:24:20 -05:00
Kent Overstreet	1f34c21bc6	bcachefs: bch2_trigger_pointer() Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-05 23:24:20 -05:00
Kent Overstreet	e4eb3e5ae4	bcachefs: unify stripe trigger Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-05 23:24:20 -05:00
Kent Overstreet	f4f78779bb	bcachefs: move stripe triggers to ec.c Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-05 23:24:20 -05:00
Kent Overstreet	153d1c63c2	bcachefs: unify alloc trigger Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-05 23:24:20 -05:00
Kent Overstreet	6820ac2cdc	bcachefs: move bch2_mark_alloc() to alloc_background.c Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-05 23:24:20 -05:00
Kent Overstreet	6cacd0c414	bcachefs: unify reservation trigger Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-05 23:24:20 -05:00
Kent Overstreet	7bc4d18af4	bcachefs: unify reflink_p trigger Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-05 23:24:20 -05:00
Kent Overstreet	08bc959010	bcachefs: unify inode trigger Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-05 23:24:19 -05:00
Kent Overstreet	282e7c37eb	bcachefs: kill mem_trigger_run_overwrite_then_insert() now that type signatures are unified, redundant Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-05 23:24:19 -05:00
Kent Overstreet	c95e9ec486	bcachefs: BTREE_TRIGGER_TRANSACTIONAL New flag so that triggers can distinguish whether we're running transactional or atomic triggers (or gc) - unifying the callbacks. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-05 23:24:19 -05:00
Kent Overstreet	089e311347	bcachefs: Kill BTREE_TRIGGER_NOATOMIC dead code Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-05 23:24:19 -05:00
Kent Overstreet	ad00bce07d	bcachefs: mark now takes bkey_s Prep work for disk space accounting rewrite: we're going to want to use a single callback for both of our current triggers, so we need to change them to have the same type signature first. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-05 23:24:19 -05:00
Kent Overstreet	717296c34c	bcachefs: trans_mark now takes bkey_s Prep work for disk space accounting rewrite: we're going to want to use a single callback for both of our current triggers, so we need to change them to have the same type signature first. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-05 23:24:19 -05:00
Kent Overstreet	eff1f728be	bcachefs: Upgrading uses bch_sb.recovery_passes_required Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-05 23:24:19 -05:00
Kent Overstreet	96f37eabe7	bcachefs: factor out thread_with_file, thread_with_stdio thread_with_stdio now knows how to handle input - fsck can now prompt to fix errors. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-05 23:24:19 -05:00
Kent Overstreet	f60250de32	bcachefs: Fix printing of device durability BCH_MEMBER_DURABILITY() was not present initially; a value of 0 means use the default, nonzero means use v - 1. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-05 23:24:19 -05:00
Kent Overstreet	8feaebb0ae	bcachefs: __bch2_journal_key_to_wb -> bch2_journal_key_to_wb_slowpath Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-05 23:24:19 -05:00
Kent Overstreet	f412392f6e	bcachefs: __journal_keys_sort() refactoring Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-05 23:24:19 -05:00
Kent Overstreet	371650143d	bcachefs: wb_key_cmp -> wb_key_ref_cmp Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-05 23:24:19 -05:00
Kent Overstreet	89056f245b	bcachefs: track transaction durations Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-05 23:24:19 -05:00
Kent Overstreet	83322e8ca8	bcachefs: btree_trans always has stats reserve slot 0 for unknown (when we overflow), to avoid some branches Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-05 23:24:19 -05:00
Kent Overstreet	0d529663f0	bcachefs: Split brain detection Use the new bch_member->seq, sb->write_time fields to detect split brain and kick out devices when necessary. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-05 23:24:19 -05:00
Kent Overstreet	6b00de06f5	bcachefs: bch_member->seq Add new fields for split brain detection: - bch_member->seq, which tracks the sequence number of the last superblock write that happened to each member device - bch_sb->write_time, which tracks the time of the last superblock write, to allow detection of when two members have diverged but had the same number of superblock writes. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-05 23:24:19 -05:00
Kent Overstreet	62719cf33c	bcachefs: Fix nochanges/read_only interaction nochanges means "we cannot issue writes at all"; it's possible to go into a pseudo read-write mode where we pin dirty metadata in memory, which is used for fsck in dry run mode and doing journal replay on a read only mount, but we do not want to allow an actual read-write mount in nochanges mode. But we do always want to allow early read-write, during recovery - this patch clarifies that. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-05 23:24:19 -05:00
Kent Overstreet	5e32914514	bcachefs: Check journal entries for invalid keys in trans commit path Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-05 23:24:19 -05:00
David Howells	807c6d09cc	netfs: Fix the loop that unmarks folios after writing to the cache In the loop in netfs_rreq_unmark_after_write() that removes the PG_fscache from folios after they've been written to the cache, as soon as we remove the mark from a multipage folio, it can get split - and then we might see a fragment of folio again. Guard against this by advancing the 'unlocked' tracker to the index of the last page in the folio to avoid a double removal of the PG_fscache mark. Reported-by: Marc Dionne <marc.dionne@auristor.com> Signed-off-by: David Howells <dhowells@redhat.com> cc: Matthew Wilcox <willy@infradead.org> cc: linux-afs@lists.infradead.org cc: linux-cachefs@redhat.com cc: linux-fsdevel@vger.kernel.org cc: linux-mm@kvack.org	2024-01-05 23:13:48 +00:00
Linus Torvalds	0d3ac66ed8	nfsd-6.7 fixes: - Fix another regression in the NFSD administrative API -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEKLLlsBKG3yQ88j7+M2qzM29mf5cFAmWYCHIACgkQM2qzM29m f5c7FRAAoXVRhP/vyua6PcfF8Je0bg83fhvqG8ppbLE6EjFzbMM7fZ9G5MudtY7/ w4PuDJ8rhQFl8oQTy004qkqV3JfZGo7buCumKoj8IA8m67cQ0WyB6Lz1fSW9BGHq cc0PWQR/z88laRvCPViW+jOsLx5f+VvvyEsDUvEYdG6xXtTVhDfNxYQFPc3bRfd0 Sf5cPBuBbMAlz9PZm/c276y46rkBP3+2ONx+zHz0k/ZR/EgJ0RoxmzZIrAUguCRy fAUW0xvrAuTMY6LMb/0YyKH6IOqYoImKWMGjhi0VFFG5AuxbmU7lLVo/S9tUlWMs aAcyEtyuVU1422dikF/FRtstswfz64fj+PqIwiq8qjt14WB+BX/tV1tjwAjuI2py YVt9T50FkhG42IG+Nkub7SXm6OuinfiUBcAilPMfBnSapT5l6BvVwrSXGIApcF87 NmQvRqjO4tEYE6/aYsUV/jZGQVYPTspvhv5kzXMQnL/6h/KlhrNLvTF9gTJFs+xM ahY7I2nVmXayoLfrweaWX/a1VNMeYLvVOUPqm/DFiWWw51TqkmD8CHQwCsBiR0i9 iVu10eh0x9gakPBDVFuCLG+BOL+vsUxfgbeAOJdHYC0s78YLRovyFsDTHxRNv7hW hDp+eblGUb+v5kNPbEZJKMmWXCwivOHnacc2EJyA1UcGRLL7I24= =TKjN -----END PGP SIGNATURE----- Merge tag 'nfsd-6.7-3' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux Pull nfsd fix from Chuck Lever: - Fix another regression in the NFSD administrative API * tag 'nfsd-6.7-3' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: nfsd: drop the nfsd_put helper	2024-01-05 13:12:29 -08:00
Matthew Wilcox (Oracle)	bcd30d4cd9	buffer: fix unintended successful return If try_to_free_buffers() succeeded and then folio_alloc_buffers() failed, grow_dev_folio() would return success. This would be incorrect; memory allocation failure is supposed to result in a failure. It's a harmless bug; the caller will simply go around the loop one more time and grow_dev_folio() will correctly return a failure that time. But it was an unintended change and looks like a more serious bug than it is. While I'm in here, improve the commentary about why we return success even though we failed. Link: https://lkml.kernel.org/r/20240101093848.2017115-1-willy@infradead.org Fixes: `6d840a1877` ("buffer: return bool from grow_dev_folio()") Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reported-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2024-01-05 10:17:43 -08:00
Linus Torvalds	3eca89454a	Three important multichannel smb3 client fixes -----BEGIN PGP SIGNATURE----- iQGyBAABCgAdFiEE6fsu8pdIjtWE/DpLiiy9cAdyT1EFAmWWRDMACgkQiiy9cAdy T1E+tAv4g15Kh9JeSaO/x1oyujKSDnEALWGvriwj9EC6+Z+MzQQyyPxB4uUyRuSN uts/QgvpK4LW2jIhyTaBfU9xRfhJYup1B+vaBtij6NwKfnxYo1PJCdAj8yelAfCW 7mgqSiIGMRbRFz3jnFbb5vQS3H32Hpje0h32Rh3LiKrS6q+9exCJDHA6jqQ5JwvP /BrDeJ0vykyUjvjxcYQaeWZDkXEy7bysVVq+3qOEr9HSwX7Jv1f6JOLLpUoTTWx9 LPkBus02JOAWX6FUWewVdx1ar9cTafA54rzS3hrpT2lxjuMhSmL6v9Br9MCsAMnI oWMWfjyefAM6ss4m/pAD178WZ61f1urIRrTRJJUwcWA+XjXTF6dXXZvMFdzqjYW5 k6nMhDfFXzSzywwSFmlQwh/00NPVzX7DuGjgd3ZdJ7z6bvFvltwZtkRPMVK7zvHc X8LWAWYuaLku4yj3PJa3FiBrSYPsztTX3NLqQq+5vcxxTDCOcFduXComvvEosSiq 4UbhxHU= =0P5G -----END PGP SIGNATURE----- Merge tag '6.7-rc8-smb3-mchan-fixes' of git://git.samba.org/sfrench/cifs-2.6 Pull smb client fixes from Steve French: "Three important multichannel smb3 client fixes found in recent testing: - fix oops due to incorrect refcounting of interfaces after disabling multichannel - fix possible unrecoverable session state after disabling multichannel with active sessions - fix two places that were missing use of chan_lock" * tag '6.7-rc8-smb3-mchan-fixes' of git://git.samba.org/sfrench/cifs-2.6: cifs: do not depend on release_iface for maintaining iface_list cifs: cifs_chan_is_iface_active should be called with chan_lock held cifs: after disabling multichannel, mark tcon for reconnect	2024-01-05 08:52:25 -08:00
Zhihao Cheng	c07a4dab24	ubifs: Check @c->dirty_[n\|p]n_cnt and @c->nroot state under @c->lp_mutex The checking of @c->nroot->flags and @c->dirty_[n\|p]n_cnt in function nothing_to_commit() is not atomic, which could be raced with modifying of lpt, for example: P1 P2 P3 run_gc ubifs_garbage_collect do_commit ubifs_return_leb ubifs_lpt_lookup_dirty dirty_cow_nnode do_commit nothing_to_commit if (test_bit(DIRTY_CNODE, &c->nroot->flags) // false test_and_set_bit(DIRTY_CNODE, &nnode->flags) c->dirty_nn_cnt += 1 ubifs_assert(c, c->dirty_nn_cnt == 0) // false ! Fetch a reproducer in Link: UBIFS error (ubi0:0 pid 2747): ubifs_assert_failed UBIFS assert failed: c->dirty_pn_cnt == 0, in fs/ubifs/commit.c Call Trace: ubifs_ro_mode+0x58/0x70 [ubifs] ubifs_assert_failed+0x6a/0x90 [ubifs] do_commit+0x5b7/0x930 [ubifs] ubifs_run_commit+0xc6/0x1a0 [ubifs] ubifs_sync_fs+0xd8/0x110 [ubifs] sync_filesystem+0xb4/0x120 do_syscall_64+0x6f/0x140 Fix it by checking @c->dirty_[n\|p]n_cnt and @c->nroot state with @c->lp_mutex locked. Fixes: `944fdef52c` ("UBIFS: do not start the commit if there is nothing to commit") Link: https://bugzilla.kernel.org/show_bug.cgi?id=218162 Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com> Signed-off-by: Richard Weinberger <richard@nod.at>	2024-01-05 17:03:41 +01:00
David Howells	92a714d727	netfs: Fix interaction between write-streaming and cachefiles culling An issue can occur between write-streaming (storing dirty data in partial non-uptodate pages) and a cachefiles object being culled to make space. The problem occurs because the cache object is only marked in use while there are files open using it. Once it has been released, it can be culled and the cookie marked disabled. At this point, a streaming write is permitted to occur (if the cache is active, we require pages to be prefetched and cached), but the cache can become active again before this gets flushed out - and then two effects can occur: (1) The cache may be asked to write out a region that's less than its DIO block size (assumed by cachefiles to be PAGE_SIZE) - and this causes one of two debugging statements to be emitted. (2) netfs_how_to_modify() gets confused because it sees a page that isn't allowed to be non-uptodate being uptodate and tries to prefetch it - leading to a warning that PG_fscache is set twice. Fix this by the following means: (1) Add a netfs_inode flag to disallow write-streaming to an inode and set it if we ever do local caching of that inode. It remains set for the lifetime of that inode - even if the cookie becomes disabled. (2) If the no-write-streaming flag is set, then make netfs_how_to_modify() always want to prefetch instead. (3) If netfs_how_to_modify() decides it wants to prefetch a folio, but that folio has write-streamed data in it, then it requires the folio be flushed first. (4) Export a counter of the number of times we wanted to prefetch a non-uptodate page, but found it had write-streamed data in it. (5) Export a counter of the number of times we cancelled a write to the cache because it didn't DIO align and remove the debug statements. Reported-by: Marc Dionne <marc.dionne@auristor.com> Signed-off-by: David Howells <dhowells@redhat.com> cc: Jeff Layton <jlayton@kernel.org> cc: linux-cachefs@redhat.com cc: linux-erofs@lists.ozlabs.org cc: linux-fsdevel@vger.kernel.org cc: linux-mm@kvack.org	2024-01-05 15:42:25 +00:00
Sascha Hauer	2ba5b48938	ubifs: describe function parameters With `16a26b20d2` ("ubifs: authentication: Add hashes to index nodes") insert_node() and insert_dent() got a new function parameter 'hash'. Add a description for this new parameter. Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de> Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202311051618.D7YUE1Rr-lkp@intel.com/ Reviewed-by: Zhihao Cheng <chengzhihao1@huawei.com> Signed-off-by: Richard Weinberger <richard@nod.at>	2024-01-05 16:35:25 +01:00
Randy Dunlap	19c2fcb4a4	ubifs: auth.c: fix kernel-doc function prototype warning Use the correct function name in the kernel-doc comment to prevent a kernel-doc warning: auth.c:30: warning: expecting prototype for ubifs_node_calc_hash(). Prototype was for __ubifs_node_calc_hash() instead Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Reported-by: kernel test robot <lkp@intel.com> Closes: lore.kernel.org/r/202311052125.gE1Rylox-lkp@intel.com Cc: Richard Weinberger <richard@nod.at> Cc: linux-mtd@lists.infradead.org Reviewed-by: Zhihao Cheng <chengzhihao1@huawei.com> Signed-off-by: Richard Weinberger <richard@nod.at>	2024-01-05 16:34:39 +01:00
Eric Biggers	738fadaa54	ubifs: use crypto_shash_tfm_digest() in ubifs_hmac_wkm() Simplify ubifs_hmac_wkm() by using crypto_shash_tfm_digest() instead of an alloc+init+update+final sequence. This should also improve performance. Signed-off-by: Eric Biggers <ebiggers@google.com> Tested-by: Zhihao Cheng <chengzhihao1@huawei.com> Signed-off-by: Richard Weinberger <richard@nod.at>	2024-01-05 16:29:53 +01:00
David Howells	4088e38947	netfs: Count DIO writes Provide a counter for DIO writes to match that for DIO reads. Signed-off-by: David Howells <dhowells@redhat.com> cc: Jeff Layton <jlayton@kernel.org> cc: linux-cachefs@redhat.com cc: linux-fsdevel@vger.kernel.org cc: linux-mm@kvack.org	2024-01-05 15:22:37 +00:00
David Howells	0e4d464cda	netfs: Mark netfs_unbuffered_write_iter_locked() static Mark netfs_unbuffered_write_iter_locked() static as it's only called from the file in which it is defined. Signed-off-by: David Howells <dhowells@redhat.com> cc: Jeff Layton <jlayton@kernel.org> cc: linux-cachefs@redhat.com cc: linux-fsdevel@vger.kernel.org cc: linux-mm@kvack.org	2024-01-05 15:20:34 +00:00
Zhihao Cheng	ada3fb86a3	ext4: move ext4_check_bdev_write_error() into nojournal mode Since JBD2 takes care of all metadata writeback errors of fs dev, ext4_check_bdev_write_error() is useful only in nojournal mode. Move it into '!ext4_handle_valid(handle)' branch. Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com> Suggested-by: Jan Kara <jack@suse.cz> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20231213013224.2100050-6-chengzhihao1@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2024-01-04 23:42:21 -05:00
Zhihao Cheng	b4e73e6126	jbd2: abort journal when detecting metadata writeback error of fs dev This is a replacement solution of commit `bc71726c72` ("ext4: abort the filesystem if failed to async write metadata buffer"), JBD2 can detect metadata writeback error of fs dev by 'j_fs_dev_wb_err'. Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20231213013224.2100050-5-chengzhihao1@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2024-01-04 23:42:21 -05:00
Zhihao Cheng	8a4fd33d87	jbd2: remove unused 'JBD2_CHECKPOINT_IO_ERROR' and 'j_atomic_flags' Since 'JBD2_CHECKPOINT_IO_ERROR' and j_atomic_flags' are not useful anymore after fs dev's errseq is imported into jbd2, just remove them. Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20231213013224.2100050-4-chengzhihao1@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2024-01-04 23:42:21 -05:00
Zhihao Cheng	62ec1707cb	jbd2: replace journal state flag by checking errseq Now JBD2 detects metadata writeback error of fs dev according to errseq. Replace journal state flag by checking errseq. Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com> Suggested-by: Jan Kara <jack@suse.cz> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20231213013224.2100050-3-chengzhihao1@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2024-01-04 23:42:21 -05:00
Zhihao Cheng	990b6b5b13	jbd2: add errseq to detect client fs's bdev writeback error Add errseq in journal, so that JBD2 can detect whether metadata is successfully written to fs bdev. This patch adds detection in recovery process to replace original solution(using local variable wb_err). Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com> Suggested-by: Jan Kara <jack@suse.cz> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20231213013224.2100050-2-chengzhihao1@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2024-01-04 23:42:21 -05:00
Gou Hao	2bf5eb2a7c	ext4: improving calculation of 'fe_{len\|start}' in mb_find_extent() After first execution of mb_find_order_for_block(): 'fe_start' is the value of 'block' passed in mb_find_extent(). 'fe_len' is the difference between the length of order-chunk and remainder of the block divided by order-chunk. And 'next' does not require initialization after above modifications. Signed-off-by: Gou Hao <gouhao@uniontech.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20231113082617.11258-1-gouhao@uniontech.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2024-01-04 23:35:55 -05:00
Ojaswin Mujoo	c6bfd72409	ext4: clarify handling of unwritten bh in __ext4_block_zero_page_range() As an optimization, I was trying to work on exiting early from this function if dealing with unwritten extent since they anyways read 0. However, it was realised that there are certain code paths that can end up calling ext4_block_zero_page_range() for an unwritten bh that might still have data in pagecache. In this case, we can't exit early and we do require to process the bh and zero out the pagecache to ensure that a writeback can't kick in at a later time and flush the stale pagecache to disk. Since, adding the logic to exit early for unwritten bh was turning out to be much more nuanced and the current code already handles it well, just add a comment to explicitly document this behavior. Suggested-by: Jan Kara <jack@suse.cz> Signed-off-by: Ojaswin Mujoo <ojaswin@linux.ibm.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/d859b7ae5fe42e6626479b91ed9f4da3aae4c597.1698856309.git.ojaswin@linux.ibm.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2024-01-04 23:28:47 -05:00
Ojaswin Mujoo	9257336914	ext4: treat end of range as exclusive in ext4_zero_range() The call to filemap_write_and_wait_range() assumes the range passed to be inclusive, so fix the call to make sure we follow that. Signed-off-by: Ojaswin Mujoo <ojaswin@linux.ibm.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/e503107a7c73a2b68dec645c5ad798c437717c45.1698856309.git.ojaswin@linux.ibm.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2024-01-04 23:28:47 -05:00
Ojaswin Mujoo	e89fdcc425	ext4: enable dioread_nolock as default for bs < ps case dioread_nolock was originally disabled as a default option for bs < ps scenarios due to a data corruption issue. Since then, we've had some fixes in this area which address such issues. Enable dioread_nolock by default and remove the experimental warning message for bs < ps path. dioread for bs < ps has been tested on a 64k pagesize machine using: kvm-xfstest -C 3 -g auto with the following configs: 64k adv bigalloc_4k bigalloc_64k data_journal encrypt dioread_nolock dioread_nolock_4k ext3 ext3conv nojournal And no new regressions were seen compared to baseline kernel. Suggested-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com> Signed-off-by: Ojaswin Mujoo <ojaswin@linux.ibm.com> Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com> Link: https://lore.kernel.org/r/20231101154717.531865-1-ojaswin@linux.ibm.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2024-01-04 23:28:47 -05:00
Gou Hao	f2fec3e99a	ext4: delete redundant calculations in ext4_mb_get_buddy_page_lock() 'blocks_per_page' is always 1 after 'if (blocks_per_page >= 2)', 'pnum' and 'block' are equal in this case. Signed-off-by: Gou Hao <gouhao@uniontech.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20231024035215.29474-1-gouhao@uniontech.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2024-01-04 23:26:21 -05:00
Jeff Layton	64e6304169	nfsd: drop the nfsd_put helper It's not safe to call nfsd_put once nfsd_last_thread has been called, as that function will zero out the nn->nfsd_serv pointer. Drop the nfsd_put helper altogether and open-code the svc_put in its callers instead. That allows us to not be reliant on the value of that pointer when handling an error. Fixes: `2a501f55cd` ("nfsd: call nfsd_last_thread() before final nfsd_put()") Reported-by: Zhi Li <yieli@redhat.com> Cc: NeilBrown <neilb@suse.de> Signed-off-by: Jeffrey Layton <jlayton@redhat.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2024-01-04 22:52:27 -05:00
Jakub Kicinski	e63c1822ac	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Cross-merge networking fixes after downstream PR. Conflicts: drivers/net/ethernet/broadcom/bnxt/bnxt.c `e009b2efb7` ("bnxt_en: Remove mis-applied code from bnxt_cfg_ntp_filters()") `0f2b214779` ("bnxt_en: Fix compile error without CONFIG_RFS_ACCEL") https://lore.kernel.org/all/20240105115509.225aa8a2@canb.auug.org.au/ Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-01-04 18:06:46 -08:00
Steven Rostedt (Google)	1de94b52d5	eventfs: Shortcut eventfs_iterate() by skipping entries already read As the ei->entries array is fixed for the duration of the eventfs_inode, it can be used to skip over already read entries in eventfs_iterate(). That is, if ctx->pos is greater than zero, there's no reason in doing the loop across the ei->entries array for the entries less than ctx->pos. Instead, start the lookup of the entries at the current ctx->pos. Link: https://lore.kernel.org/all/CAHk-=wiKwDUDv3+jCsv-uacDcHDVTYsXtBR9=6sGM5mqX+DhOg@mail.gmail.com/ Link: https://lore.kernel.org/linux-trace-kernel/20240104220048.494956957@goodmis.org Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Christian Brauner <brauner@kernel.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>	2024-01-04 17:11:58 -05:00
Steven Rostedt (Google)	704f960dbe	eventfs: Read ei->entries before ei->children in eventfs_iterate() In order to apply a shortcut to skip over the current ctx->pos immediately, by using the ei->entries array, the reading of that array should be first. Moving the array reading before the linked list reading will make the shortcut change diff nicer to read. Link: https://lore.kernel.org/all/CAHk-=wiKwDUDv3+jCsv-uacDcHDVTYsXtBR9=6sGM5mqX+DhOg@mail.gmail.com/ Link: https://lore.kernel.org/linux-trace-kernel/20240104220048.333115095@goodmis.org Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Christian Brauner <brauner@kernel.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>	2024-01-04 17:11:58 -05:00
Steven Rostedt (Google)	1e4624eb5a	eventfs: Do ctx->pos update for all iterations in eventfs_iterate() The ctx->pos was only updated when it added an entry, but the "skip to current pos" check (c--) happened for every loop regardless of if the entry was added or not. This inconsistency caused readdir to be incorrect. It was due to: for (i = 0; i < ei->nr_entries; i++) { if (c > 0) { c--; continue; } mutex_lock(&eventfs_mutex); /* If ei->is_freed then just bail here, nothing more to do / if (ei->is_freed) { mutex_unlock(&eventfs_mutex); goto out; } r = entry->callback(name, &mode, &cdata, &fops); mutex_unlock(&eventfs_mutex); [..] ctx->pos++; } But this can cause the iterator to return a file that was already read. That's because of the way the callback() works. Some events may not have all files, and the callback can return 0 to tell eventfs to skip the file for this directory. for instance, we have: # ls /sys/kernel/tracing/events/ftrace/function format hist hist_debug id inject and # ls /sys/kernel/tracing/events/sched/sched_switch/ enable filter format hist hist_debug id inject trigger Where the function directory is missing "enable", "filter" and "trigger". That's because the callback() for events has: static int event_callback(const char name, umode_t mode, void data, const struct file_operations fops) { struct trace_event_file file = data; struct trace_event_call call = file->event_call; [..] /* * Only event directories that can be enabled should have * triggers or filters, with the exception of the "print" * event that can have a "trigger" file. / if (!(call->flags & TRACE_EVENT_FL_IGNORE_ENABLE)) { if (call->class->reg && strcmp(name, "enable") == 0) { mode = TRACE_MODE_WRITE; fops = &ftrace_enable_fops; return 1; } if (strcmp(name, "filter") == 0) { mode = TRACE_MODE_WRITE; fops = &ftrace_event_filter_fops; return 1; } } if (!(call->flags & TRACE_EVENT_FL_IGNORE_ENABLE) \|\| strcmp(trace_event_name(call), "print") == 0) { if (strcmp(name, "trigger") == 0) { mode = TRACE_MODE_WRITE; *fops = &event_trigger_fops; return 1; } } [..] return 0; } Where the function event has the TRACE_EVENT_FL_IGNORE_ENABLE set. This means that the entries array elements for "enable", "filter" and "trigger" when called on the function event will have the callback return 0 and not 1, to tell eventfs to skip these files for it. Because the "skip to current ctx->pos" check happened for all entries, but the ctx->pos++ only happened to entries that exist, it would confuse the reading of a directory. Which would cause: # ls /sys/kernel/tracing/events/ftrace/function/ format hist hist hist_debug hist_debug id inject inject The missing "enable", "filter" and "trigger" caused ls to show "hist", "hist_debug" and "inject" twice. Update the ctx->pos for every iteration to keep its update and the "skip" update consistent. This also means that on error, the ctx->pos needs to be decremented if it was incremented without adding something. Link: https://lore.kernel.org/all/20240104150500.38b15a62@gandalf.local.home/ Link: https://lore.kernel.org/linux-trace-kernel/20240104220048.172295263@goodmis.org Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Christian Brauner <brauner@kernel.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Fixes: `493ec81a8f` ("eventfs: Stop using dcache_readdir() for getdents()") Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>	2024-01-04 17:11:58 -05:00
Steven Rostedt (Google)	e109deadb7	eventfs: Have eventfs_iterate() stop immediately if ei->is_freed is set If ei->is_freed is set in eventfs_iterate(), it means that the directory that is being iterated on is in the process of being freed. Just exit the loop immediately when that is ever detected, and separate out the return of the entry->callback() from ei->is_freed. Link: https://lore.kernel.org/linux-trace-kernel/20240104220048.016261289@goodmis.org Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Christian Brauner <brauner@kernel.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>	2024-01-04 17:11:57 -05:00
Benjamin Coddington	57331a59ac	NFSv4.1: Use the nfs_client's rpc timeouts for backchannel For backchannel requests that lookup the appropriate nfs_client, use the state-management rpc_clnt's rpc_timeout parameters for the backchannel's response. When the nfs_client cannot be found, fall back to using the xprt's default timeout parameters. Signed-off-by: Benjamin Coddington <bcodding@redhat.com> Tested-by: Chuck Lever <chuck.lever@oracle.com> Tested-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2024-01-04 17:01:01 -05:00
Andrea Righi	c312828c37	kernfs: convert kernfs_idr_lock to an irq safe raw spinlock bpf_cgroup_from_id() is basically a wrapper to cgroup_get_from_id(), that is relying on kernfs to determine the right cgroup associated to the target id. As a kfunc, it has the potential to be attached to any function through BPF, particularly in contexts where certain locks are held. However, kernfs is not using an irq safe spinlock for kernfs_idr_lock, that means any kernfs function that is acquiring this lock can be interrupted and potentially hit bpf_cgroup_from_id() in the process, triggering a deadlock. For example, it is really easy to trigger a lockdep splat between kernfs_idr_lock and rq->_lock, attaching a small BPF program to __set_cpus_allowed_ptr_locked() that just calls bpf_cgroup_from_id(): ===================================================== WARNING: HARDIRQ-safe -> HARDIRQ-unsafe lock order detected 6.7.0-rc7-virtme #5 Not tainted ----------------------------------------------------- repro/131 [HC0[0]:SC0[0]:HE0:SE1] is trying to acquire: ffffffffb2dc4578 (kernfs_idr_lock){+.+.}-{2:2}, at: kernfs_find_and_get_node_by_id+0x1d/0x80 and this task is already holding: ffff911cbecaf218 (&rq->__lock){-.-.}-{2:2}, at: task_rq_lock+0x50/0xc0 which would create a new lock dependency: (&rq->__lock){-.-.}-{2:2} -> (kernfs_idr_lock){+.+.}-{2:2} but this new dependency connects a HARDIRQ-irq-safe lock: (&rq->__lock){-.-.}-{2:2} ... which became HARDIRQ-irq-safe at: lock_acquire+0xbf/0x2b0 _raw_spin_lock_nested+0x2e/0x40 scheduler_tick+0x5d/0x170 update_process_times+0x9c/0xb0 tick_periodic+0x27/0xe0 tick_handle_periodic+0x24/0x70 __sysvec_apic_timer_interrupt+0x64/0x1a0 sysvec_apic_timer_interrupt+0x6f/0x80 asm_sysvec_apic_timer_interrupt+0x1a/0x20 memcpy+0xc/0x20 arch_dup_task_struct+0x15/0x30 copy_process+0x1ce/0x1eb0 kernel_clone+0xac/0x390 kernel_thread+0x6f/0xa0 kthreadd+0x199/0x230 ret_from_fork+0x31/0x50 ret_from_fork_asm+0x1b/0x30 to a HARDIRQ-irq-unsafe lock: (kernfs_idr_lock){+.+.}-{2:2} ... which became HARDIRQ-irq-unsafe at: ... lock_acquire+0xbf/0x2b0 _raw_spin_lock+0x30/0x40 __kernfs_new_node.isra.0+0x83/0x280 kernfs_create_root+0xf6/0x1d0 sysfs_init+0x1b/0x70 mnt_init+0xd9/0x2a0 vfs_caches_init+0xcf/0xe0 start_kernel+0x58a/0x6a0 x86_64_start_reservations+0x18/0x30 x86_64_start_kernel+0xc5/0xe0 secondary_startup_64_no_verify+0x178/0x17b other info that might help us debug this: Possible interrupt unsafe locking scenario: CPU0 CPU1 ---- ---- lock(kernfs_idr_lock); local_irq_disable(); lock(&rq->__lock); lock(kernfs_idr_lock); <Interrupt> lock(&rq->__lock); * DEADLOCK * Prevent this deadlock condition converting kernfs_idr_lock to a raw irq safe spinlock. The performance impact of this change should be negligible and it also helps to prevent similar deadlock conditions with any other subsystems that may depend on kernfs. Fixes: `332ea1f697` ("bpf: Add bpf_cgroup_from_id() kfunc") Cc: stable <stable@kernel.org> Signed-off-by: Andrea Righi <andrea.righi@canonical.com> Acked-by: Tejun Heo <tj@kernel.org> Link: https://lore.kernel.org/r/20231229074916.53547-1-andrea.righi@canonical.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2024-01-04 17:13:15 +01:00
Markus Elfring	597a421798	rpc_pipefs: Replace one label in bl_resolve_deviceid() The kfree() function was called in one case by the bl_resolve_deviceid() function during error handling even if the passed data structure member contained a null pointer. This issue was detected by using the Coccinelle software. Thus use an other label. Signed-off-by: Markus Elfring <elfring@users.sourceforge.net> Reviewed-by: Benjamin Coddington <bcodding@redhat.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2024-01-04 10:47:56 -05:00
Matthew Wilcox (Oracle)	12fc0a9631	nfs: Remove writepage NFS already has writepages and migrate_folio, so it does not need to implement writepage. The writepage operation is deprecated as it leads to worse performance under high memory pressure due to folios being written out in LRU order rather than sequentially within a file. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2024-01-04 10:47:56 -05:00
Benjamin Coddington	1fd5394e6a	NFS: drop unused nfs_direct_req bytes_left Now that we're calculating how large a remaining IO should be based on the current request's offset, we no longer need to track bytes_left on each struct nfs_direct_req. Drop the field, and clean up the direct request tracepoints. Signed-off-by: Benjamin Coddington <bcodding@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2024-01-04 10:47:56 -05:00
Trond Myklebust	8a6291bf3b	pNFS: Fix the pnfs block driver's calculation of layoutget size Instead of relying on the value of the 'bytes_left' field, we should calculate the layout size based on the offset of the request that is being written out. Reported-by: Benjamin Coddington <bcodding@redhat.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Fixes: `954998b60c` ("NFS: Fix error handling for O_DIRECT write scheduling") Reviewed-by: Benjamin Coddington <bcodding@redhat.com> Tested-by: Benjamin Coddington <bcodding@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2024-01-04 10:47:56 -05:00
Jeff Layton	f6e70c59ed	nfs: print fileid in lookup tracepoints With this we can see the dentry -> inode linkage that's being revalidated. A fileid of 0 means "negative dentry". Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2024-01-04 10:47:56 -05:00
Jeff Layton	310b1f89ea	nfs: rename the nfs_async_rename_done tracepoint We do async renames in other cases besides sillyrenames now. This tracepoint name is now misleading. Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2024-01-04 10:47:56 -05:00
Jeff Layton	283064fca3	nfs: add new tracepoint at nfs4 revalidate entry point Add a call to the v4 d_revalidate entrypoint, just like the v3 one. Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2024-01-04 10:47:56 -05:00
Trond Myklebust	037e56a22f	NFSv4.1/pnfs: Ensure we handle the error NFS4ERR_RETURNCONFLICT Once the client has processed the CB_LAYOUTRECALL, but has not yet successfully returned the layout, the server is supposed to switch to returning NFS4ERR_RETURNCONFLICT. This patch ensures that we handle that return value correctly. Fixes: `183d9e7b11` ("pnfs: rework LAYOUTGET retry handling") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2024-01-04 10:47:56 -05:00
Trond Myklebust	dce72920c8	NFSv4.1: if referring calls are complete, trust the stateid argument If the server is recalling a layout, and sends us a list of referring calls that we can see are complete, then we should just trust that the stateid argument is correct, even if the sequence id doesn't match the one we hold. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2024-01-04 10:47:56 -05:00
Trond Myklebust	e3fd54e7dc	NFSv4: Track the number of referring calls in struct cb_process_state When the server gives us a set of referring calls, to tell us that the NFSv4.1 callback needs to be ordered with respect to those calls, then we may want to make that information available to the operations. In certain cases, it may allow them to optimise their behaviour due to the extra knowledge. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2024-01-04 10:47:56 -05:00
Scott Mayhew	a10a923307	NFS: Use parent's objective cred in nfs_access_login_time() The subjective cred (task->cred) can potentially be overridden and subsquently freed in non-RCU context, which could lead to a panic if we try to use it in cred_fscmp(). Use __task_cred(), which returns the objective cred (task->real_cred) instead. Fixes: `0eb43812c0` ("NFS: Clear the file access cache upon login") Fixes: `5e9a7b9c2e` ("NFS: Fix up a sparse warning") Signed-off-by: Scott Mayhew <smayhew@redhat.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2024-01-04 10:47:56 -05:00
Benjamin Coddington	b4d4fd60f8	NFSv4: Always ask for type with READDIR Again we have claimed regressions for walking a directory tree, this time with the "find" utility which always tries to optimize away asking for any attributes until it has a complete list of entries. This behavior makes the readdir plus heuristic do the wrong thing, which causes a storm of GETATTRs to determine each entry's type in order to continue the walk. For v4 add the type attribute to each READDIR request to include it no matter the heuristic. This allows a simple `find` command to proceed quickly through a directory tree. Suggested-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Benjamin Coddington <bcodding@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2024-01-04 10:47:56 -05:00
Benjamin Coddington	d76c769c8d	pnfs/blocklayout: Don't add zero-length pnfs_block_dev We noticed a SCSI device that refused to allow READ CAPACITY when the device had a PR with exclusive access, registrants only. The result of this situation is that the blocklayout driver adds a pnfs_block_dev of zero length which always fails the offset_in_map tests. Instead of continuously trying to do pNFS for this case, just mark the device as unavailable which will allow the client to fallback to the MDS for the duration of PNFS_DEVICE_RETRY_TIMEOUT. Signed-off-by: Benjamin Coddington <bcodding@redhat.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2024-01-04 10:47:56 -05:00
Benjamin Coddington	1530827b90	blocklayoutdriver: Fix reference leak of pnfs_device_node The error path for blocklayout's device lookup is missing a reference drop for the case where a lookup finds the device, but the device is marked with NFS_DEVICEID_UNAVAILABLE. Fixes: `b3dce6a2f0` ("pnfs/blocklayout: handle transient devices") Signed-off-by: Benjamin Coddington <bcodding@redhat.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2024-01-04 10:47:56 -05:00
David Howells	43833f2ba5	netfs: Fix proc/fs/fscache symlink to point to "netfs" not "../netfs" Fix the proc/fs/fscache symlink to point to "netfs" not "../netfs". Reported-by: Marc Dionne <marc.dionne@auristor.com> Signed-off-by: David Howells <dhowells@redhat.com> cc: Jeff Layton <jlayton@kernel.org> cc: Christian Brauner <christian@brauner.io> cc: linux-fsdevel@vger.kernel.org cc: linux-cachefs@redhat.com	2024-01-04 13:15:32 +00:00
David Howells	252cf7b2ea	9p: Use length of data written to the server in preference to error In v9fs_upload_to_server(), we pass the error to netfslib to terminate the subreq rather than the amount of data written - even if we did actually write something. Further, we assume that the write is always entirely done if successful - but it might have been partially complete - as returned by p9_client_write(), but we ignore that. Fix this by indicating the amount written by preference and only returning the error if we didn't write anything. (We might want to return both in future if both are available as this might be useful as to whether we retry or not.) Suggested-by: Dominique Martinet <asmadeus@codewreck.org> Link: https://lore.kernel.org/r/ZZULNQAZ0n0WQv7p@codewreck.org/ Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Dominique Martinet <asmadeus@codewreck.org> cc: Eric Van Hensbergen <ericvh@kernel.org> cc: Latchesar Ionkov <lucho@ionkov.net> cc: Christian Schoenebeck <linux_oss@crudebyte.com> cc: v9fs@lists.linux.dev cc: linux-cachefs@redhat.com cc: linux-fsdevel@vger.kernel.org	2024-01-04 13:15:31 +00:00
David Howells	6c2c1e0009	9p: Do a couple of cleanups Do a couple of cleanups to 9p: (1) Remove a couple of unused variables. (2) Turn a BUG_ON() into a warning, consolidate with another warning and make the warning message include the inode number rather than whatever's in i_private (which will get hashed anyway). Suggested-by: Dominique Martinet <asmadeus@codewreck.org> Link: https://lore.kernel.org/r/ZZULNQAZ0n0WQv7p@codewreck.org/ Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Dominique Martinet <asmadeus@codewreck.org> cc: Eric Van Hensbergen <ericvh@kernel.org> cc: Latchesar Ionkov <lucho@ionkov.net> cc: Christian Schoenebeck <linux_oss@crudebyte.com> cc: v9fs@lists.linux.dev cc: linux-cachefs@redhat.com cc: linux-fsdevel@vger.kernel.org	2024-01-04 13:14:13 +00:00
Rafael J. Wysocki	22349e79b9	Merge branches 'acpi-pm', 'acpi-video', 'acpi-apei' and 'acpi-extlog' Merge an ACPI power management change, ACPI backlight driver changes, APEI updates and ACPI extlog driver changes for 6.8-rc1: - Modify the ACPI LPIT table handling code to avoid u32 multiplication overflows in state residency computations (Nikita Kiryushin). - Drop an unused helper function from the ACPI backlight (video) driver and add a clarifying comment to it (Hans de Goede). - Update the ACPI backlight driver to avoid using uninitialized memory in some cases (Nikita Kiryushin). - Add ACPI backlight quirk for the Colorful X15 AT 23 laptop (Yuluo Qiu). - Add support for vendor-defined error types to the ACPI APEI error injection code (Avadhut Naik). - Adjust APEI to properly set MF_ACTION_REQUIRED on synchronous memory failure events, so they are handled differently from the asynchronous ones (Shuai Xue). - Fix NULL pointer dereference check in the ACPI extlog driver (Prarit Bhargava). - Adjust the ACPI extlog driver to clear the Extended Error Log status when RAS_CEC handled the error (Tony Luck). * acpi-pm: ACPI: LPIT: Avoid u32 multiplication overflow * acpi-video: ACPI: video: Add quirk for the Colorful X15 AT 23 Laptop ACPI: video: check for error while searching for backlight device parent ACPI: video: Drop should_check_lcd_flag() ACPI: video: Add comment about acpi_video_backlight_use_native() usage * acpi-apei: ACPI: APEI: set memory failure flags as MF_ACTION_REQUIRED on synchronous events ACPI: APEI: EINJ: Add support for vendor defined error types platform/chrome: cros_ec_debugfs: Fix permissions for panicinfo fs: debugfs: Add write functionality to debugfs blobs ACPI: APEI: EINJ: Refactor available_error_type_show() * acpi-extlog: ACPI: extlog: Clear Extended Error Log status when RAS_CEC handled the error ACPI: extlog: fix NULL pointer dereference check	2024-01-04 13:19:40 +01:00
Steven Rostedt (Google)	8186fff7ab	tracefs/eventfs: Use root and instance inodes as default ownership Instead of walking the dentries on mount/remount to update the gid values of all the dentries if a gid option is specified on mount, just update the root inode. Add .getattr, .setattr, and .permissions on the tracefs inode operations to update the permissions of the files and directories. For all files and directories in the top level instance: /sys/kernel/tracing/* It will use the root inode as the default permissions. The inode that represents: /sys/kernel/tracing (or wherever it is mounted). When an instance is created: mkdir /sys/kernel/tracing/instance/foo The directory "foo" and all its files and directories underneath will use the default of what foo is when it was created. A remount of tracefs will not affect it. If a user were to modify the permissions of any file or directory in tracefs, it will also no longer be modified by a change in ownership of a remount. The events directory, if it is in the top level instance, will use the tracefs root inode as the default ownership for itself and all the files and directories below it. For the events directory in an instance ("foo"), it will keep the ownership of what it was when it was created, and that will be used as the default ownership for the files and directories beneath it. Link: https://lore.kernel.org/linux-trace-kernel/CAHk-=wjVdGkjDXBbvLn2wbZnqP4UsH46E3gqJ9m7UG6DpX2+WA@mail.gmail.com/ Link: https://lore.kernel.org/linux-trace-kernel/20240103215016.1e0c9811@gandalf.local.home Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Al Viro <viro@ZenIV.linux.org.uk> Cc: Christian Brauner <brauner@kernel.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>	2024-01-03 21:53:55 -05:00

... 3 4 5 6 7 ...

89195 Commits