linux

korg/linux

mirror of https://mirrors.bfsu.edu.cn/git/linux.git synced 2024-11-11 04:18:39 +08:00

Author	SHA1	Message	Date
Suren Baghdasaryan	766c163c20	alloc_tag: mark pages reserved during CMA activation as not tagged During CMA activation, pages in CMA area are prepared and then freed without being allocated. This triggers warnings when memory allocation debug config (CONFIG_MEM_ALLOC_PROFILING_DEBUG) is enabled. Fix this by marking these pages not tagged before freeing them. Link: https://lkml.kernel.org/r/20240813150758.855881-2-surenb@google.com Fixes: `d224eb0287` ("codetag: debug: mark codetags for reserved pages as empty") Signed-off-by: Suren Baghdasaryan <surenb@google.com> Acked-by: David Hildenbrand <david@redhat.com> Cc: Kees Cook <keescook@chromium.org> Cc: Kent Overstreet <kent.overstreet@linux.dev> Cc: Pasha Tatashin <pasha.tatashin@soleen.com> Cc: Sourav Panda <souravpanda@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: <stable@vger.kernel.org> [6.10] Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2024-08-15 22:16:16 -07:00
Suren Baghdasaryan	a8fc28dad6	alloc_tag: introduce clear_page_tag_ref() helper function In several cases we are freeing pages which were not allocated using common page allocators. For such cases, in order to keep allocation accounting correct, we should clear the page tag to indicate that the page being freed is expected to not have a valid allocation tag. Introduce clear_page_tag_ref() helper function to be used for this. Link: https://lkml.kernel.org/r/20240813150758.855881-1-surenb@google.com Fixes: `d224eb0287` ("codetag: debug: mark codetags for reserved pages as empty") Signed-off-by: Suren Baghdasaryan <surenb@google.com> Suggested-by: David Hildenbrand <david@redhat.com> Acked-by: David Hildenbrand <david@redhat.com> Reviewed-by: Pasha Tatashin <pasha.tatashin@soleen.com> Cc: Kees Cook <keescook@chromium.org> Cc: Kent Overstreet <kent.overstreet@linux.dev> Cc: Sourav Panda <souravpanda@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: <stable@vger.kernel.org> [6.10] Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2024-08-15 22:16:16 -07:00
Jinjie Ruan	edb907a613	crash: fix riscv64 crash memory reserve dead loop On RISCV64 Qemu machine with 512MB memory, cmdline "crashkernel=500M,high" will cause system stall as below: Zone ranges: DMA32 [mem 0x0000000080000000-0x000000009fffffff] Normal empty Movable zone start for each node Early memory node ranges node 0: [mem 0x0000000080000000-0x000000008005ffff] node 0: [mem 0x0000000080060000-0x000000009fffffff] Initmem setup node 0 [mem 0x0000000080000000-0x000000009fffffff] (stall here) commit `5d99cadf15` ("crash: fix x86_32 crash memory reserve dead loop bug") fix this on 32-bit architecture. However, the problem is not completely solved. If `CRASH_ADDR_LOW_MAX = CRASH_ADDR_HIGH_MAX` on 64-bit architecture, for example, when system memory is equal to CRASH_ADDR_LOW_MAX on RISCV64, the following infinite loop will also occur: -> reserve_crashkernel_generic() and high is true -> alloc at [CRASH_ADDR_LOW_MAX, CRASH_ADDR_HIGH_MAX] fail -> alloc at [0, CRASH_ADDR_LOW_MAX] fail and repeatedly (because CRASH_ADDR_LOW_MAX = CRASH_ADDR_HIGH_MAX). As Catalin suggested, do not remove the ",high" reservation fallback to ",low" logic which will change arm64's kdump behavior, but fix it by skipping the above situation similar to commit `d2f32f2319` ("crash: fix x86_32 crash memory reserve dead loop"). After this patch, it print: cannot allocate crashkernel (size:0x1f400000) Link: https://lkml.kernel.org/r/20240812062017.2674441-1-ruanjinjie@huawei.com Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com> Suggested-by: Catalin Marinas <catalin.marinas@arm.com> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Acked-by: Baoquan He <bhe@redhat.com> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Dave Young <dyoung@redhat.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2024-08-15 22:16:16 -07:00
Muhammad Usama Anjum	7c5e8d212d	selftests: memfd_secret: don't build memfd_secret test on unsupported arches [1] mentions that memfd_secret is only supported on arm64, riscv, x86 and x86_64 for now. It doesn't support other architectures. I found the build error on arm and decided to send the fix as it was creating noise on KernelCI: memfd_secret.c: In function 'memfd_secret': memfd_secret.c:42:24: error: '__NR_memfd_secret' undeclared (first use in this function); did you mean 'memfd_secret'? 42 \| return syscall(__NR_memfd_secret, flags); \| ^~~~~~~~~~~~~~~~~ \| memfd_secret Hence I'm adding condition that memfd_secret should only be compiled on supported architectures. Also check in run_vmtests script if memfd_secret binary is present before executing it. Link: https://lkml.kernel.org/r/20240812061522.1933054-1-usama.anjum@collabora.com Link: https://lore.kernel.org/all/20210518072034.31572-7-rppt@kernel.org/ [1] Link: https://lkml.kernel.org/r/20240809075642.403247-1-usama.anjum@collabora.com Fixes: `76fe17ef58` ("secretmem: test: add basic selftest for memfd_secret(2)") Signed-off-by: Muhammad Usama Anjum <usama.anjum@collabora.com> Reviewed-by: Shuah Khan <skhan@linuxfoundation.org> Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: James Bottomley <James.Bottomley@HansenPartnership.com> Cc: Mike Rapoport (Microsoft) <rppt@kernel.org> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2024-08-15 22:16:15 -07:00
Kirill A. Shutemov	807174a93d	mm: fix endless reclaim on machines with unaccepted memory Unaccepted memory is considered unusable free memory, which is not counted as free on the zone watermark check. This causes get_page_from_freelist() to accept more memory to hit the high watermark, but it creates problems in the reclaim path. The reclaim path encounters a failed zone watermark check and attempts to reclaim memory. This is usually successful, but if there is little or no reclaimable memory, it can result in endless reclaim with little to no progress. This can occur early in the boot process, just after start of the init process when the only reclaimable memory is the page cache of the init executable and its libraries. Make unaccepted memory free from watermark check point of view. This way unaccepted memory will never be the trigger of memory reclaim. Accept more memory in the get_page_from_freelist() if needed. Link: https://lkml.kernel.org/r/20240809114854.3745464-2-kirill.shutemov@linux.intel.com Fixes: `dcdfdd40fa` ("mm: Add support for unaccepted memory") Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Reported-by: Jianxiong Gao <jxgao@google.com> Acked-by: David Hildenbrand <david@redhat.com> Tested-by: Jianxiong Gao <jxgao@google.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Mel Gorman <mgorman@suse.de> Cc: Mike Rapoport (Microsoft) <rppt@kernel.org> Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: <stable@vger.kernel.org> [6.5+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2024-08-15 22:16:15 -07:00
Dan Carpenter	af3b7d09a9	selftests/mm: compaction_test: fix off by one in check_compaction() The "initial_nr_hugepages" variable is unsigned long so it takes up to 20 characters to print, plus 1 more character for the NUL terminator. Unfortunately, this buffer is not quite large enough for the terminator to fit. Also use snprintf() for a belt and suspenders approach. Link: https://lkml.kernel.org/r/87470c06-b45a-4e83-92ff-aac2e7b9c6ba@stanley.mountain Fixes: `fb9293b6b0` ("selftests/mm: compaction_test: fix bogus test success and reduce probability of OOM-killer invocation") Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2024-08-15 22:16:15 -07:00
Zi Yan	fd8c35a929	mm/numa: no task_numa_fault() call if PMD is changed When handling a numa page fault, task_numa_fault() should be called by a process that restores the page table of the faulted folio to avoid duplicated stats counting. Commit `c5b5a3dd2c` ("mm: thp: refactor NUMA fault handling") restructured do_huge_pmd_numa_page() and did not avoid task_numa_fault() call in the second page table check after a numa migration failure. Fix it by making all !pmd_same() return immediately. This issue can cause task_numa_fault() being called more than necessary and lead to unexpected numa balancing results (It is hard to tell whether the issue will cause positive or negative performance impact due to duplicated numa fault counting). Link: https://lkml.kernel.org/r/20240809145906.1513458-3-ziy@nvidia.com Fixes: `c5b5a3dd2c` ("mm: thp: refactor NUMA fault handling") Reported-by: "Huang, Ying" <ying.huang@intel.com> Closes: https://lore.kernel.org/linux-mm/87zfqfw0yw.fsf@yhuang6-desk2.ccr.corp.intel.com/ Signed-off-by: Zi Yan <ziy@nvidia.com> Acked-by: David Hildenbrand <david@redhat.com> Cc: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: "Huang, Ying" <ying.huang@intel.com> Cc: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Yang Shi <shy828301@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2024-08-15 22:16:15 -07:00
Zi Yan	40b760cfd4	mm/numa: no task_numa_fault() call if PTE is changed When handling a numa page fault, task_numa_fault() should be called by a process that restores the page table of the faulted folio to avoid duplicated stats counting. Commit `b99a342d4f` ("NUMA balancing: reduce TLB flush via delaying mapping on hint page fault") restructured do_numa_page() and did not avoid task_numa_fault() call in the second page table check after a numa migration failure. Fix it by making all !pte_same() return immediately. This issue can cause task_numa_fault() being called more than necessary and lead to unexpected numa balancing results (It is hard to tell whether the issue will cause positive or negative performance impact due to duplicated numa fault counting). Link: https://lkml.kernel.org/r/20240809145906.1513458-2-ziy@nvidia.com Fixes: `b99a342d4f` ("NUMA balancing: reduce TLB flush via delaying mapping on hint page fault") Signed-off-by: Zi Yan <ziy@nvidia.com> Reported-by: "Huang, Ying" <ying.huang@intel.com> Closes: https://lore.kernel.org/linux-mm/87zfqfw0yw.fsf@yhuang6-desk2.ccr.corp.intel.com/ Acked-by: David Hildenbrand <david@redhat.com> Cc: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Yang Shi <shy828301@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2024-08-15 22:16:15 -07:00
Hailong Liu	61ebe5a747	mm/vmalloc: fix page mapping if vm_area_alloc_pages() with high order fallback to order 0 The __vmap_pages_range_noflush() assumes its argument pages contains pages with the same page shift. However, since commit `e9c3cda4d8` ("mm, vmalloc: fix high order __GFP_NOFAIL allocations"), if gfp_flags includes __GFP_NOFAIL with high order in vm_area_alloc_pages() and page allocation failed for high order, the pages may contain two different page shifts (high order and order-0). This could lead __vmap_pages_range_noflush() to perform incorrect mappings, potentially resulting in memory corruption. Users might encounter this as follows (vmap_allow_huge = true, 2M is for PMD_SIZE): kvmalloc(2M, __GFP_NOFAIL\|GFP_X) __vmalloc_node_range_noprof(vm_flags=VM_ALLOW_HUGE_VMAP) vm_area_alloc_pages(order=9) ---> order-9 allocation failed and fallback to order-0 vmap_pages_range() vmap_pages_range_noflush() __vmap_pages_range_noflush(page_shift = 21) ----> wrong mapping happens We can remove the fallback code because if a high-order allocation fails, __vmalloc_node_range_noprof() will retry with order-0. Therefore, it is unnecessary to fallback to order-0 here. Therefore, fix this by removing the fallback code. Link: https://lkml.kernel.org/r/20240808122019.3361-1-hailong.liu@oppo.com Fixes: `e9c3cda4d8` ("mm, vmalloc: fix high order __GFP_NOFAIL allocations") Signed-off-by: Hailong Liu <hailong.liu@oppo.com> Reported-by: Tangquan Zheng <zhengtangquan@oppo.com> Reviewed-by: Baoquan He <bhe@redhat.com> Reviewed-by: Uladzislau Rezki (Sony) <urezki@gmail.com> Acked-by: Barry Song <baohua@kernel.org> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2024-08-15 22:16:14 -07:00
Waiman Long	d75abd0d0b	mm/memory-failure: use raw_spinlock_t in struct memory_failure_cpu The memory_failure_cpu structure is a per-cpu structure. Access to its content requires the use of get_cpu_var() to lock in the current CPU and disable preemption. The use of a regular spinlock_t for locking purpose is fine for a non-RT kernel. Since the integration of RT spinlock support into the v5.15 kernel, a spinlock_t in a RT kernel becomes a sleeping lock and taking a sleeping lock in a preemption disabled context is illegal resulting in the following kind of warning. [12135.732244] BUG: sleeping function called from invalid context at kernel/locking/spinlock_rt.c:48 [12135.732248] in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 270076, name: kworker/0:0 [12135.732252] preempt_count: 1, expected: 0 [12135.732255] RCU nest depth: 2, expected: 2 : [12135.732420] Hardware name: Dell Inc. PowerEdge R640/0HG0J8, BIOS 2.10.2 02/24/2021 [12135.732423] Workqueue: kacpi_notify acpi_os_execute_deferred [12135.732433] Call Trace: [12135.732436] <TASK> [12135.732450] dump_stack_lvl+0x57/0x81 [12135.732461] __might_resched.cold+0xf4/0x12f [12135.732479] rt_spin_lock+0x4c/0x100 [12135.732491] memory_failure_queue+0x40/0xe0 [12135.732503] ghes_do_memory_failure+0x53/0x390 [12135.732516] ghes_do_proc.constprop.0+0x229/0x3e0 [12135.732575] ghes_proc+0xf9/0x1a0 [12135.732591] ghes_notify_hed+0x6a/0x150 [12135.732602] notifier_call_chain+0x43/0xb0 [12135.732626] blocking_notifier_call_chain+0x43/0x60 [12135.732637] acpi_ev_notify_dispatch+0x47/0x70 [12135.732648] acpi_os_execute_deferred+0x13/0x20 [12135.732654] process_one_work+0x41f/0x500 [12135.732695] worker_thread+0x192/0x360 [12135.732715] kthread+0x111/0x140 [12135.732733] ret_from_fork+0x29/0x50 [12135.732779] </TASK> Fix it by using a raw_spinlock_t for locking instead. Also move the pr_err() out of the lock critical section and after put_cpu_ptr() to avoid indeterminate latency and the possibility of sleep with this call. [longman@redhat.com: don't hold percpu ref across pr_err(), per Miaohe] Link: https://lkml.kernel.org/r/20240807181130.1122660-1-longman@redhat.com Link: https://lkml.kernel.org/r/20240806164107.1044956-1-longman@redhat.com Fixes: `0f383b6dc9` ("locking/spinlock: Provide RT variant") Signed-off-by: Waiman Long <longman@redhat.com> Acked-by: Miaohe Lin <linmiaohe@huawei.com> Cc: "Huang, Ying" <ying.huang@intel.com> Cc: Juri Lelli <juri.lelli@redhat.com> Cc: Len Brown <len.brown@intel.com> Cc: Naoya Horiguchi <nao.horiguchi@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2024-08-15 22:16:14 -07:00
Pasha Tatashin	9d85731110	mm: don't account memmap per-node Fix invalid access to pgdat during hot-remove operation: ndctl users reported a GPF when trying to destroy a namespace: $ ndctl destroy-namespace all -r all -f Segmentation fault dmesg: Oops: general protection fault, probably for non-canonical address 0xdffffc0000005650: 0000 [#1] PREEMPT SMP KASAN PTI KASAN: probably user-memory-access in range [0x000000000002b280-0x000000000002b287] CPU: 26 UID: 0 PID: 1868 Comm: ndctl Not tainted 6.11.0-rc1 #1 Hardware name: Dell Inc. PowerEdge R640/08HT8T, BIOS 2.20.1 09/13/2023 RIP: 0010:mod_node_page_state+0x2a/0x110 cxl-test users report a GPF when trying to unload the test module: $ modrpobe -r cxl-test dmesg BUG: unable to handle page fault for address: 0000000000004200 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 0 P4D 0 Oops: Oops: 0000 [#1] PREEMPT SMP PTI CPU: 0 UID: 0 PID: 1076 Comm: modprobe Tainted: G O N 6.11.0-rc1 #197 Tainted: [O]=OOT_MODULE, [N]=TEST Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/15 RIP: 0010:mod_node_page_state+0x6/0x90 Currently, when memory is hot-plugged or hot-removed the accounting is done based on the assumption that memmap is allocated from the same node as the hot-plugged/hot-removed memory, which is not always the case. In addition, there are challenges with keeping the node id of the memory that is being remove to the time when memmap accounting is actually performed: since this is done after remove_pfn_range_from_zone(), and also after remove_memory_block_devices(). Meaning that we cannot use pgdat nor walking though memblocks to get the nid. Given all of that, account the memmap overhead system wide instead. For this we are going to be using global atomic counters, but given that memmap size is rarely modified, and normally is only modified either during early boot when there is only one CPU, or under a hotplug global mutex lock, therefore there is no need for per-cpu optimizations. Also, while we are here rename nr_memmap to nr_memmap_pages, and nr_memmap_boot to nr_memmap_boot_pages to be self explanatory that the units are in page count. [pasha.tatashin@soleen.com: address a few nits from David Hildenbrand] Link: https://lkml.kernel.org/r/20240809191020.1142142-4-pasha.tatashin@soleen.com Link: https://lkml.kernel.org/r/20240809191020.1142142-4-pasha.tatashin@soleen.com Link: https://lkml.kernel.org/r/20240808213437.682006-4-pasha.tatashin@soleen.com Fixes: `15995a3524` ("mm: report per-page metadata information") Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com> Reported-by: Yi Zhang <yi.zhang@redhat.com> Closes: https://lore.kernel.org/linux-cxl/CAHj4cs9Ax1=CoJkgBGP_+sNu6-6=6v=_L-ZBZY0bVLD3wUWZQg@mail.gmail.com Reported-by: Alison Schofield <alison.schofield@intel.com> Closes: https://lore.kernel.org/linux-mm/Zq0tPd2h6alFz8XF@aschofie-mobl2/#t Tested-by: Dan Williams <dan.j.williams@intel.com> Tested-by: Alison Schofield <alison.schofield@intel.com> Acked-by: David Hildenbrand <david@redhat.com> Acked-by: David Rientjes <rientjes@google.com> Tested-by: Yi Zhang <yi.zhang@redhat.com> Cc: Domenico Cerasuolo <cerasuolodomenico@gmail.com> Cc: Fan Ni <fan.ni@samsung.com> Cc: Joel Granados <j.granados@samsung.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Li Zhijian <lizhijian@fujitsu.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Mike Rapoport <rppt@kernel.org> Cc: Muchun Song <muchun.song@linux.dev> Cc: Nhat Pham <nphamcs@gmail.com> Cc: Sourav Panda <souravpanda@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Yosry Ahmed <yosryahmed@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2024-08-15 22:16:14 -07:00
Pasha Tatashin	f4cb78af91	mm: add system wide stats items category /proc/vmstat contains events and stats, events can only grow, but stats can grow and shrink. vmstat has the following: ------------------------- NR_VM_ZONE_STAT_ITEMS: per-zone stats NR_VM_NUMA_EVENT_ITEMS: per-numa events NR_VM_NODE_STAT_ITEMS: per-numa stats NR_VM_WRITEBACK_STAT_ITEMS: system-wide background-writeback and dirty-throttling tresholds. NR_VM_EVENT_ITEMS: system-wide events ------------------------- Rename NR_VM_WRITEBACK_STAT_ITEMS to NR_VM_STAT_ITEMS, to track the system-wide stats, we are going to add per-page metadata stats to this category in the next patch. Also delete unused writeback_stat_name(). Link: https://lkml.kernel.org/r/20240809191020.1142142-2-pasha.tatashin@soleen.com Link: https://lkml.kernel.org/r/20240808213437.682006-3-pasha.tatashin@soleen.com Fixes: `15995a3524` ("mm: report per-page metadata information") Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com> Suggested-by: Yosry Ahmed <yosryahmed@google.com> Tested-by: Alison Schofield <alison.schofield@intel.com> Acked-by: David Hildenbrand <david@redhat.com> Acked-by: David Rientjes <rientjes@google.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Domenico Cerasuolo <cerasuolodomenico@gmail.com> Cc: Joel Granados <j.granados@samsung.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Li Zhijian <lizhijian@fujitsu.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Mike Rapoport <rppt@kernel.org> Cc: Muchun Song <muchun.song@linux.dev> Cc: Nhat Pham <nphamcs@gmail.com> Cc: Sourav Panda <souravpanda@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Yi Zhang <yi.zhang@redhat.com> Cc: Fan Ni <fan.ni@samsung.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2024-08-15 22:16:14 -07:00
Pasha Tatashin	ace0741a55	mm: don't account memmap on failure Patch series "Fixes for memmap accounting", v4. Memmap accounting provides us with observability of how much memory is used for per-page metadata: i.e. "struct page"'s and "struct page_ext". It also provides with information of how much was allocated using boot allocator (i.e. not part of MemTotal), and how much was allocated using buddy allocated (i.e. part of MemTotal). This small series fixes a few problems that were discovered with the original patch. This patch (of 3): When we fail to allocate the mmemmap in alloc_vmemmap_page_list(), do not account any already-allocated pages: we're going to free all them before we return from the function. Link: https://lkml.kernel.org/r/20240809191020.1142142-1-pasha.tatashin@soleen.com Link: https://lkml.kernel.org/r/20240808213437.682006-1-pasha.tatashin@soleen.com Link: https://lkml.kernel.org/r/20240808213437.682006-2-pasha.tatashin@soleen.com Fixes: `15995a3524` ("mm: report per-page metadata information") Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com> Reviewed-by: Fan Ni <fan.ni@samsung.com> Reviewed-by: Yosry Ahmed <yosryahmed@google.com> Acked-by: David Hildenbrand <david@redhat.com> Tested-by: Alison Schofield <alison.schofield@intel.com> Reviewed-by: Muchun Song <muchun.song@linux.dev> Acked-by: David Rientjes <rientjes@google.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Domenico Cerasuolo <cerasuolodomenico@gmail.com> Cc: Joel Granados <j.granados@samsung.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Li Zhijian <lizhijian@fujitsu.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Mike Rapoport <rppt@kernel.org> Cc: Nhat Pham <nphamcs@gmail.com> Cc: Sourav Panda <souravpanda@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Yi Zhang <yi.zhang@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2024-08-15 22:16:13 -07:00
David Hildenbrand	5f75cfbd6b	mm/hugetlb: fix hugetlb vs. core-mm PT locking We recently made GUP's common page table walking code to also walk hugetlb VMAs without most hugetlb special-casing, preparing for the future of having less hugetlb-specific page table walking code in the codebase. Turns out that we missed one page table locking detail: page table locking for hugetlb folios that are not mapped using a single PMD/PUD. Assume we have hugetlb folio that spans multiple PTEs (e.g., 64 KiB hugetlb folios on arm64 with 4 KiB base page size). GUP, as it walks the page tables, will perform a pte_offset_map_lock() to grab the PTE table lock. However, hugetlb that concurrently modifies these page tables would actually grab the mm->page_table_lock: with USE_SPLIT_PTE_PTLOCKS, the locks would differ. Something similar can happen right now with hugetlb folios that span multiple PMDs when USE_SPLIT_PMD_PTLOCKS. This issue can be reproduced [1], for example triggering: [ 3105.936100] ------------[ cut here ]------------ [ 3105.939323] WARNING: CPU: 31 PID: 2732 at mm/gup.c:142 try_grab_folio+0x11c/0x188 [ 3105.944634] Modules linked in: [...] [ 3105.974841] CPU: 31 PID: 2732 Comm: reproducer Not tainted 6.10.0-64.eln141.aarch64 #1 [ 3105.980406] Hardware name: QEMU KVM Virtual Machine, BIOS edk2-20240524-4.fc40 05/24/2024 [ 3105.986185] pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--) [ 3105.991108] pc : try_grab_folio+0x11c/0x188 [ 3105.994013] lr : follow_page_pte+0xd8/0x430 [ 3105.996986] sp : ffff80008eafb8f0 [ 3105.999346] x29: ffff80008eafb900 x28: ffffffe8d481f380 x27: 00f80001207cff43 [ 3106.004414] x26: 0000000000000001 x25: 0000000000000000 x24: ffff80008eafba48 [ 3106.009520] x23: 0000ffff9372f000 x22: ffff7a54459e2000 x21: ffff7a546c1aa978 [ 3106.014529] x20: ffffffe8d481f3c0 x19: 0000000000610041 x18: 0000000000000001 [ 3106.019506] x17: 0000000000000001 x16: ffffffffffffffff x15: 0000000000000000 [ 3106.024494] x14: ffffb85477fdfe08 x13: 0000ffff9372ffff x12: 0000000000000000 [ 3106.029469] x11: 1fffef4a88a96be1 x10: ffff7a54454b5f0c x9 : ffffb854771b12f0 [ 3106.034324] x8 : 0008000000000000 x7 : ffff7a546c1aa980 x6 : 0008000000000080 [ 3106.038902] x5 : 00000000001207cf x4 : 0000ffff9372f000 x3 : ffffffe8d481f000 [ 3106.043420] x2 : 0000000000610041 x1 : 0000000000000001 x0 : 0000000000000000 [ 3106.047957] Call trace: [ 3106.049522] try_grab_folio+0x11c/0x188 [ 3106.051996] follow_pmd_mask.constprop.0.isra.0+0x150/0x2e0 [ 3106.055527] follow_page_mask+0x1a0/0x2b8 [ 3106.058118] __get_user_pages+0xf0/0x348 [ 3106.060647] faultin_page_range+0xb0/0x360 [ 3106.063651] do_madvise+0x340/0x598 Let's make huge_pte_lockptr() effectively use the same PT locks as any core-mm page table walker would. Add ptep_lockptr() to obtain the PTE page table lock using a pte pointer -- unfortunately we cannot convert pte_lockptr() because virt_to_page() doesn't work with kmap'ed page tables we can have with CONFIG_HIGHPTE. Handle CONFIG_PGTABLE_LEVELS correctly by checking in reverse order, such that when e.g., CONFIG_PGTABLE_LEVELS==2 with PGDIR_SIZE==P4D_SIZE==PUD_SIZE==PMD_SIZE will work as expected. Document why that works. There is one ugly case: powerpc 8xx, whereby we have an 8 MiB hugetlb folio being mapped using two PTE page tables. While hugetlb wants to take the PMD table lock, core-mm would grab the PTE table lock of one of both PTE page tables. In such corner cases, we have to make sure that both locks match, which is (fortunately!) currently guaranteed for 8xx as it does not support SMP and consequently doesn't use split PT locks. [1] https://lore.kernel.org/all/1bbfcc7f-f222-45a5-ac44-c5a1381c596d@redhat.com/ Link: https://lkml.kernel.org/r/20240801204748.99107-1-david@redhat.com Fixes: `9cb28da546` ("mm/gup: handle hugetlb in the generic follow_page_mask code") Signed-off-by: David Hildenbrand <david@redhat.com> Acked-by: Peter Xu <peterx@redhat.com> Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com> Tested-by: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: Peter Xu <peterx@redhat.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: Muchun Song <muchun.song@linux.dev> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2024-08-15 22:16:13 -07:00
Pedro Falcato	e46bc2e7eb	mseal: fix is_madv_discard() is_madv_discard did its check wrong. MADV_ flags are not bitwise, they're normal sequential numbers. So, for instance: behavior & (/* ... */ \| MADV_REMOVE) tagged both MADV_REMOVE and MADV_RANDOM (bit 0 set) as discard operations. As a result the kernel could erroneously block certain madvises (e.g MADV_RANDOM or MADV_HUGEPAGE) on sealed VMAs due to them sharing bits with blocked MADV operations (e.g REMOVE or WIPEONFORK). This is obviously incorrect, so use a switch statement instead. Link: https://lkml.kernel.org/r/20240807173336.2523757-1-pedro.falcato@gmail.com Link: https://lkml.kernel.org/r/20240807173336.2523757-2-pedro.falcato@gmail.com Fixes: `8be7258aad` ("mseal: add mseal syscall") Signed-off-by: Pedro Falcato <pedro.falcato@gmail.com> Tested-by: Jeff Xu <jeffxu@chromium.org> Reviewed-by: Jeff Xu <jeffxu@chromium.org> Cc: Kees Cook <kees@kernel.org> Cc: Liam R. Howlett <Liam.Howlett@oracle.com> Cc: Shuah Khan <shuah@kernel.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2024-08-15 22:16:13 -07:00
Dave Airlie	fee9d135e2	Mediatek DRM Fixes - 20240805 1. Set sensible cursor width/height values to fix crash -----BEGIN PGP SIGNATURE----- iQJMBAABCgA2FiEEACwLKSDmq+9RDv5P4cpzo8lZTiQFAmaw0E8YHGNodW5rdWFu Zy5odUBrZXJuZWwub3JnAAoJEOHKc6PJWU4keYUQAI/L7nnsDmy5HNm3P4t8YoL3 mzlPyxdMGZ4JwvtfHHRjqVF3sYNcIbfyjdlt84jAEfkZtN1AjYCA80MA6GCUFAUr wnWTWdVYZfGERfyS8K5QGyeud6Q796JrG9EvbWiVfcNJJbhboK6hCiWJ3qs1nmVe KDaMcRkY7goa+bDNXguDDT5+PEdKyC4Nz4YxkgCeCBqgc+FLWDJ2JiGD9UiQ94Fb GkI37ztndEXmmr8HLnvTZipk2RqpIgU2q9lDXVrhgPaMk3sFLpSlqe0AtU5kjsP5 IjhejtNUD2Z2bdIt8cjJwk+0hkEMiETILLQ66XA0wyEYrNIfDJlqEY/2ZjAU3p/B QqUUaAwHj2DdelXRl2QO1mq6K/IdCOJF5XDbWjVQj3Ys9fk8se//syzbGgJ/QeO7 sIoQIDmj333KdNAumWqNAPEOoqXwn8byJ1GJqeGRY2p7/+xYJbNHe5KRXeOP2Z5x GT3Qh3Gx5nOO/rYqluPsviHCDVzUtnjHV0fGW4B/Yn2h6CLV3HRsCP50nLui/NM6 ZRvZLMkKRyha4dkUlT2hhRsVU+XUasT6kJLMxb3tUU7VBl/FGgL7ROtmQSntTl7v 1B05hn22cFeKsGTmhRwn7qKK+lK20H3RshzdKbNEBj8BL+cJqpuMehS6P1PJoA2x MkpewadFWZTlmgh1u+Pl =g3L1 -----END PGP SIGNATURE----- Merge tag 'mediatek-drm-fixes-20240805' of https://git.kernel.org/pub/scm/linux/kernel/git/chunkuang.hu/linux into drm-fixes Mediatek DRM Fixes - 20240805 1. Set sensible cursor width/height values to fix crash Signed-off-by: Dave Airlie <airlied@redhat.com> From: Chun-Kuang Hu <chunkuang.hu@kernel.org> Link: https://patchwork.freedesktop.org/patch/msgid/20240810084605.3435-1-chunkuang.hu@kernel.org	2024-08-16 13:16:47 +10:00
Dave Airlie	f8e170a3dd	- Validate user fence during creation (Brost) - Fix use after free when client stats are captured (Umesh) - SRIOV fixes (Michal) - Runtime PM fixes (Brost) -----BEGIN PGP SIGNATURE----- iQEzBAABCAAdFiEEbSBwaO7dZQkcLOKj+mJfZA7rE8oFAma+CH8ACgkQ+mJfZA7r E8r/Gwf+Kvzdu/yM+S6O1be0ToPQPtj2qXi1sjTUT4FetYLoPRdrC1lBqt95oIQh OwYXxJQTg2YuAUBenfg/pWt5bAazSlwP962JU96gg1z+5p8i1nUfSUvp+ZGU8JSu 8aU/Nla4X6Fg/0yhmdlU3MmwxXUmqlgQvXqqr3P0R1jcpi2stNDc3VDN+0TqY07F lMR/cbVwvQnpxBvrHh/MyIR+g6iG67xYiX8tVbFb1rmmg1F6RNrTYX5mcx5FoosI 1ST2PsLt23NBQDgnOrWvV/Y38GjbqAMrrma/5EFXDKF3C237ewTCyxA0UePPAyfe JKncrc7nqmNx7igoT3AMtZFVG0LHWQ== =KJ+l -----END PGP SIGNATURE----- Merge tag 'drm-xe-fixes-2024-08-15' of https://gitlab.freedesktop.org/drm/xe/kernel into drm-fixes - Validate user fence during creation (Brost) - Fix use after free when client stats are captured (Umesh) - SRIOV fixes (Michal) - Runtime PM fixes (Brost) Signed-off-by: Dave Airlie <airlied@redhat.com> From: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/Zr4KWF5nM1YvnT8H@intel.com	2024-08-16 12:06:13 +10:00
Dave Airlie	75eac7e8bb	Short summary of fixes pull: panel: - dt-bindings style fixes panel-orientation: - add quirk for Any Loki Max - add quirk for Any Loki Zero rockchip: - inno-hdmi: fix infoframe upload v3d: - fix OOB access in v3d_csd_job_run() -----BEGIN PGP SIGNATURE----- iQEzBAABCgAdFiEEchf7rIzpz2NEoWjlaA3BHVMLeiMFAma9/7AACgkQaA3BHVML eiN4sggAnc4phUhIiLEgqtinh0E8HnqlOujhvlrONdJPJWGoZz+OKM+l/Uy4mMme /RF9D3CgEumlCN//3wZ+9J/2gutvUytcUzwr3uT+x4kkl9hjHEUiHyKnKXWDJqWK Jt8Mtn99KojZNRkVbMYp6CVylvhXu4i2jGJhuq1F8sKA0Le6fCF09fDYwS/kUWxd V2urqvSEWgu1rngcvIplPfhMyJH16VneYLa3CX7nNDy9q0mT0SAhFw8/47GozqeQ 1xPfsaqrxWnLUnV+/g6katbRgbdeXnzeYd47gbtruaLkWnhpfNrlofYpiRTq1Buy SQ7iTo3lDulhm+pyIauLBhssytEKig== =kUH6 -----END PGP SIGNATURE----- Merge tag 'drm-misc-fixes-2024-08-15' of https://gitlab.freedesktop.org/drm/misc/kernel into drm-fixes Short summary of fixes pull: panel: - dt-bindings style fixes panel-orientation: - add quirk for Any Loki Max - add quirk for Any Loki Zero rockchip: - inno-hdmi: fix infoframe upload v3d: - fix OOB access in v3d_csd_job_run() Signed-off-by: Dave Airlie <airlied@redhat.com> From: Thomas Zimmermann <tzimmermann@suse.de> Link: https://patchwork.freedesktop.org/patch/msgid/20240815131751.GA151031@linux.fritz.box	2024-08-16 11:33:37 +10:00
Li Lingfeng	b313a8c835	block: Fix lockdep warning in blk_mq_mark_tag_wait Lockdep reported a warning in Linux version 6.6: [ 414.344659] ================================ [ 414.345155] WARNING: inconsistent lock state [ 414.345658] 6.6.0-07439-gba2303cacfda #6 Not tainted [ 414.346221] -------------------------------- [ 414.346712] inconsistent {IN-SOFTIRQ-W} -> {SOFTIRQ-ON-W} usage. [ 414.347545] kworker/u10:3/1152 [HC0[0]:SC0[0]:HE0:SE1] takes: [ 414.349245] ffff88810edd1098 (&sbq->ws[i].wait){+.?.}-{2:2}, at: blk_mq_dispatch_rq_list+0x131c/0x1ee0 [ 414.351204] {IN-SOFTIRQ-W} state was registered at: [ 414.351751] lock_acquire+0x18d/0x460 [ 414.352218] _raw_spin_lock_irqsave+0x39/0x60 [ 414.352769] __wake_up_common_lock+0x22/0x60 [ 414.353289] sbitmap_queue_wake_up+0x375/0x4f0 [ 414.353829] sbitmap_queue_clear+0xdd/0x270 [ 414.354338] blk_mq_put_tag+0xdf/0x170 [ 414.354807] __blk_mq_free_request+0x381/0x4d0 [ 414.355335] blk_mq_free_request+0x28b/0x3e0 [ 414.355847] __blk_mq_end_request+0x242/0xc30 [ 414.356367] scsi_end_request+0x2c1/0x830 [ 414.345155] WARNING: inconsistent lock state [ 414.345658] 6.6.0-07439-gba2303cacfda #6 Not tainted [ 414.346221] -------------------------------- [ 414.346712] inconsistent {IN-SOFTIRQ-W} -> {SOFTIRQ-ON-W} usage. [ 414.347545] kworker/u10:3/1152 [HC0[0]:SC0[0]:HE0:SE1] takes: [ 414.349245] ffff88810edd1098 (&sbq->ws[i].wait){+.?.}-{2:2}, at: blk_mq_dispatch_rq_list+0x131c/0x1ee0 [ 414.351204] {IN-SOFTIRQ-W} state was registered at: [ 414.351751] lock_acquire+0x18d/0x460 [ 414.352218] _raw_spin_lock_irqsave+0x39/0x60 [ 414.352769] __wake_up_common_lock+0x22/0x60 [ 414.353289] sbitmap_queue_wake_up+0x375/0x4f0 [ 414.353829] sbitmap_queue_clear+0xdd/0x270 [ 414.354338] blk_mq_put_tag+0xdf/0x170 [ 414.354807] __blk_mq_free_request+0x381/0x4d0 [ 414.355335] blk_mq_free_request+0x28b/0x3e0 [ 414.355847] __blk_mq_end_request+0x242/0xc30 [ 414.356367] scsi_end_request+0x2c1/0x830 [ 414.356863] scsi_io_completion+0x177/0x1610 [ 414.357379] scsi_complete+0x12f/0x260 [ 414.357856] blk_complete_reqs+0xba/0xf0 [ 414.358338] __do_softirq+0x1b0/0x7a2 [ 414.358796] irq_exit_rcu+0x14b/0x1a0 [ 414.359262] sysvec_call_function_single+0xaf/0xc0 [ 414.359828] asm_sysvec_call_function_single+0x1a/0x20 [ 414.360426] default_idle+0x1e/0x30 [ 414.360873] default_idle_call+0x9b/0x1f0 [ 414.361390] do_idle+0x2d2/0x3e0 [ 414.361819] cpu_startup_entry+0x55/0x60 [ 414.362314] start_secondary+0x235/0x2b0 [ 414.362809] secondary_startup_64_no_verify+0x18f/0x19b [ 414.363413] irq event stamp: 428794 [ 414.363825] hardirqs last enabled at (428793): [<ffffffff816bfd1c>] ktime_get+0x1dc/0x200 [ 414.364694] hardirqs last disabled at (428794): [<ffffffff85470177>] _raw_spin_lock_irq+0x47/0x50 [ 414.365629] softirqs last enabled at (428444): [<ffffffff85474780>] __do_softirq+0x540/0x7a2 [ 414.366522] softirqs last disabled at (428419): [<ffffffff813f65ab>] irq_exit_rcu+0x14b/0x1a0 [ 414.367425] other info that might help us debug this: [ 414.368194] Possible unsafe locking scenario: [ 414.368900] CPU0 [ 414.369225] ---- [ 414.369548] lock(&sbq->ws[i].wait); [ 414.370000] <Interrupt> [ 414.370342] lock(&sbq->ws[i].wait); [ 414.370802] * DEADLOCK * [ 414.371569] 5 locks held by kworker/u10:3/1152: [ 414.372088] #0: ffff88810130e938 ((wq_completion)writeback){+.+.}-{0:0}, at: process_scheduled_works+0x357/0x13f0 [ 414.373180] #1: ffff88810201fdb8 ((work_completion)(&(&wb->dwork)->work)){+.+.}-{0:0}, at: process_scheduled_works+0x3a3/0x13f0 [ 414.374384] #2: ffffffff86ffbdc0 (rcu_read_lock){....}-{1:2}, at: blk_mq_run_hw_queue+0x637/0xa00 [ 414.375342] #3: ffff88810edd1098 (&sbq->ws[i].wait){+.?.}-{2:2}, at: blk_mq_dispatch_rq_list+0x131c/0x1ee0 [ 414.376377] #4: ffff888106205a08 (&hctx->dispatch_wait_lock){+.-.}-{2:2}, at: blk_mq_dispatch_rq_list+0x1337/0x1ee0 [ 414.378607] stack backtrace: [ 414.379177] CPU: 0 PID: 1152 Comm: kworker/u10:3 Not tainted 6.6.0-07439-gba2303cacfda #6 [ 414.380032] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014 [ 414.381177] Workqueue: writeback wb_workfn (flush-253:0) [ 414.381805] Call Trace: [ 414.382136] <TASK> [ 414.382429] dump_stack_lvl+0x91/0xf0 [ 414.382884] mark_lock_irq+0xb3b/0x1260 [ 414.383367] ? __pfx_mark_lock_irq+0x10/0x10 [ 414.383889] ? stack_trace_save+0x8e/0xc0 [ 414.384373] ? __pfx_stack_trace_save+0x10/0x10 [ 414.384903] ? graph_lock+0xcf/0x410 [ 414.385350] ? save_trace+0x3d/0xc70 [ 414.385808] mark_lock.part.20+0x56d/0xa90 [ 414.386317] mark_held_locks+0xb0/0x110 [ 414.386791] ? __pfx_do_raw_spin_lock+0x10/0x10 [ 414.387320] lockdep_hardirqs_on_prepare+0x297/0x3f0 [ 414.387901] ? _raw_spin_unlock_irq+0x28/0x50 [ 414.388422] trace_hardirqs_on+0x58/0x100 [ 414.388917] _raw_spin_unlock_irq+0x28/0x50 [ 414.389422] __blk_mq_tag_busy+0x1d6/0x2a0 [ 414.389920] __blk_mq_get_driver_tag+0x761/0x9f0 [ 414.390899] blk_mq_dispatch_rq_list+0x1780/0x1ee0 [ 414.391473] ? __pfx_blk_mq_dispatch_rq_list+0x10/0x10 [ 414.392070] ? sbitmap_get+0x2b8/0x450 [ 414.392533] ? __blk_mq_get_driver_tag+0x210/0x9f0 [ 414.393095] __blk_mq_sched_dispatch_requests+0xd99/0x1690 [ 414.393730] ? elv_attempt_insert_merge+0x1b1/0x420 [ 414.394302] ? __pfx___blk_mq_sched_dispatch_requests+0x10/0x10 [ 414.394970] ? lock_acquire+0x18d/0x460 [ 414.395456] ? blk_mq_run_hw_queue+0x637/0xa00 [ 414.395986] ? __pfx_lock_acquire+0x10/0x10 [ 414.396499] blk_mq_sched_dispatch_requests+0x109/0x190 [ 414.397100] blk_mq_run_hw_queue+0x66e/0xa00 [ 414.397616] blk_mq_flush_plug_list.part.17+0x614/0x2030 [ 414.398244] ? __pfx_blk_mq_flush_plug_list.part.17+0x10/0x10 [ 414.398897] ? writeback_sb_inodes+0x241/0xcc0 [ 414.399429] blk_mq_flush_plug_list+0x65/0x80 [ 414.399957] __blk_flush_plug+0x2f1/0x530 [ 414.400458] ? __pfx___blk_flush_plug+0x10/0x10 [ 414.400999] blk_finish_plug+0x59/0xa0 [ 414.401467] wb_writeback+0x7cc/0x920 [ 414.401935] ? __pfx_wb_writeback+0x10/0x10 [ 414.402442] ? mark_held_locks+0xb0/0x110 [ 414.402931] ? __pfx_do_raw_spin_lock+0x10/0x10 [ 414.403462] ? lockdep_hardirqs_on_prepare+0x297/0x3f0 [ 414.404062] wb_workfn+0x2b3/0xcf0 [ 414.404500] ? __pfx_wb_workfn+0x10/0x10 [ 414.404989] process_scheduled_works+0x432/0x13f0 [ 414.405546] ? __pfx_process_scheduled_works+0x10/0x10 [ 414.406139] ? do_raw_spin_lock+0x101/0x2a0 [ 414.406641] ? assign_work+0x19b/0x240 [ 414.407106] ? lock_is_held_type+0x9d/0x110 [ 414.407604] worker_thread+0x6f2/0x1160 [ 414.408075] ? __kthread_parkme+0x62/0x210 [ 414.408572] ? lockdep_hardirqs_on_prepare+0x297/0x3f0 [ 414.409168] ? __kthread_parkme+0x13c/0x210 [ 414.409678] ? __pfx_worker_thread+0x10/0x10 [ 414.410191] kthread+0x33c/0x440 [ 414.410602] ? __pfx_kthread+0x10/0x10 [ 414.411068] ret_from_fork+0x4d/0x80 [ 414.411526] ? __pfx_kthread+0x10/0x10 [ 414.411993] ret_from_fork_asm+0x1b/0x30 [ 414.412489] </TASK> When interrupt is turned on while a lock holding by spin_lock_irq it throws a warning because of potential deadlock. blk_mq_prep_dispatch_rq blk_mq_get_driver_tag __blk_mq_get_driver_tag __blk_mq_alloc_driver_tag blk_mq_tag_busy -> tag is already busy // failed to get driver tag blk_mq_mark_tag_wait spin_lock_irq(&wq->lock) -> lock A (&sbq->ws[i].wait) __add_wait_queue(wq, wait) -> wait queue active blk_mq_get_driver_tag __blk_mq_tag_busy -> 1) tag must be idle, which means there can't be inflight IO spin_lock_irq(&tags->lock) -> lock B (hctx->tags) spin_unlock_irq(&tags->lock) -> unlock B, turn on interrupt accidentally -> 2) context must be preempt by IO interrupt to trigger deadlock. As shown above, the deadlock is not possible in theory, but the warning still need to be fixed. Fix it by using spin_lock_irqsave to get lockB instead of spin_lock_irq. Fixes: `4f1731df60` ("blk-mq: fix potential io hang by wrong 'wake_batch'") Signed-off-by: Li Lingfeng <lilingfeng3@huawei.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Yu Kuai <yukuai3@huawei.com> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/20240815024736.2040971-1-lilingfeng@huaweicloud.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2024-08-15 19:25:03 -06:00
Dave Airlie	9370e43071	amd-drm-fixes-6.11-2024-08-14: amdgpu: - Fix MES ring buffer overflow - DCN 3.5 fix - DCN 3.2.1 fix - DP MST fix - Cursor fixes - JPEG fixes - Context ops validation - MES 12 fixes - VCN 5.0 fix - HDP fix -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQQgO5Idg2tXNTSZAr293/aFa7yZ2AUCZr0iSgAKCRC93/aFa7yZ 2FTmAPoDyjTUeuCzsR9m4eaknueHTz97AqP+oJ7waPfGwSGpqQEAlLAfBiiENCsx 6Wbu8yUeEwim0cmZNGkwDA+FRf020wY= =9ysE -----END PGP SIGNATURE----- Merge tag 'amd-drm-fixes-6.11-2024-08-14' of https://gitlab.freedesktop.org/agd5f/linux into drm-fixes amd-drm-fixes-6.11-2024-08-14: amdgpu: - Fix MES ring buffer overflow - DCN 3.5 fix - DCN 3.2.1 fix - DP MST fix - Cursor fixes - JPEG fixes - Context ops validation - MES 12 fixes - VCN 5.0 fix - HDP fix Signed-off-by: Dave Airlie <airlied@redhat.com> From: Alex Deucher <alexander.deucher@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240814213846.1331827-1-alexander.deucher@amd.com	2024-08-16 10:57:10 +10:00
Linus Torvalds	d7a5aa4b3c	perf tools fixes for v6.11 The usual header file sync-ups and one more build fix. * Add README file to explain why we copy the headers * Sync UAPI and other header files with kernel source * Fix build on MIPS 32-bit Signed-off-by: Namhyung Kim <namhyung@kernel.org> -----BEGIN PGP SIGNATURE----- iHUEABYIAB0WIQSo2x5BnqMqsoHtzsmMstVUGiXMgwUCZr6IVAAKCRCMstVUGiXM gz92AQC+WaAY7Xwr5o9My2u8Mjb1u9Njeqy8Vw9ZCoXtP46gAQEA1bGFe0mOo9Yo kdD6Mf214iYZ4nWNlc4aQ94vB6XUzQ0= =sOlo -----END PGP SIGNATURE----- Merge tag 'perf-tools-fixes-for-v6.11-2024-08-15' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools Pull perf tools fixes from Namhyung Kim: "The usual header file sync-ups and one more build fix: - Add README file to explain why we copy the headers - Sync UAPI and other header files with kernel source - Fix build on MIPS 32-bit" * tag 'perf-tools-fixes-for-v6.11-2024-08-15' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools: perf daemon: Fix the build on 32-bit architectures tools/include: Sync arm64 headers with the kernel sources tools/include: Sync x86 headers with the kernel sources tools/include: Sync filesystem headers with the kernel sources tools/include: Sync network socket headers with the kernel sources tools/include: Sync uapi/asm-generic/unistd.h with the kernel sources tools/include: Sync uapi/sound/asound.h with the kernel sources tools/include: Sync uapi/linux/perf.h with the kernel sources tools/include: Sync uapi/linux/kvm.h with the kernel sources tools/include: Sync uapi/drm/i915_drm.h with the kernel sources perf tools: Add tools/include/uapi/README	2024-08-15 16:08:59 -07:00
Gustavo A. R. Silva	5b4f3af39b	smb: smb2pdu.h: Use static_assert() to check struct sizes Commit `9f9bef9bc5` ("smb: smb2pdu.h: Avoid -Wflex-array-member-not-at-end warnings") introduced tagged `struct create_context_hdr`. We want to ensure that when new members need to be added to the flexible structure, they are always included within this tagged struct. So, we use `static_assert()` to ensure that the memory layout for both the flexible structure and the tagged struct is the same after any changes. Acked-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>	2024-08-15 16:06:55 -05:00
Steve French	836bb3268d	smb3: fix lock breakage for cached writes Mandatory locking is enforced for cached writes, which violates default posix semantics, and also it is enforced inconsistently. This apparently breaks recent versions of libreoffice, but can also be demonstrated by opening a file twice from the same client, locking it from handle one and writing to it from handle two (which fails, returning EACCES). Since there was already a mount option "forcemandatorylock" (which defaults to off), with this change only when the user intentionally specifies "forcemandatorylock" on mount will we break posix semantics on write to a locked range (ie we will only fail the write in this case, if the user mounts with "forcemandatorylock"). Fixes: `85160e03a7` ("CIFS: Implement caching mechanism for mandatory brlocks") Cc: stable@vger.kernel.org Cc: Pavel Shilovsky <piastryyy@gmail.com> Reported-by: abartlet@samba.org Reported-by: Kevin Ottens <kevin.ottens@enioka.com> Reviewed-by: David Howells <dhowells@redhat.com> Signed-off-by: Steve French <stfrench@microsoft.com>	2024-08-15 16:04:47 -05:00
Jens Axboe	50faba7771	Merge tag 'md-6.11-20240815' of https://git.kernel.org/pub/scm/linux/kernel/git/song/md into block-6.11 Pull MD fix from Song: "This patch fixes a potential data corruption in degraded raid0 array with slow (WriteMostly) drives. This issue was introduced in upstream 6.9 kernel." * tag 'md-6.11-20240815' of https://git.kernel.org/pub/scm/linux/kernel/git/song/md: md/raid1: Fix data corruption for degraded array with slow disk	2024-08-15 14:53:08 -06:00
Yu Kuai	c916ca3530	md/raid1: Fix data corruption for degraded array with slow disk read_balance() will avoid reading from slow disks as much as possible, however, if valid data only lands in slow disks, and a new normal disk is still in recovery, unrecovered data can be read: raid1_read_request read_balance raid1_should_read_first -> return false choose_best_rdev -> normal disk is not recovered, return -1 choose_bb_rdev -> missing the checking of recovery, return the normal disk -> read unrecovered data Root cause is that the checking of recovery is missing in choose_bb_rdev(). Hence add such checking to fix the problem. Also fix similar problem in choose_slow_rdev(). Cc: stable@vger.kernel.org Fixes: `9f3ced7922` ("md/raid1: factor out choose_bb_rdev() from read_balance()") Fixes: `dfa8ecd167` ("md/raid1: factor out choose_slow_rdev() from read_balance()") Reported-and-tested-by: Mateusz Jończyk <mat.jonczyk@o2.pl> Closes: https://lore.kernel.org/all/9952f532-2554-44bf-b906-4880b2e88e3a@o2.pl/ Signed-off-by: Yu Kuai <yukuai3@huawei.com> Link: https://lore.kernel.org/r/20240803091137.3197008-1-yukuai1@huaweicloud.com Signed-off-by: Song Liu <song@kernel.org>	2024-08-15 13:38:17 -07:00
Su Hui	74c2ab6d65	smb/client: avoid possible NULL dereference in cifs_free_subrequest() Clang static checker (scan-build) warning: cifsglob.h:line 890, column 3 Access to field 'ops' results in a dereference of a null pointer. Commit `519be98971` ("cifs: Add a tracepoint to track credits involved in R/W requests") adds a check for 'rdata->server', and let clang throw this warning about NULL dereference. When 'rdata->credits.value != 0 && rdata->server == NULL' happens, add_credits_and_wake_if() will call rdata->server->ops->add_credits(). This will cause NULL dereference problem. Add a check for 'rdata->server' to avoid NULL dereference. Cc: stable@vger.kernel.org Fixes: `69c3c023af` ("cifs: Implement netfslib hooks") Reviewed-by: David Howells <dhowells@redhat.com> Signed-off-by: Su Hui <suhui@nfschina.com> Signed-off-by: Steve French <stfrench@microsoft.com>	2024-08-15 15:32:30 -05:00
Palmer Dabbelt	32d5f7add0	Merge patch series "RISC-V: hwprobe: Misaligned scalar perf fix and rename" Evan Green <evan@rivosinc.com> says: The CPUPERF0 hwprobe key was documented and identified in code as a bitmask value, but its contents were an enum. This produced incorrect behavior in conjunction with the WHICH_CPUS hwprobe flag. The first patch in this series fixes the bitmask/enum problem by creating a new hwprobe key that returns the same data, but is properly described as a value instead of a bitmask. The second patch renames the value definitions in preparation for adding vector misaligned access info. As of this version, the old defines are kept in place to maintain source compatibility with older userspace programs. * b4-shazam-merge: RISC-V: hwprobe: Add SCALAR to misaligned perf defines RISC-V: hwprobe: Add MISALIGNED_PERF key Link: https://lore.kernel.org/r/20240809214444.3257596-1-evan@rivosinc.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>	2024-08-15 13:12:21 -07:00
Alexandre Ghiti	e01d48c699	riscv: Fix out-of-bounds when accessing Andes per hart vendor extension array The out-of-bounds access is reported by UBSAN: [ 0.000000] UBSAN: array-index-out-of-bounds in ../arch/riscv/kernel/vendor_extensions.c:41:66 [ 0.000000] index -1 is out of range for type 'riscv_isavendorinfo [32]' [ 0.000000] CPU: 0 UID: 0 PID: 0 Comm: swapper Not tainted 6.11.0-rc2ubuntu-defconfig #2 [ 0.000000] Hardware name: riscv-virtio,qemu (DT) [ 0.000000] Call Trace: [ 0.000000] [<ffffffff94e078ba>] dump_backtrace+0x32/0x40 [ 0.000000] [<ffffffff95c83c1a>] show_stack+0x38/0x44 [ 0.000000] [<ffffffff95c94614>] dump_stack_lvl+0x70/0x9c [ 0.000000] [<ffffffff95c94658>] dump_stack+0x18/0x20 [ 0.000000] [<ffffffff95c8bbb2>] ubsan_epilogue+0x10/0x46 [ 0.000000] [<ffffffff95485a82>] __ubsan_handle_out_of_bounds+0x94/0x9c [ 0.000000] [<ffffffff94e09442>] __riscv_isa_vendor_extension_available+0x90/0x92 [ 0.000000] [<ffffffff94e043b6>] riscv_cpufeature_patch_func+0xc4/0x148 [ 0.000000] [<ffffffff94e035f8>] _apply_alternatives+0x42/0x50 [ 0.000000] [<ffffffff95e04196>] apply_boot_alternatives+0x3c/0x100 [ 0.000000] [<ffffffff95e05b52>] setup_arch+0x85a/0x8bc [ 0.000000] [<ffffffff95e00ca0>] start_kernel+0xa4/0xfb6 The dereferencing using cpu should actually not happen, so remove it. Fixes: `23c996fc2b` ("riscv: Extend cpufeature.c to detect vendor extensions") Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com> Link: https://lore.kernel.org/r/20240814192619.276794-1-alexghiti@rivosinc.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>	2024-08-15 13:12:16 -07:00
David Gstir	0e28bf61a5	KEYS: trusted: dcp: fix leak of blob encryption key Trusted keys unseal the key blob on load, but keep the sealed payload in the blob field so that every subsequent read (export) will simply convert this field to hex and send it to userspace. With DCP-based trusted keys, we decrypt the blob encryption key (BEK) in the Kernel due hardware limitations and then decrypt the blob payload. BEK decryption is done in-place which means that the trusted key blob field is modified and it consequently holds the BEK in plain text. Every subsequent read of that key thus send the plain text BEK instead of the encrypted BEK to userspace. This issue only occurs when importing a trusted DCP-based key and then exporting it again. This should rarely happen as the common use cases are to either create a new trusted key and export it, or import a key blob and then just use it without exporting it again. Fix this by performing BEK decryption and encryption in a dedicated buffer. Further always wipe the plain text BEK buffer to prevent leaking the key via uninitialized memory. Cc: stable@vger.kernel.org # v6.10+ Fixes: `2e8a0f40a3` ("KEYS: trusted: Introduce NXP DCP-backed trusted keys") Signed-off-by: David Gstir <david@sigma-star.at> Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org> Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>	2024-08-15 22:01:14 +03:00
David Gstir	6486cad00a	KEYS: trusted: fix DCP blob payload length assignment The DCP trusted key type uses the wrong helper function to store the blob's payload length which can lead to the wrong byte order being used in case this would ever run on big endian architectures. Fix by using correct helper function. Cc: stable@vger.kernel.org # v6.10+ Fixes: `2e8a0f40a3` ("KEYS: trusted: Introduce NXP DCP-backed trusted keys") Suggested-by: Richard Weinberger <richard@nod.at> Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202405240610.fj53EK0q-lkp@intel.com/ Signed-off-by: David Gstir <david@sigma-star.at> Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org> Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>	2024-08-15 22:01:14 +03:00
Linus Torvalds	e724918b37	hardening fixes for v6.11-rc4 - gcc-plugins: randstruct: Remove GCC 4.7 or newer requirement (Thorsten Blum) - kallsyms: Clean up interaction with LTO suffixes (Song Liu) - refcount: Report UAF for refcount_sub_and_test(0) when counter==0 (Petr Pavlu) - kunit/overflow: Avoid misallocation of driver name (Ivan Orlov) -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQRSPkdeREjth1dHnSE2KwveOeQkuwUCZr5D6wAKCRA2KwveOeQk u5dXAQC9ddd3iHqDAWfbCLY41/5K3KByFspVqf8hw2sFK3Uq9wD/eWU0hWFIk1gq 1hUSb7vExo+oiahYPKIUMx5Zf69hHAk= =dmVd -----END PGP SIGNATURE----- Merge tag 'hardening-v6.11-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux Pull hardening fixes from Kees Cook: - gcc-plugins: randstruct: Remove GCC 4.7 or newer requirement (Thorsten Blum) - kallsyms: Clean up interaction with LTO suffixes (Song Liu) - refcount: Report UAF for refcount_sub_and_test(0) when counter==0 (Petr Pavlu) - kunit/overflow: Avoid misallocation of driver name (Ivan Orlov) * tag 'hardening-v6.11-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: kallsyms: Match symbols exactly with CONFIG_LTO_CLANG kallsyms: Do not cleanup .llvm.<hash> suffix before sorting symbols kunit/overflow: Fix UB in overflow_allocation_test gcc-plugins: randstruct: Remove GCC 4.7 or newer requirement refcount: Report UAF for refcount_sub_and_test(0) when counter==0	2024-08-15 11:50:07 -07:00
Naohiro Aota	e30729d4bd	btrfs: zoned: properly take lock to read/update block group's zoned variables __btrfs_add_free_space_zoned() references and modifies bg's alloc_offset, ro, and zone_unusable, but without taking the lock. It is mostly safe because they monotonically increase (at least for now) and this function is mostly called by a transaction commit, which is serialized by itself. Still, taking the lock is a safer and correct option and I'm going to add a change to reset zone_unusable while a block group is still alive. So, add locking around the operations. Fixes: `169e0da91a` ("btrfs: zoned: track unusable bytes for zones") CC: stable@vger.kernel.org # 5.15+ Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2024-08-15 20:35:56 +02:00
Qu Wenruo	008e2512dc	btrfs: tree-checker: add dev extent item checks [REPORT] There is a corruption report that btrfs refused to mount a fs that has overlapping dev extents: BTRFS error (device sdc): dev extent devid 4 physical offset 14263979671552 overlap with previous dev extent end 14263980982272 BTRFS error (device sdc): failed to verify dev extents against chunks: -117 BTRFS error (device sdc): open_ctree failed [CAUSE] The direct cause is very obvious, there is a bad dev extent item with incorrect length. With btrfs check reporting two overlapping extents, the second one shows some clue on the cause: ERROR: dev extent devid 4 offset 14263979671552 len 6488064 overlap with previous dev extent end 14263980982272 ERROR: dev extent devid 13 offset 2257707008000 len 6488064 overlap with previous dev extent end 2257707270144 ERROR: errors found in extent allocation tree or chunk allocation The second one looks like a bitflip happened during new chunk allocation: hex(2257707008000) = 0x20da9d30000 hex(2257707270144) = 0x20da9d70000 diff = 0x00000040000 So it looks like a bitflip happened during new dev extent allocation, resulting the second overlap. Currently we only do the dev-extent verification at mount time, but if the corruption is caused by memory bitflip, we really want to catch it before writing the corruption to the storage. Furthermore the dev extent items has the following key definition: (<device id> DEV_EXTENT <physical offset>) Thus we can not just rely on the generic key order check to make sure there is no overlapping. [ENHANCEMENT] Introduce dedicated dev extent checks, including: - Fixed member checks * chunk_tree should always be BTRFS_CHUNK_TREE_OBJECTID (3) * chunk_objectid should always be BTRFS_FIRST_CHUNK_CHUNK_TREE_OBJECTID (256) - Alignment checks * chunk_offset should be aligned to sectorsize * length should be aligned to sectorsize * key.offset should be aligned to sectorsize - Overlap checks If the previous key is also a dev-extent item, with the same device id, make sure we do not overlap with the previous dev extent. Reported: Stefan N <stefannnau@gmail.com> Link: https://lore.kernel.org/linux-btrfs/CA+W5K0rSO3koYTo=nzxxTm1-Pdu1HYgVxEpgJ=aGc7d=E8mGEg@mail.gmail.com/ CC: stable@vger.kernel.org # 5.10+ Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2024-08-15 20:35:52 +02:00
Jeff Layton	3bc2ac2f8f	btrfs: update target inode's ctime on unlink Unlink changes the link count on the target inode. POSIX mandates that the ctime must also change when this occurs. According to https://pubs.opengroup.org/onlinepubs/9699919799/functions/unlink.html: "Upon successful completion, unlink() shall mark for update the last data modification and last file status change timestamps of the parent directory. Also, if the file's link count is not 0, the last file status change timestamp of the file shall be marked for update." Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: David Sterba <dsterba@suse.com> [ add link to the opengroup docs ] Signed-off-by: David Sterba <dsterba@suse.com>	2024-08-15 20:35:44 +02:00
Thorsten Blum	c0247d289e	btrfs: send: annotate struct name_cache_entry with __counted_by() Add the __counted_by compiler attribute to the flexible array member name to improve access bounds-checking via CONFIG_UBSAN_BOUNDS and CONFIG_FORTIFY_SOURCE. Signed-off-by: Thorsten Blum <thorsten.blum@toblux.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2024-08-15 20:35:32 +02:00
Linus Torvalds	a4a35f6cbe	Including fixes from wireless and netfilter Current release - regressions: - udp: fall back to software USO if IPv6 extension headers are present - wifi: iwlwifi: correctly lookup DMA address in SG table Current release - new code bugs: - eth: mlx5e: fix queue stats access to non-existing channels splat Previous releases - regressions: - eth: mlx5e: take state lock during tx timeout reporter - eth: mlxbf_gige: disable RX filters until RX path initialized - eth: igc: fix reset adapter logics when tx mode change Previous releases - always broken: - tcp: update window clamping condition - netfilter: - nf_queue: drop packets with cloned unconfirmed conntracks - nf_tables: Add locking for NFT_MSG_GETOBJ_RESET requests - vsock: fix recursive ->recvmsg calls - dsa: vsc73xx: fix MDIO bus access and PHY opera - eth: gtp: pull network headers in gtp_dev_xmit() - eth: igc: fix packet still tx after gate close by reducing i226 MAC retry buffer - eth: mana: fix RX buf alloc_size alignment and atomic op panic - eth: hns3: fix a deadlock problem when config TC during resetting Signed-off-by: Paolo Abeni <pabeni@redhat.com> -----BEGIN PGP SIGNATURE----- iQJGBAABCAAwFiEEg1AjqC77wbdLX2LbKSR5jcyPE6QFAma+CzgSHHBhYmVuaUBy ZWRoYXQuY29tAAoJECkkeY3MjxOk46wP/0hNLi1qNd+zdv6E5nqYJ6ckgKJbaIwR mM3VGmZtLQXTNApzihFsmfEqT1EiIQuiVW4rqu0eJ28oMezDWyrEKHORx+BL4Omj 6qnygnxQw1fDrhvTfZKXyOJw6mpJL3AMbygtw9DG1se4S5kbmo8cdTI9i9Q4Qcon ms6CExsHLR1Mtf2XIs8K45XQC07CMy76dvd30VxOKus/bHHt+KBnNJ9B12kBYpbD 4Bko63KeJwhZ4n5soIC8MeqXcU1GyF+AgzQhGvuks8EvUVa4XfW7unxLZwuUsf0J ZPEKCTBinb1adZnHUx7CYRVHhzi+ptQfFW3bACAkK5cWSy8u0KLOb9Aoe68+HDev Qor2Hg3SckoFfXBEoZE0GbU+SosXMXIrs6qXOaMNo1gz062N7ZT8DoT6fNBamB31 N8QsiNTOyYDZ6icoTir1PCEvuDyx+QVIdTYAKx8wc3Q5FbpHBDTeStNFZgskTW+/ vEcOy23nXT0WImWP6wnK0REYur9UPb/pHwuBeglgBg/0zwuqioHpIjFUnphvQzBt kabkX/G4Un44w9E97/ERB7vmR1iKHPTtuU9xIsoO7dMDWxKi8v2TV6f/IBugAEFD Bx3frQFNayrhEnjm/dNnnwLpI0TZbw1YekVWBCk6pB1m7U+bpJHZfyipYloe8/yB TfoX+7zCQJtA =o4nr -----END PGP SIGNATURE----- Merge tag 'net-6.11-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Paolo Abeni: "Including fixes from wireless and netfilter Current release - regressions: - udp: fall back to software USO if IPv6 extension headers are present - wifi: iwlwifi: correctly lookup DMA address in SG table Current release - new code bugs: - eth: mlx5e: fix queue stats access to non-existing channels splat Previous releases - regressions: - eth: mlx5e: take state lock during tx timeout reporter - eth: mlxbf_gige: disable RX filters until RX path initialized - eth: igc: fix reset adapter logics when tx mode change Previous releases - always broken: - tcp: update window clamping condition - netfilter: - nf_queue: drop packets with cloned unconfirmed conntracks - nf_tables: Add locking for NFT_MSG_GETOBJ_RESET requests - vsock: fix recursive ->recvmsg calls - dsa: vsc73xx: fix MDIO bus access and PHY opera - eth: gtp: pull network headers in gtp_dev_xmit() - eth: igc: fix packet still tx after gate close by reducing i226 MAC retry buffer - eth: mana: fix RX buf alloc_size alignment and atomic op panic - eth: hns3: fix a deadlock problem when config TC during resetting" * tag 'net-6.11-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (58 commits) net: hns3: use correct release function during uninitialization net: hns3: void array out of bound when loop tnl_num net: hns3: fix a deadlock problem when config TC during resetting net: hns3: use the user's cfg after reset net: hns3: fix wrong use of semaphore up selftests: net: lib: kill PIDs before del netns pse-core: Conditionally set current limit during PI regulator registration net: thunder_bgx: Fix netdev structure allocation net: ethtool: Allow write mechanism of LPL and both LPL and EPL vsock: fix recursive ->recvmsg calls selftest: af_unix: Fix kselftest compilation warnings netfilter: nf_tables: Add locking for NFT_MSG_GETOBJ_RESET requests netfilter: nf_tables: Introduce nf_tables_getobj_single netfilter: nf_tables: Audit log dump reset after the fact selftests: netfilter: add test for br_netfilter+conntrack+queue combination netfilter: nf_queue: drop packets with cloned unconfirmed conntracks netfilter: flowtable: initialise extack before use netfilter: nfnetlink: Initialise extack before use in ACKs netfilter: allow ipv6 fragments to arrive on different devices tcp: Update window clamping condition ...	2024-08-15 10:35:20 -07:00
Linus Torvalds	20573d8e1c	media fixes for v6.11-rc4 -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEE+QmuaPwR3wnBdVwACF8+vY7k4RUFAma+F6UACgkQCF8+vY7k 4RU3sw/+P+YKaJqk2apUTNO2FXGXmqkgEtAo3uVnf3bQL30hks0wt3sdnrW42WGl DktqvAWamy68olTjrBo+cubtgPb1GRAPU/E+SXH4SgtkfxuIKs4Yco+opdTHbSQ/ 5iAX5V9sGR3ulTKVRYbC/WXgPOSYJDSpxpowZ9rAWywe/beG//lwoQRGEj0vv0Mm rV3Ym8/Hfdz8J01wtxTE3IYbg0g3/pvDG1ESrPFheYBWRMqtBIS5zLKvn3yFrv+x AwJc9NQGdFDGIUTwSzTjSkz3DZYJckhAfQLvyyzfTsAywu5uJGCIsy0ugumOSx9v ah0bM4k+8J80SuvItLAOOYyakdl6KuBqxqt060CwhuOirSOx/7JmaE0PnhoyPFXY D6aXn8Lcfjut+PwoPljC3OrgcOTwbbBy4/sTGWhGye1e7XlN5ZcV5PUIZPHr3BQz Rh0tBBUIYMo4nX1U1CJPxE+BEiv7FLahaksRl8HNbqR05a+Ij7fQk1GRClNYRTLQ Ai0g73HVeYAKlWbiVe09xVXvle5M1pOI+CQzgNITC2SA7A7AhKVHafPNYjuUxvUK bIVM7FseaGqgoip5k3DcURI1TMkTX0PO9VZjiqlx4phD4LV0hSIZ3/OX9Cqs2y36 SoQJ8OX0SubyUW34lLZ8ZsLpnAO5wk8xWA3mU1s/lUPqpXTBuvM= =3yt4 -----END PGP SIGNATURE----- Merge tag 'media/v6.11-3' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media Pull media fixes from Mauro Carvalho Chehab: "Two regression fixes: - fix atomisp support for ISP2400 - fix dvb-usb regression for TeVii s480 dual DVB-S2 S660 board" * tag 'media/v6.11-3' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media: media: atomisp: Fix streaming no longer working on BYT / ISP2400 devices media: Revert "media: dvb-usb: Fix unexpected infinite loop in dvb_usb_read_remote_control()"	2024-08-15 10:23:19 -07:00
Linus Torvalds	6e80a1fd99	ata fixes for 6.11-rc4 - Revert a recent change to sense data generation. Sense data can be in either fixed format or descriptor format. The D_SENSE bit in the Control mode page controls which format to generate. All places but one respected the D_SENSE bit. The recent change fixed the one place that didn't respect the D_SENSE bit. However, it turns out that hdparm, hddtemp and udisks (incorrectly) assumes sense data in descriptor format. Therefore, even while the change was technically correct, revert it, since even if these user space programs are fixed to (correctly) look at the format type before parsing the data, older versions of these tools will be around roughly forever. -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQRN+ES/c4tHlMch3DzJZDGjmcZNcgUCZr321QAKCRDJZDGjmcZN cgbZAQDwRS2GHNdHyWqHZLvnX8Y72pNbVDdm1F6xfCDHl9TrBwEA9r/lWO9tHiYA eRP+4FeqLpLa3GVAcJoWWlLgIOGIgAU= =KTUJ -----END PGP SIGNATURE----- Merge tag 'ata-6.11-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/libata/linux Pull ata fix from Niklas Cassel: - Revert a recent change to sense data generation. Sense data can be in either fixed format or descriptor format. The D_SENSE bit in the Control mode page controls which format to generate. All places but one respected the D_SENSE bit. The recent change fixed the one place that didn't respect the D_SENSE bit. However, it turns out that hdparm, hddtemp and udisks (incorrectly) assumes sense data in descriptor format. Therefore, even while the change was technically correct, revert it, since even if these user space programs are fixed to (correctly) look at the format type before parsing the data, older versions of these tools will be around roughly forever. * tag 'ata-6.11-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/libata/linux: Revert "ata: libata-scsi: Honor the D_SENSE bit for CK_COND=1 and no error"	2024-08-15 10:10:59 -07:00
Song Liu	fb6a421fb6	kallsyms: Match symbols exactly with CONFIG_LTO_CLANG With CONFIG_LTO_CLANG=y, the compiler may add .llvm.<hash> suffix to function names to avoid duplication. APIs like kallsyms_lookup_name() and kallsyms_on_each_match_symbol() tries to match these symbol names without the .llvm.<hash> suffix, e.g., match "c_stop" with symbol c_stop.llvm.17132674095431275852. This turned out to be problematic for use cases that require exact match, for example, livepatch. Fix this by making the APIs to match symbols exactly. Also cleanup kallsyms_selftests accordingly. Signed-off-by: Song Liu <song@kernel.org> Fixes: `8cc32a9bbf` ("kallsyms: strip LTO-only suffixes from promoted global functions") Tested-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Acked-by: Petr Mladek <pmladek@suse.com> Reviewed-by: Sami Tolvanen <samitolvanen@google.com> Reviewed-by: Luis Chamberlain <mcgrof@kernel.org> Link: https://lore.kernel.org/r/20240807220513.3100483-3-song@kernel.org Signed-off-by: Kees Cook <kees@kernel.org>	2024-08-15 09:33:35 -07:00
Song Liu	020925ce92	kallsyms: Do not cleanup .llvm.<hash> suffix before sorting symbols Cleaning up the symbols causes various issues afterwards. Let's sort the list based on original name. Signed-off-by: Song Liu <song@kernel.org> Fixes: `8cc32a9bbf` ("kallsyms: strip LTO-only suffixes from promoted global functions") Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Tested-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Acked-by: Petr Mladek <pmladek@suse.com> Reviewed-by: Sami Tolvanen <samitolvanen@google.com> Reviewed-by: Luis Chamberlain <mcgrof@kernel.org> Link: https://lore.kernel.org/r/20240807220513.3100483-2-song@kernel.org Signed-off-by: Kees Cook <kees@kernel.org>	2024-08-15 09:33:34 -07:00
Ivan Orlov	92e9bac181	kunit/overflow: Fix UB in overflow_allocation_test The 'device_name' array doesn't exist out of the 'overflow_allocation_test' function scope. However, it is being used as a driver name when calling 'kunit_driver_create' from 'kunit_device_register'. It produces the kernel panic with KASAN enabled. Since this variable is used in one place only, remove it and pass the device name into kunit_device_register directly as an ascii string. Signed-off-by: Ivan Orlov <ivan.orlov0322@gmail.com> Reviewed-by: David Gow <davidgow@google.com> Link: https://lore.kernel.org/r/20240815000431.401869-1-ivan.orlov0322@gmail.com Signed-off-by: Kees Cook <kees@kernel.org>	2024-08-15 09:24:55 -07:00
Griffin Kroah-Hartman	9bb5e74b2b	Revert "misc: fastrpc: Restrict untrusted app to attach to privileged PD" This reverts commit `bab2f5e8fd`. Joel reported that this commit breaks userspace and stops sensors in SDM845 from working. Also breaks other qcom SoC devices running postmarketOS. Cc: stable <stable@kernel.org> Cc: Ekansh Gupta <quic_ekangupt@quicinc.com> Cc: Dmitry Baryshkov <dmitry.baryshkov@linaro.org> Reported-by: Joel Selvaraj <joelselvaraj.oss@gmail.com> Link: https://lore.kernel.org/r/9a9f5646-a554-4b65-8122-d212bb665c81@umsystem.edu Signed-off-by: Griffin Kroah-Hartman <griffin@kroah.com> Acked-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org> Fixes: `bab2f5e8fd` ("misc: fastrpc: Restrict untrusted app to attach to privileged PD") Link: https://lore.kernel.org/r/20240815094920.8242-1-griffin@kroah.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2024-08-15 16:59:14 +02:00
Mathias Nyman	af8e119f52	xhci: Fix Panther point NULL pointer deref at full-speed re-enumeration re-enumerating full-speed devices after a failed address device command can trigger a NULL pointer dereference. Full-speed devices may need to reconfigure the endpoint 0 Max Packet Size value during enumeration. Usb core calls usb_ep0_reinit() in this case, which ends up calling xhci_configure_endpoint(). On Panther point xHC the xhci_configure_endpoint() function will additionally check and reserve bandwidth in software. Other hosts do this in hardware If xHC address device command fails then a new xhci_virt_device structure is allocated as part of re-enabling the slot, but the bandwidth table pointers are not set up properly here. This triggers the NULL pointer dereference the next time usb_ep0_reinit() is called and xhci_configure_endpoint() tries to check and reserve bandwidth [46710.713538] usb 3-1: new full-speed USB device number 5 using xhci_hcd [46710.713699] usb 3-1: Device not responding to setup address. [46710.917684] usb 3-1: Device not responding to setup address. [46711.125536] usb 3-1: device not accepting address 5, error -71 [46711.125594] BUG: kernel NULL pointer dereference, address: 0000000000000008 [46711.125600] #PF: supervisor read access in kernel mode [46711.125603] #PF: error_code(0x0000) - not-present page [46711.125606] PGD 0 P4D 0 [46711.125610] Oops: Oops: 0000 [#1] PREEMPT SMP PTI [46711.125615] CPU: 1 PID: 25760 Comm: kworker/1:2 Not tainted 6.10.3_2 #1 [46711.125620] Hardware name: Gigabyte Technology Co., Ltd. [46711.125623] Workqueue: usb_hub_wq hub_event [usbcore] [46711.125668] RIP: 0010:xhci_reserve_bandwidth (drivers/usb/host/xhci.c Fix this by making sure bandwidth table pointers are set up correctly after a failed address device command, and additionally by avoiding checking for bandwidth in cases like this where no actual endpoints are added or removed, i.e. only context for default control endpoint 0 is evaluated. Reported-by: Karel Balej <balejk@matfyz.cz> Closes: https://lore.kernel.org/linux-usb/D3CKQQAETH47.1MUO22RTCH2O3@matfyz.cz/ Cc: stable@vger.kernel.org Fixes: `651aaf36a7` ("usb: xhci: Handle USB transaction error on address command") Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com> Link: https://lore.kernel.org/r/20240815141117.2702314-2-mathias.nyman@linux.intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2024-08-15 16:29:46 +02:00
Matthew Brost	f002702290	drm/xe: Hold a PM ref when GT TLB invalidations are inflight Avoid GT TLB invalidation timeouts by holding a PM ref when invalidations are inflight. v2: - Drop PM ref before signaling fence (CI) v3: - Move invalidation_fence_signal helper in tlb timeout to previous patch (Matthew Auld) Fixes: `dd08ebf6c3` ("drm/xe: Introduce a new DRM driver for Intel GPUs") Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Nirmoy Das <nirmoy.das@intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Nirmoy Das <nirmoy.das@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240719172905.1527927-4-matthew.brost@intel.com (cherry picked from commit `0a382f9bc5`) Requires: 46209ce5287b ("drm/xe: Add xe_gt_tlb_invalidation_fence_init helper") Requires: 0e414ab036e0 ("drm/xe: Drop xe_gt_tlb_invalidation_wait") Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2024-08-15 09:44:22 -04:00
Matthew Brost	58bfe66744	drm/xe: Drop xe_gt_tlb_invalidation_wait Having two methods to wait on GT TLB invalidations is not ideal. Remove xe_gt_tlb_invalidation_wait and only use GT TLB invalidation fences. In addition to two methods being less than ideal, once GT TLB invalidations are coalesced the seqno cannot be assigned during xe_gt_tlb_invalidation_ggtt/range. Thus xe_gt_tlb_invalidation_wait would not have a seqno to wait one. A fence however can be armed and later signaled. v3: - Add explaination about coalescing to commit message v4: - Don't put dma fence if defined on stack (CI) v5: - Initialize ret to zero (CI) v6: - Use invalidation_fence_signal helper in tlb timeout (Matthew Auld) Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Nirmoy Das <nirmoy.das@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240719172905.1527927-3-matthew.brost@intel.com (cherry picked from commit `61ac035361`) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2024-08-15 09:44:22 -04:00
Matthew Brost	90be4cc6f7	drm/xe: Add xe_gt_tlb_invalidation_fence_init helper Other layers should not be touching struct xe_gt_tlb_invalidation_fence directly, add helper for initialization. v2: - Add dma_fence_get and list init to xe_gt_tlb_invalidation_fence_init Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Nirmoy Das <nirmoy.das@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240719172905.1527927-2-matthew.brost@intel.com (cherry picked from commit `a522b285c6`) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2024-08-15 09:44:22 -04:00
Michal Wajdeczko	4f7652dcd3	drm/xe/pf: Fix VF config validation on multi-GT platforms When validating VF config on the media GT, we may wrongly report that VF is already partially configured on it, as we consider GGTT and LMEM provisioning done on the primary GT (since both GGTT and LMEM are tile-level resources, not a GT-level). This will cause skipping a VF auto-provisioning on the media-GT and in result will block a VF from successfully initialize that GT. Fix that by considering GGTT and LMEM configurations only when checking if a VF provisioning is complete, and omit GGTT and LMEM when reporting empty/partial provisioning. Fixes: `234670cea9` ("drm/xe/pf: Skip fair VFs provisioning if already provisioned") Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Piotr Piórkowski <piotr.piorkowski@intel.com> Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240806180516.618-1-michal.wajdeczko@intel.com (cherry picked from commit `5bdacb0907`) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2024-08-15 09:44:22 -04:00
Matthew Brost	55ea73aacf	drm/xe: Build PM into GuC CT layer Take PM ref when any G2H are outstanding, drop when none are outstanding. To safely ensure we have PM ref when in the GuC CT layer, a PM ref needs to be held when scheduler messages are pending too. v2: - Add outer PM protections to xe_file_close (CI) v3: - Only take PM ref 0->1 and drop on 1->0 (Matthew Auld) v4: - Add assert to G2H increment function v5: - Rebase v6: - Declare xe as local variable in xe_file_close (CI) Fixes: `dd08ebf6c3` ("drm/xe: Introduce a new DRM driver for Intel GPUs") Cc: Matthew Auld <matthew.auld@intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Nirmoy Das <nirmoy.das@intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Reviewed-by: Nirmoy Das <nirmoy.das@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240719172905.1527927-5-matthew.brost@intel.com (cherry picked from commit `d930c19fdf`) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2024-08-15 09:44:22 -04:00
Michal Wajdeczko	64da63cd3f	drm/xe/vf: Fix register value lookup We should use the number of actual entries stored in the runtime register buffer, not the maximum number of entries that this buffer can hold, otherwise bsearch() may fail and we may miss the data and wrongly report unexpected access to some registers. Fixes: `4edadc41a3` ("drm/xe/vf: Use register values obtained from the PF") Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Piotr Piórkowski <piotr.piorkowski@intel.com> Cc: Matt Roper <matthew.d.roper@intel.com> Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240718203155.486-1-michal.wajdeczko@intel.com (cherry picked from commit `ad16682db1`) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2024-08-15 09:44:22 -04:00
Umesh Nerlige Ramappa	817c70e2ba	drm/xe: Fix use after free when client stats are captured xe_file_close triggers an asynchronous queue cleanup and then frees up the xef object. Since queue cleanup flushes all pending jobs and the KMD stores client usage stats into the xef object after jobs are flushed, we see a use-after-free for the xef object. Resolve this by taking a reference to xef from xe_exec_queue. While at it, revert an earlier change that contained a partial work around for this issue. v2: - Take a ref to xef even for the VM bind queue (Matt) - Squash patches relevant to that fix and work around (Lucas) v3: Fix typo (Lucas) Fixes: `ce62827bc2` ("drm/xe: Do not access xe file when updating exec queue run_ticks") Fixes: `6109f24f87` ("drm/xe: Add helper to accumulate exec queue runtime") Closes: https://gitlab.freedesktop.org/drm/xe/kernel/issues/1908 Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240718210548.3580382-5-umesh.nerlige.ramappa@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> (cherry picked from commit `2149ded630`) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2024-08-15 09:44:22 -04:00

1 2 3 4 5 ...

1295505 Commits