mirror of
https://mirrors.bfsu.edu.cn/git/linux.git
synced 2024-11-13 14:24:11 +08:00
8f4f68e788
42707 Commits
Author | SHA1 | Message | Date | |
---|---|---|---|---|
Lu Jialin
|
8f4f68e788 |
crypto: pcrypt - Fix hungtask for PADATA_RESET
We found a hungtask bug in test_aead_vec_cfg as follows: INFO: task cryptomgr_test:391009 blocked for more than 120 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Call trace: __switch_to+0x98/0xe0 __schedule+0x6c4/0xf40 schedule+0xd8/0x1b4 schedule_timeout+0x474/0x560 wait_for_common+0x368/0x4e0 wait_for_completion+0x20/0x30 wait_for_completion+0x20/0x30 test_aead_vec_cfg+0xab4/0xd50 test_aead+0x144/0x1f0 alg_test_aead+0xd8/0x1e0 alg_test+0x634/0x890 cryptomgr_test+0x40/0x70 kthread+0x1e0/0x220 ret_from_fork+0x10/0x18 Kernel panic - not syncing: hung_task: blocked tasks For padata_do_parallel, when the return err is 0 or -EBUSY, it will call wait_for_completion(&wait->completion) in test_aead_vec_cfg. In normal case, aead_request_complete() will be called in pcrypt_aead_serial and the return err is 0 for padata_do_parallel. But, when pinst->flags is PADATA_RESET, the return err is -EBUSY for padata_do_parallel, and it won't call aead_request_complete(). Therefore, test_aead_vec_cfg will hung at wait_for_completion(&wait->completion), which will cause hungtask. The problem comes as following: (padata_do_parallel) | rcu_read_lock_bh(); | err = -EINVAL; | (padata_replace) | pinst->flags |= PADATA_RESET; err = -EBUSY | if (pinst->flags & PADATA_RESET) | rcu_read_unlock_bh() | return err In order to resolve the problem, we replace the return err -EBUSY with -EAGAIN, which means parallel_data is changing, and the caller should call it again. v3: remove retry and just change the return err. v2: introduce padata_try_do_parallel() in pcrypt_aead_encrypt and pcrypt_aead_decrypt to solve the hungtask. Signed-off-by: Lu Jialin <lujialin4@huawei.com> Signed-off-by: Guo Zihua <guozihua@huawei.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> |
||
Linus Torvalds
|
1b37a0a2d4 |
RISC-V Patches for the 6.6 Merge Window, Part 2 (try 2)
* The kernel now dynamically probes for misaligned access speed, as opposed to relying on a table of known implementations. * Support for non-coherent devices on systems using the Andes AX45MP core, including the RZ/Five SoCs. * Support for the V extension in ptrace(), again. * Support for KASLR. * Support for the BPF prog pack allocator in RISC-V. * A handful of bug fixes and cleanups. -----BEGIN PGP SIGNATURE----- iQJHBAABCAAxFiEEKzw3R0RoQ7JKlDp6LhMZ81+7GIkFAmT8eV0THHBhbG1lckBk YWJiZWx0LmNvbQAKCRAuExnzX7sYiQYTD/9V6asKMDdWUV+gti/gRvJsiYUjIrrK h4MB8hL3fHfCLBpTD4rU6K1Gx6hzPjGsxIuQyAq/hf752KB/9XUiIVziRBv2ZEBb GuTFCXfg0QXBUlxBZzFw5SKUuKXgRaMAQ14qjy3tfLk31YMQmBtAlEPdDM8mZOCQ zNI3bbdn6zASeaSMh7hwBoOJWP2ACoOEW7RcD44EDT8jb3YW5rEF86x0XtYLgJb6 xhaR4ieIdaOLxz2RbjXj0GcPIBfhTxZbwN3fLlD8PxuGqCKn5kN03bPPwP9tMTAc z02EgVcSDvJWpYikuuTkPMxpSi18OZPJ6eriwOv5ccP5NXQScO09iGo7IZEM7OzO j1IrIXyncU4BhxlpWombU454Va+ezUlfh9uh+MrJ+Bnve3T3S9ax7AV4S8vkJZlT bnmJVS/g7L/7nxTQdJ3zoAo2WzFQXL0C8SR5tGo/3aRk0uYoliHy/W419f55F9GZ rFcc+LMqai8N4bLN3whaK0NnuodNWHoNlpcd/5ncJwecswuDkah3LWcd4rwBrWhu 8RIkIfpdr/vTQjUVXVLeMHdKB+lST3iF1feMqJj0PfTyvTZi5yfSppjAfkAdVq+9 lHqAjsaGdiCrOtLxb0oBR2PTDQPAm2gN2meuSMommDQR6Vul8K5WcQml9Zx9QEWA eDXWYDZISKYJbA== =s89m -----END PGP SIGNATURE----- Merge tag 'riscv-for-linus-6.6-mw2-2' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux Pull more RISC-V updates from Palmer Dabbelt: - The kernel now dynamically probes for misaligned access speed, as opposed to relying on a table of known implementations. - Support for non-coherent devices on systems using the Andes AX45MP core, including the RZ/Five SoCs. - Support for the V extension in ptrace(), again. - Support for KASLR. - Support for the BPF prog pack allocator in RISC-V. - A handful of bug fixes and cleanups. * tag 'riscv-for-linus-6.6-mw2-2' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: (25 commits) soc: renesas: Kconfig: For ARCH_R9A07G043 select the required configs if dependencies are met riscv: Kconfig.errata: Add dependency for RISCV_SBI in ERRATA_ANDES config riscv: Kconfig.errata: Drop dependency for MMU in ERRATA_ANDES_CMO config riscv: Kconfig: Select DMA_DIRECT_REMAP only if MMU is enabled bpf, riscv: use prog pack allocator in the BPF JIT riscv: implement a memset like function for text riscv: extend patch_text_nosync() for multiple pages bpf: make bpf_prog_pack allocator portable riscv: libstub: Implement KASLR by using generic functions libstub: Fix compilation warning for rv32 arm64: libstub: Move KASLR handling functions to kaslr.c riscv: Dump out kernel offset information on panic riscv: Introduce virtual kernel mapping KASLR RISC-V: Add ptrace support for vectors soc: renesas: Kconfig: Select the required configs for RZ/Five SoC cache: Add L2 cache management for Andes AX45MP RISC-V core dt-bindings: cache: andestech,ax45mp-cache: Add DT binding documentation for L2 cache controller riscv: mm: dma-noncoherent: nonstandard cache operations support riscv: errata: Add Andes alternative ports riscv: asm: vendorid_list: Add Andes Technology to the vendors list ... |
||
Linus Torvalds
|
474197a4f7 |
dma-mapping fixes for Linux 6.6
- move a dma-debug call that prints a message out from a lock that's causing problems with the lock order in serial drivers (Sergey Senozhatsky) - fix the CONFIG_DMA_NUMA_CMA Kconfig entry to have the right dependency on not default to y (Christoph Hellwig) - move an ifdef a bit to remove a __maybe_unused that seems to trip up some sensitivities (Christoph Hellwig) - revert a bogus check in the CMA allocator (Zhenhua Huang) -----BEGIN PGP SIGNATURE----- iQI/BAABCgApFiEEgdbnc3r/njty3Iq9D55TZVIEUYMFAmT8MJ8LHGhjaEBsc3Qu ZGUACgkQD55TZVIEUYOOyg/7BaZt4lih24cSFhXsU10wB49cKQuOngsT+4KFX7Eg 8LqdCK/L61M1VWm/1tcw6/XKK7oBDADIWnlutFTv1BHIaC0fjbFk0IpOQyWoXsYE vlCzm6Sh1rrryYK/dviLwC3+ccF33C5dzsZsKzeJ89v0cP//rENdOfXhwAIT2Wc7 FG/h0W7wWeQG4jHCz+DGhRp7X6r5urwW72KNap4BlBaDpZdAV9E5w1OeaplnENXt E7CK8rnHZz/AgH98LIMa929fgNhJ7Bec5mV9pItpBzYWAwL2iWk6k11FbAkxDiFQ MQ7gMnH5KRkCVpH/QILX1NU5ImsjdjyCUYn+2q8OJ5Y2C42K7V5ClspIgBCFk7XH AbCAGveoRUoQ3iCnQYZ0YPJOtyNSaUNB+q4NzwR/WFipiXGJBPflggI8gPzRiN8e 8SnKhduODklzLhOZZr7+nUEJpmwgR8aCkmhERZN/bw8iidBs/chiy9t1PfLmiOH9 R545BWEfsgpRikpEgb0HnuGD/zg26LcugLUilNbSZq0fzH4tkD5iCKKUitAITYLP ED0wDj9AdyQ98L6aSgAxc+9ip3eATFntM6hlg/16Ve4zdDTbFgucZpJhdSuVF8BI rq1VV3wztUm++zhkTFptK9eN2T+AQtpC+3XI6QyXpByADbX+m8GbQJm+r8V3hu5A X64= =AvYr -----END PGP SIGNATURE----- Merge tag 'dma-mapping-6.6-2023-09-09' of git://git.infradead.org/users/hch/dma-mapping Pull dma-mapping fixes from Christoph Hellwig: - move a dma-debug call that prints a message out from a lock that's causing problems with the lock order in serial drivers (Sergey Senozhatsky) - fix the CONFIG_DMA_NUMA_CMA Kconfig entry to have the right dependency and not default to y (Christoph Hellwig) - move an ifdef a bit to remove a __maybe_unused that seems to trip up some sensitivities (Christoph Hellwig) - revert a bogus check in the CMA allocator (Zhenhua Huang) * tag 'dma-mapping-6.6-2023-09-09' of git://git.infradead.org/users/hch/dma-mapping: Revert "dma-contiguous: check for memory region overlap" dma-pool: remove a __maybe_unused label in atomic_pool_expand dma-contiguous: fix the Kconfig entry for CONFIG_DMA_NUMA_CMA dma-debug: don't call __dma_entry_alloc_check_leak() under free_entries_lock |
||
Linus Torvalds
|
01a46efcd8 |
printk fixup for 6.6
-----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEESH4wyp42V4tXvYsjUqAMR0iAlPIFAmT6xsgACgkQUqAMR0iA lPIpeA/+P8Jq25Hptd+Zh5XxkNO0lHKt8o9qZxgLwyA0Mh1YWLCTrmliqeK6jTWy CrKXkeyZmOQxzTFrflEN9OVQ/NnIBZBdiT6TWl64pI87pc+ElS+awC1BxS9gKDRC vhhwiMFWHt4DWlVhjKA+ARShYI/uI5+b/Ewlmg4ZeksWxWcFfnoZb0BCkQKGZcN+ 9G1C1mPtxV4lT1FJAglkgx3hF3+BcpX9EEVYdjdpQbD9J0jmlTP08/w+yyMvTzjs BVRPtcPeq0eb5iFp06SJKqa4377j6N9KMKtZG5IjzbkU/N6u7C4hXYxiGM/nOmyC 022/ZuFP6uwoeOiWBfuPJK9cadsomMbqeSJxC8wh/eLRqgTKU7N6wt8ybSSNynrC oMzdEI+ovjYIVrb13ZFDE+YFsXCzNhw1xNMmxdJMGQeeFNVkjK5MV1yaGdisffgj ps3eJbdaklFgU1m2GUkoVKglLeiYsihyvnSDZuxgfe+12GPReE0eTjvyj8gRUOd3 DLd1GPI7gmUv0c3k9VHyLu/ATrlBB9BvZ4cT2+anrC4Trf8Al2E8xgkzhYceqMOj 6XZFoGUGW+nhxU0vGy1psYCyw1k4L71vcT/WJD9ul+6RoHvwAnbDhmfu00W4LJ7C YwMheIc+00v4ofu/oXt32DhVFuAy06VdGiC4LYwWKkaPJELatPA= =bhor -----END PGP SIGNATURE----- Merge tag 'printk-for-6.6-fixup' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux Pull printk fix from Petr Mladek: - Revert exporting symbols needed for dumping the raw printk buffer in panic(). I pushed the export prematurely before the user was ready for merging into the mainline. * tag 'printk-for-6.6-fixup' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux: Revert "printk: export symbols for debug modules" |
||
Palmer Dabbelt
|
77eea559ba
|
Merge patch series "bpf, riscv: use BPF prog pack allocator in BPF JIT"
Puranjay Mohan <puranjay12@gmail.com> says: Here is some data to prove the V2 fixes the problem: Without this series: root@rv-selftester:~/src/kselftest/bpf# time ./test_tag test_tag: OK (40945 tests) real 7m47.562s user 0m24.145s sys 6m37.064s With this series applied: root@rv-selftester:~/src/selftest/bpf# time ./test_tag test_tag: OK (40945 tests) real 7m29.472s user 0m25.865s sys 6m18.401s BPF programs currently consume a page each on RISCV. For systems with many BPF programs, this adds significant pressure to instruction TLB. High iTLB pressure usually causes slow down for the whole system. Song Liu introduced the BPF prog pack allocator[1] to mitigate the above issue. It packs multiple BPF programs into a single huge page. It is currently only enabled for the x86_64 BPF JIT. I enabled this allocator on the ARM64 BPF JIT[2]. It is being reviewed now. This patch series enables the BPF prog pack allocator for the RISCV BPF JIT. ====================================================== Performance Analysis of prog pack allocator on RISCV64 ====================================================== Test setup: =========== Host machine: Debian GNU/Linux 11 (bullseye) Qemu Version: QEMU emulator version 8.0.3 (Debian 1:8.0.3+dfsg-1) u-boot-qemu Version: 2023.07+dfsg-1 opensbi Version: 1.3-1 To test the performance of the BPF prog pack allocator on RV, a stresser tool[4] linked below was built. This tool loads 8 BPF programs on the system and triggers 5 of them in an infinite loop by doing system calls. The runner script starts 20 instances of the above which loads 8*20=160 BPF programs on the system, 5*20=100 of which are being constantly triggered. The script is passed a command which would be run in the above environment. The script was run with following perf command: ./run.sh "perf stat -a \ -e iTLB-load-misses \ -e dTLB-load-misses \ -e dTLB-store-misses \ -e instructions \ --timeout 60000" The output of the above command is discussed below before and after enabling the BPF prog pack allocator. The tests were run on qemu-system-riscv64 with 8 cpus, 16G memory. The rootfs was created using Bjorn's riscv-cross-builder[5] docker container linked below. Results ======= Before enabling prog pack allocator: ------------------------------------ Performance counter stats for 'system wide': 4939048 iTLB-load-misses 5468689 dTLB-load-misses 465234 dTLB-store-misses 1441082097998 instructions 60.045791200 seconds time elapsed After enabling prog pack allocator: ----------------------------------- Performance counter stats for 'system wide': 3430035 iTLB-load-misses 5008745 dTLB-load-misses 409944 dTLB-store-misses 1441535637988 instructions 60.046296600 seconds time elapsed Improvements in metrics ======================= It was expected that the iTLB-load-misses would decrease as now a single huge page is used to keep all the BPF programs compared to a single page for each program earlier. -------------------------------------------- The improvement in iTLB-load-misses: -30.5 % -------------------------------------------- I repeated this expriment more than 100 times in different setups and the improvement was always greater than 30%. This patch series is boot tested on the Starfive VisionFive 2 board[6]. The performance analysis was not done on the board because it doesn't expose iTLB-load-misses, etc. The stresser program was run on the board to test the loading and unloading of BPF programs [1] https://lore.kernel.org/bpf/20220204185742.271030-1-song@kernel.org/ [2] https://lore.kernel.org/all/20230626085811.3192402-1-puranjay12@gmail.com/ [3] https://lore.kernel.org/all/20230626085811.3192402-2-puranjay12@gmail.com/ [4] https://github.com/puranjaymohan/BPF-Allocator-Bench [5] https://github.com/bjoto/riscv-cross-builder [6] https://www.starfivetech.com/en/site/boards * b4-shazam-merge: bpf, riscv: use prog pack allocator in the BPF JIT riscv: implement a memset like function for text riscv: extend patch_text_nosync() for multiple pages bpf: make bpf_prog_pack allocator portable Link: https://lore.kernel.org/r/20230831131229.497941-1-puranjay12@gmail.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com> |
||
Zhenhua Huang
|
f875db4f20 |
Revert "dma-contiguous: check for memory region overlap"
This reverts commit
|
||
Linus Torvalds
|
73be7fb14e |
Including fixes from netfilter and bpf.
Current release - regressions: - eth: stmmac: fix failure to probe without MAC interface specified Current release - new code bugs: - docs: netlink: fix missing classic_netlink doc reference Previous releases - regressions: - deal with integer overflows in kmalloc_reserve() - use sk_forward_alloc_get() in sk_get_meminfo() - bpf_sk_storage: fix the missing uncharge in sk_omem_alloc - fib: avoid warn splat in flow dissector after packet mangling - skb_segment: call zero copy functions before using skbuff frags - eth: sfc: check for zero length in EF10 RX prefix Previous releases - always broken: - af_unix: fix msg_controllen test in scm_pidfd_recv() for MSG_CMSG_COMPAT - xsk: fix xsk_build_skb() dereferencing possible ERR_PTR() - netfilter: - nft_exthdr: fix non-linear header modification - xt_u32, xt_sctp: validate user space input - nftables: exthdr: fix 4-byte stack OOB write - nfnetlink_osf: avoid OOB read - one more fix for the garbage collection work from last release - igmp: limit igmpv3_newpack() packet size to IP_MAX_MTU - bpf, sockmap: fix preempt_rt splat when using raw_spin_lock_t - handshake: fix null-deref in handshake_nl_done_doit() - ip: ignore dst hint for multipath routes to ensure packets are hashed across the nexthops - phy: micrel: - correct bit assignments for cable test errata - disable EEE according to the KSZ9477 errata Misc: - docs/bpf: document compile-once-run-everywhere (CO-RE) relocations - Revert "net: macsec: preserve ingress frame ordering", it appears to have been developed against an older kernel, problem doesn't exist upstream Signed-off-by: Jakub Kicinski <kuba@kernel.org> -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmT6R6wACgkQMUZtbf5S IrsmTg//TgmRjxSZ0lrPQtJwZR/eN3ZR2oQG3rwnssCx+YgHEGGxQsfT4KHEMacR ZgGDZVTpthUJkkACBPi8ZMoy++RdjEmlCcanfeDkGHoYGtiX1lhkofhLMn1KUHbI rIbP9EdNKxQT0SsBlw/U28pD5jKyqOgL23QobEwmcjLTdMpamb+qIsD6/xNv9tEj Tu4BdCIkhjxnBD622hsE3pFTG7oSn2WM6rf5NT1E43mJ3W8RrMcydSB27J7Oryo9 l3nYMAhz0vQINS2WQ9eCT1/7GI6gg1nDtxFtrnV7ASvxayRBPIUr4kg1vT+Tixsz CZMnwVamEBIYl9agmj7vSji7d5nOUgXPhtWhwWUM2tRoGdeGw3vSi1pgDvRiUCHE PJ4UHv7goa2AgnOlOQCFtRybAu+9nmSGm7V+GkeGLnH7xbFsEa5smQ/+FSPJs8Dn Yf4q5QAhdN8tdnofRlrN/nCssoDF3cfmBsTJ7wo5h71gW+BWhsP58eDCJlXd/r8k +Qnvoe2kw27ktFR1tjsUDZ0AcSmeVARNwmXCOBYZsG4tEek8pLyj008mDvJvdfyn PGPn7Eo5DyaERlHVmPuebHXSyniDEPe2GLTmlHcGiRpGspoUHbB+HRiDAuRLMB9g pkL8RHpNfppnuUXeUoNy3rgEkYwlpTjZX0QHC6N8NQ76ccB6CNM= =YpmE -----END PGP SIGNATURE----- Merge tag 'net-6.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking updates from Jakub Kicinski: "Including fixes from netfilter and bpf. Current release - regressions: - eth: stmmac: fix failure to probe without MAC interface specified Current release - new code bugs: - docs: netlink: fix missing classic_netlink doc reference Previous releases - regressions: - deal with integer overflows in kmalloc_reserve() - use sk_forward_alloc_get() in sk_get_meminfo() - bpf_sk_storage: fix the missing uncharge in sk_omem_alloc - fib: avoid warn splat in flow dissector after packet mangling - skb_segment: call zero copy functions before using skbuff frags - eth: sfc: check for zero length in EF10 RX prefix Previous releases - always broken: - af_unix: fix msg_controllen test in scm_pidfd_recv() for MSG_CMSG_COMPAT - xsk: fix xsk_build_skb() dereferencing possible ERR_PTR() - netfilter: - nft_exthdr: fix non-linear header modification - xt_u32, xt_sctp: validate user space input - nftables: exthdr: fix 4-byte stack OOB write - nfnetlink_osf: avoid OOB read - one more fix for the garbage collection work from last release - igmp: limit igmpv3_newpack() packet size to IP_MAX_MTU - bpf, sockmap: fix preempt_rt splat when using raw_spin_lock_t - handshake: fix null-deref in handshake_nl_done_doit() - ip: ignore dst hint for multipath routes to ensure packets are hashed across the nexthops - phy: micrel: - correct bit assignments for cable test errata - disable EEE according to the KSZ9477 errata Misc: - docs/bpf: document compile-once-run-everywhere (CO-RE) relocations - Revert "net: macsec: preserve ingress frame ordering", it appears to have been developed against an older kernel, problem doesn't exist upstream" * tag 'net-6.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (95 commits) net: enetc: distinguish error from valid pointers in enetc_fixup_clear_rss_rfs() Revert "net: team: do not use dynamic lockdep key" net: hns3: remove GSO partial feature bit net: hns3: fix the port information display when sfp is absent net: hns3: fix invalid mutex between tc qdisc and dcb ets command issue net: hns3: fix debugfs concurrency issue between kfree buffer and read net: hns3: fix byte order conversion issue in hclge_dbg_fd_tcam_read() net: hns3: Support query tx timeout threshold by debugfs net: hns3: fix tx timeout issue net: phy: Provide Module 4 KSZ9477 errata (DS80000754C) netfilter: nf_tables: Unbreak audit log reset netfilter: ipset: add the missing IP_SET_HASH_WITH_NET0 macro for ip_set_hash_netportnet.c netfilter: nft_set_rbtree: skip sync GC for new elements in this transaction netfilter: nf_tables: uapi: Describe NFTA_RULE_CHAIN_ID netfilter: nfnetlink_osf: avoid OOB read netfilter: nftables: exthdr: fix 4-byte stack OOB write selftests/bpf: Check bpf_sk_storage has uncharged sk_omem_alloc bpf: bpf_sk_storage: Fix the missing uncharge in sk_omem_alloc bpf: bpf_sk_storage: Fix invalid wait context lockdep report s390/bpf: Pass through tail call counter in trampolines ... |
||
Christoph Hellwig
|
4952801fc6 |
Revert "printk: export symbols for debug modules"
This reverts commit
|
||
Puranjay Mohan
|
20e490adea
|
bpf: make bpf_prog_pack allocator portable
The bpf_prog_pack allocator currently uses module_alloc() and module_memfree() to allocate and free memory. This is not portable because different architectures use different methods for allocating memory for BPF programs. Like ARM64 and riscv use vmalloc()/vfree(). Use bpf_jit_alloc_exec() and bpf_jit_free_exec() for memory management in bpf_prog_pack allocator. Other architectures can override these with their implementation and will be able to use bpf_prog_pack directly. On architectures that don't override bpf_jit_alloc/free_exec() this is basically a NOP. Signed-off-by: Puranjay Mohan <puranjay12@gmail.com> Acked-by: Song Liu <song@kernel.org> Acked-by: Björn Töpel <bjorn@kernel.org> Tested-by: Björn Töpel <bjorn@rivosinc.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/r/20230831131229.497941-2-puranjay12@gmail.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com> |
||
Martin KaFai Lau
|
55d49f750b |
bpf: bpf_sk_storage: Fix the missing uncharge in sk_omem_alloc
The commit |
||
Martin KaFai Lau
|
a96a44aba5 |
bpf: bpf_sk_storage: Fix invalid wait context lockdep report
'./test_progs -t test_local_storage' reported a splat:
[ 27.137569] =============================
[ 27.138122] [ BUG: Invalid wait context ]
[ 27.138650] 6.5.0-03980-gd11ae1b16b0a #247 Tainted: G O
[ 27.139542] -----------------------------
[ 27.140106] test_progs/1729 is trying to lock:
[ 27.140713] ffff8883ef047b88 (stock_lock){-.-.}-{3:3}, at: local_lock_acquire+0x9/0x130
[ 27.141834] other info that might help us debug this:
[ 27.142437] context-{5:5}
[ 27.142856] 2 locks held by test_progs/1729:
[ 27.143352] #0: ffffffff84bcd9c0 (rcu_read_lock){....}-{1:3}, at: rcu_lock_acquire+0x4/0x40
[ 27.144492] #1: ffff888107deb2c0 (&storage->lock){..-.}-{2:2}, at: bpf_local_storage_update+0x39e/0x8e0
[ 27.145855] stack backtrace:
[ 27.146274] CPU: 0 PID: 1729 Comm: test_progs Tainted: G O 6.5.0-03980-gd11ae1b16b0a #247
[ 27.147550] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
[ 27.149127] Call Trace:
[ 27.149490] <TASK>
[ 27.149867] dump_stack_lvl+0x130/0x1d0
[ 27.152609] dump_stack+0x14/0x20
[ 27.153131] __lock_acquire+0x1657/0x2220
[ 27.153677] lock_acquire+0x1b8/0x510
[ 27.157908] local_lock_acquire+0x29/0x130
[ 27.159048] obj_cgroup_charge+0xf4/0x3c0
[ 27.160794] slab_pre_alloc_hook+0x28e/0x2b0
[ 27.161931] __kmem_cache_alloc_node+0x51/0x210
[ 27.163557] __kmalloc+0xaa/0x210
[ 27.164593] bpf_map_kzalloc+0xbc/0x170
[ 27.165147] bpf_selem_alloc+0x130/0x510
[ 27.166295] bpf_local_storage_update+0x5aa/0x8e0
[ 27.167042] bpf_fd_sk_storage_update_elem+0xdb/0x1a0
[ 27.169199] bpf_map_update_value+0x415/0x4f0
[ 27.169871] map_update_elem+0x413/0x550
[ 27.170330] __sys_bpf+0x5e9/0x640
[ 27.174065] __x64_sys_bpf+0x80/0x90
[ 27.174568] do_syscall_64+0x48/0xa0
[ 27.175201] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 27.175932] RIP: 0033:0x7effb40e41ad
[ 27.176357] Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d8
[ 27.179028] RSP: 002b:00007ffe64c21fc8 EFLAGS: 00000202 ORIG_RAX: 0000000000000141
[ 27.180088] RAX: ffffffffffffffda RBX: 00007ffe64c22768 RCX: 00007effb40e41ad
[ 27.181082] RDX: 0000000000000020 RSI: 00007ffe64c22008 RDI: 0000000000000002
[ 27.182030] RBP: 00007ffe64c21ff0 R08: 0000000000000000 R09: 00007ffe64c22788
[ 27.183038] R10: 0000000000000064 R11: 0000000000000202 R12: 0000000000000000
[ 27.184006] R13: 00007ffe64c22788 R14: 00007effb42a1000 R15: 0000000000000000
[ 27.184958] </TASK>
It complains about acquiring a local_lock while holding a raw_spin_lock.
It means it should not allocate memory while holding a raw_spin_lock
since it is not safe for RT.
raw_spin_lock is needed because bpf_local_storage supports tracing
context. In particular for task local storage, it is easy to
get a "current" task PTR_TO_BTF_ID in tracing bpf prog.
However, task (and cgroup) local storage has already been moved to
bpf mem allocator which can be used after raw_spin_lock.
The splat is for the sk storage. For sk (and inode) storage,
it has not been moved to bpf mem allocator. Using raw_spin_lock or not,
kzalloc(GFP_ATOMIC) could theoretically be unsafe in tracing context.
However, the local storage helper requires a verifier accepted
sk pointer (PTR_TO_BTF_ID), it is hypothetical if that (mean running
a bpf prog in a kzalloc unsafe context and also able to hold a verifier
accepted sk pointer) could happen.
This patch avoids kzalloc after raw_spin_lock to silent the splat.
There is an existing kzalloc before the raw_spin_lock. At that point,
a kzalloc is very likely required because a lookup has just been done
before. Thus, this patch always does the kzalloc before acquiring
the raw_spin_lock and remove the later kzalloc usage after the
raw_spin_lock. After this change, it will have a charge and then
uncharge during the syscall bpf_map_update_elem() code path.
This patch opts for simplicity and not continue the old
optimization to save one charge and uncharge.
This issue is dated back to the very first commit of bpf_sk_storage
which had been refactored multiple times to create task, inode, and
cgroup storage. This patch uses a Fixes tag with a more recent
commit that should be easier to do backport.
Fixes:
|
||
Sebastian Andrzej Siewior
|
6764e767f4 |
bpf: Assign bpf_tramp_run_ctx::saved_run_ctx before recursion check.
__bpf_prog_enter_recur() assigns bpf_tramp_run_ctx::saved_run_ctx before
performing the recursion check which means in case of a recursion
__bpf_prog_exit_recur() uses the previously set bpf_tramp_run_ctx::saved_run_ctx
value.
__bpf_prog_enter_sleepable_recur() assigns bpf_tramp_run_ctx::saved_run_ctx
after the recursion check which means in case of a recursion
__bpf_prog_exit_sleepable_recur() uses an uninitialized value. This does not
look right. If I read the entry trampoline code right, then bpf_tramp_run_ctx
isn't initialized upfront.
Align __bpf_prog_enter_sleepable_recur() with __bpf_prog_enter_recur() and
set bpf_tramp_run_ctx::saved_run_ctx before the recursion check is made.
Remove the assignment of saved_run_ctx in kern_sys_bpf() since it happens
a few cycles later.
Fixes:
|
||
Sebastian Andrzej Siewior
|
7645629f7d |
bpf: Invoke __bpf_prog_exit_sleepable_recur() on recursion in kern_sys_bpf().
If __bpf_prog_enter_sleepable_recur() detects recursion then it returns
0 without undoing rcu_read_lock_trace(), migrate_disable() or
decrementing the recursion counter. This is fine in the JIT case because
the JIT code will jump in the 0 case to the end and invoke the matching
exit trampoline (__bpf_prog_exit_sleepable_recur()).
This is not the case in kern_sys_bpf() which returns directly to the
caller with an error code.
Add __bpf_prog_exit_sleepable_recur() as clean up in the recursion case.
Fixes:
|
||
Linus Torvalds
|
61401a8724 |
Kbuild updates for v6.6
- Enable -Wenum-conversion warning option - Refactor the rpm-pkg target - Fix scripts/setlocalversion to consider annotated tags for rt-kernel - Add a jump key feature for the search menu of 'make nconfig' - Support Qt6 for 'make xconfig' - Enable -Wformat-overflow, -Wformat-truncation, -Wstringop-overflow, and -Wrestrict warnings for W=1 builds - Replace <asm/export.h> with <linux/export.h> for alpha, ia64, and sparc - Support DEB_BUILD_OPTIONS=parallel=N for the debian source package - Refactor scripts/Makefile.modinst and fix some modules_sign issues - Add a new Kconfig env variable to warn symbols that are not defined anywhere - Show help messages of config fragments in 'make help' -----BEGIN PGP SIGNATURE----- iQJJBAABCgAzFiEEbmPs18K1szRHjPqEPYsBB53g2wYFAmT3X/oVHG1hc2FoaXJv eUBrZXJuZWwub3JnAAoJED2LAQed4NsG58oQAIXDrka3r53Flky/uJjSl8ab620o XL3u4PF/ekv6qsZoLlU24WQP8BzcJO6gPHFz88mE9/J1+wHpNKZLZehjpgj1cCY3 LatbEAa3DCZPC/c7P/nz+FT4mjTZpKOeQmvZVfA+xonBHmTyVUKgws0uDB/xuTjE GARyOX7ymD0AAZv84SUUCiaBe5Y2Bkrki67HfteS4bxW8GHg0rZWzrFUUkEkoG54 elNOYR0WYROwyo8Iokd2MedVdK2SPZxvY8i67hXl2K+Qve6tLNk8dbRIENnYI0pk 7oQVmIfC20eu9CteywHlyjt8jpTOeIrRc2yhJKR0YrjjIzKhulRGMh+pFAAwoySd Se60uWCS2AydcXWTrtb+iwFUyM2zRK4SaMlxleqnoE/bWYp6jhg9qbV9xpztWSYI j39k9aX7B19stN1drzJeyXdILRVtaAQCcax3RR+mGgm4Z5fuTDntPepvIv8J3lBg QZ4MCdOdtFw33eQaKa7O3LddD3q1X355xeaIITivEe3rAk5iIJYu3Ty1VY+/XTcH ktSVl83zQ5Ge3tvx8D6kCR9J8jAQyTLIKHxvr/j969HgZKguS2i37eChnPyKcu23 ZMKJcmCJ1O7naQXVrb/TeiaMR0UEo/PSdrUjpEO3LlMpRthNXLVSLfgJGv8WLO7/ pb/HFXHgKaSORiRV =lfUi -----END PGP SIGNATURE----- Merge tag 'kbuild-v6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild Pull Kbuild updates from Masahiro Yamada: - Enable -Wenum-conversion warning option - Refactor the rpm-pkg target - Fix scripts/setlocalversion to consider annotated tags for rt-kernel - Add a jump key feature for the search menu of 'make nconfig' - Support Qt6 for 'make xconfig' - Enable -Wformat-overflow, -Wformat-truncation, -Wstringop-overflow, and -Wrestrict warnings for W=1 builds - Replace <asm/export.h> with <linux/export.h> for alpha, ia64, and sparc - Support DEB_BUILD_OPTIONS=parallel=N for the debian source package - Refactor scripts/Makefile.modinst and fix some modules_sign issues - Add a new Kconfig env variable to warn symbols that are not defined anywhere - Show help messages of config fragments in 'make help' * tag 'kbuild-v6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: (62 commits) kconfig: fix possible buffer overflow kbuild: Show marked Kconfig fragments in "help" kconfig: add warn-unknown-symbols sanity check kbuild: dummy-tools: make MPROFILE_KERNEL checks work on BE Documentation/llvm: refresh docs modpost: Skip .llvm.call-graph-profile section check kbuild: support modules_sign for external modules as well kbuild: support 'make modules_sign' with CONFIG_MODULE_SIG_ALL=n kbuild: move more module installation code to scripts/Makefile.modinst kbuild: reduce the number of mkdir calls during modules_install kbuild: remove $(MODLIB)/source symlink kbuild: move depmod rule to scripts/Makefile.modinst kbuild: add modules_sign to no-{compiler,sync-config}-targets kbuild: do not run depmod for 'make modules_sign' kbuild: deb-pkg: support DEB_BUILD_OPTIONS=parallel=N in debian/rules alpha: remove <asm/export.h> alpha: replace #include <asm/export.h> with #include <linux/export.h> ia64: remove <asm/export.h> ia64: replace #include <asm/export.h> with #include <linux/export.h> sparc: remove <asm/export.h> ... |
||
Linus Torvalds
|
3c31041e37 |
printk changes for 6.6
-----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEESH4wyp42V4tXvYsjUqAMR0iAlPIFAmT1pbAACgkQUqAMR0iA lPLoxBAAl18gKo6C8zIBNBoYNl7FxvChrJORjK7RQONs5RYKt8drHjSrJGazhjiV TIdbZt9juqs+UT/f6DnkJznrqQ0W70fQsgUpw+q7n7+cnkIoXAiAs+plexdQXigB 6nx67b2oub41jTwzn/uV7R/eTwq2VnoZqudS/o9iAI/Ia9JzkqmGx08hQedvOoqX 2SWs140iY/zXsFUyEfe8UTXwJUnC/n9pwtpe5sLpmtyupGc/KumUimTQ+LFJbV9o n8QhcQn10CE93M5fH/R2JXjZO7wuSmCHt/V8oSnoOwdCBBe7Tc6aBx5wUwc4XCuC 450h5hlYBKq97lA1PnWGC01uAkeDTRw8963LVRRqWvohoFuHXR0cisF9FTM9LXfs bg90XjzYAK7Ns9fJ0dZHOSbUtRaa06hiExUnO3ctyv2G6h8qKfv86LCuP0CMFmQU rflfk1dPiMW20HT3ZJNtMCy3Vu6kVQSdSaGKTnwzTcUWop5tCQxhmWYBXH6q/1LH aD7xT1xJnBGqLUqq5C8twoOea+w47x/vtjTLi7mJarP5Wfh8xv6axdkwE8N4NrYp cc2RR83a1BZ7At3YkAjfjHmhaZ97gSSe6+Yqk9UzvUEQY/WILEGnb0DKO1jCSB34 D2NPh1MHF5MFQjazdt/dSyMJVxDlTeY/S5wqfLLhNZts48LJ8n0= =D7ZU -----END PGP SIGNATURE----- Merge tag 'printk-for-6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux Pull printk updates from Petr Mladek: - Do not try to get the console lock when it is not need or useful in panic() - Replace the global console_suspended state by a per-console flag - Export symbols needed for dumping the raw printk buffer in panic() - Fix documentation of printf formats for integer types - Moved Sergey Senozhatsky to the reviewer role - Misc cleanups * tag 'printk-for-6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux: printk: export symbols for debug modules lib: test_scanf: Add explicit type cast to result initialization in test_number_prefix() printk: ringbuffer: Fix truncating buffer size min_t cast printk: Rename abandon_console_lock_in_panic() to other_cpu_in_panic() printk: Add per-console suspended state printk: Consolidate console deferred printing printk: Do not take console lock for console_flush_on_panic() printk: Keep non-panic-CPUs out of console lock printk: Reduce console_unblank() usage in unsafe scenarios kdb: Do not assume write() callback available docs: printk-formats: Treat char as always unsigned docs: printk-formats: Fix hex printing of signed values MAINTAINERS: adjust printk/vsprintf entries |
||
Petr Mladek
|
f0f6923953 | Merge branch 'rework/misc-cleanups' into for-linus | ||
Kees Cook
|
feec5e1f74 |
kbuild: Show marked Kconfig fragments in "help"
Currently the Kconfig fragments in kernel/configs and arch/*/configs that aren't used internally aren't discoverable through "make help", which consists of hard-coded lists of config fragments. Instead, list all the fragment targets that have a "# Help: " comment prefix so the targets can be generated dynamically. Add logic to the Makefile to search for and display the fragment and comment. Add comments to fragments that are intended to be direct targets. Signed-off-by: Kees Cook <keescook@chromium.org> Co-developed-by: Masahiro Yamada <masahiroy@kernel.org> Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc) Reviewed-by: Nicolas Schier <nicolas@fjasle.eu> Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> |
||
Linus Torvalds
|
b70100f2e6 |
Probes updates for v6.6:
- kprobes: use struct_size() for variable size kretprobe_instance data structure. - eprobe: Simplify trace_eprobe list iteration. - probe events: Data structure field access support on BTF argument. . Update BTF argument support on the functions in the kernel loadable modules (only loaded modules are supported). . Move generic BTF access function (search function prototype and get function parameters) to a separated file. . Add a function to search a member of data structure in BTF. . Support accessing BTF data structure member from probe args by C-like arrow('->') and dot('.') operators. e.g. 't sched_switch next=next->pid vruntime=next->se.vruntime' . Support accessing BTF data structure member from $retval. e.g. 'f getname_flags%return +0($retval->name):string' . Add string type checking if BTF type info is available. This will reject if user specify ":string" type for non "char pointer" type. . Automatically assume the fprobe event as a function return event if $retval is used. - selftests/ftrace: Add BTF data field access test cases. - Documentation: Update fprobe event example with BTF data field. -----BEGIN PGP SIGNATURE----- iQFPBAABCgA5FiEEh7BulGwFlgAOi5DV2/sHvwUrPxsFAmTycQkbHG1hc2FtaS5o aXJhbWF0c3VAZ21haWwuY29tAAoJENv7B78FKz8bqS8H/jeR1JhOzIXOvTw7XCFm MrSY/SKi8tQfV6lau2UmoYdbYvYjpqL34XLOQPNf2/lrcL2M9aNYXk9fbhlW8enx vkMyKQ0E5anixkF4vsTbEl9DaprxbpsPVACmZ/7VjQk2JuXIdyaNk8hno9LgIcEq udztb0o2HmDFqAXfRi0LvlSTAIwvXZ+usmEvYpaq1g2WwrCe7NHEYl42vMpj+h4H 9l4t5rA9JyPPX4yQUjtKGW5eRVTwDTm/Gn6DRzYfYzkkiBZv27qfovzBOt672LgG hyot+u7XeKvZx3jjnF7+mRWoH/m0dqyhyi/nPhpIE09VhgwclrbGAcDuR1x6sp01 PHY= =hBDN -----END PGP SIGNATURE----- Merge tag 'probes-v6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull probes updates from Masami Hiramatsu: - kprobes: use struct_size() for variable size kretprobe_instance data structure. - eprobe: Simplify trace_eprobe list iteration. - probe events: Data structure field access support on BTF argument. - Update BTF argument support on the functions in the kernel loadable modules (only loaded modules are supported). - Move generic BTF access function (search function prototype and get function parameters) to a separated file. - Add a function to search a member of data structure in BTF. - Support accessing BTF data structure member from probe args by C-like arrow('->') and dot('.') operators. e.g. 't sched_switch next=next->pid vruntime=next->se.vruntime' - Support accessing BTF data structure member from $retval. e.g. 'f getname_flags%return +0($retval->name):string' - Add string type checking if BTF type info is available. This will reject if user specify ":string" type for non "char pointer" type. - Automatically assume the fprobe event as a function return event if $retval is used. - selftests/ftrace: Add BTF data field access test cases. - Documentation: Update fprobe event example with BTF data field. * tag 'probes-v6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: Documentation: tracing: Update fprobe event example with BTF field selftests/ftrace: Add BTF fields access testcases tracing/fprobe-event: Assume fprobe is a return event by $retval tracing/probes: Add string type check with BTF tracing/probes: Support BTF field access from $retval tracing/probes: Support BTF based data structure field access tracing/probes: Add a function to search a member of a struct/union tracing/probes: Move finding func-proto API and getting func-param API to trace_btf tracing/probes: Support BTF argument on module functions tracing/eprobe: Iterate trace_eprobe directly kernel: kprobes: Use struct_size() |
||
Linus Torvalds
|
e021c5f1f6 |
Tracing fixes and clean ups for 6.6:
- Replace strlcpy() with strscpy() - Initialize the pipe cpumask to zero on allocation - Use within_module() instead of open coding it - Remove extra space in hwlat_detectory/mode output - Use LIST_HEAD() instead of open coding it - A bunch of clean ups and fixes for the cpumask filter - Set local da_mon_##name to static - Fix race in snapshot buffer between cpu write and swap -----BEGIN PGP SIGNATURE----- iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCZPMsBhQccm9zdGVkdEBn b29kbWlzLm9yZwAKCRAp5XQQmuv6qiToAP49yXVK6seGUwU18QSp4mCNa0QNSH0v bl2UYVSCPv8aNQEAquDOvGInbMcT2z69lK359TVlGPrtVjhqFDloSLMYgAo= =DTGo -----END PGP SIGNATURE----- Merge tag 'trace-v6.6-2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull more tracing updates from Steven Rostedt: "Tracing fixes and clean ups: - Replace strlcpy() with strscpy() - Initialize the pipe cpumask to zero on allocation - Use within_module() instead of open coding it - Remove extra space in hwlat_detectory/mode output - Use LIST_HEAD() instead of open coding it - A bunch of clean ups and fixes for the cpumask filter - Set local da_mon_##name to static - Fix race in snapshot buffer between cpu write and swap" * tag 'trace-v6.6-2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: tracing/filters: Fix coding style issues tracing/filters: Change parse_pred() cpulist ternary into an if block tracing/filters: Fix double-free of struct filter_pred.mask tracing/filters: Fix error-handling of cpulist parsing buffer tracing: Zero the pipe cpumask on alloc to avoid spurious -EBUSY ftrace: Use LIST_HEAD to initialize clear_hash ftrace: Use within_module to check rec->ip within specified module. tracing: Replace strlcpy with strscpy in trace/events/task.h tracing: Fix race issue between cpu buffer write and swap tracing: Remove extra space at the end of hwlat_detector/mode rv: Set variable 'da_mon_##name' to static |
||
Linus Torvalds
|
a6216978de |
Fix false positive "softirq work is pending" messages on -rt
kernels, caused by a buggy factoring-out of existing code. Signed-off-by: Ingo Molnar <mingo@kernel.org> -----BEGIN PGP SIGNATURE----- iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmTzC5oRHG1pbmdvQGtl cm5lbC5vcmcACgkQEnMQ0APhK1hG1Q/+ICGbpxdQOrVg7QTLzgsxttxIyi4Un6lb vPX8NO9/4HIxObR6bd+ji2499TIO6nIhRqGOzEYUe9jzEN27eM/bMo6kCcRkbWra 4V/GZd3j+XdJwIQR442cBdUcByk4X7FlE7KqizJIbvYYyLBXzboBcpOdH012e2O9 UzFjtU+pk5Lhit18jL6/AvjsMhneKb6YUH20Wbb6zjZ1FL28YGKpeOHrh6GSXlKE GVS07pWSAB8TMXdO+8YaKoE7VIOdMaYS/mJJ6u/M8Wo+Kl0wWwmJtjmSYzvD2Uod PGcCiGXr1QpWK66wZNnXjs3rb6bX5umCo8rc5L6rqvWTYvB8Owpl5V94+87yGEov 29lYvWdVJ7dPqP8fSQfYxBKbgfINwOO1STYnIX1Q5mDD9fK2SgOpD9+JFagYnJoI 5n6KoVArVHQXSB4odTn+Qyt0yu0iDubUFRxBTrWijq5ooHOExaxByl0ViyCfp1aS csTcGQSJsvHKhZPejDggjp74IU/ge5lUN4uSFlPVo3jYFwUIIgBG+43QtFiVrplg 3ifpI2qNISQl65PRerZjB5jBmItUGnUl71tnEg/Cli7zvvw/nMeKh98vChtE9S3A 2eQ66rrV9eJAeYaNCV4Uz1UmocD4i2Vec9tZOUUoIga/bDIOVr+bxUr7nvcOneak 98h2ylU4W8o= =zpfn -----END PGP SIGNATURE----- Merge tag 'timers-urgent-2023-09-02' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer fix from Ingo Molnar: "Fix false positive 'softirq work is pending' messages on -rt kernels, caused by a buggy factoring-out of existing code" * tag 'timers-urgent-2023-09-02' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: tick/rcu: Fix false positive "softirq work is pending" messages |
||
Linus Torvalds
|
23dfeae882 |
Fix a CPU hotplug related deadlock between the task which initiates
and controls a CPU hot-unplug operation vs. the CFS bandwidth timer. Signed-off-by: Ingo Molnar <mingo@kernel.org> -----BEGIN PGP SIGNATURE----- iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmTzCnoRHG1pbmdvQGtl cm5lbC5vcmcACgkQEnMQ0APhK1h9gQ//fOQT0OrAUwhAdW7IZcQGETdSykxYqRXT OciNPVUirJOXJM7tG08OEUUAiRrbIALHBRkfNk/ycOWTfa2qsur3jyGgyi8cnKo9 NmdNltRMZ2UbKlJxzoeu1wIqWkmoLaYloVp3YWXgPClclNbBROCvXvEHnEr1iRtA trfEjNxEYgKeDkJROg0Av3RQTzLgZ3TqZ67mzJVZbCbz9i/IxicJa4PNuzrkw3c3 q42Btx+Ru1ikl/Jww0asX4iESFxuUk3Aw7DBX7slaLMrLcPMKsbO2D3npSxLFTCP TUdMKoIanVjl5+a2//kT8TkV+M1OKvczy6AYH0pV/yZLkAQqJmLphVsEI6rMIdp2 ep26hrjaLlhp3dTr8jNQ86BlxT6zqP1/+OpC4BbKFK2HLJj7sGKcb5W5WMdhB/Qh tA+CgVZXJDHkH2m2zD6o+SDm5JvbbHOLywfBBUSggHDDq3oOrxdjS2g8tgFwtnJ2 ZxjvJ4Ot3M26b44qkQbJeG42Q7ciLDrfaOZhlZ6bt30agU4EP3bg4dZAL24EoPLY zdom++puL+nUBr6EvzbboVxisuf0cvDbujmuFRQdntRRy8oHgiQVhb+b4EWh0oOc CKN06nyA9z5MzhAek3/GuxMYKEWM9/Dy6rDyqvaxfcbc9PIaxGfRxjgpKxrdRPOu rjGsQHZbTlo= =wM0O -----END PGP SIGNATURE----- Merge tag 'smp-urgent-2023-09-02' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull CPU hotplug fix from Ingo Molnar: "Fix a CPU hotplug related deadlock between the task which initiates and controls a CPU hot-unplug operation vs. the CFS bandwidth timer" * tag 'smp-urgent-2023-09-02' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: cpu/hotplug: Prevent self deadlock on CPU hot-unplug |
||
Linus Torvalds
|
c39cbc5b60 |
Miscellaneous scheduler fixes: a reporting fix, a static symbol fix,
and a kernel-doc fix. Signed-off-by: Ingo Molnar <mingo@kernel.org> -----BEGIN PGP SIGNATURE----- iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmTzCWcRHG1pbmdvQGtl cm5lbC5vcmcACgkQEnMQ0APhK1ifZQ//SvKEhKT1lolh4bmMZAaRHWJBq8omH1V+ 36k5Jd3AOJcIEVJD0h+6yfJH2mlS6ZGW3te33VhW5z4c2dMBms90qMLv6xdr/E7j Pseud3bc6o9SHPA8v9oNKy9GTcnD/kKXxr7f8tabJxxewzUY7EkHa4lJ1AgOIzDP njWIVqqVFqoO1QjjKCN1ERuMU6ifX+6bcSik89f9F3Gg8KhUMbmv2+O6Jd22wwWC mI/atl2EdkJg0VlFNIZtVk6n+hwbBaPfkd76ihQ/82MaLo1M7PilO5mtpgUNUCMh XLlekYwFewUJP+xGkTg1FG8A2B937EXpPdO/8F4vFU/PhDeev8fIG99MIOo3h6A4 nlaKU/Lh9NFT/64wfP5/b8ud/UEf/7YhD1SH2SdtWwT2yXTrYUl2kdKYpgE8TX3C c7Ap0vKQIcRrycoOaoxsKw915jeA5zCyykd75RLfzmK2phW22QtZgdIOuiflDeds LAuelYaY6C7ZRPnGn2iWceoWS3IBhXTo4nsfh6sPX3A057iHo7CFjX7u1DeMqcuh XIoKOgjZR/vnJQaFdWTSKKbzwTweAc1BBDUYy4CxWbUMD13GIE2trCS+GBWTZcoF KaASIdXL4nUHP35rX9hlww5GUhF6NNOTZ9mkN7NHYfoVy0WXt/rLCywqo3D6Bne+ jeTHwFKjJYI= =jDS4 -----END PGP SIGNATURE----- Merge tag 'sched-urgent-2023-09-02' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler fixes from Ingo Molnar: "Miscellaneous scheduler fixes: a reporting fix, a static symbol fix, and a kernel-doc fix" * tag 'sched-urgent-2023-09-02' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: sched/core: Report correct state for TASK_IDLE | TASK_FREEZABLE sched/fair: Make update_entity_lag() static sched/core: Add kernel-doc for set_cpus_allowed_ptr() |
||
Linus Torvalds
|
76be05d4fd |
cgroup: fix build when CGROUP_SCHED is not enabled
Sudip Mukherjee reports that the mips sb1250_swarm_defconfig build fails
with the current kernel. It isn't actually MIPS-specific, it's just
that that defconfig does not have CGROUP_SCHED enabled like most configs
do, and as such shows this error:
kernel/cgroup/cgroup.c: In function 'cgroup_local_stat_show':
kernel/cgroup/cgroup.c:3699:15: error: implicit declaration of function 'cgroup_tryget_css'; did you mean 'cgroup_tryget'? [-Werror=implicit-function-declaration]
3699 | css = cgroup_tryget_css(cgrp, ss);
| ^~~~~~~~~~~~~~~~~
| cgroup_tryget
kernel/cgroup/cgroup.c:3699:13: warning: assignment to 'struct cgroup_subsys_state *' from 'int' makes pointer from integer without a cast [-Wint-conversion]
3699 | css = cgroup_tryget_css(cgrp, ss);
| ^
because cgroup_tryget_css() only exists when CGROUP_SCHED is enabled,
and the cgroup_local_stat_show() function should similarly be guarded by
that config option.
Move things around a bit to fix this all.
Fixes:
|
||
Valentin Schneider
|
cbb557ba92 |
tracing/filters: Fix coding style issues
Recent commits have introduced some coding style issues, fix those up. Link: https://lkml.kernel.org/r/20230901151039.125186-5-vschneid@redhat.com Cc: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Valentin Schneider <vschneid@redhat.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> |
||
Valentin Schneider
|
2900bcbee3 |
tracing/filters: Change parse_pred() cpulist ternary into an if block
Review comments noted that an if block would be clearer than a ternary, so swap it out. No change in behaviour intended Link: https://lkml.kernel.org/r/20230901151039.125186-4-vschneid@redhat.com Cc: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Valentin Schneider <vschneid@redhat.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> |
||
Valentin Schneider
|
1caf7adb9e |
tracing/filters: Fix double-free of struct filter_pred.mask
When a cpulist filter is found to contain a single CPU, that CPU is saved as a scalar and the backing cpumask storage is freed. Also NULL the mask to avoid a double-free once we get down to free_predicate(). Link: https://lkml.kernel.org/r/20230901151039.125186-3-vschneid@redhat.com Cc: Masami Hiramatsu <mhiramat@kernel.org> Reported-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Valentin Schneider <vschneid@redhat.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> |
||
Valentin Schneider
|
9af4058493 |
tracing/filters: Fix error-handling of cpulist parsing buffer
parse_pred() allocates a string buffer to parse the user-provided cpulist, but doesn't check the allocation result nor does it free the buffer once it is no longer needed. Add an allocation check, and free the buffer as soon as it is no longer needed. Link: https://lkml.kernel.org/r/20230901151039.125186-2-vschneid@redhat.com Cc: Masami Hiramatsu <mhiramat@kernel.org> Reported-by: Steven Rostedt <rostedt@goodmis.org> Reported-by: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: Valentin Schneider <vschneid@redhat.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> |
||
Brian Foster
|
3d07fa1dd1 |
tracing: Zero the pipe cpumask on alloc to avoid spurious -EBUSY
The pipe cpumask used to serialize opens between the main and percpu
trace pipes is not zeroed or initialized. This can result in
spurious -EBUSY returns if underlying memory is not fully zeroed.
This has been observed by immediate failure to read the main
trace_pipe file on an otherwise newly booted and idle system:
# cat /sys/kernel/debug/tracing/trace_pipe
cat: /sys/kernel/debug/tracing/trace_pipe: Device or resource busy
Zero the allocation of pipe_cpumask to avoid the problem.
Link: https://lore.kernel.org/linux-trace-kernel/20230831125500.986862-1-bfoster@redhat.com
Cc: stable@vger.kernel.org
Fixes:
|
||
Ruan Jinjie
|
2a30dbcbef |
ftrace: Use LIST_HEAD to initialize clear_hash
Use LIST_HEAD() to initialize clear_hash instead of open-coding it. Link: https://lore.kernel.org/linux-trace-kernel/20230809071551.913041-1-ruanjinjie@huawei.com Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Ruan Jinjie <ruanjinjie@huawei.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> |
||
Levi Yun
|
1351148904 |
ftrace: Use within_module to check rec->ip within specified module.
within_module_core && within_module_init condition is same to within module but it's more readable. Use within_module instead of former condition to check rec->ip within specified module area or not. Link: https://lore.kernel.org/linux-trace-kernel/20230803205236.32201-1-ppbuk5246@gmail.com Signed-off-by: Levi Yun <ppbuk5246@gmail.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> |
||
Zheng Yejian
|
3163f635b2 |
tracing: Fix race issue between cpu buffer write and swap
Warning happened in rb_end_commit() at code:
if (RB_WARN_ON(cpu_buffer, !local_read(&cpu_buffer->committing)))
WARNING: CPU: 0 PID: 139 at kernel/trace/ring_buffer.c:3142
rb_commit+0x402/0x4a0
Call Trace:
ring_buffer_unlock_commit+0x42/0x250
trace_buffer_unlock_commit_regs+0x3b/0x250
trace_event_buffer_commit+0xe5/0x440
trace_event_buffer_reserve+0x11c/0x150
trace_event_raw_event_sched_switch+0x23c/0x2c0
__traceiter_sched_switch+0x59/0x80
__schedule+0x72b/0x1580
schedule+0x92/0x120
worker_thread+0xa0/0x6f0
It is because the race between writing event into cpu buffer and swapping
cpu buffer through file per_cpu/cpu0/snapshot:
Write on CPU 0 Swap buffer by per_cpu/cpu0/snapshot on CPU 1
-------- --------
tracing_snapshot_write()
[...]
ring_buffer_lock_reserve()
cpu_buffer = buffer->buffers[cpu]; // 1. Suppose find 'cpu_buffer_a';
[...]
rb_reserve_next_event()
[...]
ring_buffer_swap_cpu()
if (local_read(&cpu_buffer_a->committing))
goto out_dec;
if (local_read(&cpu_buffer_b->committing))
goto out_dec;
buffer_a->buffers[cpu] = cpu_buffer_b;
buffer_b->buffers[cpu] = cpu_buffer_a;
// 2. cpu_buffer has swapped here.
rb_start_commit(cpu_buffer);
if (unlikely(READ_ONCE(cpu_buffer->buffer)
!= buffer)) { // 3. This check passed due to 'cpu_buffer->buffer'
[...] // has not changed here.
return NULL;
}
cpu_buffer_b->buffer = buffer_a;
cpu_buffer_a->buffer = buffer_b;
[...]
// 4. Reserve event from 'cpu_buffer_a'.
ring_buffer_unlock_commit()
[...]
cpu_buffer = buffer->buffers[cpu]; // 5. Now find 'cpu_buffer_b' !!!
rb_commit(cpu_buffer)
rb_end_commit() // 6. WARN for the wrong 'committing' state !!!
Based on above analysis, we can easily reproduce by following testcase:
``` bash
#!/bin/bash
dmesg -n 7
sysctl -w kernel.panic_on_warn=1
TR=/sys/kernel/tracing
echo 7 > ${TR}/buffer_size_kb
echo "sched:sched_switch" > ${TR}/set_event
while [ true ]; do
echo 1 > ${TR}/per_cpu/cpu0/snapshot
done &
while [ true ]; do
echo 1 > ${TR}/per_cpu/cpu0/snapshot
done &
while [ true ]; do
echo 1 > ${TR}/per_cpu/cpu0/snapshot
done &
```
To fix it, IIUC, we can use smp_call_function_single() to do the swap on
the target cpu where the buffer is located, so that above race would be
avoided.
Link: https://lore.kernel.org/linux-trace-kernel/20230831132739.4070878-1-zhengyejian1@huawei.com
Cc: <mhiramat@kernel.org>
Fixes:
|
||
Mikhail Kobuk
|
2cf0dee989 |
tracing: Remove extra space at the end of hwlat_detector/mode
Space is printed after each mode value including the last one:
$ echo \"$(sudo cat /sys/kernel/tracing/hwlat_detector/mode)\"
"none [round-robin] per-cpu "
Found by Linux Verification Center (linuxtesting.org) with SVACE.
Link: https://lore.kernel.org/linux-trace-kernel/20230825103432.7750-1-m.kobuk@ispras.ru
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Fixes:
|
||
Linus Torvalds
|
34232fcfe9 |
Tracing updates for 6.6:
User visible changes: - Added a way to easier filter with cpumasks: # echo 'cpumask & CPUS{17-42}' > /sys/kernel/tracing/events/ipi_send_cpumask/filter - Show actual size of ring buffer after modifying the ring buffer size via buffer_size_kb. Currently it just returns what was written, but the actual size rounds up to the sub buffer size. Show that real size instead. Major changes: - Added "eventfs". This is the code that handles the inodes and dentries of tracefs/events directory. As there are thousands of events, and each event has several inodes and dentries that currently exist even when tracing is never used, they take up precious memory. Instead, eventfs will allocate the inodes and dentries in a JIT way (similar to what procfs does). There is now metadata that handles the events and subdirectories, and will create the inodes and dentries when they are used. Note, I also have patches that remove the subdirectory meta data, but will wait till the next merge window before applying them. It's a little more complex, and I want to make sure the dynamic code works properly before adding more complexity, making it easier to revert if need be. Minor changes: - Optimization to user event list traversal. - Remove intermediate permission of tracefs files (note the intermediate permission removes all access to the files so it is not a security concern, but just a clean up.) - Add the complex fix to FORTIFY_SOURCE to the kernel stack event logic. - Other minor clean ups. -----BEGIN PGP SIGNATURE----- iQJIBAABCgAyFiEEXtmkj8VMCiLR0IBM68Js21pW3nMFAmTwtAsUHHJvc3RlZHRA Z29vZG1pcy5vcmcACgkQ68Js21pW3nNOXRAAsslQT6alY4OeplC4x47+V6+6NiIA oDtOmWAqf7TsH9bukzRFD36rUly42O20RJDx9z0Q3iRc3vGxEawId8z6P0HmBwRb VSl5BryWvL5Wc5w94xS8EeCuC1MRfhVDyfbtVFmWigzfvd/f+hp71ViMPHUvrRJX KhzzNSBc4ir5E1lzfwa7meYTXzDwrQlZbYfdf5aH94IWAkqDj85PUZDJ7UmLZhXG CIglSpNFXZ0j19Wo/U6KZlHR1XfunBKungCzJ5Dbznc9YLWZTQXOIZF4YPKfPIJL ulRG9chwXY0nQWhG3xM1UHZLsAMSWw5i13a4ZN4d8FCNOgv8ttcJnfDk7ZYUS0Oz RmY1dGcSRKAZTUTjm8ZBtmyiUCc9kZAIk0fyEfIHtoDYXmhnvni3wuTnbRSdXaSi q4YkxPaLfX8Fn3QloCqqddt8iONu7BnbpZOhUCl2AtBib52gnTTF7+rQ6/0D3rjo SSuvEHhnjJhzk+3jM2odxjmTAztNT+yu6FbKXZUKPt1Kj9YHv1J9cEQw9/Etw+GV 8jQBe979D8hFJmDOJOT/O/TdPqE9mQoMNBt6Y8QnE4nbJWM+i/MBrThFpUSQhRCr 0Ya/HgR2QyRH7RmZW5o2H9mNtN+V9c7RxZW8erYzRbUs0YofK2OpGi9SrPzxWCke w6j0VVZHaxdPguM= =/s+e -----END PGP SIGNATURE----- Merge tag 'trace-v6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull tracing updates from Steven Rostedt: "User visible changes: - Added a way to easier filter with cpumasks: # echo 'cpumask & CPUS{17-42}' > /sys/kernel/tracing/events/ipi_send_cpumask/filter - Show actual size of ring buffer after modifying the ring buffer size via buffer_size_kb. Currently it just returns what was written, but the actual size rounds up to the sub buffer size. Show that real size instead. Major changes: - Added "eventfs". This is the code that handles the inodes and dentries of tracefs/events directory. As there are thousands of events, and each event has several inodes and dentries that currently exist even when tracing is never used, they take up precious memory. Instead, eventfs will allocate the inodes and dentries in a JIT way (similar to what procfs does). There is now metadata that handles the events and subdirectories, and will create the inodes and dentries when they are used. Note, I also have patches that remove the subdirectory meta data, but will wait till the next merge window before applying them. It's a little more complex, and I want to make sure the dynamic code works properly before adding more complexity, making it easier to revert if need be. Minor changes: - Optimization to user event list traversal - Remove intermediate permission of tracefs files (note the intermediate permission removes all access to the files so it is not a security concern, but just a clean up) - Add the complex fix to FORTIFY_SOURCE to the kernel stack event logic - Other minor cleanups" * tag 'trace-v6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: (29 commits) tracefs: Remove kerneldoc from struct eventfs_file tracefs: Avoid changing i_mode to a temp value tracing/user_events: Optimize safe list traversals ftrace: Remove empty declaration ftrace_enable_daemon() and ftrace_disable_daemon() tracing: Remove unused function declarations tracing/filters: Document cpumask filtering tracing/filters: Further optimise scalar vs cpumask comparison tracing/filters: Optimise CPU vs cpumask filtering when the user mask is a single CPU tracing/filters: Optimise scalar vs cpumask filtering when the user mask is a single CPU tracing/filters: Optimise cpumask vs cpumask filtering when user mask is a single CPU tracing/filters: Enable filtering the CPU common field by a cpumask tracing/filters: Enable filtering a scalar field by a cpumask tracing/filters: Enable filtering a cpumask field by another cpumask tracing/filters: Dynamically allocate filter_pred.regex test: ftrace: Fix kprobe test for eventfs eventfs: Move tracing/events to eventfs eventfs: Implement removal of meta data from eventfs eventfs: Implement functions to create files and dirs when accessed eventfs: Implement eventfs lookup, read, open functions eventfs: Implement eventfs file add functions ... |
||
Linus Torvalds
|
bd30fe6a7d |
workqueue: Changes for v6.6
* Unbound workqueues now support more flexible affinity scopes. The default behavior is to soft-affine according to last level cache boundaries. A work item queued from a given LLC is executed by a worker running on the same LLC but the worker may be moved across cache boundaries as the scheduler sees fit. On machines which multiple L3 caches, which are becoming more popular along with chiplet designs, this improves cache locality while not harming work conservation too much. Unbound workqueues are now also a lot more flexible in terms of execution affinity. Differeing levels of affinity scopes are supported and both the default and per-workqueue affinity settings can be modified dynamically. This should help working around amny of sub-optimal behaviors observed recently with asymmetric ARM CPUs. This involved signficant restructuring of workqueue code. Nothing was reported yet but there's some risk of subtle regressions. Should keep an eye out. * Rescuer workers now has more identifiable comms. * workqueue.unbound_cpus added so that CPUs which can be used by workqueue can be constrained early during boot. * Now that all the in-tree users have been flushed out, trigger warning if system-wide workqueues are flushed. * One pull commit from for-6.5-fixes to avoid cascading conflicts in the affinity scope patchset. -----BEGIN PGP SIGNATURE----- iIQEABYIACwWIQTfIjM1kS57o3GsC/uxYfJx3gVYGQUCZPERlQ4cdGpAa2VybmVs Lm9yZwAKCRCxYfJx3gVYGVqQAPwIOy9tWY5jFAmMuIyH6wV50hbmfxCc2n5xhQNr 5HoyGgEA8lw1W7afDCIPiQVA7AYsu8dhwuNSOcRCJxhrrn4XsA0= =g/Uu -----END PGP SIGNATURE----- Merge tag 'wq-for-6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq Pull workqueue updates from Tejun Heo: - Unbound workqueues now support more flexible affinity scopes. The default behavior is to soft-affine according to last level cache boundaries. A work item queued from a given LLC is executed by a worker running on the same LLC but the worker may be moved across cache boundaries as the scheduler sees fit. On machines which multiple L3 caches, which are becoming more popular along with chiplet designs, this improves cache locality while not harming work conservation too much. Unbound workqueues are now also a lot more flexible in terms of execution affinity. Differeing levels of affinity scopes are supported and both the default and per-workqueue affinity settings can be modified dynamically. This should help working around amny of sub-optimal behaviors observed recently with asymmetric ARM CPUs. This involved signficant restructuring of workqueue code. Nothing was reported yet but there's some risk of subtle regressions. Should keep an eye out. - Rescuer workers now has more identifiable comms. - workqueue.unbound_cpus added so that CPUs which can be used by workqueue can be constrained early during boot. - Now that all the in-tree users have been flushed out, trigger warning if system-wide workqueues are flushed. * tag 'wq-for-6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq: (31 commits) workqueue: fix data race with the pwq->stats[] increment workqueue: Rename rescuer kworker workqueue: Make default affinity_scope dynamically updatable workqueue: Add "Affinity Scopes and Performance" section to documentation workqueue: Implement non-strict affinity scope for unbound workqueues workqueue: Add workqueue_attrs->__pod_cpumask workqueue: Factor out need_more_worker() check and worker wake-up workqueue: Factor out work to worker assignment and collision handling workqueue: Add multiple affinity scopes and interface to select them workqueue: Modularize wq_pod_type initialization workqueue: Add tools/workqueue/wq_dump.py which prints out workqueue configuration workqueue: Generalize unbound CPU pods workqueue: Factor out clearing of workqueue-only attrs fields workqueue: Factor out actual cpumask calculation to reduce subtlety in wq_update_pod() workqueue: Initialize unbound CPU pods later in the boot workqueue: Move wq_pod_init() below workqueue_init() workqueue: Rename NUMA related names to use pod instead workqueue: Rename workqueue_attrs->no_numa to ->ordered workqueue: Make unbound workqueues to use per-cpu pool_workqueues workqueue: Call wq_update_unbound_numa() on all CPUs in NUMA node on CPU hotplug ... |
||
Linus Torvalds
|
7716f383a5 |
cgroup: Changes for v6.6
* Per-cpu cpu usage stats are now tracked. This currently isn't printed out
in the cgroupfs interface and can only be accessed through e.g. BPF.
Should decide on a not-too-ugly way to show per-cpu stats in cgroupfs.
* cpuset received some cleanups and prepatory patches for the pending
cpus.exclusive patchset which will allow cpuset partitions to be created
below non-partition parents, which should ease the management of partition
cpusets.
* A lot of code and documentation cleanup patches.
* tools/testing/selftests/cgroup/test_cpuset.c is added. This causes trivial
conflicts in .gitignore and Makefile under the directory against
|
||
Linus Torvalds
|
e987af4546 |
percpu: changes for v6.6
percpu * A couple cleanups by Baoquan He and Bibo Mao. The only behavior change is to start printing messages if we're under the warn limit for failed atomic allocations. percpu_counter * Shakeel introduced percpu counters into mm_struct which caused percpu allocations be on the hot path [1]. Originally I spent some time trying to improve the percpu allocator, but instead preferred what Mateusz Guzik proposed grouping at the allocation site, percpu_counter_init_many(). This allows a single percpu allocation to be shared by the counters. I like this approach because it creates a shared lifetime by the allocations. Additionally, I believe many inits have higher level synchronization requirements, like percpu_counter does against HOTPLUG_CPU. Therefore we can group these optimizations together. [1] https://lore.kernel.org/linux-mm/20221024052841.3291983-1-shakeelb@google.com/ -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEE3hZPHJdcVwe+yTTtiDc0yuoFPR0FAmTv2IUACgkQiDc0yuoF PR0+gg//U430Y9jRSKQtbh3dEPaAeWGcTfSTnVHbQGfBj3A4ePJyWl/Tgzri31AC rzr8SRs0yX8b82TbECWsV67i/GrntLJyz4yQ52S/RRqVwnQqSn/wicEdCY00lJBt Tye8zApOnYBouaYqIOxm/M7ofvKzJ3gWOVeF/zBwM6hwvNaXXtY5r86fSDxoEbhY HOFnCDmg5Spf0U50j1G7nV5KfAb7BNA3/HFyzfzH+w+OWi4IGbThsfrg1qvjyFot KlEK/kF8Af2xj2A2se4XFsLc2D/Tj+29juYVQqIPBJzVPrZ2uerKSszK5Zcr+Use kMiG7tRWKE+2vkOM1RQ5Y5NCVEBhlXlienz1gf/C7247SEGs6OIyqvyDAgPTRx6p oR2/vx9hMtaSMf4aHWd+fYS5gNZ05iMvOIbRZnI1wZkQglQVkJvXhzuLaJ+dIGSP ypv6XOepik7vDjZ3p3xJXd0TAn4NSkn3jWRetrymdtMFanF99qw1VqjmkLecSil0 Gr0UhRL1oiMde6niVJrOpdOGLwt/M4N99Y5rksw6NCnktRJ99coFGj7LglZGMsu+ YkOyjD8MVJXTkBtBNGeqHTKe6nyVkHFq9ad5EmWjPkefP5JziH8i18k7JlF1dLA5 c8peq3ES659D5f0mU2jilD9PsCsBfSn6Of4ruMZa2Zr1XDD8snI= =vcA1 -----END PGP SIGNATURE----- Merge tag 'percpu-for-6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/dennis/percpu Pull percpu updates from Dennis Zhou: "One bigger change to percpu_counter's api allowing for init and destroy of multiple counters via percpu_counter_init_many() and percpu_counter_destroy_many(). This is used to help begin remediating a performance regression with percpu rss stats. Additionally, it seems larger core count machines are feeling the burden of the single threaded allocation of percpu. Mateusz is thinking about it and I will spend some time on it too. percpu: - A couple cleanups by Baoquan He and Bibo Mao. The only behavior change is to start printing messages if we're under the warn limit for failed atomic allocations. percpu_counter: - Shakeel introduced percpu counters into mm_struct which caused percpu allocations be on the hot path [1]. Originally I spent some time trying to improve the percpu allocator, but instead preferred what Mateusz Guzik proposed grouping at the allocation site, percpu_counter_init_many(). This allows a single percpu allocation to be shared by the counters. I like this approach because it creates a shared lifetime by the allocations. Additionally, I believe many inits have higher level synchronization requirements, like percpu_counter does against HOTPLUG_CPU. Therefore we can group these optimizations together" Link: https://lore.kernel.org/linux-mm/20221024052841.3291983-1-shakeelb@google.com/ [1] * tag 'percpu-for-6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/dennis/percpu: kernel/fork: group allocation/free of per-cpu counters for mm struct pcpcntr: add group allocation/free mm/percpu.c: print error message too if atomic alloc failed mm/percpu.c: optimize the code in pcpu_setup_first_chunk() a little bit mm/percpu.c: remove redundant check mm/percpu: Remove some local variables in pcpu_populate_pte |
||
Linus Torvalds
|
8e1e49550d |
TTY/Serial driver changes for 6.6-rc1
Here is the big set of tty and serial driver changes for 6.6-rc1. Lots of cleanups in here this cycle, and some driver updates. Short summary is: - Jiri's continued work to make the tty code and apis be a bit more sane with regards to modern kernel coding style and types - cpm_uart driver updates - n_gsm updates and fixes - meson driver updates - sc16is7xx driver updates - 8250 driver updates for different hardware types - qcom-geni driver fixes - tegra serial driver change - stm32 driver updates - synclink_gt driver cleanups - tty structure size reduction All of these have been in linux-next this week with no reported issues. The last bit of cleanups from Jiri and the tty structure size reduction came in last week, a bit late but as they were just style changes and size reductions, I figured they should get into this merge cycle so that others can work on top of them with no merge conflicts. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> -----BEGIN PGP SIGNATURE----- iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCZPH+jA8cZ3JlZ0Brcm9h aC5jb20ACgkQMUfUDdst+ykKyACgldt6QeenTN+6dXIHS/eQHtTKZwMAn3arSeXI QrUUnLFjOWyoX87tbMBQ =LVw0 -----END PGP SIGNATURE----- Merge tag 'tty-6.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty Pull tty/serial driver updates from Greg KH: "Here is the big set of tty and serial driver changes for 6.6-rc1. Lots of cleanups in here this cycle, and some driver updates. Short summary is: - Jiri's continued work to make the tty code and apis be a bit more sane with regards to modern kernel coding style and types - cpm_uart driver updates - n_gsm updates and fixes - meson driver updates - sc16is7xx driver updates - 8250 driver updates for different hardware types - qcom-geni driver fixes - tegra serial driver change - stm32 driver updates - synclink_gt driver cleanups - tty structure size reduction All of these have been in linux-next this week with no reported issues. The last bit of cleanups from Jiri and the tty structure size reduction came in last week, a bit late but as they were just style changes and size reductions, I figured they should get into this merge cycle so that others can work on top of them with no merge conflicts" * tag 'tty-6.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty: (199 commits) tty: shrink the size of struct tty_struct by 40 bytes tty: n_tty: deduplicate copy code in n_tty_receive_buf_real_raw() tty: n_tty: extract ECHO_OP processing to a separate function tty: n_tty: unify counts to size_t tty: n_tty: use u8 for chars and flags tty: n_tty: simplify chars_in_buffer() tty: n_tty: remove unsigned char casts from character constants tty: n_tty: move newline handling to a separate function tty: n_tty: move canon handling to a separate function tty: n_tty: use MASK() for masking out size bits tty: n_tty: make n_tty_data::num_overrun unsigned tty: n_tty: use time_is_before_jiffies() in n_tty_receive_overrun() tty: n_tty: use 'num' for writes' counts tty: n_tty: use output character directly tty: n_tty: make flow of n_tty_receive_buf_common() a bool Revert "tty: serial: meson: Add a earlycon for the T7 SoC" Documentation: devices.txt: Fix minors for ttyCPM* Documentation: devices.txt: Remove ttySIOC* Documentation: devices.txt: Remove ttyIOC* serial: 8250_bcm7271: improve bcm7271 8250 port ... |
||
Linus Torvalds
|
4ad0a4c234 |
powerpc updates for 6.6
- Add HOTPLUG_SMT support (/sys/devices/system/cpu/smt) and honour the configured SMT state when hotplugging CPUs into the system. - Combine final TLB flush and lazy TLB mm shootdown IPIs when using the Radix MMU to avoid a broadcast TLBIE flush on exit. - Drop the exclusion between ptrace/perf watchpoints, and drop the now unused associated arch hooks. - Add support for the "nohlt" command line option to disable CPU idle. - Add support for -fpatchable-function-entry for ftrace, with GCC >= 13.1. - Rework memory block size determination, and support 256MB size on systems with GPUs that have hotpluggable memory. - Various other small features and fixes. Thanks to: Andrew Donnellan, Aneesh Kumar K.V, Arnd Bergmann, Athira Rajeev, Benjamin Gray, Christophe Leroy, Frederic Barrat, Gautam Menghani, Geoff Levand, Hari Bathini, Immad Mir, Jialin Zhang, Joel Stanley, Jordan Niethe, Justin Stitt, Kajol Jain, Kees Cook, Krzysztof Kozlowski, Laurent Dufour, Liang He, Linus Walleij, Mahesh Salgaonkar, Masahiro Yamada, Michal Suchanek, Nageswara R Sastry, Nathan Chancellor, Nathan Lynch, Naveen N Rao, Nicholas Piggin, Nick Desaulniers, Omar Sandoval, Randy Dunlap, Reza Arbab, Rob Herring, Russell Currey, Sourabh Jain, Thomas Gleixner, Trevor Woerner, Uwe Kleine-König, Vaibhav Jain, Xiongfeng Wang, Yuan Tan, Zhang Rui, Zheng Zengkai. -----BEGIN PGP SIGNATURE----- iQJHBAABCAAxFiEEJFGtCPCthwEv2Y/bUevqPMjhpYAFAmTwgbwTHG1wZUBlbGxl cm1hbi5pZC5hdQAKCRBR6+o8yOGlgFmpD/432vipeoqvkAYsyK0xi/Y3GcY0wcyd WJApLXXadEbtKQrgXQ6sowWqalg5thYnQCRarg/tXKK/po3KfgwkPjGDpOL+cIdr 12QVN2XJm9VmJ1wYJxzk+yXx4F43AdmMdr94qWAGufbTHezwb4UpzVR1NxtFrOE/ X5TNsC2+2mdZY/ZaNHS5vsTIFv3EhQfqgjZPlIAdLn6CGc8xWT514Q/uHA8+ytM/ HL7Hqs33DoPSvgTa5TT/2E0d0k5nO3P5KObzAjpYlireTPaBi51mpKGewcrtm0o2 v3cBlbfx3C7pe9ZhKBK9BH8cjynfiqsVZ9/lCw/7eBNdm9tHuzG0jeS7Db9tCZXS fM7G2R7SoIusPTqxlBmkU5DpYslwrHiVgCyy3ijxkoA/fakVwh/GgTcMsRt73IY6 n6DsUvWwuYHCIeIiHmHQJqCqCRtV+aMzU3AbbBHOjtdIanhlW16M686dEsgCirh7 akRVRD5VqKaqXs34PpkRL89Xv3wZRjl6XZ3hZFfCjSYXfpXDXhgSToIskpHYhKL8 gpY7WtG9YQP05Xz5HRCx6EluaZVeKe0lZi6fezX7Mi9AygJQO8FfXqP1mHBlEq40 ThWtvL9D89RV6lADqqFN20XepgvKNOyAXcE4szvsnIZYUSPmZQZSPxx+DHtROaLP jX3ifxtxJp92pQ== =5g7K -----END PGP SIGNATURE----- Merge tag 'powerpc-6.6-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc updates from Michael Ellerman: - Add HOTPLUG_SMT support (/sys/devices/system/cpu/smt) and honour the configured SMT state when hotplugging CPUs into the system - Combine final TLB flush and lazy TLB mm shootdown IPIs when using the Radix MMU to avoid a broadcast TLBIE flush on exit - Drop the exclusion between ptrace/perf watchpoints, and drop the now unused associated arch hooks - Add support for the "nohlt" command line option to disable CPU idle - Add support for -fpatchable-function-entry for ftrace, with GCC >= 13.1 - Rework memory block size determination, and support 256MB size on systems with GPUs that have hotpluggable memory - Various other small features and fixes Thanks to Andrew Donnellan, Aneesh Kumar K.V, Arnd Bergmann, Athira Rajeev, Benjamin Gray, Christophe Leroy, Frederic Barrat, Gautam Menghani, Geoff Levand, Hari Bathini, Immad Mir, Jialin Zhang, Joel Stanley, Jordan Niethe, Justin Stitt, Kajol Jain, Kees Cook, Krzysztof Kozlowski, Laurent Dufour, Liang He, Linus Walleij, Mahesh Salgaonkar, Masahiro Yamada, Michal Suchanek, Nageswara R Sastry, Nathan Chancellor, Nathan Lynch, Naveen N Rao, Nicholas Piggin, Nick Desaulniers, Omar Sandoval, Randy Dunlap, Reza Arbab, Rob Herring, Russell Currey, Sourabh Jain, Thomas Gleixner, Trevor Woerner, Uwe Kleine-König, Vaibhav Jain, Xiongfeng Wang, Yuan Tan, Zhang Rui, and Zheng Zengkai. * tag 'powerpc-6.6-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (135 commits) macintosh/ams: linux/platform_device.h is needed powerpc/xmon: Reapply "Relax frame size for clang" powerpc/mm/book3s64: Use 256M as the upper limit with coherent device memory attached powerpc/mm/book3s64: Fix build error with SPARSEMEM disabled powerpc/iommu: Fix notifiers being shared by PCI and VIO buses powerpc/mpc5xxx: Add missing fwnode_handle_put() powerpc/config: Disable SLAB_DEBUG_ON in skiroot powerpc/pseries: Remove unused hcall tracing instruction powerpc/pseries: Fix hcall tracepoints with JUMP_LABEL=n powerpc: dts: add missing space before { powerpc/eeh: Use pci_dev_id() to simplify the code powerpc/64s: Move CPU -mtune options into Kconfig powerpc/powermac: Fix unused function warning powerpc/pseries: Rework lppaca_shared_proc() to avoid DEBUG_PREEMPT powerpc: Don't include lppaca.h in paca.h powerpc/pseries: Move hcall_vphn() prototype into vphn.h powerpc/pseries: Move VPHN constants into vphn.h cxl: Drop unused detach_spa() powerpc: Drop zalloc_maybe_bootmem() powerpc/powernv: Use struct opal_prd_msg in more places ... |
||
Linus Torvalds
|
df57721f9a |
Add x86 shadow stack support
Convert IBT selftest to asm to fix objtool warning -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEV76QKkVc4xCGURexaDWVMHDJkrAFAmTv1QQACgkQaDWVMHDJ krAUwhAAn6TOwHJK8BSkHeiQhON1nrlP3c5cv0AyZ2NP8RYDrZrSZvhpYBJ6wgKC Cx5CGq5nn9twYsYS3KsktLKDfR3lRdsQ7K9qtyFtYiaeaVKo+7gEKl/K+klwai8/ gninQWHk0zmSCja8Vi77q52WOMkQKapT8+vaON9EVDO8dVEi+CvhAIfPwMafuiwO Rk4X86SzoZu9FP79LcCg9XyGC/XbM2OG9eNUTSCKT40qTTKm5y4gix687NvAlaHR ko5MTsdl0Wfp6Qk0ohT74LnoA2c1g/FluvZIM33ci/2rFpkf9Hw7ip3lUXqn6CPx rKiZ+pVRc0xikVWkraMfIGMJfUd2rhelp8OyoozD7DB7UZw40Q4RW4N5tgq9Fhe9 MQs3p1v9N8xHdRKl365UcOczUxNAmv4u0nV5gY/4FMC6VjldCl2V9fmqYXyzFS4/ Ogg4FSd7c2JyGFKPs+5uXyi+RY2qOX4+nzHOoKD7SY616IYqtgKoz5usxETLwZ6s VtJOmJL0h//z0A7tBliB0zd+SQ5UQQBDC2XouQH2fNX2isJMn0UDmWJGjaHgK6Hh 8jVp6LNqf+CEQS387UxckOyj7fu438hDky1Ggaw4YqowEOhQeqLVO4++x+HITrbp AupXfbJw9h9cMN63Yc0gVxXQ9IMZ+M7UxLtZ3Cd8/PVztNy/clA= =3UUm -----END PGP SIGNATURE----- Merge tag 'x86_shstk_for_6.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 shadow stack support from Dave Hansen: "This is the long awaited x86 shadow stack support, part of Intel's Control-flow Enforcement Technology (CET). CET consists of two related security features: shadow stacks and indirect branch tracking. This series implements just the shadow stack part of this feature, and just for userspace. The main use case for shadow stack is providing protection against return oriented programming attacks. It works by maintaining a secondary (shadow) stack using a special memory type that has protections against modification. When executing a CALL instruction, the processor pushes the return address to both the normal stack and to the special permission shadow stack. Upon RET, the processor pops the shadow stack copy and compares it to the normal stack copy. For more information, refer to the links below for the earlier versions of this patch set" Link: https://lore.kernel.org/lkml/20220130211838.8382-1-rick.p.edgecombe@intel.com/ Link: https://lore.kernel.org/lkml/20230613001108.3040476-1-rick.p.edgecombe@intel.com/ * tag 'x86_shstk_for_6.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (47 commits) x86/shstk: Change order of __user in type x86/ibt: Convert IBT selftest to asm x86/shstk: Don't retry vm_munmap() on -EINTR x86/kbuild: Fix Documentation/ reference x86/shstk: Move arch detail comment out of core mm x86/shstk: Add ARCH_SHSTK_STATUS x86/shstk: Add ARCH_SHSTK_UNLOCK x86: Add PTRACE interface for shadow stack selftests/x86: Add shadow stack test x86/cpufeatures: Enable CET CR4 bit for shadow stack x86/shstk: Wire in shadow stack interface x86: Expose thread features in /proc/$PID/status x86/shstk: Support WRSS for userspace x86/shstk: Introduce map_shadow_stack syscall x86/shstk: Check that signal frame is shadow stack mem x86/shstk: Check that SSP is aligned on sigreturn x86/shstk: Handle signals for shadow stack x86/shstk: Introduce routines modifying shstk x86/shstk: Handle thread shadow stack x86/shstk: Add user-mode shadow stack support ... |
||
Christoph Hellwig
|
765aa6b3a4 |
dma-pool: remove a __maybe_unused label in atomic_pool_expand
Move the #endif a line so that free_page label is only seen by the compile pass when actually used. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chunhui He <hchunhui@mail.ustc.edu.cn> Reviewed-by: Robin Murphy <roin.murphy@arm.com> |
||
Linus Torvalds
|
cd99b9eb4b |
Documentation work keeps chugging along; stuff for 6.6 includes:
- Work from Carlos Bilbao to integrate rustdoc output into the generated HTML documentation. This took some work to figure out how to do it without slowing the docs build and without creating people who don't have Rust installed, but Carlos got there. - Move the loongarch and mips architecture documentation under Documentation/arch/. - Some more maintainer documentation from Jakub ...plus the usual assortment of updates, translations, and fixes. -----BEGIN PGP SIGNATURE----- iQFDBAABCAAtFiEEIw+MvkEiF49krdp9F0NaE2wMflgFAmTvqNkPHGNvcmJldEBs d24ubmV0AAoJEBdDWhNsDH5YgIgH/3drfLtlFtzLqDOzrzDXS8yGnE3pPdxw796b /ZFzAK16wYKaKevYoIz8bVGGKaE1sEUW0mhlq4KGdfZuxLG8YnWS8URyCW4FDU2E 6qNL+8oJ8LZfID46f9Q8ZgfEz7yF/mhCqPk7MEswYtwbscs2ZTGCTGYB/5BHlBuT LR+M89uLmHgr8S1o24v30OgiX+VvQFyu0xoxIhbiqUZvBd/XdfX2pgYd9BGzMj5q C2ZP+V14g36c5pV0EO9TwhCXOF/WVrp7DbjbfWAsqBSLxvpXPydH2q1DUzGeQtP1 exujrBD1O8q3pPdaNA5R+h6cWlHmUZug9mE4BRLp9ErGrozwJsQ= =C3Uv -----END PGP SIGNATURE----- Merge tag 'docs-6.6' of git://git.lwn.net/linux Pull documentation updates from Jonathan Corbet: "Documentation work keeps chugging along; this includes: - Work from Carlos Bilbao to integrate rustdoc output into the generated HTML documentation. This took some work to figure out how to do it without slowing the docs build and without creating people who don't have Rust installed, but Carlos got there - Move the loongarch and mips architecture documentation under Documentation/arch/ - Some more maintainer documentation from Jakub ... plus the usual assortment of updates, translations, and fixes" * tag 'docs-6.6' of git://git.lwn.net/linux: (56 commits) Docu: genericirq.rst: fix irq-example input: docs: pxrc: remove reference to phoenix-sim Documentation: serial-console: Fix literal block marker docs/mm: remove references to hmm_mirror ops and clean typos docs/zh_CN: correct regi_chg(),regi_add() to region_chg(),region_add() Documentation: Fix typos Documentation/ABI: Fix typos scripts: kernel-doc: fix macro handling in enums scripts: kernel-doc: parse DEFINE_DMA_UNMAP_[ADDR|LEN] Documentation: riscv: Update boot image header since EFI stub is supported Documentation: riscv: Add early boot document Documentation: arm: Add bootargs to the table of added DT parameters docs: kernel-parameters: Refer to the correct bitmap function doc: update params of memhp_default_state= docs: Add book to process/kernel-docs.rst docs: sparse: fix invalid link addresses docs: vfs: clean up after the iterate() removal docs: Add a section on surveys to the researcher guidelines docs: move mips under arch docs: move loongarch under arch ... |
||
Phil Sutter
|
ea078ae910 |
netfilter: nf_tables: Audit log rule reset
Resetting rules' stateful data happens outside of the transaction logic,
so 'get' and 'dump' handlers have to emit audit log entries themselves.
Fixes:
|
||
Phil Sutter
|
7e9be1124d |
netfilter: nf_tables: Audit log setelem reset
Since set element reset is not integrated into nf_tables' transaction
logic, an explicit log call is needed, similar to NFT_MSG_GETOBJ_RESET
handling.
For the sake of simplicity, catchall element reset will always generate
a dedicated log entry. This relieves nf_tables_dump_set() from having to
adjust the logged element count depending on whether a catchall element
was found or not.
Fixes:
|
||
Linus Torvalds
|
1a35914f73 |
integrity-v6.6
-----BEGIN PGP SIGNATURE----- iIoEABYIADIWIQQdXVVFGN5XqKr1Hj7LwZzRsCrn5QUCZO0WoxQcem9oYXJAbGlu dXguaWJtLmNvbQAKCRDLwZzRsCrn5alsAP0UZQIKI2zEjFdtucgClcSouflIOC5i Hvtgv3qVFXPZQwEA2H/SGjigtH5NruVXECDZdrIfaGGvBhyeY72lbswXfQ0= =Gu8i -----END PGP SIGNATURE----- Merge tag 'integrity-v6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity Pull integrity subsystem updates from Mimi Zohar: - With commit |
||
Linus Torvalds
|
1086eeac9c |
lsm/stable-6.6 PR 20230829
-----BEGIN PGP SIGNATURE----- iQJIBAABCAAyFiEES0KozwfymdVUl37v6iDy2pc3iXMFAmTuKLcUHHBhdWxAcGF1 bC1tb29yZS5jb20ACgkQ6iDy2pc3iXM/Eg//cwaOu/ASS08Cz/tfXeKpzg9UpzbW uHqGtgdE9ZEvS71z+3dorOJVPEwPr+/yviq3FXYjYHFqvVhLZCvYM9rw+eNo/k4T I95UTchGUsMWwkw61YBDLythfXm2UL5nabjckO81i9UPtxUYOwF6xQMQXYyMcLL8 6fm1vnCvK5FBEXi2HSUWy3Eb3wdviGdHrL6h19Aeew+q8u33asWSxn9vmBSSFEzZ 492//Pgy0t3FA6paWXQRvoR+GvLgBXNOvHB68cAx9vS8Lq6mAwJJSCRrQtKGh2Gd YInr49f+TXOosD5Tm6ueWO4sr8RzQZ7nPyM+BLue4Yn2ZzdYgjwfHdkHWS1KeH5X qVqa9s6/QONvkSCzqHs/ne2qio1Q0/0uGgwOkx6N7oVWQWjE7iTYlADwM0CDJnd2 UD7AHTOgpc88x1T1eW599MZttSCznBTSFXv4waaS5/5NT9n8Db1TpTtCTedOc1x2 n+c+F5BHLy69vhSGCanvum/8i2gNoKVyYaHyaMsQxr5LRcLnvN6oOjWIv7jMKxe7 GavUAxU7M5rxPUH44vrrrI+XztKJOdpCz4S0xp+7pSSSGAK5KkmVVLXjzrlGO1WS 55ixxQWYTGK0KlWHp4Ofi6brE9a4ATKcd1XscPN+AtBYX2ufNHLskCZulu/lyrMx lAy9RRDe1hHWTvg= =dnm4 -----END PGP SIGNATURE----- Merge tag 'lsm-pr-20230829' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/lsm Pull LSM updates from Paul Moore: - Add proper multi-LSM support for xattrs in the security_inode_init_security() hook Historically the LSM layer has only allowed a single LSM to add an xattr to an inode, with IMA/EVM measuring that and adding its own as well. As we work towards promoting IMA/EVM to a "proper LSM" instead of the special case that it is now, we need to better support the case of multiple LSMs each adding xattrs to an inode and after several attempts we now appear to have something that is working well. It is worth noting that in the process of making this change we uncovered a problem with Smack's SMACK64TRANSMUTE xattr which is also fixed in this pull request. - Additional LSM hook constification Two patches to constify parameters to security_capget() and security_binder_transfer_file(). While I generally don't make a special note of who submitted these patches, these were the work of an Outreachy intern, Khadija Kamran, and that makes me happy; hopefully it does the same for all of you reading this. - LSM hook comment header fixes One patch to add a missing hook comment header, one to fix a minor typo. - Remove an old, unused credential function declaration It wasn't clear to me who should pick this up, but it was trivial, obviously correct, and arguably the LSM layer has a vested interest in credentials so I merged it. Sadly I'm now noticing that despite my subject line cleanup I didn't cleanup the "unsued" misspelling, sigh * tag 'lsm-pr-20230829' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/lsm: lsm: constify the 'file' parameter in security_binder_transfer_file() lsm: constify the 'target' parameter in security_capget() lsm: add comment block for security_sk_classify_flow LSM hook security: Fix ret values doc for security_inode_init_security() cred: remove unsued extern declaration change_create_files_as() evm: Support multiple LSMs providing an xattr evm: Align evm_inode_init_security() definition with LSM infrastructure smack: Set the SMACK64TRANSMUTE xattr in smack_inode_init_security() security: Allow all LSMs to provide xattrs for inode_init_security hook lsm: fix typo in security_file_lock() comment header |
||
Linus Torvalds
|
3ea67c4f46 |
audit/stable-6.6 PR 20230829
-----BEGIN PGP SIGNATURE----- iQJIBAABCAAyFiEES0KozwfymdVUl37v6iDy2pc3iXMFAmTuKIQUHHBhdWxAcGF1 bC1tb29yZS5jb20ACgkQ6iDy2pc3iXMSahAA4o+mfGxcadExo8wsEFfizsQd0JS1 6KpV8Gl9/uwPTCUmvjquFnTb5tbNFZ1X7jnj2g0+/ZHYPp9yJQqTKu7NX1Q9w+dE 11tiipc4CyrcJpWrjBinNH27txjulLSCN1imMnRYLZOpk1AbXTwjuLjFBy2iTDtm 8TAPj4vcKbi5MlcUodp/DGO6ysL75gTsLn5UUsHJhWbofz4ECay0heQoPeZ/MaW3 gBPMRgt/REg8ikdR/ntFMOD6ywBZZ0Vsf/S+hNWGwHUgGxQ5H7rJBEFI65HL4Ur1 c36UFRsypT1sFaIDbS/PrvpT3M48XwmqdmWNx5Z1dtJCCwNhuhsmEkXB+GEud2qM SOQQfMgfjKvnaLMPUmDePuAiSflSJj2AHo1HXlYxKFtybI1plJGiRoDX5jlsklCp JbwUJ2y7YlxNPIaZSBHYIUuniUDqET83cR2D3YJiU+2I9myg8Z5Amto8d4MFgf21 f4qfm0SDBMvXYHUuhUry0/kuk2A0R89H4HUNcrGky+cSsaelpm06uaxj43B/M9Dp v1nSwDQpDtYKSt+16GUDfqq5BywjwMe4J7wlE9+YdTDrvuc2qUxZMky5GzZ55Wnl mbe6BVEBc19FhDeC3muhgV0jWCUGKuq6q+W+CRmxafyOMzX9NIDFaZf1KxkaesxD S9I7AYmT7fCghFQ= =tZaJ -----END PGP SIGNATURE----- Merge tag 'audit-pr-20230829' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit Pull audit updates from Paul Moore: "Six audit patches, the highlights are: - Add an explicit cond_resched() call when generating PATH records Certain tracefs/debugfs operations can generate a *lot* of audit PATH entries and if one has an aggressive system configuration (not the default) this can cause a soft lockup in the audit code as it works to process all of these new entries. This is in sharp contrast to the common case where only one or two PATH entries are logged. In order to fix this corner case without excessively impacting the common case we're adding a single cond_rescued() call between two of the most intensive loops in the __audit_inode_child() function. - Various minor cleanups We removed a conditional header file as the included header already had the necessary logic in place, fixed a dummy function's return value, and the usual collection of checkpatch.pl noise (whitespace, brace, and trailing statement tweaks)" * tag 'audit-pr-20230829' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit: audit: move trailing statements to next line audit: cleanup function braces and assignment-in-if-condition audit: add space before parenthesis and around '=', "==", and '<' audit: fix possible soft lockup in __audit_inode_child() audit: correct audit_filter_inodes() definition audit: include security.h unconditionally |
||
Christoph Hellwig
|
2dcdf8c18d |
dma-contiguous: fix the Kconfig entry for CONFIG_DMA_NUMA_CMA
It makes no sense to expose CONFIG_DMA_NUMA_CMA if CONFIG_NUMA is not enabled, and random config options shouldn't be default unless there is a good reason. Replace the default NUMA with a depends on to fix both issues. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Robin Murphy <roin.murphy@arm.com> |
||
Thomas Gleixner
|
2b8272ff4a |
cpu/hotplug: Prevent self deadlock on CPU hot-unplug
Xiongfeng reported and debugged a self deadlock of the task which initiates and controls a CPU hot-unplug operation vs. the CFS bandwidth timer. CPU1 CPU2 T1 sets cfs_quota starts hrtimer cfs_bandwidth 'period_timer' T1 is migrated to CPU2 T1 initiates offlining of CPU1 Hotplug operation starts ... 'period_timer' expires and is re-enqueued on CPU1 ... take_cpu_down() CPU1 shuts down and does not handle timers anymore. They have to be migrated in the post dead hotplug steps by the control task. T1 runs the post dead offline operation T1 is scheduled out T1 waits for 'period_timer' to expire T1 waits there forever if it is scheduled out before it can execute the hrtimer offline callback hrtimers_dead_cpu(). Cure this by delegating the hotplug control operation to a worker thread on an online CPU. This takes the initiating user space task, which might be affected by the bandwidth timer, completely out of the picture. Reported-by: Xiongfeng Wang <wangxiongfeng2@huawei.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Yu Liao <liaoyu15@huawei.com> Acked-by: Vincent Guittot <vincent.guittot@linaro.org> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/lkml/8e785777-03aa-99e1-d20e-e956f5685be6@huawei.com Link: https://lore.kernel.org/r/87h6oqdq0i.ffs@tglx |
||
Paul Gortmaker
|
96c1fa04f0 |
tick/rcu: Fix false positive "softirq work is pending" messages
In commit |
||
Sergey Senozhatsky
|
fb5a431559 |
dma-debug: don't call __dma_entry_alloc_check_leak() under free_entries_lock
__dma_entry_alloc_check_leak() calls into printk -> serial console output (qcom geni) and grabs port->lock under free_entries_lock spin lock, which is a reverse locking dependency chain as qcom_geni IRQ handler can call into dma-debug code and grab free_entries_lock under port->lock. Move __dma_entry_alloc_check_leak() call out of free_entries_lock scope so that we don't acquire serial console's port->lock under it. Trimmed-down lockdep splat: The existing dependency chain (in reverse order) is: -> #2 (free_entries_lock){-.-.}-{2:2}: _raw_spin_lock_irqsave+0x60/0x80 dma_entry_alloc+0x38/0x110 debug_dma_map_page+0x60/0xf8 dma_map_page_attrs+0x1e0/0x230 dma_map_single_attrs.constprop.0+0x6c/0xc8 geni_se_rx_dma_prep+0x40/0xcc qcom_geni_serial_isr+0x310/0x510 __handle_irq_event_percpu+0x110/0x244 handle_irq_event_percpu+0x20/0x54 handle_irq_event+0x50/0x88 handle_fasteoi_irq+0xa4/0xcc handle_irq_desc+0x28/0x40 generic_handle_domain_irq+0x24/0x30 gic_handle_irq+0xc4/0x148 do_interrupt_handler+0xa4/0xb0 el1_interrupt+0x34/0x64 el1h_64_irq_handler+0x18/0x24 el1h_64_irq+0x64/0x68 arch_local_irq_enable+0x4/0x8 ____do_softirq+0x18/0x24 ... -> #1 (&port_lock_key){-.-.}-{2:2}: _raw_spin_lock_irqsave+0x60/0x80 qcom_geni_serial_console_write+0x184/0x1dc console_flush_all+0x344/0x454 console_unlock+0x94/0xf0 vprintk_emit+0x238/0x24c vprintk_default+0x3c/0x48 vprintk+0xb4/0xbc _printk+0x68/0x90 register_console+0x230/0x38c uart_add_one_port+0x338/0x494 qcom_geni_serial_probe+0x390/0x424 platform_probe+0x70/0xc0 really_probe+0x148/0x280 __driver_probe_device+0xfc/0x114 driver_probe_device+0x44/0x100 __device_attach_driver+0x64/0xdc bus_for_each_drv+0xb0/0xd8 __device_attach+0xe4/0x140 device_initial_probe+0x1c/0x28 bus_probe_device+0x44/0xb0 device_add+0x538/0x668 of_device_add+0x44/0x50 of_platform_device_create_pdata+0x94/0xc8 of_platform_bus_create+0x270/0x304 of_platform_populate+0xac/0xc4 devm_of_platform_populate+0x60/0xac geni_se_probe+0x154/0x160 platform_probe+0x70/0xc0 ... -> #0 (console_owner){-...}-{0:0}: __lock_acquire+0xdf8/0x109c lock_acquire+0x234/0x284 console_flush_all+0x330/0x454 console_unlock+0x94/0xf0 vprintk_emit+0x238/0x24c vprintk_default+0x3c/0x48 vprintk+0xb4/0xbc _printk+0x68/0x90 dma_entry_alloc+0xb4/0x110 debug_dma_map_sg+0xdc/0x2f8 __dma_map_sg_attrs+0xac/0xe4 dma_map_sgtable+0x30/0x4c get_pages+0x1d4/0x1e4 [msm] msm_gem_pin_pages_locked+0x38/0xac [msm] msm_gem_pin_vma_locked+0x58/0x88 [msm] msm_ioctl_gem_submit+0xde4/0x13ac [msm] drm_ioctl_kernel+0xe0/0x15c drm_ioctl+0x2e8/0x3f4 vfs_ioctl+0x30/0x50 ... Chain exists of: console_owner --> &port_lock_key --> free_entries_lock Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(free_entries_lock); lock(&port_lock_key); lock(free_entries_lock); lock(console_owner); *** DEADLOCK *** Call trace: dump_backtrace+0xb4/0xf0 show_stack+0x20/0x30 dump_stack_lvl+0x60/0x84 dump_stack+0x18/0x24 print_circular_bug+0x1cc/0x234 check_noncircular+0x78/0xac __lock_acquire+0xdf8/0x109c lock_acquire+0x234/0x284 console_flush_all+0x330/0x454 console_unlock+0x94/0xf0 vprintk_emit+0x238/0x24c vprintk_default+0x3c/0x48 vprintk+0xb4/0xbc _printk+0x68/0x90 dma_entry_alloc+0xb4/0x110 debug_dma_map_sg+0xdc/0x2f8 __dma_map_sg_attrs+0xac/0xe4 dma_map_sgtable+0x30/0x4c get_pages+0x1d4/0x1e4 [msm] msm_gem_pin_pages_locked+0x38/0xac [msm] msm_gem_pin_vma_locked+0x58/0x88 [msm] msm_ioctl_gem_submit+0xde4/0x13ac [msm] drm_ioctl_kernel+0xe0/0x15c drm_ioctl+0x2e8/0x3f4 vfs_ioctl+0x30/0x50 ... Reported-by: Rob Clark <robdclark@chromium.org> Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org> Acked-by: Robin Murphy <robin.murphy@arm.com> Signed-off-by: Christoph Hellwig <hch@lst.de> |