mirror of
https://mirrors.bfsu.edu.cn/git/linux.git
synced 2024-12-14 06:24:53 +08:00
b1dfc0f762
43300 Commits
Author | SHA1 | Message | Date | |
---|---|---|---|---|
Linus Torvalds
|
3a87498869 |
- Make sure tasks are thawed exactly and only once to avoid their state
getting corrupted -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmV1nDUACgkQEsHwGGHe VUqC3Q/9GF3IjEzKZAwTqw9ir2Nq9fFKkDZVT1ZCkXcg3bc6t5Dp68NcMPEoPdNE 6ONaEwKhZxqPyivI7u1ExdZnHYDMRWolZmjs/x19c+g3Zo6QzT+6blMdoWvl6nV2 RD3macPt5w5bcJ8ugSM4ekTQgo4nPU5VhBS52zDARx0W9ufpIk3YKmxmVQjhuV5J z/nfewUuUtAHDxnbF8pRvN8WoSg15Z5iERksdcj8Wagjx79cMAR6liuauJNkj9dP lldG69ODdJeZc9L/SUkLEgYPVaq+G6BOKgWXbzeiRM9LedHN3iQlT9JUttLHN383 NdTbQ6lboViP1O64WuoqJFVDYvY0DvVLUll4URywfT3lPbISGvxhg0Xj+4E8F5W9 A9pB9TDZwRXwrNuRLksaY0v/Glfo7eUr6252aDbgrUovJCDOwfRB+pI4ywpfoL/+ 2eKkJR1mUjoCXirkbYjcm7EhnTSKxiKmCYK7pyol3fJCsK/4bQF7mJ4UyDFIB3Na VXVD41KkMsaAdIQp4HbdduYaPSCQvQee6ahtobQwcxyBWGXRzurTw4ubHlzSeN9F fIfxF9PfSY+So2J9IrU1uYKPvfbUWfU3b1urQPhPvVlbVlZmfG579ek6+4bhagsg UztDRvv9lCxvBskruIMfelAduXsDkDi0UwJ0/TXlPnQGzYlDdeI= =07a8 -----END PGP SIGNATURE----- Merge tag 'sched_urgent_for_v6.7_rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler fix from Borislav Petkov: - Make sure tasks are thawed exactly and only once to avoid their state getting corrupted * tag 'sched_urgent_for_v6.7_rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: freezer,sched: Do not restore saved_state of a thawed task |
||
Linus Torvalds
|
537ccb5d28 |
- Make sure perf event size validation is done on every event in the
group -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmV1mfQACgkQEsHwGGHe VUrW9w/9EVMf1/cu1rY4XN68NbOgdfoic2oPan60WJwiYhYto9uA1quR4Q8ziwVh UbuO4e0up1ZCUzutZGFnx0ZHrlQIb0/YTQj8kDKX6m7g8s2Vers7YbkRwneDsNqA JDp58yGXdc1TipVYrKqa0leNrezvaEeoVFPIPKoelzi3673xrlslRseJ/n7vJd4u SnMjT7LQZIlEe/pecz01nHAo6SSwfI/Ynh2WSorHnhSTuE5gMUzJwBYSXvpZ2gyg 207keTiIcrvxgT+a32NMeEYsFFFvpYKFHI5nxxV1pB8AWXdWaNpuYHNItTDIh81D fSb8hu+EpNSWtZYzXl/esgULfMgHXez+4VknTpX/vsbfcV1Yif4aHlZP8tgP6gZ5 QyA2NMA5vJypjzLsAgCyZjpTyEVPYQ3f4+iYg4EGlMlgLgoXtHIV+zP765SzDVkC yPO4xVf+Ypo9AKcGKjBrxyMlRq40zos40k6l2yOjSUlTE2IfOLMhjgVHeLcgD+uv E9pi0/KtfGvrm3nWgIhDtcvd5Jg6vrilaRWl9bAN6g6xgaqLPXuIZbOjPaRpKSNa L32XBMg5fUt4eesZv458qu4Zw1ybHCd6qoe3OieFzW5ocR61O946MHX3kkbpmsWC PzH1mBsPa3F8/utJ06p+9pank3M5yKHdkDPQXfSvImuZ3DPKEGI= =QxHj -----END PGP SIGNATURE----- Merge tag 'perf_urgent_for_v6.7_rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf event fix from Borislav Petkov: - Make sure perf event size validation is done on every event in the group * tag 'perf_urgent_for_v6.7_rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf: Fix perf_event_validate_size() |
||
Linus Torvalds
|
17894c2a7a |
tracing fixes for v6.7-rc4:
- Snapshot buffer issues 1. When instances started allowing latency tracers, it uses a snapshot buffer (another buffer that is not written to but swapped with the main buffer that is). The snapshot buffer needs to be the same size as the main buffer. But when the snapshot buffers were added to instances, the code to make the snapshot equal to the main buffer still was only doing it for the main buffer and not the instances. 2. Need to stop the current tracer when resizing the buffers. Otherwise there can be a race if the tracer decides to make a snapshot between resizing the main buffer and the snapshot buffer. 3. When a tracer is "stopped" in disables both the main buffer and the snapshot buffer. This needs to be done for instances and not only the main buffer, now that instances also have a snapshot buffer. - Buffered event for filtering issues When filtering is enabled, because events can be dropped often, it is quicker to copy the event into a temp buffer and write that into the main buffer if it is not filtered or just drop the event if it is, than to write the event into the ring buffer and then try to discard it. This temp buffer is allocated and needs special synchronization to do so. But there were some issues with that: 1. When disabling the filter and freeing the buffer, a call to all CPUs is required to stop each per_cpu usage. But the code called smp_call_function_many() which does not include the current CPU. If the task is migrated to another CPU when it enables the CPUs via smp_call_function_many(), it will not enable the one it is currently on and this causes issues later on. Use on_each_cpu_mask() instead, which includes the current CPU. 2. When the allocation of the buffered event fails, it can give a warning. But the buffered event is just an optimization (it's still OK to write to the ring buffer and free it). Do not WARN in this case. 3. The freeing of the buffer event requires synchronization. First a counter is decremented to zero so that no new uses of it will happen. Then it sets the buffered event to NULL, and finally it frees the buffered event. There's a synchronize_rcu() between the counter decrement and the setting the variable to NULL, but only a smp_wmb() between that and the freeing of the buffer. It is theoretically possible that a user missed seeing the decrement, but will use the buffer after it is free. Another synchronize_rcu() is needed in place of that smp_wmb(). - ring buffer timestamps on 32 bit machines The ring buffer timestamp on 32 bit machines has to break the 64 bit number into multiple values as cmpxchg is required on it, and a 64 bit cmpxchg on 32 bit architectures is very slow. The code use to just use two 32 bit values and make it a 60 bit timestamp where the other 4 bits were used as counters for synchronization. It later came known that the timestamp on 32 bit still need all 64 bits in some cases. So 3 words were created to handle the 64 bits. But issues arised with this: 1. The synchronization logic still only compared the counter with the first two, but not with the third number, so the synchronization could fail unknowingly. 2. A check on discard of an event could race if an event happened between the discard and updating one of the counters. The counter needs to be updated (forcing an absolute timestamp and not to use a delta) before the actual discard happens. -----BEGIN PGP SIGNATURE----- iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCZXIP5hQccm9zdGVkdEBn b29kbWlzLm9yZwAKCRAp5XQQmuv6qmJxAQDXBZwBUFQjWqZHLJn0S9aaz5FggkeR RmlsOMND0PXcjwD+N6U905i553ehu3SSyOP+5svoi0hyCB2qhj3ZF0LzZQU= =us1V -----END PGP SIGNATURE----- Merge tag 'trace-v6.7-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull tracing fixes from Steven Rostedt: - Snapshot buffer issues: 1. When instances started allowing latency tracers, it uses a snapshot buffer (another buffer that is not written to but swapped with the main buffer that is). The snapshot buffer needs to be the same size as the main buffer. But when the snapshot buffers were added to instances, the code to make the snapshot equal to the main buffer still was only doing it for the main buffer and not the instances. 2. Need to stop the current tracer when resizing the buffers. Otherwise there can be a race if the tracer decides to make a snapshot between resizing the main buffer and the snapshot buffer. 3. When a tracer is "stopped" in disables both the main buffer and the snapshot buffer. This needs to be done for instances and not only the main buffer, now that instances also have a snapshot buffer. - Buffered event for filtering issues: When filtering is enabled, because events can be dropped often, it is quicker to copy the event into a temp buffer and write that into the main buffer if it is not filtered or just drop the event if it is, than to write the event into the ring buffer and then try to discard it. This temp buffer is allocated and needs special synchronization to do so. But there were some issues with that: 1. When disabling the filter and freeing the buffer, a call to all CPUs is required to stop each per_cpu usage. But the code called smp_call_function_many() which does not include the current CPU. If the task is migrated to another CPU when it enables the CPUs via smp_call_function_many(), it will not enable the one it is currently on and this causes issues later on. Use on_each_cpu_mask() instead, which includes the current CPU. 2.When the allocation of the buffered event fails, it can give a warning. But the buffered event is just an optimization (it's still OK to write to the ring buffer and free it). Do not WARN in this case. 3.The freeing of the buffer event requires synchronization. First a counter is decremented to zero so that no new uses of it will happen. Then it sets the buffered event to NULL, and finally it frees the buffered event. There's a synchronize_rcu() between the counter decrement and the setting the variable to NULL, but only a smp_wmb() between that and the freeing of the buffer. It is theoretically possible that a user missed seeing the decrement, but will use the buffer after it is free. Another synchronize_rcu() is needed in place of that smp_wmb(). - ring buffer timestamps on 32 bit machines The ring buffer timestamp on 32 bit machines has to break the 64 bit number into multiple values as cmpxchg is required on it, and a 64 bit cmpxchg on 32 bit architectures is very slow. The code use to just use two 32 bit values and make it a 60 bit timestamp where the other 4 bits were used as counters for synchronization. It later came known that the timestamp on 32 bit still need all 64 bits in some cases. So 3 words were created to handle the 64 bits. But issues arised with this: 1. The synchronization logic still only compared the counter with the first two, but not with the third number, so the synchronization could fail unknowingly. 2. A check on discard of an event could race if an event happened between the discard and updating one of the counters. The counter needs to be updated (forcing an absolute timestamp and not to use a delta) before the actual discard happens. * tag 'trace-v6.7-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: ring-buffer: Test last update in 32bit version of __rb_time_read() ring-buffer: Force absolute timestamp on discard of event tracing: Fix a possible race when disabling buffered events tracing: Fix a warning when allocating buffered events fails tracing: Fix incomplete locking when disabling buffered events tracing: Disable snapshot buffer when stopping instance tracers tracing: Stop current tracer when resizing buffer tracing: Always update snapshot buffer size |
||
Linus Torvalds
|
8e819a7623 |
31 hotfixes. 10 of these address pre-6.6 issues and are marked cc:stable.
The remainder address post-6.6 issues or aren't considered serious enough to justify backporting. -----BEGIN PGP SIGNATURE----- iHUEABYIAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZXKEfwAKCRDdBJ7gKXxA jlRpAQCiAp1nSqIz/fOKTzoQRaTDXU/m+C+6ZAXdKLDfvQBhpwEAnxxjZ8IgF+8Z Klz/GirHX5w5o7jE2wb8iObo1nR75Qo= =omRq -----END PGP SIGNATURE----- Merge tag 'mm-hotfixes-stable-2023-12-07-18-47' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull misc fixes from Andrew Morton: "31 hotfixes. Ten of these address pre-6.6 issues and are marked cc:stable. The remainder address post-6.6 issues or aren't considered serious enough to justify backporting" * tag 'mm-hotfixes-stable-2023-12-07-18-47' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (31 commits) mm/madvise: add cond_resched() in madvise_cold_or_pageout_pte_range() nilfs2: prevent WARNING in nilfs_sufile_set_segment_usage() mm/hugetlb: have CONFIG_HUGETLB_PAGE select CONFIG_XARRAY_MULTI scripts/gdb: fix lx-device-list-bus and lx-device-list-class MAINTAINERS: drop Antti Palosaari highmem: fix a memory copy problem in memcpy_from_folio nilfs2: fix missing error check for sb_set_blocksize call kernel/Kconfig.kexec: drop select of KEXEC for CRASH_DUMP units: add missing header drivers/base/cpu: crash data showing should depends on KEXEC_CORE mm/damon/sysfs-schemes: add timeout for update_schemes_tried_regions scripts/gdb/tasks: fix lx-ps command error mm/Kconfig: make userfaultfd a menuconfig selftests/mm: prevent duplicate runs caused by TEST_GEN_PROGS mm/damon/core: copy nr_accesses when splitting region lib/group_cpus.c: avoid acquiring cpu hotplug lock in group_cpus_evenly checkstack: fix printed address mm/memory_hotplug: fix error handling in add_memory_resource() mm/memory_hotplug: add missing mem_hotplug_lock .mailmap: add a new address mapping for Chester Lin ... |
||
Linus Torvalds
|
5e3f5b81de |
Including fixes from bpf and netfilter.
Current release - regressions: - veth: fix packet segmentation in veth_convert_skb_to_xdp_buff Current release - new code bugs: - tcp: assorted fixes to the new Auth Option support Older releases - regressions: - tcp: fix mid stream window clamp - tls: fix incorrect splice handling - ipv4: ip_gre: handle skb_pull() failure in ipgre_xmit() - dsa: mv88e6xxx: restore USXGMII support for 6393X - arcnet: restore support for multiple Sohard Arcnet cards Older releases - always broken: - tcp: do not accept ACK of bytes we never sent - require admin privileges to receive packet traces via netlink - packet: move reference count in packet_sock to atomic_long_t - bpf: - fix incorrect branch offset comparison with cpu=v4 - fix prog_array_map_poke_run map poke update - netfilter: - 3 fixes for crashes on bad admin commands - xt_owner: fix race accessing sk->sk_socket, TOCTOU null-deref - nf_tables: fix 'exist' matching on bigendian arches - leds: netdev: fix RTNL handling to prevent potential deadlock - eth: tg3: prevent races in error/reset handling - eth: r8169: fix rtl8125b PAUSE storm when suspended - eth: r8152: improve reset and surprise removal handling - eth: hns: fix race between changing features and sending - eth: nfp: fix sleep in atomic for bonding offload Signed-off-by: Jakub Kicinski <kuba@kernel.org> -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmVyGxsACgkQMUZtbf5S IrvziA//XZQLEQ3OsZnnYuuGkH0lPnY6ABaK/hcjCHnk9xs8SfIKPVYpq1LaShEp TY6mBhLMIANbdNO+yPzaszWVTkBPyb0w8JNy43bhLhOL3m/6FS6qwsgN8SAL2qVv 8rnDF9Gsb4yU27aMZ6+2m92WiuyPptf4HrWU2ISSv/oCYH9TWsPUrTwt+QuVUboN eSbvMzgIAkFIQVSbhMuinR9bOzAypSJPi18m1kkID5NsNUP/OToxPE7IFDEVS/oo f4P7Ru6g1Gw9pAJmVXy5c0528Hy2P4Pyyw3LD5i2FWZ7rhYJRADOC4EMs9lINzrn uscNUyztldaMHkKcZRqKbaXsnA3MPvuf3qycRH0wyHa1+OjL9N4A9P077FugtBln UlmgVokfONVlxRgwy7AqapQbZ30QmnUEOvWjFWV3dsCBS3ziq1h7ujCTaQkl6R/6 i96xuiUPMrAnxAlbFOjoF8NeGvcvwujYCqs/q5JC43f+xZRGf52Pwf5U/AliOFym aBX1mF/mdMLjYIBlGwFABiybACRPMceT2RuCfvhfIdQiM01OHlydO933jS+R3I4O cB03ppK0QiNo5W4RlMqDGuXfVnBJ36pv/2tY8IUOZGXSR+jSQOxZHrhYrtzMM5F8 sWjpEIrfzdtuz0ssEg9wwGBTffEf07uZyPttov3Pm+VnDrsmCMU= =bkyC -----END PGP SIGNATURE----- Merge tag 'net-6.7-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Jakub Kicinski: "Including fixes from bpf and netfilter. Current release - regressions: - veth: fix packet segmentation in veth_convert_skb_to_xdp_buff Current release - new code bugs: - tcp: assorted fixes to the new Auth Option support Older releases - regressions: - tcp: fix mid stream window clamp - tls: fix incorrect splice handling - ipv4: ip_gre: handle skb_pull() failure in ipgre_xmit() - dsa: mv88e6xxx: restore USXGMII support for 6393X - arcnet: restore support for multiple Sohard Arcnet cards Older releases - always broken: - tcp: do not accept ACK of bytes we never sent - require admin privileges to receive packet traces via netlink - packet: move reference count in packet_sock to atomic_long_t - bpf: - fix incorrect branch offset comparison with cpu=v4 - fix prog_array_map_poke_run map poke update - netfilter: - three fixes for crashes on bad admin commands - xt_owner: fix race accessing sk->sk_socket, TOCTOU null-deref - nf_tables: fix 'exist' matching on bigendian arches - leds: netdev: fix RTNL handling to prevent potential deadlock - eth: tg3: prevent races in error/reset handling - eth: r8169: fix rtl8125b PAUSE storm when suspended - eth: r8152: improve reset and surprise removal handling - eth: hns: fix race between changing features and sending - eth: nfp: fix sleep in atomic for bonding offload" * tag 'net-6.7-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (62 commits) vsock/virtio: fix "comparison of distinct pointer types lacks a cast" warning net/smc: fix missing byte order conversion in CLC handshake net: dsa: microchip: provide a list of valid protocols for xmit handler drop_monitor: Require 'CAP_SYS_ADMIN' when joining "events" group psample: Require 'CAP_NET_ADMIN' when joining "packets" group bpf: sockmap, updating the sg structure should also update curr net: tls, update curr on splice as well nfp: flower: fix for take a mutex lock in soft irq context and rcu lock net: dsa: mv88e6xxx: Restore USXGMII support for 6393X tcp: do not accept ACK of bytes we never sent selftests/bpf: Add test for early update in prog_array_map_poke_run bpf: Fix prog_array_map_poke_run map poke update netfilter: xt_owner: Fix for unsafe access of sk->sk_socket netfilter: nf_tables: validate family when identifying table via handle netfilter: nf_tables: bail out on mismatching dynset and set expressions netfilter: nf_tables: fix 'exist' matching on bigendian arches netfilter: nft_set_pipapo: skip inactive elements during set walk netfilter: bpf: fix bad registration on nf_defrag leds: trigger: netdev: fix RTNL handling to prevent potential deadlock octeontx2-af: Update Tx link register range ... |
||
Linus Torvalds
|
9ace34a8e4 |
cgroup: Fixes for v6.7-rc4
Just one patch. |
||
Linus Torvalds
|
e0348c1f68 |
workqueue: Fixes for v6.7-rc4
Just one patch to fix a bug which can crash the kernel if the housekeeping and wq_unbound_cpu cpumask configuration combination leaves the latter empty. -----BEGIN PGP SIGNATURE----- iIQEABYKACwWIQTfIjM1kS57o3GsC/uxYfJx3gVYGQUCZXDKTg4cdGpAa2VybmVs Lm9yZwAKCRCxYfJx3gVYGTmMAP9kuC9JkII2J5JnxQpkJLDd/qeRHrigrClx3F0+ gBiK8AD/XgsGY5J/OOMjsU1Px7BYvy6w0MEEqqhx2vOVEkEFPAo= =pH9n -----END PGP SIGNATURE----- Merge tag 'wq-for-6.7-rc4-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq Pull workqueue fix from Tejun Heo: "Just one patch to fix a bug which can crash the kernel if the housekeeping and wq_unbound_cpu cpumask configuration combination leaves the latter empty" * tag 'wq-for-6.7-rc4-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq: workqueue: Make sure that wq_unbound_cpumask is never empty |
||
Baoquan He
|
dccf78d39f |
kernel/Kconfig.kexec: drop select of KEXEC for CRASH_DUMP
Ignat Korchagin complained that a potential config regression was introduced by commit |
||
Jiri Olsa
|
4b7de80160 |
bpf: Fix prog_array_map_poke_run map poke update
Lee pointed out issue found by syscaller [0] hitting BUG in prog array
map poke update in prog_array_map_poke_run function due to error value
returned from bpf_arch_text_poke function.
There's race window where bpf_arch_text_poke can fail due to missing
bpf program kallsym symbols, which is accounted for with check for
-EINVAL in that BUG_ON call.
The problem is that in such case we won't update the tail call jump
and cause imbalance for the next tail call update check which will
fail with -EBUSY in bpf_arch_text_poke.
I'm hitting following race during the program load:
CPU 0 CPU 1
bpf_prog_load
bpf_check
do_misc_fixups
prog_array_map_poke_track
map_update_elem
bpf_fd_array_map_update_elem
prog_array_map_poke_run
bpf_arch_text_poke returns -EINVAL
bpf_prog_kallsyms_add
After bpf_arch_text_poke (CPU 1) fails to update the tail call jump, the next
poke update fails on expected jump instruction check in bpf_arch_text_poke
with -EBUSY and triggers the BUG_ON in prog_array_map_poke_run.
Similar race exists on the program unload.
Fixing this by moving the update to bpf_arch_poke_desc_update function which
makes sure we call __bpf_arch_text_poke that skips the bpf address check.
Each architecture has slightly different approach wrt looking up bpf address
in bpf_arch_text_poke, so instead of splitting the function or adding new
'checkip' argument in previous version, it seems best to move the whole
map_poke_run update as arch specific code.
[0] https://syzkaller.appspot.com/bug?extid=97a4fe20470e9bc30810
Fixes:
|
||
Steven Rostedt (Google)
|
f458a14534 |
ring-buffer: Test last update in 32bit version of __rb_time_read()
Since 64 bit cmpxchg() is very expensive on 32bit architectures, the
timestamp used by the ring buffer does some interesting tricks to be able
to still have an atomic 64 bit number. It originally just used 60 bits and
broke it up into two 32 bit words where the extra 2 bits were used for
synchronization. But this was not enough for all use cases, and all 64
bits were required.
The 32bit version of the ring buffer timestamp was then broken up into 3
32bit words using the same counter trick. But one update was not done. The
check to see if the read operation was done without interruption only
checked the first two words and not last one (like it had before this
update). Fix it by making sure all three updates happen without
interruption by comparing the initial counter with the last updated
counter.
Link: https://lore.kernel.org/linux-trace-kernel/20231206100050.3100b7bb@gandalf.local.home
Cc: stable@vger.kernel.org
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Fixes:
|
||
Steven Rostedt (Google)
|
b2dd797543 |
ring-buffer: Force absolute timestamp on discard of event
There's a race where if an event is discarded from the ring buffer and an
interrupt were to happen at that time and insert an event, the time stamp
is still used from the discarded event as an offset. This can screw up the
timings.
If the event is going to be discarded, set the "before_stamp" to zero.
When a new event comes in, it compares the "before_stamp" with the
"write_stamp" and if they are not equal, it will insert an absolute
timestamp. This will prevent the timings from getting out of sync due to
the discarded event.
Link: https://lore.kernel.org/linux-trace-kernel/20231206100244.5130f9b3@gandalf.local.home
Cc: stable@vger.kernel.org
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Fixes:
|
||
Petr Pavlu
|
c0591b1ccc |
tracing: Fix a possible race when disabling buffered events
Function trace_buffered_event_disable() is responsible for freeing pages
backing buffered events and this process can run concurrently with
trace_event_buffer_lock_reserve().
The following race is currently possible:
* Function trace_buffered_event_disable() is called on CPU 0. It
increments trace_buffered_event_cnt on each CPU and waits via
synchronize_rcu() for each user of trace_buffered_event to complete.
* After synchronize_rcu() is finished, function
trace_buffered_event_disable() has the exclusive access to
trace_buffered_event. All counters trace_buffered_event_cnt are at 1
and all pointers trace_buffered_event are still valid.
* At this point, on a different CPU 1, the execution reaches
trace_event_buffer_lock_reserve(). The function calls
preempt_disable_notrace() and only now enters an RCU read-side
critical section. The function proceeds and reads a still valid
pointer from trace_buffered_event[CPU1] into the local variable
"entry". However, it doesn't yet read trace_buffered_event_cnt[CPU1]
which happens later.
* Function trace_buffered_event_disable() continues. It frees
trace_buffered_event[CPU1] and decrements
trace_buffered_event_cnt[CPU1] back to 0.
* Function trace_event_buffer_lock_reserve() continues. It reads and
increments trace_buffered_event_cnt[CPU1] from 0 to 1. This makes it
believe that it can use the "entry" that it already obtained but the
pointer is now invalid and any access results in a use-after-free.
Fix the problem by making a second synchronize_rcu() call after all
trace_buffered_event values are set to NULL. This waits on all potential
users in trace_event_buffer_lock_reserve() that still read a previous
pointer from trace_buffered_event.
Link: https://lore.kernel.org/all/20231127151248.7232-2-petr.pavlu@suse.com/
Link: https://lkml.kernel.org/r/20231205161736.19663-4-petr.pavlu@suse.com
Cc: stable@vger.kernel.org
Fixes:
|
||
Petr Pavlu
|
34209fe83e |
tracing: Fix a warning when allocating buffered events fails
Function trace_buffered_event_disable() produces an unexpected warning
when the previous call to trace_buffered_event_enable() fails to
allocate pages for buffered events.
The situation can occur as follows:
* The counter trace_buffered_event_ref is at 0.
* The soft mode gets enabled for some event and
trace_buffered_event_enable() is called. The function increments
trace_buffered_event_ref to 1 and starts allocating event pages.
* The allocation fails for some page and trace_buffered_event_disable()
is called for cleanup.
* Function trace_buffered_event_disable() decrements
trace_buffered_event_ref back to 0, recognizes that it was the last
use of buffered events and frees all allocated pages.
* The control goes back to trace_buffered_event_enable() which returns.
The caller of trace_buffered_event_enable() has no information that
the function actually failed.
* Some time later, the soft mode is disabled for the same event.
Function trace_buffered_event_disable() is called. It warns on
"WARN_ON_ONCE(!trace_buffered_event_ref)" and returns.
Buffered events are just an optimization and can handle failures. Make
trace_buffered_event_enable() exit on the first failure and left any
cleanup later to when trace_buffered_event_disable() is called.
Link: https://lore.kernel.org/all/20231127151248.7232-2-petr.pavlu@suse.com/
Link: https://lkml.kernel.org/r/20231205161736.19663-3-petr.pavlu@suse.com
Fixes:
|
||
Petr Pavlu
|
7fed14f7ac |
tracing: Fix incomplete locking when disabling buffered events
The following warning appears when using buffered events: [ 203.556451] WARNING: CPU: 53 PID: 10220 at kernel/trace/ring_buffer.c:3912 ring_buffer_discard_commit+0x2eb/0x420 [...] [ 203.670690] CPU: 53 PID: 10220 Comm: stress-ng-sysin Tainted: G E 6.7.0-rc2-default #4 56e6d0fcf5581e6e51eaaecbdaec2a2338c80f3a [ 203.670704] Hardware name: Intel Corp. GROVEPORT/GROVEPORT, BIOS GVPRCRB1.86B.0016.D04.1705030402 05/03/2017 [ 203.670709] RIP: 0010:ring_buffer_discard_commit+0x2eb/0x420 [ 203.735721] Code: 4c 8b 4a 50 48 8b 42 48 49 39 c1 0f 84 b3 00 00 00 49 83 e8 01 75 b1 48 8b 42 10 f0 ff 40 08 0f 0b e9 fc fe ff ff f0 ff 47 08 <0f> 0b e9 77 fd ff ff 48 8b 42 10 f0 ff 40 08 0f 0b e9 f5 fe ff ff [ 203.735734] RSP: 0018:ffffb4ae4f7b7d80 EFLAGS: 00010202 [ 203.735745] RAX: 0000000000000000 RBX: ffffb4ae4f7b7de0 RCX: ffff8ac10662c000 [ 203.735754] RDX: ffff8ac0c750be00 RSI: ffff8ac10662c000 RDI: ffff8ac0c004d400 [ 203.781832] RBP: ffff8ac0c039cea0 R08: 0000000000000000 R09: 0000000000000000 [ 203.781839] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 [ 203.781842] R13: ffff8ac10662c000 R14: ffff8ac0c004d400 R15: ffff8ac10662c008 [ 203.781846] FS: 00007f4cd8a67740(0000) GS:ffff8ad798880000(0000) knlGS:0000000000000000 [ 203.781851] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 203.781855] CR2: 0000559766a74028 CR3: 00000001804c4000 CR4: 00000000001506f0 [ 203.781862] Call Trace: [ 203.781870] <TASK> [ 203.851949] trace_event_buffer_commit+0x1ea/0x250 [ 203.851967] trace_event_raw_event_sys_enter+0x83/0xe0 [ 203.851983] syscall_trace_enter.isra.0+0x182/0x1a0 [ 203.851990] do_syscall_64+0x3a/0xe0 [ 203.852075] entry_SYSCALL_64_after_hwframe+0x6e/0x76 [ 203.852090] RIP: 0033:0x7f4cd870fa77 [ 203.982920] Code: 00 b8 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 66 90 b8 89 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d e9 43 0e 00 f7 d8 64 89 01 48 [ 203.982932] RSP: 002b:00007fff99717dd8 EFLAGS: 00000246 ORIG_RAX: 0000000000000089 [ 203.982942] RAX: ffffffffffffffda RBX: 0000558ea1d7b6f0 RCX: 00007f4cd870fa77 [ 203.982948] RDX: 0000000000000000 RSI: 00007fff99717de0 RDI: 0000558ea1d7b6f0 [ 203.982957] RBP: 00007fff99717de0 R08: 00007fff997180e0 R09: 00007fff997180e0 [ 203.982962] R10: 00007fff997180e0 R11: 0000000000000246 R12: 00007fff99717f40 [ 204.049239] R13: 00007fff99718590 R14: 0000558e9f2127a8 R15: 00007fff997180b0 [ 204.049256] </TASK> For instance, it can be triggered by running these two commands in parallel: $ while true; do echo hist:key=id.syscall:val=hitcount > \ /sys/kernel/debug/tracing/events/raw_syscalls/sys_enter/trigger; done $ stress-ng --sysinfo $(nproc) The warning indicates that the current ring_buffer_per_cpu is not in the committing state. It happens because the active ring_buffer_event doesn't actually come from the ring_buffer_per_cpu but is allocated from trace_buffered_event. The bug is in function trace_buffered_event_disable() where the following normally happens: * The code invokes disable_trace_buffered_event() via smp_call_function_many() and follows it by synchronize_rcu(). This increments the per-CPU variable trace_buffered_event_cnt on each target CPU and grants trace_buffered_event_disable() the exclusive access to the per-CPU variable trace_buffered_event. * Maintenance is performed on trace_buffered_event, all per-CPU event buffers get freed. * The code invokes enable_trace_buffered_event() via smp_call_function_many(). This decrements trace_buffered_event_cnt and releases the access to trace_buffered_event. A problem is that smp_call_function_many() runs a given function on all target CPUs except on the current one. The following can then occur: * Task X executing trace_buffered_event_disable() runs on CPU 0. * The control reaches synchronize_rcu() and the task gets rescheduled on another CPU 1. * The RCU synchronization finishes. At this point, trace_buffered_event_disable() has the exclusive access to all trace_buffered_event variables except trace_buffered_event[CPU0] because trace_buffered_event_cnt[CPU0] is never incremented and if the buffer is currently unused, remains set to 0. * A different task Y is scheduled on CPU 0 and hits a trace event. The code in trace_event_buffer_lock_reserve() sees that trace_buffered_event_cnt[CPU0] is set to 0 and decides the use the buffer provided by trace_buffered_event[CPU0]. * Task X continues its execution in trace_buffered_event_disable(). The code incorrectly frees the event buffer pointed by trace_buffered_event[CPU0] and resets the variable to NULL. * Task Y writes event data to the now freed buffer and later detects the created inconsistency. The issue is observable since commit |
||
Steven Rostedt (Google)
|
b538bf7d0e |
tracing: Disable snapshot buffer when stopping instance tracers
It use to be that only the top level instance had a snapshot buffer (for
latency tracers like wakeup and irqsoff). When stopping a tracer in an
instance would not disable the snapshot buffer. This could have some
unintended consequences if the irqsoff tracer is enabled.
Consolidate the tracing_start/stop() with tracing_start/stop_tr() so that
all instances behave the same. The tracing_start/stop() functions will
just call their respective tracing_start/stop_tr() with the global_array
passed in.
Link: https://lkml.kernel.org/r/20231205220011.041220035@goodmis.org
Cc: stable@vger.kernel.org
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Fixes:
|
||
Steven Rostedt (Google)
|
d78ab79270 |
tracing: Stop current tracer when resizing buffer
When the ring buffer is being resized, it can cause side effects to the
running tracer. For instance, there's a race with irqsoff tracer that
swaps individual per cpu buffers between the main buffer and the snapshot
buffer. The resize operation modifies the main buffer and then the
snapshot buffer. If a swap happens in between those two operations it will
break the tracer.
Simply stop the running tracer before resizing the buffers and enable it
again when finished.
Link: https://lkml.kernel.org/r/20231205220010.748996423@goodmis.org
Cc: stable@vger.kernel.org
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Fixes:
|
||
Steven Rostedt (Google)
|
7be76461f3 |
tracing: Always update snapshot buffer size
It use to be that only the top level instance had a snapshot buffer (for
latency tracers like wakeup and irqsoff). The update of the ring buffer
size would check if the instance was the top level and if so, it would
also update the snapshot buffer as it needs to be the same as the main
buffer.
Now that lower level instances also has a snapshot buffer, they too need
to update their snapshot buffer sizes when the main buffer is changed,
otherwise the following can be triggered:
# cd /sys/kernel/tracing
# echo 1500 > buffer_size_kb
# mkdir instances/foo
# echo irqsoff > instances/foo/current_tracer
# echo 1000 > instances/foo/buffer_size_kb
Produces:
WARNING: CPU: 2 PID: 856 at kernel/trace/trace.c:1938 update_max_tr_single.part.0+0x27d/0x320
Which is:
ret = ring_buffer_swap_cpu(tr->max_buffer.buffer, tr->array_buffer.buffer, cpu);
if (ret == -EBUSY) {
[..]
}
WARN_ON_ONCE(ret && ret != -EAGAIN && ret != -EBUSY); <== here
That's because ring_buffer_swap_cpu() has:
int ret = -EINVAL;
[..]
/* At least make sure the two buffers are somewhat the same */
if (cpu_buffer_a->nr_pages != cpu_buffer_b->nr_pages)
goto out;
[..]
out:
return ret;
}
Instead, update all instances' snapshot buffer sizes when their main
buffer size is updated.
Link: https://lkml.kernel.org/r/20231205220010.454662151@goodmis.org
Cc: stable@vger.kernel.org
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Fixes:
|
||
Linus Torvalds
|
669fc83452 |
Probes fixes for v6.7-r3:
- objpool: Fix objpool overrun case on memory/cache access delay especially on the big.LITTLE SoC. The objpool uses a copy of object slot index internal loop, but the slot index can be changed on another processor in parallel. In that case, the difference of 'head' local copy and the 'slot->last' index will be bigger than local slot size. In that case, we need to re-read the slot::head to update it. - kretprobe: Fix to use appropriate rcu API for kretprobe holder. Since kretprobe_holder::rp is RCU managed, it should use rcu_assign_pointer() and rcu_dereference_check() correctly. Also adding __rcu tag for finding wrong usage by sparse. - rethook: Fix to use appropriate rcu API for rethook::handler. The same as kretprobe, rethook::handler is RCU managed and it should use rcu_assign_pointer() and rcu_dereference_check(). This also adds __rcu tag for finding wrong usage by sparse. -----BEGIN PGP SIGNATURE----- iQFPBAABCgA5FiEEh7BulGwFlgAOi5DV2/sHvwUrPxsFAmVpfBobHG1hc2FtaS5o aXJhbWF0c3VAZ21haWwuY29tAAoJENv7B78FKz8bNyMIAJSLICKQNuFiBJEn/rty ACWJ9QMOnwi0DoVaepG/m9QJh6AIUUFW4//9helmSm0GIVzxQ2+f8UeKU+sYiVtH ro9atea4W4+FMTvtEB1cU8oG5CDVT4WQdUXbjMktqYe3+WB8Zt8+fIP0mnbTFAVr yStpliGPecmlupJVRYqrJGyDdbkUxXxVlPsP/eDrHFgbBWv8Incw0f+MLGSi6LSE sZ1MaKCdi2tlHbtD/fiowfLoBMZwQAKY4hq/XguVsWh+BGaGUgwtif+8ESwPeu22 KEZLyWDQ1N8XBHyOBotV7vsBEwh6LKtLGVXIBsO4KxVyGw6msxWBis0dt/tkn+kk LEg= =B9WK -----END PGP SIGNATURE----- Merge tag 'probes-fixes-v6.7-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull probes fixes from Masami Hiramatsu: - objpool: Fix objpool overrun case on memory/cache access delay especially on the big.LITTLE SoC. The objpool uses a copy of object slot index internal loop, but the slot index can be changed on another processor in parallel. In that case, the difference of 'head' local copy and the 'slot->last' index will be bigger than local slot size. In that case, we need to re-read the slot::head to update it. - kretprobe: Fix to use appropriate rcu API for kretprobe holder. Since kretprobe_holder::rp is RCU managed, it should use rcu_assign_pointer() and rcu_dereference_check() correctly. Also adding __rcu tag for finding wrong usage by sparse. - rethook: Fix to use appropriate rcu API for rethook::handler. The same as kretprobe, rethook::handler is RCU managed and it should use rcu_assign_pointer() and rcu_dereference_check(). This also adds __rcu tag for finding wrong usage by sparse. * tag 'probes-fixes-v6.7-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: rethook: Use __rcu pointer for rethook::handler kprobes: consistent rcu api usage for kretprobe holder lib: objpool: fix head overrun on RK3588 SBC |
||
Yonghong Song
|
dfce9cb314 |
bpf: Fix a verifier bug due to incorrect branch offset comparison with cpu=v4
Bpf cpu=v4 support is introduced in [1] and Commit |
||
Masami Hiramatsu (Google)
|
a1461f1fd6 |
rethook: Use __rcu pointer for rethook::handler
Since the rethook::handler is an RCU-maganged pointer so that it will
notice readers the rethook is stopped (unregistered) or not, it should
be an __rcu pointer and use appropriate functions to be accessed. This
will use appropriate memory barrier when accessing it. OTOH,
rethook::data is never changed, so we don't need to check it in
get_kretprobe().
NOTE: To avoid sparse warning, rethook::handler is defined by a raw
function pointer type with __rcu instead of rethook_handler_t.
Link: https://lore.kernel.org/all/170126066201.398836.837498688669005979.stgit@devnote2/
Fixes:
|
||
JP Kobryn
|
d839a656d0 |
kprobes: consistent rcu api usage for kretprobe holder
It seems that the pointer-to-kretprobe "rp" within the kretprobe_holder is
RCU-managed, based on the (non-rethook) implementation of get_kretprobe().
The thought behind this patch is to make use of the RCU API where possible
when accessing this pointer so that the needed barriers are always in place
and to self-document the code.
The __rcu annotation to "rp" allows for sparse RCU checking. Plain writes
done to the "rp" pointer are changed to make use of the RCU macro for
assignment. For the single read, the implementation of get_kretprobe()
is simplified by making use of an RCU macro which accomplishes the same,
but note that the log warning text will be more generic.
I did find that there is a difference in assembly generated between the
usage of the RCU macros vs without. For example, on arm64, when using
rcu_assign_pointer(), the corresponding store instruction is a
store-release (STLR) which has an implicit barrier. When normal assignment
is done, a regular store (STR) is found. In the macro case, this seems to
be a result of rcu_assign_pointer() using smp_store_release() when the
value to write is not NULL.
Link: https://lore.kernel.org/all/20231122132058.3359-1-inwardvessel@gmail.com/
Fixes:
|
||
Linus Torvalds
|
6172a5180f |
Including fixes from bpf and wifi.
Current release - regressions: - neighbour: fix __randomize_layout crash in struct neighbour - r8169: fix deadlock on RTL8125 in jumbo mtu mode Previous releases - regressions: - wifi: - mac80211: fix warning at station removal time - cfg80211: fix CQM for non-range use - tools: ynl-gen: fix unexpected response handling - octeontx2-af: fix possible buffer overflow - dpaa2: recycle the RX buffer only after all processing done - rswitch: fix missing dev_kfree_skb_any() in error path Previous releases - always broken: - ipv4: fix uaf issue when receiving igmp query packet - wifi: mac80211: fix debugfs deadlock at device removal time - bpf: - sockmap: af_unix stream sockets need to hold ref for pair sock - netdevsim: don't accept device bound programs - selftests: fix a char signedness issue - dsa: mv88e6xxx: fix marvell 6350 probe crash - octeontx2-pf: restore TC ingress police rules when interface is up - wangxun: fix memory leak on msix entry - ravb: keep reverse order of operations in ravb_remove() Signed-off-by: Paolo Abeni <pabeni@redhat.com> -----BEGIN PGP SIGNATURE----- iQJGBAABCAAwFiEEg1AjqC77wbdLX2LbKSR5jcyPE6QFAmVobzISHHBhYmVuaUBy ZWRoYXQuY29tAAoJECkkeY3MjxOk4rwP/2qaUstOJVpkO8cG+bRYi3idH9uO/8Yu dYgFI4LM826YgbVNVzuiu9Sh7t78dbep/fWQ2quDuZUinWtPmv6RV3UKbDyNWLRr iV7sZvXElGsUefixxGANYDUPuCrlr3O230Y8zCN0R65BMurppljs9Pp8FwIqaD+v pVs2alb/PeX7g+hPACKPr4Knu8QeZYmzdHoyYeLoMG3PqIgJVU3/8OHHfmnYCdxT VSss2LB5FKFCOgetEPGy83KQP7QVaK22GDphZJ4xh7aSewRVP92ORfauiI8To4vQ 0VnLNcQ+1pXnYzgGdv8oF02e4EP5b0jvrTpqCw1U0QU2s2PARJarzajCXBkwa308 gXELRpVRpY4+7WEBSX4RGUigurwGGEh/IP/puVtPDr9KU3lFgaTI8wM624Y3Ob/e /LVI7a5kUSJysq9/H/QrHjoiuTtV7nCmzBgDqEFSN5hQinSHYKyD0XsUPcLlMJmn p6CyQDGHv2ibbg+8TStig0xfmC83N8KfDfcCekSrYxquDMTRtfa2VXofzQiQKDnr XNyIURmZAAUVPR6enxlg5Iqzc0mQGumYif7wzsO1bzVzmVZgIDCVxU95hkoRrutU qnWXuUGUdieUvXA9HltntTzy2BgJVtg7L/p8YEbd97dxtgK80sbdnjfDswFvEeE4 nTvE+IDKdCmb =QiQp -----END PGP SIGNATURE----- Merge tag 'net-6.7-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Paolo Abeni: "Including fixes from bpf and wifi. Current release - regressions: - neighbour: fix __randomize_layout crash in struct neighbour - r8169: fix deadlock on RTL8125 in jumbo mtu mode Previous releases - regressions: - wifi: - mac80211: fix warning at station removal time - cfg80211: fix CQM for non-range use - tools: ynl-gen: fix unexpected response handling - octeontx2-af: fix possible buffer overflow - dpaa2: recycle the RX buffer only after all processing done - rswitch: fix missing dev_kfree_skb_any() in error path Previous releases - always broken: - ipv4: fix uaf issue when receiving igmp query packet - wifi: mac80211: fix debugfs deadlock at device removal time - bpf: - sockmap: af_unix stream sockets need to hold ref for pair sock - netdevsim: don't accept device bound programs - selftests: fix a char signedness issue - dsa: mv88e6xxx: fix marvell 6350 probe crash - octeontx2-pf: restore TC ingress police rules when interface is up - wangxun: fix memory leak on msix entry - ravb: keep reverse order of operations in ravb_remove()" * tag 'net-6.7-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (51 commits) net: ravb: Keep reverse order of operations in ravb_remove() net: ravb: Stop DMA in case of failures on ravb_open() net: ravb: Start TX queues after HW initialization succeeded net: ravb: Make write access to CXR35 first before accessing other EMAC registers net: ravb: Use pm_runtime_resume_and_get() net: ravb: Check return value of reset_control_deassert() net: libwx: fix memory leak on msix entry ice: Fix VF Reset paths when interface in a failed over aggregate bpf, sockmap: Add af_unix test with both sockets in map bpf, sockmap: af_unix stream sockets need to hold ref for pair sock tools: ynl-gen: always construct struct ynl_req_state ethtool: don't propagate EOPNOTSUPP from dumps ravb: Fix races between ravb_tx_timeout_work() and net related ops r8169: prevent potential deadlock in rtl8169_close r8169: fix deadlock on RTL8125 in jumbo mtu mode neighbour: Fix __randomize_layout crash in struct neighbour octeontx2-pf: Restore TC ingress police rules when interface is up octeontx2-pf: Fix adding mbox work queue entry when num_vfs > 64 net: stmmac: xgmac: Disable FPE MMC interrupts octeontx2-af: Fix possible buffer overflow ... |
||
Peter Zijlstra
|
382c27f4ed |
perf: Fix perf_event_validate_size()
Budimir noted that perf_event_validate_size() only checks the size of
the newly added event, even though the sizes of all existing events
can also change due to not all events having the same read_format.
When we attach the new event, perf_group_attach(), we do re-compute
the size for all events.
Fixes:
|
||
Elliot Berman
|
23ab79e8e4 |
freezer,sched: Do not restore saved_state of a thawed task
It is possible for a task to be thawed multiple times when mixing the
*legacy* cgroup freezer and system-wide freezer. To do this, freeze the
cgroup, do system-wide freeze/thaw, then thaw the cgroup. When this
happens, then a stale saved_state can be written to the task's state
and cause task to hang indefinitely. Fix this by only trying to thaw
tasks that are actually frozen.
This change also has the marginal benefit avoiding unnecessary
wake_up_state(p, TASK_FROZEN) if we know the task is already thawed.
There is not possibility of time-of-compare/time-of-use race when we skip
the wake_up_state because entering/exiting TASK_FROZEN is guarded by
freezer_lock.
Fixes:
|
||
Tim Van Patten
|
cff5f49d43 |
cgroup_freezer: cgroup_freezing: Check if not frozen
__thaw_task() was recently updated to warn if the task being thawed was
part of a freezer cgroup that is still currently freezing:
void __thaw_task(struct task_struct *p)
{
...
if (WARN_ON_ONCE(freezing(p)))
goto unlock;
This has exposed a bug in cgroup1 freezing where when CGROUP_FROZEN is
asserted, the CGROUP_FREEZING bits are not also cleared at the same
time. Meaning, when a cgroup is marked FROZEN it continues to be marked
FREEZING as well. This causes the WARNING to trigger, because
cgroup_freezing() thinks the cgroup is still freezing.
There are two ways to fix this:
1. Whenever FROZEN is set, clear FREEZING for the cgroup and all
children cgroups.
2. Update cgroup_freezing() to also verify that FROZEN is not set.
This patch implements option (2), since it's smaller and more
straightforward.
Signed-off-by: Tim Van Patten <timvp@google.com>
Tested-by: Mark Hasemeyer <markhas@chromium.org>
Fixes:
|
||
Hou Tao
|
75a442581d |
bpf: Add missed allocation hint for bpf_mem_cache_alloc_flags()
bpf_mem_cache_alloc_flags() may call __alloc() directly when there is no
free object in free list, but it doesn't initialize the allocation hint
for the returned pointer. It may lead to bad memory dereference when
freeing the pointer, so fix it by initializing the allocation hint.
Fixes:
|
||
Linus Torvalds
|
1d0dbc3d16 |
Fix lockdep block chain corruption resulting in KASAN warnings.
Signed-off-by: Ingo Molnar <mingo@kernel.org> -----BEGIN PGP SIGNATURE----- iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmVjEa0RHG1pbmdvQGtl cm5lbC5vcmcACgkQEnMQ0APhK1jGMRAAvlW/mmlwp4lRv/+aIRBo3iAzDS9vkPds uuS7jOweKkFJZJTR0Fr/OppRB05JObSUVXQSH71hGc0YUC29NEQyqa03Qy6MDdDx TuvDzIUildQqcUVJLRV2d8PmNRfFQftuQnvQcFpk+T0jrElBq6ADTe0SAwbSYLVU 8onXjYrGRsxOaZP7zQ99o4BkWyX7DHMv8lMhq5QdEHotg8/4BkcYDU4F99Zs0tu9 txi2RPDCvR8JmvK37qMXumexu/IMBcE8OQadmlQjK1uPiXIBj+7iHdrqDegUIayk XyttXmvODb8SgXL/o5thbmHI9ZGsTSK0RpwQMO5CHrF0LmlI/z2bNClz9bGMh/7A Sa6misq4at0o50RQmpus3zo8q8hZ1P37bhyhIBgsfbzLJCVWU5LAltV3A6OrDygy YR4j29qSsnZvRZ1kvlfDROS5t4QicPN1IwfYxdDJypnlapIeQbmt1nLQFH1zaCN4 EwYeVTfJ9dJpXozZTPftD/uiPhj7NZUNUhkVI9mngP46XMCC1GWjF1CcPYLuv8Iz Qw0Gj4YzDWFwuG98r3hrntXaTz2BKy4GVAQTQcpswhdPFJ/BPxY4AJPeTznm7fQX Lu2bIBLYlUROvuL45TgAPArh17iC8O1pfxwTfEOxlQvi9+xNzN9hPNsWRSiJnYlV R4q3G7Ejelo= =8C4B -----END PGP SIGNATURE----- Merge tag 'locking-urgent-2023-11-26' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull locking fix from Ingo Molnar: "Fix lockdep block chain corruption resulting in KASAN warnings" * tag 'locking-urgent-2023-11-26' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: lockdep: Fix block chain corruption |
||
Peter Zijlstra
|
bca4104b00 |
lockdep: Fix block chain corruption
Kent reported an occasional KASAN splat in lockdep. Mark then noted:
> I suspect the dodgy access is to chain_block_buckets[-1], which hits the last 4
> bytes of the redzone and gets (incorrectly/misleadingly) attributed to
> nr_large_chain_blocks.
That would mean @size == 0, at which point size_to_bucket() returns -1
and the above happens.
alloc_chain_hlocks() has 'size - req', for the first with the
precondition 'size >= rq', which allows the 0.
This code is trying to split a block, del_chain_block() takes what we
need, and add_chain_block() puts back the remainder, except in the
above case the remainder is 0 sized and things go sideways.
Fixes:
|
||
Linus Torvalds
|
d3fa86b1a7 |
Including fixes from bpf.
Current release - regressions: - Revert "net: r8169: Disable multicast filter for RTL8168H and RTL8107E" - kselftest: rtnetlink: fix ip route command typo Current release - new code bugs: - s390/ism: make sure ism driver implies smc protocol in kconfig - two build fixes for tools/net Previous releases - regressions: - rxrpc: couple of ACK/PING/RTT handling fixes Previous releases - always broken: - bpf: verify bpf_loop() callbacks as if they are called unknown number of times - improve stability of auto-bonding with Hyper-V - account BPF-neigh-redirected traffic in interface statistics Misc: - net: fill in some more MODULE_DESCRIPTION()s Signed-off-by: Jakub Kicinski <kuba@kernel.org> -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmVfiBoACgkQMUZtbf5S IrukFhAAiY5XyqiVyEBsm10AgYSpl0BbnxywfK27nR2SbxSTvSxyuXseV2EyEynU iNn6CksHe2rG1/DXbKoQohsIYey/YjY5c6omT5JzuxIT2h69J4iYKMIj/Ptk5joZ MQsDK5J9aCvxXBazYF2CuOCgVcwmqNFbCHf1FAFhk0RuqI3RoC5/OGbLM0tmTMQw BceNUHBn8iPcSkRbnntwLLHVxMrX9bay6F+pcy5/b40VWBATM3uBkw/2zBqPZ+n1 Z0SNWvLtnO6T66Y07vaE1sRiqN4KHtS4WWelVYcmYX2rY1HkXx/TUjvrJ7R/uQQQ /5yUB6G294NmFv/2X+Yjt5AB8PjnFzjm/BqCBrjXcnnMPOiB0YZg554s59RhRrSr cmZ4bveUgCQV/jJWMxwGYvZSAqtle8uN+8DhxdjbCpVJocbrseDHKyWJ6bOy85BN zbJuUOUeFDs53nhV+Z9fiuUFuxhIwDCKHHFmEp7R5VotX0ZURiDnqlj9WEIxKZrC UidWRXv/VP+bV4BB2GVIFncEWMuhrnWOQY8DR6eC33uq7JkwTZD3R8IGR8up/+tm CtVyPvefAYZB8/IVU/mOSVrx04ERupNVvBkXzOMQe7UqRq3okPsQFPW8HmSrmnQG KrJWyBIqG3jnJCuqoXwlt0rKP3MmgCjowhTbZ3uDjeVf9UJTu2U= =2sG4 -----END PGP SIGNATURE----- Merge tag 'net-6.7-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Jakub Kicinski: "Including fixes from bpf. Current release - regressions: - Revert "net: r8169: Disable multicast filter for RTL8168H and RTL8107E" - kselftest: rtnetlink: fix ip route command typo Current release - new code bugs: - s390/ism: make sure ism driver implies smc protocol in kconfig - two build fixes for tools/net Previous releases - regressions: - rxrpc: couple of ACK/PING/RTT handling fixes Previous releases - always broken: - bpf: verify bpf_loop() callbacks as if they are called unknown number of times - improve stability of auto-bonding with Hyper-V - account BPF-neigh-redirected traffic in interface statistics Misc: - net: fill in some more MODULE_DESCRIPTION()s" * tag 'net-6.7-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (58 commits) tools: ynl: fix duplicate op name in devlink tools: ynl: fix header path for nfsd net: ipa: fix one GSI register field width tls: fix NULL deref on tls_sw_splice_eof() with empty record net: axienet: Fix check for partial TX checksum vsock/test: fix SEQPACKET message bounds test i40e: Fix adding unsupported cloud filters ice: restore timestamp configuration after device reset ice: unify logic for programming PFINT_TSYN_MSK ice: remove ptp_tx ring parameter flag amd-xgbe: propagate the correct speed and duplex status amd-xgbe: handle the corner-case during tx completion amd-xgbe: handle corner-case during sfp hotplug net: veth: fix ethtool stats reporting octeontx2-pf: Fix ntuple rule creation to direct packet to VF with higher Rx queue than its PF net: usb: qmi_wwan: claim interface 4 for ZTE MF290 Revert "net: r8169: Disable multicast filter for RTL8168H and RTL8107E" net/smc: avoid data corruption caused by decline nfc: virtual_ncidev: Add variable to check if ndev is running dpll: Fix potential msg memleak when genlmsg_put_reply failed ... |
||
Tejun Heo
|
4a6c5607d4 |
workqueue: Make sure that wq_unbound_cpumask is never empty
During boot, depending on how the housekeeping and workqueue.unbound_cpus masks are set, wq_unbound_cpumask can end up empty. Since |
||
Eduard Zingerman
|
bb124da69c |
bpf: keep track of max number of bpf_loop callback iterations
In some cases verifier can't infer convergence of the bpf_loop() iteration. E.g. for the following program: static int cb(__u32 idx, struct num_context* ctx) { ctx->i++; return 0; } SEC("?raw_tp") int prog(void *_) { struct num_context ctx = { .i = 0 }; __u8 choice_arr[2] = { 0, 1 }; bpf_loop(2, cb, &ctx, 0); return choice_arr[ctx.i]; } Each 'cb' simulation would eventually return to 'prog' and reach 'return choice_arr[ctx.i]' statement. At which point ctx.i would be marked precise, thus forcing verifier to track multitude of separate states with {.i=0}, {.i=1}, ... at bpf_loop() callback entry. This commit allows "brute force" handling for such cases by limiting number of callback body simulations using 'umax' value of the first bpf_loop() parameter. For this, extend bpf_func_state with 'callback_depth' field. Increment this field when callback visiting state is pushed to states traversal stack. For frame #N it's 'callback_depth' field counts how many times callback with frame depth N+1 had been executed. Use bpf_func_state specifically to allow independent tracking of callback depths when multiple nested bpf_loop() calls are present. Signed-off-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/20231121020701.26440-11-eddyz87@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org> |
||
Eduard Zingerman
|
cafe2c2150 |
bpf: widening for callback iterators
Callbacks are similar to open coded iterators, so add imprecise widening logic for callback body processing. This makes callback based loops behave identically to open coded iterators, e.g. allowing to verify programs like below: struct ctx { u32 i; }; int cb(u32 idx, struct ctx* ctx) { ++ctx->i; return 0; } ... struct ctx ctx = { .i = 0 }; bpf_loop(100, cb, &ctx, 0); ... Acked-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/20231121020701.26440-9-eddyz87@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org> |
||
Eduard Zingerman
|
ab5cfac139 |
bpf: verify callbacks as if they are called unknown number of times
Prior to this patch callbacks were handled as regular function calls, execution of callback body was modeled exactly once. This patch updates callbacks handling logic as follows: - introduces a function push_callback_call() that schedules callback body verification in env->head stack; - updates prepare_func_exit() to reschedule callback body verification upon BPF_EXIT; - as calls to bpf_*_iter_next(), calls to callback invoking functions are marked as checkpoints; - is_state_visited() is updated to stop callback based iteration when some identical parent state is found. Paths with callback function invoked zero times are now verified first, which leads to necessity to modify some selftests: - the following negative tests required adding release/unlock/drop calls to avoid previously masked unrelated error reports: - cb_refs.c:underflow_prog - exceptions_fail.c:reject_rbtree_add_throw - exceptions_fail.c:reject_with_cp_reference - the following precision tracking selftests needed change in expected log trace: - verifier_subprog_precision.c:callback_result_precise (note: r0 precision is no longer propagated inside callback and I think this is a correct behavior) - verifier_subprog_precision.c:parent_callee_saved_reg_precise_with_callback - verifier_subprog_precision.c:parent_stack_slot_precise_with_callback Reported-by: Andrew Werner <awerner32@gmail.com> Closes: https://lore.kernel.org/bpf/CA+vRuzPChFNXmouzGG+wsy=6eMcfr1mFG0F3g7rbg-sedGKW3w@mail.gmail.com/ Acked-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/20231121020701.26440-7-eddyz87@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org> |
||
Eduard Zingerman
|
58124a98cb |
bpf: extract setup_func_entry() utility function
Move code for simulated stack frame creation to a separate utility function. This function would be used in the follow-up change for callbacks handling. Acked-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/20231121020701.26440-6-eddyz87@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org> |
||
Eduard Zingerman
|
683b96f960 |
bpf: extract __check_reg_arg() utility function
Split check_reg_arg() into two utility functions: - check_reg_arg() operating on registers from current verifier state; - __check_reg_arg() operating on a specific set of registers passed as a parameter; The __check_reg_arg() function would be used by a follow-up change for callbacks handling. Acked-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/20231121020701.26440-5-eddyz87@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org> |
||
Linus Torvalds
|
b0014556a2 |
- Do the push of pending hrtimers away from a CPU which is being
offlined earlier in the offlining process in order to prevent a deadlock -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmVaB28ACgkQEsHwGGHe VUr3ZBAAwOLL5vimHB3Y59cTRLPN+zGKhzyVLMLnbkKs4sGJ+9srP4HLX4Q9PoAb kR9Hzq90+48YuyLe+S/R2pvm1x88K33spS+4w4fl3x6EeToqvUlop2GPuMS2yzXY yECdqCLEd3Q6DeI8hN35lv899qyfGSD+6WxezLCT+uwx6AMHljMAsDy2249UtMZv 1bqZnYCtN2zv3MQuV1uli/AVxTDv4vXcumza17inuw0IpEA26Wz2kWruxeyZnUXU /sWZudUdhiErfg428ok3oTL1BOwPzyiIWjhN2MzqlKFmyp463DwV7KoAc3SxYINE 8qbODN93CFdnU6h29+VQoRxO9vcmWL6w7A/Swc9ar/0/Qnt7H9JdzUKtJ4+EaTCY /IpRWcNcX4WI6BKuHHl6kOBvX3YW77PKaIsxj8JDNZTMk6rq6lMGi+tIaVsAki92 3MQZ9+Lkm0baykIZAWz4jajbA98KvJMeJ60qZQI6sWWdpyrncEqG9pH/ulkLY4aZ gT94LiRpdwT0LWrX0J6xPMTNi9NYWjdB/uyo6Drer42SB9J7ol4rAbOxs50srG8i z46VGDtgWz6C5MSkonhQqrpGzc/HF9xCWVVSF1UENT4K+2W55JhJrDZBs5XCPJiz Bj8T3Maz7wcVkA41DA7C5xlVed+ST1ID8/4y5cWImnrmWOdG5Zw= =Tekh -----END PGP SIGNATURE----- Merge tag 'timers_urgent_for_v6.7_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer fix from Borislav Petkov: - Do the push of pending hrtimers away from a CPU which is being offlined earlier in the offlining process in order to prevent a deadlock * tag 'timers_urgent_for_v6.7_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: hrtimers: Push pending hrtimers away from outgoing CPU earlier |
||
Linus Torvalds
|
2a0adc4954 |
- Fix virtual runtime calculation when recomputing a sched entity's
weights - Fix wrongly rejected unprivileged poll requests to the cgroup psi pressure files - Make sure the load balancing is done by only one CPU -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmVaBe0ACgkQEsHwGGHe VUqX4hAAmrlp7bcloMRto6j4yC+pjDIQlFym7opa7kaEPeY3icOydfpSGEDnEwbv HxOOmveb2sC8DBE+Rkum4bHb2I46SD/5LlM/MZHvSguEGNgAJEYCcPfGZJDgGlW1 MgALG78ThA/mVKr5i3/Q1U6U71+vuNHJOpCY1s4o+jgF/sG3AYIdK1sqaVI++ygz q0WK31jGo+YelPpNDKnXpVGIuOcUlh9v/Hu2zGBBJD9pf4kfTelseiV7rc+rk0yI YHSKpw2jCnuJaGS748Q4aIG+8kLRBz+HqUKDWQPlq3pRWjJWTBbH+i8TZef7keZQ gAk/uJpdQ9z4Z7suwY3vcEBVRo4e6AoD99XDG1eUX07C+f1d9p54EVDkgFIZMIle pT2yd5GT/zl0UfcZ8B96y2lJHoa6pHnv83uZKtRZhBMiN5F4iN88lhQFVpZDoCBg xZ+NPfpXcZxm4HpKFjfsGyxQJxIkC6NDdf6Rfhtc3sV1rx4AT1Qii4fDnBHOkaBs iFgpFOCeb+K9UUXB0ONJ5PWZVnc8OGPtm/22TwtZ9rBzVqrmtVJb+VDg2YWpwFwU xhy0hMWxwZFsn0VjjsBbgfm1/+WGjCKjbPa1SvS3oH3+H9EbWiBjxe1zwkS46PUf HjC0RCMPxfnYG4+h9JHEaFioGvUqQQ6Ub3K8epd8MPUtD9DCnro= =hJzS -----END PGP SIGNATURE----- Merge tag 'sched_urgent_for_v6.7_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler fixes from Borislav Petkov: - Fix virtual runtime calculation when recomputing a sched entity's weights - Fix wrongly rejected unprivileged poll requests to the cgroup psi pressure files - Make sure the load balancing is done by only one CPU * tag 'sched_urgent_for_v6.7_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: sched/fair: Fix the decision for load balance sched: psi: fix unprivileged polling against cgroups sched/eevdf: Fix vruntime adjustment on reweight |
||
Linus Torvalds
|
2f84f8232e |
- Fix a hardcoded futex flags case which lead to one robust futex test
failure -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmVaA2sACgkQEsHwGGHe VUoOdBAAmvDdbMNVi0p33kqLhSQLwzxsqkrGyNkAfSbuuaGNsH8mQ87VA0dMQYpe bXzJzvoccHYxJYnFyExv0d7PtN3xquh2q32D1pL6gzaA974oEmQiyQag9++gkJGh +/NYQwo0Y2ucEsvgeMN+knE0q0OdelUAiKNPF9nE9LG0d9TLFC45jwLH+9pa5jAF jtLBxrexeU49UBBDnoPR2CNrDi9TlNYRas2V5xlQnUXl5kZlVNcQLMo1Rcd7+dTF 6I414ZVXiS6u02Vs7wcrKC50BdBIa4h2WaOX+Nb2j9ibJ5uY14B1nwewAztmaQY7 szpaI2EtSMk0+Ap0QHTaxZvi/UREWed5n0AykqTy97f0vsvkK9zCiPk3LMJsoupu vfEApclAIMzDi6qnB/zGhHkHLMBHsiXrANGCe6SbjphD9ic0ClKwAyqJ9kpB43JE pnqdrTcrYLuTCV+fE9r/WfXt5Z09xmlF+usmOS4T7y35gzrl4+BPVzu2V80SlZSj CtDSvMG7z7LLK5o8XsvQk1VlAYCXEPfOldkoRaisD82yKw0r38YqXf+cigE4noyq 55ChMwNmlqtetvPNK/6SsPtj8F/502Lqo/xAJjSRo/vO1KYpNa3sfXUZpZ5J+xuc zVGXzcBGsNgteVin2I0jhdOvRd7apA7rKiXd0duTtiSj2N++b5U= =T4AK -----END PGP SIGNATURE----- Merge tag 'locking_urgent_for_v6.7_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull locking fix from Borislav Petkov: - Fix a hardcoded futex flags case which lead to one robust futex test failure * tag 'locking_urgent_for_v6.7_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: futex: Fix hardcoded flags |
||
Linus Torvalds
|
c8b3443cbd |
- Make sure the context refcount is transferred too when migrating perf
events -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmVaAW4ACgkQEsHwGGHe VUq4bQ/5ATpnsIYxn/rjshZa2hfDrOfS9FRegfMCZyz7w/HDmRlwjIbXbQE7ttS7 vU55QgftybwS38N//LRzx9etGIhbcmeNgNibskslxsuVFCxBJLUVT3UJ9T9AtWrM v9i0W+D8SuJsIKoKo77K+OkTvKHZgpvxyABQivyHMjeD2zmPHUslJ+OTWYNooM7g kPmTu/M6sJme/q/PgtezOVKhV3tdaHB2oBKUEv9y4nVBNNVRnX/Fo+Zd/FEgs45F JhxnvwqP5cU7mSmn9itcNRKY8nW5lv2OdYNtgyqWbpYY39JPZxy/EKF4GUPv1Jqb d/eQov0B8K844j982bBa0lgecmiGYq7DNevVHlUB/JowQpNsfxVJAVphVnWZr5iy hzZQU5EfvmkRE/ja/g1wUn15mdxvAQ3QRmOGBJwCQi2QLC9zCiki4esnGc4U9Zp7 ZVQ22PVliuBnLRloOSHsp1G65i3K+VPlKyBOkp0YmEVz9EHahOceBPa39WkGCHOh 1EkhPIzemTYUzAE5KuzbLp7C6KzMvaP8Zc9fOLy9SEesb2HjwoIQjkRKmJ38rbSu KEY5SbdxIkVCsks4kajOVSysS89rafbH7ykd/Di3cXb1rOJtRSyWghnaap1trghd uf0EAVb9y8er2xvdOHHwHIdNILK3Y1F2TcOg+jAyywudrfmnIls= =lbBn -----END PGP SIGNATURE----- Merge tag 'perf_urgent_for_v6.7_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fix from Borislav Petkov: - Make sure the context refcount is transferred too when migrating perf events * tag 'perf_urgent_for_v6.7_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf/core: Fix cpuctx refcounting |
||
Linus Torvalds
|
2254005ef1 |
parisc architecture fixes for kernel v6.7-rc2:
- Fix power soft-off on qemu - Disable prctl(PR_SET_MDWE) since parisc sometimes still needs writeable stacks - Use strscpy instead of strlcpy in show_cpuinfo() -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQS86RI+GtKfB8BJu973ErUQojoPXwUCZVkHjgAKCRD3ErUQojoP X196AP9I9w/4Go3HfvFNgEGUpVSbQq8679um13mlMdlFC6z3NAD+J32vmvU1keL1 0f4C7IltOr2ntU4QIXJUCLAPWO7NWgQ= =r7N6 -----END PGP SIGNATURE----- Merge tag 'parisc-for-6.7-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux Pull parisc fixes from Helge Deller: "On parisc we still sometimes need writeable stacks, e.g. if programs aren't compiled with gcc-14. To avoid issues with the upcoming systemd-254 we therefore have to disable prctl(PR_SET_MDWE) for now (for parisc only). The other two patches are minor: a bugfix for the soft power-off on qemu with 64-bit kernel and prefer strscpy() over strlcpy(): - Fix power soft-off on qemu - Disable prctl(PR_SET_MDWE) since parisc sometimes still needs writeable stacks - Use strscpy instead of strlcpy in show_cpuinfo()" * tag 'parisc-for-6.7-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux: prctl: Disable prctl(PR_SET_MDWE) on parisc parisc/power: Fix power soft-off when running on qemu parisc: Replace strlcpy() with strscpy() |
||
Helge Deller
|
793838138c |
prctl: Disable prctl(PR_SET_MDWE) on parisc
systemd-254 tries to use prctl(PR_SET_MDWE) for it's MemoryDenyWriteExecute functionality, but fails on parisc which still needs executable stacks in certain combinations of gcc/glibc/kernel. Disable prctl(PR_SET_MDWE) by returning -EINVAL for now on parisc, until userspace has catched up. Signed-off-by: Helge Deller <deller@gmx.de> Co-developed-by: Linus Torvalds <torvalds@linux-foundation.org> Reported-by: Sam James <sam@gentoo.org> Closes: https://github.com/systemd/systemd/issues/29775 Tested-by: Sam James <sam@gentoo.org> Link: https://lore.kernel.org/all/875y2jro9a.fsf@gentoo.org/ Cc: <stable@vger.kernel.org> # v6.3+ |
||
Linus Torvalds
|
bf786e2a78 |
audit/stable-6.7 PR 20231116
-----BEGIN PGP SIGNATURE----- iQJIBAABCAAyFiEES0KozwfymdVUl37v6iDy2pc3iXMFAmVWX8cUHHBhdWxAcGF1 bC1tb29yZS5jb20ACgkQ6iDy2pc3iXPl8hAA2D4DAmbnM4wLGk5FX1ruFpACmabx 7iNPonV7loDiGZInvlTgvQxTQ6hafvs6aFqu69ZplLuCaBLiSn6U3J/bOXneQxzn nRjLQEfJLcSmTd39M82QxpaihCtVltDRT4jPfq4AGN+6nV0TB4KyFjrIvOw7udfX fJF096Lt9rqxbYyKk2Lgy8LZZdVqFN9pbstpH7Vas8LOi4bnvogRljhFA3vipn45 0tzMrFR9b/myOPFm1ktvAUSUdWIzNGmxsYkrxHkQ2TemhuFEiNl3n86juWzeXCzN wjaGPLIUqJQW+C+kXRmEZo/SytiqKS5Wo97mMVDPKpYFwp6IbgjSg01LPNdmLoVY 2i1jxOFTDnANLZgXa31kjzTO2Ceu61GFVqLZGuOh2lB7rjj3+JAkL0U/YLQWWBMO RG8MbmQnHOGZlHdqiPRKJKo/qHPW7vBkgSPJ/K0tRNXMFoZtGAfcHjxJJQNystPU BoRd2Tdw0jMrrS5cLNXfkxhHKwNHGFny4TRyqOJo9G7/jK56JWU+3ZXNoWH9OKFJ Ln2wH7NT16CLMnb/kZ2CSh8UQXIJpkBL1OuG6IrOQuBoNun7AzGnmXW7vqywV5bo dqOgxtkBYhrfjUEXmRzEii2oOoc/esr1vZYnmj5K4RpWksIXTJ4BZjC9kH/fHEpg 2ZO03UwyZ7ZiE2U= =00KW -----END PGP SIGNATURE----- Merge tag 'audit-pr-20231116' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit Pull audit fix from Paul Moore: "One small audit patch to convert a WARN_ON_ONCE() into a normal conditional to avoid scary looking console warnings when eBPF code generates audit records from unexpected places" * tag 'audit-pr-20231116' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit: audit: don't WARN_ON_ONCE(!current->mm) in audit_exe_compare() |
||
Linus Torvalds
|
7475e51b87 |
Including fixes from BPF and netfilter.
Current release - regressions: - core: fix undefined behavior in netdev name allocation - bpf: do not allocate percpu memory at init stage - netfilter: nf_tables: split async and sync catchall in two functions - mptcp: fix possible NULL pointer dereference on close Current release - new code bugs: - eth: ice: dpll: fix initial lock status of dpll Previous releases - regressions: - bpf: fix precision backtracking instruction iteration - af_unix: fix use-after-free in unix_stream_read_actor() - tipc: fix kernel-infoleak due to uninitialized TLV value - eth: bonding: stop the device in bond_setup_by_slave() - eth: mlx5: - fix double free of encap_header - avoid referencing skb after free-ing in drop path - eth: hns3: fix VF reset - eth: mvneta: fix calls to page_pool_get_stats Previous releases - always broken: - core: set SOCK_RCU_FREE before inserting socket into hashtable - bpf: fix control-flow graph checking in privileged mode - eth: ppp: limit MRU to 64K - eth: stmmac: avoid rx queue overrun - eth: icssg-prueth: fix error cleanup on failing initialization - eth: hns3: fix out-of-bounds access may occur when coalesce info is read via debugfs - eth: cortina: handle large frames Misc: - selftests: gso: support CONFIG_MAX_SKB_FRAGS up to 45 Signed-off-by: Paolo Abeni <pabeni@redhat.com> -----BEGIN PGP SIGNATURE----- iQJGBAABCAAwFiEEg1AjqC77wbdLX2LbKSR5jcyPE6QFAmVV9akSHHBhYmVuaUBy ZWRoYXQuY29tAAoJECkkeY3MjxOkICMP/1+QHUaD4JG1mW9oYc2zINPfQl3dqQt3 2CGSE2yrtbQvyQl39BDa0WFzV5X6So6/U50twhTNM+UAJsCaOvxCUDvUP9eY9Dcm z2H4oITZimyP4CEb3l7JpL2PImvfImL7D/fCPPMUZVzNY6dkEFznaQrnawbJz4gg mZXDnjwIXq7OchoJy3dHzyOn4ZQj2Df5VcfBzkVMdMcwV55Sd5JezbhwJ6NOmnKA uoXlq4pFYj3ahAhEQfLWUwXmF3e6esHs/WUCMe5FR9YkanJlu4oHUmY3RLzfcdQA PPIPDRxOzthcXyymqvqs7gnZ3ruMUll4B7tGTVFpJch8ts+DwGdUyBIIoDd/1BUT gmjipP5HPia3Qdtk3Jc4vMkcf5AwoGo0hXku7YYJ1K7+4+t8ep3/hDbQc0PLWX6J afiQgqpnNXHSTqBO5zl91vSwhGr/AAtAkDlPnsQL/RDAxY4teIwxHuoMvwPWaHZJ sMo5ZcHXvNnBbGhpozFtmrnbf1nduUrQmW5LkJViCLf25Sj6pDYbo8WnhMuOKSnZ 7an2YqniCgBtrX4MEVn2jsWgavI+SxndVIQR04u0uwqmP+dn8s9LUfjKKDtPWHsK +zMFtk+Op03TW5ur9w3+dgrGH0cLogPO3BJkho7xXKBfZ6/tN/pOef3/nV9xY6g8 JjnBUdpZRTWI =VjWw -----END PGP SIGNATURE----- Merge tag 'net-6.7-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Paolo Abeni: "Including fixes from BPF and netfilter. Current release - regressions: - core: fix undefined behavior in netdev name allocation - bpf: do not allocate percpu memory at init stage - netfilter: nf_tables: split async and sync catchall in two functions - mptcp: fix possible NULL pointer dereference on close Current release - new code bugs: - eth: ice: dpll: fix initial lock status of dpll Previous releases - regressions: - bpf: fix precision backtracking instruction iteration - af_unix: fix use-after-free in unix_stream_read_actor() - tipc: fix kernel-infoleak due to uninitialized TLV value - eth: bonding: stop the device in bond_setup_by_slave() - eth: mlx5: - fix double free of encap_header - avoid referencing skb after free-ing in drop path - eth: hns3: fix VF reset - eth: mvneta: fix calls to page_pool_get_stats Previous releases - always broken: - core: set SOCK_RCU_FREE before inserting socket into hashtable - bpf: fix control-flow graph checking in privileged mode - eth: ppp: limit MRU to 64K - eth: stmmac: avoid rx queue overrun - eth: icssg-prueth: fix error cleanup on failing initialization - eth: hns3: fix out-of-bounds access may occur when coalesce info is read via debugfs - eth: cortina: handle large frames Misc: - selftests: gso: support CONFIG_MAX_SKB_FRAGS up to 45" * tag 'net-6.7-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (78 commits) macvlan: Don't propagate promisc change to lower dev in passthru net: sched: do not offload flows with a helper in act_ct net/mlx5e: Check return value of snprintf writing to fw_version buffer for representors net/mlx5e: Check return value of snprintf writing to fw_version buffer net/mlx5e: Reduce the size of icosq_str net/mlx5: Increase size of irq name buffer net/mlx5e: Update doorbell for port timestamping CQ before the software counter net/mlx5e: Track xmit submission to PTP WQ after populating metadata map net/mlx5e: Avoid referencing skb after free-ing in drop path of mlx5e_sq_xmit_wqe net/mlx5e: Don't modify the peer sent-to-vport rules for IPSec offload net/mlx5e: Fix pedit endianness net/mlx5e: fix double free of encap_header in update funcs net/mlx5e: fix double free of encap_header net/mlx5: Decouple PHC .adjtime and .adjphase implementations net/mlx5: DR, Allow old devices to use multi destination FTE net/mlx5: Free used cpus mask when an IRQ is released Revert "net/mlx5: DR, Supporting inline WQE when possible" bpf: Do not allocate percpu memory at init stage net: Fix undefined behavior in netdev name allocation dt-bindings: net: ethernet-controller: Fix formatting error ... |
||
Yonghong Song
|
1fda5bb66a |
bpf: Do not allocate percpu memory at init stage
Kirill Shutemov reported significant percpu memory consumption increase after booting in 288-cpu VM ([1]) due to commit |
||
Peter Zijlstra
|
889c58b315 |
perf/core: Fix cpuctx refcounting
Audit of the refcounting turned up that perf_pmu_migrate_context()
fails to migrate the ctx refcount.
Fixes:
|
||
Peter Zijlstra
|
c9bd1568d5 |
futex: Fix hardcoded flags
Xi reported that commit |
||
Paul Moore
|
969d90ec21 |
audit: don't WARN_ON_ONCE(!current->mm) in audit_exe_compare()
eBPF can end up calling into the audit code from some odd places, and
some of these places don't have @current set properly so we end up
tripping the `WARN_ON_ONCE(!current->mm)` near the top of
`audit_exe_compare()`. While the basic `!current->mm` check is good,
the `WARN_ON_ONCE()` results in some scary console messages so let's
drop that and just do the regular `!current->mm` check to avoid
problems.
Cc: <stable@vger.kernel.org>
Fixes:
|
||
Keisuke Nishimura
|
6d7e4782bc |
sched/fair: Fix the decision for load balance
should_we_balance is called for the decision to do load-balancing.
When sched ticks invoke this function, only one CPU should return
true. However, in the current code, two CPUs can return true. The
following situation, where b means busy and i means idle, is an
example, because CPU 0 and CPU 2 return true.
[0, 1] [2, 3]
b b i b
This fix checks if there exists an idle CPU with busy sibling(s)
after looking for a CPU on an idle core. If some idle CPUs with busy
siblings are found, just the first one should do load-balancing.
Fixes:
|
||
Johannes Weiner
|
8b39d20ece |
sched: psi: fix unprivileged polling against cgroups
|
||
Abel Wu
|
eab03c23c2 |
sched/eevdf: Fix vruntime adjustment on reweight
vruntime of the (on_rq && !0-lag) entity needs to be adjusted when
it gets re-weighted, and the calculations can be simplified based
on the fact that re-weight won't change the w-average of all the
entities. Please check the proofs in comments.
But adjusting vruntime can also cause position change in RB-tree
hence require re-queue to fix up which might be costly. This might
be avoided by deferring adjustment to the time the entity actually
leaves tree (dequeue/pick), but that will negatively affect task
selection and probably not good enough either.
Fixes:
|