linux/kernel
Haifeng Xu 5bbf6ad532 perf/core: Fix missing wakeup when waiting for context reference
[ Upstream commit 74751ef5c1 ]

In our production environment, we found many hung tasks which are
blocked for more than 18 hours. Their call traces are like this:

[346278.191038] __schedule+0x2d8/0x890
[346278.191046] schedule+0x4e/0xb0
[346278.191049] perf_event_free_task+0x220/0x270
[346278.191056] ? init_wait_var_entry+0x50/0x50
[346278.191060] copy_process+0x663/0x18d0
[346278.191068] kernel_clone+0x9d/0x3d0
[346278.191072] __do_sys_clone+0x5d/0x80
[346278.191076] __x64_sys_clone+0x25/0x30
[346278.191079] do_syscall_64+0x5c/0xc0
[346278.191083] ? syscall_exit_to_user_mode+0x27/0x50
[346278.191086] ? do_syscall_64+0x69/0xc0
[346278.191088] ? irqentry_exit_to_user_mode+0x9/0x20
[346278.191092] ? irqentry_exit+0x19/0x30
[346278.191095] ? exc_page_fault+0x89/0x160
[346278.191097] ? asm_exc_page_fault+0x8/0x30
[346278.191102] entry_SYSCALL_64_after_hwframe+0x44/0xae

The task was waiting for the refcount become to 1, but from the vmcore,
we found the refcount has already been 1. It seems that the task didn't
get woken up by perf_event_release_kernel() and got stuck forever. The
below scenario may cause the problem.

Thread A					Thread B
...						...
perf_event_free_task				perf_event_release_kernel
						   ...
						   acquire event->child_mutex
						   ...
						   get_ctx
   ...						   release event->child_mutex
   acquire ctx->mutex
   ...
   perf_free_event (acquire/release event->child_mutex)
   ...
   release ctx->mutex
   wait_var_event
						   acquire ctx->mutex
						   acquire event->child_mutex
						   # move existing events to free_list
						   release event->child_mutex
						   release ctx->mutex
						   put_ctx
...						...

In this case, all events of the ctx have been freed, so we couldn't
find the ctx in free_list and Thread A will miss the wakeup. It's thus
necessary to add a wakeup after dropping the reference.

Fixes: 1cf8dfe8a6 ("perf/core: Fix race between close() and fork()")
Signed-off-by: Haifeng Xu <haifeng.xu@shopee.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20240513103948.33570-1-haifeng.xu@shopee.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-07-05 09:08:25 +02:00
..
bpf bpf: report RCU QS in cpumap kthread 2024-03-26 18:22:25 -04:00
cgroup sched/fair: Allow disabling sched_balance_newidle with sched_relax_domain_level 2024-06-16 13:28:41 +02:00
configs
debug kdb: Use format-specifiers rather than memset() for padding in kdb_read() 2024-06-16 13:28:52 +02:00
dma dma-mapping: clear dev->dma_mem to NULL after freeing it 2024-01-25 14:34:25 -08:00
events perf/core: Fix missing wakeup when waiting for context reference 2024-07-05 09:08:25 +02:00
gcov gcov: add support for GCC 14 2024-07-05 09:08:24 +02:00
irq genirq/cpuhotplug, x86/vector: Prevent vector leak during CPU offline 2024-06-16 13:28:48 +02:00
livepatch livepatch: fix race between fork and KLP transition 2022-10-26 13:22:18 +02:00
locking locking/ww_mutex/test: Fix potential workqueue corruption 2023-11-28 16:50:13 +00:00
power PM: suspend: Set mem_sleep_current during kernel command line setup 2024-04-13 12:51:24 +02:00
printk printk: Update @console_may_schedule in console_trylock_spinning() 2024-04-13 12:51:29 +02:00
rcu rcutorture: Fix rcu_torture_one_read() pipe_count overflow comment 2024-07-05 09:08:20 +02:00
sched sched/fair: Allow disabling sched_balance_newidle with sched_relax_domain_level 2024-06-16 13:28:41 +02:00
time tick/nohz_full: Don't abuse smp_call_function_single() in tick_setup_device() 2024-07-05 09:08:19 +02:00
trace tracing: Add MODULE_DESCRIPTION() to preemptirq_delay_test 2024-07-05 09:08:25 +02:00
.gitignore kbuild: update config_data.gz only when the content of .config is changed 2021-05-11 14:04:16 +02:00
acct.c acct: fix potential integer overflow in encode_comp_t() 2023-01-18 11:41:34 +01:00
async.c treewide: Remove uninitialized_var() usage 2023-06-09 10:29:01 +02:00
audit_fsnotify.c audit: fix potential double free on error path from fsnotify_add_inode_mark 2022-09-05 10:27:38 +02:00
audit_tree.c audit: move put_tree() to avoid trim_trees refcount underflow and UAF 2021-09-03 10:08:16 +02:00
audit_watch.c audit: don't WARN_ON_ONCE(!current->mm) in audit_exe_compare() 2023-11-28 16:50:18 +00:00
audit.c audit: Send netlink ACK before setting connection in auditd_set 2024-02-23 08:24:54 +01:00
audit.h audit: log AUDIT_TIME_* records only from rules 2022-04-15 14:18:04 +02:00
auditfilter.c audit: fix a net reference leak in audit_list_rules_send() 2020-06-22 09:30:59 +02:00
auditsc.c audit: fix possible soft lockup in __audit_inode_child() 2023-09-23 10:59:46 +02:00
backtracetest.c treewide: Replace DECLARE_TASKLET() with DECLARE_TASKLET_OLD() 2023-04-20 12:07:32 +02:00
bounds.c bounds: Use the right number of bits for power-of-two CONFIG_NR_CPUS 2024-05-02 16:18:37 +02:00
capability.c
compat.c sched_getaffinity: don't assume 'cpumask_size()' is fully initialized 2023-04-05 11:16:42 +02:00
configs.c kernel/configs: Replace GPL boilerplate code with SPDX identifier 2019-07-30 18:34:15 +02:00
context_tracking.c
cpu_pm.c kernel/cpu_pm: Fix uninitted local in cpu_pm 2020-06-22 09:31:22 +02:00
cpu.c hrtimers: Push pending hrtimers away from outgoing CPU earlier 2023-12-13 18:18:09 +01:00
crash_core.c
crash_dump.c
cred.c cred: switch to using atomic_long_t 2023-12-20 15:41:18 +01:00
delayacct.c
dma.c
exec_domain.c
exit.c treewide: Remove uninitialized_var() usage 2023-06-09 10:29:01 +02:00
extable.c kernel/extable.c: use address-of operator on section symbols 2023-06-09 10:29:01 +02:00
fail_function.c kernel/fail_function: fix memory leak with using debugfs_lookup() 2023-03-11 16:44:15 +01:00
fork.c kernel/fork: beware of __put_task_struct() calling context 2023-09-23 11:00:03 +02:00
freezer.c Revert "libata, freezer: avoid block device removal while system is frozen" 2019-10-06 09:11:37 -06:00
futex.c treewide: Remove uninitialized_var() usage 2023-06-09 10:29:01 +02:00
gen_kheaders.sh kheaders: explicitly define file modes for archived headers 2024-07-05 09:08:25 +02:00
groups.c
hung_task.c
iomem.c
irq_work.c
jump_label.c jump_label: Don't warn on __exit jump entries 2019-08-29 15:10:10 +01:00
kallsyms.c kallsyms: Refactor kallsyms_show_value() to take cred 2020-07-16 08:16:44 +02:00
kcmp.c exec: Transform exec_update_mutex into a rw_semaphore 2021-01-09 13:44:55 +01:00
Kconfig.freezer
Kconfig.hz
Kconfig.locks
Kconfig.preempt sched/rt, Kconfig: Unbreak def/oldconfig with CONFIG_PREEMPT=y 2019-07-22 18:05:11 +02:00
kcov.c
kexec_core.c kexec: fix a memory leak in crash_shrink_memory() 2023-07-27 08:37:10 +02:00
kexec_elf.c kexec_elf: support 32 bit ELF files 2019-09-06 23:58:44 +02:00
kexec_file.c kexec: support purgatories with .text.hot sections 2023-06-21 15:44:10 +02:00
kexec_internal.h
kexec.c kexec_load: Disable at runtime if the kernel is locked down 2019-08-19 21:54:15 -07:00
kheaders.c kheaders: Use array declaration instead of char 2023-05-17 11:35:33 +02:00
kmod.c kmod: make request_module() return an error when autoloading is disabled 2020-04-17 10:50:22 +02:00
kprobes.c kprobes: Fix possible use-after-free issue on kprobe registration 2024-05-02 16:18:30 +02:00
ksysfs.c
kthread.c kthread: Fix PF_KTHREAD vs to_kthread() race 2021-09-12 08:56:39 +02:00
latencytop.c
Makefile kbuild: update config_data.gz only when the content of .config is changed 2021-05-11 14:04:16 +02:00
module_signature.c module: harden ELF info handling 2021-04-07 14:47:38 +02:00
module_signing.c module: harden ELF info handling 2021-04-07 14:47:38 +02:00
module-internal.h
module.c modules: only allow symbol_get of EXPORT_SYMBOL_GPL modules 2023-09-23 10:59:36 +02:00
notifier.c kernel/notifier.c: intercept duplicate registrations to avoid infinite loops 2020-10-01 13:17:23 +02:00
nsproxy.c
padata.c crypto: pcrypt - Fix hungtask for PADATA_RESET 2023-11-28 16:50:14 +00:00
panic.c panic: Flush kernel log buffer at the end 2024-04-13 12:51:37 +02:00
params.c params: lift param_set_uint_minmax to common code 2024-06-16 13:28:45 +02:00
pid_namespace.c memcg: enable accounting for pids in nested pid namespaces 2021-09-22 12:26:37 +02:00
pid.c
profile.c profiling: fix shift too large makes kernel panic 2022-08-25 11:18:02 +02:00
ptrace.c ptrace: Reimplement PTRACE_KILL by always sending SIGKILL 2022-06-14 18:11:24 +02:00
range.c
reboot.c kernel/reboot: emergency_restart: Set correct system_state 2023-11-28 16:50:19 +00:00
relay.c relayfs: fix out-of-bounds access in relay_file_read 2023-05-17 11:35:58 +02:00
resource.c /dev/mem: Revoke mappings when a driver claims the region 2020-06-24 17:50:35 +02:00
rseq.c
seccomp.c seccomp: Invalidate seccomp mode to catch death failures 2022-02-16 12:52:53 +01:00
signal.c signal handling: don't use BUG_ON() for debugging 2022-07-21 20:59:27 +02:00
smp.c smp: Fix offline cpu check in flush_smp_call_function_queue() 2022-04-20 09:19:39 +02:00
smpboot.c kthread: Extract KTHREAD_IS_PER_CPU 2021-02-07 15:35:49 +01:00
smpboot.h
softirq.c
stackleak.c
stacktrace.c stacktrace: Don't skip first entry on noncurrent tasks 2019-11-04 21:19:25 +01:00
stop_machine.c stop_machine: Avoid potential race behaviour 2019-10-17 12:47:12 +02:00
sys_ni.c kernel/sys_ni: add compat entry for fadvise64_64 2022-09-05 10:27:38 +02:00
sys.c getrusage: use sig->stats_lock rather than lock_task_sighand() 2024-03-15 10:48:19 -04:00
sysctl_binary.c
sysctl-test.c kernel/sysctl-test: Add null pointer test for sysctl.c:proc_dointvec() 2020-10-01 13:17:10 +02:00
sysctl.c sched/rt: Disallow writing invalid values to sched_rt_period_us 2024-03-01 13:13:33 +01:00
task_work.c
taskstats.c taskstats: fix data-race 2020-01-09 10:19:54 +01:00
test_kprobes.c
torture.c torture: Remove exporting of internal functions 2019-08-01 14:30:22 -07:00
tracepoint.c tracepoint: Add tracepoint_probe_register_may_exist() for BPF tracing 2021-07-14 16:53:08 +02:00
tsacct.c taskstats: Cleanup the use of task->exit_code 2022-02-23 11:59:57 +01:00
ucount.c proc/sysctl: add shared variables for range check 2019-07-18 17:08:07 -07:00
uid16.c
uid16.h
umh.c usermodehelper: reset umask to default before executing user process 2020-10-14 10:32:58 +02:00
up.c smp: Fix smp_call_function_single_async prototype 2021-05-14 09:44:33 +02:00
user_namespace.c
user-return-notifier.c
user.c
utsname_sysctl.c
utsname.c
watchdog_hld.c watchdog/perf: more properly prevent false positives with turbo modes 2023-07-27 08:37:10 +02:00
watchdog.c watchdog: export lockup_detector_reconfigure 2022-08-25 11:18:37 +02:00
workqueue_internal.h
workqueue.c workqueue: Override implicit ordered attribute in workqueue_apply_unbound_cpumask() 2023-10-25 11:53:18 +02:00