linux/arch/x86
Jiri Olsa 6f55967ad9 perf/x86/intel: Fix race in intel_pmu_disable_event()
New race in x86_pmu_stop() was introduced by replacing the
atomic __test_and_clear_bit() of cpuc->active_mask by separate
test_bit() and __clear_bit() calls in the following commit:

  3966c3feca ("x86/perf/amd: Remove need to check "running" bit in NMI handler")

The race causes panic for PEBS events with enabled callchains:

  BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
  ...
  RIP: 0010:perf_prepare_sample+0x8c/0x530
  Call Trace:
   <NMI>
   perf_event_output_forward+0x2a/0x80
   __perf_event_overflow+0x51/0xe0
   handle_pmi_common+0x19e/0x240
   intel_pmu_handle_irq+0xad/0x170
   perf_event_nmi_handler+0x2e/0x50
   nmi_handle+0x69/0x110
   default_do_nmi+0x3e/0x100
   do_nmi+0x11a/0x180
   end_repeat_nmi+0x16/0x1a
  RIP: 0010:native_write_msr+0x6/0x20
  ...
   </NMI>
   intel_pmu_disable_event+0x98/0xf0
   x86_pmu_stop+0x6e/0xb0
   x86_pmu_del+0x46/0x140
   event_sched_out.isra.97+0x7e/0x160
  ...

The event is configured to make samples from PEBS drain code,
but when it's disabled, we'll go through NMI path instead,
where data->callchain will not get allocated and we'll crash:

          x86_pmu_stop
            test_bit(hwc->idx, cpuc->active_mask)
            intel_pmu_disable_event(event)
            {
              ...
              intel_pmu_pebs_disable(event);
              ...

EVENT OVERFLOW ->  <NMI>
                     intel_pmu_handle_irq
                       handle_pmi_common
   TEST PASSES ->        test_bit(bit, cpuc->active_mask))
                           perf_event_overflow
                             perf_prepare_sample
                             {
                               ...
                               if (!(sample_type & __PERF_SAMPLE_CALLCHAIN_EARLY))
                                     data->callchain = perf_callchain(event, regs);

         CRASH ->              size += data->callchain->nr;
                             }
                   </NMI>
              ...
              x86_pmu_disable_event(event)
            }

            __clear_bit(hwc->idx, cpuc->active_mask);

Fixing this by disabling the event itself before setting
off the PEBS bit.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: David Arcari <darcari@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Lendacky Thomas <Thomas.Lendacky@amd.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Fixes: 3966c3feca ("x86/perf/amd: Remove need to check "running" bit in NMI handler")
Link: http://lkml.kernel.org/r/20190504151556.31031-1-jolsa@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-05-05 13:00:48 +02:00
..
boot x86/boot: Disable RSDP parsing temporarily 2019-04-22 11:36:43 +02:00
configs Merge branch 'akpm' (patches from Andrew) 2019-03-07 19:25:37 -08:00
crypto crypto: x86/poly1305 - fix overflow during partial reduction 2019-04-08 14:43:06 +08:00
entry gcc-9: properly declare the {pv,hv}clock_page storage 2019-05-01 11:20:53 -07:00
events perf/x86/intel: Fix race in intel_pmu_disable_event() 2019-05-05 13:00:48 +02:00
hyperv x86/hyperv: Prevent potential NULL pointer dereference 2019-03-21 12:24:39 +01:00
ia32 a.out: remove core dumping support 2019-03-05 10:00:35 -08:00
include x86: make ZERO_PAGE() at least parse its argument 2019-04-29 09:51:29 -07:00
kernel Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2019-04-20 10:05:02 -07:00
kvm KVM: x86: avoid misreporting level-triggered irqs as edge-triggered in tracing 2019-04-16 15:38:08 +02:00
lib x86/lib: Fix indentation issue, remove extra tab 2019-03-21 12:24:38 +01:00
math-emu Remove 'type' argument from access_ok() function 2019-01-03 18:57:57 -08:00
mm x86/mm: Fix a crash with kmemleak_scan() 2019-04-24 11:32:34 +02:00
net x32: bpf: implement jitting of JMP32 2019-01-26 13:33:02 -08:00
oprofile
pci x86/PCI: Fixup RTIT_BAR of Intel Denverton Trace Hub 2019-02-07 08:43:58 -06:00
platform x86/realmode: Make set_real_mode_mem() static inline 2019-03-29 10:16:27 +01:00
power mm: remove include/linux/bootmem.h 2018-10-31 08:54:16 -07:00
purgatory
ras
realmode x86/realmode: Make set_real_mode_mem() static inline 2019-03-29 10:16:27 +01:00
tools x86: Clean up 'sizeof x' => 'sizeof(x)' 2018-10-29 07:13:28 +01:00
um Merge branch 'timers-2038-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2019-03-05 14:08:26 -08:00
video
xen Merge branch 'akpm' (patches from Andrew) 2019-03-12 10:39:53 -07:00
.gitignore
Kbuild KVM: x86: Allow Qemu/KVM to use PVH entry point 2018-12-13 13:41:49 -05:00
Kconfig Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2019-04-20 10:01:11 -07:00
Kconfig.cpu x86/cpu: Create Hygon Dhyana architecture support file 2018-09-27 16:14:05 +02:00
Kconfig.debug efi/x86: Convert x86 EFI earlyprintk into generic earlycon implementation 2019-02-04 08:27:30 +01:00
Makefile x86/retpolines: Disable switch jump tables when retpolines are enabled 2019-03-28 13:39:48 +01:00
Makefile_32.cpu
Makefile.um x86, powerpc: Remove -funit-at-a-time compiler option entirely 2018-12-09 11:55:32 +01:00