Commit Graph

25081 Commits

Author SHA1 Message Date
Andrew Banman
6d78059bbc x86/platform/uv/BAU: Fix payload queue setup on UV4 hardware
The BAU on UV4 does not need to maintain the payload queue tail pointer. Do
not initialize the tail pointer MMR on UV4.

Note that write_payload_tail is not an abstracted BAU function since it is
an operation specific to pre-UV4 versions. Then we must switch on the UV
version to control its usage, for which we use uvhub_version rather than
is_uv*_hub because it is quicker/more concise.

Signed-off-by: Andrew Banman <abanman@sgi.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Mike Travis <travis@sgi.com>
Acked-by: Dimitri Sivanich <sivanich@sgi.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: akpm@linux-foundation.org
Cc: rja@sgi.com
Link: http://lkml.kernel.org/r/1474474161-265604-10-git-send-email-abanman@sgi.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-09-22 11:16:15 +02:00
Andrew Banman
e879c1124a x86/platform/uv/BAU: Disable software timeout on UV4 hardware
Software timeouts are not currently supported on BAU for UV4. Instead, the
BAU will rely on hardware-level fairness protocols to determine broadcast
timeouts.

Do not call enable_timeouts or calculate_destination_timeout on UV4. These
functions write to pre-UV4 MMRs so they generate error messages on UV4.

Signed-off-by: Andrew Banman <abanman@sgi.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Mike Travis <travis@sgi.com>
Acked-by: Dimitri Sivanich <sivanich@sgi.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: akpm@linux-foundation.org
Cc: rja@sgi.com
Link: http://lkml.kernel.org/r/1474474161-265604-9-git-send-email-abanman@sgi.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-09-22 11:16:14 +02:00
Andrew Banman
58d4ab46f2 x86/platform/uv/BAU: Populate ->uvhub_version with UV4 version information
Signed-off-by: Andrew Banman <abanman@sgi.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Mike Travis <travis@sgi.com>
Acked-by: Dimitri Sivanich <sivanich@sgi.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: akpm@linux-foundation.org
Cc: rja@sgi.com
Link: http://lkml.kernel.org/r/1474474161-265604-8-git-send-email-abanman@sgi.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-09-22 11:16:14 +02:00
Andrew Banman
21e3f12fc0 x86/platform/uv/BAU: Use generic function pointers
Convert the use of UV version-specific functions to their abstracted
counterparts.

Signed-off-by: Andrew Banman <abanman@sgi.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Mike Travis <travis@sgi.com>
Acked-by: Dimitri Sivanich <sivanich@sgi.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: akpm@linux-foundation.org
Cc: rja@sgi.com
Link: http://lkml.kernel.org/r/1474474161-265604-7-git-send-email-abanman@sgi.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-09-22 11:16:14 +02:00
Andrew Banman
5e4f96fe2a x86/platform/uv/BAU: Add generic function pointers
Many BAU functions have different implementations depending on the UV
version. Rather than switching on the uvhub_version throughout the driver,
we can define a set of operations for each version. This is especially
beneficial for UV4, which will require many new MMR read/write functions.

Currently, the set of abstracted functions are the same for UV1, UV2, and
UV3. The functions were chosen because each one will have a different
implementation for UV4. Other functions will be added as needed to handle
new implementations or to cleanup the existing differences between UV1,
UV2, and UV3, i.e. read_status and wait_completion.

Signed-off-by: Andrew Banman <abanman@sgi.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Mike Travis <travis@sgi.com>
Acked-by: Dimitri Sivanich <sivanich@sgi.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: akpm@linux-foundation.org
Cc: rja@sgi.com
Link: http://lkml.kernel.org/r/1474474161-265604-6-git-send-email-abanman@sgi.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-09-22 11:16:13 +02:00
Andrew Banman
60e1c842c7 x86/platform/uv/BAU: Convert uv_physnodeaddr() use to uv_gpa_to_offset()
The BAU driver should use the functions provided by uv_hub.h rather than
its own implementations. uv_physnodeaddr converts vaddrs to paddrs for
BAU MMR fields, but this is done better by uv_gpa_to_offset.

Signed-off-by: Andrew Banman <abanman@sgi.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Mike Travis <travis@sgi.com>
Acked-by: Dimitri Sivanich <sivanich@sgi.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: akpm@linux-foundation.org
Cc: rja@sgi.com
Link: http://lkml.kernel.org/r/1474474161-265604-5-git-send-email-abanman@sgi.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-09-22 11:16:13 +02:00
Andrew Banman
d2a57afa53 x86/platform/uv/BAU: Clean up pq_init()
The payload queue first MMR requires the physical memory address and hub
GNODE of where the payload queue resides in memory, but the associated
variables are named as if the PNODE were used. Rename gnode-related
variables and clarify the definitions of the payload queue head, last, and
tail pointers.

Signed-off-by: Andrew Banman <abanman@sgi.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Mike Travis <travis@sgi.com>
Acked-by: Dimitri Sivanich <sivanich@sgi.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: akpm@linux-foundation.org
Cc: rja@sgi.com
Link: http://lkml.kernel.org/r/1474474161-265604-4-git-send-email-abanman@sgi.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-09-22 11:16:13 +02:00
Andrew Banman
efa59ab3e7 x86/platform/uv/BAU: Clean up and update printks
Replace all uses of printk with the appropriate pr_*() function.

Signed-off-by: Andrew Banman <abanman@sgi.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Mike Travis <travis@sgi.com>
Acked-by: Dimitri Sivanich <sivanich@sgi.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: akpm@linux-foundation.org
Cc: rja@sgi.com
Link: http://lkml.kernel.org/r/1474474161-265604-3-git-send-email-abanman@sgi.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-09-22 11:16:12 +02:00
Andrew Banman
67492c86b3 x86/platform/uv/BAU: Clean up vertical alignment
Fix whitespace on blocks of code to be vertically aligned.

Signed-off-by: Andrew Banman <abanman@sgi.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Mike Travis <travis@sgi.com>
Acked-by: Dimitri Sivanich <sivanich@sgi.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: akpm@linux-foundation.org
Cc: rja@sgi.com
Link: http://lkml.kernel.org/r/1474474161-265604-2-git-send-email-abanman@sgi.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-09-22 11:16:12 +02:00
Ingo Molnar
baad92e344 Merge branch 'linus' into x86/platform, to pick up fixes
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-09-22 11:15:38 +02:00
Linus Torvalds
3286be9480 Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Thomas Gleixner:
 "A couple of small fixes to x86 perf drivers:

   - Measure L2 for HW_CACHE* events on AMD

   - Fix the address filter handling in the intel/pt driver

   - Handle the BTS disabling at the proper place"

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf/x86/amd: Make HW_CACHE_REFERENCES and HW_CACHE_MISSES measure L2
  perf/x86/intel/pt: Do validate the size of a kernel address filter
  perf/x86/intel/pt: Fix kernel address filter's offset validation
  perf/x86/intel/pt: Fix an off-by-one in address filter configuration
  perf/x86/intel: Don't disable "intel_bts" around "intel" event batching
2016-09-18 11:50:48 -07:00
Matt Fleming
080fe0b790 perf/x86/amd: Make HW_CACHE_REFERENCES and HW_CACHE_MISSES measure L2
While the Intel PMU monitors the LLC when perf enables the
HW_CACHE_REFERENCES and HW_CACHE_MISSES events, these events monitor
L1 instruction cache fetches (0x0080) and instruction cache misses
(0x0081) on the AMD PMU.

This is extremely confusing when monitoring the same workload across
Intel and AMD machines, since parameters like,

  $ perf stat -e cache-references,cache-misses

measure completely different things.

Instead, make the AMD PMU measure instruction/data cache and TLB fill
requests to the L2 and instruction/data cache and TLB misses in the L2
when HW_CACHE_REFERENCES and HW_CACHE_MISSES are enabled,
respectively. That way the events measure unified caches on both
platforms.

Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: <stable@vger.kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1472044328-21302-1-git-send-email-matt@codeblueprint.co.uk
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-09-16 16:19:49 +02:00
Alexander Shishkin
1155bafcb7 perf/x86/intel/pt: Do validate the size of a kernel address filter
Right now, the kernel address filters in PT are prone to integer overflow
that may happen in adding filter's size to its offset to obtain the end
of the range. Such an overflow would also throw a #GP in the PT event
configuration path.

Fix this by explicitly validating the result of this calculation.

Reported-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: stable@vger.kernel.org # v4.7
Cc: stable@vger.kernel.org#v4.7
Cc: vince@deater.net
Link: http://lkml.kernel.org/r/20160915151352.21306-4-alexander.shishkin@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-09-16 11:14:16 +02:00
Alexander Shishkin
ddfdad991e perf/x86/intel/pt: Fix kernel address filter's offset validation
The kernel_ip() filter is used mostly by the DS/LBR code to look at the
branch addresses, but Intel PT also uses it to validate the address
filter offsets for kernel addresses, for which it is not sufficient:
supplying something in bits 64:48 that's not a sign extension of the lower
address bits (like 0xf00d000000000000) throws a #GP.

This patch adds address validation for the user supplied kernel filters.

Reported-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: stable@vger.kernel.org # v4.7
Cc: stable@vger.kernel.org#v4.7
Cc: vince@deater.net
Link: http://lkml.kernel.org/r/20160915151352.21306-3-alexander.shishkin@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-09-16 11:14:16 +02:00
Alexander Shishkin
95f60084ac perf/x86/intel/pt: Fix an off-by-one in address filter configuration
PT address filter configuration requires that a range is specified by
its first and last address, but at the moment we're obtaining the end
of the range by adding user specified size to its start, which is off
by one from what it actually needs to be.

Fix this and make sure that zero-sized filters don't pass the filter
validation.

Reported-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: stable@vger.kernel.org # v4.7
Cc: stable@vger.kernel.org#v4.7
Cc: vince@deater.net
Link: http://lkml.kernel.org/r/20160915151352.21306-2-alexander.shishkin@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-09-16 11:14:16 +02:00
Linus Torvalds
024c7e3756 One fix for an x86 regression in VM migration, mostly visible with
Windows because it uses RTC periodic interrupts.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQEcBAABAgAGBQJX2xpHAAoJEL/70l94x66D9NsIAIw+9oRA86qjehVnguV3fRKA
 ITZ4OGFDiXPWuxqDaw8mHHXr0RYx8KcMTzfFNbV+YL5U0cq9xYzdaNhchKPpyF+3
 7H5wL8Ku9wkYZ930kdCf5Q+LNCfg8d/wKlibPEbX0MDx4jL99kkcxLzEkmIRqFlq
 bpXaQe/KR1xCWR6gI/a6aRJWLfGuFMV82YSnk/dCSjwotbAwjJUSt+IPhLwhx28o
 7ddcxW3CxQqelJorcu2lvRiGnCvEzDhIdOvHJqibCjo3uzqbcI4PA2gs3rozbs9s
 VMEzqZpNgK0XsyKyccSw75npViIHYPkjMzxoyHMDhgiP3eIwp/tJquxAfLjK4WE=
 =h4P4
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull kvm fix from Paolo Bonzini:
 "One fix for an x86 regression in VM migration, mostly visible with
  Windows because it uses RTC periodic interrupts"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  kvm: x86: correctly reset dest_map->vector when restoring LAPIC state
2016-09-15 15:15:41 -07:00
Al Viro
1c109fabbd fix minor infoleak in get_user_ex()
get_user_ex(x, ptr) should zero x on failure.  It's not a lot of a leak
(at most we are leaking uninitialized 64bit value off the kernel stack,
and in a fairly constrained situation, at that), but the fix is trivial,
so...

Cc: stable@vger.kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
[ This sat in different branch from the uaccess fixes since mid-August ]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-09-15 12:54:04 -07:00
Paolo Bonzini
b0eaf4506f kvm: x86: correctly reset dest_map->vector when restoring LAPIC state
When userspace sends KVM_SET_LAPIC, KVM schedules a check between
the vCPU's IRR and ISR and the IOAPIC redirection table, in order
to re-establish the IOAPIC's dest_map (the list of CPUs servicing
the real-time clock interrupt with the corresponding vectors).

However, __rtc_irq_eoi_tracking_restore_one was forgetting to
set dest_map->vectors.  Because of this, the IOAPIC did not process
the real-time clock interrupt EOI, ioapic->rtc_status.pending_eoi
got stuck at a non-zero value, and further RTC interrupts were
reported to userspace as coalesced.

Fixes: 9e4aabe2bb
Fixes: 4d99ba898d
Cc: stable@vger.kernel.org
Cc: Joerg Roedel <jroedel@suse.de>
Cc: David Gilbert <dgilbert@redhat.com>
Reviewed-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-09-15 18:00:32 +02:00
Alexander Shishkin
cecf62352a perf/x86/intel: Don't disable "intel_bts" around "intel" event batching
At the moment, intel_bts events get disabled from intel PMU's disable
callback, which includes event scheduling transactions of said PMU,
which have nothing to do with intel_bts events.

We do want to keep intel_bts events off inside the PMI handler to
avoid filling up their buffer too soon.

This patch moves intel_bts enabling/disabling directly to the PMI
handler.

Reported-by: Vince Weaver <vincent.weaver@maine.edu>
Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: vince@deater.net
Link: http://lkml.kernel.org/r/20160915082233.11065-1-alexander.shishkin@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-09-15 11:25:26 +02:00
Linus Torvalds
4cea877657 PCI updates for v4.8:
Enumeration
     Mark Haswell Power Control Unit as having non-compliant BARs (Bjorn Helgaas)
 
   Power management
     Fix bridge_d3 update on device removal (Lukas Wunner)
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJX2az0AAoJEFmIoMA60/r8/EoP/28mGRiKi8mlqNAR3MYN3F0n
 VSIm7WyxWNawH1gRJXKQBzNqgMJnj4qRGXSIvP3AIYyBDcJs/X7j91/eKOARNfQr
 55A+gfSz4jUKlw+0WgPY8/U2/xQ4yoom1zhbsAYcIVeljZo/3JUg+wHpPjhIMkH0
 2slTerHRDExrS43jxQi225toiEaO6lcY8EVmHCDo+jYlQz3sCEwIXg9hn1rwTbvG
 sJI0zyUwHF+oWowgJqlwYxsbPPnelPAN5YAx7KrHuVmBdL0Bgo3oIRtbb3JZZ9Up
 L9bQ6NpRjSARvijaZ2TAhueqIIDv2HGgvwNB01l4Yggw7Sm1dFCuUS6vj/e5tpZA
 xntE3F6s2Z+I4I1D7pAX3jMYCdYx/QltiTCeGRp8pJv+f4ewW3jcel3FAksY3BEg
 0NCjDrGFqGYai4hGRROpt/aXlW/Pn53eQLlu4Xg2qgkj0NMh0ODMrTjMnABB39ae
 eGqIXab7WeVBxt10eU19J1u1RTqpUO2LJW+cMnvYdCfKAYby/gj8SD8vIsn3oDjZ
 hQS/4fSHurc7LZwDmwOfaiHlGnvcQWV9EKwgScS0v8AxPQnC8pNUgYpzZcXY8Q6I
 YXtyK7suFriSZPS0Qs4FrdfJrmTBaBQ55aZu9aftb5v4YqacN5qZo+HaXiONBy3v
 RzsFK6xIbnIgb35g8vKy
 =PQml
 -----END PGP SIGNATURE-----

Merge tag 'pci-v4.8-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci

Pull PCI fixes from Bjorn Helgaas:
 "Here are two changes for v4.8.  The first fixes a "[Firmware Bug]: reg
  0x10: invalid BAR (can't size)" warning on Haswell, and the second
  fixes a problem in some new runtime suspend functionality we merged
  for v4.8.  Summary:

  Enumeration:
    Mark Haswell Power Control Unit as having non-compliant BARs (Bjorn Helgaas)

  Power management:
    Fix bridge_d3 update on device removal (Lukas Wunner)"

* tag 'pci-v4.8-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci:
  PCI: Fix bridge_d3 update on device removal
  PCI: Mark Haswell Power Control Unit as having non-compliant BARs
2016-09-14 14:06:30 -07:00
Linus Torvalds
5924bbecd0 Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Ingo Molnar:
 "Three fixes:

   - AMD microcode loading fix with randomization

   - an lguest tooling fix

   - and an APIC enumeration boundary condition fix"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/apic: Fix num_processors value in case of failure
  tools/lguest: Don't bork the terminal in case of wrong args
  x86/microcode/AMD: Fix load of builtin microcode with randomized memory
2016-09-13 12:52:45 -07:00
Linus Torvalds
ee319d5834 Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Ingo Molnar:
 "This contains:

   - a set of fixes found by directed-random perf fuzzing efforts by
     Vince Weaver, Alexander Shishkin and Peter Zijlstra

   - a cqm driver crash fix

   - an AMD uncore driver use after free fix"

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf/x86/intel: Fix PEBSv3 record drain
  perf/x86/intel/bts: Kill a silly warning
  perf/x86/intel/bts: Fix BTS PMI detection
  perf/x86/intel/bts: Fix confused ordering of PMU callbacks
  perf/core: Fix aux_mmap_count vs aux_refcount order
  perf/core: Fix a race between mmap_close() and set_output() of AUX events
  perf/x86/amd/uncore: Prevent use after free
  perf/x86/intel/cqm: Check cqm/mbm enabled state in event init
  perf/core: Remove WARN from perf_event_read()
2016-09-13 12:47:29 -07:00
Linus Torvalds
7c2c114416 Merge branch 'efi-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull EFI fixes from Ingo Molnar:
 "This contains a Xen fix, an arm64 fix and a race condition /
  robustization set of fixes related to ExitBootServices() usage and
  boundary conditions"

* 'efi-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/efi: Use efi_exit_boot_services()
  efi/libstub: Use efi_exit_boot_services() in FDT
  efi/libstub: Introduce ExitBootServices helper
  efi/libstub: Allocate headspace in efi_get_memory_map()
  efi: Fix handling error value in fdt_find_uefi_params
  efi: Make for_each_efi_memory_desc_in_map() cope with running on Xen
2016-09-13 12:02:00 -07:00
Linus Torvalds
ac059c4fa7 * s390: nested virt fixes (new 4.8 feature)
* x86: fixes for 4.8 regressions
 * ARM: two small bugfixes
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQEcBAABAgAGBQJX1xaBAAoJEL/70l94x66Db/8H/AuKWs6QbvUNnA+tWITdmFbi
 ah7R2BoRVak4h6UGYC4OsY9BTuWVeaCx2+8yLIqSdfWoYUJzwiGFPwkuUK/hVSkK
 /9gZO8SsREAm/1gVck2X2vzATYbfCxKcRsSq0/ocmIQEpRYWKGoJWS6zeiwLZ9kq
 H/KsAvsGc83VdlPXrhLVQa7js/Tl/M5JlBEbm6i8nIsvhoqZkES4rgwHKBtkxTyU
 8oepfNQnYTb2hZhJE0aW+9z3V0O+NKbGW75bKOzrbJBoEUGeHuPpLnP0mZIyEGPU
 grQkfuWMmfEaWrGahNA9ARpsNcy4Zp0BF5mgJ03OAmgfY+KmJroVk95IvcP9jF4=
 =+uuA
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull KVM fixes from Paolo Bonzini:
 - s390: nested virt fixes (new 4.8 feature)
 - x86: fixes for 4.8 regressions
 - ARM: two small bugfixes

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  kvm-arm: Unmap shadow pagetables properly
  x86, clock: Fix kvm guest tsc initialization
  arm: KVM: Fix idmap overlap detection when the kernel is idmap'ed
  KVM: lapic: adjust preemption timer correctly when goes TSC backward
  KVM: s390: vsie: fix riccbd
  KVM: s390: don't use current->thread.fpu.* when accessing registers
2016-09-12 14:30:14 -07:00
Linus Torvalds
98ac9a608d Merge branch 'libnvdimm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm
Pull libnvdimm fixes from Dan Williams:
 "nvdimm fixes for v4.8, two of them are tagged for -stable:

   - Fix devm_memremap_pages() to use track_pfn_insert().  Otherwise,
     DAX pmd mappings end up with an uncached pgprot, and unusable
     performance for the device-dax interface.  The device-dax interface
     appeared in 4.7 so this is tagged for -stable.

   - Fix a couple VM_BUG_ON() checks in the show_smaps() path to
     understand DAX pmd entries.  This fix is tagged for -stable.

   - Fix a mis-merge of the nfit machine-check handler to flip the
     polarity of an if() to match the final version of the patch that
     Vishal sent for 4.8-rc1.  Without this the nfit machine check
     handler never detects / inserts new 'badblocks' entries which
     applications use to identify lost portions of files.

   - For test purposes, fix the nvdimm_clear_poison() path to operate on
     legacy / simulated nvdimm memory ranges.  Without this fix a test
     can set badblocks, but never clear them on these ranges.

   - Fix the range checking done by dax_dev_pmd_fault().  This is not
     tagged for -stable since this problem is mitigated by specifying
     aligned resources at device-dax setup time.

  These patches have appeared in a next release over the past week.  The
  recent rebase you can see in the timestamps was to drop an invalid fix
  as identified by the updated device-dax unit tests [1].  The -mm
  touches have an ack from Andrew"

[1]: "[ndctl PATCH 0/3] device-dax test for recent kernel bugs"
   https://lists.01.org/pipermail/linux-nvdimm/2016-September/006855.html

* 'libnvdimm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
  libnvdimm: allow legacy (e820) pmem region to clear bad blocks
  nfit, mce: Fix SPA matching logic in MCE handler
  mm: fix cache mode of dax pmd mappings
  mm: fix show_smap() for zone_device-pmd ranges
  dax: fix mapping size check
2016-09-10 09:58:52 -07:00
Peter Zijlstra
8ef9b8455a perf/x86/intel: Fix PEBSv3 record drain
Alexander hit the WARN_ON_ONCE(!event) on his Skylake while running
the perf fuzzer.

This means the PEBSv3 record included a status bit for an inactive
event, something that _should_ not happen.

Move the code that filters the status bits against our known PEBS
events up a spot to guarantee we only deal with events we know about.

Further add "continue" statements to the WARN_ON_ONCE()s such that
we'll not die nor generate silly events in case we ever do hit them
again.

Reported-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Tested-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vince@deater.net>
Cc: stable@vger.kernel.org
Fixes: a3d86542de ("perf/x86/intel/pebs: Add PEBSv3 decoding")
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-09-10 11:15:39 +02:00
Alexander Shishkin
ef9ef3befa perf/x86/intel/bts: Kill a silly warning
At the moment, intel_bts will WARN() out if there is more than one
event writing to the same ring buffer, via SET_OUTPUT, and will only
send data from one event to a buffer.

There is no reason to have this warning in, so kill it.

Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: vince@deater.net
Link: http://lkml.kernel.org/r/20160906132353.19887-6-alexander.shishkin@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-09-10 11:15:38 +02:00
Alexander Shishkin
4d4c474124 perf/x86/intel/bts: Fix BTS PMI detection
Since BTS doesn't have a dedicated PMI status bit, the driver needs to
take extra care to check for the condition that triggers it to avoid
spurious NMI warnings.

Regardless of the local BTS context state, the only way of knowing that
the NMI is ours is to compare the write pointer against the interrupt
threshold.

Reported-by: Vince Weaver <vincent.weaver@maine.edu>
Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: vince@deater.net
Link: http://lkml.kernel.org/r/20160906132353.19887-5-alexander.shishkin@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-09-10 11:15:38 +02:00
Alexander Shishkin
a9a94401c2 perf/x86/intel/bts: Fix confused ordering of PMU callbacks
The intel_bts driver is using a CPU-local 'started' variable to order
callbacks and PMIs and make sure that AUX transactions don't get messed
up. However, the ordering rules in regard to this variable is a complete
mess, which recently resulted in perf_fuzzer-triggered warnings and
panics.

The general ordering rule that is patch is enforcing is that this
cpu-local variable be set only when the cpu-local AUX transaction is
active; consequently, this variable is to be checked before the AUX
related bits can be touched.

Reported-by: Vince Weaver <vincent.weaver@maine.edu>
Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: vince@deater.net
Link: http://lkml.kernel.org/r/20160906132353.19887-4-alexander.shishkin@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-09-10 11:15:37 +02:00
Dan Williams
9049771f7d mm: fix cache mode of dax pmd mappings
track_pfn_insert() in vmf_insert_pfn_pmd() is marking dax mappings as
uncacheable rendering them impractical for application usage.  DAX-pte
mappings are cached and the goal of establishing DAX-pmd mappings is to
attain more performance, not dramatically less (3 orders of magnitude).

track_pfn_insert() relies on a previous call to reserve_memtype() to
establish the expected page_cache_mode for the range.  While memremap()
arranges for reserve_memtype() to be called, devm_memremap_pages() does
not.  So, teach track_pfn_insert() and untrack_pfn() how to handle
tracking without a vma, and arrange for devm_memremap_pages() to
establish the write-back-cache reservation in the memtype tree.

Cc: <stable@vger.kernel.org>
Cc: Matthew Wilcox <mawilcox@microsoft.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Nilesh Choudhury <nilesh.choudhury@oracle.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reported-by: Toshi Kani <toshi.kani@hpe.com>
Reported-by: Kai Zhang <kai.ka.zhang@oracle.com>
Acked-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2016-09-09 17:34:46 -07:00
Sebastian Andrzej Siewior
7d762e49c2 perf/x86/amd/uncore: Prevent use after free
The resent conversion of the cpu hotplug support in the uncore driver
introduced a regression due to the way the callbacks are invoked at
initialization time.

The old code called the prepare/starting/online function on each online cpu
as a block. The new code registers the hotplug callbacks in the core for
each state. The core invokes the callbacks at each registration on all
online cpus.

The code implicitely relied on the prepare/starting/online callbacks being
called as combo on a particular cpu, which was not obvious and completely
undocumented.

The resulting subtle wreckage happens due to the way how the uncore code
manages shared data structures for cpus which share an uncore resource in
hardware. The sharing is determined in the cpu starting callback, but the
prepare callback allocates per cpu data for the upcoming cpu because
potential sharing is unknown at this point. If the starting callback finds
a online cpu which shares the hardware resource it takes a refcount on the
percpu data of that cpu and puts the own data structure into a
'free_at_online' pointer of that shared data structure. The online callback
frees that.

With the old model this worked because in a starting callback only one non
unused structure (the one of the starting cpu) was available. The new code
allocates the data structures for all cpus when the prepare callback is
registered.

Now the starting function iterates through all online cpus and looks for a
data structure (skipping its own) which has a matching hardware id. The id
member of the data structure is initialized to 0, but the hardware id can
be 0 as well. The resulting wreckage is:

  CPU0 finds a matching id on CPU1, takes a refcount on CPU1 data and puts
  its own data structure into CPU1s data structure to be freed.

  CPU1 skips CPU0 because the data structure is its allegedly unsued own.
  It finds a matching id on CPU2, takes a refcount on CPU1 data and puts
  its own data structure into CPU2s data structure to be freed.

  ....

Now the online callbacks are invoked.

  CPU0 has a pointer to CPU1s data and frees the original CPU0 data. So
  far so good.

  CPU1 has a pointer to CPU2s data and frees the original CPU1 data, which
  is still referenced by CPU0 ---> Booom

So there are two issues to be solved here:

1) The id field must be initialized at allocation time to a value which
   cannot be a valid hardware id, i.e. -1

   This prevents the above scenario, but now CPU1 and CPU2 both stick their
   own data structure into the free_at_online pointer of CPU0. So we leak
   CPU1s data structure.

2) Fix the memory leak described in #1

   Instead of having a single pointer, use a hlist to enqueue the
   superflous data structures which are then freed by the first cpu
   invoking the online callback.

Ideally we should know the sharing _before_ invoking the prepare callback,
but that's way beyond the scope of this bug fix.

[ tglx: Rewrote changelog ]

Fixes: 96b2bd3866 ("perf/x86/amd/uncore: Convert to hotplug state machine")
Reported-and-tested-by: Eric Sandeen <sandeen@sandeen.net>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Borislav Petkov <bp@suse.de>
Link: http://lkml.kernel.org/r/20160909160822.lowgmkdwms2dheyv@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-09-10 00:00:06 +02:00
Prarit Bhargava
a4497a86fb x86, clock: Fix kvm guest tsc initialization
When booting a kvm guest on AMD with the latest kernel the following
messages are displayed in the boot log:

 tsc: Unable to calibrate against PIT
 tsc: HPET/PMTIMER calibration failed

aa297292d7 ("x86/tsc: Enumerate SKL cpu_khz and tsc_khz via CPUID")
introduced a change to account for a difference in cpu and tsc frequencies for
Intel SKL processors. Before this change the native tsc set
x86_platform.calibrate_tsc to native_calibrate_tsc() which is a hardware
calibration of the tsc, and in tsc_init() executed

	tsc_khz = x86_platform.calibrate_tsc();
	cpu_khz = tsc_khz;

The kvm code changed x86_platform.calibrate_tsc to kvm_get_tsc_khz() and
executed the same tsc_init() function.  This meant that KVM guests did not
execute the native hardware calibration function.

After aa297292d7, there are separate native calibrations for cpu_khz and
tsc_khz.  The code sets x86_platform.calibrate_tsc to native_calibrate_tsc()
which is now an Intel specific calibration function, and
x86_platform.calibrate_cpu to native_calibrate_cpu() which is the "old"
native_calibrate_tsc() function (ie, the native hardware calibration
function).

tsc_init() now does

	cpu_khz = x86_platform.calibrate_cpu();
	tsc_khz = x86_platform.calibrate_tsc();
	if (tsc_khz == 0)
		tsc_khz = cpu_khz;
	else if (abs(cpu_khz - tsc_khz) * 10 > tsc_khz)
		cpu_khz = tsc_khz;

The kvm code should not call the hardware initialization in
native_calibrate_cpu(), as it isn't applicable for kvm and it didn't do that
prior to aa297292d7.

This patch resolves this issue by setting x86_platform.calibrate_cpu to
kvm_get_tsc_khz().

v2: I had originally set x86_platform.calibrate_cpu to
cpu_khz_from_cpuid(), however, pbonzini pointed out that the CPUID leaf
in that function is not available in KVM.  I have changed the function
pointer to kvm_get_tsc_khz().

Fixes: aa297292d7 ("x86/tsc: Enumerate SKL cpu_khz and tsc_khz via CPUID")
Signed-off-by: Prarit Bhargava <prarit@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: x86@kernel.org
Cc: Len Brown <len.brown@intel.com>
Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org>
Cc: Borislav Petkov <bp@suse.de>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: "Christopher S. Hall" <christopher.s.hall@intel.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: kvm@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-09-08 16:41:55 +02:00
Andy Shevchenko
f43ea76cf3 x86/platform/intel-mid: Keep SRAM powered on at boot
On Penwell SRAM has to be powered on, otherwise it prevents booting.

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: ca22312dc8 ("x86/platform/intel-mid: Extend PWRMU to support Penwell")
Link: http://lkml.kernel.org/r/20160908103232.137587-2-andriy.shevchenko@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-09-08 14:07:54 +02:00
Andy Shevchenko
8e522e1d32 x86/platform/intel-mid: Add Intel Penwell to ID table
Commit:

  ca22312dc8 ("x86/platform/intel-mid: Extend PWRMU to support Penwell")

... enabled the PWRMU driver on platforms based on Intel Penwell, but
unfortunately this is not enough.

Add Intel Penwell ID to pci-mid.c driver as well. To avoid confusion in the
future add a comment to both drivers.

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: ca22312dc8 ("x86/platform/intel-mid: Extend PWRMU to support Penwell")
Link: http://lkml.kernel.org/r/20160908103232.137587-1-andriy.shevchenko@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-09-08 14:07:53 +02:00
Andy Shevchenko
f5fbf84830 x86/cpu: Rename Merrifield2 to Moorefield
Merrifield2 is actually Moorefield.

Rename it accordingly and drop tail digit from Merrifield1.

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20160906184254.94440-1-andriy.shevchenko@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-09-08 08:13:08 +02:00
Dou Liyang
c291b01515 x86/apic: Fix num_processors value in case of failure
If the topology package map check of the APIC ID and the CPU is a failure,
we don't generate the processor info for that APIC ID yet we increase
disabled_cpus by one - which is buggy.

Only increase num_processors once we are sure we don't fail.

Signed-off-by: Dou Liyang <douly.fnst@cn.fujitsu.com>
Acked-by: David Rientjes <rientjes@google.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1473214893-16481-1-git-send-email-douly.fnst@cn.fujitsu.com
[ Rewrote the changelog. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-09-08 08:11:03 +02:00
Andy Shevchenko
bda7b072de x86/platform/intel-mid: Implement power off sequence
Tell SCU that we are about powering off the device.

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20160907123955.21228-1-andriy.shevchenko@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-09-08 08:03:58 +02:00
Linus Torvalds
ab29b33a84 - Fix UM seccomp vs ptrace, after reordering landed.
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 Comment: Kees Cook <kees@outflux.net>
 
 iQIcBAABCgAGBQJX0D/IAAoJEIly9N/cbcAmORYQAIzhZ+tFuXkOrEJx0efQx8ik
 8ix0wBrh0KTDQkMWQxUnDyIr+7vNcH0oOkH3kwgYaShAZif4oq6TGkdvgJJQr+ca
 ZsBcEdasYvglcr/7hEF4cH28h3YqN7HwW+Mx8IqrBQqqjly7V2jamBBd/YXOcDK9
 gqXooH1He9QVtVb0uC/AVFVY+st37iuqrreLFKWrX52FaUQ+7f0svN4LWC9b1UEq
 UQyi+HMMRbovum9+2WDKPpl8oEZvXE7CiWexwl2G9qBCpm/AhragArl9BhrA31eE
 0PPuVBTtypPNvKVieboXUhTg5eet+nFXtlea+/CLWfbqxpecsNiqMgSIg+aWOnz4
 Y6Te5P7oSwhto0WzXvilxjvShvjHgTo98PY9Qj0bBb7ECCbI09GfoDf3SPYayTwC
 9zPC04mJihr+6h8QdmYgfuHzTVjMxb6YOmAorz805cJ0vHtdPIP4Gkei4R1tNUkG
 TNQNXKef/UJbImL17pFu70Ru6/J58OgDuMuAAuswsFt7KELw4DZhmJL+KnYdPG1G
 gHirQd1HtQlsBoVvQAwmQjBp+TqFoY3oiv4Y2oq7IEk/ZCs3XJaEYKm90CoZlm4a
 sMtA0j+9rWqNirzuyWUeJfkd48yjcbOzSimzSCTsNVVsbyeA1yyapAdHSTWWm554
 zEubN4LCJoGiH/FWnX6U
 =UgsA
 -----END PGP SIGNATURE-----

Merge tag 'seccomp-v4.8-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux

Pull seccomp fixes from Kees Cook:
 "Fix UM seccomp vs ptrace, after reordering landed"

* tag 'seccomp-v4.8-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
  seccomp: Remove 2-phase API documentation
  um/ptrace: Fix the syscall number update after a ptrace
  um/ptrace: Fix the syscall_trace_leave call
2016-09-07 10:46:06 -07:00
Mickaël Salaün
ce29856a5e um/ptrace: Fix the syscall number update after a ptrace
Update the syscall number after each PTRACE_SETREGS on ORIG_*AX.

This is needed to get the potentially altered syscall number in the
seccomp filters after RET_TRACE.

This fix four seccomp_bpf tests:
> [ RUN      ] TRACE_syscall.skip_after_RET_TRACE
> seccomp_bpf.c:1560:TRACE_syscall.skip_after_RET_TRACE:Expected -1 (18446744073709551615) == syscall(39) (26)
> seccomp_bpf.c:1561:TRACE_syscall.skip_after_RET_TRACE:Expected 1 (1) == (*__errno_location ()) (22)
> [     FAIL ] TRACE_syscall.skip_after_RET_TRACE
> [ RUN      ] TRACE_syscall.kill_after_RET_TRACE
> TRACE_syscall.kill_after_RET_TRACE: Test exited normally instead of by signal (code: 1)
> [     FAIL ] TRACE_syscall.kill_after_RET_TRACE
> [ RUN      ] TRACE_syscall.skip_after_ptrace
> seccomp_bpf.c:1622:TRACE_syscall.skip_after_ptrace:Expected -1 (18446744073709551615) == syscall(39) (26)
> seccomp_bpf.c:1623:TRACE_syscall.skip_after_ptrace:Expected 1 (1) == (*__errno_location ()) (22)
> [     FAIL ] TRACE_syscall.skip_after_ptrace
> [ RUN      ] TRACE_syscall.kill_after_ptrace
> TRACE_syscall.kill_after_ptrace: Test exited normally instead of by signal (code: 1)
> [     FAIL ] TRACE_syscall.kill_after_ptrace

Fixes: 26703c636c ("um/ptrace: run seccomp after ptrace")

Signed-off-by: Mickaël Salaün <mic@digikod.net>
Acked-by: Kees Cook <keescook@chromium.org>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Richard Weinberger <richard@nod.at>
Cc: James Morris <jmorris@namei.org>
Cc: user-mode-linux-devel@lists.sourceforge.net
Signed-off-by: James Morris <james.l.morris@oracle.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
2016-09-07 09:25:04 -07:00
Kees Cook
e6971009a9 x86/uaccess: force copy_*_user() to be inlined
As already done with __copy_*_user(), mark copy_*_user() as __always_inline.
Without this, the checks for things like __builtin_const_p() won't work
consistently in either hardened usercopy nor the recent adjustments for
detecting usercopy overflows at compile time.

The change in kernel text size is detectable, but very small:

 text      data     bss     dec      hex     filename
12118735  5768608 14229504 32116847 1ea106f vmlinux.before
12120207  5768608 14229504 32118319 1ea162f vmlinux.after

Signed-off-by: Kees Cook <keescook@chromium.org>
2016-09-06 12:16:42 -07:00
Jiri Olsa
79d102cbfd perf/x86/intel/cqm: Check cqm/mbm enabled state in event init
Yanqiu Zhang reported kernel panic when using mbm event
on system where CQM is detected but without mbm event
support, like with perf:

  # perf stat -e 'intel_cqm/event=3/' -a

  BUG: unable to handle kernel NULL pointer dereference at 0000000000000020
  IP: [<ffffffff8100d64c>] update_sample+0xbc/0xe0
  ...
   <IRQ>
   [<ffffffff8100d688>] __intel_mbm_event_init+0x18/0x20
   [<ffffffff81113d6b>] flush_smp_call_function_queue+0x7b/0x160
   [<ffffffff81114853>] generic_smp_call_function_single_interrupt+0x13/0x60
   [<ffffffff81052017>] smp_call_function_interrupt+0x27/0x40
   [<ffffffff816fb06c>] call_function_interrupt+0x8c/0xa0
  ...

The reason is that we currently allow to init mbm event
even if mbm support is not detected.  Adding checks for
both cqm and mbm events and support into cqm's event_init.

Fixes: 33c3cc7acf ("perf/x86/mbm: Add Intel Memory B/W Monitoring enumeration and init")
Reported-by: Yanqiu Zhang <yanqzhan@redhat.com>
Signed-off-by: Jiri Olsa <jolsa@redhat.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Vikas Shivappa <vikas.shivappa@linux.intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/1473089407-21857-1-git-send-email-jolsa@kernel.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-09-06 10:42:12 +02:00
Wanpeng Li
e12c8f36f3 KVM: lapic: adjust preemption timer correctly when goes TSC backward
TSC_OFFSET will be adjusted if discovers TSC backward during vCPU load.
The preemption timer, which relies on the guest tsc to reprogram its
preemption timer value, is also reprogrammed if vCPU is scheded in to
a different pCPU. However, the current implementation reprogram preemption
timer before TSC_OFFSET is adjusted to the right value, resulting in the
preemption timer firing prematurely.

This patch fix it by adjusting TSC_OFFSET before reprogramming preemption
timer if TSC backward.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krċmář <rkrcmar@redhat.com>
Cc: Yunhong Jiang <yunhong.jiang@intel.com>
Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-09-05 16:14:39 +02:00
Jeffrey Hugo
d64934019f x86/efi: Use efi_exit_boot_services()
The eboot code directly calls ExitBootServices.  This is inadvisable as the
UEFI spec details a complex set of errors, race conditions, and API
interactions that the caller of ExitBootServices must get correct.  The
eboot code attempts allocations after calling ExitBootSerives which is
not permitted per the spec.  Call the efi_exit_boot_services() helper
intead, which handles the allocation scenario properly.

Signed-off-by: Jeffrey Hugo <jhugo@codeaurora.org>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Leif Lindholm <leif.lindholm@linaro.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk>
2016-09-05 12:40:16 +01:00
Jeffrey Hugo
dadb57abc3 efi/libstub: Allocate headspace in efi_get_memory_map()
efi_get_memory_map() allocates a buffer to store the memory map that it
retrieves.  This buffer may need to be reused by the client after
ExitBootServices() is called, at which point allocations are not longer
permitted.  To support this usecase, provide the allocated buffer size back
to the client, and allocate some additional headroom to account for any
reasonable growth in the map that is likely to happen between the call to
efi_get_memory_map() and the client reusing the buffer.

Signed-off-by: Jeffrey Hugo <jhugo@codeaurora.org>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Leif Lindholm <leif.lindholm@linaro.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk>
2016-09-05 12:18:17 +01:00
Borislav Petkov
cc2187a6e0 x86/microcode/AMD: Fix load of builtin microcode with randomized memory
We do not need to add the randomization offset when the microcode is
built in.

Reported-and-tested-by: Emanuel Czirai <icanrealizeum@gmail.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: http://lkml.kernel.org/r/20160904093736.GA11939@pd.tnic
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-09-05 10:38:56 +02:00
Linus Torvalds
9ca581b50d Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fix from Thomas Gleixner:
 "A single fix for an AMD erratum so machines without a BIOS fix work"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/AMD: Apply erratum 665 on machines without a BIOS fix
2016-09-04 08:45:41 -07:00
Emanuel Czirai
d199299675 x86/AMD: Apply erratum 665 on machines without a BIOS fix
AMD F12h machines have an erratum which can cause DIV/IDIV to behave
unpredictably. The workaround is to set MSRC001_1029[31] but sometimes
there is no BIOS update containing that workaround so let's do it
ourselves unconditionally. It is simple enough.

[ Borislav: Wrote commit message. ]

Signed-off-by: Emanuel Czirai <icanrealizeum@gmail.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Yaowu Xu <yaowu@google.com>
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/20160902053550.18097-1-bp@alien8.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-09-02 20:42:28 +02:00
Steven Rostedt
15301a5707 x86/paravirt: Do not trace _paravirt_ident_*() functions
Łukasz Daniluk reported that on a RHEL kernel that his machine would lock up
after enabling function tracer. I asked him to bisect the functions within
available_filter_functions, which he did and it came down to three:

  _paravirt_nop(), _paravirt_ident_32() and _paravirt_ident_64()

It was found that this is only an issue when noreplace-paravirt is added
to the kernel command line.

This means that those functions are most likely called within critical
sections of the funtion tracer, and must not be traced.

In newer kenels _paravirt_nop() is defined within gcc asm(), and is no
longer an issue.  But both _paravirt_ident_{32,64}() causes the
following splat when they are traced:

 mm/pgtable-generic.c:33: bad pmd ffff8800d2435150(0000000001d00054)
 mm/pgtable-generic.c:33: bad pmd ffff8800d3624190(0000000001d00070)
 mm/pgtable-generic.c:33: bad pmd ffff8800d36a5110(0000000001d00054)
 mm/pgtable-generic.c:33: bad pmd ffff880118eb1450(0000000001d00054)
 NMI watchdog: BUG: soft lockup - CPU#2 stuck for 22s! [systemd-journal:469]
 Modules linked in: e1000e
 CPU: 2 PID: 469 Comm: systemd-journal Not tainted 4.6.0-rc4-test+ #513
 Hardware name: Hewlett-Packard HP Compaq Pro 6300 SFF/339A, BIOS K01 v02.05 05/07/2012
 task: ffff880118f740c0 ti: ffff8800d4aec000 task.ti: ffff8800d4aec000
 RIP: 0010:[<ffffffff81134148>]  [<ffffffff81134148>] queued_spin_lock_slowpath+0x118/0x1a0
 RSP: 0018:ffff8800d4aefb90  EFLAGS: 00000246
 RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffff88011eb16d40
 RDX: ffffffff82485760 RSI: 000000001f288820 RDI: ffffea0000008030
 RBP: ffff8800d4aefb90 R08: 00000000000c0000 R09: 0000000000000000
 R10: ffffffff821c8e0e R11: 0000000000000000 R12: ffff880000200fb8
 R13: 00007f7a4e3f7000 R14: ffffea000303f600 R15: ffff8800d4b562e0
 FS:  00007f7a4e3d7840(0000) GS:ffff88011eb00000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 00007f7a4e3f7000 CR3: 00000000d3e71000 CR4: 00000000001406e0
 Call Trace:
   _raw_spin_lock+0x27/0x30
   handle_pte_fault+0x13db/0x16b0
   handle_mm_fault+0x312/0x670
   __do_page_fault+0x1b1/0x4e0
   do_page_fault+0x22/0x30
   page_fault+0x28/0x30
   __vfs_read+0x28/0xe0
   vfs_read+0x86/0x130
   SyS_read+0x46/0xa0
   entry_SYSCALL_64_fastpath+0x1e/0xa8
 Code: 12 48 c1 ea 0c 83 e8 01 83 e2 30 48 98 48 81 c2 40 6d 01 00 48 03 14 c5 80 6a 5d 82 48 89 0a 8b 41 08 85 c0 75 09 f3 90 8b 41 08 <85> c0 74 f7 4c 8b 09 4d 85 c9 74 08 41 0f 18 09 eb 02 f3 90 8b

Reported-by: Łukasz Daniluk <lukasz.daniluk@intel.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Cc: stable@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-09-02 09:40:47 -07:00
Arnd Bergmann
236dec0510 kconfig: tinyconfig: provide whole choice blocks to avoid warnings
Using "make tinyconfig" produces a couple of annoying warnings that show
up for build test machines all the time:

    .config:966:warning: override: NOHIGHMEM changes choice state
    .config:965:warning: override: SLOB changes choice state
    .config:963:warning: override: KERNEL_XZ changes choice state
    .config:962:warning: override: CC_OPTIMIZE_FOR_SIZE changes choice state
    .config:933:warning: override: SLOB changes choice state
    .config:930:warning: override: CC_OPTIMIZE_FOR_SIZE changes choice state
    .config:870:warning: override: SLOB changes choice state
    .config:868:warning: override: KERNEL_XZ changes choice state
    .config:867:warning: override: CC_OPTIMIZE_FOR_SIZE changes choice state

I've made a previous attempt at fixing them and we discussed a number of
alternatives.

I tried changing the Makefile to use "merge_config.sh -n
$(fragment-list)" but couldn't get that to work properly.

This is yet another approach, based on the observation that we do want
to see a warning for conflicting 'choice' options, and that we can
simply make them non-conflicting by listing all other options as
disabled.  This is a trivial patch that we can apply independent of
plans for other changes.

Link: http://lkml.kernel.org/r/20160829214952.1334674-2-arnd@arndb.de
Link: https://storage.kernelci.org/mainline/v4.7-rc6/x86-tinyconfig/build.log
https://patchwork.kernel.org/patch/9212749/
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
Reviewed-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-09-01 17:52:01 -07:00
Bjorn Helgaas
6af7e4f772 PCI: Mark Haswell Power Control Unit as having non-compliant BARs
The Haswell Power Control Unit has a non-PCI register (CONFIG_TDP_NOMINAL)
where BAR 0 is supposed to be.  This is erratum HSE43 in the spec update
referenced below:

  The PCIe* Base Specification indicates that Configuration Space Headers
  have a base address register at offset 0x10.  Due to this erratum, the
  Power Control Unit's CONFIG_TDP_NOMINAL CSR (Bus 1; Device 30; Function
  3; Offset 0x10) is located where a base register is expected.

Mark the PCU as having non-compliant BARs so we don't try to probe any of
them.  There are no other BARs on this device.

Rename the quirk so it's not Broadwell-specific.

Link: http://www.intel.com/content/www/us/en/processors/xeon/xeon-e5-v3-spec-update.html
Link: http://www.intel.com/content/www/us/en/processors/xeon/xeon-e5-v3-datasheet-vol-2.html (section 5.4, Device 30 Function 3)
Link: https://bugzilla.kernel.org/show_bug.cgi?id=153881
Reported-by: Paul Menzel <pmenzel@molgen.mpg.de>
Tested-by: Prarit Bhargava <prarit@redhat.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Myron Stowe <myron.stowe@redhat.com>
2016-09-01 08:52:29 -05:00