Commit Graph

13867 Commits

Author SHA1 Message Date
Prarit Bhargava
843c57916d tools/power turbostat: set max_num_cpus equal to the cpumask length
Future fixes will use sysfs files that contain cpumask output.  The code
needs to know the length of the cpumask in order to determine which cpus
are set in a cpumask.  Currently topo.max_cpu_num is the maximum cpu
number.  It can be increased the the maximum value of cpus represented in
cpumasks.

Set max_num_cpus to the length of a cpumask.

Signed-off-by: Prarit Bhargava <prarit@redhat.com>
Signed-off-by: Len Brown <len.brown@intel.com>
2018-06-01 23:12:46 -04:00
Chen Yu
023fe0ac97 tools/power turbostat: if --num_iterations, print for specific number of iterations
There's a use case during test to only print specific round of iterations
if --num_iterations is specified, for example, with this patch applied:

turbostat -i 5 -n 4
will capture 4 samples with 5 seconds interval.

[lenb: renamed to --num_iterations from --iterations]

Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Chen Yu <yu.c.chen@intel.com>
Reviewed-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
2018-06-01 23:12:45 -04:00
Srinivas Pandruvada
997e53950e tools/power turbostat: Add Cannon Lake support
All MSRs related to turbostat are same as Kabylake.
Even though SDM claims that core C3 residency can be read from MSR 0x662,
the read on this MSR fails on CNL platform. Hence disabled C3 MSR read
and display.

Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
2018-06-01 23:12:45 -04:00
Len Brown
9d4eab02a7 tools/power turbostat: delete duplicate #defines
The SNB_C1_AUTO_UNDEMOTE definition should have been deleted once
it was copied into msr-index.h.  One copy of the truth is better --
particularly when Matt needs to fix it:-)

Signed-off-by: Len Brown <len.brown@intel.com>
2018-06-01 23:12:45 -04:00
Matt Turner
e0d34648b4 tools/power turbostat: Correct SNB_C1/C3_AUTO_UNDEMOTE defines
According to the Intel Software Developers' Manual, Vol. 4, Order No.
335592, these macros have been reversed since they were added.

Fixes: 889facbee3 ("tools/power turbostat: v3.0: monitor Watts and Temperature")
Signed-off-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Len Brown <len.brown@intel.com>
2018-06-01 23:12:44 -04:00
Len Brown
0748eaf0cf tools/power turbostat: add POLL and POLL% column
Like the "C1" and "C1%" column, the new POLL and POLL% columns
show invocations and residency% during the measurement interval.

While it didn't seem important to track in the past,
we've recently found some Linux cpuidle bugs related to POLL%.

Signed-off-by: Len Brown <len.brown@intel.com>
2018-06-01 23:12:44 -04:00
Len Brown
4bd1f8f21a tools/power turbostat: Fix --hide Pk%pc10
The column header for PC10 residency is "Pk%pc10"
This is missing the 'g' that others have, eg Pkg%pc6,
to allow tab-delimited columns to fit into 8-columns.

However, --hide Pk%pc10 did not work, it was still looking for the 'g'.
This was confusing, because --list shows the correct "Pk%pc10"

Reported-by: Wendy Wang <wendy.wang@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
2018-06-01 23:12:44 -04:00
Len Brown
be0e54c4eb tools/power turbostat: Build-in "Low Power Idle" counters support
Linux 4.15 exports the ACPI Low Power Idle Table's
counters in /sys/devices/system/cpu/cpuidle/

low_power_idle_cpu_residency_us

	Show this in the "CPU%LPI" column.

	Today this reflects the "North Complex"
	residency in PC10, so expect it to
	closely follow "Pk%pc10".

low_power_idle_system_residency_us

	Show this in the "SYS%LPI" column.

	Today, this reflects the North is in PC10,
	plus the PCH is sufficiently quiescent
	to save additional power via the "S0ix"
	system state, as measured by the
	PCH SLP_S0 counter.

Signed-off-by: Len Brown <len.brown@intel.com>
2018-06-01 23:12:40 -04:00
Laura Abbott
e29dc460d6 tools/power turbostat: Don't make man pages executable
rpm-lint flagged these as being executable:

kernel-tools.x86_64: W: spurious-executable-perm /usr/share/man/man8/turbostat.8.gz
kernel-tools.x86_64: W: spurious-executable-perm /usr/share/man/man8/x86_energy_perf_policy.8.gz

Fix this

Signed-off-by: Laura Abbott <labbott@redhat.com>
Signed-off-by: Len Brown <len.brown@intel.com>
2018-06-01 17:15:09 -04:00
Len Brown
94d6ab4b11 tools/power turbostat: remove blank lines
When the user reuests to collect and show columns
that are not present on every row (eg. for every CPU)
turbostat still prints an (empty) line for every CPU.
Update so no blank lines are printed.

old:
	# turbostat --quiet --show Pkg%pc6
	Pkg%pc6
	9.12
	9.12

	Pkg%pc6
	9.12
	9.12

new:
	# turbostat --quiet --show Pkg%pc6
	Pkg%pc6
	9.12
	9.12
	Pkg%pc6
	9.12
	9.12

Reported-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
2018-06-01 17:15:09 -04:00
Artem Bityutskiy
3e8b62bf0c tools/power turbostat: a small C-states dump readability immprovement
Improve readability a little bit by changing this output:

 MSR_PKG_CST_CONFIG_CONTROL: 0x00008407 (locked: pkg-cstate-limit=7: unlimited, automatic-c-state-conversion=off)

with this output:

 MSR_PKG_CST_CONFIG_CONTROL: 0x00008407 (locked, pkg-cstate-limit=7 (unlimited), automatic-c-state-conversion=off)

Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
2018-06-01 17:15:08 -04:00
Artem Bityutskiy
ac980e1357 tools/power turbostat: dump BDX, SKX automatic C-state conversion bit
BDX and SKX have a bit that tells them to PROMOTE shallow
C-states requests to MWAIT(C6).  It is generally a BIOS bug
if this bit is set.  As we have encountered that BIOS bug,
let's print this bit in turbostat debug output.

Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
2018-06-01 17:15:08 -04:00
Len Brown
733ef0f8e7 tools/power turbostat: do not hard-code 25MHz crystal on SKX
Some SKX use a 24 MHz crystal, so do not hard code 25 MHz.

Also, SKX crystal is not exact, because SKX uses an EMI reduction
circuit that costs a fraction of a percent.

Signed-off-by: Len Brown <len.brown@intel.com>
2018-06-01 17:15:08 -04:00
Len Brown
46c2797826 tools/power turbostat: fix possible sprintf buffer overflow
Signed-off-by: Len Brown <len.brown@intel.com>
2018-06-01 17:14:56 -04:00
Len Brown
fd3933ca7b tools/power turbostat: fix MSR_IA32_MISC_ENABLE MWAIT printout
MSR_IA32_MISC_ENABLE[18] is the MWAIT ENABLE bit, not DISABLE bit...

so

MSR_IA32_MISC_ENABLE: 0x00850089 (TCC EIST No-MWAIT PREFETCH TURBO)

should print as:

MSR_IA32_MISC_ENABLE: 0x00850089 (TCC EIST MWAIT PREFETCH TURBO)

Signed-off-by: Len Brown <len.brown@intel.com>
2018-06-01 12:13:06 -04:00
Artem Bityutskiy
47936f944e tools/power turbostat: fix printing on input
The recent patch that implements table printing on a keypress introduced a
regression - turbostat prints the table almost continuously if it is run from a
daemon program.

The problem is also easy to reproduce like this:

echo | turbostat

The reason is that we cannot assume that stdin is always a TTY. It can be many
things.

This patch adds fixes the problem by limiting the new keypress functionality to
TTYs only. If stdin is not a TTY, we just sleep for the full interval time.

While on it, clean-up 'do_sleep()' to return no value, as callers do not expect
that anyway.

Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
2018-06-01 12:13:05 -04:00
Len Brown
b9ad8ee0da tools/power turbostat: end current interval upon newline input
In turbostat interval mode, a newline typed on standard input
will now conclude the current interval.  Data will immediately
be collected and printed for that interval, and the next interval
will be started.

This is similar to the recently added SIGUSR1 feature.
But that is for use by programs, while this is for interactive use.

Signed-off-by: Len Brown <len.brown@intel.com>
2018-06-01 12:13:05 -04:00
Len Brown
072119606a tools/power turbostat: on SIGUSR1: sample, print and continue
Interval-mode turbostat now catches and discards SIGUSR1.

Thus, SIGUSR1 can be used to tell turbostat to cut short
the current measurement interval.  Turbostat will then start
the next measurement interval using the regular interval length.

This can be used to give turbostat variable intervals.
Invoke turbostat with --interval LARGE_NUMBER_SEC
and have a program that has permission to send it a SIGUSR1
always before LARGE_NUMBER_SEC expires.

It may also be useful to use "--enable Time_Of_Day_Seconds"
to observe the actual interval length.

Signed-off-by: Len Brown <len.brown@intel.com>
2018-06-01 12:13:04 -04:00
Len Brown
8aa2ed0b28 tools/power turbostat: on SIGINT: sample, print and exit
When running in interval-mode, catch interrupts
and print a final data record before exiting.

Signed-off-by: Len Brown <len.brown@intel.com>
2018-06-01 12:13:04 -04:00
Len Brown
3f44a5c62b tools/power turbostat: add --enable Time_Of_Day_Seconds
Add a Time_Of_Day_Seconds column showing when measurement
for each row was completed.  Units are [sec.subsec] since Epoch,
as reported by gettimeofday(2).

While useful to correlate turbostat output with other tools,
this built-in column is disabled, by default.

Add the "--enable" option to enable such disabled-by-default
built-in columns:

"--enable Time_Of_Day_Seconds"
"--enable usec"

"--enable all", will enable all disabled-by-defauilt built-in counters.

When "--debug" is used, all disabled-by-default columns are enabled,
unless explicitly skipped using "--hide"

Signed-off-by: Len Brown <len.brown@intel.com>
2018-06-01 12:13:04 -04:00
Artem Bityutskiy
2085e12441 tools/power turbostat: fix Skylake Xeon package C-state display
Turbostat neglects to display all package C-states for some Skylake Xeon BIOS configurations.

This is due to a typo in the table decoding MSR_PKG_CST_CONFIG_CONTROL (0x000000e2)

Here we fix that typo, according to Intel SDM, vol 4, Table 2-41 -
"MSRs Supported by Intel® Xeon® Processor Scalable Family with DisplayFamily_DisplayModel 06_55H".

Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
2018-06-01 12:13:03 -04:00
Linus Torvalds
bc2dbc5420 Merge branch 'akpm' (patches from Andrew)
Merge misc fixes from Andrew Morton:
 "16 fixes"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
  kasan: fix memory hotplug during boot
  kasan: free allocated shadow memory on MEM_CANCEL_ONLINE
  checkpatch: fix macro argument precedence test
  init/main.c: include <linux/mem_encrypt.h>
  kernel/sys.c: fix potential Spectre v1 issue
  mm/memory_hotplug: fix leftover use of struct page during hotplug
  proc: fix smaps and meminfo alignment
  mm: do not warn on offline nodes unless the specific node is explicitly requested
  mm, memory_hotplug: make has_unmovable_pages more robust
  mm/kasan: don't vfree() nonexistent vm_area
  MAINTAINERS: change hugetlbfs maintainer and update files
  ipc/shm: fix shmat() nil address after round-down when remapping
  Revert "ipc/shm: Fix shmat mmap nil-page protection"
  idr: fix invalid ptr dereference on item delete
  ocfs2: revert "ocfs2/o2hb: check len for bio_add_page() to avoid getting incorrect bio"
  mm: fix nr_rotate_swap leak in swapon() error case
2018-05-25 20:24:28 -07:00
Linus Torvalds
03250e1028 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:
 "Let's begin the holiday weekend with some networking fixes:

   1) Whoops need to restrict cfg80211 wiphy names even more to 64
      bytes. From Eric Biggers.

   2) Fix flags being ignored when using kernel_connect() with SCTP,
      from Xin Long.

   3) Use after free in DCCP, from Alexey Kodanev.

   4) Need to check rhltable_init() return value in ipmr code, from Eric
      Dumazet.

   5) XDP handling fixes in virtio_net from Jason Wang.

   6) Missing RTA_TABLE in rtm_ipv4_policy[], from Roopa Prabhu.

   7) Need to use IRQ disabling spinlocks in mlx4_qp_lookup(), from Jack
      Morgenstein.

   8) Prevent out-of-bounds speculation using indexes in BPF, from
      Daniel Borkmann.

   9) Fix regression added by AF_PACKET link layer cure, from Willem de
      Bruijn.

  10) Correct ENIC dma mask, from Govindarajulu Varadarajan.

  11) Missing config options for PMTU tests, from Stefano Brivio"

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (48 commits)
  ibmvnic: Fix partial success login retries
  selftests/net: Add missing config options for PMTU tests
  mlx4_core: allocate ICM memory in page size chunks
  enic: set DMA mask to 47 bit
  ppp: remove the PPPIOCDETACH ioctl
  ipv4: remove warning in ip_recv_error
  net : sched: cls_api: deal with egdev path only if needed
  vhost: synchronize IOTLB message with dev cleanup
  packet: fix reserve calculation
  net/mlx5: IPSec, Fix a race between concurrent sandbox QP commands
  net/mlx5e: When RXFCS is set, add FCS data into checksum calculation
  bpf: properly enforce index mask to prevent out-of-bounds speculation
  net/mlx4: Fix irq-unsafe spinlock usage
  net: phy: broadcom: Fix bcm_write_exp()
  net: phy: broadcom: Fix auxiliary control register reads
  net: ipv4: add missing RTA_TABLE to rtm_ipv4_policy
  net/mlx4: fix spelling mistake: "Inrerface" -> "Interface" and rephrase message
  ibmvnic: Only do H_EOI for mobility events
  tuntap: correctly set SOCKWQ_ASYNC_NOSPACE
  virtio-net: fix leaking page for gso packet during mergeable XDP
  ...
2018-05-25 19:54:42 -07:00
Matthew Wilcox
7a4deea1aa idr: fix invalid ptr dereference on item delete
If the radix tree underlying the IDR happens to be full and we attempt
to remove an id which is larger than any id in the IDR, we will call
__radix_tree_delete() with an uninitialised 'slot' pointer, at which
point anything could happen.  This was easiest to hit with a single
entry at id 0 and attempting to remove a non-0 id, but it could have
happened with 64 entries and attempting to remove an id >= 64.

Roman said:

  The syzcaller test boils down to opening /dev/kvm, creating an
  eventfd, and calling a couple of KVM ioctls. None of this requires
  superuser. And the result is dereferencing an uninitialized pointer
  which is likely a crash. The specific path caught by syzbot is via
  KVM_HYPERV_EVENTD ioctl which is new in 4.17. But I guess there are
  other user-triggerable paths, so cc:stable is probably justified.

Matthew added:

  We have around 250 calls to idr_remove() in the kernel today. Many of
  them pass an ID which is embedded in the object they're removing, so
  they're safe. Picking a few likely candidates:

  drivers/firewire/core-cdev.c looks unsafe; the ID comes from an ioctl.
  drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c is similar
  drivers/atm/nicstar.c could be taken down by a handcrafted packet

Link: http://lkml.kernel.org/r/20180518175025.GD6361@bombadil.infradead.org
Fixes: 0a835c4f09 ("Reimplement IDR and IDA using the radix tree")
Reported-by: <syzbot+35666cba7f0a337e2e79@syzkaller.appspotmail.com>
Debugged-by: Roman Kagan <rkagan@virtuozzo.com>
Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-05-25 18:12:10 -07:00
David S. Miller
d2f30f5172 Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf
Daniel Borkmann says:

====================
pull-request: bpf 2018-05-24

The following pull-request contains BPF updates for your *net* tree.

The main changes are:

1) Fix a bug in the original fix to prevent out of bounds speculation when
   multiple tail call maps from different branches or calls end up at the
   same tail call helper invocation, from Daniel.

2) Two selftest fixes, one in reuseport_bpf_numa where test is skipped in
   case of missing numa support and another one to update kernel config to
   properly support xdp_meta.sh test, from Anders.

 ...

Would be great if you have a chance to merge net into net-next after that.

The verifier fix would be needed later as a dependency in bpf-next for
upcomig work there. When you do the merge there's a trivial conflict on
BPF side with 849fa50662 ("bpf/verifier: refine retval R0 state for
bpf_get_stack helper"): Resolution is to keep both functions, the
do_refine_retval_range() and record_func_map().
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-25 15:37:41 -04:00
Stefano Brivio
24e4b075d8 selftests/net: Add missing config options for PMTU tests
PMTU tests in pmtu.sh need support for VTI, VTI6 and dummy
interfaces: add them to config file.

Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org>
Fixes: d1f1b9cbf3 ("selftests: net: Introduce first PMTU test")
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-25 15:11:21 -04:00
Anders Roxell
1a2b80ecc7 selftests: net: reuseport_bpf_numa: don't fail if no numa support
The reuseport_bpf_numa test case fails there's no numa support.  The
test shouldn't fail if there's no support it should be skipped.

Fixes: 3c2c3c16aa ("reuseport, bpf: add test case for bpf_get_numa_node_id")
Signed-off-by: Anders Roxell <anders.roxell@linaro.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-05-23 12:21:02 +02:00
Linus Torvalds
3b78ce4a34 Merge branch 'speck-v20' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Merge speculative store buffer bypass fixes from Thomas Gleixner:

 - rework of the SPEC_CTRL MSR management to accomodate the new fancy
   SSBD (Speculative Store Bypass Disable) bit handling.

 - the CPU bug and sysfs infrastructure for the exciting new Speculative
   Store Bypass 'feature'.

 - support for disabling SSB via LS_CFG MSR on AMD CPUs including
   Hyperthread synchronization on ZEN.

 - PRCTL support for dynamic runtime control of SSB

 - SECCOMP integration to automatically disable SSB for sandboxed
   processes with a filter flag for opt-out.

 - KVM integration to allow guests fiddling with SSBD including the new
   software MSR VIRT_SPEC_CTRL to handle the LS_CFG based oddities on
   AMD.

 - BPF protection against SSB

.. this is just the core and x86 side, other architecture support will
come separately.

* 'speck-v20' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (49 commits)
  bpf: Prevent memory disambiguation attack
  x86/bugs: Rename SSBD_NO to SSB_NO
  KVM: SVM: Implement VIRT_SPEC_CTRL support for SSBD
  x86/speculation, KVM: Implement support for VIRT_SPEC_CTRL/LS_CFG
  x86/bugs: Rework spec_ctrl base and mask logic
  x86/bugs: Remove x86_spec_ctrl_set()
  x86/bugs: Expose x86_spec_ctrl_base directly
  x86/bugs: Unify x86_spec_ctrl_{set_guest,restore_host}
  x86/speculation: Rework speculative_store_bypass_update()
  x86/speculation: Add virtualized speculative store bypass disable support
  x86/bugs, KVM: Extend speculation control for VIRT_SPEC_CTRL
  x86/speculation: Handle HT correctly on AMD
  x86/cpufeatures: Add FEATURE_ZEN
  x86/cpufeatures: Disentangle SSBD enumeration
  x86/cpufeatures: Disentangle MSR_SPEC_CTRL enumeration from IBRS
  x86/speculation: Use synthetic bits for IBRS/IBPB/STIBP
  KVM: SVM: Move spec control call after restore of GS
  x86/cpu: Make alternative_msr_write work for 32-bit code
  x86/bugs: Fix the parameters alignment and missing void
  x86/bugs: Make cpu_show_common() static
  ...
2018-05-21 11:23:26 -07:00
Linus Torvalds
5aef268ace Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:

 1) Fix refcounting bug for connections in on-packet scheduling mode of
    IPVS, from Julian Anastasov.

 2) Set network header properly in AF_PACKET's packet_snd, from Willem
    de Bruijn.

 3) Fix regressions in 3c59x by converting to generic DMA API. It was
    relying upon the hack that the PCI DMA interfaces would accept NULL
    for EISA devices. From Christoph Hellwig.

 4) Remove RDMA devices before unregistering netdev in QEDE driver, from
    Michal Kalderon.

 5) Use after free in TUN driver ptr_ring usage, from Jason Wang.

 6) Properly check for missing netlink attributes in SMC_PNETID
    requests, from Eric Biggers.

 7) Set DMA mask before performaing any DMA operations in vmxnet3
    driver, from Regis Duchesne.

 8) Fix mlx5 build with SMP=n, from Saeed Mahameed.

 9) Classifier fixes in bcm_sf2 driver from Florian Fainelli.

10) Tuntap use after free during release, from Jason Wang.

11) Don't use stack memory in scatterlists in tls code, from Matt
    Mullins.

12) Not fully initialized flow key object in ipv4 routing code, from
    David Ahern.

13) Various packet headroom bug fixes in ip6_gre driver, from Petr
    Machata.

14) Remove queues from XPS maps using correct index, from Amritha
    Nambiar.

15) Fix use after free in sock_diag, from Eric Dumazet.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (64 commits)
  net: ip6_gre: fix tunnel metadata device sharing.
  cxgb4: fix offset in collecting TX rate limit info
  net: sched: red: avoid hashing NULL child
  sock_diag: fix use-after-free read in __sk_free
  sh_eth: Change platform check to CONFIG_ARCH_RENESAS
  net: dsa: Do not register devlink for unused ports
  net: Fix a bug in removing queues from XPS map
  bpf: fix truncated jump targets on heavy expansions
  bpf: parse and verdict prog attach may race with bpf map update
  bpf: sockmap update rollback on error can incorrectly dec prog refcnt
  net: test tailroom before appending to linear skb
  net: ip6_gre: Fix ip6erspan hlen calculation
  net: ip6_gre: Split up ip6gre_changelink()
  net: ip6_gre: Split up ip6gre_newlink()
  net: ip6_gre: Split up ip6gre_tnl_change()
  net: ip6_gre: Split up ip6gre_tnl_link_config()
  net: ip6_gre: Fix headroom request in ip6erspan_tunnel_xmit()
  net: ip6_gre: Request headroom in __gre6_xmit()
  selftests/bpf: check return value of fopen in test_verifier.c
  erspan: fix invalid erspan version.
  ...
2018-05-21 08:37:48 -07:00
Linus Torvalds
8a6bd2f40e Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Thomas Gleixner:
 "An unfortunately larger set of fixes, but a large portion is
  selftests:

   - Fix the missing clusterid initializaiton for x2apic cluster
     management which caused boot failures due to IPIs being sent to the
     wrong cluster

   - Drop TX_COMPAT when a 64bit executable is exec()'ed from a compat
     task

   - Wrap access to __supported_pte_mask in __startup_64() where clang
     compile fails due to a non PC relative access being generated.

   - Two fixes for 5 level paging fallout in the decompressor:

      - Handle GOT correctly for paging_prepare() and
        cleanup_trampoline()

      - Fix the page table handling in cleanup_trampoline() to avoid
        page table corruption.

   - Stop special casing protection key 0 as this is inconsistent with
     the manpage and also inconsistent with the allocation map handling.

   - Override the protection key wen moving away from PROT_EXEC to
     prevent inaccessible memory.

   - Fix and update the protection key selftests to address breakage and
     to cover the above issue

   - Add a MOV SS self test"

[ Part of the x86 fixes were in the earlier core pull due to dependencies ]

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (21 commits)
  x86/mm: Drop TS_COMPAT on 64-bit exec() syscall
  x86/apic/x2apic: Initialize cluster ID properly
  x86/boot/compressed/64: Fix moving page table out of trampoline memory
  x86/boot/compressed/64: Set up GOT for paging_prepare() and cleanup_trampoline()
  x86/pkeys: Do not special case protection key 0
  x86/pkeys/selftests: Add a test for pkey 0
  x86/pkeys/selftests: Save off 'prot' for allocations
  x86/pkeys/selftests: Fix pointer math
  x86/pkeys: Override pkey when moving away from PROT_EXEC
  x86/pkeys/selftests: Fix pkey exhaustion test off-by-one
  x86/pkeys/selftests: Add PROT_EXEC test
  x86/pkeys/selftests: Factor out "instruction page"
  x86/pkeys/selftests: Allow faults on unknown keys
  x86/pkeys/selftests: Avoid printf-in-signal deadlocks
  x86/pkeys/selftests: Remove dead debugging code, fix dprint_in_signal
  x86/pkeys/selftests: Stop using assert()
  x86/pkeys/selftests: Give better unexpected fault error messages
  x86/selftests: Add mov_to_ss test
  x86/mpx/selftests: Adjust the self-test to fresh distros that export the MPX ABI
  x86/pkeys/selftests: Adjust the self-test to fresh distros that export the pkeys ABI
  ...
2018-05-20 11:28:32 -07:00
Linus Torvalds
95bcce4d42 Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf tooling fixes from Thomas Gleixner:

 - fix segfault when processing unknown threads in cs-etm

 - fix "perf test inet_pton" on s390 failing due to missing inline

 - display all available events on 'perf annotate --stdio'

 - add missing newline when parsing an empty BPF program

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf tools: Add missing newline when parsing empty BPF proggie
  perf cs-etm: Remove redundant space
  perf cs-etm: Support unknown_thread in cs_etm_auxtrace
  perf annotate: Display all available events on --stdio
  perf test: "probe libc's inet_pton" fails on s390 due to missing inline
2018-05-20 11:18:42 -07:00
Linus Torvalds
583dbad340 Merge branch 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull core fixes from Thomas Gleixner:

 - Unbreak the BPF compilation which got broken by the unconditional
   requirement of asm-goto, which is not supported by clang.

 - Prevent probing on exception masking instructions in uprobes and
   kprobes to avoid the issues of the delayed exceptions instead of
   having an ugly workaround.

 - Prevent a double free_page() in the error path of do_kexec_load()

 - A set of objtool updates addressing various issues mostly related to
   switch tables and the noreturn detection for recursive sibling calls

 - Header sync for tools.

* 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  objtool: Detect RIP-relative switch table references, part 2
  objtool: Detect RIP-relative switch table references
  objtool: Support GCC 8 switch tables
  objtool: Support GCC 8's cold subfunctions
  objtool: Fix "noreturn" detection for recursive sibling calls
  objtool, kprobes/x86: Sync the latest <asm/insn.h> header with tools/objtool/arch/x86/include/asm/insn.h
  x86/cpufeature: Guard asm_volatile_goto usage for BPF compilation
  uprobes/x86: Prohibit probing on MOV SS instruction
  kprobes/x86: Prohibit probing on exception masking instructions
  x86/kexec: Avoid double free_page() upon do_kexec_load() failure
2018-05-20 10:01:38 -07:00
Josh Poimboeuf
7dec80ccbe objtool: Detect RIP-relative switch table references, part 2
With the following commit:

  fd35c88b74 ("objtool: Support GCC 8 switch tables")

I added a "can't find switch jump table" warning, to stop covering up
silent failures if add_switch_table() can't find anything.

That warning found yet another bug in the objtool switch table detection
logic.  For cases 1 and 2 (as described in the comments of
find_switch_table()), the find_symbol_containing() check doesn't adjust
the offset for RIP-relative switch jumps.

Incidentally, this bug was already fixed for case 3 with:

  6f5ec2993b ("objtool: Detect RIP-relative switch table references")

However, that commit missed the fix for cases 1 and 2.

The different cases are now starting to look more and more alike.  So
fix the bug by consolidating them into a single case, by checking the
original dynamic jump instruction in the case 3 loop.

This also simplifies the code and makes it more robust against future
switch table detection issues -- of which I'm sure there will be many...

Switch table detection has been the most fragile area of objtool, by
far.  I long for the day when we'll have a GCC plugin for annotating
switch tables.  Linus asked me to delay such a plugin due to the
flakiness of the plugin infrastructure in older versions of GCC, so this
rickety code is what we're stuck with for now.  At least the code is now
a little simpler than it was.

Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/f400541613d45689086329432f3095119ffbc328.1526674218.git.jpoimboe@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-05-19 08:10:04 +02:00
Ross Zwisler
fd8f58c40b radix tree test suite: multi-order iteration race
Add a test which shows a race in the multi-order iteration code.  This
test reliably hits the race in under a second on my machine, and is the
result of a real bug report against kernel a production v4.15 based
kernel (4.15.6-300.fc27.x86_64).  With a real kernel this issue is hit
when using order 9 PMD DAX radix tree entries.

The race has to do with how we tear down multi-order sibling entries
when we are removing an item from the tree.  Remember that an order 2
entry looks like this:

  struct radix_tree_node.slots[] = [entry][sibling][sibling][sibling]

where 'entry' is in some slot in the struct radix_tree_node, and the
three slots following 'entry' contain sibling pointers which point back
to 'entry.'

When we delete 'entry' from the tree, we call :

  radix_tree_delete()
    radix_tree_delete_item()
      __radix_tree_delete()
        replace_slot()

replace_slot() first removes the siblings in order from the first to the
last, then at then replaces 'entry' with NULL.  This means that for a
brief period of time we end up with one or more of the siblings removed,
so:

  struct radix_tree_node.slots[] = [entry][NULL][sibling][sibling]

This causes an issue if you have a reader iterating over the slots in
the tree via radix_tree_for_each_slot() while only under
rcu_read_lock()/rcu_read_unlock() protection.  This is a common case in
mm/filemap.c.

The issue is that when __radix_tree_next_slot() => skip_siblings() tries
to skip over the sibling entries in the slots, it currently does so with
an exact match on the slot directly preceding our current slot.
Normally this works:

                                      V preceding slot
  struct radix_tree_node.slots[] = [entry][sibling][sibling][sibling]
                                              ^ current slot

This lets you find the first sibling, and you skip them all in order.

But in the case where one of the siblings is NULL, that slot is skipped
and then our sibling detection is interrupted:

                                             V preceding slot
  struct radix_tree_node.slots[] = [entry][NULL][sibling][sibling]
                                                    ^ current slot

This means that the sibling pointers aren't recognized since they point
all the way back to 'entry', so we think that they are normal internal
radix tree pointers.  This causes us to think we need to walk down to a
struct radix_tree_node starting at the address of 'entry'.

In a real running kernel this will crash the thread with a GP fault when
you try and dereference the slots in your broken node starting at
'entry'.

In the radix tree test suite this will be caught by the address
sanitizer:

  ==27063==ERROR: AddressSanitizer: heap-buffer-overflow on address
  0x60c0008ae400 at pc 0x00000040ce4f bp 0x7fa89b8fcad0 sp 0x7fa89b8fcac0
  READ of size 8 at 0x60c0008ae400 thread T3
      #0 0x40ce4e in __radix_tree_next_slot /home/rzwisler/project/linux/tools/testing/radix-tree/radix-tree.c:1660
      #1 0x4022cc in radix_tree_next_slot linux/../../../../include/linux/radix-tree.h:567
      #2 0x4022cc in iterator_func /home/rzwisler/project/linux/tools/testing/radix-tree/multiorder.c:655
      #3 0x7fa8a088d50a in start_thread (/lib64/libpthread.so.0+0x750a)
      #4 0x7fa8a03bd16e in clone (/lib64/libc.so.6+0xf516e)

Link: http://lkml.kernel.org/r/20180503192430.7582-5-ross.zwisler@linux.intel.com
Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: CR, Sapthagirish <sapthagirish.cr@intel.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-05-18 17:17:12 -07:00
Ross Zwisler
3e252fa7d4 radix tree test suite: add item_delete_rcu()
Currently the lifetime of "struct item" entries in the radix tree are
not controlled by RCU, but are instead deleted inline as they are
removed from the tree.

In the following patches we add a test which has threads iterating over
items pulled from the tree and verifying them in an
rcu_read_lock()/rcu_read_unlock() section.  This means that though an
item has been removed from the tree it could still be being worked on by
other threads until the RCU grace period expires.  So, we need to
actually free the "struct item" structures at the end of the grace
period, just as we do with "struct radix_tree_node" items.

Link: http://lkml.kernel.org/r/20180503192430.7582-4-ross.zwisler@linux.intel.com
Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: CR, Sapthagirish <sapthagirish.cr@intel.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-05-18 17:17:12 -07:00
Ross Zwisler
dcbbf25adb radix tree test suite: fix compilation issue
Pulled from a patch from Matthew Wilcox entitled "xarray: Add definition
of struct xarray":

> From: Matthew Wilcox <mawilcox@microsoft.com>
> Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>

  https://patchwork.kernel.org/patch/10341249/

These defines fix this compilation error:

  In file included from ./linux/radix-tree.h:6:0,
                   from ./linux/../../../../include/linux/idr.h:15,
                   from ./linux/idr.h:1,
                   from idr.c:4:
  ./linux/../../../../include/linux/idr.h: In function `idr_init_base':
  ./linux/../../../../include/linux/radix-tree.h:129:2: warning: implicit declaration of function `spin_lock_init'; did you mean `spinlock_t'? [-Wimplicit-function-declaration]
    spin_lock_init(&(root)->xa_lock);    \
    ^
  ./linux/../../../../include/linux/idr.h:126:2: note: in expansion of macro `INIT_RADIX_TREE'
    INIT_RADIX_TREE(&idr->idr_rt, IDR_RT_MARKER);
    ^~~~~~~~~~~~~~~

by providing a spin_lock_init() wrapper for the v4.17-rc* version of the
radix tree test suite.

Link: http://lkml.kernel.org/r/20180503192430.7582-3-ross.zwisler@linux.intel.com
Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: CR, Sapthagirish <sapthagirish.cr@intel.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-05-18 17:17:12 -07:00
Ross Zwisler
8d9fa88edd radix tree test suite: fix mapshift build target
Commit c6ce3e2fe3 ("radix tree test suite: Add config option for map
shift") introduced a phony makefile target called 'mapshift' that ends
up generating the file generated/map-shift.h.  This phony target was
then added as a dependency of the top level 'targets' build target,
which is what is run when you go to tools/testing/radix-tree and just
type 'make'.

Unfortunately, this phony target doesn't actually work as a dependency,
so you end up getting:

  $ make
  make: *** No rule to make target 'generated/map-shift.h', needed by 'main.o'.  Stop.
  make: *** Waiting for unfinished jobs....

Fix this by making the file generated/map-shift.h our real makefile
target, and add this a dependency of the top level build target.

Link: http://lkml.kernel.org/r/20180503192430.7582-2-ross.zwisler@linux.intel.com
Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: CR, Sapthagirish <sapthagirish.cr@intel.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-05-18 17:17:12 -07:00
Anders Roxell
a6837d2667 selftests: bpf: config: enable NET_SCH_INGRESS for xdp_meta.sh
When running bpf's selftest test_xdp_meta.sh it fails:
./test_xdp_meta.sh
Error: Specified qdisc not found.
selftests: test_xdp_meta [FAILED]

Need to enable CONFIG_NET_SCH_INGRESS and CONFIG_NET_CLS_ACT to get the
test to pass.

Fixes: 22c8852624 ("bpf: improve selftests and add tests for meta pointer")
Signed-off-by: Anders Roxell <anders.roxell@linaro.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-05-18 21:39:39 +02:00
Jesper Dangaard Brouer
deea81228b selftests/bpf: check return value of fopen in test_verifier.c
Commit 0a67487403 ("selftests/bpf: Only run tests if !bpf_disabled")
forgot to check return value of fopen.

This caused some confusion, when running test_verifier (from
tools/testing/selftests/bpf/) on an older kernel (< v4.4) as it will
simply seqfault.

This fix avoids the segfault and prints an error, but allow program to
continue.  Given the sysctl was introduced in 1be7f75d16 ("bpf:
enable non-root eBPF programs"), we know that the running kernel
cannot support unpriv, thus continue with unpriv_disabled = true.

Fixes: 0a67487403 ("selftests/bpf: Only run tests if !bpf_disabled")
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-05-17 22:18:46 +02:00
Linus Torvalds
58ddfe6c3a * ARM/ARM64 locking fixes
* x86 fixes: PCID, UMIP, locking
 * Improved support for recent Windows version that have a 2048 Hz
 APIC timer.
 * Rename KVM_HINTS_DEDICATED CPUID bit to KVM_HINTS_REALTIME
 * Better behaved selftests.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQEcBAABAgAGBQJa/bkTAAoJEL/70l94x66Dzf8IAJ1GqtXi0CNbq8MvU4QIqw0L
 HLIRoe/QgkTeTUa2fwirEuu5I+/wUyPvy5sAIsn/F5eiZM7nciLm+fYzw6F2uPIm
 lSCqKpVwmh8dPl1SBaqPnTcB1HPVwcCgc2SF9Ph7yZCUwFUtoeUuPj8v6Qy6y21g
 jfobHFZa3MrFgi7kPxOXSrC1qxuNJL9yLB5mwCvCK/K7jj2nrGJkLLDuzgReCqvz
 isOdpof3hz8whXDQG5cTtybBgE9veym4YqJY8R5ANXBKqbFlhaNF1T3xXrdPMISZ
 7bsGgkhYEOqeQsPrFwzAIiFxe2DogFwkn1BcvJ1B+duXrayt5CBnDPRB6Yxg00M=
 =H0d0
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull kvm fixes from Paolo Bonzini:

 - ARM/ARM64 locking fixes

 - x86 fixes: PCID, UMIP, locking

 - improved support for recent Windows version that have a 2048 Hz APIC
   timer

 - rename KVM_HINTS_DEDICATED CPUID bit to KVM_HINTS_REALTIME

 - better behaved selftests

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  kvm: rename KVM_HINTS_DEDICATED to KVM_HINTS_REALTIME
  KVM: arm/arm64: VGIC/ITS save/restore: protect kvm_read_guest() calls
  KVM: arm/arm64: VGIC/ITS: protect kvm_read_guest() calls with SRCU lock
  KVM: arm/arm64: VGIC/ITS: Promote irq_lock() in update_affinity
  KVM: arm/arm64: Properly protect VGIC locks from IRQs
  KVM: X86: Lower the default timer frequency limit to 200us
  KVM: vmx: update sec exec controls for UMIP iff emulating UMIP
  kvm: x86: Suppress CR3_PCID_INVD bit only when PCIDs are enabled
  KVM: selftests: exit with 0 status code when tests cannot be run
  KVM: hyperv: idr_find needs RCU protection
  x86: Delay skip of emulated hypercall instruction
  KVM: Extend MAX_IRQ_ROUTES to 4096 for all archs
2018-05-17 10:23:36 -07:00
Ingo Molnar
f3903c9161 perf/urgent fixes:
- Fix segfault when processing unknown threads in cs-etm (Leo Yan)
 
 - Fix "perf test inet_pton" on s390 failing due to missing inline (Thomas Richter)
 
 - Display all available events on 'perf annotate --stdio' (Jin Yao)
 
 - Add missing newline when parsing empty BPF proggie (Arnaldo Carvalho de Melo)
 
 Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEELb9bqkb7Te0zijNb1lAW81NSqkAFAlr5fx4ACgkQ1lAW81NS
 qkCmfQ//ZbpL9ezoBuAfX61jv2n44ksqUV9NK+MZMfXr7hMQwY0ny2i8C9r/yhv4
 W5pEDLcCyfEGRryEQgJSGw7w792WPFt2CT4NG/0dBqlKjzo+yaCXtbCrzfBqPgzs
 G8x95JhONDVe5bwlRHscI7k5dEp/Rics1x46gtCGNgeHrHaTm/RfGxXdR/Fp09id
 JaiO0czT2JxrtaLqim5v2Rm7k47Vc0i34BQwVQ/Tx0eRlGpyyUGMEM/6EWP355gO
 wG2lTtY24RMX9wEPBiEpxUz5mCZibZSOFSnn2AWYnhV1pDoyrG7j3A3PG4nIzZ4/
 UuOsKQWA6ojgpbnk5S7Z7RUIZLy+OpzYAvExsF4zCleoNZ+BdcxyxZHwKDB1wEW1
 9YJ3mWdeE4fJqWnJaRxzUpPChdW9VCOTcks6lIlFKfABjdB3+nfCjShbiSg7l/bU
 V8umQCFKxe5j+ZJomrZXGZldX6k1CslXuipYB+aAaGlR8wUl0Z50KW9cWWBHN5pJ
 3tVKLpedxm2KAsFG119m+aA/stpq3z2idtOMU+A6k/bjCHYplKn/NLnDWPbXti5N
 ac09fDrjBaPPerHhAcZ742H3Ttt8hJgWe7COR4Rv5q9PZoU+NT//38eFR/8NtAIY
 1ngk0KlrNM7bR6Muzgqr9zINX7aB5vi3fCeWTB2yKZXo9sZ9sqk=
 =FXHV
 -----END PGP SIGNATURE-----

Merge tag 'perf-urgent-for-mingo-4.17-20180514' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/urgent

Pull perf/urgent fixes from Arnaldo Carvalho de Melo:

- Fix segfault when processing unknown threads in cs-etm (Leo Yan)

- Fix "perf test inet_pton" on s390 failing due to missing inline (Thomas Richter)

- Display all available events on 'perf annotate --stdio' (Jin Yao)

- Add missing newline when parsing empty BPF proggie (Arnaldo Carvalho de Melo)

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-05-15 08:20:45 +02:00
Josh Poimboeuf
6f5ec2993b objtool: Detect RIP-relative switch table references
Typically a switch table can be found by detecting a .rodata access
followed an indirect jump:

    1969:	4a 8b 0c e5 00 00 00 	mov    0x0(,%r12,8),%rcx
    1970:	00
			196d: R_X86_64_32S	.rodata+0x438
    1971:	e9 00 00 00 00       	jmpq   1976 <dispc_runtime_suspend+0xb6a>
			1972: R_X86_64_PC32	__x86_indirect_thunk_rcx-0x4

Randy Dunlap reported a case (seen with GCC 4.8) where the .rodata
access uses RIP-relative addressing:

    19bd:	48 8b 3d 00 00 00 00 	mov    0x0(%rip),%rdi        # 19c4 <dispc_runtime_suspend+0xbb8>
			19c0: R_X86_64_PC32	.rodata+0x45c
    19c4:	e9 00 00 00 00       	jmpq   19c9 <dispc_runtime_suspend+0xbbd>
			19c5: R_X86_64_PC32	__x86_indirect_thunk_rdi-0x4

In this case the relocation addend needs to be adjusted accordingly in
order to find the location of the switch table.

The fix is for case 3 (as described in the comments), but also make the
existing case 1 & 2 checks more precise by only adjusting the addend for
R_X86_64_PC32 relocations.

This fixes the following warnings:

  drivers/video/fbdev/omap2/omapfb/dss/dispc.o: warning: objtool: dispc_runtime_suspend()+0xbb8: sibling call from callable instruction with modified stack frame
  drivers/video/fbdev/omap2/omapfb/dss/dispc.o: warning: objtool: dispc_runtime_resume()+0xcc5: sibling call from callable instruction with modified stack frame

Reported-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/b6098294fd67afb69af8c47c9883d7a68bf0f8ea.1526305958.git.jpoimboe@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-05-15 07:30:59 +02:00
Dave Hansen
3488a600d9 x86/pkeys/selftests: Add a test for pkey 0
Protection key 0 is the default key for all memory and will
not normally come back from pkey_alloc().  But, you might
still want pass it to mprotect_pkey().

This check ensures that you can use pkey 0.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Michael Ellermen <mpe@ellerman.id.au>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ram Pai <linuxram@us.ibm.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20180509171356.9E40B254@viggo.jf.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-05-14 11:14:45 +02:00
Dave Hansen
acb25d761d x86/pkeys/selftests: Save off 'prot' for allocations
This makes it possible to to tell what 'prot' a given allocation
is supposed to have.  That way, if we want to change just the
pkey, we know what 'prot' to pass to mprotect_pkey().

Also, keep a record of the most recent allocation so the tests
can easily find it.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Michael Ellermen <mpe@ellerman.id.au>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ram Pai <linuxram@us.ibm.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20180509171354.AA23E228@viggo.jf.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-05-14 11:14:45 +02:00
Dave Hansen
3d64f4ed15 x86/pkeys/selftests: Fix pointer math
We dump out the entire area of the siginfo where the si_pkey_ptr is
supposed to be.  But, we do some math on the poitner, which is a u32.
We intended to do byte math, not u32 math on the pointer.

Cast it over to a u8* so it works.

Also, move this block of code to below th si_code check.  It doesn't
hurt anything, but the si_pkey field is gibberish for other signal
types.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Michael Ellermen <mpe@ellerman.id.au>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ram Pai <linuxram@us.ibm.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20180509171352.9BE09819@viggo.jf.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-05-14 11:14:45 +02:00
Dave Hansen
f50b487832 x86/pkeys/selftests: Fix pkey exhaustion test off-by-one
In our "exhaust all pkeys" test, we make sure that there
is the expected number available.  Turns out that the
test did not cover the execute-only key, but discussed
it anyway.  It did *not* discuss the test-allocated
key.

Now that we have a test for the mprotect(PROT_EXEC) case,
this off-by-one issue showed itself.  Correct the off-by-
one and add the explanation for the case we missed.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Michael Ellermen <mpe@ellerman.id.au>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ram Pai <linuxram@us.ibm.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20180509171350.E1656B95@viggo.jf.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-05-14 11:14:45 +02:00
Dave Hansen
6af17cf89e x86/pkeys/selftests: Add PROT_EXEC test
Under the covers, implement executable-only memory with
protection keys when userspace calls mprotect(PROT_EXEC).

But, we did not have a selftest for that.  Now we do.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Michael Ellermen <mpe@ellerman.id.au>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ram Pai <linuxram@us.ibm.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20180509171348.9EEE4BEF@viggo.jf.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-05-14 11:14:45 +02:00
Dave Hansen
3fcd2b2d92 x86/pkeys/selftests: Factor out "instruction page"
We currently have an execute-only test, but it is for
the explicit mprotect_pkey() interface.  We will soon
add a test for the implicit mprotect(PROT_EXEC)
enterface.  We need this code in both tests.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Michael Ellermen <mpe@ellerman.id.au>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ram Pai <linuxram@us.ibm.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20180509171347.C64AB733@viggo.jf.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-05-14 11:14:45 +02:00
Dave Hansen
7e7fd67ca3 x86/pkeys/selftests: Allow faults on unknown keys
The exec-only pkey is allocated inside the kernel and userspace
is not told what it is.  So, allow PK faults to occur that have
an unknown key.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Michael Ellermen <mpe@ellerman.id.au>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ram Pai <linuxram@us.ibm.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20180509171345.7FC7DA00@viggo.jf.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-05-14 11:14:45 +02:00
Dave Hansen
caf9eb6b4c x86/pkeys/selftests: Avoid printf-in-signal deadlocks
printf() and friends are unusable in signal handlers.  They deadlock.
The pkey selftest does not do any normal printing in signal handlers,
only extra debugging.  So, just print the format string so we get
*some* output when debugging.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Michael Ellermen <mpe@ellerman.id.au>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ram Pai <linuxram@us.ibm.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20180509171344.C53FD2F3@viggo.jf.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-05-14 11:14:45 +02:00