Highlights include:
- Support for direct mapped LPC on POWER9, giving Linux direct access to
devices that may be on there such as a UART.
- Memory hotplug support for the Power9 Radix MMU.
- Add new AUX vectors describing the processor's cache geometry, to be used by
glibc.
- The ability for a guest to ask the hypervisor to resize the guest's hash
table, and in addition support for doing so automatically when memory is
hotplugged into/out-of the guest. This allows the hash table to be sized
based on the current memory usage of the guest, rather than the maximum
possible memory usage.
- Implementation of optprobes (kprobe optimisation) for powerpc.
In addition there's the topic branch shared with the KVM tree, which includes
support for guests to use the Radix MMU on Power9.
Thanks to:
Alistair Popple, Andrew Donnellan, Aneesh Kumar K.V, Anju T, Anton Blanchard,
Benjamin Herrenschmidt, Chris Packham, Daniel Axtens, Daniel Borkmann, David
Gibson, Finn Thain, Gautham R. Shenoy, Gavin Shan, Greg Kurz, Joel Stanley,
John Allen, Madhavan Srinivasan, Mahesh Salgaonkar, Markus Elfring, Michael
Neuling, Nathan Fontenot, Naveen N. Rao, Nicholas Piggin, Paul Mackerras, Ravi
Bangoria, Reza Arbab, Shailendra Singh, Vaibhav Jain, Wei Yongjun.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJYrWj0AAoJEFHr6jzI4aWAsn4P/08Kz3TtOvDuuPGVNoO7fWOn
ag5/zVt8R4FuCALqWpAZbVqMuUU4wLxG0RuWlmBNYYhrjMC6JxHpOSjQXxM2D7YT
CdGTJxG414r6HMOeToL9i/z33o0m+KT07tscer+QMKlXVKCR2z0fEJch+zoPBHYA
5eVBFpLLTtbNiX9UnrcM/vYz61d56kT4YJey9/8qbAkTAc1rMPa8ucU5UiKYJ7yX
mF8cd7WE+7aqif9V8yN59G2rcbz+h3pbMw/gzImiYsYrUj4fLjU+VTKL5PPT+UFy
WWBXD3MAMm1dksMMZi3hgoo2BZhDn3RkymeYi6Jo4kDknNMPZzkMxGyvaJ8Eq5H/
bIYXdS1AbTtvaaEEuWqDFjpnChOEvj/8IeqitU0jjlql8BVjNKg/ESaaKucZr+pO
pk2Mfvw0Tb/lxJT5qj27yq4aRsxJwdFOoPYCN7MquPp/wV2Tg5M6h4nVQ4T6Wl0S
tMFQeCqXflhDWh0Xgr2UXpF66/YTj3Du5LasOTkgGeU30Z8TcNGFEmDWShKP3cEm
e0oQE+OHhIGN4WSBXBAto/Gw8/0v3pXlMs+VEfeHqenwPss2sWtSwXWe8khmiy9e
DJ48sTzj75/Zx1fiqRldw9YEnrL+NK0eOOpvzxeyKpfvdUytc+chFfEqUmO6kl3Z
DW2UmlZxmW+b0SfexCHL
=Icle
-----END PGP SIGNATURE-----
Merge tag 'powerpc-4.11-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc updates from Michael Ellerman:
"Highlights include:
- Support for direct mapped LPC on POWER9, giving Linux direct access
to devices that may be on there such as a UART.
- Memory hotplug support for the Power9 Radix MMU.
- Add new AUX vectors describing the processor's cache geometry, to
be used by glibc.
- The ability for a guest to ask the hypervisor to resize the guest's
hash table, and in addition support for doing so automatically when
memory is hotplugged into/out-of the guest. This allows the hash
table to be sized based on the current memory usage of the guest,
rather than the maximum possible memory usage.
- Implementation of optprobes (kprobe optimisation) for powerpc.
In addition there's the topic branch shared with the KVM tree, which
includes support for guests to use the Radix MMU on Power9.
Thanks to:
Alistair Popple, Andrew Donnellan, Aneesh Kumar K.V, Anju T, Anton
Blanchard, Benjamin Herrenschmidt, Chris Packham, Daniel Axtens,
Daniel Borkmann, David Gibson, Finn Thain, Gautham R. Shenoy, Gavin
Shan, Greg Kurz, Joel Stanley, John Allen, Madhavan Srinivasan,
Mahesh Salgaonkar, Markus Elfring, Michael Neuling, Nathan Fontenot,
Naveen N. Rao, Nicholas Piggin, Paul Mackerras, Ravi Bangoria, Reza
Arbab, Shailendra Singh, Vaibhav Jain, Wei Yongjun"
* tag 'powerpc-4.11-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (129 commits)
powerpc/mm/radix: Skip ptesync in pte update helpers
powerpc/mm/radix: Use ptep_get_and_clear_full when clearing pte for full mm
powerpc/mm/radix: Update pte update sequence for pte clear case
powerpc/mm: Update PROTFAULT handling in the page fault path
powerpc/xmon: Fix data-breakpoint
powerpc/mm: Fix build break with BOOK3S_64=n and MEMORY_HOTPLUG=y
powerpc/mm: Fix build break when CMA=n && SPAPR_TCE_IOMMU=y
powerpc/mm: Fix build break with RADIX=y & HUGETLBFS=n
powerpc/pseries: Fix typo in parameter description
powerpc/kprobes: Remove kprobe_exceptions_notify()
kprobes: Introduce weak variant of kprobe_exceptions_notify()
powerpc/ftrace: Fix confusing help text for DISABLE_MPROFILE_KERNEL
powerpc/powernv: Fix opal_exit tracepoint opcode
powerpc: Add a prototype for mcount() so it can be versioned
powerpc: Drop GPL from of_node_to_nid() export to match other arches
powerpc/kprobes: Optimize kprobe in kretprobe_trampoline()
powerpc/kprobes: Implement Optprobes
powerpc/kprobes: Fixes for kprobe_lookup_name() on BE
powerpc: Add helper to check if offset is within relative branch range
powerpc/bpf: Introduce __PPC_SH64()
...
Pull networking updates from David Miller:
"Highlights:
1) Support TX_RING in AF_PACKET TPACKET_V3 mode, from Sowmini
Varadhan.
2) Simplify classifier state on sk_buff in order to shrink it a bit.
From Willem de Bruijn.
3) Introduce SIPHASH and it's usage for secure sequence numbers and
syncookies. From Jason A. Donenfeld.
4) Reduce CPU usage for ICMP replies we are going to limit or
suppress, from Jesper Dangaard Brouer.
5) Introduce Shared Memory Communications socket layer, from Ursula
Braun.
6) Add RACK loss detection and allow it to actually trigger fast
recovery instead of just assisting after other algorithms have
triggered it. From Yuchung Cheng.
7) Add xmit_more and BQL support to mvneta driver, from Simon Guinot.
8) skb_cow_data avoidance in esp4 and esp6, from Steffen Klassert.
9) Export MPLS packet stats via netlink, from Robert Shearman.
10) Significantly improve inet port bind conflict handling, especially
when an application is restarted and changes it's setting of
reuseport. From Josef Bacik.
11) Implement TX batching in vhost_net, from Jason Wang.
12) Extend the dummy device so that VF (virtual function) features,
such as configuration, can be more easily tested. From Phil
Sutter.
13) Avoid two atomic ops per page on x86 in bnx2x driver, from Eric
Dumazet.
14) Add new bpf MAP, implementing a longest prefix match trie. From
Daniel Mack.
15) Packet sample offloading support in mlxsw driver, from Yotam Gigi.
16) Add new aquantia driver, from David VomLehn.
17) Add bpf tracepoints, from Daniel Borkmann.
18) Add support for port mirroring to b53 and bcm_sf2 drivers, from
Florian Fainelli.
19) Remove custom busy polling in many drivers, it is done in the core
networking since 4.5 times. From Eric Dumazet.
20) Support XDP adjust_head in virtio_net, from John Fastabend.
21) Fix several major holes in neighbour entry confirmation, from
Julian Anastasov.
22) Add XDP support to bnxt_en driver, from Michael Chan.
23) VXLAN offloads for enic driver, from Govindarajulu Varadarajan.
24) Add IPVTAP driver (IP-VLAN based tap driver) from Sainath Grandhi.
25) Support GRO in IPSEC protocols, from Steffen Klassert"
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1764 commits)
Revert "ath10k: Search SMBIOS for OEM board file extension"
net: socket: fix recvmmsg not returning error from sock_error
bnxt_en: use eth_hw_addr_random()
bpf: fix unlocking of jited image when module ronx not set
arch: add ARCH_HAS_SET_MEMORY config
net: napi_watchdog() can use napi_schedule_irqoff()
tcp: Revert "tcp: tcp_probe: use spin_lock_bh()"
net/hsr: use eth_hw_addr_random()
net: mvpp2: enable building on 64-bit platforms
net: mvpp2: switch to build_skb() in the RX path
net: mvpp2: simplify MVPP2_PRS_RI_* definitions
net: mvpp2: fix indentation of MVPP2_EXT_GLOBAL_CTRL_DEFAULT
net: mvpp2: remove unused register definitions
net: mvpp2: simplify mvpp2_bm_bufs_add()
net: mvpp2: drop useless fields in mvpp2_bm_pool and related code
net: mvpp2: remove unused 'tx_skb' field of 'struct mvpp2_tx_queue'
net: mvpp2: release reference to txq_cpu[] entry after unmapping
net: mvpp2: handle too large value in mvpp2_rx_time_coal_set()
net: mvpp2: handle too large value handling in mvpp2_rx_pkts_coal_set()
net: mvpp2: remove useless arguments in mvpp2_rx_{pkts, time}_coal_set
...
- Operating Performance Points (OPP) framework fixes, cleanups and
switch over from RCU-based synchronization to reference counting
using krefs (Viresh Kumar, Wei Yongjun, Dave Gerlach).
- cpufreq core cleanups and documentation updates (Viresh Kumar,
Rafael Wysocki).
- New cpufreq driver for Broadcom BMIPS SoCs (Markus Mayer).
- New cpufreq-dt sub-driver for TI SoCs requiring special handling,
like in the AM335x, AM437x, DRA7x, and AM57x families, along with
new DT bindings for it (Dave Gerlach, Paul Gortmaker).
- ARM64 SoCs support for the qoriq cpufreq driver (Tang Yuantian).
- intel_pstate driver updates including a new sysfs knob to control
the driver's operation mode and fixes related to the no_turbo
sysfs knob and the hardware-managed P-states feature support
(Rafael Wysocki, Srinivas Pandruvada).
- New interface to export ultra-turbo frequencies for the powernv
cpufreq driver (Shilpasri Bhat).
- Assorted fixes for cpufreq drivers (Arnd Bergmann, Dan Carpenter,
Wei Yongjun).
- devfreq core fixes, mostly related to the sysfs interface exported
by it (Chanwoo Choi, Chris Diamand).
- Updates of the exynos-bus and exynos-ppmu devfreq drivers (Chanwoo
Choi).
- Device PM QoS extension to support CPUs and support for per-CPU
wakeup (device resume) latency constraints in the cpuidle menu
governor (Alex Shi).
- Wakeup IRQs framework fixes (Grygorii Strashko).
- Generic power domains framework update including a fix to make
it handle asynchronous invocations of *noirq suspend/resume
callbacks correctly (Ulf Hansson, Geert Uytterhoeven).
- Assorted fixes and cleanups in the core suspend/hibernate code,
PM QoS framework and x86 ACPI idle support code (Corentin Labbe,
Geert Uytterhoeven, Geliang Tang, John Keeping, Nick Desaulniers).
- Update of the analyze_suspend.py script is updated to version 4.5
offering multiple improvements (Todd Brandt).
- New tool for intel_pstate diagnostics using the pstate_sample
tracepoint (Doug Smythies).
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQIcBAABCAAGBQJYq3IjAAoJEILEb/54YlRx/lYP+gNXhfETSzjd4kWSHy3FVEDb
gc5rMiE2j0OYgVSXwBI7p4EqMPy56lSWBASvbF2o6v9CIxb880KLFEsMDCVHwn46
6xfEnIRxf1oeRqn7EG9ZPIcTgNsUyvK+gah7zgLXu/0KU7ceXxygvNk47qpeOZ8f
dKYgIk/TOSGPC8H2nsg8VBKlK/ZOj5hID4F3MmFw6yDuWVCYuh2EokYXS4Nx0JwY
UQGpWtz+FWWs71vhgVl33GbPXWvPqA7OMe0btZ3RCnhnz4tA/mH+jDWiaspCdS3J
vKGeZyZptjIMJcufm3X7s7ghYjELheqQusMODDXk4AaWQ5nz8V5/h7NThYfa9J1b
M93Tb0rMb2MqUhBpv/M6D3qQroZmhq55QKfQrul3QWSOiQUzTWJcbbpyeBQ7nkrI
F1qNqQfuCnBL/r9y7HpW8P2iFg9kCHkwTtXMdp/lzGXdKzSGtAUSkYg5ohnUzQTp
2WCPTEk+5DxLVPjW5rDoZOotr5p1kdcdWBk6r3MEWRokZK6PJo7rJBcnTtXSo2mO
lLRba006q+fTlI5wZtjAI0rOiS3JgtT6cRx7uPjZlze9TGjklJhdsCPJbM5gcOT+
YiOxvqD+9if5QRSxiEZNj3bQ43wYhXmpctfIanyxziq09BPIPxvgfRR/BkUzc34R
ps4CIvImim5v5xc8Zsbk
=57xJ
-----END PGP SIGNATURE-----
Merge tag 'pm-4.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management updates from Rafael Wysocki:
"The majority of changes go into the Operating Performance Points (OPP)
framework and cpufreq this time, followed by devfreq and some
scattered updates all over.
The OPP changes are mostly related to switching over from RCU-based
synchronization, that turned out to be overly complicated and
problematic, to reference counting using krefs.
In the cpufreq land there are core cleanups, documentation updates, a
new driver for Broadcom BMIPS SoCs, a new cpufreq-dt sub-driver for TI
SoCs that require special handling, ARM64 SoCs support for the qoriq
driver, intel_pstate updates, powernv driver update and assorted
fixes.
The devfreq changes are mostly fixes related to the sysfs interface
and some Exynos drivers updates.
Apart from that, the cpuidle menu governor will support per-CPU PM QoS
constraints for the wakeup latency now, some bugs in the wakeup IRQs
framework are fixed, the generic power domains framework should handle
asynchronous invocations of *noirq suspend/resume callbacks from now
on, the analyze_suspend.py script is updated and there is a new tool
for intel_pstate diagnostics.
Specifics:
- Operating Performance Points (OPP) framework fixes, cleanups and
switch over from RCU-based synchronization to reference counting
using krefs (Viresh Kumar, Wei Yongjun, Dave Gerlach)
- cpufreq core cleanups and documentation updates (Viresh Kumar,
Rafael Wysocki)
- New cpufreq driver for Broadcom BMIPS SoCs (Markus Mayer)
- New cpufreq-dt sub-driver for TI SoCs requiring special handling,
like in the AM335x, AM437x, DRA7x, and AM57x families, along with
new DT bindings for it (Dave Gerlach, Paul Gortmaker)
- ARM64 SoCs support for the qoriq cpufreq driver (Tang Yuantian)
- intel_pstate driver updates including a new sysfs knob to control
the driver's operation mode and fixes related to the no_turbo sysfs
knob and the hardware-managed P-states feature support (Rafael
Wysocki, Srinivas Pandruvada)
- New interface to export ultra-turbo frequencies for the powernv
cpufreq driver (Shilpasri Bhat)
- Assorted fixes for cpufreq drivers (Arnd Bergmann, Dan Carpenter,
Wei Yongjun)
- devfreq core fixes, mostly related to the sysfs interface exported
by it (Chanwoo Choi, Chris Diamand)
- Updates of the exynos-bus and exynos-ppmu devfreq drivers (Chanwoo
Choi)
- Device PM QoS extension to support CPUs and support for per-CPU
wakeup (device resume) latency constraints in the cpuidle menu
governor (Alex Shi)
- Wakeup IRQs framework fixes (Grygorii Strashko)
- Generic power domains framework update including a fix to make it
handle asynchronous invocations of *noirq suspend/resume callbacks
correctly (Ulf Hansson, Geert Uytterhoeven)
- Assorted fixes and cleanups in the core suspend/hibernate code, PM
QoS framework and x86 ACPI idle support code (Corentin Labbe, Geert
Uytterhoeven, Geliang Tang, John Keeping, Nick Desaulniers)
- Update of the analyze_suspend.py script is updated to version 4.5
offering multiple improvements (Todd Brandt)
- New tool for intel_pstate diagnostics using the pstate_sample
tracepoint (Doug Smythies)"
* tag 'pm-4.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (85 commits)
MAINTAINERS: cpufreq: add bmips-cpufreq.c
PM / QoS: Fix memory leak on resume_latency.notifiers
PM / Documentation: Spelling s/wrtie/write/
PM / sleep: Fix test_suspend after sleep state rework
cpufreq: CPPC: add ACPI_PROCESSOR dependency
cpufreq: make ti-cpufreq explicitly non-modular
cpufreq: Do not clear real_cpus mask on policy init
tools/power/x86: Debug utility for intel_pstate driver
AnalyzeSuspend: fix drag and zoom bug in javascript
PM / wakeirq: report a wakeup_event on dedicated wekup irq
PM / wakeirq: Fix spurious wake-up events for dedicated wakeirqs
PM / wakeirq: Enable dedicated wakeirq for suspend
cpufreq: dt: Don't use generic platdev driver for ti-cpufreq platforms
cpufreq: ti: Add cpufreq driver to determine available OPPs at runtime
Documentation: dt: add bindings for ti-cpufreq
PM / OPP: Expose _of_get_opp_desc_node as dev_pm_opp API
cpufreq: qoriq: Don't look at clock implementation details
cpufreq: qoriq: add ARM64 SoCs support
PM / Domains: Provide dummy governors if CONFIG_PM_GENERIC_DOMAINS=n
cpufreq: brcmstb-avs-cpufreq: remove unnecessary platform_set_drvdata()
...
Pull scheduler updates from Ingo Molnar:
"The main changes in this (fairly busy) cycle were:
- There was a class of scheduler bugs related to forgetting to update
the rq-clock timestamp which can cause weird and hard to debug
problems, so there's a new debug facility for this: which uncovered
a whole lot of bugs which convinced us that we want to keep the
debug facility.
(Peter Zijlstra, Matt Fleming)
- Various cputime related updates: eliminate cputime and use u64
nanoseconds directly, simplify and improve the arch interfaces,
implement delayed accounting more widely, etc. - (Frederic
Weisbecker)
- Move code around for better structure plus cleanups (Ingo Molnar)
- Move IO schedule accounting deeper into the scheduler plus related
changes to improve the situation (Tejun Heo)
- ... plus a round of sched/rt and sched/deadline fixes, plus other
fixes, updats and cleanups"
* 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (85 commits)
sched/core: Remove unlikely() annotation from sched_move_task()
sched/autogroup: Rename auto_group.[ch] to autogroup.[ch]
sched/topology: Split out scheduler topology code from core.c into topology.c
sched/core: Remove unnecessary #include headers
sched/rq_clock: Consolidate the ordering of the rq_clock methods
delayacct: Include <uapi/linux/taskstats.h>
sched/core: Clean up comments
sched/rt: Show the 'sched_rr_timeslice' SCHED_RR timeslice tuning knob in milliseconds
sched/clock: Add dummy clear_sched_clock_stable() stub function
sched/cputime: Remove generic asm headers
sched/cputime: Remove unused nsec_to_cputime()
s390, sched/cputime: Remove unused cputime definitions
powerpc, sched/cputime: Remove unused cputime definitions
s390, sched/cputime: Make arch_cpu_idle_time() to return nsecs
ia64, sched/cputime: Remove unused cputime definitions
ia64: Convert vtime to use nsec units directly
ia64, sched/cputime: Move the nsecs based cputime headers to the last arch using it
sched/cputime: Remove jiffies based cputime
sched/cputime, vtime: Return nsecs instead of cputime_t to account
sched/cputime: Complete nsec conversion of tick based accounting
...
* Expose per-DIMM error counts in sysfs (Aaron Miller)
* Add T2080 l2-cache support to mpc85xx (Chris Packham)
* Random other small improvements/cleanups/fixlets
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAliqyD0ACgkQEsHwGGHe
VUoGMhAAhiSyD3O0wL+e+8Tg0Kbn5nynwjuF1GdPPSsC9xYerRqw6mkcZPZKidBt
QoMstORdwoZtZ5BgVIUO4Yn0iqoNAmxOCxEMiDE5hW+ptViSKogGocQIvxojSl65
K9qvUX1I/EpIblHJcMlv9BqvTUTOv5sBhgMSxLjBIArOud/MeaTGEM/CuxwL90mI
OeJ++HLNNz657WkKzOH2fm9nvwIgYsg7VhdCxAJBPOuiyZDVBEb5+exFJzcy/kQ/
LLGptFSZ4zv+K2ebZKSBt4YawYUHxLq2ufvFYYEdf9PZ8rGniSdCX7cTiun8ZUCl
Q/74BhfIkkNVvcx49ugF883uzO7J2yatnrLp7P2j7RZ5jM6RdqKVteIK3f/iK0kE
pjRXglLKXofZvD3UQsS3NeZtZ1Rb6R1ejwqo4jIbqSUaXn3X67Mo/pEEtbvCdeGS
duQnE80H+8vqXdJMiYfhHeRvkLKWo2l3o6IsPeNyPO0+3naxvLijsv2hBdx83Y8V
Psk8rl3QR4Gqe9y/9SzjtyxHxuSXGNFijrBy1i918LphS5RtE/bWgl5+8SLoe3hg
4g7WjXW0Vpvvu+JVNuXhYXtEUZF2RqvXZMGEv2Iv5okZpBGo3IRFLDudezO1YQ33
xEckv0qHM8LNbMmYjhI4JcfcH4Hos+KBQMPeT4uXB6ItdCf4FRE=
=7YI0
-----END PGP SIGNATURE-----
Merge tag 'edac_for_4.11' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp
Pull EDAC updates from Borislav Petkov:
- Make amd64_edac still load on a machine with unpopulated nodes +
cleanups (Yazen Ghannam)
- Expose per-DIMM error counts in sysfs (Aaron Miller)
- Add T2080 l2-cache support to mpc85xx (Chris Packham)
- Random other small improvements/cleanups/fixlets
* tag 'edac_for_4.11' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp:
EDAC, mce_amd: Print IPID and Syndrome on a separate line
EDAC, amd64: Bump driver version
MAINTAINERS, EDAC: Update email for Thor Thayer
EDAC, fsl_ddr: Make locally used symbols static
EDAC, mpc85xx: Add T2080 l2-cache support
EDAC, amd64: Add x86cpuid sanity check during init
EDAC, amd64: Don't treat ECC disabled as failure
EDAC: Add routine to check if MC devices list is empty
EDAC, amd64: Remove unused printing macros
EDAC, amd64: Rework messages in ecc_enabled()
EDAC, amd64: Move global code out of instance functions
EDAC, amd64: Free unused memory when init_one_instance() fails
EDAC, mce_amd: Give more context to deferred error message
EDAC, i7300: Test for the second channel properly
EDAC, sb_edac: Get rid of ->show_interleave_mode()
EDAC: Expose per-DIMM error counts in sysfs
EDAC, amd64: Save and return err code from probe_one_instance()
EDAC, i82975x: Add ioremap_nocache() error handling
EDAC: Fix typos in enum mem_type comments
EDAC: Make dev_attr_sdram_scrub_rate static
Long standing issue with JITed programs is that stack traces from
function tracing check whether a given address is kernel code
through {__,}kernel_text_address(), which checks for code in core
kernel, modules and dynamically allocated ftrace trampolines. But
what is still missing is BPF JITed programs (interpreted programs
are not an issue as __bpf_prog_run() will be attributed to them),
thus when a stack trace is triggered, the code walking the stack
won't see any of the JITed ones. The same for address correlation
done from user space via reading /proc/kallsyms. This is read by
tools like perf, but the latter is also useful for permanent live
tracing with eBPF itself in combination with stack maps when other
eBPF types are part of the callchain. See offwaketime example on
dumping stack from a map.
This work tries to tackle that issue by making the addresses and
symbols known to the kernel. The lookup from *kernel_text_address()
is implemented through a latched RB tree that can be read under
RCU in fast-path that is also shared for symbol/size/offset lookup
for a specific given address in kallsyms. The slow-path iteration
through all symbols in the seq file done via RCU list, which holds
a tiny fraction of all exported ksyms, usually below 0.1 percent.
Function symbols are exported as bpf_prog_<tag>, in order to aide
debugging and attribution. This facility is currently enabled for
root-only when bpf_jit_kallsyms is set to 1, and disabled if hardening
is active in any mode. The rationale behind this is that still a lot
of systems ship with world read permissions on kallsyms thus addresses
should not get suddenly exposed for them. If that situation gets
much better in future, we always have the option to change the
default on this. Likewise, unprivileged programs are not allowed
to add entries there either, but that is less of a concern as most
such programs types relevant in this context are for root-only anyway.
If enabled, call graphs and stack traces will then show a correct
attribution; one example is illustrated below, where the trace is
now visible in tooling such as perf script --kallsyms=/proc/kallsyms
and friends.
Before:
7fff8166889d bpf_clone_redirect+0x80007f0020ed (/lib/modules/4.9.0-rc8+/build/vmlinux)
f5d80 __sendmsg_nocancel+0xffff006451f1a007 (/usr/lib64/libc-2.18.so)
After:
7fff816688b7 bpf_clone_redirect+0x80007f002107 (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fffa0575728 bpf_prog_33c45a467c9e061a+0x8000600020fb (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fffa07ef1fc cls_bpf_classify+0x8000600020dc (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fff81678b68 tc_classify+0x80007f002078 (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fff8164d40b __netif_receive_skb_core+0x80007f0025fb (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fff8164d718 __netif_receive_skb+0x80007f002018 (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fff8164e565 process_backlog+0x80007f002095 (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fff8164dc71 net_rx_action+0x80007f002231 (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fff81767461 __softirqentry_text_start+0x80007f0020d1 (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fff817658ac do_softirq_own_stack+0x80007f00201c (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fff810a2c20 do_softirq+0x80007f002050 (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fff810a2cb5 __local_bh_enable_ip+0x80007f002085 (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fff8168d452 ip_finish_output2+0x80007f002152 (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fff8168ea3d ip_finish_output+0x80007f00217d (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fff8168f2af ip_output+0x80007f00203f (/lib/modules/4.9.0-rc8+/build/vmlinux)
[...]
7fff81005854 do_syscall_64+0x80007f002054 (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fff817649eb return_from_SYSCALL_64+0x80007f002000 (/lib/modules/4.9.0-rc8+/build/vmlinux)
f5d80 __sendmsg_nocancel+0xffff01c484812007 (/usr/lib64/libc-2.18.so)
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
Remove the dummy bpf_jit_compile() stubs for eBPF JITs and make
that a single __weak function in the core that can be overridden
similarly to the eBPF one. Also remove stale pr_err() mentions
of bpf_jit_compile.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
One fix from Paul, we can not use the radix MMU under a hypervisor for now.
Although the current code checks if the processor supports radix, that is not
sufficient.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJYpukgAAoJEFHr6jzI4aWA17UP/jKOZX9G7GcjNAM5VUkL1MZK
pAwZ4heCGeXDT3BhM6UPhwOXPJ/HTpoo9r1hAwAcRAHjvLEbY4WIdwr89s7kGiSD
uO3AfORSpvVNXzJkiD3vyllavkeg0diB+M5G6BktTuV473Ssn3lq3HjS4fjKEWt3
Rfnn4uGx7/KTsAsjEa20bau9lbz48M9csIfvIRs5osYi+PlpEN/3ES0cKWshkgOo
E5doX5jnNxGcySXIFEDhmBQu3jBjVUGIgEambpZExAKHZkIODrhUS3WBKCbTO4U6
kqgP662F8N2Q5jchYWtqbTiJHQJ3dKCcsSzluTrW2BkEE955XSzLxG7+jXQjA2OV
2bIFi/tswKFXDrTbfkHWe8gO4VGeTjmBIFOyfutFrum/F88cxw3eRcgm/n77J8BN
APicI6RVEEXi9s/PbWPOBtw/nPUvsrITDQBz2sC6bUZ+ruk+WgTpo6uSNaqudmFy
3s47M6unX8j9JHHt1xtqPDE6wKYQSum5xQiC1J3xSTGugfakqmuXaMtikm63GK7V
sv9EFX9NZ4pdS8E/mxfnaVVqLNSwpxgpmU1Hxu38IOvTwrKceMpUpRK3R4Z5uqss
XVePIhOllpufZo7yE1WsdVSWd0xViSplrKqlvY6XHjCNeiEdBeCoYpMG3ECJtoKC
qiBro6VMAIfA1BP0vLu+
=Q1P7
-----END PGP SIGNATURE-----
Merge tag 'powerpc-4.10-5' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc fix from Michael Ellerman:
"One fix from Paul: we can not use the radix MMU under a hypervisor for
now.
Although the code checked if the processor supports radix, that is not
sufficient"
* tag 'powerpc-4.10-5' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
powerpc/64: Disable use of radix under a hypervisor
Currently, if the kernel is running on a POWER9 processor under a
hypervisor, it may try to use the radix MMU even though it doesn't have
the necessary code to do so (it doesn't negotiate use of radix, and it
doesn't do the H_REGISTER_PROC_TBL hcall). If the hypervisor supports
both radix and HPT, then it will set up the guest to use HPT (since the
guest doesn't request radix in the CAS call), but if the radix feature
bit is set in the ibm,pa-features property (which is valid, since
ibm,pa-features is defined to represent the capabilities of the
processor) the guest will try to use radix, resulting in a crash when
it turns the MMU on.
This makes the minimal fix for the current code, which is to disable
radix unless we are running in hypervisor mode.
Fixes: 2bfd65e45e ("powerpc/mm/radix: Add radix callbacks for early init routines")
Cc: stable@vger.kernel.org # v4.7+
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
We do them at the start of tlb flush, and we are sure a pte update will be
followed by a tlbflush. Hence we can skip the ptesync in pte update helpers.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Tested-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
This helps us to do some optimization for application exit case, where we can
skip the DD1 style pte update sequence.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Tested-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
In the kernel we do follow the below sequence in different code paths.
pte = ptep_get_clear(ptep)
....
set_pte_at(ptep, pte)
We do that for mremap, autonuma protection update and softdirty clearing. This
implies our optimization to skip a tlb flush when clearing a pte update is
not valid, because for DD1 system that followup set_pte_at will be done witout
doing the required tlbflush. Fix that by always doing the dd1 style pte update
irrespective of new_pte value. In a later patch we will optimize the application
exit case.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Tested-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
With radix, we can get page fault with DSISR_PROTFAULT value set in case of
PROT_NONE or autonuma mapping. The PROT_NONE case in handled by the vma check
where we consider the access bad. For autonuma we should fall through and fixup
the access mask correctly.
Without this patch we trigger the WARN_ON() on radix. This code moves that
WARN_ON() within a radix_enabled() check. I also moved the WARN_ON() outside
the if condition making it apply for all type of faults (exec/write/read). It
is also conditionalized for book3s, because BOOK3E can also get a PROTFAULT to
handle the D/I cache sync.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Currently xmon data-breakpoint feature is broken.
Whenever there is a watchpoint match occurs, hw_breakpoint_handler will
be called by do_break via notifier chains mechanism. If watchpoint is
registered by xmon, hw_breakpoint_handler won't find any associated
perf_event and returns immediately with NOTIFY_STOP. Similarly, do_break
also returns without notifying to xmon.
Solve this by returning NOTIFY_DONE when hw_breakpoint_handler does not
find any perf_event associated with matched watchpoint, rather than
NOTIFY_STOP, which tells the core code to continue calling the other
breakpoint handlers including the xmon one.
Cc: stable@vger.kernel.org
Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
The recently merged HPT (Hash Page Table) resize support broke the build
when BOOK3S_64=n (ie. 32-bit or 64-bit Book3E) and MEMORY_HOTPLUG=y:
arch/powerpc/mm/mem.o: In function `.arch_add_memory':
(.text+0x4e4): undefined reference to `.resize_hpt_for_hotplug'
Fix it by adding a dummy version.
Fixes: 438cc81a41 ("powerpc/pseries: Automatically resize HPT for memory hot add/remove")
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Currently the build breaks if CMA=n and SPAPR_TCE_IOMMU=y:
arch/powerpc/mm/mmu_context_iommu.c: In function ‘mm_iommu_get’:
arch/powerpc/mm/mmu_context_iommu.c:193:42: error: ‘MIGRATE_CMA’ undeclared (first use in this function)
if (get_pageblock_migratetype(page) == MIGRATE_CMA) {
^~~~~~~~~~~
Fix it by using the existing is_migrate_cma_page(), which evaulates to
false when CMA=n.
Fixes: 2e5bbb5461 ("KVM: PPC: Book3S HV: Migrate pinned pages out of CMA")
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
If we enable RADIX but disable HUGETLBFS, the build breaks with:
arch/powerpc/mm/pgtable-radix.c:557:7: error: implicit declaration of function 'pmd_huge'
arch/powerpc/mm/pgtable-radix.c:588:7: error: implicit declaration of function 'pud_huge'
Fix it by stubbing those functions when HUGETLBFS=n.
Fixes: 4b5d62ca17 ("powerpc/mm: add radix__remove_section_mapping()")
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Four fixes from Ben:
- Userspace was semi-randomly segfaulting on radix due to us incorrectly
handling a fault triggered by autonuma, caused by a patch we merged earlier
in v4.10 to prevent the kernel executing userspace.
- We weren't marking host IPIs properly for KVM in the OPAL ICP backend.
- The ERAT flushing on radix was missing an isync and was incorrectly marked
as DD1 only.
- The powernv CPU hotplug code was missing a wakeup type and failing to flush
the interrupt correctly when using OPAL ICP.
Thanks to:
Benjamin Herrenschmidt.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJYnXkMAAoJEFHr6jzI4aWATLoQAIIGjrJ/U5vKHcb8vjhMaJwQ
JiwnIhTiA3+DJShodmReV6LxmUfIfo4CWuhG79NCuTibBi9LrTUkinE8t0DB6Z4H
qmeaULNi+tjX2OFyWNd/WTLrARzuXKm/gs0q2LpwslnrIAahP8xaOJEDzhhqpa4r
XyyVMVhyY1SFZK8d/ULwXeE19ZH+CwpBvanTyVS/zQPaVCY3v8798Z7x/1K6Vtrj
ks8FDY8AqHxSOwaaCmATy8gzGzLqj0I6phUT6D0OVmZIkKn3ucE+XpCzhLYBGpGa
PNc9guKuwXISiKDxQgCQCLyjeZnSOD57o4p83OdBIWEorxiHdaIKtXt8op5C6+3X
uO6MzI91t4ER3N40W4JNSROxxz/wd4R7mr3qLnsi8p3iJdG87T113L/2/pjKsoCy
HrEcD38K77mZTzcd3XPI8fYkaawL0kgnFrAGUJ+sVrpHbnbVgSvKF8c255FYl1YH
K9LiRZ3txSfTA+eA9Xuyw7O8dMEZiA89kdJ1yng5Pe+CBnBDcXJDdeD8Fm4FWCkc
OePb8IXPEbfT0jJ0TZpnu2x2mpzKxU2nEih3GNK90/NmiYd6Pp5drNJo78hQfFQA
Abterj12dXLW7/ynxLz3SMv20lnxOgzfbJZwFRXY90yc9YwYy4QFrpuDy/V/gopP
pGQhqPneUS2z6A8M5+59
=yLmR
-----END PGP SIGNATURE-----
Merge tag 'powerpc-4.10-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc fixes friom Michael Ellerman:
"Apologies for the late pull request, but Ben has been busy finding bugs.
- Userspace was semi-randomly segfaulting on radix due to us
incorrectly handling a fault triggered by autonuma, caused by a
patch we merged earlier in v4.10 to prevent the kernel executing
userspace.
- We weren't marking host IPIs properly for KVM in the OPAL ICP
backend.
- The ERAT flushing on radix was missing an isync and was incorrectly
marked as DD1 only.
- The powernv CPU hotplug code was missing a wakeup type and failing
to flush the interrupt correctly when using OPAL ICP
Thanks to Benjamin Herrenschmidt"
* tag 'powerpc-4.10-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
powerpc/powernv: Properly set "host-ipi" on IPIs
powerpc/powernv: Fix CPU hotplug to handle waking on HVI
powerpc/mm/radix: Update ERAT flushes when invalidating TLB
powerpc/mm: Fix spurrious segfaults on radix with autonuma
Fix typo in "hotplug_delay" parameter description. This allows modinfo
to match the help text to the parameter.
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
... as the generic weak variant will do.
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
The final paragraph of the help text is reversed. We want to enable
this option by default, and disable it if the toolchain has a working
-mprofile-kernel.
Fixes: 8c50b72a3b ("powerpc/ftrace: Add Kconfig & Make glue for mprofile-kernel")
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Currently the opal_exit tracepoint usually shows the opcode as 0:
<idle>-0 [047] d.h. 635.654292: opal_entry: opcode=63
<idle>-0 [047] d.h. 635.654296: opal_exit: opcode=0 retval=0
kopald-1209 [019] d... 636.420943: opal_entry: opcode=10
kopald-1209 [019] d... 636.420959: opal_exit: opcode=0 retval=0
This is because we incorrectly load the opcode into r0 before calling
__trace_opal_exit(), whereas it expects the opcode in r3 (first function
parameter). In fact we are leaving the retval in r3, so opcode and
retval will always show the same value.
Instead load the opcode into r3, resulting in:
<idle>-0 [040] d.h. 636.618625: opal_entry: opcode=63
<idle>-0 [040] d.h. 636.618627: opal_exit: opcode=63 retval=0
Fixes: c49f63530b ("powernv: Add OPAL tracepoints")
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Currently we get a warning that _mcount() can't be versioned:
WARNING: EXPORT symbol "_mcount" [vmlinux] version generation failed, symbol will not be versioned.
Add a prototype to asm-prototypes.h to fix it.
The prototype is not really correct, mcount() is not a normal function,
it has a special ABI. But for the purpose of versioning it doesn't
matter.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
The generic implementation of of_node_to_nid() is EXPORT_SYMBOL, added
in commit 298535c00a ("of, numa: Add NUMA of binding
implementation.").
The powerpc implementation added in commit 953039c8df ("[PATCH]
powerpc: Allow devices to register with numa topology") is
EXPORT_SYMBOL_GPL.
This creates an inconsistency for of_node_to_nid() callers across
architectures.
Update the powerpc implementation to be exported consistently with the
generic implementation.
Signed-off-by: Shailendra Singh <shailendras@nvidia.com>
Reviewed-by: Andy Ritger <aritger@nvidia.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Kprobe placed on the kretprobe_trampoline() during boot time can be
optimized, since the instruction at probe point is a 'nop'.
Signed-off-by: Anju T Sudhakar <anju@linux.vnet.ibm.com>
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Current infrastructure of kprobe uses the unconditional trap instruction
to probe a running kernel. Optprobe allows kprobe to replace the trap
with a branch instruction to a detour buffer. Detour buffer contains
instructions to create an in memory pt_regs. Detour buffer also has a
call to optimized_callback() which in turn call the pre_handler(). After
the execution of the pre-handler, a call is made for instruction
emulation. The NIP is determined in advanced through dummy instruction
emulation and a branch instruction is created to the NIP at the end of
the trampoline.
To address the limitation of branch instruction in POWER architecture,
detour buffer slot is allocated from a reserved area. For the time
being, 64KB is reserved in memory for this purpose.
Instructions which can be emulated using analyse_instr() are the
candidates for optimization. Before optimization ensure that the address
range between the detour buffer allocated and the instruction being
probed is within +/- 32MB.
Signed-off-by: Anju T Sudhakar <anju@linux.vnet.ibm.com>
Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Fix two issues with kprobes.h on BE which were exposed with the
optprobes work:
- one, having to do with a missing include for linux/module.h for
MODULE_NAME_LEN -- this didn't show up previously since the only
users of kprobe_lookup_name were in kprobes.c, which included
linux/module.h through other headers, and
- two, with a missing const qualifier for a local variable which ends
up referring a string literal. Again, this is unique to how
kprobe_lookup_name is being invoked in optprobes.c
Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
To permit the use of relative branch instruction in powerpc, the target
address has to be relatively nearby, since the address is specified in an
immediate field (24 bit filed) in the instruction opcode itself. Here
nearby refers to 32MB on either side of the current instruction.
This patch verifies whether the target address is within +/- 32MB
range or not.
Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: Anju T Sudhakar <anju@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Introduce __PPC_SH64() as a 64-bit variant to encode shift field in some
of the shift and rotate instructions operating on double-words. Convert
some of the BPF instruction macros to use the same.
Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
We've now implemented code in the pseries platform to use the new PAPR
interface to allow resizing the hash page table (HPT) at runtime.
This patch uses that interface to automatically attempt to resize the HPT
when memory is hot added or removed. This tries to always keep the HPT at
a reasonable size for our current memory size.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
The hypervisor needs to know a guest is capable of using the HPT resizing
PAPR extension in order to make full advantage of it for memory hotplug.
If the hypervisor knows the guest is HPT resize aware, it can size the
initial HPT based on the initial guest RAM size, relying on the guest to
resize the HPT when more memory is hot-added. Without this, the hypervisor
must size the HPT for the maximum possible guest RAM, which can lead to
a huge waste of space if the guest never actually expends to that maximum
size.
This patch advertises the guest's support for HPT resizing via the
ibm,client-architecture-support OF interface. We use bit 5 of byte 6 of
option vector 5 for this purpose, as defined in the PAPR ACR "HPT
resizing option".
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Anshuman Khandual <khandual@linux.vnet.ibm.com>
Reviewed-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
This adds support for using two hypercalls to change the size of the
main hash page table while running as a PAPR guest. For now these
hypercalls are only in experimental qemu versions.
The interface is two part: first H_RESIZE_HPT_PREPARE is used to
allocate and prepare the new hash table. This may be slow, but can be
done asynchronously. Then, H_RESIZE_HPT_COMMIT is used to switch to the
new hash table. This requires that no CPUs be concurrently updating the
HPT, and so must be run under stop_machine().
This also adds a debugfs file which can be used to manually control
HPT resizing or testing purposes.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Paul Mackerras <paulus@samba.org>
[mpe: Rename the debugfs file to "hpt_order"]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
This adds the hypercall numbers and wrapper functions for the hash page
table resizing hypercalls.
These hypercall numbers are defined in the PAPR ACR "HPT resizing
option".
It also adds a new firmware feature flag to track the presence of the
HPT resizing calls.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Otherwise KVM will fail to pass them through to the host
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
The IPIs come in as HVI not EE, so we need to test the appropriate
SRR1 bits. The encoding is such that it won't have false positives
on P7 and P8 so we can just test it like that. We also need to handle
the icp-opal variant of the flush.
Fixes: d74361881f ("powerpc/xics: Add ICP OPAL backend")
Cc: stable@vger.kernel.org # v4.8+
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Three tiny changes to the ERAT flushing logic: First don't make
it depend on DD1. It hasn't been decided yet but we might run
DD2 in a mode that also requires explicit flushes for performance
reasons so make it unconditional. We also add a missing isync, and
finally remove the flush from _tlbiel_va as it is only necessary
for congruence-class invalidations (PID, LPID and full TLB), not
targetted invalidations.
Fixes: 96ed1fe511 ("powerpc/mm/radix: Invalidate ERAT on tlbiel for POWER9 DD1")
Cc: stable@vger.kernel.org # v4.9+
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Recent versions of OPAL can provide names for the various OPAL interrupts,
so let's use them. This also modernises the code that fetches the
interrupt array to use the helpers provided by the generic code instead
of hand-parsing the property.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
[mpe: Free irqs on error, check allocation of names, consolidate error
handling, whitespace.]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
On some CAPP errors we see console messages that prints unknown HMIs for
which CAPI recovery is in progress. This patch fixes this by printing
correct error info for HMI generated due to CAPP recovery.
Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Tested-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
These are common on bare metal machines, so put them in the defconfig.
This adds 216KB to the vmlinux size
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
When autonuma (Automatic NUMA balancing) marks a PTE inaccessible it
clears all the protection bits but leave the PTE valid.
With the Radix MMU, an attempt at executing from such a PTE will
take a fault with bit 35 of SRR1 set "SRR1_ISI_N_OR_G".
It is thus incorrect to treat all such faults as errors. We should
pass them to handle_mm_fault() for autonuma to deal with. The case
of pages that are really not executable is handled by the existing
test for VM_EXEC further down.
That leaves us with catching the kernel attempts at executing user
pages. We can catch that earlier, even before we do find_vma.
It is never valid on powerpc for the kernel to take an exec fault
to begin with. So fold that test with the existing test for the
kernel faulting on kernel addresses to bail out early.
Fixes: 1d18ad0268 ("powerpc/mm: Detect instruction fetch denied and report")
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Acked-by: Balbir Singh <bsingharora@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
All entry points already read the MSR so they can easily do
the right thing.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
The branch from hmi_exception_early to hmi_exception_realmode must use
a "relocatable-style" branch, because it is branching from unrelocated
exception code to beyond __end_interrupts.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Without this we will always find the feature disabled.
Fixes: 984d7a1ec6 ("powerpc/mm: Fixup kernel read only mapping")
Cc: stable@vger.kernel.org # v4.7+
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Acked-by: Balbir Singh <bsingharora@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
start,size has the benefit of being easier to search for (start,end
usually gives you the preceeding vector from the one you want, as first
result).
Suggested-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>