The link state and exception interrupts may be masked when we probe.
The firmware should in theory prevent sending (and automasking) those
interrupts if the device is disabled, but if my reading of the FW code
is correct there are firmwares out there with race conditions in this
area. The interrupt may also be masked if previous driver which used
the device was malfunctioning and we didn't load the FW (there is no
other good way to comprehensively reset the PF).
Note that FW unmasks the data interrupts by itself when vNIC is
enabled, such helpful operation is not performed for LSC/EXN interrupts.
Always unmask the auxiliary interrupts after request_irq(). On the
remove path add missing PCI write flush before free_irq().
Fixes: 4c3523623d ("net: add driver for Netronome NFP4000/NFP6000 NIC VFs")
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
v2: add table length check.
DC component expect PP to give max engine clock and
memory clock through pp_get_display_mode_validation_clocks
on DGPU as well.
This patch can fix MultiGPU-Display blank
out with 1 IGPU-4k display and 2 DGPU-two 4K
displays.
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Rex Zhu <Rex.Zhu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Just another contact for the amdgpu driver.
Signed-off-by: Christian König <christian.koenig@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Windows added by the BIOS are not marked as 64bit because they are
usually not changeable anyway.
This fixes large BAR support on my new Ryzen build system.
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Otherwise we would completely circumvent that debugging feature.
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Roger He <Hongbo.He@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Update the PDEs after resetting the huge flag.
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Chunming Zhou <david1.zhou@amd.com>
Acked-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Necessary for the next patch.
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Chunming Zhou <david1.zhou@amd.com>
Acked-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Try to lock moved BOs if it's successful we can update the
PTEs directly to the new location.
v2: rebase
Signed-off-by: Christian König <christian.koenig@amd.com>
Acked-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
We can actually handle invalid huge pages perfectly fine now.
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Chunming Zhou <david1.zhou@amd.com>
Acked-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Evan Quan <evan.quan@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Suppress warning messages when allocating huge pages fails since we can
always fall back to normal pages.
Signed-off-by: Christian König <christian.koenig@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
No one will use this function except ttm_bo_io_mem_pfn() now, so move
the calculation of ttm_bo_default_io_mem_pfn() into ttm_bo_io_mem_pfn()
and do some cleanup.
Signed-off-by: Tan Xiaojun <tanxiaojun@huawei.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
The default interface situation has been taken into the framework, so
remove the default set of each module.
Signed-off-by: Tan Xiaojun <tanxiaojun@huawei.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
add smu_free_memory when smu fini to prevent memory leakage
v2: squash in typo fix (Yintian) and warning (Harry)
Signed-off-by: Yintian Tao <yttao@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Use adev->vm_manager.id_mgr[0].num_ids rather than hardcoded 16.
v2: use AMDGPU_GFXHUB rather than hardcoded 0 (Christian)
Reviewed-by: Chunming Zhou <david1.zhou@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Noticed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Add quirks for handling PX/HG systems. In this case, add
a quirk for a weston dGPU that only seems to properly power
down using ATPX power control rather than HG (_PR3).
v2: append a new weston XT
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Junwei Zhang <Jerry.Zhang@amd.com> (v2)
Reviewed-and-Tested-by: Junwei Zhang <Jerry.Zhang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Cc: stable@vger.kernel.org
A vlan device with vid 0 is allow to creat by not able to be fully
cleaned up by unregister_vlan_dev() which checks for vlan_id!=0.
Also, VLAN 0 is probably not a valid number and it is kinda
"reserved" for HW accelerating devices, but it is probably too
late to reject it from creation even if makes sense. Instead,
just remove the check in unregister_vlan_dev().
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Fixes: ad1afb0039 ("vlan_dev: VLAN 0 should be treated as "no vlan tag" (802.1p packet)")
Cc: Vlad Yasevich <vyasevich@gmail.com>
Cc: Ben Hutchings <ben.hutchings@codethink.co.uk>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Hopefully the last set of fixes for 4.15.
iwlwifi
* fix DMA mapping regression since v4.14
wcn36xx
* fix dynamic power save which has been broken since the driver was commited
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQEcBAABAgAGBQJaVLeaAAoJEG4XJFUm622bYBUH/2NhQwUJbKIrbxYhYpo0d++8
GK3TjxpzTByCe0nXGUT5iuDaY72i9C2UoLhFQ5smVg04pE0IXQQjZus6vSGx4biz
GUave/SkzL0EruUXjXLBYWiYDND4iynk82gWX2/Lh7qGoT2SQfD5cKz0cMdE5NrW
E7Q1CMiaoB0i9jcksaU2uWA0XPwISxl61kU2dXuKHQOJ1CW1goI/YIHsBajshHmi
ZEfMqxZFE+jz2Kkp4tKhvG/Xva0ylJv8bwK8CMK6MqA8oa3xdgOhv67E9mm7IGpD
ETrRLJNnWGJlnod5u7QOZWcS01gAgT5whCqDl/lTEty0823kvjMQdwckQaIU+AE=
=me16
-----END PGP SIGNATURE-----
Merge tag 'wireless-drivers-for-davem-2018-01-09' of git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers
Kalle Valo says:
====================
wireless-drivers fixes for 4.15
Hopefully the last set of fixes for 4.15.
iwlwifi
* fix DMA mapping regression since v4.14
wcn36xx
* fix dynamic power save which has been broken since the driver was commited
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
If one of the child devices is missing the of_mdiobus_register_phy()
call will return -ENODEV. When a missing device is encountered the
registration of the remaining PHYs is stopped and the MDIO bus will
fail to register. Propagate all errors except ENODEV to avoid it.
Signed-off-by: Madalin Bucur <madalin.bucur@nxp.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
gcc-8 reports
net/caif/caif_usb.c: In function 'cfusbl_device_notify':
./include/linux/string.h:245:9: warning: '__builtin_strncpy' output may
be truncated copying 15 bytes from a string of length 15
[-Wstringop-truncation]
The compiler require that the input param 'len' of strncpy() should be
greater than the length of the src string, so that '\0' is copied as
well. We can just use strlcpy() to avoid this warning.
Signed-off-by: Xiongfeng Wang <xiongfeng.wang@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Kornilios Kourtis <kou@zurich.ibm.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
set_fipers() calling should be protected by spinlock in
case that any interrupt breaks related registers setting
and the function we expect. This patch is to move set_fipers()
to spinlock protecting area in ptp_gianfar_adjtime().
Signed-off-by: Yangbo Lu <yangbo.lu@nxp.com>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Reviewed-by: Fabio Estevam <fabio.estevam@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Marcelo Ricardo Leitner says:
====================
sctp: Some sockopt optlen fixes
Hangbin Liu reported that some SCTP sockopt are allowing the user to get
the kernel to allocate really large buffers by not having a ceiling on
optlen.
This patchset address this issue (in patch 2), replace an GFP_ATOMIC
that isn't needed and avoid calculating the option size multiple times
in some setsockopt.
====================
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Some sockopt handling functions were calculating the length of the
buffer to be written to userspace and then calculating it again when
actually writing the buffer, which could lead to some write not using
an up-to-date length.
This patch updates such places to just make use of the len variable.
Also, replace some sizeof(type) to sizeof(var).
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Hangbin Liu reported that some sockopt calls could cause the kernel to log
a warning on memory allocation failure if the user supplied a large optlen
value. That is because some of them called memdup_user() without a ceiling
on optlen, allowing it to try to allocate really large buffers.
This patch adds a ceiling by limiting optlen to the maximum allowed that
would still make sense for these sockopt.
Reported-by: Hangbin Liu <haliu@redhat.com>
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
So replace it with GFP_USER and also add __GFP_NOWARN.
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add support for devfreq to dynamically control the GPU frequency.
By default try to use the 'simple_ondemand' governor which can
adjust the frequency based on GPU load.
v2: Fix __aeabi_uldivmod issue from the 0 day bot and use
devfreq_recommended_opp() as suggested by Rob.
Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
Signed-off-by: Rob Clark <robdclark@gmail.com>
A collection of the last-minute small PCM fixes:
- A workaround for the recent regression wrt PulseAudio
- Removal of spurious WARN_ON() that is triggered by syzkaller
- Fixes for aloop, hardening racy accesses
- Fixes in PCM OSS emulation wrt the unabortable loops that may cause
RCU stall
-----BEGIN PGP SIGNATURE-----
iQJCBAABCAAsFiEEIXTw5fNLNI7mMiVaLtJE4w1nLE8FAlpUeWoOHHRpd2FpQHN1
c2UuZGUACgkQLtJE4w1nLE8rMxAAy+XJNWigvkWHd79ttKeAmndia/u9d+T6Ge/I
VI/SJSy8vhnGO0YNf/AHEs6vtad73XnXP76x1H3TkCsrDxykfhKogCvp0Aat/Ji7
LQFkhQKsaEdACm2TlPxmxpO64sYB8UjvcZBFS82tCmNCldMkwi8T+DDDHocP0A0D
pOQogjffqPBZdk7X1hJxoVKOm95GI1ms09+JPrLl47aa6mLIvNxa81RGnrVK5blE
+kYZQAblweGN8RsMVWqyrnxgRatF59UbV6JIKui/8KD2AXl3Hya/Dn2aFWtMqqH8
p8siLsUI+tACPucNk7tMt9UjHEy7yGK02hClhYVZG6vZ81nSoJsJFTdwXBMKjrfy
Fa1bBb8quM6WfBEHXB7YISulUrrc2nftkPhB/zIa5E9arkHWY4FL7jhdUTEAjkgr
D0Ka3Q/PtdXxmK+NBqUpoiqDHoOQeA5HG+njsz5L0xSbxoxMy8guyxSaoeF4BOnW
KbrVbzcJzSxDWPYGbKmeLEYHW8P3FOKNv9SI/WZErmyjQkeMiq7AuP93yYACFEyj
LhSAxBZ00sStl6IgM4Unw6p4Gi0SOawQfADDG4Arfr/fRA52l9wmpaUwU3uJ1RMn
gLvLfJkBbs/MwBjD5BPxfdjKIuREvMdUBwl/hZk1zp5d2ay0lAl6toNcee//MuHf
DKd1t3I=
=Fz+p
-----END PGP SIGNATURE-----
Merge tag 'sound-4.15-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
Pull sound fixes from Takashi Iwai:
"A collection of the last-minute small PCM fixes:
- A workaround for the recent regression wrt PulseAudio
- Removal of spurious WARN_ON() that is triggered by syzkaller
- Fixes for aloop, hardening racy accesses
- Fixes in PCM OSS emulation wrt the unabortable loops that may cause
RCU stall"
* tag 'sound-4.15-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
ALSA: pcm: Allow aborting mutex lock at OSS read/write loops
ALSA: pcm: Abort properly at pending signal in OSS read/write loops
ALSA: aloop: Fix racy hw constraints adjustment
ALSA: aloop: Fix inconsistent format due to incomplete rule
ALSA: aloop: Release cable upon open error path
ALSA: pcm: Workaround for weird PulseAudio behavior on rewind error
ALSA: pcm: Add missing error checks in OSS emulation plugin builder
ALSA: pcm: Remove incorrect snd_BUG_ON() usages
The alternatives code checks only the first byte whether it is a NOP, but
with NOPs in front of the payload and having actual instructions after it
breaks the "optimized' test.
Make sure to scan all bytes before deciding to optimize the NOPs in there.
Reported-by: David Woodhouse <dwmw2@infradead.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Jiri Kosina <jikos@kernel.org>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Andrew Lutomirski <luto@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Greg Kroah-Hartman <gregkh@linux-foundation.org>
Cc: Paul Turner <pjt@google.com>
Link: https://lkml.kernel.org/r/20180110112815.mgciyf5acwacphkq@pd.tnic
Daniel Borkmann says:
====================
pull-request: bpf 2018-01-09
The following pull-request contains BPF updates for your *net* tree.
The main changes are:
1) Prevent out-of-bounds speculation in BPF maps by masking the
index after bounds checks in order to fix spectre v1, and
add an option BPF_JIT_ALWAYS_ON into Kconfig that allows for
removing the BPF interpreter from the kernel in favor of
JIT-only mode to make spectre v2 harder, from Alexei.
2) Remove false sharing of map refcount with max_entries which
was used in spectre v1, from Daniel.
3) Add a missing NULL psock check in sockmap in order to fix
a race, from John.
4) Fix test_align BPF selftest case since a recent change in
verifier rejects the bit-wise arithmetic on pointers
earlier but test_align update was missing, from Alexei.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
The vmw_view_cmd_to_type() function returns vmw_view_max (3) on error.
It's one element beyond the end of the vmw_view_cotables[] table.
My read on this is that it's possible to hit this failure. header->id
comes from vmw_cmd_check() and it's a user controlled number between
1040 and 1225 so we can hit that error. But I don't have the hardware
to test this code.
Fixes: d80efd5cb3 ("drm/vmwgfx: Initial DX support")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Thomas Hellstrom <thellstrom@vmware.com>
Cc: <stable@vger.kernel.org>
Even though the default countable for CP0 is CP_ALWAYS_COUNT (0),
program the selector during HW initialization in an effort to be
up front about which counters are programmed and why.
Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
Signed-off-by: Rob Clark <robdclark@gmail.com>
Some 5xx based chipsets have different bins for GPU clock speeds.
Read the fuses (if applicable) and set the appropriate OPP table.
This will only work with OPP v2 tables - the bin will be ignored
for legacy pwrlevel tables.
Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
Signed-off-by: Rob Clark <robdclark@gmail.com>
Move the clock parsing to adreno_gpu_init() to allow for target
specific probing and manipulation of the clock tables.
Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
Signed-off-by: Rob Clark <robdclark@gmail.com>
We don't need to convert the chipid to an intermediate value and
then back again into a struct adreno_rev.
Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
Signed-off-by: Rob Clark <robdclark@gmail.com>
Remove the downstream bus scaling code. It isn't needed for for
compatibility with a downstream or vendor kernel. Get it out of the
way to clear space for devfreq support.
Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
Signed-off-by: Rob Clark <robdclark@gmail.com>
Calling dev_pm_opp_find_freq_floor() returns the matched frequency
in 'freq'. We don't need to call dev_pm_opp_get_freq() again
to get the frequency value.
Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
Signed-off-by: Rob Clark <robdclark@gmail.com>
We need to call dev_pm_opp_put() to put back the reference
for the OPP struct after calling the various dev_pm_opp_get_*
functions.
Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
Signed-off-by: Rob Clark <robdclark@gmail.com>
When cleaning up after a partially successful gntdev_mmap(), unmap the
successfully mapped grant pages otherwise Xen will kill the domain if
in debug mode (Attempt to implicitly unmap a granted PTE) or Linux will
kill the process and emit "BUG: Bad page map in process" if Xen is in
release mode.
This is only needed when use_ptemod is true because gntdev_put_map()
will unmap grant pages itself when use_ptemod is false.
Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
If the requested range has a hole, the calculation of the number of
pages to unmap is off by one. Fix it.
Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Since commit f11a04464a ("i2c: gpio: Enable working over slow
can_sleep GPIOs"), probing the i2c RTC connected to an i2c-gpio bus on
r8a7740/armadillo fails with:
rtc-s35390a 0-0030: error resetting chip
rtc-s35390a: probe of 0-0030 failed with error -5
More debug code reveals:
i2c i2c-0: master_xfer[0] R, addr=0x30, len=1
i2c i2c-0: NAK from device addr 0x30 msg #0
s35390a_get_reg: ret = -6
Commit 02e479808b ("gpio: Alter semantics of *raw* operations to
actually be raw") moved open drain/source handling from
gpiod_set_raw_value_commit() to gpiod_set_value(), but forgot to take
into account that gpiod_set_value_cansleep() also needs this handling.
The i2c protocol mandates that i2c signals are open drain, hence i2c
communication fails.
Fix this by adding the missing handling to gpiod_set_value_cansleep(),
using a new common helper gpiod_set_value_nocheck().
Fixes: 02e479808b ("gpio: Alter semantics of *raw* operations to actually be raw")
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
[removed underscore syntax, added kerneldoc]
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
The SOR0 found on Tegra124 and Tegra210 only supports eDP and LVDS and
therefore has a slightly different clock tree than the SOR1 which does
not support eDP, but HDMI and DP instead.
Commit e1335e2f0c ("drm/tegra: sor: Reimplement pad clock") breaks
setups with eDP because the sor->clk_out clock is uninitialized and
therefore setting the parent clock (either the safe clock or either of
the display PLLs) fails, which can cause hangs later on since there is
no clock driving the module.
Fix this by falling back to the module clock for sor->clk_out on those
setups. This guarantees that the module will always be clocked by an
enabled clock and hence prevents those hangs.
Fixes: e1335e2f0c ("drm/tegra: sor: Reimplement pad clock")
Reported-by: Guillaume Tucker <guillaume.tucker@collabora.com>
Tested-by: Jon Hunter <jonathanh@nvidia.com>
Signed-off-by: Thierry Reding <treding@nvidia.com>
New device-tree properties are available which tell the hypervisor
settings related to the RFI flush. Use them to determine the
appropriate flush instruction to use, and whether the flush is
required.
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
A new hypervisor call is available which tells the guest settings
related to the RFI flush. Use it to query the appropriate flush
instruction(s), and whether the flush is required.
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Because there may be some performance overhead of the RFI flush, add
kernel command line options to disable it.
We add a sensibly named 'no_rfi_flush' option, but we also hijack the
x86 option 'nopti'. The RFI flush is not the same as KPTI, but if we
see 'nopti' we can guess that the user is trying to avoid any overhead
of Meltdown mitigations, and it means we don't have to educate every
one about a different command line option.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
On some CPUs we can prevent the Meltdown vulnerability by flushing the
L1-D cache on exit from kernel to user mode, and from hypervisor to
guest.
This is known to be the case on at least Power7, Power8 and Power9. At
this time we do not know the status of the vulnerability on other CPUs
such as the 970 (Apple G5), pasemi CPUs (AmigaOne X1000) or Freescale
CPUs. As more information comes to light we can enable this, or other
mechanisms on those CPUs.
The vulnerability occurs when the load of an architecturally
inaccessible memory region (eg. userspace load of kernel memory) is
speculatively executed to the point where its result can influence the
address of a subsequent speculatively executed load.
In order for that to happen, the first load must hit in the L1,
because before the load is sent to the L2 the permission check is
performed. Therefore if no kernel addresses hit in the L1 the
vulnerability can not occur. We can ensure that is the case by
flushing the L1 whenever we return to userspace. Similarly for
hypervisor vs guest.
In order to flush the L1-D cache on exit, we add a section of nops at
each (h)rfi location that returns to a lower privileged context, and
patch that with some sequence. Newer firmwares are able to advertise
to us that there is a special nop instruction that flushes the L1-D.
If we do not see that advertised, we fall back to doing a displacement
flush in software.
For guest kernels we support migration between some CPU versions, and
different CPUs may use different flush instructions. So that we are
prepared to migrate to a machine with a different flush instruction
activated, we may have to patch more than one flush instruction at
boot if the hypervisor tells us to.
In the end this patch is mostly the work of Nicholas Piggin and
Michael Ellerman. However a cast of thousands contributed to analysis
of the issue, earlier versions of the patch, back ports testing etc.
Many thanks to all of them.
Tested-by: Jon Masters <jcm@redhat.com>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
The KVM_PPC_ALLOCATE_HTAB ioctl(), implemented by kvmppc_alloc_reset_hpt()
is supposed to completely clear and reset a guest's Hashed Page Table (HPT)
allocating or re-allocating it if necessary.
In the case where an HPT of the right size already exists and it just
zeroes it, it forces a TLB flush on all guest CPUs, to remove any stale TLB
entries loaded from the old HPT.
However, that situation can arise when the HPT is resizing as well - or
even when switching from an RPT to HPT - so those cases need a TLB flush as
well.
So, move the TLB flush to trigger in all cases except for errors.
Cc: stable@vger.kernel.org # v4.10+
Fixes: f98a8bf9ee ("KVM: PPC: Book3S HV: Allow KVM_PPC_ALLOCATE_HTAB ioctl() to change HPT size")
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Commit 96df226 ("KVM: PPC: Book3S PR: Preserve storage control bits")
added code to preserve WIMG bits but it missed 2 special cases:
- a magic page in kvmppc_mmu_book3s_64_xlate() and
- guest real mode in kvmppc_handle_pagefault().
For these ptes, WIMG was 0 and pHyp failed on these causing a guest to
stop in the very beginning at NIP=0x100 (due to bd9166ffe "KVM: PPC:
Book3S PR: Exit KVM on failed mapping").
According to LoPAPR v1.1 14.5.4.1.2 H_ENTER:
The hypervisor checks that the WIMG bits within the PTE are appropriate
for the physical page number else H_Parameter return. (For System Memory
pages WIMG=0010, or, 1110 if the SAO option is enabled, and for IO pages
WIMG=01**.)
This hence initializes WIMG to non-zero value HPTE_R_M (0x10), as expected
by pHyp.
[paulus@ozlabs.org - fix compile for 32-bit]
Cc: stable@vger.kernel.org # v4.11+
Fixes: 96df226 "KVM: PPC: Book3S PR: Preserve storage control bits"
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Tested-by: Ruediger Oertel <ro@suse.de>
Reviewed-by: Greg Kurz <groug@kaod.org>
Tested-by: Greg Kurz <groug@kaod.org>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>