'pvpanic_remove()' is referenced only by a 'devm_add_action_or_reset()'
call in 'devm_pvpanic_probe()'. So, we know that its parameter is non-NULL.
Axe the unneeded check to save a few lines of code.
Suggested-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Link: https://lore.kernel.org/r/8e425618f4042a8ab8366be4d34026972e77bd40.1622911768.git.christophe.jaillet@wanadoo.fr
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
It's possible that during ->exit() the private_data is NULL,
for instance when there was no GPIO device instantiated.
Due to this we may not dereference it. Add a respective check.
Note, for now ->exit() only makes sense when GPIO device
was instantiated, that's why we may use the check for entire
function.
Fixes: 81171e7d31 ("serial: 8250_exar: Constify the software nodes")
Reported-by: Maxim Levitsky <mlevitsk@redhat.com>
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Tested-by: Maxim Levitsky <mlevitsk@redhat.com>
Link: https://lore.kernel.org/r/20210608144239.12697-1-andriy.shevchenko@linux.intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Commit 719e1f561a ("ACPI: Execute platform _OSC also with query bit
clear") makes acpi_bus_osc_negotiate_platform_control() not only query
the platforms capabilities but it also commits the result back to the
firmware to report which capabilities are supported by the OS back to
the firmware
On certain systems the BIOS loads SSDT tables dynamically based on the
capabilities the OS claims to support. However, on these systems the
_OSC actually clears some of the bits (under certain conditions) so what
happens is that now when we call the _OSC twice the second time we pass
the cleared values and that results errors like below to appear on the
system log:
ACPI BIOS Error (bug): Could not resolve symbol [\_PR.PR00._CPC], AE_NOT_FOUND (20210105/psargs-330)
ACPI Error: Aborting method \_PR.PR01._CPC due to previous error (AE_NOT_FOUND) (20210105/psparse-529)
In addition the ACPI 6.4 spec says following [1]:
If the OS declares support of a feature in the Support Field in one
call to _OSC, then it must preserve the set state of that bit
(declaring support for that feature) in all subsequent calls.
Based on the above we can fix the issue by passing the same set of
capabilities to the platform wide _OSC in both calls regardless of the
query flag.
While there drop the context.ret.length checks which were wrong to begin
with (as the length is number of bytes not elements). This is already
checked in acpi_run_osc() that also returns an error in that case.
Includes fixes by Hans de Goede.
[1] https://uefi.org/specs/ACPI/6.4/06_Device_Configuration/Device_Configuration.html#sequence-of-osc-calls
BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=213023
BugLink: https://bugzilla.redhat.com/show_bug.cgi?id=1963717
Fixes: 719e1f561a ("ACPI: Execute platform _OSC also with query bit clear")
Cc: 5.12+ <stable@vger.kernel.org> # 5.12+
Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Reviewed-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
The calclulation of how many bytes we stuff into the
DSI pipeline for video mode panels is off by three
orders of magnitude because we did not account for the
fact that the DRM mode clock is in kilohertz rather
than hertz.
This used to be:
drm_mode_vrefresh(mode) * mode->htotal * mode->vtotal
which would become for example for s6e63m0:
60 x 514 x 831 = 25628040 Hz, but mode->clock is
25628 as it is in kHz.
This affects only the Samsung GT-I8190 "Golden" phone
right now since it is the only MCDE device with a video
mode display.
Curiously some specimen work with this code and wild
settings in the EOL and empty packets at the end of the
display, but I have noticed an eeire flicker until now.
Others were not so lucky and got black screens.
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reported-by: Stephan Gerhold <stephan@gerhold.net>
Fixes: 920dd1b142 ("drm/mcde: Use mode->clock instead of reverse calculating it from the vrefresh")
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Tested-by: Stephan Gerhold <stephan@gerhold.net>
Reviewed-by: Stephan Gerhold <stephan@gerhold.net>
Link: https://patchwork.freedesktop.org/patch/msgid/20210608213318.3897858-1-linus.walleij@linaro.org
It's currently not possible to select the SC8180x TLMM driver, due to it
selecting PINCTRL_MSM, rather than depending on the same. Fix this.
Fixes: 97423113ec ("pinctrl: qcom: Add sc8180x TLMM driver")
Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Link: https://lore.kernel.org/r/20210608180702.2064253-1-bjorn.andersson@linaro.org
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Lars did not write the ralink-gdma driver. Looks like his name just got
copy&pasted from another similar DMA driver. Remove his name from the
copyright and MODULE_AUTHOR.
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
Link: https://lore.kernel.org/r/20210607100119.26983-1-lars@metafoo.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
It will cause null-ptr-deref if platform_get_resource() returns NULL,
we need check the return value.
Fixes: 517c4c44b3 ("usb: Add driver to allow any GPIO to be used for 7211 USB signals")
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Link: https://lore.kernel.org/r/20210605080914.2057758-1-yangyingliang@huawei.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
There is no validation of the index from dwc3_wIndex_to_dep() and we might
be referring a non-existing ep and trigger a NULL pointer exception. In
certain configurations we might use fewer eps and the index might wrongly
indicate a larger ep index than existing.
By adding this validation from the patch we can actually report a wrong
index back to the caller.
In our usecase we are using a composite device on an older kernel, but
upstream might use this fix also. Unfortunately, I cannot describe the
hardware for others to reproduce the issue as it is a proprietary
implementation.
[ 82.958261] Unable to handle kernel NULL pointer dereference at virtual address 00000000000000a4
[ 82.966891] Mem abort info:
[ 82.969663] ESR = 0x96000006
[ 82.972703] Exception class = DABT (current EL), IL = 32 bits
[ 82.978603] SET = 0, FnV = 0
[ 82.981642] EA = 0, S1PTW = 0
[ 82.984765] Data abort info:
[ 82.987631] ISV = 0, ISS = 0x00000006
[ 82.991449] CM = 0, WnR = 0
[ 82.994409] user pgtable: 4k pages, 39-bit VAs, pgdp = 00000000c6210ccc
[ 83.000999] [00000000000000a4] pgd=0000000053aa5003, pud=0000000053aa5003, pmd=0000000000000000
[ 83.009685] Internal error: Oops: 96000006 [#1] PREEMPT SMP
[ 83.026433] Process irq/62-dwc3 (pid: 303, stack limit = 0x000000003985154c)
[ 83.033470] CPU: 0 PID: 303 Comm: irq/62-dwc3 Not tainted 4.19.124 #1
[ 83.044836] pstate: 60000085 (nZCv daIf -PAN -UAO)
[ 83.049628] pc : dwc3_ep0_handle_feature+0x414/0x43c
[ 83.054558] lr : dwc3_ep0_interrupt+0x3b4/0xc94
...
[ 83.141788] Call trace:
[ 83.144227] dwc3_ep0_handle_feature+0x414/0x43c
[ 83.148823] dwc3_ep0_interrupt+0x3b4/0xc94
[ 83.181546] ---[ end trace aac6b5267d84c32f ]---
Signed-off-by: Marian-Cristian Rotariu <marian.c.rotariu@gmail.com>
Cc: stable <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/20210608162650.58426-1-marian.c.rotariu@gmail.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
when skb_clone() or skb_copy_expand() fail,
it should pull skb with lengh indicated by header,
or not it will read network data and check it as header.
Cc: <stable@vger.kernel.org>
Signed-off-by: Linyu Yuan <linyyuan@codeaurora.com>
Link: https://lore.kernel.org/r/20210608233547.3767-1-linyyuan@codeaurora.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
For ACPI devices we have a symmetric API to put them, so use it in the driver.
Reviewed-by: Heikki Krogerus <heikki.krogerus@linux.intel.com>
Signed-off-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Link: https://lore.kernel.org/r/20210607205007.71458-3-andy.shevchenko@gmail.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
devm_ioremap_resource() can return an error, add missed check for it.
Fixes: 43d596e322 ("usb: typec: intel_pmc_mux: Check the port status before connect")
Reviewed-by: Heikki Krogerus <heikki.krogerus@linux.intel.com>
Signed-off-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Cc: stable <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/20210607205007.71458-2-andy.shevchenko@gmail.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
device_get_next_child_node() bumps a reference counting of a returned variable.
We have to balance it whenever we return to the caller.
Fixes: 6701adfa96 ("usb: typec: driver for Intel PMC mux control")
Cc: Heikki Krogerus <heikki.krogerus@linux.intel.com>
Reviewed-by: Heikki Krogerus <heikki.krogerus@linux.intel.com>
Signed-off-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Cc: stable <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/20210607205007.71458-1-andy.shevchenko@gmail.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
If the VDM responses couldn't be sent successfully, it doesn't need to
finish the AMS until the retry count reaches the limit.
Fixes: 0908c5aca3 ("usb: typec: tcpm: AMS and Collision Avoidance")
Reviewed-by: Guenter Roeck <linux@roeck-us.net>
Cc: stable <stable@vger.kernel.org>
Acked-by: Heikki Krogerus <heikki.krogerus@linux.intel.com>
Signed-off-by: Kyle Tso <kyletso@google.com>
Link: https://lore.kernel.org/r/20210606081452.764032-1-kyletso@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
usb_assign_descriptors() is called with 5 parameters,
the last 4 of which are the usb_descriptor_header for:
full-speed (USB1.1 - 12Mbps [including USB1.0 low-speed @ 1.5Mbps),
high-speed (USB2.0 - 480Mbps),
super-speed (USB3.0 - 5Gbps),
super-speed-plus (USB3.1 - 10Gbps).
The differences between full/high/super-speed descriptors are usually
substantial (due to changes in the maximum usb block size from 64 to 512
to 1024 bytes and other differences in the specs), while the difference
between 5 and 10Gbps descriptors may be as little as nothing
(in many cases the same tuning is simply good enough).
However if a gadget driver calls usb_assign_descriptors() with
a NULL descriptor for super-speed-plus and is then used on a max 10gbps
configuration, the kernel will crash with a null pointer dereference,
when a 10gbps capable device port + cable + host port combination shows up.
(This wouldn't happen if the gadget max-speed was set to 5gbps, but
it of course defaults to the maximum, and there's no real reason to
artificially limit it)
The fix is to simply use the 5gbps descriptor as the 10gbps descriptor,
if a 10gbps descriptor wasn't provided.
Obviously this won't fix the problem if the 5gbps descriptor is also
NULL, but such cases can't be so trivially solved (and any such gadgets
are unlikely to be used with USB3 ports any way).
Cc: Felipe Balbi <balbi@kernel.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Maciej Żenczykowski <maze@google.com>
Cc: stable <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/20210609024459.1126080-1-zenczykowski@gmail.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This avoids a null pointer dereference in
f_{ecm,eem,hid,loopback,printer,rndis,serial,sourcesink,subset,tcm}
by simply reusing the 5gbps config for 10gbps.
Fixes: eaef50c760 ("usb: gadget: Update usb_assign_descriptors for SuperSpeedPlus")
Cc: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Cc: Felipe Balbi <balbi@kernel.org>
Cc: Gustavo A. R. Silva <gustavoars@kernel.org>
Cc: Lorenzo Colitti <lorenzo@google.com>
Cc: Martin K. Petersen <martin.petersen@oracle.com>
Cc: Michael R Sweet <msweet@msweet.org>
Cc: Mike Christie <michael.christie@oracle.com>
Cc: Pawel Laszczak <pawell@cadence.com>
Cc: Peter Chen <peter.chen@nxp.com>
Cc: Sudhakar Panneerselvam <sudhakar.panneerselvam@oracle.com>
Cc: Wei Ming Chen <jj251510319013@gmail.com>
Cc: Will McVicker <willmcvicker@google.com>
Cc: Zqiang <qiang.zhang@windriver.com>
Reviewed-By: Lorenzo Colitti <lorenzo@google.com>
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Maciej Żenczykowski <maze@google.com>
Link: https://lore.kernel.org/r/20210608044141.3898496-1-zenczykowski@gmail.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The XHCI controller is required to enter D3hot rather than D3cold for AMD
s2idle on this hardware generation.
Otherwise, the 'Controller Not Ready' (CNR) bit is not being cleared by
host in resume and eventually this results in xhci resume failures during
the s2idle wakeup.
Link: https://lore.kernel.org/linux-usb/1612527609-7053-1-git-send-email-Prike.Liang@amd.com/
Suggested-by: Prike Liang <Prike.Liang@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Cc: stable <stable@vger.kernel.org> # 5.11+
Link: https://lore.kernel.org/r/20210527154534.8900-1-mario.limonciello@amd.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The reasoning for this change is that if we already had
a packet pending, then we also already had a pending timer,
and as such there is no need to reschedule it.
This also prevents packets getting delayed 60 ms worst case
under a tiny packet every 290us transmit load, by keeping the
timeout always relative to the first queued up packet.
(300us delay * 16KB max aggregation / 80 byte packet =~ 60 ms)
As such the first packet is now at most delayed by 300us.
Under low transmit load, this will simply result in us sending
a shorter aggregate, as originally intended.
This patch has the benefit of greatly reducing (by ~10 factor
with 1500 byte frames aggregated into 16 kiB) the number of
(potentially pretty costly) updates to the hrtimer.
Cc: Brooke Basile <brookebasile@gmail.com>
Cc: Bryan O'Donoghue <bryan.odonoghue@linaro.org>
Cc: Felipe Balbi <balbi@kernel.org>
Cc: Lorenzo Colitti <lorenzo@google.com>
Signed-off-by: Maciej Żenczykowski <maze@google.com>
Link: https://lore.kernel.org/r/20210608085438.813960-1-zenczykowski@gmail.com
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Giving support for isp1763 made a little revival to this driver, add
entry in the MAINTAINERS file with me as maintainer.
Acked-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Signed-off-by: Rui Miguel Silva <rui.silva@linaro.org>
Link: https://lore.kernel.org/r/20210607170054.220975-1-rui.silva@linaro.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
iQEzBAABCgAdFiEEDaZUZmFxRG/wNThrSFkpgVDWcbsFAmC9dVwACgkQSFkpgVDW
cbs+oAf/b4mVmJpYIMTBM5tU086YpsZ80StjH3MVx2Hx/F8bM2ewovOUxOWPAY2o
A6ROuLUlU/0TT8U3LL5PEbQDHhuPsf1n3erv6iz63a72uzOouLeHOBYFP+E6aM5z
1mkr2yHXpwFe+aHpstXvOgeR4ano0kzMkx/I3SPH6P8/pXiBSrpvTiBwnxqvZxbb
VQuP9VfEF5e8vXuDiMF1V5qjea/BYnvjt2Yj34An3JHte13oTNRdFM/6aQiIEvt2
r6g0nBJizmhIHwtNYq30KfgFsNjiC9+dPtOCOVyv80c+Qu+F71EwEyqZSrOVLunh
hfxC/U6H8vy73Zo3gJP3xrXE+xvztw==
=OpDk
-----END PGP SIGNATURE-----
Merge tag 'usb-v5.13-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/peter.chen/usb into usb-linus
Peter writes:
Two bug fixes for cdns3 and cdnsp
* tag 'usb-v5.13-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/peter.chen/usb:
usb: cdnsp: Fix deadlock issue in cdnsp_thread_irq_handler
usb: cdns3: Enable TDL_CHK only for OUT ep
Here's a fix for some pipe-direction mismatches in the quatech2 driver,
and a couple of new device ids for ftdi_sio and omninet (and a related
trivial cleanup).
All but the ftdi_sio commit have been in linux-next, and with no
reported issues.
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQQHbPq+cpGvN/peuzMLxc3C7H1lCAUCYLtSqwAKCRALxc3C7H1l
CPj2AQDSM/GeQ658vxYKPIA+8ZDBWubaYhq8KAUDqxL7hBMevAEAkbFhjgtyQt4h
32lqF2YsahxCHckvLXSnxxj1cTJxGAk=
=NKr6
-----END PGP SIGNATURE-----
Merge tag 'usb-serial-5.13-rc5' of https://git.kernel.org/pub/scm/linux/kernel/git/johan/usb-serial into usb-linus
Jonah writes:
USB-serial fixes for 5.13-rc5
Here's a fix for some pipe-direction mismatches in the quatech2 driver,
and a couple of new device ids for ftdi_sio and omninet (and a related
trivial cleanup).
All but the ftdi_sio commit have been in linux-next, and with no
reported issues.
* tag 'usb-serial-5.13-rc5' of https://git.kernel.org/pub/scm/linux/kernel/git/johan/usb-serial:
USB: serial: ftdi_sio: add NovaTech OrionMX product ID
USB: serial: omninet: update driver description
USB: serial: omninet: add device id for Zyxel Omni 56K Plus
USB: serial: quatech2: fix control-request directions
array_index_nospec does not work for uint64_t on 32-bit builds.
However, the size of a memory slot must be less than 20 bits wide
on those system, since the memory slot must fit in the user
address space. So just store it in an unsigned long.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
dt_binding_check reports the below error with the latest schema:
Documentation/devicetree/bindings/media/renesas,drif.yaml:
properties:clock-names:maxItems: False schema does not allow 1
Documentation/devicetree/bindings/media/renesas,drif.yaml:
ignoring, error in schema: properties: clock-names: maxItems
This patch fixes the problem.
Signed-off-by: Fabrizio Castro <fabrizio.castro.jz@renesas.com>
Reviewed-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Reviewed-by: Rob Herring <robh@kernel.org>
Signed-off-by: Rob Herring <robh@kernel.org>
Link: https://lore.kernel.org/r/20210408202436.3706-1-fabrizio.castro.jz@renesas.com
KVM's mechanism for accessing guest memory translates a guest physical
address (gpa) to a host virtual address using the right-shifted gpa
(also known as gfn) and a struct kvm_memory_slot. The translation is
performed in __gfn_to_hva_memslot using the following formula:
hva = slot->userspace_addr + (gfn - slot->base_gfn) * PAGE_SIZE
It is expected that gfn falls within the boundaries of the guest's
physical memory. However, a guest can access invalid physical addresses
in such a way that the gfn is invalid.
__gfn_to_hva_memslot is called from kvm_vcpu_gfn_to_hva_prot, which first
retrieves a memslot through __gfn_to_memslot. While __gfn_to_memslot
does check that the gfn falls within the boundaries of the guest's
physical memory or not, a CPU can speculate the result of the check and
continue execution speculatively using an illegal gfn. The speculation
can result in calculating an out-of-bounds hva. If the resulting host
virtual address is used to load another guest physical address, this
is effectively a Spectre gadget consisting of two consecutive reads,
the second of which is data dependent on the first.
Right now it's not clear if there are any cases in which this is
exploitable. One interesting case was reported by the original author
of this patch, and involves visiting guest page tables on x86. Right
now these are not vulnerable because the hva read goes through get_user(),
which contains an LFENCE speculation barrier. However, there are
patches in progress for x86 uaccess.h to mask kernel addresses instead of
using LFENCE; once these land, a guest could use speculation to read
from the VMM's ring 3 address space. Other architectures such as ARM
already use the address masking method, and would be susceptible to
this same kind of data-dependent access gadgets. Therefore, this patch
proactively protects from these attacks by masking out-of-bounds gfns
in __gfn_to_hva_memslot, which blocks speculation of invalid hvas.
Sean Christopherson noted that this patch does not cover
kvm_read_guest_offset_cached. This however is limited to a few bytes
past the end of the cache, and therefore it is unlikely to be useful in
the context of building a chain of data dependent accesses.
Reported-by: Artemiy Margaritov <artemiy.margaritov@gmail.com>
Co-developed-by: Artemiy Margaritov <artemiy.margaritov@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
When using shadow paging, unload the guest MMU when emulating a guest TLB
flush to ensure all roots are synchronized. From the guest's perspective,
flushing the TLB ensures any and all modifications to its PTEs will be
recognized by the CPU.
Note, unloading the MMU is overkill, but is done to mirror KVM's existing
handling of INVPCID(all) and ensure the bug is squashed. Future cleanup
can be done to more precisely synchronize roots when servicing a guest
TLB flush.
If TDP is enabled, synchronizing the MMU is unnecessary even if nested
TDP is in play, as a "legacy" TLB flush from L1 does not invalidate L1's
TDP mappings. For EPT, an explicit INVEPT is required to invalidate
guest-physical mappings; for NPT, guest mappings are always tagged with
an ASID and thus can only be invalidated via the VMCB's ASID control.
This bug has existed since the introduction of KVM_VCPU_FLUSH_TLB.
It was only recently exposed after Linux guests stopped flushing the
local CPU's TLB prior to flushing remote TLBs (see commit 4ce94eabac,
"x86/mm/tlb: Flush remote and local TLBs concurrently"), but is also
visible in Windows 10 guests.
Tested-by: Maxim Levitsky <mlevitsk@redhat.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Fixes: f38a7b7526 ("KVM: X86: support paravirtualized help for TLB shootdowns")
Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com>
[sean: massaged comment and changelog]
Message-Id: <20210531172256.2908-1-jiangshanlai@gmail.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
In the cache missing code path of cached device, if a proper location
from the internal B+ tree is matched for a cache miss range, function
cached_dev_cache_miss() will be called in cache_lookup_fn() in the
following code block,
[code block 1]
526 unsigned int sectors = KEY_INODE(k) == s->iop.inode
527 ? min_t(uint64_t, INT_MAX,
528 KEY_START(k) - bio->bi_iter.bi_sector)
529 : INT_MAX;
530 int ret = s->d->cache_miss(b, s, bio, sectors);
Here s->d->cache_miss() is the call backfunction pointer initialized as
cached_dev_cache_miss(), the last parameter 'sectors' is an important
hint to calculate the size of read request to backing device of the
missing cache data.
Current calculation in above code block may generate oversized value of
'sectors', which consequently may trigger 2 different potential kernel
panics by BUG() or BUG_ON() as listed below,
1) BUG_ON() inside bch_btree_insert_key(),
[code block 2]
886 BUG_ON(b->ops->is_extents && !KEY_SIZE(k));
2) BUG() inside biovec_slab(),
[code block 3]
51 default:
52 BUG();
53 return NULL;
All the above panics are original from cached_dev_cache_miss() by the
oversized parameter 'sectors'.
Inside cached_dev_cache_miss(), parameter 'sectors' is used to calculate
the size of data read from backing device for the cache missing. This
size is stored in s->insert_bio_sectors by the following lines of code,
[code block 4]
909 s->insert_bio_sectors = min(sectors, bio_sectors(bio) + reada);
Then the actual key inserting to the internal B+ tree is generated and
stored in s->iop.replace_key by the following lines of code,
[code block 5]
911 s->iop.replace_key = KEY(s->iop.inode,
912 bio->bi_iter.bi_sector + s->insert_bio_sectors,
913 s->insert_bio_sectors);
The oversized parameter 'sectors' may trigger panic 1) by BUG_ON() from
the above code block.
And the bio sending to backing device for the missing data is allocated
with hint from s->insert_bio_sectors by the following lines of code,
[code block 6]
926 cache_bio = bio_alloc_bioset(GFP_NOWAIT,
927 DIV_ROUND_UP(s->insert_bio_sectors, PAGE_SECTORS),
928 &dc->disk.bio_split);
The oversized parameter 'sectors' may trigger panic 2) by BUG() from the
agove code block.
Now let me explain how the panics happen with the oversized 'sectors'.
In code block 5, replace_key is generated by macro KEY(). From the
definition of macro KEY(),
[code block 7]
71 #define KEY(inode, offset, size) \
72 ((struct bkey) { \
73 .high = (1ULL << 63) | ((__u64) (size) << 20) | (inode), \
74 .low = (offset) \
75 })
Here 'size' is 16bits width embedded in 64bits member 'high' of struct
bkey. But in code block 1, if "KEY_START(k) - bio->bi_iter.bi_sector" is
very probably to be larger than (1<<16) - 1, which makes the bkey size
calculation in code block 5 is overflowed. In one bug report the value
of parameter 'sectors' is 131072 (= 1 << 17), the overflowed 'sectors'
results the overflowed s->insert_bio_sectors in code block 4, then makes
size field of s->iop.replace_key to be 0 in code block 5. Then the 0-
sized s->iop.replace_key is inserted into the internal B+ tree as cache
missing check key (a special key to detect and avoid a racing between
normal write request and cache missing read request) as,
[code block 8]
915 ret = bch_btree_insert_check_key(b, &s->op, &s->iop.replace_key);
Then the 0-sized s->iop.replace_key as 3rd parameter triggers the bkey
size check BUG_ON() in code block 2, and causes the kernel panic 1).
Another kernel panic is from code block 6, is by the bvecs number
oversized value s->insert_bio_sectors from code block 4,
min(sectors, bio_sectors(bio) + reada)
There are two possibility for oversized reresult,
- bio_sectors(bio) is valid, but bio_sectors(bio) + reada is oversized.
- sectors < bio_sectors(bio) + reada, but sectors is oversized.
From a bug report the result of "DIV_ROUND_UP(s->insert_bio_sectors,
PAGE_SECTORS)" from code block 6 can be 344, 282, 946, 342 and many
other values which larther than BIO_MAX_VECS (a.k.a 256). When calling
bio_alloc_bioset() with such larger-than-256 value as the 2nd parameter,
this value will eventually be sent to biovec_slab() as parameter
'nr_vecs' in following code path,
bio_alloc_bioset() ==> bvec_alloc() ==> biovec_slab()
Because parameter 'nr_vecs' is larger-than-256 value, the panic by BUG()
in code block 3 is triggered inside biovec_slab().
From the above analysis, we know that the 4th parameter 'sector' sent
into cached_dev_cache_miss() may cause overflow in code block 5 and 6,
and finally cause kernel panic in code block 2 and 3. And if result of
bio_sectors(bio) + reada exceeds valid bvecs number, it may also trigger
kernel panic in code block 3 from code block 6.
Now the almost-useless readahead size for cache missing request back to
backing device is removed, this patch can fix the oversized issue with
more simpler method.
- add a local variable size_limit, set it by the minimum value from
the max bkey size and max bio bvecs number.
- set s->insert_bio_sectors by the minimum value from size_limit,
sectors, and the sectors size of bio.
- replace sectors by s->insert_bio_sectors to do bio_next_split.
By the above method with size_limit, s->insert_bio_sectors will never
result oversized replace_key size or bio bvecs number. And split bio
'miss' from bio_next_split() will always match the size of 'cache_bio',
that is the current maximum bio size we can sent to backing device for
fetching the cache missing data.
Current problmatic code can be partially found since Linux v3.13-rc1,
therefore all maintained stable kernels should try to apply this fix.
Reported-by: Alexander Ullrich <ealex1979@gmail.com>
Reported-by: Diego Ercolani <diego.ercolani@gmail.com>
Reported-by: Jan Szubiak <jan.szubiak@linuxpolska.pl>
Reported-by: Marco Rebhan <me@dblsaiko.net>
Reported-by: Matthias Ferdinand <bcache@mfedv.net>
Reported-by: Victor Westerhuis <victor@westerhu.is>
Reported-by: Vojtech Pavlik <vojtech@suse.cz>
Reported-and-tested-by: Rolf Fokkens <rolf@rolffokkens.nl>
Reported-and-tested-by: Thorsten Knabe <linux@thorsten-knabe.de>
Signed-off-by: Coly Li <colyli@suse.de>
Cc: stable@vger.kernel.org
Cc: Christoph Hellwig <hch@lst.de>
Cc: Kent Overstreet <kent.overstreet@gmail.com>
Cc: Nix <nix@esperi.org.uk>
Cc: Takashi Iwai <tiwai@suse.com>
Link: https://lore.kernel.org/r/20210607125052.21277-3-colyli@suse.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
For read cache missing, bcache defines a readahead size for the read I/O
request to the backing device for the missing data. This readahead size
is initialized to 0, and almost no one uses it to avoid unnecessary read
amplifying onto backing device and write amplifying onto cache device.
Considering upper layer file system code has readahead logic allready
and works fine with readahead_cache_policy sysfile interface, we don't
have to keep bcache self-defined readahead anymore.
This patch removes the bcache self-defined readahead for cache missing
request for backing device, and the readahead sysfs file interfaces are
removed as well.
This is the preparation for next patch to fix potential kernel panic due
to oversized request in a simpler method.
Reported-by: Alexander Ullrich <ealex1979@gmail.com>
Reported-by: Diego Ercolani <diego.ercolani@gmail.com>
Reported-by: Jan Szubiak <jan.szubiak@linuxpolska.pl>
Reported-by: Marco Rebhan <me@dblsaiko.net>
Reported-by: Matthias Ferdinand <bcache@mfedv.net>
Reported-by: Victor Westerhuis <victor@westerhu.is>
Reported-by: Vojtech Pavlik <vojtech@suse.cz>
Reported-and-tested-by: Rolf Fokkens <rolf@rolffokkens.nl>
Reported-and-tested-by: Thorsten Knabe <linux@thorsten-knabe.de>
Signed-off-by: Coly Li <colyli@suse.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Cc: stable@vger.kernel.org
Cc: Kent Overstreet <kent.overstreet@gmail.com>
Cc: Nix <nix@esperi.org.uk>
Cc: Takashi Iwai <tiwai@suse.com>
Link: https://lore.kernel.org/r/20210607125052.21277-2-colyli@suse.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
It was reported that a bug on arm64 caused a bad ip address to be used for
updating into a nop in ftrace_init(), but the error path (rightfully)
returned -EINVAL and not -EFAULT, as the bug caused more than one error to
occur. But because -EINVAL was returned, the ftrace_bug() tried to report
what was at the location of the ip address, and read it directly. This
caused the machine to panic, as the ip was not pointing to a valid memory
address.
Instead, read the ip address with copy_from_kernel_nofault() to safely
access the memory, and if it faults, report that the address faulted,
otherwise report what was in that location.
Link: https://lore.kernel.org/lkml/20210607032329.28671-1-mark-pk.tsai@mediatek.com/
Cc: stable@vger.kernel.org
Fixes: 05736a427f ("ftrace: warn on failure to disable mcount callers")
Reported-by: Mark-PK Tsai <mark-pk.tsai@mediatek.com>
Tested-by: Mark-PK Tsai <mark-pk.tsai@mediatek.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Since the "fallthrough" is defined only in the kernel, building
lib/bootconfig.c as a part of user-space tools causes a build
error.
Add a dummy fallthrough to avoid the build error.
Link: https://lkml.kernel.org/r/162087519356.442660.11385099982318160180.stgit@devnote2
Cc: Ingo Molnar <mingo@kernel.org>
Cc: stable@vger.kernel.org
Fixes: 4c1ca831ad ("Revert "lib: Revert use of fallthrough pseudo-keyword in lib/"")
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Fix to return a negative error code from the error handling case instead
of 0, as done elsewhere in this function.
Link: https://lkml.kernel.org/r/20210508034216.2277-1-thunder.leizhen@huawei.com
Fixes: a995e6bc05 ("tools/bootconfig: Fix to check the write failure correctly")
Reported-by: Hulk Robot <hulkci@huawei.com>
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Allow creating FDB steering rules only when in switchdev mode.
The only software model where a userspace application can manipulate
FDB entries is when it manages the eswitch. This is only possible in
switchdev mode where we expose a single RDMA device with representors
for all the vports that are connected to the eswitch.
Fixes: 52438be441 ("RDMA/mlx5: Allow inserting a steering rule to the FDB")
Link: https://lore.kernel.org/r/e928ae7c58d07f104716a2a8d730963d1bd01204.1623052923.git.leonro@nvidia.com
Reviewed-by: Maor Gottlieb <maorg@nvidia.com>
Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
If a6xx_hw_init() fails before creating the shadow_bo, the a6xx_pm_suspend
code referencing it will crash. Change the condition to one that avoids
this problem (note: creation of shadow_bo is behind this same condition)
Fixes: e8b0b994c3 ("drm/msm/a6xx: Clear shadow on suspend")
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Akhil P Oommen <akhilpo@codeaurora.org>
Link: https://lore.kernel.org/r/20210513171431.18632-6-jonathan@marek.ca
Signed-off-by: Rob Clark <robdclark@chromium.org>
Value was shifted in the wrong direction, resulting in the field always
being zero, which is incorrect for A650.
Fixes: d0bac4e9cd ("drm/msm/a6xx: set ubwc config for A640 and A650")
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Akhil P Oommen <akhilpo@codeaurora.org>
Link: https://lore.kernel.org/r/20210513171431.18632-4-jonathan@marek.ca
Signed-off-by: Rob Clark <robdclark@chromium.org>
Update CP_PROTECT register programming based on downstream.
A6XX_PROTECT_RW is renamed to A6XX_PROTECT_NORDWR to make things aligned
and also be more clear about what it does.
Note that this required switching to use the CP_ALWAYS_ON_COUNTER as the
GMU counter is not accessible from the cmdstream. Which also means
using the CPU counter for the msm_gpu_submit_flush() tracepoint (as
catapult depends on being able to compare this to the start/end values
captured in cmdstream). This may need to be revisited when IFPC is
enabled.
Also, compared to downstream, this opens up CP_PERFCTR_CP_SEL as the
userspace performance tooling (fdperf and pps-producer) expect to be
able to configure the CP counters.
Fixes: 4b565ca5a2 ("drm/msm: Add A6XX device support")
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Akhil P Oommen <akhilpo@codeaurora.org>
Link: https://lore.kernel.org/r/20210513171431.18632-5-jonathan@marek.ca
[switch to CP_ALWAYS_ON_COUNTER, open up CP_PERFCNTR_CP_SEL, and spiff
up commit msg]
Signed-off-by: Rob Clark <robdclark@chromium.org>
I met a gpu addr bug recently and the kernel log
tells me the pc is memcpy/memset and link register is
radeon_uvd_resume.
As we know, in some architectures, optimized memcpy/memset
may not work well on device memory. Trival memcpy_toio/memset_io
can fix this problem.
BTW, amdgpu has already done it in:
commit ba0b2275a6 ("drm/amdgpu: use memcpy_to/fromio for UVD fw upload"),
that's why it has no this issue on the same gpu and platform.
Signed-off-by: Chen Li <chenli@uniontech.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
In preparation to enable -Wimplicit-fallthrough for Clang, fix a warning
by explicitly adding a break statement instead of letting the code fall
through to the next case.
Link: https://github.com/KSPP/linux/issues/115
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
RLC_CP_SCHEDULERS and RLC_SPARE_INT0 have different
offsets for Sienna Cichlid
Signed-off-by: Rohit Khaire <rohit.khaire@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
drm_err meant broken user space could spam dmesg.
Fixes: f258907fdd "drm/amdgpu: Verify bo size can fit framebuffer size on init."
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Michel Dänzer <mdaenzer@redhat.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
It will cause error when alloc memory larger than 128KB in
amdgpu_bo_create->kzalloc. So it needs to switch kzalloc to kvzalloc.
Call Trace:
alloc_pages_current+0x6a/0xe0
kmalloc_order+0x32/0xb0
kmalloc_order_trace+0x1e/0x80
__kmalloc+0x249/0x2d0
amdgpu_bo_create+0x102/0x500 [amdgpu]
? xas_create+0x264/0x3e0
amdgpu_bo_create_vm+0x32/0x60 [amdgpu]
amdgpu_vm_pt_create+0xf5/0x260 [amdgpu]
amdgpu_vm_init+0x1fd/0x4d0 [amdgpu]
Signed-off-by: Changfeng <Changfeng.Zhu@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Use the __string() machinery provided by the tracing subystem to make a
copy of the string literals consumed by the "nested VM-Enter failed"
tracepoint. A complete copy is necessary to ensure that the tracepoint
can't outlive the data/memory it consumes and deference stale memory.
Because the tracepoint itself is defined by kvm, if kvm-intel and/or
kvm-amd are built as modules, the memory holding the string literals
defined by the vendor modules will be freed when the module is unloaded,
whereas the tracepoint and its data in the ring buffer will live until
kvm is unloaded (or "indefinitely" if kvm is built-in).
This bug has existed since the tracepoint was added, but was recently
exposed by a new check in tracing to detect exactly this type of bug.
fmt: '%s%s
' current_buffer: ' vmx_dirty_log_t-140127 [003] .... kvm_nested_vmenter_failed: '
WARNING: CPU: 3 PID: 140134 at kernel/trace/trace.c:3759 trace_check_vprintf+0x3be/0x3e0
CPU: 3 PID: 140134 Comm: less Not tainted 5.13.0-rc1-ce2e73ce600a-req #184
Hardware name: ASUS Q87M-E/Q87M-E, BIOS 1102 03/03/2014
RIP: 0010:trace_check_vprintf+0x3be/0x3e0
Code: <0f> 0b 44 8b 4c 24 1c e9 a9 fe ff ff c6 44 02 ff 00 49 8b 97 b0 20
RSP: 0018:ffffa895cc37bcb0 EFLAGS: 00010282
RAX: 0000000000000000 RBX: ffffa895cc37bd08 RCX: 0000000000000027
RDX: 0000000000000027 RSI: 00000000ffffdfff RDI: ffff9766cfad74f8
RBP: ffffffffc0a041d4 R08: ffff9766cfad74f0 R09: ffffa895cc37bad8
R10: 0000000000000001 R11: 0000000000000001 R12: ffffffffc0a041d4
R13: ffffffffc0f4dba8 R14: 0000000000000000 R15: ffff976409f2c000
FS: 00007f92fa200740(0000) GS:ffff9766cfac0000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000559bd11b0000 CR3: 000000019fbaa002 CR4: 00000000001726e0
Call Trace:
trace_event_printf+0x5e/0x80
trace_raw_output_kvm_nested_vmenter_failed+0x3a/0x60 [kvm]
print_trace_line+0x1dd/0x4e0
s_show+0x45/0x150
seq_read_iter+0x2d5/0x4c0
seq_read+0x106/0x150
vfs_read+0x98/0x180
ksys_read+0x5f/0xe0
do_syscall_64+0x40/0xb0
entry_SYSCALL_64_after_hwframe+0x44/0xae
Cc: Steven Rostedt <rostedt@goodmis.org>
Fixes: 380e0055bc ("KVM: nVMX: trace nested VM-Enter failures detected by H/W")
Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Message-Id: <20210607175748.674002-1-seanjc@google.com>