Commit Graph

93458 Commits

Author SHA1 Message Date
Peter Maydell
06985cc3fe hw/intc/arm_gicv3_its: Pass CTEntry to update_cte()
Make update_cte() take a CTEntry struct rather than all the fields
of the new CTE as separate arguments.

This brings it into line with the update_dte() API.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20220201193207.2771604-6-peter.maydell@linaro.org
2022-02-08 10:56:28 +00:00
Peter Maydell
d37cf49b11 hw/intc/arm_gicv3_its: Keep CTEs as a struct, not a raw uint64_t
In the ITS, a CTE is an entry in the collection table, which contains
multiple fields. Currently the function get_cte() which reads one
entry from the device table returns a success/failure boolean and
passes back the raw 64-bit integer CTE value via a pointer argument.
We then extract fields from the CTE as we need them.

Create a real C struct with the same fields as the CTE, and
populate it in get_cte(), so that that function and update_cte()
are the only ones which need to care about the in-guest-memory
format of the CTE.

This brings get_cte()'s API into line with get_dte().

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20220201193207.2771604-5-peter.maydell@linaro.org
2022-02-08 10:56:28 +00:00
Peter Maydell
22d62b08ba hw/intc/arm_gicv3_its: Pass DTEntry to update_dte()
Make update_dte() take a DTEntry struct rather than all the fields of
the new DTE as separate arguments.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20220201193207.2771604-4-peter.maydell@linaro.org
2022-02-08 10:56:28 +00:00
Peter Maydell
4acf93e193 hw/intc/arm_gicv3_its: Keep DTEs as a struct, not a raw uint64_t
In the ITS, a DTE is an entry in the device table, which contains
multiple fields. Currently the function get_dte() which reads one
entry from the device table returns it as a raw 64-bit integer,
which we then pass around in that form, only extracting fields
from it as we need them.

Create a real C struct with the same fields as the DTE, and
populate it in get_dte(), so that that function and update_dte()
are the only ones that need to care about the in-guest-memory
format of the DTE.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20220201193207.2771604-3-peter.maydell@linaro.org
2022-02-08 10:56:28 +00:00
Peter Maydell
b6f96009ac hw/intc/arm_gicv3_its: Use address_space_map() to access command queue packets
Currently the ITS accesses each 8-byte doubleword in a 4-doubleword
command packet with a separate address_space_ldq_le() call.  This is
awkward because the individual command processing functions have
ended up with code to handle "load more doublewords out of the
packet", which is both unwieldy and also a potential source of bugs
because it's not obvious when looking at a line that pulls a field
out of the 'value' variable which of the 4 doublewords that variable
currently holds.

Switch to using address_space_map() to map the whole command packet
at once and fish the four doublewords out of it.  Then each process_*
function can start with a few lines of code that extract the fields
it cares about.

This requires us to split out the guts of process_its_cmd() into a
new do_process_its_cmd(), because we were previously overloading the
value and offset arguments as a backdoor way to directly pass the
devid and eventid from a write to GITS_TRANSLATER.  The new
do_process_its_cmd() takes those arguments directly, and
process_its_cmd() is just a wrapper that does the "read fields from
command packet" part.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20220201193207.2771604-2-peter.maydell@linaro.org
2022-02-08 10:56:28 +00:00
Eric Auger
43530095e1 hw/arm/smmuv3: Fix device reset
We currently miss a bunch of register resets in the device reset
function. This sometimes prevents the guest from rebooting after
a system_reset (with virtio-blk-pci). For instance, we may get
the following errors:

invalid STE
smmuv3-iommu-memory-region-0-0 translation failed for iova=0x13a9d2000(SMMU_EVT_C_BAD_STE)
Invalid read at addr 0x13A9D2000, size 2, region '(null)', reason: rejected
invalid STE
smmuv3-iommu-memory-region-0-0 translation failed for iova=0x13a9d2000(SMMU_EVT_C_BAD_STE)
Invalid write at addr 0x13A9D2000, size 2, region '(null)', reason: rejected
invalid STE

Signed-off-by: Eric Auger <eric.auger@redhat.com>
Message-id: 20220202111602.627429-1-eric.auger@redhat.com
Fixes: 10a83cb988 ("hw/arm/smmuv3: Skeleton")
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2022-02-08 10:56:28 +00:00
Richard Petri
77cd997161 hw/timer/armv7m_systick: Update clock source before enabling timer
Starting the SysTick timer and changing the clock source a the same time
will result in an error, if the previous clock period was zero. For exmaple,
on the mps2-tz platforms, no refclk is present. Right after reset, the
configured ptimer period is zero, and trying to enabling it will turn it off
right away. E.g., code running on the platform setting

    SysTick->CTRL  = SysTick_CTRL_CLKSOURCE_Msk | SysTick_CTRL_ENABLE_Msk;

should change the clock source and enable the timer on real hardware, but
resulted in an error in qemu.

Signed-off-by: Richard Petri <git@rpls.de>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Message-id: 20220201192650.289584-1-git@rpls.de
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2022-02-08 10:56:28 +00:00
Alex Bennée
c737d86804 arm: force flag recalculation when messing with DAIF
The recently introduced debug tests in kvm-unit-tests exposed an error
in our handling of singlestep cause by stale hflags. This is caught by
--enable-debug-tcg when running the tests.

Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Reported-by: Andrew Jones <drjones@redhat.com>
Tested-by: Andrew Jones <drjones@redhat.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20220202122353.457084-1-alex.bennee@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2022-02-08 10:56:28 +00:00
Edgar E. Iglesias
40874a383d hw/arm: versal-virt: Always call arm_load_kernel()
Always call arm_load_kernel() regardless of kernel_filename being
set. This is needed because arm_load_kernel() sets up reset for
the CPUs.

Fixes: 6f16da53ff (hw/arm: versal: Add a virtual Xilinx Versal board)
Reported-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
Message-id: 20220130110313.4045351-2-edgar.iglesias@gmail.com
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2022-02-08 10:56:28 +00:00
Peter Maydell
e4b0bb8071 hw/arm/boot: Drop existing dtb /psci node rather than retaining it
If we're using PSCI emulation, we add a /psci node to the device tree
we pass to the guest.  At the moment, if the dtb already has a /psci
node in it, we retain it, rather than replacing it. (This behaviour
was added in commit c39770cd63 in 2018.)

This is a problem if the existing node doesn't match our PSCI
emulation.  In particular, it might specify the wrong method (HVC vs
SMC), or wrong function IDs for cpu_suspend/cpu_off/etc, in which
case the guest will not get the behaviour it wants when it makes PSCI
calls.

An example of this is trying to boot the highbank or midway board
models using the device tree supplied in the kernel sources: this
device tree includes a /psci node that specifies function IDs that
don't match the (PSCI 0.2 compliant) IDs that QEMU uses.  The dtb
cpu_suspend function ID happens to match the PSCI 0.2 cpu_off ID, so
the guest hangs after booting when the kernel tries to idle the CPU
and instead it gets turned off.

Instead of retaining an existing /psci node, delete it entirely
and replace it with a node whose properties match QEMU's PSCI
emulation behaviour. This matches the way we handle /memory nodes,
where we also delete any existing nodes and write in ones that
match the way QEMU is going to behave.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Niek Linnenbank <nieklinnenbank@gmail.com>
Tested-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
Tested-by: Cédric Le Goater <clg@kaod.org>
Tested-by: Niek Linnenbank <nieklinnenbank@gmail.com>
Message-id: 20220127154639.2090164-17-peter.maydell@linaro.org
2022-02-08 10:56:28 +00:00
Peter Maydell
d6dc926e6e hw/arm/boot: Drop nb_cpus field from arm_boot_info
We use the arm_boot_info::nb_cpus field in only one place, and that
place can easily get the number of CPUs locally rather than relying
on the board code to have set the field correctly.  (At least one
board, xlnx-versal-virt, does not set the field despite having more
than one CPU.)

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Niek Linnenbank <nieklinnenbank@gmail.com>
Tested-by: Cédric Le Goater <clg@kaod.org>
Tested-by: Niek Linnenbank <nieklinnenbank@gmail.com>
Message-id: 20220127154639.2090164-16-peter.maydell@linaro.org
2022-02-08 10:56:28 +00:00
Peter Maydell
45dd668f23 hw/arm/highbank: Drop unused secondary boot stub code
The highbank and midway board code includes boot-stub code for
handling secondary CPU boot which keeps the secondaries in a pen
until the primary writes to a known location with the address they
should jump to.

This code is never used, because the boards enable QEMU's PSCI
emulation, so secondary CPUs are kept powered off until the PSCI call
which turns them on, and then start execution from the address given
by the guest in that PSCI call.  Delete the unreachable code.

(The code was wrong for midway in any case -- on the Cortex-A15 the
GIC CPU interface registers are at a different offset from PERIPHBASE
compared to the Cortex-A9, and the code baked-in the offsets for
highbank's A9.)

Note that this commit implicitly depends on the preceding "Don't
write secondary boot stub if using PSCI" commit -- the default
secondary-boot stub code overlaps with one of the highbank-specific
bootcode rom blobs, so we must suppress the secondary-boot
stub code entirely, not merely replace the highbank-specific
version with the default.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Niek Linnenbank <nieklinnenbank@gmail.com>
Tested-by: Cédric Le Goater <clg@kaod.org>
Tested-by: Niek Linnenbank <nieklinnenbank@gmail.com>
Message-id: 20220127154639.2090164-15-peter.maydell@linaro.org
2022-02-08 10:56:28 +00:00
Peter Maydell
d4a29ed6db hw/arm/boot: Don't write secondary boot stub if using PSCI
If we're using PSCI emulation to start secondary CPUs, there is no
point in writing the "secondary boot" stub code, because it will
never be used -- secondary CPUs start powered-off, and when powered
on are set to begin execution at the address specified by the guest's
power-on PSCI call, not at the stub.

Move the call to the hook that writes the secondary boot stub code so
that we can do it only if we're starting a Linux kernel and not using
PSCI.

(None of the users of the hook care about the ordering of its call
relative to anything else: they only use it to write a rom blob to
guest memory.)

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Niek Linnenbank <nieklinnenbank@gmail.com>
Tested-by: Cédric Le Goater <clg@kaod.org>
Tested-by: Niek Linnenbank <nieklinnenbank@gmail.com>
Message-id: 20220127154639.2090164-14-peter.maydell@linaro.org
2022-02-08 10:56:28 +00:00
Peter Maydell
dc888dd43b hw/arm/boot: Prevent setting both psci_conduit and secure_board_setup
Now that we have dealt with the one special case (highbank) that needed
to set both psci_conduit and secure_board_setup, we don't need to
allow that combination any more. It doesn't make sense in general,
so use an assertion to ensure we don't add new boards that do it
by accident without thinking through the consequences.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Niek Linnenbank <nieklinnenbank@gmail.com>
Tested-by: Cédric Le Goater <clg@kaod.org>
Tested-by: Niek Linnenbank <nieklinnenbank@gmail.com>
Message-id: 20220127154639.2090164-13-peter.maydell@linaro.org
2022-02-08 10:56:28 +00:00
Peter Maydell
61b82973e7 hw/arm/highbank: Drop use of secure_board_setup
Guest code on highbank may make non-PSCI SMC calls in order to
enable/disable the L2x0 cache controller (see the Linux kernel's
arch/arm/mach-highbank/highbank.c highbank_l2c310_write_sec()
function).  The ABI for this is documented in kernel commit
8e56130dcb as being borrowed from the OMAP44xx ROM.  The OMAP44xx TRM
documents this function ID as having no return value and potentially
trashing all guest registers except SP and PC. For QEMU's purposes
(where our L2x0 model is a stub and enabling or disabling it doesn't
affect the guest behaviour) a simple "do nothing" SMC is fine.

We currently implement this NOP behaviour using a little bit of
Secure code we run before jumping to the guest kernel, which is
written by arm_write_secure_board_setup_dummy_smc().  The code sets
up a set of Secure vectors where the SMC entry point returns without
doing anything.

Now that the PSCI SMC emulation handles all SMC calls (setting r0 to
an error code if the input r0 function identifier is not recognized),
we can use that default behaviour as sufficient for the highbank
cache controller call.  (Because the guest code assumes r0 has no
interesting value on exit it doesn't matter that we set it to the
error code).  We can therefore delete the highbank board code that
sets secure_board_setup to true and writes the secure-code bootstub.

(Note that because the OMAP44xx ABI puts function-identifiers in
r12 and PSCI uses r0, we only avoid a clash because Linux's code
happens to put the function-identifier in both registers. But this
is true also when the kernel is running on real firmware that
implements both ABIs as far as I can see.)

This change fixes in passing booting on the 'midway' board model,
which has been completely broken since we added support for Hyp
mode to the Cortex-A15 CPU. When we did that boot.c was made to
start running the guest code in Hyp mode; this includes the
board_setup hook, which instantly UNDEFs because the NSACR is
not accessible from Hyp. (Put another way, we never made the
secure_board_setup hook support cope with Hyp mode.)

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Niek Linnenbank <nieklinnenbank@gmail.com>
Tested-by: Cédric Le Goater <clg@kaod.org>
Tested-by: Niek Linnenbank <nieklinnenbank@gmail.com>
Message-id: 20220127154639.2090164-12-peter.maydell@linaro.org
2022-02-08 10:56:28 +00:00
Peter Maydell
3f37979bf5 arm: tcg: Adhere to SMCCC 1.3 section 5.2
The SMCCC 1.3 spec section 5.2 says

  The Unknown SMC Function Identifier is a sign-extended value of (-1)
  that is returned in the R0, W0 or X0 registers. An implementation must
  return this error code when it receives:

    * An SMC or HVC call with an unknown Function Identifier
    * An SMC or HVC call for a removed Function Identifier
    * An SMC64/HVC64 call from AArch32 state

To comply with these statements, let's always return -1 when we encounter
an unknown HVC or SMC call.

[PMM:
 This is a reinstatement of commit 9fcd15b919, previously
 reverted in commit 4825eaae4fdd56fba0f; we can do this now that we
 have arranged for all the affected board models to not enable the
 PSCI emulation if they are running guest code at EL3. This avoids
 the regressions that caused us to revert the change for 7.0.]

Signed-off-by: Alexander Graf <agraf@csgraf.de>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Niek Linnenbank <nieklinnenbank@gmail.com>
Tested-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
Tested-by: Cédric Le Goater <clg@kaod.org>
Tested-by: Niek Linnenbank <nieklinnenbank@gmail.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2022-02-08 10:56:28 +00:00
Peter Maydell
33284d482c hw/arm: highbank: For EL3 guests, don't enable PSCI, start all cores
Change the highbank/midway boards to use the new boot.c functionality
to allow us to enable psci-conduit only if the guest is being booted
in EL1 or EL2, so that if the user runs guest EL3 firmware code our
PSCI emulation doesn't get in its way.

To do this we stop setting the psci-conduit and start-powered-off
properties on the CPU objects in the board code, and instead set the
psci_conduit field in the arm_boot_info struct to tell the common
boot loader code that we'd like PSCI if the guest is starting at an
EL that it makes sense with (in which case it will set these
properties).

This means that when running guest code at EL3, all the cores
will start execution at once on poweron. This matches the
real hardware behaviour. (A brief description of the hardware
boot process is in the u-boot documentation for these boards:
https://u-boot.readthedocs.io/en/latest/board/highbank/highbank.html#boot-process
 -- in theory one might run the 'a9boot'/'a15boot' secure monitor
code in QEMU, though we probably don't emulate enough for that.)

This affects the highbank and midway boards.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Niek Linnenbank <nieklinnenbank@gmail.com>
Tested-by: Cédric Le Goater <clg@kaod.org>
Tested-by: Niek Linnenbank <nieklinnenbank@gmail.com>
Message-id: 20220127154639.2090164-10-peter.maydell@linaro.org
2022-02-08 10:56:28 +00:00
Peter Maydell
52c235ad75 hw/arm/virt: Let boot.c handle PSCI enablement
Instead of setting the CPU psci-conduit and start-powered-off
properties in the virt board code, set the arm_boot_info psci_conduit
field so that the boot.c code can do it.

This will fix a corner case where we were incorrectly enabling PSCI
emulation when booting guest code into EL3 because it was an ELF file
passed to -kernel or to the generic loader.  (EL3 guest code started
via -bios or -pflash was already being run with PSCI emulation
disabled.)

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Niek Linnenbank <nieklinnenbank@gmail.com>
Tested-by: Cédric Le Goater <clg@kaod.org>
Tested-by: Niek Linnenbank <nieklinnenbank@gmail.com>
Message-id: 20220127154639.2090164-9-peter.maydell@linaro.org
2022-02-08 10:56:28 +00:00
Peter Maydell
9437a76e10 hw/arm/versal: Let boot.c handle PSCI enablement
Instead of setting the CPU psci-conduit and start-powered-off
properties in the xlnx-versal-virt board code, set the arm_boot_info
psci_conduit field so that the boot.c code can do it.

This will fix a corner case where we were incorrectly enabling PSCI
emulation when booting guest code into EL3 because it was an ELF file
passed to -kernel.  (EL3 guest code started via -bios, -pflash, or
the generic loader was already being run with PSCI emulation
disabled.)

Note that EL3 guest code has no way to turn on the secondary CPUs
because there's no emulated power controller, but this was already
true for EL3 guest code run via -bios, -pflash, or the generic
loader.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Niek Linnenbank <nieklinnenbank@gmail.com>
Tested-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
Tested-by: Cédric Le Goater <clg@kaod.org>
Tested-by: Niek Linnenbank <nieklinnenbank@gmail.com>
Message-id: 20220127154639.2090164-8-peter.maydell@linaro.org
2022-02-08 10:56:27 +00:00
Peter Maydell
50c785f2c7 hw/arm/xlnx-zcu102: Don't enable PSCI conduit when booting guest in EL3
Change the Xilinx ZynqMP-based board xlnx-zcu102 to use the new
boot.c functionality to allow us to enable psci-conduit only if
the guest is being booted in EL1 or EL2, so that if the user runs
guest EL3 firmware code our PSCI emulation doesn't get in its
way.

To do this we stop setting the psci-conduit property on the CPU
objects in the SoC code, and instead set the psci_conduit field in
the arm_boot_info struct to tell the common boot loader code that
we'd like PSCI if the guest is starting at an EL that it makes
sense with.

Note that this means that EL3 guest code will have no way
to power on secondary cores, because we don't model any
kind of power controller that does that on this SoC.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Niek Linnenbank <nieklinnenbank@gmail.com>
Tested-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
Tested-by: Cédric Le Goater <clg@kaod.org>
Tested-by: Niek Linnenbank <nieklinnenbank@gmail.com>
Acked-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20220127154639.2090164-7-peter.maydell@linaro.org
2022-02-08 10:56:27 +00:00
Peter Maydell
49865b9014 hw/arm: allwinner: Don't enable PSCI conduit when booting guest in EL3
Change the allwinner-h3 based board to use the new boot.c
functionality to allow us to enable psci-conduit only if the guest is
being booted in EL1 or EL2, so that if the user runs guest EL3
firmware code our PSCI emulation doesn't get in its way.

To do this we stop setting the psci-conduit property on the CPU
objects in the SoC code, and instead set the psci_conduit field in
the arm_boot_info struct to tell the common boot loader code that
we'd like PSCI if the guest is starting at an EL that it makes sense
with.

This affects the orangepi-pc board.

This commit leaves the secondary CPUs in the powered-down state if
the guest is booting at EL3, which is the same behaviour as before
this commit.  The secondaries can no longer be started by that EL3
code making a PSCI call but can still be started via the CPU
Configuration Module registers (which we model in
hw/misc/allwinner-cpucfg.c).

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Niek Linnenbank <nieklinnenbank@gmail.com>
Tested-by: Cédric Le Goater <clg@kaod.org>
Tested-by: Niek Linnenbank <nieklinnenbank@gmail.com>
Message-id: 20220127154639.2090164-6-peter.maydell@linaro.org
2022-02-08 10:56:27 +00:00
Peter Maydell
ae2474f118 hw/arm: imx: Don't enable PSCI conduit when booting guest in EL3
Change the iMX-SoC based boards to use the new boot.c functionality
to allow us to enable psci-conduit only if the guest is being booted
in EL1 or EL2, so that if the user runs guest EL3 firmware code our
PSCI emulation doesn't get in its way.

To do this we stop setting the psci-conduit property on the CPU
objects in the SoC code, and instead set the psci_conduit field in
the arm_boot_info struct to tell the common boot loader code that
we'd like PSCI if the guest is starting at an EL that it makes
sense with.

This affects the mcimx6ul-evk and mcimx7d-sabre boards.

Note that for the mcimx7d board, this means that when running guest
code at EL3 there is currently no way to power on the secondary CPUs,
because we do not currently have a model of the system reset
controller module which should be used to do that for the imx7 SoC,
only for the imx6 SoC.  (Previously EL3 code which knew it was
running on QEMU could use a PSCI call to do this.) This doesn't
affect the imx6ul-evk board because it is uniprocessor.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Tested-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
Tested-by: Cédric Le Goater <clg@kaod.org>
Acked-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20220127154639.2090164-5-peter.maydell@linaro.org
2022-02-08 10:56:27 +00:00
Peter Maydell
817e2db8ce hw/arm/boot: Support setting psci-conduit based on guest EL
Currently we expect board code to set the psci-conduit property on
CPUs and ensure that secondary CPUs are created with the
start-powered-off property set to false, if the board wishes to use
QEMU's builtin PSCI emulation.  This worked OK for the virt board
where we first wanted to use it, because the virt board directly
creates its CPUs and is in a reasonable position to set those
properties.  For other boards which model real hardware and use a
separate SoC object, however, it is more awkward.  Most PSCI-using
boards just set the psci-conduit board unconditionally.

This was never strictly speaking correct (because you would not be
able to run EL3 guest firmware that itself provided the PSCI
interface, as the QEMU implementation would overrule it), but mostly
worked in practice because for non-PSCI SMC calls QEMU would emulate
the SMC instruction as normal (by trapping to guest EL3).  However,
we would like to make our PSCI emulation follow the part of the SMCC
specification that mandates that SMC calls with unknown function
identifiers return a failure code, which means that all SMC calls
will be handled by the PSCI code and the "emulate as normal" path
will no longer be taken.

We tried to implement that in commit 9fcd15b919
("arm: tcg: Adhere to SMCCC 1.3 section 5.2"), but this
regressed attempts to run EL3 guest code on the affected boards:
 * mcimx6ul-evk, mcimx7d-sabre, orangepi, xlnx-zcu102
 * for the case only of EL3 code loaded via -kernel (and
   not via -bios or -pflash), virt and xlnx-versal-virt
so for the 7.0 release we reverted it (in commit 4825eaae4f).

This commit provides a mechanism that boards can use to arrange that
psci-conduit is set if running guest code at a low enough EL but not
if it would be running at the same EL that the conduit implies that
the QEMU PSCI implementation is using.  (Later commits will convert
individual board models to use this mechanism.)

We do this by moving the setting of the psci-conduit and
start-powered-off properties to arm_load_kernel().  Boards which want
to potentially use emulated PSCI must set a psci_conduit field in the
arm_boot_info struct to the type of conduit they want to use (SMC or
HVC); arm_load_kernel() will then set the CPUs up accordingly if it
is not going to start the guest code at the same or higher EL as the
fake QEMU firmware would be at.

Board/SoC code which uses this mechanism should no longer set the CPU
psci-conduit property directly.  It should only set the
start-powered-off property for secondaries if EL3 guest firmware
running bare metal expects that rather than the alternative "all CPUs
start executing the firmware at once".

Note that when calculating whether we are going to run guest
code at EL3, we ignore the setting of arm_boot_info::secure_board_setup,
which might cause us to run a stub bit of guest code at EL3 which
does some board-specific setup before dropping to EL2 or EL1 to
run the guest kernel. This is OK because only one board that
enables PSCI sets secure_board_setup (the highbank board), and
the stub code it writes will behave the same way whether the
one SMC call it makes is handled by "emulate the SMC" or by
"PSCI default returns an error code". So we can leave that stub
code in place until after we've changed the PSCI default behaviour;
at that point we will remove it.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Tested-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
Tested-by: Cédric Le Goater <clg@kaod.org>
Message-id: 20220127154639.2090164-4-peter.maydell@linaro.org
2022-02-08 10:56:27 +00:00
Peter Maydell
0c3c25fcda cpu.c: Make start-powered-off settable after realize
The CPU object's start-powered-off property is currently only
settable before the CPU object is realized.  For arm machines this is
awkward, because we would like to decide whether the CPU should be
powered-off based on how we are booting the guest code, which is
something done in the machine model code and in common code called by
the machine model, which runs much later and in completely different
parts of the codebase from the SoC object code that is responsible
for creating and realizing the CPU objects.

Allow start-powered-off to be set after realize.  Since this isn't
something that's supported by the DEFINE_PROP_* macros, we have to
switch the property definition to use the
object_class_property_add_bool() function.

Note that it doesn't conceptually make sense to change the setting of
the property after the machine has been completely initialized,
beacuse this would mean that the behaviour of the machine when first
started would differ from its behaviour when the system is
subsequently reset.  (It would also require the underlying state to
be migrated, which we don't do.)

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Tested-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
Tested-by: Cédric Le Goater <clg@kaod.org>
Message-id: 20220127154639.2090164-3-peter.maydell@linaro.org
2022-02-08 10:56:27 +00:00
Peter Maydell
bddd892ef1 target/arm: make psci-conduit settable after realize
We want to allow the psci-conduit property to be set after realize,
because the parts of the code which are best placed to decide if it's
OK to enable QEMU's builtin PSCI emulation (the board code and the
arm_load_kernel() function are distant from the code which creates
and realizes CPUs (typically inside an SoC object's init and realize
method) and run afterwards.

Since the DEFINE_PROP_* macros don't have support for creating
properties which can be changed after realize, change the property to
be created with object_property_add_uint32_ptr(), which is what we
already use in this function for creating settable-after-realize
properties like init-svtor and init-nsvtor.

Note that it doesn't conceptually make sense to change the setting of
the property after the machine has been completely initialized,
beacuse this would mean that the behaviour of the machine when first
started would differ from its behaviour when the system is
subsequently reset.  (It would also require the underlying state to
be migrated, which we don't do.)

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Tested-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
Tested-by: Cédric Le Goater <clg@kaod.org>
Message-id: 20220127154639.2090164-2-peter.maydell@linaro.org
2022-02-08 10:56:27 +00:00
Francisco Iglesias
c74ccb5dd6 hw/arm/xlnx-zynqmp: 'Or' the QSPI / QSPI DMA IRQs
'Or' the IRQs coming from the QSPI and QSPI DMA models. This is done for
avoiding the situation where one of the models incorrectly deasserts an
interrupt asserted from the other model (which will result in that the IRQ
is lost and will not reach guest SW).

Signed-off-by: Francisco Iglesias <francisco.iglesias@xilinx.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Luc Michel <luc@lmichel.fr>
Message-id: 20220203151742.1457-1-francisco.iglesias@xilinx.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2022-02-08 10:56:27 +00:00
Richard Henderson
a7b66ada6e target/arm: Use CPTR_TFP with CPTR_EL3 in fp_exception_el
Use the named bit rather than a bare extract32.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Zenghui Yu <yuzenghui@huawei.com>
Message-id: 20220127063428.30212-5-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2022-02-08 10:56:27 +00:00
Richard Henderson
d5a6fa2dcf target/arm: Fix {fp, sve}_exception_el for VHE mode running
When HCR_EL2.E2H is set, the format of CPTR_EL2 changes to
look more like CPACR_EL1, with ZEN and FPEN fields instead
of TZ and TFP fields.

Reported-by: Zenghui Yu <yuzenghui@huawei.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Message-id: 20220127063428.30212-4-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2022-02-08 10:56:27 +00:00
Richard Henderson
7701cee545 target/arm: Tidy sve_exception_el for CPACR_EL1 access
Extract entire fields for ZEN and FPEN, rather than testing specific bits.
This makes it easier to follow the code versus the ARM spec.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Zenghui Yu <yuzenghui@huawei.com>
Message-id: 20220127063428.30212-3-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2022-02-08 10:56:27 +00:00
Richard Henderson
63888fa78b target/arm: Fix sve_zcr_len_for_el for VHE mode running
When HCR_EL2.{E2H,TGE} == '11', ZCR_EL1 is unused.

Reported-by: Zenghui Yu <yuzenghui@huawei.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Zenghui Yu <yuzenghui@huawei.com>
Message-id: 20220127063428.30212-2-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2022-02-08 10:56:27 +00:00
Peter Maydell
55ef0b702b Linux-user pull request 20220207
Fix target rlimits for alpha
 Add startime in /proc/self/stat
 -----BEGIN PGP SIGNATURE-----
 
 iQJGBAABCAAwFiEEzS913cjjpNwuT1Fz8ww4vT8vvjwFAmIA1+8SHGxhdXJlbnRA
 dml2aWVyLmV1AAoJEPMMOL0/L748WBgP/1eXts9XNdAbcrAcWbH2DTRxFPdv4HDm
 9FvU+kFjuRFkv72vaXgdkyrqO9O/Pl0eooo4uYlAeY1xSTUbX8EwqAbgLB0zVdb9
 D51gQTCok87Z/qrbf3QpZAca3hGzYwqYumvvKfNRaaH5iH2ElSe50+QNMnW6P2zG
 yRtjAl5Wi0GkJNipAgr+9JcrQPDLnVjSe0VMPS9Q4yUdnBJvW/KcBswcRdXxH9BK
 AG+aopElApZnnkZnwdhcRZSx+juUANB2vCGX4pUbixxcY+oFsXumup5T3GeaoKWs
 kpvS7tf1GKUdT08oCu0LlSoh6vLKUgkpBo93IsCvc2wF+f9roG0OAtrC7fSNJopa
 AJnHIvJAYj1zJRsDbGWCK629gRDjEE7Y0rykVkI1ZCp7OIHJngQT0MUrF6OYTXjX
 5G6B9iPw3oL2sfRG3eINw20fM1NdQHvGkQq/+UnvWdDfNy6FWFk8dDGyFamg9iBm
 tFtCTATbe2u75KJdv2qcQeo3Pd1YZySjuWzx2mgud552t8UyBcsQUewIAxL2Yw87
 oDI4yMPw5oJvVsSSxYAZKAwGK05ofCrGqbr0maXQ01L9LwCvgIs9wIECQjmKPfqD
 vdBRTq4muys65mv17/U2mGmDPpgf0xAgUFnWpXF/BygClVv0TsRVkZuF9+MoCU1r
 TreVmVoyGq3c
 =ClMx
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/lvivier-gitlab/tags/linux-user-for-7.0-pull-request' into staging

Linux-user pull request 20220207

Fix target rlimits for alpha
Add startime in /proc/self/stat

# gpg: Signature made Mon 07 Feb 2022 08:27:27 GMT
# gpg:                using RSA key CD2F75DDC8E3A4DC2E4F5173F30C38BD3F2FBE3C
# gpg:                issuer "laurent@vivier.eu"
# gpg: Good signature from "Laurent Vivier <lvivier@redhat.com>" [full]
# gpg:                 aka "Laurent Vivier <laurent@vivier.eu>" [full]
# gpg:                 aka "Laurent Vivier (Red Hat) <lvivier@redhat.com>" [full]
# Primary key fingerprint: CD2F 75DD C8E3 A4DC 2E4F  5173 F30C 38BD 3F2F BE3C

* remotes/lvivier-gitlab/tags/linux-user-for-7.0-pull-request:
  linux-user/syscall: Translate TARGET_RLIMIT_RTTIME
  linux-user: Move generic TARGET_RLIMIT* definitions to generic/target_resource.h
  linux-user: Implement starttime field in self stat emulation
  linux-user: sigprocmask check read perms first
  linux-user: rt_sigprocmask, check read perms first
  linux-user: Fix inotify on aarch64
  linux-user/alpha: Fix target rlimits for alpha and rearrange for clarity
  linux-user: Remove unnecessary 'aligned' attribute from TaskState

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2022-02-07 10:48:25 +00:00
Peter Maydell
0d564a3e32 virtio,pc: features, cleanups, fixes
Part of ACPI ERST support
 fixes, cleanups
 
 Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
 -----BEGIN PGP SIGNATURE-----
 
 iQFCBAABCAAtFiEEXQn9CHHI+FuUyooNKB8NuNKNVGkFAmH/lpgPHG1zdEByZWRo
 YXQuY29tAAoJECgfDbjSjVRpntwH+LTJ3MIX5tHL2FWR9vfQoIOQms4A2YJb5GFv
 f/wZMQ4Hx/4k3KsicJF4ONJ04cT4IuxtsY9WtUroNcpuh9qy+cMYw61xsd7oa2DB
 k7vInrVaDP1pKmNzK+R1DV4VsbghAZzCh23hKKS8HGOd+DM6PWSozzCSpbQGerNX
 H61bS7lvESLONhHIJdmo5/d4aGEGlt5xs2KzZe/pwl8OU4/WRYK8MgEKOhvAZT7T
 Ups0IDSAOJ5pqzXCLESKFfImNIzak16+lnY9iOMcIFWQVEphMvjkn9jtCu4wvDRe
 500GC8i9Q8X3B/D5Y9TB78mDuiqfQP69zBRfhSjgQeU9+eWCxA==
 =JIT5
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/mst/tags/for_upstream' into staging

virtio,pc: features, cleanups, fixes

Part of ACPI ERST support
fixes, cleanups

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

# gpg: Signature made Sun 06 Feb 2022 09:36:24 GMT
# gpg:                using RSA key 5D09FD0871C8F85B94CA8A0D281F0DB8D28D5469
# gpg:                issuer "mst@redhat.com"
# gpg: Good signature from "Michael S. Tsirkin <mst@kernel.org>" [full]
# gpg:                 aka "Michael S. Tsirkin <mst@redhat.com>" [full]
# Primary key fingerprint: 0270 606B 6F3C DF3D 0B17  0970 C350 3912 AFBE 8E67
#      Subkey fingerprint: 5D09 FD08 71C8 F85B 94CA  8A0D 281F 0DB8 D28D 5469

* remotes/mst/tags/for_upstream: (24 commits)
  util/oslib-posix: Fix missing unlock in the error path of os_mem_prealloc()
  ACPI ERST: step 6 of bios-tables-test.c
  ACPI ERST: bios-tables-test testcase
  ACPI ERST: qtest for ERST
  ACPI ERST: create ACPI ERST table for pc/x86 machines
  ACPI ERST: build the ACPI ERST table
  ACPI ERST: support for ACPI ERST feature
  ACPI ERST: header file for ERST
  ACPI ERST: PCI device_id for ERST
  ACPI ERST: bios-tables-test.c steps 1 and 2
  libvhost-user: Map shared RAM with MAP_NORESERVE to support virtio-mem with hugetlb
  libvhost-user: handle removal of identical regions
  libvhost-user: prevent over-running max RAM slots
  libvhost-user: fix VHOST_USER_REM_MEM_REG not closing the fd
  libvhost-user: Simplify VHOST_USER_REM_MEM_REG
  libvhost-user: Add vu_add_mem_reg input validation
  libvhost-user: Add vu_rem_mem_reg input validation
  tests: acpi: test short OEM_ID/OEM_TABLE_ID values in test_oem_fields()
  tests: acpi: update expected blobs
  acpi: fix OEM ID/OEM Table ID padding
  ...

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2022-02-06 10:46:46 +00:00
David Hildenbrand
dd4fc60585 util/oslib-posix: Fix missing unlock in the error path of os_mem_prealloc()
We're missing an unlock in case installing the signal handler failed.
Fortunately, we barely see this error in real life.

Fixes: a960d6642d ("util/oslib-posix: Support concurrent os_mem_prealloc() invocation")
Fixes: CID 1468941
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Pankaj Gupta <pankaj.gupta@ionos.com>
Cc: Daniel P. Berrangé <berrange@redhat.com>
Cc: Michal Privoznik <mprivozn@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20220111120830.119912-1-david@redhat.com>
Reviewed-by: Pankaj Gupta <pankaj.gupta@ionos.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-02-06 04:33:50 -05:00
Eric DeVolder
a4752a51f1 ACPI ERST: step 6 of bios-tables-test.c
Following the guidelines in tests/qtest/bios-tables-test.c, this
is step 6.

Below is the disassembly of tests/data/acpi/pc/ERST.acpierst.

 /*
  * Intel ACPI Component Architecture
  * AML/ASL+ Disassembler version 20180508 (64-bit version)
  * Copyright (c) 2000 - 2018 Intel Corporation
  *
  * Disassembly of tests/data/acpi/pc/ERST.acpierst, Thu Dec  2 13:32:07 2021
  *
  * ACPI Data Table [ERST]
  *
  * Format: [HexOffset DecimalOffset ByteLength]  FieldName : FieldValue
  */

 [000h 0000   4]                    Signature : "ERST"    [Error Record Serialization Table]
 [004h 0004   4]                 Table Length : 00000390
 [008h 0008   1]                     Revision : 01
 [009h 0009   1]                     Checksum : D6
 [00Ah 0010   6]                       Oem ID : "BOCHS "
 [010h 0016   8]                 Oem Table ID : "BXPC    "
 [018h 0024   4]                 Oem Revision : 00000001
 [01Ch 0028   4]              Asl Compiler ID : "BXPC"
 [020h 0032   4]        Asl Compiler Revision : 00000001

 [024h 0036   4]  Serialization Header Length : 00000030
 [028h 0040   4]                     Reserved : 00000000
 [02Ch 0044   4]      Instruction Entry Count : 0000001B

 [030h 0048   1]                       Action : 00 [Begin Write Operation]
 [031h 0049   1]                  Instruction : 03 [Write Register Value]
 [032h 0050   1]        Flags (decoded below) : 00
                       Preserve Register Bits : 0
 [033h 0051   1]                     Reserved : 00

 [034h 0052  12]              Register Region : [Generic Address Structure]
 [034h 0052   1]                     Space ID : 00 [SystemMemory]
 [035h 0053   1]                    Bit Width : 20
 [036h 0054   1]                   Bit Offset : 00
 [037h 0055   1]         Encoded Access Width : 03 [DWord Access:32]
 [038h 0056   8]                      Address : 00000000FEBF3000

 [040h 0064   8]                        Value : 0000000000000000
 [048h 0072   8]                         Mask : 00000000000000FF

 [050h 0080   1]                       Action : 01 [Begin Read Operation]
 [051h 0081   1]                  Instruction : 03 [Write Register Value]
 [052h 0082   1]        Flags (decoded below) : 00
                       Preserve Register Bits : 0
 [053h 0083   1]                     Reserved : 00

 [054h 0084  12]              Register Region : [Generic Address Structure]
 [054h 0084   1]                     Space ID : 00 [SystemMemory]
 [055h 0085   1]                    Bit Width : 20
 [056h 0086   1]                   Bit Offset : 00
 [057h 0087   1]         Encoded Access Width : 03 [DWord Access:32]
 [058h 0088   8]                      Address : 00000000FEBF3000

 [060h 0096   8]                        Value : 0000000000000001
 [068h 0104   8]                         Mask : 00000000000000FF

 [070h 0112   1]                       Action : 02 [Begin Clear Operation]
 [071h 0113   1]                  Instruction : 03 [Write Register Value]
 [072h 0114   1]        Flags (decoded below) : 00
                       Preserve Register Bits : 0
 [073h 0115   1]                     Reserved : 00

 [074h 0116  12]              Register Region : [Generic Address Structure]
 [074h 0116   1]                     Space ID : 00 [SystemMemory]
 [075h 0117   1]                    Bit Width : 20
 [076h 0118   1]                   Bit Offset : 00
 [077h 0119   1]         Encoded Access Width : 03 [DWord Access:32]
 [078h 0120   8]                      Address : 00000000FEBF3000

 [080h 0128   8]                        Value : 0000000000000002
 [088h 0136   8]                         Mask : 00000000000000FF

 [090h 0144   1]                       Action : 03 [End Operation]
 [091h 0145   1]                  Instruction : 03 [Write Register Value]
 [092h 0146   1]        Flags (decoded below) : 00
                       Preserve Register Bits : 0
 [093h 0147   1]                     Reserved : 00

 [094h 0148  12]              Register Region : [Generic Address Structure]
 [094h 0148   1]                     Space ID : 00 [SystemMemory]
 [095h 0149   1]                    Bit Width : 20
 [096h 0150   1]                   Bit Offset : 00
 [097h 0151   1]         Encoded Access Width : 03 [DWord Access:32]
 [098h 0152   8]                      Address : 00000000FEBF3000

 [0A0h 0160   8]                        Value : 0000000000000003
 [0A8h 0168   8]                         Mask : 00000000000000FF

 [0B0h 0176   1]                       Action : 04 [Set Record Offset]
 [0B1h 0177   1]                  Instruction : 02 [Write Register]
 [0B2h 0178   1]        Flags (decoded below) : 00
                       Preserve Register Bits : 0
 [0B3h 0179   1]                     Reserved : 00

 [0B4h 0180  12]              Register Region : [Generic Address Structure]
 [0B4h 0180   1]                     Space ID : 00 [SystemMemory]
 [0B5h 0181   1]                    Bit Width : 20
 [0B6h 0182   1]                   Bit Offset : 00
 [0B7h 0183   1]         Encoded Access Width : 03 [DWord Access:32]
 [0B8h 0184   8]                      Address : 00000000FEBF3008

 [0C0h 0192   8]                        Value : 0000000000000000
 [0C8h 0200   8]                         Mask : 00000000FFFFFFFF

 [0D0h 0208   1]                       Action : 04 [Set Record Offset]
 [0D1h 0209   1]                  Instruction : 03 [Write Register Value]
 [0D2h 0210   1]        Flags (decoded below) : 00
                       Preserve Register Bits : 0
 [0D3h 0211   1]                     Reserved : 00

 [0D4h 0212  12]              Register Region : [Generic Address Structure]
 [0D4h 0212   1]                     Space ID : 00 [SystemMemory]
 [0D5h 0213   1]                    Bit Width : 20
 [0D6h 0214   1]                   Bit Offset : 00
 [0D7h 0215   1]         Encoded Access Width : 03 [DWord Access:32]
 [0D8h 0216   8]                      Address : 00000000FEBF3000

 [0E0h 0224   8]                        Value : 0000000000000004
 [0E8h 0232   8]                         Mask : 00000000000000FF

 [0F0h 0240   1]                       Action : 05 [Execute Operation]
 [0F1h 0241   1]                  Instruction : 03 [Write Register Value]
 [0F2h 0242   1]        Flags (decoded below) : 00
                       Preserve Register Bits : 0
 [0F3h 0243   1]                     Reserved : 00

 [0F4h 0244  12]              Register Region : [Generic Address Structure]
 [0F4h 0244   1]                     Space ID : 00 [SystemMemory]
 [0F5h 0245   1]                    Bit Width : 20
 [0F6h 0246   1]                   Bit Offset : 00
 [0F7h 0247   1]         Encoded Access Width : 03 [DWord Access:32]
 [0F8h 0248   8]                      Address : 00000000FEBF3008

 [100h 0256   8]                        Value : 000000000000009C
 [108h 0264   8]                         Mask : 00000000000000FF

 [110h 0272   1]                       Action : 05 [Execute Operation]
 [111h 0273   1]                  Instruction : 03 [Write Register Value]
 [112h 0274   1]        Flags (decoded below) : 00
                       Preserve Register Bits : 0
 [113h 0275   1]                     Reserved : 00

 [114h 0276  12]              Register Region : [Generic Address Structure]
 [114h 0276   1]                     Space ID : 00 [SystemMemory]
 [115h 0277   1]                    Bit Width : 20
 [116h 0278   1]                   Bit Offset : 00
 [117h 0279   1]         Encoded Access Width : 03 [DWord Access:32]
 [118h 0280   8]                      Address : 00000000FEBF3000

 [120h 0288   8]                        Value : 0000000000000005
 [128h 0296   8]                         Mask : 00000000000000FF

 [130h 0304   1]                       Action : 06 [Check Busy Status]
 [131h 0305   1]                  Instruction : 03 [Write Register Value]
 [132h 0306   1]        Flags (decoded below) : 00
                       Preserve Register Bits : 0
 [133h 0307   1]                     Reserved : 00

 [134h 0308  12]              Register Region : [Generic Address Structure]
 [134h 0308   1]                     Space ID : 00 [SystemMemory]
 [135h 0309   1]                    Bit Width : 20
 [136h 0310   1]                   Bit Offset : 00
 [137h 0311   1]         Encoded Access Width : 03 [DWord Access:32]
 [138h 0312   8]                      Address : 00000000FEBF3000

 [140h 0320   8]                        Value : 0000000000000006
 [148h 0328   8]                         Mask : 00000000000000FF

 [150h 0336   1]                       Action : 06 [Check Busy Status]
 [151h 0337   1]                  Instruction : 01 [Read Register Value]
 [152h 0338   1]        Flags (decoded below) : 00
                       Preserve Register Bits : 0
 [153h 0339   1]                     Reserved : 00

 [154h 0340  12]              Register Region : [Generic Address Structure]
 [154h 0340   1]                     Space ID : 00 [SystemMemory]
 [155h 0341   1]                    Bit Width : 20
 [156h 0342   1]                   Bit Offset : 00
 [157h 0343   1]         Encoded Access Width : 03 [DWord Access:32]
 [158h 0344   8]                      Address : 00000000FEBF3008

 [160h 0352   8]                        Value : 0000000000000001
 [168h 0360   8]                         Mask : 00000000000000FF

 [170h 0368   1]                       Action : 07 [Get Command Status]
 [171h 0369   1]                  Instruction : 03 [Write Register Value]
 [172h 0370   1]        Flags (decoded below) : 00
                       Preserve Register Bits : 0
 [173h 0371   1]                     Reserved : 00

 [174h 0372  12]              Register Region : [Generic Address Structure]
 [174h 0372   1]                     Space ID : 00 [SystemMemory]
 [175h 0373   1]                    Bit Width : 20
 [176h 0374   1]                   Bit Offset : 00
 [177h 0375   1]         Encoded Access Width : 03 [DWord Access:32]
 [178h 0376   8]                      Address : 00000000FEBF3000

 [180h 0384   8]                        Value : 0000000000000007
 [188h 0392   8]                         Mask : 00000000000000FF

 [190h 0400   1]                       Action : 07 [Get Command Status]
 [191h 0401   1]                  Instruction : 00 [Read Register]
 [192h 0402   1]        Flags (decoded below) : 00
                       Preserve Register Bits : 0
 [193h 0403   1]                     Reserved : 00

 [194h 0404  12]              Register Region : [Generic Address Structure]
 [194h 0404   1]                     Space ID : 00 [SystemMemory]
 [195h 0405   1]                    Bit Width : 20
 [196h 0406   1]                   Bit Offset : 00
 [197h 0407   1]         Encoded Access Width : 03 [DWord Access:32]
 [198h 0408   8]                      Address : 00000000FEBF3008

 [1A0h 0416   8]                        Value : 0000000000000000
 [1A8h 0424   8]                         Mask : 00000000000000FF

 [1B0h 0432   1]                       Action : 08 [Get Record Identifier]
 [1B1h 0433   1]                  Instruction : 03 [Write Register Value]
 [1B2h 0434   1]        Flags (decoded below) : 00
                       Preserve Register Bits : 0
 [1B3h 0435   1]                     Reserved : 00

 [1B4h 0436  12]              Register Region : [Generic Address Structure]
 [1B4h 0436   1]                     Space ID : 00 [SystemMemory]
 [1B5h 0437   1]                    Bit Width : 20
 [1B6h 0438   1]                   Bit Offset : 00
 [1B7h 0439   1]         Encoded Access Width : 03 [DWord Access:32]
 [1B8h 0440   8]                      Address : 00000000FEBF3000

 [1C0h 0448   8]                        Value : 0000000000000008
 [1C8h 0456   8]                         Mask : 00000000000000FF

 [1D0h 0464   1]                       Action : 08 [Get Record Identifier]
 [1D1h 0465   1]                  Instruction : 00 [Read Register]
 [1D2h 0466   1]        Flags (decoded below) : 00
                       Preserve Register Bits : 0
 [1D3h 0467   1]                     Reserved : 00

 [1D4h 0468  12]              Register Region : [Generic Address Structure]
 [1D4h 0468   1]                     Space ID : 00 [SystemMemory]
 [1D5h 0469   1]                    Bit Width : 40
 [1D6h 0470   1]                   Bit Offset : 00
 [1D7h 0471   1]         Encoded Access Width : 04 [QWord Access:64]
 [1D8h 0472   8]                      Address : 00000000FEBF3008

 [1E0h 0480   8]                        Value : 0000000000000000
 [1E8h 0488   8]                         Mask : FFFFFFFFFFFFFFFF

 [1F0h 0496   1]                       Action : 09 [Set Record Identifier]
 [1F1h 0497   1]                  Instruction : 02 [Write Register]
 [1F2h 0498   1]        Flags (decoded below) : 00
                       Preserve Register Bits : 0
 [1F3h 0499   1]                     Reserved : 00

 [1F4h 0500  12]              Register Region : [Generic Address Structure]
 [1F4h 0500   1]                     Space ID : 00 [SystemMemory]
 [1F5h 0501   1]                    Bit Width : 40
 [1F6h 0502   1]                   Bit Offset : 00
 [1F7h 0503   1]         Encoded Access Width : 04 [QWord Access:64]
 [1F8h 0504   8]                      Address : 00000000FEBF3008

 [200h 0512   8]                        Value : 0000000000000000
 [208h 0520   8]                         Mask : FFFFFFFFFFFFFFFF

 [210h 0528   1]                       Action : 09 [Set Record Identifier]
 [211h 0529   1]                  Instruction : 03 [Write Register Value]
 [212h 0530   1]        Flags (decoded below) : 00
                       Preserve Register Bits : 0
 [213h 0531   1]                     Reserved : 00

 [214h 0532  12]              Register Region : [Generic Address Structure]
 [214h 0532   1]                     Space ID : 00 [SystemMemory]
 [215h 0533   1]                    Bit Width : 20
 [216h 0534   1]                   Bit Offset : 00
 [217h 0535   1]         Encoded Access Width : 03 [DWord Access:32]
 [218h 0536   8]                      Address : 00000000FEBF3000

 [220h 0544   8]                        Value : 0000000000000009
 [228h 0552   8]                         Mask : 00000000000000FF

 [230h 0560   1]                       Action : 0A [Get Record Count]
 [231h 0561   1]                  Instruction : 03 [Write Register Value]
 [232h 0562   1]        Flags (decoded below) : 00
                       Preserve Register Bits : 0
 [233h 0563   1]                     Reserved : 00

 [234h 0564  12]              Register Region : [Generic Address Structure]
 [234h 0564   1]                     Space ID : 00 [SystemMemory]
 [235h 0565   1]                    Bit Width : 20
 [236h 0566   1]                   Bit Offset : 00
 [237h 0567   1]         Encoded Access Width : 03 [DWord Access:32]
 [238h 0568   8]                      Address : 00000000FEBF3000

 [240h 0576   8]                        Value : 000000000000000A
 [248h 0584   8]                         Mask : 00000000000000FF

 [250h 0592   1]                       Action : 0A [Get Record Count]
 [251h 0593   1]                  Instruction : 00 [Read Register]
 [252h 0594   1]        Flags (decoded below) : 00
                       Preserve Register Bits : 0
 [253h 0595   1]                     Reserved : 00

 [254h 0596  12]              Register Region : [Generic Address Structure]
 [254h 0596   1]                     Space ID : 00 [SystemMemory]
 [255h 0597   1]                    Bit Width : 20
 [256h 0598   1]                   Bit Offset : 00
 [257h 0599   1]         Encoded Access Width : 03 [DWord Access:32]
 [258h 0600   8]                      Address : 00000000FEBF3008

 [260h 0608   8]                        Value : 0000000000000000
 [268h 0616   8]                         Mask : 00000000FFFFFFFF

 [270h 0624   1]                       Action : 0B [Begin Dummy Write]
 [271h 0625   1]                  Instruction : 03 [Write Register Value]
 [272h 0626   1]        Flags (decoded below) : 00
                       Preserve Register Bits : 0
 [273h 0627   1]                     Reserved : 00

 [274h 0628  12]              Register Region : [Generic Address Structure]
 [274h 0628   1]                     Space ID : 00 [SystemMemory]
 [275h 0629   1]                    Bit Width : 20
 [276h 0630   1]                   Bit Offset : 00
 [277h 0631   1]         Encoded Access Width : 03 [DWord Access:32]
 [278h 0632   8]                      Address : 00000000FEBF3000

 [280h 0640   8]                        Value : 000000000000000B
 [288h 0648   8]                         Mask : 00000000000000FF

 [290h 0656   1]                       Action : 0D [Get Error Address Range]
 [291h 0657   1]                  Instruction : 03 [Write Register Value]
 [292h 0658   1]        Flags (decoded below) : 00
                       Preserve Register Bits : 0
 [293h 0659   1]                     Reserved : 00

 [294h 0660  12]              Register Region : [Generic Address Structure]
 [294h 0660   1]                     Space ID : 00 [SystemMemory]
 [295h 0661   1]                    Bit Width : 20
 [296h 0662   1]                   Bit Offset : 00
 [297h 0663   1]         Encoded Access Width : 03 [DWord Access:32]
 [298h 0664   8]                      Address : 00000000FEBF3000

 [2A0h 0672   8]                        Value : 000000000000000D
 [2A8h 0680   8]                         Mask : 00000000000000FF

 [2B0h 0688   1]                       Action : 0D [Get Error Address Range]
 [2B1h 0689   1]                  Instruction : 00 [Read Register]
 [2B2h 0690   1]        Flags (decoded below) : 00
                       Preserve Register Bits : 0
 [2B3h 0691   1]                     Reserved : 00

 [2B4h 0692  12]              Register Region : [Generic Address Structure]
 [2B4h 0692   1]                     Space ID : 00 [SystemMemory]
 [2B5h 0693   1]                    Bit Width : 40
 [2B6h 0694   1]                   Bit Offset : 00
 [2B7h 0695   1]         Encoded Access Width : 04 [QWord Access:64]
 [2B8h 0696   8]                      Address : 00000000FEBF3008

 [2C0h 0704   8]                        Value : 0000000000000000
 [2C8h 0712   8]                         Mask : FFFFFFFFFFFFFFFF

 [2D0h 0720   1]                       Action : 0E [Get Error Address Length]
 [2D1h 0721   1]                  Instruction : 03 [Write Register Value]
 [2D2h 0722   1]        Flags (decoded below) : 00
                       Preserve Register Bits : 0
 [2D3h 0723   1]                     Reserved : 00

 [2D4h 0724  12]              Register Region : [Generic Address Structure]
 [2D4h 0724   1]                     Space ID : 00 [SystemMemory]
 [2D5h 0725   1]                    Bit Width : 20
 [2D6h 0726   1]                   Bit Offset : 00
 [2D7h 0727   1]         Encoded Access Width : 03 [DWord Access:32]
 [2D8h 0728   8]                      Address : 00000000FEBF3000

 [2E0h 0736   8]                        Value : 000000000000000E
 [2E8h 0744   8]                         Mask : 00000000000000FF

 [2F0h 0752   1]                       Action : 0E [Get Error Address Length]
 [2F1h 0753   1]                  Instruction : 00 [Read Register]
 [2F2h 0754   1]        Flags (decoded below) : 00
                       Preserve Register Bits : 0
 [2F3h 0755   1]                     Reserved : 00

 [2F4h 0756  12]              Register Region : [Generic Address Structure]
 [2F4h 0756   1]                     Space ID : 00 [SystemMemory]
 [2F5h 0757   1]                    Bit Width : 40
 [2F6h 0758   1]                   Bit Offset : 00
 [2F7h 0759   1]         Encoded Access Width : 04 [QWord Access:64]
 [2F8h 0760   8]                      Address : 00000000FEBF3008

 [300h 0768   8]                        Value : 0000000000000000
 [308h 0776   8]                         Mask : 00000000FFFFFFFF

 [310h 0784   1]                       Action : 0F [Get Error Attributes]
 [311h 0785   1]                  Instruction : 03 [Write Register Value]
 [312h 0786   1]        Flags (decoded below) : 00
                       Preserve Register Bits : 0
 [313h 0787   1]                     Reserved : 00

 [314h 0788  12]              Register Region : [Generic Address Structure]
 [314h 0788   1]                     Space ID : 00 [SystemMemory]
 [315h 0789   1]                    Bit Width : 20
 [316h 0790   1]                   Bit Offset : 00
 [317h 0791   1]         Encoded Access Width : 03 [DWord Access:32]
 [318h 0792   8]                      Address : 00000000FEBF3000

 [320h 0800   8]                        Value : 000000000000000F
 [328h 0808   8]                         Mask : 00000000000000FF

 [330h 0816   1]                       Action : 0F [Get Error Attributes]
 [331h 0817   1]                  Instruction : 00 [Read Register]
 [332h 0818   1]        Flags (decoded below) : 00
                       Preserve Register Bits : 0
 [333h 0819   1]                     Reserved : 00

 [334h 0820  12]              Register Region : [Generic Address Structure]
 [334h 0820   1]                     Space ID : 00 [SystemMemory]
 [335h 0821   1]                    Bit Width : 20
 [336h 0822   1]                   Bit Offset : 00
 [337h 0823   1]         Encoded Access Width : 03 [DWord Access:32]
 [338h 0824   8]                      Address : 00000000FEBF3008

 [340h 0832   8]                        Value : 0000000000000000
 [348h 0840   8]                         Mask : 00000000FFFFFFFF

 [350h 0848   1]                       Action : 10 [Execute Timings]
 [351h 0849   1]                  Instruction : 03 [Write Register Value]
 [352h 0850   1]        Flags (decoded below) : 00
                       Preserve Register Bits : 0
 [353h 0851   1]                     Reserved : 00

 [354h 0852  12]              Register Region : [Generic Address Structure]
 [354h 0852   1]                     Space ID : 00 [SystemMemory]
 [355h 0853   1]                    Bit Width : 20
 [356h 0854   1]                   Bit Offset : 00
 [357h 0855   1]         Encoded Access Width : 03 [DWord Access:32]
 [358h 0856   8]                      Address : 00000000FEBF3000

 [360h 0864   8]                        Value : 0000000000000010
 [368h 0872   8]                         Mask : 00000000000000FF

 [370h 0880   1]                       Action : 10 [Execute Timings]
 [371h 0881   1]                  Instruction : 00 [Read Register]
 [372h 0882   1]        Flags (decoded below) : 00
                       Preserve Register Bits : 0
 [373h 0883   1]                     Reserved : 00

 [374h 0884  12]              Register Region : [Generic Address Structure]
 [374h 0884   1]                     Space ID : 00 [SystemMemory]
 [375h 0885   1]                    Bit Width : 40
 [376h 0886   1]                   Bit Offset : 00
 [377h 0887   1]         Encoded Access Width : 04 [QWord Access:64]
 [378h 0888   8]                      Address : 00000000FEBF3008

 [380h 0896   8]                        Value : 0000000000000000
 [388h 0904   8]                         Mask : FFFFFFFFFFFFFFFF

 Raw Table Data: Length 912 (0x390)

Note that the contents of tests/data/q35/ERST.acpierst and
tests/data/microvm/ERST.pcie are the same except for differences
due to assigned base address.

Files tests/data/pc/DSDT.acpierst and tests/data/acpi/q35/DSDT.acpierst
are new files (and are included as a result of 'make check' process).
Rather than provide the entire content, I am providing the differences
between pc/DSDT and pc/DSDT.acpierst, and the difference between
q35/DSDT and q35/DSDT.acpierst, with an explanation to follow.

diff pc/DSDT pc/DSDT.acpierst:
 @@ -5,13 +5,13 @@
   *
   * Disassembling to symbolic ASL+ operators
   *
 - * Disassembly of tests/data/acpi/pc/DSDT, Thu Dec  2 10:10:13 2021
 + * Disassembly of tests/data/acpi/pc/DSDT.acpierst, Thu Dec  2 12:59:36 2021
   *
   * Original Table Header:
   *     Signature        "DSDT"
 - *     Length           0x00001772 (6002)
 + *     Length           0x00001751 (5969)
   *     Revision         0x01 **** 32-bit table (V1), no 64-bit math support
 - *     Checksum         0x9E
 + *     Checksum         0x95
   *     OEM ID           "BOCHS "
   *     OEM Table ID     "BXPC    "
   *     OEM Revision     0x00000001 (1)
 @@ -964,16 +964,11 @@ DefinitionBlock ("", "DSDT", 1, "BOCHS "

              Device (S18)
              {
 -                Name (_SUN, 0x03)  // _SUN: Slot User Number
                  Name (_ADR, 0x00030000)  // _ADR: Address
 -                Method (_EJ0, 1, NotSerialized)  // _EJx: Eject Device
 -                {
 -                    PCEJ (BSEL, _SUN)
 -                }
 -
 +                Name (ASUN, 0x03)
                  Method (_DSM, 4, Serialized)  // _DSM: Device-Specific Method
                  {
 -                    Return (PDSM (Arg0, Arg1, Arg2, Arg3, BSEL, _SUN))
 +                    Return (PDSM (Arg0, Arg1, Arg2, Arg3, BSEL, ASUN))
                  }
              }

 @@ -1399,11 +1394,6 @@ DefinitionBlock ("", "DSDT", 1, "BOCHS "

              Method (DVNT, 2, NotSerialized)
              {
 -                If ((Arg0 & 0x08))
 -                {
 -                    Notify (S18, Arg1)
 -                }
 -
                  If ((Arg0 & 0x10))
                  {
                      Notify (S20, Arg1)

diff q35/DSDT and q35/DSDT.acpierst:
 @@ -5,13 +5,13 @@
   *
   * Disassembling to symbolic ASL+ operators
   *
 - * Disassembly of tests/data/acpi/q35/DSDT, Thu Dec  2 10:10:13 2021
 + * Disassembly of tests/data/acpi/q35/DSDT.acpierst, Thu Dec  2 12:59:36 2021
   *
   * Original Table Header:
   *     Signature        "DSDT"
 - *     Length           0x00002061 (8289)
 + *     Length           0x00002072 (8306)
   *     Revision         0x01 **** 32-bit table (V1), no 64-bit math support
 - *     Checksum         0xFA
 + *     Checksum         0x9A
   *     OEM ID           "BOCHS "
   *     OEM Table ID     "BXPC    "
   *     OEM Revision     0x00000001 (1)
 @@ -3278,6 +3278,11 @@ DefinitionBlock ("", "DSDT", 1, "BOCHS "
                  }
              }

 +            Device (S10)
 +            {
 +                Name (_ADR, 0x00020000)  // _ADR: Address
 +            }
 +
              Method (PCNT, 0, NotSerialized)
              {
              }

For both pc and q35, there is but a small difference between this
DSDT.acpierst and the corresponding DSDT. In both cases, the changes
occur under the hiearchy:

    Scope (\_SB)
    {
        Scope (PCI0)
        {

which leads me to believe that the change to the DSDT was needed
due to the introduction of the ERST PCI device.

And is explained in detail by Ani Sinha:
I have convinced myself of the changes we see in the DSDT tables.
On i440fx side, we are adding a non-hotpluggable pci device on slot 3.
So the changes we see are basically replacing an empty hotpluggable
slot on the pci root port with a non-hotplugggable device.
On q35, bsel on pcie root bus is not set (its not hotpluggable bus),
so the change basically adds the address enumeration for the device.

Signed-off-by: Eric DeVolder <eric.devolder@oracle.com>
Acked-by: Ani Sinha <ani@anisinha.ca>
Message-Id: <1643402289-22216-11-git-send-email-eric.devolder@oracle.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-02-06 04:33:50 -05:00
Eric DeVolder
646a793cc3 ACPI ERST: bios-tables-test testcase
This change implements the test suite checks for the ERST table.

Signed-off-by: Eric DeVolder <eric.devolder@oracle.com>
Reviewed-by: Ani Sinha <ani@anisinha.ca>
Message-Id: <1643402289-22216-10-git-send-email-eric.devolder@oracle.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-02-06 04:33:50 -05:00
Eric DeVolder
bd24550e5c ACPI ERST: qtest for ERST
This change provides a qtest that locates and then does a simple
interrogation of the ERST feature within the guest.

Signed-off-by: Eric DeVolder <eric.devolder@oracle.com>
Reviewed-by: Ani Sinha <ani@anisinha.ca>
Message-Id: <1643402289-22216-9-git-send-email-eric.devolder@oracle.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-02-06 04:33:50 -05:00
Eric DeVolder
8486f12f0b ACPI ERST: create ACPI ERST table for pc/x86 machines
This change exposes ACPI ERST support for x86 guests.

Signed-off-by: Eric DeVolder <eric.devolder@oracle.com>
Reviewed-by: Ani Sinha <ani@anisinha.ca>
Message-Id: <1643402289-22216-8-git-send-email-eric.devolder@oracle.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-02-06 04:33:50 -05:00
Eric DeVolder
c9cd06ca00 ACPI ERST: build the ACPI ERST table
This builds the ACPI ERST table to inform OSPM how to communicate
with the acpi-erst device.

Signed-off-by: Eric DeVolder <eric.devolder@oracle.com>
Reviewed-by: Ani Sinha <ani@anisinha.ca>
Message-Id: <1643402289-22216-7-git-send-email-eric.devolder@oracle.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-02-06 04:33:50 -05:00
Eric DeVolder
f7e26ffa59 ACPI ERST: support for ACPI ERST feature
This implements a PCI device for ACPI ERST. This implements the
non-NVRAM "mode" of operation for ERST as it is supported by
Linux and Windows.

Signed-off-by: Eric DeVolder <eric.devolder@oracle.com>
Reviewed-by: Ani Sinha <ani@anisinha.ca>
Message-Id: <1643402289-22216-6-git-send-email-eric.devolder@oracle.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-02-06 04:33:50 -05:00
Eric DeVolder
fb1c8f8966 ACPI ERST: header file for ERST
This change introduces the public defintions for ACPI ERST.

Signed-off-by: Eric DeVolder <eric.devolder@oracle.com>
Reviewed-by: Ani Sinha <ani@anisinha.ca>
Message-Id: <1643402289-22216-5-git-send-email-eric.devolder@oracle.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-02-06 04:33:42 -05:00
Eric DeVolder
22874353ea ACPI ERST: PCI device_id for ERST
This change reserves the PCI device_id for the new ACPI ERST
device.

Signed-off-by: Eric DeVolder <eric.devolder@oracle.com>
Acked-by: Igor Mammedov <imammedo@redhat.com>
Acked-by: Ani Sinha <ani@anisinha.ca>
Message-Id: <1643402289-22216-4-git-send-email-eric.devolder@oracle.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-02-06 04:33:42 -05:00
Eric DeVolder
922f48d37a ACPI ERST: bios-tables-test.c steps 1 and 2
Following the guidelines in tests/qtest/bios-tables-test.c, this
change adds empty placeholder files per step 1 for the new ERST
table, and excludes resulting changed files in bios-tables-test-allowed-diff.h
per step 2.

Signed-off-by: Eric DeVolder <eric.devolder@oracle.com>
Acked-by: Igor Mammedov <imammedo@redhat.com>
Message-Id: <1643402289-22216-2-git-send-email-eric.devolder@oracle.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-02-06 04:33:42 -05:00
David Hildenbrand
eb99baa9b3 libvhost-user: Map shared RAM with MAP_NORESERVE to support virtio-mem with hugetlb
For fd-based shared memory, MAP_NORESERVE is only effective for hugetlb,
otherwise it's ignored. Older Linux versions that didn't support
reservation of huge pages ignored MAP_NORESERVE completely.

The first client to mmap a hugetlb fd without MAP_NORESERVE will
trigger reservation of huge pages for the whole mmapped range. There are
two cases to consider:

1) QEMU mapped RAM without MAP_NORESERVE

We're not dealing with a sparse mapping, huge pages for the whole range
have already been reserved by QEMU. An additional mmap() without
MAP_NORESERVE won't have any effect on the reservation.

2) QEMU mapped RAM with MAP_NORESERVE

We're delaing with a sparse mapping, no huge pages should be reserved.
Further mappings without MAP_NORESERVE should be avoided.

For 1), it doesn't matter if we set MAP_NORESERVE or not, so we can
simply set it. For 2), we'd be overriding QEMUs decision and trigger
reservation of huge pages, which might just fail if there are not
sufficient huge pages around. We must map with MAP_NORESERVE.

This change is required to support virtio-mem with hugetlb: a
virtio-mem device mapped into the guest physical memory corresponds to
a sparse memory mapping and QEMU maps this memory with MAP_NORESERVE.
Whenever memory in that sparse region will be accessed by the VM, QEMU
populates huge pages for the affected range by preallocating memory
and handling any preallocation errors gracefully.

So let's map shared RAM with MAP_NORESERVE. As libvhost-user only
supports Linux, there shouldn't be anything to take care of in regard of
other OS support.

Without this change, libvhost-user will fail mapping the region if there
are currently not enough huge pages to perform the reservation:
 fv_panic: libvhost-user: region mmap error: Cannot allocate memory

Cc: "Marc-André Lureau" <marcandre.lureau@redhat.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Raphael Norwitz <raphael.norwitz@nutanix.com>
Cc: Stefan Hajnoczi <stefanha@redhat.com>
Cc: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20220111123939.132659-1-david@redhat.com>
Acked-by: Raphael Norwitz <raphael.norwitz@nutanix.com>
2022-02-04 09:07:43 -05:00
Raphael Norwitz
4fafedc9da libvhost-user: handle removal of identical regions
Today if QEMU (or any other VMM) has sent multiple copies of the same
region to a libvhost-user based backend and then attempts to remove the
region, only one instance of the region will be removed, leaving stale
copies of the region in dev->regions[].

This change resolves this by having vu_rem_mem_reg() iterate through all
regions in dev->regions[] and delete all matching regions.

Suggested-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Raphael Norwitz <raphael.norwitz@nutanix.com>
Message-Id: <20220117041050.19718-7-raphael.norwitz@nutanix.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
2022-02-04 09:07:43 -05:00
Raphael Norwitz
b906a23c33 libvhost-user: prevent over-running max RAM slots
When VHOST_USER_PROTOCOL_F_CONFIGURE_MEM_SLOTS support was added to
libvhost-user, no guardrails were added to protect against QEMU
attempting to hot-add too many RAM slots to a VM with a libvhost-user
based backed attached.

This change adds the missing error handling by introducing a check on
the number of RAM slots the device has available before proceeding to
process the VHOST_USER_ADD_MEM_REG message.

Suggested-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Raphael Norwitz <raphael.norwitz@nutanix.com>
Message-Id: <20220117041050.19718-6-raphael.norwitz@nutanix.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
2022-02-04 09:07:43 -05:00
David Hildenbrand
fa3d5483f0 libvhost-user: fix VHOST_USER_REM_MEM_REG not closing the fd
We end up not closing the file descriptor, resulting in leaking one
file descriptor for each VHOST_USER_REM_MEM_REG message.

Fixes: 875b9fd97b ("Support individual region unmap in libvhost-user")
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Raphael Norwitz <raphael.norwitz@nutanix.com>
Cc: "Marc-André Lureau" <marcandre.lureau@redhat.com>
Cc: Stefan Hajnoczi <stefanha@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Coiby Xu <coiby.xu@gmail.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Raphael Norwitz <raphael.norwitz@nutanix.com>
Message-Id: <20220117041050.19718-5-raphael.norwitz@nutanix.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-02-04 09:07:43 -05:00
David Hildenbrand
4fd5ca829a libvhost-user: Simplify VHOST_USER_REM_MEM_REG
Let's avoid having to manually copy all elements. Copy only the ones
necessary to close the hole and perform the operation in-place without
a second array.

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Raphael Norwitz <raphael.norwitz@nutanix.com>
Message-Id: <20220117041050.19718-4-raphael.norwitz@nutanix.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-02-04 09:07:43 -05:00
Raphael Norwitz
9f4e63491b libvhost-user: Add vu_add_mem_reg input validation
Today if multiple FDs are sent from the VMM to the backend in a
VHOST_USER_ADD_MEM_REG message, one FD will be mapped and the remaining
FDs will be leaked. Therefore if multiple FDs are sent we report an
error and fail the operation, closing all FDs in the message.

Likewise in case the VMM sends a message with a size less than that
of a memory region descriptor, we add a check to gracefully report an
error and fail the operation rather than crashing.

Signed-off-by: Raphael Norwitz <raphael.norwitz@nutanix.com>
Message-Id: <20220117041050.19718-3-raphael.norwitz@nutanix.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
2022-02-04 09:07:43 -05:00
Raphael Norwitz
316ee11144 libvhost-user: Add vu_rem_mem_reg input validation
Today if multiple FDs are sent from the VMM to the backend in a
VHOST_USER_REM_MEM_REG message, one FD will be unmapped and the remaining
FDs will be leaked. Therefore if multiple FDs are sent we report an
error and fail the operation, closing all FDs in the message.

Likewise in case the VMM sends a message with a size less than that of a
memory region descriptor, we add a check to gracefully report an error
and fail the operation rather than crashing.

Signed-off-by: Raphael Norwitz <raphael.norwitz@nutanix.com>
Message-Id: <20220117041050.19718-2-raphael.norwitz@nutanix.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
2022-02-04 09:07:43 -05:00
Igor Mammedov
408ca92634 tests: acpi: test short OEM_ID/OEM_TABLE_ID values in test_oem_fields()
Previous patch [1] added explicit whitespace padding to OEM_ID/OEM_TABLE_ID
values used in test_oem_fields() testcase to avoid false positive and
bisection issues when QEMU is switched to \0' padding. As result
testcase ceased to test values that were shorter than max possible
length values.

Update testcase to make sure that it's testing shorter IDs like it
used to before [2].

1) "tests: acpi: manually pad OEM_ID/OEM_TABLE_ID for  test_oem_fields() test"
2) 602b458201 ("acpi: Permit OEM ID and OEM table ID fields to be changed")

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Message-Id: <20220114142641.1727679-1-imammedo@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-02-04 09:07:43 -05:00