TYPE_X86_CPU now call cpu_exec_init() on realize, so we don't
need to set cannot_destroy_with_object_finalize_yet anymore.
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Assume that KVM would have returned the same on subsequent runs.
Abstract the memoizaiton pattern into macros and call it memorize as
adding the r makes it less obscure.
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Cluster x2APIC cannot work without KVM's x2apic API when the maximal
APIC ID is greater than 8 and only KVM's LAPIC can support x2APIC, so we
forbid other APICs and also the old KVM case with less than 9, to
simplify the code.
There is no point in enabling EIM in forbidden APICs, so we keep it
enabled only for the KVM APIC; unconditionally, because making the
option depend on KVM version would be a maintanance burden.
Old QEMUs would enable eim whenever intremap was on, which would trick
guests into thinking that they can enable cluster x2APIC even if any
interrupt destination would get clamped to 8 bits.
Depending on your configuration, QEMU could notice that the destination
LAPIC is not present and report it with a very non-obvious:
KVM: injection failed, MSI lost (Operation not permitted)
Or the guest could say something about unexpected interrupts, because
clamping leads to aliasing so interrupts were being delivered to
incorrect VCPUs.
KVM_X2APIC_API is the feature that allows us to enable EIM for KVM.
QEMU 2.7 allowed EIM whenever interrupt remapping was enabled. In order
to keep backward compatibility, we again allow guests to misbehave in
non-obvious ways, and make it the default for old machine types.
A user can enable the buggy mode it with "x-buggy-eim=on".
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
The default (auto) emulates the current behavior.
A user can now control EIM like
-device intel-iommu,intremap=on,eim=off
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
* there no point in configuring the device if realization is going to
fail, so move the check to the beginning,
* create a separate function for the check,
* use error_setg() instead error_report().
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
The MMIO interface to APIC only allowed 8 bit addresses, which is not
enough for 32 bit addresses from EIM remapping.
Intel stored upper 24 bits in the high MSI address, so use the same
technique. The technique is also used in KVM MSI interface.
Other APICs are unlikely to handle those upper bits.
Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
The MMIO based interface to APIC doesn't work well with MSIs that have
upper address bits set (remapped x2APIC MSIs). A specialized interface
is a quick and dirty way to avoid the shortcoming.
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Every configuration has only up to one APIC class and we'll be extending
the class with a function that can be called without an instanced
object, so a direct access to the class is convenient.
This patch will break compilation if some code uses apic_get_class()
with CONFIG_USER_ONLY.
Suggested-by: Eduardo Habkost <ehabkost@redhat.com>
Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
x86_cpu_filter_features() will be reused by code that shouldn't
print any warning. Move the warning code to a new
x86_cpu_report_filtered_features() function, and call it from
x86_cpu_realizefn().
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Add a new optional field to query-cpu-definitions schema:
"unavailable-features". It will contain a list of QOM properties
that prevent the CPU model from running in the current host.
Cc: David Hildenbrand <dahi@linux.vnet.ibm.com>
Cc: Michael Mueller <mimu@linux.vnet.ibm.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Cornelia Huck <cornelia.huck@de.ibm.com>
Cc: Jiri Denemark <jdenemar@redhat.com>
Cc: libvir-list@redhat.com
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Instead of treating the FP and SSE bits as special cases, add
them to the x86_ext_save_areas array. This will simplify the code
that calculates the supported xsave components and the size of
the xsave area.
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Instead of keeping the aliases inside the feature name arrays and
require parsing the strings, just register alias properties
manually. This simplifies the code for property registration and
lookup.
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Instead of translating the feature name entries when adding
property names, store the actual property names in the feature
name array.
For reference, here is the full list of functions that use
FeatureWordInfo::feat_names:
* x86_cpu_get_migratable_flags(): not affected, as it just
check for non-NULL values.
* report_unavailable_features(): informative only. It will
start printing feature names with hyphens.
* x86_cpu_list(): informative only. It will start printing
feature names with hyphens
* x86_cpu_register_feature_bit_props(): not affected, as it
was already calling feat2prop(). Now we can remove the
feat2prop() calls safely.
So, the only user-visible effect of this patch are the new names
being used in help and error messages for users.
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Instead of using custom feature name lookup code for
plus_features/minus_features, save the property names used in
"[+-]feature" and use object_property_set_bool() to set them.
We don't need a feat2prop() call because we now have alias
properties for the old names containing underscores.
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Registering the actual names containing underscores as aliases
will allow management software to be aware that the old
compatibility names are suported, and will make feat2prop() calls
unnecessary when using feature names.
Also, this will help us avoid making the code support underscores
on feature names that never had them in the first place. e.g.
"+tsc_deadline" was never supported and doesn't need to be
translated to "+tsc-deadline".
In other word: this will require less magic translation of
strings, and simple 1:1 match between the config options and
actual QOM properties.
Note that the underscores are still present in the
FeatureWordInfo::feat_names arrays, because
add_flagname_to_bitmaps() needs them to be kept. The next patches
will remove add_flagname_to_bitmaps() and will allow us to
finally remove the aliases from feat_names.
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
VME is already disabled automatically when using TCG. So, instead
of pretending it is there when reporting CPU model data on
query-cpu-* QMP commands (making every CPU model to be reported
as not runnable), we can disable it by default on all CPU models
when using TCG.
Do that by adding a tcg_default_props array that will work like
kvm_default_props.
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Instead of using the builtin_x86_defs array, use the QOM subclass
list to list CPU models on "-cpu ?" and "query-cpu-definitions".
Signed-off-by: Andreas Färber <afaerber@suse.de>
[ehabkost: copied code from a patch by Andreas:
"target-i386: QOM'ify CPU", from March 2012]
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
When vfio device is reset(encounter FLR, or bus reset), if need to do
bus reset(vfio_pci_hot_reset_one is called), vfio_pci_pre_reset &
vfio_pci_post_reset will be called twice.
Signed-off-by: Cao jin <caoj.fnst@cn.fujitsu.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Introductory comment for rtl8168 VFIO MSI-X quirk states:
At BAR2 offset 0x70 there is a dword data register,
offset 0x74 is a dword address register.
vfio: vfio_bar_read(0000:05:00.0:BAR2+0x70, 4) = 0xfee00398 // read data
Thus, correct offset for data read is 0x70,
but function vfio_rtl8168_quirk_data_read() wrongfully uses offset 0x74.
Signed-off-by: Thorsten Kohfeldt <thorsten.kohfeldt@gmx.de>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
In case the end-user calls qemu with -vfio-pci option without passing
either sysfsdev or host property value, the device is interpreted as
0000:00:00.0. Let's create a specific error message to guide the end-user.
Signed-off-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
The returned value (either -errno or -1) is not used anymore by the caller,
vfio_realize, since the error now is stored in the error object. So let's
remove it.
Signed-off-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
The returned value is not used anymore by the caller, vfio_realize,
since the error now is stored in the error object. So let's remove it.
Signed-off-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
This patch converts VFIO PCI to realize function.
Also original initfn errors now are propagated using QEMU
error objects. All errors are formatted with the same pattern:
"vfio: %s: the error description"
Signed-off-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
This patch propagates errors encountered during vfio_base_device_init
up to the realize function.
In case the host value is not set or badly formed we now report an
error.
Signed-off-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
In case the vfio_init_intp fails we currently do not return an
error value. This patch fixes the bug. The returned value is not
explicit but in practice the error object is the one used to
report the error to the end-user and the actual returned error
value is not used.
Signed-off-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Propagate the vfio_populate_device errors up to vfio_base_device_init.
The error object also is passed to vfio_init_intp. At the moment we
only report the error. Subsequent patches will propagate the error
up to the realize function.
Signed-off-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Pass an error object to prepare for migration to VFIO-PCI realize.
In vfio platform vfio_base_device_init we currently just report the
error. Subsequent patches will propagate the error up to the realize
function.
Signed-off-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Pass an error object to prepare for migration to VFIO-PCI realize.
For the time being let's just simply report the error in
vfio platform's vfio_base_device_init(). A subsequent patch will
duly propagate the error up to vfio_platform_realize.
Signed-off-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
The error is currently simply reported in vfio_get_group. Don't
bother too much with the prefix which will be handled at upper level,
later on.
Also return an error value in case container->error is not 0 and
the container is teared down.
On vfio_spapr_remove_window failure, we also report an error whereas
it was silent before.
Signed-off-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Pass an error object to prepare for migration to VFIO-PCI realize.
In vfio_probe_igd_bar4_quirk, simply report the error.
Signed-off-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Pass an error object to prepare for migration to VFIO-PCI realize.
The error is cascaded downto vfio_add_std_cap and then vfio_msi(x)_setup,
vfio_setup_pcie_cap.
vfio_add_ext_cap does not return anything else than 0 so let's transform
it into a void function.
Also use pci_add_capability2 which takes an error object.
Signed-off-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Pass an error object to prepare for migration to VFIO-PCI realize.
The error object is propagated down to vfio_intx_enable_kvm().
The three other callers, vfio_intx_enable_kvm(), vfio_msi_disable_common()
and vfio_pci_post_reset() do not propagate the error and simply call
error_reportf_err() with the ERR_PREFIX formatting.
Signed-off-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Pass an error object to prepare for migration to VFIO-PCI realize.
The returned value will be removed later on.
We now format an error in case of reading failure for
- the MSIX flags
- the MSIX table,
- the MSIX PBA.
Signed-off-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Pass an error object to prepare for migration to VFIO-PCI realize.
The returned value will be removed later on.
The case where error recovery cannot be enabled is not converted into
an error object but directly reported through error_report, as before.
Populating an error instead would cause the future realize function to
fail, which is not wanted.
Signed-off-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Pass an error object to prepare for the same operation in
vfio_populate_device. Eventually this contributes to the migration
to VFIO-PCI realize.
We now report an error on vfio_get_region_info failure.
vfio_probe_igd_bar4_quirk is not involved in the migration to realize
and simply calls error_reportf_err.
Signed-off-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
To prepare for migration to realize, let's use a local error
object in vfio_initfn. Also let's use the same error prefix for all
error messages.
On top of the 1-1 conversion, we start using a common error prefix for
all error messages. We also introduce a similar warning prefix which will
be used later on.
Signed-off-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
- a patch to add a vdc->reset() handler to virtio-9p
- a bunch of patches to fix various memory leaks (thanks to Li Qiang)
- some code cleanups for 9pfs
-----BEGIN PGP SIGNATURE-----
iEYEABECAAYFAlgE59oACgkQAvw66wEB28IrzgCePLq8RVQHvxJKcx9CO1XQaXCE
Dp4AoJm+CDVuaBd+cojoUmwGaLmZG8lb
=8nIw
-----END PGP SIGNATURE-----
Merge remote-tracking branch 'remotes/gkurz/tags/for-upstream' into staging
This pull request contains:
- a patch to add a vdc->reset() handler to virtio-9p
- a bunch of patches to fix various memory leaks (thanks to Li Qiang)
- some code cleanups for 9pfs
# gpg: Signature made Mon 17 Oct 2016 16:01:46 BST
# gpg: using DSA key 0x02FC3AEB0101DBC2
# gpg: Good signature from "Greg Kurz <groug@kaod.org>"
# gpg: aka "Greg Kurz <groug@free.fr>"
# gpg: aka "Greg Kurz <gkurz@fr.ibm.com>"
# gpg: aka "Greg Kurz <gkurz@linux.vnet.ibm.com>"
# gpg: aka "Gregory Kurz (Groug) <groug@free.fr>"
# gpg: aka "Gregory Kurz (Cimai Technology) <gkurz@cimai.com>"
# gpg: aka "Gregory Kurz (Meiosys Technology) <gkurz@meiosys.com>"
# gpg: WARNING: This key is not certified with a trusted signature!
# gpg: There is no indication that the signature belongs to the owner.
# Primary key fingerprint: 2BD4 3B44 535E C0A7 9894 DBA2 02FC 3AEB 0101 DBC2
* remotes/gkurz/tags/for-upstream:
9pfs: fix memory leak in v9fs_write
9pfs: fix memory leak in v9fs_link
9pfs: fix memory leak in v9fs_xattrcreate
9pfs: fix information leak in xattr read
virtio-9p: add reset handler
9pfs: only free completed request if not flushed
9pfs: drop useless check in pdu_free()
9pfs: use coroutine_fn annotation in hw/9pfs/9p.[ch]
9pfs: use coroutine_fn annotation in hw/9pfs/co*.[ch]
9pfs: fsdev: drop useless extern annotation for functions
9pfs: fix potential host memory leak in v9fs_read
9pfs: allocate space for guest originated empty strings
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
If an error occurs when marshalling the transfer length to the guest, the
v9fs_write() function doesn't free an IO vector, thus leading to a memory
leak. This patch fixes the issue.
Signed-off-by: Li Qiang <liqiang6-s@360.cn>
Reviewed-by: Greg Kurz <groug@kaod.org>
[groug, rephrased the changelog]
Signed-off-by: Greg Kurz <groug@kaod.org>
The v9fs_link() function keeps a reference on the source fid object. This
causes a memory leak since the reference never goes down to 0. This patch
fixes the issue.
Signed-off-by: Li Qiang <liqiang6-s@360.cn>
Reviewed-by: Greg Kurz <groug@kaod.org>
[groug, rephrased the changelog]
Signed-off-by: Greg Kurz <groug@kaod.org>
The 'fs.xattr.value' field in V9fsFidState object doesn't consider the
situation that this field has been allocated previously. Every time, it
will be allocated directly. This leads to a host memory leak issue if
the client sends another Txattrcreate message with the same fid number
before the fid from the previous time got clunked.
Signed-off-by: Li Qiang <liqiang6-s@360.cn>
Reviewed-by: Greg Kurz <groug@kaod.org>
[groug, updated the changelog to indicate how the leak can occur]
Signed-off-by: Greg Kurz <groug@kaod.org>
9pfs uses g_malloc() to allocate the xattr memory space, if the guest
reads this memory before writing to it, this will leak host heap memory
to the guest. This patch avoid this.
Signed-off-by: Li Qiang <liqiang6-s@360.cn>
Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: Greg Kurz <groug@kaod.org>
Virtio devices should implement the VirtIODevice->reset() function to
perform necessary cleanup actions and to bring the device to a quiescent
state.
In the case of the virtio-9p device, this means:
- emptying the list of active PDUs (i.e. draining all in-flight I/O)
- freeing all fids (i.e. close open file descriptors and free memory)
That's what this patch does.
The reset handler first waits for all active PDUs to complete. Since
completion happens in the QEMU global aio context, we just have to
loop around aio_poll() until the active list is empty.
The freeing part involves some actions to be performed on the backend,
like closing file descriptors or flushing extended attributes to the
underlying filesystem. The virtfs_reset() function already does the
job: it calls free_fid() for all open fids not involved in an ongoing
I/O operation. We are sure this is the case since we have drained
the PDU active list.
The current code implements all backend accesses with coroutines, but we
want to stay synchronous on the reset path. We can either change the
current code to be able to run when not in coroutine context, or create
a coroutine context and wait for virtfs_reset() to complete. This patch
goes for the latter because it results in simpler code.
Note that we also need to create a dummy PDU because it is also an API
to pass the FsContext pointer to all backend callbacks.
Signed-off-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
If a PDU has a flush request pending, the current code calls pdu_free()
twice:
1) pdu_complete()->pdu_free() with pdu->cancelled set, which does nothing
2) v9fs_flush()->pdu_free() with pdu->cancelled cleared, which moves the
PDU back to the free list.
This works but it complexifies the logic of pdu_free().
With this patch, pdu_complete() only calls pdu_free() if no flush request
is pending, i.e. qemu_co_queue_next() returns false.
Since pdu_free() is now supposed to be called with pdu->cancelled cleared,
the check in pdu_free() is dropped and replaced by an assertion.
Signed-off-by: Greg Kurz <groug@kaod.org>
All these functions either call the v9fs_co_* functions which have the
coroutine_fn annotation, or pdu_complete() which calls qemu_co_queue_next().
Let's mark them to make it obvious they execute in coroutine context.
Signed-off-by: Greg Kurz <groug@kaod.org>
All these functions use the v9fs_co_run_in_worker() macro, and thus always
call qemu_coroutine_self() and qemu_coroutine_yield().
Let's mark them to make it obvious they execute in coroutine context.
Signed-off-by: Greg Kurz <groug@kaod.org>
In 9pfs read dispatch function, it doesn't free two QEMUIOVector
object thus causing potential memory leak. This patch avoid this.
Signed-off-by: Li Qiang <liqiang6-s@360.cn>
Signed-off-by: Greg Kurz <groug@kaod.org>
If a guest sends an empty string paramater to any 9P operation, the current
code unmarshals it into a V9fsString equal to { .size = 0, .data = NULL }.
This is unfortunate because it can cause NULL pointer dereference to happen
at various locations in the 9pfs code. And we don't want to check str->data
everywhere we pass it to strcmp() or any other function which expects a
dereferenceable pointer.
This patch enforces the allocation of genuine C empty strings instead, so
callers don't have to bother.
Out of all v9fs_iov_vunmarshal() users, only v9fs_xattrwalk() checks if
the returned string is empty. It now uses v9fs_string_size() since
name.data cannot be NULL anymore.
Signed-off-by: Li Qiang <liqiang6-s@360.cn>
[groug, rewritten title and changelog,
fix empty string check in v9fs_xattrwalk()]
Signed-off-by: Greg Kurz <groug@kaod.org>