Current default round-robin way of distributing VCPUs among
NUMA nodes might be wrong in case on multi-core/threads
CPUs. Making guests confused wrt topology where cores from
the same socket are on different nodes.
Allow a machine to override default mapping by providing
MachineClass::cpu_index_to_socket_id()
callback which would allow it group VCPUs from a socket
on the same NUMA node.
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Andreas Färber <afaerber@suse.de>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Each CPU can appear in only one NUMA node on the NUMA config. Reject
configuration if a CPU appears in multiple nodes.
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
CPU index is always less than max_cpus, as documented at sysemu.h:
> The following shall be true for all CPUs:
> cpu->cpu_index < max_cpus <= MAX_CPUMASK_BITS
Reject configuration which uses invalid CPU indexes.
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Fix the CPU index check to ensure we don't go beyond the size of the
node_cpu bitmap.
CPU index is always less than MAX_CPUMASK_BITS, as documented at
sysemu.h:
> The following shall be true for all CPUs:
> cpu->cpu_index < max_cpus <= MAX_CPUMASK_BITS
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
This function does some initialization that needs to be done after
machine init. The function may be eventually removed if we move the
CPUState.numa_node initialization to the CPU init code, but while the
function exists, lets give it a name that makes sense.
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Renaming set_numa_nodes() and numa_init_func() to parse_numa_opts() and
parse_numa() makes the purpose of those functions clearer.
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
This allows us to make numa_init_func() static.
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Now the only code that uses the variable is inside numa.c.
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Not all sysemu.h users need the NUMA declarations, and keeping them in a
separate file makes it easier to see what are the interfaces provided by
numa.c.
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
qerror_report_err() is a transitional interface to help with
converting existing monitor commands to QMP. It should not be used
elsewhere. Replace by error_report_err() in initial startup helper
numa_init_func() and board setup helper
memory_region_allocate_system_memory().
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
When do memory hotplug, if there is numa node, we should add
the memory size to the corresponding node memory size.
It affects the result of hmp command "info numa".
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
Add parameter errp to memory_region_init_ram and update all call sites
to pass in &error_abort.
Signed-off-by: Hu Tao <hutao@cn.fujitsu.com>
Reviewed-by: Peter Crosthwaite <peter.crosthwaite@xilinx.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
the memdev_list in hmp_info_memdev() is never freed.
so we use existent method qapi_free_MemdevList() to free it.
and also we can use qapi_free_MemdevList() to replace list loops
to clean up the memdev list in error path.
Signed-off-by: Chen Fan <chen.fan.fnst@cn.fujitsu.com>
Reviewed-by: Peter Crosthwaite <peter.crosthwaite@xilinx.com>
Reviewed-by: Hu Tao <hutao@cn.fujitsu.com>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
Signed-off-by: Chen Fan <chen.fan.fnst@cn.fujitsu.com>
Reviewed-by: Peter Crosthwaite <peter.crosthwaite@xilinx.com>
Reviewed-by: Hu Tao <hutao@cn.fujitsu.com>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
The error messages before and after patch are:
before:
qemu-system-x86_64: total memory for NUMA nodes (134217728) should equal RAM size (20000000)
after:
qemu-system-x86_64: total memory for NUMA nodes (0x8000000) should equal RAM size (0x20000000)
Cc: qemu-stable@nongnu.org
Signed-off-by: Hu Tao <hutao@cn.fujitsu.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Specifying the same memory backend twice leads to an assert:
./x86_64-softmmu/qemu-system-x86_64 -m 512M -enable-kvm -object
memory-backend-ram,size=256M,id=ram0 -numa node,nodeid=0,memdev=ram0
-numa node,nodeid=1,memdev=ram0
qemu-system-x86_64: /scm/qemu/memory.c:1506:
memory_region_add_subregion_common: Assertion `!subregion->container'
failed.
Aborted (core dumped)
Detect and exit with an error message instead.
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: Hu Tao <hutao@cn.fujitsu.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
We don't support sparse NUMA node IDs yet, so this changes QEMU to
reject configs where not all nodes are present.
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
The same nodeid shouldn't appear multiple times in the command-line.
In addition to detecting command-line mistakes, this will fix a bug
where nb_numa_nodes may become larger than MAX_NODES (and cause
out-of-bounds access on the numa_info array).
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Hu Tao <hutao@cn.fujitsu.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Based on "enable sparse node numbering" patch from Nishanth Aravamudan,
but without the code to actually support sparse node IDs. This just adds
the code to keep track of present/non-present nodes on the command-line,
without changing any behavior.
Signed-off-by: Nishanth Aravamudan <nacc@linux.vnet.ibm.com>
[Rename max_numa_node to max_numa_nodeid -Eduardo]
[Initialize max_numa_nodeid to 0 -Eduardo]
[Use MAX() macro when setting max_numa_nodeid -Eduardo]
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Hu Tao <hutao@cn.fujitsu.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
when memory_region_init_ram_from_file() fails
memory_region_size() will still return size that was
provided at region init time.
Instead use errp to properly detect error condition.
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Add qmp command query-memdev to query for information
of memory devices
Signed-off-by: Hu Tao <hutao@cn.fujitsu.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
A new "share" property can be used with the "memory-file" backend to
map memory with MAP_SHARED instead of MAP_PRIVATE.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Hu Tao <hutao@cn.fujitsu.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Right now, -mem-path will fall back to RAM-based allocation in some
cases. This should never happen with "-object memory-file", prepare
the code by adding correct error propagation.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Hu Tao <hutao@cn.fujitsu.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
MST: drop \n at end of error messages
Like the previous patch did in exec.c, split memory_region_init_ram and
memory_region_init_ram_from_file, and push mem_path one step further up.
Other RAM regions than system memory will now be backed by regular RAM.
Also, boards that do not use memory_region_allocate_system_memory will
not support -mem-path anymore. This can be changed before the patches
are merged by migrating boards to use the function.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Hu Tao <hutao@cn.fujitsu.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
This option provides the infrastructure for binding guest NUMA nodes
to host NUMA nodes. For example:
-object memory-ram,size=1024M,policy=bind,host-nodes=0,id=ram-node0 \
-numa node,nodeid=0,cpus=0,memdev=ram-node0 \
-object memory-ram,size=1024M,policy=interleave,host-nodes=1-3,id=ram-node1 \
-numa node,nodeid=1,cpus=1,memdev=ram-node1
The option replaces "-numa node,mem=".
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Hu Tao <hutao@cn.fujitsu.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
MST: conflict resolution
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Hu Tao <hutao@cn.fujitsu.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
MST: resolve conflicts
Signed-off-by: Wanlong Gao <gaowanlong@cn.fujitsu.com>
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Tested-by: Eduardo Habkost <ehabkost@redhat.com>
Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Hu Tao <hutao@cn.fujitsu.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Add the numa_info structure to contain the numa nodes memory,
VCPUs information and the future added numa nodes host memory
policies.
Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Andre Przywara <andre.przywara@amd.com>
Signed-off-by: Wanlong Gao <gaowanlong@cn.fujitsu.com>
[Fix hw/ppc/spapr.c - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Hu Tao <hutao@cn.fujitsu.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
If the total number of the assigned numa nodes memory is not
equal to the assigned ram size, it will write the wrong data
to ACPI table, then the guest will ignore the wrong ACPI table
and recognize all memory to one node. It's buggy, we should
check it to ensure that we write the right data to ACPI table.
Signed-off-by: Wanlong Gao <gaowanlong@cn.fujitsu.com>
Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Hu Tao <hutao@cn.fujitsu.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
MST: error message reworded
Signed-off-by: Wanlong Gao <gaowanlong@cn.fujitsu.com>
Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Hu Tao <hutao@cn.fujitsu.com>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
Signed-off-by: Andre Przywara <andre.przywara@amd.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
MST: comment tweaks