2019-05-30 07:57:35 +08:00
|
|
|
// SPDX-License-Identifier: GPL-2.0-only
|
2009-07-14 07:02:34 +08:00
|
|
|
/*
|
|
|
|
* Copyright (c) 2009, Microsoft Corporation.
|
|
|
|
*
|
|
|
|
* Authors:
|
|
|
|
* Haiyang Zhang <haiyangz@microsoft.com>
|
|
|
|
* Hank Janssen <hjanssen@microsoft.com>
|
2011-04-30 04:45:15 +08:00
|
|
|
* K. Y. Srinivasan <kys@microsoft.com>
|
2009-07-14 07:02:34 +08:00
|
|
|
*/
|
2011-03-30 04:58:47 +08:00
|
|
|
#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
|
|
|
|
|
2009-07-14 07:02:34 +08:00
|
|
|
#include <linux/init.h>
|
|
|
|
#include <linux/module.h>
|
|
|
|
#include <linux/device.h>
|
2023-03-20 15:47:38 +08:00
|
|
|
#include <linux/platform_device.h>
|
2009-07-14 07:02:34 +08:00
|
|
|
#include <linux/interrupt.h>
|
|
|
|
#include <linux/sysctl.h>
|
include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h
percpu.h is included by sched.h and module.h and thus ends up being
included when building most .c files. percpu.h includes slab.h which
in turn includes gfp.h making everything defined by the two files
universally available and complicating inclusion dependencies.
percpu.h -> slab.h dependency is about to be removed. Prepare for
this change by updating users of gfp and slab facilities include those
headers directly instead of assuming availability. As this conversion
needs to touch large number of source files, the following script is
used as the basis of conversion.
http://userweb.kernel.org/~tj/misc/slabh-sweep.py
The script does the followings.
* Scan files for gfp and slab usages and update includes such that
only the necessary includes are there. ie. if only gfp is used,
gfp.h, if slab is used, slab.h.
* When the script inserts a new include, it looks at the include
blocks and try to put the new include such that its order conforms
to its surrounding. It's put in the include block which contains
core kernel includes, in the same order that the rest are ordered -
alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
doesn't seem to be any matching order.
* If the script can't find a place to put a new include (mostly
because the file doesn't have fitting include block), it prints out
an error message indicating which .h file needs to be added to the
file.
The conversion was done in the following steps.
1. The initial automatic conversion of all .c files updated slightly
over 4000 files, deleting around 700 includes and adding ~480 gfp.h
and ~3000 slab.h inclusions. The script emitted errors for ~400
files.
2. Each error was manually checked. Some didn't need the inclusion,
some needed manual addition while adding it to implementation .h or
embedding .c file was more appropriate for others. This step added
inclusions to around 150 files.
3. The script was run again and the output was compared to the edits
from #2 to make sure no file was left behind.
4. Several build tests were done and a couple of problems were fixed.
e.g. lib/decompress_*.c used malloc/free() wrappers around slab
APIs requiring slab.h to be added manually.
5. The script was run on all .h files but without automatically
editing them as sprinkling gfp.h and slab.h inclusions around .h
files could easily lead to inclusion dependency hell. Most gfp.h
inclusion directives were ignored as stuff from gfp.h was usually
wildly available and often used in preprocessor macros. Each
slab.h inclusion directive was examined and added manually as
necessary.
6. percpu.h was updated not to include slab.h.
7. Build test were done on the following configurations and failures
were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my
distributed build env didn't work with gcov compiles) and a few
more options had to be turned off depending on archs to make things
build (like ipr on powerpc/64 which failed due to missing writeq).
* x86 and x86_64 UP and SMP allmodconfig and a custom test config.
* powerpc and powerpc64 SMP allmodconfig
* sparc and sparc64 SMP allmodconfig
* ia64 SMP allmodconfig
* s390 SMP allmodconfig
* alpha SMP allmodconfig
* um on x86_64 SMP allmodconfig
8. percpu.h modifications were reverted so that it could be applied as
a separate patch and serve as bisection point.
Given the fact that I had only a couple of failures from tests on step
6, I'm fairly confident about the coverage of this conversion patch.
If there is a breakage, it's likely to be something in one of the arch
headers which should be easily discoverable easily on most builds of
the specific arch.
Signed-off-by: Tejun Heo <tj@kernel.org>
Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
2010-03-24 16:04:11 +08:00
|
|
|
#include <linux/slab.h>
|
2011-04-30 04:45:15 +08:00
|
|
|
#include <linux/acpi.h>
|
2010-05-29 07:22:44 +08:00
|
|
|
#include <linux/completion.h>
|
2011-10-05 03:29:52 +08:00
|
|
|
#include <linux/hyperv.h>
|
2012-12-01 22:46:54 +08:00
|
|
|
#include <linux/kernel_stat.h>
|
2023-03-20 15:47:40 +08:00
|
|
|
#include <linux/of_address.h>
|
2015-01-10 15:54:32 +08:00
|
|
|
#include <linux/clockchips.h>
|
2015-02-28 03:25:51 +08:00
|
|
|
#include <linux/cpu.h>
|
2022-05-27 15:43:59 +08:00
|
|
|
#include <linux/sched/isolation.h>
|
2017-02-09 01:51:37 +08:00
|
|
|
#include <linux/sched/task_stack.h>
|
|
|
|
|
2019-09-06 07:01:20 +08:00
|
|
|
#include <linux/delay.h>
|
2021-07-01 09:54:59 +08:00
|
|
|
#include <linux/panic_notifier.h>
|
2015-03-01 03:39:01 +08:00
|
|
|
#include <linux/ptrace.h>
|
2015-08-05 15:52:37 +08:00
|
|
|
#include <linux/screen_info.h>
|
2016-04-06 01:22:55 +08:00
|
|
|
#include <linux/efi.h>
|
2016-05-02 14:14:34 +08:00
|
|
|
#include <linux/random.h>
|
2020-04-06 23:53:31 +08:00
|
|
|
#include <linux/kernel.h>
|
2019-09-06 07:01:16 +08:00
|
|
|
#include <linux/syscore_ops.h>
|
2021-12-13 15:14:05 +08:00
|
|
|
#include <linux/dma-map-ops.h>
|
2022-08-27 21:03:44 +08:00
|
|
|
#include <linux/pci.h>
|
2019-07-01 12:25:56 +08:00
|
|
|
#include <clocksource/hyperv_timer.h>
|
2022-11-14 05:21:15 +08:00
|
|
|
#include <asm/mshyperv.h>
|
2011-05-13 10:34:28 +08:00
|
|
|
#include "hyperv_vmbus.h"
|
2009-07-14 07:02:34 +08:00
|
|
|
|
2016-12-04 04:34:39 +08:00
|
|
|
struct vmbus_dynid {
|
|
|
|
struct list_head node;
|
|
|
|
struct hv_vmbus_device_id id;
|
|
|
|
};
|
|
|
|
|
2023-03-20 15:47:38 +08:00
|
|
|
static struct device *hv_dev;
|
2011-03-16 06:03:32 +08:00
|
|
|
|
2016-12-08 06:53:11 +08:00
|
|
|
static int hyperv_cpuhp_online;
|
2015-03-01 03:39:01 +08:00
|
|
|
|
2021-03-03 05:38:18 +08:00
|
|
|
static long __percpu *vmbus_evt;
|
|
|
|
|
2020-08-15 03:45:04 +08:00
|
|
|
/* Values parsed from ACPI DSDT */
|
2021-03-03 05:38:18 +08:00
|
|
|
int vmbus_irq;
|
2020-08-15 03:45:04 +08:00
|
|
|
int vmbus_interrupt;
|
|
|
|
|
drivers: hv, hyperv_fb: Untangle and refactor Hyper-V panic notifiers
Currently Hyper-V guests are among the most relevant users of the panic
infrastructure, like panic notifiers, kmsg dumpers, etc. The reasons rely
both in cleaning-up procedures (closing hypervisor <-> guest connection,
disabling some paravirtualized timer) as well as to data collection
(sending panic information to the hypervisor) and framebuffer management.
The thing is: some notifiers are related to others, ordering matters, some
functionalities are duplicated and there are lots of conditionals behind
sending panic information to the hypervisor. As part of an effort to
clean-up the panic notifiers mechanism and better document things, we
hereby address some of the issues/complexities of Hyper-V panic handling
through the following changes:
(a) We have die and panic notifiers on vmbus_drv.c and both have goals of
sending panic information to the hypervisor, though the panic notifier is
also responsible for a cleaning-up procedure.
This commit clears the code by splitting the panic notifier in two, one
for closing the vmbus connection whereas the other is only for sending
panic info to hypervisor. With that, it was possible to merge the die and
panic notifiers in a single/well-documented function, and clear some
conditional complexities on sending such information to the hypervisor.
(b) There is a Hyper-V framebuffer panic notifier, which relies in doing
a vmbus operation that demands a valid connection. So, we must order this
notifier with the panic notifier from vmbus_drv.c, to guarantee that the
framebuffer code executes before the vmbus connection is unloaded.
Also, this commit removes a useless header.
Although there is code rework and re-ordering, we expect that this change
has no functional regressions but instead optimize the path and increase
panic reliability on Hyper-V. This was tested on Hyper-V with success.
Cc: Andrea Parri (Microsoft) <parri.andrea@gmail.com>
Cc: Dexuan Cui <decui@microsoft.com>
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Cc: "K. Y. Srinivasan" <kys@microsoft.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Stephen Hemminger <sthemmin@microsoft.com>
Cc: Tianyu Lan <Tianyu.Lan@microsoft.com>
Cc: Wei Liu <wei.liu@kernel.org>
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Tested-by: Fabio A M Martins <fabiomirmar@gmail.com>
Signed-off-by: Guilherme G. Piccoli <gpiccoli@igalia.com>
Tested-by: Michael Kelley <mikelley@microsoft.com>
Link: https://lore.kernel.org/r/20220819221731.480795-11-gpiccoli@igalia.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>
2022-08-20 06:17:30 +08:00
|
|
|
/*
|
|
|
|
* The panic notifier below is responsible solely for unloading the
|
|
|
|
* vmbus connection, which is necessary in a panic event.
|
|
|
|
*
|
|
|
|
* Notice an intrincate relation of this notifier with Hyper-V
|
|
|
|
* framebuffer panic notifier exists - we need vmbus connection alive
|
|
|
|
* there in order to succeed, so we need to order both with each other
|
|
|
|
* [see hvfb_on_panic()] - this is done using notifiers' priorities.
|
|
|
|
*/
|
|
|
|
static int hv_panic_vmbus_unload(struct notifier_block *nb, unsigned long val,
|
2015-08-02 07:08:10 +08:00
|
|
|
void *args)
|
|
|
|
{
|
2020-04-06 23:53:26 +08:00
|
|
|
vmbus_initiate_unload(true);
|
2015-03-01 03:39:01 +08:00
|
|
|
return NOTIFY_DONE;
|
|
|
|
}
|
drivers: hv, hyperv_fb: Untangle and refactor Hyper-V panic notifiers
Currently Hyper-V guests are among the most relevant users of the panic
infrastructure, like panic notifiers, kmsg dumpers, etc. The reasons rely
both in cleaning-up procedures (closing hypervisor <-> guest connection,
disabling some paravirtualized timer) as well as to data collection
(sending panic information to the hypervisor) and framebuffer management.
The thing is: some notifiers are related to others, ordering matters, some
functionalities are duplicated and there are lots of conditionals behind
sending panic information to the hypervisor. As part of an effort to
clean-up the panic notifiers mechanism and better document things, we
hereby address some of the issues/complexities of Hyper-V panic handling
through the following changes:
(a) We have die and panic notifiers on vmbus_drv.c and both have goals of
sending panic information to the hypervisor, though the panic notifier is
also responsible for a cleaning-up procedure.
This commit clears the code by splitting the panic notifier in two, one
for closing the vmbus connection whereas the other is only for sending
panic info to hypervisor. With that, it was possible to merge the die and
panic notifiers in a single/well-documented function, and clear some
conditional complexities on sending such information to the hypervisor.
(b) There is a Hyper-V framebuffer panic notifier, which relies in doing
a vmbus operation that demands a valid connection. So, we must order this
notifier with the panic notifier from vmbus_drv.c, to guarantee that the
framebuffer code executes before the vmbus connection is unloaded.
Also, this commit removes a useless header.
Although there is code rework and re-ordering, we expect that this change
has no functional regressions but instead optimize the path and increase
panic reliability on Hyper-V. This was tested on Hyper-V with success.
Cc: Andrea Parri (Microsoft) <parri.andrea@gmail.com>
Cc: Dexuan Cui <decui@microsoft.com>
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Cc: "K. Y. Srinivasan" <kys@microsoft.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Stephen Hemminger <sthemmin@microsoft.com>
Cc: Tianyu Lan <Tianyu.Lan@microsoft.com>
Cc: Wei Liu <wei.liu@kernel.org>
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Tested-by: Fabio A M Martins <fabiomirmar@gmail.com>
Signed-off-by: Guilherme G. Piccoli <gpiccoli@igalia.com>
Tested-by: Michael Kelley <mikelley@microsoft.com>
Link: https://lore.kernel.org/r/20220819221731.480795-11-gpiccoli@igalia.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>
2022-08-20 06:17:30 +08:00
|
|
|
static struct notifier_block hyperv_panic_vmbus_unload_block = {
|
|
|
|
.notifier_call = hv_panic_vmbus_unload,
|
|
|
|
.priority = INT_MIN + 1, /* almost the latest one to execute */
|
|
|
|
};
|
|
|
|
|
2016-04-06 01:22:55 +08:00
|
|
|
static const char *fb_mmio_name = "fb_range";
|
|
|
|
static struct resource *fb_mmio;
|
2016-09-07 20:39:33 +08:00
|
|
|
static struct resource *hyperv_mmio;
|
2019-11-02 04:00:04 +08:00
|
|
|
static DEFINE_MUTEX(hyperv_mmio_lock);
|
2011-03-16 06:03:44 +08:00
|
|
|
|
2011-12-02 01:59:34 +08:00
|
|
|
static int vmbus_exists(void)
|
|
|
|
{
|
2023-03-20 15:47:38 +08:00
|
|
|
if (hv_dev == NULL)
|
2011-12-02 01:59:34 +08:00
|
|
|
return -ENODEV;
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2017-09-22 11:58:49 +08:00
|
|
|
static u8 channel_monitor_group(const struct vmbus_channel *channel)
|
2013-09-14 02:32:56 +08:00
|
|
|
{
|
|
|
|
return (u8)channel->offermsg.monitorid / 32;
|
|
|
|
}
|
|
|
|
|
2017-09-22 11:58:49 +08:00
|
|
|
static u8 channel_monitor_offset(const struct vmbus_channel *channel)
|
2013-09-14 02:32:56 +08:00
|
|
|
{
|
|
|
|
return (u8)channel->offermsg.monitorid % 32;
|
|
|
|
}
|
|
|
|
|
2017-09-22 11:58:49 +08:00
|
|
|
static u32 channel_pending(const struct vmbus_channel *channel,
|
|
|
|
const struct hv_monitor_page *monitor_page)
|
2013-09-14 02:32:56 +08:00
|
|
|
{
|
|
|
|
u8 monitor_group = channel_monitor_group(channel);
|
2017-09-22 11:58:49 +08:00
|
|
|
|
2013-09-14 02:32:56 +08:00
|
|
|
return monitor_page->trigger_group[monitor_group].pending;
|
|
|
|
}
|
|
|
|
|
2017-09-22 11:58:49 +08:00
|
|
|
static u32 channel_latency(const struct vmbus_channel *channel,
|
|
|
|
const struct hv_monitor_page *monitor_page)
|
2013-09-14 02:32:57 +08:00
|
|
|
{
|
|
|
|
u8 monitor_group = channel_monitor_group(channel);
|
|
|
|
u8 monitor_offset = channel_monitor_offset(channel);
|
2017-09-22 11:58:49 +08:00
|
|
|
|
2013-09-14 02:32:57 +08:00
|
|
|
return monitor_page->latency[monitor_group][monitor_offset];
|
|
|
|
}
|
|
|
|
|
2013-09-14 02:32:58 +08:00
|
|
|
static u32 channel_conn_id(struct vmbus_channel *channel,
|
|
|
|
struct hv_monitor_page *monitor_page)
|
|
|
|
{
|
|
|
|
u8 monitor_group = channel_monitor_group(channel);
|
|
|
|
u8 monitor_offset = channel_monitor_offset(channel);
|
2020-11-16 03:57:31 +08:00
|
|
|
|
2013-09-14 02:32:58 +08:00
|
|
|
return monitor_page->parameter[monitor_group][monitor_offset].connectionid.u.id;
|
|
|
|
}
|
|
|
|
|
2013-09-14 02:32:49 +08:00
|
|
|
static ssize_t id_show(struct device *dev, struct device_attribute *dev_attr,
|
|
|
|
char *buf)
|
|
|
|
{
|
|
|
|
struct hv_device *hv_dev = device_to_hv_device(dev);
|
|
|
|
|
|
|
|
if (!hv_dev->channel)
|
|
|
|
return -ENODEV;
|
2024-03-19 11:43:50 +08:00
|
|
|
return sysfs_emit(buf, "%d\n", hv_dev->channel->offermsg.child_relid);
|
2013-09-14 02:32:49 +08:00
|
|
|
}
|
|
|
|
static DEVICE_ATTR_RO(id);
|
|
|
|
|
2013-09-14 02:32:50 +08:00
|
|
|
static ssize_t state_show(struct device *dev, struct device_attribute *dev_attr,
|
|
|
|
char *buf)
|
|
|
|
{
|
|
|
|
struct hv_device *hv_dev = device_to_hv_device(dev);
|
|
|
|
|
|
|
|
if (!hv_dev->channel)
|
|
|
|
return -ENODEV;
|
2024-03-19 11:43:50 +08:00
|
|
|
return sysfs_emit(buf, "%d\n", hv_dev->channel->state);
|
2013-09-14 02:32:50 +08:00
|
|
|
}
|
|
|
|
static DEVICE_ATTR_RO(state);
|
|
|
|
|
2013-09-14 02:32:51 +08:00
|
|
|
static ssize_t monitor_id_show(struct device *dev,
|
|
|
|
struct device_attribute *dev_attr, char *buf)
|
|
|
|
{
|
|
|
|
struct hv_device *hv_dev = device_to_hv_device(dev);
|
|
|
|
|
|
|
|
if (!hv_dev->channel)
|
|
|
|
return -ENODEV;
|
2024-03-19 11:43:50 +08:00
|
|
|
return sysfs_emit(buf, "%d\n", hv_dev->channel->offermsg.monitorid);
|
2013-09-14 02:32:51 +08:00
|
|
|
}
|
|
|
|
static DEVICE_ATTR_RO(monitor_id);
|
|
|
|
|
2013-09-14 02:32:53 +08:00
|
|
|
static ssize_t class_id_show(struct device *dev,
|
|
|
|
struct device_attribute *dev_attr, char *buf)
|
|
|
|
{
|
|
|
|
struct hv_device *hv_dev = device_to_hv_device(dev);
|
|
|
|
|
|
|
|
if (!hv_dev->channel)
|
|
|
|
return -ENODEV;
|
2024-03-19 11:43:50 +08:00
|
|
|
return sysfs_emit(buf, "{%pUl}\n",
|
|
|
|
&hv_dev->channel->offermsg.offer.if_type);
|
2013-09-14 02:32:53 +08:00
|
|
|
}
|
|
|
|
static DEVICE_ATTR_RO(class_id);
|
|
|
|
|
2013-09-14 02:32:54 +08:00
|
|
|
static ssize_t device_id_show(struct device *dev,
|
|
|
|
struct device_attribute *dev_attr, char *buf)
|
|
|
|
{
|
|
|
|
struct hv_device *hv_dev = device_to_hv_device(dev);
|
|
|
|
|
|
|
|
if (!hv_dev->channel)
|
|
|
|
return -ENODEV;
|
2024-03-19 11:43:50 +08:00
|
|
|
return sysfs_emit(buf, "{%pUl}\n",
|
|
|
|
&hv_dev->channel->offermsg.offer.if_instance);
|
2013-09-14 02:32:54 +08:00
|
|
|
}
|
|
|
|
static DEVICE_ATTR_RO(device_id);
|
|
|
|
|
2013-09-14 02:32:52 +08:00
|
|
|
static ssize_t modalias_show(struct device *dev,
|
|
|
|
struct device_attribute *dev_attr, char *buf)
|
|
|
|
{
|
|
|
|
struct hv_device *hv_dev = device_to_hv_device(dev);
|
|
|
|
|
2024-03-19 11:43:50 +08:00
|
|
|
return sysfs_emit(buf, "vmbus:%*phN\n", UUID_SIZE, &hv_dev->dev_type);
|
2013-09-14 02:32:52 +08:00
|
|
|
}
|
|
|
|
static DEVICE_ATTR_RO(modalias);
|
|
|
|
|
2018-07-29 05:58:48 +08:00
|
|
|
#ifdef CONFIG_NUMA
|
|
|
|
static ssize_t numa_node_show(struct device *dev,
|
|
|
|
struct device_attribute *attr, char *buf)
|
|
|
|
{
|
|
|
|
struct hv_device *hv_dev = device_to_hv_device(dev);
|
|
|
|
|
|
|
|
if (!hv_dev->channel)
|
|
|
|
return -ENODEV;
|
|
|
|
|
2024-03-19 11:43:50 +08:00
|
|
|
return sysfs_emit(buf, "%d\n", cpu_to_node(hv_dev->channel->target_cpu));
|
2018-07-29 05:58:48 +08:00
|
|
|
}
|
|
|
|
static DEVICE_ATTR_RO(numa_node);
|
|
|
|
#endif
|
|
|
|
|
2013-09-14 02:32:56 +08:00
|
|
|
static ssize_t server_monitor_pending_show(struct device *dev,
|
|
|
|
struct device_attribute *dev_attr,
|
|
|
|
char *buf)
|
|
|
|
{
|
|
|
|
struct hv_device *hv_dev = device_to_hv_device(dev);
|
|
|
|
|
|
|
|
if (!hv_dev->channel)
|
|
|
|
return -ENODEV;
|
2024-03-19 11:43:50 +08:00
|
|
|
return sysfs_emit(buf, "%d\n", channel_pending(hv_dev->channel,
|
|
|
|
vmbus_connection.monitor_pages[0]));
|
2013-09-14 02:32:56 +08:00
|
|
|
}
|
|
|
|
static DEVICE_ATTR_RO(server_monitor_pending);
|
|
|
|
|
|
|
|
static ssize_t client_monitor_pending_show(struct device *dev,
|
|
|
|
struct device_attribute *dev_attr,
|
|
|
|
char *buf)
|
|
|
|
{
|
|
|
|
struct hv_device *hv_dev = device_to_hv_device(dev);
|
|
|
|
|
|
|
|
if (!hv_dev->channel)
|
|
|
|
return -ENODEV;
|
2024-03-19 11:43:50 +08:00
|
|
|
return sysfs_emit(buf, "%d\n", channel_pending(hv_dev->channel,
|
|
|
|
vmbus_connection.monitor_pages[1]));
|
2013-09-14 02:32:56 +08:00
|
|
|
}
|
|
|
|
static DEVICE_ATTR_RO(client_monitor_pending);
|
2013-09-14 02:32:53 +08:00
|
|
|
|
2013-09-14 02:32:57 +08:00
|
|
|
static ssize_t server_monitor_latency_show(struct device *dev,
|
|
|
|
struct device_attribute *dev_attr,
|
|
|
|
char *buf)
|
|
|
|
{
|
|
|
|
struct hv_device *hv_dev = device_to_hv_device(dev);
|
|
|
|
|
|
|
|
if (!hv_dev->channel)
|
|
|
|
return -ENODEV;
|
2024-03-19 11:43:50 +08:00
|
|
|
return sysfs_emit(buf, "%d\n", channel_latency(hv_dev->channel,
|
|
|
|
vmbus_connection.monitor_pages[0]));
|
2013-09-14 02:32:57 +08:00
|
|
|
}
|
|
|
|
static DEVICE_ATTR_RO(server_monitor_latency);
|
|
|
|
|
|
|
|
static ssize_t client_monitor_latency_show(struct device *dev,
|
|
|
|
struct device_attribute *dev_attr,
|
|
|
|
char *buf)
|
|
|
|
{
|
|
|
|
struct hv_device *hv_dev = device_to_hv_device(dev);
|
|
|
|
|
|
|
|
if (!hv_dev->channel)
|
|
|
|
return -ENODEV;
|
2024-03-19 11:43:50 +08:00
|
|
|
return sysfs_emit(buf, "%d\n", channel_latency(hv_dev->channel,
|
|
|
|
vmbus_connection.monitor_pages[1]));
|
2013-09-14 02:32:57 +08:00
|
|
|
}
|
|
|
|
static DEVICE_ATTR_RO(client_monitor_latency);
|
|
|
|
|
2013-09-14 02:32:58 +08:00
|
|
|
static ssize_t server_monitor_conn_id_show(struct device *dev,
|
|
|
|
struct device_attribute *dev_attr,
|
|
|
|
char *buf)
|
|
|
|
{
|
|
|
|
struct hv_device *hv_dev = device_to_hv_device(dev);
|
|
|
|
|
|
|
|
if (!hv_dev->channel)
|
|
|
|
return -ENODEV;
|
2024-03-19 11:43:50 +08:00
|
|
|
return sysfs_emit(buf, "%d\n", channel_conn_id(hv_dev->channel,
|
|
|
|
vmbus_connection.monitor_pages[0]));
|
2013-09-14 02:32:58 +08:00
|
|
|
}
|
|
|
|
static DEVICE_ATTR_RO(server_monitor_conn_id);
|
|
|
|
|
|
|
|
static ssize_t client_monitor_conn_id_show(struct device *dev,
|
|
|
|
struct device_attribute *dev_attr,
|
|
|
|
char *buf)
|
|
|
|
{
|
|
|
|
struct hv_device *hv_dev = device_to_hv_device(dev);
|
|
|
|
|
|
|
|
if (!hv_dev->channel)
|
|
|
|
return -ENODEV;
|
2024-03-19 11:43:50 +08:00
|
|
|
return sysfs_emit(buf, "%d\n", channel_conn_id(hv_dev->channel,
|
|
|
|
vmbus_connection.monitor_pages[1]));
|
2013-09-14 02:32:58 +08:00
|
|
|
}
|
|
|
|
static DEVICE_ATTR_RO(client_monitor_conn_id);
|
|
|
|
|
2013-09-14 02:33:01 +08:00
|
|
|
static ssize_t out_intr_mask_show(struct device *dev,
|
|
|
|
struct device_attribute *dev_attr, char *buf)
|
|
|
|
{
|
|
|
|
struct hv_device *hv_dev = device_to_hv_device(dev);
|
|
|
|
struct hv_ring_buffer_debug_info outbound;
|
2018-12-18 04:16:09 +08:00
|
|
|
int ret;
|
2013-09-14 02:33:01 +08:00
|
|
|
|
|
|
|
if (!hv_dev->channel)
|
|
|
|
return -ENODEV;
|
2018-12-18 04:16:09 +08:00
|
|
|
|
|
|
|
ret = hv_ringbuffer_get_debuginfo(&hv_dev->channel->outbound,
|
|
|
|
&outbound);
|
|
|
|
if (ret < 0)
|
|
|
|
return ret;
|
|
|
|
|
2024-03-19 11:43:50 +08:00
|
|
|
return sysfs_emit(buf, "%d\n", outbound.current_interrupt_mask);
|
2013-09-14 02:33:01 +08:00
|
|
|
}
|
|
|
|
static DEVICE_ATTR_RO(out_intr_mask);
|
|
|
|
|
|
|
|
static ssize_t out_read_index_show(struct device *dev,
|
|
|
|
struct device_attribute *dev_attr, char *buf)
|
|
|
|
{
|
|
|
|
struct hv_device *hv_dev = device_to_hv_device(dev);
|
|
|
|
struct hv_ring_buffer_debug_info outbound;
|
2018-12-18 04:16:09 +08:00
|
|
|
int ret;
|
2013-09-14 02:33:01 +08:00
|
|
|
|
|
|
|
if (!hv_dev->channel)
|
|
|
|
return -ENODEV;
|
2018-12-18 04:16:09 +08:00
|
|
|
|
|
|
|
ret = hv_ringbuffer_get_debuginfo(&hv_dev->channel->outbound,
|
|
|
|
&outbound);
|
|
|
|
if (ret < 0)
|
|
|
|
return ret;
|
2024-03-19 11:43:50 +08:00
|
|
|
return sysfs_emit(buf, "%d\n", outbound.current_read_index);
|
2013-09-14 02:33:01 +08:00
|
|
|
}
|
|
|
|
static DEVICE_ATTR_RO(out_read_index);
|
|
|
|
|
|
|
|
static ssize_t out_write_index_show(struct device *dev,
|
|
|
|
struct device_attribute *dev_attr,
|
|
|
|
char *buf)
|
|
|
|
{
|
|
|
|
struct hv_device *hv_dev = device_to_hv_device(dev);
|
|
|
|
struct hv_ring_buffer_debug_info outbound;
|
2018-12-18 04:16:09 +08:00
|
|
|
int ret;
|
2013-09-14 02:33:01 +08:00
|
|
|
|
|
|
|
if (!hv_dev->channel)
|
|
|
|
return -ENODEV;
|
2018-12-18 04:16:09 +08:00
|
|
|
|
|
|
|
ret = hv_ringbuffer_get_debuginfo(&hv_dev->channel->outbound,
|
|
|
|
&outbound);
|
|
|
|
if (ret < 0)
|
|
|
|
return ret;
|
2024-03-19 11:43:50 +08:00
|
|
|
return sysfs_emit(buf, "%d\n", outbound.current_write_index);
|
2013-09-14 02:33:01 +08:00
|
|
|
}
|
|
|
|
static DEVICE_ATTR_RO(out_write_index);
|
|
|
|
|
|
|
|
static ssize_t out_read_bytes_avail_show(struct device *dev,
|
|
|
|
struct device_attribute *dev_attr,
|
|
|
|
char *buf)
|
|
|
|
{
|
|
|
|
struct hv_device *hv_dev = device_to_hv_device(dev);
|
|
|
|
struct hv_ring_buffer_debug_info outbound;
|
2018-12-18 04:16:09 +08:00
|
|
|
int ret;
|
2013-09-14 02:33:01 +08:00
|
|
|
|
|
|
|
if (!hv_dev->channel)
|
|
|
|
return -ENODEV;
|
2018-12-18 04:16:09 +08:00
|
|
|
|
|
|
|
ret = hv_ringbuffer_get_debuginfo(&hv_dev->channel->outbound,
|
|
|
|
&outbound);
|
|
|
|
if (ret < 0)
|
|
|
|
return ret;
|
2024-03-19 11:43:50 +08:00
|
|
|
return sysfs_emit(buf, "%d\n", outbound.bytes_avail_toread);
|
2013-09-14 02:33:01 +08:00
|
|
|
}
|
|
|
|
static DEVICE_ATTR_RO(out_read_bytes_avail);
|
|
|
|
|
|
|
|
static ssize_t out_write_bytes_avail_show(struct device *dev,
|
|
|
|
struct device_attribute *dev_attr,
|
|
|
|
char *buf)
|
|
|
|
{
|
|
|
|
struct hv_device *hv_dev = device_to_hv_device(dev);
|
|
|
|
struct hv_ring_buffer_debug_info outbound;
|
2018-12-18 04:16:09 +08:00
|
|
|
int ret;
|
2013-09-14 02:33:01 +08:00
|
|
|
|
|
|
|
if (!hv_dev->channel)
|
|
|
|
return -ENODEV;
|
2018-12-18 04:16:09 +08:00
|
|
|
|
|
|
|
ret = hv_ringbuffer_get_debuginfo(&hv_dev->channel->outbound,
|
|
|
|
&outbound);
|
|
|
|
if (ret < 0)
|
|
|
|
return ret;
|
2024-03-19 11:43:50 +08:00
|
|
|
return sysfs_emit(buf, "%d\n", outbound.bytes_avail_towrite);
|
2013-09-14 02:33:01 +08:00
|
|
|
}
|
|
|
|
static DEVICE_ATTR_RO(out_write_bytes_avail);
|
|
|
|
|
|
|
|
static ssize_t in_intr_mask_show(struct device *dev,
|
|
|
|
struct device_attribute *dev_attr, char *buf)
|
|
|
|
{
|
|
|
|
struct hv_device *hv_dev = device_to_hv_device(dev);
|
|
|
|
struct hv_ring_buffer_debug_info inbound;
|
2018-12-18 04:16:09 +08:00
|
|
|
int ret;
|
2013-09-14 02:33:01 +08:00
|
|
|
|
|
|
|
if (!hv_dev->channel)
|
|
|
|
return -ENODEV;
|
2018-12-18 04:16:09 +08:00
|
|
|
|
|
|
|
ret = hv_ringbuffer_get_debuginfo(&hv_dev->channel->inbound, &inbound);
|
|
|
|
if (ret < 0)
|
|
|
|
return ret;
|
|
|
|
|
2024-03-19 11:43:50 +08:00
|
|
|
return sysfs_emit(buf, "%d\n", inbound.current_interrupt_mask);
|
2013-09-14 02:33:01 +08:00
|
|
|
}
|
|
|
|
static DEVICE_ATTR_RO(in_intr_mask);
|
|
|
|
|
|
|
|
static ssize_t in_read_index_show(struct device *dev,
|
|
|
|
struct device_attribute *dev_attr, char *buf)
|
|
|
|
{
|
|
|
|
struct hv_device *hv_dev = device_to_hv_device(dev);
|
|
|
|
struct hv_ring_buffer_debug_info inbound;
|
2018-12-18 04:16:09 +08:00
|
|
|
int ret;
|
2013-09-14 02:33:01 +08:00
|
|
|
|
|
|
|
if (!hv_dev->channel)
|
|
|
|
return -ENODEV;
|
2018-12-18 04:16:09 +08:00
|
|
|
|
|
|
|
ret = hv_ringbuffer_get_debuginfo(&hv_dev->channel->inbound, &inbound);
|
|
|
|
if (ret < 0)
|
|
|
|
return ret;
|
|
|
|
|
2024-03-19 11:43:50 +08:00
|
|
|
return sysfs_emit(buf, "%d\n", inbound.current_read_index);
|
2013-09-14 02:33:01 +08:00
|
|
|
}
|
|
|
|
static DEVICE_ATTR_RO(in_read_index);
|
|
|
|
|
|
|
|
static ssize_t in_write_index_show(struct device *dev,
|
|
|
|
struct device_attribute *dev_attr, char *buf)
|
|
|
|
{
|
|
|
|
struct hv_device *hv_dev = device_to_hv_device(dev);
|
|
|
|
struct hv_ring_buffer_debug_info inbound;
|
2018-12-18 04:16:09 +08:00
|
|
|
int ret;
|
2013-09-14 02:33:01 +08:00
|
|
|
|
|
|
|
if (!hv_dev->channel)
|
|
|
|
return -ENODEV;
|
2018-12-18 04:16:09 +08:00
|
|
|
|
|
|
|
ret = hv_ringbuffer_get_debuginfo(&hv_dev->channel->inbound, &inbound);
|
|
|
|
if (ret < 0)
|
|
|
|
return ret;
|
|
|
|
|
2024-03-19 11:43:50 +08:00
|
|
|
return sysfs_emit(buf, "%d\n", inbound.current_write_index);
|
2013-09-14 02:33:01 +08:00
|
|
|
}
|
|
|
|
static DEVICE_ATTR_RO(in_write_index);
|
|
|
|
|
|
|
|
static ssize_t in_read_bytes_avail_show(struct device *dev,
|
|
|
|
struct device_attribute *dev_attr,
|
|
|
|
char *buf)
|
|
|
|
{
|
|
|
|
struct hv_device *hv_dev = device_to_hv_device(dev);
|
|
|
|
struct hv_ring_buffer_debug_info inbound;
|
2018-12-18 04:16:09 +08:00
|
|
|
int ret;
|
2013-09-14 02:33:01 +08:00
|
|
|
|
|
|
|
if (!hv_dev->channel)
|
|
|
|
return -ENODEV;
|
2018-12-18 04:16:09 +08:00
|
|
|
|
|
|
|
ret = hv_ringbuffer_get_debuginfo(&hv_dev->channel->inbound, &inbound);
|
|
|
|
if (ret < 0)
|
|
|
|
return ret;
|
|
|
|
|
2024-03-19 11:43:50 +08:00
|
|
|
return sysfs_emit(buf, "%d\n", inbound.bytes_avail_toread);
|
2013-09-14 02:33:01 +08:00
|
|
|
}
|
|
|
|
static DEVICE_ATTR_RO(in_read_bytes_avail);
|
|
|
|
|
|
|
|
static ssize_t in_write_bytes_avail_show(struct device *dev,
|
|
|
|
struct device_attribute *dev_attr,
|
|
|
|
char *buf)
|
|
|
|
{
|
|
|
|
struct hv_device *hv_dev = device_to_hv_device(dev);
|
|
|
|
struct hv_ring_buffer_debug_info inbound;
|
2018-12-18 04:16:09 +08:00
|
|
|
int ret;
|
2013-09-14 02:33:01 +08:00
|
|
|
|
|
|
|
if (!hv_dev->channel)
|
|
|
|
return -ENODEV;
|
2018-12-18 04:16:09 +08:00
|
|
|
|
|
|
|
ret = hv_ringbuffer_get_debuginfo(&hv_dev->channel->inbound, &inbound);
|
|
|
|
if (ret < 0)
|
|
|
|
return ret;
|
|
|
|
|
2024-03-19 11:43:50 +08:00
|
|
|
return sysfs_emit(buf, "%d\n", inbound.bytes_avail_towrite);
|
2013-09-14 02:33:01 +08:00
|
|
|
}
|
|
|
|
static DEVICE_ATTR_RO(in_write_bytes_avail);
|
|
|
|
|
2015-08-05 15:52:43 +08:00
|
|
|
static ssize_t channel_vp_mapping_show(struct device *dev,
|
|
|
|
struct device_attribute *dev_attr,
|
|
|
|
char *buf)
|
|
|
|
{
|
|
|
|
struct hv_device *hv_dev = device_to_hv_device(dev);
|
|
|
|
struct vmbus_channel *channel = hv_dev->channel, *cur_sc;
|
2024-03-19 11:43:50 +08:00
|
|
|
int n_written;
|
2015-08-05 15:52:43 +08:00
|
|
|
struct list_head *cur;
|
|
|
|
|
|
|
|
if (!channel)
|
|
|
|
return -ENODEV;
|
|
|
|
|
2020-06-18 00:46:39 +08:00
|
|
|
mutex_lock(&vmbus_connection.channel_mutex);
|
|
|
|
|
2024-03-19 11:43:50 +08:00
|
|
|
n_written = sysfs_emit(buf, "%u:%u\n",
|
|
|
|
channel->offermsg.child_relid,
|
|
|
|
channel->target_cpu);
|
2015-08-05 15:52:43 +08:00
|
|
|
|
|
|
|
list_for_each(cur, &channel->sc_list) {
|
|
|
|
|
|
|
|
cur_sc = list_entry(cur, struct vmbus_channel, sc_list);
|
2024-03-19 11:43:50 +08:00
|
|
|
n_written += sysfs_emit_at(buf, n_written, "%u:%u\n",
|
|
|
|
cur_sc->offermsg.child_relid,
|
|
|
|
cur_sc->target_cpu);
|
2015-08-05 15:52:43 +08:00
|
|
|
}
|
|
|
|
|
2020-06-18 00:46:39 +08:00
|
|
|
mutex_unlock(&vmbus_connection.channel_mutex);
|
2015-08-05 15:52:43 +08:00
|
|
|
|
2024-03-19 11:43:50 +08:00
|
|
|
return n_written;
|
2015-08-05 15:52:43 +08:00
|
|
|
}
|
|
|
|
static DEVICE_ATTR_RO(channel_vp_mapping);
|
|
|
|
|
2015-12-26 12:00:30 +08:00
|
|
|
static ssize_t vendor_show(struct device *dev,
|
|
|
|
struct device_attribute *dev_attr,
|
|
|
|
char *buf)
|
|
|
|
{
|
|
|
|
struct hv_device *hv_dev = device_to_hv_device(dev);
|
2020-11-16 03:57:31 +08:00
|
|
|
|
2024-03-19 11:43:50 +08:00
|
|
|
return sysfs_emit(buf, "0x%x\n", hv_dev->vendor_id);
|
2015-12-26 12:00:30 +08:00
|
|
|
}
|
|
|
|
static DEVICE_ATTR_RO(vendor);
|
|
|
|
|
|
|
|
static ssize_t device_show(struct device *dev,
|
|
|
|
struct device_attribute *dev_attr,
|
|
|
|
char *buf)
|
|
|
|
{
|
|
|
|
struct hv_device *hv_dev = device_to_hv_device(dev);
|
2020-11-16 03:57:31 +08:00
|
|
|
|
2024-03-19 11:43:50 +08:00
|
|
|
return sysfs_emit(buf, "0x%x\n", hv_dev->device_id);
|
2015-12-26 12:00:30 +08:00
|
|
|
}
|
|
|
|
static DEVICE_ATTR_RO(device);
|
|
|
|
|
2018-08-11 07:06:08 +08:00
|
|
|
static ssize_t driver_override_store(struct device *dev,
|
|
|
|
struct device_attribute *attr,
|
|
|
|
const char *buf, size_t count)
|
|
|
|
{
|
|
|
|
struct hv_device *hv_dev = device_to_hv_device(dev);
|
2022-04-19 19:34:27 +08:00
|
|
|
int ret;
|
2018-08-11 07:06:08 +08:00
|
|
|
|
2022-04-19 19:34:27 +08:00
|
|
|
ret = driver_set_override(dev, &hv_dev->driver_override, buf, count);
|
|
|
|
if (ret)
|
|
|
|
return ret;
|
2018-08-11 07:06:08 +08:00
|
|
|
|
|
|
|
return count;
|
|
|
|
}
|
|
|
|
|
|
|
|
static ssize_t driver_override_show(struct device *dev,
|
|
|
|
struct device_attribute *attr, char *buf)
|
|
|
|
{
|
|
|
|
struct hv_device *hv_dev = device_to_hv_device(dev);
|
|
|
|
ssize_t len;
|
|
|
|
|
|
|
|
device_lock(dev);
|
2024-03-19 11:43:50 +08:00
|
|
|
len = sysfs_emit(buf, "%s\n", hv_dev->driver_override);
|
2018-08-11 07:06:08 +08:00
|
|
|
device_unlock(dev);
|
|
|
|
|
|
|
|
return len;
|
|
|
|
}
|
|
|
|
static DEVICE_ATTR_RW(driver_override);
|
|
|
|
|
2013-09-14 02:33:01 +08:00
|
|
|
/* Set up per device attributes in /sys/bus/vmbus/devices/<bus device> */
|
2016-12-04 04:34:39 +08:00
|
|
|
static struct attribute *vmbus_dev_attrs[] = {
|
2013-09-14 02:32:49 +08:00
|
|
|
&dev_attr_id.attr,
|
2013-09-14 02:32:50 +08:00
|
|
|
&dev_attr_state.attr,
|
2013-09-14 02:32:51 +08:00
|
|
|
&dev_attr_monitor_id.attr,
|
2013-09-14 02:32:53 +08:00
|
|
|
&dev_attr_class_id.attr,
|
2013-09-14 02:32:54 +08:00
|
|
|
&dev_attr_device_id.attr,
|
2013-09-14 02:32:52 +08:00
|
|
|
&dev_attr_modalias.attr,
|
2018-07-29 05:58:48 +08:00
|
|
|
#ifdef CONFIG_NUMA
|
|
|
|
&dev_attr_numa_node.attr,
|
|
|
|
#endif
|
2013-09-14 02:32:56 +08:00
|
|
|
&dev_attr_server_monitor_pending.attr,
|
|
|
|
&dev_attr_client_monitor_pending.attr,
|
2013-09-14 02:32:57 +08:00
|
|
|
&dev_attr_server_monitor_latency.attr,
|
|
|
|
&dev_attr_client_monitor_latency.attr,
|
2013-09-14 02:32:58 +08:00
|
|
|
&dev_attr_server_monitor_conn_id.attr,
|
|
|
|
&dev_attr_client_monitor_conn_id.attr,
|
2013-09-14 02:33:01 +08:00
|
|
|
&dev_attr_out_intr_mask.attr,
|
|
|
|
&dev_attr_out_read_index.attr,
|
|
|
|
&dev_attr_out_write_index.attr,
|
|
|
|
&dev_attr_out_read_bytes_avail.attr,
|
|
|
|
&dev_attr_out_write_bytes_avail.attr,
|
|
|
|
&dev_attr_in_intr_mask.attr,
|
|
|
|
&dev_attr_in_read_index.attr,
|
|
|
|
&dev_attr_in_write_index.attr,
|
|
|
|
&dev_attr_in_read_bytes_avail.attr,
|
|
|
|
&dev_attr_in_write_bytes_avail.attr,
|
2015-08-05 15:52:43 +08:00
|
|
|
&dev_attr_channel_vp_mapping.attr,
|
2015-12-26 12:00:30 +08:00
|
|
|
&dev_attr_vendor.attr,
|
|
|
|
&dev_attr_device.attr,
|
2018-08-11 07:06:08 +08:00
|
|
|
&dev_attr_driver_override.attr,
|
2013-09-14 02:32:49 +08:00
|
|
|
NULL,
|
|
|
|
};
|
2019-03-19 12:04:01 +08:00
|
|
|
|
|
|
|
/*
|
|
|
|
* Device-level attribute_group callback function. Returns the permission for
|
|
|
|
* each attribute, and returns 0 if an attribute is not visible.
|
|
|
|
*/
|
|
|
|
static umode_t vmbus_dev_attr_is_visible(struct kobject *kobj,
|
|
|
|
struct attribute *attr, int idx)
|
|
|
|
{
|
|
|
|
struct device *dev = kobj_to_dev(kobj);
|
|
|
|
const struct hv_device *hv_dev = device_to_hv_device(dev);
|
|
|
|
|
|
|
|
/* Hide the monitor attributes if the monitor mechanism is not used. */
|
|
|
|
if (!hv_dev->channel->offermsg.monitor_allocated &&
|
|
|
|
(attr == &dev_attr_monitor_id.attr ||
|
|
|
|
attr == &dev_attr_server_monitor_pending.attr ||
|
|
|
|
attr == &dev_attr_client_monitor_pending.attr ||
|
|
|
|
attr == &dev_attr_server_monitor_latency.attr ||
|
|
|
|
attr == &dev_attr_client_monitor_latency.attr ||
|
|
|
|
attr == &dev_attr_server_monitor_conn_id.attr ||
|
|
|
|
attr == &dev_attr_client_monitor_conn_id.attr))
|
|
|
|
return 0;
|
|
|
|
|
|
|
|
return attr->mode;
|
|
|
|
}
|
|
|
|
|
|
|
|
static const struct attribute_group vmbus_dev_group = {
|
|
|
|
.attrs = vmbus_dev_attrs,
|
|
|
|
.is_visible = vmbus_dev_attr_is_visible
|
|
|
|
};
|
|
|
|
__ATTRIBUTE_GROUPS(vmbus_dev);
|
2013-09-14 02:32:49 +08:00
|
|
|
|
2021-01-07 09:45:52 +08:00
|
|
|
/* Set up the attribute for /sys/bus/vmbus/hibernation */
|
2023-03-14 02:29:05 +08:00
|
|
|
static ssize_t hibernation_show(const struct bus_type *bus, char *buf)
|
2021-01-07 09:45:52 +08:00
|
|
|
{
|
|
|
|
return sprintf(buf, "%d\n", !!hv_is_hibernation_supported());
|
|
|
|
}
|
|
|
|
|
|
|
|
static BUS_ATTR_RO(hibernation);
|
|
|
|
|
|
|
|
static struct attribute *vmbus_bus_attrs[] = {
|
|
|
|
&bus_attr_hibernation.attr,
|
|
|
|
NULL,
|
|
|
|
};
|
|
|
|
static const struct attribute_group vmbus_bus_group = {
|
|
|
|
.attrs = vmbus_bus_attrs,
|
|
|
|
};
|
|
|
|
__ATTRIBUTE_GROUPS(vmbus_bus);
|
|
|
|
|
2011-03-16 06:03:37 +08:00
|
|
|
/*
|
|
|
|
* vmbus_uevent - add uevent for our device
|
|
|
|
*
|
|
|
|
* This routine is invoked when a device is added or removed on the vmbus to
|
|
|
|
* generate a uevent to udev in the userspace. The udev will then look at its
|
|
|
|
* rule and the uevent generated here to load the appropriate driver
|
2011-08-26 00:48:38 +08:00
|
|
|
*
|
|
|
|
* The alias string will be of the form vmbus:guid where guid is the string
|
|
|
|
* representation of the device guid (each byte of the guid will be
|
|
|
|
* represented with two hex characters.
|
2011-03-16 06:03:37 +08:00
|
|
|
*/
|
2023-01-11 19:30:17 +08:00
|
|
|
static int vmbus_uevent(const struct device *device, struct kobj_uevent_env *env)
|
2011-03-16 06:03:37 +08:00
|
|
|
{
|
2023-01-11 19:30:17 +08:00
|
|
|
const struct hv_device *dev = device_to_hv_device(device);
|
2020-04-23 21:45:04 +08:00
|
|
|
const char *format = "MODALIAS=vmbus:%*phN";
|
2011-08-26 00:48:38 +08:00
|
|
|
|
2020-04-23 21:45:04 +08:00
|
|
|
return add_uevent_var(env, format, UUID_SIZE, &dev->dev_type);
|
2011-03-16 06:03:37 +08:00
|
|
|
}
|
|
|
|
|
2018-08-11 07:06:08 +08:00
|
|
|
static const struct hv_vmbus_device_id *
|
2019-01-10 22:25:32 +08:00
|
|
|
hv_vmbus_dev_match(const struct hv_vmbus_device_id *id, const guid_t *guid)
|
2018-08-11 07:06:08 +08:00
|
|
|
{
|
|
|
|
if (id == NULL)
|
|
|
|
return NULL; /* empty device table */
|
|
|
|
|
2019-01-10 22:25:32 +08:00
|
|
|
for (; !guid_is_null(&id->guid); id++)
|
|
|
|
if (guid_equal(&id->guid, guid))
|
2018-08-11 07:06:08 +08:00
|
|
|
return id;
|
|
|
|
|
|
|
|
return NULL;
|
|
|
|
}
|
|
|
|
|
|
|
|
static const struct hv_vmbus_device_id *
|
2019-01-10 22:25:32 +08:00
|
|
|
hv_vmbus_dynid_match(struct hv_driver *drv, const guid_t *guid)
|
2011-09-14 01:59:37 +08:00
|
|
|
{
|
2016-12-04 04:34:39 +08:00
|
|
|
const struct hv_vmbus_device_id *id = NULL;
|
|
|
|
struct vmbus_dynid *dynid;
|
|
|
|
|
|
|
|
spin_lock(&drv->dynids.lock);
|
|
|
|
list_for_each_entry(dynid, &drv->dynids.list, node) {
|
2019-01-10 22:25:32 +08:00
|
|
|
if (guid_equal(&dynid->id.guid, guid)) {
|
2016-12-04 04:34:39 +08:00
|
|
|
id = &dynid->id;
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
spin_unlock(&drv->dynids.lock);
|
|
|
|
|
2018-08-11 07:06:08 +08:00
|
|
|
return id;
|
|
|
|
}
|
2016-12-04 04:34:39 +08:00
|
|
|
|
2019-01-10 22:25:32 +08:00
|
|
|
static const struct hv_vmbus_device_id vmbus_device_null;
|
2016-12-04 04:34:39 +08:00
|
|
|
|
2018-08-11 07:06:08 +08:00
|
|
|
/*
|
|
|
|
* Return a matching hv_vmbus_device_id pointer.
|
|
|
|
* If there is no match, return NULL.
|
|
|
|
*/
|
|
|
|
static const struct hv_vmbus_device_id *hv_vmbus_get_id(struct hv_driver *drv,
|
|
|
|
struct hv_device *dev)
|
|
|
|
{
|
2019-01-10 22:25:32 +08:00
|
|
|
const guid_t *guid = &dev->dev_type;
|
2018-08-11 07:06:08 +08:00
|
|
|
const struct hv_vmbus_device_id *id;
|
2011-09-14 01:59:37 +08:00
|
|
|
|
2018-08-11 07:06:08 +08:00
|
|
|
/* When driver_override is set, only bind to the matching driver */
|
|
|
|
if (dev->driver_override && strcmp(dev->driver_override, drv->name))
|
|
|
|
return NULL;
|
|
|
|
|
|
|
|
/* Look at the dynamic ids first, before the static ones */
|
|
|
|
id = hv_vmbus_dynid_match(drv, guid);
|
|
|
|
if (!id)
|
|
|
|
id = hv_vmbus_dev_match(drv->id_table, guid);
|
|
|
|
|
|
|
|
/* driver_override will always match, send a dummy id */
|
|
|
|
if (!id && dev->driver_override)
|
|
|
|
id = &vmbus_device_null;
|
|
|
|
|
|
|
|
return id;
|
2011-09-14 01:59:37 +08:00
|
|
|
}
|
|
|
|
|
2016-12-04 04:34:39 +08:00
|
|
|
/* vmbus_add_dynid - add a new device ID to this driver and re-probe devices */
|
2019-01-10 22:25:32 +08:00
|
|
|
static int vmbus_add_dynid(struct hv_driver *drv, guid_t *guid)
|
2016-12-04 04:34:39 +08:00
|
|
|
{
|
|
|
|
struct vmbus_dynid *dynid;
|
|
|
|
|
|
|
|
dynid = kzalloc(sizeof(*dynid), GFP_KERNEL);
|
|
|
|
if (!dynid)
|
|
|
|
return -ENOMEM;
|
|
|
|
|
|
|
|
dynid->id.guid = *guid;
|
|
|
|
|
|
|
|
spin_lock(&drv->dynids.lock);
|
|
|
|
list_add_tail(&dynid->node, &drv->dynids.list);
|
|
|
|
spin_unlock(&drv->dynids.lock);
|
|
|
|
|
|
|
|
return driver_attach(&drv->driver);
|
|
|
|
}
|
|
|
|
|
|
|
|
static void vmbus_free_dynids(struct hv_driver *drv)
|
|
|
|
{
|
|
|
|
struct vmbus_dynid *dynid, *n;
|
|
|
|
|
|
|
|
spin_lock(&drv->dynids.lock);
|
|
|
|
list_for_each_entry_safe(dynid, n, &drv->dynids.list, node) {
|
|
|
|
list_del(&dynid->node);
|
|
|
|
kfree(dynid);
|
|
|
|
}
|
|
|
|
spin_unlock(&drv->dynids.lock);
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* store_new_id - sysfs frontend to vmbus_add_dynid()
|
|
|
|
*
|
|
|
|
* Allow GUIDs to be added to an existing driver via sysfs.
|
|
|
|
*/
|
|
|
|
static ssize_t new_id_store(struct device_driver *driver, const char *buf,
|
|
|
|
size_t count)
|
|
|
|
{
|
|
|
|
struct hv_driver *drv = drv_to_hv_drv(driver);
|
2019-01-10 22:25:32 +08:00
|
|
|
guid_t guid;
|
2016-12-04 04:34:39 +08:00
|
|
|
ssize_t retval;
|
|
|
|
|
2019-01-10 22:25:32 +08:00
|
|
|
retval = guid_parse(buf, &guid);
|
2017-05-19 01:46:06 +08:00
|
|
|
if (retval)
|
|
|
|
return retval;
|
2016-12-04 04:34:39 +08:00
|
|
|
|
2018-08-11 07:06:08 +08:00
|
|
|
if (hv_vmbus_dynid_match(drv, &guid))
|
2016-12-04 04:34:39 +08:00
|
|
|
return -EEXIST;
|
|
|
|
|
|
|
|
retval = vmbus_add_dynid(drv, &guid);
|
|
|
|
if (retval)
|
|
|
|
return retval;
|
|
|
|
return count;
|
|
|
|
}
|
|
|
|
static DRIVER_ATTR_WO(new_id);
|
|
|
|
|
|
|
|
/*
|
|
|
|
* store_remove_id - remove a PCI device ID from this driver
|
|
|
|
*
|
|
|
|
* Removes a dynamic pci device ID to this driver.
|
|
|
|
*/
|
|
|
|
static ssize_t remove_id_store(struct device_driver *driver, const char *buf,
|
|
|
|
size_t count)
|
|
|
|
{
|
|
|
|
struct hv_driver *drv = drv_to_hv_drv(driver);
|
|
|
|
struct vmbus_dynid *dynid, *n;
|
2019-01-10 22:25:32 +08:00
|
|
|
guid_t guid;
|
2017-05-19 01:46:06 +08:00
|
|
|
ssize_t retval;
|
2016-12-04 04:34:39 +08:00
|
|
|
|
2019-01-10 22:25:32 +08:00
|
|
|
retval = guid_parse(buf, &guid);
|
2017-05-19 01:46:06 +08:00
|
|
|
if (retval)
|
|
|
|
return retval;
|
2016-12-04 04:34:39 +08:00
|
|
|
|
2017-05-19 01:46:06 +08:00
|
|
|
retval = -ENODEV;
|
2016-12-04 04:34:39 +08:00
|
|
|
spin_lock(&drv->dynids.lock);
|
|
|
|
list_for_each_entry_safe(dynid, n, &drv->dynids.list, node) {
|
|
|
|
struct hv_vmbus_device_id *id = &dynid->id;
|
|
|
|
|
2019-01-10 22:25:32 +08:00
|
|
|
if (guid_equal(&id->guid, &guid)) {
|
2016-12-04 04:34:39 +08:00
|
|
|
list_del(&dynid->node);
|
|
|
|
kfree(dynid);
|
|
|
|
retval = count;
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
spin_unlock(&drv->dynids.lock);
|
|
|
|
|
|
|
|
return retval;
|
|
|
|
}
|
|
|
|
static DRIVER_ATTR_WO(remove_id);
|
|
|
|
|
|
|
|
static struct attribute *vmbus_drv_attrs[] = {
|
|
|
|
&driver_attr_new_id.attr,
|
|
|
|
&driver_attr_remove_id.attr,
|
|
|
|
NULL,
|
|
|
|
};
|
|
|
|
ATTRIBUTE_GROUPS(vmbus_drv);
|
2011-09-14 01:59:37 +08:00
|
|
|
|
2011-03-16 06:03:38 +08:00
|
|
|
|
|
|
|
/*
|
|
|
|
* vmbus_match - Attempt to match the specified device to the specified driver
|
|
|
|
*/
|
|
|
|
static int vmbus_match(struct device *device, struct device_driver *driver)
|
|
|
|
{
|
|
|
|
struct hv_driver *drv = drv_to_hv_drv(driver);
|
2011-06-07 06:50:04 +08:00
|
|
|
struct hv_device *hv_dev = device_to_hv_device(device);
|
2011-03-16 06:03:38 +08:00
|
|
|
|
2016-01-28 14:29:41 +08:00
|
|
|
/* The hv_sock driver handles all hv_sock offers. */
|
|
|
|
if (is_hvsock_channel(hv_dev->channel))
|
|
|
|
return drv->hvsock;
|
|
|
|
|
2018-08-11 07:06:08 +08:00
|
|
|
if (hv_vmbus_get_id(drv, hv_dev))
|
2011-09-14 01:59:37 +08:00
|
|
|
return 1;
|
2011-04-27 00:20:24 +08:00
|
|
|
|
2011-08-26 00:48:39 +08:00
|
|
|
return 0;
|
2011-03-16 06:03:38 +08:00
|
|
|
}
|
|
|
|
|
2011-03-16 06:03:39 +08:00
|
|
|
/*
|
|
|
|
* vmbus_probe - Add the new vmbus's child device
|
|
|
|
*/
|
|
|
|
static int vmbus_probe(struct device *child_device)
|
|
|
|
{
|
|
|
|
int ret = 0;
|
|
|
|
struct hv_driver *drv =
|
|
|
|
drv_to_hv_drv(child_device->driver);
|
2011-04-30 04:45:10 +08:00
|
|
|
struct hv_device *dev = device_to_hv_device(child_device);
|
2011-09-14 01:59:38 +08:00
|
|
|
const struct hv_vmbus_device_id *dev_id;
|
2011-03-16 06:03:39 +08:00
|
|
|
|
2018-08-11 07:06:08 +08:00
|
|
|
dev_id = hv_vmbus_get_id(drv, dev);
|
2011-04-30 04:45:10 +08:00
|
|
|
if (drv->probe) {
|
2011-09-14 01:59:38 +08:00
|
|
|
ret = drv->probe(dev, dev_id);
|
2011-04-30 04:45:03 +08:00
|
|
|
if (ret != 0)
|
2011-03-30 04:58:47 +08:00
|
|
|
pr_err("probe failed for device %s (%d)\n",
|
|
|
|
dev_name(child_device), ret);
|
2011-03-16 06:03:39 +08:00
|
|
|
|
|
|
|
} else {
|
2011-03-30 04:58:47 +08:00
|
|
|
pr_err("probe not set for driver %s\n",
|
|
|
|
dev_name(child_device));
|
2011-06-07 06:50:07 +08:00
|
|
|
ret = -ENODEV;
|
2011-03-16 06:03:39 +08:00
|
|
|
}
|
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
|
2022-03-25 00:14:51 +08:00
|
|
|
/*
|
|
|
|
* vmbus_dma_configure -- Configure DMA coherence for VMbus device
|
|
|
|
*/
|
|
|
|
static int vmbus_dma_configure(struct device *child_device)
|
|
|
|
{
|
|
|
|
/*
|
|
|
|
* On ARM64, propagate the DMA coherence setting from the top level
|
|
|
|
* VMbus ACPI device to the child VMbus device being added here.
|
|
|
|
* On x86/x64 coherence is assumed and these calls have no effect.
|
|
|
|
*/
|
|
|
|
hv_setup_dma_ops(child_device,
|
2023-03-20 15:47:38 +08:00
|
|
|
device_get_dma_attr(hv_dev) == DEV_DMA_COHERENT);
|
2022-03-25 00:14:51 +08:00
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2011-03-16 06:03:40 +08:00
|
|
|
/*
|
|
|
|
* vmbus_remove - Remove a vmbus device
|
|
|
|
*/
|
2021-07-14 03:35:22 +08:00
|
|
|
static void vmbus_remove(struct device *child_device)
|
2011-03-16 06:03:40 +08:00
|
|
|
{
|
2015-03-01 03:18:16 +08:00
|
|
|
struct hv_driver *drv;
|
2011-04-30 04:45:12 +08:00
|
|
|
struct hv_device *dev = device_to_hv_device(child_device);
|
2011-03-16 06:03:40 +08:00
|
|
|
|
2015-03-01 03:18:16 +08:00
|
|
|
if (child_device->driver) {
|
|
|
|
drv = drv_to_hv_drv(child_device->driver);
|
|
|
|
if (drv->remove)
|
|
|
|
drv->remove(dev);
|
|
|
|
}
|
2011-03-16 06:03:40 +08:00
|
|
|
}
|
|
|
|
|
2011-03-16 06:03:41 +08:00
|
|
|
/*
|
|
|
|
* vmbus_shutdown - Shutdown a vmbus device
|
|
|
|
*/
|
|
|
|
static void vmbus_shutdown(struct device *child_device)
|
|
|
|
{
|
|
|
|
struct hv_driver *drv;
|
2011-04-30 04:45:14 +08:00
|
|
|
struct hv_device *dev = device_to_hv_device(child_device);
|
2011-03-16 06:03:41 +08:00
|
|
|
|
|
|
|
|
|
|
|
/* The device may not be attached yet */
|
|
|
|
if (!child_device->driver)
|
|
|
|
return;
|
|
|
|
|
|
|
|
drv = drv_to_hv_drv(child_device->driver);
|
|
|
|
|
2011-04-30 04:45:14 +08:00
|
|
|
if (drv->shutdown)
|
|
|
|
drv->shutdown(dev);
|
2011-03-16 06:03:41 +08:00
|
|
|
}
|
|
|
|
|
2019-09-20 05:46:12 +08:00
|
|
|
#ifdef CONFIG_PM_SLEEP
|
2019-09-06 07:01:17 +08:00
|
|
|
/*
|
|
|
|
* vmbus_suspend - Suspend a vmbus device
|
|
|
|
*/
|
|
|
|
static int vmbus_suspend(struct device *child_device)
|
|
|
|
{
|
|
|
|
struct hv_driver *drv;
|
|
|
|
struct hv_device *dev = device_to_hv_device(child_device);
|
|
|
|
|
|
|
|
/* The device may not be attached yet */
|
|
|
|
if (!child_device->driver)
|
|
|
|
return 0;
|
|
|
|
|
|
|
|
drv = drv_to_hv_drv(child_device->driver);
|
|
|
|
if (!drv->suspend)
|
|
|
|
return -EOPNOTSUPP;
|
|
|
|
|
|
|
|
return drv->suspend(dev);
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* vmbus_resume - Resume a vmbus device
|
|
|
|
*/
|
|
|
|
static int vmbus_resume(struct device *child_device)
|
|
|
|
{
|
|
|
|
struct hv_driver *drv;
|
|
|
|
struct hv_device *dev = device_to_hv_device(child_device);
|
|
|
|
|
|
|
|
/* The device may not be attached yet */
|
|
|
|
if (!child_device->driver)
|
|
|
|
return 0;
|
|
|
|
|
|
|
|
drv = drv_to_hv_drv(child_device->driver);
|
|
|
|
if (!drv->resume)
|
|
|
|
return -EOPNOTSUPP;
|
|
|
|
|
|
|
|
return drv->resume(dev);
|
|
|
|
}
|
2020-04-12 11:50:35 +08:00
|
|
|
#else
|
|
|
|
#define vmbus_suspend NULL
|
|
|
|
#define vmbus_resume NULL
|
2019-09-20 05:46:12 +08:00
|
|
|
#endif /* CONFIG_PM_SLEEP */
|
2011-03-16 06:03:42 +08:00
|
|
|
|
|
|
|
/*
|
|
|
|
* vmbus_device_release - Final callback release of the vmbus child device
|
|
|
|
*/
|
|
|
|
static void vmbus_device_release(struct device *device)
|
|
|
|
{
|
2011-06-07 06:50:04 +08:00
|
|
|
struct hv_device *hv_dev = device_to_hv_device(device);
|
Drivers: hv: vmbus: fix rescind-offer handling for device without a driver
In the path vmbus_onoffer_rescind() -> vmbus_device_unregister() ->
device_unregister() -> ... -> __device_release_driver(), we can see for a
device without a driver loaded: dev->driver is NULL, so
dev->bus->remove(dev), namely vmbus_remove(), isn't invoked.
As a result, vmbus_remove() -> hv_process_channel_removal() isn't invoked
and some cleanups(like sending a CHANNELMSG_RELID_RELEASED message to the
host) aren't done.
We can demo the issue this way:
1. rmmod hv_utils;
2. disable the Heartbeat Integration Service in Hyper-V Manager and lsvmbus
shows the device disappears.
3. re-enable the Heartbeat in Hyper-V Manager and modprobe hv_utils, but
lsvmbus shows the device can't appear again.
This is because, the host thinks the VM hasn't released the relid, so can't
re-offer the device to the VM.
We can fix the issue by moving hv_process_channel_removal()
from vmbus_close_internal() to vmbus_device_release(), since the latter is
always invoked on device_unregister(), whether or not the dev has a driver
loaded.
Signed-off-by: Dexuan Cui <decui@microsoft.com>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-12-15 08:01:49 +08:00
|
|
|
struct vmbus_channel *channel = hv_dev->channel;
|
2011-03-16 06:03:42 +08:00
|
|
|
|
2019-10-04 05:01:49 +08:00
|
|
|
hv_debug_rm_dev_dir(hv_dev);
|
|
|
|
|
2017-05-01 07:21:18 +08:00
|
|
|
mutex_lock(&vmbus_connection.channel_mutex);
|
2018-09-15 00:10:15 +08:00
|
|
|
hv_process_channel_removal(channel);
|
2017-05-01 07:21:18 +08:00
|
|
|
mutex_unlock(&vmbus_connection.channel_mutex);
|
2011-06-07 06:50:04 +08:00
|
|
|
kfree(hv_dev);
|
2011-03-16 06:03:42 +08:00
|
|
|
}
|
|
|
|
|
2019-09-06 07:01:17 +08:00
|
|
|
/*
|
2020-04-12 11:50:35 +08:00
|
|
|
* Note: we must use the "noirq" ops: see the comment before vmbus_bus_pm.
|
|
|
|
*
|
|
|
|
* suspend_noirq/resume_noirq are set to NULL to support Suspend-to-Idle: we
|
|
|
|
* shouldn't suspend the vmbus devices upon Suspend-to-Idle, otherwise there
|
|
|
|
* is no way to wake up a Generation-2 VM.
|
|
|
|
*
|
|
|
|
* The other 4 ops are for hibernation.
|
2019-09-06 07:01:17 +08:00
|
|
|
*/
|
2020-04-12 11:50:35 +08:00
|
|
|
|
2019-09-06 07:01:17 +08:00
|
|
|
static const struct dev_pm_ops vmbus_pm = {
|
2020-04-12 11:50:35 +08:00
|
|
|
.suspend_noirq = NULL,
|
|
|
|
.resume_noirq = NULL,
|
|
|
|
.freeze_noirq = vmbus_suspend,
|
|
|
|
.thaw_noirq = vmbus_resume,
|
|
|
|
.poweroff_noirq = vmbus_suspend,
|
|
|
|
.restore_noirq = vmbus_resume,
|
2019-09-06 07:01:17 +08:00
|
|
|
};
|
|
|
|
|
2009-07-28 04:47:24 +08:00
|
|
|
/* The one and only one */
|
2024-02-05 00:38:02 +08:00
|
|
|
static const struct bus_type hv_bus = {
|
2011-04-30 04:45:08 +08:00
|
|
|
.name = "vmbus",
|
|
|
|
.match = vmbus_match,
|
|
|
|
.shutdown = vmbus_shutdown,
|
|
|
|
.remove = vmbus_remove,
|
|
|
|
.probe = vmbus_probe,
|
|
|
|
.uevent = vmbus_uevent,
|
2022-03-25 00:14:51 +08:00
|
|
|
.dma_configure = vmbus_dma_configure,
|
2016-12-04 04:34:39 +08:00
|
|
|
.dev_groups = vmbus_dev_groups,
|
|
|
|
.drv_groups = vmbus_drv_groups,
|
2021-01-07 09:45:52 +08:00
|
|
|
.bus_groups = vmbus_bus_groups,
|
2019-09-06 07:01:17 +08:00
|
|
|
.pm = &vmbus_pm,
|
2009-07-14 07:02:34 +08:00
|
|
|
};
|
|
|
|
|
2010-12-16 02:48:08 +08:00
|
|
|
struct onmessage_work_context {
|
|
|
|
struct work_struct work;
|
2020-04-06 18:41:51 +08:00
|
|
|
struct {
|
|
|
|
struct hv_message_header header;
|
|
|
|
u8 payload[];
|
|
|
|
} msg;
|
2010-12-16 02:48:08 +08:00
|
|
|
};
|
|
|
|
|
|
|
|
static void vmbus_onmessage_work(struct work_struct *work)
|
|
|
|
{
|
|
|
|
struct onmessage_work_context *ctx;
|
|
|
|
|
2015-02-28 03:25:54 +08:00
|
|
|
/* Do not process messages if we're in DISCONNECTED state */
|
|
|
|
if (vmbus_connection.conn_state == DISCONNECTED)
|
|
|
|
return;
|
|
|
|
|
2010-12-16 02:48:08 +08:00
|
|
|
ctx = container_of(work, struct onmessage_work_context,
|
|
|
|
work);
|
2020-04-06 18:41:52 +08:00
|
|
|
vmbus_onmessage((struct vmbus_channel_message_header *)
|
|
|
|
&ctx->msg.payload);
|
2010-12-16 02:48:08 +08:00
|
|
|
kfree(ctx);
|
|
|
|
}
|
|
|
|
|
2016-02-27 07:13:21 +08:00
|
|
|
void vmbus_on_msg_dpc(unsigned long data)
|
2010-12-03 03:59:22 +08:00
|
|
|
{
|
2017-02-12 14:02:19 +08:00
|
|
|
struct hv_per_cpu_context *hv_cpu = (void *)data;
|
|
|
|
void *page_addr = hv_cpu->synic_message_page;
|
2020-12-09 15:08:24 +08:00
|
|
|
struct hv_message msg_copy, *msg = (struct hv_message *)page_addr +
|
2010-12-03 03:59:22 +08:00
|
|
|
VMBUS_MESSAGE_SINT;
|
2015-03-28 00:10:08 +08:00
|
|
|
struct vmbus_channel_message_header *hdr;
|
2020-12-09 15:08:23 +08:00
|
|
|
enum vmbus_channel_message_type msgtype;
|
2017-03-05 09:27:16 +08:00
|
|
|
const struct vmbus_channel_message_table_entry *entry;
|
2010-12-16 02:48:08 +08:00
|
|
|
struct onmessage_work_context *ctx;
|
2020-12-09 15:08:23 +08:00
|
|
|
__u8 payload_size;
|
2020-12-09 15:08:24 +08:00
|
|
|
u32 message_type;
|
2010-12-03 03:59:22 +08:00
|
|
|
|
2020-04-06 18:43:15 +08:00
|
|
|
/*
|
|
|
|
* 'enum vmbus_channel_message_type' is supposed to always be 'u32' as
|
|
|
|
* it is being used in 'struct vmbus_channel_message_header' definition
|
|
|
|
* which is supposed to match hypervisor ABI.
|
|
|
|
*/
|
|
|
|
BUILD_BUG_ON(sizeof(enum vmbus_channel_message_type) != sizeof(u32));
|
|
|
|
|
2020-12-09 15:08:24 +08:00
|
|
|
/*
|
|
|
|
* Since the message is in memory shared with the host, an erroneous or
|
|
|
|
* malicious Hyper-V could modify the message while vmbus_on_msg_dpc()
|
|
|
|
* or individual message handlers are executing; to prevent this, copy
|
|
|
|
* the message into private memory.
|
|
|
|
*/
|
|
|
|
memcpy(&msg_copy, msg, sizeof(struct hv_message));
|
|
|
|
|
|
|
|
message_type = msg_copy.header.message_type;
|
2016-05-01 10:21:34 +08:00
|
|
|
if (message_type == HVMSG_NONE)
|
2016-02-27 07:13:15 +08:00
|
|
|
/* no msg */
|
|
|
|
return;
|
2015-03-28 00:10:08 +08:00
|
|
|
|
2020-12-09 15:08:24 +08:00
|
|
|
hdr = (struct vmbus_channel_message_header *)msg_copy.u.payload;
|
2020-12-09 15:08:23 +08:00
|
|
|
msgtype = hdr->msgtype;
|
2015-03-28 00:10:08 +08:00
|
|
|
|
2017-10-30 03:21:00 +08:00
|
|
|
trace_vmbus_on_msg_dpc(hdr);
|
|
|
|
|
2020-12-09 15:08:23 +08:00
|
|
|
if (msgtype >= CHANNELMSG_COUNT) {
|
|
|
|
WARN_ONCE(1, "unknown msgtype=%d\n", msgtype);
|
2016-02-27 07:13:15 +08:00
|
|
|
goto msg_handled;
|
|
|
|
}
|
2015-03-28 00:10:08 +08:00
|
|
|
|
2020-12-09 15:08:24 +08:00
|
|
|
payload_size = msg_copy.header.payload_size;
|
2020-12-09 15:08:23 +08:00
|
|
|
if (payload_size > HV_MESSAGE_PAYLOAD_BYTE_COUNT) {
|
|
|
|
WARN_ONCE(1, "payload size is too large (%d)\n", payload_size);
|
2020-04-06 18:41:50 +08:00
|
|
|
goto msg_handled;
|
|
|
|
}
|
|
|
|
|
2020-12-09 15:08:23 +08:00
|
|
|
entry = &channel_message_table[msgtype];
|
2020-01-20 07:29:22 +08:00
|
|
|
|
|
|
|
if (!entry->message_handler)
|
|
|
|
goto msg_handled;
|
|
|
|
|
2020-12-09 15:08:23 +08:00
|
|
|
if (payload_size < entry->min_payload_len) {
|
|
|
|
WARN_ONCE(1, "message too short: msgtype=%d len=%d\n", msgtype, payload_size);
|
2020-04-06 18:43:26 +08:00
|
|
|
goto msg_handled;
|
|
|
|
}
|
|
|
|
|
2016-02-27 07:13:15 +08:00
|
|
|
if (entry->handler_type == VMHT_BLOCKING) {
|
2022-01-26 02:01:31 +08:00
|
|
|
ctx = kmalloc(struct_size(ctx, msg.payload, payload_size), GFP_ATOMIC);
|
2016-02-27 07:13:15 +08:00
|
|
|
if (ctx == NULL)
|
|
|
|
return;
|
2015-03-28 00:10:08 +08:00
|
|
|
|
2016-02-27 07:13:15 +08:00
|
|
|
INIT_WORK(&ctx->work, vmbus_onmessage_work);
|
2022-09-28 05:17:36 +08:00
|
|
|
ctx->msg.header = msg_copy.header;
|
|
|
|
memcpy(&ctx->msg.payload, msg_copy.u.payload, payload_size);
|
2015-03-28 00:10:08 +08:00
|
|
|
|
2017-05-01 07:21:18 +08:00
|
|
|
/*
|
|
|
|
* The host can generate a rescind message while we
|
|
|
|
* may still be handling the original offer. We deal with
|
2020-04-06 08:15:05 +08:00
|
|
|
* this condition by relying on the synchronization provided
|
|
|
|
* by offer_in_progress and by channel_mutex. See also the
|
|
|
|
* inline comments in vmbus_onoffer_rescind().
|
2017-05-01 07:21:18 +08:00
|
|
|
*/
|
2020-12-09 15:08:23 +08:00
|
|
|
switch (msgtype) {
|
2017-05-01 07:21:18 +08:00
|
|
|
case CHANNELMSG_RESCIND_CHANNELOFFER:
|
|
|
|
/*
|
|
|
|
* If we are handling the rescind message;
|
|
|
|
* schedule the work on the global work queue.
|
2020-04-06 08:15:04 +08:00
|
|
|
*
|
|
|
|
* The OFFER message and the RESCIND message should
|
|
|
|
* not be handled by the same serialized work queue,
|
|
|
|
* because the OFFER handler may call vmbus_open(),
|
|
|
|
* which tries to open the channel by sending an
|
|
|
|
* OPEN_CHANNEL message to the host and waits for
|
|
|
|
* the host's response; however, if the host has
|
|
|
|
* rescinded the channel before it receives the
|
|
|
|
* OPEN_CHANNEL message, the host just silently
|
|
|
|
* ignores the OPEN_CHANNEL message; as a result,
|
|
|
|
* the guest's OFFER handler hangs for ever, if we
|
|
|
|
* handle the RESCIND message in the same serialized
|
|
|
|
* work queue: the RESCIND handler can not start to
|
|
|
|
* run before the OFFER handler finishes.
|
2017-05-01 07:21:18 +08:00
|
|
|
*/
|
2022-07-11 12:11:47 +08:00
|
|
|
if (vmbus_connection.ignore_any_offer_msg)
|
|
|
|
break;
|
|
|
|
queue_work(vmbus_connection.rescind_work_queue, &ctx->work);
|
2017-05-01 07:21:18 +08:00
|
|
|
break;
|
|
|
|
|
|
|
|
case CHANNELMSG_OFFERCHANNEL:
|
2020-04-06 08:15:05 +08:00
|
|
|
/*
|
|
|
|
* The host sends the offer message of a given channel
|
|
|
|
* before sending the rescind message of the same
|
|
|
|
* channel. These messages are sent to the guest's
|
|
|
|
* connect CPU; the guest then starts processing them
|
|
|
|
* in the tasklet handler on this CPU:
|
|
|
|
*
|
|
|
|
* VMBUS_CONNECT_CPU
|
|
|
|
*
|
|
|
|
* [vmbus_on_msg_dpc()]
|
|
|
|
* atomic_inc() // CHANNELMSG_OFFERCHANNEL
|
|
|
|
* queue_work()
|
|
|
|
* ...
|
|
|
|
* [vmbus_on_msg_dpc()]
|
|
|
|
* schedule_work() // CHANNELMSG_RESCIND_CHANNELOFFER
|
|
|
|
*
|
|
|
|
* We rely on the memory-ordering properties of the
|
|
|
|
* queue_work() and schedule_work() primitives, which
|
|
|
|
* guarantee that the atomic increment will be visible
|
|
|
|
* to the CPUs which will execute the offer & rescind
|
|
|
|
* works by the time these works will start execution.
|
|
|
|
*/
|
2022-07-11 12:11:47 +08:00
|
|
|
if (vmbus_connection.ignore_any_offer_msg)
|
|
|
|
break;
|
2017-05-01 07:21:18 +08:00
|
|
|
atomic_inc(&vmbus_connection.offer_in_progress);
|
2020-04-06 08:15:05 +08:00
|
|
|
fallthrough;
|
2017-05-01 07:21:18 +08:00
|
|
|
|
|
|
|
default:
|
|
|
|
queue_work(vmbus_connection.work_queue, &ctx->work);
|
|
|
|
}
|
2016-02-27 07:13:15 +08:00
|
|
|
} else
|
|
|
|
entry->message_handler(hdr);
|
2010-12-03 03:59:22 +08:00
|
|
|
|
2015-03-28 00:10:08 +08:00
|
|
|
msg_handled:
|
2016-05-01 10:21:34 +08:00
|
|
|
vmbus_signal_eom(msg, message_type);
|
2010-12-03 03:59:22 +08:00
|
|
|
}
|
|
|
|
|
2019-09-20 05:46:12 +08:00
|
|
|
#ifdef CONFIG_PM_SLEEP
|
2019-09-06 07:01:20 +08:00
|
|
|
/*
|
|
|
|
* Fake RESCIND_CHANNEL messages to clean up hv_sock channels by force for
|
|
|
|
* hibernation, because hv_sock connections can not persist across hibernation.
|
|
|
|
*/
|
|
|
|
static void vmbus_force_channel_rescinded(struct vmbus_channel *channel)
|
|
|
|
{
|
|
|
|
struct onmessage_work_context *ctx;
|
|
|
|
struct vmbus_channel_rescind_offer *rescind;
|
|
|
|
|
|
|
|
WARN_ON(!is_hvsock_channel(channel));
|
|
|
|
|
|
|
|
/*
|
2020-04-06 18:41:51 +08:00
|
|
|
* Allocation size is small and the allocation should really not fail,
|
2019-09-06 07:01:20 +08:00
|
|
|
* otherwise the state of the hv_sock connections ends up in limbo.
|
|
|
|
*/
|
2020-04-06 18:41:51 +08:00
|
|
|
ctx = kzalloc(sizeof(*ctx) + sizeof(*rescind),
|
|
|
|
GFP_KERNEL | __GFP_NOFAIL);
|
2019-09-06 07:01:20 +08:00
|
|
|
|
|
|
|
/*
|
|
|
|
* So far, these are not really used by Linux. Just set them to the
|
|
|
|
* reasonable values conforming to the definitions of the fields.
|
|
|
|
*/
|
|
|
|
ctx->msg.header.message_type = 1;
|
|
|
|
ctx->msg.header.payload_size = sizeof(*rescind);
|
|
|
|
|
|
|
|
/* These values are actually used by Linux. */
|
2020-04-06 18:41:51 +08:00
|
|
|
rescind = (struct vmbus_channel_rescind_offer *)ctx->msg.payload;
|
2019-09-06 07:01:20 +08:00
|
|
|
rescind->header.msgtype = CHANNELMSG_RESCIND_CHANNELOFFER;
|
|
|
|
rescind->child_relid = channel->offermsg.child_relid;
|
|
|
|
|
|
|
|
INIT_WORK(&ctx->work, vmbus_onmessage_work);
|
|
|
|
|
2020-04-06 08:15:05 +08:00
|
|
|
queue_work(vmbus_connection.work_queue, &ctx->work);
|
2019-09-06 07:01:20 +08:00
|
|
|
}
|
2019-09-20 05:46:12 +08:00
|
|
|
#endif /* CONFIG_PM_SLEEP */
|
2017-02-12 14:02:20 +08:00
|
|
|
|
|
|
|
/*
|
|
|
|
* Schedule all channels with events pending
|
|
|
|
*/
|
|
|
|
static void vmbus_chan_sched(struct hv_per_cpu_context *hv_cpu)
|
|
|
|
{
|
|
|
|
unsigned long *recv_int_page;
|
|
|
|
u32 maxbits, relid;
|
|
|
|
|
2022-05-03 00:36:28 +08:00
|
|
|
/*
|
|
|
|
* The event page can be directly checked to get the id of
|
|
|
|
* the channel that has the interrupt pending.
|
|
|
|
*/
|
|
|
|
void *page_addr = hv_cpu->synic_event_page;
|
|
|
|
union hv_synic_event_flags *event
|
|
|
|
= (union hv_synic_event_flags *)page_addr +
|
|
|
|
VMBUS_MESSAGE_SINT;
|
2017-02-12 14:02:20 +08:00
|
|
|
|
2022-05-03 00:36:28 +08:00
|
|
|
maxbits = HV_EVENT_FLAGS_COUNT;
|
|
|
|
recv_int_page = event->flags;
|
2017-02-12 14:02:20 +08:00
|
|
|
|
|
|
|
if (unlikely(!recv_int_page))
|
|
|
|
return;
|
|
|
|
|
|
|
|
for_each_set_bit(relid, recv_int_page, maxbits) {
|
2020-04-06 08:15:09 +08:00
|
|
|
void (*callback_fn)(void *context);
|
2017-02-12 14:02:20 +08:00
|
|
|
struct vmbus_channel *channel;
|
|
|
|
|
|
|
|
if (!sync_test_and_clear_bit(relid, recv_int_page))
|
|
|
|
continue;
|
|
|
|
|
|
|
|
/* Special case - vmbus channel protocol msg */
|
|
|
|
if (relid == 0)
|
|
|
|
continue;
|
|
|
|
|
2020-04-06 08:15:06 +08:00
|
|
|
/*
|
|
|
|
* Pairs with the kfree_rcu() in vmbus_chan_release().
|
|
|
|
* Guarantees that the channel data structure doesn't
|
|
|
|
* get freed while the channel pointer below is being
|
|
|
|
* dereferenced.
|
|
|
|
*/
|
2017-03-05 09:13:57 +08:00
|
|
|
rcu_read_lock();
|
|
|
|
|
2017-02-12 14:02:20 +08:00
|
|
|
/* Find channel based on relid */
|
2020-04-06 08:15:06 +08:00
|
|
|
channel = relid2channel(relid);
|
|
|
|
if (channel == NULL)
|
|
|
|
goto sched_unlock_rcu;
|
2017-02-12 14:02:21 +08:00
|
|
|
|
2020-04-06 08:15:06 +08:00
|
|
|
if (channel->rescind)
|
|
|
|
goto sched_unlock_rcu;
|
2017-08-12 01:03:59 +08:00
|
|
|
|
2020-04-06 08:15:09 +08:00
|
|
|
/*
|
|
|
|
* Make sure that the ring buffer data structure doesn't get
|
|
|
|
* freed while we dereference the ring buffer pointer. Test
|
|
|
|
* for the channel's onchannel_callback being NULL within a
|
|
|
|
* sched_lock critical section. See also the inline comments
|
|
|
|
* in vmbus_reset_channel_cb().
|
|
|
|
*/
|
|
|
|
spin_lock(&channel->sched_lock);
|
2017-10-30 03:21:16 +08:00
|
|
|
|
2020-04-06 08:15:09 +08:00
|
|
|
callback_fn = channel->onchannel_callback;
|
|
|
|
if (unlikely(callback_fn == NULL))
|
|
|
|
goto sched_unlock;
|
2017-10-30 02:33:40 +08:00
|
|
|
|
2020-04-06 08:15:06 +08:00
|
|
|
trace_vmbus_chan_sched(channel);
|
2017-02-12 14:02:21 +08:00
|
|
|
|
2020-04-06 08:15:06 +08:00
|
|
|
++channel->interrupts;
|
2017-10-30 02:33:40 +08:00
|
|
|
|
2020-04-06 08:15:06 +08:00
|
|
|
switch (channel->callback_mode) {
|
|
|
|
case HV_CALL_ISR:
|
2020-04-06 08:15:09 +08:00
|
|
|
(*callback_fn)(channel->channel_callback_context);
|
2020-04-06 08:15:06 +08:00
|
|
|
break;
|
2017-02-12 14:02:21 +08:00
|
|
|
|
2020-04-06 08:15:06 +08:00
|
|
|
case HV_CALL_BATCHED:
|
|
|
|
hv_begin_read(&channel->inbound);
|
|
|
|
fallthrough;
|
|
|
|
case HV_CALL_DIRECT:
|
|
|
|
tasklet_schedule(&channel->callback_event);
|
2017-02-12 14:02:20 +08:00
|
|
|
}
|
2017-03-05 09:13:57 +08:00
|
|
|
|
2020-04-06 08:15:09 +08:00
|
|
|
sched_unlock:
|
|
|
|
spin_unlock(&channel->sched_lock);
|
2020-04-06 08:15:06 +08:00
|
|
|
sched_unlock_rcu:
|
2017-03-05 09:13:57 +08:00
|
|
|
rcu_read_unlock();
|
2017-02-12 14:02:20 +08:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2014-03-05 20:42:14 +08:00
|
|
|
static void vmbus_isr(void)
|
2010-12-03 03:59:22 +08:00
|
|
|
{
|
2017-02-12 14:02:19 +08:00
|
|
|
struct hv_per_cpu_context *hv_cpu
|
|
|
|
= this_cpu_ptr(hv_context.cpu_context);
|
2022-05-03 00:36:28 +08:00
|
|
|
void *page_addr;
|
2010-12-03 03:59:22 +08:00
|
|
|
struct hv_message *msg;
|
2011-03-16 06:03:43 +08:00
|
|
|
|
2022-05-03 00:36:28 +08:00
|
|
|
vmbus_chan_sched(hv_cpu);
|
2012-12-01 22:46:49 +08:00
|
|
|
|
2017-02-12 14:02:19 +08:00
|
|
|
page_addr = hv_cpu->synic_message_page;
|
2011-09-01 05:35:56 +08:00
|
|
|
msg = (struct hv_message *)page_addr + VMBUS_MESSAGE_SINT;
|
|
|
|
|
|
|
|
/* Check if there are actual msgs to be processed */
|
2015-01-10 15:54:32 +08:00
|
|
|
if (msg->header.message_type != HVMSG_NONE) {
|
2019-07-01 12:25:56 +08:00
|
|
|
if (msg->header.message_type == HVMSG_TIMER_EXPIRED) {
|
|
|
|
hv_stimer0_isr();
|
|
|
|
vmbus_signal_eom(msg, HVMSG_TIMER_EXPIRED);
|
|
|
|
} else
|
2017-02-12 14:02:19 +08:00
|
|
|
tasklet_schedule(&hv_cpu->msg_dpc);
|
2015-01-10 15:54:32 +08:00
|
|
|
}
|
2016-05-02 14:14:34 +08:00
|
|
|
|
2021-12-07 20:17:33 +08:00
|
|
|
add_interrupt_randomness(vmbus_interrupt);
|
2021-03-03 05:38:18 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
static irqreturn_t vmbus_percpu_isr(int irq, void *dev_id)
|
|
|
|
{
|
|
|
|
vmbus_isr();
|
|
|
|
return IRQ_HANDLED;
|
2011-03-16 06:03:43 +08:00
|
|
|
}
|
|
|
|
|
2010-03-05 06:11:00 +08:00
|
|
|
/*
|
2009-09-02 22:11:14 +08:00
|
|
|
* vmbus_bus_init -Main vmbus driver initialization routine.
|
|
|
|
*
|
|
|
|
* Here, we
|
2010-03-12 06:51:23 +08:00
|
|
|
* - initialize the vmbus driver context
|
|
|
|
* - invoke the vmbus hv main init routine
|
|
|
|
* - retrieve the channel offers
|
2009-09-02 22:11:14 +08:00
|
|
|
*/
|
2015-12-15 08:01:46 +08:00
|
|
|
static int vmbus_bus_init(void)
|
2009-07-14 07:02:34 +08:00
|
|
|
{
|
2009-09-02 22:11:14 +08:00
|
|
|
int ret;
|
2009-07-14 07:02:34 +08:00
|
|
|
|
2010-12-03 04:08:08 +08:00
|
|
|
ret = hv_init();
|
2009-09-02 22:11:14 +08:00
|
|
|
if (ret != 0) {
|
2011-03-30 04:58:47 +08:00
|
|
|
pr_err("Unable to initialize the hypervisor - 0x%x\n", ret);
|
2011-06-07 06:50:08 +08:00
|
|
|
return ret;
|
2009-07-14 07:02:34 +08:00
|
|
|
}
|
|
|
|
|
2011-04-30 04:45:08 +08:00
|
|
|
ret = bus_register(&hv_bus);
|
2011-06-07 06:50:08 +08:00
|
|
|
if (ret)
|
2017-01-29 03:37:14 +08:00
|
|
|
return ret;
|
2009-07-14 07:02:34 +08:00
|
|
|
|
2021-03-03 05:38:18 +08:00
|
|
|
/*
|
|
|
|
* VMbus interrupts are best modeled as per-cpu interrupts. If
|
|
|
|
* on an architecture with support for per-cpu IRQs (e.g. ARM64),
|
|
|
|
* allocate a per-cpu IRQ using standard Linux kernel functionality.
|
|
|
|
* If not on such an architecture (e.g., x86/x64), then rely on
|
|
|
|
* code in the arch-specific portion of the code tree to connect
|
|
|
|
* the VMbus interrupt handler.
|
|
|
|
*/
|
|
|
|
|
|
|
|
if (vmbus_irq == -1) {
|
|
|
|
hv_setup_vmbus_handler(vmbus_isr);
|
|
|
|
} else {
|
|
|
|
vmbus_evt = alloc_percpu(long);
|
|
|
|
ret = request_percpu_irq(vmbus_irq, vmbus_percpu_isr,
|
|
|
|
"Hyper-V VMbus", vmbus_evt);
|
|
|
|
if (ret) {
|
|
|
|
pr_err("Can't request Hyper-V VMbus IRQ %d, Err %d",
|
|
|
|
vmbus_irq, ret);
|
|
|
|
free_percpu(vmbus_evt);
|
|
|
|
goto err_setup;
|
|
|
|
}
|
|
|
|
}
|
2009-07-14 07:02:34 +08:00
|
|
|
|
2013-06-19 11:28:10 +08:00
|
|
|
ret = hv_synic_alloc();
|
|
|
|
if (ret)
|
|
|
|
goto err_alloc;
|
2019-07-01 12:25:56 +08:00
|
|
|
|
2011-03-16 06:03:33 +08:00
|
|
|
/*
|
2019-07-01 12:25:56 +08:00
|
|
|
* Initialize the per-cpu interrupt state and stimer state.
|
|
|
|
* Then connect to the host.
|
2011-03-16 06:03:33 +08:00
|
|
|
*/
|
2017-12-23 02:19:02 +08:00
|
|
|
ret = cpuhp_setup_state(CPUHP_AP_ONLINE_DYN, "hyperv/vmbus:online",
|
2016-12-08 06:53:11 +08:00
|
|
|
hv_synic_init, hv_synic_cleanup);
|
|
|
|
if (ret < 0)
|
2023-05-05 06:41:55 +08:00
|
|
|
goto err_alloc;
|
2016-12-08 06:53:11 +08:00
|
|
|
hyperv_cpuhp_online = ret;
|
|
|
|
|
2011-03-16 06:03:33 +08:00
|
|
|
ret = vmbus_connect();
|
2011-09-01 05:35:55 +08:00
|
|
|
if (ret)
|
2015-12-15 08:01:38 +08:00
|
|
|
goto err_connect;
|
2011-03-16 06:03:33 +08:00
|
|
|
|
2020-04-06 23:53:26 +08:00
|
|
|
/*
|
drivers: hv, hyperv_fb: Untangle and refactor Hyper-V panic notifiers
Currently Hyper-V guests are among the most relevant users of the panic
infrastructure, like panic notifiers, kmsg dumpers, etc. The reasons rely
both in cleaning-up procedures (closing hypervisor <-> guest connection,
disabling some paravirtualized timer) as well as to data collection
(sending panic information to the hypervisor) and framebuffer management.
The thing is: some notifiers are related to others, ordering matters, some
functionalities are duplicated and there are lots of conditionals behind
sending panic information to the hypervisor. As part of an effort to
clean-up the panic notifiers mechanism and better document things, we
hereby address some of the issues/complexities of Hyper-V panic handling
through the following changes:
(a) We have die and panic notifiers on vmbus_drv.c and both have goals of
sending panic information to the hypervisor, though the panic notifier is
also responsible for a cleaning-up procedure.
This commit clears the code by splitting the panic notifier in two, one
for closing the vmbus connection whereas the other is only for sending
panic info to hypervisor. With that, it was possible to merge the die and
panic notifiers in a single/well-documented function, and clear some
conditional complexities on sending such information to the hypervisor.
(b) There is a Hyper-V framebuffer panic notifier, which relies in doing
a vmbus operation that demands a valid connection. So, we must order this
notifier with the panic notifier from vmbus_drv.c, to guarantee that the
framebuffer code executes before the vmbus connection is unloaded.
Also, this commit removes a useless header.
Although there is code rework and re-ordering, we expect that this change
has no functional regressions but instead optimize the path and increase
panic reliability on Hyper-V. This was tested on Hyper-V with success.
Cc: Andrea Parri (Microsoft) <parri.andrea@gmail.com>
Cc: Dexuan Cui <decui@microsoft.com>
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Cc: "K. Y. Srinivasan" <kys@microsoft.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Stephen Hemminger <sthemmin@microsoft.com>
Cc: Tianyu Lan <Tianyu.Lan@microsoft.com>
Cc: Wei Liu <wei.liu@kernel.org>
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Tested-by: Fabio A M Martins <fabiomirmar@gmail.com>
Signed-off-by: Guilherme G. Piccoli <gpiccoli@igalia.com>
Tested-by: Michael Kelley <mikelley@microsoft.com>
Link: https://lore.kernel.org/r/20220819221731.480795-11-gpiccoli@igalia.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>
2022-08-20 06:17:30 +08:00
|
|
|
* Always register the vmbus unload panic notifier because we
|
|
|
|
* need to shut the VMbus channel connection on panic.
|
2020-04-06 23:53:26 +08:00
|
|
|
*/
|
|
|
|
atomic_notifier_chain_register(&panic_notifier_list,
|
drivers: hv, hyperv_fb: Untangle and refactor Hyper-V panic notifiers
Currently Hyper-V guests are among the most relevant users of the panic
infrastructure, like panic notifiers, kmsg dumpers, etc. The reasons rely
both in cleaning-up procedures (closing hypervisor <-> guest connection,
disabling some paravirtualized timer) as well as to data collection
(sending panic information to the hypervisor) and framebuffer management.
The thing is: some notifiers are related to others, ordering matters, some
functionalities are duplicated and there are lots of conditionals behind
sending panic information to the hypervisor. As part of an effort to
clean-up the panic notifiers mechanism and better document things, we
hereby address some of the issues/complexities of Hyper-V panic handling
through the following changes:
(a) We have die and panic notifiers on vmbus_drv.c and both have goals of
sending panic information to the hypervisor, though the panic notifier is
also responsible for a cleaning-up procedure.
This commit clears the code by splitting the panic notifier in two, one
for closing the vmbus connection whereas the other is only for sending
panic info to hypervisor. With that, it was possible to merge the die and
panic notifiers in a single/well-documented function, and clear some
conditional complexities on sending such information to the hypervisor.
(b) There is a Hyper-V framebuffer panic notifier, which relies in doing
a vmbus operation that demands a valid connection. So, we must order this
notifier with the panic notifier from vmbus_drv.c, to guarantee that the
framebuffer code executes before the vmbus connection is unloaded.
Also, this commit removes a useless header.
Although there is code rework and re-ordering, we expect that this change
has no functional regressions but instead optimize the path and increase
panic reliability on Hyper-V. This was tested on Hyper-V with success.
Cc: Andrea Parri (Microsoft) <parri.andrea@gmail.com>
Cc: Dexuan Cui <decui@microsoft.com>
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Cc: "K. Y. Srinivasan" <kys@microsoft.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Stephen Hemminger <sthemmin@microsoft.com>
Cc: Tianyu Lan <Tianyu.Lan@microsoft.com>
Cc: Wei Liu <wei.liu@kernel.org>
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Tested-by: Fabio A M Martins <fabiomirmar@gmail.com>
Signed-off-by: Guilherme G. Piccoli <gpiccoli@igalia.com>
Tested-by: Michael Kelley <mikelley@microsoft.com>
Link: https://lore.kernel.org/r/20220819221731.480795-11-gpiccoli@igalia.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>
2022-08-20 06:17:30 +08:00
|
|
|
&hyperv_panic_vmbus_unload_block);
|
2020-04-06 23:53:26 +08:00
|
|
|
|
2010-12-03 00:50:58 +08:00
|
|
|
vmbus_request_offers();
|
2010-05-29 07:22:44 +08:00
|
|
|
|
2011-06-07 06:50:08 +08:00
|
|
|
return 0;
|
2011-09-01 05:35:55 +08:00
|
|
|
|
2015-12-15 08:01:38 +08:00
|
|
|
err_connect:
|
2016-12-08 06:53:11 +08:00
|
|
|
cpuhp_remove_state(hyperv_cpuhp_online);
|
x86/hyperv: Initialize clockevents earlier in CPU onlining
Hyper-V has historically initialized stimer-based clockevents late in the
process of onlining a CPU because clockevents depend on stimer
interrupts. In the original Hyper-V design, stimer interrupts generate a
VMbus message, so the VMbus machinery must be running first, and VMbus
can't be initialized until relatively late. On x86/64, LAPIC timer based
clockevents are used during early initialization before VMbus and
stimer-based clockevents are ready, and again during CPU offlining after
the stimer clockevents have been shut down.
Unfortunately, this design creates problems when offlining CPUs for
hibernation or other purposes. stimer-based clockevents are shut down
relatively early in the offlining process, so clockevents_unbind_device()
must be used to fallback to the LAPIC-based clockevents for the remainder
of the offlining process. Furthermore, the late initialization and early
shutdown of stimer-based clockevents doesn't work well on ARM64 since there
is no other timer like the LAPIC to fallback to. So CPU onlining and
offlining doesn't work properly.
Fix this by recognizing that stimer Direct Mode is the normal path for
newer versions of Hyper-V on x86/64, and the only path on other
architectures. With stimer Direct Mode, stimer interrupts don't require any
VMbus machinery. stimer clockevents can be initialized and shut down
consistent with how it is done for other clockevent devices. While the old
VMbus-based stimer interrupts must still be supported for backward
compatibility on x86, that mode of operation can be treated as legacy.
So add a new Hyper-V stimer entry in the CPU hotplug state list, and use
that new state when in Direct Mode. Update the Hyper-V clocksource driver
to allocate and initialize stimer clockevents earlier during boot. Update
Hyper-V initialization and the VMbus driver to use this new design. As a
result, the LAPIC timer is no longer used during boot or CPU
onlining/offlining and clockevents_unbind_device() is not called. But
retain the old design as a legacy implementation for older versions of
Hyper-V that don't support Direct Mode.
Signed-off-by: Michael Kelley <mikelley@microsoft.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Dexuan Cui <decui@microsoft.com>
Reviewed-by: Dexuan Cui <decui@microsoft.com>
Link: https://lkml.kernel.org/r/1573607467-9456-1-git-send-email-mikelley@microsoft.com
2019-11-13 09:11:49 +08:00
|
|
|
err_alloc:
|
2023-05-05 06:41:55 +08:00
|
|
|
hv_synic_free();
|
2021-03-03 05:38:18 +08:00
|
|
|
if (vmbus_irq == -1) {
|
|
|
|
hv_remove_vmbus_handler();
|
|
|
|
} else {
|
|
|
|
free_percpu_irq(vmbus_irq, vmbus_evt);
|
|
|
|
free_percpu(vmbus_evt);
|
|
|
|
}
|
2020-08-15 03:45:04 +08:00
|
|
|
err_setup:
|
2011-09-01 05:35:55 +08:00
|
|
|
bus_unregister(&hv_bus);
|
|
|
|
return ret;
|
2009-07-14 07:02:34 +08:00
|
|
|
}
|
|
|
|
|
2009-09-02 22:11:14 +08:00
|
|
|
/**
|
2022-09-19 14:38:15 +08:00
|
|
|
* __vmbus_driver_register() - Register a vmbus's driver
|
2015-08-05 15:52:37 +08:00
|
|
|
* @hv_driver: Pointer to driver structure you want to register
|
2011-08-26 06:07:32 +08:00
|
|
|
* @owner: owner module of the drv
|
|
|
|
* @mod_name: module name string
|
2010-03-05 06:11:00 +08:00
|
|
|
*
|
|
|
|
* Registers the given driver with Linux through the 'driver_register()' call
|
2011-08-26 06:07:32 +08:00
|
|
|
* and sets up the hyper-v vmbus handling for this driver.
|
2010-03-05 06:11:00 +08:00
|
|
|
* It will return the state of the 'driver_register()' call.
|
|
|
|
*
|
2009-09-02 22:11:14 +08:00
|
|
|
*/
|
2011-08-26 06:07:32 +08:00
|
|
|
int __vmbus_driver_register(struct hv_driver *hv_driver, struct module *owner, const char *mod_name)
|
2009-07-14 07:02:34 +08:00
|
|
|
{
|
2009-07-28 04:47:36 +08:00
|
|
|
int ret;
|
2009-07-14 07:02:34 +08:00
|
|
|
|
2011-08-26 06:07:32 +08:00
|
|
|
pr_info("registering driver %s\n", hv_driver->name);
|
2009-07-14 07:02:34 +08:00
|
|
|
|
2011-12-02 01:59:34 +08:00
|
|
|
ret = vmbus_exists();
|
|
|
|
if (ret < 0)
|
|
|
|
return ret;
|
|
|
|
|
2011-08-26 06:07:32 +08:00
|
|
|
hv_driver->driver.name = hv_driver->name;
|
|
|
|
hv_driver->driver.owner = owner;
|
|
|
|
hv_driver->driver.mod_name = mod_name;
|
|
|
|
hv_driver->driver.bus = &hv_bus;
|
2009-07-14 07:02:34 +08:00
|
|
|
|
2016-12-04 04:34:39 +08:00
|
|
|
spin_lock_init(&hv_driver->dynids.lock);
|
|
|
|
INIT_LIST_HEAD(&hv_driver->dynids.list);
|
|
|
|
|
2011-08-26 06:07:32 +08:00
|
|
|
ret = driver_register(&hv_driver->driver);
|
2009-07-14 07:02:34 +08:00
|
|
|
|
2009-07-28 04:47:36 +08:00
|
|
|
return ret;
|
2009-07-14 07:02:34 +08:00
|
|
|
}
|
2011-08-26 06:07:32 +08:00
|
|
|
EXPORT_SYMBOL_GPL(__vmbus_driver_register);
|
2009-07-14 07:02:34 +08:00
|
|
|
|
2009-09-02 22:11:14 +08:00
|
|
|
/**
|
2011-08-26 06:07:32 +08:00
|
|
|
* vmbus_driver_unregister() - Unregister a vmbus's driver
|
2015-08-05 15:52:37 +08:00
|
|
|
* @hv_driver: Pointer to driver structure you want to
|
|
|
|
* un-register
|
2010-03-05 06:11:00 +08:00
|
|
|
*
|
2011-08-26 06:07:32 +08:00
|
|
|
* Un-register the given driver that was previous registered with a call to
|
|
|
|
* vmbus_driver_register()
|
2009-09-02 22:11:14 +08:00
|
|
|
*/
|
2011-08-26 06:07:32 +08:00
|
|
|
void vmbus_driver_unregister(struct hv_driver *hv_driver)
|
2009-07-14 07:02:34 +08:00
|
|
|
{
|
2011-08-26 06:07:32 +08:00
|
|
|
pr_info("unregistering driver %s\n", hv_driver->name);
|
2009-07-14 07:02:34 +08:00
|
|
|
|
2016-12-04 04:34:39 +08:00
|
|
|
if (!vmbus_exists()) {
|
2011-12-28 05:49:37 +08:00
|
|
|
driver_unregister(&hv_driver->driver);
|
2016-12-04 04:34:39 +08:00
|
|
|
vmbus_free_dynids(hv_driver);
|
|
|
|
}
|
2009-07-14 07:02:34 +08:00
|
|
|
}
|
2011-08-26 06:07:32 +08:00
|
|
|
EXPORT_SYMBOL_GPL(vmbus_driver_unregister);
|
2009-07-14 07:02:34 +08:00
|
|
|
|
2017-09-22 11:58:49 +08:00
|
|
|
|
|
|
|
/*
|
|
|
|
* Called when last reference to channel is gone.
|
|
|
|
*/
|
|
|
|
static void vmbus_chan_release(struct kobject *kobj)
|
|
|
|
{
|
|
|
|
struct vmbus_channel *channel
|
|
|
|
= container_of(kobj, struct vmbus_channel, kobj);
|
|
|
|
|
|
|
|
kfree_rcu(channel, rcu);
|
|
|
|
}
|
|
|
|
|
|
|
|
struct vmbus_chan_attribute {
|
|
|
|
struct attribute attr;
|
2019-03-15 04:05:15 +08:00
|
|
|
ssize_t (*show)(struct vmbus_channel *chan, char *buf);
|
2017-09-22 11:58:49 +08:00
|
|
|
ssize_t (*store)(struct vmbus_channel *chan,
|
|
|
|
const char *buf, size_t count);
|
|
|
|
};
|
|
|
|
#define VMBUS_CHAN_ATTR(_name, _mode, _show, _store) \
|
|
|
|
struct vmbus_chan_attribute chan_attr_##_name \
|
|
|
|
= __ATTR(_name, _mode, _show, _store)
|
|
|
|
#define VMBUS_CHAN_ATTR_RW(_name) \
|
|
|
|
struct vmbus_chan_attribute chan_attr_##_name = __ATTR_RW(_name)
|
|
|
|
#define VMBUS_CHAN_ATTR_RO(_name) \
|
|
|
|
struct vmbus_chan_attribute chan_attr_##_name = __ATTR_RO(_name)
|
|
|
|
#define VMBUS_CHAN_ATTR_WO(_name) \
|
|
|
|
struct vmbus_chan_attribute chan_attr_##_name = __ATTR_WO(_name)
|
|
|
|
|
|
|
|
static ssize_t vmbus_chan_attr_show(struct kobject *kobj,
|
|
|
|
struct attribute *attr, char *buf)
|
|
|
|
{
|
|
|
|
const struct vmbus_chan_attribute *attribute
|
|
|
|
= container_of(attr, struct vmbus_chan_attribute, attr);
|
2019-03-15 04:05:15 +08:00
|
|
|
struct vmbus_channel *chan
|
2017-09-22 11:58:49 +08:00
|
|
|
= container_of(kobj, struct vmbus_channel, kobj);
|
|
|
|
|
|
|
|
if (!attribute->show)
|
|
|
|
return -EIO;
|
|
|
|
|
|
|
|
return attribute->show(chan, buf);
|
|
|
|
}
|
|
|
|
|
2020-04-06 08:15:13 +08:00
|
|
|
static ssize_t vmbus_chan_attr_store(struct kobject *kobj,
|
|
|
|
struct attribute *attr, const char *buf,
|
|
|
|
size_t count)
|
|
|
|
{
|
|
|
|
const struct vmbus_chan_attribute *attribute
|
|
|
|
= container_of(attr, struct vmbus_chan_attribute, attr);
|
|
|
|
struct vmbus_channel *chan
|
|
|
|
= container_of(kobj, struct vmbus_channel, kobj);
|
|
|
|
|
|
|
|
if (!attribute->store)
|
|
|
|
return -EIO;
|
|
|
|
|
|
|
|
return attribute->store(chan, buf, count);
|
|
|
|
}
|
|
|
|
|
2017-09-22 11:58:49 +08:00
|
|
|
static const struct sysfs_ops vmbus_chan_sysfs_ops = {
|
|
|
|
.show = vmbus_chan_attr_show,
|
2020-04-06 08:15:13 +08:00
|
|
|
.store = vmbus_chan_attr_store,
|
2017-09-22 11:58:49 +08:00
|
|
|
};
|
|
|
|
|
2019-03-15 04:05:15 +08:00
|
|
|
static ssize_t out_mask_show(struct vmbus_channel *channel, char *buf)
|
2017-09-22 11:58:49 +08:00
|
|
|
{
|
2019-03-15 04:05:15 +08:00
|
|
|
struct hv_ring_buffer_info *rbi = &channel->outbound;
|
|
|
|
ssize_t ret;
|
2017-09-22 11:58:49 +08:00
|
|
|
|
2019-03-15 04:05:15 +08:00
|
|
|
mutex_lock(&rbi->ring_buffer_mutex);
|
|
|
|
if (!rbi->ring_buffer) {
|
|
|
|
mutex_unlock(&rbi->ring_buffer_mutex);
|
2019-03-15 04:05:00 +08:00
|
|
|
return -EINVAL;
|
2019-03-15 04:05:15 +08:00
|
|
|
}
|
2019-03-15 04:05:00 +08:00
|
|
|
|
2019-03-15 04:05:15 +08:00
|
|
|
ret = sprintf(buf, "%u\n", rbi->ring_buffer->interrupt_mask);
|
|
|
|
mutex_unlock(&rbi->ring_buffer_mutex);
|
|
|
|
return ret;
|
2017-09-22 11:58:49 +08:00
|
|
|
}
|
2018-01-05 06:13:25 +08:00
|
|
|
static VMBUS_CHAN_ATTR_RO(out_mask);
|
2017-09-22 11:58:49 +08:00
|
|
|
|
2019-03-15 04:05:15 +08:00
|
|
|
static ssize_t in_mask_show(struct vmbus_channel *channel, char *buf)
|
2017-09-22 11:58:49 +08:00
|
|
|
{
|
2019-03-15 04:05:15 +08:00
|
|
|
struct hv_ring_buffer_info *rbi = &channel->inbound;
|
|
|
|
ssize_t ret;
|
2017-09-22 11:58:49 +08:00
|
|
|
|
2019-03-15 04:05:15 +08:00
|
|
|
mutex_lock(&rbi->ring_buffer_mutex);
|
|
|
|
if (!rbi->ring_buffer) {
|
|
|
|
mutex_unlock(&rbi->ring_buffer_mutex);
|
2019-03-15 04:05:00 +08:00
|
|
|
return -EINVAL;
|
2019-03-15 04:05:15 +08:00
|
|
|
}
|
2019-03-15 04:05:00 +08:00
|
|
|
|
2019-03-15 04:05:15 +08:00
|
|
|
ret = sprintf(buf, "%u\n", rbi->ring_buffer->interrupt_mask);
|
|
|
|
mutex_unlock(&rbi->ring_buffer_mutex);
|
|
|
|
return ret;
|
2017-09-22 11:58:49 +08:00
|
|
|
}
|
2018-01-05 06:13:25 +08:00
|
|
|
static VMBUS_CHAN_ATTR_RO(in_mask);
|
2017-09-22 11:58:49 +08:00
|
|
|
|
2019-03-15 04:05:15 +08:00
|
|
|
static ssize_t read_avail_show(struct vmbus_channel *channel, char *buf)
|
2017-09-22 11:58:49 +08:00
|
|
|
{
|
2019-03-15 04:05:15 +08:00
|
|
|
struct hv_ring_buffer_info *rbi = &channel->inbound;
|
|
|
|
ssize_t ret;
|
2017-09-22 11:58:49 +08:00
|
|
|
|
2019-03-15 04:05:15 +08:00
|
|
|
mutex_lock(&rbi->ring_buffer_mutex);
|
|
|
|
if (!rbi->ring_buffer) {
|
|
|
|
mutex_unlock(&rbi->ring_buffer_mutex);
|
2019-03-15 04:05:00 +08:00
|
|
|
return -EINVAL;
|
2019-03-15 04:05:15 +08:00
|
|
|
}
|
2019-03-15 04:05:00 +08:00
|
|
|
|
2019-03-15 04:05:15 +08:00
|
|
|
ret = sprintf(buf, "%u\n", hv_get_bytes_to_read(rbi));
|
|
|
|
mutex_unlock(&rbi->ring_buffer_mutex);
|
|
|
|
return ret;
|
2017-09-22 11:58:49 +08:00
|
|
|
}
|
2018-01-05 06:13:25 +08:00
|
|
|
static VMBUS_CHAN_ATTR_RO(read_avail);
|
2017-09-22 11:58:49 +08:00
|
|
|
|
2019-03-15 04:05:15 +08:00
|
|
|
static ssize_t write_avail_show(struct vmbus_channel *channel, char *buf)
|
2017-09-22 11:58:49 +08:00
|
|
|
{
|
2019-03-15 04:05:15 +08:00
|
|
|
struct hv_ring_buffer_info *rbi = &channel->outbound;
|
|
|
|
ssize_t ret;
|
2017-09-22 11:58:49 +08:00
|
|
|
|
2019-03-15 04:05:15 +08:00
|
|
|
mutex_lock(&rbi->ring_buffer_mutex);
|
|
|
|
if (!rbi->ring_buffer) {
|
|
|
|
mutex_unlock(&rbi->ring_buffer_mutex);
|
2019-03-15 04:05:00 +08:00
|
|
|
return -EINVAL;
|
2019-03-15 04:05:15 +08:00
|
|
|
}
|
2019-03-15 04:05:00 +08:00
|
|
|
|
2019-03-15 04:05:15 +08:00
|
|
|
ret = sprintf(buf, "%u\n", hv_get_bytes_to_write(rbi));
|
|
|
|
mutex_unlock(&rbi->ring_buffer_mutex);
|
|
|
|
return ret;
|
2017-09-22 11:58:49 +08:00
|
|
|
}
|
2018-01-05 06:13:25 +08:00
|
|
|
static VMBUS_CHAN_ATTR_RO(write_avail);
|
2017-09-22 11:58:49 +08:00
|
|
|
|
2020-04-06 08:15:13 +08:00
|
|
|
static ssize_t target_cpu_show(struct vmbus_channel *channel, char *buf)
|
2017-09-22 11:58:49 +08:00
|
|
|
{
|
|
|
|
return sprintf(buf, "%u\n", channel->target_cpu);
|
|
|
|
}
|
2020-04-06 08:15:13 +08:00
|
|
|
static ssize_t target_cpu_store(struct vmbus_channel *channel,
|
|
|
|
const char *buf, size_t count)
|
|
|
|
{
|
Drivers: hv: vmbus: Resolve more races involving init_vp_index()
init_vp_index() uses the (per-node) hv_numa_map[] masks to record the
CPUs allocated for channel interrupts at a given time, and distribute
the performance-critical channels across the available CPUs: in part.,
the mask of "candidate" target CPUs in a given NUMA node, for a newly
offered channel, is determined by XOR-ing the node's CPU mask and the
node's hv_numa_map. This operation/mechanism assumes that no offline
CPUs is set in the hv_numa_map mask, an assumption that does not hold
since such mask is currently not updated when a channel is removed or
assigned to a different CPU.
To address the issues described above, this adds hooks in the channel
removal path (hv_process_channel_removal()) and in target_cpu_store()
in order to clear, resp. to update, the hv_numa_map[] masks as needed.
This also adds a (missed) update of the masks in init_vp_index() (cf.,
e.g., the memory-allocation failure path in this function).
Like in the case of init_vp_index(), such hooks require to determine
if the given channel is performance critical. init_vp_index() does
this by parsing the channel's offer, it can not rely on the device
data structure (device_obj) to retrieve such information because the
device data structure has not been allocated/linked with the channel
by the time that init_vp_index() executes. A similar situation may
hold in hv_is_alloced_cpu() (defined below); the adopted approach is
to "cache" the device type of the channel, as computed by parsing the
channel's offer, in the channel structure itself.
Fixes: 7527810573436f ("Drivers: hv: vmbus: Introduce the CHANNELMSG_MODIFYCHANNEL message type")
Signed-off-by: Andrea Parri (Microsoft) <parri.andrea@gmail.com>
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Link: https://lore.kernel.org/r/20200522171901.204127-3-parri.andrea@gmail.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>
2020-05-23 01:19:01 +08:00
|
|
|
u32 target_cpu, origin_cpu;
|
2020-04-06 08:15:13 +08:00
|
|
|
ssize_t ret = count;
|
|
|
|
|
|
|
|
if (vmbus_proto_version < VERSION_WIN10_V4_1)
|
|
|
|
return -EIO;
|
|
|
|
|
|
|
|
if (sscanf(buf, "%uu", &target_cpu) != 1)
|
|
|
|
return -EIO;
|
|
|
|
|
|
|
|
/* Validate target_cpu for the cpumask_test_cpu() operation below. */
|
|
|
|
if (target_cpu >= nr_cpumask_bits)
|
|
|
|
return -EINVAL;
|
|
|
|
|
2022-05-27 15:43:59 +08:00
|
|
|
if (!cpumask_test_cpu(target_cpu, housekeeping_cpumask(HK_TYPE_MANAGED_IRQ)))
|
|
|
|
return -EINVAL;
|
|
|
|
|
2020-04-06 08:15:13 +08:00
|
|
|
/* No CPUs should come up or down during this. */
|
|
|
|
cpus_read_lock();
|
|
|
|
|
2020-06-18 00:46:37 +08:00
|
|
|
if (!cpu_online(target_cpu)) {
|
2020-04-06 08:15:13 +08:00
|
|
|
cpus_read_unlock();
|
|
|
|
return -EINVAL;
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Synchronizes target_cpu_store() and channel closure:
|
|
|
|
*
|
|
|
|
* { Initially: state = CHANNEL_OPENED }
|
|
|
|
*
|
|
|
|
* CPU1 CPU2
|
|
|
|
*
|
|
|
|
* [target_cpu_store()] [vmbus_disconnect_ring()]
|
|
|
|
*
|
|
|
|
* LOCK channel_mutex LOCK channel_mutex
|
|
|
|
* LOAD r1 = state LOAD r2 = state
|
|
|
|
* IF (r1 == CHANNEL_OPENED) IF (r2 == CHANNEL_OPENED)
|
|
|
|
* SEND MODIFYCHANNEL STORE state = CHANNEL_OPEN
|
|
|
|
* [...] SEND CLOSECHANNEL
|
|
|
|
* UNLOCK channel_mutex UNLOCK channel_mutex
|
|
|
|
*
|
|
|
|
* Forbids: r1 == r2 == CHANNEL_OPENED (i.e., CPU1's LOCK precedes
|
|
|
|
* CPU2's LOCK) && CPU2's SEND precedes CPU1's SEND
|
|
|
|
*
|
|
|
|
* Note. The host processes the channel messages "sequentially", in
|
|
|
|
* the order in which they are received on a per-partition basis.
|
|
|
|
*/
|
|
|
|
mutex_lock(&vmbus_connection.channel_mutex);
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Hyper-V will ignore MODIFYCHANNEL messages for "non-open" channels;
|
|
|
|
* avoid sending the message and fail here for such channels.
|
|
|
|
*/
|
|
|
|
if (channel->state != CHANNEL_OPENED_STATE) {
|
|
|
|
ret = -EIO;
|
|
|
|
goto cpu_store_unlock;
|
|
|
|
}
|
|
|
|
|
Drivers: hv: vmbus: Resolve more races involving init_vp_index()
init_vp_index() uses the (per-node) hv_numa_map[] masks to record the
CPUs allocated for channel interrupts at a given time, and distribute
the performance-critical channels across the available CPUs: in part.,
the mask of "candidate" target CPUs in a given NUMA node, for a newly
offered channel, is determined by XOR-ing the node's CPU mask and the
node's hv_numa_map. This operation/mechanism assumes that no offline
CPUs is set in the hv_numa_map mask, an assumption that does not hold
since such mask is currently not updated when a channel is removed or
assigned to a different CPU.
To address the issues described above, this adds hooks in the channel
removal path (hv_process_channel_removal()) and in target_cpu_store()
in order to clear, resp. to update, the hv_numa_map[] masks as needed.
This also adds a (missed) update of the masks in init_vp_index() (cf.,
e.g., the memory-allocation failure path in this function).
Like in the case of init_vp_index(), such hooks require to determine
if the given channel is performance critical. init_vp_index() does
this by parsing the channel's offer, it can not rely on the device
data structure (device_obj) to retrieve such information because the
device data structure has not been allocated/linked with the channel
by the time that init_vp_index() executes. A similar situation may
hold in hv_is_alloced_cpu() (defined below); the adopted approach is
to "cache" the device type of the channel, as computed by parsing the
channel's offer, in the channel structure itself.
Fixes: 7527810573436f ("Drivers: hv: vmbus: Introduce the CHANNELMSG_MODIFYCHANNEL message type")
Signed-off-by: Andrea Parri (Microsoft) <parri.andrea@gmail.com>
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Link: https://lore.kernel.org/r/20200522171901.204127-3-parri.andrea@gmail.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>
2020-05-23 01:19:01 +08:00
|
|
|
origin_cpu = channel->target_cpu;
|
|
|
|
if (target_cpu == origin_cpu)
|
2020-04-06 08:15:13 +08:00
|
|
|
goto cpu_store_unlock;
|
|
|
|
|
2021-04-16 22:34:48 +08:00
|
|
|
if (vmbus_send_modifychannel(channel,
|
2020-04-06 08:15:13 +08:00
|
|
|
hv_cpu_number_to_vp_number(target_cpu))) {
|
|
|
|
ret = -EIO;
|
|
|
|
goto cpu_store_unlock;
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
2021-04-16 22:34:48 +08:00
|
|
|
* For version before VERSION_WIN10_V5_3, the following warning holds:
|
|
|
|
*
|
2020-04-06 08:15:13 +08:00
|
|
|
* Warning. At this point, there is *no* guarantee that the host will
|
|
|
|
* have successfully processed the vmbus_send_modifychannel() request.
|
|
|
|
* See the header comment of vmbus_send_modifychannel() for more info.
|
|
|
|
*
|
|
|
|
* Lags in the processing of the above vmbus_send_modifychannel() can
|
|
|
|
* result in missed interrupts if the "old" target CPU is taken offline
|
|
|
|
* before Hyper-V starts sending interrupts to the "new" target CPU.
|
|
|
|
* But apart from this offlining scenario, the code tolerates such
|
|
|
|
* lags. It will function correctly even if a channel interrupt comes
|
|
|
|
* in on a CPU that is different from the channel target_cpu value.
|
|
|
|
*/
|
|
|
|
|
|
|
|
channel->target_cpu = target_cpu;
|
|
|
|
|
Drivers: hv: vmbus: Resolve more races involving init_vp_index()
init_vp_index() uses the (per-node) hv_numa_map[] masks to record the
CPUs allocated for channel interrupts at a given time, and distribute
the performance-critical channels across the available CPUs: in part.,
the mask of "candidate" target CPUs in a given NUMA node, for a newly
offered channel, is determined by XOR-ing the node's CPU mask and the
node's hv_numa_map. This operation/mechanism assumes that no offline
CPUs is set in the hv_numa_map mask, an assumption that does not hold
since such mask is currently not updated when a channel is removed or
assigned to a different CPU.
To address the issues described above, this adds hooks in the channel
removal path (hv_process_channel_removal()) and in target_cpu_store()
in order to clear, resp. to update, the hv_numa_map[] masks as needed.
This also adds a (missed) update of the masks in init_vp_index() (cf.,
e.g., the memory-allocation failure path in this function).
Like in the case of init_vp_index(), such hooks require to determine
if the given channel is performance critical. init_vp_index() does
this by parsing the channel's offer, it can not rely on the device
data structure (device_obj) to retrieve such information because the
device data structure has not been allocated/linked with the channel
by the time that init_vp_index() executes. A similar situation may
hold in hv_is_alloced_cpu() (defined below); the adopted approach is
to "cache" the device type of the channel, as computed by parsing the
channel's offer, in the channel structure itself.
Fixes: 7527810573436f ("Drivers: hv: vmbus: Introduce the CHANNELMSG_MODIFYCHANNEL message type")
Signed-off-by: Andrea Parri (Microsoft) <parri.andrea@gmail.com>
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Link: https://lore.kernel.org/r/20200522171901.204127-3-parri.andrea@gmail.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>
2020-05-23 01:19:01 +08:00
|
|
|
/* See init_vp_index(). */
|
|
|
|
if (hv_is_perf_channel(channel))
|
2022-01-28 18:34:11 +08:00
|
|
|
hv_update_allocated_cpus(origin_cpu, target_cpu);
|
Drivers: hv: vmbus: Resolve more races involving init_vp_index()
init_vp_index() uses the (per-node) hv_numa_map[] masks to record the
CPUs allocated for channel interrupts at a given time, and distribute
the performance-critical channels across the available CPUs: in part.,
the mask of "candidate" target CPUs in a given NUMA node, for a newly
offered channel, is determined by XOR-ing the node's CPU mask and the
node's hv_numa_map. This operation/mechanism assumes that no offline
CPUs is set in the hv_numa_map mask, an assumption that does not hold
since such mask is currently not updated when a channel is removed or
assigned to a different CPU.
To address the issues described above, this adds hooks in the channel
removal path (hv_process_channel_removal()) and in target_cpu_store()
in order to clear, resp. to update, the hv_numa_map[] masks as needed.
This also adds a (missed) update of the masks in init_vp_index() (cf.,
e.g., the memory-allocation failure path in this function).
Like in the case of init_vp_index(), such hooks require to determine
if the given channel is performance critical. init_vp_index() does
this by parsing the channel's offer, it can not rely on the device
data structure (device_obj) to retrieve such information because the
device data structure has not been allocated/linked with the channel
by the time that init_vp_index() executes. A similar situation may
hold in hv_is_alloced_cpu() (defined below); the adopted approach is
to "cache" the device type of the channel, as computed by parsing the
channel's offer, in the channel structure itself.
Fixes: 7527810573436f ("Drivers: hv: vmbus: Introduce the CHANNELMSG_MODIFYCHANNEL message type")
Signed-off-by: Andrea Parri (Microsoft) <parri.andrea@gmail.com>
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Link: https://lore.kernel.org/r/20200522171901.204127-3-parri.andrea@gmail.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>
2020-05-23 01:19:01 +08:00
|
|
|
|
|
|
|
/* Currently set only for storvsc channels. */
|
|
|
|
if (channel->change_target_cpu_callback) {
|
|
|
|
(*channel->change_target_cpu_callback)(channel,
|
|
|
|
origin_cpu, target_cpu);
|
|
|
|
}
|
|
|
|
|
2020-04-06 08:15:13 +08:00
|
|
|
cpu_store_unlock:
|
|
|
|
mutex_unlock(&vmbus_connection.channel_mutex);
|
|
|
|
cpus_read_unlock();
|
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
static VMBUS_CHAN_ATTR(cpu, 0644, target_cpu_show, target_cpu_store);
|
2017-09-22 11:58:49 +08:00
|
|
|
|
2019-03-15 04:05:15 +08:00
|
|
|
static ssize_t channel_pending_show(struct vmbus_channel *channel,
|
2017-09-22 11:58:49 +08:00
|
|
|
char *buf)
|
|
|
|
{
|
|
|
|
return sprintf(buf, "%d\n",
|
|
|
|
channel_pending(channel,
|
|
|
|
vmbus_connection.monitor_pages[1]));
|
|
|
|
}
|
2020-11-16 03:57:30 +08:00
|
|
|
static VMBUS_CHAN_ATTR(pending, 0444, channel_pending_show, NULL);
|
2017-09-22 11:58:49 +08:00
|
|
|
|
2019-03-15 04:05:15 +08:00
|
|
|
static ssize_t channel_latency_show(struct vmbus_channel *channel,
|
2017-09-22 11:58:49 +08:00
|
|
|
char *buf)
|
|
|
|
{
|
|
|
|
return sprintf(buf, "%d\n",
|
|
|
|
channel_latency(channel,
|
|
|
|
vmbus_connection.monitor_pages[1]));
|
|
|
|
}
|
2020-11-16 03:57:30 +08:00
|
|
|
static VMBUS_CHAN_ATTR(latency, 0444, channel_latency_show, NULL);
|
2017-09-22 11:58:49 +08:00
|
|
|
|
2019-03-15 04:05:15 +08:00
|
|
|
static ssize_t channel_interrupts_show(struct vmbus_channel *channel, char *buf)
|
2017-10-30 02:33:40 +08:00
|
|
|
{
|
|
|
|
return sprintf(buf, "%llu\n", channel->interrupts);
|
|
|
|
}
|
2020-11-16 03:57:30 +08:00
|
|
|
static VMBUS_CHAN_ATTR(interrupts, 0444, channel_interrupts_show, NULL);
|
2017-10-30 02:33:40 +08:00
|
|
|
|
2019-03-15 04:05:15 +08:00
|
|
|
static ssize_t channel_events_show(struct vmbus_channel *channel, char *buf)
|
2017-10-30 02:33:40 +08:00
|
|
|
{
|
|
|
|
return sprintf(buf, "%llu\n", channel->sig_events);
|
|
|
|
}
|
2020-11-16 03:57:30 +08:00
|
|
|
static VMBUS_CHAN_ATTR(events, 0444, channel_events_show, NULL);
|
2017-10-30 02:33:40 +08:00
|
|
|
|
2019-03-15 04:05:15 +08:00
|
|
|
static ssize_t channel_intr_in_full_show(struct vmbus_channel *channel,
|
2019-02-04 15:13:09 +08:00
|
|
|
char *buf)
|
|
|
|
{
|
|
|
|
return sprintf(buf, "%llu\n",
|
|
|
|
(unsigned long long)channel->intr_in_full);
|
|
|
|
}
|
|
|
|
static VMBUS_CHAN_ATTR(intr_in_full, 0444, channel_intr_in_full_show, NULL);
|
|
|
|
|
2019-03-15 04:05:15 +08:00
|
|
|
static ssize_t channel_intr_out_empty_show(struct vmbus_channel *channel,
|
2019-02-04 15:13:09 +08:00
|
|
|
char *buf)
|
|
|
|
{
|
|
|
|
return sprintf(buf, "%llu\n",
|
|
|
|
(unsigned long long)channel->intr_out_empty);
|
|
|
|
}
|
|
|
|
static VMBUS_CHAN_ATTR(intr_out_empty, 0444, channel_intr_out_empty_show, NULL);
|
|
|
|
|
2019-03-15 04:05:15 +08:00
|
|
|
static ssize_t channel_out_full_first_show(struct vmbus_channel *channel,
|
2019-02-04 15:13:09 +08:00
|
|
|
char *buf)
|
|
|
|
{
|
|
|
|
return sprintf(buf, "%llu\n",
|
|
|
|
(unsigned long long)channel->out_full_first);
|
|
|
|
}
|
|
|
|
static VMBUS_CHAN_ATTR(out_full_first, 0444, channel_out_full_first_show, NULL);
|
|
|
|
|
2019-03-15 04:05:15 +08:00
|
|
|
static ssize_t channel_out_full_total_show(struct vmbus_channel *channel,
|
2019-02-04 15:13:09 +08:00
|
|
|
char *buf)
|
|
|
|
{
|
|
|
|
return sprintf(buf, "%llu\n",
|
|
|
|
(unsigned long long)channel->out_full_total);
|
|
|
|
}
|
|
|
|
static VMBUS_CHAN_ATTR(out_full_total, 0444, channel_out_full_total_show, NULL);
|
|
|
|
|
2019-03-15 04:05:15 +08:00
|
|
|
static ssize_t subchannel_monitor_id_show(struct vmbus_channel *channel,
|
2018-01-10 02:29:06 +08:00
|
|
|
char *buf)
|
|
|
|
{
|
|
|
|
return sprintf(buf, "%u\n", channel->offermsg.monitorid);
|
|
|
|
}
|
2020-11-16 03:57:30 +08:00
|
|
|
static VMBUS_CHAN_ATTR(monitor_id, 0444, subchannel_monitor_id_show, NULL);
|
2018-01-10 02:29:06 +08:00
|
|
|
|
2019-03-15 04:05:15 +08:00
|
|
|
static ssize_t subchannel_id_show(struct vmbus_channel *channel,
|
2018-01-10 02:29:06 +08:00
|
|
|
char *buf)
|
|
|
|
{
|
|
|
|
return sprintf(buf, "%u\n",
|
|
|
|
channel->offermsg.offer.sub_channel_index);
|
|
|
|
}
|
|
|
|
static VMBUS_CHAN_ATTR_RO(subchannel_id);
|
|
|
|
|
2017-09-22 11:58:49 +08:00
|
|
|
static struct attribute *vmbus_chan_attrs[] = {
|
|
|
|
&chan_attr_out_mask.attr,
|
|
|
|
&chan_attr_in_mask.attr,
|
|
|
|
&chan_attr_read_avail.attr,
|
|
|
|
&chan_attr_write_avail.attr,
|
|
|
|
&chan_attr_cpu.attr,
|
|
|
|
&chan_attr_pending.attr,
|
|
|
|
&chan_attr_latency.attr,
|
2017-10-30 02:33:40 +08:00
|
|
|
&chan_attr_interrupts.attr,
|
|
|
|
&chan_attr_events.attr,
|
2019-02-04 15:13:09 +08:00
|
|
|
&chan_attr_intr_in_full.attr,
|
|
|
|
&chan_attr_intr_out_empty.attr,
|
|
|
|
&chan_attr_out_full_first.attr,
|
|
|
|
&chan_attr_out_full_total.attr,
|
2018-01-10 02:29:06 +08:00
|
|
|
&chan_attr_monitor_id.attr,
|
|
|
|
&chan_attr_subchannel_id.attr,
|
2017-09-22 11:58:49 +08:00
|
|
|
NULL
|
|
|
|
};
|
|
|
|
|
2019-03-19 12:04:01 +08:00
|
|
|
/*
|
|
|
|
* Channel-level attribute_group callback function. Returns the permission for
|
|
|
|
* each attribute, and returns 0 if an attribute is not visible.
|
|
|
|
*/
|
|
|
|
static umode_t vmbus_chan_attr_is_visible(struct kobject *kobj,
|
|
|
|
struct attribute *attr, int idx)
|
|
|
|
{
|
|
|
|
const struct vmbus_channel *channel =
|
|
|
|
container_of(kobj, struct vmbus_channel, kobj);
|
|
|
|
|
|
|
|
/* Hide the monitor attributes if the monitor mechanism is not used. */
|
|
|
|
if (!channel->offermsg.monitor_allocated &&
|
|
|
|
(attr == &chan_attr_pending.attr ||
|
|
|
|
attr == &chan_attr_latency.attr ||
|
|
|
|
attr == &chan_attr_monitor_id.attr))
|
|
|
|
return 0;
|
|
|
|
|
|
|
|
return attr->mode;
|
|
|
|
}
|
|
|
|
|
|
|
|
static struct attribute_group vmbus_chan_group = {
|
|
|
|
.attrs = vmbus_chan_attrs,
|
|
|
|
.is_visible = vmbus_chan_attr_is_visible
|
|
|
|
};
|
|
|
|
|
2017-09-22 11:58:49 +08:00
|
|
|
static struct kobj_type vmbus_chan_ktype = {
|
|
|
|
.sysfs_ops = &vmbus_chan_sysfs_ops,
|
|
|
|
.release = vmbus_chan_release,
|
|
|
|
};
|
|
|
|
|
|
|
|
/*
|
|
|
|
* vmbus_add_channel_kobj - setup a sub-directory under device/channels
|
|
|
|
*/
|
|
|
|
int vmbus_add_channel_kobj(struct hv_device *dev, struct vmbus_channel *channel)
|
|
|
|
{
|
2019-03-19 12:04:01 +08:00
|
|
|
const struct device *device = &dev->device;
|
2017-09-22 11:58:49 +08:00
|
|
|
struct kobject *kobj = &channel->kobj;
|
|
|
|
u32 relid = channel->offermsg.child_relid;
|
|
|
|
int ret;
|
|
|
|
|
|
|
|
kobj->kset = dev->channels_kset;
|
|
|
|
ret = kobject_init_and_add(kobj, &vmbus_chan_ktype, NULL,
|
|
|
|
"%u", relid);
|
2022-02-04 01:30:08 +08:00
|
|
|
if (ret) {
|
|
|
|
kobject_put(kobj);
|
2017-09-22 11:58:49 +08:00
|
|
|
return ret;
|
2022-02-04 01:30:08 +08:00
|
|
|
}
|
2017-09-22 11:58:49 +08:00
|
|
|
|
2019-03-19 12:04:01 +08:00
|
|
|
ret = sysfs_create_group(kobj, &vmbus_chan_group);
|
|
|
|
|
|
|
|
if (ret) {
|
|
|
|
/*
|
|
|
|
* The calling functions' error handling paths will cleanup the
|
|
|
|
* empty channel directory.
|
|
|
|
*/
|
2022-02-04 01:30:08 +08:00
|
|
|
kobject_put(kobj);
|
2019-03-19 12:04:01 +08:00
|
|
|
dev_err(device, "Unable to set up channel sysfs files\n");
|
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
|
2017-09-22 11:58:49 +08:00
|
|
|
kobject_uevent(kobj, KOBJ_ADD);
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2019-03-19 12:04:01 +08:00
|
|
|
/*
|
|
|
|
* vmbus_remove_channel_attr_group - remove the channel's attribute group
|
|
|
|
*/
|
|
|
|
void vmbus_remove_channel_attr_group(struct vmbus_channel *channel)
|
|
|
|
{
|
|
|
|
sysfs_remove_group(&channel->kobj, &vmbus_chan_group);
|
|
|
|
}
|
|
|
|
|
2010-03-05 06:11:00 +08:00
|
|
|
/*
|
2011-09-08 22:24:12 +08:00
|
|
|
* vmbus_device_create - Creates and registers a new child device
|
2010-03-05 06:11:00 +08:00
|
|
|
* on the vmbus.
|
2009-09-02 22:11:14 +08:00
|
|
|
*/
|
2019-01-10 22:25:32 +08:00
|
|
|
struct hv_device *vmbus_device_create(const guid_t *type,
|
|
|
|
const guid_t *instance,
|
2014-06-03 23:38:15 +08:00
|
|
|
struct vmbus_channel *channel)
|
2009-07-14 07:02:34 +08:00
|
|
|
{
|
2009-07-28 23:32:53 +08:00
|
|
|
struct hv_device *child_device_obj;
|
2009-07-14 07:02:34 +08:00
|
|
|
|
2011-03-08 05:35:48 +08:00
|
|
|
child_device_obj = kzalloc(sizeof(struct hv_device), GFP_KERNEL);
|
|
|
|
if (!child_device_obj) {
|
2011-03-30 04:58:47 +08:00
|
|
|
pr_err("Unable to allocate device object for child device\n");
|
2009-07-14 07:02:34 +08:00
|
|
|
return NULL;
|
|
|
|
}
|
|
|
|
|
2010-10-22 00:05:27 +08:00
|
|
|
child_device_obj->channel = channel;
|
2019-01-10 22:25:32 +08:00
|
|
|
guid_copy(&child_device_obj->dev_type, type);
|
|
|
|
guid_copy(&child_device_obj->dev_instance, instance);
|
2022-09-20 06:04:44 +08:00
|
|
|
child_device_obj->vendor_id = PCI_VENDOR_ID_MICROSOFT;
|
2009-07-14 07:02:34 +08:00
|
|
|
|
|
|
|
return child_device_obj;
|
|
|
|
}
|
|
|
|
|
2010-03-05 06:11:00 +08:00
|
|
|
/*
|
2011-09-08 22:24:13 +08:00
|
|
|
* vmbus_device_register - Register the child device
|
2009-09-02 22:11:14 +08:00
|
|
|
*/
|
2011-09-08 22:24:13 +08:00
|
|
|
int vmbus_device_register(struct hv_device *child_device_obj)
|
2009-07-14 07:02:34 +08:00
|
|
|
{
|
2017-09-22 11:58:49 +08:00
|
|
|
struct kobject *kobj = &child_device_obj->device.kobj;
|
|
|
|
int ret;
|
2011-03-08 05:35:48 +08:00
|
|
|
|
2016-11-01 15:01:59 +08:00
|
|
|
dev_set_name(&child_device_obj->device, "%pUl",
|
2020-04-23 21:45:03 +08:00
|
|
|
&child_device_obj->channel->offermsg.offer.if_instance);
|
2009-07-14 07:02:34 +08:00
|
|
|
|
2011-08-28 02:31:39 +08:00
|
|
|
child_device_obj->device.bus = &hv_bus;
|
2023-03-20 15:47:38 +08:00
|
|
|
child_device_obj->device.parent = hv_dev;
|
2011-03-08 05:35:48 +08:00
|
|
|
child_device_obj->device.release = vmbus_device_release;
|
2009-07-14 07:02:34 +08:00
|
|
|
|
2022-03-15 22:10:53 +08:00
|
|
|
child_device_obj->device.dma_parms = &child_device_obj->dma_parms;
|
|
|
|
child_device_obj->device.dma_mask = &child_device_obj->dma_mask;
|
|
|
|
dma_set_mask(&child_device_obj->device, DMA_BIT_MASK(64));
|
|
|
|
|
2009-09-02 22:11:14 +08:00
|
|
|
/*
|
|
|
|
* Register with the LDM. This will kick off the driver/device
|
|
|
|
* binding...which will eventually call vmbus_match() and vmbus_probe()
|
|
|
|
*/
|
2011-03-08 05:35:48 +08:00
|
|
|
ret = device_register(&child_device_obj->device);
|
2017-09-22 11:58:49 +08:00
|
|
|
if (ret) {
|
2011-03-30 04:58:47 +08:00
|
|
|
pr_err("Unable to register child device\n");
|
2022-11-19 16:11:35 +08:00
|
|
|
put_device(&child_device_obj->device);
|
2017-09-22 11:58:49 +08:00
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
|
|
|
|
child_device_obj->channels_kset = kset_create_and_add("channels",
|
|
|
|
NULL, kobj);
|
|
|
|
if (!child_device_obj->channels_kset) {
|
|
|
|
ret = -ENOMEM;
|
|
|
|
goto err_dev_unregister;
|
|
|
|
}
|
|
|
|
|
|
|
|
ret = vmbus_add_channel_kobj(child_device_obj,
|
|
|
|
child_device_obj->channel);
|
|
|
|
if (ret) {
|
|
|
|
pr_err("Unable to register primary channeln");
|
|
|
|
goto err_kset_unregister;
|
|
|
|
}
|
2019-10-04 05:01:49 +08:00
|
|
|
hv_debug_add_dev_dir(child_device_obj);
|
2017-09-22 11:58:49 +08:00
|
|
|
|
|
|
|
return 0;
|
|
|
|
|
|
|
|
err_kset_unregister:
|
|
|
|
kset_unregister(child_device_obj->channels_kset);
|
2009-07-14 07:02:34 +08:00
|
|
|
|
2017-09-22 11:58:49 +08:00
|
|
|
err_dev_unregister:
|
|
|
|
device_unregister(&child_device_obj->device);
|
2009-07-14 07:02:34 +08:00
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
|
2010-03-05 06:11:00 +08:00
|
|
|
/*
|
2011-09-08 22:24:14 +08:00
|
|
|
* vmbus_device_unregister - Remove the specified child device
|
2010-03-05 06:11:00 +08:00
|
|
|
* from the vmbus.
|
2009-09-02 22:11:14 +08:00
|
|
|
*/
|
2011-09-08 22:24:14 +08:00
|
|
|
void vmbus_device_unregister(struct hv_device *device_obj)
|
2009-07-14 07:02:34 +08:00
|
|
|
{
|
2013-06-15 07:13:35 +08:00
|
|
|
pr_debug("child device %s unregistered\n",
|
|
|
|
dev_name(&device_obj->device));
|
|
|
|
|
2017-11-14 21:53:32 +08:00
|
|
|
kset_unregister(device_obj->channels_kset);
|
|
|
|
|
2009-09-02 22:11:14 +08:00
|
|
|
/*
|
|
|
|
* Kick off the process of unregistering the device.
|
|
|
|
* This will call vmbus_remove() and eventually vmbus_device_release()
|
|
|
|
*/
|
2011-03-08 05:35:48 +08:00
|
|
|
device_unregister(&device_obj->device);
|
2009-07-14 07:02:34 +08:00
|
|
|
}
|
|
|
|
|
2023-03-20 15:47:40 +08:00
|
|
|
#ifdef CONFIG_ACPI
|
2011-04-30 04:45:15 +08:00
|
|
|
/*
|
2015-08-05 15:52:36 +08:00
|
|
|
* VMBUS is an acpi enumerated device. Get the information we
|
2014-01-30 10:14:39 +08:00
|
|
|
* need from DSDT.
|
2011-04-30 04:45:15 +08:00
|
|
|
*/
|
2014-01-30 10:14:39 +08:00
|
|
|
static acpi_status vmbus_walk_resources(struct acpi_resource *res, void *ctx)
|
2011-04-30 04:45:15 +08:00
|
|
|
{
|
2015-08-05 15:52:36 +08:00
|
|
|
resource_size_t start = 0;
|
|
|
|
resource_size_t end = 0;
|
|
|
|
struct resource *new_res;
|
|
|
|
struct resource **old_res = &hyperv_mmio;
|
|
|
|
struct resource **prev_res = NULL;
|
2020-08-15 03:45:04 +08:00
|
|
|
struct resource r;
|
2015-08-05 15:52:36 +08:00
|
|
|
|
2014-01-30 10:14:39 +08:00
|
|
|
switch (res->type) {
|
2015-08-05 15:52:36 +08:00
|
|
|
|
|
|
|
/*
|
|
|
|
* "Address" descriptors are for bus windows. Ignore
|
|
|
|
* "memory" descriptors, which are for registers on
|
|
|
|
* devices.
|
|
|
|
*/
|
|
|
|
case ACPI_RESOURCE_TYPE_ADDRESS32:
|
|
|
|
start = res->data.address32.address.minimum;
|
|
|
|
end = res->data.address32.address.maximum;
|
2014-02-24 21:17:08 +08:00
|
|
|
break;
|
2011-04-30 04:45:15 +08:00
|
|
|
|
2014-01-30 10:14:39 +08:00
|
|
|
case ACPI_RESOURCE_TYPE_ADDRESS64:
|
2015-08-05 15:52:36 +08:00
|
|
|
start = res->data.address64.address.minimum;
|
|
|
|
end = res->data.address64.address.maximum;
|
2014-02-24 21:17:08 +08:00
|
|
|
break;
|
2015-08-05 15:52:36 +08:00
|
|
|
|
2020-08-15 03:45:04 +08:00
|
|
|
/*
|
|
|
|
* The IRQ information is needed only on ARM64, which Hyper-V
|
|
|
|
* sets up in the extended format. IRQ information is present
|
|
|
|
* on x86/x64 in the non-extended format but it is not used by
|
|
|
|
* Linux. So don't bother checking for the non-extended format.
|
|
|
|
*/
|
|
|
|
case ACPI_RESOURCE_TYPE_EXTENDED_IRQ:
|
|
|
|
if (!acpi_dev_resource_interrupt(res, 0, &r)) {
|
|
|
|
pr_err("Unable to parse Hyper-V ACPI interrupt\n");
|
|
|
|
return AE_ERROR;
|
|
|
|
}
|
|
|
|
/* ARM64 INTID for VMbus */
|
|
|
|
vmbus_interrupt = res->data.extended_irq.interrupts[0];
|
|
|
|
/* Linux IRQ number */
|
|
|
|
vmbus_irq = r.start;
|
|
|
|
return AE_OK;
|
|
|
|
|
2015-08-05 15:52:36 +08:00
|
|
|
default:
|
|
|
|
/* Unused resource type */
|
|
|
|
return AE_OK;
|
|
|
|
|
2011-04-30 04:45:15 +08:00
|
|
|
}
|
2015-08-05 15:52:36 +08:00
|
|
|
/*
|
|
|
|
* Ignore ranges that are below 1MB, as they're not
|
|
|
|
* necessary or useful here.
|
|
|
|
*/
|
|
|
|
if (end < 0x100000)
|
|
|
|
return AE_OK;
|
|
|
|
|
|
|
|
new_res = kzalloc(sizeof(*new_res), GFP_ATOMIC);
|
|
|
|
if (!new_res)
|
|
|
|
return AE_NO_MEMORY;
|
|
|
|
|
|
|
|
/* If this range overlaps the virtual TPM, truncate it. */
|
|
|
|
if (end > VTPM_BASE_ADDRESS && start < VTPM_BASE_ADDRESS)
|
|
|
|
end = VTPM_BASE_ADDRESS;
|
|
|
|
|
|
|
|
new_res->name = "hyperv mmio";
|
|
|
|
new_res->flags = IORESOURCE_MEM;
|
|
|
|
new_res->start = start;
|
|
|
|
new_res->end = end;
|
|
|
|
|
2015-12-15 08:01:52 +08:00
|
|
|
/*
|
|
|
|
* If two ranges are adjacent, merge them.
|
|
|
|
*/
|
2015-08-05 15:52:36 +08:00
|
|
|
do {
|
|
|
|
if (!*old_res) {
|
|
|
|
*old_res = new_res;
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
|
2015-12-15 08:01:52 +08:00
|
|
|
if (((*old_res)->end + 1) == new_res->start) {
|
|
|
|
(*old_res)->end = new_res->end;
|
|
|
|
kfree(new_res);
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
|
|
|
|
if ((*old_res)->start == new_res->end + 1) {
|
|
|
|
(*old_res)->start = new_res->start;
|
|
|
|
kfree(new_res);
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
|
2016-04-06 01:22:53 +08:00
|
|
|
if ((*old_res)->start > new_res->end) {
|
2015-08-05 15:52:36 +08:00
|
|
|
new_res->sibling = *old_res;
|
|
|
|
if (prev_res)
|
|
|
|
(*prev_res)->sibling = new_res;
|
|
|
|
*old_res = new_res;
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
|
|
|
|
prev_res = old_res;
|
|
|
|
old_res = &(*old_res)->sibling;
|
|
|
|
|
|
|
|
} while (1);
|
2011-04-30 04:45:15 +08:00
|
|
|
|
|
|
|
return AE_OK;
|
|
|
|
}
|
2023-03-20 15:47:40 +08:00
|
|
|
#endif
|
2011-04-30 04:45:15 +08:00
|
|
|
|
2023-03-20 15:47:38 +08:00
|
|
|
static void vmbus_mmio_remove(void)
|
2015-08-05 15:52:36 +08:00
|
|
|
{
|
|
|
|
struct resource *cur_res;
|
|
|
|
struct resource *next_res;
|
|
|
|
|
|
|
|
if (hyperv_mmio) {
|
2016-04-06 01:22:55 +08:00
|
|
|
if (fb_mmio) {
|
|
|
|
__release_region(hyperv_mmio, fb_mmio->start,
|
|
|
|
resource_size(fb_mmio));
|
|
|
|
fb_mmio = NULL;
|
|
|
|
}
|
|
|
|
|
2015-08-05 15:52:36 +08:00
|
|
|
for (cur_res = hyperv_mmio; cur_res; cur_res = next_res) {
|
|
|
|
next_res = cur_res->sibling;
|
|
|
|
kfree(cur_res);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2023-03-20 15:47:40 +08:00
|
|
|
static void __maybe_unused vmbus_reserve_fb(void)
|
2016-04-06 01:22:55 +08:00
|
|
|
{
|
2022-08-27 21:03:44 +08:00
|
|
|
resource_size_t start = 0, size;
|
|
|
|
struct pci_dev *pdev;
|
|
|
|
|
|
|
|
if (efi_enabled(EFI_BOOT)) {
|
|
|
|
/* Gen2 VM: get FB base from EFI framebuffer */
|
2023-10-10 05:18:44 +08:00
|
|
|
if (IS_ENABLED(CONFIG_SYSFB)) {
|
|
|
|
start = screen_info.lfb_base;
|
|
|
|
size = max_t(__u32, screen_info.lfb_size, 0x800000);
|
|
|
|
}
|
2022-08-27 21:03:44 +08:00
|
|
|
} else {
|
|
|
|
/* Gen1 VM: get FB base from PCI */
|
|
|
|
pdev = pci_get_device(PCI_VENDOR_ID_MICROSOFT,
|
|
|
|
PCI_DEVICE_ID_HYPERV_VIDEO, NULL);
|
|
|
|
if (!pdev)
|
|
|
|
return;
|
|
|
|
|
|
|
|
if (pdev->resource[0].flags & IORESOURCE_MEM) {
|
|
|
|
start = pci_resource_start(pdev, 0);
|
|
|
|
size = pci_resource_len(pdev, 0);
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Release the PCI device so hyperv_drm or hyperv_fb driver can
|
|
|
|
* grab it later.
|
|
|
|
*/
|
|
|
|
pci_dev_put(pdev);
|
|
|
|
}
|
|
|
|
|
|
|
|
if (!start)
|
|
|
|
return;
|
|
|
|
|
2016-04-06 01:22:55 +08:00
|
|
|
/*
|
|
|
|
* Make a claim for the frame buffer in the resource tree under the
|
|
|
|
* first node, which will be the one below 4GB. The length seems to
|
|
|
|
* be underreported, particularly in a Generation 1 VM. So start out
|
|
|
|
* reserving a larger area and make it smaller until it succeeds.
|
|
|
|
*/
|
2022-08-27 21:03:44 +08:00
|
|
|
for (; !fb_mmio && (size >= 0x100000); size >>= 1)
|
|
|
|
fb_mmio = __request_region(hyperv_mmio, start, size, fb_mmio_name, 0);
|
2016-04-06 01:22:55 +08:00
|
|
|
}
|
|
|
|
|
2015-08-05 15:52:37 +08:00
|
|
|
/**
|
|
|
|
* vmbus_allocate_mmio() - Pick a memory-mapped I/O range.
|
|
|
|
* @new: If successful, supplied a pointer to the
|
|
|
|
* allocated MMIO space.
|
|
|
|
* @device_obj: Identifies the caller
|
|
|
|
* @min: Minimum guest physical address of the
|
|
|
|
* allocation
|
|
|
|
* @max: Maximum guest physical address
|
|
|
|
* @size: Size of the range to be allocated
|
|
|
|
* @align: Alignment of the range to be allocated
|
|
|
|
* @fb_overlap_ok: Whether this allocation can be allowed
|
|
|
|
* to overlap the video frame buffer.
|
|
|
|
*
|
|
|
|
* This function walks the resources granted to VMBus by the
|
|
|
|
* _CRS object in the ACPI namespace underneath the parent
|
|
|
|
* "bridge" whether that's a root PCI bus in the Generation 1
|
|
|
|
* case or a Module Device in the Generation 2 case. It then
|
|
|
|
* attempts to allocate from the global MMIO pool in a way that
|
|
|
|
* matches the constraints supplied in these parameters and by
|
|
|
|
* that _CRS.
|
|
|
|
*
|
|
|
|
* Return: 0 on success, -errno on failure
|
|
|
|
*/
|
|
|
|
int vmbus_allocate_mmio(struct resource **new, struct hv_device *device_obj,
|
|
|
|
resource_size_t min, resource_size_t max,
|
|
|
|
resource_size_t size, resource_size_t align,
|
|
|
|
bool fb_overlap_ok)
|
|
|
|
{
|
2016-04-06 01:22:54 +08:00
|
|
|
struct resource *iter, *shadow;
|
Drivers: hv: Never allocate anything besides framebuffer from framebuffer memory region
Passed through PCI device sometimes misbehave on Gen1 VMs when Hyper-V
DRM driver is also loaded. Looking at IOMEM assignment, we can see e.g.
$ cat /proc/iomem
...
f8000000-fffbffff : PCI Bus 0000:00
f8000000-fbffffff : 0000:00:08.0
f8000000-f8001fff : bb8c4f33-2ba2-4808-9f7f-02f3b4da22fe
...
fe0000000-fffffffff : PCI Bus 0000:00
fe0000000-fe07fffff : bb8c4f33-2ba2-4808-9f7f-02f3b4da22fe
fe0000000-fe07fffff : 2ba2:00:02.0
fe0000000-fe07fffff : mlx4_core
the interesting part is the 'f8000000' region as it is actually the
VM's framebuffer:
$ lspci -v
...
0000:00:08.0 VGA compatible controller: Microsoft Corporation Hyper-V virtual VGA (prog-if 00 [VGA controller])
Flags: bus master, fast devsel, latency 0, IRQ 11
Memory at f8000000 (32-bit, non-prefetchable) [size=64M]
...
hv_vmbus: registering driver hyperv_drm
hyperv_drm 5620e0c7-8062-4dce-aeb7-520c7ef76171: [drm] Synthvid Version major 3, minor 5
hyperv_drm 0000:00:08.0: vgaarb: deactivate vga console
hyperv_drm 0000:00:08.0: BAR 0: can't reserve [mem 0xf8000000-0xfbffffff]
hyperv_drm 5620e0c7-8062-4dce-aeb7-520c7ef76171: [drm] Cannot request framebuffer, boot fb still active?
Note: "Cannot request framebuffer" is not a fatal error in
hyperv_setup_gen1() as the code assumes there's some other framebuffer
device there but we actually have some other PCI device (mlx4 in this
case) config space there!
The problem appears to be that vmbus_allocate_mmio() can use dedicated
framebuffer region to serve any MMIO request from any device. The
semantics one might assume of a parameter named "fb_overlap_ok"
aren't implemented because !fb_overlap_ok essentially has no effect.
The existing semantics are really "prefer_fb_overlap". This patch
implements the expected and needed semantics, which is to not allocate
from the frame buffer space when !fb_overlap_ok.
Note, Gen2 VMs are usually unaffected by the issue because
framebuffer region is already taken by EFI fb (in case kernel supports
it) but Gen1 VMs may have this region unclaimed by the time Hyper-V PCI
pass-through driver tries allocating MMIO space if Hyper-V DRM/FB drivers
load after it. Devices can be brought up in any sequence so let's
resolve the issue by always ignoring 'fb_mmio' region for non-FB
requests, even if the region is unclaimed.
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Link: https://lore.kernel.org/r/20220827130345.1320254-4-vkuznets@redhat.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>
2022-08-27 21:03:45 +08:00
|
|
|
resource_size_t range_min, range_max, start, end;
|
2015-08-05 15:52:37 +08:00
|
|
|
const char *dev_n = dev_name(&device_obj->device);
|
2016-04-06 01:22:56 +08:00
|
|
|
int retval;
|
2016-04-06 01:22:50 +08:00
|
|
|
|
|
|
|
retval = -ENXIO;
|
2019-11-02 04:00:04 +08:00
|
|
|
mutex_lock(&hyperv_mmio_lock);
|
2015-08-05 15:52:37 +08:00
|
|
|
|
2016-04-06 01:22:56 +08:00
|
|
|
/*
|
|
|
|
* If overlaps with frame buffers are allowed, then first attempt to
|
|
|
|
* make the allocation from within the reserved region. Because it
|
|
|
|
* is already reserved, no shadow allocation is necessary.
|
|
|
|
*/
|
|
|
|
if (fb_overlap_ok && fb_mmio && !(min > fb_mmio->end) &&
|
|
|
|
!(max < fb_mmio->start)) {
|
|
|
|
|
|
|
|
range_min = fb_mmio->start;
|
|
|
|
range_max = fb_mmio->end;
|
|
|
|
start = (range_min + align - 1) & ~(align - 1);
|
|
|
|
for (; start + size - 1 <= range_max; start += align) {
|
|
|
|
*new = request_mem_region_exclusive(start, size, dev_n);
|
|
|
|
if (*new) {
|
|
|
|
retval = 0;
|
|
|
|
goto exit;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2015-08-05 15:52:37 +08:00
|
|
|
for (iter = hyperv_mmio; iter; iter = iter->sibling) {
|
|
|
|
if ((iter->start >= max) || (iter->end <= min))
|
|
|
|
continue;
|
|
|
|
|
|
|
|
range_min = iter->start;
|
|
|
|
range_max = iter->end;
|
2016-04-06 01:22:56 +08:00
|
|
|
start = (range_min + align - 1) & ~(align - 1);
|
|
|
|
for (; start + size - 1 <= range_max; start += align) {
|
Drivers: hv: Never allocate anything besides framebuffer from framebuffer memory region
Passed through PCI device sometimes misbehave on Gen1 VMs when Hyper-V
DRM driver is also loaded. Looking at IOMEM assignment, we can see e.g.
$ cat /proc/iomem
...
f8000000-fffbffff : PCI Bus 0000:00
f8000000-fbffffff : 0000:00:08.0
f8000000-f8001fff : bb8c4f33-2ba2-4808-9f7f-02f3b4da22fe
...
fe0000000-fffffffff : PCI Bus 0000:00
fe0000000-fe07fffff : bb8c4f33-2ba2-4808-9f7f-02f3b4da22fe
fe0000000-fe07fffff : 2ba2:00:02.0
fe0000000-fe07fffff : mlx4_core
the interesting part is the 'f8000000' region as it is actually the
VM's framebuffer:
$ lspci -v
...
0000:00:08.0 VGA compatible controller: Microsoft Corporation Hyper-V virtual VGA (prog-if 00 [VGA controller])
Flags: bus master, fast devsel, latency 0, IRQ 11
Memory at f8000000 (32-bit, non-prefetchable) [size=64M]
...
hv_vmbus: registering driver hyperv_drm
hyperv_drm 5620e0c7-8062-4dce-aeb7-520c7ef76171: [drm] Synthvid Version major 3, minor 5
hyperv_drm 0000:00:08.0: vgaarb: deactivate vga console
hyperv_drm 0000:00:08.0: BAR 0: can't reserve [mem 0xf8000000-0xfbffffff]
hyperv_drm 5620e0c7-8062-4dce-aeb7-520c7ef76171: [drm] Cannot request framebuffer, boot fb still active?
Note: "Cannot request framebuffer" is not a fatal error in
hyperv_setup_gen1() as the code assumes there's some other framebuffer
device there but we actually have some other PCI device (mlx4 in this
case) config space there!
The problem appears to be that vmbus_allocate_mmio() can use dedicated
framebuffer region to serve any MMIO request from any device. The
semantics one might assume of a parameter named "fb_overlap_ok"
aren't implemented because !fb_overlap_ok essentially has no effect.
The existing semantics are really "prefer_fb_overlap". This patch
implements the expected and needed semantics, which is to not allocate
from the frame buffer space when !fb_overlap_ok.
Note, Gen2 VMs are usually unaffected by the issue because
framebuffer region is already taken by EFI fb (in case kernel supports
it) but Gen1 VMs may have this region unclaimed by the time Hyper-V PCI
pass-through driver tries allocating MMIO space if Hyper-V DRM/FB drivers
load after it. Devices can be brought up in any sequence so let's
resolve the issue by always ignoring 'fb_mmio' region for non-FB
requests, even if the region is unclaimed.
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Link: https://lore.kernel.org/r/20220827130345.1320254-4-vkuznets@redhat.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>
2022-08-27 21:03:45 +08:00
|
|
|
end = start + size - 1;
|
|
|
|
|
|
|
|
/* Skip the whole fb_mmio region if not fb_overlap_ok */
|
|
|
|
if (!fb_overlap_ok && fb_mmio &&
|
|
|
|
(((start >= fb_mmio->start) && (start <= fb_mmio->end)) ||
|
|
|
|
((end >= fb_mmio->start) && (end <= fb_mmio->end))))
|
|
|
|
continue;
|
|
|
|
|
2016-04-06 01:22:56 +08:00
|
|
|
shadow = __request_region(iter, start, size, NULL,
|
|
|
|
IORESOURCE_BUSY);
|
|
|
|
if (!shadow)
|
|
|
|
continue;
|
|
|
|
|
|
|
|
*new = request_mem_region_exclusive(start, size, dev_n);
|
|
|
|
if (*new) {
|
|
|
|
shadow->name = (char *)*new;
|
|
|
|
retval = 0;
|
|
|
|
goto exit;
|
2015-08-05 15:52:37 +08:00
|
|
|
}
|
|
|
|
|
2016-04-06 01:22:56 +08:00
|
|
|
__release_region(iter, start, size);
|
2015-08-05 15:52:37 +08:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2016-04-06 01:22:50 +08:00
|
|
|
exit:
|
2019-11-02 04:00:04 +08:00
|
|
|
mutex_unlock(&hyperv_mmio_lock);
|
2016-04-06 01:22:50 +08:00
|
|
|
return retval;
|
2015-08-05 15:52:37 +08:00
|
|
|
}
|
|
|
|
EXPORT_SYMBOL_GPL(vmbus_allocate_mmio);
|
|
|
|
|
2016-04-06 01:22:51 +08:00
|
|
|
/**
|
|
|
|
* vmbus_free_mmio() - Free a memory-mapped I/O range.
|
|
|
|
* @start: Base address of region to release.
|
|
|
|
* @size: Size of the range to be allocated
|
|
|
|
*
|
|
|
|
* This function releases anything requested by
|
|
|
|
* vmbus_mmio_allocate().
|
|
|
|
*/
|
|
|
|
void vmbus_free_mmio(resource_size_t start, resource_size_t size)
|
|
|
|
{
|
2016-04-06 01:22:54 +08:00
|
|
|
struct resource *iter;
|
|
|
|
|
2019-11-02 04:00:04 +08:00
|
|
|
mutex_lock(&hyperv_mmio_lock);
|
2016-04-06 01:22:54 +08:00
|
|
|
for (iter = hyperv_mmio; iter; iter = iter->sibling) {
|
|
|
|
if ((iter->start >= start + size) || (iter->end <= start))
|
|
|
|
continue;
|
|
|
|
|
|
|
|
__release_region(iter, start, size);
|
|
|
|
}
|
2016-04-06 01:22:51 +08:00
|
|
|
release_mem_region(start, size);
|
2019-11-02 04:00:04 +08:00
|
|
|
mutex_unlock(&hyperv_mmio_lock);
|
2016-04-06 01:22:51 +08:00
|
|
|
|
|
|
|
}
|
|
|
|
EXPORT_SYMBOL_GPL(vmbus_free_mmio);
|
|
|
|
|
2023-03-20 15:47:40 +08:00
|
|
|
#ifdef CONFIG_ACPI
|
2023-03-20 15:47:38 +08:00
|
|
|
static int vmbus_acpi_add(struct platform_device *pdev)
|
2011-04-30 04:45:15 +08:00
|
|
|
{
|
|
|
|
acpi_status result;
|
2014-01-30 10:14:39 +08:00
|
|
|
int ret_val = -ENODEV;
|
2015-08-05 15:52:36 +08:00
|
|
|
struct acpi_device *ancestor;
|
2023-03-20 15:47:38 +08:00
|
|
|
struct acpi_device *device = ACPI_COMPANION(&pdev->dev);
|
2011-04-30 04:45:15 +08:00
|
|
|
|
2023-03-20 15:47:38 +08:00
|
|
|
hv_dev = &device->dev;
|
2011-06-07 06:49:39 +08:00
|
|
|
|
2022-03-25 00:14:51 +08:00
|
|
|
/*
|
|
|
|
* Older versions of Hyper-V for ARM64 fail to include the _CCA
|
|
|
|
* method on the top level VMbus device in the DSDT. But devices
|
|
|
|
* are hardware coherent in all current Hyper-V use cases, so fix
|
|
|
|
* up the ACPI device to behave as if _CCA is present and indicates
|
|
|
|
* hardware coherence.
|
|
|
|
*/
|
|
|
|
ACPI_COMPANION_SET(&device->dev, device);
|
|
|
|
if (IS_ENABLED(CONFIG_ACPI_CCA_REQUIRED) &&
|
|
|
|
device_get_dma_attr(&device->dev) == DEV_DMA_NOT_SUPPORTED) {
|
|
|
|
pr_info("No ACPI _CCA found; assuming coherent device I/O\n");
|
|
|
|
device->flags.cca_seen = true;
|
|
|
|
device->flags.coherent_dma = true;
|
|
|
|
}
|
|
|
|
|
2011-08-28 02:31:38 +08:00
|
|
|
result = acpi_walk_resources(device->handle, METHOD_NAME__CRS,
|
2014-01-30 10:14:39 +08:00
|
|
|
vmbus_walk_resources, NULL);
|
2011-04-30 04:45:15 +08:00
|
|
|
|
2014-01-30 10:14:39 +08:00
|
|
|
if (ACPI_FAILURE(result))
|
|
|
|
goto acpi_walk_err;
|
|
|
|
/*
|
2015-08-05 15:52:36 +08:00
|
|
|
* Some ancestor of the vmbus acpi device (Gen1 or Gen2
|
|
|
|
* firmware) is the VMOD that has the mmio ranges. Get that.
|
2014-01-30 10:14:39 +08:00
|
|
|
*/
|
2023-08-10 02:40:18 +08:00
|
|
|
for (ancestor = acpi_dev_parent(device);
|
|
|
|
ancestor && ancestor->handle != ACPI_ROOT_OBJECT;
|
2022-08-25 00:59:48 +08:00
|
|
|
ancestor = acpi_dev_parent(ancestor)) {
|
2015-08-05 15:52:36 +08:00
|
|
|
result = acpi_walk_resources(ancestor->handle, METHOD_NAME__CRS,
|
|
|
|
vmbus_walk_resources, NULL);
|
2014-01-30 10:14:39 +08:00
|
|
|
|
|
|
|
if (ACPI_FAILURE(result))
|
2015-08-05 15:52:36 +08:00
|
|
|
continue;
|
2016-04-06 01:22:55 +08:00
|
|
|
if (hyperv_mmio) {
|
|
|
|
vmbus_reserve_fb();
|
2015-08-05 15:52:36 +08:00
|
|
|
break;
|
2016-04-06 01:22:55 +08:00
|
|
|
}
|
2011-04-30 04:45:15 +08:00
|
|
|
}
|
2014-01-30 10:14:39 +08:00
|
|
|
ret_val = 0;
|
|
|
|
|
|
|
|
acpi_walk_err:
|
2015-08-05 15:52:36 +08:00
|
|
|
if (ret_val)
|
2023-03-20 15:47:38 +08:00
|
|
|
vmbus_mmio_remove();
|
2014-01-30 10:14:39 +08:00
|
|
|
return ret_val;
|
2011-04-30 04:45:15 +08:00
|
|
|
}
|
2023-03-20 15:47:40 +08:00
|
|
|
#else
|
|
|
|
static int vmbus_acpi_add(struct platform_device *pdev)
|
|
|
|
{
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
#endif
|
|
|
|
|
|
|
|
static int vmbus_device_add(struct platform_device *pdev)
|
|
|
|
{
|
|
|
|
struct resource **cur_res = &hyperv_mmio;
|
|
|
|
struct of_range range;
|
|
|
|
struct of_range_parser parser;
|
|
|
|
struct device_node *np = pdev->dev.of_node;
|
|
|
|
int ret;
|
|
|
|
|
|
|
|
hv_dev = &pdev->dev;
|
|
|
|
|
|
|
|
ret = of_range_parser_init(&parser, np);
|
|
|
|
if (ret)
|
|
|
|
return ret;
|
|
|
|
|
|
|
|
for_each_of_range(&parser, &range) {
|
|
|
|
struct resource *res;
|
|
|
|
|
|
|
|
res = kzalloc(sizeof(*res), GFP_KERNEL);
|
|
|
|
if (!res) {
|
|
|
|
vmbus_mmio_remove();
|
|
|
|
return -ENOMEM;
|
|
|
|
}
|
|
|
|
|
|
|
|
res->name = "hyperv mmio";
|
|
|
|
res->flags = range.flags;
|
|
|
|
res->start = range.cpu_addr;
|
|
|
|
res->end = range.cpu_addr + range.size;
|
|
|
|
|
|
|
|
*cur_res = res;
|
|
|
|
cur_res = &res->sibling;
|
|
|
|
}
|
|
|
|
|
|
|
|
return ret;
|
|
|
|
}
|
2011-04-30 04:45:15 +08:00
|
|
|
|
2023-03-20 15:47:38 +08:00
|
|
|
static int vmbus_platform_driver_probe(struct platform_device *pdev)
|
|
|
|
{
|
2023-03-20 15:47:40 +08:00
|
|
|
if (acpi_disabled)
|
|
|
|
return vmbus_device_add(pdev);
|
|
|
|
else
|
|
|
|
return vmbus_acpi_add(pdev);
|
2023-03-20 15:47:38 +08:00
|
|
|
}
|
|
|
|
|
2024-03-08 16:51:08 +08:00
|
|
|
static void vmbus_platform_driver_remove(struct platform_device *pdev)
|
2023-03-20 15:47:38 +08:00
|
|
|
{
|
|
|
|
vmbus_mmio_remove();
|
|
|
|
}
|
2011-04-30 04:45:15 +08:00
|
|
|
|
2019-09-20 05:46:12 +08:00
|
|
|
#ifdef CONFIG_PM_SLEEP
|
2019-09-06 07:01:19 +08:00
|
|
|
static int vmbus_bus_suspend(struct device *dev)
|
|
|
|
{
|
2022-07-11 12:11:47 +08:00
|
|
|
struct hv_per_cpu_context *hv_cpu = per_cpu_ptr(
|
|
|
|
hv_context.cpu_context, VMBUS_CONNECT_CPU);
|
2019-09-06 07:01:21 +08:00
|
|
|
struct vmbus_channel *channel, *sc;
|
2019-09-06 07:01:20 +08:00
|
|
|
|
2022-07-11 12:11:47 +08:00
|
|
|
tasklet_disable(&hv_cpu->msg_dpc);
|
|
|
|
vmbus_connection.ignore_any_offer_msg = true;
|
|
|
|
/* The tasklet_enable() takes care of providing a memory barrier */
|
|
|
|
tasklet_enable(&hv_cpu->msg_dpc);
|
|
|
|
|
|
|
|
/* Drain all the workqueues as we are in suspend */
|
|
|
|
drain_workqueue(vmbus_connection.rescind_work_queue);
|
|
|
|
drain_workqueue(vmbus_connection.work_queue);
|
|
|
|
drain_workqueue(vmbus_connection.handle_primary_chan_wq);
|
|
|
|
drain_workqueue(vmbus_connection.handle_sub_chan_wq);
|
2019-09-06 07:01:20 +08:00
|
|
|
|
|
|
|
mutex_lock(&vmbus_connection.channel_mutex);
|
|
|
|
list_for_each_entry(channel, &vmbus_connection.chn_list, listentry) {
|
|
|
|
if (!is_hvsock_channel(channel))
|
|
|
|
continue;
|
|
|
|
|
|
|
|
vmbus_force_channel_rescinded(channel);
|
|
|
|
}
|
|
|
|
mutex_unlock(&vmbus_connection.channel_mutex);
|
|
|
|
|
2019-09-06 07:01:21 +08:00
|
|
|
/*
|
|
|
|
* Wait until all the sub-channels and hv_sock channels have been
|
|
|
|
* cleaned up. Sub-channels should be destroyed upon suspend, otherwise
|
|
|
|
* they would conflict with the new sub-channels that will be created
|
|
|
|
* in the resume path. hv_sock channels should also be destroyed, but
|
|
|
|
* a hv_sock channel of an established hv_sock connection can not be
|
|
|
|
* really destroyed since it may still be referenced by the userspace
|
|
|
|
* application, so we just force the hv_sock channel to be rescinded
|
|
|
|
* by vmbus_force_channel_rescinded(), and the userspace application
|
|
|
|
* will thoroughly destroy the channel after hibernation.
|
|
|
|
*
|
|
|
|
* Note: the counter nr_chan_close_on_suspend may never go above 0 if
|
|
|
|
* the VM has no sub-channel and hv_sock channel, e.g. a 1-vCPU VM.
|
|
|
|
*/
|
|
|
|
if (atomic_read(&vmbus_connection.nr_chan_close_on_suspend) > 0)
|
|
|
|
wait_for_completion(&vmbus_connection.ready_for_suspend_event);
|
|
|
|
|
2020-09-05 10:55:55 +08:00
|
|
|
if (atomic_read(&vmbus_connection.nr_chan_fixup_on_resume) != 0) {
|
|
|
|
pr_err("Can not suspend due to a previous failed resuming\n");
|
|
|
|
return -EBUSY;
|
|
|
|
}
|
2019-09-06 07:01:22 +08:00
|
|
|
|
2019-09-06 07:01:21 +08:00
|
|
|
mutex_lock(&vmbus_connection.channel_mutex);
|
|
|
|
|
|
|
|
list_for_each_entry(channel, &vmbus_connection.chn_list, listentry) {
|
2019-09-06 07:01:22 +08:00
|
|
|
/*
|
2020-04-06 08:15:06 +08:00
|
|
|
* Remove the channel from the array of channels and invalidate
|
|
|
|
* the channel's relid. Upon resume, vmbus_onoffer() will fix
|
|
|
|
* up the relid (and other fields, if necessary) and add the
|
|
|
|
* channel back to the array.
|
2019-09-06 07:01:22 +08:00
|
|
|
*/
|
2020-04-06 08:15:06 +08:00
|
|
|
vmbus_channel_unmap_relid(channel);
|
2019-09-06 07:01:22 +08:00
|
|
|
channel->offermsg.child_relid = INVALID_RELID;
|
|
|
|
|
2019-09-06 07:01:21 +08:00
|
|
|
if (is_hvsock_channel(channel)) {
|
|
|
|
if (!channel->rescind) {
|
|
|
|
pr_err("hv_sock channel not rescinded!\n");
|
|
|
|
WARN_ON_ONCE(1);
|
|
|
|
}
|
|
|
|
continue;
|
|
|
|
}
|
|
|
|
|
|
|
|
list_for_each_entry(sc, &channel->sc_list, sc_list) {
|
|
|
|
pr_err("Sub-channel not deleted!\n");
|
|
|
|
WARN_ON_ONCE(1);
|
|
|
|
}
|
2019-09-06 07:01:22 +08:00
|
|
|
|
|
|
|
atomic_inc(&vmbus_connection.nr_chan_fixup_on_resume);
|
2019-09-06 07:01:21 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
mutex_unlock(&vmbus_connection.channel_mutex);
|
|
|
|
|
2019-09-06 07:01:19 +08:00
|
|
|
vmbus_initiate_unload(false);
|
|
|
|
|
2019-09-06 07:01:22 +08:00
|
|
|
/* Reset the event for the next resume. */
|
|
|
|
reinit_completion(&vmbus_connection.ready_for_resume_event);
|
|
|
|
|
2019-09-06 07:01:19 +08:00
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
static int vmbus_bus_resume(struct device *dev)
|
|
|
|
{
|
|
|
|
struct vmbus_channel_msginfo *msginfo;
|
|
|
|
size_t msgsize;
|
|
|
|
int ret;
|
|
|
|
|
2022-07-11 12:11:47 +08:00
|
|
|
vmbus_connection.ignore_any_offer_msg = false;
|
|
|
|
|
2019-09-06 07:01:19 +08:00
|
|
|
/*
|
|
|
|
* We only use the 'vmbus_proto_version', which was in use before
|
|
|
|
* hibernation, to re-negotiate with the host.
|
|
|
|
*/
|
2019-10-15 19:46:44 +08:00
|
|
|
if (!vmbus_proto_version) {
|
2019-09-06 07:01:19 +08:00
|
|
|
pr_err("Invalid proto version = 0x%x\n", vmbus_proto_version);
|
|
|
|
return -EINVAL;
|
|
|
|
}
|
|
|
|
|
|
|
|
msgsize = sizeof(*msginfo) +
|
|
|
|
sizeof(struct vmbus_channel_initiate_contact);
|
|
|
|
|
|
|
|
msginfo = kzalloc(msgsize, GFP_KERNEL);
|
|
|
|
|
|
|
|
if (msginfo == NULL)
|
|
|
|
return -ENOMEM;
|
|
|
|
|
|
|
|
ret = vmbus_negotiate_version(msginfo, vmbus_proto_version);
|
|
|
|
|
|
|
|
kfree(msginfo);
|
|
|
|
|
|
|
|
if (ret != 0)
|
|
|
|
return ret;
|
|
|
|
|
2019-09-06 07:01:22 +08:00
|
|
|
WARN_ON(atomic_read(&vmbus_connection.nr_chan_fixup_on_resume) == 0);
|
|
|
|
|
2019-09-06 07:01:19 +08:00
|
|
|
vmbus_request_offers();
|
|
|
|
|
2020-09-05 10:55:55 +08:00
|
|
|
if (wait_for_completion_timeout(
|
|
|
|
&vmbus_connection.ready_for_resume_event, 10 * HZ) == 0)
|
|
|
|
pr_err("Some vmbus device is missing after suspending?\n");
|
2019-09-06 07:01:22 +08:00
|
|
|
|
2019-09-06 07:01:21 +08:00
|
|
|
/* Reset the event for the next suspend. */
|
|
|
|
reinit_completion(&vmbus_connection.ready_for_suspend_event);
|
|
|
|
|
2019-09-06 07:01:19 +08:00
|
|
|
return 0;
|
|
|
|
}
|
2020-04-12 11:50:35 +08:00
|
|
|
#else
|
|
|
|
#define vmbus_bus_suspend NULL
|
|
|
|
#define vmbus_bus_resume NULL
|
2019-09-20 05:46:12 +08:00
|
|
|
#endif /* CONFIG_PM_SLEEP */
|
2019-09-06 07:01:19 +08:00
|
|
|
|
2023-03-20 15:47:40 +08:00
|
|
|
static const __maybe_unused struct of_device_id vmbus_of_match[] = {
|
|
|
|
{
|
|
|
|
.compatible = "microsoft,vmbus",
|
|
|
|
},
|
|
|
|
{
|
|
|
|
/* sentinel */
|
|
|
|
},
|
|
|
|
};
|
|
|
|
MODULE_DEVICE_TABLE(of, vmbus_of_match);
|
|
|
|
|
|
|
|
static const __maybe_unused struct acpi_device_id vmbus_acpi_device_ids[] = {
|
2011-04-30 04:45:15 +08:00
|
|
|
{"VMBUS", 0},
|
2011-06-07 06:49:42 +08:00
|
|
|
{"VMBus", 0},
|
2011-04-30 04:45:15 +08:00
|
|
|
{"", 0},
|
|
|
|
};
|
|
|
|
MODULE_DEVICE_TABLE(acpi, vmbus_acpi_device_ids);
|
|
|
|
|
2019-09-06 07:01:19 +08:00
|
|
|
/*
|
2020-04-12 11:50:35 +08:00
|
|
|
* Note: we must use the "no_irq" ops, otherwise hibernation can not work with
|
|
|
|
* PCI device assignment, because "pci_dev_pm_ops" uses the "noirq" ops: in
|
|
|
|
* the resume path, the pci "noirq" restore op runs before "non-noirq" op (see
|
2019-09-06 07:01:19 +08:00
|
|
|
* resume_target_kernel() -> dpm_resume_start(), and hibernation_restore() ->
|
|
|
|
* dpm_resume_end()). This means vmbus_bus_resume() and the pci-hyperv's
|
2020-04-12 11:50:35 +08:00
|
|
|
* resume callback must also run via the "noirq" ops.
|
|
|
|
*
|
|
|
|
* Set suspend_noirq/resume_noirq to NULL for Suspend-to-Idle: see the comment
|
|
|
|
* earlier in this file before vmbus_pm.
|
2019-09-06 07:01:19 +08:00
|
|
|
*/
|
2020-04-12 11:50:35 +08:00
|
|
|
|
2019-09-06 07:01:19 +08:00
|
|
|
static const struct dev_pm_ops vmbus_bus_pm = {
|
2020-04-12 11:50:35 +08:00
|
|
|
.suspend_noirq = NULL,
|
|
|
|
.resume_noirq = NULL,
|
|
|
|
.freeze_noirq = vmbus_bus_suspend,
|
|
|
|
.thaw_noirq = vmbus_bus_resume,
|
|
|
|
.poweroff_noirq = vmbus_bus_suspend,
|
|
|
|
.restore_noirq = vmbus_bus_resume
|
2019-09-06 07:01:19 +08:00
|
|
|
};
|
|
|
|
|
2023-03-20 15:47:38 +08:00
|
|
|
static struct platform_driver vmbus_platform_driver = {
|
|
|
|
.probe = vmbus_platform_driver_probe,
|
2024-03-08 16:51:08 +08:00
|
|
|
.remove_new = vmbus_platform_driver_remove,
|
2023-03-20 15:47:38 +08:00
|
|
|
.driver = {
|
|
|
|
.name = "vmbus",
|
|
|
|
.acpi_match_table = ACPI_PTR(vmbus_acpi_device_ids),
|
2023-03-20 15:47:40 +08:00
|
|
|
.of_match_table = of_match_ptr(vmbus_of_match),
|
2023-03-20 15:47:38 +08:00
|
|
|
.pm = &vmbus_bus_pm,
|
|
|
|
.probe_type = PROBE_FORCE_SYNCHRONOUS,
|
|
|
|
}
|
2011-04-30 04:45:15 +08:00
|
|
|
};
|
|
|
|
|
2015-08-02 07:08:07 +08:00
|
|
|
static void hv_kexec_handler(void)
|
|
|
|
{
|
2019-07-01 12:25:56 +08:00
|
|
|
hv_stimer_global_cleanup();
|
2016-02-27 07:13:16 +08:00
|
|
|
vmbus_initiate_unload(false);
|
2016-12-08 06:53:12 +08:00
|
|
|
/* Make sure conn_state is set as hv_synic_cleanup checks for it */
|
|
|
|
mb();
|
2016-12-08 06:53:11 +08:00
|
|
|
cpuhp_remove_state(hyperv_cpuhp_online);
|
2015-08-02 07:08:07 +08:00
|
|
|
};
|
|
|
|
|
2015-08-02 07:08:09 +08:00
|
|
|
static void hv_crash_handler(struct pt_regs *regs)
|
|
|
|
{
|
2019-07-01 12:25:56 +08:00
|
|
|
int cpu;
|
|
|
|
|
2016-02-27 07:13:16 +08:00
|
|
|
vmbus_initiate_unload(true);
|
2015-08-02 07:08:09 +08:00
|
|
|
/*
|
|
|
|
* In crash handler we can't schedule synic cleanup for all CPUs,
|
|
|
|
* doing the cleanup for current CPU only. This should be sufficient
|
|
|
|
* for kdump.
|
|
|
|
*/
|
2019-07-01 12:25:56 +08:00
|
|
|
cpu = smp_processor_id();
|
|
|
|
hv_stimer_cleanup(cpu);
|
2019-11-14 14:32:01 +08:00
|
|
|
hv_synic_disable_regs(cpu);
|
2015-08-02 07:08:09 +08:00
|
|
|
};
|
|
|
|
|
2019-09-06 07:01:16 +08:00
|
|
|
static int hv_synic_suspend(void)
|
|
|
|
{
|
|
|
|
/*
|
x86/hyperv: Initialize clockevents earlier in CPU onlining
Hyper-V has historically initialized stimer-based clockevents late in the
process of onlining a CPU because clockevents depend on stimer
interrupts. In the original Hyper-V design, stimer interrupts generate a
VMbus message, so the VMbus machinery must be running first, and VMbus
can't be initialized until relatively late. On x86/64, LAPIC timer based
clockevents are used during early initialization before VMbus and
stimer-based clockevents are ready, and again during CPU offlining after
the stimer clockevents have been shut down.
Unfortunately, this design creates problems when offlining CPUs for
hibernation or other purposes. stimer-based clockevents are shut down
relatively early in the offlining process, so clockevents_unbind_device()
must be used to fallback to the LAPIC-based clockevents for the remainder
of the offlining process. Furthermore, the late initialization and early
shutdown of stimer-based clockevents doesn't work well on ARM64 since there
is no other timer like the LAPIC to fallback to. So CPU onlining and
offlining doesn't work properly.
Fix this by recognizing that stimer Direct Mode is the normal path for
newer versions of Hyper-V on x86/64, and the only path on other
architectures. With stimer Direct Mode, stimer interrupts don't require any
VMbus machinery. stimer clockevents can be initialized and shut down
consistent with how it is done for other clockevent devices. While the old
VMbus-based stimer interrupts must still be supported for backward
compatibility on x86, that mode of operation can be treated as legacy.
So add a new Hyper-V stimer entry in the CPU hotplug state list, and use
that new state when in Direct Mode. Update the Hyper-V clocksource driver
to allocate and initialize stimer clockevents earlier during boot. Update
Hyper-V initialization and the VMbus driver to use this new design. As a
result, the LAPIC timer is no longer used during boot or CPU
onlining/offlining and clockevents_unbind_device() is not called. But
retain the old design as a legacy implementation for older versions of
Hyper-V that don't support Direct Mode.
Signed-off-by: Michael Kelley <mikelley@microsoft.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Dexuan Cui <decui@microsoft.com>
Reviewed-by: Dexuan Cui <decui@microsoft.com>
Link: https://lkml.kernel.org/r/1573607467-9456-1-git-send-email-mikelley@microsoft.com
2019-11-13 09:11:49 +08:00
|
|
|
* When we reach here, all the non-boot CPUs have been offlined.
|
|
|
|
* If we're in a legacy configuration where stimer Direct Mode is
|
|
|
|
* not enabled, the stimers on the non-boot CPUs have been unbound
|
|
|
|
* in hv_synic_cleanup() -> hv_stimer_legacy_cleanup() ->
|
2019-09-06 07:01:16 +08:00
|
|
|
* hv_stimer_cleanup() -> clockevents_unbind_device().
|
|
|
|
*
|
x86/hyperv: Initialize clockevents earlier in CPU onlining
Hyper-V has historically initialized stimer-based clockevents late in the
process of onlining a CPU because clockevents depend on stimer
interrupts. In the original Hyper-V design, stimer interrupts generate a
VMbus message, so the VMbus machinery must be running first, and VMbus
can't be initialized until relatively late. On x86/64, LAPIC timer based
clockevents are used during early initialization before VMbus and
stimer-based clockevents are ready, and again during CPU offlining after
the stimer clockevents have been shut down.
Unfortunately, this design creates problems when offlining CPUs for
hibernation or other purposes. stimer-based clockevents are shut down
relatively early in the offlining process, so clockevents_unbind_device()
must be used to fallback to the LAPIC-based clockevents for the remainder
of the offlining process. Furthermore, the late initialization and early
shutdown of stimer-based clockevents doesn't work well on ARM64 since there
is no other timer like the LAPIC to fallback to. So CPU onlining and
offlining doesn't work properly.
Fix this by recognizing that stimer Direct Mode is the normal path for
newer versions of Hyper-V on x86/64, and the only path on other
architectures. With stimer Direct Mode, stimer interrupts don't require any
VMbus machinery. stimer clockevents can be initialized and shut down
consistent with how it is done for other clockevent devices. While the old
VMbus-based stimer interrupts must still be supported for backward
compatibility on x86, that mode of operation can be treated as legacy.
So add a new Hyper-V stimer entry in the CPU hotplug state list, and use
that new state when in Direct Mode. Update the Hyper-V clocksource driver
to allocate and initialize stimer clockevents earlier during boot. Update
Hyper-V initialization and the VMbus driver to use this new design. As a
result, the LAPIC timer is no longer used during boot or CPU
onlining/offlining and clockevents_unbind_device() is not called. But
retain the old design as a legacy implementation for older versions of
Hyper-V that don't support Direct Mode.
Signed-off-by: Michael Kelley <mikelley@microsoft.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Dexuan Cui <decui@microsoft.com>
Reviewed-by: Dexuan Cui <decui@microsoft.com>
Link: https://lkml.kernel.org/r/1573607467-9456-1-git-send-email-mikelley@microsoft.com
2019-11-13 09:11:49 +08:00
|
|
|
* hv_synic_suspend() only runs on CPU0 with interrupts disabled.
|
|
|
|
* Here we do not call hv_stimer_legacy_cleanup() on CPU0 because:
|
|
|
|
* 1) it's unnecessary as interrupts remain disabled between
|
|
|
|
* syscore_suspend() and syscore_resume(): see create_image() and
|
|
|
|
* resume_target_kernel()
|
2019-09-06 07:01:16 +08:00
|
|
|
* 2) the stimer on CPU0 is automatically disabled later by
|
|
|
|
* syscore_suspend() -> timekeeping_suspend() -> tick_suspend() -> ...
|
x86/hyperv: Initialize clockevents earlier in CPU onlining
Hyper-V has historically initialized stimer-based clockevents late in the
process of onlining a CPU because clockevents depend on stimer
interrupts. In the original Hyper-V design, stimer interrupts generate a
VMbus message, so the VMbus machinery must be running first, and VMbus
can't be initialized until relatively late. On x86/64, LAPIC timer based
clockevents are used during early initialization before VMbus and
stimer-based clockevents are ready, and again during CPU offlining after
the stimer clockevents have been shut down.
Unfortunately, this design creates problems when offlining CPUs for
hibernation or other purposes. stimer-based clockevents are shut down
relatively early in the offlining process, so clockevents_unbind_device()
must be used to fallback to the LAPIC-based clockevents for the remainder
of the offlining process. Furthermore, the late initialization and early
shutdown of stimer-based clockevents doesn't work well on ARM64 since there
is no other timer like the LAPIC to fallback to. So CPU onlining and
offlining doesn't work properly.
Fix this by recognizing that stimer Direct Mode is the normal path for
newer versions of Hyper-V on x86/64, and the only path on other
architectures. With stimer Direct Mode, stimer interrupts don't require any
VMbus machinery. stimer clockevents can be initialized and shut down
consistent with how it is done for other clockevent devices. While the old
VMbus-based stimer interrupts must still be supported for backward
compatibility on x86, that mode of operation can be treated as legacy.
So add a new Hyper-V stimer entry in the CPU hotplug state list, and use
that new state when in Direct Mode. Update the Hyper-V clocksource driver
to allocate and initialize stimer clockevents earlier during boot. Update
Hyper-V initialization and the VMbus driver to use this new design. As a
result, the LAPIC timer is no longer used during boot or CPU
onlining/offlining and clockevents_unbind_device() is not called. But
retain the old design as a legacy implementation for older versions of
Hyper-V that don't support Direct Mode.
Signed-off-by: Michael Kelley <mikelley@microsoft.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Dexuan Cui <decui@microsoft.com>
Reviewed-by: Dexuan Cui <decui@microsoft.com>
Link: https://lkml.kernel.org/r/1573607467-9456-1-git-send-email-mikelley@microsoft.com
2019-11-13 09:11:49 +08:00
|
|
|
* -> clockevents_shutdown() -> ... -> hv_ce_shutdown()
|
|
|
|
* 3) a warning would be triggered if we call
|
|
|
|
* clockevents_unbind_device(), which may sleep, in an
|
|
|
|
* interrupts-disabled context.
|
2019-09-06 07:01:16 +08:00
|
|
|
*/
|
|
|
|
|
|
|
|
hv_synic_disable_regs(0);
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
static void hv_synic_resume(void)
|
|
|
|
{
|
|
|
|
hv_synic_enable_regs(0);
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Note: we don't need to call hv_stimer_init(0), because the timer
|
|
|
|
* on CPU0 is not unbound in hv_synic_suspend(), and the timer is
|
|
|
|
* automatically re-enabled in timekeeping_resume().
|
|
|
|
*/
|
|
|
|
}
|
|
|
|
|
|
|
|
/* The callbacks run only on CPU0, with irqs_disabled. */
|
|
|
|
static struct syscore_ops hv_synic_syscore_ops = {
|
|
|
|
.suspend = hv_synic_suspend,
|
|
|
|
.resume = hv_synic_resume,
|
|
|
|
};
|
|
|
|
|
2011-06-07 06:49:39 +08:00
|
|
|
static int __init hv_acpi_init(void)
|
2011-03-16 06:03:32 +08:00
|
|
|
{
|
2022-09-22 02:39:05 +08:00
|
|
|
int ret;
|
2011-04-30 04:45:15 +08:00
|
|
|
|
2017-12-23 02:19:02 +08:00
|
|
|
if (!hv_is_hyperv_initialized())
|
2012-08-17 18:52:43 +08:00
|
|
|
return -ENODEV;
|
|
|
|
|
2023-01-02 15:12:54 +08:00
|
|
|
if (hv_root_partition && !hv_nested)
|
2021-02-03 23:04:22 +08:00
|
|
|
return 0;
|
|
|
|
|
2011-04-30 04:45:15 +08:00
|
|
|
/*
|
2015-12-15 08:01:46 +08:00
|
|
|
* Get ACPI resources first.
|
2011-04-30 04:45:15 +08:00
|
|
|
*/
|
2023-03-20 15:47:38 +08:00
|
|
|
ret = platform_driver_register(&vmbus_platform_driver);
|
2011-04-30 04:45:15 +08:00
|
|
|
if (ret)
|
|
|
|
return ret;
|
|
|
|
|
2023-03-20 15:47:38 +08:00
|
|
|
if (!hv_dev) {
|
2022-09-22 02:39:05 +08:00
|
|
|
ret = -ENODEV;
|
2011-07-16 04:38:56 +08:00
|
|
|
goto cleanup;
|
|
|
|
}
|
2021-03-03 05:38:18 +08:00
|
|
|
|
|
|
|
/*
|
|
|
|
* If we're on an architecture with a hardcoded hypervisor
|
|
|
|
* vector (i.e. x86/x64), override the VMbus interrupt found
|
|
|
|
* in the ACPI tables. Ensure vmbus_irq is not set since the
|
|
|
|
* normal Linux IRQ mechanism is not used in this case.
|
|
|
|
*/
|
|
|
|
#ifdef HYPERVISOR_CALLBACK_VECTOR
|
|
|
|
vmbus_interrupt = HYPERVISOR_CALLBACK_VECTOR;
|
|
|
|
vmbus_irq = -1;
|
|
|
|
#endif
|
|
|
|
|
2019-10-04 05:01:49 +08:00
|
|
|
hv_debug_init();
|
2011-04-30 04:45:15 +08:00
|
|
|
|
2015-12-15 08:01:46 +08:00
|
|
|
ret = vmbus_bus_init();
|
2011-06-17 04:16:38 +08:00
|
|
|
if (ret)
|
2011-07-16 04:38:56 +08:00
|
|
|
goto cleanup;
|
|
|
|
|
2015-08-02 07:08:07 +08:00
|
|
|
hv_setup_kexec_handler(hv_kexec_handler);
|
2015-08-02 07:08:09 +08:00
|
|
|
hv_setup_crash_handler(hv_crash_handler);
|
2015-08-02 07:08:07 +08:00
|
|
|
|
2019-09-06 07:01:16 +08:00
|
|
|
register_syscore_ops(&hv_synic_syscore_ops);
|
|
|
|
|
2011-07-16 04:38:56 +08:00
|
|
|
return 0;
|
|
|
|
|
|
|
|
cleanup:
|
2023-03-20 15:47:38 +08:00
|
|
|
platform_driver_unregister(&vmbus_platform_driver);
|
|
|
|
hv_dev = NULL;
|
2011-06-17 04:16:38 +08:00
|
|
|
return ret;
|
2011-03-16 06:03:32 +08:00
|
|
|
}
|
|
|
|
|
2011-12-13 01:29:17 +08:00
|
|
|
static void __exit vmbus_exit(void)
|
|
|
|
{
|
2015-02-28 03:25:55 +08:00
|
|
|
int cpu;
|
|
|
|
|
2019-09-06 07:01:16 +08:00
|
|
|
unregister_syscore_ops(&hv_synic_syscore_ops);
|
|
|
|
|
2015-08-02 07:08:07 +08:00
|
|
|
hv_remove_kexec_handler();
|
2015-08-02 07:08:09 +08:00
|
|
|
hv_remove_crash_handler();
|
2015-02-28 03:25:54 +08:00
|
|
|
vmbus_connection.conn_state = DISCONNECTED;
|
2019-07-01 12:25:56 +08:00
|
|
|
hv_stimer_global_cleanup();
|
2015-04-23 12:31:32 +08:00
|
|
|
vmbus_disconnect();
|
2021-03-03 05:38:18 +08:00
|
|
|
if (vmbus_irq == -1) {
|
|
|
|
hv_remove_vmbus_handler();
|
|
|
|
} else {
|
|
|
|
free_percpu_irq(vmbus_irq, vmbus_evt);
|
|
|
|
free_percpu(vmbus_evt);
|
|
|
|
}
|
2017-02-12 14:02:19 +08:00
|
|
|
for_each_online_cpu(cpu) {
|
|
|
|
struct hv_per_cpu_context *hv_cpu
|
|
|
|
= per_cpu_ptr(hv_context.cpu_context, cpu);
|
|
|
|
|
|
|
|
tasklet_kill(&hv_cpu->msg_dpc);
|
|
|
|
}
|
2019-10-04 05:01:49 +08:00
|
|
|
hv_debug_rm_all_dir();
|
|
|
|
|
2011-12-13 01:29:17 +08:00
|
|
|
vmbus_free_channels();
|
2020-04-06 08:15:06 +08:00
|
|
|
kfree(vmbus_connection.channels);
|
2017-02-12 14:02:19 +08:00
|
|
|
|
2022-03-16 04:35:35 +08:00
|
|
|
/*
|
drivers: hv, hyperv_fb: Untangle and refactor Hyper-V panic notifiers
Currently Hyper-V guests are among the most relevant users of the panic
infrastructure, like panic notifiers, kmsg dumpers, etc. The reasons rely
both in cleaning-up procedures (closing hypervisor <-> guest connection,
disabling some paravirtualized timer) as well as to data collection
(sending panic information to the hypervisor) and framebuffer management.
The thing is: some notifiers are related to others, ordering matters, some
functionalities are duplicated and there are lots of conditionals behind
sending panic information to the hypervisor. As part of an effort to
clean-up the panic notifiers mechanism and better document things, we
hereby address some of the issues/complexities of Hyper-V panic handling
through the following changes:
(a) We have die and panic notifiers on vmbus_drv.c and both have goals of
sending panic information to the hypervisor, though the panic notifier is
also responsible for a cleaning-up procedure.
This commit clears the code by splitting the panic notifier in two, one
for closing the vmbus connection whereas the other is only for sending
panic info to hypervisor. With that, it was possible to merge the die and
panic notifiers in a single/well-documented function, and clear some
conditional complexities on sending such information to the hypervisor.
(b) There is a Hyper-V framebuffer panic notifier, which relies in doing
a vmbus operation that demands a valid connection. So, we must order this
notifier with the panic notifier from vmbus_drv.c, to guarantee that the
framebuffer code executes before the vmbus connection is unloaded.
Also, this commit removes a useless header.
Although there is code rework and re-ordering, we expect that this change
has no functional regressions but instead optimize the path and increase
panic reliability on Hyper-V. This was tested on Hyper-V with success.
Cc: Andrea Parri (Microsoft) <parri.andrea@gmail.com>
Cc: Dexuan Cui <decui@microsoft.com>
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Cc: "K. Y. Srinivasan" <kys@microsoft.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Stephen Hemminger <sthemmin@microsoft.com>
Cc: Tianyu Lan <Tianyu.Lan@microsoft.com>
Cc: Wei Liu <wei.liu@kernel.org>
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Tested-by: Fabio A M Martins <fabiomirmar@gmail.com>
Signed-off-by: Guilherme G. Piccoli <gpiccoli@igalia.com>
Tested-by: Michael Kelley <mikelley@microsoft.com>
Link: https://lore.kernel.org/r/20220819221731.480795-11-gpiccoli@igalia.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>
2022-08-20 06:17:30 +08:00
|
|
|
* The vmbus panic notifier is always registered, hence we should
|
2022-03-16 04:35:35 +08:00
|
|
|
* also unconditionally unregister it here as well.
|
|
|
|
*/
|
|
|
|
atomic_notifier_chain_unregister(&panic_notifier_list,
|
drivers: hv, hyperv_fb: Untangle and refactor Hyper-V panic notifiers
Currently Hyper-V guests are among the most relevant users of the panic
infrastructure, like panic notifiers, kmsg dumpers, etc. The reasons rely
both in cleaning-up procedures (closing hypervisor <-> guest connection,
disabling some paravirtualized timer) as well as to data collection
(sending panic information to the hypervisor) and framebuffer management.
The thing is: some notifiers are related to others, ordering matters, some
functionalities are duplicated and there are lots of conditionals behind
sending panic information to the hypervisor. As part of an effort to
clean-up the panic notifiers mechanism and better document things, we
hereby address some of the issues/complexities of Hyper-V panic handling
through the following changes:
(a) We have die and panic notifiers on vmbus_drv.c and both have goals of
sending panic information to the hypervisor, though the panic notifier is
also responsible for a cleaning-up procedure.
This commit clears the code by splitting the panic notifier in two, one
for closing the vmbus connection whereas the other is only for sending
panic info to hypervisor. With that, it was possible to merge the die and
panic notifiers in a single/well-documented function, and clear some
conditional complexities on sending such information to the hypervisor.
(b) There is a Hyper-V framebuffer panic notifier, which relies in doing
a vmbus operation that demands a valid connection. So, we must order this
notifier with the panic notifier from vmbus_drv.c, to guarantee that the
framebuffer code executes before the vmbus connection is unloaded.
Also, this commit removes a useless header.
Although there is code rework and re-ordering, we expect that this change
has no functional regressions but instead optimize the path and increase
panic reliability on Hyper-V. This was tested on Hyper-V with success.
Cc: Andrea Parri (Microsoft) <parri.andrea@gmail.com>
Cc: Dexuan Cui <decui@microsoft.com>
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Cc: "K. Y. Srinivasan" <kys@microsoft.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Stephen Hemminger <sthemmin@microsoft.com>
Cc: Tianyu Lan <Tianyu.Lan@microsoft.com>
Cc: Wei Liu <wei.liu@kernel.org>
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Tested-by: Fabio A M Martins <fabiomirmar@gmail.com>
Signed-off-by: Guilherme G. Piccoli <gpiccoli@igalia.com>
Tested-by: Michael Kelley <mikelley@microsoft.com>
Link: https://lore.kernel.org/r/20220819221731.480795-11-gpiccoli@igalia.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>
2022-08-20 06:17:30 +08:00
|
|
|
&hyperv_panic_vmbus_unload_block);
|
2022-03-16 04:35:35 +08:00
|
|
|
|
2011-12-13 01:29:17 +08:00
|
|
|
bus_unregister(&hv_bus);
|
2017-02-12 14:02:19 +08:00
|
|
|
|
2016-12-08 06:53:11 +08:00
|
|
|
cpuhp_remove_state(hyperv_cpuhp_online);
|
2015-08-02 07:08:05 +08:00
|
|
|
hv_synic_free();
|
2023-03-20 15:47:38 +08:00
|
|
|
platform_driver_unregister(&vmbus_platform_driver);
|
2011-12-13 01:29:17 +08:00
|
|
|
}
|
|
|
|
|
2011-03-16 06:03:32 +08:00
|
|
|
|
2009-09-02 22:11:14 +08:00
|
|
|
MODULE_LICENSE("GPL");
|
2019-04-23 11:47:27 +08:00
|
|
|
MODULE_DESCRIPTION("Microsoft Hyper-V VMBus Driver");
|
2009-07-14 07:02:34 +08:00
|
|
|
|
2011-10-25 02:28:12 +08:00
|
|
|
subsys_initcall(hv_acpi_init);
|
2011-12-13 01:29:17 +08:00
|
|
|
module_exit(vmbus_exit);
|