linux

korg/linux

mirror of https://mirrors.bfsu.edu.cn/git/linux.git synced 2024-11-19 18:24:14 +08:00

Author	SHA1	Message	Date
Alex Williamson	8b27ee60bf	vfio-pci: PCI hot reset interface The current VFIO_DEVICE_RESET interface only maps to PCI use cases where we can isolate the reset to the individual PCI function. This means the device must support FLR (PCIe or AF), PM reset on D3hot->D0 transition, device specific reset, or be a singleton device on a bus for a secondary bus reset. FLR does not have widespread support, PM reset is not very reliable, and bus topology is dictated by the system and device design. We need to provide a means for a user to induce a bus reset in cases where the existing mechanisms are not available or not reliable. This device specific extension to VFIO provides the user with this ability. Two new ioctls are introduced: - VFIO_DEVICE_PCI_GET_HOT_RESET_INFO - VFIO_DEVICE_PCI_HOT_RESET The first provides the user with information about the extent of devices affected by a hot reset. This is essentially a list of devices and the IOMMU groups they belong to. The user may then initiate a hot reset by calling the second ioctl. We must be careful that the user has ownership of all the affected devices found via the first ioctl, so the second ioctl takes a list of file descriptors for the VFIO groups affected by the reset. Each group must have IOMMU protection established for the ioctl to succeed. Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2013-09-04 11:28:04 -06:00
Alex Williamson	17638db1b8	vfio-pci: Test for extended config space Having PCIe/PCI-X capability isn't enough to assume that there are extended capabilities. Both specs define that the first capability header is all zero if there are no extended capabilities. Testing for this avoids an erroneous message about hiding capability 0x0 at offset 0x100. Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2013-09-04 10:58:52 -06:00
Alex Williamson	20e7745784	vfio-pci: Use fdget() rather than eventfd_fget() eventfd_fget() tests to see whether the file is an eventfd file, which we then immediately pass to eventfd_ctx_fileget(), which again tests whether the file is an eventfd file. Simplify slightly by using fdget() so that we only test that we're looking at an eventfd once. fget() could also be used, but fdget() makes use of fget_light() for another slight optimization. Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2013-08-28 09:49:55 -06:00
Alex Williamson	d24cdbfd28	vfio-pci: Avoid deadlock on remove If an attempt is made to unbind a device from vfio-pci while that device is in use, the request is blocked until the device becomes unused. Unfortunately, that unbind path still grabs the device_lock, which certain things like __pci_reset_function() also want to take. This means we need to try to acquire the locks ourselves and use the pre-locked version, __pci_reset_function_locked(). Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2013-07-24 16:36:41 -06:00
Al Viro	a47df1518e	vfio: remap_pfn_range() sets all those flags... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2013-06-29 12:46:41 +04:00
Linus Torvalds	0b2e3b6bb4	vfio updates for v3.10 Changes include extension to support PCI AER notification to userspace, byte granularity of PCI config space and access to unarchitected PCI config space, better protection around IOMMU driver accesses, default file mode fix, and a few misc cleanups. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.13 (GNU/Linux) iQIcBAABAgAGBQJRgox9AAoJECObm247sIsizJwP/23KprR6BuRt168Obo+xC/lp kj1E4Hd4XHx6T/5XRexa4GqhmyLqgoqbS19uj/K6ebY9rouVKo0V6OYNFDQiFR/F yEGjYIBKV9/eBZMQI4RbZpZKGDXTeFrXyjsac1m51JrR3st8zsvZlNPE/TjzhSfz jMnsCC99ZwIFjmptw/yWwgInswy1n9H9iICS14Xn1x7v71iyOE32reTG6M9HsPHe Xm5F0K5iLQX8qoISHAomrTnFmCLxp5Y2qhh7nZWmC2gCbPBEnGdqx4prwzJayAaC DAN8osaYldoKQuwI4wFYMICHxxCEFk8xU58FeKCmeQJZgomefVAoRKf86gAEsSLR XHptBQOktWVKNFhzZWyOuAX3iua/hmWcOa05I6inycV1x2ZAGlNhvJnj1iKIphLk +neBFAD9wDcAJZdXkfudq0jJ9jJTSAWBv8d+hozNfkjTVhF+wRnhZ7V6dZfb9+W6 kPGbgwqKApLoOabQbxbZ5Ftr6S7344prB/HAMywGa+xZfoxoYPxBHCi0rSTvUTWy oWLtzrKRbjBDc0qF4eEK9RZlmVt2ZmCUnUB3kbWED4mrJAHu4LlFghnAloePrIZ3 kXlMARWbyw/E+1WIMO4Ito5+1s3zVsJJgzVsAQDJkTQw2aYDhns73/Y25Dj7x7QS BKebrXnh2xVCN6OIu+eX =F0Cc -----END PGP SIGNATURE----- Merge tag 'vfio-for-v3.10' of git://github.com/awilliam/linux-vfio Pull vfio updates from Alex Williamson: "Changes include extension to support PCI AER notification to userspace, byte granularity of PCI config space and access to unarchitected PCI config space, better protection around IOMMU driver accesses, default file mode fix, and a few misc cleanups." * tag 'vfio-for-v3.10' of git://github.com/awilliam/linux-vfio: vfio: Set container device mode vfio: Use down_reads to protect iommu disconnects vfio: Convert container->group_lock to rwsem PCI/VFIO: use pcie_flags_reg instead of access PCI-E Capabilities Register vfio-pci: Enable raw access to unassigned config space vfio-pci: Use byte granularity in config map vfio: make local function vfio_pci_intx_unmask_handler() static VFIO-AER: Vfio-pci driver changes for supporting AER VFIO: Wrapper for getting reference to vfio_device	2013-05-02 14:02:32 -07:00
Linus Torvalds	96a3e8af5a	PCI changes for the v3.10 merge window: PCI device hotplug - Remove ACPI PCI subdrivers (Jiang Liu, Myron Stowe) - Make acpiphp builtin only, not modular (Jiang Liu) - Add acpiphp mutual exclusion (Jiang Liu) Power management - Skip "PME enabled/disabled" messages when not supported (Rafael Wysocki) - Fix fallback to PCI_D0 (Rafael Wysocki) Miscellaneous - Factor quirk_io_region (Yinghai Lu) - Cache MSI capability offsets & cleanup (Gavin Shan, Bjorn Helgaas) - Clean up EISA resource initialization and logging (Bjorn Helgaas) - Fix prototype warnings (Andy Shevchenko, Bjorn Helgaas) - MIPS: Initialize of_node before scanning bus (Gabor Juhos) - Fix pcibios_get_phb_of_node() declaration "weak" annotation (Gabor Juhos) - Add MSI INTX_DISABLE quirks for AR8161/AR8162/etc (Xiong Huang) - Fix aer_inject return values (Prarit Bhargava) - Remove PME/ACPI dependency (Andrew Murray) - Use shared PCI_BUS_NUM() and PCI_DEVID() (Shuah Khan) -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) iQIcBAABAgAGBQJRfhSWAAoJEFmIoMA60/r8GrYQAIHDsyZIuJSf6g+8Td1h+PIC YD3wQhbyrDqQDuKU4+9cz+JsbHmnozUGA4UmlwmOGBxEa/Uauspb6yX1P1+x9Ok1 WD7Ar3BlA5OuYI/1L1mgCiA428MTujwoR4fPnC0+KFy8xk1tBpmhzzeOFohbKyFF hMBO/Xt9tCzPATJ1LhjIH4xAykfDkbnPNHNcUKRoAkRo0CO0lS8gcTk0shXXSNng p9kQ6c4cYZvlRIJTwlawWV09nr7mDsBYa3JClqXYZufUWfEwvIuhisJxCJ57sWi9 t+Ev8dm7VM6Cr5dV+ORArlboBFrq4f/W5U9j9GPFrRplwf+WbNT6tNGSpSDq8XhU Q7JjNgPWVdWXe1vIsMwaO49zi45/bNehuCSFLZiyPZwedMk764tys+iYw+tMRtv1 tBR7lwESSXfagmvWyQAuQOTy6Rj26BPd2T8e2lMsvsuQO9mCyTK6Ey3YyKuqKQK/ l5Gns4vv4eaCjGXqqDGiydUjSes+r/v1bu43XiRnwPQJUKb5kr5SjN5/zSMBuUgm TLT/bnv8qvdFxCpVQJFv4k/uzULARMdbvLtTy8osB14vNHX9jPn+xORjLaZNiO6O 7fFispMU8Om56hNkD6C451r3icRjjGlD7OA8KOlbZ8f876sLzGV9i6P9gwCoRdEB wclDPsN7kAzw/V2sEE60 =bj8i -----END PGP SIGNATURE----- Merge tag 'pci-v3.10-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci Pull PCI updates from Bjorn Helgaas: "PCI changes for the v3.10 merge window: PCI device hotplug - Remove ACPI PCI subdrivers (Jiang Liu, Myron Stowe) - Make acpiphp builtin only, not modular (Jiang Liu) - Add acpiphp mutual exclusion (Jiang Liu) Power management - Skip "PME enabled/disabled" messages when not supported (Rafael Wysocki) - Fix fallback to PCI_D0 (Rafael Wysocki) Miscellaneous - Factor quirk_io_region (Yinghai Lu) - Cache MSI capability offsets & cleanup (Gavin Shan, Bjorn Helgaas) - Clean up EISA resource initialization and logging (Bjorn Helgaas) - Fix prototype warnings (Andy Shevchenko, Bjorn Helgaas) - MIPS: Initialize of_node before scanning bus (Gabor Juhos) - Fix pcibios_get_phb_of_node() declaration "weak" annotation (Gabor Juhos) - Add MSI INTX_DISABLE quirks for AR8161/AR8162/etc (Xiong Huang) - Fix aer_inject return values (Prarit Bhargava) - Remove PME/ACPI dependency (Andrew Murray) - Use shared PCI_BUS_NUM() and PCI_DEVID() (Shuah Khan)" * tag 'pci-v3.10-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (63 commits) vfio-pci: Use cached MSI/MSI-X capabilities vfio-pci: Use PCI_MSIX_TABLE_BIR, not PCI_MSIX_FLAGS_BIRMASK PCI: Remove "extern" from function declarations PCI: Use PCI_MSIX_TABLE_BIR, not PCI_MSIX_FLAGS_BIRMASK PCI: Drop msi_mask_reg() and remove drivers/pci/msi.h PCI: Use msix_table_size() directly, drop multi_msix_capable() PCI: Drop msix_table_offset_reg() and msix_pba_offset_reg() macros PCI: Drop is_64bit_address() and is_mask_bit_support() macros PCI: Drop msi_data_reg() macro PCI: Drop msi_lower_address_reg() and msi_upper_address_reg() macros PCI: Drop msi_control_reg() macro and use PCI_MSI_FLAGS directly PCI: Use cached MSI/MSI-X offsets from dev, not from msi_desc PCI: Clean up MSI/MSI-X capability #defines PCI: Use cached MSI-X cap while enabling MSI-X PCI: Use cached MSI cap while enabling MSI interrupts PCI: Remove MSI/MSI-X cap check in pci_msi_check_device() PCI: Cache MSI/MSI-X capability offsets in struct pci_dev PCI: Use u8, not int, for PM capability offset [SCSI] megaraid_sas: Use correct #define for MSI-X capability PCI: Remove "extern" from function declarations ...	2013-04-29 09:30:25 -07:00
Bjorn Helgaas	a9047f24df	vfio-pci: Use cached MSI/MSI-X capabilities We now cache the MSI/MSI-X capability offsets in the struct pci_dev, so no need to find the capabilities again. Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Acked-by: Alex Williamson <alex.williamson@redhat.com>	2013-04-24 11:35:56 -06:00
Bjorn Helgaas	508d1aa602	vfio-pci: Use PCI_MSIX_TABLE_BIR, not PCI_MSIX_FLAGS_BIRMASK PCI_MSIX_FLAGS_BIRMASK is mis-named because the BIR mask is in the Table Offset register, not the flags ("Message Control" per spec) register. Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Acked-by: Alex Williamson <alex.williamson@redhat.com>	2013-04-24 11:35:56 -06:00
Yijing Wang	aa2cba51a0	PCI/VFIO: use pcie_flags_reg instead of access PCI-E Capabilities Register Currently, we use pcie_flags_reg to cache PCI-E Capabilities Register, because PCI-E Capabilities Register bits are almost read-only. This patch use pcie_caps_reg() instead of another access PCI-E Capabilities Register. Signed-off-by: Yijing Wang <wangyijing@huawei.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2013-04-15 08:45:10 -06:00
Alex Williamson	a7d1ea1c11	vfio-pci: Enable raw access to unassigned config space Devices like be2net hide registers between the gaps in capabilities and architected regions of PCI config space. Our choices to support such devices is to either build an ever growing and unmanageable white list or rely on hardware isolation to protect us. These registers are really no different than MMIO or I/O port space registers, which we don't attempt to regulate, so treat PCI config space in the same way. Reported-by: Gavin Shan <shangw@linux.vnet.ibm.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Tested-by: Gavin Shan <shangw@linux.vnet.ibm.com>	2013-04-01 09:04:12 -06:00
Alex Williamson	180b138107	vfio-pci: Use byte granularity in config map The config map previously used a byte per dword to map regions of config space to capabilities. Modulo a bug where we round the length of capabilities down instead of up, this theoretically works well and saves space so long as devices don't try to hide registers in the gaps between capabilities. Unfortunately they do exactly that so we need byte granularity on our config space map. Increase the allocation of the config map and split accesses at capability region boundaries. Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Tested-by: Gavin Shan <shangw@linux.vnet.ibm.com>	2013-04-01 09:03:44 -06:00
Alex Williamson	904c680c7b	vfio-pci: Fix possible integer overflow The VFIO_DEVICE_SET_IRQS ioctl takes a start and count parameter, both of which are unsigned. We attempt to bounds check these, but fail to account for the case where start is a very large number, allowing start + count to wrap back into the valid range. Bounds check both start and start + count. Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2013-03-26 11:33:16 -06:00
Wei Yongjun	0bced2f728	vfio: make local function vfio_pci_intx_unmask_handler() static vfio_pci_intx_unmask_handler() was not declared. It should be static. Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2013-03-25 11:15:44 -06:00
Arnd Bergmann	25e9789ddd	vfio: include <linux/slab.h> for kmalloc The vfio drivers call kmalloc or kzalloc, but do not include <linux/slab.h>, which causes build errors on ARM. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Cc: kvm@vger.kernel.org	2013-03-15 12:58:20 -06:00
Vijay Mohan Pandarathil	dad9f8972e	VFIO-AER: Vfio-pci driver changes for supporting AER - New VFIO_SET_IRQ ioctl option to pass the eventfd that is signaled when an error occurs in the vfio_pci_device - Register pci_error_handler for the vfio_pci driver - When the device encounters an error, the error handler registered by the vfio_pci driver gets invoked by the AER infrastructure - In the error handler, signal the eventfd registered for the device. - This results in the qemu eventfd handler getting invoked and appropriate action taken for the guest. Signed-off-by: Vijay Mohan Pandarathil <vijaymohan.pandarathil@hp.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2013-03-11 09:31:22 -06:00
Kees Cook	d65530fbc7	drivers/vfio: remove depends on CONFIG_EXPERIMENTAL The CONFIG_EXPERIMENTAL config item has not carried much meaning for a while now and is almost always enabled by default. As agreed during the Linux kernel summit, remove it from any "depends on" lines in Kconfigs. Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2013-02-24 09:59:44 -07:00
Alex Williamson	84237a826b	vfio-pci: Add support for VGA region access PCI defines display class VGA regions at I/O port address 0x3b0, 0x3c0 and MMIO address 0xa0000. As these are non-overlapping, we can ignore the I/O port vs MMIO difference and expose them both in a single region. We make use of the VGA arbiter around each access to configure chipset access as necessary. Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2013-02-18 10:11:13 -07:00
Alex Williamson	2dd1194833	vfio-pci: Manage user power state transitions We give the user access to change the power state of the device but certain transitions result in an uninitialized state which the user cannot resolve. To fix this we need to mark the PowerState field of the PMCSR register read-only and effect the requested change on behalf of the user. This has the added benefit that pdev->current_state remains accurate while controlled by the user. The primary example of this bug is a QEMU guest doing a reboot where the device it put into D3 on shutdown and becomes unusable on the next boot because the device did a soft reset on D3->D0 (NoSoftRst-). Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2013-02-18 10:10:33 -07:00
Alex Williamson	906ee99dd2	vfio-pci: Cleanup BAR access We can actually handle MMIO and I/O port from the same access function since PCI already does abstraction of this. The ROM BAR only requires a minor difference, so it gets included too. vfio_pci_config_readwrite gets renamed for consistency. Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2013-02-14 14:02:12 -07:00
Alex Williamson	5b279a11d3	vfio-pci: Cleanup read/write functions The read and write functions are nearly identical, combine them and convert to a switch statement. This also makes it easy to narrow the scope of when we use the io/mem accessors in case new regions are added. Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2013-02-14 14:02:12 -07:00
Alex Williamson	5641ade41f	vfio-pci: Enable PCIe extended capabilities on v1 Even PCIe 1.x had extended config space. Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2013-02-14 10:45:31 -07:00
Alex Williamson	ec1287e511	vfio-pci: Fix buffer overfill A read from a range hidden from the user (ex. MSI-X vector table) attempts to fill the user buffer up to the end of the excluded range instead of up to the requested count. Fix it. Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Cc: stable@vger.kernel.org	2013-01-15 10:45:26 -07:00
Alex Williamson	9a92c5091a	vfio-pci: Enable device before attempting reset Devices making use of PM reset are getting incorrectly identified as not supporting reset because pci_pm_reset() fails unless the device is in D0 power state. When first attached to vfio_pci devices are typically in an unknown power state. We can fix this by explicitly setting the power state or simply calling pci_enable_device() before attempting a pci_reset_function(). We need to enable the device anyway, so move this up in our vfio_pci_enable() function, which also simplifies the error path a bit. Note that pci_disable_device() does not explicitly set the power state, so there's no need to re-order vfio_pci_disable(). Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2012-12-07 13:43:51 -07:00
Jiang Liu	05bf3aac93	VFIO: fix out of order labels for error recovery in vfio_pci_init() The two labels for error recovery in function vfio_pci_init() is out of order, so fix it. Signed-off-by: Jiang Liu <jiang.liu@huawei.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2012-12-07 13:43:51 -07:00
Alex Williamson	2007722a60	vfio-pci: Re-order device reset Move the device reset to the end of our disable path, the device should already be stopped from pci_disable_device(). This also allows us to manipulate the save/restore to avoid the save/reset/restore + save/restore that we had before. Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2012-12-07 13:43:50 -07:00
Fengguang Wu	3a1f7041dd	vfio: simplify kmalloc+copy_from_user to memdup_user Generated by: coccinelle/api/memdup_user.cocci Acked-by: Julia Lawall <julia.lawall@lip6.fr> Reported-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2012-12-07 13:43:49 -07:00
Alex Williamson	899649b7d4	vfio: Fix PCI INTx disable consistency The virq_disabled flag tracks the userspace view of INTx masking across interrupt mode changes, but we're not consistently applying this to the interrupt and masking handler notion of the device. Currently if the user sets DisINTx while in MSI or MSIX mode, then returns to INTx mode (ex. rebooting a qemu guest), the hardware has DisINTx+, but the management of INTx thinks it's enabled, making it impossible to actually clear DisINTx. Fix this by updating the handler state when INTx is re-enabled. Cc: stable@vger.kernel.org Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2012-10-10 09:10:32 -06:00
Alex Williamson	9dbdfd23b7	vfio: Move PCI INTx eventfd setting earlier We need to be ready to recieve an interrupt as soon as we call request_irq, so our eventfd context setting needs to be moved earlier. Without this, an interrupt from our device or one sharing the interrupt line can pass a NULL into eventfd_signal and oops. Cc: stable@vger.kernel.org Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2012-10-10 09:10:32 -06:00
Alex Williamson	34002f54d2	vfio: Fix PCI mmap after `b3b9c293` Our mmap path mistakely relied on vma->vm_pgoff to get set in remap_pfn_range. After `b3b9c293`, that path only applies to copy-on-write mappings. Set it in our own code. Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2012-10-10 09:10:31 -06:00
Linus Torvalds	547b1e81af	Fix staging driver use of VM_RESERVED The VM_RESERVED flag was killed off in commit `314e51b985` ("mm: kill vma flag VM_RESERVED and mm->reserved_vm counter"), and replaced by the proper semantic flags (eg "don't core-dump" etc). But there was a new use of VM_RESERVED that got missed by the merge. Fix the remaining use of VM_RESERVED in the vfio_pci driver, replacing the VM_RESERVED flag with VM_DONTEXPAND \| VM_DONTDUMP. Signed-off-by: Linus Torvalds <torvalds@linux-foundation,org>	2012-10-09 21:06:41 +09:00
Alex Williamson	b68e7fa879	vfio: Fix virqfd release race vfoi-pci supports a mechanism like KVM's irqfd for unmasking an interrupt through an eventfd. There are two ways to shutdown this interface: 1) close the eventfd, 2) ioctl (such as disabling the interrupt). Both of these do the release through a workqueue, which can result in a segfault if two jobs get queued for the same virqfd. Fix this by protecting the pointer to these virqfds by a spinlock. The vfio pci device will therefore no longer have a reference to it once the release job is queued under lock. On the ioctl side, we still flush the workqueue to ensure that any outstanding releases are completed. Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2012-09-21 10:48:28 -06:00
Alex Williamson	89e1f7d4c6	vfio: Add PCI device driver Add PCI device support for VFIO. PCI devices expose regions for accessing config space, I/O port space, and MMIO areas of the device. PCI config access is virtualized in the kernel, allowing us to ensure the integrity of the system, by preventing various accesses while reducing duplicate support across various userspace drivers. I/O port supports read/write access while MMIO also supports mmap of sufficiently sized regions. Support for INTx, MSI, and MSI-X interrupts are provided using eventfds to userspace. Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2012-07-31 08:16:24 -06:00

33 Commits