Commit Graph

209 Commits

Author SHA1 Message Date
Linus Torvalds
09102704c6 virtio: features, fixes
virtio-mem
 doorbell mapping for vdpa
 config interrupt support in ifc
 fixes all over the place
 
 Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
 -----BEGIN PGP SIGNATURE-----
 
 iQFDBAABCAAtFiEEXQn9CHHI+FuUyooNKB8NuNKNVGkFAl7fZ6APHG1zdEByZWRo
 YXQuY29tAAoJECgfDbjSjVRpkDoIAMcBcQx5su1iuX7vT35xzUWZO478eAf1jOMZ
 7KxKUVBeztkcxVFUlRVRu9MR6wOzwHils+1HD6025775Smr5M6x3aJxR6xOORaBj
 RoU6OVGkpDvbzsxlhW+xhONz4O7/RkveKJPCwzGjqHrsFeh92lkfTqroz/EuNpw+
 LZsO0+DhdUf123HbwHQp5lxW8EjyrRabgeZZg/D9VLPhoCP88vCjRhBXU2GPuaUl
 /UNXsQafn4xUgrxPaoN5f4Phn/P46NNrbZ1jmlkw/z/3QhF/DhktGXGaZsIHDCN/
 vicUii0or5QLeBsZpMbKko/BIe2xWHxFjkMRhMOMZOfcBb6sMBI=
 =auUa
 -----END PGP SIGNATURE-----

Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost

Pull virtio updates from Michael Tsirkin:

 - virtio-mem: paravirtualized memory hotplug

 - support doorbell mapping for vdpa

 - config interrupt support in ifc

 - fixes all over the place

* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: (40 commits)
  vhost/test: fix up after API change
  virtio_mem: convert device block size into 64bit
  virtio-mem: drop unnecessary initialization
  ifcvf: implement config interrupt in IFCVF
  vhost: replace -1 with VHOST_FILE_UNBIND in ioctls
  vhost_vdpa: Support config interrupt in vdpa
  ifcvf: ignore continuous setting same status value
  virtio-mem: Don't rely on implicit compiler padding for requests
  virtio-mem: Try to unplug the complete online memory block first
  virtio-mem: Use -ETXTBSY as error code if the device is busy
  virtio-mem: Unplug subblocks right-to-left
  virtio-mem: Drop manual check for already present memory
  virtio-mem: Add parent resource for all added "System RAM"
  virtio-mem: Better retry handling
  virtio-mem: Offline and remove completely unplugged memory blocks
  mm/memory_hotplug: Introduce offline_and_remove_memory()
  virtio-mem: Allow to offline partially unplugged memory blocks
  mm: Allow to offline unmovable PageOffline() pages via MEM_GOING_OFFLINE
  virtio-mem: Paravirtualized memory hotunplug part 2
  virtio-mem: Paravirtualized memory hotunplug part 1
  ...
2020-06-10 13:42:09 -07:00
John Hubbard
690623e1b4 vhost: convert get_user_pages() --> pin_user_pages()
This code was using get_user_pages*(), in approximately a "Case 5"
scenario (accessing the data within a page), using the categorization
from [1].  That means that it's time to convert the get_user_pages*() +
put_page() calls to pin_user_pages*() + unpin_user_pages() calls.

There is some helpful background in [2]: basically, this is a small part
of fixing a long-standing disconnect between pinning pages, and file
systems' use of those pages.

[1] Documentation/core-api/pin_user_pages.rst

[2] "Explicit pinning of user-space pages":
    https://lwn.net/Articles/807108/

Signed-off-by: John Hubbard <jhubbard@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Jan Kara <jack@suse.cz>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Jérôme Glisse <jglisse@redhat.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Souptick Joarder <jrdr.linux@gmail.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Link: http://lkml.kernel.org/r/20200529234309.484480-3-jhubbard@nvidia.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-08 11:05:57 -07:00
Zhu Lingshan
e0136c16fa vhost: replace -1 with VHOST_FILE_UNBIND in ioctls
This commit replaces -1 with VHOST_FILE_UNBIND in ioctls since
we have added such a macro in the uapi header for vdpa_host.

Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Link: https://lore.kernel.org/r/1591352835-22441-5-git-send-email-lingshan.zhu@intel.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2020-06-06 16:26:27 -04:00
Guennadi Liakhovetski
002ef18eff vhost: (cosmetic) remove a superfluous variable initialisation
Even the compiler is able to figure out that in this case the
initialisation is superfluous.

Signed-off-by: Guennadi Liakhovetski <guennadi.liakhovetski@linux.intel.com>
Link: https://lore.kernel.org/r/20200527180541.5570-3-guennadi.liakhovetski@linux.intel.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2020-06-04 15:36:51 -04:00
Jason Wang
5ce995f313 vhost: use mmgrab() instead of mmget() for non worker device
For the device that doesn't use vhost worker and use_mm(), mmget() is
too heavy weight and it may brings troubles for implementing mmap()
support for vDPA device.

This is because, an reference to the address space was held via
mm_get() in vhost_dev_set_owner() and an reference to the file was
held in mmap(). This means when process exits, the mm can not be
released thus we can not release the file.

This patch tries to use mmgrab() instead of mmget(), which allows the
address space to be destroy in process exit without releasing the mm
structure itself. This is sufficient for vDPA device which pin user
pages and does not depend on the address space to work.

Signed-off-by: Jason Wang <jasowang@redhat.com>
Link: https://lore.kernel.org/r/20200529080303.15449-3-jasowang@redhat.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2020-06-04 15:36:51 -04:00
Jason Wang
01fcb1cbc8 vhost: allow device that does not depend on vhost worker
vDPA device currently relays the eventfd via vhost worker. This is
inefficient due the latency of wakeup and scheduling, so this patch
tries to introduce a use_worker attribute for the vhost device. When
use_worker is not set with vhost_dev_init(), vhost won't try to
allocate a worker thread and the vhost_poll will be processed directly
in the wakeup function.

This help for vDPA since it reduces the latency caused by vhost worker.

In my testing, it saves 0.2 ms in pings between VMs on a mutual host.

Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Link: https://lore.kernel.org/r/20200529080303.15449-2-jasowang@redhat.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2020-06-04 15:36:51 -04:00
Michael S. Tsirkin
a865e420b9 virtio: force spec specified alignment on types
The ring element addresses are passed between components with different
alignments assumptions. Thus, if guest/userspace selects a pointer and
host then gets and dereferences it, we might need to decrease the
compiler-selected alignment to prevent compiler on the host from
assuming pointer is aligned.

This actually triggers on ARM with -mabi=apcs-gnu - which is a
deprecated configuration, but it seems safer to handle this
generally.

Note that userspace that allocates the memory is actually OK and does
not need to be fixed, but userspace that gets it from guest or another
process does need to be fixed. The later doesn't generally talk to the
kernel so while it might be buggy it's not talking to the kernel in the
buggy way - it's just using the header in the buggy way - so fixing
header and asking userspace to recompile is the best we can do.

I verified that the produced kernel binary on x86 is exactly identical
before and after the change.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
2020-06-02 02:45:13 -04:00
Michael S. Tsirkin
1b0be99f1a vhost: missing __user tags
sparse warns about converting void * to void __user *. This is not new
but only got noticed now that vhost is built on more systems.
This is just a question of __user tags missing in a couple of places,
so fix it up.

Fixes: f889491380 ("vhost: introduce O(1) vq metadata cache")
Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2020-05-15 11:36:31 -04:00
Jason Wang
0bbe30668d vhost: factor out IOTLB
This patch factors out IOTLB into a dedicated module in order to be
reused by other modules like vringh. User may choose to enable the
automatic retiring by specifying VHOST_IOTLB_FLAG_RETIRE flag to fit
for the case of vhost device IOTLB implementation.

Signed-off-by: Jason Wang <jasowang@redhat.com>
Link: https://lore.kernel.org/r/20200326140125.19794-4-jasowang@redhat.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2020-04-01 12:06:26 -04:00
Jason Wang
792a4f2ed2 vhost: allow per device message handler
This patch allow device to register its own message handler during
vhost_dev_init(). vDPA device will use it to implement its own DMA
mapping logic.

Signed-off-by: Jason Wang <jasowang@redhat.com>
Link: https://lore.kernel.org/r/20200326140125.19794-3-jasowang@redhat.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2020-04-01 12:06:26 -04:00
Andrey Konovalov
8f6a7f96dc vhost, kcov: collect coverage from vhost_worker
Add kcov_remote_start()/kcov_remote_stop() annotations to the
vhost_worker() function, which is responsible for processing vhost
works.

Since vhost_worker() threads are spawned per vhost device instance the
common kcov handle is used for kcov_remote_start()/stop() annotations
(see Documentation/dev-tools/kcov.rst for details).  As the result kcov
can now be used to collect coverage from vhost worker threads.

Link: http://lkml.kernel.org/r/e49d5d154e5da6c9ada521d2b7ce10a49ce9f98b.1572366574.git.andreyknvl@google.com
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Cc: Alan Stern <stern@rowland.harvard.edu>
Cc: Alexander Potapenko <glider@google.com>
Cc: Anders Roxell <anders.roxell@linaro.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: David Windsor <dwindsor@gmail.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Elena Reshetova <elena.reshetova@intel.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Marco Elver <elver@google.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-12-04 19:44:14 -08:00
Michael S. Tsirkin
0d4a3f2abb Revert "vhost: block speculation of translated descriptors"
This reverts commit a89db445fb.

I was hasty to include this patch, and it breaks the build on 32 bit.
Defence in depth is good but let's do it properly.

Cc: stable@vger.kernel.org
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2019-09-14 15:21:51 -04:00
yongduan
060423bfde vhost: make sure log_num < in_num
The code assumes log_num < in_num everywhere, and that is true as long as
in_num is incremented by descriptor iov count, and log_num by 1. However
this breaks if there's a zero sized descriptor.

As a result, if a malicious guest creates a vring desc with desc.len = 0,
it may cause the host kernel to crash by overflowing the log array. This
bug can be triggered during the VM migration.

There's no need to log when desc.len = 0, so just don't increment log_num
in this case.

Fixes: 3a4d5c94e9 ("vhost_net: a kernel-level virtio server")
Cc: stable@vger.kernel.org
Reviewed-by: Lidong Chen <lidongchen@tencent.com>
Signed-off-by: ruippan <ruippan@tencent.com>
Signed-off-by: yongduan <yongduan@tencent.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Tyler Hicks <tyhicks@canonical.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2019-09-11 15:15:26 -04:00
Michael S. Tsirkin
a89db445fb vhost: block speculation of translated descriptors
iovec addresses coming from vhost are assumed to be
pre-validated, but in fact can be speculated to a value
out of range.

Userspace address are later validated with array_index_nospec so we can
be sure kernel info does not leak through these addresses, but vhost
must also not leak userspace info outside the allowed memory table to
guests.

Following the defence in depth principle, make sure
the address is not validated out of node range.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Cc: stable@vger.kernel.org
Acked-by: Jason Wang <jasowang@redhat.com>
Tested-by: Jason Wang <jasowang@redhat.com>
2019-09-11 15:15:07 -04:00
Michael S. Tsirkin
3d2c7d3704 Revert "vhost: access vq metadata through kernel virtual address"
This reverts commit 7f466032dc ("vhost: access vq metadata through
kernel virtual address").  The commit caused a bunch of issues, and
while commit 73f628ec9e ("vhost: disable metadata prefetch
optimization") disabled the optimization it's not nice to keep lots of
dead code around.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2019-09-04 07:39:48 -04:00
Yunsheng Lin
896fc242bc vhost: Remove unnecessary variable
It is unnecessary to use ret variable to return the error
code, just return the error code directly.

Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2019-09-04 06:21:17 -04:00
Linus Torvalds
3a1d5384b7 virtio, vhost: fixes, features, performance
new iommu device
 vhost guest memory access using vmap (just meta-data for now)
 minor fixes
 
 Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
 
 Note: due to code driver changes the driver-core tree, the following
 patch is needed when merging tree with commit 92ce7e83b4
 ("driver_find_device: Unify the match function with
 class_find_device()") in the driver-core tree:
 
 From: Nathan Chancellor <natechancellor@gmail.com>
 Subject: [PATCH] iommu/virtio: Constify data parameter in viommu_match_node
 
 After commit 92ce7e83b4 ("driver_find_device: Unify the match
 function with class_find_device()") in the driver-core tree.
 
 Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
 Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
 
 ---
  drivers/iommu/virtio-iommu.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)
 
 diff --git a/drivers/iommu/virtio-iommu.c b/drivers/iommu/virtio-iommu.c
 index 4620dd221ffd..433f4d2ee956 100644
 --- a/drivers/iommu/virtio-iommu.c
 +++ b/drivers/iommu/virtio-iommu.c
 @@ -839,7 +839,7 @@ static void viommu_put_resv_regions(struct device *dev, struct list_head *head)
  static struct iommu_ops viommu_ops;
  static struct virtio_driver virtio_iommu_drv;
 
 -static int viommu_match_node(struct device *dev, void *data)
 +static int viommu_match_node(struct device *dev, const void *data)
  {
  	return dev->parent->fwnode == data;
  }
 -----BEGIN PGP SIGNATURE-----
 
 iQEcBAABAgAGBQJdJ5qUAAoJECgfDbjSjVRpQs0H/2qWcIG1zjGKyh9KWrfgOusG
 /QIqeP50d7SC6oqdyd00tzmExqO1xdGLPFzYixdOsU817te1gHBP4Rfmzo01jZRd
 CUzZNnZQ2JRsDshiA6G2ui+wn1/a/cB3RPN4rT1mquDYS53QmsRGDQDnpp84TXMV
 aocB8TS6halbRzKMq3VmaWHIvzNXnt4dwQR542+PyeLLn9bUx2QwWj2ON3QwxixK
 dVRZow3GwLGBhKTA/Z1Z/Bta4fEfOKjUGP2XWgvL6zOr+nZR4eQ8w5WXVJYzR+d6
 1JCfqTxleweT2k6Tu5VwtTNlQkxn/XvQAeisppOiEE6NnPjubyI9wMQIvL7bkpo=
 =uJbC
 -----END PGP SIGNATURE-----

Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost

Pull virtio, vhost updates from Michael Tsirkin:
 "Fixes, features, performance:

   - new iommu device

   - vhost guest memory access using vmap (just meta-data for now)

   - minor fixes"

* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
  virtio-mmio: add error check for platform_get_irq
  scsi: virtio_scsi: Use struct_size() helper
  iommu/virtio: Add event queue
  iommu/virtio: Add probe request
  iommu: Add virtio-iommu driver
  PCI: OF: Initialize dev->fwnode appropriately
  of: Allow the iommu-map property to omit untranslated devices
  dt-bindings: virtio: Add virtio-pci-iommu node
  dt-bindings: virtio-mmio: Add IOMMU description
  vhost: fix clang build warning
  vhost: access vq metadata through kernel virtual address
  vhost: factor out setting vring addr and num
  vhost: introduce helpers to get the size of metadata area
  vhost: rename vq_iotlb_prefetch() to vq_meta_prefetch()
  vhost: fine grain userspace memory accessors
  vhost: generalize adding used elem
2019-07-17 11:26:09 -07:00
Linus Torvalds
e9a83bd232 It's been a relatively busy cycle for docs:
- A fair pile of RST conversions, many from Mauro.  These create more
    than the usual number of simple but annoying merge conflicts with other
    trees, unfortunately.  He has a lot more of these waiting on the wings
    that, I think, will go to you directly later on.
 
  - A new document on how to use merges and rebases in kernel repos, and one
    on Spectre vulnerabilities.
 
  - Various improvements to the build system, including automatic markup of
    function() references because some people, for reasons I will never
    understand, were of the opinion that :c:func:``function()`` is
    unattractive and not fun to type.
 
  - We now recommend using sphinx 1.7, but still support back to 1.4.
 
  - Lots of smaller improvements, warning fixes, typo fixes, etc.
 -----BEGIN PGP SIGNATURE-----
 
 iQFDBAABCAAtFiEEIw+MvkEiF49krdp9F0NaE2wMflgFAl0krAEPHGNvcmJldEBs
 d24ubmV0AAoJEBdDWhNsDH5Yg98H/AuLqO9LpOgUjF4LhyjxGPdzJkY9RExSJ7km
 gznyreLCZgFaJR+AY6YDsd4Jw6OJlPbu1YM/Qo3C3WrZVFVhgL/s2ebvBgCo50A8
 raAFd8jTf4/mGCHnAqRotAPQ3mETJUk315B66lBJ6Oc+YdpRhwXWq8ZW2bJxInFF
 3HDvoFgMf0KhLuMHUkkL0u3fxH1iA+KvDu8diPbJYFjOdOWENz/CV8wqdVkXRSEW
 DJxIq89h/7d+hIG3d1I7Nw+gibGsAdjSjKv4eRKauZs4Aoxd1Gpl62z0JNk6aT3m
 dtq4joLdwScydonXROD/Twn2jsu4xYTrPwVzChomElMowW/ZBBY=
 =D0eO
 -----END PGP SIGNATURE-----

Merge tag 'docs-5.3' of git://git.lwn.net/linux

Pull Documentation updates from Jonathan Corbet:
 "It's been a relatively busy cycle for docs:

   - A fair pile of RST conversions, many from Mauro. These create more
     than the usual number of simple but annoying merge conflicts with
     other trees, unfortunately. He has a lot more of these waiting on
     the wings that, I think, will go to you directly later on.

   - A new document on how to use merges and rebases in kernel repos,
     and one on Spectre vulnerabilities.

   - Various improvements to the build system, including automatic
     markup of function() references because some people, for reasons I
     will never understand, were of the opinion that
     :c:func:``function()`` is unattractive and not fun to type.

   - We now recommend using sphinx 1.7, but still support back to 1.4.

   - Lots of smaller improvements, warning fixes, typo fixes, etc"

* tag 'docs-5.3' of git://git.lwn.net/linux: (129 commits)
  docs: automarkup.py: ignore exceptions when seeking for xrefs
  docs: Move binderfs to admin-guide
  Disable Sphinx SmartyPants in HTML output
  doc: RCU callback locks need only _bh, not necessarily _irq
  docs: format kernel-parameters -- as code
  Doc : doc-guide : Fix a typo
  platform: x86: get rid of a non-existent document
  Add the RCU docs to the core-api manual
  Documentation: RCU: Add TOC tree hooks
  Documentation: RCU: Rename txt files to rst
  Documentation: RCU: Convert RCU UP systems to reST
  Documentation: RCU: Convert RCU linked list to reST
  Documentation: RCU: Convert RCU basic concepts to reST
  docs: filesystems: Remove uneeded .rst extension on toctables
  scripts/sphinx-pre-install: fix out-of-tree build
  docs: zh_CN: submitting-drivers.rst: Remove a duplicated Documentation/
  Documentation: PGP: update for newer HW devices
  Documentation: Add section about CPU vulnerabilities for Spectre
  Documentation: platform: Delete x86-laptop-drivers.txt
  docs: Note that :c:func: should no longer be used
  ...
2019-07-09 12:34:26 -07:00
Thomas Gleixner
7a338472f2 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 482
Based on 1 normalized pattern(s):

  this work is licensed under the terms of the gnu gpl version 2

extracted by the scancode license scanner the SPDX license identifier

  GPL-2.0-only

has been chosen to replace the boilerplate/reference in 48 file(s).

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Allison Randal <allison@lohutok.net>
Reviewed-by: Enrico Weigelt <info@metux.net>
Cc: linux-spdx@vger.kernel.org
Link: https://lkml.kernel.org/r/20190604081204.624030236@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-06-19 17:09:52 +02:00
Jonathan Corbet
8afecfb0ec Linux 5.2-rc4
-----BEGIN PGP SIGNATURE-----
 
 iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAlz8fAYeHHRvcnZhbGRz
 QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiG1asH/3ySguxqtqL1MCBa
 4/SZ37PHeWKMerfX6ZyJdgEqK3B+PWlmuLiOMNK5h2bPLzeQQQAmHU/mfKmpXqgB
 dHwUbG9yNnyUtTfsfRqAnCA6vpuw9Yb1oIzTCVQrgJLSWD0j7scBBvmzYqguOkto
 ThwigLUq3AILr8EfR4rh+GM+5Dn9OTEFAxwil9fPHQo7QoczwZxpURhScT6Co9TB
 DqLA3fvXbBvLs/CZy/S5vKM9hKzC+p39ApFTURvFPrelUVnythAM0dPDJg3pIn5u
 g+/+gDxDFa+7ANxvxO2ng1sJPDqJMeY/xmjJYlYyLpA33B7zLNk2vDHhAP06VTtr
 XCMhQ9s=
 =cb80
 -----END PGP SIGNATURE-----

Merge tag 'v5.2-rc4' into mauro

We need to pick up post-rc1 changes to various document files so they don't
get lost in Mauro's massive RST conversion push.
2019-06-14 14:18:53 -06:00
Mauro Carvalho Chehab
cb1aaebea8 docs: fix broken documentation links
Mostly due to x86 and acpi conversion, several documentation
links are still pointing to the old file. Fix them.

Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Reviewed-by: Wolfram Sang <wsa@the-dreams.de>
Reviewed-by: Sven Van Asbroeck <TheSven73@gmail.com>
Reviewed-by: Bhupesh Sharma <bhsharma@redhat.com>
Acked-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2019-06-08 13:42:13 -06:00
Jason Wang
7f466032dc vhost: access vq metadata through kernel virtual address
It was noticed that the copy_to/from_user() friends that was used to
access virtqueue metdata tends to be very expensive for dataplane
implementation like vhost since it involves lots of software checks,
speculation barriers, hardware feature toggling (e.g SMAP). The
extra cost will be more obvious when transferring small packets since
the time spent on metadata accessing become more significant.

This patch tries to eliminate those overheads by accessing them
through direct mapping of those pages. Invalidation callbacks is
implemented for co-operation with general VM management (swap, KSM,
THP or NUMA balancing). We will try to get the direct mapping of vq
metadata before each round of packet processing if it doesn't
exist. If we fail, we will simplely fallback to copy_to/from_user()
friends.

This invalidation and direct mapping access are synchronized through
spinlock and RCU. All matedata accessing through direct map is
protected by RCU, and the setup or invalidation are done under
spinlock.

This method might does not work for high mem page which requires
temporary mapping so we just fallback to normal
copy_to/from_user() and may not for arch that has virtual tagged cache
since extra cache flushing is needed to eliminate the alias. This will
result complex logic and bad performance. For those archs, this patch
simply go for copy_to/from_user() friends. This is done by ruling out
kernel mapping codes through ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE.

Note that this is only done when device IOTLB is not enabled. We
could use similar method to optimize IOTLB in the future.

Tests shows at most about 23% improvement on TX PPS when using
virtio-user + vhost_net + xdp1 + TAP on 2.6GHz Broadwell:

        SMAP on | SMAP off
Before: 5.2Mpps | 7.1Mpps
After:  6.4Mpps | 8.2Mpps

Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: James Bottomley <James.Bottomley@hansenpartnership.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: David Miller <davem@davemloft.net>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: linux-mm@kvack.org
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-parisc@vger.kernel.org
Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2019-06-05 21:09:18 -04:00
Jason Wang
feebcaeac7 vhost: factor out setting vring addr and num
Factoring vring address and num setting which needs special care for
accelerating vq metadata accessing.

Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2019-06-05 16:23:53 -04:00
Jason Wang
4942e8254d vhost: introduce helpers to get the size of metadata area
To avoid code duplication since it will be used by kernel VA prefetching.

Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2019-06-05 16:23:53 -04:00
Jason Wang
9b5e830b71 vhost: rename vq_iotlb_prefetch() to vq_meta_prefetch()
Rename the function to be more accurate since it actually tries to
prefetch vq metadata address in IOTLB. And this will be used by
following patch to prefetch metadata virtual addresses.

Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2019-06-05 16:23:52 -04:00
Jason Wang
7b5d753ebc vhost: fine grain userspace memory accessors
This is used to hide the metadata address from virtqueue helpers. This
will allow to implement a vmap based fast accessing to metadata.

Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2019-06-05 16:23:52 -04:00
Jason Wang
1ab5d1385a vhost: generalize adding used elem
Use one generic vhost_copy_to_user() instead of two dedicated
accessor. This will simplify the conversion to fine grain
accessors. About 2% improvement of PPS were seen during vitio-user
txonly test.

Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2019-06-05 16:23:52 -04:00
Jason Wang
e82b9b0727 vhost: introduce vhost_exceeds_weight()
We used to have vhost_exceeds_weight() for vhost-net to:

- prevent vhost kthread from hogging the cpu
- balance the time spent between TX and RX

This function could be useful for vsock and scsi as well. So move it
to vhost.c. Device must specify a weight which counts the number of
requests, or it can also specific a byte_weight which counts the
number of bytes that has been processed.

Signed-off-by: Jason Wang <jasowang@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2019-05-27 11:08:22 -04:00
Ira Weiny
73b0140bf0 mm/gup: change GUP fast to use flags rather than a write 'bool'
To facilitate additional options to get_user_pages_fast() change the
singular write parameter to be gup_flags.

This patch does not change any functionality.  New functionality will
follow in subsequent patches.

Some of the get_user_pages_fast() call sites were unchanged because they
already passed FOLL_WRITE or 0 for the write parameter.

NOTE: It was suggested to change the ordering of the get_user_pages_fast()
arguments to ensure that callers were converted.  This breaks the current
GUP call site convention of having the returned pages be the final
parameter.  So the suggestion was rejected.

Link: http://lkml.kernel.org/r/20190328084422.29911-4-ira.weiny@intel.com
Link: http://lkml.kernel.org/r/20190317183438.2057-4-ira.weiny@intel.com
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Mike Marshall <hubcap@omnibond.com>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Hogan <jhogan@kernel.org>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Rich Felker <dalias@libc.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-05-14 09:47:46 -07:00
Jason Wang
813dbeb656 vhost: reject zero size iova range
We used to accept zero size iova range which will lead a infinite loop
in translate_desc(). Fixing this by failing the request in this case.

Reported-by: syzbot+d21e6e297322a900c128@syzkaller.appspotmail.com
Fixes: 6b1e6cc7 ("vhost: new device IOTLB API")
Signed-off-by: Jason Wang <jasowang@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-10 22:45:38 -07:00
Arnd Bergmann
cfdbb4ed31 vhost: silence an unused-variable warning
On some architectures, the MMU can be disabled, leading to access_ok()
becoming an empty macro that does not evaluate its size argument,
which in turn produces an unused-variable warning:

drivers/vhost/vhost.c:1191:9: error: unused variable 's' [-Werror,-Wunused-variable]
        size_t s = vhost_has_feature(vq, VIRTIO_RING_F_EVENT_IDX) ? 2 : 0;

Mark the variable as __maybe_unused to shut up that warning.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2019-03-06 11:32:37 -05:00
Jason Wang
816db76635 vhost: correctly check the return value of translate_desc() in log_used()
When fail, translate_desc() returns negative value, otherwise the
number of iovs. So we should fail when the return value is negative
instead of a blindly check against zero.

Detected by CoverityScan, CID# 1442593:  Control flow issues  (DEADCODE)

Fixes: cc5e710759 ("vhost: log dirty page correctly")
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Reported-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-19 13:14:45 -08:00
Jason Wang
b46a0bf78a vhost: fix OOB in get_rx_bufs()
After batched used ring updating was introduced in commit e2b3b35eb9
("vhost_net: batch used ring update in rx"). We tend to batch heads in
vq->heads for more than one packet. But the quota passed to
get_rx_bufs() was not correctly limited, which can result a OOB write
in vq->heads.

        headcount = get_rx_bufs(vq, vq->heads + nvq->done_idx,
                    vhost_len, &in, vq_log, &log,
                    likely(mergeable) ? UIO_MAXIOV : 1);

UIO_MAXIOV was still used which is wrong since we could have batched
used in vq->heads, this will cause OOB if the next buffer needs more
than 960 (1024 (UIO_MAXIOV) - 64 (VHOST_NET_BATCH)) heads after we've
batched 64 (VHOST_NET_BATCH) heads:
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>

=============================================================================
BUG kmalloc-8k (Tainted: G    B            ): Redzone overwritten
-----------------------------------------------------------------------------

INFO: 0x00000000fd93b7a2-0x00000000f0713384. First byte 0xa9 instead of 0xcc
INFO: Allocated in alloc_pd+0x22/0x60 age=3933677 cpu=2 pid=2674
    kmem_cache_alloc_trace+0xbb/0x140
    alloc_pd+0x22/0x60
    gen8_ppgtt_create+0x11d/0x5f0
    i915_ppgtt_create+0x16/0x80
    i915_gem_create_context+0x248/0x390
    i915_gem_context_create_ioctl+0x4b/0xe0
    drm_ioctl_kernel+0xa5/0xf0
    drm_ioctl+0x2ed/0x3a0
    do_vfs_ioctl+0x9f/0x620
    ksys_ioctl+0x6b/0x80
    __x64_sys_ioctl+0x11/0x20
    do_syscall_64+0x43/0xf0
    entry_SYSCALL_64_after_hwframe+0x44/0xa9
INFO: Slab 0x00000000d13e87af objects=3 used=3 fp=0x          (null) flags=0x200000000010201
INFO: Object 0x0000000003278802 @offset=17064 fp=0x00000000e2e6652b

Fixing this by allocating UIO_MAXIOV + VHOST_NET_BATCH iovs for
vhost-net. This is done through set the limitation through
vhost_dev_init(), then set_owner can allocate the number of iov in a
per device manner.

This fixes CVE-2018-16880.

Fixes: e2b3b35eb9 ("vhost_net: batch used ring update in rx")
Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-28 22:53:09 -08:00
Linus Torvalds
7d0ae236ed Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:

 1) Fix endless loop in nf_tables, from Phil Sutter.

 2) Fix cross namespace ip6_gre tunnel hash list corruption, from
    Olivier Matz.

 3) Don't be too strict in phy_start_aneg() otherwise we might not allow
    restarting auto negotiation. From Heiner Kallweit.

 4) Fix various KMSAN uninitialized value cases in tipc, from Ying Xue.

 5) Memory leak in act_tunnel_key, from Davide Caratti.

 6) Handle chip errata of mv88e6390 PHY, from Andrew Lunn.

 7) Remove linear SKB assumption in fou/fou6, from Eric Dumazet.

 8) Missing udplite rehash callbacks, from Alexey Kodanev.

 9) Log dirty pages properly in vhost, from Jason Wang.

10) Use consume_skb() in neigh_probe() as this is a normal free not a
    drop, from Yang Wei. Likewise in macvlan_process_broadcast().

11) Missing device_del() in mdiobus_register() error paths, from Thomas
    Petazzoni.

12) Fix checksum handling of short packets in mlx5, from Cong Wang.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (96 commits)
  bpf: in __bpf_redirect_no_mac pull mac only if present
  virtio_net: bulk free tx skbs
  net: phy: phy driver features are mandatory
  isdn: avm: Fix string plus integer warning from Clang
  net/mlx5e: Fix cb_ident duplicate in indirect block register
  net/mlx5e: Fix wrong (zero) TX drop counter indication for representor
  net/mlx5e: Fix wrong error code return on FEC query failure
  net/mlx5e: Force CHECKSUM_UNNECESSARY for short ethernet frames
  tools: bpftool: Cleanup license mess
  bpf: fix inner map masking to prevent oob under speculation
  bpf: pull in pkt_sched.h header for tooling to fix bpftool build
  selftests: forwarding: Add a test case for externally learned FDB entries
  selftests: mlxsw: Test FDB offload indication
  mlxsw: spectrum_switchdev: Do not treat static FDB entries as sticky
  net: bridge: Mark FDB entries that were added by user as such
  mlxsw: spectrum_fid: Update dummy FID index
  mlxsw: pci: Return error on PCI reset timeout
  mlxsw: pci: Increase PCI SW reset timeout
  mlxsw: pci: Ring CQ's doorbell before RDQ's
  MAINTAINERS: update email addresses of liquidio driver maintainers
  ...
2019-01-21 12:52:31 +13:00
Jason Wang
cc5e710759 vhost: log dirty page correctly
Vhost dirty page logging API is designed to sync through GPA. But we
try to log GIOVA when device IOTLB is enabled. This is wrong and may
lead to missing data after migration.

To solve this issue, when logging with device IOTLB enabled, we will:

1) reuse the device IOTLB translation result of GIOVA->HVA mapping to
   get HVA, for writable descriptor, get HVA through iovec. For used
   ring update, translate its GIOVA to HVA
2) traverse the GPA->HVA mapping to get the possible GPA and log
   through GPA. Pay attention this reverse mapping is not guaranteed
   to be unique, so we should log each possible GPA in this case.

This fix the failure of scp to guest during migration. In -next, we
will probably support passing GIOVA->GPA instead of GIOVA->HVA.

Fixes: 6b1e6cc785 ("vhost: new device IOTLB API")
Reported-by: Jintack Lim <jintack@cs.columbia.edu>
Cc: Jintack Lim <jintack@cs.columbia.edu>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-17 21:43:24 -08:00
Pavel Tikhomirov
74ad741948 vhost: return EINVAL if iovecs size does not match the message size
We've failed to copy and process vhost_iotlb_msg so let userspace at
least know about it. For instance before these patch the code below runs
without any error:

int main()
{
  struct vhost_msg msg;
  struct iovec iov;
  int fd;

  fd = open("/dev/vhost-net", O_RDWR);
  if (fd == -1) {
    perror("open");
    return 1;
  }

  iov.iov_base = &msg;
  iov.iov_len = sizeof(msg)-4;

  if (writev(fd, &iov,1) == -1) {
    perror("writev");
    return 1;
  }

  return 0;
}

Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2019-01-14 20:28:07 -05:00
Linus Torvalds
96d4f267e4 Remove 'type' argument from access_ok() function
Nobody has actually used the type (VERIFY_READ vs VERIFY_WRITE) argument
of the user address range verification function since we got rid of the
old racy i386-only code to walk page tables by hand.

It existed because the original 80386 would not honor the write protect
bit when in kernel mode, so you had to do COW by hand before doing any
user access.  But we haven't supported that in a long time, and these
days the 'type' argument is a purely historical artifact.

A discussion about extending 'user_access_begin()' to do the range
checking resulted this patch, because there is no way we're going to
move the old VERIFY_xyz interface to that model.  And it's best done at
the end of the merge window when I've done most of my merges, so let's
just get this done once and for all.

This patch was mostly done with a sed-script, with manual fix-ups for
the cases that weren't of the trivial 'access_ok(VERIFY_xyz' form.

There were a couple of notable cases:

 - csky still had the old "verify_area()" name as an alias.

 - the iter_iov code had magical hardcoded knowledge of the actual
   values of VERIFY_{READ,WRITE} (not that they mattered, since nothing
   really used it)

 - microblaze used the type argument for a debug printout

but other than those oddities this should be a total no-op patch.

I tried to fix up all architectures, did fairly extensive grepping for
access_ok() uses, and the changes are trivial, but I may have missed
something.  Any missed conversion should be trivially fixable, though.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-01-03 18:57:57 -08:00
Jason Wang
86a07da343 Revert "net: vhost: lock the vqs one by one"
This reverts commit 78139c94dc. We don't
protect device IOTLB with vq mutex, which will lead e.g use after free
for device IOTLB entries. And since we've switched to use
mutex_trylock() in previous patch, it's safe to revert it without
having deadlock.

Fixes: commit 78139c94dc ("net: vhost: lock the vqs one by one")
Cc: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-12 21:56:20 -08:00
Jason Wang
841df92241 vhost: make sure used idx is seen before log in vhost_add_used_n()
We miss a write barrier that guarantees used idx is updated and seen
before log. This will let userspace sync and copy used ring before
used idx is update. Fix this by adding a barrier before log_write().

Fixes: 8dd014adfe ("vhost-net: mergeable buffers support")
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-12 21:56:19 -08:00
Jean-Philippe Brucker
e3f787189e vhost: fix IOTLB locking
Commit 78139c94dc ("net: vhost: lock the vqs one by one") moved the vq
lock to improve scalability, but introduced a possible deadlock in
vhost-iotlb. vhost_iotlb_notify_vq() now takes vq->mutex while holding
the device's IOTLB spinlock. And on the vhost_iotlb_miss() path, the
spinlock is taken while holding vq->mutex.

Since calling vhost_poll_queue() doesn't require any lock, avoid the
deadlock by not taking vq->mutex.

Fixes: 78139c94dc ("net: vhost: lock the vqs one by one")
Acked-by: Jason Wang <jasowang@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-03 15:52:30 -08:00
Jason Wang
ff002269a4 vhost: Fix Spectre V1 vulnerability
The idx in vhost_vring_ioctl() was controlled by userspace, hence a
potential exploitation of the Spectre variant 1 vulnerability.

Fixing this by sanitizing idx before using it to index d->vqs.

Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-31 12:39:15 -07:00
Tonghao Zhang
78139c94dc net: vhost: lock the vqs one by one
This patch changes the way that lock all vqs
at the same, to lock them one by one. It will
be used for next patch to avoid the deadlock.

Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-26 20:25:54 -07:00
Jason Wang
2d66f997f0 vhost: correctly check the iova range when waking virtqueue
We don't wakeup the virtqueue if the first byte of pending iova range
is the last byte of the range we just got updated. This will lead a
virtqueue to wait for IOTLB updating forever. Fixing by correct the
check and wake up the virtqueue in this case.

Fixes: 6b1e6cc785 ("vhost: new device IOTLB API")
Reported-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Tested-by: Peter Xu <peterx@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-25 17:39:41 -07:00
David S. Miller
a736e07468 Merge ra.kernel.org:/pub/scm/linux/kernel/git/davem/net
Overlapping changes in RXRPC, changing to ktime_get_seconds() whilst
adding some tracepoints.

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-09 11:52:36 -07:00
Jason Wang
b13f9c6364 vhost: reset metadata cache when initializing new IOTLB
We need to reset metadata cache during new IOTLB initialization,
otherwise the stale pointers to previous IOTLB may be still accessed
which will lead a use after free.

Reported-by: syzbot+c51e6736a1bf614b3272@syzkaller.appspotmail.com
Fixes: f889491380 ("vhost: introduce O(1) vq metadata cache")
Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-08 09:44:39 -07:00
Jason Wang
429711aec2 vhost: switch to use new message format
We use to have message like:

struct vhost_msg {
	int type;
	union {
		struct vhost_iotlb_msg iotlb;
		__u8 padding[64];
	};
};

Unfortunately, there will be a hole of 32bit in 64bit machine because
of the alignment. This leads a different formats between 32bit API and
64bit API. What's more it will break 32bit program running on 64bit
machine.

So fixing this by introducing a new message type with an explicit
32bit reserved field after type like:

struct vhost_msg_v2 {
	__u32 type;
	__u32 reserved;
	union {
		struct vhost_iotlb_msg iotlb;
		__u8 padding[64];
	};
};

We will have a consistent ABI after switching to use this. To enable
this capability, introduce a new ioctl (VHOST_SET_BAKCEND_FEATURE) for
userspace to enable this feature (VHOST_BACKEND_F_IOTLB_V2).

Fixes: 6b1e6cc785 ("vhost: new device IOTLB API")
Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-06 10:41:04 -07:00
Linus Torvalds
2f3f056685 virtio, vhost: features, fixes
VF support for virtio.
 DMA barriers for virtio strong barriers.
 Bugfixes.
 
 Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
 -----BEGIN PGP SIGNATURE-----
 
 iQEcBAABAgAGBQJbHykhAAoJECgfDbjSjVRpTAgH/iS2bIo0DOvlC5wPljVMopKV
 fD3n5dPUDOc2yWv2H9wwc3xDO6f3kByMjLnHvn+PM2ZX/ms731QaPd5sTlzUm+jj
 LzvI0gc9cyym8INZcU+xuTLQhiC13wZmZIHuP7X4TRsKBPTSaT+goSRk63qmuJF7
 0V8BJcj2QXaygaWD1P5SczrL4nFK7nn5PWZqZTPk3ohuLcUtgcv6Qb+idj+tCnov
 6osK122JkN6GO/LuVgEPxKamDgi9SB+sXeqNCYSzgKzXEUyC/cMtxyExXKxwqDEI
 MCcfPcoS1IklvII0ZYCTFKJYDTkPCjZ3HQwxF9aVjy4FirJGpRI3NRp5Eqr9rG4=
 =+EYn
 -----END PGP SIGNATURE-----

Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost

Pull virtio updates from Michael Tsirkin:
 "virtio, vhost: features, fixes

   - PCI virtual function support for virtio

   - DMA barriers for virtio strong barriers

   - bugfixes"

* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
  virtio: update the comments for transport features
  virtio_pci: support enabling VFs
  vhost: fix info leak due to uninitialized memory
  virtio_ring: switch to dma_XX barriers for rpmsg
2018-06-16 06:35:02 +09:00
Kees Cook
6da2ec5605 treewide: kmalloc() -> kmalloc_array()
The kmalloc() function has a 2-factor argument form, kmalloc_array(). This
patch replaces cases of:

        kmalloc(a * b, gfp)

with:
        kmalloc_array(a * b, gfp)

as well as handling cases of:

        kmalloc(a * b * c, gfp)

with:

        kmalloc(array3_size(a, b, c), gfp)

as it's slightly less ugly than:

        kmalloc_array(array_size(a, b), c, gfp)

This does, however, attempt to ignore constant size factors like:

        kmalloc(4 * 1024, gfp)

though any constants defined via macros get caught up in the conversion.

Any factors with a sizeof() of "unsigned char", "char", and "u8" were
dropped, since they're redundant.

The tools/ directory was manually excluded, since it has its own
implementation of kmalloc().

The Coccinelle script used for this was:

// Fix redundant parens around sizeof().
@@
type TYPE;
expression THING, E;
@@

(
  kmalloc(
-	(sizeof(TYPE)) * E
+	sizeof(TYPE) * E
  , ...)
|
  kmalloc(
-	(sizeof(THING)) * E
+	sizeof(THING) * E
  , ...)
)

// Drop single-byte sizes and redundant parens.
@@
expression COUNT;
typedef u8;
typedef __u8;
@@

(
  kmalloc(
-	sizeof(u8) * (COUNT)
+	COUNT
  , ...)
|
  kmalloc(
-	sizeof(__u8) * (COUNT)
+	COUNT
  , ...)
|
  kmalloc(
-	sizeof(char) * (COUNT)
+	COUNT
  , ...)
|
  kmalloc(
-	sizeof(unsigned char) * (COUNT)
+	COUNT
  , ...)
|
  kmalloc(
-	sizeof(u8) * COUNT
+	COUNT
  , ...)
|
  kmalloc(
-	sizeof(__u8) * COUNT
+	COUNT
  , ...)
|
  kmalloc(
-	sizeof(char) * COUNT
+	COUNT
  , ...)
|
  kmalloc(
-	sizeof(unsigned char) * COUNT
+	COUNT
  , ...)
)

// 2-factor product with sizeof(type/expression) and identifier or constant.
@@
type TYPE;
expression THING;
identifier COUNT_ID;
constant COUNT_CONST;
@@

(
- kmalloc
+ kmalloc_array
  (
-	sizeof(TYPE) * (COUNT_ID)
+	COUNT_ID, sizeof(TYPE)
  , ...)
|
- kmalloc
+ kmalloc_array
  (
-	sizeof(TYPE) * COUNT_ID
+	COUNT_ID, sizeof(TYPE)
  , ...)
|
- kmalloc
+ kmalloc_array
  (
-	sizeof(TYPE) * (COUNT_CONST)
+	COUNT_CONST, sizeof(TYPE)
  , ...)
|
- kmalloc
+ kmalloc_array
  (
-	sizeof(TYPE) * COUNT_CONST
+	COUNT_CONST, sizeof(TYPE)
  , ...)
|
- kmalloc
+ kmalloc_array
  (
-	sizeof(THING) * (COUNT_ID)
+	COUNT_ID, sizeof(THING)
  , ...)
|
- kmalloc
+ kmalloc_array
  (
-	sizeof(THING) * COUNT_ID
+	COUNT_ID, sizeof(THING)
  , ...)
|
- kmalloc
+ kmalloc_array
  (
-	sizeof(THING) * (COUNT_CONST)
+	COUNT_CONST, sizeof(THING)
  , ...)
|
- kmalloc
+ kmalloc_array
  (
-	sizeof(THING) * COUNT_CONST
+	COUNT_CONST, sizeof(THING)
  , ...)
)

// 2-factor product, only identifiers.
@@
identifier SIZE, COUNT;
@@

- kmalloc
+ kmalloc_array
  (
-	SIZE * COUNT
+	COUNT, SIZE
  , ...)

// 3-factor product with 1 sizeof(type) or sizeof(expression), with
// redundant parens removed.
@@
expression THING;
identifier STRIDE, COUNT;
type TYPE;
@@

(
  kmalloc(
-	sizeof(TYPE) * (COUNT) * (STRIDE)
+	array3_size(COUNT, STRIDE, sizeof(TYPE))
  , ...)
|
  kmalloc(
-	sizeof(TYPE) * (COUNT) * STRIDE
+	array3_size(COUNT, STRIDE, sizeof(TYPE))
  , ...)
|
  kmalloc(
-	sizeof(TYPE) * COUNT * (STRIDE)
+	array3_size(COUNT, STRIDE, sizeof(TYPE))
  , ...)
|
  kmalloc(
-	sizeof(TYPE) * COUNT * STRIDE
+	array3_size(COUNT, STRIDE, sizeof(TYPE))
  , ...)
|
  kmalloc(
-	sizeof(THING) * (COUNT) * (STRIDE)
+	array3_size(COUNT, STRIDE, sizeof(THING))
  , ...)
|
  kmalloc(
-	sizeof(THING) * (COUNT) * STRIDE
+	array3_size(COUNT, STRIDE, sizeof(THING))
  , ...)
|
  kmalloc(
-	sizeof(THING) * COUNT * (STRIDE)
+	array3_size(COUNT, STRIDE, sizeof(THING))
  , ...)
|
  kmalloc(
-	sizeof(THING) * COUNT * STRIDE
+	array3_size(COUNT, STRIDE, sizeof(THING))
  , ...)
)

// 3-factor product with 2 sizeof(variable), with redundant parens removed.
@@
expression THING1, THING2;
identifier COUNT;
type TYPE1, TYPE2;
@@

(
  kmalloc(
-	sizeof(TYPE1) * sizeof(TYPE2) * COUNT
+	array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2))
  , ...)
|
  kmalloc(
-	sizeof(TYPE1) * sizeof(THING2) * (COUNT)
+	array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2))
  , ...)
|
  kmalloc(
-	sizeof(THING1) * sizeof(THING2) * COUNT
+	array3_size(COUNT, sizeof(THING1), sizeof(THING2))
  , ...)
|
  kmalloc(
-	sizeof(THING1) * sizeof(THING2) * (COUNT)
+	array3_size(COUNT, sizeof(THING1), sizeof(THING2))
  , ...)
|
  kmalloc(
-	sizeof(TYPE1) * sizeof(THING2) * COUNT
+	array3_size(COUNT, sizeof(TYPE1), sizeof(THING2))
  , ...)
|
  kmalloc(
-	sizeof(TYPE1) * sizeof(THING2) * (COUNT)
+	array3_size(COUNT, sizeof(TYPE1), sizeof(THING2))
  , ...)
)

// 3-factor product, only identifiers, with redundant parens removed.
@@
identifier STRIDE, SIZE, COUNT;
@@

(
  kmalloc(
-	(COUNT) * STRIDE * SIZE
+	array3_size(COUNT, STRIDE, SIZE)
  , ...)
|
  kmalloc(
-	COUNT * (STRIDE) * SIZE
+	array3_size(COUNT, STRIDE, SIZE)
  , ...)
|
  kmalloc(
-	COUNT * STRIDE * (SIZE)
+	array3_size(COUNT, STRIDE, SIZE)
  , ...)
|
  kmalloc(
-	(COUNT) * (STRIDE) * SIZE
+	array3_size(COUNT, STRIDE, SIZE)
  , ...)
|
  kmalloc(
-	COUNT * (STRIDE) * (SIZE)
+	array3_size(COUNT, STRIDE, SIZE)
  , ...)
|
  kmalloc(
-	(COUNT) * STRIDE * (SIZE)
+	array3_size(COUNT, STRIDE, SIZE)
  , ...)
|
  kmalloc(
-	(COUNT) * (STRIDE) * (SIZE)
+	array3_size(COUNT, STRIDE, SIZE)
  , ...)
|
  kmalloc(
-	COUNT * STRIDE * SIZE
+	array3_size(COUNT, STRIDE, SIZE)
  , ...)
)

// Any remaining multi-factor products, first at least 3-factor products,
// when they're not all constants...
@@
expression E1, E2, E3;
constant C1, C2, C3;
@@

(
  kmalloc(C1 * C2 * C3, ...)
|
  kmalloc(
-	(E1) * E2 * E3
+	array3_size(E1, E2, E3)
  , ...)
|
  kmalloc(
-	(E1) * (E2) * E3
+	array3_size(E1, E2, E3)
  , ...)
|
  kmalloc(
-	(E1) * (E2) * (E3)
+	array3_size(E1, E2, E3)
  , ...)
|
  kmalloc(
-	E1 * E2 * E3
+	array3_size(E1, E2, E3)
  , ...)
)

// And then all remaining 2 factors products when they're not all constants,
// keeping sizeof() as the second factor argument.
@@
expression THING, E1, E2;
type TYPE;
constant C1, C2, C3;
@@

(
  kmalloc(sizeof(THING) * C2, ...)
|
  kmalloc(sizeof(TYPE) * C2, ...)
|
  kmalloc(C1 * C2 * C3, ...)
|
  kmalloc(C1 * C2, ...)
|
- kmalloc
+ kmalloc_array
  (
-	sizeof(TYPE) * (E2)
+	E2, sizeof(TYPE)
  , ...)
|
- kmalloc
+ kmalloc_array
  (
-	sizeof(TYPE) * E2
+	E2, sizeof(TYPE)
  , ...)
|
- kmalloc
+ kmalloc_array
  (
-	sizeof(THING) * (E2)
+	E2, sizeof(THING)
  , ...)
|
- kmalloc
+ kmalloc_array
  (
-	sizeof(THING) * E2
+	E2, sizeof(THING)
  , ...)
|
- kmalloc
+ kmalloc_array
  (
-	(E1) * E2
+	E1, E2
  , ...)
|
- kmalloc
+ kmalloc_array
  (
-	(E1) * (E2)
+	E1, E2
  , ...)
|
- kmalloc
+ kmalloc_array
  (
-	E1 * E2
+	E1, E2
  , ...)
)

Signed-off-by: Kees Cook <keescook@chromium.org>
2018-06-12 16:19:22 -07:00
Matthew Wilcox
b2303d7bf7 Convert vhost to struct_size
Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
2018-06-12 16:19:22 -07:00
Michael S. Tsirkin
670ae9caac vhost: fix info leak due to uninitialized memory
struct vhost_msg within struct vhost_msg_node is copied to userspace.
Unfortunately it turns out on 64 bit systems vhost_msg has padding after
type which gcc doesn't initialize, leaking 4 uninitialized bytes to
userspace.

This padding also unfortunately means 32 bit users of this interface are
broken on a 64 bit kernel which will need to be fixed separately.

Fixes: CVE-2018-1118
Cc: stable@vger.kernel.org
Reported-by: Kevin Easton <kevin@guarana.org>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reported-by: syzbot+87cfa083e727a224754b@syzkaller.appspotmail.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2018-06-12 04:59:29 +03:00