2
0
mirror of https://github.com/edk2-porting/linux-next.git synced 2025-01-02 18:54:10 +08:00
linux-next/drivers
Asias He a98755c559 virtio-blk: Add bio-based IO path for virtio-blk
This patch introduces bio-based IO path for virtio-blk.

Compared to request-based IO path, bio-based IO path uses driver
provided ->make_request_fn() method to bypasses the IO scheduler. It
handles the bio to device directly without allocating a request in block
layer. This reduces the IO path in guest kernel to achieve high IOPS
and lower latency. The downside is that guest can not use the IO
scheduler to merge and sort requests. However, this is not a big problem
if the backend disk in host side uses faster disk device.

When the bio-based IO path is not enabled, virtio-blk still uses the
original request-based IO path, no performance difference is observed.

Using a slow device e.g. normal SATA disk, the bio-based IO path for
sequential read and write are slower than req-based IO path due to lack
of merge in guest kernel. So we make the bio-based path optional.

Performance evaluation:
-----------------------------
1) Fio test is performed in a 8 vcpu guest with ramdisk based guest using
kvm tool.

Short version:
 With bio-based IO path, sequential read/write, random read/write
 IOPS boost         : 28%, 24%, 21%, 16%
 Latency improvement: 32%, 17%, 21%, 16%

Long version:
 With bio-based IO path:
  seq-read  : io=2048.0MB, bw=116996KB/s, iops=233991 , runt= 17925msec
  seq-write : io=2048.0MB, bw=100829KB/s, iops=201658 , runt= 20799msec
  rand-read : io=3095.7MB, bw=112134KB/s, iops=224268 , runt= 28269msec
  rand-write: io=3095.7MB, bw=96198KB/s,  iops=192396 , runt= 32952msec
    clat (usec): min=0 , max=2631.6K, avg=58716.99, stdev=191377.30
    clat (usec): min=0 , max=1753.2K, avg=66423.25, stdev=81774.35
    clat (usec): min=0 , max=2915.5K, avg=61685.70, stdev=120598.39
    clat (usec): min=0 , max=1933.4K, avg=76935.12, stdev=96603.45
  cpu : usr=74.08%, sys=703.84%, ctx=29661403, majf=21354, minf=22460954
  cpu : usr=70.92%, sys=702.81%, ctx=77219828, majf=13980, minf=27713137
  cpu : usr=72.23%, sys=695.37%, ctx=88081059, majf=18475, minf=28177648
  cpu : usr=69.69%, sys=654.13%, ctx=145476035, majf=15867, minf=26176375
 With request-based IO path:
  seq-read  : io=2048.0MB, bw=91074KB/s, iops=182147 , runt= 23027msec
  seq-write : io=2048.0MB, bw=80725KB/s, iops=161449 , runt= 25979msec
  rand-read : io=3095.7MB, bw=92106KB/s, iops=184211 , runt= 34416msec
  rand-write: io=3095.7MB, bw=82815KB/s, iops=165630 , runt= 38277msec
    clat (usec): min=0 , max=1932.4K, avg=77824.17, stdev=170339.49
    clat (usec): min=0 , max=2510.2K, avg=78023.96, stdev=146949.15
    clat (usec): min=0 , max=3037.2K, avg=74746.53, stdev=128498.27
    clat (usec): min=0 , max=1363.4K, avg=89830.75, stdev=114279.68
  cpu : usr=53.28%, sys=724.19%, ctx=37988895, majf=17531, minf=23577622
  cpu : usr=49.03%, sys=633.20%, ctx=205935380, majf=18197, minf=27288959
  cpu : usr=55.78%, sys=722.40%, ctx=101525058, majf=19273, minf=28067082
  cpu : usr=56.55%, sys=690.83%, ctx=228205022, majf=18039, minf=26551985

2) Fio test is performed in a 8 vcpu guest with Fusion-IO based guest using
kvm tool.

Short version:
 With bio-based IO path, sequential read/write, random read/write
 IOPS boost         : 11%, 11%, 13%, 10%
 Latency improvement: 10%, 10%, 12%, 10%
Long Version:
 With bio-based IO path:
  read : io=2048.0MB, bw=58920KB/s, iops=117840 , runt= 35593msec
  write: io=2048.0MB, bw=64308KB/s, iops=128616 , runt= 32611msec
  read : io=3095.7MB, bw=59633KB/s, iops=119266 , runt= 53157msec
  write: io=3095.7MB, bw=62993KB/s, iops=125985 , runt= 50322msec
    clat (usec): min=0 , max=1284.3K, avg=128109.01, stdev=71513.29
    clat (usec): min=94 , max=962339 , avg=116832.95, stdev=65836.80
    clat (usec): min=0 , max=1846.6K, avg=128509.99, stdev=89575.07
    clat (usec): min=0 , max=2256.4K, avg=121361.84, stdev=82747.25
  cpu : usr=56.79%, sys=421.70%, ctx=147335118, majf=21080, minf=19852517
  cpu : usr=61.81%, sys=455.53%, ctx=143269950, majf=16027, minf=24800604
  cpu : usr=63.10%, sys=455.38%, ctx=178373538, majf=16958, minf=24822612
  cpu : usr=62.04%, sys=453.58%, ctx=226902362, majf=16089, minf=23278105
 With request-based IO path:
  read : io=2048.0MB, bw=52896KB/s, iops=105791 , runt= 39647msec
  write: io=2048.0MB, bw=57856KB/s, iops=115711 , runt= 36248msec
  read : io=3095.7MB, bw=52387KB/s, iops=104773 , runt= 60510msec
  write: io=3095.7MB, bw=57310KB/s, iops=114619 , runt= 55312msec
    clat (usec): min=0 , max=1532.6K, avg=142085.62, stdev=109196.84
    clat (usec): min=0 , max=1487.4K, avg=129110.71, stdev=114973.64
    clat (usec): min=0 , max=1388.6K, avg=145049.22, stdev=107232.55
    clat (usec): min=0 , max=1465.9K, avg=133585.67, stdev=110322.95
  cpu : usr=44.08%, sys=590.71%, ctx=451812322, majf=14841, minf=17648641
  cpu : usr=48.73%, sys=610.78%, ctx=418953997, majf=22164, minf=26850689
  cpu : usr=45.58%, sys=581.16%, ctx=714079216, majf=21497, minf=22558223
  cpu : usr=48.40%, sys=599.65%, ctx=656089423, majf=16393, minf=23824409

3) Fio test is performed in a 8 vcpu guest with normal SATA based guest
using kvm tool.

Short version:
 With bio-based IO path, sequential read/write, random read/write
 IOPS boost         : -10%, -10%, 4.4%, 0.5%
 Latency improvement: -12%, -15%, 2.5%, 0.8%
Long Version:
 With bio-based IO path:
  read : io=124812KB, bw=36537KB/s, iops=9060 , runt=  3416msec
  write: io=169180KB, bw=24406KB/s, iops=6065 , runt=  6932msec
  read : io=256200KB, bw=2089.3KB/s, iops=520 , runt=122630msec
  write: io=257988KB, bw=1545.7KB/s, iops=384 , runt=166910msec
    clat (msec): min=1 , max=1527 , avg=28.06, stdev=89.54
    clat (msec): min=2 , max=344 , avg=41.12, stdev=38.70
    clat (msec): min=8 , max=1984 , avg=490.63, stdev=207.28
    clat (msec): min=33 , max=4131 , avg=659.19, stdev=304.71
  cpu          : usr=4.85%, sys=17.15%, ctx=31593, majf=0, minf=7
  cpu          : usr=3.04%, sys=11.45%, ctx=39377, majf=0, minf=0
  cpu          : usr=0.47%, sys=1.59%, ctx=262986, majf=0, minf=16
  cpu          : usr=0.47%, sys=1.46%, ctx=337410, majf=0, minf=0

 With request-based IO path:
  read : io=150120KB, bw=40420KB/s, iops=10037 , runt=  3714msec
  write: io=194932KB, bw=27029KB/s, iops=6722 , runt=  7212msec
  read : io=257136KB, bw=2001.1KB/s, iops=498 , runt=128443msec
  write: io=258276KB, bw=1537.2KB/s, iops=382 , runt=168028msec
    clat (msec): min=1 , max=1542 , avg=24.84, stdev=32.45
    clat (msec): min=3 , max=628 , avg=35.62, stdev=39.71
    clat (msec): min=8 , max=2540 , avg=503.28, stdev=236.97
    clat (msec): min=41 , max=4398 , avg=653.88, stdev=302.61
  cpu          : usr=3.91%, sys=15.75%, ctx=26968, majf=0, minf=23
  cpu          : usr=2.50%, sys=10.56%, ctx=19090, majf=0, minf=0
  cpu          : usr=0.16%, sys=0.43%, ctx=20159, majf=0, minf=16
  cpu          : usr=0.18%, sys=0.53%, ctx=81364, majf=0, minf=0

How to use:
-----------------------------
Add 'virtio_blk.use_bio=1' to kernel cmdline or 'modprobe virtio_blk
use_bio=1' to enable ->make_request_fn() based I/O path.

Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Tejun Heo <tj@kernel.org>
Cc: Shaohua Li <shli@kernel.org>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: kvm@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: virtualization@lists.linux-foundation.org
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Minchan Kim <minchan.kim@gmail.com>
Signed-off-by: Asias He <asias@redhat.com>
Acked-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2012-09-28 15:05:13 +09:30
..
accessibility
acpi ACPI / PM: Use KERN_DEBUG when no power resources are found 2012-09-14 20:54:44 +02:00
amba Merge branch 'for-linus' of git://git.linaro.org/people/rmk/linux-arm 2012-07-27 15:14:26 -07:00
ata ahci: Add identifiers for ASM106x devices 2012-09-13 00:24:29 -04:00
atm drivers/atm/iphase.c: fix error return code 2012-08-06 13:29:57 -07:00
auxdisplay
base mm: cma: fix alignment requirements for contiguous regions 2012-08-28 21:01:01 +02:00
bcma bcma: BCM43228 support 2012-08-02 13:51:46 -04:00
block virtio-blk: Add bio-based IO path for virtio-blk 2012-09-28 15:05:13 +09:30
bluetooth Bluetooth: Add support for Apple vendor-specific devices 2012-08-27 08:36:42 -05:00
cdrom
char virtio: console: fix error handling in init() function 2012-09-28 15:05:13 +09:30
clk clk: validate pointer in __clk_disable() 2012-07-30 17:25:13 -07:00
clocksource cs5535-clockevt: typo, it's MFGPT, not MFPGT 2012-08-21 16:45:02 -07:00
connector
cpufreq Merge branch 'imx/fixes-for-3.6' of git://git.linaro.org/people/shawnguo/linux-2.6 into fixes 2012-08-23 17:02:42 +02:00
cpuidle cpuidle: Prevent null pointer dereference in cpuidle_coupled_cpu_notify 2012-08-17 19:37:08 +02:00
crypto Merge git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 2012-09-12 07:14:17 +08:00
dca
devfreq
dio
dma dma: tegra: enable/disable dma clock 2012-08-13 10:15:22 +05:30
edac Merge branch 'devel' 2012-07-29 21:11:05 -03:00
eisa
extcon This is the remaining MFD fixes for 3.6, with 5 pending fixes: 2012-09-16 13:22:21 -07:00
firewire - Small fixes and optimizations. 2012-07-30 09:32:39 -07:00
firmware This patch series contains a major revamp of how we collect entropy 2012-07-31 19:07:42 -07:00
gpio gpio: rdc321x: Prevent removal of modules exporting active GPIOs 2012-09-01 12:52:24 +02:00
gpu drm/nouveau: fix booting with plymouth + dumb support 2012-09-14 15:45:01 +10:00
hid Merge branch 'upstream-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid 2012-09-07 12:29:38 -07:00
hsi
hv This patch series contains a major revamp of how we collect entropy 2012-07-31 19:07:42 -07:00
hwmon hwmon: (ina2xx) Fix word size register read and write operations 2012-09-12 06:42:11 -07:00
hwspinlock hwspinlock/core: move the dereference below the NULL test 2012-09-10 13:19:25 +03:00
i2c Merge branch 'i2c-embedded/for-current' of git://git.pengutronix.de/git/wsa/linux 2012-09-14 17:55:57 -07:00
ide ide: fix generic_ide_suspend/resume Oops 2012-08-21 14:54:42 -07:00
idle intel_idle: Check cpu_idle_get_driver() for NULL before dereferencing it. 2012-08-17 19:37:14 +02:00
ieee802154
iio drivers/iio/adc/at91_adc.c: adjust inconsistent IS_ERR and PTR_ERR 2012-08-27 21:15:25 +01:00
infiniband Merge branches 'cxgb4', 'ipoib', 'mlx4', 'ocrdma' and 'qib' into for-next 2012-09-14 10:42:52 -07:00
input Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input 2012-09-08 16:20:59 -07:00
iommu iommu/amd: Fix wrong check for ARRAY_SIZE() 2012-08-10 11:34:08 +02:00
isdn mISDN: Fix wrong usage of flush_work_sync while holding locks 2012-09-13 14:58:54 -04:00
leds leds: renesas: fix error handling 2012-08-13 14:34:02 +08:00
lguest
macintosh
md md/raid10: fix problem with on-stack allocation of r10bio structure. 2012-08-18 09:51:42 +10:00
media Merge branch 'v4l_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media 2012-08-21 16:54:38 -07:00
memory
memstick
message drivers/message/i2o/i2o_config.c: bound allocation 2012-07-30 17:25:17 -07:00
mfd This is the remaining MFD fixes for 3.6, with 5 pending fixes: 2012-09-16 13:22:21 -07:00
misc drivers/misc/sgi-xp/xpc_uv.c: SGI XPC fails to load when cpu 0 is out of IRQ resources 2012-08-21 16:45:03 -07:00
mmc mmc: omap: fix broken PIO mode 2012-09-04 13:58:11 -04:00
mtd UBI: fix a horrible memory deallocation bug 2012-09-04 09:40:26 +03:00
net Last set of InfiniBand/RDMA fixes for 3.6: 2012-09-17 13:21:02 -07:00
nfc
nubus
of dt: introduce for_each_available_child_of_node, of_get_next_available_child 2012-08-20 02:16:00 -07:00
oprofile
parisc PCI changes for the 3.6 merge window: 2012-07-24 16:17:07 -07:00
parport
pci PCI: Don't print anything while decoding is disabled 2012-08-23 10:53:08 -06:00
pcmcia Merge branch 'for-linus' of git://git.linaro.org/people/rmk/linux-arm 2012-07-27 15:14:26 -07:00
pinctrl pinctrl/nomadik: add kp_b_2 keyboard function group list 2012-08-17 11:09:58 +02:00
platform thinkpad_acpi: buffer overflow in fan_get_status() 2012-09-13 16:46:31 -04:00
pnp
power Merge branch 'for-linus-3.6' of git://dev.laptop.org/users/dilinger/linux-olpc 2012-08-02 11:52:39 -07:00
pps pps: return PTR_ERR on error in device_create 2012-07-30 17:25:21 -07:00
ps3
ptp
pwm pwm: pwm-tiehrpwm: Fix conflicting channel period setting 2012-09-10 17:04:38 +02:00
rapidio rapidio/tsi721: fix unused variable compiler warning 2012-08-21 16:45:03 -07:00
regulator This is the remaining MFD fixes for 3.6, with 5 pending fixes: 2012-09-16 13:22:21 -07:00
remoteproc A batch of remoteproc patches for 3.6: 2012-07-26 16:19:08 -07:00
rpmsg A batch of remoteproc patches for 3.6: 2012-07-26 16:19:08 -07:00
rtc drivers/rtc/rtc-twl.c: ensure all interrupts are disabled during probe 2012-09-17 15:00:38 -07:00
s390 s390/dasd: fix ioctl return value 2012-08-28 10:08:31 +02:00
sbus
scsi [SCSI] Fix 'Device not ready' issue on mpt2sas 2012-08-22 09:42:54 +04:00
sfi
sh sh: intc: Handle domain association for sparseirq pre-allocated vectors. 2012-08-09 13:21:05 +09:00
sn
spi Merge branch 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus 2012-08-25 11:45:04 -07:00
ssb
staging This is the remaining MFD fixes for 3.6, with 5 pending fixes: 2012-09-16 13:22:21 -07:00
target target: go through normal processing for zero-length REQUEST_SENSE 2012-09-07 11:32:54 -07:00
tc
thermal The tag contains just a few battery-related changes for v3.6. It's is 2012-07-31 18:08:25 -07:00
tty tty: serial: imx: don't reinit clock in imx_setup_ufcr() 2012-09-05 12:44:44 -07:00
uio
usb Merge branch 'chipidea-stable' into usb-linus 2012-09-12 11:12:31 -07:00
uwb
vfio vfio: grab vfio_device reference *before* exposing the sucker via fd_install() 2012-08-22 10:26:42 -04:00
vhost tcm_vhost: Fix vhost_scsi_target structure alignment 2012-08-20 14:52:11 -07:00
video OMAPFB: fix framebuffer console colors 2012-08-23 12:37:22 +00:00
virt
virtio [SCSI] virtio-scsi: Add vdrv->scan for post VIRTIO_CONFIG_S_DRIVER_OK LUN scanning 2012-07-20 08:59:03 +01:00
vlynq
vme
w1 1-Wire: Add support for the maxim ds1825 temperature sensor 2012-08-16 12:33:59 -07:00
watchdog watchdog: da9052: Remove duplicate inclusion of delay.h 2012-08-29 17:13:06 +02:00
xen xen/pciback: Fix proper FLR steps. 2012-09-06 09:22:02 -04:00
zorro
Kconfig vfio: VFIO core 2012-07-31 08:16:22 -06:00
Makefile vfio: VFIO core 2012-07-31 08:16:22 -06:00