linux

korg/linux

mirror of https://mirrors.bfsu.edu.cn/git/linux.git synced 2024-11-25 13:14:07 +08:00

History

Paolo Valente aee69d78de block, bfq: introduce the BFQ-v0 I/O scheduler as an extra scheduler We tag as v0 the version of BFQ containing only BFQ's engine plus hierarchical support. BFQ's engine is introduced by this commit, while hierarchical support is added by next commit. We use the v0 tag to distinguish this minimal version of BFQ from the versions containing also the features and the improvements added by next commits. BFQ-v0 coincides with the version of BFQ submitted a few years ago [1], apart from the introduction of preemption, described below. BFQ is a proportional-share I/O scheduler, whose general structure, plus a lot of code, are borrowed from CFQ. - Each process doing I/O on a device is associated with a weight and a (bfq_)queue. - BFQ grants exclusive access to the device, for a while, to one queue (process) at a time, and implements this service model by associating every queue with a budget, measured in number of sectors. - After a queue is granted access to the device, the budget of the queue is decremented, on each request dispatch, by the size of the request. - The in-service queue is expired, i.e., its service is suspended, only if one of the following events occurs: 1) the queue finishes its budget, 2) the queue empties, 3) a "budget timeout" fires. - The budget timeout prevents processes doing random I/O from holding the device for too long and dramatically reducing throughput. - Actually, as in CFQ, a queue associated with a process issuing sync requests may not be expired immediately when it empties. In contrast, BFQ may idle the device for a short time interval, giving the process the chance to go on being served if it issues a new request in time. Device idling typically boosts the throughput on rotational devices, if processes do synchronous and sequential I/O. In addition, under BFQ, device idling is also instrumental in guaranteeing the desired throughput fraction to processes issuing sync requests (see [2] for details). - With respect to idling for service guarantees, if several processes are competing for the device at the same time, but all processes (and groups, after the following commit) have the same weight, then BFQ guarantees the expected throughput distribution without ever idling the device. Throughput is thus as high as possible in this common scenario. - Queues are scheduled according to a variant of WF2Q+, named B-WF2Q+, and implemented using an augmented rb-tree to preserve an O(log N) overall complexity. See [2] for more details. B-WF2Q+ is also ready for hierarchical scheduling. However, for a cleaner logical breakdown, the code that enables and completes hierarchical support is provided in the next commit, which focuses exactly on this feature. - B-WF2Q+ guarantees a tight deviation with respect to an ideal, perfectly fair, and smooth service. In particular, B-WF2Q+ guarantees that each queue receives a fraction of the device throughput proportional to its weight, even if the throughput fluctuates, and regardless of: the device parameters, the current workload and the budgets assigned to the queue. - The last, budget-independence, property (although probably counterintuitive in the first place) is definitely beneficial, for the following reasons: - First, with any proportional-share scheduler, the maximum deviation with respect to an ideal service is proportional to the maximum budget (slice) assigned to queues. As a consequence, BFQ can keep this deviation tight not only because of the accurate service of B-WF2Q+, but also because BFQ does not need to assign a larger budget to a queue to let the queue receive a higher fraction of the device throughput. - Second, BFQ is free to choose, for every process (queue), the budget that best fits the needs of the process, or best leverages the I/O pattern of the process. In particular, BFQ updates queue budgets with a simple feedback-loop algorithm that allows a high throughput to be achieved, while still providing tight latency guarantees to time-sensitive applications. When the in-service queue expires, this algorithm computes the next budget of the queue so as to: - Let large budgets be eventually assigned to the queues associated with I/O-bound applications performing sequential I/O: in fact, the longer these applications are served once got access to the device, the higher the throughput is. - Let small budgets be eventually assigned to the queues associated with time-sensitive applications (which typically perform sporadic and short I/O), because, the smaller the budget assigned to a queue waiting for service is, the sooner B-WF2Q+ will serve that queue (Subsec 3.3 in [2]). - Weights can be assigned to processes only indirectly, through I/O priorities, and according to the relation: weight = 10 * (IOPRIO_BE_NR - ioprio). The next patch provides, instead, a cgroups interface through which weights can be assigned explicitly. - If several processes are competing for the device at the same time, but all processes and groups have the same weight, then BFQ guarantees the expected throughput distribution without ever idling the device. It uses preemption instead. Throughput is then much higher in this common scenario. - ioprio classes are served in strict priority order, i.e., lower-priority queues are not served as long as there are higher-priority queues. Among queues in the same class, the bandwidth is distributed in proportion to the weight of each queue. A very thin extra bandwidth is however guaranteed to the Idle class, to prevent it from starving. - If the strict_guarantees parameter is set (default: unset), then BFQ - always performs idling when the in-service queue becomes empty; - forces the device to serve one I/O request at a time, by dispatching a new request only if there is no outstanding request. In the presence of differentiated weights or I/O-request sizes, both the above conditions are needed to guarantee that every queue receives its allotted share of the bandwidth (see Documentation/block/bfq-iosched.txt for more details). Setting strict_guarantees may evidently affect throughput. [1] https://lkml.org/lkml/2008/4/1/234 https://lkml.org/lkml/2008/11/11/148 [2] P. Valente and M. Andreolini, "Improving Application Responsiveness with the BFQ Disk I/O Scheduler", Proceedings of the 5th Annual International Systems and Storage Conference (SYSTOR '12), June 2012. Slightly extended version: http://algogroup.unimore.it/people/paolo/disk_sched/bfq-v1-suite- results.pdf Signed-off-by: Fabio Checconi <fchecconi@gmail.com> Signed-off-by: Paolo Valente <paolo.valente@linaro.org> Signed-off-by: Arianna Avanzini <avanzini.arianna@gmail.com> Signed-off-by: Jens Axboe <axboe@fb.com>		2017-04-19 08:29:02 -06:00
..
partitions	partitions/efi: Fix integer overflow in GPT size calculation	2017-01-17 09:02:31 -07:00
badblocks.c	badblocks: badblocks_set/clear update unacked_exist	2016-10-21 15:45:47 -06:00
bfq-iosched.c	block, bfq: introduce the BFQ-v0 I/O scheduler as an extra scheduler	2017-04-19 08:29:02 -06:00
bio-integrity.c	block: remove bio_is_rw	2016-10-28 08:45:17 -06:00
bio.c	block: trace completion of all bios.	2017-04-07 09:40:52 -06:00
blk-cgroup.c	blkcg: allocate struct blkcg_gq outside request queue spinlock	2017-03-29 11:27:19 -06:00
blk-core.c	block: trace completion of all bios.	2017-04-07 09:40:52 -06:00
blk-exec.c	block: introduce blk_rq_is_passthrough	2017-01-31 14:00:34 -07:00
blk-flush.c	block: remove outdated part of blkdev_issue_flush() comment	2017-03-24 15:41:30 -06:00
blk-integrity.c	block: constify struct blk_integrity_profile	2017-03-24 20:34:39 -06:00
blk-ioc.c	Merge branch 'for-linus' of git://git.kernel.dk/linux-block	2017-03-03 10:53:35 -08:00
blk-lib.c	block: remove the discard_zeroes_data flag	2017-04-08 11:25:38 -06:00
blk-map.c	sched/headers: Prepare for new header dependencies before moving code to <linux/sched/task_stack.h>	2017-03-02 08:42:36 +01:00
blk-merge.c	block: implement splitting of REQ_OP_WRITE_ZEROES bios	2017-04-08 11:25:38 -06:00
blk-mq-cpumap.c	blk-mq: export blk_mq_map_queues	2016-11-08 17:30:00 -05:00
blk-mq-debugfs.c	blk-mq: Show symbolic names for hctx state and flags	2017-04-10 16:13:33 -06:00
blk-mq-pci.c	blk-mq-pci: Fix two spelling mistakes	2017-03-29 11:09:51 -06:00
blk-mq-sched.c	blk-mq-sched: provide hooks for initializing hardware queue data	2017-04-07 12:45:41 -06:00
blk-mq-sched.h	blk-mq-sched: make completed_request() callback more useful	2017-04-14 14:06:57 -06:00
blk-mq-sysfs.c	blk-mq: free hctx->cpumask in release handler of hctx's kobject	2017-03-08 09:56:12 -07:00
blk-mq-tag.c	blk-mq: add shallow depth option for blk_mq_get_tag()	2017-04-14 14:06:54 -06:00
blk-mq-tag.h	blk-mq-sched: Allocate sched reserved tags as specified in the original queue tagset	2017-03-02 08:56:04 -07:00
blk-mq-virtio.c	blk-mq: provide a default queue mapping for virtio device	2017-02-27 20:54:05 +02:00
blk-mq.c	blk-mq-sched: make completed_request() callback more useful	2017-04-14 14:06:57 -06:00
blk-mq.h	blk-mq: add shallow depth option for blk_mq_get_tag()	2017-04-14 14:06:54 -06:00
blk-settings.c	block: remove the discard_zeroes_data flag	2017-04-08 11:25:38 -06:00
blk-softirq.c	sched/headers: Prepare for new header dependencies before moving code to <linux/sched/topology.h>	2017-03-02 08:42:26 +01:00
blk-stat.c	blk-throttle: add a mechanism to estimate IO latency	2017-03-28 08:02:20 -06:00
blk-stat.h	blk-throttle: add a mechanism to estimate IO latency	2017-03-28 08:02:20 -06:00
blk-sysfs.c	block: remove the discard_zeroes_data flag	2017-04-08 11:25:38 -06:00
blk-tag.c	blk-mq-sched: add framework for MQ capable IO schedulers	2017-01-17 10:04:20 -07:00
blk-throttle.c	blk-throttle: add latency target support	2017-03-28 08:02:20 -06:00
blk-timeout.c	block: remove REQ_NO_TIMEOUT flag	2015-12-22 09:38:34 -07:00
blk-wbt.c	block: Fix list corruption of blk stats callback list	2017-04-11 08:09:14 -06:00
blk-wbt.h	block: track request size in blk_issue_stat	2017-03-28 08:02:20 -06:00
blk-zoned.c	block: Rename blk_queue_zone_size and bdev_zone_size	2017-01-12 07:58:32 -07:00
blk.h	blk-throttle: add a mechanism to estimate IO latency	2017-03-28 08:02:20 -06:00
bounce.c	Merge branch 'for-linus' of git://git.kernel.dk/linux-block	2015-09-19 18:57:09 -07:00
bsg-lib.c	block: split scsi_request out of struct request	2017-01-27 15:08:35 -07:00
bsg.c	lib/vsprintf.c: remove %Z support	2017-02-27 18:43:47 -08:00
cfq-iosched.c	cfq: Disable writeback throttling by default	2017-04-05 08:15:08 -06:00
cmdline-parser.c	block: remove unrelated header files and export symbol	2014-01-21 20:18:26 -08:00
compat_ioctl.c	block: remove the discard_zeroes_data flag	2017-04-08 11:25:38 -06:00
deadline-iosched.c	block: enumify ELEVATOR_*_MERGE	2017-02-08 13:43:06 -07:00
elevator.c	blk-mq-sched: fix crash in switch error path	2017-04-07 08:56:48 -06:00
genhd.c	block: Fix oops scsi_disk_get()	2017-03-22 20:11:37 -06:00
ioctl.c	block: remove the discard_zeroes_data flag	2017-04-08 11:25:38 -06:00
ioprio.c	sched/headers: Prepare to move the task_lock()/unlock() APIs to <linux/sched/task.h>	2017-03-02 08:42:38 +01:00
Kconfig	blk-throttle: add configure option for new .low interface	2017-03-28 08:02:20 -06:00
Kconfig.iosched	block, bfq: introduce the BFQ-v0 I/O scheduler as an extra scheduler	2017-04-19 08:29:02 -06:00
kyber-iosched.c	blk-mq: introduce Kyber multiqueue I/O scheduler	2017-04-14 14:06:58 -06:00
Makefile	block, bfq: introduce the BFQ-v0 I/O scheduler as an extra scheduler	2017-04-19 08:29:02 -06:00
mq-deadline.c	block: enumify ELEVATOR_*_MERGE	2017-02-08 13:43:06 -07:00
noop-iosched.c	block: move existing elevator ops to union	2017-01-17 10:03:33 -07:00
opal_proto.h	block/sed-opal: allocate struct opal_dev dynamically	2017-02-17 12:41:47 -07:00
partition-generic.c	block: Rename blk_queue_zone_size and bdev_zone_size	2017-01-12 07:58:32 -07:00
scsi_ioctl.c	block, scsi: move the retries field to struct scsi_request	2017-04-05 12:05:08 -06:00
sed-opal.c	block: sed-opal: Tone down all the pr_* to debugs	2017-04-07 14:24:16 -06:00
t10-pi.c	block: constify struct blk_integrity_profile	2017-03-24 20:34:39 -06:00