linux

korg/linux

mirror of https://mirrors.bfsu.edu.cn/git/linux.git synced 2024-11-11 12:28:41 +08:00

Author	SHA1	Message	Date
Tejun Heo	b243ddcbe9	block: move rq->start_time initialization to blk_rq_init() rq->start_time was initialized in init_request_from_bio() so special requests didn't have start_time set. This has been okay as start_time has been used only for fs requests; however, there is no indication of this actually is the case or not. Set rq->start_time in blk_rq_init() and guarantee that all initialized rq's have its start_time set. This improves consistency at virtually no cost and future changes will make use of the timestamp for !bio requests. [ Impact: rq->start_time is valid for all requests ] Signed-off-by: Tejun Heo <tj@kernel.org>	2009-04-28 07:37:35 +02:00
Tejun Heo	2e60e02297	block: clean up request completion API Request completion has gone through several changes and became a bit messy over the time. Clean it up. 1. end_that_request_data() is a thin wrapper around end_that_request_data_first() which checks whether bio is NULL before doing anything and handles bidi completion. blk_update_request() is a thin wrapper around end_that_request_data() which clears nr_sectors on the last iteration but doesn't use the bidi completion. Clean it up by moving the initial bio NULL check and nr_sectors clearing on the last iteration into end_that_request_data() and renaming it to blk_update_request(), which makes blk_end_io() the only user of end_that_request_data(). Collapse end_that_request_data() into blk_end_io(). 2. There are four visible completion variants - blk_end_request(), __blk_end_request(), blk_end_bidi_request() and end_request(). blk_end_request() and blk_end_bidi_request() uses blk_end_request() as the backend but __blk_end_request() and end_request() use separate implementation in __blk_end_request() due to different locking rules. blk_end_bidi_request() is identical to blk_end_io(). Collapse blk_end_io() into blk_end_bidi_request(), separate out request update into internal helper blk_update_bidi_request() and add __blk_end_bidi_request(). Redefine [__]blk_end_request() as thin inline wrappers around [__]blk_end_bidi_request(). 3. As the whole request issue/completion usages are about to be modified and audited, it's a good chance to convert completion functions return bool which better indicates the intended meaning of return values. 4. The function name end_that_request_last() is from the days when it was a public interface and slighly confusing. Give it a proper internal name - blk_finish_request(). 5. Add description explaning that blk_end_bidi_request() can be safely used for uni requests as suggested by Boaz Harrosh. The only visible behavior change is from #1. nr_sectors counts are cleared after the final iteration no matter which function is used to complete the request. I couldn't find any place where the code assumes those nr_sectors counters contain the values for the last segment and this change is good as it makes the API much more consistent as the end result is now same whether a request is completed using [__]blk_end_request() alone or in combination with blk_update_request(). API further cleaned up per Christoph's suggestion. [ Impact: cleanup, rq->*nr_sectors always updated after req completion ] Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: Boaz Harrosh <bharrosh@panasas.com> Cc: Christoph Hellwig <hch@infradead.org>	2009-04-28 07:37:35 +02:00
Tejun Heo	0b302d5aa7	block: kill blk_end_request_callback() With recent IDE updates, blk_end_request_callback() doesn't have any user now. Kill it. [ Impact: removal of unused convoluted interface ] Signed-off-by: Tejun Heo <tj@kernel.org>	2009-04-28 07:37:34 +02:00
Tejun Heo	158dbda006	block: reorganize request fetching functions Impact: code reorganization elv_next_request() and elv_dequeue_request() are public block layer interface than actual elevator implementation. They mostly deal with how requests interact with block layer and low level drivers at the beginning of rqeuest processing whereas __elv_next_request() is the actual eleveator request fetching interface. Move the two functions to blk-core.c. This prepares for further interface cleanup. Signed-off-by: Tejun Heo <tj@kernel.org>	2009-04-28 07:37:34 +02:00
Tejun Heo	5efccd17ce	block: reorder request completion functions Reorder request completion functions such that * All request completion functions are located together. * Functions which are used by only one caller is put right above the caller. * end_request() is put after other completion functions but before blk_update_request(). This change is for completion function cleanup which will follow. [ Impact: cleanup, code reorganization ] Signed-off-by: Tejun Heo <tj@kernel.org>	2009-04-28 07:37:34 +02:00
Tejun Heo	10732f5661	block: cleanup REQ_SOFTBARRIER usages blk_insert_request() doesn't need to worry about REQ_SOFTBARRIER. Don't set it. Combined with recent ide updates, REQ_SOFTBARRIER is now only used in elevator proper and for discard requests. [ Impact: cleanup ] Signed-off-by: Tejun Heo <tj@kernel.org>	2009-04-28 07:37:34 +02:00
Tejun Heo	e4025f6c21	block: don't set REQ_NOMERGE unnecessarily RQ_NOMERGE_FLAGS already clears defines which REQ flags aren't mergeable. There is no reason to specify it superflously. It only adds to confusion. Don't set REQ_NOMERGE for barriers and requests with specific queueing directive. REQ_NOMERGE is now exclusively used by the merging code. [ Impact: cleanup ] Signed-off-by: Tejun Heo <tj@kernel.org>	2009-04-28 07:37:33 +02:00
Tejun Heo	a7f5579234	block: kill blk_start_queueing() blk_start_queueing() is identical to __blk_run_queue() except that it doesn't check for recursion. None of the current users depends on blk_start_queueing() running request_fn directly. Replace usages of blk_start_queueing() with [__]blk_run_queue() and kill it. [ Impact: removal of mostly duplicate interface function ] Signed-off-by: Tejun Heo <tj@kernel.org>	2009-04-28 07:37:33 +02:00
Tejun Heo	a538cd03be	block: merge blk_invoke_request_fn() into __blk_run_queue() __blk_run_queue wraps blk_invoke_request_fn() such that it additionally removes plug and bails out early if the queue is empty. Both extra operations have their own pending mechanisms and don't cause any harm correctness-wise when they are done superflously. The only user of blk_invoke_request_fn() being blk_start_queue(), there isn't much reason to keep both functions around. Merge blk_invoke_request_fn() into __blk_run_queue() and make blk_start_queue() use __blk_run_queue() instead. [ Impact: merge two subtly different internal functions ] Signed-off-by: Tejun Heo <tj@kernel.org>	2009-04-28 07:37:33 +02:00
Tejun Heo	924cec7789	block: clear req->errors on bio completion only for fs requests Impact: subtle behavior change For fs requests, rq is only carrier of bios and rq error status as a whole doesn't mean much. This is the reason why rq->errors is being cleared on each partial completion of a request as on each partial completion the error status is transferred to the respective bios. For pc requests, rq->errors is used to carry error status to the issuer and thus __end_that_request_first() doesn't clear it on such cases. The condition was fine till now as only fs and pc requests have used bio and thus the bio completion path. However, future changes will unify data accesses to bio and all non fs users care about rq error status. Clear rq->errors on bio completion only for fs requests. In general, the implicit clearing is a bit too subtle especially as the meaning of rq->errors is completely dependent on low level drivers. Unifying / cleaning up rq->errors usage and letting llds manage it would be better. TODO comment added. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Jens Axboe <axboe@kernel.dk>	2009-04-28 07:37:28 +02:00
Jerome Marchand	42dad7647a	block: simplify I/O stat accounting This simplifies I/O stat accounting switching code and separates it completely from I/O scheduler switch code. Requests are accounted according to the state of their request queue at the time of the request allocation. There is no need anymore to flush the request queue when switching I/O accounting state. Signed-off-by: Jerome Marchand <jmarchan@redhat.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2009-04-24 08:54:21 +02:00
Linus Torvalds	c93f216b5b	Merge branch 'tracing-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'tracing-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: branch tracer, intel-iommu: fix build with CONFIG_BRANCH_TRACER=y branch tracer: Fix for enabling branch profiling makes sparse unusable ftrace: Correct a text align for event format output Update /debug/tracing/README tracing/ftrace: alloc the started cpumask for the trace file tracing, x86: remove duplicated #include ftrace: Add check of sched_stopped for probe_sched_wakeup function-graph: add proper initialization for init task tracing/ftrace: fix missing include string.h tracing: fix incorrect return type of ns2usecs() tracing: remove CALLER_ADDR2 from wakeup tracer blktrace: fix pdu_len when tracing packet command requests blktrace: small cleanup in blk_msg_write() blktrace: NUL-terminate user space messages tracing: move scripts/trace/power.pl to scripts/tracing/power.pl	2009-04-07 14:10:10 -07:00
Jens Axboe	2385327725	block: remove unused REQ_UNPLUG The request inherits the unplug flag from the bio, but it isn't actually used. The bio flag stops at __make_request(), which tells it to unplug after submission. Passing it on to the request doesn't make any sense. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2009-04-07 08:59:11 +02:00
Jerome Marchand	26308eab69	block: fix inconsistency in I/O stat accounting code This forces in_flight to be zero when turning off or on the I/O stat accounting and stops updating I/O stats in attempt_merge() when accounting is turned off. Signed-off-by: Jerome Marchand <jmarchan@redhat.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2009-04-07 08:12:38 +02:00
Jens Axboe	aeb6fafb8f	block: Add flag for telling the IO schedulers NOT to anticipate more IO By default, CFQ will anticipate more IO from a given io context if the previously completed IO was sync. This used to be fine, since the only sync IO was reads and O_DIRECT writes. But with more "normal" sync writes being used now, we don't want to anticipate for those. Add a bio/request flag that informs the IO scheduler that this is a sync request that we should not idle for. Introduce WRITE_ODIRECT specifically for O_DIRECT writes, and make sure that the other sync writes set this flag. Signed-off-by: Jens Axboe <jens.axboe@oracle.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-04-06 08:04:54 -07:00
Jens Axboe	644b2d99b7	block: enabling plugging on SSD devices that don't do queuing For the older SSD devices that don't do command queuing, we do want to enable plugging to get better merging. Signed-off-by: Jens Axboe <jens.axboe@oracle.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-04-06 08:04:54 -07:00
Jens Axboe	1faa16d228	block: change the request allocation/congestion logic to be sync/async based This makes sure that we never wait on async IO for sync requests, instead of doing the split on writes vs reads. Signed-off-by: Jens Axboe <jens.axboe@oracle.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-04-06 08:04:53 -07:00
Li Zefan	e2494e1b42	blktrace: fix pdu_len when tracing packet command requests Impact: output all of packet commands - not just the first 4 / 8 bytes Since commit `d7e3c3249e` ("block: add large command support"), struct request->cmd has been changed from unsinged char cmd[BLK_MAX_CDB] to unsigned char *cmd. v1 -> v2: by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> - make sure rq->cmd_len is always intialized, and then we can use rq->cmd_len instead of BLK_MAX_CDB. Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Acked-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Jens Axboe <jens.axboe@oracle.com> LKML-Reference: <49D4507E.2060602@cn.fujitsu.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-04-03 15:29:26 +02:00
Boaz Harrosh	1cd96c242a	block: WARN in __blk_put_request() for potential bio leak Put a WARN_ON in __blk_put_request if it is about to leak bio(s). This is a serious bug that can happen in error handling code paths. For this to work I have fixed a couple of places in block/ where request->bio != NULL ownership was not honored. And a small cleanup at sg_io() while at it. Signed-off-by: Boaz Harrosh <bharrosh@panasas.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2009-03-26 11:01:23 +01:00
Jens Axboe	50e1749310	block: get rid of unused blkdev_free_rq() define Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2009-03-24 12:35:16 +01:00
Jens Axboe	f3b144aa7f	block: remove various blk_queue_*() setting functions in blk_init_queue_node() It calls blk_queue_make_request(), which sets the identical set of limits. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2009-03-24 12:35:16 +01:00
Jens Axboe	fb8ec18c31	block: fix oops in blk_queue_io_stat() Some initial probe requests don't have disk->queue mapped yet, so we can't rely on a non-NULL queue in blk_queue_io_stat(). Wrap it in blk_do_io_stat(). Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2009-02-02 08:42:32 +01:00
Jens Axboe	bc58ba9468	block: add sysfs file for controlling io stats accounting This allows us to turn off disk stat accounting completely, for the cases where the 0.5-1% reduction in system time is important. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2009-01-30 12:34:38 +01:00
Jens Axboe	cec0707e40	block: silently error an unsupported barrier bio This fixes a "regression" from 2.6.28, where the barrier probes that file systems may do would trigger additional end request warnings in dmesg. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2009-01-30 12:34:37 +01:00
Jens Axboe	213d9417fe	block: seperate bio/request unplug and sync bits Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2009-01-30 12:34:37 +01:00
Jens Axboe	a31a97381c	block: don't use plugging on SSD devices We just want to hand the first bits of IO to the device as fast as possible. Gains a few percent on the IOPS rate. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2008-12-29 08:28:45 +01:00
Tejun Heo	a7384677b2	block: remove duplicate or unused barrier/discard error paths * Because barrier mode can be changed dynamically, whether barrier is supported or not can be determined only when actually issuing the barrier and there is no point in checking it earlier. Drop barrier support check in generic_make_request() and __make_request(), and update comment around the support check in blk_do_ordered(). * There is no reason to check discard support in both generic_make_request() and __make_request(). Drop the check in __make_request(). While at it, move error action block to the end of the function and add unlikely() to q existence test. * Barrier request, be it empty or not, is never passed to low level driver and thus it's meaningless to try to copy back req->sector to bio->bi_sector on error. In addition, the notion of failed sector doesn't make any sense for empty barrier to begin with. Drop the code block from __end_that_request_first(). Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2008-12-29 08:28:44 +01:00
Cheng Renquan	64d01dc9e1	block: use cancel_work_sync() instead of kblockd_flush_work() After many improvements on kblockd_flush_work, it is now identical to cancel_work_sync, so a direct call to cancel_work_sync is suggested. The only difference is that cancel_work_sync is a GPL symbol, so no non-GPL modules anymore. Signed-off-by: Cheng Renquan <crquan@gmail.com> Cc: Jens Axboe <jens.axboe@oracle.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2008-12-29 08:28:44 +01:00
Keith Mannthey	08bafc0341	block: Supress Buffer I/O errors when SCSI REQ_QUIET flag set Allow the scsi request REQ_QUIET flag to be propagated to the buffer file system layer. The basic ideas is to pass the flag from the scsi request to the bio (block IO) and then to the buffer layer. The buffer layer can then suppress needless printks. This patch declutters the kernel log by removed the 40-50 (per lun) buffer io error messages seen during a boot in my multipath setup . It is a good chance any real errors will be missed in the "noise" it the logs without this patch. During boot I see blocks of messages like " __ratelimit: 211 callbacks suppressed Buffer I/O error on device sdm, logical block 5242879 Buffer I/O error on device sdm, logical block 5242879 Buffer I/O error on device sdm, logical block 5242847 Buffer I/O error on device sdm, logical block 1 Buffer I/O error on device sdm, logical block 5242878 Buffer I/O error on device sdm, logical block 5242879 Buffer I/O error on device sdm, logical block 5242879 Buffer I/O error on device sdm, logical block 5242879 Buffer I/O error on device sdm, logical block 5242879 Buffer I/O error on device sdm, logical block 5242872 " in my logs. My disk environment is multipath fiber channel using the SCSI_DH_RDAC code and multipathd. This topology includes an "active" and "ghost" path for each lun. IO's to the "ghost" path will never complete and the SCSI layer, via the scsi device handler rdac code, quick returns the IOs to theses paths and sets the REQ_QUIET scsi flag to suppress the scsi layer messages. I am wanting to extend the QUIET behavior to include the buffer file system layer to deal with these errors as well. I have been running this patch for a while now on several boxes without issue. A few runs of bonnie++ show no noticeable difference in performance in my setup. Thanks for John Stultz for the quiet_error finalization. Submitted-by: Keith Mannthey <kmannth@us.ibm.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2008-12-29 08:28:44 +01:00
Jens Axboe	70ed28b92a	block: leave the request timeout timer running even on an empty list For sync IO, we'll often do them serialized. This means we'll be touching the queue timer for every IO, as opposed to only occasionally like we do for queued IO. Instead of deleting the timer when the last request is removed, just let continue running. If a new request comes up soon we then don't have to readd the timer again. If no new requests arrive, the timer will expire without side effect later. This improves high iops sync IO by ~1%. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2008-12-29 08:28:42 +01:00
Ingo Molnar	970987beb9	Merge branches 'tracing/ftrace', 'tracing/function-graph-tracer' and 'tracing/urgent' into tracing/core	2008-12-05 14:45:22 +01:00
Milan Broz	0e435ac26e	block: fix setting of max_segment_size and seg_boundary mask Fix setting of max_segment_size and seg_boundary mask for stacked md/dm devices. When stacking devices (LVM over MD over SCSI) some of the request queue parameters are not set up correctly in some cases by default, namely max_segment_size and and seg_boundary mask. If you create MD device over SCSI, these attributes are zeroed. Problem become when there is over this mapping next device-mapper mapping - queue attributes are set in DM this way: request_queue max_segment_size seg_boundary_mask SCSI 65536 0xffffffff MD RAID1 0 0 LVM 65536 -1 (64bit) Unfortunately bio_add_page (resp. bio_phys_segments) calculates number of physical segments according to these parameters. During the generic_make_request() is segment cout recalculated and can increase bio->bi_phys_segments count over the allowed limit. (After bio_clone() in stack operation.) Thi is specially problem in CCISS driver, where it produce OOPS here BUG_ON(creq->nr_phys_segments > MAXSGENTRIES); (MAXSEGENTRIES is 31 by default.) Sometimes even this command is enough to cause oops: dd iflag=direct if=/dev/<vg>/<lv> of=/dev/null bs=128000 count=10 This command generates bios with 250 sectors, allocated in 32 4k-pages (last page uses only 1024 bytes). For LVM layer, it allocates bio with 31 segments (still OK for CCISS), unfortunatelly on lower layer it is recalculated to 32 segments and this violates CCISS restriction and triggers BUG_ON(). The patch tries to fix it by: * initializing attributes above in queue request constructor blk_queue_make_request() * make sure that blk_queue_stack_limits() inherits setting (DM uses its own function to set the limits because it blk_queue_stack_limits() was introduced later. It should probably switch to use generic stack limit function too.) * sets the default seg_boundary value in one place (blkdev.h) * use this mask as default in DM (instead of -1, which differs in 64bit) Bugs related to this: https://bugzilla.redhat.com/show_bug.cgi?id=471639 http://bugzilla.kernel.org/show_bug.cgi?id=8672 Signed-off-by: Milan Broz <mbroz@redhat.com> Reviewed-by: Alasdair G Kergon <agk@redhat.com> Cc: Neil Brown <neilb@suse.de> Cc: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Cc: Tejun Heo <htejun@gmail.com> Cc: Mike Miller <mike.miller@hp.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2008-12-03 12:55:55 +01:00
Tejun Heo	53a08807c0	block: internal dequeue shouldn't start timer blkdev_dequeue_request() and elv_dequeue_request() are equivalent and both start the timeout timer. Barrier code dequeues the original barrier request but doesn't passes the request itself to lower level driver, only broken down proxy requests; however, as the original barrier code goes through the same dequeue path and timeout timer is started on it. If barrier sequence takes long enough, this timer expires but the low level driver has no idea about this request and oops follows. Timeout timer shouldn't have been started on the original barrier request as it never goes through actual IO. This patch unexports elv_dequeue_request(), which has no external user anyway, and makes it operate on elevator proper w/o adding the timer and make blkdev_dequeue_request() call elv_dequeue_request() and add timer. Internal users which don't pass the request to driver - barrier code and end_that_request_last() - are converted to use elv_dequeue_request(). Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Mike Anderson <andmike@linux.vnet.ibm.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2008-12-03 12:41:26 +01:00
Ingo Molnar	0bfc24559d	blktrace: port to tracepoints, update Port to the new tracepoints API: split DEFINE_TRACE() and DECLARE_TRACE() sites. Spread them out to the usage sites, as suggested by Mathieu Desnoyers. Signed-off-by: Ingo Molnar <mingo@elte.hu> Acked-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>	2008-11-26 13:04:35 +01:00
Arnaldo Carvalho de Melo	5f3ea37c77	blktrace: port to tracepoints This was a forward port of work done by Mathieu Desnoyers, I changed it to encode the 'what' parameter on the tracepoint name, so that one can register interest in specific events and not on classes of events to then check the 'what' parameter. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-11-26 12:13:34 +01:00
Mike Anderson	e78042e5b8	blk: move blk_delete_timer call in end_that_request_last Move the calling blk_delete_timer to later in end_that_request_last to address an issue where blkdev_dequeue_request may have add a timer for the request. Signed-off-by: Mike Anderson <andmike@linux.vnet.ibm.com> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2008-11-06 08:41:56 +01:00
Linus Torvalds	c53dbf5486	Merge branch 'for-linus' of git://git.kernel.dk/linux-2.6-block * 'for-linus' of git://git.kernel.dk/linux-2.6-block: block: remove __generic_unplug_device() from exports block: move q->unplug_work initialization blktrace: pass zfcp driver data blktrace: add support for driver data block: fix current kernel-doc warnings block: only call ->request_fn when the queue is not stopped block: simplify string handling in elv_iosched_store() block: fix kernel-doc for blk_alloc_devt() block: fix nr_phys_segments miscalculation bug block: add partition attribute for partition number block: add BIG FAT WARNING to CONFIG_DEBUG_BLOCK_EXT_DEVT softirq: Add support for triggering softirq work on softirqs.	2008-10-17 09:29:55 -07:00
Jens Axboe	f73e2d13a1	block: remove __generic_unplug_device() from exports The only out-of-core user is IDE, and that should be using blk_start_queueing() instead. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2008-10-17 14:03:08 +02:00
Peter Zijlstra	713ada9ba9	block: move q->unplug_work initialization modprobe loop; rmmod loop effectively creates a blk_queue and destroys it which results in q->unplug_work being canceled without it ever being initialized. Therefore, move the initialization of q->unplug_work from blk_queue_make_request() to blk_alloc_queue*(). Reported-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2008-10-17 08:46:57 +02:00
Randy Dunlap	496aa8a98f	block: fix current kernel-doc warnings Fix block kernel-doc warnings: Warning(linux-2.6.27-git4//fs/block_dev.c:1272): No description found for parameter 'path' Warning(linux-2.6.27-git4//block/blk-core.c:1021): No description found for parameter 'cpu' Warning(linux-2.6.27-git4//block/blk-core.c:1021): No description found for parameter 'part' Warning(/var/linsrc/linux-2.6.27-git4//block/genhd.c:544): No description found for parameter 'partno' Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2008-10-17 08:46:57 +02:00
Jens Axboe	80a4b58e36	block: only call ->request_fn when the queue is not stopped Callers should use either blk_run_queue/__blk_run_queue, or blk_start_queueing() to invoke request handling instead of calling ->request_fn() directly as that does not take the queue stopped flag into account. Also add appropriate comments on the above functions to detail their usage. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2008-10-17 08:46:57 +02:00
Mike Christie	6000a368cd	[SCSI] block: separate failfast into multiple bits. Multipath is best at handling transport errors. If it gets a device error then there is not much the multipath layer can do. It will just access the same device but from a different path. This patch breaks up failfast into device, transport and driver errors. The multipath layers (md and dm mutlipath) only ask the lower levels to fast fail transport errors. The user of failfast, read ahead, will ask to fast fail on all errors. Note that blk_noretry_request will return true if any failfast bit is set. This allows drivers that do not support the multipath failfast bits to continue to fail on any failfast error like before. Drivers like scsi that are able to fail fast specific errors can check for the specific fail fast type. In the next patch I will convert scsi. Signed-off-by: Mike Christie <michaelc@cs.wisc.edu> Cc: Jens Axboe <jens.axboe@oracle.com> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>	2008-10-13 09:28:52 -04:00
Kiyoshi Ueda	d00e29fd99	block: remove end_{queued\|dequeued}_request() This patch removes end_queued_request() and end_dequeued_request(), which are no longer used. As a results, users of __end_request() became only end_request(). So the actual code in __end_request() is moved to end_request() and __end_request() is removed. Signed-off-by: Kiyoshi Ueda <k-ueda@ct.jp.nec.com> Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2008-10-09 08:56:21 +02:00
Kiyoshi Ueda	ef9e3facdf	block: add lld busy state exporting interface This patch adds an new interface, blk_lld_busy(), to check lld's busy state from the block layer. blk_lld_busy() calls down into low-level drivers for the checking if the drivers set q->lld_busy_fn() using blk_queue_lld_busy(). This resolves a performance problem on request stacking devices below. Some drivers like scsi mid layer stop dispatching request when they detect busy state on its low-level device like host/target/device. It allows other requests to stay in the I/O scheduler's queue for a chance of merging. Request stacking drivers like request-based dm should follow the same logic. However, there is no generic interface for the stacked device to check if the underlying device(s) are busy. If the request stacking driver dispatches and submits requests to the busy underlying device, the requests will stay in the underlying device's queue without a chance of merging. This causes performance problem on burst I/O load. With this patch, busy state of the underlying device is exported via q->lld_busy_fn(). So the request stacking driver can check it and stop dispatching requests if busy. The underlying device driver must return the busy state appropriately: 1: when the device driver can't process requests immediately. 0: when the device driver can process requests immediately, including abnormal situations where the device driver needs to kill all requests. Signed-off-by: Kiyoshi Ueda <k-ueda@ct.jp.nec.com> Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2008-10-09 08:56:20 +02:00
Elias Oltmanns	336c3d8ce7	block: Fix blk_start_queueing() to not kick a stopped queue blk_start_queueing() should act like the generic queue unplugging and kicking and ignore a stopped queue. Such a queue may not be run until after a call to blk_start_queue(). Signed-off-by: Elias Oltmanns <eo@nebensachen.de> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2008-10-09 08:56:20 +02:00
Kiyoshi Ueda	4ee5eaf451	block: add a queue flag for request stacking support This patch adds a queue flag to indicate the block device can be used for request stacking. Request stacking drivers need to stack their devices on top of only devices of which q->request_fn is functional. Since bio stacking drivers (e.g. md, loop) basically initialize their queue using blk_alloc_queue() and don't set q->request_fn, the check of (q->request_fn == NULL) looks enough for that purpose. However, dm will become both types of stacking driver (bio-based and request-based). And dm will always set q->request_fn even if the dm device is bio-based of which q->request_fn is not functional actually. So we need something else to distinguish the type of the device. Adding a queue flag is a solution for that. The reason why dm always sets q->request_fn is to keep the compatibility of dm user-space tools. Currently, all dm user-space tools are using bio-based dm without specifying the type of the dm device they use. To use request-based dm without changing such tools, the kernel must decide the type of the dm device automatically. The automatic type decision can't be done at the device creation time and needs to be deferred until such tools load a mapping table, since the actual type is decided by dm target type included in the mapping table. So a dm device has to be initialized using blk_init_queue() so that we can load either type of table. Then, all queue stuffs are set (e.g. q->request_fn) and we have no element to distinguish that it is bio-based or request-based, even after a table is loaded and the type of the device is decided. By the way, some stuffs of the queue (e.g. request_list, elevator) are needless when the dm device is used as bio-based. But the memory size is not so large (about 20[KB] per queue on ia64), so I hope the memory loss can be acceptable for bio-based dm users. Signed-off-by: Kiyoshi Ueda <k-ueda@ct.jp.nec.com> Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2008-10-09 08:56:18 +02:00
Kiyoshi Ueda	82124d6035	block: add request submission interface This patch adds blk_insert_cloned_request(), a generic request submission interface for request stacking drivers. Request-based dm will use it to submit their clones to underlying devices. blk_rq_check_limits() is also added because it is possible that the lower queue has stronger limitations than the upper queue if multiple drivers are stacking at request-level. Not only for blk_insert_cloned_request()'s internal use, the function will be used by request-based dm when the queue limitation is modified (e.g. by replacing dm's table). Signed-off-by: Kiyoshi Ueda <k-ueda@ct.jp.nec.com> Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2008-10-09 08:56:18 +02:00
Kiyoshi Ueda	32fab448e5	block: add request update interface This patch adds blk_update_request(), which updates struct request with completing its data part, but doesn't complete the struct request itself. Though it looks like end_that_request_first() of older kernels, blk_update_request() should be used only by request stacking drivers. Request-based dm will use it in bio->bi_end_io callback to update the original request when a data part of a cloned request completes. Followings are additional background information of why request-based dm needs this interface. - Request stacking drivers can't use blk_end_request() directly from the lower driver's completion context (bio->bi_end_io or rq->end_io), because some device drivers (e.g. ide) may try to complete their request with queue lock held, and it may cause deadlock. See below for detailed description of possible deadlock: <http://marc.info/?l=linux-kernel&m=120311479108569&w=2> - To solve that, request-based dm offloads the completion of cloned struct request to softirq context (i.e. using blk_complete_request() from rq->end_io). - Though it is possible to use the same solution from bio->bi_end_io, it will delay the notification of bio completion to the original submitter. Also, it will cause inefficient partial completion, because the lower driver can't perform the cloned request anymore and request-based dm needs to requeue and redispatch it to the lower driver again later. That's not good. - So request-based dm needs blk_update_request() to perform the bio completion in the lower driver's completion context, which is more efficient. Signed-off-by: Kiyoshi Ueda <k-ueda@ct.jp.nec.com> Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2008-10-09 08:56:18 +02:00
Jens Axboe	e3335de940	block: blk_cleanup_queue() should call blk_sync_queue() When a driver calls blk_cleanup_queue(), the device should be fully idle. However, the block layer may have pending plugging timers and the IO schedulers may have pending work in the work queues. So quisce the device by waiting for the timer and flushing the work queues. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2008-10-09 08:56:18 +02:00
Jens Axboe	242f9dcb8b	block: unify request timeout handling Right now SCSI and others do their own command timeout handling. Move those bits to the block layer. Instead of having a timer per command, we try to be a bit more clever and simply have one per-queue. This avoids the overhead of having to tear down and setup a timer for each command, so it will result in a lot less timer fiddling. Signed-off-by: Mike Anderson <andmike@linux.vnet.ibm.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2008-10-09 08:56:13 +02:00

1 2 3

104 Commits