linux

korg/linux

mirror of https://mirrors.bfsu.edu.cn/git/linux.git synced 2024-11-18 17:54:13 +08:00

Author	SHA1	Message	Date
Jens Axboe	d993831fa7	writeback: add name to backing_dev_info This enables us to track who does what and print info. Its main use is catching dirty inodes on the default_backing_dev_info, so we can fix that up. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2009-09-11 09:20:26 +02:00
Nikanth Karthikesan	c295fc0578	block: Allow changing max_sectors_kb above the default 512 The patch "block: Use accessor functions for queue limits" (`ae03bf639a`) changed queue_max_sectors_store() to use blk_queue_max_sectors() instead of directly assigning the value. But blk_queue_max_sectors() differs a bit 1. It sets both max_sectors_kb, and max_hw_sectors_kb 2. Never allows one to change max_sectors_kb above BLK_DEF_MAX_SECTORS. If one specifies a value greater then max_hw_sectors is set to that value but max_sectors is set to BLK_DEF_MAX_SECTORS I am not sure whether blk_queue_max_sectors() should be changed, as it seems to be that way for a long time. And there may be callers dependent on that behaviour. This patch simply reverts to the older way of directly assigning the value to max_sectors as it was before. Signed-off-by: Nikanth Karthikesan <knikanth@suse.de> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2009-09-01 22:40:15 +02:00
John Stoffel	14d9fa3525	Make SCSI SG v4 driver enabled by default and remove EXPERIMENTAL dependency, since udev depends on BSG Make Block Layer SG support v4 the default, since recent udev versions depend on this to access serial numbers and other low level info properly. This should be backported to older kernels as well, since most distros have enabled this for a long time. Signed-off-by: John Stoffel <john@stoffel.org> Cc: stable@kernel.org Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2009-08-04 22:10:17 +02:00
Martin K. Petersen	7e5f5fb09e	block: Update topology documentation Update topology comments and sysfs documentation based upon discussions with Neil Brown. Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2009-08-01 10:24:35 +02:00
Martin K. Petersen	70dd5bf3b9	block: Stack optimal I/O size When stacking block devices ensure that optimal I/O size is scaled accordingly. Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2009-08-01 10:24:35 +02:00
Martin K. Petersen	7c958e3264	block: Add a wrapper for setting minimum request size without a queue Introduce blk_limits_io_min() and make blk_queue_io_min() call it. Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2009-08-01 10:24:35 +02:00
Martin K. Petersen	fef246672b	block: Make blk_queue_stack_limits use the new stacking interface blk_queue_stack_limits() has been superceded by blk_stack_limits() and disk_stack_limits(). Wrap the function call for now, we'll deprecate it later. Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2009-08-01 10:24:35 +02:00
Jens Axboe	56ad1740d9	block: make the end_io functions be non-GPL exports Prior to the change for more sane end_io functions, we exported the helpers with the normal EXPORT_SYMBOL(). That got changed to _GPL() for the new interface. Revert that particular change, on the basis that this is basic functionality and doesn't dip into internal structures. If these exports can't be non-GPL, then we may as well make EXPORT_SYMBOL() imply GPL for everything. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2009-07-28 22:11:24 +02:00
Xiaotian Feng	3839e4b29b	block: fix improper kobject release in blk_integrity_unregister blk_integrity_unregister should use kobject_put to release the kobject, otherwise after bi is freed, memory of bi->kobj->name is leaked. Signed-off-by: Xiaotian Feng <dfeng@redhat.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2009-07-28 09:11:14 +02:00
Jens Axboe	a4e7d46407	block: always assign default lock to queues Move the assignment of a default lock below blk_init_queue() to blk_queue_make_request(), so we also get to set the default lock for ->make_request_fn() based drivers. This is important since the queue flag locking requires a lock to be in place. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2009-07-28 09:07:29 +02:00
Xiaotian Feng	9cb308ce8d	block: sysfs fix mismatched queue_var_{store,show} in 64bit kernel In blk-sysfs.c, queue_var_store uses unsigned long to store data, but queue_var_show uses unsigned int to show data. This causes, # echo 70000000000 > /sys/block/<dev>/queue/read_ahead_kb # cat /sys/block/<dev>/queue/read_ahead_kb => get wrong value Fix it by using unsigned long. While at it, convert queue_rq_affinity_show() such that it uses bool variable instead of explicit != 0 testing. Signed-off-by: Xiaotian Feng <dfeng@redhat.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2009-07-17 16:54:15 +09:00
Tejun Heo	0a09f4319c	block: fix failfast merge testing in elv_rq_merge_ok() Commit `ab0fd1debe` tries to prevent merge of requests with different failfast settings. In elv_rq_merge_ok(), it compares new bio's failfast flags against the merge target request's. However, the flag testing accessors for bio and blk don't return boolean but the tested bit value directly and FAILFAST on bio and blk don't match, so directly comparing them with == results in false negative unnecessary preventing merge of readahead requests. This patch convert the results to boolean by negating them before comparison. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Jens Axboe <jens.axboe@oracle.com> Cc: Boaz Harrosh <bharrosh@panasas.com> Cc: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Cc: James Bottomley <James.Bottomley@HansenPartnership.com> Cc: Jeff Garzik <jeff@garzik.org>	2009-07-17 14:50:43 +09:00
Vivek Goyal	32f2e807a3	cfq-iosched: reset oom_cfqq in cfq_set_request() In case memory is scarce, we now default to oom_cfqq. Once memory is available again, we should allocate a new cfqq and stop using oom_cfqq for a particular io context. Once a new request comes in, check if we are using oom_cfqq, and if yes, try to allocate a new cfqq. Tested the patch by forcing the use of oom_cfqq and upon next request thread realized that it was using oom_cfqq and it allocated a new cfqq. Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2009-07-10 20:31:54 +02:00
FUJITA Tomonori	76da03467a	block: call blk_scsi_ioctl_init() Currently, blk_scsi_ioctl_init() is not called since it lacks an initcall marking. This causes the command table to be unitialized, hence somce commands are block when they should not have been. This fixes a regression introduced by commit `018e044689` Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2009-07-10 20:31:53 +02:00
Tejun Heo	ab0fd1debe	block: don't merge requests of different failfast settings Block layer used to merge requests and bios with different failfast settings. This caused regular IOs to fail prematurely when they were merged into failfast requests for readahead. Niel Lambrechts could trigger the problem semi-reliably on ext4 when resuming from STR. ext4 uses readahead when reading inodes and combined with the deterministic extra SATA PHY exception cycle during resume on the specific configuration, non-readahead inode read would fail causing ext4 errors. Please read the following thread for details. http://lkml.org/lkml/2009/5/23/21 This patch makes block layer reject merging if the failfast settings don't match. This is correct but likely to lower IO performance by preventing regular IOs from mingling into surrounding readahead requests. Changes to allow such mixed merges and handle errors correctly will be added later. Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: Niel Lambrechts <niel.lambrechts@gmail.com> Cc: Theodore Tso <tytso@mit.edu> Signed-off-by: Jens Axboe <axboe@carl.(none)>	2009-07-03 21:06:45 +02:00
Shan Wei	b706f64281	cfq-iosched: remove redundant check for NULL cfqq in cfq_set_request() With the changes for falling back to an oom_cfqq, we never fail to find/allocate a queue in cfq_get_queue(). So remove the check. Signed-off-by: Shan Wei <shanwei@cn.fujitsu.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2009-07-01 12:41:14 +02:00
NeilBrown	db64f680ba	blocK: Restore barrier support for md and probably other virtual devices. The next_ordered flag is only meaningful for devices that use __make_request. So move the test against next_ordered out of generic code and in to __make_request Since this test was added, barriers have not worked on md or any devices that don't use __make_request and so don't bother to set next_ordered. (dm explicitly sets something other than QUEUE_ORDERED_NONE since commit `99360b4c18` but notes in the comments that it is otherwise meaningless). Cc: Ken Milmore <ken.milmore@googlemail.com> Cc: stable@kernel.org Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2009-07-01 10:56:26 +02:00
Jens Axboe	018e044689	block: get rid of queue-private command filter The initial patches to support this through sysfs export were broken and have been if 0'ed out in any release. So lets just kill the code and reclaim some space in struct request_queue, if anyone would later like to fixup the sysfs bits, the git history can easily restore the removed bits. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2009-07-01 10:56:26 +02:00
Martin K. Petersen	7878cba9f0	block: Create bip slabs with embedded integrity vectors This patch restores stacking ability to the block layer integrity infrastructure by creating a set of dedicated bip slabs. Each bip slab has an embedded bio_vec array at the end. This cuts down on memory allocations and also simplifies the code compared to the original bvec version. Only the largest bip slab is backed by a mempool. The pool is contained in the bio_set so stacking drivers can ensure forward progress. Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Jens Axboe <axboe@carl.(none)>	2009-07-01 10:56:25 +02:00
Jens Axboe	6118b70b3a	cfq-iosched: get rid of the need for __GFP_NOFAIL in cfq_find_alloc_queue() Setup an emergency fallback cfqq that we allocate at IO scheduler init time. If the slab allocation fails in cfq_find_alloc_queue(), we'll just punt IO to that cfqq instead. This ensures that cfq_find_alloc_queue() never fails without having to ensure free memory. On cfqq lookup, always try to allocate a new cfqq if the given cfq io context has the oom_cfqq assigned. This ensures that we only temporarily punt to this shared queue. Reviewed-by: Jeff Moyer <jmoyer@redhat.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2009-07-01 10:56:25 +02:00
Jens Axboe	d5036d770f	cfq-iosched: move cfqq initialization out of cfq_find_alloc_queue() We're going to be needing that init code outside of that function to get rid of the __GFP_NOFAIL in cfqq allocation. Reviewed-by: Jeff Moyer <jmoyer@redhat.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2009-07-01 10:56:25 +02:00
FUJITA Tomonori	1119865935	block: revert "bsg: setting rq->bio to NULL" The SMP handler (sas_smp_request) was fixed to use the block API properly, so we don't need this workaround to avoid blk_put_request() warning. Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>	2009-06-21 10:52:42 -05:00
Randy Dunlap	f740f5ca05	Fix kernel-doc parameter name typo in blk-settings.c: Warning(block/blk-settings.c:108): No description found for parameter 'lim' Warning(block/blk-settings.c:108): Excess function parameter 'limits' description in 'blk_set_default_limits' Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2009-06-19 09:18:32 +02:00
Bartlomiej Zolnierkiewicz	90c699a9ee	block: rename CONFIG_LBD to CONFIG_LBDAF Follow-up to "block: enable by default support for large devices and files on 32-bit archs". Rename CONFIG_LBD to CONFIG_LBDAF to: - allow update of existing [def]configs for "default y" change - reflect that it is used also for large files support nowadays Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2009-06-19 08:08:50 +02:00
Martin K. Petersen	3a02c8e814	block: Fix bounce_pfn setting Correct stacking bounce_pfn limit setting and prevent warnings on 32-bit. Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2009-06-18 09:56:20 +02:00
Linus Torvalds	6fd03301d7	Merge git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core-2.6 * git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core-2.6: (64 commits) debugfs: use specified mode to possibly mark files read/write only debugfs: Fix terminology inconsistency of dir name to mount debugfs filesystem. xen: remove driver_data direct access of struct device from more drivers usb: gadget: at91_udc: remove driver_data direct access of struct device uml: remove driver_data direct access of struct device block/ps3: remove driver_data direct access of struct device s390: remove driver_data direct access of struct device parport: remove driver_data direct access of struct device parisc: remove driver_data direct access of struct device of_serial: remove driver_data direct access of struct device mips: remove driver_data direct access of struct device ipmi: remove driver_data direct access of struct device infiniband: ehca: remove driver_data direct access of struct device ibmvscsi: gadget: at91_udc: remove driver_data direct access of struct device hvcs: remove driver_data direct access of struct device xen block: remove driver_data direct access of struct device thermal: remove driver_data direct access of struct device scsi: remove driver_data direct access of struct device pcmcia: remove driver_data direct access of struct device PCIE: remove driver_data direct access of struct device ... Manually fix up trivial conflicts due to different direct driver_data direct access fixups in drivers/block/{ps3disk.c,ps3vram.c}	2009-06-16 12:57:37 -07:00
Li Zefan	e212d6f250	block: remove some includings of blktrace_api.h When porting blktrace to tracepoints, we changed to trace/block.h for trace prober declarations. Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2009-06-16 11:19:36 +02:00
Martin K. Petersen	e475bba2fd	block: Introduce helper to reset queue limits to default values DM reuses the request queue when swapping in a new device table Introduce blk_set_default_limits() which can be used to reset the the queue_limits prior to stacking devices. Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Acked-by: Alasdair G Kergon <agk@redhat.com> Acked-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2009-06-16 08:23:52 +02:00
Jeff Moyer	6923715ae3	cfq: remove extraneous '\n' in blktrace output I noticed a blank line in blktrace output. This patch fixes that. Signed-off-by: Jeff Moyer <jmoyer@redhat.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2009-06-16 08:21:04 +02:00
Jens Axboe	0989a025d2	block: don't overwrite bdi->state after bdi_init() has been run Move the defaults to where we do the init of the backing_dev_info. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2009-06-16 08:21:03 +02:00
Gui Jianfeng	81be834713	cfq: cleanup for last_end_request in cfq_data Actually, last_end_request in cfq_data isn't used now. So lets just remove it. Signed-off-by: Gui Jianfeng <guijianfeng@cn.fujitsu.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2009-06-16 08:21:03 +02:00
Kay Sievers	2bdf914915	Driver Core: bsg: add nodename for bsg driver This adds support to the BSG driver to report the proper device name to userspace for the bsg devices. Signed-off-by: Kay Sievers <kay.sievers@vrfy.org> Signed-off-by: Jan Blunck <jblunck@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2009-06-15 21:30:26 -07:00
Kay Sievers	b03f38b685	Driver Core: block: add nodename support for block drivers. This adds support for block drivers to report their requested nodename to userspace. It also updates a number of block drivers to provide the needed subdirectory and device name to be used for them. Signed-off-by: Kay Sievers <kay.sievers@vrfy.org> Signed-off-by: Jan Blunck <jblunck@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2009-06-15 21:30:25 -07:00
Randy Dunlap	8ebf975608	block: fix kernel-doc in recent block/ changes Fix kernel-doc warnings in recently changed block/ source code. Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-06-11 20:14:23 -07:00
Linus Torvalds	c9059598ea	Merge branch 'for-2.6.31' of git://git.kernel.dk/linux-2.6-block * 'for-2.6.31' of git://git.kernel.dk/linux-2.6-block: (153 commits) block: add request clone interface (v2) floppy: fix hibernation ramdisk: remove long-deprecated "ramdisk=" boot-time parameter fs/bio.c: add missing __user annotation block: prevent possible io_context->refcount overflow Add serial number support for virtio_blk, V4a block: Add missing bounce_pfn stacking and fix comments Revert "block: Fix bounce limit setting in DM" cciss: decode unit attention in SCSI error handling code cciss: Remove no longer needed sendcmd reject processing code cciss: change SCSI error handling routines to work with interrupts enabled. cciss: separate error processing and command retrying code in sendcmd_withirq_core() cciss: factor out fix target status processing code from sendcmd functions cciss: simplify interface of sendcmd() and sendcmd_withirq() cciss: factor out core of sendcmd_withirq() for use by SCSI error handling code cciss: Use schedule_timeout_uninterruptible in SCSI error handling code block: needs to set the residual length of a bidi request Revert "block: implement blkdev_readpages" block: Fix bounce limit setting in DM Removed reference to non-existing file Documentation/PCI/PCI-DMA-mapping.txt ... Manually fix conflicts with tracing updates in: block/blk-sysfs.c drivers/ide/ide-atapi.c drivers/ide/ide-cd.c drivers/ide/ide-floppy.c drivers/ide/ide-tape.c include/trace/events/block.h kernel/trace/blktrace.c	2009-06-11 11:10:35 -07:00
Linus Torvalds	27951daa71	Merge branch 'for-2.6.31' of git://git.kernel.org/pub/scm/linux/kernel/git/bart/ide-2.6 * 'for-2.6.31' of git://git.kernel.org/pub/scm/linux/kernel/git/bart/ide-2.6: (28 commits) ide-tape: fix debug call alim15x3: Remove historical hacks, re-enable init_hwif for PowerPC ide-dma: don't reset request fields on dma_timeout_retry() ide: drop rq->data handling from ide_map_sg() ide-atapi: kill unused fields and callbacks ide-tape: simplify read/write functions ide-tape: use byte size instead of sectors on rw issue functions ide-tape: unify r/w init paths ide-tape: kill idetape_bh ide-tape: use standard data transfer mechanism ide-tape: use single continuous buffer ide-atapi,tape,floppy: allow ->pc_callback() to change rq->data_len ide-tape,floppy: fix failed command completion after request sense ide-pm: don't abuse rq->data ide-cd,atapi: use bio for internal commands ide-atapi: convert ide-{floppy,tape} to using preallocated sense buffer ide-cd: convert to using generic sense request ide: add helpers for preparing sense requests ide-cd: don't abuse rq->buffer ide-atapi: don't abuse rq->buffer ...	2009-06-11 10:00:03 -07:00
Kiyoshi Ueda	b0fd271d5f	block: add request clone interface (v2) This patch adds the following 2 interfaces for request-stacking drivers: - blk_rq_prep_clone(struct request clone, struct request orig, struct bio_set bs, gfp_t gfp_mask, int (bio_ctr)(struct bio , struct bio, void ), void data) * Clones bios in the original request to the clone request (bio_ctr is called for each cloned bios.) * Copies attributes of the original request to the clone request. The actual data parts (e.g. ->cmd, ->buffer, ->sense) are not copied. - blk_rq_unprep_clone(struct request clone) Frees cloned bios from the clone request. Request stacking drivers (e.g. request-based dm) need to make a clone request for a submitted request and dispatch it to other devices. To allocate request for the clone, request stacking drivers may not be able to use blk_get_request() because the allocation may be done in an irq-disabled context. So blk_rq_prep_clone() takes a request allocated by the caller as an argument. For each clone bio in the clone request, request stacking drivers should be able to set up their own completion handler. So blk_rq_prep_clone() takes a callback function which is called for each clone bio, and a pointer for private data which is passed to the callback. NOTE: blk_rq_prep_clone() doesn't copy any actual data of the original request. Pages are shared between original bios and cloned bios. So caller must not complete the original request before the clone request. Signed-off-by: Kiyoshi Ueda <k-ueda@ct.jp.nec.com> Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com> Cc: Boaz Harrosh <bharrosh@panasas.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2009-06-11 13:11:05 +02:00
Linus Torvalds	8623661180	Merge branch 'tracing-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'tracing-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (244 commits) Revert "x86, bts: reenable ptrace branch trace support" tracing: do not translate event helper macros in print format ftrace/documentation: fix typo in function grapher name tracing/events: convert block trace points to TRACE_EVENT(), fix !CONFIG_BLOCK tracing: add protection around module events unload tracing: add trace_seq_vprint interface tracing: fix the block trace points print size tracing/events: convert block trace points to TRACE_EVENT() ring-buffer: fix ret in rb_add_time_stamp ring-buffer: pass in lockdep class key for reader_lock tracing: add annotation to what type of stack trace is recorded tracing: fix multiple use of __print_flags and __print_symbolic tracing/events: fix output format of user stack tracing/events: fix output format of kernel stack tracing/trace_stack: fix the number of entries in the header ring-buffer: discard timestamps that are at the start of the buffer ring-buffer: try to discard unneeded timestamps ring-buffer: fix bug in ring_buffer_discard_commit ftrace: do not profile functions when disabled tracing: make trace pipe recognize latency format flag ...	2009-06-10 19:53:40 -07:00
Nikanth Karthikesan	d9c7d394a8	block: prevent possible io_context->refcount overflow Currently io_context has an atomic_t(32-bit) as refcount. In the case of cfq, for each device against whcih a task does I/O, a reference to the io_context would be taken. And when there are multiple process sharing io_contexts(CLONE_IO) would also have a reference to the same io_context. Theoretically the possible maximum number of processes sharing the same io_context + the number of disks/cfq_data referring to the same io_context can overflow the 32-bit counter on a very high-end machine. Even though it is an improbable case, let us make it atomic_long_t. Signed-off-by: Nikanth Karthikesan <knikanth@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2009-06-10 23:07:15 +02:00
Li Zefan	55782138e4	tracing/events: convert block trace points to TRACE_EVENT() TRACE_EVENT is a more generic way to define tracepoints. Doing so adds these new capabilities to this tracepoint: - zero-copy and per-cpu splice() tracing - binary tracing without printf overhead - structured logging records exposed under /debug/tracing/events - trace events embedded in function tracer output and other plugins - user-defined, per tracepoint filter expressions ... Cons: - no dev_t info for the output of plug, unplug_timer and unplug_io events. no dev_t info for getrq and sleeprq events if bio == NULL. no dev_t info for rq_abort,...,rq_requeue events if rq->rq_disk == NULL. This is mainly because we can't get the deivce from a request queue. But this may change in the future. - A packet command is converted to a string in TP_assign, not TP_print. While blktrace do the convertion just before output. Since pc requests should be rather rare, this is not a big issue. - In blktrace, an event can have 2 different print formats, but a TRACE_EVENT has a unique format, which means we have some unused data in a trace entry. The overhead is minimized by using __dynamic_array() instead of __array(). I've benchmarked the ioctl blktrace vs the splice based TRACE_EVENT tracing: dd dd + ioctl blktrace dd + TRACE_EVENT (splice) 1 7.36s, 42.7 MB/s 7.50s, 42.0 MB/s 7.41s, 42.5 MB/s 2 7.43s, 42.3 MB/s 7.48s, 42.1 MB/s 7.43s, 42.4 MB/s 3 7.38s, 42.6 MB/s 7.45s, 42.2 MB/s 7.41s, 42.5 MB/s So the overhead of tracing is very small, and no regression when using those trace events vs blktrace. And the binary output of TRACE_EVENT is much smaller than blktrace: # ls -l -h -rw-r--r-- 1 root root 8.8M 06-09 13:24 sda.blktrace.0 -rw-r--r-- 1 root root 195K 06-09 13:24 sda.blktrace.1 -rw-r--r-- 1 root root 2.7M 06-09 13:25 trace_splice.out Following are some comparisons between TRACE_EVENT and blktrace: plug: kjournald-480 [000] 303.084981: block_plug: [kjournald] kjournald-480 [000] 303.084981: 8,0 P N [kjournald] unplug_io: kblockd/0-118 [000] 300.052973: block_unplug_io: [kblockd/0] 1 kblockd/0-118 [000] 300.052974: 8,0 U N [kblockd/0] 1 remap: kjournald-480 [000] 303.085042: block_remap: 8,0 W 102736992 + 8 <- (8,8) 33384 kjournald-480 [000] 303.085043: 8,0 A W 102736992 + 8 <- (8,8) 33384 bio_backmerge: kjournald-480 [000] 303.085086: block_bio_backmerge: 8,0 W 102737032 + 8 [kjournald] kjournald-480 [000] 303.085086: 8,0 M W 102737032 + 8 [kjournald] getrq: kjournald-480 [000] 303.084974: block_getrq: 8,0 W 102736984 + 8 [kjournald] kjournald-480 [000] 303.084975: 8,0 G W 102736984 + 8 [kjournald] bash-2066 [001] 1072.953770: 8,0 G N [bash] bash-2066 [001] 1072.953773: block_getrq: 0,0 N 0 + 0 [bash] rq_complete: konsole-2065 [001] 300.053184: block_rq_complete: 8,0 W () 103669040 + 16 [0] konsole-2065 [001] 300.053191: 8,0 C W 103669040 + 16 [0] ksoftirqd/1-7 [001] 1072.953811: 8,0 C N (5a 00 08 00 00 00 00 00 24 00) [0] ksoftirqd/1-7 [001] 1072.953813: block_rq_complete: 0,0 N (5a 00 08 00 00 00 00 00 24 00) 0 + 0 [0] rq_insert: kjournald-480 [000] 303.084985: block_rq_insert: 8,0 W 0 () 102736984 + 8 [kjournald] kjournald-480 [000] 303.084986: 8,0 I W 102736984 + 8 [kjournald] Changelog from v2 -> v3: - use the newly introduced __dynamic_array(). Changelog from v1 -> v2: - use __string() instead of __array() to minimize the memory required to store hex dump of rq->cmd(). - support large pc requests. - add missing blk_fill_rwbs_rq() in block_rq_requeue TRACE_EVENT. - some cleanups. Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> LKML-Reference: <4A2DF669.5070905@cn.fujitsu.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2009-06-09 12:34:23 -04:00
FUJITA Tomonori	c1d4c41f2f	bsg: setting rq->bio to NULL Due to commit `1cd96c242a` ("block: WARN in __blk_put_request() for potential bio leak"), BSG SMP requests get the false warnings: WARNING: at block/blk-core.c:1068 __blk_put_request+0x52/0xc0() This sets rq->bio to NULL to avoid that false warnings. Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2009-06-09 15:17:37 +02:00
Martin K. Petersen	77634f33d4	block: Add missing bounce_pfn stacking and fix comments DM no longer needs to set limits explicitly when calling blk_stack_limits. Let the latter automatically deal with bounce_pfn scaling. Fix kerneldoc variable names. Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2009-06-09 06:23:22 +02:00
Jens Axboe	9df1bb9b51	Revert "block: Fix bounce limit setting in DM" This reverts commit `a05c0205ba`. DM doesn't need to access the bounce_pfn directly. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2009-06-09 06:22:57 +02:00
FUJITA Tomonori	dbb66c4be0	block: needs to set the residual length of a bidi request Tejun's "block: set rq->resid_len to blk_rq_bytes() on issue" patch seems to be incomplete; It doesn't set rq->resid_len to blk_rq_bytes() for a bidi request (req->next_rq). As a result, all bidi users are broken. Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2009-06-09 05:47:10 +02:00
Martin K. Petersen	a05c0205ba	block: Fix bounce limit setting in DM blk_queue_bounce_limit() is more than a wrapper about the request queue limits.bounce_pfn variable. Introduce blk_queue_bounce_pfn() which can be called by stacking drivers that wish to set the bounce limit explicitly. Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2009-06-03 09:33:18 +02:00
Kiyoshi Ueda	53c663ce0f	block: fix a possible oops on elv_abort_queue() I found one more mis-conversion to the 'request is always dequeued when completing' model in elv_abort_queue() during code inspection. Although I haven't hit any problem caused by this mis-conversion yet and just done compile/boot test, please apply if you have no problem. Request must be dequeued when it completes. However, elv_abort_queue() completes requests without dequeueing. This will cause oops in the __blk_end_request_all(). This patch fixes the oops. Signed-off-by: Kiyoshi Ueda <k-ueda@ct.jp.nec.com> Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2009-06-02 08:44:01 +02:00
James Bottomley	c143dc903d	block: fix an oops on BLKPREP_KILL Doing a bit of torture testing, I ran across a BUG in the block subsystem (at blk-core.c:2048): the test for if the request is queued. It turns out the trigger was a BLKPREP_KILL coming out of the SCSI prep function. Currently for BLKPREP_KILL requests, we send them straight into __blk_end_request_all() with an error, but they've never been dequeued, so they trip the bug. Fix this by starting requests before killing them. Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2009-05-30 06:43:49 +02:00
Mike Snitzer	5d85d3247c	block: export blk_stack_limits() DM needs to use blk_stack_limits(), so it needs to be exported. Acked-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2009-05-28 11:04:53 +02:00
Kiyoshi Ueda	3c4198e874	block: fix no diskstat problem The commit below in 2.6-block/for-2.6.31 causes no diskstat problem because the blk_discard_rq() check was added with '&&'. It should be 'blk_fs_request() \|\| blk_discard_rq()'. This patch does it and fixes the no diskstat problem. Please review and apply. ------ /proc/diskstat without this patch ------------------------------------- 8 0 sda 0 0 0 0 0 0 0 0 0 0 0 ------------------------------------------------------------------------------ ----- /proc/diskstat with this patch applied --------------------------------- 8 0 sda 4186 303 373621 61600 9578 3859 107468 169479 2 89755 231059 ------------------------------------------------------------------------------ -------------------------------------------------------------------------- commit `c69d48540c` Author: Jens Axboe <jens.axboe@oracle.com> Date: Fri Apr 24 08:12:19 2009 +0200 block: include discard requests in IO accounting We currently don't do merging on discard requests, but we potentially could. If we do, then we need to include discard requests in the IO accounting, or merging would end up decrementing in_flight IO counters for an IO which never incremented them. So enable accounting for discard requests. <snip> static inline int blk_do_io_stat(struct request *rq) { - return rq->rq_disk && blk_rq_io_stat(rq) && blk_fs_request(rq); + return rq->rq_disk && blk_rq_io_stat(rq) && blk_fs_request(rq) && + blk_discard_rq(rq); } -------------------------------------------------------------------------- Signed-off-by: Kiyoshi Ueda <k-ueda@ct.jp.nec.com> Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2009-05-27 14:50:02 +02:00
James Bottomley	ba396a6c10	block: fix oops with block tag queueing commit e8939a50466fd963eb1ba9118c34b9ffb7ff6aa6 Author: Tejun Heo <tj@kernel.org> Date: Fri May 8 11:54:16 2009 +0900 block: implement and enforce request peek/start/fetch Added a BUG_ON(blk_queued_rq(req)) to the top of blk_finish_req(). Unfortunately, this checks whether req->queuelist is empty. This list is doing double duty both as the queue list and the tag list, so tagged requests come in here with this not empty and boom (the tag list is emptied by blk_queue_end_tag() lower down). Fix this by moving the BUG_ON to below the end tag we also seem vulnerable to this in blk_requeue_request() as well. I think all uses of blk_queued_rq() need auditing because the check is clearly wrong in the tagged case. Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2009-05-27 14:17:08 +02:00

1 2 3 4 5 ...

958 Commits