linux

korg/linux

mirror of https://mirrors.bfsu.edu.cn/git/linux.git synced 2024-11-16 16:54:20 +08:00

Author	SHA1	Message	Date
NeilBrown	52c64152a9	md: bad blocks shouldn't cause a Blocked status on a Faulty device. Once a device is marked Faulty the badblocks - whether acknowledged or not - become irrelevant. So they shouldn't cause the device to be marked as Blocked. Without this patch, a process might write "-blocked" to clear the Blocked status, but while that will correctly fail the device, it won't remove the apparent 'blocked' status. Signed-off-by: NeilBrown <neilb@suse.de>	2011-12-08 16:22:48 +11:00
NeilBrown	af8a24347f	md: take a reference to mddev during sysfs access. When we are accessing an mddev via sysfs we know that the mddev cannot disappear because it has an embedded kobj which is refcounted by sysfs. And we also take the mddev_lock. However this is not enough. The final mddev_put could have been called and the mddev_delayed_delete is waiting for sysfs to let go so it can destroy the kobj and mddev. In this state there are a lot of changes that should not be attempted. To to guard against this we: - initialise mddev->all_mddevs in on last put so the state can be easily detected. - in md_attr_show and md_attr_store, check ->all_mddevs under all_mddevs_lock and mddev_get the mddev if it still appears to be active. This means that if we get to sysfs as the mddev is being deleted we will get -EBUSY. rdev_attr_store and rdev_attr_show are similar but already have sufficient protection. They check that rdev->mddev still points to mddev after taking mddev_lock. As this is cleared before delayed removal which can only be requested under the mddev_lock, this ensure the rdev and mddev are still alive. Signed-off-by: NeilBrown <neilb@suse.de>	2011-12-08 15:49:46 +11:00
NeilBrown	1d23f178d5	md: refine interpretation of "hold_active == UNTIL_IOCTL". We like md devices to disappear when they really are not needed. However it is not possible to tell from the current state whether it is needed or not. We can only tell from recent history of changes. In particular immediately after we create an md device it looks very similar to immediately after we have finished with it. So we always preserve a newly created md device until something significant happens. This state is stored in 'hold_active'. The normal case is to keep it until an ioctl happens, as that will normally either activate it, or explicitly de-activate it. If it doesn't then it was probably created by mistake and it is now time to get rid of it. We can also modify an array via sysfs (instead of via ioctl) and we currently treat any change via sysfs like an ioctl as a sign that if it now isn't more active, it should be destroyed. However this is not appropriate as changes made via sysfs are more gradual so we should look for a more definitive change. So this patch only clears 'hold_active' from UNTIL_IOCTL to clear when the array_state is changed via sysfs. Other changes via sysfs are ignored. Signed-off-by: NeilBrown <neilb@suse.de>	2011-12-08 15:49:12 +11:00
NeilBrown	7c8f424798	md/lock: ensure updates to page_attrs are properly locked. Page attributes are set using __set_bit rather than set_bit as it normally called under a spinlock so the extra atomicity is not needed. However there are two places where we might set or clear page attributes without holding the spinlock. So add the spinlock in those cases. This might be the cause of occasional reports that bits a aren't getting clear properly - theory is that BITMAP_PAGE_PENDING gets lost when BITMAP_PAGE_NEEDWRITE is set or cleared. This is an inconvenience, not a threat to data safety. Signed-off-by: NeilBrown <neilb@suse.de>	2011-11-23 10:18:52 +11:00
Dan Williams	257a4b42af	md/raid5: STRIPE_ACTIVE has lock semantics, add barriers All updates that occur under STRIPE_ACTIVE should be globally visible when STRIPE_ACTIVE clears. test_and_set_bit() implies a barrier, but clear_bit() does not. This is suitable for 3.1-stable. Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: NeilBrown <neilb@suse.de> Cc: stable@kernel.org	2011-11-08 16:22:06 +11:00
NeilBrown	9a3f530f39	md/raid5: abort any pending parity operations when array fails. When the number of failed devices exceeds the allowed number we must abort any active parity operations (checks or updates) as they are no longer meaningful, and can lead to a BUG_ON in handle_parity_checks6. This bug was introduce by commit `6c0069c0ae` in 2.6.29. Reported-by: Manish Katiyar <mkatiyar@gmail.com> Tested-by: Manish Katiyar <mkatiyar@gmail.com> Acked-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: NeilBrown <neilb@suse.de> Cc: stable@kernel.org	2011-11-08 16:22:01 +11:00
Stephen Rothwell	a84450604d	device-mapper: using EXPORT_SYBOL in dm-space-map-checker.c needs export.h Reported-by: Witold Baryluk <baryluk@smp.if.uj.edu.pl> Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-11-07 10:29:10 -08:00
Stephen Rothwell	6f66263f8e	device-mapper: dm-bufio.c needs to include module.h since it uses the module facilities. Reported-by: Witold Baryluk <baryluk@smp.if.uj.edu.pl> Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-11-07 10:29:10 -08:00
Paul Gortmaker	1944ce60fe	drivers/md: change module.h -> export.h in persistent-data/dm-* For the files which are not themselves modular, we can change them to include only the smaller export.h since all they are doing is looking for EXPORT_SYMBOL. Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-11-07 10:29:09 -08:00
Linus Torvalds	32aaeffbd4	Merge branch 'modsplit-Oct31_2011' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux * 'modsplit-Oct31_2011' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux: (230 commits) Revert "tracing: Include module.h in define_trace.h" irq: don't put module.h into irq.h for tracking irqgen modules. bluetooth: macroize two small inlines to avoid module.h ip_vs.h: fix implicit use of module_get/module_put from module.h nf_conntrack.h: fix up fallout from implicit moduleparam.h presence include: replace linux/module.h with "struct module" wherever possible include: convert various register fcns to macros to avoid include chaining crypto.h: remove unused crypto_tfm_alg_modname() inline uwb.h: fix implicit use of asm/page.h for PAGE_SIZE pm_runtime.h: explicitly requires notifier.h linux/dmaengine.h: fix implicit use of bitmap.h and asm/page.h miscdevice.h: fix up implicit use of lists and types stop_machine.h: fix implicit use of smp.h for smp_processor_id of: fix implicit use of errno.h in include/linux/of.h of_platform.h: delete needless include <linux/module.h> acpi: remove module.h include from platform/aclinux.h miscdevice.h: delete unnecessary inclusion of module.h device_cgroup.h: delete needless include <linux/module.h> net: sch_generic remove redundant use of <linux/module.h> net: inet_timewait_sock doesnt need <linux/module.h> ... Fix up trivial conflicts (other header files, and removal of the ab3550 mfd driver) in - drivers/media/dvb/frontends/dibx000_common.c - drivers/media/video/{mt9m111.c,ov6650.c} - drivers/mfd/ab3550-core.c - include/linux/dmaengine.h	2011-11-06 19:44:47 -08:00
Linus Torvalds	b4fdcb02f1	Merge branch 'for-3.2/core' of git://git.kernel.dk/linux-block * 'for-3.2/core' of git://git.kernel.dk/linux-block: (29 commits) block: don't call blk_drain_queue() if elevator is not up blk-throttle: use queue_is_locked() instead of lockdep_is_held() blk-throttle: Take blkcg->lock while traversing blkcg->policy_list blk-throttle: Free up policy node associated with deleted rule block: warn if tag is greater than real_max_depth. block: make gendisk hold a reference to its queue blk-flush: move the queue kick into blk-flush: fix invalid BUG_ON in blk_insert_flush block: Remove the control of complete cpu from bio. block: fix a typo in the blk-cgroup.h file block: initialize the bounce pool if high memory may be added later block: fix request_queue lifetime handling by making blk_queue_cleanup() properly shutdown block: drop @tsk from attempt_plug_merge() and explain sync rules block: make get_request[_wait]() fail if queue is dead block: reorganize throtl_get_tg() and blk_throtl_bio() block: reorganize queue draining block: drop unnecessary blk_get/put_queue() in scsi_cmd_ioctl() and blk_get_tg() block: pass around REQ_* flags instead of broken down booleans during request alloc/free block: move blk_throtl prototypes to block/blk.h block: fix genhd refcounting in blkio_policy_parse_and_set() ... Fix up trivial conflicts due to "mddev_t" -> "struct mddev" conversion and making the request functions be of type "void" instead of "int" in - drivers/md/{faulty.c,linear.c,md.c,md.h,multipath.c,raid0.c,raid1.c,raid10.c,raid5.c} - drivers/staging/zram/zram_drv.c	2011-11-04 17:06:58 -07:00
Linus Torvalds	43672a0784	Merge git://git.kernel.org/pub/scm/linux/kernel/git/steve/linux-dm * git://git.kernel.org/pub/scm/linux/kernel/git/steve/linux-dm: dm: raid fix device status indicator when array initializing dm log userspace: add log device dependency dm log userspace: fix comment hyphens dm: add thin provisioning target dm: add persistent data library dm: add bufio dm: export dm get md dm table: add immutable feature dm table: add always writeable feature dm table: add singleton feature dm kcopyd: add dm_kcopyd_zero to zero an area dm: remove superfluous smp_mb dm: use local printk ratelimit dm table: propagate non rotational flag	2011-11-02 17:02:37 -07:00
Paul Gortmaker	daaa5f7cbe	md: Add in export.h for files using EXPORT_SYMBOL These files were getting the defines for EXPORT_SYMBOL because device.h was including module.h. But we are going to put an end to that. So add the proper export.h include now. Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>	2011-10-31 19:31:19 -04:00
Paul Gortmaker	056075c764	md: Add module.h to all files using it implicitly A pending cleanup will mean that module.h won't be implicitly everywhere anymore. Make sure the modular drivers in md dir are actually calling out for <module.h> explicitly in advance. Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>	2011-10-31 19:31:18 -04:00
Linus Torvalds	571109f536	Merge branch 'for-linus' of git://neil.brown.name/md * 'for-linus' of git://neil.brown.name/md: md/raid10: Fix bug when activating a hot-spare.	2011-10-31 15:21:29 -07:00
Jonathan E Brassow	2e727c3ca1	dm: raid fix device status indicator when array initializing When devices in a RAID array are not in-sync, they are supposed to be reported as such in the status output as an 'a' character, which means "alive, but not in-sync". But when the entire array is rebuilt 'A' is being used, which is incorrect. This patch corrects this to 'a'. Signed-off-by: Jonathan Brassow <jbrassow@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>	2011-10-31 20:21:26 +00:00
Jonathan E Brassow	5a25f0eb70	dm log userspace: add log device dependency Allow userspace dm log implementations to register their log device so it is no longer missing from the list of device dependencies. When device mapper targets use a device they normally call dm_get_device which includes it in the device list returned to userspace applications such as LVM through the DM_TABLE_DEPS ioctl. Userspace log devices don't use dm_get_device as userspace opens them so they are missing from the list of dependencies. This patch extends the DM_ULOG_CTR operation to allow userspace to respond with the name of the log device (if appropriate) to be registered via 'dm_get_device'. DM_ULOG_REQUEST_VERSION is incremented. This is backwards compatible. If the kernel and userspace log server have both been updated, the new information will be passed down to the kernel and the device will be registered. If the kernel is new, but the log server is old, the log server will not pass down any device information and the kernel will simply bypass the device registration as before. If the kernel is old but the log server is new, the log server will see the old version number and not pass the device info. Signed-off-by: Jonathan Brassow <jbrassow@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>	2011-10-31 20:21:24 +00:00
Jonathan Brassow	b89544575d	dm log userspace: fix comment hyphens Fix comments: clustered-disk needs a hyphen not an underscore. Signed-off-by: Jonathan Brassow <jbrassow@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>	2011-10-31 20:21:22 +00:00
Joe Thornber	991d9fa02d	dm: add thin provisioning target Initial EXPERIMENTAL implementation of device-mapper thin provisioning with snapshot support. The 'thin' target is used to create instances of the virtual devices that are hosted in the 'thin-pool' target. The thin-pool target provides data sharing among devices. This sharing is made possible using the persistent-data library in the previous patch. The main highlight of this implementation, compared to the previous implementation of snapshots, is that it allows many virtual devices to be stored on the same data volume, simplifying administration and allowing sharing of data between volumes (thus reducing disk usage). Another big feature is support for arbitrary depth of recursive snapshots (snapshots of snapshots of snapshots ...). The previous implementation of snapshots did this by chaining together lookup tables, and so performance was O(depth). This new implementation uses a single data structure so we don't get this degradation with depth. For further information and examples of how to use this, please read Documentation/device-mapper/thin-provisioning.txt Signed-off-by: Joe Thornber <thornber@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>	2011-10-31 20:21:18 +00:00
Joe Thornber	3241b1d3e0	dm: add persistent data library The persistent-data library offers a re-usable framework for the storage and management of on-disk metadata in device-mapper targets. It's used by the thin-provisioning target in the next patch and in an upcoming hierarchical storage target. For further information, please read Documentation/device-mapper/persistent-data.txt Signed-off-by: Joe Thornber <thornber@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>	2011-10-31 20:19:11 +00:00
Mikulas Patocka	95d402f057	dm: add bufio The dm-bufio interface allows you to do cached I/O on devices, holding recently-read blocks in memory and performing delayed writes. We don't use buffer cache or page cache already present in the kernel, because: * we need to handle block sizes larger than a page * we can't allocate memory to perform reads or we'd have deadlocks Currently, when a cache is required, we limit its size to a fraction of available memory. Usage can be viewed and changed in /sys/module/dm_bufio/parameters/ . The first user is thin provisioning, but more dm users are planned. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>	2011-10-31 20:19:09 +00:00
Alasdair G Kergon	3cf2e4ba74	dm: export dm get md Export dm_get_md() for the new thin provisioning target to use. Signed-off-by: Alasdair G Kergon <agk@redhat.com>	2011-10-31 20:19:06 +00:00
Alasdair G Kergon	36a0456fbf	dm table: add immutable feature Introduce DM_TARGET_IMMUTABLE to indicate that the target type cannot be mixed with any other target type, and once loaded into a device, it cannot be replaced with a table containing a different type. The thin provisioning pool device will use this. Signed-off-by: Alasdair G Kergon <agk@redhat.com>	2011-10-31 20:19:04 +00:00
Alasdair G Kergon	cc6cbe141a	dm table: add always writeable feature Add a target feature flag DM_TARGET_ALWAYS_WRITEABLE to indicate that a target does not support read-only mode. The initial implementation of the thin provisioning target uses this. Signed-off-by: Alasdair G Kergon <agk@redhat.com>	2011-10-31 20:19:02 +00:00
Alasdair G Kergon	3791e2fc0e	dm table: add singleton feature Introduce the concept of a singleton table which contains exactly one target. If a target type sets the DM_TARGET_SINGLETON feature bit device-mapper will ensure that any table that includes that target contains no others. The thin provisioning pool target uses this. Signed-off-by: Alasdair G Kergon <agk@redhat.com>	2011-10-31 20:19:00 +00:00
Mikulas Patocka	7f06965390	dm kcopyd: add dm_kcopyd_zero to zero an area This patch introduces dm_kcopyd_zero() to make it easy to use kcopyd to write zeros into the requested areas instead instead of copying. It is implemented by passing a NULL copying source to dm_kcopyd_copy(). The forthcoming thin provisioning target uses this. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>	2011-10-31 20:18:58 +00:00
Namhyung Kim	fbdc86f3bd	dm: remove superfluous smp_mb Since set_current_state() contains a memory barrier in it, an additional barrier isn't needed. Signed-off-by: Namhyung Kim <namhyung@gmail.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>	2011-10-31 20:18:56 +00:00
Namhyung Kim	71a16736a1	dm: use local printk ratelimit printk_ratelimit() shares global ratelimiting state with all other subsystems, so its usage is discouraged. Instead, define and use dm's local state. Signed-off-by: Namhyung Kim <namhyung@gmail.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>	2011-10-31 20:18:54 +00:00
Mandeep Singh Baines	4693c9668f	dm table: propagate non rotational flag Allow QUEUE_FLAG_NONROT to propagate up the device stack if all underlying devices are non-rotational. Tools like ureadahead will schedule IOs differently based on the rotational flag. With this patch, I see boot time go from 7.75 s to 7.46 s on my device. Suggested-by: J. Richard Barnette <jrbarnette@chromium.org> Signed-off-by: Mandeep Singh Baines <msb@chromium.org> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Cc: Neil Brown <neilb@suse.de> Cc: Jens Axboe <jaxboe@fusionio.com> Cc: Martin K. Petersen <martin.petersen@oracle.com> Cc: dm-devel@redhat.com Signed-off-by: Alasdair G Kergon <agk@redhat.com>	2011-10-31 20:18:50 +00:00
NeilBrown	7fcc7c8acf	md/raid10: Fix bug when activating a hot-spare. This is a fairly serious bug in RAID10. When a RAID10 array is degraded and a hot-spare is activated, the spare does not take up the empty slot, but rather replaces the first working device. This is likely to make the array non-functional. It would normally be possible to recover the data, but that would need care and is not guaranteed. This bug was introduced in commit `2bb77736ae` which first appeared in 3.1. Cc: stable@kernel.org Signed-off-by: NeilBrown <neilb@suse.de>	2011-10-31 12:59:44 +11:00
Linus Torvalds	c3ae1f3356	Merge branch 'for-linus' of git://neil.brown.name/md * 'for-linus' of git://neil.brown.name/md: (34 commits) md: Fix some bugs in recovery_disabled handling. md/raid5: fix bug that could result in reads from a failed device. lib/raid6: Fix filename emitted in generated code md.c: trivial comment fix MD: Allow restarting an interrupted incremental recovery. md: clear In_sync bit on devices added to an active array. md: add proper write-congestion reporting to RAID1 and RAID10. md: rename "mdk_personality" to "md_personality" md/bitmap remove fault injection options. md/raid5: typedef removal: raid5_conf_t -> struct r5conf md/raid1: typedef removal: conf_t -> struct r1conf md/raid10: typedef removal: conf_t -> struct r10conf md/raid0: typedef removal: raid0_conf_t -> struct r0conf md/multipath: typedef removal: multipath_conf_t -> struct mpconf md/linear: typedef removal: linear_conf_t -> struct linear_conf md/faulty: remove typedef: conf_t -> struct faulty_conf md/linear: remove typedefs: dev_info_t -> struct dev_info md: remove typedefs: mirror_info_t -> struct mirror_info md: remove typedefs: r10bio_t -> struct r10bio and r1bio_t -> struct r1bio md: remove typedefs: mdk_thread_t -> struct md_thread ...	2011-10-26 21:39:42 +02:00
NeilBrown	d890fa2b05	md: Fix some bugs in recovery_disabled handling. In 3.0 we changed the way recovery_disabled was handle so that instead of testing against zero, we test an mddev-> value against a conf-> value. Two problems: 1/ one place in raid1 was missed and still sets to '1'. 2/ We didn't explicitly set the conf-> value at array creation time. It defaulted to '0' just like the mddev value does so they could appear equal and thus disable recovery. This did not affect normal 'md' as it calls bind_rdev_to_array which changes the mddev value. However the dmraid interface doesn't call this and so doesn't change ->recovery_disabled; so at array start all recovery is incorrectly disabled. So initialise the 'conf' value to one less that the mddev value, so the will only be the same when explicitly set that way. Reported-by: Jonathan Brassow <jbrassow@redhat.com> Signed-off-by: NeilBrown <neilb@suse.de>	2011-10-26 11:54:39 +11:00
NeilBrown	355840e7a7	md/raid5: fix bug that could result in reads from a failed device. This bug was introduced in `415e72d034` which was in 2.6.36. There is a small window of time between when a device fails and when it is removed from the array. During this time we might still read from it, but we won't write to it - so it is possible that we could read stale data. We didn't need the test of 'Faulty' before because the test on In_sync is sufficient. Since we started allowing reads from the early part of non-In_sync devices we need a test on Faulty too. This is suitable for any kernel from 2.6.36 onwards, though the patch might need a bit of tweaking in 3.0 and earlier. Cc: stable@kernel.org Signed-off-by: NeilBrown <neilb@suse.de>	2011-10-26 10:31:04 +11:00
Tao Ma	9562ad9ab3	block: Remove the control of complete cpu from bio. bio originally has the functionality to set the complete cpu, but it is broken. Chirstoph said that "This code is unused, and from the all the discussions lately pretty obviously broken. The only thing keeping it serves is creating more confusion and possibly more bugs." And Jens replied with "We can kill bio_set_completion_cpu(). I'm fine with leaving cpu control to the request based drivers, they are the only ones that can toggle the setting anyway". So this patch tries to remove all the work of controling complete cpu from a bio. Cc: Shaohua Li <shaohua.li@intel.com> Cc: Christoph Hellwig <hch@infradead.org> Signed-off-by: Tao Ma <boyu.mt@taobao.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2011-10-24 16:11:30 +02:00
Alasdair G Kergon	d136f2efdf	dm kcopyd: fix job_pool leak Fix memory leak introduced by commit `a6e50b409d` (dm snapshot: skip reading origin when overwriting complete chunk). When allocating a set of jobs from kc->job_pool, job->master_job must be set (to point to itself) so that the mempool item gets freed when the master_job completes. master_job was introduced by commit `c6ea41fbbe` (dm kcopyd: preallocate sub jobs to avoid deadlock) Reported-by: Michael Leun <ml@newton.leun.net> Cc: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>	2011-10-23 20:55:17 +01:00
Jens Axboe	5c04b426f2	Merge branch 'v3.1-rc10' into for-3.2/core Conflicts: block/blk-core.c include/linux/blkdev.h Signed-off-by: Jens Axboe <axboe@kernel.dk>	2011-10-19 14:30:42 +02:00
Chris Dunlop	751e67ca2e	md.c: trivial comment fix Trivial comment fix Signed-off-by: Chris Dunlop <chris@onthe.net.au> Signed-off-by: NeilBrown <neilb@suse.de>	2011-10-19 17:15:15 +11:00
Andrei Warkentin	d70ed2e4fa	MD: Allow restarting an interrupted incremental recovery. If an incremental recovery was interrupted, a subsequent re-add will result in a full recovery, even though an incremental should be possible (seen with raid1). Solve this problem by not updating the superblock on the recovering device until array is not degraded any longer. Cc: Neil Brown <neilb@suse.de> Signed-off-by: Andrei Warkentin <andreiw@vmware.com> Signed-off-by: NeilBrown <neilb@suse.de>	2011-10-18 12:16:48 +11:00
NeilBrown	d30519fc59	md: clear In_sync bit on devices added to an active array. When we add a device to an active array it can be meaningful to set the 'insync' flag. This indicates that the device is in-sync with the array except for locations recorded in the bitmap. A bitmap-based recovery can then bring it completely in-sync. Internally we move that flag to 'saved_raid_disk' but forgot to clear In_sync like we do in add_new_disk. So clear In_sync after moving its value to saved_raid_disk. Reported-by: Andrei Warkentin <andreiw@vmware.com> Signed-off-by: NeilBrown <neilb@suse.de>	2011-10-18 12:13:47 +11:00
NeilBrown	34db0cd60f	md: add proper write-congestion reporting to RAID1 and RAID10. RAID1 and RAID10 handle write requests by queuing them for handling by a separate thread. This is because when a write-intent-bitmap is active we might need to update the bitmap first, so it is good to queue a lot of writes, then do one big bitmap update for them all. However writeback request devices to appear to be congested after a while so it can make some guesstimate of throughput. The infinite queue defeats that (note that RAID5 has already has a finite queue so it doesn't suffer from this problem). So impose a limit on the number of pending write requests. By default it is 1024 which seems to be generally suitable. Make it configurable via module option just in case someone finds a regression. Signed-off-by: NeilBrown <neilb@suse.de>	2011-10-11 16:50:01 +11:00
NeilBrown	84fc4b56db	md: rename "mdk_personality" to "md_personality" "mdk" doesn't mean anything any more. Signed-off-by: NeilBrown <neilb@suse.de>	2011-10-11 16:49:58 +11:00
NeilBrown	29d3247ea2	md/bitmap remove fault injection options. These are too hard to use to be much more than noise. Signed-off-by: NeilBrown <neilb@suse.de>	2011-10-11 16:49:56 +11:00
NeilBrown	d1688a6d55	md/raid5: typedef removal: raid5_conf_t -> struct r5conf Signed-off-by: NeilBrown <neilb@suse.de>	2011-10-11 16:49:52 +11:00
NeilBrown	e809636047	md/raid1: typedef removal: conf_t -> struct r1conf Signed-off-by: NeilBrown <neilb@suse.de>	2011-10-11 16:49:05 +11:00
NeilBrown	e879a8793f	md/raid10: typedef removal: conf_t -> struct r10conf Signed-off-by: NeilBrown <neilb@suse.de>	2011-10-11 16:49:02 +11:00
NeilBrown	e373ab1091	md/raid0: typedef removal: raid0_conf_t -> struct r0conf Signed-off-by: NeilBrown <neilb@suse.de>	2011-10-11 16:48:59 +11:00
NeilBrown	69724e28ca	md/multipath: typedef removal: multipath_conf_t -> struct mpconf Signed-off-by: NeilBrown <neilb@suse.de>	2011-10-11 16:48:57 +11:00
NeilBrown	e849b9381f	md/linear: typedef removal: linear_conf_t -> struct linear_conf Signed-off-by: NeilBrown <neilb@suse.de>	2011-10-11 16:48:54 +11:00
NeilBrown	8f1ae43dd2	md/faulty: remove typedef: conf_t -> struct faulty_conf Signed-off-by: NeilBrown <neilb@suse.de>	2011-10-11 16:48:52 +11:00
NeilBrown	a71207713a	md/linear: remove typedefs: dev_info_t -> struct dev_info Signed-off-by: NeilBrown <neilb@suse.de>	2011-10-11 16:48:49 +11:00
NeilBrown	0f6d02d580	md: remove typedefs: mirror_info_t -> struct mirror_info Signed-off-by: NeilBrown <neilb@suse.de>	2011-10-11 16:48:46 +11:00
NeilBrown	9f2c9d12bc	md: remove typedefs: r10bio_t -> struct r10bio and r1bio_t -> struct r1bio Signed-off-by: NeilBrown <neilb@suse.de>	2011-10-11 16:48:43 +11:00
NeilBrown	2b8bf3451d	md: remove typedefs: mdk_thread_t -> struct md_thread Signed-off-by: NeilBrown <neilb@suse.de>	2011-10-11 16:48:23 +11:00
NeilBrown	fd01b88c75	md: remove typedefs: mddev_t -> struct mddev Having mddev_t and 'struct mddev_s' is ugly and not preferred Signed-off-by: NeilBrown <neilb@suse.de>	2011-10-11 16:47:53 +11:00
NeilBrown	3cb0300200	md: removing typedefs: mdk_rdev_t -> struct md_rdev The typedefs are just annoying. 'mdk' probably refers to 'md_k.h' which used to be an include file that defined this thing. Signed-off-by: NeilBrown <neilb@suse.de>	2011-10-11 16:45:26 +11:00
NeilBrown	50de8df4ab	md/raid0: convert some printks to pr_debug. When md assembles a RAID0 array it prints out lots of info which is really just for debugging, so convert that to pr_debug. It also prints out the resulting configuration which could be interesting, so keep that as 'printk' but tidy it up a bit. Signed-off-by: NeilBrown <neilb@suse.de>	2011-10-07 14:23:22 +11:00
NeilBrown	36a4e1fe0f	md: remove PRINTK and dprintk debugging and use pr_debug Being able to dynamically enable these make them much more useful. Signed-off-by: NeilBrown <neilb@suse.de>	2011-10-07 14:23:17 +11:00
NeilBrown	bdc04e6b15	md: remove some old DEBUGging code. This code is not really helpful and is hard to maintain, so just discard it. Signed-off-by: NeilBrown <neilb@suse.de>	2011-10-07 14:23:04 +11:00
NeilBrown	db298e1946	md/raid5: convert to macros into inline functions. More type-safety. Easier to read. Signed-off-by: NeilBrown <neilb@suse.de>	2011-10-07 14:23:00 +11:00
NeilBrown	0fc280f606	md/raid1/ avoid bio search in end_sync_read() We know which device we just read from so we don't need to search the bios to find out. Just use ->read_disk. Signed-off-by: NeilBrown <neilb@suse.de>	2011-10-07 14:22:55 +11:00
Namhyung Kim	ba3ae3bee3	md/raid1: factor out common bio handling code When normal-write and sync-read/write bio completes, we should find out the disk number the bio belongs to. Factor those common code out to a separate function. Signed-off-by: Namhyung Kim <namhyung@gmail.com> Signed-off-by: NeilBrown <neilb@suse.de>	2011-10-07 14:22:53 +11:00
NeilBrown	e4f869d9de	md/raid5: remove pointless NULL test. In the 'abort' branch of run(), 'conf' cannot possibly be NULL, so remove the test. Reported-by: Zdenek Kabelac <zdenek.kabelac@gmail.com> Signed-off-by: NeilBrown <neilb@suse.de>	2011-10-07 14:22:49 +11:00
NeilBrown	ce550c2059	md/raid1: add documentation to r1_private_data_s data structure. There wasn't much and it is inconsistent. Also rearrange fields to keep related fields together. Reported-by: Aapo Laine <aapo.laine@shiftmail.org> Signed-off-by: NeilBrown <neilb@suse.de>	2011-10-07 14:22:33 +11:00
Linus Torvalds	6367f1775e	Merge branch 'for-linus' of http://people.redhat.com/agk/git/linux-dm * 'for-linus' of http://people.redhat.com/agk/git/linux-dm: dm crypt: always disable discard_zeroes_data dm: raid fix write_mostly arg validation dm table: avoid crash if integrity profile changes dm: flakey fix corrupt_bio_byte error path	2011-10-06 08:31:47 -07:00
Milan Broz	983c7db347	dm crypt: always disable discard_zeroes_data If optional discard support in dm-crypt is enabled, discards requests bypass the crypt queue and blocks of the underlying device are discarded. For the read path, discarded blocks are handled the same as normal ciphertext blocks, thus decrypted. So if the underlying device announces discarded regions return zeroes, dm-crypt must disable this flag because after decryption there is just random noise instead of zeroes. Signed-off-by: Milan Broz <mbroz@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>	2011-09-25 23:26:21 +01:00
Jonthan Brassow	8232480944	dm: raid fix write_mostly arg validation Fix off-by-one error in validation of write_mostly. The user-supplied value given for the 'write_mostly' argument must be an index starting at 0. The validation of the supplied argument failed to check for 'N' ('>' vs '>='), which would have caused an access beyond the end of the array. Reported-by: Doug Ledford <dledford@redhat.com> Signed-off-by: Jonathan Brassow <jbrassow@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>	2011-09-25 23:26:19 +01:00
Mike Snitzer	876fbba1db	dm table: avoid crash if integrity profile changes Commit `a63a5cf` (dm: improve block integrity support) introduced a two-phase initialization of a DM device's integrity profile. This patch avoids dereferencing a NULL 'template_disk' pointer in blk_integrity_register() if there is an integrity profile mismatch in dm_table_set_integrity(). This can occur if the integrity profiles for stacked devices in a DM table are changed between the call to dm_table_prealloc_integrity() and dm_table_set_integrity(). Reported-by: Zdenek Kabelac <zkabelac@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com> Cc: stable@kernel.org # 2.6.39	2011-09-25 23:26:17 +01:00
Mike Snitzer	68e58a294f	dm: flakey fix corrupt_bio_byte error path If no arguments were provided to the corrupt_bio_byte feature an error should be returned immediately. Reported-by: Zdenek Kabelac <zkabelac@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>	2011-09-25 23:26:15 +01:00
Daniel P. Berrange	2dba6a911c	md: don't delay reboot by 1 second if no MD devices exist The md_notify_reboot() method includes a call to mdelay(1000), to deal with "exotic SCSI devices" which are too volatile on reboot. The delay is unconditional. Even if the machine does not have any block devices, let alone MD devices, the kernel shutdown sequence is slowed down. 1 second does not matter much with physical hardware, but with certain virtualization use cases any wasted time in the bootup & shutdown sequence counts for alot. * drivers/md/md.c: md_notify_reboot() - only impose a delay if there was at least one MD device to be stopped during reboot Signed-off-by: Daniel P. Berrange <berrange@redhat.com> Signed-off-by: NeilBrown <neilb@suse.de>	2011-09-23 19:54:04 +10:00
Wang Sheng-Hui	7e84152626	trival: md_k.h should be md.h in the beginning comment of file md.h Signed-off-by: Wang Sheng-Hui <shhuiw@gmail.com> Signed-off-by: NeilBrown <neilb@suse.de>	2011-09-21 15:37:46 +10:00
NeilBrown	2585f3ef8c	md/bitmap: improve handling of 'allclean'. The 'allclean' flag is used to cache the fact that there is nothing to do, so we can avoid waking up and scanning the bitmap regularly. The two sorts of pages that might need the attention of the bitmap daemon are BITMAP_PAGE_PENDING and BITMAP_PAGE_NEEDWRITE pages. So make sure allclean reflects exactly when there are none of those. So: set it before scanning all pages with either bit set. clear it whenever these bits are set clear it when we desire not to clear one of these bits. don't clear it any other time. Signed-off-by: NeilBrown <neilb@suse.de>	2011-09-21 15:37:46 +10:00
NeilBrown	5a537df44d	md/bitmap: rename and tidy up BITMAP_PAGE_CLEAN The flag 'BITMAP_PAGE_CLEAN' has a confusing name as it doesn't mean that the page is clean, but rather that there are counters in the page which allow bits in the bitmap to be cleared - i.e. maybe cleaning can happen. So change it to BITMAP_PAGE_PENDING and fix some irregularities: - Don't set it in bitmap_init_from_disk as bitmap_set_memory_bits sets it when needed - in bitmap_daemon_work, if we find a counter that is '1', but need_sync is set, then set BITMAP_PAGE_PENDING again (it was recently cleared) to ensure we don't forget about this bit. Signed-off-by: NeilBrown <neilb@suse.de>	2011-09-21 15:37:46 +10:00
NeilBrown	01f96c0a99	md: Avoid waking up a thread after it has been freed. Two related problems: 1/ some error paths call "md_unregister_thread(mddev->thread)" without subsequently clearing ->thread. A subsequent call to mddev_unlock will try to wake the thread, and crash. 2/ Most calls to md_wakeup_thread are protected against the thread disappeared either by: - holding the ->mutex - having an active request, so something else must be keeping the array active. However mddev_unlock calls md_wakeup_thread after dropping the mutex and without any certainty of an active request, so the ->thread could theoretically disappear. So we need a spinlock to provide some protections. So change md_unregister_thread to take a pointer to the thread pointer, and ensure that it always does the required locking, and clears the pointer properly. Reported-by: "Moshe Melnikov" <moshe@zadarastorage.com> Signed-off-by: NeilBrown <neilb@suse.de> cc: stable@kernel.org	2011-09-21 15:30:20 +10:00
Christoph Hellwig	5a7bbad27a	block: remove support for bio remapping from ->make_request There is very little benefit in allowing to let a ->make_request instance update the bios device and sector and loop around it in __generic_make_request when we can archive the same through calling generic_make_request from the driver and letting the loop in generic_make_request handle it. Note that various drivers got the return value from ->make_request and returned non-zero values for errors. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: NeilBrown <neilb@suse.de> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>	2011-09-12 12:12:01 +02:00
Jens Axboe	c20e8de27f	block: rename __make_request() to blk_queue_bio() Now that it's exported, lets put it in a more sane namespace. Signed-off-by: Jens Axboe <jaxboe@fusionio.com>	2011-09-12 12:08:31 +02:00
Christoph Hellwig	166e1f901b	block: export __make_request Avoid the hacks need for request based device mappers currently by simply exporting the symbol instead of trying to get it through the back door. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>	2011-09-12 12:08:27 +02:00
NeilBrown	27a7b260f7	md: Fix handling for devices from 2TB to 4TB in 0.90 metadata. 0.90 metadata uses an unsigned 32bit number to count the number of kilobytes used from each device. This should allow up to 4TB per device. However we multiply this by 2 (to get sectors) before casting to a larger type, so sizes above 2TB get truncated. Also we allow rdev->sectors to be larger than 4TB, so it is possible for the array to be resized larger than the metadata can handle. So make sure rdev->sectors never exceeds 4TB when 0.90 metadata is in used. Also the sanity check at the end of super_90_load should include level 1 as it used ->size too. (RAID0 and Linear don't use ->size at all). Reported-by: Pim Zandbergen <P.Zandbergen@macroscoop.nl> Cc: stable@kernel.org Signed-off-by: NeilBrown <neilb@suse.de>	2011-09-10 17:21:28 +10:00
NeilBrown	079fa166a2	md/raid1,10: Remove use-after-free bug in make_request. A single request to RAID1 or RAID10 might result in multiple requests if there are known bad blocks that need to be avoided. To detect if we need to submit another write request we test: if (sectors_handled < (bio->bi_size >> 9)) { However this is after we call _write_done() so the 'bio' no longer belongs to us - the writes could have completed and the bio freed. So move the _write_done call until after the test against bio->bi_size. This addresses https://bugzilla.kernel.org/show_bug.cgi?id=41862 Reported-by: Bruno Wolff III <bruno@wolff.to> Tested-by: Bruno Wolff III <bruno@wolff.to> Signed-off-by: NeilBrown <neilb@suse.de>	2011-09-10 17:21:23 +10:00
NeilBrown	19d5f834d6	md/raid10: unify handling of write completion. A write can complete at two different places: 1/ when the last member-device write completes, through raid10_end_write_request 2/ in make_request() when we remove the initial bias from ->remaining. These two should do exactly the same thing and the comment says they do, but they don't. So factor the correct code out into a function and call it in both places. This makes the code much more similar to RAID1. The difference is only significant if there is an error, and they usually take a while, so it is unlikely that there will be an error already when make_request is completing, so this is unlikely to cause real problems. Signed-off-by: NeilBrown <neilb@suse.de>	2011-09-10 17:21:17 +10:00
NeilBrown	43220aa0f2	md/raid5: fix a hang on device failure. Waiting for a 'blocked' rdev to become unblocked in the raid5d thread cannot work with internal metadata as it is the raid5d thread which will clear the blocked flag. This wasn't a problem in 3.0 and earlier as we only set the blocked flag when external metadata was used then. However we now set it always, so we need to be more careful. Signed-off-by: NeilBrown <neilb@suse.de>	2011-08-31 12:49:14 +10:00
NeilBrown	7da64a0abc	md: fix clearing of 'blocked' flag in the presence of bad blocks. When the 'blocked' flag on a device is cleared while there are unacknowledged bad blocks we must fail the device. This is needed for backwards compatability of the interface. The code currently uses the wrong test for "unacknowledged bad blocks exist". Change it to the right test. Signed-off-by: NeilBrown <neilb@suse.de>	2011-08-30 16:20:17 +10:00
NeilBrown	1b6afa1758	md/linear: avoid corrupting structure while waiting for rcu_free to complete. I don't know what I was thinking putting 'rcu' after a dynamically sized array! The array could still be in use when we call rcu_free() (That is the point) so we mustn't corrupt it. Cc: stable@kernel.org Signed-off-by: NeilBrown <neilb@suse.de>	2011-08-25 14:43:53 +10:00
Namhyung Kim	a5bf4df0c8	md: use REQ_NOIDLE flag in md_super_write() Queue idling is used for the anticipation of immediate sequencial I/O's but md_super_write() is a kind of one- shot operation, coupled with md_super_wait(), so the idling in this case will be just a waste of time. Specifying REQ_NOIDLE prevents it. Instead of adding the flag to submit_bio() directly, use pre-defined macro WRITE_FLUSH_FUA. Signed-off-by: Namhyung Kim <namhyung@gmail.com> Signed-off-by: NeilBrown <neilb@suse.de>	2011-08-25 14:43:34 +10:00
NeilBrown	aeb9b21184	md: ensure changes to 'write-mostly' are reflected in metadata. The 'write-mostly' flag can be changed through sysfs. With 0.90 metadata, those changes are reflected in the metadata. For 1.x metadata, they aren't. So fix super_1_sync to record 'write-mostly' status. Signed-off-by: NeilBrown <neilb@suse.de>	2011-08-25 14:43:08 +10:00
NeilBrown	5ef56c8fec	md: report failure if a 'set faulty' request doesn't. Sometimes a device will refuse to be set faulty. e.g. RAID1 will never let the last working device become faulty. So check if "md_error()" did manage to set the faulty flag and fail with EBUSY if it didn't. Resolves-Debian-Bug: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=601198 Reported-by: Mike Hommey <mh+reportbug@glandium.org> Signed-off-by: NeilBrown <neilb@suse.de>	2011-08-25 14:42:51 +10:00
Mike Snitzer	ed8b752bcc	dm table: set flush capability based on underlying devices DM has always advertised both REQ_FLUSH and REQ_FUA flush capabilities regardless of whether or not a given DM device's underlying devices also advertised a need for them. Block's flush-merge changes from 2.6.39 have proven to be more costly for DM devices. Performance regressions have been reported even when DM's underlying devices do not advertise that they have a write cache. Fix the performance regressions by configuring a DM device's flushing capabilities based on those of the underlying devices' capabilities. Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>	2011-08-02 12:32:08 +01:00
Milan Broz	772ae5f54d	dm crypt: optionally support discard requests Add optional parameter field to dmcrypt table and support "allow_discards" option. Discard requests bypass crypt queue processing. Bio is simple remapped to underlying device. Note that discard will be never enabled by default because of security consequences. It is up to the administrator to enable it for encrypted devices. (Note that userspace cryptsetup does not understand new optional parameters yet. Support for this will come later. Until then, you should use 'dmsetup' to enable and disable this.) Signed-off-by: Milan Broz <mbroz@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>	2011-08-02 12:32:08 +01:00
Jonathan Brassow	327372797c	dm raid: add md raid1 support Support the MD RAID1 personality through dm-raid. Signed-off-by: Jonathan Brassow <jbrassow@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>	2011-08-02 12:32:07 +01:00
Jonathan Brassow	b12d437b73	dm raid: support metadata devices Add the ability to parse and use metadata devices to dm-raid. Although not strictly required, without the metadata devices, many features of RAID are unavailable. They are used to store a superblock and bitmap. The role, or position in the array, of each device must be recorded in its superblock. This is to help with fault handling, array reshaping, and sanity checks. RAID 4/5/6 devices must be loaded in a specific order: in this way, the 'array_position' field helps validate the correctness of the mapping when it is loaded. It can be used during reshaping to identify which devices are added/removed. Fault handling is impossible without this field. For example, when a device fails it is recorded in the superblock. If this is a RAID1 device and the offending device is removed from the array, there must be a way during subsequent array assembly to determine that the failed device was the one removed. This is done by correlating the 'array_position' field and the bit-field variable 'failed_devices'. Signed-off-by: Jonathan Brassow <jbrassow@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>	2011-08-02 12:32:07 +01:00
Jonathan Brassow	46bed2b5c1	dm raid: add write_mostly parameter Add the write_mostly parameter to RAID1 dm-raid tables. This allows the user to set the WriteMostly flag on a RAID1 device that should normally be avoided for read I/O. Signed-off-by: Jonathan Brassow <jbrassow@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>	2011-08-02 12:32:07 +01:00
Jonathan Brassow	c1084561bb	dm raid: add region_size parameter Allow the user to specify the region_size. Ensures that the supplied value meets md's constraints, viz. the number of regions does not exceed 2^21. Signed-off-by: Jonathan Brassow <jbrassow@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>	2011-08-02 12:32:07 +01:00
Mikulas Patocka	759dea204c	dm ioctl: forbid multiple device specifiers Exactly one of name, uuid or device must be specified when referencing an existing device. This removes the ambiguity (risking the wrong device being updated) if two conflicting parameters were specified. Previously one parameter got used and any others were ignored silently. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>	2011-08-02 12:32:06 +01:00
Mikulas Patocka	ba2e19b0f4	dm ioctl: introduce __get_dev_cell Move logic to find device based on major/minor number to a separate function __get_dev_cell (similar to __get_uuid_cell and __get_name_cell). This makes the function __find_device_hash_cell more straightforward. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>	2011-08-02 12:32:06 +01:00
Mikulas Patocka	0ddf9644cc	dm ioctl: fill in device parameters in more ioctls Move parameter filling from find_device to __find_device_hash_cell. This patch causes ioctls using __find_device_hash_cell (DM_DEV_REMOVE_CMD, DM_DEV_SUSPEND_CMD - resume, DM_TABLE_CLEAR_CMD) to return device parameters, bringing them into line with the other ioctls. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>	2011-08-02 12:32:06 +01:00
Mike Snitzer	a3998799fb	dm flakey: add corrupt_bio_byte feature Add corrupt_bio_byte feature to simulate corruption by overwriting a byte at a specified position with a specified value during intervals when the device is "down". Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>	2011-08-02 12:32:06 +01:00
Mike Snitzer	b26f5e3d71	dm flakey: add drop_writes Add 'drop_writes' option to drop writes silently while the device is 'down'. Reads are not touched. Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>	2011-08-02 12:32:05 +01:00
Mike Snitzer	dfd068b01f	dm flakey: support feature args Add the ability to specify arbitrary feature flags when creating a flakey target. This code uses the same target argument helpers that the multipath target does. Also remove the superfluous 'dm-flakey' prefixes from the error messages, as they already contain the prefix 'flakey'. Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>	2011-08-02 12:32:05 +01:00
Mike Snitzer	30e4171bfe	dm flakey: use dm_target_offset and support discards Use dm_target_offset() and support discards. Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>	2011-08-02 12:32:05 +01:00
Mike Snitzer	498f0103ea	dm table: share target argument parsing functions Move multipath target argument parsing code into dm-table so other targets can share it. Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>	2011-08-02 12:32:04 +01:00
Mikulas Patocka	a6e50b409d	dm snapshot: skip reading origin when overwriting complete chunk If we write a full chunk in the snapshot, skip reading the origin device because the whole chunk will be overwritten anyway. This patch changes the snapshot write logic when a full chunk is written. In this case: 1. allocate the exception 2. dispatch the bio (but don't report the bio completion to device mapper) 3. write the exception record 4. report bio completed Callbacks must be done through the kcopyd thread, because callbacks must not race with each other. So we create two new functions: dm_kcopyd_prepare_callback: allocate a job structure and prepare the callback. (This function must not be called from interrupt context.) dm_kcopyd_do_callback: submit callback. (This function may be called from interrupt context.) Performance test (on snapshots with 4k chunk size): without the patch: non-direct-io sequential write (dd): 17.7MB/s direct-io sequential write (dd): 20.9MB/s non-direct-io random write (mkfs.ext2): 0.44s with the patch: non-direct-io sequential write (dd): 26.5MB/s direct-io sequential write (dd): 33.2MB/s non-direct-io random write (mkfs.ext2): 0.27s Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>	2011-08-02 12:32:04 +01:00

1 2 3 4 5 ...

2199 Commits