Commit Graph

1207 Commits

Author SHA1 Message Date
NeilBrown
1f59390339 md: support bitmaps on RAID10 arrays larger then 2 terabytes
.. and other arrays with components larger than 2 terabytes.

We use a "long" rather than a "sector_t" in part of the bitmap
size calculations, which is sad.

Reported-by: "Mario 'BitKoenig' Holbe" <Mario.Holbe@TU-Ilmenau.DE>
Signed-off-by: NeilBrown <neilb@suse.de>
2009-04-20 11:50:24 +10:00
NeilBrown
c03f6a1969 md: update sync_completed and reshape_position even more often.
There are circumstances when a user-space process might need to
"oversee" a resync/reshape process.  For example when doing an
in-place reshape of a raid5, it is prudent to take a backup of each
section before reshaping it as this is the only way to provide
safety against an unplanned shutdown (i.e. crash/power failure).

The sync_max sysfs value can be used to stop the resync from
advancing beyond a particular point.
So user-space can:
  suspend IO to the first section and back it up
  set 'sync_max' to the end of the section
  wait for 'sync_completed' to reach that point
  resume IO on the first section and move on to the next section.

However this process requires the kernel and user-space to run in
lock-step which could introduce unnecessary delays.

It would be better if a 'double buffered' approach could be used with
userspace and kernel space working on different sections with the
'next' section always ready when the 'current' section is finished.

One problem with implementing this is that sync_completed is only
guaranteed to be updated when the sync process reaches sync_max.
(it is updated on a time basis at other times, but it is hard to rely
on that).  This defeats some of the double buffering.

With this patch, sync_completed (and reshape_position) get updated as
the current position approaches sync_max, so there is room for
userspace to advance sync_max early without losing updates.

To be precise, sync_completed is updated when the current sync
position reaches half way between the current value of sync_completed
and the value of sync_max.  This will usually be a good time for user
space to update sync_max.

If sync_max does not get updated, the updates to sync_completed
(together with associated metadata updates) will occur at an
exponentially increasing frequency which will get unreasonably fast
(one update every page) immediately before the process hits sync_max
and stops.  So the update rate will be unreasonably fast only for an
insignificant period of time.

Signed-off-by: NeilBrown <neilb@suse.de>
2009-04-17 11:06:30 +10:00
NeilBrown
acb180b0e3 md: improve usefulness and accuracy of sysfs file md/sync_completed.
The sync_completed file reports how much of a resync (or recovery or
reshape) has been completed.
However due to the possibility of out-of-order completion of writes,
it is not certain to be accurate.

We have an internal value - mddev->curr_resync_completed - which is an
accurate value (though it might not always be quite so uptodate).

So:
 - make curr_resync_completed be uptodate a little more often,
   particularly when raid5 reshape updates status in the metadata
 - report curr_resync_completed in the sysfs file
 - allow poll/select to report all updates to md/sync_completed.

This makes sync_completed completed usable by any external metadata
handler that wants to record this status information in its metadata.

Signed-off-by: NeilBrown <neilb@suse.de>
2009-04-14 16:28:34 +10:00
NeilBrown
6d56e27844 md: allow setting newly added device to 'in_sync' via sysfs.
When adding devices to an active array via sysfs, there is currently
no way to mark a device as 'in-sync' which is useful when
incrementally assembling an array.

So add that option.

Signed-off-by: NeilBrown <neilb@suse.de>
2009-04-14 12:01:57 +10:00
Christoph Hellwig
63fe08177f md: tiny md.h cleanups
- update inclusion guard and make sure it covers the whole file
 - remove superflous #ifdef CONFIG_BLOCK
 - make sure all required headers are included so that new users aren't
   required to include others before

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: NeilBrown <neilb@suse.de>
2009-04-14 12:01:53 +10:00
Mikulas Patocka
340cd44451 dm kcopyd: fix callback race
If the thread calling dm_kcopyd_copy is delayed due to scheduling inside
split_job/segment_complete and the subjobs complete before the loop in
split_job completes, the kcopyd callback could be invoked from the
thread that called dm_kcopyd_copy instead of the kcopyd workqueue.

dm_kcopyd_copy -> split_job -> segment_complete -> job->fn()

Snapshots depend on the fact that callbacks are called from the singlethreaded
kcopyd workqueue and expect that there is no racing between individual
callbacks. The racing between callbacks can lead to corruption of exception
store and it can also mean that exception store callbacks are called twice
for the same exception - a likely reason for crashes reported inside
pending_complete() / remove_exception().

This patch fixes two problems:

1. job->fn being called from the thread that submitted the job (see above).

- Fix: hand over the completion callback to the kcopyd thread.

2. job->fn(read_err, write_err, job->context); in segment_complete
reports the error of the last subjob, not the union of all errors.

- Fix: pass job->write_err to the callback to report all error bits
  (it is done already in run_complete_job)

Cc: stable@kernel.org
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2009-04-09 00:27:17 +01:00
Mikulas Patocka
73830857bc dm kcopyd: prepare for callback race fix
Use a variable in segment_complete() to point to the dm_kcopyd_client
struct and only release job->pages in run_complete_job() if any are
defined.  These changes are needed by the next patch.

Cc: stable@kernel.org
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2009-04-09 00:27:16 +01:00
Mikulas Patocka
af7e466a1a dm: implement basic barrier support
Barriers are submitted to a worker thread that issues them in-order.

The thread is modified so that when it sees a barrier request it waits
for all pending IO before the request then submits the barrier and
waits for it.  (We must wait, otherwise it could be intermixed with
following requests.)

Errors from the barrier request are recorded in a per-device barrier_error
variable. There may be only one barrier request in progress at once.

For now, the barrier request is converted to a non-barrier request when
sending it to the underlying device.

This patch guarantees correct barrier behavior if the underlying device
doesn't perform write-back caching. The same requirement existed before
barriers were supported in dm.

Bottom layer barrier support (sending barriers by target drivers) and
handling devices with write-back caches will be done in further patches.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2009-04-09 00:27:16 +01:00
Mikulas Patocka
92c639021c dm: remove dm_request loop
Remove queue_io return value and a loop in dm_request.

IO may be submitted to a worker thread with queue_io().  queue_io() sets
DMF_QUEUE_IO_TO_THREAD so that all further IO is queued for the thread. When
the thread finishes its work, it clears DMF_QUEUE_IO_TO_THREAD and from this
point on, requests are submitted from dm_request again. This will be used
for processing barriers.

Remove the loop in dm_request. queue_io() can submit I/Os to the worker thread
even if DMF_QUEUE_IO_TO_THREAD was not set.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2009-04-09 00:27:15 +01:00
Mikulas Patocka
3b00b2036f dm: rework queueing and suspension
Rework shutting down on suspend and document the associated rules.

Drop write lock in __split_and_process_bio to allow more processing
concurrency.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2009-04-09 00:27:15 +01:00
Alasdair G Kergon
54d9a1b451 dm: simplify dm_request loop
Refactor the code in dm_request().

Require the new DMF_BLOCK_FOR_SUSPEND flag on readahead bios we will
discard so we don't drop such bios while processing a barrier.

Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2009-04-09 00:27:14 +01:00
Alasdair G Kergon
1eb787ec18 dm: split DMF_BLOCK_IO flag into two
Split the DMF_BLOCK_IO flag into two.

DMF_BLOCK_IO_FOR_SUSPEND is set when I/O must be blocked while suspending a
device.  DMF_QUEUE_IO_TO_THREAD is set when I/O must be queued to a
worker thread for later processing.

Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2009-04-09 00:27:14 +01:00
Alasdair G Kergon
df12ee9963 dm: rearrange dm_wq_work
Refactor dm_wq_work() to make later patch more readable.

Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2009-04-09 00:27:13 +01:00
Mikulas Patocka
692d0eb9e0 dm: remove limited barrier support
Prepare for full barrier implementation: first remove the restricted support.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2009-04-09 00:27:13 +01:00
Martin K. Petersen
9c47008d13 dm: add integrity support
This patch provides support for data integrity passthrough in the device
mapper.

 - If one or more component devices support integrity an integrity
   profile is preallocated for the DM device.

 - If all component devices have compatible profiles the DM device is
   flagged as capable.

 - Handle integrity metadata when splitting and cloning bios.

Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2009-04-09 00:27:12 +01:00
Alexander Beregalov
91a9e99d76 md/raid1: fix build breakage
Fix this build error:

  drivers/md/raid1.c: In function 'raid1_congested':
  drivers/md/raid1.c:589: error: 'BDI_write_congested' undeclared

BDI_write_congested was changed in commit 1faa16d228 ("block: change the
request allocation/congestion logic to be sync/async based")

Signed-off-by: Alexander Beregalov <a.beregalov@gmail.com>
Cc: Neil Brown <neilb@suse.de>
Cc: Jens Axboe <jens.axboe@oracle.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-04-06 14:40:07 -07:00
NeilBrown
303a0e11d0 md/raid1 - don't assume newly allocated bvecs are initialised.
Since commit d3f761104b
newly allocated bvecs aren't initialised to NULL, so we have
to be more careful about freeing a bio which only managed
to get a few pages allocated to it.  Otherwise the resync
process crashes.

This patch is appropriate for 2.6.29-stable.

Cc: stable@kernel.org
Cc: "Jens Axboe" <jens.axboe@oracle.com>
Reported-by: Gabriele Tozzi <gabriele@tozzi.eu>
Signed-off-by: NeilBrown <neilb@suse.de>
2009-04-06 14:40:38 +10:00
Linus Torvalds
d9b9be024a Merge git://git.kernel.org/pub/scm/linux/kernel/git/agk/linux-2.6-dm
* git://git.kernel.org/pub/scm/linux/kernel/git/agk/linux-2.6-dm: (36 commits)
  dm: set queue ordered mode
  dm: move wait queue declaration
  dm: merge pushback and deferred bio lists
  dm: allow uninterruptible wait for pending io
  dm: merge __flush_deferred_io into caller
  dm: move bio_io_error into __split_and_process_bio
  dm: rename __split_bio
  dm: remove unnecessary struct dm_wq_req
  dm: remove unnecessary work queue context field
  dm: remove unnecessary work queue type field
  dm: bio list add bio_list_add_head
  dm snapshot: persistent fix dtr cleanup
  dm snapshot: move status to exception store
  dm snapshot: move ctr parsing to exception store
  dm snapshot: use DMEMIT macro for status
  dm snapshot: remove dm_snap header
  dm snapshot: remove dm_snap header use
  dm exception store: move cow pointer
  dm exception store: move chunk_fields
  dm exception store: move dm_target pointer
  ...
2009-04-03 10:02:45 -07:00
Linus Torvalds
223cdea4c4 Merge branch 'for-linus' of git://neil.brown.name/md
* 'for-linus' of git://neil.brown.name/md: (53 commits)
  md/raid5 revise rules for when to update metadata during reshape
  md/raid5: minor code cleanups in make_request.
  md: remove CONFIG_MD_RAID_RESHAPE config option.
  md/raid5: be more careful about write ordering when reshaping.
  md: don't display meaningless values in sysfs files resync_start and sync_speed
  md/raid5: allow layout and chunksize to be changed on active array.
  md/raid5: reshape using largest of old and new chunk size
  md/raid5: prepare for allowing reshape to change layout
  md/raid5: prepare for allowing reshape to change chunksize.
  md/raid5: clearly differentiate 'before' and 'after' stripes during reshape.
  Documentation/md.txt update
  md: allow number of drives in raid5 to be reduced
  md/raid5: change reshape-progress measurement to cope with reshaping backwards.
  md: add explicit method to signal the end of a reshape.
  md/raid5: enhance raid5_size to work correctly with negative delta_disks
  md/raid5: drop qd_idx from r6_state
  md/raid6: move raid6 data processing to raid6_pq.ko
  md: raid5 run(): Fix max_degraded for raid level 4.
  md: 'array_size' sysfs attribute
  md: centralize ->array_sectors modifications
  ...
2009-04-03 09:08:19 -07:00
Mikulas Patocka
99360b4c18 dm: set queue ordered mode
Set queue ordered mode.  It doesn't really matter what we set here
because we don't ever put any requests on the queue.  But we need to set
something other than QUEUE_ORDERED_NONE so that __generic_make_request
passes barrier requests to us.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2009-04-02 19:55:39 +01:00
Mikulas Patocka
b44ebeb017 dm: move wait queue declaration
Move wait queue declaration and unplug to dm_wait_for_completion.

The purpose is to minimize duplicate code in the further patches.

The patch reorders functions a little bit. It doesn't change any
functionality. For proper non-deadlock operation, add_wait_queue must
happen before set_current_state(interruptible) and before the test for
!atomic_read(&md->pending).

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2009-04-02 19:55:39 +01:00
Mikulas Patocka
022c261100 dm: merge pushback and deferred bio lists
Merge pushback and deferred lists into one list - use deferred list
for both deferred and pushed-back bios.

This will be needed for proper support of barrier bios: it is impossible to
support ordering correctly with two lists because the requests on both lists
will be mixed up.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2009-04-02 19:55:39 +01:00
Mikulas Patocka
401600dfd3 dm: allow uninterruptible wait for pending io
Allow uninterruptible wait for pending IOs.

Add argument "interruptible" to dm_wait_for_completion that specifies
either interruptible or uninterruptible waiting.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2009-04-02 19:55:38 +01:00
Mikulas Patocka
ef2085870e dm: merge __flush_deferred_io into caller
Merge __flush_deferred_io() into the only caller, dm_wq_work().

There's no need to have a function that has only one caller.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2009-04-02 19:55:38 +01:00
Mikulas Patocka
f0b9a4502b dm: move bio_io_error into __split_and_process_bio
Move the bio_io_error() calls directly into __split_and_process_bio().

This avoids some code duplication in later patches.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2009-04-02 19:55:38 +01:00
Mikulas Patocka
8a53c28db4 dm: rename __split_bio
Rename __split_bio() to __split_and_process_bio() because it not only splits
the bio to serveral parts, but also submits them to target drivers.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2009-04-02 19:55:37 +01:00
Mikulas Patocka
53d5914f28 dm: remove unnecessary struct dm_wq_req
Remove struct dm_wq_req and move "work" directly into struct mapped_device.

In the revised implementation, the thread will do just one type of work
(processing the queue).

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2009-04-02 19:55:37 +01:00
Mikulas Patocka
9a1fb46448 dm: remove unnecessary work queue context field
Remove the context field from struct dm_wq_req because we will no longer
need it.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2009-04-02 19:55:36 +01:00
Mikulas Patocka
143773965b dm: remove unnecessary work queue type field
Remove "type" field from struct dm_wq_req because we no longer need it
to have more than one value.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2009-04-02 19:55:36 +01:00
Mikulas Patocka
99c75e3130 dm: bio list add bio_list_add_head
Introduce a function that adds a bio to the head of the list for
use by the patch that will support barriers.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2009-04-02 19:55:36 +01:00
Jonathan Brassow
a32079ce17 dm snapshot: persistent fix dtr cleanup
The persistent exception store destructor does not properly
account for all conditions in which it can be called.  If it
is called after 'ctr' but before 'read_metadata' (e.g. if
something else in 'snapshot_ctr' fails) then it will attempt
to free areas of memory that haven't been allocated yet.

Signed-off-by: Jonathan Brassow <jbrassow@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2009-04-02 19:55:35 +01:00
Jonathan Brassow
1e302a929e dm snapshot: move status to exception store
Let the exception store types print out their status through
the new API, rather than having the snapshot code do it.

Adjust the buffer position to allow for the preceding DMEMIT in the
arguments to type->status().

Signed-off-by: Jonathan Brassow <jbrassow@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2009-04-02 19:55:35 +01:00
Jonathan Brassow
fee1998e9c dm snapshot: move ctr parsing to exception store
First step of having the exception stores parse their own arguments -
generalizing the interface.

Signed-off-by: Jonathan Brassow <jbrassow@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2009-04-02 19:55:34 +01:00
Jonathan Brassow
2e4a31df2b dm snapshot: use DMEMIT macro for status
Use DMEMIT in place of snprintf.  This makes it easier later when
other modules are helping to populate our status output.

Signed-off-by: Jonathan Brassow <jbrassow@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2009-04-02 19:55:34 +01:00
Jonathan Brassow
ccc45ea8ae dm snapshot: remove dm_snap header
Move some of the last bits from dm-snap.h into dm-snap.c where they
belong and remove dm-snap.h.

Signed-off-by: Jonathan Brassow <jbrassow@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2009-04-02 19:55:34 +01:00
Jonathan Brassow
71fab00a6b dm snapshot: remove dm_snap header use
Move useful functions out of dm-snap.h and stop using dm-snap.h.

Signed-off-by: Jonathan Brassow <jbrassow@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2009-04-02 19:55:33 +01:00
Jonathan Brassow
49beb2b87a dm exception store: move cow pointer
Move COW device from snapshot to exception store.

Signed-off-by: Jonathan Brassow <jbrassow@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2009-04-02 19:55:33 +01:00
Jonathan Brassow
d021684951 dm exception store: move chunk_fields
Move chunk fields from snapshot to exception store.

Signed-off-by: Jonathan Brassow <jbrassow@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2009-04-02 19:55:32 +01:00
Jonathan Brassow
0cea9c7827 dm exception store: move dm_target pointer
Move target pointer from snapshot to exception store.

Signed-off-by: Jonathan Brassow <jbrassow@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2009-04-02 19:55:32 +01:00
Jonathan Brassow
493df71c64 dm exception store: introduce registry
Move exception stores into a registry.

Signed-off-by: Jonathan Brassow <jbrassow@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2009-04-02 19:55:31 +01:00
Jonathan Brassow
7513c2a761 dm raid1: add is_remote_recovering hook for clusters
The logging API needs an extra function to make cluster mirroring
possible.  This new function allows us to check whether a mirror
region is being recovered on another machine in the cluster.  This
helps us prevent simultaneous recovery I/O and process I/O to the
same locations on disk.

Cluster-aware log modules will implement this function.  Single
machine log modules will not.  So, there is no performance
penalty for single machine mirrors.

Signed-off-by: Jonathan Brassow <jbrassow@redhat.com>
Acked-by: Heinz Mauelshagen <heinzm@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2009-04-02 19:55:30 +01:00
Jonathan Brassow
b2a1146529 dm exception store: separate type from instance
Introduce struct dm_exception_store_type.

Signed-off-by: Jonathan Brassow <jbrassow@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2009-04-02 19:55:30 +01:00
Mike Snitzer
ec44ab9d66 dm log: remove struct dm_dirty_log_internal
Remove the 'dm_dirty_log_internal' structure.  The resulting cleanup
eliminates extra memory allocations.  Therefore exposing the internal
list_head to the external 'dm_dirty_log_type' structure is a worthwhile
compromise.

Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2009-04-02 19:55:30 +01:00
Mike Snitzer
84e67c9319 dm log: use standard kernel module refcount
Avoid private module usage accounting by removing 'use' from
dm_dirty_log_internal.  The standard module reference counting is
sufficient.

Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2009-04-02 19:55:29 +01:00
Johannes Weiner
b81d6cf79b dm crypt: use kzfree
Use kzfree() instead of memset() + kfree().

Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Pekka Enberg <penberg@cs.helsinki.fi>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2009-04-02 19:55:28 +01:00
Cheng Renquan
45194e4f89 dm target: remove struct tt_internal
The tt_internal is really just a list_head to manage registered target_type
in a double linked list,

Here embed the list_head into target_type directly,
1. to avoid kmalloc/kfree;
2. then tt_internal is really unneeded;

Cc: stable@kernel.org
Signed-off-by: Cheng Renquan <crquan@gmail.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Reviewed-by: Alasdair G Kergon <agk@redhat.com>
2009-04-02 19:55:28 +01:00
Alasdair G Kergon
570b9d968b dm table: fix upgrade mode race
upgrade_mode() sets bdev to NULL temporarily, and does not have any
locking to exclude anything from seeing that NULL.

In dm_table_any_congested() bdev_get_queue() can dereference that NULL and
cause a reported oops.

Fix this by not changing that field during the mode upgrade.

Cc: stable@kernel.org
Cc: Neil Brown <neilb@suse.de>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2009-04-02 19:55:28 +01:00
Jun'ichi Nomura
aea9058801 dm: path selector use module refcount directly
Fix refcount corruption in dm-path-selector

Refcounting with non-atomic ops under shared lock will corrupt the counter
in multi-processor system and may trigger BUG_ON().
Use module refcount.
# same approach as dm-target-use-module-refcount-directly.patch here
# https://www.redhat.com/archives/dm-devel/2008-December/msg00075.html

Typical oops:
  kernel BUG at linux-2.6.29-rc3/drivers/md/dm-path-selector.c:90!
  Pid: 11148, comm: dmsetup Not tainted 2.6.29-rc3-nm #1
  dm_put_path_selector+0x4d/0x61 [dm_multipath]
  Call Trace:
   [<ffffffffa031d3f9>] free_priority_group+0x33/0xb3 [dm_multipath]
   [<ffffffffa031d4aa>] free_multipath+0x31/0x67 [dm_multipath]
   [<ffffffffa031d50d>] multipath_dtr+0x2d/0x32 [dm_multipath]
   [<ffffffffa015d6c2>] dm_table_destroy+0x64/0xd8 [dm_mod]
   [<ffffffffa015b73a>] __unbind+0x46/0x4b [dm_mod]
   [<ffffffffa015b79f>] dm_swap_table+0x60/0x14d [dm_mod]
   [<ffffffffa015f963>] dev_suspend+0xfd/0x177 [dm_mod]
   [<ffffffffa0160250>] dm_ctl_ioctl+0x24c/0x29c [dm_mod]
   [<ffffffff80288cd3>] ? get_page_from_freelist+0x49c/0x61d
   [<ffffffffa015f866>] ? dev_suspend+0x0/0x177 [dm_mod]
   [<ffffffff802bf05c>] vfs_ioctl+0x2a/0x77
   [<ffffffff802bf4f1>] do_vfs_ioctl+0x448/0x4a0
   [<ffffffff802bf5a0>] sys_ioctl+0x57/0x7a
   [<ffffffff8020c05b>] system_call_fastpath+0x16/0x1b

Cc: stable@kernel.org
Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2009-04-02 19:55:27 +01:00
Cheng Renquan
5642b8a61a dm target: use module refcount directly
The tt_internal's 'use' field is superfluous: the module's refcount can do
the work properly.  An acceptable side-effect is that this increases the
reference counts reported by 'lsmod'.

Remove the superfluous test when removing a target module.

[Crash possible without this on SMP - agk]

Cc: stable@kernel.org
Signed-off-by: Cheng Renquan <crquan@gmail.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Reviewed-by: Alasdair G Kergon <agk@redhat.com>
Reviewed-by: Jonathan Brassow <jbrassow@redhat.com>
2009-04-02 19:55:27 +01:00
Mikulas Patocka
35bf659b00 dm snapshot: avoid having two exceptions for the same chunk
We need to check if the exception was completed after dropping the lock.

After regaining the lock, __find_pending_exception checks if the exception
was already placed into &s->pending hash.

But we don't check if the exception was already completed and placed into
&s->complete hash. If the process waiting in alloc_pending_exception was
delayed at this point because of a scheduling latency and the exception
was meanwhile completed, we'd miss that and allocate another pending
exception for already completed chunk.

It would lead to a situation where two records for the same chunk exist
and potential data corruption because multiple snapshot I/Os to the
affected chunk could be redirected to different locations in the
snapshot.

Cc: stable@kernel.org
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2009-04-02 19:55:26 +01:00