2
0
mirror of https://github.com/edk2-porting/linux-next.git synced 2024-12-24 05:04:00 +08:00
Commit Graph

4021 Commits

Author SHA1 Message Date
Linus Torvalds
33caf82acf Merge branch 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull misc vfs updates from Al Viro:
 "All kinds of stuff.  That probably should've been 5 or 6 separate
  branches, but by the time I'd realized how large and mixed that bag
  had become it had been too close to -final to play with rebasing.

  Some fs/namei.c cleanups there, memdup_user_nul() introduction and
  switching open-coded instances, burying long-dead code, whack-a-mole
  of various kinds, several new helpers for ->llseek(), assorted
  cleanups and fixes from various people, etc.

  One piece probably deserves special mention - Neil's
  lookup_one_len_unlocked().  Similar to lookup_one_len(), but gets
  called without ->i_mutex and tries to avoid ever taking it.  That, of
  course, means that it's not useful for any directory modifications,
  but things like getting inode attributes in nfds readdirplus are fine
  with that.  I really should've asked for moratorium on lookup-related
  changes this cycle, but since I hadn't done that early enough...  I
  *am* asking for that for the coming cycle, though - I'm going to try
  and get conversion of i_mutex to rwsem with ->lookup() done under lock
  taken shared.

  There will be a patch closer to the end of the window, along the lines
  of the one Linus had posted last May - mechanical conversion of
  ->i_mutex accesses to inode_lock()/inode_unlock()/inode_trylock()/
  inode_is_locked()/inode_lock_nested().  To quote Linus back then:

    -----
    |    This is an automated patch using
    |
    |        sed 's/mutex_lock(&\(.*\)->i_mutex)/inode_lock(\1)/'
    |        sed 's/mutex_unlock(&\(.*\)->i_mutex)/inode_unlock(\1)/'
    |        sed 's/mutex_lock_nested(&\(.*\)->i_mutex,[     ]*I_MUTEX_\([A-Z0-9_]*\))/inode_lock_nested(\1, I_MUTEX_\2)/'
    |        sed 's/mutex_is_locked(&\(.*\)->i_mutex)/inode_is_locked(\1)/'
    |        sed 's/mutex_trylock(&\(.*\)->i_mutex)/inode_trylock(\1)/'
    |
    |    with a very few manual fixups
    -----

  I'm going to send that once the ->i_mutex-affecting stuff in -next
  gets mostly merged (or when Linus says he's about to stop taking
  merges)"

* 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (63 commits)
  nfsd: don't hold i_mutex over userspace upcalls
  fs:affs:Replace time_t with time64_t
  fs/9p: use fscache mutex rather than spinlock
  proc: add a reschedule point in proc_readfd_common()
  logfs: constify logfs_block_ops structures
  fcntl: allow to set O_DIRECT flag on pipe
  fs: __generic_file_splice_read retry lookup on AOP_TRUNCATED_PAGE
  fs: xattr: Use kvfree()
  [s390] page_to_phys() always returns a multiple of PAGE_SIZE
  nbd: use ->compat_ioctl()
  fs: use block_device name vsprintf helper
  lib/vsprintf: add %*pg format specifier
  fs: use gendisk->disk_name where possible
  poll: plug an unused argument to do_poll
  amdkfd: don't open-code memdup_user()
  cdrom: don't open-code memdup_user()
  rsxx: don't open-code memdup_user()
  mtip32xx: don't open-code memdup_user()
  [um] mconsole: don't open-code memdup_user_nul()
  [um] hostaudio: don't open-code memdup_user()
  ...
2016-01-12 17:11:47 -08:00
Linus Torvalds
03891f9c85 - The most significant set of changes this cycle is the Forward Error
Correction (FEC) support that has been added to the DM verity target.
   Google uses DM verity on all Android devices and it is believed that
   this FEC support will enable DM verity to recover from storage
   failures seen since DM verity was first deployed as part of Android.
 
 - A stable fix for a race in the destruction of DM thin pool's workqueue
 
 - A stable fix for hung IO if a DM snapshot copy hit an error
 
 - A few small cleanups in DM core and DM persistent data.
 
 - A couple DM thinp range discard improvements (address atomicity of
   finding a range and the efficiency of discarding a partially mapped
   thin device)
 
 - Add ability to debug DM bufio leaks by recording stack trace when a
   buffer is allocated.  Upon detected leak the recorded stack is dumped.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJWlAf0AAoJEMUj8QotnQNaHPIH/2rvnzg71RsP/7IRI/5DHETP
 ubxKhKd7tfqwJEjuQvhiYB1Ubo+gvXEuT51C2G1ug2QzHsjymmE14q/60ElB7+/U
 ++bGisWvqm4ZqWWM9yffqbESzNOfNTn7dLduaxGeLxVG3zVLfzQRfSPOqhk1FiIv
 H35v0Xx/j1NAHQtcocVYzG4P5BwfgmeyuYmUq8BklHNlwa3drBKnMZfIlF4u2216
 Z3K7d+5nLpSsPyejzpQlByHTUt/eVy1Y2ZBgudWITaP5DAcUQwHyLZI4k3skmMiK
 O/xLZ54aeKI9NhtEwH8s8jOd3b7Kvw/oAw5nfPj7jmIDF3if8U2HCU6KgfBVwwU=
 =fOsS
 -----END PGP SIGNATURE-----

Merge tag 'dm-4.5-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm

Pull device mapper updates from Mike Snitzer:

 - The most significant set of changes this cycle is the Forward Error
   Correction (FEC) support that has been added to the DM verity target.

   Google uses DM verity on all Android devices and it is believed that
   this FEC support will enable DM verity to recover from storage
   failures seen since DM verity was first deployed as part of Android.

 - A stable fix for a race in the destruction of DM thin pool's
   workqueue

 - A stable fix for hung IO if a DM snapshot copy hit an error

 - A few small cleanups in DM core and DM persistent data.

 - A couple DM thinp range discard improvements (address atomicity of
   finding a range and the efficiency of discarding a partially mapped
   thin device)

 - Add ability to debug DM bufio leaks by recording stack trace when a
   buffer is allocated.  Upon detected leak the recorded stack is
   dumped.

* tag 'dm-4.5-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
  dm snapshot: fix hung bios when copy error occurs
  dm thin: bump thin and thin-pool target versions
  dm thin: fix race condition when destroying thin pool workqueue
  dm space map metadata: remove unused variable in brb_pop()
  dm verity: add ignore_zero_blocks feature
  dm verity: add support for forward error correction
  dm verity: factor out verity_for_bv_block()
  dm verity: factor out structures and functions useful to separate object
  dm verity: move dm-verity.c to dm-verity-target.c
  dm verity: separate function for parsing opt args
  dm verity: clean up duplicate hashing code
  dm btree: factor out need_insert() helper
  dm bufio: use BUG_ON instead of conditional call to BUG
  dm bufio: store stacktrace in buffers to help find buffer leaks
  dm bufio: return NULL to improve code clarity
  dm block manager: cleanup code that prints stacktrace
  dm: don't save and restore bi_private
  dm thin metadata: make dm_thin_find_mapped_range() atomic
  dm thin metadata: speed up discard of partially mapped volumes
2016-01-11 22:25:00 -08:00
Mikulas Patocka
385277bfb5 dm snapshot: fix hung bios when copy error occurs
When there is an error copying a chunk dm-snapshot can incorrectly hold
associated bios indefinitely, resulting in hung IO.

The function copy_callback sets pe->error if there was error copying the
chunk, and then calls complete_exception.  complete_exception calls
pending_complete on error, otherwise it calls commit_exception with
commit_callback (and commit_callback calls complete_exception).

The persistent exception store (dm-snap-persistent.c) assumes that calls
to prepare_exception and commit_exception are paired.
persistent_prepare_exception increases ps->pending_count and
persistent_commit_exception decreases it.

If there is a copy error, persistent_prepare_exception is called but
persistent_commit_exception is not.  This results in the variable
ps->pending_count never returning to zero and that causes some pending
exceptions (and their associated bios) to be held forever.

Fix this by unconditionally calling commit_exception regardless of
whether the copy was successful.  A new "valid" parameter is added to
commit_exception -- when the copy fails this parameter is set to zero so
that the chunk that failed to copy (and all following chunks) is not
recorded in the snapshot store.  Also, remove commit_callback now that
it is merely a wrapper around pending_complete.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Cc: stable@vger.kernel.org
2016-01-08 20:03:05 -05:00
Mike Snitzer
1c2e54e1ed dm thin: bump thin and thin-pool target versions
Commit 3d5f6733 ("dm thin metadata: speed up discard of partially mapped
volumes"), or some other dm-thinp change during the Linux 4.5
development window, really should've bumped these target versions.

Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2016-01-06 20:59:40 -05:00
Al Viro
93bbf5831d md: more open-coded offset_in_page()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-01-04 10:29:12 -05:00
Al Viro
756d097b95 dm-bufio: virt_to_phys() doesn't change remainder modulo PAGE_SIZE
... so virt_to_phys(p) & (PAGE_SIZE - 1) is a very odd way to
spell offset_in_page(p).

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-01-04 10:29:07 -05:00
NeilBrown
312045eef9 md: remove check for MD_RECOVERY_NEEDED in action_store.
md currently doesn't allow a 'sync_action' such as 'reshape' to be set
while MD_RECOVERY_NEEDED is set.

This s a problem, particularly since commit 738a273806 as that can
cause ->check_shape to call mddev_resume() which sets
MD_RECOVERY_NEEDED.  So by the time we come to start 'reshape' it is
very likely that MD_RECOVERY_NEEDED is still set.

Testing for this flag is not really needed and is in any case very
racy as it can be set at any moment - asynchronously.  Any race
between setting a sync_action and setting MD_RECOVERY_NEEDED must
already be handled properly in some locked code, probably
md_check_recovery(), so remove the test here.

The test on MD_RECOVERY_RUNNING is also racy in the 'reshape' case
so we should test it again after getting mddev_lock().

As this fixes a race and a regression which can cause 'reshape' to
fail, it is suitable for -stable kernels since 4.1

Reported-by: Xiao Ni <xni@redhat.com>
Fixes: 738a273806 ("md/raid5: fix allocation of 'scribble' array.")
Cc: stable@vger.kernel.org (v4.1+)
Signed-off-by: NeilBrown <neilb@suse.com>
2015-12-21 11:10:06 +11:00
Goldwyn Rodrigues
cb01c5496d Fix remove_and_add_spares removes drive added as spare in slot_store
Commit 2910ff17d1
introduced a regression which would remove a recently added spare via
slot_store. Revert part of the patch which touches slot_store() and add
the disk directly using pers->hot_add_disk()

Fixes: 2910ff17d1 ("md: remove_and_add_spares() to activate specific
rdev")
Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
Signed-off-by: Pawel Baldysiak <pawel.baldysiak@intel.com>
Signed-off-by: NeilBrown <neilb@suse.com>
2015-12-18 15:19:16 +11:00
Mikulas Patocka
0dc10e50f2 md: fix bug due to nested suspend
The patch c7bfced9a6 committed to 4.4-rc
causes crash in LVM test shell/lvchange-raid.sh. The kernel crashes with
this BUG, the reason is that we attempt to suspend a device that is
already suspended. See also
https://bugzilla.redhat.com/show_bug.cgi?id=1283491

This patch fixes the bug by changing functions mddev_suspend and
mddev_resume to always nest.
The number of nested calls to mddev_nested_suspend is kept in the
variable mddev->suspended.
[neilb: made mddev_suspend() always nest instead of introduce mddev_nested_suspend]

kernel BUG at drivers/md/md.c:317!
CPU: 3 PID: 32754 Comm: lvm Not tainted 4.4.0-rc2 #1
task: 0000000047076040 ti: 0000000047014000 task.ti: 0000000047014000

     YZrvWESTHLNXBCVMcbcbcbcbOGFRQPDI
PSW: 00001000000001000000000000001111 Not tainted
r00-03  000000000804000f 00000000102c5280 0000000010c7522c 000000007e3d1810
r04-07  0000000010c6f000 000000004ef37f20 000000007e3d1dd0 000000007e3d1810
r08-11  000000007c9f1600 0000000000000000 0000000000000001 ffffffffffffffff
r12-15  0000000010c1d000 0000000000000041 00000000f98d63c8 00000000f98e49e4
r16-19  00000000f98e49e4 00000000c138fd06 00000000f98d63c8 0000000000000001
r20-23  0000000000000002 000000004ef37f00 00000000000000b0 00000000000001d1
r24-27  00000000424783a0 000000007e3d1dd0 000000007e3d1810 00000000102b2000
r28-31  0000000000000001 0000000047014840 0000000047014930 0000000000000001
sr00-03  0000000007040800 0000000000000000 0000000000000000 0000000007040800
sr04-07  0000000000000000 0000000000000000 0000000000000000 0000000000000000

IASQ: 0000000000000000 0000000000000000 IAOQ: 00000000102c538c 00000000102c5390
 IIR: 03ffe01f    ISR: 0000000000000000  IOR: 00000000102b2748
 CPU:        3   CR30: 0000000047014000 CR31: 0000000000000000
 ORIG_R28: 00000000000000b0
 IAOQ[0]: mddev_suspend+0x10c/0x160 [md_mod]
 IAOQ[1]: mddev_suspend+0x110/0x160 [md_mod]
 RP(r2): raid1_add_disk+0xd4/0x2c0 [raid1]
Backtrace:
 [<0000000010c7522c>] raid1_add_disk+0xd4/0x2c0 [raid1]
 [<0000000010c20078>] raid_resume+0x390/0x418 [dm_raid]
 [<00000000105833e8>] dm_table_resume_targets+0xc0/0x188 [dm_mod]
 [<000000001057f784>] dm_resume+0x144/0x1e0 [dm_mod]
 [<0000000010587dd4>] dev_suspend+0x1e4/0x568 [dm_mod]
 [<0000000010589278>] ctl_ioctl+0x1e8/0x428 [dm_mod]
 [<0000000010589518>] dm_compat_ctl_ioctl+0x18/0x68 [dm_mod]
 [<0000000040377b88>] compat_SyS_ioctl+0xd0/0x1558

Fixes: c7bfced9a6 ("md: suspend i/o during runtime blk_integrity_unregister")
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: NeilBrown <neilb@suse.com>
2015-12-18 15:19:16 +11:00
Shaohua Li
9b15603dbd MD: change journal disk role to disk 0
Neil pointed out setting journal disk role to raid_disks will confuse
reshape if we support reshape eventually. Switching the role to 0 (we
should be fine as long as the value >=0) and skip sysfs file creation to
avoid error.

Signed-off-by: Shaohua Li <shli@fb.com>
Signed-off-by: NeilBrown <neilb@suse.com>
2015-12-18 15:19:16 +11:00
Artur Paszkiewicz
cc57858831 md/raid10: fix data corruption and crash during resync
The commit c31df25f20 ("md/raid10: make sync_request_write() call
bio_copy_data()") replaced manual data copying with bio_copy_data() but
it doesn't work as intended. The source bio (fbio) is already processed,
so its bvec_iter has bi_size == 0 and bi_idx == bi_vcnt.  Because of
this, bio_copy_data() either does not copy anything, or worse, copies
data from the ->bi_next bio if it is set.  This causes wrong data to be
written to drives during resync and sometimes lockups/crashes in
bio_copy_data():

[  517.338478] NMI watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [md126_raid10:3319]
[  517.347324] Modules linked in: raid10 xt_CHECKSUM ipt_MASQUERADE nf_nat_masquerade_ipv4 tun ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 ipt_REJECT nf_reject_ipv4 xt_conntrack ebtable_nat ebtable_broute bridge stp llc ebtable_filter ebtables ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_security ip6table_raw ip6table_filter ip6_tables iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_security iptable_raw iptable_filter ip_tables x86_pkg_temp_thermal coretemp kvm_intel kvm crct10dif_pclmul crc32_pclmul cryptd shpchp pcspkr ipmi_si ipmi_msghandler tpm_crb acpi_power_meter acpi_cpufreq ext4 mbcache jbd2 sr_mod cdrom sd_mod e1000e ax88179_178a usbnet mii ahci ata_generic crc32c_intel libahci ptp pata_acpi libata pps_core wmi sunrpc dm_mirror dm_region_hash dm_log dm_mod
[  517.440555] CPU: 0 PID: 3319 Comm: md126_raid10 Not tainted 4.3.0-rc6+ #1
[  517.448384] Hardware name: Intel Corporation PURLEY/PURLEY, BIOS PLYDCRB1.86B.0055.D14.1509221924 09/22/2015
[  517.459768] task: ffff880153773980 ti: ffff880150df8000 task.ti: ffff880150df8000
[  517.468529] RIP: 0010:[<ffffffff812e1888>]  [<ffffffff812e1888>] bio_copy_data+0xc8/0x3c0
[  517.478164] RSP: 0018:ffff880150dfbc98  EFLAGS: 00000246
[  517.484341] RAX: ffff880169356688 RBX: 0000000000001000 RCX: 0000000000000000
[  517.492558] RDX: 0000000000000000 RSI: ffffea0001ac2980 RDI: ffffea0000d835c0
[  517.500773] RBP: ffff880150dfbd08 R08: 0000000000000001 R09: ffff880153773980
[  517.508987] R10: ffff880169356600 R11: 0000000000001000 R12: 0000000000010000
[  517.517199] R13: 000000000000e000 R14: 0000000000000000 R15: 0000000000001000
[  517.525412] FS:  0000000000000000(0000) GS:ffff880174a00000(0000) knlGS:0000000000000000
[  517.534844] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  517.541507] CR2: 00007f8a044d5fed CR3: 0000000169504000 CR4: 00000000001406f0
[  517.549722] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  517.557929] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  517.566144] Stack:
[  517.568626]  ffff880174a16bc0 ffff880153773980 ffff880169356600 0000000000000000
[  517.577659]  0000000000000001 0000000000000001 ffff880153773980 ffff88016a61a800
[  517.586715]  ffff880150dfbcf8 0000000000000001 ffff88016dd209e0 0000000000001000
[  517.595773] Call Trace:
[  517.598747]  [<ffffffffa043ef95>] raid10d+0xfc5/0x1690 [raid10]
[  517.605610]  [<ffffffff816697ae>] ? __schedule+0x29e/0x8e2
[  517.611987]  [<ffffffff814ff206>] md_thread+0x106/0x140
[  517.618072]  [<ffffffff810c1d80>] ? wait_woken+0x80/0x80
[  517.624252]  [<ffffffff814ff100>] ? super_1_load+0x520/0x520
[  517.630817]  [<ffffffff8109ef89>] kthread+0xc9/0xe0
[  517.636506]  [<ffffffff8109eec0>] ? flush_kthread_worker+0x70/0x70
[  517.643653]  [<ffffffff8166d99f>] ret_from_fork+0x3f/0x70
[  517.649929]  [<ffffffff8109eec0>] ? flush_kthread_worker+0x70/0x70

Signed-off-by: Artur Paszkiewicz <artur.paszkiewicz@intel.com>
Reviewed-by: Shaohua Li <shli@kernel.org>
Cc: stable@vger.kernel.org (v4.2+)
Fixes: c31df25f20 ("md/raid10: make sync_request_write() call bio_copy_data()")
Signed-off-by: NeilBrown <neilb@suse.com>
2015-12-18 15:19:16 +11:00
Nikolay Borisov
18d03e8c25 dm thin: fix race condition when destroying thin pool workqueue
When a thin pool is being destroyed delayed work items are
cancelled using cancel_delayed_work(), which doesn't guarantee that on
return the delayed item isn't running.  This can cause the work item to
requeue itself on an already destroyed workqueue.  Fix this by using
cancel_delayed_work_sync() which guarantees that on return the work item
is not running anymore.

Fixes: 905e51b39a ("dm thin: commit outstanding data every second")
Fixes: 85ad643b7e ("dm thin: add timeout to stop out-of-data-space mode holding IO forever")
Signed-off-by: Nikolay Borisov <kernel@kyup.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Cc: stable@vger.kernel.org
2015-12-17 15:47:20 -05:00
Mike Snitzer
512167788a dm space map metadata: remove unused variable in brb_pop()
Remove the unused struct block_op pointer that was inadvertantly
introduced, via cut-and-paste of previous brb_op() code, as part of
commit 50dd842ad.

(Cc'ing stable@ because commit 50dd842ad did)

Fixes: 50dd842ad ("dm space map metadata: fix ref counting bug when bootstrapping a new space map")
Reported-by: David Binderman <dcb314@hotmail.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Cc: stable@vger.kernel.org
2015-12-14 09:26:01 -05:00
Sami Tolvanen
0cc37c2df4 dm verity: add ignore_zero_blocks feature
If ignore_zero_blocks is enabled dm-verity will return zeroes for blocks
matching a zero hash without validating the content.

Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2015-12-10 10:39:03 -05:00
Sami Tolvanen
a739ff3f54 dm verity: add support for forward error correction
Add support for correcting corrupted blocks using Reed-Solomon.

This code uses RS(255, N) interleaved across data and hash
blocks. Each error-correcting block covers N bytes evenly
distributed across the combined total data, so that each byte is a
maximum distance away from the others. This makes it possible to
recover from several consecutive corrupted blocks with relatively
small space overhead.

In addition, using verity hashes to locate erasures nearly doubles
the effectiveness of error correction. Being able to detect
corrupted blocks also improves performance, because only corrupted
blocks need to corrected.

For a 2 GiB partition, RS(255, 253) (two parity bytes for each
253-byte block) can correct up to 16 MiB of consecutive corrupted
blocks if erasures can be located, and 8 MiB if they cannot, with
16 MiB space overhead.

Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2015-12-10 10:39:03 -05:00
Sami Tolvanen
bb4d73ac5e dm verity: factor out verity_for_bv_block()
verity_for_bv_block() will be re-used by optional dm-verity object.

Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2015-12-10 10:39:02 -05:00
Sami Tolvanen
ffa393807c dm verity: factor out structures and functions useful to separate object
Prepare for an optional verity object to make use of existing dm-verity
structures and functions.

Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2015-12-10 10:39:01 -05:00
Sami Tolvanen
03045cbafa dm verity: move dm-verity.c to dm-verity-target.c
Prepare for extending dm-verity with an optional object.  Follows the
naming convention used by other DM targets (e.g. dm-cache and dm-era).

Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2015-12-10 10:39:01 -05:00
Sami Tolvanen
753c1fd028 dm verity: separate function for parsing opt args
Move optional argument parsing into a separate function to make it
easier to add more of them without making verity_ctr even longer.

Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2015-12-10 10:39:00 -05:00
Sami Tolvanen
6dbeda3469 dm verity: clean up duplicate hashing code
Handle dm-verity salting in one place to simplify the code.

Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2015-12-10 10:39:00 -05:00
Mike Snitzer
ba503835ad dm btree: factor out need_insert() helper
Eliminates code duplication within insert().

Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2015-12-10 10:38:59 -05:00
Anup Limbu
86a49e2dac dm bufio: use BUG_ON instead of conditional call to BUG
Signed-off-by: Anup Limbu <anuplimbu14@gmail.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2015-12-10 10:38:58 -05:00
Mikulas Patocka
86bad0c707 dm bufio: store stacktrace in buffers to help find buffer leaks
The option DM_DEBUG_BLOCK_STACK_TRACING is moved from persistent-data
directory to device mapper directory because it will now be used by
persistent-data and bufio.  When the option is enabled, each bufio buffer
stores the stacktrace of the last dm_bufio_get(), dm_bufio_read() or
dm_bufio_new() call that increased the hold count to 1.  The buffer's
stacktrace is printed if the buffer was not released before the bufio
client is destroyed.

When DM_DEBUG_BLOCK_STACK_TRACING is enabled, any bufio buffer leaks are
considered warnings - i.e. the kernel continues afterwards.  If not
enabled, buffer leaks are considered BUGs and the kernel with crash.
Reasoning on this disposition is: if we only ever warned on buffer leaks
users would generally ignore them and the problematic code would never
get fixed.

Successfully used to find source of bufio leaks fixed with commit
fce079f63c3 ("dm btree: fix bufio buffer leaks in dm_btree_del() error
path").

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2015-12-10 10:38:58 -05:00
Mikulas Patocka
f98c8f7970 dm bufio: return NULL to improve code clarity
A small code cleanup in new_read() - return NULL instead of b (although
b is NULL at this point).  This function is not returning pointer to the
buffer, it is returning a pointer to the bufffer's data, thus it makes
no sense to return the variable b.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2015-12-10 10:38:57 -05:00
Mikulas Patocka
313c9b9736 dm block manager: cleanup code that prints stacktrace
There is no need to record stack trace and immediately print it.  Just
use dump_stack() to print the current stack.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2015-12-10 10:38:56 -05:00
Mikulas Patocka
fe3265b180 dm: don't save and restore bi_private
Device mapper used the field bi_private to point to dm_target_io. However,
since kernel 3.15, the bi_private field is unused, and so the targets do
not need to save and restore this field.

This patch removes code that saves and restores bi_private from dm-cache,
dm-snapshot and dm-verity.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2015-12-10 10:38:56 -05:00
Joe Thornber
086fbbbda9 dm thin metadata: make dm_thin_find_mapped_range() atomic
Refactor dm_thin_find_mapped_range() so that it takes the read lock on
the metadata's lock; rather than relying on finer grained locking that
is pushed down inside dm_thin_find_next_mapped_block() and
dm_thin_find_block().

Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2015-12-10 10:38:55 -05:00
Joe Thornber
3d5f67332a dm thin metadata: speed up discard of partially mapped volumes
Use dm_btree_lookup_next() to more quickly discard partially mapped
volumes.

Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2015-12-10 10:30:56 -05:00
Joe Thornber
ed8b45a367 dm btree: fix bufio buffer leaks in dm_btree_del() error path
If dm_btree_del()'s call to push_frame() fails, e.g. due to
btree_node_validator finding invalid metadata, the dm_btree_del() error
path must unlock all frames (which have active dm-bufio buffers) that
were pushed onto the del_stack.

Otherwise, dm_bufio_client_destroy() will BUG_ON() because dm-bufio
buffers have leaked, e.g.:
  device-mapper: bufio: leaked buffer 3, hold count 1, list 0

Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Cc: stable@vger.kernel.org
2015-12-10 10:30:18 -05:00
Joe Thornber
50dd842ad8 dm space map metadata: fix ref counting bug when bootstrapping a new space map
When applying block operations (BOPs) do not remove them from the
uncommitted BOP ring-buffer until after they've been applied -- in case
we recurse.

Also, perform BOP_INC operation, in dm_sm_metadata_create() and
sm_metadata_extend(), in terms of the uncommitted BOP ring-buffer rather
than using direct calls to sm_ll_inc().

Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Cc: stable@vger.kernel.org
2015-12-09 13:27:25 -05:00
Joe Thornber
49e99fc717 dm thin metadata: fix bug when taking a metadata snapshot
When you take a metadata snapshot the btree roots for the mapping and
details tree need to have their reference counts incremented so they
persist for the lifetime of the metadata snap.

The roots being incremented were those currently written in the
superblock, which could possibly be out of date if concurrent IO is
triggering new mappings, breaking of sharing, etc.

Fix this by performing a commit with the metadata lock held while taking
a metadata snapshot.

Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Cc: stable@vger.kernel.org
2015-12-09 13:18:12 -05:00
Joe Thornber
993ceab919 dm thin metadata: fix bug in dm_thin_remove_range()
dm_btree_remove_leaves() only unmaps a contiguous region so we need a
loop, in __remove_range(), to handle ranges that contain multiple
regions.

A new btree function, dm_btree_lookup_next(), is introduced which is
more efficiently able to skip over regions of the thin device which
aren't mapped.  __remove_range() uses dm_btree_lookup_next() for each
iteration of __remove_range()'s loop.

Also, improve description of dm_btree_remove_leaves().

Fixes: 6550f075 ("dm thin metadata: add dm_thin_remove_range()")
Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Cc: stable@vger.kernel.org # 4.1+
2015-12-02 13:26:49 -05:00
Mike Snitzer
30ce6e1cc5 dm btree: fix leak of bufio-backed block in btree_split_sibling error path
The block allocated at the start of btree_split_sibling() is never
released if later insert_at() fails.

Fix this by releasing the previously allocated bufio block using
unlock_block().

Reported-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Cc: stable@vger.kernel.org
2015-12-02 13:20:34 -05:00
Mike Snitzer
0fcb04d593 dm thin: fix regression in advertised discard limits
When establishing a thin device's discard limits we cannot rely on the
underlying thin-pool device's discard capabilities (which are inherited
from the thin-pool's underlying data device) given that DM thin devices
must provide discard support even when the thin-pool's underlying data
device doesn't support discards.

Users were exposed to this thin device discard limits regression if
their thin-pool's underlying data device does _not_ support discards.
This regression caused all upper-layers that called the
blkdev_issue_discard() interface to not be able to issue discards to
thin devices (because discard_granularity was 0).  This regression
wasn't caught earlier because the device-mapper-test-suite's extensive
'thin-provisioning' discard tests are only ever performed against
thin-pool's with data devices that support discards.

Fix is to have thin_io_hints() test the pool's 'discard_enabled' feature
rather than inferring whether or not a thin device's discard support
should be enabled by looking at the thin-pool's discard_granularity.

Fixes: 216076705 ("dm thin: disable discard support for thin devices if pool's is disabled")
Reported-by: Mike Gerber <mike@sprachgewalt.de>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Cc: stable@vger.kernel.org # 4.1+
2015-11-23 14:54:46 -05:00
Mikulas Patocka
bcbd94ff48 dm crypt: fix a possible hang due to race condition on exit
A kernel thread executes __set_current_state(TASK_INTERRUPTIBLE),
__add_wait_queue, spin_unlock_irq and then tests kthread_should_stop().
It is possible that the processor reorders memory accesses so that
kthread_should_stop() is executed before __set_current_state().  If such
reordering happens, there is a possible race on thread termination:

CPU 0:
calls kthread_should_stop()
	it tests KTHREAD_SHOULD_STOP bit, returns false
CPU 1:
calls kthread_stop(cc->write_thread)
	sets the KTHREAD_SHOULD_STOP bit
	calls wake_up_process on the kernel thread, that sets the thread
	state to TASK_RUNNING
CPU 0:
sets __set_current_state(TASK_INTERRUPTIBLE)
spin_unlock_irq(&cc->write_thread_wait.lock)
schedule() - and the process is stuck and never terminates, because the
	state is TASK_INTERRUPTIBLE and wake_up_process on CPU 1 already
	terminated

Fix this race condition by using a new flag DM_CRYPT_EXIT_THREAD to
signal that the kernel thread should exit.  The flag is set and tested
while holding cc->write_thread_wait.lock, so there is no possibility of
racy access to the flag.

Also, remove the unnecessary set_task_state(current, TASK_RUNNING)
following the schedule() call.  When the process was woken up, its state
was already set to TASK_RUNNING.  Other kernel code also doesn't set the
state to TASK_RUNNING following schedule() (for example,
do_wait_for_common in completion.c doesn't do it).

Fixes: dc2676210c ("dm crypt: offload writes to thread")
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Cc: stable@vger.kernel.org # v4.0+
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2015-11-19 13:38:30 -05:00
Junichi Nomura
43e43c9ea6 dm mpath: fix infinite recursion in ioctl when no paths and !queue_if_no_path
In multipath_prepare_ioctl(),
  - pgpath is a path selected from available paths
  - m->queue_io is true if we cannot send a request immediately to
    paths, either because:
      * there is no available path
      * the path group needs activation (pg_init)
          - pg_init is not started
          - pg_init is still running
  - m->queue_if_no_path is true if the device is configured to queue
    I/O if there are no available paths

If !pgpath && !m->queue_if_no_path, the handler should return -EIO.
However in the course of refactoring the condition check has broken
and returns success in that case.  Since bdev points to the dm device
itself, dm_blk_ioctl() calls __blk_dev_driver_ioctl() for itself and
recurses until crash.

You could reproduce the problem like this:

  # dmsetup create mp --table '0 1024 multipath 0 0 0 0'
  # sg_inq /dev/mapper/mp
  <crash>
  [  172.648615] BUG: unable to handle kernel paging request at fffffffc81b10268
  [  172.662843] PGD 19dd067 PUD 0
  [  172.666269] Thread overran stack, or stack corrupted
  [  172.671808] Oops: 0000 [#1] SMP
  ...

Fix the condition check with some clarifications.

Fixes: e56f81e0b0 ("dm: refactor ioctl handling")
Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2015-11-17 14:19:00 -05:00
Mike Snitzer
647a20d5ca dm: do not reuse dm_blk_ioctl block_device input as local variable
(Ab)using the @bdev passed to dm_blk_ioctl() opens the potential for
targets' .prepare_ioctl to fail if they go on to check the bdev for
!NULL.

Fixes: e56f81e0b0 ("dm: refactor ioctl handling")
Reported-by: Junichi Nomura <j-nomura@ce.jp.nec.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2015-11-17 14:18:49 -05:00
Junichi Nomura
5bbbfdf685 dm: fix ioctl retry termination with signal
dm-mpath retries ioctl, when no path is readily available and the device
is configured to queue I/O in such a case. If you want to stop the retry
before multipathd decides to turn off queueing mode, you could send
signal for the process to exit from the loop.

However the check of fatal signal has not carried along when commit
6c182cd88d ("dm mpath: fix ioctl deadlock when no paths") moved the
loop from dm-mpath to dm core. As a result, we can't terminate such
a process in the retry loop.

Easy reproducer of the situation is:

  # dmsetup create mp --table '0 1024 multipath 0 0 0 0'
  # dmsetup message mp 0 'queue_if_no_path'
  # sg_inq /dev/mapper/mp

then you should be able to terminate sg_inq by pressing Ctrl+C.

Fixes: 6c182cd88d ("dm mpath: fix ioctl deadlock when no paths")
Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Cc: stable@vger.kernel.org
2015-11-17 14:04:32 -05:00
Mike Snitzer
172c238612 dm thin: restore requested 'error_if_no_space' setting on OODS to WRITE transition
A thin-pool that is in out-of-data-space (OODS) mode may transition back
to write mode -- without the admin adding more space to the thin-pool --
if/when blocks are released (either by deleting thin devices or
discarding provisioned blocks).

But as part of the thin-pool's earlier transition to out-of-data-space
mode the thin-pool may have set the 'error_if_no_space' flag to true if
the no_space_timeout expires without more space having been made
available.  That implementation detail, of changing the pool's
error_if_no_space setting, needs to be reset back to the default that
the user specified when the thin-pool's table was loaded.

Otherwise we'll drop the user requested behaviour on the floor when this
out-of-data-space to write mode transition occurs.

Reported-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Acked-by: Joe Thornber <ejt@redhat.com>
Fixes: 2c43fd26e4 ("dm thin: fix missing out-of-data-space to write mode transition if blocks are released")
Cc: stable@vger.kernel.org
2015-11-16 09:36:08 -05:00
Linus Torvalds
3419b45039 Merge branch 'for-4.4/io-poll' of git://git.kernel.dk/linux-block
Pull block IO poll support from Jens Axboe:
 "Various groups have been doing experimentation around IO polling for
  (really) fast devices.  The code has been reviewed and has been
  sitting on the side for a few releases, but this is now good enough
  for coordinated benchmarking and further experimentation.

  Currently O_DIRECT sync read/write are supported.  A framework is in
  the works that allows scalable stats tracking so we can auto-tune
  this.  And we'll add libaio support as well soon.  Fow now, it's an
  opt-in feature for test purposes"

* 'for-4.4/io-poll' of git://git.kernel.dk/linux-block:
  direct-io: be sure to assign dio->bio_bdev for both paths
  directio: add block polling support
  NVMe: add blk polling support
  block: add block polling support
  blk-mq: return tag/queue combo in the make_request_fn handlers
  block: change ->make_request_fn() and users to return a queue cookie
2015-11-10 17:23:49 -08:00
Linus Torvalds
3934bbc044 config fix for md
config dependency needed as md/raid5 now uses crc32c
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQIcBAABCAAGBQJWP8lRAAoJEDnsnt1WYoG5AQ0P/REvKfJxP870gS6p5gowMYXN
 1pwOdq9t2MeVkQk0Q5xBOZFGQI2TL2VZjaZdiSEKqaHgd3IOD/aGpl2exLN8nZM4
 mNx+iD3QvEpmSA9mwlCe+V8vTuE0JeTpod9pXk+aQ0DIUx60dSmWh0Lp8ctr33oQ
 inlJ7kXFws6rG0xU0pOaSDM1hI0sC06Nyi2tvRSyZlZbBIMjZWorzJFUQuVX43rD
 7cQJSQ8z2e2x3V7KXYtZf6Kxe+NzEltnq0OAfnDvjz3iw+a9qU6Qg7diAbcwOdyP
 m34/MHJGIYX9GBlTtVo3+j+h3ppaPpLfT4emeYUPTQCkMXg7J9Zbjkwso7OSkvnL
 kovtgIWRVzxFx/k7jhoWzNOsacOhTFz0E4aKDpOeH2cfU7k5H/siIjeIYcqfW3fU
 q8fJXMplHz3XRYp0JhR5DRJZSw87eQnkIRkvSy8wHx8KVoRi7KNm7fv/3Zi7FpaU
 XnbxQa6FrIoqbqeBQ666Rlkn+r2Ftmj50eudAhbj4/PBesRmt6otvr9uCK+b+adh
 ZI748BC2/IOOK34WNktRQCyS2C4b3VLwRVMHReD1xir34rGGcKO04Hci8T7Qb0nP
 uJBAxmE2zue0oE6d+nnrJS3BlEC2ZFKJGzRxKiLCU7nXXoGCznd++IIPRHAkGhZc
 MSMpaS2mqtdTNp4n+9y8
 =dN6h
 -----END PGP SIGNATURE-----

Merge tag 'md/4.4-rc0-fix' of git://neil.brown.name/md

Pull config fix for md from Neil Brown:
 "New config dependency needed as md/raid5 now uses crc32c"

* tag 'md/4.4-rc0-fix' of git://neil.brown.name/md:
  raid5-cache: add crc32c Kconfig dependency
2015-11-10 12:13:00 -08:00
Arnd Bergmann
14f09e2f9b raid5-cache: add crc32c Kconfig dependency
The recent change of the raid5-cache code to use crc32c instead
of crc32 causes link errors when CONFIG_LIBCRC32C is disabled:

drivers/built-in.o: In function crc32c'
core.c:(.text+0x1c6060): undefined reference to `crc32c'

This adds an explicit 'select' statement like all other users
of this function do.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Fixes: 5cb2fbd6ea ("raid5-cache: use crc32c checksum")
Signed-off-by: NeilBrown <neilb@suse.com>
2015-11-09 09:09:52 +11:00
Linus Torvalds
ad804a0b2a Merge branch 'akpm' (patches from Andrew)
Merge second patch-bomb from Andrew Morton:

 - most of the rest of MM

 - procfs

 - lib/ updates

 - printk updates

 - bitops infrastructure tweaks

 - checkpatch updates

 - nilfs2 update

 - signals

 - various other misc bits: coredump, seqfile, kexec, pidns, zlib, ipc,
   dma-debug, dma-mapping, ...

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (102 commits)
  ipc,msg: drop dst nil validation in copy_msg
  include/linux/zutil.h: fix usage example of zlib_adler32()
  panic: release stale console lock to always get the logbuf printed out
  dma-debug: check nents in dma_sync_sg*
  dma-mapping: tidy up dma_parms default handling
  pidns: fix set/getpriority and ioprio_set/get in PRIO_USER mode
  kexec: use file name as the output message prefix
  fs, seqfile: always allow oom killer
  seq_file: reuse string_escape_str()
  fs/seq_file: use seq_* helpers in seq_hex_dump()
  coredump: change zap_threads() and zap_process() to use for_each_thread()
  coredump: ensure all coredumping tasks have SIGNAL_GROUP_COREDUMP
  signal: remove jffs2_garbage_collect_thread()->allow_signal(SIGCONT)
  signal: introduce kernel_signal_stop() to fix jffs2_garbage_collect_thread()
  signal: turn dequeue_signal_lock() into kernel_dequeue_signal()
  signals: kill block_all_signals() and unblock_all_signals()
  nilfs2: fix gcc uninitialized-variable warnings in powerpc build
  nilfs2: fix gcc unused-but-set-variable warnings
  MAINTAINERS: nilfs2: add header file for tracing
  nilfs2: add tracepoints for analyzing reading and writing metadata files
  ...
2015-11-07 14:32:45 -08:00
Linus Torvalds
75021d2859 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial
Pull trivial updates from Jiri Kosina:
 "Trivial stuff from trivial tree that can be trivially summed up as:

   - treewide drop of spurious unlikely() before IS_ERR() from Viresh
     Kumar

   - cosmetic fixes (that don't really affect basic functionality of the
     driver) for pktcdvd and bcache, from Julia Lawall and Petr Mladek

   - various comment / printk fixes and updates all over the place"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial:
  bcache: Really show state of work pending bit
  hwmon: applesmc: fix comment typos
  Kconfig: remove comment about scsi_wait_scan module
  class_find_device: fix reference to argument "match"
  debugfs: document that debugfs_remove*() accepts NULL and error values
  net: Drop unlikely before IS_ERR(_OR_NULL)
  mm: Drop unlikely before IS_ERR(_OR_NULL)
  fs: Drop unlikely before IS_ERR(_OR_NULL)
  drivers: net: Drop unlikely before IS_ERR(_OR_NULL)
  drivers: misc: Drop unlikely before IS_ERR(_OR_NULL)
  UBI: Update comments to reflect UBI_METAONLY flag
  pktcdvd: drop null test before destroy functions
2015-11-07 13:05:44 -08:00
Jens Axboe
dece16353e block: change ->make_request_fn() and users to return a queue cookie
No functional changes in this patch, but it prepares us for returning
a more useful cookie related to the IO that was queued up.

Signed-off-by: Jens Axboe <axboe@fb.com>
Acked-by: Christoph Hellwig <hch@lst.de>
Acked-by: Keith Busch <keith.busch@intel.com>
2015-11-07 10:40:46 -07:00
Mel Gorman
d0164adc89 mm, page_alloc: distinguish between being unable to sleep, unwilling to sleep and avoiding waking kswapd
__GFP_WAIT has been used to identify atomic context in callers that hold
spinlocks or are in interrupts.  They are expected to be high priority and
have access one of two watermarks lower than "min" which can be referred
to as the "atomic reserve".  __GFP_HIGH users get access to the first
lower watermark and can be called the "high priority reserve".

Over time, callers had a requirement to not block when fallback options
were available.  Some have abused __GFP_WAIT leading to a situation where
an optimisitic allocation with a fallback option can access atomic
reserves.

This patch uses __GFP_ATOMIC to identify callers that are truely atomic,
cannot sleep and have no alternative.  High priority users continue to use
__GFP_HIGH.  __GFP_DIRECT_RECLAIM identifies callers that can sleep and
are willing to enter direct reclaim.  __GFP_KSWAPD_RECLAIM to identify
callers that want to wake kswapd for background reclaim.  __GFP_WAIT is
redefined as a caller that is willing to enter direct reclaim and wake
kswapd for background reclaim.

This patch then converts a number of sites

o __GFP_ATOMIC is used by callers that are high priority and have memory
  pools for those requests. GFP_ATOMIC uses this flag.

o Callers that have a limited mempool to guarantee forward progress clear
  __GFP_DIRECT_RECLAIM but keep __GFP_KSWAPD_RECLAIM. bio allocations fall
  into this category where kswapd will still be woken but atomic reserves
  are not used as there is a one-entry mempool to guarantee progress.

o Callers that are checking if they are non-blocking should use the
  helper gfpflags_allow_blocking() where possible. This is because
  checking for __GFP_WAIT as was done historically now can trigger false
  positives. Some exceptions like dm-crypt.c exist where the code intent
  is clearer if __GFP_DIRECT_RECLAIM is used instead of the helper due to
  flag manipulations.

o Callers that built their own GFP flags instead of starting with GFP_KERNEL
  and friends now also need to specify __GFP_KSWAPD_RECLAIM.

The first key hazard to watch out for is callers that removed __GFP_WAIT
and was depending on access to atomic reserves for inconspicuous reasons.
In some cases it may be appropriate for them to use __GFP_HIGH.

The second key hazard is callers that assembled their own combination of
GFP flags instead of starting with something like GFP_KERNEL.  They may
now wish to specify __GFP_KSWAPD_RECLAIM.  It's almost certainly harmless
if it's missed in most cases as other activity will wake kswapd.

Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Christoph Lameter <cl@linux.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Vitaly Wool <vitalywool@gmail.com>
Cc: Rik van Riel <riel@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-11-06 17:50:42 -08:00
Petr Mladek
8d090f4731 bcache: Really show state of work pending bit
WORK_STRUCT_PENDING is a mask for testing the pending bit.
test_bit() expects the number of the bit and we need to
use WORK_STRUCT_PENDING_BIT there.

Also work_data_bits() is defined in workqueues.h now.

I have noticed this just by chance when looking how
WORK_STRUCT_PENDING_BIT is used. The change is compile
tested.

Signed-off-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2015-11-06 15:06:05 +01:00
Linus Torvalds
e0700ce709 - Revert a dm-multipath change that caused a regression for unprivledged
users (e.g. kvm guests) that issued ioctls when a multipath device had
   no available paths.
 
 - Include Christoph's refactoring of DM's ioctl handling and add support
   for passing through persistent reservations with DM multipath.
 
 - All other changes are very simple cleanups.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJWOp04AAoJEMUj8QotnQNaFLsH/AhMEH/jI1ObOfy4J1Wy4rOx
 ujJT91uS/s0H3pc9cGKQYnuGpFkX6WWU4wMiabIyiTn4sAsoXaflfIGutivLiDJr
 HfecrMrGZgnP4ZlpPPB02BmlxFbcPW8yzAU4ma38xBgQ+Pu30RO/HkvX/2vKOppG
 qwPop/XsNxq3KXgFGM44ToytM6c/MPGluhuvOwbaacAO1HviMuen9qsVjk4kwcf3
 jGYTbEPHATxyu5/6oKDTkQTYhzdwg3B2qHCiKMGw3l1kXhaQLFcaOivOLV8Sf3xh
 bj1070pkGe9OpqaVzMnwDtJ8rnsBl/Nt4wj9oiQPxbX71GYZAmcMIYn9WEkcKFI=
 =AR2D
 -----END PGP SIGNATURE-----

Merge tag 'dm-4.4-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm

Pull device mapper updates from Mike Snitzer:
 "Smaller set of DM changes for this merge.  I've based these changes on
  Jens' for-4.4/reservations branch because the associated DM changes
  required it.

   - Revert a dm-multipath change that caused a regression for
     unprivledged users (e.g. kvm guests) that issued ioctls when a
     multipath device had no available paths.

   - Include Christoph's refactoring of DM's ioctl handling and add
     support for passing through persistent reservations with DM
     multipath.

   - All other changes are very simple cleanups"

* tag 'dm-4.4-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
  dm switch: simplify conditional in alloc_region_table()
  dm delay: document that offsets are specified in sectors
  dm delay: capitalize the start of an delay_ctr() error message
  dm delay: Use DM_MAPIO macros instead of open-coded equivalents
  dm linear: remove redundant target name from error messages
  dm persistent data: eliminate unnecessary return values
  dm: eliminate unused "bioset" process for each bio-based DM device
  dm: convert ffs to __ffs
  dm: drop NULL test before kmem_cache_destroy() and mempool_destroy()
  dm: add support for passing through persistent reservations
  dm: refactor ioctl handling
  Revert "dm mpath: fix stalls when handling invalid ioctls"
  dm: initialize non-blk-mq queue data before queue is used
2015-11-04 21:19:53 -08:00
Linus Torvalds
ac322de6bf md updates for 4.4.
Two major components to this update.
 
 1/ the clustered-raid1 support from SUSE is nearly
   complete.  There are a few outstanding issues being
   worked on.  Maybe half a dozen patches will bring
   this to a usable state.
 
 2/ The first stage of journalled-raid5 support from
    Facebook makes an appearance.  With a journal
    device configured (typically NVRAM or SSD), the
    "RAID5 write hole" should be closed - a crash
    during degraded operations cannot result in data
    corruption.
 
    The next stage will be to use the journal as a
    write-behind cache so that latency can be reduced
    and in some cases throughput increased by
    performing more full-stripe writes.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQIcBAABCAAGBQJWNX9RAAoJEDnsnt1WYoG5bYMP/jI0pV3wcbs7mZQAa8S/V0lU
 2l25x4MdwDvqVKMfjIc/C5J08QNgcrgSvhiVPCEOK0w18q395vep9f6gFKbMHhu/
 lWU3PLHGw8XBHp5yEnxrpQkN0pRrNjh5NqIdlVMBNyL6u+RZPS2ZuzxJ8wiNAFg1
 MypNkgoUu6s+nBp4DWWnMGYhBc+szBR+gTYAzGiZ8vqOH9uiSJ2SsGG5aRVUN/af
 oMYvJAf9aA6uj+xSzNlXIaLfWJIrshQYS1jU/W4gTm0DwK9yqbTxvubJaE0SGu/o
 73FGU8tmQ6ELYfsp3D/jmfUkE7weiNEQhdVb/4wy1A/SGc+W7Ju9pxfhm8ra57s0
 /BCkfwWZXEvx1flegXfK1mC6EMpMIcGAD2FQEhmQbW6wTdDwtNyEhIePDVGJwD/F
 rhEThFa+Dg9+xnBGnS6OUK3EpXgml2hAeAC7uA3TVSAnWd/9/Mpim6fZhqrB/v9L
 Ik0tZt+H4nxYaheZjKlKhuXUQYcUWGiMb67bGMem/YAlMa4y9C9qF+9mPXxyjVlI
 hBsd5SfZNz99DyB/bO8BumQeIWlTfzLeFzWW67eQ864LRKO6k0/VIbPZHCfn2oVG
 XvyC2fUhNOIURP3IMxcyHYxOA7Mu6EDsVVDTpuqLVbZQ5IPjDEfQ54yB/BLUvbX/
 Gh2/tKn7Xc25HuLAFEbs
 =TD5o
 -----END PGP SIGNATURE-----

Merge tag 'md/4.4' of git://neil.brown.name/md

Pull md updates from Neil Brown:
 "Two major components to this update.

   1) The clustered-raid1 support from SUSE is nearly complete.  There
      are a few outstanding issues being worked on.  Maybe half a dozen
      patches will bring this to a usable state.

   2) The first stage of journalled-raid5 support from Facebook makes an
      appearance.  With a journal device configured (typically NVRAM or
      SSD), the "RAID5 write hole" should be closed - a crash during
      degraded operations cannot result in data corruption.

      The next stage will be to use the journal as a write-behind cache
      so that latency can be reduced and in some cases throughput
      increased by performing more full-stripe writes.

* tag 'md/4.4' of git://neil.brown.name/md: (66 commits)
  MD: when RAID journal is missing/faulty, block RESTART_ARRAY_RW
  MD: set journal disk ->raid_disk
  MD: kick out journal disk if it's not fresh
  raid5-cache: start raid5 readonly if journal is missing
  MD: add new bit to indicate raid array with journal
  raid5-cache: IO error handling
  raid5: journal disk can't be removed
  raid5-cache: add trim support for log
  MD: fix info output for journal disk
  raid5-cache: use bio chaining
  raid5-cache: small log->seq cleanup
  raid5-cache: new helper: r5_reserve_log_entry
  raid5-cache: inline r5l_alloc_io_unit into r5l_new_meta
  raid5-cache: take rdev->data_offset into account early on
  raid5-cache: refactor bio allocation
  raid5-cache: clean up r5l_get_meta
  raid5-cache: simplify state machine when caches flushes are not needed
  raid5-cache: factor out a helper to run all stripes for an I/O unit
  raid5-cache: rename flushed_ios to finished_ios
  raid5-cache: free I/O units earlier
  ...
2015-11-04 21:12:47 -08:00
Linus Torvalds
527d1529e3 Merge branch 'for-4.4/integrity' of git://git.kernel.dk/linux-block
Pull block integrity updates from Jens Axboe:
 ""This is the joint work of Dan and Martin, cleaning up and improving
  the support for block data integrity"

* 'for-4.4/integrity' of git://git.kernel.dk/linux-block:
  block, libnvdimm, nvme: provide a built-in blk_integrity nop profile
  block: blk_flush_integrity() for bio-based drivers
  block: move blk_integrity to request_queue
  block: generic request_queue reference counting
  nvme: suspend i/o during runtime blk_integrity_unregister
  md: suspend i/o during runtime blk_integrity_unregister
  md, dm, scsi, nvme, libnvdimm: drop blk_integrity_unregister() at shutdown
  block: Inline blk_integrity in struct gendisk
  block: Export integrity data interval size in sysfs
  block: Reduce the size of struct blk_integrity
  block: Consolidate static integrity profile properties
  block: Move integrity kobject to struct gendisk
2015-11-04 20:51:48 -08:00