This reverts commit a2b8b2d975.
WQ_SYSFS breaks the ability to reload a DM table due to sysfs kobject
collision (due to active and inactive table). Given lack of
demonstrated need for exposing this workqueue via sysfs: revert
exposing it.
Reported-by: Ignat Korchagin <ignat@cloudflare.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
It should be helpful to export sysfs of "kcryptd" workqueue in some
cases, such as setting specific CPU affinity of the workqueue.
Besides, also tweak the name format a little. The slash inside a
directory name will be translate into exclamation mark, such as
/sys/devices/virtual/workqueue/'kcryptd!253:0'.
Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
The only usage of these structs is to assign their address to the
iv_gen_ops field in the crypt config struct, which is a pointer to
const. Make them const like the rest of the static crypt_iv_operations
structs. This allows the compiler to put them in read-only memory.
Signed-off-by: Rikard Falkeborn <rikard.falkeborn@gmail.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Don't use crypto drivers that have the flag CRYPTO_ALG_ALLOCATES_MEMORY
set. These drivers allocate memory and thus they are unsuitable for block
I/O processing.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Use the DECLARE_CRYPTO_WAIT() macro to properly initialize the crypto
wait structures declared on stack before their use with
crypto_wait_req().
Fixes: 39d13a1ac4 ("dm crypt: reuse eboiv skcipher for IV generation")
Fixes: bbb1658461 ("dm crypt: Implement Elephant diffuser for Bitlocker compatibility")
Cc: stable@vger.kernel.org
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
bio-based code so that it follows patterns established by
request-based code.
- Request-based DM core improvement to eliminate unnecessary call to
blk_mq_queue_stopped().
- Add "panic_on_corruption" error handling mode to DM verity target.
- DM bufio fix to to perform buffer cleanup from a workqueue rather
than wait for IO in reclaim context from shrinker.
- DM crypt improvement to optionally avoid async processing via
workqueues for reads and/or writes -- via "no_read_workqueue" and
"no_write_workqueue" features. This more direct IO processing
improves latency and throughput with faster storage. Avoiding
workqueue IO submission for writes (DM_CRYPT_NO_WRITE_WORKQUEUE) is
a requirement for adding zoned block device support to DM crypt.
- Add zoned block device support to DM crypt. Makes use of
DM_CRYPT_NO_WRITE_WORKQUEUE and a new optional feature
(DM_CRYPT_WRITE_INLINE) that allows write completion to wait for
encryption to complete. This allows write ordering to be preserved,
which is needed for zoned block devices.
- Fix DM ebs target's check for REQ_OP_FLUSH.
- Fix DM core's report zones support to not report more zones than
were requested.
- A few small compiler warning fixes.
- DM dust improvements to return output directly to the user rather
than require they scrape the system log for output.
-----BEGIN PGP SIGNATURE-----
iQFHBAABCAAxFiEEJfWUX4UqZ4x1O2wixSPxCi2dA1oFAl8tdOQTHHNuaXR6ZXJA
cmVkaGF0LmNvbQAKCRDFI/EKLZ0DWvDlB/sF8svagDqeqs27xTxCiUPykD29cMmS
OGPr0Mp/BntZOBpSaTPM9s5XucP3WJhPsxet5qeoyM3OViSFx+O55PqPjn8C65y0
eGMa4zknd9eO1933+ijmyQu6VNr4sf/6nusX4xSGqv00UR22dJ+3pHtfN9ANDXYX
AAYA0Ve6UuOwAbGUCnRGI/2780aYY0B8Ok+cF21CskqryF+RpmbZ6BsR07+Hk4cy
LX5EaHUqezW12cibLq2f0l7TLLJ86OscvqyU9lGVIxiV57e2i5c2S1HvhKZu+Wn3
6CUmlOhGI0viCKgM1ArekZ+zOw9ROIaAKKPzC5mspqx9yuuCqdY8k8xV
=X3tt
-----END PGP SIGNATURE-----
Merge tag 'for-5.9/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm
Pull device mapper updates from Mike Snitzer:
- DM multipath locking fixes around m->flags tests and improvements to
bio-based code so that it follows patterns established by
request-based code.
- Request-based DM core improvement to eliminate unnecessary call to
blk_mq_queue_stopped().
- Add "panic_on_corruption" error handling mode to DM verity target.
- DM bufio fix to to perform buffer cleanup from a workqueue rather
than wait for IO in reclaim context from shrinker.
- DM crypt improvement to optionally avoid async processing via
workqueues for reads and/or writes -- via "no_read_workqueue" and
"no_write_workqueue" features. This more direct IO processing
improves latency and throughput with faster storage. Avoiding
workqueue IO submission for writes (DM_CRYPT_NO_WRITE_WORKQUEUE) is a
requirement for adding zoned block device support to DM crypt.
- Add zoned block device support to DM crypt. Makes use of
DM_CRYPT_NO_WRITE_WORKQUEUE and a new optional feature
(DM_CRYPT_WRITE_INLINE) that allows write completion to wait for
encryption to complete. This allows write ordering to be preserved,
which is needed for zoned block devices.
- Fix DM ebs target's check for REQ_OP_FLUSH.
- Fix DM core's report zones support to not report more zones than were
requested.
- A few small compiler warning fixes.
- DM dust improvements to return output directly to the user rather
than require they scrape the system log for output.
* tag 'for-5.9/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
dm: don't call report zones for more than the user requested
dm ebs: Fix incorrect checking for REQ_OP_FLUSH
dm init: Set file local variable static
dm ioctl: Fix compilation warning
dm raid: Remove empty if statement
dm verity: Fix compilation warning
dm crypt: Enable zoned block device support
dm crypt: add flags to optionally bypass kcryptd workqueues
dm bufio: do buffer cleanup from a workqueue
dm rq: don't call blk_mq_queue_stopped() in dm_stop_queue()
dm dust: add interface to list all badblocks
dm dust: report some message results directly back to user
dm verity: add "panic_on_corruption" error handling mode
dm mpath: use double checked locking in fast path
dm mpath: rename current_pgpath to pgpath in multipath_prepare_ioctl
dm mpath: rework __map_bio()
dm mpath: factor out multipath_queue_bio
dm mpath: push locking down to must_push_back_rq()
dm mpath: take m->lock spinlock when testing QUEUE_IF_NO_PATH
dm mpath: changes from initial m->flags locking audit
As said by Linus:
A symmetric naming is only helpful if it implies symmetries in use.
Otherwise it's actively misleading.
In "kzalloc()", the z is meaningful and an important part of what the
caller wants.
In "kzfree()", the z is actively detrimental, because maybe in the
future we really _might_ want to use that "memfill(0xdeadbeef)" or
something. The "zero" part of the interface isn't even _relevant_.
The main reason that kzfree() exists is to clear sensitive information
that should not be leaked to other future users of the same memory
objects.
Rename kzfree() to kfree_sensitive() to follow the example of the recently
added kvfree_sensitive() and make the intention of the API more explicit.
In addition, memzero_explicit() is used to clear the memory to make sure
that it won't get optimized away by the compiler.
The renaming is done by using the command sequence:
git grep -w --name-only kzfree |\
xargs sed -i 's/kzfree/kfree_sensitive/'
followed by some editing of the kfree_sensitive() kerneldoc and adding
a kzfree backward compatibility macro in slab.h.
[akpm@linux-foundation.org: fs/crypto/inline_crypt.c needs linux/slab.h]
[akpm@linux-foundation.org: fix fs/crypto/inline_crypt.c some more]
Suggested-by: Joe Perches <joe@perches.com>
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: David Howells <dhowells@redhat.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Cc: James Morris <jmorris@namei.org>
Cc: "Serge E. Hallyn" <serge@hallyn.com>
Cc: Joe Perches <joe@perches.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Cc: "Jason A . Donenfeld" <Jason@zx2c4.com>
Link: http://lkml.kernel.org/r/20200616154311.12314-3-longman@redhat.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
while to come. Changes include:
- Some new Chinese translations
- Progress on the battle against double words words and non-HTTPS URLs
- Some block-mq documentation
- More RST conversions from Mauro. At this point, that task is
essentially complete, so we shouldn't see this kind of churn again for a
while. Unless we decide to switch to asciidoc or something...:)
- Lots of typo fixes, warning fixes, and more.
-----BEGIN PGP SIGNATURE-----
iQFDBAABCAAtFiEEIw+MvkEiF49krdp9F0NaE2wMflgFAl8oVkwPHGNvcmJldEBs
d24ubmV0AAoJEBdDWhNsDH5YoW8H/jJ/xnXFn7tkgVPQAlL3k5HCnK7A5nDP9RVR
cg1pTx1cEFdjzxPlJyExU6/v+AImOvtweHXC+JDK7YcJ6XFUNYXJI3LxL5KwUXbY
BL/xRFszDSXH2C7SJF5GECcFYp01e/FWSLN3yWAh+g+XwsKiTJ8q9+CoIDkHfPGO
7oQsHKFu6s36Af0LfSgxk4sVB7EJbo8e4psuPsP5SUrl+oXRO43Put0rXkR4yJoH
9oOaB51Do5fZp8I4JVAqGXvpXoExyLMO4yw0mASm6YSZ3KyjR8Fae+HD9Cq4ZuwY
0uzb9K+9NEhqbfwtyBsi99S64/6Zo/MonwKwevZuhtsDTK4l4iU=
=JQLZ
-----END PGP SIGNATURE-----
Merge tag 'docs-5.9' of git://git.lwn.net/linux
Pull documentation updates from Jonathan Corbet:
"It's been a busy cycle for documentation - hopefully the busiest for a
while to come. Changes include:
- Some new Chinese translations
- Progress on the battle against double words words and non-HTTPS
URLs
- Some block-mq documentation
- More RST conversions from Mauro. At this point, that task is
essentially complete, so we shouldn't see this kind of churn again
for a while. Unless we decide to switch to asciidoc or
something...:)
- Lots of typo fixes, warning fixes, and more"
* tag 'docs-5.9' of git://git.lwn.net/linux: (195 commits)
scripts/kernel-doc: optionally treat warnings as errors
docs: ia64: correct typo
mailmap: add entry for <alobakin@marvell.com>
doc/zh_CN: add cpu-load Chinese version
Documentation/admin-guide: tainted-kernels: fix spelling mistake
MAINTAINERS: adjust kprobes.rst entry to new location
devices.txt: document rfkill allocation
PCI: correct flag name
docs: filesystems: vfs: correct flag name
docs: filesystems: vfs: correct sync_mode flag names
docs: path-lookup: markup fixes for emphasis
docs: path-lookup: more markup fixes
docs: path-lookup: fix HTML entity mojibake
CREDITS: Replace HTTP links with HTTPS ones
docs: process: Add an example for creating a fixes tag
doc/zh_CN: add Chinese translation prefer section
doc/zh_CN: add clearing-warn-once Chinese version
doc/zh_CN: add admin-guide index
doc:it_IT: process: coding-style.rst: Correct __maybe_unused compiler label
futex: MAINTAINERS: Re-add selftests directory
...
Enable support for zoned block devices. This is done by:
1) implementing the target report_zones method.
2) adding the DM_TARGET_ZONED_HM flag to the target features.
3) setting DM_CRYPT_NO_WRITE_WORKQUEUE flag to avoid IO
processing via workqueue.
4) Introducing inline write encryption completion to preserve write
ordering.
The last point is implemented by introducing the internal flag
DM_CRYPT_WRITE_INLINE. When set, kcryptd_crypt_write_convert() always
waits inline for the completion of a write request encryption if the
request is not already completed once crypt_convert() returns.
Completion of write request encryption is signaled using the
restart completion by kcryptd_async_done(). This mechanism allows
using ciphers that have an asynchronous implementation, isolating
dm-crypt from any potential request completion reordering for these
ciphers.
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
This is a follow up to [1] that detailed latency problems associated
with dm-crypt's use of workqueues when processing IO.
Current dm-crypt implementation creates a significant IO performance
overhead (at least on small IO block sizes) for both latency and
throughput. We suspect offloading IO request processing into
workqueues and async threads is more harmful these days with the
modern fast storage. I also did some digging into the dm-crypt git
history and much of this async processing is not needed anymore,
because the reasons it was added are mostly gone from the kernel. More
details can be found in [2] (see "Git archeology" section).
This change adds DM_CRYPT_NO_READ_WORKQUEUE and
DM_CRYPT_NO_WRITE_WORKQUEUE flags for read and write BIOs, which
direct dm-crypt to not offload crypto operations into kcryptd
workqueues. In addition, writes are not buffered to be sorted in the
dm-crypt red-black tree, but dispatched immediately. For cases, where
crypto operations cannot happen (hard interrupt context, for example
the read path of some NVME drivers), we offload the work to a tasklet
rather than a workqueue.
These flags only ensure no async BIO processing in the dm-crypt
module. It is worth noting that some Crypto API implementations may
offload encryption into their own workqueues, which are independent of
the dm-crypt and its configuration. However upon enabling these new
flags dm-crypt will instruct Crypto API not to backlog crypto
requests.
To give an idea of the performance gains for certain workloads,
consider the script, and results when tested against various
devices, detailed here:
https://www.redhat.com/archives/dm-devel/2020-July/msg00138.html
[1]: https://www.spinics.net/lists/dm-crypt/msg07516.html
[2]: https://blog.cloudflare.com/speeding-up-linux-disk-encryption/
Signed-off-by: Ignat Korchagin <ignat@cloudflare.com>
Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com>
Reviewed-by: Bob Liu <bob.liu@oracle.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Rationale:
Reduces attack surface on kernel devs opening the links for MITM
as HTTPS traffic is much harder to manipulate.
Deterministic algorithm:
For each file:
If not .svg:
For each line:
If doesn't contain `\bxmlns\b`:
For each link, `\bhttp://[^# \t\r\n]*(?:\w|/)`:
If both the HTTP and HTTPS versions
return 200 OK and serve the same content:
Replace HTTP with HTTPS.
Signed-off-by: Alexander A. Klimov <grandmaster@al2klimov.de>
Link: https://lore.kernel.org/r/20200627103138.71885-1-grandmaster@al2klimov.de
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
generic_make_request has always been very confusingly misnamed, so rename
it to submit_bio_noacct to make it clear that it is submit_bio minus
accounting and a few checks.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
queue_limits::logical_block_size got changed from unsigned short to
unsigned int, but it was forgotten to update crypt_io_hints() to use the
new type. Fix it.
Fixes: ad6bf88a6c ("block: fix an integer overflow in logical block size")
Cc: stable@vger.kernel.org
Signed-off-by: Eric Biggers <ebiggers@google.com>
Reviewed-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
The current codebase makes use of the zero-length array language
extension to the C90 standard, but the preferred mechanism to declare
variable-length types such as these ones is a flexible array member[1][2],
introduced in C99:
struct foo {
int stuff;
struct boo array[];
};
By making use of the mechanism above, we will get a compiler warning
in case the flexible array does not occur last in the structure, which
will help us prevent some kind of undefined behavior bugs from being
inadvertently introduced[3] to the codebase from now on.
Also, notice that, dynamic memory allocations won't be affected by
this change:
"Flexible array members have incomplete type, and so the sizeof operator
may not be applied. As a quirk of the original implementation of
zero-length arrays, sizeof evaluates to zero."[1]
sizeof(flexible-array-member) triggers a warning because flexible array
members have incomplete type[1]. There are some instances of code in
which the sizeof operator is being incorrectly/erroneously applied to
zero-length arrays and the result is zero. Such instances may be hiding
some bugs. So, this work (flexible-array member conversions) will also
help to get completely rid of those sorts of issues.
This issue was found with the help of Coccinelle.
[1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html
[2] https://github.com/KSPP/linux/issues/21
[3] commit 7649773293 ("cxgb3/l2t: Fix undefined behaviour")
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Allow one to use "encrypted" in addition to "user" and "logon" key
types for device encryption.
Signed-off-by: Dmitry Baryshkov <dmitry_baryshkov@mentor.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Replace test_bit(CRYPT_MODE_INTEGRITY_AEAD, XXX) with
crypt_integrity_aead().
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
If benbi IV is used in AEAD construction, for example:
cryptsetup luksFormat <device> --cipher twofish-xts-benbi --key-size 512 --integrity=hmac-sha256
the constructor uses wrong skcipher function and crashes:
BUG: kernel NULL pointer dereference, address: 00000014
...
EIP: crypt_iv_benbi_ctr+0x15/0x70 [dm_crypt]
Call Trace:
? crypt_subkey_size+0x20/0x20 [dm_crypt]
crypt_ctr+0x567/0xfc0 [dm_crypt]
dm_table_add_target+0x15f/0x340 [dm_mod]
Fix this by properly using crypt_aead_blocksize() in this case.
Fixes: ef43aa3806 ("dm crypt: add cryptographic data integrity protection (authenticated encryption)")
Cc: stable@vger.kernel.org # v4.12+
Link: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=941051
Reported-by: Jerad Simpson <jbsimpson@gmail.com>
Signed-off-by: Milan Broz <gmazyland@gmail.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Add experimental support for BitLocker encryption with CBC mode and
additional Elephant diffuser.
The mode was used in older Windows systems and it is provided mainly
for compatibility reasons. The userspace support to activate these
devices is being added to cryptsetup utility.
Read-write activation of such a device is very simple, for example:
echo <password> | cryptsetup bitlkOpen bitlk_image.img test
The Elephant diffuser uses two rotations in opposite direction for
data (Diffuser A and B) and also XOR operation with Sector key over
the sector data; Sector key is derived from additional key data. The
original public documentation is available here:
http://download.microsoft.com/download/0/2/3/0238acaf-d3bf-4a6d-b3d6-0a0be4bbb36e/bitlockercipher200608.pdf
The dm-crypt implementation is embedded to "elephant" IV (similar to
tcw IV construction).
Because we cannot modify original bio data for write (before
encryption), an additional internal flag to pre-process data is
added.
Signed-off-by: Milan Broz <gmazyland@gmail.com>
Reviewed-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
GFP_KERNEL is not supposed to be or'd with GFP_NOFS (the result is
equivalent to GFP_KERNEL). Also, we use GFP_NOIO instead of GFP_NOFS
because we don't want any I/O being submitted in the direct reclaim
path.
Fixes: 39d13a1ac4 ("dm crypt: reuse eboiv skcipher for IV generation")
Cc: stable@vger.kernel.org # v5.4+
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Only the ESSIV IV generation mode used to use cc->cipher so it could
instantiate the bare cipher used to encrypt the IV. However, this is
now taken care of by the ESSIV template, and so no users of cc->cipher
remain. So remove it altogether.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Tested-by: Milan Broz <gmazyland@gmail.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Replace the explicit ESSIV handling in the dm-crypt driver with calls
into the crypto API, which now possesses the capability to perform
this processing within the crypto subsystem.
Note that we reorder the AEAD cipher_api string parsing with the TFM
instantiation: this is needed because cipher_api is mangled by the
ESSIV handling, and throws off the parsing of "authenc(" otherwise.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Tested-by: Milan Broz <gmazyland@gmail.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Instead of instantiating a separate cipher to perform the encryption
needed to produce the IV, reuse the skcipher used for the block data
and invoke it one additional time for each block to encrypt a zero
vector and use the output as the IV.
For CBC mode, this is equivalent to using the bare block cipher, but
without the risk of ending up with a non-time invariant implementation
of AES when the skcipher itself is time variant (e.g., arm64 without
Crypto Extensions has a NEON based time invariant implementation of
cbc(aes) but no time invariant implementation of the core cipher other
than aes-ti, which is not enabled by default).
This approach is a compromise between dm-crypt API flexibility and
reducing dependence on parts of the crypto API that should not usually
be exposed to other subsystems, such as the bare cipher API.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Tested-by: Milan Broz <gmazyland@gmail.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
This IV is used in some BitLocker devices with CBC encryption mode.
IV is encrypted little-endian byte-offset (with the same key and cipher
as the volume).
Signed-off-by: Milan Broz <gmazyland@gmail.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
The URL is no longer valid and the comment is obsolete anyway
(the plumb IV was never used).
Signed-off-by: Milan Broz <gmazyland@gmail.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
If a private IV wipe function fails, the code does not set the key
invalid flag. To fix this, move code to after the flag is set to
prevent the device from resuming in an inconsistent state.
Also, this allows using of a randomized key in private wipe function
(to be used in a following commit).
Signed-off-by: Milan Broz <gmazyland@gmail.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Use struct_size() to avoid open-coded equivalent that is prone to a type
mistake.
Signed-off-by: Zhengyuan Liu <liuzhengyuan@kylinos.cn>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
locking. Requires some list_bl interface improvements.
- Add ability for DM integrity to use a bitmap mode, that tracks regions
where data and metadata are out of sync, instead of using a journal.
- Improve DM thin provisioning target to not write metadata changes to
disk if the thin-pool and associated thin devices are merely
activated but not used. This avoids metadata corruption due to
concurrent activation of thin devices across different OS instances
(e.g. split brain scenarios, which ultimately would be avoided if
proper device filters were used -- but not having proper filtering has
proven a very common configuration mistake)
- Fix missing call to path selector type->end_io in DM multipath. This
fixes reported performance problems due to inaccurate path selector IO
accounting causing an imbalance of IO (e.g. avoiding issuing IO to
particular path due to it seemingly being heavily used).
- Fix bug in DM cache metadata's loading of its discard bitset that
could lead to all cache blocks being discarded if the very first cache
block was discarded (thankfully in practice the first cache block is
generally in use; be it FS superblock, partition table, disk label,
etc).
- Add testing-only DM dust target which simulates a device that has
failing sectors and/or read failures.
- Fix a DM init error path reference count hang that caused boot hangs
if user supplied malformed input on kernel commandline.
- Fix a couple issues with DM crypt target's logging being overly
verbose or lacking context.
- Various other small fixes to DM init, DM multipath, DM zoned, and DM
crypt.
-----BEGIN PGP SIGNATURE-----
iQFHBAABCAAxFiEEJfWUX4UqZ4x1O2wixSPxCi2dA1oFAlzdcCgTHHNuaXR6ZXJA
cmVkaGF0LmNvbQAKCRDFI/EKLZ0DWsxZB/9idHl8LmwwL1JzBfi/XX7bWxwqDQLo
j1b3ycQ14AKVau4VCkmgDuRIfMDuU6PIAVvsMeVbF3aCE0fZ7zbEV1qHefbtJuCL
MMm//KbrhIT8oMKYUWtlOj7XI9MT6ErFzfActBZ6UF6r21m1N3bohhVGN7kvCnJm
wgmSlnz/m2GLKK8gQx+OisnAh0nlje3PIdIYPu7uWN6t0FF2XRz3UwWTuyw7lYhC
Rx2J+sOIL02CtadhHKLMCG8OutRXWP01cBSohUVJIMGihWfbe6aqvhG5afbqb4bG
UQrXl477ry5zyQ4fAU2JKZ+8qFvc1FoLLknKrZQu+uYPRokUPw/AwiL7
=mOH3
-----END PGP SIGNATURE-----
Merge tag 'for-5.2/dm-changes-v2' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm
Pull device mapper updates from Mike Snitzer:
- Improve DM snapshot target's scalability by using finer grained
locking. Requires some list_bl interface improvements.
- Add ability for DM integrity to use a bitmap mode, that tracks
regions where data and metadata are out of sync, instead of using a
journal.
- Improve DM thin provisioning target to not write metadata changes to
disk if the thin-pool and associated thin devices are merely
activated but not used. This avoids metadata corruption due to
concurrent activation of thin devices across different OS instances
(e.g. split brain scenarios, which ultimately would be avoided if
proper device filters were used -- but not having proper filtering
has proven a very common configuration mistake)
- Fix missing call to path selector type->end_io in DM multipath. This
fixes reported performance problems due to inaccurate path selector
IO accounting causing an imbalance of IO (e.g. avoiding issuing IO to
particular path due to it seemingly being heavily used).
- Fix bug in DM cache metadata's loading of its discard bitset that
could lead to all cache blocks being discarded if the very first
cache block was discarded (thankfully in practice the first cache
block is generally in use; be it FS superblock, partition table, disk
label, etc).
- Add testing-only DM dust target which simulates a device that has
failing sectors and/or read failures.
- Fix a DM init error path reference count hang that caused boot hangs
if user supplied malformed input on kernel commandline.
- Fix a couple issues with DM crypt target's logging being overly
verbose or lacking context.
- Various other small fixes to DM init, DM multipath, DM zoned, and DM
crypt.
* tag 'for-5.2/dm-changes-v2' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: (42 commits)
dm: fix a couple brace coding style issues
dm crypt: print device name in integrity error message
dm crypt: move detailed message into debug level
dm ioctl: fix hang in early create error condition
dm integrity: whitespace, coding style and dead code cleanup
dm integrity: implement synchronous mode for reboot handling
dm integrity: handle machine reboot in bitmap mode
dm integrity: add a bitmap mode
dm integrity: introduce a function add_new_range_and_wait()
dm integrity: allow large ranges to be described
dm ingerity: pass size to dm_integrity_alloc_page_list()
dm integrity: introduce rw_journal_sectors()
dm integrity: update documentation
dm integrity: don't report unused options
dm integrity: don't check null pointer before kvfree and vfree
dm integrity: correctly calculate the size of metadata area
dm dust: Make dm_dust_init and dm_dust_exit static
dm dust: remove redundant unsigned comparison to less than zero
dm mpath: always free attached_handler_name in parse_path()
dm init: fix max devices/targets checks
...
This message should better identify the DM device with the integrity
failure.
Signed-off-by: Milan Broz <gmazyland@gmail.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
The information about tag size should not be printed without debug info
set. Also print device major:minor in the error message to identify the
device instance.
Also use rate limiting and debug level for info about used crypto API
implementaton. This is important because during online reencryption
the existing message saturates syslog (because we are moving hotzone
across the whole device).
Cc: stable@vger.kernel.org
Signed-off-by: Milan Broz <gmazyland@gmail.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
-----BEGIN PGP SIGNATURE-----
iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAlzR0AAQHGF4Ym9lQGtl
cm5lbC5kawAKCRD301j7KXHgpo0MD/47D1kBK9rGzkAwIz1Jkh1Qy/ITVaDJzmHJ
UP5uncQsgKFLKMR1LbRcrWtmk2MwFDNULGbteHFeCYE1ypCrTgpWSp5+SJluKd1Q
hma9krLSAXO9QiSaZ4jafshXFIZxz6IjakOW8c9LrT80Ze47yh7AxiLwDafcp/Jj
x6NW790qB7ENDtfarDkZk14NCS8HGLRHO5B21LB+hT0Kfbh0XZaLzJdj7Mck1wPA
VT8hL9mPuA++AjF7Ra4kUjwSakgmajTa3nS2fpkwTYdztQfas7x5Jiv7FWxrrelb
qbabkNkWKepcHAPEiZR7o53TyfCucGeSK/jG+dsJ9KhNp26kl1ci3frl5T6PfVMP
SPPDjsKIHs+dqFrU9y5rSGhLJqewTs96hHthnLGxyF67+5sRb5+YIy+dcqgiyc/b
TUVyjCD6r0cO2q4v9VhwnhOyeBUA9Rwbu8nl7JV5Q45uG7qI4BC39l1jfubMNDPO
GLNGUUzb6ER7z6lYINjRSF2Jhejsx8SR9P7jhpb1Q7k/VvDDxO1T4FpwvqWFz9+s
Gn+s6//+cA6LL+42eZkQjvwF2CUNE7TaVT8zdb+s5HP1RQkZToqUnsQCGeRTrFni
RqWXfW9o9+awYRp431417oMdX/LvLGq9+ZtifRk9DqDcowXevTaf0W2RpplWSuiX
RcCuPeLAVg==
=Ot0g
-----END PGP SIGNATURE-----
Merge tag 'for-5.2/block-20190507' of git://git.kernel.dk/linux-block
Pull block updates from Jens Axboe:
"Nothing major in this series, just fixes and improvements all over the
map. This contains:
- Series of fixes for sed-opal (David, Jonas)
- Fixes and performance tweaks for BFQ (via Paolo)
- Set of fixes for bcache (via Coly)
- Set of fixes for md (via Song)
- Enabling multi-page for passthrough requests (Ming)
- Queue release fix series (Ming)
- Device notification improvements (Martin)
- Propagate underlying device rotational status in loop (Holger)
- Removal of mtip32xx trim support, which has been disabled for years
(Christoph)
- Improvement and cleanup of nvme command handling (Christoph)
- Add block SPDX tags (Christoph)
- Cleanup/hardening of bio/bvec iteration (Christoph)
- A few NVMe pull requests (Christoph)
- Removal of CONFIG_LBDAF (Christoph)
- Various little fixes here and there"
* tag 'for-5.2/block-20190507' of git://git.kernel.dk/linux-block: (164 commits)
block: fix mismerge in bvec_advance
block: don't drain in-progress dispatch in blk_cleanup_queue()
blk-mq: move cancel of hctx->run_work into blk_mq_hw_sysfs_release
blk-mq: always free hctx after request queue is freed
blk-mq: split blk_mq_alloc_and_init_hctx into two parts
blk-mq: free hw queue's resource in hctx's release handler
blk-mq: move cancel of requeue_work into blk_mq_release
blk-mq: grab .q_usage_counter when queuing request from plug code path
block: fix function name in comment
nvmet: protect discovery change log event list iteration
nvme: mark nvme_core_init and nvme_core_exit static
nvme: move command size checks to the core
nvme-fabrics: check more command sizes
nvme-pci: check more command sizes
nvme-pci: remove an unneeded variable initialization
nvme-pci: unquiesce admin queue on shutdown
nvme-pci: shutdown on timeout during deletion
nvme-pci: fix psdt field for single segment sgls
nvme-multipath: don't print ANA group state by default
nvme-multipath: split bios with the ns_head bio_set before submitting
...
We only have two callers that need the integer loop iterator, and they
can easily maintain it themselves.
Suggested-by: Matthew Wilcox <willy@infradead.org>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Acked-by: David Sterba <dsterba@suse.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Acked-by: Coly Li <colyli@suse.de>
Reviewed-by: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
The flags field in 'struct shash_desc' never actually does anything.
The only ostensibly supported flag is CRYPTO_TFM_REQ_MAY_SLEEP.
However, no shash algorithm ever sleeps, making this flag a no-op.
With this being the case, inevitably some users who can't sleep wrongly
pass MAY_SLEEP. These would all need to be fixed if any shash algorithm
actually started sleeping. For example, the shash_ahash_*() functions,
which wrap a shash algorithm with the ahash API, pass through MAY_SLEEP
from the ahash API to the shash API. However, the shash functions are
called under kmap_atomic(), so actually they're assumed to never sleep.
Even if it turns out that some users do need preemption points while
hashing large buffers, we could easily provide a helper function
crypto_shash_update_large() which divides the data into smaller chunks
and calls crypto_shash_update() and cond_resched() for each chunk. It's
not necessary to have a flag in 'struct shash_desc', nor is it necessary
to make individual shash algorithms aware of this at all.
Therefore, remove shash_desc::flags, and document that the
crypto_shash_*() functions can be called from any context.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
The sector used here is a little endian value, so use the right
type for it.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
-----BEGIN PGP SIGNATURE-----
iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAlx63XIQHGF4Ym9lQGtl
cm5lbC5kawAKCRD301j7KXHgpp2vEACfrrQsap7R+Av28mmXpmXi2FPa3g5Tev1t
yYjK2qHvhlMZjPTYw3hCmbYdDDczlF7PEgSE2x2DjdcsYapb8Fy1lZ2X16c7ztBR
HD/t9b5AVSQsczZzKgv3RqsNtTnjzS5V0A8XH8FAP2QRgiwDMwSN6G0FP0JBLbE/
ZgxQrH1Iy1F33Wz4hI3Z7dEghKPZrH1IlegkZCEu47q9SlWS76qUetSy2GEtchOl
3Lgu54mQZyVdI5/QZf9DyMDLF6dIz3tYU2qhuo01AHjGRCC72v86p8sIiXcUr94Q
8pbegJhJ/g8KBol9Qhv3+pWG/QUAZwi/ZwasTkK+MJ4klRXfOrznxPubW1z6t9Vn
QRo39Po5SqqP0QWAscDxCFjESIQlWlKa+LZurJL7DJDCUGrSgzTpnVwFqKwc5zTP
HJa5MT2tEeL2TfUYRYCfh0ZV0elINdHA1y1klDBh38drh4EWr2gW8xdseGYXqRjh
fLgEpoF7VQ8kTvxKN+E4jZXkcZmoLmefp0ZyAbblS6IawpPVC7kXM9Fdn2OU8f2c
fjVjvSiqxfeN6dnpfeLDRbbN9894HwgP/LPropJOQ7KmjCorQq5zMDkAvoh3tElq
qwluRqdBJpWT/F05KweY+XVW8OawIycmUWqt6JrVNoIDAK31auHQv47kR0VA4OvE
DRVVhYpocw==
=VBaU
-----END PGP SIGNATURE-----
Merge tag 'for-5.1/block-20190302' of git://git.kernel.dk/linux-block
Pull block layer updates from Jens Axboe:
"Not a huge amount of changes in this round, the biggest one is that we
finally have Mings multi-page bvec support merged. Apart from that,
this pull request contains:
- Small series that avoids quiescing the queue for sysfs changes that
match what we currently have (Aleksei)
- Series of bcache fixes (via Coly)
- Series of lightnvm fixes (via Mathias)
- NVMe pull request from Christoph. Nothing major, just SPDX/license
cleanups, RR mp policy (Hannes), and little fixes (Bart,
Chaitanya).
- BFQ series (Paolo)
- Save blk-mq cpu -> hw queue mapping, removing a pointer indirection
for the fast path (Jianchao)
- fops->iopoll() added for async IO polling, this is a feature that
the upcoming io_uring interface will use (Christoph, me)
- Partition scan loop fixes (Dongli)
- mtip32xx conversion from managed resource API (Christoph)
- cdrom registration race fix (Guenter)
- MD pull from Song, two minor fixes.
- Various documentation fixes (Marcos)
- Multi-page bvec feature. This brings a lot of nice improvements
with it, like more efficient splitting, larger IOs can be supported
without growing the bvec table size, and so on. (Ming)
- Various little fixes to core and drivers"
* tag 'for-5.1/block-20190302' of git://git.kernel.dk/linux-block: (117 commits)
block: fix updating bio's front segment size
block: Replace function name in string with __func__
nbd: propagate genlmsg_reply return code
floppy: remove set but not used variable 'q'
null_blk: fix checking for REQ_FUA
block: fix NULL pointer dereference in register_disk
fs: fix guard_bio_eod to check for real EOD errors
blk-mq: use HCTX_TYPE_DEFAULT but not 0 to index blk_mq_tag_set->map
block: optimize bvec iteration in bvec_iter_advance
block: introduce mp_bvec_for_each_page() for iterating over page
block: optimize blk_bio_segment_split for single-page bvec
block: optimize __blk_segment_map_sg() for single-page bvec
block: introduce bvec_nth_page()
iomap: wire up the iopoll method
block: add bio_set_polled() helper
block: wire up block device iopoll method
fs: add an iopoll method to struct file_operations
loop: set GENHD_FL_NO_PART_SCAN after blkdev_reread_part()
loop: do not print warn message if partition scan is successful
block: bounce: make sure that bvec table is updated
...
This patch introduces one extra iterator variable to bio_for_each_segment_all(),
then we can allow bio_for_each_segment_all() to iterate over multi-page bvec.
Given it is just one mechannical & simple change on all bio_for_each_segment_all()
users, this patch does tree-wide change in one single patch, so that we can
avoid to use a temporary helper for this conversion.
Reviewed-by: Omar Sandoval <osandov@fb.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
bio_sectors() returns the value in the units of 512-byte sectors (no
matter what the real sector size of the device). dm-crypt multiplies
bio_sectors() by on_disk_tag_size to calculate the space allocated for
integrity tags. If dm-crypt is running with sector size larger than
512b, it allocates more data than is needed.
Device Mapper trims the extra space when passing the bio to
dm-integrity, so this bug didn't result in any visible misbehavior.
But it must be fixed to avoid wasteful memory allocation for the block
integrity payload.
Fixes: ef43aa3806 ("dm crypt: add cryptographic data integrity protection (authenticated encryption)")
Cc: stable@vger.kernel.org # 4.12+
Reported-by: Milan Broz <mbroz@redhat.com>
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
The dm-crypt cipher specification in a mapping table is defined as:
cipher[:keycount]-chainmode-ivmode[:ivopts]
or (new crypt API format):
capi:cipher_api_spec-ivmode[:ivopts]
For ESSIV, the parameter includes hash specification, for example:
aes-cbc-essiv:sha256
The implementation expected that additional IV option to never include
another dash '-' character.
But, with SHA3, there are names like sha3-256; so the mapping table
parser fails:
dmsetup create test --table "0 8 crypt aes-cbc-essiv:sha3-256 9c1185a5c5e9fc54612808977ee8f5b9e 0 /dev/sdb 0"
or (new crypt API format)
dmsetup create test --table "0 8 crypt capi:cbc(aes)-essiv:sha3-256 9c1185a5c5e9fc54612808977ee8f5b9e 0 /dev/sdb 0"
device-mapper: crypt: Ignoring unexpected additional cipher options
device-mapper: table: 253:0: crypt: Error creating IV
device-mapper: ioctl: error adding target to table
Fix the dm-crypt constructor to ignore additional dash in IV options and
also remove a bogus warning (that is ignored anyway).
Cc: stable@vger.kernel.org # 4.12+
Signed-off-by: Milan Broz <gmazyland@gmail.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Merge misc updates from Andrew Morton:
- large KASAN update to use arm's "software tag-based mode"
- a few misc things
- sh updates
- ocfs2 updates
- just about all of MM
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (167 commits)
kernel/fork.c: mark 'stack_vm_area' with __maybe_unused
memcg, oom: notify on oom killer invocation from the charge path
mm, swap: fix swapoff with KSM pages
include/linux/gfp.h: fix typo
mm/hmm: fix memremap.h, move dev_page_fault_t callback to hmm
hugetlbfs: Use i_mmap_rwsem to fix page fault/truncate race
hugetlbfs: use i_mmap_rwsem for more pmd sharing synchronization
memory_hotplug: add missing newlines to debugging output
mm: remove __hugepage_set_anon_rmap()
include/linux/vmstat.h: remove unused page state adjustment macro
mm/page_alloc.c: allow error injection
mm: migrate: drop unused argument of migrate_page_move_mapping()
blkdev: avoid migration stalls for blkdev pages
mm: migrate: provide buffer_migrate_page_norefs()
mm: migrate: move migrate_page_lock_buffers()
mm: migrate: lock buffers before migrate_page_move_mapping()
mm: migration: factor out code to compute expected number of page references
mm, page_alloc: enable pcpu_drain with zone capability
kmemleak: add config to select auto scan
mm/page_alloc.c: don't call kasan_free_pages() at deferred mem init
...
- Fix DM to allow reads that exceed readahead limits by setting io_pages
in the backing_dev_info.
- A couple code cleanups in request-based DM.
- Fix various DM targets to check for device sector overflow if
CONFIG_LBDAF is not set.
- Use u64 instead of sector_t to store iv_offset in DM crypt; sector_t
isn't large enough on 32bit when CONFIG_LBDAF is not set.
- Performance fixes to DM's kcopyd and the snapshot target focused on
limiting memory use and workqueue stalls.
- Fix typos in the integrity and writecache targets.
- Log which algorithm is used for dm-crypt's encryption and
dm-integrity's hashing.
- Fix false -EBUSY errors in DM raid target's handling of check/repair
messages.
- Fix DM flakey target's corrupt_bio_byte feature to reliably corrupt
the Nth byte in a bio's payload.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQEcBAABAgAGBQJcHu6MAAoJEMUj8QotnQNar9gIAKyJ5hH0ik3mlLhAyPcMkRU+
MPvjTAC8V5V33VY8YqKD/TcDHY8yaGTPKSK0SAnB8Jv2tLz8mp0H18syJbMvNcFc
MS/xWYLAZiXf5xJ4eJpo8ayDvU6DAkKtSbG7VHJ8gnx/WV8bW1b90OCIq8GN+jHo
DPp6RDzy3LvonoZ9OPognQKUDXK1crdPDiv1Wzw9Zjd4JlruIaGgJOc1RLB7YdVK
u5c7UD7cT+MCROEzJrctAZa5oRVhuemd83HPvwE2akRFvCSNtc7aIVtNfJ79nqL7
19oUDfsuzf2zkiJLzO0oU4OrJ+lUGDJf/sPXmU1qlPyKdUqYKFxT+KHvBnhAQXI=
=pCs8
-----END PGP SIGNATURE-----
Merge tag 'for-4.21/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm
Pull device mapper updates from Mike Snitzer:
- Eliminate a couple indirect calls from bio-based DM core.
- Fix DM to allow reads that exceed readahead limits by setting
io_pages in the backing_dev_info.
- A couple code cleanups in request-based DM.
- Fix various DM targets to check for device sector overflow if
CONFIG_LBDAF is not set.
- Use u64 instead of sector_t to store iv_offset in DM crypt; sector_t
isn't large enough on 32bit when CONFIG_LBDAF is not set.
- Performance fixes to DM's kcopyd and the snapshot target focused on
limiting memory use and workqueue stalls.
- Fix typos in the integrity and writecache targets.
- Log which algorithm is used for dm-crypt's encryption and
dm-integrity's hashing.
- Fix false -EBUSY errors in DM raid target's handling of check/repair
messages.
- Fix DM flakey target's corrupt_bio_byte feature to reliably corrupt
the Nth byte in a bio's payload.
* tag 'for-4.21/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
dm: do not allow readahead to limit IO size
dm raid: fix false -EBUSY when handling check/repair message
dm rq: cleanup leftover code from recently removed q->mq_ops branching
dm verity: log the hash algorithm implementation
dm crypt: log the encryption algorithm implementation
dm integrity: fix spelling mistake in workqueue name
dm flakey: Properly corrupt multi-page bios.
dm: Check for device sector overflow if CONFIG_LBDAF is not set
dm crypt: use u64 instead of sector_t to store iv_offset
dm kcopyd: Fix bug causing workqueue stalls
dm snapshot: Fix excessive memory usage and workqueue stalls
dm bufio: update comment in dm-bufio.c
dm writecache: fix typo in error msg for creating writecache_flush_thread
dm: remove indirect calls from __send_changing_extent_only()
dm mpath: only flush workqueue when needed
dm rq: remove unused arguments from rq_completed()
dm: avoid indirect call in __dm_make_request
totalram_pages and totalhigh_pages are made static inline function.
Main motivation was that managed_page_count_lock handling was complicating
things. It was discussed in length here,
https://lore.kernel.org/patchwork/patch/995739/#1181785 So it seemes
better to remove the lock and convert variables to atomic, with preventing
poteintial store-to-read tearing as a bonus.
[akpm@linux-foundation.org: coding style fixes]
Link: http://lkml.kernel.org/r/1542090790-21750-4-git-send-email-arunks@codeaurora.org
Signed-off-by: Arun KS <arunks@codeaurora.org>
Suggested-by: Michal Hocko <mhocko@suse.com>
Suggested-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Reviewed-by: Pavel Tatashin <pasha.tatashin@soleen.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: David Hildenbrand <david@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Log the encryption algorithm's driver name when a dm-crypt target is
created. This will help people determine whether the expected
implementation is being used. In some cases we've seen people do
benchmarks and reject using encryption for performance reasons, when in
fact they used a much slower implementation than was possible on the
hardware. It can make an enormous difference; e.g., AES-XTS on ARM can
be over 10x faster with the crypto extensions than without. It can also
be useful to know if an implementation using an external crypto
accelerator is being used instead of a software implementation.
Example message:
[ 29.307629] device-mapper: crypt: xts(aes) using implementation "xts-aes-ce"
We've already found the similar message in fs/crypto/keyinfo.c to be
very useful.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Reference to a device in device-mapper table contains offset in sectors.
If the sector_t is 32bit integer (CONFIG_LBDAF is not set), then
several device-mapper targets can overflow this offset and validity
check is then performed on a wrong offset and a wrong table is activated.
See for example (on 32bit without CONFIG_LBDAF) this overflow:
# dmsetup create test --table "0 2048 linear /dev/sdg 4294967297"
# dmsetup table test
0 2048 linear 8:96 1
This patch adds explicit check for overflow if the offset is sector_t type.
Signed-off-by: Milan Broz <gmazyland@gmail.com>
Reviewed-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
The iv_offset in the mapping table of crypt target is a 64bit number when
IV algorithm is plain64, plain64be, essiv or benbi. It will be assigned to
iv_offset of struct crypt_config, cc_sector of struct convert_context and
iv_sector of struct dm_crypt_request. These structures members are defined
as a sector_t. But sector_t is 32bit when CONFIG_LBDAF is not set in 32bit
kernel. In this situation sector_t is not big enough to store the 64bit
iv_offset.
Here is a reproducer.
Prepare test image and device (loop is automatically allocated by cryptsetup):
# dd if=/dev/zero of=tst.img bs=1M count=1
# echo "tst"|cryptsetup open --type plain -c aes-xts-plain64 \
--skip 500000000000000000 tst.img test
On 32bit system (use IV offset value that overflows to 64bit; CONFIG_LBDAF if off)
and device checksum is wrong:
# dmsetup table test --showkeys
0 2048 crypt aes-xts-plain64 dfa7cfe3c481f2239155739c42e539ae8f2d38f304dcc89d20b26f69daaf0933 3551657984 7:0 0
# sha256sum /dev/mapper/test
533e25c09176632b3794f35303488c4a8f3f965dffffa6ec2df347c168cb6c19 /dev/mapper/test
On 64bit system (and on 32bit system with the patch), table and checksum is now correct:
# dmsetup table test --showkeys
0 2048 crypt aes-xts-plain64 dfa7cfe3c481f2239155739c42e539ae8f2d38f304dcc89d20b26f69daaf0933 500000000000000000 7:0 0
# sha256sum /dev/mapper/test
5d16160f9d5f8c33d8051e65fdb4f003cc31cd652b5abb08f03aa6fce0df75fc /dev/mapper/test
Signed-off-by: AliOS system security <alios_sys_security@linux.alibaba.com>
Tested-and-Reviewed-by: Milan Broz <gmazyland@gmail.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
'cipher' algorithms (single block ciphers) are always synchronous, so
passing CRYPTO_ALG_ASYNC in the mask to crypto_alloc_cipher() has no
effect. Many users therefore already don't pass it, but some still do.
This inconsistency can cause confusion, especially since the way the
'mask' argument works is somewhat counterintuitive.
Thus, just remove the unneeded CRYPTO_ALG_ASYNC flags.
This patch shouldn't change any actual behavior.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
There's a XFS on dm-crypt deadlock, recursing back to itself due to the
crypto subsystems use of GFP_KERNEL, reported here:
https://bugzilla.kernel.org/show_bug.cgi?id=200835
* dm-crypt calls crypt_convert in xts mode
* init_crypt from xts.c calls kmalloc(GFP_KERNEL)
* kmalloc(GFP_KERNEL) recurses into the XFS filesystem, the filesystem
tries to submit some bios and wait for them, causing a deadlock
Fix this by updating both the DM crypt and integrity targets to no
longer use the CRYPTO_TFM_REQ_MAY_SLEEP flag, which will change the
crypto allocations from GFP_KERNEL to GFP_ATOMIC, therefore they can't
recurse into a filesystem. A GFP_ATOMIC allocation can fail, but
init_crypt() in xts.c handles the allocation failure gracefully - it
will fall back to preallocated buffer if the allocation fails.
The crypto API maintainer says that the crypto API only needs to
allocate memory when dealing with unaligned buffers and therefore
turning CRYPTO_TFM_REQ_MAY_SLEEP off is safe (see this discussion:
https://www.redhat.com/archives/dm-devel/2018-August/msg00195.html )
Cc: stable@vger.kernel.org
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
dm-crypt should only increase device limits, it should not decrease them.
This fixes a bug where the user could creates a crypt device with 1024
sector size on the top of scsi device that had 4096 logical block size.
The limit 4096 would be lost and the user could incorrectly send
1024-I/Os to the crypt device.
Cc: stable@vger.kernel.org
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
In preparing to remove all stack VLA usage from the kernel[1], remove
the discouraged use of AHASH_REQUEST_ON_STACK in favor of the smaller
SHASH_DESC_ON_STACK by converting from ahash-wrapped-shash to direct
shash. The stack allocation will be made a fixed size in a later patch
to the crypto subsystem.
[1] https://lkml.kernel.org/r/CA+55aFzCG-zNmZwX4A2FQpadafLfEzK6CC=qPXydAacU1RqZWA@mail.gmail.com
Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
This is a small simplification of dm-crypt - use wake_up_process()
instead of a wait queue in a case where only one process may be
waiting. dm-writecache uses a similar pattern.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
- Additional struct_size() conversions (Matthew, Kees)
- Explicitly reported overflow fixes (Silvio, Kees)
- Add missing kvcalloc() function (Kees)
- Treewide conversions of allocators to use either 2-factor argument
variant when available, or array_size() and array3_size() as needed (Kees)
-----BEGIN PGP SIGNATURE-----
Comment: Kees Cook <kees@outflux.net>
iQJKBAABCgA0FiEEpcP2jyKd1g9yPm4TiXL039xtwCYFAlsgVtMWHGtlZXNjb29r
QGNocm9taXVtLm9yZwAKCRCJcvTf3G3AJhsJEACLYe2EbwLFJz7emOT1KUGK5R1b
oVxJog0893WyMqgk9XBlA2lvTBRBYzR3tzsadfYo87L3VOBzazUv0YZaweJb65sF
bAvxW3nY06brhKKwTRed1PrMa1iG9R63WISnNAuZAq7+79mN6YgW4G6YSAEF9lW7
oPJoPw93YxcI8JcG+dA8BC9w7pJFKooZH4gvLUSUNl5XKr8Ru5YnWcV8F+8M4vZI
EJtXFmdlmxAledUPxTSCIojO8m/tNOjYTreBJt9K1DXKY6UcgAdhk75TRLEsp38P
fPvMigYQpBDnYz2pi9ourTgvZLkffK1OBZ46PPt8BgUZVf70D6CBg10vK47KO6N2
zreloxkMTrz5XohyjfNjYFRkyyuwV2sSVrRJqF4dpyJ4NJQRjvyywxIP4Myifwlb
ONipCM1EjvQjaEUbdcqKgvlooMdhcyxfshqJWjHzXB6BL22uPzq5jHXXugz8/ol8
tOSM2FuJ2sBLQso+szhisxtMd11PihzIZK9BfxEG3du+/hlI+2XgN7hnmlXuA2k3
BUW6BSDhab41HNd6pp50bDJnL0uKPWyFC6hqSNZw+GOIb46jfFcQqnCB3VZGCwj3
LH53Be1XlUrttc/NrtkvVhm4bdxtfsp4F7nsPFNDuHvYNkalAVoC3An0BzOibtkh
AtfvEeaPHaOyD8/h2Q==
=zUUp
-----END PGP SIGNATURE-----
Merge tag 'overflow-v4.18-rc1-part2' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux
Pull more overflow updates from Kees Cook:
"The rest of the overflow changes for v4.18-rc1.
This includes the explicit overflow fixes from Silvio, further
struct_size() conversions from Matthew, and a bug fix from Dan.
But the bulk of it is the treewide conversions to use either the
2-factor argument allocators (e.g. kmalloc(a * b, ...) into
kmalloc_array(a, b, ...) or the array_size() macros (e.g. vmalloc(a *
b) into vmalloc(array_size(a, b)).
Coccinelle was fighting me on several fronts, so I've done a bunch of
manual whitespace updates in the patches as well.
Summary:
- Error path bug fix for overflow tests (Dan)
- Additional struct_size() conversions (Matthew, Kees)
- Explicitly reported overflow fixes (Silvio, Kees)
- Add missing kvcalloc() function (Kees)
- Treewide conversions of allocators to use either 2-factor argument
variant when available, or array_size() and array3_size() as needed
(Kees)"
* tag 'overflow-v4.18-rc1-part2' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: (26 commits)
treewide: Use array_size in f2fs_kvzalloc()
treewide: Use array_size() in f2fs_kzalloc()
treewide: Use array_size() in f2fs_kmalloc()
treewide: Use array_size() in sock_kmalloc()
treewide: Use array_size() in kvzalloc_node()
treewide: Use array_size() in vzalloc_node()
treewide: Use array_size() in vzalloc()
treewide: Use array_size() in vmalloc()
treewide: devm_kzalloc() -> devm_kcalloc()
treewide: devm_kmalloc() -> devm_kmalloc_array()
treewide: kvzalloc() -> kvcalloc()
treewide: kvmalloc() -> kvmalloc_array()
treewide: kzalloc_node() -> kcalloc_node()
treewide: kzalloc() -> kcalloc()
treewide: kmalloc() -> kmalloc_array()
mm: Introduce kvcalloc()
video: uvesafb: Fix integer overflow in allocation
UBIFS: Fix potential integer overflow in allocation
leds: Use struct_size() in allocation
Convert intel uncore to struct_size
...
Eliminate most holes in DM data structures that were modified by
commit 6f1c819c21 ("dm: convert to bioset_init()/mempool_init()").
Also prevent structure members from unnecessarily spanning cache
lines.
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
The counter for the number of allocated pages includes pages in the
mempool's reserve, so checking that the number of allocated pages is 0
needs to happen after we exit the mempool.
Fixes: 6f1c819c21 ("dm: convert to bioset_init()/mempool_init()")
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Reported-by: Krzysztof Kozlowski <krzk@kernel.org>
Acked-by: Mike Snitzer <snitzer@redhat.com>
Fixed to always just use percpu_counter_sum()
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Convert dm to embedded bio sets.
Acked-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
dm-crypt consumes an excessive amount memory when the user attempts to
zero a dm-crypt device with "blkdiscard -z". The command "blkdiscard -z"
calls the BLKZEROOUT ioctl, it goes to the function __blkdev_issue_zeroout,
__blkdev_issue_zeroout sends a large amount of write bios that contain
the zero page as their payload.
For each incoming page, dm-crypt allocates another page that holds the
encrypted data, so when processing "blkdiscard -z", dm-crypt tries to
allocate the amount of memory that is equal to the size of the device.
This can trigger OOM killer or cause system crash.
Fix this by limiting the amount of memory that dm-crypt allocates to 2%
of total system memory. This limit is system-wide and is divided by the
number of active dm-crypt devices and each device receives an equal
share.
Cc: stable@vger.kernel.org
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Could be useful for a target to return stats or other information.
If a target does DMEMIT() anything to @result from its .message method
then it must return 1 to the caller.
Signed-off-By: Mike Snitzer <snitzer@redhat.com>
walk; this is critical to allow forward progress without the need to
use the bioset's BIOSET_NEED_RESCUER.
- Remove DM core's BIOSET_NEED_RESCUER based dm_offload infrastructure.
- DM core cleanups and improvements to make bio-based DM more efficient
(e.g. reduced memory footprint as well leveraging per-bio-data more).
- Introduce new bio-based mode (DM_TYPE_NVME_BIO_BASED) that leverages
the more direct IO submission path in the block layer; this mode is
used by DM multipath and also optimizes targets like DM thin-pool that
stack directly on NVMe data device.
- DM multipath improvements to factor out legacy SCSI-only
(e.g. scsi_dh) code paths to allow for more optimized support for NVMe
multipath.
- A fix for DM multipath path selectors (service-time and queue-length)
to select paths in a more balanced way; largely academic but doesn't
hurt.
- Numerous DM raid target fixes and improvements.
- Add a new DM "unstriped" target that enables Intel to workaround
firmware limitations in some NVMe drives that are striped internally
(this target also works when stacked above the DM "striped" target).
- Various Documentation fixes and improvements.
- Misc. cleanups and fixes across various DM infrastructure and targets
(e.g. bufio, flakey, log-writes, snapshot).
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQEcBAABAgAGBQJacgwPAAoJEMUj8QotnQNaEw0H/0XRTcg8/lRuGl46kdeI3PgR
ZxUy4XgUrCLiACWO5yCU/nKipB32+3xTlTDTBcjmaBfX8HolH147Pasb1KdHqLVC
dOWLMpjlFztb5fnuOMitJA05qQAbgRlZ52QdVk/FDo9yWicgWjQZduh8aYX53pHw
6XOYWzSFAXQcaduPdz6TLiPw479xBwIpXxQbrO09f4qt3Ub4bqknEhzFXc+6M7zl
ejmW/bG2Qg6WmsfAuaAhFTV0LpTPSEzvaq9TfR7yqFU3DvDIAi7Yh8eQinIUDo4u
txpOGoESRAMPAMKH0/UJdr/u7jTsfgJox4QEavWfnViPvkouah5KdjVOL1veZ5U=
=R3dN
-----END PGP SIGNATURE-----
Merge tag 'for-4.16/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm
Pull device mapper updates from Mike Snitzer:
- DM core fixes to ensure that bio submission follows a depth-first
tree walk; this is critical to allow forward progress without the
need to use the bioset's BIOSET_NEED_RESCUER.
- Remove DM core's BIOSET_NEED_RESCUER based dm_offload infrastructure.
- DM core cleanups and improvements to make bio-based DM more efficient
(e.g. reduced memory footprint as well leveraging per-bio-data more).
- Introduce new bio-based mode (DM_TYPE_NVME_BIO_BASED) that leverages
the more direct IO submission path in the block layer; this mode is
used by DM multipath and also optimizes targets like DM thin-pool
that stack directly on NVMe data device.
- DM multipath improvements to factor out legacy SCSI-only (e.g.
scsi_dh) code paths to allow for more optimized support for NVMe
multipath.
- A fix for DM multipath path selectors (service-time and queue-length)
to select paths in a more balanced way; largely academic but doesn't
hurt.
- Numerous DM raid target fixes and improvements.
- Add a new DM "unstriped" target that enables Intel to workaround
firmware limitations in some NVMe drives that are striped internally
(this target also works when stacked above the DM "striped" target).
- Various Documentation fixes and improvements.
- Misc cleanups and fixes across various DM infrastructure and targets
(e.g. bufio, flakey, log-writes, snapshot).
* tag 'for-4.16/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: (69 commits)
dm cache: Documentation: update default migration_throttling value
dm mpath selector: more evenly distribute ties
dm unstripe: fix target length versus number of stripes size check
dm thin: fix trailing semicolon in __remap_and_issue_shared_cell
dm table: fix NVMe bio-based dm_table_determine_type() validation
dm: various cleanups to md->queue initialization code
dm mpath: delay the retry of a request if the target responded as busy
dm mpath: return DM_MAPIO_DELAY_REQUEUE if QUEUE_IO or PG_INIT_REQUIRED
dm mpath: return DM_MAPIO_REQUEUE on blk-mq rq allocation failure
dm log writes: fix max length used for kstrndup
dm: backfill missing calls to mutex_destroy()
dm snapshot: use mutex instead of rw_semaphore
dm flakey: check for null arg_name in parse_features()
dm thin: extend thinpool status format string with omitted fields
dm thin: fixes in thin-provisioning.txt
dm thin: document representation of <highest mapped sector> when there is none
dm thin: fix documentation relative to low water mark threshold
dm cache: be consistent in specifying sectors and SI units in cache.txt
dm cache: delete obsoleted paragraph in cache.txt
dm cache: fix grammar in cache-policies.txt
...
Pull block updates from Jens Axboe:
"This is the main pull request for block IO related changes for the
4.16 kernel. Nothing major in this pull request, but a good amount of
improvements and fixes all over the map. This contains:
- BFQ improvements, fixes, and cleanups from Angelo, Chiara, and
Paolo.
- Support for SMR zones for deadline and mq-deadline from Damien and
Christoph.
- Set of fixes for bcache by way of Michael Lyle, including fixes
from himself, Kent, Rui, Tang, and Coly.
- Series from Matias for lightnvm with fixes from Hans Holmberg,
Javier, and Matias. Mostly centered around pblk, and the removing
rrpc 1.2 in preparation for supporting 2.0.
- A couple of NVMe pull requests from Christoph. Nothing major in
here, just fixes and cleanups, and support for command tracing from
Johannes.
- Support for blk-throttle for tracking reads and writes separately.
From Joseph Qi. A few cleanups/fixes also for blk-throttle from
Weiping.
- Series from Mike Snitzer that enables dm to register its queue more
logically, something that's alwways been problematic on dm since
it's a stacked device.
- Series from Ming cleaning up some of the bio accessor use, in
preparation for supporting multipage bvecs.
- Various fixes from Ming closing up holes around queue mapping and
quiescing.
- BSD partition fix from Richard Narron, fixing a problem where we
can't mount newer (10/11) FreeBSD partitions.
- Series from Tejun reworking blk-mq timeout handling. The previous
scheme relied on atomic bits, but it had races where we would think
a request had timed out if it to reused at the wrong time.
- null_blk now supports faking timeouts, to enable us to better
exercise and test that functionality separately. From me.
- Kill the separate atomic poll bit in the request struct. After
this, we don't use the atomic bits on blk-mq anymore at all. From
me.
- sgl_alloc/free helpers from Bart.
- Heavily contended tag case scalability improvement from me.
- Various little fixes and cleanups from Arnd, Bart, Corentin,
Douglas, Eryu, Goldwyn, and myself"
* 'for-4.16/block' of git://git.kernel.dk/linux-block: (186 commits)
block: remove smart1,2.h
nvme: add tracepoint for nvme_complete_rq
nvme: add tracepoint for nvme_setup_cmd
nvme-pci: introduce RECONNECTING state to mark initializing procedure
nvme-rdma: remove redundant boolean for inline_data
nvme: don't free uuid pointer before printing it
nvme-pci: Suspend queues after deleting them
bsg: use pr_debug instead of hand crafted macros
blk-mq-debugfs: don't allow write on attributes with seq_operations set
nvme-pci: Fix queue double allocations
block: Set BIO_TRACE_COMPLETION on new bio during split
blk-throttle: use queue_is_rq_based
block: Remove kblockd_schedule_delayed_work{,_on}()
blk-mq: Avoid that blk_mq_delay_run_hw_queue() introduces unintended delays
blk-mq: Rename blk_mq_request_direct_issue() into blk_mq_request_issue_directly()
lib/scatterlist: Fix chaining support in sgl_alloc_order()
blk-throttle: track read and write request individually
block: add bdev_read_only() checks to common helpers
block: fail op_is_write() requests to read-only partitions
blk-throttle: export io_serviced_recursive, io_service_bytes_recursive
...
Fix to return error code -ENOMEM from the mempool_create_kmalloc_pool()
error handling case instead of 0, as done elsewhere in this function.
Fixes: ef43aa3806 ("dm crypt: add cryptographic data integrity protection (authenticated encryption)")
Cc: stable@vger.kernel.org # 4.12+
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Loading key via kernel keyring service erases the internal
key copy immediately after we pass it in crypto layer. This is
wrong because IV is initialized later and we use wrong key
for the initialization (instead of real key there's just zeroed
block).
The bug may cause data corruption if key is loaded via kernel keyring
service first and later same crypt device is reactivated using exactly
same key in hexbyte representation, or vice versa. The bug (and fix)
affects only ciphers using following IVs: essiv, lmk and tcw.
Fixes: c538f6ec9f ("dm crypt: add ability to use keys from the kernel key retention service")
Cc: stable@vger.kernel.org # 4.10+
Signed-off-by: Ondrej Kozina <okozina@redhat.com>
Reviewed-by: Milan Broz <gmazyland@gmail.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
If dm-crypt uses authenticated mode with separate MAC, there are two
concatenated part of the key structure - key(s) for encryption and
authentication key.
Add a missing check for authenticated key length. If this key length is
smaller than actually provided key, dm-crypt now properly fails instead
of crashing.
Fixes: ef43aa3806 ("dm crypt: add cryptographic data integrity protection (authenticated encryption)")
Cc: stable@vger.kernel.org # 4.12+
Reported-by: Salah Coronya <salahx@yahoo.com>
Signed-off-by: Milan Broz <gmazyland@gmail.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
The bio is always freed after running crypt_free_buffer_pages(), so it
isn't necessary to clear bv->bv_page.
Cc: Mike Snitzer <snitzer@redhat.com>
Cc:dm-devel@redhat.com
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
The BIOSET_NEED_RESCUER flag is only needed when a make_request_fn might
do two allocations from the one bioset, and the second one could block
until the first bio completes.
dm-crypt does allocate from this bioset inside the dm make_request_fn,
but does so using GFP_NOWAIT so that the allocation will not block.
So BIOSET_NEED_RESCUER is not needed.
Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
When slub_debug is enabled kmalloc returns unaligned memory. XFS uses
this unaligned memory for its buffers (if an unaligned buffer crosses a
page, XFS frees it and allocates a full page instead - see the function
xfs_buf_allocate_memory).
dm-crypt checks if bv_offset is aligned on page size and these checks
fail with slub_debug and XFS.
Fix this bug by removing the bv_offset checks. Switch to checking if
bv_len is aligned instead of bv_offset (this check should be sufficient
to prevent overruns if a bio with too small bv_len is received).
Fixes: 8f0009a225 ("dm crypt: optionally support larger encryption sector size")
Cc: stable@vger.kernel.org # v4.12+
Reported-by: Bruno Prémont <bonbons@sysophe.eu>
Tested-by: Bruno Prémont <bonbons@sysophe.eu>
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Reviewed-by: Milan Broz <gmazyland@gmail.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
If a crypt mapping uses optional sector_size feature, additional
restrictions to mapped device segment size must be applied in
constructor, otherwise the device activation will fail later.
Fixes: 8f0009a225 ("dm crypt: optionally support larger encryption sector size")
Cc: stable@vger.kernel.org # 4.12+
Signed-off-by: Milan Broz <gmazyland@gmail.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Fix memory leak of cipher_api.
Fixes: 33d2f09fcb (dm crypt: introduce new format of cipher with "capi:" prefix)
Cc: stable@vger.kernel.org # 4.12+
Signed-off-by: Jeffy Chen <jeffy.chen@rock-chips.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
- Constify a few variables in DM core and DM integrity
- Add bufio optimization and checksum failure accounting to DM integrity
- Fix DM integrity to avoid checking integrity of failed reads
- Fix DM integrity to use init_completion
- A couple DM log-writes target fixes
- Simplify DAX flushing by eliminating the unnecessary flush abstraction
that was stood up for DM's use.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQEcBAABAgAGBQJZuo8UAAoJEMUj8QotnQNa5BEIANO4mHh1nrzEbH72a4RCLgxV
H1Pk1zZx/W1bhOOmcRRhxCSM85dPgsCegc5EmpwLZEMavQrP9UZblHcYOUsyIx7W
S/lWa+soOq/5N2OveROc4WdoWVs50UFmc1+BcClc4YrEe+15XC3R0VMkjX2b/hUL
o2eYhPjpMlgaorMtRRU6MAooo2fBRQ9m05aPeVgd35fxibrE7PZm+EYW09wa0STi
9ufuDXJf8+TtFP/38BD41LbUEskuHUZTSDeAJ+3DBaTtfEZcZYxsst4P9JangsHx
jqqqI9aYzFD2a27fl9WLhCvm40YFiKp5nwzED0RZjzWxVa/jTShX7a49BdzTTfw=
=rkSB
-----END PGP SIGNATURE-----
Merge tag 'for-4.14/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm
Pull device mapper updates from Mike Snitzer:
- Some request-based DM core and DM multipath fixes and cleanups
- Constify a few variables in DM core and DM integrity
- Add bufio optimization and checksum failure accounting to DM
integrity
- Fix DM integrity to avoid checking integrity of failed reads
- Fix DM integrity to use init_completion
- A couple DM log-writes target fixes
- Simplify DAX flushing by eliminating the unnecessary flush
abstraction that was stood up for DM's use.
* tag 'for-4.14/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
dax: remove the pmem_dax_ops->flush abstraction
dm integrity: use init_completion instead of COMPLETION_INITIALIZER_ONSTACK
dm integrity: make blk_integrity_profile structure const
dm integrity: do not check integrity for failed read operations
dm log writes: fix >512b sectorsize support
dm log writes: don't use all the cpu while waiting to log blocks
dm ioctl: constify ioctl lookup table
dm: constify argument arrays
dm integrity: count and display checksum failures
dm integrity: optimize writing dm-bufio buffers that are partially changed
dm rq: do not update rq partially in each ending bio
dm rq: make dm-sq requeuing behavior consistent with dm-mq behavior
dm mpath: complain about unsupported __multipath_map_bio() return values
dm mpath: avoid that building with W=1 causes gcc 7 to complain about fall-through
Pull block layer updates from Jens Axboe:
"This is the first pull request for 4.14, containing most of the code
changes. It's a quiet series this round, which I think we needed after
the churn of the last few series. This contains:
- Fix for a registration race in loop, from Anton Volkov.
- Overflow complaint fix from Arnd for DAC960.
- Series of drbd changes from the usual suspects.
- Conversion of the stec/skd driver to blk-mq. From Bart.
- A few BFQ improvements/fixes from Paolo.
- CFQ improvement from Ritesh, allowing idling for group idle.
- A few fixes found by Dan's smatch, courtesy of Dan.
- A warning fixup for a race between changing the IO scheduler and
device remova. From David Jeffery.
- A few nbd fixes from Josef.
- Support for cgroup info in blktrace, from Shaohua.
- Also from Shaohua, new features in the null_blk driver to allow it
to actually hold data, among other things.
- Various corner cases and error handling fixes from Weiping Zhang.
- Improvements to the IO stats tracking for blk-mq from me. Can
drastically improve performance for fast devices and/or big
machines.
- Series from Christoph removing bi_bdev as being needed for IO
submission, in preparation for nvme multipathing code.
- Series from Bart, including various cleanups and fixes for switch
fall through case complaints"
* 'for-4.14/block' of git://git.kernel.dk/linux-block: (162 commits)
kernfs: checking for IS_ERR() instead of NULL
drbd: remove BIOSET_NEED_RESCUER flag from drbd_{md_,}io_bio_set
drbd: Fix allyesconfig build, fix recent commit
drbd: switch from kmalloc() to kmalloc_array()
drbd: abort drbd_start_resync if there is no connection
drbd: move global variables to drbd namespace and make some static
drbd: rename "usermode_helper" to "drbd_usermode_helper"
drbd: fix race between handshake and admin disconnect/down
drbd: fix potential deadlock when trying to detach during handshake
drbd: A single dot should be put into a sequence.
drbd: fix rmmod cleanup, remove _all_ debugfs entries
drbd: Use setup_timer() instead of init_timer() to simplify the code.
drbd: fix potential get_ldev/put_ldev refcount imbalance during attach
drbd: new disk-option disable-write-same
drbd: Fix resource role for newly created resources in events2
drbd: mark symbols static where possible
drbd: Send P_NEG_ACK upon write error in protocol != C
drbd: add explicit plugging when submitting batches
drbd: change list_for_each_safe to while(list_first_entry_or_null)
drbd: introduce drbd_recv_header_maybe_unplug
...
The arrays of 'struct dm_arg' are never modified by the device-mapper
core, so constify them so that they are placed in .rodata.
(Exception: the args array in dm-raid cannot be constified because it is
allocated on the stack and modified.)
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
This way we don't need a block_device structure to submit I/O. The
block_device has different life time rules from the gendisk and
request_queue and is usually only available when the block device node
is open. Other callers need to explicitly create one (e.g. the lightnvm
passthrough code, or the new nvme multipathing code).
For the actual I/O path all that we need is the gendisk, which exists
once per block device. But given that the block layer also does
partition remapping we additionally need a partition index, which is
used for said remapping in generic_make_request.
Note that all the block drivers generally want request_queue or
sometimes the gendisk, so this removes a layer of indirection all
over the stack.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
This flag is never set right after calling bio_integrity_alloc,
so don't clear it and confuse the reader.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
There are quite a number of occurrences in the kernel of the pattern
if (dst != src)
memcpy(dst, src, walk.total % AES_BLOCK_SIZE);
crypto_xor(dst, final, walk.total % AES_BLOCK_SIZE);
or
crypto_xor(keystream, src, nbytes);
memcpy(dst, keystream, nbytes);
where crypto_xor() is preceded or followed by a memcpy() invocation
that is only there because crypto_xor() uses its output parameter as
one of the inputs. To avoid having to add new instances of this pattern
in the arm64 code, which will be refactored to implement non-SIMD
fallbacks, add an alternative implementation called crypto_xor_cpy(),
taking separate input and output arguments. This removes the need for
the separate memcpy().
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
The big-endian IV (plain64be) is needed to map images from extracted
disks that are used in some external (on-chip FDE) disk encryption
drives, e.g.: data recovery from external USB/SATA drives that support
"internal" encryption.
Signed-off-by: Milan Broz <gmazyland@gmail.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
This patch converts bioset_create() to not create a workqueue by
default, so alloctions will never trigger punt_bios_to_rescuer(). It
also introduces a new flag BIOSET_NEED_RESCUER which tells
bioset_create() to preserve the old behavior.
All callers of bioset_create() that are inside block device drivers,
are given the BIOSET_NEED_RESCUER flag.
biosets used by filesystems or other top-level users do not
need rescuing as the bio can never be queued behind other
bios. This includes fs_bio_set, blkdev_dio_pool,
btrfs_bioset, xfs_ioend_bioset, and one allocated by
target_core_iblock.c.
biosets used by md/raid do not need rescuing as
their usage was recently audited and revised to never
risk deadlock.
It is hoped that most, if not all, of the remaining biosets
can end up being the non-rescued version.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Credit-to: Ming Lei <ming.lei@redhat.com> (minor fixes)
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
"flags" arguments are often seen as good API design as they allow
easy extensibility.
bioset_create_nobvec() is implemented internally as a variation in
flags passed to __bioset_create().
To support future extension, make the internal structure part of the
API.
i.e. add a 'flags' argument to bioset_create() and discard
bioset_create_nobvec().
Note that the bio_split allocations in drivers/md/raid* do not need
the bvec mempool - they should have used bioset_create_nobvec().
Suggested-by: Christoph Hellwig <hch@infradead.org>
Reviewed-by: Christoph Hellwig <hch@infradead.org>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Replace bi_error with a new bi_status to allow for a clear conversion.
Note that device mapper overloaded bi_error with a private value, which
we'll have to keep arround at least for now and thus propagate to a
proper blk_status_t value.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
Instead use the special DM_MAPIO_KILL return value to return -EIO just
like we do for the request based path. Note that dm-log-writes returned
-ENOMEM in a few places, which now becomes -EIO instead. No consumer
treats -ENOMEM special so this shouldn't be an issue (and it should
use a mempool to start with to make guaranteed progress).
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
whether blocks should migrate to/from the cache. The bio-prison-v2
interface supports this improvement by enabling direct dispatch of
work to workqueues rather than having to delay the actual work
dispatch to the DM cache core. So the dm-cache policies are much more
nimble by being able to drive IO as they see fit. One immediate
benefit from the improved latency is a cache that should be much more
adaptive to changing workloads.
- Add a new DM integrity target that emulates a block device that has
additional per-sector tags that can be used for storing integrity
information.
- Add a new authenticated encryption feature to the DM crypt target that
builds on the capabilities provided by the DM integrity target.
- Add MD interface for switching the raid4/5/6 journal mode and update
the DM raid target to use it to enable aid4/5/6 journal write-back
support.
- Switch the DM verity target over to using the asynchronous hash crypto
API (this helps work better with architectures that have access to
off-CPU algorithm providers, which should reduce CPU utilization).
- Various request-based DM and DM multipath fixes and improvements from
Bart and Christoph.
- A DM thinp target fix for a bio structure leak that occurs for each
discard IFF discard passdown is enabled.
- A fix for a possible deadlock in DM bufio and a fix to re-check the
new buffer allocation watermark in the face of competing admin changes
to the 'max_cache_size_bytes' tunable.
- A couple DM core cleanups.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQEcBAABAgAGBQJZB6vtAAoJEMUj8QotnQNaoicIALuZTLElgAzxzA28cfk1+1Ea
Gd09CfJ3M6cvk/YGUU7WwiSYIwu16yOJALG4sLcYnEmUCzvKfFPcl/RpeSJHPpYM
0aVXa6NIJw7K2r3C17toiK2DRMHYw6QU843WeWI93vBW13lDJklNJL9fM7GBEOLH
NMSNw2mAq9ajtLlnJhM3ZfhloA7/u/jektvlBO1AA3RQ5Kx1cXVXFPqN7FdRfcqp
4RuEMe9faAadlXLsj3bia5IBmF/W0Qza6JilP+NLKLWB4fm7LZDjN/k+TsHWMa9e
cGR73TgUGLMBJX+sDJy8R3oeBG9JZkFVkD7I30eCjzyhSOs/54XNYQ23EkqHJU0=
=9Ryi
-----END PGP SIGNATURE-----
Merge tag 'for-4.12/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm
Pull device mapper updates from Mike Snitzer:
- A major update for DM cache that reduces the latency for deciding
whether blocks should migrate to/from the cache. The bio-prison-v2
interface supports this improvement by enabling direct dispatch of
work to workqueues rather than having to delay the actual work
dispatch to the DM cache core. So the dm-cache policies are much more
nimble by being able to drive IO as they see fit. One immediate
benefit from the improved latency is a cache that should be much more
adaptive to changing workloads.
- Add a new DM integrity target that emulates a block device that has
additional per-sector tags that can be used for storing integrity
information.
- Add a new authenticated encryption feature to the DM crypt target
that builds on the capabilities provided by the DM integrity target.
- Add MD interface for switching the raid4/5/6 journal mode and update
the DM raid target to use it to enable aid4/5/6 journal write-back
support.
- Switch the DM verity target over to using the asynchronous hash
crypto API (this helps work better with architectures that have
access to off-CPU algorithm providers, which should reduce CPU
utilization).
- Various request-based DM and DM multipath fixes and improvements from
Bart and Christoph.
- A DM thinp target fix for a bio structure leak that occurs for each
discard IFF discard passdown is enabled.
- A fix for a possible deadlock in DM bufio and a fix to re-check the
new buffer allocation watermark in the face of competing admin
changes to the 'max_cache_size_bytes' tunable.
- A couple DM core cleanups.
* tag 'for-4.12/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: (50 commits)
dm bufio: check new buffer allocation watermark every 30 seconds
dm bufio: avoid a possible ABBA deadlock
dm mpath: make it easier to detect unintended I/O request flushes
dm mpath: cleanup QUEUE_IF_NO_PATH bit manipulation by introducing assign_bit()
dm mpath: micro-optimize the hot path relative to MPATHF_QUEUE_IF_NO_PATH
dm: introduce enum dm_queue_mode to cleanup related code
dm mpath: verify __pg_init_all_paths locking assumptions at runtime
dm: verify suspend_locking assumptions at runtime
dm block manager: remove an unused argument from dm_block_manager_create()
dm rq: check blk_mq_register_dev() return value in dm_mq_init_request_queue()
dm mpath: delay requeuing while path initialization is in progress
dm mpath: avoid that path removal can trigger an infinite loop
dm mpath: split and rename activate_path() to prepare for its expanded use
dm ioctl: prevent stack leak in dm ioctl call
dm integrity: use previously calculated log2 of sectors_per_block
dm integrity: use hex2bin instead of open-coded variant
dm crypt: replace custom implementation of hex2bin()
dm crypt: remove obsolete references to per-CPU state
dm verity: switch to using asynchronous hash crypto API
dm crypt: use WQ_HIGHPRI for the IO and crypt workqueues
...
There is no need to have a duplication of the generic library, i.e. hex2bin().
Replace the open coded variant.
Signed-off-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Tested-by: Milan Broz <gmazyland@gmail.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
dm-crypt used to use separate crypto transforms for each CPU, but this
is no longer the case. To avoid confusion, fix up obsolete comments and
rename setup_essiv_cpu().
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Running dm-crypt with workqueues at the standard priority results in IO
competing for CPU time with standard user apps, which can lead to
pipeline bubbles and seriously degraded performance. Move to using
WQ_HIGHPRI workqueues to protect against that.
Signed-off-by: Tim Murray <timmurray@google.com>
Signed-off-by: Enric Balletbo i Serra <enric.balletbo@collabora.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
The message "key wipe" used to wipe real key stored in crypto layer by
rewriting it with zeroes. Since commit 28856a9 ("crypto: xts -
consolidate sanity check for keys") this no longer works in FIPS mode
for XTS.
While running in FIPS mode the crypto key part has to differ from the
tweak key.
Fixes: 28856a9 ("crypto: xts - consolidate sanity check for keys")
Cc: stable@vger.kernel.org
Signed-off-by: Ondrej Kozina <okozina@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Previously, dm-crypt could use blocks composed of multiple 512b sectors
but it created integrity profile for each 512b sector (it padded it with
zeroes). Fix dm-crypt so that the integrity profile is sent for each
block not each sector.
The user must use the same block size in the DM crypt and integrity
targets.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Now that we use the proper REQ_OP_WRITE_ZEROES operation everywhere we can
kill this hack.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
sector_div is very slow, so we introduce a variable sector_shift and
use shift instead of sector_div.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Add optional "sector_size" parameter that specifies encryption sector
size (atomic unit of block device encryption).
Parameter can be in range 512 - 4096 bytes and must be power of two.
For compatibility reasons, the maximal IO must fit into the page limit,
so the limit is set to the minimal page size possible (4096 bytes).
NOTE: this device cannot yet be handled by cryptsetup if this parameter
is set.
IV for the sector is calculated from the 512 bytes sector offset unless
the iv_large_sectors option is used.
Test script using dmsetup:
DEV="/dev/sdb"
DEV_SIZE=$(blockdev --getsz $DEV)
KEY="9c1185a5c5e9fc54612808977ee8f548b2258d31ddadef707ba62c166051b9e3cd0294c27515f2bccee924e8823ca6e124b8fc3167ed478bca702babe4e130ac"
BLOCK_SIZE=4096
# dmsetup create test_crypt --table "0 $DEV_SIZE crypt aes-xts-plain64 $KEY 0 $DEV 0 1 sector_size:$BLOCK_SIZE"
# dmsetup table --showkeys test_crypt
Signed-off-by: Milan Broz <gmazyland@gmail.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
For the new authenticated encryption we have to support generic composed
modes (combination of encryption algorithm and authenticator) because
this is how the kernel crypto API accesses such algorithms.
To simplify the interface, we accept an algorithm directly in crypto API
format. The new format is recognised by the "capi:" prefix. The
dmcrypt internal IV specification is the same as for the old format.
The crypto API cipher specifications format is:
capi:cipher_api_spec-ivmode[:ivopts]
Examples:
capi:cbc(aes)-essiv:sha256 (equivalent to old aes-cbc-essiv:sha256)
capi:xts(aes)-plain64 (equivalent to old aes-xts-plain64)
Examples of authenticated modes:
capi:gcm(aes)-random
capi:authenc(hmac(sha256),xts(aes))-random
capi:rfc7539(chacha20,poly1305)-random
Authenticated modes can only be configured using the new cipher format.
Note that this format allows user to specify arbitrary combinations that
can be insecure. (Policy decision is done in cryptsetup userspace.)
Authenticated encryption algorithms can be of two types, either native
modes (like GCM) that performs both encryption and authentication
internally, or composed modes where user can compose AEAD with separate
specification of encryption algorithm and authenticator.
For composed mode with HMAC (length-preserving encryption mode like an
XTS and HMAC as an authenticator) we have to calculate HMAC digest size
(the separate authentication key is the same size as the HMAC digest).
Introduce crypt_ctr_auth_cipher() to parse the crypto API string to get
HMAC algorithm and retrieve digest size from it.
Also, for HMAC composed mode we need to parse the crypto API string to
get the cipher mode nested in the specification. For native AEAD mode
(like GCM), we can use crypto_tfm_alg_name() API to get the cipher
specification.
Because the HMAC composed mode is not processed the same as the native
AEAD mode, the CRYPT_MODE_INTEGRITY_HMAC flag is no longer needed and
"hmac" specification for the table integrity argument is removed.
Signed-off-by: Milan Broz <gmazyland@gmail.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Allow the use of per-sector metadata, provided by the dm-integrity
module, for integrity protection and persistently stored per-sector
Initialization Vector (IV). The underlying device must support the
"DM-DIF-EXT-TAG" dm-integrity profile.
The per-bio integrity metadata is allocated by dm-crypt for every bio.
Example of low-level mapping table for various types of use:
DEV=/dev/sdb
SIZE=417792
# Additional HMAC with CBC-ESSIV, key is concatenated encryption key + HMAC key
SIZE_INT=389952
dmsetup create x --table "0 $SIZE_INT integrity $DEV 0 32 J 0"
dmsetup create y --table "0 $SIZE_INT crypt aes-cbc-essiv:sha256 \
11ff33c6fb942655efb3e30cf4c0fd95f5ef483afca72166c530ae26151dd83b \
00112233445566778899aabbccddeeff00112233445566778899aabbccddeeff \
0 /dev/mapper/x 0 1 integrity:32:hmac(sha256)"
# AEAD (Authenticated Encryption with Additional Data) - GCM with random IVs
# GCM in kernel uses 96bits IV and we store 128bits auth tag (so 28 bytes metadata space)
SIZE_INT=393024
dmsetup create x --table "0 $SIZE_INT integrity $DEV 0 28 J 0"
dmsetup create y --table "0 $SIZE_INT crypt aes-gcm-random \
11ff33c6fb942655efb3e30cf4c0fd95f5ef483afca72166c530ae26151dd83b \
0 /dev/mapper/x 0 1 integrity:28:aead"
# Random IV only for XTS mode (no integrity protection but provides atomic random sector change)
SIZE_INT=401272
dmsetup create x --table "0 $SIZE_INT integrity $DEV 0 16 J 0"
dmsetup create y --table "0 $SIZE_INT crypt aes-xts-random \
11ff33c6fb942655efb3e30cf4c0fd95f5ef483afca72166c530ae26151dd83b \
0 /dev/mapper/x 0 1 integrity:16:none"
# Random IV with XTS + HMAC integrity protection
SIZE_INT=377656
dmsetup create x --table "0 $SIZE_INT integrity $DEV 0 48 J 0"
dmsetup create y --table "0 $SIZE_INT crypt aes-xts-random \
11ff33c6fb942655efb3e30cf4c0fd95f5ef483afca72166c530ae26151dd83b \
00112233445566778899aabbccddeeff00112233445566778899aabbccddeeff \
0 /dev/mapper/x 0 1 integrity:48:hmac(sha256)"
Both AEAD and HMAC protection authenticates not only data but also
sector metadata.
HMAC protection is implemented through autenc wrapper (so it is
processed the same way as an authenticated mode).
In HMAC mode there are two keys (concatenated in dm-crypt mapping
table). First is the encryption key and the second is the key for
authentication (HMAC). (It is userspace decision if these keys are
independent or somehow derived.)
The sector request for AEAD/HMAC authenticated encryption looks like this:
|----- AAD -------|------ DATA -------|-- AUTH TAG --|
| (authenticated) | (auth+encryption) | |
| sector_LE | IV | sector in/out | tag in/out |
For writes, the integrity fields are calculated during AEAD encryption
of every sector and stored in bio integrity fields and sent to
underlying dm-integrity target for storage.
For reads, the integrity metadata is verified during AEAD decryption of
every sector (they are filled in by dm-integrity, but the integrity
fields are pre-allocated in dm-crypt).
There is also an experimental support in cryptsetup utility for more
friendly configuration (part of LUKS2 format).
Because the integrity fields are not valid on initial creation, the
device must be "formatted". This can be done by direct-io writes to the
device (e.g. dd in direct-io mode). For now, there is available trivial
tool to do this, see: https://github.com/mbroz/dm_int_tools
Signed-off-by: Milan Broz <gmazyland@gmail.com>
Signed-off-by: Ondrej Mosnacek <omosnacek@gmail.com>
Signed-off-by: Vashek Matyas <matyas@fi.muni.cz>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
rcu_dereference_key() and user_key_payload() are currently being used in
two different, incompatible ways:
(1) As a wrapper to rcu_dereference() - when only the RCU read lock used
to protect the key.
(2) As a wrapper to rcu_dereference_protected() - when the key semaphor is
used to protect the key and the may be being modified.
Fix this by splitting both of the key wrappers to produce:
(1) RCU accessors for keys when caller has the key semaphore locked:
dereference_key_locked()
user_key_payload_locked()
(2) RCU accessors for keys when caller holds the RCU read lock:
dereference_key_rcu()
user_key_payload_rcu()
This should fix following warning in the NFS idmapper
===============================
[ INFO: suspicious RCU usage. ]
4.10.0 #1 Tainted: G W
-------------------------------
./include/keys/user-type.h:53 suspicious rcu_dereference_protected() usage!
other info that might help us debug this:
rcu_scheduler_active = 2, debug_locks = 0
1 lock held by mount.nfs/5987:
#0: (rcu_read_lock){......}, at: [<d000000002527abc>] nfs_idmap_get_key+0x15c/0x420 [nfsv4]
stack backtrace:
CPU: 1 PID: 5987 Comm: mount.nfs Tainted: G W 4.10.0 #1
Call Trace:
dump_stack+0xe8/0x154 (unreliable)
lockdep_rcu_suspicious+0x140/0x190
nfs_idmap_get_key+0x380/0x420 [nfsv4]
nfs_map_name_to_uid+0x2a0/0x3b0 [nfsv4]
decode_getfattr_attrs+0xfac/0x16b0 [nfsv4]
decode_getfattr_generic.constprop.106+0xbc/0x150 [nfsv4]
nfs4_xdr_dec_lookup_root+0xac/0xb0 [nfsv4]
rpcauth_unwrap_resp+0xe8/0x140 [sunrpc]
call_decode+0x29c/0x910 [sunrpc]
__rpc_execute+0x140/0x8f0 [sunrpc]
rpc_run_task+0x170/0x200 [sunrpc]
nfs4_call_sync_sequence+0x68/0xa0 [nfsv4]
_nfs4_lookup_root.isra.44+0xd0/0xf0 [nfsv4]
nfs4_lookup_root+0xe0/0x350 [nfsv4]
nfs4_lookup_root_sec+0x70/0xa0 [nfsv4]
nfs4_find_root_sec+0xc4/0x100 [nfsv4]
nfs4_proc_get_rootfh+0x5c/0xf0 [nfsv4]
nfs4_get_rootfh+0x6c/0x190 [nfsv4]
nfs4_server_common_setup+0xc4/0x260 [nfsv4]
nfs4_create_server+0x278/0x3c0 [nfsv4]
nfs4_remote_mount+0x50/0xb0 [nfsv4]
mount_fs+0x74/0x210
vfs_kern_mount+0x78/0x220
nfs_do_root_mount+0xb0/0x140 [nfsv4]
nfs4_try_mount+0x60/0x100 [nfsv4]
nfs_fs_mount+0x5ec/0xda0 [nfs]
mount_fs+0x74/0x210
vfs_kern_mount+0x78/0x220
do_mount+0x254/0xf70
SyS_mount+0x94/0x100
system_call+0x38/0xe0
Reported-by: Jan Stancek <jstancek@redhat.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Tested-by: Jan Stancek <jstancek@redhat.com>
Signed-off-by: James Morris <james.l.morris@oracle.com>
Pull locking updates from Ingo Molnar:
"The main changes in this cycle were:
- Implement wraparound-safe refcount_t and kref_t types based on
generic atomic primitives (Peter Zijlstra)
- Improve and fix the ww_mutex code (Nicolai Hähnle)
- Add self-tests to the ww_mutex code (Chris Wilson)
- Optimize percpu-rwsems with the 'rcuwait' mechanism (Davidlohr
Bueso)
- Micro-optimize the current-task logic all around the core kernel
(Davidlohr Bueso)
- Tidy up after recent optimizations: remove stale code and APIs,
clean up the code (Waiman Long)
- ... plus misc fixes, updates and cleanups"
* 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (50 commits)
fork: Fix task_struct alignment
locking/spinlock/debug: Remove spinlock lockup detection code
lockdep: Fix incorrect condition to print bug msgs for MAX_LOCKDEP_CHAIN_HLOCKS
lkdtm: Convert to refcount_t testing
kref: Implement 'struct kref' using refcount_t
refcount_t: Introduce a special purpose refcount type
sched/wake_q: Clarify queue reinit comment
sched/wait, rcuwait: Fix typo in comment
locking/mutex: Fix lockdep_assert_held() fail
locking/rtmutex: Flip unlikely() branch to likely() in __rt_mutex_slowlock()
locking/rwsem: Reinit wake_q after use
locking/rwsem: Remove unnecessary atomic_long_t casts
jump_labels: Move header guard #endif down where it belongs
locking/atomic, kref: Implement kref_put_lock()
locking/ww_mutex: Turn off __must_check for now
locking/atomic, kref: Avoid more abuse
locking/atomic, kref: Use kref_get_unless_zero() more
locking/atomic, kref: Kill kref_sub()
locking/atomic, kref: Add kref_read()
locking/atomic, kref: Add KREF_INIT()
...
The lockdep splat below hints at a bug in RCU usage in dm-crypt that
was introduced with commit c538f6ec9f ("dm crypt: add ability to use
keys from the kernel key retention service"). The kernel keyring
function user_key_payload() is in fact a wrapper for
rcu_dereference_protected() which must not be called with only
rcu_read_lock() section mark.
Unfortunately the kernel keyring subsystem doesn't currently provide
an interface that allows the use of an RCU read-side section. So for
now we must drop RCU in favour of rwsem until a proper function is
made available in the kernel keyring subsystem.
===============================
[ INFO: suspicious RCU usage. ]
4.10.0-rc5 #2 Not tainted
-------------------------------
./include/keys/user-type.h:53 suspicious rcu_dereference_protected() usage!
other info that might help us debug this:
rcu_scheduler_active = 2, debug_locks = 1
2 locks held by cryptsetup/6464:
#0: (&md->type_lock){+.+.+.}, at: [<ffffffffa02472a2>] dm_lock_md_type+0x12/0x20 [dm_mod]
#1: (rcu_read_lock){......}, at: [<ffffffffa02822f8>] crypt_set_key+0x1d8/0x4b0 [dm_crypt]
stack backtrace:
CPU: 1 PID: 6464 Comm: cryptsetup Not tainted 4.10.0-rc5 #2
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.9.1-1.fc24 04/01/2014
Call Trace:
dump_stack+0x67/0x92
lockdep_rcu_suspicious+0xc5/0x100
crypt_set_key+0x351/0x4b0 [dm_crypt]
? crypt_set_key+0x1d8/0x4b0 [dm_crypt]
crypt_ctr+0x341/0xa53 [dm_crypt]
dm_table_add_target+0x147/0x330 [dm_mod]
table_load+0x111/0x350 [dm_mod]
? retrieve_status+0x1c0/0x1c0 [dm_mod]
ctl_ioctl+0x1f5/0x510 [dm_mod]
dm_ctl_ioctl+0xe/0x20 [dm_mod]
do_vfs_ioctl+0x8e/0x690
? ____fput+0x9/0x10
? task_work_run+0x7e/0xa0
? trace_hardirqs_on_caller+0x122/0x1b0
SyS_ioctl+0x3c/0x70
entry_SYSCALL_64_fastpath+0x18/0xad
RIP: 0033:0x7f392c9a4ec7
RSP: 002b:00007ffef6383378 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 00007ffef63830a0 RCX: 00007f392c9a4ec7
RDX: 000000000124fcc0 RSI: 00000000c138fd09 RDI: 0000000000000005
RBP: 00007ffef6383090 R08: 00000000ffffffff R09: 00000000012482b0
R10: 2a28205d34383336 R11: 0000000000000246 R12: 00007f392d803a08
R13: 00007ffef63831e0 R14: 0000000000000000 R15: 00007f392d803a0b
Fixes: c538f6ec9f ("dm crypt: add ability to use keys from the kernel key retention service")
Reported-by: Milan Broz <mbroz@redhat.com>
Signed-off-by: Ondrej Kozina <okozina@redhat.com>
Reviewed-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
This is a nasty interface and setting the state of a foreign task must
not be done. As of the following commit:
be628be095 ("bcache: Make gc wakeup sane, remove set_task_state()")
... everyone in the kernel calls set_task_state() with current, allowing
the helper to be removed.
However, as the comment indicates, it is still around for those archs
where computing current is more expensive than using a pointer, at least
in theory. An important arch that is affected is arm64, however this has
been addressed now [1] and performance is up to par making no difference
with either calls.
Of all the callers, if any, it's the locking bits that would care most
about this -- ie: we end up passing a tsk pointer to a lot of the lock
slowpath, and setting ->state on that. The following numbers are based
on two tests: a custom ad-hoc microbenchmark that just measures
latencies (for ~65 million calls) between get_task_state() vs
get_current_state().
Secondly for a higher overview, an unlink microbenchmark was used,
which pounds on a single file with open, close,unlink combos with
increasing thread counts (up to 4x ncpus). While the workload is quite
unrealistic, it does contend a lot on the inode mutex or now rwsem.
[1] https://lkml.kernel.org/r/1483468021-8237-1-git-send-email-mark.rutland@arm.com
== 1. x86-64 ==
Avg runtime set_task_state(): 601 msecs
Avg runtime set_current_state(): 552 msecs
vanilla dirty
Hmean unlink1-processes-2 36089.26 ( 0.00%) 38977.33 ( 8.00%)
Hmean unlink1-processes-5 28555.01 ( 0.00%) 29832.55 ( 4.28%)
Hmean unlink1-processes-8 37323.75 ( 0.00%) 44974.57 ( 20.50%)
Hmean unlink1-processes-12 43571.88 ( 0.00%) 44283.01 ( 1.63%)
Hmean unlink1-processes-21 34431.52 ( 0.00%) 38284.45 ( 11.19%)
Hmean unlink1-processes-30 34813.26 ( 0.00%) 37975.17 ( 9.08%)
Hmean unlink1-processes-48 37048.90 ( 0.00%) 39862.78 ( 7.59%)
Hmean unlink1-processes-79 35630.01 ( 0.00%) 36855.30 ( 3.44%)
Hmean unlink1-processes-110 36115.85 ( 0.00%) 39843.91 ( 10.32%)
Hmean unlink1-processes-141 32546.96 ( 0.00%) 35418.52 ( 8.82%)
Hmean unlink1-processes-172 34674.79 ( 0.00%) 36899.21 ( 6.42%)
Hmean unlink1-processes-203 37303.11 ( 0.00%) 36393.04 ( -2.44%)
Hmean unlink1-processes-224 35712.13 ( 0.00%) 36685.96 ( 2.73%)
== 2. ppc64le ==
Avg runtime set_task_state(): 938 msecs
Avg runtime set_current_state: 940 msecs
vanilla dirty
Hmean unlink1-processes-2 19269.19 ( 0.00%) 30704.50 ( 59.35%)
Hmean unlink1-processes-5 20106.15 ( 0.00%) 21804.15 ( 8.45%)
Hmean unlink1-processes-8 17496.97 ( 0.00%) 17243.28 ( -1.45%)
Hmean unlink1-processes-12 14224.15 ( 0.00%) 17240.21 ( 21.20%)
Hmean unlink1-processes-21 14155.66 ( 0.00%) 15681.23 ( 10.78%)
Hmean unlink1-processes-30 14450.70 ( 0.00%) 15995.83 ( 10.69%)
Hmean unlink1-processes-48 16945.57 ( 0.00%) 16370.42 ( -3.39%)
Hmean unlink1-processes-79 15788.39 ( 0.00%) 14639.27 ( -7.28%)
Hmean unlink1-processes-110 14268.48 ( 0.00%) 14377.40 ( 0.76%)
Hmean unlink1-processes-141 14023.65 ( 0.00%) 16271.69 ( 16.03%)
Hmean unlink1-processes-172 13417.62 ( 0.00%) 16067.55 ( 19.75%)
Hmean unlink1-processes-203 15293.08 ( 0.00%) 15440.40 ( 0.96%)
Hmean unlink1-processes-234 13719.32 ( 0.00%) 16190.74 ( 18.01%)
Hmean unlink1-processes-265 16400.97 ( 0.00%) 16115.22 ( -1.74%)
Hmean unlink1-processes-296 14388.60 ( 0.00%) 16216.13 ( 12.70%)
Hmean unlink1-processes-320 15771.85 ( 0.00%) 15905.96 ( 0.85%)
x86-64 (known to be fast for get_current()/this_cpu_read_stable() caching)
and ppc64 (with paca) show similar improvements in the unlink microbenches.
The small delta for ppc64 (2ms), does not represent the gains on the unlink
runs. In the case of x86, there was a decent amount of variation in the
latency runs, but always within a 20 to 50ms increase), ppc was more constant.
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: dave@stgolabs.net
Cc: mark.rutland@arm.com
Link: http://lkml.kernel.org/r/1483479794-14013-5-git-send-email-dave@stgolabs.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
. some locking improvements in DM bufio
. add Kconfig option to disable the DM block manager's extra locking
which mainly serves as a developer tool
. a few bug fixes to DM's persistent-data
. a couple changes to prepare for multipage biovec support in the block
layer
. various improvements and cleanups in the DM core, DM cache, DM raid
and DM crypt
. add ability to have DM crypt use keys from the kernel key retention
service
. add a new "error_writes" feature to the DM flakey target, reads are
left unchanged in this mode
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQEcBAABAgAGBQJYUW8zAAoJEMUj8QotnQNaAWEIAMRQ4aCXq5T7F9Hf4K/l6FwO
FoBr2TPS3Lf0vm/A5Tr819I47hk7q0oroa61ARbpS90iuGt/Au/Sk35cn1BwT0YW
llMvMGbh+w9ZBUJGkyexdXbyfm5ywPHuthMr4CK/UNASyjDl2QMAeBuUZ6FLSPn1
RUL/RYv0mG/7EXOPz0PURPb5rpjO15cAU0NjfNS0862UVR8x8dNS6iImOmScsioe
Flw90qPl3kMBxBHik8xSPJfhtW+lD7xSaOlWzHKtalnUZHRG2BNUtlAMKdiaynx2
yl9MhSsi8wlgd4h9WmlmaOr0VqkU5UYY9D9TDuuJwXnHUXGenVSJ/aGOohr+bm4=
=kOoK
-----END PGP SIGNATURE-----
Merge tag 'dm-4.10-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm
Pull device mapper updates from Mike Snitzer:
- various fixes and improvements to request-based DM and DM multipath
- some locking improvements in DM bufio
- add Kconfig option to disable the DM block manager's extra locking
which mainly serves as a developer tool
- a few bug fixes to DM's persistent-data
- a couple changes to prepare for multipage biovec support in the block
layer
- various improvements and cleanups in the DM core, DM cache, DM raid
and DM crypt
- add ability to have DM crypt use keys from the kernel key retention
service
- add a new "error_writes" feature to the DM flakey target, reads are
left unchanged in this mode
* tag 'dm-4.10-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: (40 commits)
dm flakey: introduce "error_writes" feature
dm cache policy smq: use hash_32() instead of hash_32_generic()
dm crypt: reject key strings containing whitespace chars
dm space map: always set ev if sm_ll_mutate() succeeds
dm space map metadata: skip useless memcpy in metadata_ll_init_index()
dm space map metadata: fix 'struct sm_metadata' leak on failed create
Documentation: dm raid: define data_offset status field
dm raid: fix discard support regression
dm raid: don't allow "write behind" with raid4/5/6
dm mpath: use hw_handler_params if attached hw_handler is same as requested
dm crypt: add ability to use keys from the kernel key retention service
dm array: remove a dead assignment in populate_ablock_with_values()
dm ioctl: use offsetof() instead of open-coding it
dm rq: simplify use_blk_mq initialization
dm: use blk_set_queue_dying() in __dm_destroy()
dm bufio: drop the lock when doing GFP_NOIO allocation
dm bufio: don't take the lock in dm_bufio_shrink_count
dm bufio: avoid sleeping while holding the dm_bufio lock
dm table: simplify dm_table_determine_type()
dm table: an 'all_blk_mq' table must be loaded for a blk-mq DM device
...
Unfortunately key_string may theoretically contain whitespace even after
it's processed by dm_split_args(). The reason for this is DM core
supports escaping of almost all chars including any whitespace.
If userspace passes a key to the kernel in format ":32:logon:my_prefix:my\ key"
dm-crypt will look up key "my_prefix:my key" in kernel keyring service.
So far everything's fine.
Unfortunately if userspace later calls DM_TABLE_STATUS ioctl, it will not
receive back expected ":32:logon:my_prefix:my\ key" but the unescaped version
instead. Also userpace (most notably cryptsetup) is not ready to parse
single target argument containing (even escaped) whitespace chars and any
whitespace is simply taken as delimiter of another argument.
This effect is mitigated by the fact libdevmapper curently performs
double escaping of '\' char. Any user input in format "x\ x" is
transformed into "x\\ x" before being passed to the kernel. Nonetheless
dm-crypt may be used without libdevmapper. Therefore the near-term
solution to this is to reject any key string containing whitespace.
Signed-off-by: Ondrej Kozina <okozina@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>