Commit Graph

325 Commits

Author SHA1 Message Date
Linus Torvalds
6f2689a766 SCSI misc on 20220324
This series consists of the usual driver updates (qla2xxx, pm8001,
 libsas, smartpqi, scsi_debug, lpfc, iscsi, mpi3mr) plus minor updates
 and bug fixes.  The high blast radius core update is the removal of
 write same, which affects block and several non-SCSI devices.  The
 other big change, which is more local, is the removal of the SCSI
 pointer.
 
 Signed-off-by: James E.J. Bottomley <jejb@linux.ibm.com>
 -----BEGIN PGP SIGNATURE-----
 
 iJwEABMIAEQWIQTnYEDbdso9F2cI+arnQslM7pishQUCYjzDQyYcamFtZXMuYm90
 dG9tbGV5QGhhbnNlbnBhcnRuZXJzaGlwLmNvbQAKCRDnQslM7pishQMYAQDEWUGV
 6U0+736AHVtOfiMNfiRN79B1HfXVoHvemnPcTwD/UlndwFfy/3GGOtoZmqEpc73J
 Ec1HDuUCE18H1H2QAh0=
 =/Ty9
 -----END PGP SIGNATURE-----

Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi

Pull SCSI updates from James Bottomley:
 "This series consists of the usual driver updates (qla2xxx, pm8001,
  libsas, smartpqi, scsi_debug, lpfc, iscsi, mpi3mr) plus minor updates
  and bug fixes.

  The high blast radius core update is the removal of write same, which
  affects block and several non-SCSI devices. The other big change,
  which is more local, is the removal of the SCSI pointer"

* tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (281 commits)
  scsi: scsi_ioctl: Drop needless assignment in sg_io()
  scsi: bsg: Drop needless assignment in scsi_bsg_sg_io_fn()
  scsi: lpfc: Copyright updates for 14.2.0.0 patches
  scsi: lpfc: Update lpfc version to 14.2.0.0
  scsi: lpfc: SLI path split: Refactor BSG paths
  scsi: lpfc: SLI path split: Refactor Abort paths
  scsi: lpfc: SLI path split: Refactor SCSI paths
  scsi: lpfc: SLI path split: Refactor CT paths
  scsi: lpfc: SLI path split: Refactor misc ELS paths
  scsi: lpfc: SLI path split: Refactor VMID paths
  scsi: lpfc: SLI path split: Refactor FDISC paths
  scsi: lpfc: SLI path split: Refactor LS_RJT paths
  scsi: lpfc: SLI path split: Refactor LS_ACC paths
  scsi: lpfc: SLI path split: Refactor the RSCN/SCR/RDF/EDC/FARPR paths
  scsi: lpfc: SLI path split: Refactor PLOGI/PRLI/ADISC/LOGO paths
  scsi: lpfc: SLI path split: Refactor base ELS paths and the FLOGI path
  scsi: lpfc: SLI path split: Introduce lpfc_prep_wqe
  scsi: lpfc: SLI path split: Refactor fast and slow paths to native SLI4
  scsi: lpfc: SLI path split: Refactor lpfc_iocbq
  scsi: lpfc: Use kcalloc()
  ...
2022-03-24 19:37:53 -07:00
Linus Torvalds
b1f8ccdaae - Significant refactoring and fixing of how DM core does bio-based IO
accounting with focus on fixing wildly inaccurate IO stats for
   dm-crypt (and other DM targets that defer bio submission in their
   own workqueues). End result is proper IO accounting, made possible
   by targets being updated to use the new dm_submit_bio_remap()
   interface.
 
 - Add hipri bio polling support (REQ_POLLED) to bio-based DM.
 
 - Reduce dm_io and dm_target_io structs so that a single dm_io (which
   contains dm_target_io and first clone bio) weighs in at 256 bytes.
   For reference the bio struct is 128 bytes.
 
 - Various other small cleanups, fixes or improvements in DM core and
   targets.
 
 - Update MAINTAINERS with my kernel.org email address to allow
   distinction between my "upstream" and "Red" Hats.
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEJfWUX4UqZ4x1O2wixSPxCi2dA1oFAmI7ghIACgkQxSPxCi2d
 A1q04wgAzaRu186WjfCO8MK0uyv3S52Rw1EsgYealAqoPwQJ9KkW2icvjtwRL+fJ
 1+w6qE/Da6QdwXj9lGtp1XIXJFipNJSw3PSaE/tV2cXiBemZlzJ5vR6F6dfeYKmV
 /sGas46H2l+aD4Xr7unUmcN/AYrNIFtnucClY3+DlJFPesXQQc9a/XmL9RX9MrN4
 MS9wLkh/5QSG3zReEct/4GVmNSJAjFfLkkeFHtLN82jvvDmnszRT5+aJ06WkXeOz
 OZmQfOPnJv5MnFUz9DOaRb/fTCoyxzxLnNM5Lt3jyFPk9Jf8Qz9TJ2rgskxsE83u
 UsCD/Y/QAdDcrRVB5SS6+yx4AS6uSA==
 =cinj
 -----END PGP SIGNATURE-----

Merge tag 'for-5.18/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm

Pull device mapper updates from Mike Snitzer:

 - Significant refactoring and fixing of how DM core does bio-based IO
   accounting with focus on fixing wildly inaccurate IO stats for
   dm-crypt (and other DM targets that defer bio submission in their own
   workqueues). End result is proper IO accounting, made possible by
   targets being updated to use the new dm_submit_bio_remap() interface.

 - Add hipri bio polling support (REQ_POLLED) to bio-based DM.

 - Reduce dm_io and dm_target_io structs so that a single dm_io (which
   contains dm_target_io and first clone bio) weighs in at 256 bytes.
   For reference the bio struct is 128 bytes.

 - Various other small cleanups, fixes or improvements in DM core and
   targets.

 - Update MAINTAINERS with my kernel.org email address to allow
   distinction between my "upstream" and "Red" Hats.

* tag 'for-5.18/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: (46 commits)
  dm: consolidate spinlocks in dm_io struct
  dm: reduce size of dm_io and dm_target_io structs
  dm: switch dm_target_io booleans over to proper flags
  dm: switch dm_io booleans over to proper flags
  dm: update email address in MAINTAINERS
  dm: return void from __send_empty_flush
  dm: factor out dm_io_complete
  dm cache: use dm_submit_bio_remap
  dm: simplify dm_sumbit_bio_remap interface
  dm thin: use dm_submit_bio_remap
  dm: add WARN_ON_ONCE to dm_submit_bio_remap
  dm: support bio polling
  block: add ->poll_bio to block_device_operations
  dm mpath: use DMINFO instead of printk with KERN_INFO
  dm: stop using bdevname
  dm-zoned: remove the ->name field in struct dmz_dev
  dm: remove unnecessary local variables in __bind
  dm: requeue IO if mapping table not yet available
  dm io: remove stale comment block for dm_io()
  dm thin metadata: remove unused dm_thin_remove_block and __remove
  ...
2022-03-24 19:25:24 -07:00
Mike Snitzer
b7f8dff098 dm: simplify dm_sumbit_bio_remap interface
Remove the from_wq argument from dm_sumbit_bio_remap(). Eliminates the
need for dm_sumbit_bio_remap() callers to know whether they are
calling for a workqueue or from the original dm_submit_bio().

Add map_task to dm_io struct, record the map_task in alloc_io and
clear it after all target ->map() calls have completed. Update
dm_sumbit_bio_remap to check if 'current' matches io->map_task rather
than rely on passed 'from_rq' argument.

This change really simplifies the chore of porting each DM target to
using dm_sumbit_bio_remap() because there is no longer the risk of
programming error by not completely knowing all the different contexts
a particular method that calls dm_sumbit_bio_remap() might be used in.

Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2022-03-10 13:44:56 -05:00
Christoph Hellwig
6667171965 dm-crypt: stop using bio_devname
Use the %pg format specifier to save on stack consuption and code size.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Link: https://lore.kernel.org/r/20220304180105.409765-5-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-03-07 06:42:33 -07:00
Christoph Hellwig
a773187e37 scsi: dm: Remove WRITE_SAME support
There are no more end-users of REQ_OP_WRITE_SAME left, so we can start
deleting it.

Link: https://lore.kernel.org/r/20220209082828.2629273-7-hch@lst.de
Reviewed-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-02-22 21:11:08 -05:00
Aashish Sharma
6fc5150438 dm crypt: fix get_key_size compiler warning if !CONFIG_KEYS
Explicitly convert unsigned int in the right of the conditional
expression to int to match the left side operand and the return type,
fixing the following compiler warning:

drivers/md/dm-crypt.c:2593:43: warning: signed and unsigned
type in conditional expression [-Wsign-compare]

Fixes: c538f6ec9f ("dm crypt: add ability to use keys from the kernel key retention service")
Signed-off-by: Aashish Sharma <shraash@google.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2022-02-22 11:27:56 -05:00
Mike Snitzer
e5524e128f dm crypt: use dm_submit_bio_remap
Care was taken to support kcryptd_io_read being called from crypt_map
or workqueue.  Use of an intermediate CRYPT_MAP_READ_GFP gfp_t
(defined as GFP_NOWAIT) should protect from maintenance burden if that
flag were to change for some reason.

Reviewed-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2022-02-21 15:36:34 -05:00
Christoph Hellwig
abfc426d1b block: pass a block_device to bio_clone_fast
Pass a block_device to bio_clone_fast and __bio_clone_fast and give
the functions more suitable names.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Mike Snitzer <snitzer@redhat.com>
Link: https://lore.kernel.org/r/20220202160109.108149-14-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-02-04 07:43:18 -07:00
Christoph Hellwig
609be10667 block: pass a block_device and opf to bio_alloc_bioset
Pass the block_device and operation that we plan to use this bio for to
bio_alloc_bioset to optimize the assigment.  NULL/0 can be passed, both
for the passthrough case on a raw request_queue and to temporarily avoid
refactoring some nasty code.

Also move the gfp_mask argument after the nr_vecs argument for a much
more logical calling convention matching what most of the kernel does.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Link: https://lore.kernel.org/r/20220124091107.642561-16-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-02-02 07:49:59 -07:00
Christoph Hellwig
3f868c09ea dm-crypt: remove clone_init
Just open code it next to the bio allocations, which saves a few lines
of code, prepares for future changes and allows to remove the duplicate
bi_opf assignment for the bio_clone_fast case in kcryptd_io_read.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20220124091107.642561-7-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-02-02 07:49:59 -07:00
Christoph Hellwig
53db984e00 dm: bio_alloc can't fail if it is allowed to sleep
Remove handling of NULL returns from sleeping bio_alloc calls given that
those can't fail.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20220124091107.642561-6-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-02-02 07:49:59 -07:00
Linus Torvalds
c183e1707a - Add DM core support for emitting audit events through the audit
subsystem. Also enhance both the integrity and crypt targets to emit
   events to via dm-audit.
 
 - Various other simple code improvements and cleanups.
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEJfWUX4UqZ4x1O2wixSPxCi2dA1oFAmGJlFkACgkQxSPxCi2d
 A1pqwwf/YZ6kNKRQaKF1mbkkHOxa/ULf7qIhi/R0epwJu4j1RGsCACS34EqzLc4c
 x15h6flCNj1IBVAqTvMUETYTjTLtyrcfD0yBRWYw2RL0ksHMHyMvd1r/7aE64+pj
 EeZk9Xzcx3Gsq9GOzKfYA2AX0PrypkKSjgHK7hgv+Jh5heqkFcnMXSl3l7BQ6vbr
 ue9joPSI7+6eVFMDn32KxyHzfm6zZo1nmKZ6tQBBHD1D9yBqWTAhXiyXhRA+BOYH
 Tg5wE1fvZ/htyZNEc1cMRArzLF6q9pEU4r8j472N6IcJbhIJzSu0V60zVvexNWG3
 fJSIWqlta1KFK8SQttmDmfFnJiFcyw==
 =t097
 -----END PGP SIGNATURE-----

Merge tag 'for-5.16/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm

Pull device mapper updates from Mike Snitzer:

 - Add DM core support for emitting audit events through the audit
   subsystem. Also enhance both the integrity and crypt targets to emit
   events to via dm-audit.

 - Various other simple code improvements and cleanups.

* tag 'for-5.16/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
  dm table: log table creation error code
  dm: make workqueue names device-specific
  dm writecache: Make use of the helper macro kthread_run()
  dm crypt: Make use of the helper macro kthread_run()
  dm verity: use bvec_kmap_local in verity_for_bv_block
  dm log writes: use memcpy_from_bvec in log_writes_map
  dm integrity: use bvec_kmap_local in __journal_read_write
  dm integrity: use bvec_kmap_local in integrity_metadata
  dm: add add_disk() error handling
  dm: Remove redundant flush_workqueue() calls
  dm crypt: log aead integrity violations to audit subsystem
  dm integrity: log audit events for dm-integrity target
  dm: introduce audit event module for device mapper
2021-11-09 11:02:04 -08:00
Cai Huoqing
a5217c1105 dm crypt: Make use of the helper macro kthread_run()
Replace kthread_create/wake_up_process() with kthread_run()
to simplify the code.

Signed-off-by: Cai Huoqing <caihuoqing@baidu.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2021-11-01 13:28:49 -04:00
Michael Weiß
58d0f180bd dm crypt: log aead integrity violations to audit subsystem
Since dm-crypt target can be stacked on dm-integrity targets to
provide authenticated encryption, integrity violations are recognized
here during aead computation. We use the dm-audit submodule to
signal those events to user space, too.

The construction and destruction of crypt device mappings are also
logged as audit events.

Signed-off-by: Michael Weiß <michael.weiss@aisec.fraunhofer.de>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2021-10-27 16:54:37 -04:00
Christoph Hellwig
fe45e630a1 block: move integrity handling out of <linux/blkdev.h>
Split the integrity/metadata handling definitions out into a new header.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Link: https://lore.kernel.org/r/20210920123328.1399408-17-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-18 06:17:02 -06:00
Changbin Du
d3703ef331 dm crypt: use in_hardirq() instead of deprecated in_irq()
Signed-off-by: Changbin Du <changbin.du@gmail.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2021-08-20 16:25:07 -04:00
Arne Welzel
528b16bfc3 dm crypt: Avoid percpu_counter spinlock contention in crypt_page_alloc()
On systems with many cores using dm-crypt, heavy spinlock contention in
percpu_counter_compare() can be observed when the page allocation limit
for a given device is reached or close to be reached. This is due
to percpu_counter_compare() taking a spinlock to compute an exact
result on potentially many CPUs at the same time.

Switch to non-exact comparison of allocated and allowed pages by using
the value returned by percpu_counter_read_positive() to avoid taking
the percpu_counter spinlock.

This may over/under estimate the actual number of allocated pages by at
most (batch-1) * num_online_cpus().

Currently, batch is bounded by 32. The system on which this issue was
first observed has 256 CPUs and 512GB of RAM. With a 4k page size, this
change may over/under estimate by 31MB. With ~10G (2%) allowed dm-crypt
allocations, this seems an acceptable error. Certainly preferred over
running into the spinlock contention.

This behavior was reproduced on an EC2 c5.24xlarge instance with 96 CPUs
and 192GB RAM as follows, but can be provoked on systems with less CPUs
as well.

 * Disable swap
 * Tune vm settings to promote regular writeback
     $ echo 50 > /proc/sys/vm/dirty_expire_centisecs
     $ echo 25 > /proc/sys/vm/dirty_writeback_centisecs
     $ echo $((128 * 1024 * 1024)) > /proc/sys/vm/dirty_background_bytes

 * Create 8 dmcrypt devices based on files on a tmpfs
 * Create and mount an ext4 filesystem on each crypt devices
 * Run stress-ng --hdd 8 within one of above filesystems

Total %system usage collected from sysstat goes to ~35%. Write throughput
on the underlying loop device is ~2GB/s. perf profiling an individual
kworker kcryptd thread shows the following profile, indicating spinlock
contention in percpu_counter_compare():

    99.98%     0.00%  kworker/u193:46  [kernel.kallsyms]  [k] ret_from_fork
      |
      --ret_from_fork
        kthread
        worker_thread
        |
         --99.92%--process_one_work
            |
            |--80.52%--kcryptd_crypt
            |    |
            |    |--62.58%--mempool_alloc
            |    |  |
            |    |   --62.24%--crypt_page_alloc
            |    |     |
            |    |      --61.51%--__percpu_counter_compare
            |    |        |
            |    |         --61.34%--__percpu_counter_sum
            |    |           |
            |    |           |--58.68%--_raw_spin_lock_irqsave
            |    |           |  |
            |    |           |   --58.30%--native_queued_spin_lock_slowpath
            |    |           |
            |    |            --0.69%--cpumask_next
            |    |                |
            |    |                 --0.51%--_find_next_bit
            |    |
            |    |--10.61%--crypt_convert
            |    |          |
            |    |          |--6.05%--xts_crypt
            ...

After applying this patch and running the same test, %system usage is
lowered to ~7% and write throughput on the loop device increases
to ~2.7GB/s. perf report shows mempool_alloc() as ~8% rather than ~62%
in the profile and not hitting the percpu_counter() spinlock anymore.

    |--8.15%--mempool_alloc
    |    |
    |    |--3.93%--crypt_page_alloc
    |    |    |
    |    |     --3.75%--__alloc_pages
    |    |         |
    |    |          --3.62%--get_page_from_freelist
    |    |              |
    |    |               --3.22%--rmqueue_bulk
    |    |                   |
    |    |                    --2.59%--_raw_spin_lock
    |    |                      |
    |    |                       --2.57%--native_queued_spin_lock_slowpath
    |    |
    |     --3.05%--_raw_spin_lock_irqsave
    |               |
    |                --2.49%--native_queued_spin_lock_slowpath

Suggested-by: DJ Gregor <dj@corelight.com>
Reviewed-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Arne Welzel <arne.welzel@corelight.com>
Fixes: 5059353df8 ("dm crypt: limit the number of allocated pages")
Cc: stable@vger.kernel.org
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2021-08-18 02:10:48 -04:00
Tushar Sugandhi
8ec456629d dm: update target status functions to support IMA measurement
For device mapper targets to take advantage of IMA's measurement
capabilities, the status functions for the individual targets need to be
updated to handle the status_type_t case for value STATUSTYPE_IMA.

Update status functions for the following target types, to log their
respective attributes to be measured using IMA.
 01. cache
 02. crypt
 03. integrity
 04. linear
 05. mirror
 06. multipath
 07. raid
 08. snapshot
 09. striped
 10. verity

For rest of the targets, handle the STATUSTYPE_IMA case by setting the
measurement buffer to NULL.

For IMA to measure the data on a given system, the IMA policy on the
system needs to be updated to have the following line, and the system
needs to be restarted for the measurements to take effect.

/etc/ima/ima-policy
 measure func=CRITICAL_DATA label=device-mapper template=ima-buf

The measurements will be reflected in the IMA logs, which are located at:

/sys/kernel/security/integrity/ima/ascii_runtime_measurements
/sys/kernel/security/integrity/ima/binary_runtime_measurements

These IMA logs can later be consumed by various attestation clients
running on the system, and send them to external services for attesting
the system.

The DM target data measured by IMA subsystem can alternatively
be queried from userspace by setting DM_IMA_MEASUREMENT_FLAG with
DM_TABLE_STATUS_CMD.

Signed-off-by: Tushar Sugandhi <tusharsu@linux.microsoft.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2021-08-10 13:34:23 -04:00
Damien Le Moal
f34ee1dce6 dm crypt: Fix zoned block device support
Zone append BIOs (REQ_OP_ZONE_APPEND) always specify the start sector
of the zone to be written instead of the actual sector location to
write. The write location is determined by the device and returned to
the host upon completion of the operation. This interface, while simple
and efficient for writing into sequential zones of a zoned block
device, is incompatible with the use of sector values to calculate a
cypher block IV. All data written in a zone end up using the same IV
values corresponding to the first sectors of the zone, but read
operation will specify any sector within the zone resulting in an IV
mismatch between encryption and decryption.

To solve this problem, report to DM core that zone append operations are
not supported. This result in the zone append operations being emulated
using regular write operations.

Reported-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2021-06-04 12:07:38 -04:00
Damien Le Moal
912e887505 dm: Introduce dm_report_zones()
To simplify the implementation of the report_zones operation of a zoned
target, introduce the function dm_report_zones() to set a target
mapping start sector in struct dm_report_zones_args and call
blkdev_report_zones(). This new function is exported and the report
zones callback function dm_report_zones_cb() is not.

dm-linear, dm-flakey and dm-crypt are modified to use dm_report_zones().

Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2021-06-04 12:07:32 -04:00
Christoph Hellwig
a8affc03a9 block: rename BIO_MAX_PAGES to BIO_MAX_VECS
Ever since the addition of multipage bio_vecs BIO_MAX_PAGES has been
horribly confusingly misnamed.  Rename it to BIO_MAX_VECS to stop
confusing users of the bio API.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Link: https://lore.kernel.org/r/20210311110137.1132391-2-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-03-11 07:47:48 -07:00
Mikulas Patocka
a666e5c05e dm: fix deadlock when swapping to encrypted device
The system would deadlock when swapping to a dm-crypt device. The reason
is that for each incoming write bio, dm-crypt allocates memory that holds
encrypted data. These excessive allocations exhaust all the memory and the
result is either deadlock or OOM trigger.

This patch limits the number of in-flight swap bios, so that the memory
consumed by dm-crypt is limited. The limit is enforced if the target set
the "limit_swap_bios" variable and if the bio has REQ_SWAP set.

Non-swap bios are not affected becuase taking the semaphore would cause
performance degradation.

This is similar to request-based drivers - they will also block when the
number of requests is over the limit.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2021-02-11 09:45:28 -05:00
Mike Snitzer
e3290b9491 dm: simplify target code conditional on CONFIG_BLK_DEV_ZONED
Allow removal of CONFIG_BLK_DEV_ZONED conditionals in target_type
definition of various targets.

Suggested-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2021-02-11 09:45:27 -05:00
Ahmad Fatoum
363880c4eb dm crypt: support using trusted keys
Commit 27f5411a71 ("dm crypt: support using encrypted keys") extended
dm-crypt to allow use of "encrypted" keys along with "user" and "logon".

Along the same lines, teach dm-crypt to support "trusted" keys as well.

Signed-off-by: Ahmad Fatoum <a.fatoum@pengutronix.de>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2021-02-03 10:13:00 -05:00
Ahmad Fatoum
831475cc0b dm crypt: replaced #if defined with IS_ENABLED
IS_ENABLED(CONFIG_ENCRYPTED_KEYS) is true whether the option is built-in
or a module, so use it instead of #if defined checking for each
separately.

The other #if was to avoid a static function defined, but unused
warning. As we now always build the callsite when the function
is defined, we can remove that first #if guard.

Suggested-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Ahmad Fatoum <a.fatoum@pengutronix.de>
Acked-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2021-02-03 10:12:31 -05:00
Geert Uytterhoeven
74d1da3988 dm crypt: Spelling s/cihper/cipher/
Fix a misspelling of "cipher".

Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2021-02-03 10:10:05 -05:00
Ignat Korchagin
004b8ae9e2 dm crypt: fix copy and paste bug in crypt_alloc_req_aead
In commit d68b29584c ("dm crypt: use GFP_ATOMIC when allocating
crypto requests from softirq") code was incorrectly copy and pasted
from crypt_alloc_req_skcipher()'s crypto request allocation code to
crypt_alloc_req_aead(). It is OK from runtime perspective as both
simple encryption request pointer and AEAD request pointer are part of
a union, but may confuse code reviewers.

Fixes: d68b29584c ("dm crypt: use GFP_ATOMIC when allocating crypto requests from softirq")
Cc: stable@vger.kernel.org # v5.9+
Reported-by: Pavel Machek <pavel@denx.de>
Signed-off-by: Ignat Korchagin <ignat@cloudflare.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2021-01-21 15:06:44 -05:00
Ignat Korchagin
c87a95dc28 dm crypt: defer decryption to a tasklet if interrupts disabled
On some specific hardware on early boot we occasionally get:

[ 1193.920255][    T0] BUG: sleeping function called from invalid context at mm/mempool.c:381
[ 1193.936616][    T0] in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 0, name: swapper/69
[ 1193.953233][    T0] no locks held by swapper/69/0.
[ 1193.965871][    T0] irq event stamp: 575062
[ 1193.977724][    T0] hardirqs last  enabled at (575061): [<ffffffffab73f662>] tick_nohz_idle_exit+0xe2/0x3e0
[ 1194.002762][    T0] hardirqs last disabled at (575062): [<ffffffffab74e8af>] flush_smp_call_function_from_idle+0x4f/0x80
[ 1194.029035][    T0] softirqs last  enabled at (575050): [<ffffffffad600fd2>] asm_call_irq_on_stack+0x12/0x20
[ 1194.054227][    T0] softirqs last disabled at (575043): [<ffffffffad600fd2>] asm_call_irq_on_stack+0x12/0x20
[ 1194.079389][    T0] CPU: 69 PID: 0 Comm: swapper/69 Not tainted 5.10.6-cloudflare-kasan-2021.1.4-dev #1
[ 1194.104103][    T0] Hardware name: NULL R162-Z12-CD/MZ12-HD4-CD, BIOS R10 06/04/2020
[ 1194.119591][    T0] Call Trace:
[ 1194.130233][    T0]  dump_stack+0x9a/0xcc
[ 1194.141617][    T0]  ___might_sleep.cold+0x180/0x1b0
[ 1194.153825][    T0]  mempool_alloc+0x16b/0x300
[ 1194.165313][    T0]  ? remove_element+0x160/0x160
[ 1194.176961][    T0]  ? blk_mq_end_request+0x4b/0x490
[ 1194.188778][    T0]  crypt_convert+0x27f6/0x45f0 [dm_crypt]
[ 1194.201024][    T0]  ? rcu_read_lock_sched_held+0x3f/0x70
[ 1194.212906][    T0]  ? module_assert_mutex_or_preempt+0x3e/0x70
[ 1194.225318][    T0]  ? __module_address.part.0+0x1b/0x3a0
[ 1194.237212][    T0]  ? is_kernel_percpu_address+0x5b/0x190
[ 1194.249238][    T0]  ? crypt_iv_tcw_ctr+0x4a0/0x4a0 [dm_crypt]
[ 1194.261593][    T0]  ? is_module_address+0x25/0x40
[ 1194.272905][    T0]  ? static_obj+0x8a/0xc0
[ 1194.283582][    T0]  ? lockdep_init_map_waits+0x26a/0x700
[ 1194.295570][    T0]  ? __raw_spin_lock_init+0x39/0x110
[ 1194.307330][    T0]  kcryptd_crypt_read_convert+0x31c/0x560 [dm_crypt]
[ 1194.320496][    T0]  ? kcryptd_queue_crypt+0x1be/0x380 [dm_crypt]
[ 1194.333203][    T0]  blk_update_request+0x6d7/0x1500
[ 1194.344841][    T0]  ? blk_mq_trigger_softirq+0x190/0x190
[ 1194.356831][    T0]  blk_mq_end_request+0x4b/0x490
[ 1194.367994][    T0]  ? blk_mq_trigger_softirq+0x190/0x190
[ 1194.379693][    T0]  flush_smp_call_function_queue+0x24b/0x560
[ 1194.391847][    T0]  flush_smp_call_function_from_idle+0x59/0x80
[ 1194.403969][    T0]  do_idle+0x287/0x450
[ 1194.413891][    T0]  ? arch_cpu_idle_exit+0x40/0x40
[ 1194.424716][    T0]  ? lockdep_hardirqs_on_prepare+0x286/0x3f0
[ 1194.436399][    T0]  ? _raw_spin_unlock_irqrestore+0x39/0x40
[ 1194.447759][    T0]  cpu_startup_entry+0x19/0x20
[ 1194.458038][    T0]  secondary_startup_64_no_verify+0xb0/0xbb

IO completion can be queued to a different CPU by the block subsystem as a "call
single function/data". The CPU may run these routines from the idle task, but it
does so with interrupts disabled.

It is not a good idea to do decryption with irqs disabled even in an idle task
context, so just defer it to a tasklet (as is done with requests from hard irqs).

Fixes: 39d42fa96b ("dm crypt: add flags to optionally bypass kcryptd workqueues")
Cc: stable@vger.kernel.org # v5.9+
Signed-off-by: Ignat Korchagin <ignat@cloudflare.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2021-01-14 09:54:37 -05:00
Ignat Korchagin
8e14f61015 dm crypt: do not call bio_endio() from the dm-crypt tasklet
Sometimes, when dm-crypt executes decryption in a tasklet, we may get
"BUG: KASAN: use-after-free in tasklet_action_common.constprop..."
with a kasan-enabled kernel.

When the decryption fully completes in the tasklet, dm-crypt will call
bio_endio(), which in turn will call clone_endio() from dm.c core code. That
function frees the resources associated with the bio, including per bio private
structures. For dm-crypt it will free the current struct dm_crypt_io, which
contains our tasklet object, causing use-after-free, when the tasklet is being
dequeued by the kernel.

To avoid this, do not call bio_endio() from the current tasklet context, but
delay its execution to the dm-crypt IO workqueue.

Fixes: 39d42fa96b ("dm crypt: add flags to optionally bypass kcryptd workqueues")
Cc: <stable@vger.kernel.org> # v5.9+
Signed-off-by: Ignat Korchagin <ignat@cloudflare.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2021-01-12 13:51:17 -05:00
Ignat Korchagin
d68b29584c dm crypt: use GFP_ATOMIC when allocating crypto requests from softirq
Commit 39d42fa96b ("dm crypt: add flags to optionally bypass kcryptd
workqueues") made it possible for some code paths in dm-crypt to be
executed in softirq context, when the underlying driver processes IO
requests in interrupt/softirq context.

In this case sometimes when allocating a new crypto request we may get
a stacktrace like below:

[  210.103008][    C0] BUG: sleeping function called from invalid context at mm/mempool.c:381
[  210.104746][    C0] in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 2602, name: fio
[  210.106599][    C0] CPU: 0 PID: 2602 Comm: fio Tainted: G        W         5.10.0+ #50
[  210.108331][    C0] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 0.0.0 02/06/2015
[  210.110212][    C0] Call Trace:
[  210.110921][    C0]  <IRQ>
[  210.111527][    C0]  dump_stack+0x7d/0xa3
[  210.112411][    C0]  ___might_sleep.cold+0x122/0x151
[  210.113527][    C0]  mempool_alloc+0x16b/0x2f0
[  210.114524][    C0]  ? __queue_work+0x515/0xde0
[  210.115553][    C0]  ? mempool_resize+0x700/0x700
[  210.116586][    C0]  ? crypt_endio+0x91/0x180
[  210.117479][    C0]  ? blk_update_request+0x757/0x1150
[  210.118513][    C0]  ? blk_mq_end_request+0x4b/0x480
[  210.119572][    C0]  ? blk_done_softirq+0x21d/0x340
[  210.120628][    C0]  ? __do_softirq+0x190/0x611
[  210.121626][    C0]  crypt_convert+0x29f9/0x4c00
[  210.122668][    C0]  ? _raw_spin_lock_irqsave+0x87/0xe0
[  210.123824][    C0]  ? kasan_set_track+0x1c/0x30
[  210.124858][    C0]  ? crypt_iv_tcw_ctr+0x4a0/0x4a0
[  210.125930][    C0]  ? kmem_cache_free+0x104/0x470
[  210.126973][    C0]  ? crypt_endio+0x91/0x180
[  210.127947][    C0]  kcryptd_crypt_read_convert+0x30e/0x420
[  210.129165][    C0]  blk_update_request+0x757/0x1150
[  210.130231][    C0]  blk_mq_end_request+0x4b/0x480
[  210.131294][    C0]  blk_done_softirq+0x21d/0x340
[  210.132332][    C0]  ? _raw_spin_lock+0x81/0xd0
[  210.133289][    C0]  ? blk_mq_stop_hw_queue+0x30/0x30
[  210.134399][    C0]  ? _raw_read_lock_irq+0x40/0x40
[  210.135458][    C0]  __do_softirq+0x190/0x611
[  210.136409][    C0]  ? handle_edge_irq+0x221/0xb60
[  210.137447][    C0]  asm_call_irq_on_stack+0x12/0x20
[  210.138507][    C0]  </IRQ>
[  210.139118][    C0]  do_softirq_own_stack+0x37/0x40
[  210.140191][    C0]  irq_exit_rcu+0x110/0x1b0
[  210.141151][    C0]  common_interrupt+0x74/0x120
[  210.142171][    C0]  asm_common_interrupt+0x1e/0x40

Fix this by allocating crypto requests with GFP_ATOMIC mask in
interrupt context.

Fixes: 39d42fa96b ("dm crypt: add flags to optionally bypass kcryptd workqueues")
Cc: stable@vger.kernel.org # v5.9+
Reported-by: Maciej S. Szmigiero <mail@maciej.szmigiero.name>
Signed-off-by: Ignat Korchagin <ignat@cloudflare.com>
Acked-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2021-01-04 15:02:33 -05:00
Ignat Korchagin
8abec36d12 dm crypt: do not wait for backlogged crypto request completion in softirq
Commit 39d42fa96b ("dm crypt: add flags to optionally bypass kcryptd
workqueues") made it possible for some code paths in dm-crypt to be
executed in softirq context, when the underlying driver processes IO
requests in interrupt/softirq context.

When Crypto API backlogs a crypto request, dm-crypt uses
wait_for_completion to avoid sending further requests to an already
overloaded crypto driver. However, if the code is executing in softirq
context, we might get the following stacktrace:

[  210.235213][    C0] BUG: scheduling while atomic: fio/2602/0x00000102
[  210.236701][    C0] Modules linked in:
[  210.237566][    C0] CPU: 0 PID: 2602 Comm: fio Tainted: G        W         5.10.0+ #50
[  210.239292][    C0] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 0.0.0 02/06/2015
[  210.241233][    C0] Call Trace:
[  210.241946][    C0]  <IRQ>
[  210.242561][    C0]  dump_stack+0x7d/0xa3
[  210.243466][    C0]  __schedule_bug.cold+0xb3/0xc2
[  210.244539][    C0]  __schedule+0x156f/0x20d0
[  210.245518][    C0]  ? io_schedule_timeout+0x140/0x140
[  210.246660][    C0]  schedule+0xd0/0x270
[  210.247541][    C0]  schedule_timeout+0x1fb/0x280
[  210.248586][    C0]  ? usleep_range+0x150/0x150
[  210.249624][    C0]  ? unpoison_range+0x3a/0x60
[  210.250632][    C0]  ? ____kasan_kmalloc.constprop.0+0x82/0xa0
[  210.251949][    C0]  ? unpoison_range+0x3a/0x60
[  210.252958][    C0]  ? __prepare_to_swait+0xa7/0x190
[  210.254067][    C0]  do_wait_for_common+0x2ab/0x370
[  210.255158][    C0]  ? usleep_range+0x150/0x150
[  210.256192][    C0]  ? bit_wait_io_timeout+0x160/0x160
[  210.257358][    C0]  ? blk_update_request+0x757/0x1150
[  210.258582][    C0]  ? _raw_spin_lock_irq+0x82/0xd0
[  210.259674][    C0]  ? _raw_read_unlock_irqrestore+0x30/0x30
[  210.260917][    C0]  wait_for_completion+0x4c/0x90
[  210.261971][    C0]  crypt_convert+0x19a6/0x4c00
[  210.263033][    C0]  ? _raw_spin_lock_irqsave+0x87/0xe0
[  210.264193][    C0]  ? kasan_set_track+0x1c/0x30
[  210.265191][    C0]  ? crypt_iv_tcw_ctr+0x4a0/0x4a0
[  210.266283][    C0]  ? kmem_cache_free+0x104/0x470
[  210.267363][    C0]  ? crypt_endio+0x91/0x180
[  210.268327][    C0]  kcryptd_crypt_read_convert+0x30e/0x420
[  210.269565][    C0]  blk_update_request+0x757/0x1150
[  210.270563][    C0]  blk_mq_end_request+0x4b/0x480
[  210.271680][    C0]  blk_done_softirq+0x21d/0x340
[  210.272775][    C0]  ? _raw_spin_lock+0x81/0xd0
[  210.273847][    C0]  ? blk_mq_stop_hw_queue+0x30/0x30
[  210.275031][    C0]  ? _raw_read_lock_irq+0x40/0x40
[  210.276182][    C0]  __do_softirq+0x190/0x611
[  210.277203][    C0]  ? handle_edge_irq+0x221/0xb60
[  210.278340][    C0]  asm_call_irq_on_stack+0x12/0x20
[  210.279514][    C0]  </IRQ>
[  210.280164][    C0]  do_softirq_own_stack+0x37/0x40
[  210.281281][    C0]  irq_exit_rcu+0x110/0x1b0
[  210.282286][    C0]  common_interrupt+0x74/0x120
[  210.283376][    C0]  asm_common_interrupt+0x1e/0x40
[  210.284496][    C0] RIP: 0010:_aesni_enc1+0x65/0xb0

Fix this by making crypt_convert function reentrant from the point of
a single bio and make dm-crypt defer further bio processing to a
workqueue, if Crypto API backlogs a request in interrupt context.

Fixes: 39d42fa96b ("dm crypt: add flags to optionally bypass kcryptd workqueues")
Cc: stable@vger.kernel.org # v5.9+
Signed-off-by: Ignat Korchagin <ignat@cloudflare.com>
Acked-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2021-01-04 15:02:32 -05:00
Mike Snitzer
48b0777cd9 Revert "dm crypt: export sysfs of kcryptd workqueue"
This reverts commit a2b8b2d975.

WQ_SYSFS breaks the ability to reload a DM table due to sysfs kobject
collision (due to active and inactive table). Given lack of
demonstrated need for exposing this workqueue via sysfs: revert
exposing it.

Reported-by: Ignat Korchagin <ignat@cloudflare.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2020-12-28 16:13:52 -05:00
Jeffle Xu
a2b8b2d975 dm crypt: export sysfs of kcryptd workqueue
It should be helpful to export sysfs of "kcryptd" workqueue in some
cases, such as setting specific CPU affinity of the workqueue.

Besides, also tweak the name format a little. The slash inside a
directory name will be translate into exclamation mark, such as
/sys/devices/virtual/workqueue/'kcryptd!253:0'.

Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2020-12-04 18:04:36 -05:00
Rikard Falkeborn
e8dc79d1bd dm crypt: Constify static crypt_iv_operations
The only usage of these structs is to assign their address to the
iv_gen_ops field in the crypt config struct, which is a pointer to
const. Make them const like the rest of the static crypt_iv_operations
structs. This allows the compiler to put them in read-only memory.

Signed-off-by: Rikard Falkeborn <rikard.falkeborn@gmail.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2020-12-04 18:04:35 -05:00
Mikulas Patocka
cd74693870 dm crypt: don't use drivers that have CRYPTO_ALG_ALLOCATES_MEMORY
Don't use crypto drivers that have the flag CRYPTO_ALG_ALLOCATES_MEMORY
set. These drivers allocate memory and thus they are unsuitable for block
I/O processing.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2020-09-29 16:33:08 -04:00
Damien Le Moal
7785a9e4c2 dm crypt: Initialize crypto wait structures
Use the DECLARE_CRYPTO_WAIT() macro to properly initialize the crypto
wait structures declared on stack before their use with
crypto_wait_req().

Fixes: 39d13a1ac4 ("dm crypt: reuse eboiv skcipher for IV generation")
Fixes: bbb1658461 ("dm crypt: Implement Elephant diffuser for Bitlocker compatibility")
Cc: stable@vger.kernel.org
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2020-09-01 16:41:57 -04:00
Gustavo A. R. Silva
df561f6688 treewide: Use fallthrough pseudo-keyword
Replace the existing /* fall through */ comments and its variants with
the new pseudo-keyword macro fallthrough[1]. Also, remove unnecessary
fall-through markings when it is the case.

[1] https://www.kernel.org/doc/html/v5.7/process/deprecated.html?highlight=fallthrough#implicit-switch-case-fall-through

Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
2020-08-23 17:36:59 -05:00
Linus Torvalds
2f12d44085 - DM multipath locking fixes around m->flags tests and improvements to
bio-based code so that it follows patterns established by
   request-based code.
 
 - Request-based DM core improvement to eliminate unnecessary call to
   blk_mq_queue_stopped().
 
 - Add "panic_on_corruption" error handling mode to DM verity target.
 
 - DM bufio fix to to perform buffer cleanup from a workqueue rather
   than wait for IO in reclaim context from shrinker.
 
 - DM crypt improvement to optionally avoid async processing via
   workqueues for reads and/or writes -- via "no_read_workqueue" and
   "no_write_workqueue" features.  This more direct IO processing
   improves latency and throughput with faster storage.  Avoiding
   workqueue IO submission for writes (DM_CRYPT_NO_WRITE_WORKQUEUE) is
   a requirement for adding zoned block device support to DM crypt.
 
 - Add zoned block device support to DM crypt.  Makes use of
   DM_CRYPT_NO_WRITE_WORKQUEUE and a new optional feature
   (DM_CRYPT_WRITE_INLINE) that allows write completion to wait for
   encryption to complete.  This allows write ordering to be preserved,
   which is needed for zoned block devices.
 
 - Fix DM ebs target's check for REQ_OP_FLUSH.
 
 - Fix DM core's report zones support to not report more zones than
   were requested.
 
 - A few small compiler warning fixes.
 
 - DM dust improvements to return output directly to the user rather
   than require they scrape the system log for output.
 -----BEGIN PGP SIGNATURE-----
 
 iQFHBAABCAAxFiEEJfWUX4UqZ4x1O2wixSPxCi2dA1oFAl8tdOQTHHNuaXR6ZXJA
 cmVkaGF0LmNvbQAKCRDFI/EKLZ0DWvDlB/sF8svagDqeqs27xTxCiUPykD29cMmS
 OGPr0Mp/BntZOBpSaTPM9s5XucP3WJhPsxet5qeoyM3OViSFx+O55PqPjn8C65y0
 eGMa4zknd9eO1933+ijmyQu6VNr4sf/6nusX4xSGqv00UR22dJ+3pHtfN9ANDXYX
 AAYA0Ve6UuOwAbGUCnRGI/2780aYY0B8Ok+cF21CskqryF+RpmbZ6BsR07+Hk4cy
 LX5EaHUqezW12cibLq2f0l7TLLJ86OscvqyU9lGVIxiV57e2i5c2S1HvhKZu+Wn3
 6CUmlOhGI0viCKgM1ArekZ+zOw9ROIaAKKPzC5mspqx9yuuCqdY8k8xV
 =X3tt
 -----END PGP SIGNATURE-----

Merge tag 'for-5.9/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm

Pull device mapper updates from Mike Snitzer:

 - DM multipath locking fixes around m->flags tests and improvements to
   bio-based code so that it follows patterns established by
   request-based code.

 - Request-based DM core improvement to eliminate unnecessary call to
   blk_mq_queue_stopped().

 - Add "panic_on_corruption" error handling mode to DM verity target.

 - DM bufio fix to to perform buffer cleanup from a workqueue rather
   than wait for IO in reclaim context from shrinker.

 - DM crypt improvement to optionally avoid async processing via
   workqueues for reads and/or writes -- via "no_read_workqueue" and
   "no_write_workqueue" features. This more direct IO processing
   improves latency and throughput with faster storage. Avoiding
   workqueue IO submission for writes (DM_CRYPT_NO_WRITE_WORKQUEUE) is a
   requirement for adding zoned block device support to DM crypt.

 - Add zoned block device support to DM crypt. Makes use of
   DM_CRYPT_NO_WRITE_WORKQUEUE and a new optional feature
   (DM_CRYPT_WRITE_INLINE) that allows write completion to wait for
   encryption to complete. This allows write ordering to be preserved,
   which is needed for zoned block devices.

 - Fix DM ebs target's check for REQ_OP_FLUSH.

 - Fix DM core's report zones support to not report more zones than were
   requested.

 - A few small compiler warning fixes.

 - DM dust improvements to return output directly to the user rather
   than require they scrape the system log for output.

* tag 'for-5.9/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
  dm: don't call report zones for more than the user requested
  dm ebs: Fix incorrect checking for REQ_OP_FLUSH
  dm init: Set file local variable static
  dm ioctl: Fix compilation warning
  dm raid: Remove empty if statement
  dm verity: Fix compilation warning
  dm crypt: Enable zoned block device support
  dm crypt: add flags to optionally bypass kcryptd workqueues
  dm bufio: do buffer cleanup from a workqueue
  dm rq: don't call blk_mq_queue_stopped() in dm_stop_queue()
  dm dust: add interface to list all badblocks
  dm dust: report some message results directly back to user
  dm verity: add "panic_on_corruption" error handling mode
  dm mpath: use double checked locking in fast path
  dm mpath: rename current_pgpath to pgpath in multipath_prepare_ioctl
  dm mpath: rework __map_bio()
  dm mpath: factor out multipath_queue_bio
  dm mpath: push locking down to must_push_back_rq()
  dm mpath: take m->lock spinlock when testing QUEUE_IF_NO_PATH
  dm mpath: changes from initial m->flags locking audit
2020-08-07 13:08:09 -07:00
Waiman Long
453431a549 mm, treewide: rename kzfree() to kfree_sensitive()
As said by Linus:

  A symmetric naming is only helpful if it implies symmetries in use.
  Otherwise it's actively misleading.

  In "kzalloc()", the z is meaningful and an important part of what the
  caller wants.

  In "kzfree()", the z is actively detrimental, because maybe in the
  future we really _might_ want to use that "memfill(0xdeadbeef)" or
  something. The "zero" part of the interface isn't even _relevant_.

The main reason that kzfree() exists is to clear sensitive information
that should not be leaked to other future users of the same memory
objects.

Rename kzfree() to kfree_sensitive() to follow the example of the recently
added kvfree_sensitive() and make the intention of the API more explicit.
In addition, memzero_explicit() is used to clear the memory to make sure
that it won't get optimized away by the compiler.

The renaming is done by using the command sequence:

  git grep -w --name-only kzfree |\
  xargs sed -i 's/kzfree/kfree_sensitive/'

followed by some editing of the kfree_sensitive() kerneldoc and adding
a kzfree backward compatibility macro in slab.h.

[akpm@linux-foundation.org: fs/crypto/inline_crypt.c needs linux/slab.h]
[akpm@linux-foundation.org: fix fs/crypto/inline_crypt.c some more]

Suggested-by: Joe Perches <joe@perches.com>
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: David Howells <dhowells@redhat.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Cc: James Morris <jmorris@namei.org>
Cc: "Serge E. Hallyn" <serge@hallyn.com>
Cc: Joe Perches <joe@perches.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Cc: "Jason A . Donenfeld" <Jason@zx2c4.com>
Link: http://lkml.kernel.org/r/20200616154311.12314-3-longman@redhat.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-08-07 11:33:22 -07:00
Linus Torvalds
2324d50d05 It's been a busy cycle for documentation - hopefully the busiest for a
while to come.  Changes include:
 
  - Some new Chinese translations
 
  - Progress on the battle against double words words and non-HTTPS URLs
 
  - Some block-mq documentation
 
  - More RST conversions from Mauro.  At this point, that task is
    essentially complete, so we shouldn't see this kind of churn again for a
    while.  Unless we decide to switch to asciidoc or something...:)
 
  - Lots of typo fixes, warning fixes, and more.
 -----BEGIN PGP SIGNATURE-----
 
 iQFDBAABCAAtFiEEIw+MvkEiF49krdp9F0NaE2wMflgFAl8oVkwPHGNvcmJldEBs
 d24ubmV0AAoJEBdDWhNsDH5YoW8H/jJ/xnXFn7tkgVPQAlL3k5HCnK7A5nDP9RVR
 cg1pTx1cEFdjzxPlJyExU6/v+AImOvtweHXC+JDK7YcJ6XFUNYXJI3LxL5KwUXbY
 BL/xRFszDSXH2C7SJF5GECcFYp01e/FWSLN3yWAh+g+XwsKiTJ8q9+CoIDkHfPGO
 7oQsHKFu6s36Af0LfSgxk4sVB7EJbo8e4psuPsP5SUrl+oXRO43Put0rXkR4yJoH
 9oOaB51Do5fZp8I4JVAqGXvpXoExyLMO4yw0mASm6YSZ3KyjR8Fae+HD9Cq4ZuwY
 0uzb9K+9NEhqbfwtyBsi99S64/6Zo/MonwKwevZuhtsDTK4l4iU=
 =JQLZ
 -----END PGP SIGNATURE-----

Merge tag 'docs-5.9' of git://git.lwn.net/linux

Pull documentation updates from Jonathan Corbet:
 "It's been a busy cycle for documentation - hopefully the busiest for a
  while to come. Changes include:

   - Some new Chinese translations

   - Progress on the battle against double words words and non-HTTPS
     URLs

   - Some block-mq documentation

   - More RST conversions from Mauro. At this point, that task is
     essentially complete, so we shouldn't see this kind of churn again
     for a while. Unless we decide to switch to asciidoc or
     something...:)

   - Lots of typo fixes, warning fixes, and more"

* tag 'docs-5.9' of git://git.lwn.net/linux: (195 commits)
  scripts/kernel-doc: optionally treat warnings as errors
  docs: ia64: correct typo
  mailmap: add entry for <alobakin@marvell.com>
  doc/zh_CN: add cpu-load Chinese version
  Documentation/admin-guide: tainted-kernels: fix spelling mistake
  MAINTAINERS: adjust kprobes.rst entry to new location
  devices.txt: document rfkill allocation
  PCI: correct flag name
  docs: filesystems: vfs: correct flag name
  docs: filesystems: vfs: correct sync_mode flag names
  docs: path-lookup: markup fixes for emphasis
  docs: path-lookup: more markup fixes
  docs: path-lookup: fix HTML entity mojibake
  CREDITS: Replace HTTP links with HTTPS ones
  docs: process: Add an example for creating a fixes tag
  doc/zh_CN: add Chinese translation prefer section
  doc/zh_CN: add clearing-warn-once Chinese version
  doc/zh_CN: add admin-guide index
  doc:it_IT: process: coding-style.rst: Correct __maybe_unused compiler label
  futex: MAINTAINERS: Re-add selftests directory
  ...
2020-08-04 22:47:54 -07:00
Damien Le Moal
8e225f04d2 dm crypt: Enable zoned block device support
Enable support for zoned block devices. This is done by:
1) implementing the target report_zones method.
2) adding the DM_TARGET_ZONED_HM flag to the target features.
3) setting DM_CRYPT_NO_WRITE_WORKQUEUE flag to avoid IO
   processing via workqueue.
4) Introducing inline write encryption completion to preserve write
   ordering.

The last point is implemented by introducing the internal flag
DM_CRYPT_WRITE_INLINE. When set, kcryptd_crypt_write_convert() always
waits inline for the completion of a write request encryption if the
request is not already completed once crypt_convert() returns.
Completion of write request encryption is signaled using the
restart completion by kcryptd_async_done(). This mechanism allows
using ciphers that have an asynchronous implementation, isolating
dm-crypt from any potential request completion reordering for these
ciphers.

Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2020-07-20 11:17:45 -04:00
Ignat Korchagin
39d42fa96b dm crypt: add flags to optionally bypass kcryptd workqueues
This is a follow up to [1] that detailed latency problems associated
with dm-crypt's use of workqueues when processing IO.

Current dm-crypt implementation creates a significant IO performance
overhead (at least on small IO block sizes) for both latency and
throughput. We suspect offloading IO request processing into
workqueues and async threads is more harmful these days with the
modern fast storage. I also did some digging into the dm-crypt git
history and much of this async processing is not needed anymore,
because the reasons it was added are mostly gone from the kernel. More
details can be found in [2] (see "Git archeology" section).

This change adds DM_CRYPT_NO_READ_WORKQUEUE and
DM_CRYPT_NO_WRITE_WORKQUEUE flags for read and write BIOs, which
direct dm-crypt to not offload crypto operations into kcryptd
workqueues.  In addition, writes are not buffered to be sorted in the
dm-crypt red-black tree, but dispatched immediately. For cases, where
crypto operations cannot happen (hard interrupt context, for example
the read path of some NVME drivers), we offload the work to a tasklet
rather than a workqueue.

These flags only ensure no async BIO processing in the dm-crypt
module. It is worth noting that some Crypto API implementations may
offload encryption into their own workqueues, which are independent of
the dm-crypt and its configuration. However upon enabling these new
flags dm-crypt will instruct Crypto API not to backlog crypto
requests.

To give an idea of the performance gains for certain workloads,
consider the script, and results when tested against various
devices, detailed here:
https://www.redhat.com/archives/dm-devel/2020-July/msg00138.html

[1]: https://www.spinics.net/lists/dm-crypt/msg07516.html
[2]: https://blog.cloudflare.com/speeding-up-linux-disk-encryption/

Signed-off-by: Ignat Korchagin <ignat@cloudflare.com>
Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com>
Reviewed-by: Bob Liu <bob.liu@oracle.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2020-07-20 11:17:44 -04:00
Alexander A. Klimov
6f3bc22bf5 Replace HTTP links with HTTPS ones: LVM
Rationale:
Reduces attack surface on kernel devs opening the links for MITM
as HTTPS traffic is much harder to manipulate.

Deterministic algorithm:
For each file:
  If not .svg:
    For each line:
      If doesn't contain `\bxmlns\b`:
        For each link, `\bhttp://[^# \t\r\n]*(?:\w|/)`:
          If both the HTTP and HTTPS versions
          return 200 OK and serve the same content:
            Replace HTTP with HTTPS.

Signed-off-by: Alexander A. Klimov <grandmaster@al2klimov.de>
Link: https://lore.kernel.org/r/20200627103138.71885-1-grandmaster@al2klimov.de
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2020-07-05 14:28:27 -06:00
Christoph Hellwig
ed00aabd5e block: rename generic_make_request to submit_bio_noacct
generic_make_request has always been very confusingly misnamed, so rename
it to submit_bio_noacct to make it clear that it is submit_bio minus
accounting and a few checks.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-07-01 07:27:24 -06:00
Eric Biggers
64611a15ca dm crypt: avoid truncating the logical block size
queue_limits::logical_block_size got changed from unsigned short to
unsigned int, but it was forgotten to update crypt_io_hints() to use the
new type.  Fix it.

Fixes: ad6bf88a6c ("block: fix an integer overflow in logical block size")
Cc: stable@vger.kernel.org
Signed-off-by: Eric Biggers <ebiggers@google.com>
Reviewed-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2020-06-05 14:59:59 -04:00
Gustavo A. R. Silva
b18ae8dd9d dm: replace zero-length array with flexible-array
The current codebase makes use of the zero-length array language
extension to the C90 standard, but the preferred mechanism to declare
variable-length types such as these ones is a flexible array member[1][2],
introduced in C99:

struct foo {
        int stuff;
        struct boo array[];
};

By making use of the mechanism above, we will get a compiler warning
in case the flexible array does not occur last in the structure, which
will help us prevent some kind of undefined behavior bugs from being
inadvertently introduced[3] to the codebase from now on.

Also, notice that, dynamic memory allocations won't be affected by
this change:

"Flexible array members have incomplete type, and so the sizeof operator
may not be applied. As a quirk of the original implementation of
zero-length arrays, sizeof evaluates to zero."[1]

sizeof(flexible-array-member) triggers a warning because flexible array
members have incomplete type[1]. There are some instances of code in
which the sizeof operator is being incorrectly/erroneously applied to
zero-length arrays and the result is zero. Such instances may be hiding
some bugs. So, this work (flexible-array member conversions) will also
help to get completely rid of those sorts of issues.

This issue was found with the help of Coccinelle.

[1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html
[2] https://github.com/KSPP/linux/issues/21
[3] commit 7649773293 ("cxgb3/l2t: Fix undefined behaviour")

Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2020-05-20 17:09:44 -04:00
Dmitry Baryshkov
27f5411a71 dm crypt: support using encrypted keys
Allow one to use "encrypted" in addition to "user" and "logon" key
types for device encryption.

Signed-off-by: Dmitry Baryshkov <dmitry_baryshkov@mentor.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2020-05-15 10:29:34 -04:00
Yang Yingliang
3fd53533a8 dm crypt: use crypt_integrity_aead() helper
Replace test_bit(CRYPT_MODE_INTEGRITY_AEAD, XXX) with
crypt_integrity_aead().

Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2020-03-24 11:17:33 -04:00
Milan Broz
4ea9471fbd dm crypt: fix benbi IV constructor crash if used in authenticated mode
If benbi IV is used in AEAD construction, for example:
  cryptsetup luksFormat <device> --cipher twofish-xts-benbi --key-size 512 --integrity=hmac-sha256
the constructor uses wrong skcipher function and crashes:

 BUG: kernel NULL pointer dereference, address: 00000014
 ...
 EIP: crypt_iv_benbi_ctr+0x15/0x70 [dm_crypt]
 Call Trace:
  ? crypt_subkey_size+0x20/0x20 [dm_crypt]
  crypt_ctr+0x567/0xfc0 [dm_crypt]
  dm_table_add_target+0x15f/0x340 [dm_mod]

Fix this by properly using crypt_aead_blocksize() in this case.

Fixes: ef43aa3806 ("dm crypt: add cryptographic data integrity protection (authenticated encryption)")
Cc: stable@vger.kernel.org # v4.12+
Link: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=941051
Reported-by: Jerad Simpson <jbsimpson@gmail.com>
Signed-off-by: Milan Broz <gmazyland@gmail.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2020-01-14 20:22:47 -05:00
Milan Broz
bbb1658461 dm crypt: Implement Elephant diffuser for Bitlocker compatibility
Add experimental support for BitLocker encryption with CBC mode and
additional Elephant diffuser.

The mode was used in older Windows systems and it is provided mainly
for compatibility reasons. The userspace support to activate these
devices is being added to cryptsetup utility.

Read-write activation of such a device is very simple, for example:
  echo <password> | cryptsetup bitlkOpen bitlk_image.img test

The Elephant diffuser uses two rotations in opposite direction for
data (Diffuser A and B) and also XOR operation with Sector key over
the sector data; Sector key is derived from additional key data. The
original public documentation is available here:
  http://download.microsoft.com/download/0/2/3/0238acaf-d3bf-4a6d-b3d6-0a0be4bbb36e/bitlockercipher200608.pdf

The dm-crypt implementation is embedded to "elephant" IV (similar to
tcw IV construction).

Because we cannot modify original bio data for write (before
encryption), an additional internal flag to pre-process data is
added.

Signed-off-by: Milan Broz <gmazyland@gmail.com>
Reviewed-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2020-01-14 20:22:46 -05:00