linux

korg/linux

mirror of https://mirrors.bfsu.edu.cn/git/linux.git synced 2024-12-22 18:44:44 +08:00

Author	SHA1	Message	Date
Jeff Layton	d55207717d	ceph: add encryption support to writepage and writepages Allow writepage to issue encrypted writes. Extend out the requested size and offset to cover complete blocks, and then encrypt and write them to the OSDs. Add the appropriate machinery to write back dirty data with encryption. Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Xiubo Li <xiubli@redhat.com> Reviewed-and-tested-by: Luís Henriques <lhenriques@suse.de> Reviewed-by: Milind Changire <mchangir@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2023-08-24 11:24:36 +02:00
Jeff Layton	33a5f1709a	ceph: add read/modify/write to ceph_sync_write When doing a synchronous write on an encrypted inode, we have no guarantee that the caller is writing crypto block-aligned data. When that happens, we must do a read/modify/write cycle. First, expand the range to cover complete blocks. If we had to change the original pos or length, issue a read to fill the first and/or last pages, and fetch the version of the object from the result. We then copy data into the pages as usual, encrypt the result and issue a write prefixed by an assertion that the version hasn't changed. If it has changed then we restart the whole thing again. If there is no object at that position in the file (-ENOENT), we prefix the write on an exclusive create of the object instead. Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Xiubo Li <xiubli@redhat.com> Reviewed-and-tested-by: Luís Henriques <lhenriques@suse.de> Reviewed-by: Milind Changire <mchangir@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2023-08-24 11:24:36 +02:00
Jeff Layton	b294fa295f	ceph: align data in pages in ceph_sync_write Encrypted files will need to be dealt with in block-sized chunks and once we do that, the way that ceph_sync_write aligns the data in the bounce buffer won't be acceptable. Change it to align the data the same way it would be aligned in the pagecache. Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Xiubo Li <xiubli@redhat.com> Reviewed-and-tested-by: Luís Henriques <lhenriques@suse.de> Reviewed-by: Milind Changire <mchangir@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2023-08-24 11:24:36 +02:00
Jeff Layton	8cff8f5374	ceph: don't use special DIO path for encrypted inodes Eventually I want to merge the synchronous and direct read codepaths, possibly via new netfs infrastructure. For now, the direct path is not crypto-enabled, so use the sync read/write paths instead. Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Xiubo Li <xiubli@redhat.com> Reviewed-and-tested-by: Luís Henriques <lhenriques@suse.de> Reviewed-by: Milind Changire <mchangir@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2023-08-24 11:24:36 +02:00
Xiubo Li	5c64737d25	ceph: add truncate size handling support for fscrypt This will transfer the encrypted last block contents to the MDS along with the truncate request only when the new size is smaller and not aligned to the fscrypt BLOCK size. When the last block is located in the file hole, the truncate request will only contain the header. The MDS could fail to do the truncate if there has another client or process has already updated the RADOS object which contains the last block, and will return -EAGAIN, then the kclient needs to retry it. The RMW will take around 50ms, and will let it retry 20 times for now. Signed-off-by: Xiubo Li <xiubli@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Reviewed-and-tested-by: Luís Henriques <lhenriques@suse.de> Reviewed-by: Milind Changire <mchangir@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2023-08-24 11:24:35 +02:00
Xiubo Li	d4d5188715	ceph: add object version support for sync read Turn the guts of ceph_sync_read into a new helper that takes an inode and an offset instead of a kiocb struct, and make ceph_sync_read call the helper as a wrapper. Make the new helper always return the last object's version. Signed-off-by: Xiubo Li <xiubli@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Reviewed-and-tested-by: Luís Henriques <lhenriques@suse.de> Reviewed-by: Milind Changire <mchangir@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2023-08-24 11:24:35 +02:00
Jeff Layton	4e8c4c2355	libceph: allow ceph_osdc_new_request to accept a multi-op read Currently we have some special-casing for multi-op writes, but in the case of a read, we can't really handle it. All of the current multi-op callers call it with CEPH_OSD_FLAG_WRITE set. Have ceph_osdc_new_request check for CEPH_OSD_FLAG_READ and if it's set, allocate multiple reply ops instead of multiple request ops. If neither flag is set, return -EINVAL. Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Xiubo Li <xiubli@redhat.com> Reviewed-and-tested-by: Luís Henriques <lhenriques@suse.de> Reviewed-by: Milind Changire <mchangir@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2023-08-24 11:24:35 +02:00
Jeff Layton	69dd3b3930	libceph: add CEPH_OSD_OP_ASSERT_VER support ...and record the user_version in the reply in a new field in ceph_osd_request, so we can populate the assert_ver appropriately. Shuffle the fields a bit too so that the new field fits in an existing hole on x86_64. Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Xiubo Li <xiubli@redhat.com> Reviewed-and-tested-by: Luís Henriques <lhenriques@suse.de> Reviewed-by: Milind Changire <mchangir@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2023-08-24 11:24:35 +02:00
Jeff Layton	77cdb7e17e	ceph: add infrastructure for file encryption and decryption ...and allow test_dummy_encryption to bypass content encryption if mounted with test_dummy_encryption=clear. [ xiubli: remove test_dummy_encryption=clear support per Ilya ] Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Xiubo Li <xiubli@redhat.com> Reviewed-and-tested-by: Luís Henriques <lhenriques@suse.de> Reviewed-by: Milind Changire <mchangir@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2023-08-24 11:24:35 +02:00
Jeff Layton	0d91f0ad6a	ceph: handle fscrypt fields in cap messages from MDS Handle the new fscrypt_file and fscrypt_auth fields in cap messages. Use them to populate new fields in cap_extra_info and update the inode with those values. Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Xiubo Li <xiubli@redhat.com> Reviewed-and-tested-by: Luís Henriques <lhenriques@suse.de> Reviewed-by: Milind Changire <mchangir@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2023-08-24 11:24:35 +02:00
Jeff Layton	16be62fc8a	ceph: size handling in MClientRequest, cap updates and inode traces For encrypted inodes, transmit a rounded-up size to the MDS as the normal file size and send the real inode size in fscrypt_file field. Also, fix up creates and truncates to also transmit fscrypt_file. When we get an inode trace from the MDS, grab the fscrypt_file field if the inode is encrypted, and use it to populate the i_size field instead of the regular inode size field. Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Xiubo Li <xiubli@redhat.com> Reviewed-and-tested-by: Luís Henriques <lhenriques@suse.de> Reviewed-by: Milind Changire <mchangir@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2023-08-24 11:24:35 +02:00
Luís Henriques	14e034a61c	ceph: mark directory as non-complete after loading key When setting a directory's crypt context, ceph_dir_clear_complete() needs to be called otherwise if it was complete before, any existing (old) dentry will still be valid. This patch adds a wrapper around __fscrypt_prepare_readdir() which will ensure a directory is marked as non-complete if key status changes. [ xiubli: revise commit title per Milind ] Signed-off-by: Luís Henriques <lhenriques@suse.de> Reviewed-by: Xiubo Li <xiubli@redhat.com> Reviewed-by: Milind Changire <mchangir@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2023-08-24 11:24:35 +02:00
Luís Henriques	e127e03009	ceph: allow encrypting a directory while not having Ax caps If a client doesn't have Fx caps on a directory, it will get errors while trying encrypt it: ceph: handle_cap_grant: cap grant attempt to change fscrypt_auth on non-I_NEW inode (old len 0 new len 48) fscrypt (ceph, inode 1099511627812): Error -105 getting encryption context A simple way to reproduce this is to use two clients: client1 # mkdir /mnt/mydir client2 # ls /mnt/mydir client1 # fscrypt encrypt /mnt/mydir client1 # echo hello > /mnt/mydir/world This happens because, in __ceph_setattr(), we only initialize ci->fscrypt_auth if we have Ax and ceph_fill_inode() won't use the fscrypt_auth received if the inode state isn't I_NEW. Fix it by allowing ceph_fill_inode() to also set ci->fscrypt_auth if the inode doesn't have it set already. Signed-off-by: Luís Henriques <lhenriques@suse.de> Reviewed-by: Xiubo Li <xiubli@redhat.com> Reviewed-by: Milind Changire <mchangir@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2023-08-24 11:24:35 +02:00
Jeff Layton	94af047092	ceph: add some fscrypt guardrails Add the appropriate calls into fscrypt for various actions, including link, rename, setattr, and the open codepaths. Disable fallocate for encrypted inodes -- hopefully, just for now. If we have an encrypted inode, then the client will need to re-encrypt the contents of the new object. Disable copy offload to or from encrypted inodes. Set i_blkbits to crypto block size for encrypted inodes -- some of the underlying infrastructure for fscrypt relies on i_blkbits being aligned to crypto blocksize. Report STATX_ATTR_ENCRYPTED on encrypted inodes. [ lhenriques: forbid encryption with striped layouts ] Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Xiubo Li <xiubli@redhat.com> Reviewed-and-tested-by: Luís Henriques <lhenriques@suse.de> Reviewed-by: Milind Changire <mchangir@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2023-08-24 11:24:35 +02:00
Jeff Layton	79f2f6ad87	ceph: create symlinks with encrypted and base64-encoded targets When creating symlinks in encrypted directories, encrypt and base64-encode the target with the new inode's key before sending to the MDS. When filling a symlinked inode, base64-decode it into a buffer that we'll keep in ci->i_symlink. When get_link is called, decrypt the buffer into a new one that will hang off i_link. Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Xiubo Li <xiubli@redhat.com> Reviewed-and-tested-by: Luís Henriques <lhenriques@suse.de> Reviewed-by: Milind Changire <mchangir@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2023-08-24 11:24:35 +02:00
Xiubo Li	af9ffa6df7	ceph: add support to readdir for encrypted names To make it simpler to decrypt names in a readdir reply (i.e. before we have a dentry), add a new ceph_encode_encrypted_fname()-like helper that takes a qstr pointer instead of a dentry pointer. Once we've decrypted the names in a readdir reply, we no longer need the crypttext, so overwrite them in ceph_mds_reply_dir_entry with the unencrypted names. Then in both ceph_readdir_prepopulate() and ceph_readdir() we will use the dencrypted name directly. [ jlayton: convert some BUG_ONs into error returns ] Signed-off-by: Xiubo Li <xiubli@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Reviewed-and-tested-by: Luís Henriques <lhenriques@suse.de> Reviewed-by: Milind Changire <mchangir@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2023-08-24 11:24:34 +02:00
Xiubo Li	3859af9eba	ceph: pass the request to parse_reply_info_readdir() Instead of passing just the r_reply_info to the readdir reply parser, pass the request pointer directly instead. This will facilitate implementing readdir on fscrypted directories. Signed-off-by: Xiubo Li <xiubli@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Reviewed-and-tested-by: Luís Henriques <lhenriques@suse.de> Reviewed-by: Milind Changire <mchangir@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2023-08-24 11:24:34 +02:00
Jeff Layton	855290962c	ceph: make ceph_fill_trace and ceph_get_name decrypt names When we get a dentry in a trace, decrypt the name so we can properly instantiate the dentry or fill out ceph_get_name() buffer. Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Xiubo Li <xiubli@redhat.com> Reviewed-and-tested-by: Luís Henriques <lhenriques@suse.de> Reviewed-by: Milind Changire <mchangir@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2023-08-24 11:24:34 +02:00
Jeff Layton	457117f077	ceph: add helpers for converting names for userland presentation Define a new ceph_fname struct that we can use to carry information about encrypted dentry names. Add helpers for working with these objects, including ceph_fname_to_usr which formats an encrypted filename for userland presentation. [ xiubli: fix resulting name length check -- neither name_len nor ctext_len should exceed NAME_MAX ] Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Xiubo Li <xiubli@redhat.com> Reviewed-and-tested-by: Luís Henriques <lhenriques@suse.de> Reviewed-by: Milind Changire <mchangir@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2023-08-24 11:24:34 +02:00
Jeff Layton	c526760181	ceph: make d_revalidate call fscrypt revalidator for encrypted dentries If we have a dentry which represents a no-key name, then we need to test whether the parent directory's encryption key has since been added. Do that before we test anything else about the dentry. Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Xiubo Li <xiubli@redhat.com> Reviewed-and-tested-by: Luís Henriques <lhenriques@suse.de> Reviewed-by: Milind Changire <mchangir@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2023-08-24 11:24:34 +02:00
Jeff Layton	cb3524a8bd	ceph: set DCACHE_NOKEY_NAME flag in ceph_lookup/atomic_open() This is required so that we know to invalidate these dentries when the directory is unlocked. Atomic open can act as a lookup if handed a dentry that is negative on the MDS. Ensure that we set DCACHE_NOKEY_NAME on the dentry in atomic_open, if we don't have the key for the parent. Otherwise, we can end up validating the dentry inappropriately if someone later adds a key. Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Xiubo Li <xiubli@redhat.com> Reviewed-and-tested-by: Luís Henriques <lhenriques@suse.de> Reviewed-by: Milind Changire <mchangir@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2023-08-24 11:24:34 +02:00
Jeff Layton	4ac4c23eaa	ceph: decode alternate_name in lease info Ceph is a bit different from local filesystems, in that we don't want to store filenames as raw binary data, since we may also be dealing with clients that don't support fscrypt. We could just base64-encode the encrypted filenames, but that could leave us with filenames longer than NAME_MAX. It turns out that the MDS doesn't care much about filename length, but the clients do. To manage this, we've added a new "alternate name" field that can be optionally added to any dentry that we'll use to store the binary crypttext of the filename if its base64-encoded value will be longer than NAME_MAX. When a dentry has one of these names attached, the MDS will send it along in the lease info, which we can then store for later usage. Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Xiubo Li <xiubli@redhat.com> Reviewed-and-tested-by: Luís Henriques <lhenriques@suse.de> Reviewed-by: Milind Changire <mchangir@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2023-08-24 11:24:34 +02:00
Jeff Layton	24865e75c1	ceph: send alternate_name in MClientRequest In the event that we have a filename longer than CEPH_NOHASH_NAME_MAX, we'll need to hash the tail of the filename. The client however will still need to know the full name of the file if it has a key. To support this, the MClientRequest field has grown a new alternate_name field that we populate with the full (binary) crypttext of the filename. This is then transmitted to the clients in readdir or traces as part of the dentry lease. Add support for populating this field when the filenames are very long. Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Xiubo Li <xiubli@redhat.com> Reviewed-and-tested-by: Luís Henriques <lhenriques@suse.de> Reviewed-by: Milind Changire <mchangir@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2023-08-24 11:24:34 +02:00
Jeff Layton	3fd945a79e	ceph: encode encrypted name in ceph_mdsc_build_path and dentry release Allow ceph_mdsc_build_path to encrypt and base64 encode the filename when the parent is encrypted and we're sending the path to the MDS. In a similar fashion, encode encrypted dentry names if including a dentry release in a request. In most cases, we just encrypt the filenames and base64 encode them, but when the name is longer than CEPH_NOHASH_NAME_MAX, we use a similar scheme to fscrypt proper, and hash the remaning bits with sha256. When doing this, we then send along the full crypttext of the name in the new alternate_name field of the MClientRequest. The MDS can then send that along in readdir responses and traces. [ idryomov: drop duplicate include reported by Abaci Robot ] Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Xiubo Li <xiubli@redhat.com> Reviewed-and-tested-by: Luís Henriques <lhenriques@suse.de> Reviewed-by: Milind Changire <mchangir@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2023-08-24 11:22:37 +02:00
Luís Henriques	64e86f632b	ceph: add base64 endcoding routines for encrypted names The base64url encoding used by fscrypt includes the '_' character, which may cause problems in snapshot names (if the name starts with '_'). Thus, use the base64 encoding defined for IMAP mailbox names (RFC 3501), which uses '+' and ',' instead of '-' and '_'. Signed-off-by: Luís Henriques <lhenriques@suse.de> Reviewed-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Xiubo Li <xiubli@redhat.com> Reviewed-by: Milind Changire <mchangir@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2023-08-22 09:01:48 +02:00
Xiubo Li	b7b53361c8	ceph: make ioctl cmds more readable in debug log ioctl file 0000000004e6b054 cmd 2148296211 arg 824635143532 The numerical cmd value in the ioctl debug log message is too hard to understand even when you look at it in the code. Make it more readable. [ idryomov: add missing _ in ceph_ioctl_cmd_name() ] Signed-off-by: Xiubo Li <xiubli@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Reviewed-and-tested-by: Luís Henriques <lhenriques@suse.de> Reviewed-by: Milind Changire <mchangir@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2023-08-22 09:01:48 +02:00
Jeff Layton	f061feda6c	ceph: add fscrypt ioctls and ceph.fscrypt.auth vxattr We gate most of the ioctls on MDS feature support. The exception is the key removal and status functions that we still want to work if the MDS's were to (inexplicably) lose the feature. For the set_policy ioctl, we take Fs caps to ensure that nothing can create files in the directory while the ioctl is running. That should be enough to ensure that the "empty_dir" check is reliable. The vxattr is read-only, added mostly for future debugging purposes. Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Xiubo Li <xiubli@redhat.com> Reviewed-and-tested-by: Luís Henriques <lhenriques@suse.de> Reviewed-by: Milind Changire <mchangir@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2023-08-22 09:01:48 +02:00
Jeff Layton	6b5717bd30	ceph: implement -o test_dummy_encryption mount option Add support for the test_dummy_encryption mount option. This allows us to test the encrypted codepaths in ceph without having to manually set keys, etc. [ lhenriques: fix potential fsc->fsc_dummy_enc_policy memory leak in ceph_real_mount() ] Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Xiubo Li <xiubli@redhat.com> Reviewed-and-tested-by: Luís Henriques <lhenriques@suse.de> Reviewed-by: Milind Changire <mchangir@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2023-08-22 09:01:48 +02:00
Jeff Layton	2d332d5bc4	ceph: fscrypt_auth handling for ceph Most fscrypt-enabled filesystems store the crypto context in an xattr, but that's problematic for ceph as xatts are governed by the XATTR cap, but we really want the crypto context as part of the AUTH cap. Because of this, the MDS has added two new inode metadata fields: fscrypt_auth and fscrypt_file. The former is used to hold the crypto context, and the latter is used to track the real file size. Parse new fscrypt_auth and fscrypt_file fields in inode traces. For now, we don't use fscrypt_file, but fscrypt_auth is used to hold the fscrypt context. Allow the client to use a setattr request for setting the fscrypt_auth field. Since this is not a standard setattr request from the VFS, we add a new field to __ceph_setattr that carries ceph-specific inode attrs. Have the set_context op do a setattr that sets the fscrypt_auth value, and get_context just return the contents of that field (since it should always be available). Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Xiubo Li <xiubli@redhat.com> Reviewed-and-tested-by: Luís Henriques <lhenriques@suse.de> Reviewed-by: Milind Changire <mchangir@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2023-08-22 09:01:48 +02:00
Jeff Layton	4de77f25fd	ceph: use osd_req_op_extent_osd_iter for netfs reads The netfs layer has already pinned the pages involved before calling issue_op, so we can just pass down the iter directly instead of calling iov_iter_get_pages_alloc. Instead of having to allocate a page array, use CEPH_MSG_DATA_ITER and pass it the iov_iter directly to clone. Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Xiubo Li <xiubli@redhat.com> Reviewed-and-tested-by: Luís Henriques <lhenriques@suse.de> Reviewed-by: Milind Changire <mchangir@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2023-08-22 09:01:48 +02:00
Jeff Layton	dee0c5f834	libceph: add new iov_iter-based ceph_msg_data_type and ceph_osd_data_type Add an iov_iter to the unions in ceph_msg_data and ceph_msg_data_cursor. Instead of requiring a list of pages or bvecs, we can just use an iov_iter directly, and avoid extra allocations. We assume that the pages represented by the iter are pinned such that they shouldn't incur page faults, which is the case for the iov_iters created by netfs. While working on this, Al Viro informed me that he was going to change iov_iter_get_pages to auto-advance the iterator as that pattern is more or less required for ITER_PIPE anyway. We emulate that here for now by advancing in the _next op and tracking that amount in the "lastlen" field. In the event that _next is called twice without an intervening _advance, we revert the iov_iter by the remaining lastlen before calling iov_iter_get_pages. Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: David Howells <dhowells@redhat.com> Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Xiubo Li <xiubli@redhat.com> Reviewed-and-tested-by: Luís Henriques <lhenriques@suse.de> Reviewed-by: Milind Changire <mchangir@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2023-08-22 09:01:48 +02:00
Jeff Layton	4c793d4c58	ceph: make ceph_msdc_build_path use ref-walk Encryption potentially requires allocation, at which point we'll need to be in a non-atomic context. Convert ceph_msdc_build_path to take dentry spinlocks and references instead of using rcu_read_lock to walk the path. This is slightly less efficient, and we may want to eventually allow using RCU when the leaf dentry isn't encrypted. Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Xiubo Li <xiubli@redhat.com> Reviewed-and-tested-by: Luís Henriques <lhenriques@suse.de> Reviewed-by: Milind Changire <mchangir@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2023-08-22 09:01:48 +02:00
Jeff Layton	ec9595c080	ceph: preallocate inode for ops that may create one When creating a new inode, we need to determine the crypto context before we can transmit the RPC. The fscrypt API has a routine for getting a crypto context before a create occurs, but it requires an inode. Change the ceph code to preallocate an inode in advance of a create of any sort (open(), mknod(), symlink(), etc). Move the existing code that generates the ACL and SELinux blobs into this routine since that's mostly common across all the different codepaths. In most cases, we just want to allow ceph_fill_trace to use that inode after the reply comes in, so add a new field to the MDS request for it (r_new_inode). The async create codepath is a bit different though. In that case, we want to hash the inode in advance of the RPC so that it can be used before the reply comes in. If the call subsequently fails with -EJUKEBOX, then just put the references and clean up the as_ctx. Note that with this change, we now need to regenerate the as_ctx when this occurs, but it's quite rare for it to happen. Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Xiubo Li <xiubli@redhat.com> Reviewed-and-tested-by: Luís Henriques <lhenriques@suse.de> Reviewed-by: Milind Changire <mchangir@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2023-08-22 09:01:47 +02:00
Jeff Layton	03bc06c7b0	ceph: add new mount option to enable sparse reads Add a new mount option that has the client issue sparse reads instead of normal ones. The callers now preallocate an sparse extent buffer that the libceph receive code can populate and hand back after the operation completes. After a successful sparse read, we can't use the req->r_result value to determine the amount of data "read", so instead we set the received length to be from the end of the last extent in the buffer. Any interstitial holes will have been filled by the receive code. [ xiubli: fix a double free on req reported by Ilya ] Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Xiubo Li <xiubli@redhat.com> Reviewed-and-tested-by: Luís Henriques <lhenriques@suse.de> Reviewed-by: Milind Changire <mchangir@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2023-08-22 09:01:47 +02:00
Jeff Layton	f628d79997	libceph: add sparse read support to OSD client Have get_reply check for the presence of sparse read ops in the request and set the sparse_read boolean in the msg. That will queue the messenger layer to use the sparse read codepath instead of the normal data receive. Add a new sparse_read operation for the OSD client, driven by its own state machine. The messenger will repeatedly call the sparse_read operation, and it will pass back the necessary info to set up to read the next extent of data, while zero-filling the sparse regions. The state machine will stop at the end of the last extent, and will attach the extent map buffer to the ceph_osd_req_op so that the caller can use it. Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Xiubo Li <xiubli@redhat.com> Reviewed-and-tested-by: Luís Henriques <lhenriques@suse.de> Reviewed-by: Milind Changire <mchangir@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2023-08-22 09:01:47 +02:00
Jeff Layton	d396f89db3	libceph: add sparse read support to msgr1 Add 2 new fields to ceph_connection_v1_info to track the necessary info in sparse reads. Skip initializing the cursor for a sparse read. Break out read_partial_message_section into a wrapper around a new read_partial_message_chunk function that doesn't zero out the crc first. Add new helper functions to drive receiving into the destinations provided by the sparse_read state machine. Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Xiubo Li <xiubli@redhat.com> Reviewed-and-tested-by: Luís Henriques <lhenriques@suse.de> Reviewed-by: Milind Changire <mchangir@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2023-08-22 09:01:47 +02:00
Jeff Layton	f36217e35c	libceph: support sparse reads on msgr2 secure codepath Add a new init_sgs_pages helper that populates the scatterlist from an arbitrary point in an array of pages. Change setup_message_sgs to take an optional pointer to an array of pages. If that's set, then the scatterlist will be set using that array instead of the cursor. When given a sparse read on a secure connection, decrypt the data in-place rather than into the final destination, by passing it the in_enc_pages array. After decrypting, run the sparse_read state machine in a loop, copying data from the decrypted pages until it's complete. Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Xiubo Li <xiubli@redhat.com> Reviewed-and-tested-by: Luís Henriques <lhenriques@suse.de> Reviewed-by: Milind Changire <mchangir@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2023-08-22 09:01:47 +02:00
Jeff Layton	ec3bc567ea	libceph: new sparse_read op, support sparse reads on msgr2 crc codepath Add support for a new sparse_read ceph_connection operation. The idea is that the client driver can define this operation use it to do special handling for incoming reads. The alloc_msg routine will look at the request and determine whether the reply is expected to be sparse. If it is, then we'll dispatch to a different set of state machine states that will repeatedly call the driver's sparse_read op to get length and placement info for reading the extent map, and the extents themselves. This necessitates adding some new field to some other structs: - The msg gets a new bool to track whether it's a sparse_read request. - A new field is added to the cursor to track the amount remaining in the current extent. This is used to cap the read from the socket into the msg_data - Handing a revoke with all of this is particularly difficult, so I've added a new data_len_remain field to the v2 connection info, and then use that to skip that much on a revoke. We may want to expand the use of that to the normal read path as well, just for consistency's sake. Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Xiubo Li <xiubli@redhat.com> Reviewed-and-tested-by: Luís Henriques <lhenriques@suse.de> Reviewed-by: Milind Changire <mchangir@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2023-08-22 09:01:47 +02:00
Jeff Layton	a679e50f72	libceph: define struct ceph_sparse_extent and add some helpers When the OSD sends back a sparse read reply, it contains an array of these structures. Define the structure and add a couple of helpers for dealing with them. Also add a place in struct ceph_osd_req_op to store the extent buffer, and code to free it if it's populated when the req is torn down. Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Xiubo Li <xiubli@redhat.com> Reviewed-and-tested-by: Luís Henriques <lhenriques@suse.de> Reviewed-by: Milind Changire <mchangir@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2023-08-22 09:01:47 +02:00
Jeff Layton	08b8a0440e	libceph: add spinlock around osd->o_requests In a later patch, we're going to need to search for a request in the rbtree, but taking the o_mutex is inconvenient as we already hold the con mutex at the point where we need it. Add a new spinlock that we take when inserting and erasing entries from the o_requests tree. Search of the rbtree can be done with either the mutex or the spinlock, but insertion and removal requires both. Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Xiubo Li <xiubli@redhat.com> Reviewed-and-tested-by: Luís Henriques <lhenriques@suse.de> Reviewed-by: Milind Changire <mchangir@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2023-08-22 09:01:47 +02:00
Linus Torvalds	706a741595	Linux 6.5-rc7	2023-08-20 15:02:52 +02:00
Linus Torvalds	b320441c04	TTY/Serial fixes for 6.5-rc7 Here are some small tty and serial core fixes for 6.5-rc7 that resolve a lot of reported issues. Primarily in here is the fixes for the serial bus code from Tony that came in -rc1, as it hit wider testing with the huge number of different types of systems and serial ports. All of the reported issues with duplicate names and other issues with this code are now resolved. Other than that included in here is: - n_gsm fix for a previous fix - 8250 lockdep annotation fix - fsl_lpuart serial driver fix - TIOCSTI documentation update for previous CAP_SYS_ADMIN change All of these have been in linux-next for a while with no reported problems. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> -----BEGIN PGP SIGNATURE----- iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCZOEjmg8cZ3JlZ0Brcm9h aC5jb20ACgkQMUfUDdst+ykpSgCfajOM7BUzL1kOrfKNvHQuPmOYlIUAnAot+4M+ HkvS58Xs2+PM14y/KG1j =6DRj -----END PGP SIGNATURE----- Merge tag 'tty-6.5-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty Pull tty/serial fixes from Greg KH: "Here are some small tty and serial core fixes for 6.5-rc7 that resolve a lot of reported issues. Primarily in here are the fixes for the serial bus code from Tony that came in -rc1, as it hit wider testing with the huge number of different types of systems and serial ports. All of the reported issues with duplicate names and other issues with this code are now resolved. Other than that included in here is: - n_gsm fix for a previous fix - 8250 lockdep annotation fix - fsl_lpuart serial driver fix - TIOCSTI documentation update for previous CAP_SYS_ADMIN change All of these have been in linux-next for a while with no reported problems" * tag 'tty-6.5-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty: serial: core: Fix serial core port id, including multiport devices serial: 8250: drop lockdep annotation from serial8250_clear_IER() tty: n_gsm: fix the UAF caused by race condition in gsm_cleanup_mux serial: core: Revert port_id use TIOCSTI: Document CAP_SYS_ADMIN behaviour in Kconfig serial: 8250: Fix oops for port->pm on uart_change_pm() serial: 8250: Reinit port_id when adding back serial8250_isa_devs serial: core: Fix kmemleak issue for serial core device remove MAINTAINERS: Merge TTY layer and serial drivers serial: core: Fix serial_base_match() after fixing controller port name serial: core: Fix serial core controller port name to show controller id serial: core: Fix serial core port id to not use port->line serial: core: Controller id cannot be negative tty: serial: fsl_lpuart: Clear the error flags by writing 1 for lpuart32 platforms	2023-08-20 08:26:51 +02:00
Linus Torvalds	ec27a636d7	Rust fixes for 6.5-rc7 - Macros: fix 'HAS_' redefinition by the '#[vtable]' macro under conditional compilation. -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEPjU5OPd5QIZ9jqqOGXyLc2htIW0FAmThFSQACgkQGXyLc2ht IW1H5w/7BrYr2h+KYBtpDKagivO1Gm+OnIY0T2cN/bv+6Wt3XFqjyXrlUpQYZpVA s6V7W7ihJ2+HThGIZNrrjtFX/MetRxc4PT8RH3ixOkd7yggPfHegRGI59QNrHeAB 9Z8hhhTp2qqcBq4asA+5twsO6HBuGAfVRCyi62dhfmq3fGUGhp7PWKrIOZ+Fd5bX SgU0uqHJegyXSx5obH5pwMM+CMr6+obgM99DWMCbKEysS1kEapZg8o5jQ+8wmpsh vnUfMt7vtWVrd1KHV+lyVJz1VgugQudfd3YyO1RueAc8uq1sNexTTgz7wqDDLiN2 KuGIVFt5+iMcgUbRzPA87muArpIOBJntjt20i4HUb+Xj5lp8raLUuvs1lF6iPvbK WqxGhpnA81UuYnWpshUx0iqv6lZFIYZ2WlhGcvQcpCKVkWdCgEUDRMdi9ZRvCRyQ 2ezye2c8sBhjvSmKcvQSMWY2sE3IQ4MRqbyiV9qnXnLe83FNDX5VuFAsShBKFpsa A4I6LkCVWjBcLTDi+ggg6cAzWTQtULzi7acwWS3qw4W0Yj/jtHH4Ire9EljW/CV/ CvM+7p59Ff18h28Pz6dip5s/n1VEVZozeTOIonw9p28Na+5Y+pUbRkpoZZh6mYhW So55wSLR4TT7lYY9W1PgW5NEcTZm0BuJ9ZJwNsySy5NU7kWfvZk= =McGU -----END PGP SIGNATURE----- Merge tag 'rust-fixes-6.5-rc7' of https://github.com/Rust-for-Linux/linux Pull rust fix from Miguel Ojeda: - Macros: fix 'HAS_' redefinition by the '#[vtable]' macro under conditional compilation * tag 'rust-fixes-6.5-rc7' of https://github.com/Rust-for-Linux/linux: rust: macros: vtable: fix `HAS_*` redefinition (`gen_const_name`)	2023-08-20 08:18:58 +02:00
Linus Torvalds	9e6c269de4	Usual set of driver fixes. A bit more than usual because I was unavailable for a while. -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEOZGx6rniZ1Gk92RdFA3kzBSgKbYFAmTg7RsACgkQFA3kzBSg KbbVXw/9GjnFwj0sazyKuHJdE3l3lP5RbWPhO9lF4pwSG2XAyVMLjpcsN27LvrKl eu19hAu0a6C/PlUgYieCak/yl/9Xv08gxAcyI+PquAVtJ2XuWtsDUc0y1Rq2O53v xFtQEy5eARWH3hI7f2rBbZWpgIXit7Xd+WLNRREttQgRRN8L9qmOwiZKuG2S2UQa QMFn6qcOccwtkkvv992sZ4Pw54d9nOFCxKNim0TCiEKL9lgIelnrCJzqpw+yFRz/ a/bDtb2lLJGHeaSsjCH4dyobecLLXd/DtJSenCCGJUpfQ9cU9AqZDH/EUKQEUHyQ p+LbixN/0viDZALwMzsXxPp+HuSouYr9m6s0ZstMUO73y9ydhaSoquq0f5eRjKbI PRyhtBw55QLduoOITPcyPMwBxEIxXWhoNbF4hRnpdIRzLQ/JB+kO5a3RZ4BYFMP1 Rjz3vBKf2xzxlHJStYltNexmgPQ8PsfC6b7VJdyTOV74KUBADvWXPlBH/1tsppDX wwQMrYE7ouzhJ1ypq6sDj7/ERjw9+0+XVbnlyEi6S9XXx1OkzHN6HMZWC34LCXiR NXowjvU3O22wTnK3jzo2rIbJ+FPOYcaLcqV87vBCWigEByJ9jGxrwBBy2y4qsmyI MpwwiXEuODdA1OEByxkCY3U5bfIjdW7QzPGb6tJEPJNWjFOHG3w= =mRg9 -----END PGP SIGNATURE----- Merge tag 'i2c-for-6.5-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux Pull i2c fixes from Wolfram Sang: "Usual set of driver fixes. A bit more than usual because I was unavailable for a while" * tag 'i2c-for-6.5-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux: i2c: bcm-iproc: Fix bcm_iproc_i2c_isr deadlock issue i2c: Update documentation to use .probe() again i2c: sun6i-p2wi: Fix an error message in probe() i2c: hisi: Only handle the interrupt of the driver's transfer i2c: tegra: Fix i2c-tegra DMA config option processing i2c: tegra: Fix failure during probe deferral cleanup i2c: designware: Handle invalid SMBus block data response length value i2c: designware: Correct length byte validation logic i2c: imx-lpi2c: return -EINVAL when i2c peripheral clk doesn't work	2023-08-19 19:22:41 +02:00
Linus Torvalds	12e6ccedb3	for-6.5-rc6-tag -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEE8rQSAMVO+zA4DBdWxWXV+ddtWDsFAmTgyQQACgkQxWXV+ddt WDvqSQ/+PFg0GwssGuiqWTGbfHV2bJCJWeuXUJNuKFo8PtEnpN0zf28ihsaRXAHF ZDFKrRjEmb62n+EWJFDpC7wmnz6UJEoEtQteN2VBnLSIUQAKFI+g5flXrR85rk1D d52JSXtaXSZeCtZH/wdYWdfkL19SJQqJrFDY1WmRLCylOsLHuG0a67fXNeL+5WM/ NgGUMk0bO/j2CKjiCwJT4EpsSP4tFj49TciuDESyXnS8aDbPLbAQkGpYlE+99HSj D3vjZeqdVfmVhSjdIrK2eTlndzCl+HU+J1DXHzRE6I5XkXhzofJFtrlsvl++C9pv UZL9bFyMFzybKME33RWvzXBhiRguZ4hfGBoh5FQbJl4yErU4I5RVZcd3/S/2V6n+ AzWemwkOdLEiiPD+aLV28EYdKpnd4GFweVTxeXjdXrJrSx/e4Vn/kPNq1aZJi6Qi ex3hZWr0oN7JG/StN6i3ix09fEB8cyDzn/jaEwk5zb6uHVN8fw7whkVwZOvFkXx5 VcPxZOyxBFxwmN+L6JlxkIGEpu8UQC2RHa1JJzDTXJPqpz6W68d2wJ8jlDFJYUaf fahDd8FoG/e/EYh8sPsOnp3gMY53UxxWLF8fuZXVScq9+g5zA3jfftF+a3TaA5bh e119g0ml+KIGtTB7Q8nLob4PA12NNhNtHbKfdSPDhOfvz8heg9A= =eFDQ -----END PGP SIGNATURE----- Merge tag 'for-6.5-rc6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs fixes from David Sterba: - fix infinite loop in readdir(), could happen in a big directory when files get renamed during enumeration - fix extent map handling of skipped pinned ranges - fix a corner case when handling ordered extent length - fix a potential crash when balance cancel races with pause - verify correct uuid when starting scrub or device replace * tag 'for-6.5-rc6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: btrfs: fix incorrect splitting in btrfs_drop_extent_map_range btrfs: fix BUG_ON condition in btrfs_cancel_balance btrfs: only subtract from len_to_oe_boundary when it is tracking an extent btrfs: fix replace/scrub failure with metadata_uuid btrfs: fix infinite directory reads	2023-08-19 17:57:07 +02:00
Linus Torvalds	b5cab28be6	fbdev fixes and cleanups for 6.5-rc7: - various code cleanups in amifb, atmel_lcdfb, ssd1307fb, kyro and goldfishfb -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQS86RI+GtKfB8BJu973ErUQojoPXwUCZODeUwAKCRD3ErUQojoP X4RvAP97hetBNJFMw3N34QT4TYT3NUAhR12z73YFTi/PeKVSMgEA4nQHXuSP5Ymv 5+FNFsEHGJs6p25AMDc6oDKTYkVuMQc= =PYOb -----END PGP SIGNATURE----- Merge tag 'fbdev-for-6.5-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/linux-fbdev Pull fbdev fixes and cleanups from Helge Deller: - various code cleanups in amifb, atmel_lcdfb, ssd1307fb, kyro and goldfishfb * tag 'fbdev-for-6.5-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/linux-fbdev: fbdev: goldfishfb: Do not check 0 for platform_get_irq() fbdev: atmel_lcdfb: Remove redundant of_match_ptr() fbdev: kyro: Remove unused declarations fbdev: ssd1307fb: Print the PWM's label instead of its number fbdev: mmp: fix value check in mmphw_probe() fbdev: amifb: Replace zero-length arrays with DECLARE_FLEX_ARRAY() helper	2023-08-19 17:43:55 +02:00
Linus Torvalds	2383ffc41a	block-6.5-2023-08-19 -----BEGIN PGP SIGNATURE----- iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmTg19oQHGF4Ym9lQGtl cm5lbC5kawAKCRD301j7KXHgpmLjD/wKs0hA4JOQDqZPZK1p1aBU4f0vXwQxFGlL +gcnO4/MIB/5Ud+T+SXYuMrimLws7xsVbymcGatiRjH8LTfJVXFhuAzLILi0AcHw nhzjOUEzHokUex+tZLZRZxmavR+9SyGJoFNIbh+mY8JOLdNVzFDSqnLWO+D02Q2R OOBupA0mLRelYODEm2rI4xlQndwfrOAoAyEv+R7Ug0F6bFSno36QOg64pmZVI0Fl eudORXnIRYdtUajv+kNATWoqBbq/UCuBJdk0veM07Try6ZGRXRh6dQSA+GRh93pE Zg3JAHj4MKwlP3/wglw3SzoeECHpZrKQavIQQe9pTWKP4xGI/jdbVBcyFE0ERc66 HijMo6CLeAzpOI1nEv+QhD8ntr4polEiWL4EVLuoXE9fVI1mYzavqmqrsDHeOHeF IJHadXZwsTG243msDvqedy0RFBwAkpnK0XdQuDtMnSa7UHwWWbxwUOwO5p4COJ3g vmrCfPQr7TTgkOtAXoMnwOZ1troEGxa/2CdUKaTdVG8RkMeM2qy8tmBBTV9Bx6+i rwQbB/JJm5SE6DX309TRaR6w+5YiwR6e7ECKx5hdYXia7M3OxlBBvl1NOfiWjWE3 abC38/FReHLmFKHaDaN2AM1vLy+duc4NEc/yMQ4FDcfj/hUHQCoZBPYUsvlC+a4e Ws4qoMLU8A== =LnzH -----END PGP SIGNATURE----- Merge tag 'block-6.5-2023-08-19' of git://git.kernel.dk/linux Pull block fixes from Jens Axboe: "Main thing here is the fix for the regression in flush handling which caused IO hangs/stalls for a few reporters. Hopefully that should all be sorted out now. Outside of that, just a few minor fixes for issues that were introduced in this cycle" * tag 'block-6.5-2023-08-19' of git://git.kernel.dk/linux: blk-mq: release scheduler resource when request completes blk-crypto: dynamically allocate fallback profile blk-cgroup: hold queue_lock when removing blkg->q_node drivers/rnbd: restore sysfs interface to rnbd-client	2023-08-19 17:31:46 +02:00
Chengming Zhou	e5c0ca1365	blk-mq: release scheduler resource when request completes Chuck reported [1] an IO hang problem on NFS exports that reside on SATA devices and bisected to commit `615939a2ae` ("blk-mq: defer to the normal submission path for post-flush requests"). We analysed the IO hang problem, found there are two postflush requests waiting for each other. The first postflush request completed the REQ_FSEQ_DATA sequence, so go to the REQ_FSEQ_POSTFLUSH sequence and added in the flush pending list, but failed to blk_kick_flush() because of the second postflush request which is inflight waiting in scheduler queue. The second postflush waiting in scheduler queue can't be dispatched because the first postflush hasn't released scheduler resource even though it has completed by itself. Fix it by releasing scheduler resource when the first postflush request completed, so the second postflush can be dispatched and completed, then make blk_kick_flush() succeed. While at it, remove the check for e->ops.finish_request, as all schedulers set that. Reaffirm this requirement by adding a WARN_ON_ONCE() at scheduler registration time, just like we do for insert_requests and dispatch_request. [1] https://lore.kernel.org/all/7A57C7AE-A51A-4254-888B-FE15CA21F9E9@oracle.com/ Link: https://lore.kernel.org/linux-block/20230819031206.2744005-1-chengming.zhou@linux.dev/ Reported-by: kernel test robot <oliver.sang@intel.com> Closes: https://lore.kernel.org/oe-lkp/202308172100.8ce4b853-oliver.sang@intel.com Fixes: `615939a2ae` ("blk-mq: defer to the normal submission path for post-flush requests") Reported-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com> Tested-by: Chuck Lever <chuck.lever@oracle.com> Link: https://lore.kernel.org/r/20230813152325.3017343-1-chengming.zhou@linux.dev [axboe: folded in incremental fix and added tags] Signed-off-by: Jens Axboe <axboe@kernel.dk>	2023-08-19 07:47:17 -06:00
Linus Torvalds	aa9ea98cca	media fixes for v6.5-rc7 -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEE+QmuaPwR3wnBdVwACF8+vY7k4RUFAmTc0qUACgkQCF8+vY7k 4RXRqQ//RPlpmt65W0iXWkGjfqIJENhPpjC18ae3UWkK7ecSfV9PY8rOUmDea31H SrJCMjDjHB2FkIQlQ0Ljx16X+MIiJZYPsk2PfsF0kTsvbIvz7ZD6zYW0ZJW7PtNL TDh9nNMQrKPH6H1BjSmDK18jz6IyQu7aCSi4rVd5NjzvIltgo7O/GFo3+8PtZA0j aa0KbB7ginzAjZSefbqy7uSF6ca0u/VNmp1J3yGeCPUsPnUJUSOeSmtAx2deV0gM t6ICC79hyUJbpA1g9XE3Yj6oOT1GgHbWAG8cstEXTnK7F/u1uKTNyMOhrcEF1fnf hN4it5Q9kujmN6dfk6R9q8boes3omtEKyh6LIhjzAVqvOCY9R3BnCzfUkYfWfXRq Uu51I8kXQTc2bm3nFKk3323JcfZuZkAkwl2GKJJXMJFkYxux3by8tyJI3IIRRfO5 HYj/WnoM1VoKQ4Voa76jCv8gUZNwWEFcgXIIf6X6UWf5pDAy36gZFqQ1Jtb6Hk8l VbWC+o+7pG+e770Iw/dYDTPLWG23/C0tMvr1ra+nBKFDx38u49nbpQKa7WEDWL9w X51ldQ+Oww/zOSh1l85ZPid0xyEtPBjmf8jc9Jui1XwneW9j/4M2FzLRvWmbyf2v 9EtxBXVVeWTCgEi2gthXqNGpfGMQJh+xaXpghnYzPPoYn3LI05M= =57HD -----END PGP SIGNATURE----- Merge tag 'media/v6.5-3' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media Pull media fixes from Mauro Carvalho Chehab: "Three driver fixes" * tag 'media/v6.5-3' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media: media: imx: imx7-media-csi: Fix applying format constraints media: uvcvideo: Fix menu count handling for userspace XU mappings media: mtk-jpeg: Set platform driver data earlier	2023-08-19 13:13:55 +02:00
Linus Torvalds	bf98bae3d8	- Use LEA ...%rsp instead of ADD %rsp in the Zen1/2 SRSO return sequence as latter clobbers flags which interferes with fastop emulation in KVM, leading to guests freezing during boot - A fix for the DIV(0) quotient data leak on Zen1 to clear the divider buffers at the right time - Disable the SRSO mitigation on unaffected configurations as it got enabled there unnecessarily - Change .text section name to fix CONFIG_LTO_CLANG builds - Improve the optprobe indirect jmp check so that certain configurations can still be able to use optprobes at all - A serious and good scrubbing of the untraining routines by PeterZ: - Add proper speculation stopping traps so that objtool is happy - Adjust objtool to handle the new thunks - Make the thunk pointer assignable to the different untraining sequences at runtime, thus avoiding the alternative at the return thunk. It simplifies the code a bit too. - Add a entry_untrain_ret() main entry point which selects the respective untraining sequence - Rename things so that they're more clear - Fix stack validation with FRAME_POINTER=y builds - Fix static call patching to handle when a JMP to the return thunk is the last insn on the very last module memory page - Add more documentation about what each untraining routine does and why -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmTge4wACgkQEsHwGGHe VUpgwRAAgP1dAq4c/DuLQh/+Mao/pM+EiNxwoDTNJ27ZoRfXG5vLXF3++TRkmFKB ua+jEhkNTAH1xyF+um4exjUD2UC62UfNo4wBZPjl+jVmguHqpsNOsZj7M3+GRD+3 vRWspaOnNPKOIVdtvftaS6J3YavFUolwZSRC9HCFQiriX5zV4BlMZEJxkWw6LNW6 LeJt4qmbDXCIzmCRqEmtNBOhuWuMvhwWg9G1Aq4MLcHf+gHSEGNnY8Otl7YPPeqr ys9vE5hQ3NiUmBkGnhw+Mj3gGFCL2fzWF0XqY8VCTPcYTVRFen7BmelhJVm7RhAr wpXdyCU+bV4qrn2uRpBSbzH/DfxfQA2xbRtBR+L7x5ZbHamFyi17fN94AQv2WUXz 7TUdooWPuJLPQ2CHAgSChTEF/CZBl6pYHEorHkzA1GqV0omMT7hg8GEHn17JGI5v FDPGYHuznsu59DhGNh7Wx4hLO10slvkSHly+se7eCaDr1hDIpJtiZLxn6n+SphZh qzYc+Pxa3UcgNSxqqfOBqDWQQNdoYqx1ONao8nWgjj+/y0eIEf27uqIDT/o5tb7E YejDq7xO00CartGm2g/0S0OvDvRTWbU0LoGMKNxo/HTD+goM8pa7vdE77g5NNSCy wG0BnFWni53p66JJzzxxgPG39OYu9NR6ilcOTYT9jlPT3ZMySYg= =ndko -----END PGP SIGNATURE----- Merge tag 'x86_urgent_for_v6.5_rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Borislav Petkov: "Extraordinary embargoed times call for extraordinary measures. That's why this week's x86/urgent branch is larger than usual, containing all the known fallout fixes after the SRSO mitigation got merged. I know, it is a bit late in the game but everyone who has reported a bug stemming from the SRSO pile, has tested that branch and has confirmed that it fixes their bug. Also, I've run it on every possible hardware I have and it is looking good. It is running on this very machine while I'm typing, for 2 days now without an issue. Famous last words... - Use LEA ...%rsp instead of ADD %rsp in the Zen1/2 SRSO return sequence as latter clobbers flags which interferes with fastop emulation in KVM, leading to guests freezing during boot - A fix for the DIV(0) quotient data leak on Zen1 to clear the divider buffers at the right time - Disable the SRSO mitigation on unaffected configurations as it got enabled there unnecessarily - Change .text section name to fix CONFIG_LTO_CLANG builds - Improve the optprobe indirect jmp check so that certain configurations can still be able to use optprobes at all - A serious and good scrubbing of the untraining routines by PeterZ: - Add proper speculation stopping traps so that objtool is happy - Adjust objtool to handle the new thunks - Make the thunk pointer assignable to the different untraining sequences at runtime, thus avoiding the alternative at the return thunk. It simplifies the code a bit too. - Add a entry_untrain_ret() main entry point which selects the respective untraining sequence - Rename things so that they're more clear - Fix stack validation with FRAME_POINTER=y builds - Fix static call patching to handle when a JMP to the return thunk is the last insn on the very last module memory page - Add more documentation about what each untraining routine does and why" * tag 'x86_urgent_for_v6.5_rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/srso: Correct the mitigation status when SMT is disabled x86/static_call: Fix __static_call_fixup() objtool/x86: Fixup frame-pointer vs rethunk x86/srso: Explain the untraining sequences a bit more x86/cpu/kvm: Provide UNTRAIN_RET_VM x86/cpu: Cleanup the untrain mess x86/cpu: Rename srso_(.*)_alias to srso_alias_\1 x86/cpu: Rename original retbleed methods x86/cpu: Clean up SRSO return thunk mess x86/alternative: Make custom return thunk unconditional objtool/x86: Fix SRSO mess x86/cpu: Fix up srso_safe_ret() and __x86_return_thunk() x86/cpu: Fix __x86_return_thunk symbol type x86/retpoline,kprobes: Skip optprobe check for indirect jumps with retpolines and IBT x86/retpoline,kprobes: Fix position of thunk sections with CONFIG_LTO_CLANG x86/srso: Disable the mitigation on unaffected configurations x86/CPU/AMD: Fix the DIV(0) initial fix attempt x86/retpoline: Don't clobber RFLAGS during srso_safe_ret()	2023-08-19 10:46:02 +02:00

1 2 3 4 5 ...

1201805 Commits