mirror of
https://mirrors.bfsu.edu.cn/git/linux.git
synced 2024-11-11 21:38:32 +08:00
8834147f95
74362 Commits
Author | SHA1 | Message | Date | |
---|---|---|---|---|
Linus Torvalds
|
8834147f95 |
fscache rewrite
-----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEqG5UsNXhtOCrfGQP+7dXa6fLC2sFAmHeBGsACgkQ+7dXa6fL C2tyLw/8C2Gs/XvOZvRO7KPetKI9BbQSFoCe7uvGbiPq5CEmgcjWzQxvQGklBiZD qYa6pMNye1iGpsHOY3Yu210b7vMQiRLnnxvVle0UrjpZR7CcxYS0gGV+6yRdbDGy W1X6GFiX06qiNsgBH4msYp0SmbhhfkTyAx1BeBZAEtX8iFgaPfOldPY2nLMcTDD6 6FT1nTzRcMHx9IUQZJtpeatzc70Qg8+fOr2UAY2nOIypXh6+vAMBO80xtUjGVU+1 pWD1E+8cXSLfwEEzquFWoWTsTX7hNfsesEN10FmBf1bVCH9ZDFE01MOl6B8+CkFl +xfkvDNFC3yyUwAMVAV4+A4Be+cVLSqN2R91QIKJnAj9w1OjxASrwZJ1YeZp6KP4 h0XKuPs3sRwwbNPVL/nP0UPNexoJnOUAaHesl4uKkRrExmxz9xGOIqIri2+tUIO+ HkGyNns1huymj1K1ja4AQbDiZZX39GgYVleyg9g3uuy1FS4k+/myJcXo/CqWn3ON 4oeNwxwLvlcqIQnPrESvwev50lFZYB4pfwvez6T2C5dL/Wk/xdeJK9iG81RWgx7y 5XcDeoGDE08gMCGWVPjuhOCXypeiRGHhRNlcxTtq5kLwBZGkcYg/wFFnWn+6hzc4 kyXw2kS5WZq4Q/FPh7BdY0eHp6xv0EpAOZwceneLB9lhNINdxcQ= =ISJ6 -----END PGP SIGNATURE----- Merge tag 'fscache-rewrite-20220111' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs Pull fscache rewrite from David Howells: "This is a set of patches that rewrites the fscache driver and the cachefiles driver, significantly simplifying the code compared to what's upstream, removing the complex operation scheduling and object state machine in favour of something much smaller and simpler. The series is structured such that the first few patches disable fscache use by the network filesystems using it, remove the cachefiles driver entirely and as much of the fscache driver as can be got away with without causing build failures in the network filesystems. The patches after that recreate fscache and then cachefiles, attempting to add the pieces in a logical order. Finally, the filesystems are reenabled and then the very last patch changes the documentation. [!] Note: I have dropped the cifs patch for the moment, leaving local caching in cifs disabled. I've been having trouble getting that working. I think I have it done, but it needs more testing (there seem to be some test failures occurring with v5.16 also from xfstests), so I propose deferring that patch to the end of the merge window. WHY REWRITE? ============ Fscache's operation scheduling API was intended to handle sequencing of cache operations, which were all required (where possible) to run asynchronously in parallel with the operations being done by the network filesystem, whilst allowing the cache to be brought online and offline and to interrupt service for invalidation. With the advent of the tmpfile capacity in the VFS, however, an opportunity arises to do invalidation much more simply, without having to wait for I/O that's actually in progress: Cachefiles can simply create a tmpfile, cut over the file pointer for the backing object attached to a cookie and abandon the in-progress I/O, dismissing it upon completion. Future work here would involve using Omar Sandoval's vfs_link() with AT_LINK_REPLACE[1] to allow an extant file to be displaced by a new hard link from a tmpfile as currently I have to unlink the old file first. These patches can also simplify the object state handling as I/O operations to the cache don't all have to be brought to a stop in order to invalidate a file. To that end, and with an eye on to writing a new backing cache model in the future, I've taken the opportunity to simplify the indexing structure. I've separated the index cookie concept from the file cookie concept by C type now. The former is now called a "volume cookie" (struct fscache_volume) and there is a container of file cookies. There are then just the two levels. All the index cookie levels are collapsed into a single volume cookie, and this has a single printable string as a key. For instance, an AFS volume would have a key of something like "afs,example.com,1000555", combining the filesystem name, cell name and volume ID. This is freeform, but must not have '/' chars in it. I've also eliminated all pointers back from fscache into the network filesystem. This required the duplication of a little bit of data in the cookie (cookie key, coherency data and file size), but it's not actually that much. This gets rid of problems with making sure we keep netfs data structures around so that the cache can access them. These patches mean that most of the code that was in the drivers before is simply gone and those drivers are now almost entirely new code. That being the case, there doesn't seem any particular reason to try and maintain bisectability across it. Further, there has to be a point in the middle where things are cut over as there's a single point everything has to go through (ie. /dev/cachefiles) and it can't be in use by two drivers at once. ISSUES YET OUTSTANDING ====================== There are some issues still outstanding, unaddressed by this patchset, that will need fixing in future patchsets, but that don't stop this series from being usable: (1) The cachefiles driver needs to stop using the backing filesystem's metadata to store information about what parts of the cache are populated. This is not reliable with modern extent-based filesystems. Fixing this is deferred to a separate patchset as it involves negotiation with the network filesystem and the VM as to how much data to download to fulfil a read - which brings me on to (2)... (2) NFS (and CIFS with the dropped patch) do not take account of how the cache would like I/O to be structured to meet its granularity requirements. Previously, the cache used page granularity, which was fine as the network filesystems also dealt in page granularity, and the backing filesystem (ext4, xfs or whatever) did whatever it did out of sight. However, we now have folios to deal with and the cache will now have to store its own metadata to track its contents. The change I'm looking at making for cachefiles is to store content bitmaps in one or more xattrs and making a bit in the map correspond to something like a 256KiB block. However, the size of an xattr and the fact that they have to be read/updated in one go means that I'm looking at covering 1GiB of data per 512-byte map and storing each map in an xattr. Cachefiles has the potential to grow into a fully fledged filesystem of its very own if I'm not careful. However, I'm also looking at changing things even more radically and going to a different model of how the cache is arranged and managed - one that's more akin to the way, say, openafs does things - which brings me on to (3)... (3) The way cachefilesd does culling is very inefficient for large caches and it would be better to move it into the kernel if I can as cachefilesd has to keep asking the kernel if it can cull a file. Changing the way the backend works would allow this to be addressed. BITS THAT MAY BE CONTROVERSIAL ============================== There are some bits I've added that may be controversial: (1) I've provided a flag, S_KERNEL_FILE, that cachefiles uses to check if a files is already being used by some other kernel service (e.g. a duplicate cachefiles cache in the same directory) and reject it if it is. This isn't entirely necessary, but it helps prevent accidental data corruption. I don't want to use S_SWAPFILE as that has other effects, but quite possibly swapon() should set S_KERNEL_FILE too. Note that it doesn't prevent userspace from interfering, though perhaps it should. (I have made it prevent a marked directory from being rmdir-able). (2) Cachefiles wants to keep the backing file for a cookie open whilst we might need to write to it from network filesystem writeback. The problem is that the network filesystem unuses its cookie when its file is closed, and so we have nothing pinning the cachefiles file open and it will get closed automatically after a short time to avoid EMFILE/ENFILE problems. Reopening the cache file, however, is a problem if this is being done due to writeback triggered by exit(). Some filesystems will oops if we try to open a file in that context because they want to access current->fs or suchlike. To get around this, I added the following: (A) An inode flag, I_PINNING_FSCACHE_WB, to be set on a network filesystem inode to indicate that we have a usage count on the cookie caching that inode. (B) A flag in struct writeback_control, unpinned_fscache_wb, that is set when __writeback_single_inode() clears the last dirty page from i_pages - at which point it clears I_PINNING_FSCACHE_WB and sets this flag. This has to be done here so that clearing I_PINNING_FSCACHE_WB can be done atomically with the check of PAGECACHE_TAG_DIRTY that clears I_DIRTY_PAGES. (C) A function, fscache_set_page_dirty(), which if it is not set, sets I_PINNING_FSCACHE_WB and calls fscache_use_cookie() to pin the cache resources. (D) A function, fscache_unpin_writeback(), to be called by ->write_inode() to unuse the cookie. (E) A function, fscache_clear_inode_writeback(), to be called when the inode is evicted, before clear_inode() is called. This cleans up any lingering I_PINNING_FSCACHE_WB. The network filesystem can then use these tools to make sure that fscache_write_to_cache() can write locally modified data to the cache as well as to the server. For the future, I'm working on write helpers for netfs lib that should allow this facility to be removed by keeping track of the dirty regions separately - but that's incomplete at the moment and is also going to be affected by folios, one way or another, since it deals with pages" Link: https://lore.kernel.org/all/510611.1641942444@warthog.procyon.org.uk/ Tested-by: Dominique Martinet <asmadeus@codewreck.org> # 9p Tested-by: kafs-testing@auristor.com # afs Tested-by: Jeff Layton <jlayton@kernel.org> # ceph Tested-by: Dave Wysochanski <dwysocha@redhat.com> # nfs Tested-by: Daire Byrne <daire@dneg.com> # nfs * tag 'fscache-rewrite-20220111' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs: (67 commits) 9p, afs, ceph, nfs: Use current_is_kswapd() rather than gfpflags_allow_blocking() fscache: Add a tracepoint for cookie use/unuse fscache: Rewrite documentation ceph: add fscache writeback support ceph: conversion to new fscache API nfs: Implement cache I/O by accessing the cache directly nfs: Convert to new fscache volume/cookie API 9p: Copy local writes to the cache when writing to the server 9p: Use fscache indexing rewrite and reenable caching afs: Skip truncation on the server of data we haven't written yet afs: Copy local writes to the cache when writing to the server afs: Convert afs to use the new fscache API fscache, cachefiles: Display stat of culling events fscache, cachefiles: Display stats of no-space events cachefiles: Allow cachefiles to actually function fscache, cachefiles: Store the volume coherency data cachefiles: Implement the I/O routines cachefiles: Implement cookie resize for truncate cachefiles: Implement begin and end I/O operation cachefiles: Implement backing file wrangling ... |
||
Linus Torvalds
|
8975f89748 |
fuse update for 5.17
-----BEGIN PGP SIGNATURE----- iHUEABYIAB0WIQSQHSd0lITzzeNWNm3h3BK/laaZPAUCYdw8xgAKCRDh3BK/laaZ PFy3AQCHSltzy6f234CcsFk3mtJn0im0tDbRoEYFD731JOR1YAD9HQKtJRn/sMCF r0PnZnOWJ35RWB3o8uEqptsgrZXJkQo= =HosR -----END PGP SIGNATURE----- Merge tag 'fuse-update-5.17' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse Pull fuse updates from Miklos Szeredi: - Fix a regression introduced in 5.15 - Extend the size of the FUSE_INIT request to accommodate for more flags. There's a slight possibility of a regression for obscure fuse servers; if this happens, then more complexity will need to be added to the protocol - Allow the DAX property to be controlled by the server on a per-inode basis in virtiofs - Allow sending security context to the server when creating a file or directory * tag 'fuse-update-5.17' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse: Documentation/filesystem/dax: DAX on virtiofs fuse: mark inode DONT_CACHE when per inode DAX hint changes fuse: negotiate per inode DAX in FUSE_INIT fuse: enable per inode DAX fuse: support per inode DAX in fuse protocol fuse: make DAX mount option a tri-state fuse: add fuse_should_enable_dax() helper fuse: Pass correct lend value to filemap_write_and_wait_range() fuse: send security context of inode on file fuse: extend init flags |
||
Linus Torvalds
|
1fb38c934c |
\n
-----BEGIN PGP SIGNATURE----- iQEzBAABCAAdFiEEq1nRK9aeMoq1VSgcnJ2qBz9kQNkFAmHek9YACgkQnJ2qBz9k QNmgUQf+NpBuhoBI83dWI7NvwMZ1srtOiFdtVctq4jT8rS5m3KnwaTbq9HZ2y+WL 6+Tp8B2Qs/C2X6DbX87N1RBaP6mpMQCY4ADfbYNV6KOhUQqGBo/zxyM4nAk0WOBR wXFWNhd/c3Xr3eb5Ggaus11UdQxXBFtZ72Azm2eUXnIKuSeAH1KWbyElQGzLqMDE NLS68XFOcxLLScWA0lTExtouV6ZLUHm6EyYRH+TmIjAwb01JDwceXmdqzXNO7uiV cKalWV8GcUH2OOLdX/KXvAwDVwuJwq4PCGCajGPEI8Ebct5Eh0YZuHVcvmfReOde dnyZNiMOH8GwYVDHUDCNI8zD/BpyJQ== =4gnz -----END PGP SIGNATURE----- Merge tag 'fs_for_v5.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs Pull UDF / reiserfs updates from Jan Kara: "One UDF fix and one reiserfs cleanup" * tag 'fs_for_v5.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs: udf: Fix error handling in udf_new_inode() reiserfs: don't use congestion_wait() |
||
Linus Torvalds
|
3d3d673306 |
\n
-----BEGIN PGP SIGNATURE----- iQEzBAABCAAdFiEEq1nRK9aeMoq1VSgcnJ2qBz9kQNkFAmHektkACgkQnJ2qBz9k QNk73Qf9G/8C29GC7lTZvW8ZOqM2onvP16/wpMU98jkw6/xpsdGLMmQlP4JCdB6Y mTttvPxwMMVrdRmt+MNIgvWPOK2LKHdkbNbj0DRw4JxLEIm+zXddAXAcvX2vLeUn N24idiVQxIKWS3zJhQ6H7sFjSrZ7ztjkeHWff70LIzURIupYO0+rRmKDgOqo6D5g cKAdb8x9ZQLr0CTQoHKQxA8IeCSirSblVaDMdes2KrJH+i05s3aqS9jBvY16nrLo Xvm4Vv7ecQXcmKDswuxVcmBVlVVsWxoYv1w8+v9qRGJnbgpQkjjgCGjCh54M18Do /OovTVlqGWRfmf2Eo1FCXXz90ppGlQ== =gvMh -----END PGP SIGNATURE----- Merge tag 'fsnotify_for_v5.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs Pull fanotify updates from Jan Kara: "Support for new FAN_RENAME fanotify event and support for reporting child info in directory fanotify events (FAN_REPORT_TARGET_FID)" * tag 'fsnotify_for_v5.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs: fanotify: wire up FAN_RENAME event fanotify: report old and/or new parent+name in FAN_RENAME event fanotify: record either old name new name or both for FAN_RENAME fanotify: record old and new parent and name in FAN_RENAME event fanotify: support secondary dir fh and name in fanotify_info fanotify: use helpers to parcel fanotify_info buffer fanotify: use macros to get the offset to fanotify_info buffer fsnotify: generate FS_RENAME event with rich information fanotify: introduce group flag FAN_REPORT_TARGET_FID fsnotify: separate mark iterator type from object type enum fsnotify: clarify object type argument |
||
Linus Torvalds
|
f079ab01b5 |
Convert xfs/iomap to use folios
This should be all that is needed for XFS to use large folios. There is no code in this pull request to create large folios, but no additional changes should be needed to XFS or iomap once they are created. -----BEGIN PGP SIGNATURE----- iQEzBAABCgAdFiEEejHryeLBw/spnjHrDpNsjXcpgj4FAmHcpaUACgkQDpNsjXcp gj4MUAf+ItcKfgFo1QCMT+6Y0mohVqPme/vdyOCNv6yOOfZZqN5ZQc+2hmxXrRz9 XPOPwZKL0TttlHSYEJmrm8mqwN8UXl0kqMu4kQqOXMziiD9qpVlaLXOZ7iLdkQxu z/xe1iACcGfJUaQCsaMP6BZqp6iETA4qP72dBE4jc6PC4H3OI0pN/900gEbAcLxD Yn0a5NhrdS/EySU2aHLB6OcwhqnSiHBVjUbFiuXxuvOYyzLaERIh00Kx3jLdj4DR 82K4TF8h2IZpALfIDSt0JG+gHLCc+EfF7Yd/xkeEv0md3ncyi+jWvFCFPNJbyFjm cYoDTSunfbxwszA2n01R4JM8/KkGwA== =IeFX -----END PGP SIGNATURE----- Merge tag 'iomap-5.17' of git://git.infradead.org/users/willy/linux Pull iomap updates from Matthew Wilcox: "Convert xfs/iomap to use folios. This should be all that is needed for XFS to use large folios. There is no code in this pull request to create large folios, but no additional changes should be needed to XFS or iomap once they are created. Usually this would have come from Darrick, and we had intended that it would come that route. Between the holidays and various things which Darrick needed to work on, he asked if I could send things directly. There weren't any other iomap patches pending for this release, which probably also played a role" * tag 'iomap-5.17' of git://git.infradead.org/users/willy/linux: (26 commits) iomap: Inline __iomap_zero_iter into its caller xfs: Support large folios iomap: Support large folios in invalidatepage iomap: Convert iomap_migrate_page() to use folios iomap: Convert iomap_add_to_ioend() to take a folio iomap: Simplify iomap_do_writepage() iomap: Simplify iomap_writepage_map() iomap,xfs: Convert ->discard_page to ->discard_folio iomap: Convert iomap_write_end_inline to take a folio iomap: Convert iomap_write_begin() and iomap_write_end() to folios iomap: Convert __iomap_zero_iter to use a folio iomap: Allow iomap_write_begin() to be called with the full length iomap: Convert iomap_page_mkwrite to use a folio iomap: Convert readahead and readpage to use a folio iomap: Convert iomap_read_inline_data to take a folio iomap: Use folio offsets instead of page offsets iomap: Convert bio completions to use folios iomap: Pass the iomap_page into iomap_set_range_uptodate iomap: Add iomap_invalidate_folio iomap: Convert iomap_releasepage to use a folio ... |
||
Linus Torvalds
|
6020c204be |
Convert much of the page cache to use folios
This patchset stops just short of actually enabling large folios. It converts everything that I noticed needs to be converted, but there may still be places I've overlooked which still have page size assumptions. The big change here is using large entries in the page cache XArray instead of many small entries. That only affects shmem for now, but it's a pretty big change for shmem since it changes where memory needs to be allocated (at split time instead of insertion). -----BEGIN PGP SIGNATURE----- iQEzBAABCgAdFiEEejHryeLBw/spnjHrDpNsjXcpgj4FAmHcraoACgkQDpNsjXcp gj7C3wgAl0cjtdVzTpkLmbnInsicW1m3thnbkSXYbpqRccFjpu2kEBGj31PT+oGz dzgXP7SNZ/VkFT+qWtmHSRF/J41B6f9bFojO81B2aQdpRiziU+5QbSbXbfUjwVhE GJF0WGSJtVqySKynXP/iYTEt2zj6BiVperAwIqzhZpPY7gNoyDgeRD34Xy5bQqdD ey6/Uwkh7oFHLEDcgxsEnyF0tUR3q+gpe5XZW1fb79p3crWw44xATc3UvKv8qCLC Rd4oHmKkOj4MvdiUxJEfXI+XxgrkQ8XRO70B+p6ZljhDaoDZYw7ullxA0gvlSpNX 6pnjSQlKA1VQXsi6PMSt+9vf26XxaQ== =KeYZ -----END PGP SIGNATURE----- Merge tag 'folio-5.17' of git://git.infradead.org/users/willy/pagecache Pull folio conversion updates from Matthew Wilcox: "Convert much of the page cache to use folios This stops just short of actually enabling large folios. It converts everything that I noticed needs to be converted, but there may still be places I've overlooked which still have page size assumptions. The big change here is using large entries in the page cache XArray instead of many small entries. That only affects shmem for now, but it's a pretty big change for shmem since it changes where memory needs to be allocated (at split time instead of insertion)" * tag 'folio-5.17' of git://git.infradead.org/users/willy/pagecache: (49 commits) mm: Use multi-index entries in the page cache XArray: Add xas_advance() truncate,shmem: Handle truncates that split large folios truncate: Convert invalidate_inode_pages2_range to folios fs: Convert vfs_dedupe_file_range_compare to folios mm: Remove pagevec_remove_exceptionals() mm: Convert find_lock_entries() to use a folio_batch filemap: Return only folios from find_get_entries() filemap: Convert filemap_get_read_batch() to use a folio_batch filemap: Convert filemap_read() to use a folio truncate: Add invalidate_complete_folio2() truncate: Convert invalidate_inode_pages2_range() to use a folio truncate: Skip known-truncated indices truncate,shmem: Add truncate_inode_folio() shmem: Convert part of shmem_undo_range() to use a folio mm: Add unmap_mapping_folio() truncate: Add truncate_cleanup_folio() filemap: Add filemap_release_folio() filemap: Use a folio in filemap_page_mkwrite filemap: Use a folio in filemap_map_pages ... |
||
Linus Torvalds
|
6dc69d3d0d |
driver core changes for 5.17-rc1
Here is the set of changes for the driver core for 5.17-rc1. Lots of little things here, including: - kobj_type cleanups - auxiliary_bus documentation updates - auxiliary_device conversions for some drivers (relevant subsystems all have provided acks for these) - kernfs lock contention reduction for some workloads - other tiny cleanups and changes. All of these have been in linux-next for a while with no reported issues. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> -----BEGIN PGP SIGNATURE----- iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCYd7deA8cZ3JlZ0Brcm9h aC5jb20ACgkQMUfUDdst+ym8ngCgw0ANwrRPE5b1dthEmfU2f8Knk5kAn0pHQv6R VRZJypgNfU/Pt0ykstZD =CO9J -----END PGP SIGNATURE----- Merge tag 'driver-core-5.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core Pull driver core updates from Greg KH: "Here is the set of changes for the driver core for 5.17-rc1. Lots of little things here, including: - kobj_type cleanups - auxiliary_bus documentation updates - auxiliary_device conversions for some drivers (relevant subsystems all have provided acks for these) - kernfs lock contention reduction for some workloads - other tiny cleanups and changes. All of these have been in linux-next for a while with no reported issues" * tag 'driver-core-5.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (43 commits) kobject documentation: remove default_attrs information drivers/firmware: Add missing platform_device_put() in sysfb_create_simplefb debugfs: lockdown: Allow reading debugfs files that are not world readable driver core: Make bus notifiers in right order in really_probe() driver core: Move driver_sysfs_remove() after driver_sysfs_add() firmware: edd: remove empty default_attrs array firmware: dmi-sysfs: use default_groups in kobj_type qemu_fw_cfg: use default_groups in kobj_type firmware: memmap: use default_groups in kobj_type sh: sq: use default_groups in kobj_type headers/uninline: Uninline single-use function: kobject_has_children() devtmpfs: mount with noexec and nosuid driver core: Simplify async probe test code by using ktime_ms_delta() nilfs2: use default_groups in kobj_type kobject: remove kset from struct kset_uevent_ops callbacks driver core: make kobj_type constant. driver core: platform: document registration-failure requirement vdpa/mlx5: Use auxiliary_device driver data helpers net/mlx5e: Use auxiliary_device driver data helpers soundwire: intel: Use auxiliary_device driver data helpers ... |
||
Linus Torvalds
|
d3c8108035 |
for-5.17/block-2022-01-11
-----BEGIN PGP SIGNATURE----- iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmHd8DAQHGF4Ym9lQGtl cm5lbC5kawAKCRD301j7KXHgpnhRD/wMAjsNO65PCA+o/bPpVi4ulx9EejAzrJnB 5vHFvREAoOOGKvRpYGe4w3TcKyW+zPb+GtlXFjPfK+wuVzWhrQtW/+vkjKlBt8wK o7rzeMwTKJ9ZGvYaaQpp1yC0WURBB3qnCRQhb8dOQzhJgEXinhIOznZsut4mniLv fTqcDmKAb/+G6K6CQCCqnH0I/+OJZyUeSFo1kk2i4ZqCBepQpBkOL6H2rBOtGxUg bt1jiGHbbhCRYEE3u2kV0HP10qAChNaMQC705jV4Qpf4+3EntSxs+6nSb74dvMkX 3+Wmp8Ctq6lpPnDL1nrAFGz3jZnB0Y+GdgOclQn3ViQd1FCXZzuYWQ3fTaBfURCZ /RE5nc047SqpwCFLOynM++OkaeQZ1zSxeyoFTtzDaPF4tLuaX3JHswvTzNGPw8SN BnexseNnNBCjJliZSEE7fOkjJDcev2dvRxPtI8/wkF4lHUgETc5IW563C53xo/Tx 32yFjZwCVIpNWk21su/0H3iEq80wZ7PnriiN/E3JA6XbnevlRPu0NPMb0D258GCm yCcdPVDNZsQCB8hluqZcu0g6LSgZRo90Yg1oqKqEpAllJJMBaEAPPPuUIJh998mo iKGxZzgr7d9jrbGJTInp0F8b3B3/oV/hxgzy0Hu/mHP3AsnaAk9o/oEQZ7rX4Khr 6biloqkIMA== =RWnJ -----END PGP SIGNATURE----- Merge tag 'for-5.17/block-2022-01-11' of git://git.kernel.dk/linux-block Pull block updates from Jens Axboe: - Unify where the struct request handling code is located in the blk-mq code (Christoph) - Header cleanups (Christoph) - Clean up the io_context handling code (Christoph, me) - Get rid of ->rq_disk in struct request (Christoph) - Error handling fix for add_disk() (Christoph) - request allocation cleanusp (Christoph) - Documentation updates (Eric, Matthew) - Remove trivial crypto unregister helper (Eric) - Reduce shared tag overhead (John) - Reduce poll_stats memory overhead (me) - Known indirect function call for dio (me) - Use atomic references for struct request (me) - Support request list issue for block and NVMe (me) - Improve queue dispatch pinning (Ming) - Improve the direct list issue code (Keith) - BFQ improvements (Jan) - Direct completion helper and use it in mmc block (Sebastian) - Use raw spinlock for the blktrace code (Wander) - fsync error handling fix (Ye) - Various fixes and cleanups (Lukas, Randy, Yang, Tetsuo, Ming, me) * tag 'for-5.17/block-2022-01-11' of git://git.kernel.dk/linux-block: (132 commits) MAINTAINERS: add entries for block layer documentation docs: block: remove queue-sysfs.rst docs: sysfs-block: document virt_boundary_mask docs: sysfs-block: document stable_writes docs: sysfs-block: fill in missing documentation from queue-sysfs.rst docs: sysfs-block: add contact for nomerges docs: sysfs-block: sort alphabetically docs: sysfs-block: move to stable directory block: don't protect submit_bio_checks by q_usage_counter block: fix old-style declaration nvme-pci: fix queue_rqs list splitting block: introduce rq_list_move block: introduce rq_list_for_each_safe macro block: move rq_list macros to blk-mq.h block: drop needless assignment in set_task_ioprio() block: remove unnecessary trailing '\' bio.h: fix kernel-doc warnings block: check minor range in device_add_disk() block: use "unsigned long" for blk_validate_block_size(). block: fix error unwinding in device_add_disk ... |
||
Linus Torvalds
|
42a7b4ed45 |
for-5.17/io_uring-2022-01-11
-----BEGIN PGP SIGNATURE----- iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmHd8BkQHGF4Ym9lQGtl cm5lbC5kawAKCRD301j7KXHgpkgED/4vyoNlKfuTqRzuHAZzq+PBYlMpVbn54ebm z2AJ2StLrZ8rflz/9ERNVYxbSS6ELf/8H2Mdjnb5uCoRza/QkMf4JnGtgT0pBavV uoruKTiYgQAQJmGuhxAo0TQwyW7F6qadp1eZH8CqxB2Cwj5wgSb3SB/yIlNkj6+2 fSzVGCh0CljhrkgFG10z507VGJYmgBqE9kl2tnfYkAA4blqyowZN4boTMx3l9fm5 2GWt0xg80lgMCzHb5m/sbqdOrMJV77TkuiAuVc+FjIfdGiVWo7XCn3q3e0qqJhRM A3x9mmUyF/xNrRLMgTyYxHzKLrytBibr+VxvDLr/LmLtV8viVCsgJiP5EIiAWIAt BCBks31QUxHaB9fyiQ43/nsRZLHmhXErLEK4D0loAXa50p4Xl4ZdaegD6txSH3FI mGprv4Si1kMNqSxmzrOUfFk8Xdv/vC08RWL4UbRa8xkgwWUAIhpRaYmAtmw913kb MgwFQuGpE9b7ae7HNdgWzyI424srCwY5UawgpzI25ZwGAVXTXDQ2qw1Lmyyo6swT bsYVyH/vJLvCS1tjmBtrKfJQ0Mokm1sJpaMeTs/SfSyHmAUXEsEFdXVu6bRnVkYF 9vjHZKOl5jA5nx6/JDpW993GnHV3FGkCDfENXs3wUYY7Hu0DDYZt0CGiqGLjC7Oq Ow4q3aEJQA== =syIB -----END PGP SIGNATURE----- Merge tag 'for-5.17/io_uring-2022-01-11' of git://git.kernel.dk/linux-block Pull io_uring updates from Jens Axboe: - Support for prioritized work completions (Hao) - Simplification of reissue (Pavel) - Add support for CQE skip (Pavel) - Memory leak fix going to 5.15-stable (Pavel) - Re-write of internal poll. This both cleans up that code, and gets us ready to fix the POLLFREE issue (Pavel) - Various cleanups (GuoYong, Pavel, Hao) * tag 'for-5.17/io_uring-2022-01-11' of git://git.kernel.dk/linux-block: (31 commits) io_uring: fix not released cached task refs io_uring: remove redundant tab space io_uring: remove unused function parameter io_uring: use completion batching for poll rem/upd io_uring: single shot poll removal optimisation io_uring: poll rework io_uring: kill poll linking optimisation io_uring: move common poll bits io_uring: refactor poll update io_uring: remove double poll on poll update io_uring: code clean for some ctx usage io_uring: batch completion in prior_task_list io_uring: split io_req_complete_post() and add a helper io_uring: add helper for task work execution code io_uring: add a priority tw list for irq completion work io-wq: add helper to merge two wq_lists io_uring: reuse io_req_task_complete for timeouts io_uring: tweak iopoll CQE_SKIP event counting io_uring: simplify selected buf handling io_uring: move up io_put_kbuf() and io_put_rw_kbuf() ... |
||
Linus Torvalds
|
f692121142 |
This pull request contains the following changes for UML:
- set_fs removal - Devicetree support - Many cleanups from Al - Various virtio and build related fixes -----BEGIN PGP SIGNATURE----- iQJKBAABCAA0FiEEdgfidid8lnn52cLTZvlZhesYu8EFAmHbPpwWHHJpY2hhcmRA c2lnbWEtc3Rhci5hdAAKCRBm+VmF6xi7wXkcD/9UfDRFqvgtZIlacQ/0IvN24xeq +f4aXoXEVyVsCd02jv9pUk3IAezIQyJf3MGtNJ4D/UFXtYfjEYjK5kJpPDP6umaZ ZDnpTzn29HW2aGlgOxW9gU7a3Yze629QasIRP6x7Ht+Hk5eXrvRYrgcmKtw1mm04 SA5v5ZqP3P5r623fpsFiw4Dvl7l6MhDyFeyA2tabNnmv93HgB76PHDtV2Z+SWrC+ ubjlfBQc87QGHW+eTvce+0qw9APMoJpNFjNN4H8P/9VcDTvw+KL2JqQ02HSMWh4z HeHKsv6hbty+GskBhbaWDW7867fPJ3e08TFAAAjeEiBP/CDBwjOTSr3eOw1eHgzU xdAqC2Bz0e5G3shClmVEzzvcP6R2cgNZjeBze5m3wQ1NKHEddk6N9t5K+4NrOpgp gbNN5Q4FAVOBKeQsZWG81bJKGcu7SbShgiKjlxcaRpMyp6LwyD4naauGjmCzYsbf Pd4ilLO1Yocf7nFs2C4vWxE4iAZ6hfQtukerIxCQfb/Y2BaWT3bcWWYHFRFy6Lq+ hTFGnjf+Ro65QCoa1idaLaUdhwAGi6U9sjjL6G/JdQCCE3ftcXLVA9TJz9CNdMb5 98IGznxhczOZc7rHNXOF4km5+OUrU6N+C0WRp3yOoUWcI+Ms4PXHzqIwC5cde/V7 O/o9O1BAoBP6LE1pPg== =5J6E -----END PGP SIGNATURE----- Merge tag 'for-linus-5.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml Pull UML updates from Richard Weinberger: - set_fs removal - Devicetree support - Many cleanups from Al - Various virtio and build related fixes * tag 'for-linus-5.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml: (31 commits) um: virtio_uml: Allow probing from devicetree um: Add devicetree support um: Extract load file helper from initrd.c um: remove set_fs hostfs: Fix writeback of dirty pages um: Use swap() to make code cleaner um: header debriding - sigio.h um: header debriding - os.h um: header debriding - net_*.h um: header debriding - mem_user.h um: header debriding - activate_ipi() um: common-offsets.h debriding... um, x86: bury crypto_tfm_ctx_offset um: unexport handle_page_fault() um: remove a dangling extern of syscall_trace() um: kill unused cpu() uml/i386: missing include in barrier.h um: stop polluting the namespace with registers.h contents logic_io instance of iounmap() needs volatile on argument um: move amd64 variant of mmap(2) to arch/x86/um/syscalls_64.c ... |
||
Linus Torvalds
|
5672cdfba4 |
This pull request contains the following changes for JFFS2, UBI and UBIFS:
JFFS2: - Fix for a deadlock in jffs2_write_begin() UBI: - Fixes in comments UBIFS: - Expose error counters in sysfs - Many bugfixes found by Hulk Robot and others -----BEGIN PGP SIGNATURE----- iQJKBAABCAA0FiEEdgfidid8lnn52cLTZvlZhesYu8EFAmHbQUIWHHJpY2hhcmRA c2lnbWEtc3Rhci5hdAAKCRBm+VmF6xi7wQRhD/9dMymYBFc0pku9SPgQrbieCXTf O73rtO/MQTgIZR7y0VueosoUE+6P5672VKMLKt8Dp67tqoqI/LI8YiC0eYGQwklm U7Ui3ze3ClNoRoPWflFKpm7c9/5UthWpUdVIRQ0Q3U7qvUPHswlPMoEKzTdMZqXa zaOKGuUGO+4USBod4Jooo5I8KyTUezvjVQRii2xcjIy9LOgKNXImuGIX4QnLkH8Q GbzgSMtYsHvdo8pqE68sZJ8D7vgEpOHEKTqcVroqO3dO8cmnk41pI5hNMnYherN3 +2yPefRza0ukxowTquM+eJB7D+/Req1ZjpthRMGVtStmmhXxYwp+wX4JLWta6Tr2 A6mgd3j8IxU0BlhP+AAXwIcIPvYyPtGYcfY8tqjorErHHOESvk8Hhifmq49pB8qy UpIPMGSohZDrIIN4DIvb5tahVI0lYq7ozgPz9MJbvVfTI5Z63lntbfwGydsLCJLy VLuVl3tZaMg/Pwr0qIGERQbgAqTJ0nzeH68CumSTmQxOFR9J51H2jQZLJMkZOohn b7B6XkJ4H8xv6U2cFU9JjaaNJCJK6A3DcdCJKyzIYKwoWKPXjEGPmYEg2SgKFIhY jzAUYUlN+ZBI4UTZH/ua627pdgzjR/ho1Y6tqd1OetlGzZq1a6UnTmXifud8h/mX B3kRcodjNzbEm9dx2w== =rpfR -----END PGP SIGNATURE----- Merge tag 'for-linus-5.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/ubifs Pull JFFS2, UBI and UBIFS updates from Richard Weinberger: "JFFS2: - Fix for a deadlock in jffs2_write_begin() UBI: - Fixes in comments UBIFS: - Expose error counters in sysfs - Many bugfixes found by Hulk Robot and others" * tag 'for-linus-5.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/ubifs: jffs2: GC deadlock reading a page that is used in jffs2_write_begin() ubifs: read-only if LEB may always be taken in ubifs_garbage_collect ubifs: fix double return leb in ubifs_garbage_collect ubifs: fix slab-out-of-bounds in ubifs_change_lp ubifs: fix snprintf() length check ubifs: Document sysfs nodes ubifs: Export filesystem error counters ubifs: Error path in ubifs_remount_rw() seems to wrongly free write buffers ubifs: Make use of the helper macro kthread_run() ubi: Fix a mistake in comment ubifs: Fix spelling mistakes |
||
Linus Torvalds
|
3f67eaed57 |
dlm for 5.17
This set includes the normal collection of minor fixes and cleanups, new kmem caches for network messaging structs, a start on some basic tracepoints, and some new debugfs files for inserting test messages. -----BEGIN PGP SIGNATURE----- iQIcBAABAgAGBQJh3J8BAAoJEDgbc8f8gGmqC0IP/0xCyLKDLmv1KcR/SbUxiP9Y ZmiuF1FdxqRLPN5HfU1GU4734EbWEiIZFGXDk36b4kE3UpbeddUMA6dDf9sDOJ73 H5BYgNfSgzybIq1D3wZe4C3nH4YTSrqvZhAI2bD3mcNnUDx20LzUMaGqLkzJcsOa iFDVDlkNrXee/RPSuZDv/5U+L5YEC7WlKEHJ2u/INAFitakb0i1ChJGAgrJZADCw giznPhfkmWyesgfOxOQ7JcxcVQSedQ+7vtHYhZ5tZ5v1h2yM9LPYJ0bNpNKfHUiA 0gf0DIpOjt4q9ClYxCsaLeK0t32qWOjiNoZaAm+lrLvZWE75CC+WP6KrDR+2x3p5 JlgBZ24h3v7t/ldYmEo7SSZ/1lfb4+0fyk7UjPzko9ErhRFrAopAfl924bRLN3ES 9iSgDslBT5apivNwrByDKI9flhDpMnfWtSufnYeMzFAbHzJCWJuUS8bbi9M+K0mK ENjmXQu1OVOmht8WM7HOA91tIUcLhr/YD2fKtYNx054TErKKRjlRb/V+NHffyYJc SqqyaLsAamhMBcLrAk3a88hj6bp8N4NYwrMQMus0xv3aZ23OaGCvtrj7fXuYMlcu sjLOTQfFFDPN1B1WvI18TDdh/TIgP/8AC+KbHf4Y+hbntO0BgEzL1/CKqT3WqN8y hAnK+Avj6CILRvO8ardm =Cst4 -----END PGP SIGNATURE----- Merge tag 'dlm-5.17' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm Pull dlm updates from David Teigland: "This set includes the normal collection of minor fixes and cleanups, new kmem caches for network messaging structs, a start on some basic tracepoints, and some new debugfs files for inserting test messages" * tag 'dlm-5.17' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm: (32 commits) fs: dlm: print cluster addr if non-cluster node connects fs: dlm: memory cache for lowcomms hotpath fs: dlm: memory cache for writequeue_entry fs: dlm: memory cache for midcomms hotpath fs: dlm: remove wq_alloc mutex fs: dlm: use event based wait for pending remove fs: dlm: check for pending users filling buffers fs: dlm: use list_empty() to check last iteration fs: dlm: fix build with CONFIG_IPV6 disabled fs: dlm: replace use of socket sk_callback_lock with sock_lock fs: dlm: don't call kernel_getpeername() in error_report() fs: dlm: fix potential buffer overflow fs: dlm:Remove unneeded semicolon fs: dlm: remove double list_first_entry call fs: dlm: filter user dlm messages for kernel locks fs: dlm: add lkb waiters debugfs functionality fs: dlm: add lkb debugfs functionality fs: dlm: allow create lkb with specific id range fs: dlm: add debugfs rawmsg send functionality fs: dlm: let handle callback data as void ... |
||
Linus Torvalds
|
8481c323e4 |
Various minor gfs2 cleanups and fixes
-----BEGIN PGP SIGNATURE----- iQJIBAABCAAyFiEEJZs3krPW0xkhLMTc1b+f6wMTZToFAmHdubYUHGFncnVlbmJh QHJlZGhhdC5jb20ACgkQ1b+f6wMTZTpngxAAg+N74AZhPeUoPsvbVf4LKJ8eJRTY 4329sUjORwpy2g2wN1I53vOnpSjFQ0uWxaOwsuBTmTzM2lr9+J3w/a7Chx5q6M7D o8dHh4BAwKGmXoCqMxvSiBtFiD/cht7YjjGGb+Q6+qa8viLky3Oksps9SsaZqMCf RD+33fflp3eppPZTVgo9n+zuxbMz4wT9GpS/GNZy8GhKhPM+v5cPCw1pV0Wy1b7T 2FXsKBtk+odTzbfj88gQ+b2hoRMMQ2FLqIsqyJKajMuBwiBpuPzrm03J2B9N1OJ0 R0Ir0mXV627SoQ4N7v12MfmSF3+XMUTEMrdxMGbjsOz35TtspfOc5+mmveVF4DBD T4zsK9ju6N/4OtsetOB3MF1AV5Hj0zOvuFZ1y71Bk9RBAJvLrzbHc17KtMEYbVgQ XUQYY6IGYiVBr8n8x9t3LaPhOzYXwvgFwAmetZE8vYkLZQEQlgRVYJymFR8iluLo oIPy1ciZgyMRDCa1/OHAIV604qwTChRLzO6WnDhdwWPwF/Sx4cHnhNVyR/5cp1ub GG6E4o/jopeblftQHO5j+GIZAAwo8abU4ahJXUJaPQVdyY3VjOPhd+ooTrNbEEug qijL1fIojcalDyCG2cIsXlFtXSb9Txr3L1fxOkluwmwVdR9LHJSEeDxYvIN20L/+ yqnHctyg9+HNsIc= =icLi -----END PGP SIGNATURE----- Merge tag 'gfs2-v5.16-rc3-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2 Pull gfs2 updates from Andreas Gruenbacher: "Various minor gfs2 cleanups and fixes" * tag 'gfs2-v5.16-rc3-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2: gfs2: dump inode object for iopen glocks gfs2: Fix gfs2_instantiate description gfs2: Remove redundant check for GLF_INSTANTIATE_NEEDED gfs2: remove redundant set of INSTANTIATE_NEEDED gfs2: Fix __gfs2_holder_init function name in kernel-doc comment |
||
Linus Torvalds
|
1dbfae0113 |
Convert ext4 to use the new mount API, and add support for the
FS_IOC_GETFSLABEL and FS_IOC_SETFSLABEL ioctls. In addition the usual large number of clean ups and bug fixes, in particular for the fast_commit feature. -----BEGIN PGP SIGNATURE----- iQEzBAABCAAdFiEEK2m5VNv+CHkogTfJ8vlZVpUNgaMFAmHcr8sACgkQ8vlZVpUN gaOuMwgAgunXEni8yED1QXAp+F1nigxx8K6O6fpjzVjiUFlH/cgc3M5PEZiV7Pgj rIsNliCLQjDQPA7XxInvMuDXEvGRdleaqGS5MCKBR9898acaCTn0eBCK8ZKfhlIs tJus706kOjy8IXRgEJkCrxEdrsJ57rgqHk/bstipNVg4lnfYYSUKH8k0ob4aaoiW irufs1dSo7KmpwL0c/z9yIei3ST4DA2Dc/eBS0hsdgKtr/zMmwSanxqbQJTXz8ra FTZ23HAu/zxWZ9GM4p3DhuVaMWaulm6eEqJKO+loeU9xOeItn0WmSJW260KR0cmj 0nPAflMmpQVBhtO9gjXcSiORbd6/zg== =9HNa -----END PGP SIGNATURE----- Merge tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 Pull ext4 updates from Ted Ts'o: "Convert ext4 to use the new mount API, and add support for the FS_IOC_GETFSLABEL and FS_IOC_SETFSLABEL ioctls. In addition the usual large number of clean ups and bug fixes, in particular for the fast_commit feature" * tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (48 commits) ext4: don't use the orphan list when migrating an inode ext4: use BUG_ON instead of if condition followed by BUG ext4: fix a copy and paste typo ext4: set csum seed in tmp inode while migrating to extents ext4: remove unnecessary 'offset' assignment ext4: remove redundant o_start statement ext4: drop an always true check ext4: remove unused assignments ext4: remove redundant statement ext4: remove useless resetting io_end_size in mpage_process_page() ext4: allow to change s_last_trim_minblks via sysfs ext4: change s_last_trim_minblks type to unsigned long ext4: implement support for get/set fs label ext4: only set EXT4_MOUNT_QUOTA when journalled quota file is specified ext4: don't use kfree() on rcu protected pointer sbi->s_qf_names ext4: avoid trim error on fs with small groups ext4: fix an use-after-free issue about data=journal writeback mode ext4: fix null-ptr-deref in '__ext4_journal_ensure_credits' ext4: initialize err_blk before calling __ext4_get_inode_loc ext4: fix a possible ABBA deadlock due to busy PA ... |
||
Linus Torvalds
|
11fc88c2e4 |
New code for 5.17:
- Fix log recovery with da btree buffers when metauuid is in use. - Fix type coercion problems in xattr buffer size validation. - Fix a bug in online scrub dir leaf bestcount checking. - Only run COW recovery when recovering the log. - Fix symlink target buffer UAF problems and symlink locking problems by not exposing xfs innards to the VFS. - Fix incorrect quotaoff lock usage. - Don't let transactions cancel cleanly if they have deferred work items attached. - Fix a UAF when we're deciding if we need to relog an intent item. - Reduce kvmalloc overhead for log shadow buffers. - Clean up sysfs attr group usage. - Fix a bug where scrub's bmap/rmap checking could race with a quota file block allocation due to insufficient locking. - Teach scrub to complain about invalid project ids. -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEUzaAxoMeQq6m2jMV+H93GTRKtOsFAmHXOL0ACgkQ+H93GTRK tOulDQ/+PwvFyROQePQo+5Z7fDb92VAKX6UxFYzWZGXj3zhRmTX7fvNSniPxKqHO K0GMnZ9rJxvuo/MRuSEPILl+2rKh59Efo9IPcs2TR+xO+vI/EKnch2/gPLBN1y7Y A4peQuK6CvUjYTyMazR2PcK8y76ZW8iXyuQMxnpmYldZBCZ+v7myVoVlPCRrZ68W EoEQ/cDf4fmsycSG4Eh9WTddbqd9bfl1+jXEz/pzlbhJSvQOaqynR+21ahxhEQ/f CzjSM4kXQ9AWTfHOPnEXs+Hf02NfcKRMuguzO69gJZhooEXJlgeYAZXUKqXpdmbC 2Tc+x/rJnQv9PtzEw/WoWMem4rhwjBX7cVi2E/SJHB46MLI5rjhQilIdWzecWd8q nOrjIZlReI0vxYSqe1ifejxiCsV6gv3XdyLNacjxf7ISkU4XF0caO8f+3/ocIScV E7k+XN13pArUiRZXOt8UL571Hoa6nhtHjyAUg6i62AFI2R9XYUEjEXKMwX7ovRVO 8eGdeQqioHkmOkF8HGyyurB9+itccuW8dtl7YOqEQJMGXeyk7rsvpcPahKKKoiDr 8w3bpZlNiOfZEnRcNjO9B/Egu6LLMudti2ptwlUvfvWsfa8VTO8v0h5pqVhwuwrk lwQThWJ5THe/MptWybdZ3kSjhMvzqi4GO75T30Hj6XhecC3dFfo= =RYEb -----END PGP SIGNATURE----- Merge tag 'xfs-5.17-merge-2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux Pull xfs updates from Darrick Wong: "The big new feature here is that the mount code now only bothers to try to free stale COW staging extents if the fs unmounted uncleanly. This should reduce mount times, particularly on filesystems supporting reflink and containing a large number of allocation groups. Everything else this cycle are bugfixes, as the iomap folios conversion should be plenty enough excitement for anyone. That and I ran out of brain bandwidth after Thanksgiving last year. Summary: - Fix log recovery with da btree buffers when metauuid is in use. - Fix type coercion problems in xattr buffer size validation. - Fix a bug in online scrub dir leaf bestcount checking. - Only run COW recovery when recovering the log. - Fix symlink target buffer UAF problems and symlink locking problems by not exposing xfs innards to the VFS. - Fix incorrect quotaoff lock usage. - Don't let transactions cancel cleanly if they have deferred work items attached. - Fix a UAF when we're deciding if we need to relog an intent item. - Reduce kvmalloc overhead for log shadow buffers. - Clean up sysfs attr group usage. - Fix a bug where scrub's bmap/rmap checking could race with a quota file block allocation due to insufficient locking. - Teach scrub to complain about invalid project ids" * tag 'xfs-5.17-merge-2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: xfs: warn about inodes with project id of -1 xfs: hold quota inode ILOCK_EXCL until the end of dqalloc xfs: Remove redundant assignment of mp xfs: reduce kvmalloc overhead for CIL shadow buffers xfs: sysfs: use default_groups in kobj_type xfs: prevent UAF in xfs_log_item_in_current_chkpt xfs: prevent a WARN_ONCE() in xfs_ioc_attr_list() xfs: Fix comments mentioning xfs_ialloc xfs: check sb_meta_uuid for dabuf buffer recovery xfs: fix a bug in the online fsck directory leaf1 bestcount check xfs: only run COW extent recovery when there are no live extents xfs: don't expose internal symlink metadata buffers to the vfs xfs: fix quotaoff mutex usage now that we don't support disabling it xfs: shut down filesystem if we xfs_trans_cancel with deferred work items |
||
Linus Torvalds
|
d601e58c5f |
for-5.17-tag
-----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEE8rQSAMVO+zA4DBdWxWXV+ddtWDsFAmHcX64ACgkQxWXV+ddt WDsIyA//UGMG0waxz40RksQL+4AnTIJ+apmmxq5VGEE8y6aUPAGKm++5Cs9cC+Os 5xuZhNnD8hjB+eWH+tRCx8ll+T4g4skGKDCyD0fWNtZAlW8pnsIbztdO0nNYx9C0 V++vu/hQR6M8E7ORlayEKBWy2/UnBG5p/XVLPG4RJ4vMETJPl2RLWVDpu3dj09kf YnD3AY0vmKEyCu/b9NtSzfZMO2/lXT2U41ezLJJfmPAXcMJ0EeSAazACVDQyq24p wnnr6xmdo7ZR0oGFLUmBmfxbKwd3l0JIUsi/XRysXe+8y7raIE0gKCg7NpCS276T BZrKmefxHhdMCA1HLlH6AkrKmQUgIaceLkXTanTYv4cnVzb6XoeV6R4IO9/JQ0yv YsdCL7eZ4Vl3ToPlQkYWdwUNP5UjVMg5qMwxchigbwq7jViLJtiu3WKy0O0TitzB n4o0cCdlv3lHRp8FS6cFmbCrsavFT8/q3vz/aRdkrojOKE0jEWVJiz0bf39j5BVO IiJ3RF3kbpG7TZz0+eNUbgebME8zwaWfyd6t2L7Ztvx9F4elzSW95iwGc9//TZvl ciNFI9LQvnZRymUItxg0HNXfbJMXJAE9ImTxA8HLkfQYpaBlwbt4avkQK56LhpN+ nFCd5cxJy5HLFIvkRDfqpF+C247p5FICzd/hs59/cXlCynulglg= =9lg+ -----END PGP SIGNATURE----- Merge tag 'for-5.17-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs updates from David Sterba: "This end of the year branch is intentionally not that exciting. Most of the changes are under the hood, but there are some minor user visible improvements and several performance improvements too. Features: - make send work with concurrent block group relocation. We're not allowed to prevent send failing or silently producing some bad stream but with more fine grained locking and checks it's possible. The send vs deduplication exclusion could reuse the same logic in the future. - new exclusive operation 'balance paused' to allow adding a device to filesystem with paused balance - new sysfs file for fsid stored in the per-device directory to help distinguish devices when seeding is enabled, the fsid may differ from the one reported by the filesystem Performance improvements: - less metadata needed for directory logging, directory deletion is 20-40% faster - in zoned mode, cache zone information during mount to speed up repeated queries (about 50% speedup) - free space tree entries get indexed and searched by size (latency -30%, search run time -30%) - less contention in tree node locking when inserting a key and no splits are needed (files/sec in fsmark improves by 1-20%) Fixes: - fix ENOSPC failure when attempting direct IO write into NOCOW range - fix deadlock between quota enable and other quota operations - global reserve minimum calculations fixed to account for free space tree - in zoned mode, fix condition for chunk allocation that may not find the right zone for reuse and could lead to early ENOSPC Core: - global reserve stealing got simplified and cleaned up in evict - remove async transaction commit based on manual transaction refs, reuse existing kthread and mechanisms to let it commit transaction before timeout - preparatory work for extent tree v2, add wrappers for global tree roots, truncation path cleanups - remove readahead framework, it's a bit overengineered and used only for scrub, and yet it does not cover all its needs, there is another readahead built in the b-tree search that is now used, performance drop on HDD is about 5% which is acceptable and scrub is often throttled anyway, on SSDs there's no reported drop but slight improvement - self tests report extent tree state when error occurs - replace assert with debugging information when an uncommitted transaction is found at unmount time Other: - error handling improvements - other cleanups and refactoring" * tag 'for-5.17-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: (115 commits) btrfs: output more debug messages for uncommitted transaction btrfs: respect the max size in the header when activating swap file btrfs: fix argument list that the kdoc format and script verified btrfs: remove unnecessary parameter type from compression_decompress_bio btrfs: selftests: dump extent io tree if extent-io-tree test failed btrfs: scrub: cleanup the argument list of scrub_stripe() btrfs: scrub: cleanup the argument list of scrub_chunk() btrfs: remove reada infrastructure btrfs: scrub: use btrfs_path::reada for extent tree readahead btrfs: scrub: remove the unnecessary path parameter for scrub_raid56_parity() btrfs: refactor unlock_up btrfs: skip transaction commit after failure to create subvolume btrfs: zoned: fix chunk allocation condition for zoned allocator btrfs: add extent allocator hook to decide to allocate chunk or not btrfs: zoned: unset dedicated block group on allocation failure btrfs: zoned: drop redundant check for REQ_OP_ZONE_APPEND and btrfs_is_zoned btrfs: zoned: sink zone check into btrfs_repair_one_zone btrfs: zoned: simplify btrfs_check_meta_write_pointer btrfs: zoned: encapsulate inode locking for zoned relocation btrfs: sysfs: add devinfo/fsid to retrieve actual fsid from the device ... |
||
Linus Torvalds
|
9149fe8ba7 |
Changes since last update:
- add sysfs interface and a sysfs node to control sync decompression; - add tail-packing inline support for compressed files; - get rid of erofs_get_meta_page(). -----BEGIN PGP SIGNATURE----- iIcEABYIAC8WIQThPAmQN9sSA0DVxtI5NzHcH7XmBAUCYduH/BEceGlhbmdAa2Vy bmVsLm9yZwAKCRA5NzHcH7XmBNPdAP9hKomD1hRiFeCWlLA1nDXYkqGbbt6+D3HT cm4G7DgVBAD+O+RWv6JVYg1zAAFlKmxqEKEfoDLKI65wAjH1V/h/dQE= =cEam -----END PGP SIGNATURE----- Merge tag 'erofs-for-5.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs Pull erofs updates from Gao Xiang: "In this cycle, tail-packing data inline for compressed files is now supported so that tail pcluster can be stored and read together with inode metadata in order to save data I/O and storage space. In addition to that, to prepare for the upcoming subpage, folio and fscache features, we also introduce meta buffers to get rid of erofs_get_meta_page() since it was too close to the page itself. In addition, in order to show supported kernel features and control sync decompression strategy, new sysfs nodes are introduced in this cycle as well. Summary: - add sysfs interface and a sysfs node to control sync decompression - add tail-packing inline support for compressed files - get rid of erofs_get_meta_page()" * tag 'erofs-for-5.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs: erofs: use meta buffers for zmap operations erofs: use meta buffers for xattr operations erofs: use meta buffers for super operations erofs: use meta buffers for inode operations erofs: introduce meta buffer operations erofs: add on-disk compressed tail-packing inline support erofs: support inline data decompression erofs: support unaligned data decompression erofs: introduce z_erofs_fixup_insize erofs: tidy up z_erofs_lz4_decompress erofs: clean up erofs_map_blocks tracepoints erofs: Replace zero-length array with flexible-array member erofs: add sysfs node to control sync decompression strategy erofs: add sysfs interface erofs: rename lz4_0pading to zero_padding |
||
David Howells
|
d7bdba1c81 |
9p, afs, ceph, nfs: Use current_is_kswapd() rather than gfpflags_allow_blocking()
In 9p, afs ceph, and nfs, gfpflags_allow_blocking() (which wraps a test for __GFP_DIRECT_RECLAIM being set) is used to determine if ->releasepage() should wait for the completion of a DIO write to fscache with something like: if (folio_test_fscache(folio)) { if (!gfpflags_allow_blocking(gfp) || !(gfp & __GFP_FS)) return false; folio_wait_fscache(folio); } Instead, current_is_kswapd() should be used instead. Note that this is based on a patch originally by Zhaoyang Huang[1]. In addition to extending it to the other network filesystems and putting it on top of my fscache rewrite, it also needs to include linux/swap.h in a bunch of places. Can current_is_kswapd() be moved to linux/mm.h? Changes ======= ver #5: - Dropping the changes for cifs. Originally-signed-off-by: Zhaoyang Huang <zhaoyang.huang@unisoc.com> Co-developed-by: David Howells <dhowells@redhat.com> Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> cc: Zhaoyang Huang <zhaoyang.huang@unisoc.com> cc: Dominique Martinet <asmadeus@codewreck.org> cc: Marc Dionne <marc.dionne@auristor.com> cc: Steve French <smfrench@gmail.com> cc: Trond Myklebust <trond.myklebust@hammerspace.com> cc: linux-cachefs@redhat.com cc: v9fs-developer@lists.sourceforge.net cc: linux-afs@lists.infradead.org cc: ceph-devel@vger.kernel.org cc: linux-cifs@vger.kernel.org cc: linux-nfs@vger.kernel.org cc: linux-mm@kvack.org Link: https://lore.kernel.org/r/1638952658-20285-1-git-send-email-huangzhaoyang@gmail.com/ [1] Link: https://lore.kernel.org/r/164021590773.640689.16777975200823659231.stgit@warthog.procyon.org.uk/ # v4 |
||
Linus Torvalds
|
5dfbfe71e3 |
fs.idmapped.v5.17
-----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCYdRCkgAKCRCRxhvAZXjc olrvAQCdp8LWkT8TauJSl8wmUm3mZhNy+5+fXuCUSwe3PyUtTQEAq4fxm41JpG8u WCZTrrxVhaXwgUY3aWzzeQnLCZjtEQw= =woqV -----END PGP SIGNATURE----- Merge tag 'fs.idmapped.v5.17' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux Pull fs idmapping updates from Christian Brauner: "This contains the work to enable the idmapping infrastructure to support idmapped mounts of filesystems mounted with an idmapping. In addition this contains various cleanups that avoid repeated open-coding of the same functionality and simplify the code in quite a few places. We also finish the renaming of the mapping helpers we started a few kernel releases back and move them to a dedicated header to not continue polluting the fs header needlessly with low-level idmapping helpers. With this series the fs header only contains idmapping helpers that interact with fs objects. Currently we only support idmapped mounts for filesystems mounted without an idmapping themselves. This was a conscious decision mentioned in multiple places (cf. [1]). As explained at length in [3] it is perfectly fine to extend support for idmapped mounts to filesystem's mounted with an idmapping should the need arise. The need has been there for some time now (cf. [2]). Before we can port any filesystem that is mountable with an idmapping to support idmapped mounts in the coming cycles, we need to first extend the mapping helpers to account for the filesystem's idmapping. This again, is explained at length in our documentation at [3] and also in the individual commit messages so here's an overview. Currently, the low-level mapping helpers implement the remapping algorithms described in [3] in a simplified manner as we could rely on the fact that all filesystems supporting idmapped mounts are mounted without an idmapping. In contrast, filesystems mounted with an idmapping are very likely to not use an identity mapping and will instead use a non-identity mapping. So the translation step from or into the filesystem's idmapping in the remapping algorithm cannot be skipped for such filesystems. Non-idmapped filesystems and filesystems not supporting idmapped mounts are unaffected by this change as the remapping algorithms can take the same shortcut as before. If the low-level helpers detect that they are dealing with an idmapped mount but the underlying filesystem is mounted without an idmapping we can rely on the previous shortcut and can continue to skip the translation step from or into the filesystem's idmapping. And of course, if the low-level helpers detect that they are not dealing with an idmapped mount they can simply return the relevant id unchanged; no remapping needs to be performed at all. These checks guarantee that only the minimal amount of work is performed. As before, if idmapped mounts aren't used the low-level helpers are idempotent and no work is performed at all" Link: |
||
David Howells
|
e6435f1e02 |
fscache: Add a tracepoint for cookie use/unuse
Add a tracepoint to track fscache_use/unuse_cookie(). Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> cc: linux-cachefs@redhat.com Link: https://lore.kernel.org/r/164021588628.640689.12942919367404043608.stgit@warthog.procyon.org.uk/ # v4 |
||
Jeff Layton
|
1702e79734 |
ceph: add fscache writeback support
When updating the backing store from the pagecache (a'la writepage or writepages), write to the cache first. This allows us to keep caching files even when they are being written, as long as we have appropriate caps. Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: David Howells <dhowells@redhat.com> Link: https://lore.kernel.org/r/20211129162907.149445-3-jlayton@kernel.org/ # v1 Link: https://lore.kernel.org/r/20211207134451.66296-3-jlayton@kernel.org/ # v2 Link: https://lore.kernel.org/r/163906985808.143852.1383891557313186623.stgit@warthog.procyon.org.uk/ # v2 Link: https://lore.kernel.org/r/163967190257.1823006.16713609520911954804.stgit@warthog.procyon.org.uk/ # v3 Link: https://lore.kernel.org/r/164021585020.640689.6765214932458435472.stgit@warthog.procyon.org.uk/ # v4 |
||
Jeff Layton
|
400e1286c0 |
ceph: conversion to new fscache API
Now that the fscache API has been reworked and simplified, change ceph over to use it. With the old API, we would only instantiate a cookie when the file was open for reads. Change it to instantiate the cookie when the inode is instantiated and call use/unuse when the file is opened/closed. Also, ensure we resize the cached data on truncates, and invalidate the cache in response to the appropriate events. This will allow us to plumb in write support later. Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: David Howells <dhowells@redhat.com> Link: https://lore.kernel.org/r/20211129162907.149445-2-jlayton@kernel.org/ # v1 Link: https://lore.kernel.org/r/20211207134451.66296-2-jlayton@kernel.org/ # v2 Link: https://lore.kernel.org/r/163906984277.143852.14697110691303589000.stgit@warthog.procyon.org.uk/ # v2 Link: https://lore.kernel.org/r/163967188351.1823006.5065634844099079351.stgit@warthog.procyon.org.uk/ # v3 Link: https://lore.kernel.org/r/164021581427.640689.14128682147127509264.stgit@warthog.procyon.org.uk/ # v4 |
||
Jan Kara
|
68514dacf2 |
select: Fix indefinitely sleeping task in poll_schedule_timeout()
A task can end up indefinitely sleeping in do_select() -> poll_schedule_timeout() when the following race happens: TASK1 (thread1) TASK2 TASK1 (thread2) do_select() setup poll_wqueues table with 'fd' write data to 'fd' pollwake() table->triggered = 1 closes 'fd' thread1 is waiting for poll_schedule_timeout() - sees table->triggered table->triggered = 0 return -EINTR loop back in do_select() But at this point when TASK1 loops back, the fdget() in the setup of poll_wqueues fails. So now so we never find 'fd' is ready for reading and sleep in poll_schedule_timeout() indefinitely. Treat an fd that got closed as a fd on which some event happened. This makes sure cannot block indefinitely in do_select(). Another option would be to return -EBADF in this case but that has a potential of subtly breaking applications that excercise this behavior and it happens to work for them. So returning fd as active seems like a safer choice. Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> CC: stable@vger.kernel.org Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
Bob Peterson
|
74382e277a |
gfs2: dump inode object for iopen glocks
Before this patch, glock dumps would not dump the gl_object for iopen glocks. This information can help us debug problems related to eviction: when AN iopen glock is blocked we can see the status of its underlying inode and its flags, etc. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> |
||
Linus Torvalds
|
8efd0d9c31 |
Networking changes for 5.17.
Core ---- - Defer freeing TCP skbs to the BH handler, whenever possible, or at least perform the freeing outside of the socket lock section to decrease cross-CPU allocator work and improve latency. - Add netdevice refcount tracking to locate sources of netdevice and net namespace refcount leaks. - Make Tx watchdog less intrusive - avoid pausing Tx and restarting all queues from a single CPU removing latency spikes. - Various small optimizations throughout the stack from Eric Dumazet. - Make netdev->dev_addr[] constant, force modifications to go via appropriate helpers to allow us to keep addresses in ordered data structures. - Replace unix_table_lock with per-hash locks, improving performance of bind() calls. - Extend skb drop tracepoint with a drop reason. - Allow SO_MARK and SO_PRIORITY setsockopt under CAP_NET_RAW. BPF --- - New helpers: - bpf_find_vma(), find and inspect VMAs for profiling use cases - bpf_loop(), runtime-bounded loop helper trading some execution time for much faster (if at all converging) verification - bpf_strncmp(), improve performance, avoid compiler flakiness - bpf_get_func_arg(), bpf_get_func_ret(), bpf_get_func_arg_cnt() for tracing programs, all inlined by the verifier - Support BPF relocations (CO-RE) in the kernel loader. - Further the support for BTF_TYPE_TAG annotations. - Allow access to local storage in sleepable helpers. - Convert verifier argument types to a composable form with different attributes which can be shared across types (ro, maybe-null). - Prepare libbpf for upcoming v1.0 release by cleaning up APIs, creating new, extensible ones where missing and deprecating those to be removed. Protocols --------- - WiFi (mac80211/cfg80211): - notify user space about long "come back in N" AP responses, allow it to react to such temporary rejections - allow non-standard VHT MCS 10/11 rates - use coarse time in airtime fairness code to save CPU cycles - Bluetooth: - rework of HCI command execution serialization to use a common queue and work struct, and improve handling errors reported in the middle of a batch of commands - rework HCI event handling to use skb_pull_data, avoiding packet parsing pitfalls - support AOSP Bluetooth Quality Report - SMC: - support net namespaces, following the RDMA model - improve connection establishment latency by pre-clearing buffers - introduce TCP ULP for automatic redirection to SMC - Multi-Path TCP: - support ioctls: SIOCINQ, OUTQ, and OUTQNSD - support socket options: IP_TOS, IP_FREEBIND, IP_TRANSPARENT, IPV6_FREEBIND, and IPV6_TRANSPARENT, TCP_CORK and TCP_NODELAY - support cmsgs: TCP_INQ - improvements in the data scheduler (assigning data to subflows) - support fastclose option (quick shutdown of the full MPTCP connection, similar to TCP RST in regular TCP) - MCTP (Management Component Transport) over serial, as defined by DMTF spec DSP0253 - "MCTP Serial Transport Binding". Driver API ---------- - Support timestamping on bond interfaces in active/passive mode. - Introduce generic phylink link mode validation for drivers which don't have any quirks and where MAC capability bits fully express what's supported. Allow PCS layer to participate in the validation. Convert a number of drivers. - Add support to set/get size of buffers on the Rx rings and size of the tx copybreak buffer via ethtool. - Support offloading TC actions as first-class citizens rather than only as attributes of filters, improve sharing and device resource utilization. - WiFi (mac80211/cfg80211): - support forwarding offload (ndo_fill_forward_path) - support for background radar detection hardware - SA Query Procedures offload on the AP side New hardware / drivers ---------------------- - tsnep - FPGA based TSN endpoint Ethernet MAC used in PLCs with real-time requirements for isochronous communication with protocols like OPC UA Pub/Sub. - Qualcomm BAM-DMUX WWAN - driver for data channels of modems integrated into many older Qualcomm SoCs, e.g. MSM8916 or MSM8974 (qcom_bam_dmux). - Microchip LAN966x multi-port Gigabit AVB/TSN Ethernet Switch driver with support for bridging, VLANs and multicast forwarding (lan966x). - iwlmei driver for co-operating between Intel's WiFi driver and Intel's Active Management Technology (AMT) devices. - mse102x - Vertexcom MSE102x Homeplug GreenPHY chips - Bluetooth: - MediaTek MT7921 SDIO devices - Foxconn MT7922A - Realtek RTL8852AE Drivers ------- - Significantly improve performance in the datapaths of: lan78xx, ax88179_178a, lantiq_xrx200, bnxt. - Intel Ethernet NICs: - igb: support PTP/time PEROUT and EXTTS SDP functions on 82580/i354/i350 adapters - ixgbevf: new PF -> VF mailbox API which avoids the risk of mailbox corruption with ESXi - iavf: support configuration of VLAN features of finer granularity, stacked tags and filtering - ice: PTP support for new E822 devices with sub-ns precision - ice: support firmware activation without reboot - Mellanox Ethernet NICs (mlx5): - expose control over IRQ coalescing mode (CQE vs EQE) via ethtool - support TC forwarding when tunnel encap and decap happen between two ports of the same NIC - dynamically size and allow disabling various features to save resources for running in embedded / SmartNIC scenarios - Broadcom Ethernet NICs (bnxt): - use page frag allocator to improve Rx performance - expose control over IRQ coalescing mode (CQE vs EQE) via ethtool - Other Ethernet NICs: - amd-xgbe: add Ryzen 6000 (Yellow Carp) Ethernet support - Microsoft cloud/virtual NIC (mana): - add XDP support (PASS, DROP, TX) - Mellanox Ethernet switches (mlxsw): - initial support for Spectrum-4 ASICs - VxLAN with IPv6 underlay - Marvell Ethernet switches (prestera): - support flower flow templates - add basic IP forwarding support - NXP embedded Ethernet switches (ocelot & felix): - support Per-Stream Filtering and Policing (PSFP) - enable cut-through forwarding between ports by default - support FDMA to improve packet Rx/Tx to CPU - Other embedded switches: - hellcreek: improve trapping management (STP and PTP) packets - qca8k: support link aggregation and port mirroring - Qualcomm 802.11ax WiFi (ath11k): - qca6390, wcn6855: enable 802.11 power save mode in station mode - BSS color change support - WCN6855 hw2.1 support - 11d scan offload support - scan MAC address randomization support - full monitor mode, only supported on QCN9074 - qca6390/wcn6855: report signal and tx bitrate - qca6390: rfkill support - qca6390/wcn6855: regdb.bin support - Intel WiFi (iwlwifi): - support SAR GEO Offset Mapping (SGOM) and Time-Aware-SAR (TAS) in cooperation with the BIOS - support for Optimized Connectivity Experience (OCE) scan - support firmware API version 68 - lots of preparatory work for the upcoming Bz device family - MediaTek WiFi (mt76): - Specific Absorption Rate (SAR) support - mt7921: 160 MHz channel support - RealTek WiFi (rtw88): - Specific Absorption Rate (SAR) support - scan offload - Other WiFi NICs - ath10k: support fetching (pre-)calibration data from nvmem - brcmfmac: configure keep-alive packet on suspend - wcn36xx: beacon filter support Signed-off-by: Jakub Kicinski <kuba@kernel.org> -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmHbkZAACgkQMUZtbf5S IruYkQ//XX7BggcwBfukPK83j0dONolClijqKcKR08g4vB5L8GXvv6OErKIWrh4k h8JanCH352ZkbCSw3MvFdm825UYQv8vPMd6Qks/LJ4aSKqCuy4MIlAo+yOw4Km3O i7++lRfma6DqHHI59wvLjWoxZSPu8lL+rI8UsZ5qMOlnNlGAOXsNrzRjaqQ3FddY AMxZeBUtrPqUCCQZFq3U8apkYzUp7CA/3XR9zRcja3uPbrtOV2G+4whRF90qGNWz Tm/QvJ9F/Ab292cbhxR4KuaQ3hUhaCQyDjbZk3+FZzZpAVhYTVqcNjny6+yXmbiP NXRtwemnl1NlWKMnJM8lEeY48u626tRIkxA/Wtd61uoO5uKUSxfGP+UpUi+DfXbF yIw50VQ7L2bpxXP/HjtmhVgZDaWKYyh22Zw4Hp/muMJz0hgUB0KODY3tf2jUWbjJ 0oEgocWyzhhwMQKqupTDCIaRgIs2ewYr4ZrFDhI3HnHC/vv1VjoPRUPIyxwppD2N cXvZb3B1sWK8iX5gCbISGzyU4bB7I0rvJSTU42ueti7n6NqRFZ79qHQpYnnY+JdO z1qOwY/d/yWfBoXVKRtRg2qz6CdEt5BQklwAgVEBgrFpf58gp694EwGMb1htY14J r/k9bVpmyIFpUnBH2CPMRfBVA3tUTqzyzzFV4AMw40NYLKmhLdo= =KLm3 -----END PGP SIGNATURE----- Merge tag '5.17-net-next' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next Pull networking updates from Jakub Kicinski: "Core ---- - Defer freeing TCP skbs to the BH handler, whenever possible, or at least perform the freeing outside of the socket lock section to decrease cross-CPU allocator work and improve latency. - Add netdevice refcount tracking to locate sources of netdevice and net namespace refcount leaks. - Make Tx watchdog less intrusive - avoid pausing Tx and restarting all queues from a single CPU removing latency spikes. - Various small optimizations throughout the stack from Eric Dumazet. - Make netdev->dev_addr[] constant, force modifications to go via appropriate helpers to allow us to keep addresses in ordered data structures. - Replace unix_table_lock with per-hash locks, improving performance of bind() calls. - Extend skb drop tracepoint with a drop reason. - Allow SO_MARK and SO_PRIORITY setsockopt under CAP_NET_RAW. BPF --- - New helpers: - bpf_find_vma(), find and inspect VMAs for profiling use cases - bpf_loop(), runtime-bounded loop helper trading some execution time for much faster (if at all converging) verification - bpf_strncmp(), improve performance, avoid compiler flakiness - bpf_get_func_arg(), bpf_get_func_ret(), bpf_get_func_arg_cnt() for tracing programs, all inlined by the verifier - Support BPF relocations (CO-RE) in the kernel loader. - Further the support for BTF_TYPE_TAG annotations. - Allow access to local storage in sleepable helpers. - Convert verifier argument types to a composable form with different attributes which can be shared across types (ro, maybe-null). - Prepare libbpf for upcoming v1.0 release by cleaning up APIs, creating new, extensible ones where missing and deprecating those to be removed. Protocols --------- - WiFi (mac80211/cfg80211): - notify user space about long "come back in N" AP responses, allow it to react to such temporary rejections - allow non-standard VHT MCS 10/11 rates - use coarse time in airtime fairness code to save CPU cycles - Bluetooth: - rework of HCI command execution serialization to use a common queue and work struct, and improve handling errors reported in the middle of a batch of commands - rework HCI event handling to use skb_pull_data, avoiding packet parsing pitfalls - support AOSP Bluetooth Quality Report - SMC: - support net namespaces, following the RDMA model - improve connection establishment latency by pre-clearing buffers - introduce TCP ULP for automatic redirection to SMC - Multi-Path TCP: - support ioctls: SIOCINQ, OUTQ, and OUTQNSD - support socket options: IP_TOS, IP_FREEBIND, IP_TRANSPARENT, IPV6_FREEBIND, and IPV6_TRANSPARENT, TCP_CORK and TCP_NODELAY - support cmsgs: TCP_INQ - improvements in the data scheduler (assigning data to subflows) - support fastclose option (quick shutdown of the full MPTCP connection, similar to TCP RST in regular TCP) - MCTP (Management Component Transport) over serial, as defined by DMTF spec DSP0253 - "MCTP Serial Transport Binding". Driver API ---------- - Support timestamping on bond interfaces in active/passive mode. - Introduce generic phylink link mode validation for drivers which don't have any quirks and where MAC capability bits fully express what's supported. Allow PCS layer to participate in the validation. Convert a number of drivers. - Add support to set/get size of buffers on the Rx rings and size of the tx copybreak buffer via ethtool. - Support offloading TC actions as first-class citizens rather than only as attributes of filters, improve sharing and device resource utilization. - WiFi (mac80211/cfg80211): - support forwarding offload (ndo_fill_forward_path) - support for background radar detection hardware - SA Query Procedures offload on the AP side New hardware / drivers ---------------------- - tsnep - FPGA based TSN endpoint Ethernet MAC used in PLCs with real-time requirements for isochronous communication with protocols like OPC UA Pub/Sub. - Qualcomm BAM-DMUX WWAN - driver for data channels of modems integrated into many older Qualcomm SoCs, e.g. MSM8916 or MSM8974 (qcom_bam_dmux). - Microchip LAN966x multi-port Gigabit AVB/TSN Ethernet Switch driver with support for bridging, VLANs and multicast forwarding (lan966x). - iwlmei driver for co-operating between Intel's WiFi driver and Intel's Active Management Technology (AMT) devices. - mse102x - Vertexcom MSE102x Homeplug GreenPHY chips - Bluetooth: - MediaTek MT7921 SDIO devices - Foxconn MT7922A - Realtek RTL8852AE Drivers ------- - Significantly improve performance in the datapaths of: lan78xx, ax88179_178a, lantiq_xrx200, bnxt. - Intel Ethernet NICs: - igb: support PTP/time PEROUT and EXTTS SDP functions on 82580/i354/i350 adapters - ixgbevf: new PF -> VF mailbox API which avoids the risk of mailbox corruption with ESXi - iavf: support configuration of VLAN features of finer granularity, stacked tags and filtering - ice: PTP support for new E822 devices with sub-ns precision - ice: support firmware activation without reboot - Mellanox Ethernet NICs (mlx5): - expose control over IRQ coalescing mode (CQE vs EQE) via ethtool - support TC forwarding when tunnel encap and decap happen between two ports of the same NIC - dynamically size and allow disabling various features to save resources for running in embedded / SmartNIC scenarios - Broadcom Ethernet NICs (bnxt): - use page frag allocator to improve Rx performance - expose control over IRQ coalescing mode (CQE vs EQE) via ethtool - Other Ethernet NICs: - amd-xgbe: add Ryzen 6000 (Yellow Carp) Ethernet support - Microsoft cloud/virtual NIC (mana): - add XDP support (PASS, DROP, TX) - Mellanox Ethernet switches (mlxsw): - initial support for Spectrum-4 ASICs - VxLAN with IPv6 underlay - Marvell Ethernet switches (prestera): - support flower flow templates - add basic IP forwarding support - NXP embedded Ethernet switches (ocelot & felix): - support Per-Stream Filtering and Policing (PSFP) - enable cut-through forwarding between ports by default - support FDMA to improve packet Rx/Tx to CPU - Other embedded switches: - hellcreek: improve trapping management (STP and PTP) packets - qca8k: support link aggregation and port mirroring - Qualcomm 802.11ax WiFi (ath11k): - qca6390, wcn6855: enable 802.11 power save mode in station mode - BSS color change support - WCN6855 hw2.1 support - 11d scan offload support - scan MAC address randomization support - full monitor mode, only supported on QCN9074 - qca6390/wcn6855: report signal and tx bitrate - qca6390: rfkill support - qca6390/wcn6855: regdb.bin support - Intel WiFi (iwlwifi): - support SAR GEO Offset Mapping (SGOM) and Time-Aware-SAR (TAS) in cooperation with the BIOS - support for Optimized Connectivity Experience (OCE) scan - support firmware API version 68 - lots of preparatory work for the upcoming Bz device family - MediaTek WiFi (mt76): - Specific Absorption Rate (SAR) support - mt7921: 160 MHz channel support - RealTek WiFi (rtw88): - Specific Absorption Rate (SAR) support - scan offload - Other WiFi NICs - ath10k: support fetching (pre-)calibration data from nvmem - brcmfmac: configure keep-alive packet on suspend - wcn36xx: beacon filter support" * tag '5.17-net-next' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (2048 commits) tcp: tcp_send_challenge_ack delete useless param `skb` net/qla3xxx: Remove useless DMA-32 fallback configuration rocker: Remove useless DMA-32 fallback configuration hinic: Remove useless DMA-32 fallback configuration lan743x: Remove useless DMA-32 fallback configuration net: enetc: Remove useless DMA-32 fallback configuration cxgb4vf: Remove useless DMA-32 fallback configuration cxgb4: Remove useless DMA-32 fallback configuration cxgb3: Remove useless DMA-32 fallback configuration bnx2x: Remove useless DMA-32 fallback configuration et131x: Remove useless DMA-32 fallback configuration be2net: Remove useless DMA-32 fallback configuration vmxnet3: Remove useless DMA-32 fallback configuration bna: Simplify DMA setting net: alteon: Simplify DMA setting myri10ge: Simplify DMA setting qlcnic: Simplify DMA setting net: allwinner: Fix print format page_pool: remove spinlock in page_pool_refill_alloc_cache() amt: fix wrong return type of amt_send_membership_update() ... |
||
Linus Torvalds
|
404dbad382 |
pstore update for v5.17-rc1
- Add boot param for early ftrace recording in pstore (Uwe Kleine-König) -----BEGIN PGP SIGNATURE----- iQJKBAABCgA0FiEEpcP2jyKd1g9yPm4TiXL039xtwCYFAmHV0TIWHGtlZXNjb29r QGNocm9taXVtLm9yZwAKCRCJcvTf3G3AJveID/0a/qRLfwNyjjLZ2MNtZLfL8ejs Qc/3wXJy7IoMe6OagbwS+boJ7PTOa3EMYh0h7GUGc8EjxFai+N2mGqEDMX/JZFRy zk1eWfNDpmCBrzo3VlOXrMM0Zb5cj3y9bfsLK31FcLN7gu+ivSXrA6BlPA513E+s SISPfK04tXJsLMBg12kKDkp8OcQ9ybKYDxUnScCFg+ef50pFdgzU6K79fUMGac8P M0n/ZDZRadUK8fCI2CChlOghWe09gWjWZIxstfn4qY+3C6R97ueB7IVCVgn/fbzX JIhsVmxjubQhd6AP/t5Ib5yzRFQ1O+p60LBGuwG8uA25DvnH3MfYhNRqpOB9R0a8 3IygXuMcCGDl5LCBDteKrktdIKKCA8gLt0IydWD1xcVt8z7ux52/KTPyQZYYCPn4 LvzACEL6O1qoPEod5k//BbF8p4A8JRBUPUCy55xtdQbOw/kg5H1fWg6xjGYW20Vs GpmuKKtDrAKX4OxIdLnaA5p3eBaF98OEo/d/H9PEUkYVc6xoPIGqwvepbt+N5Zpc vQ8FRqLTG+KvS3uUyXFlIdR5Rt/qwF0OoFxrMEEp0X/5aKwJHjW9v5qSkKehfQDI Ly4QsAaIZJx7w8FckGCBQzvTxfteMp5yivHW0ccOmnqxDLK04keoCxNSFBfP4FLQ 4puMlfhXcL2ZO640bA== =Ux14 -----END PGP SIGNATURE----- Merge tag 'pstore-v5.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux Pull pstore update from Kees Cook: - Add boot param for early ftrace recording in pstore (Uwe Kleine-König) * tag 'pstore-v5.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: pstore/ftrace: Allow immediate recording |
||
Theodore Ts'o
|
6eeaf88fd5 |
ext4: don't use the orphan list when migrating an inode
We probably want to remove the indirect block to extents migration feature after a deprecation window, but until then, let's fix a potential data loss problem caused by the fact that we put the tmp_inode on the orphan list. In the unlikely case where we crash and do a journal recovery, the data blocks belonging to the inode being migrated are also represented in the tmp_inode on the orphan list --- and so its data blocks will get marked unallocated, and available for reuse. Instead, stop putting the tmp_inode on the oprhan list. So in the case where we crash while migrating the inode, we'll leak an inode, which is not a disaster. It will be easily fixed the next time we run fsck, and it's better than potentially having blocks getting claimed by two different files, and losing data as a result. Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Lukas Czerner <lczerner@redhat.com> Cc: stable@kernel.org |
||
xu xin
|
a2e3965df4 |
ext4: use BUG_ON instead of if condition followed by BUG
BUG_ON would be better. This issue was detected with the help of Coccinelle. Reported-by: Zeal robot <zealci@zte.com.cn> Reviewed-by: Lukas Czerner <lczerner@redhat.com> Signed-off-by: xu xin <xu.xin16@zte.com.cn> Link: https://lore.kernel.org/r/20211228073252.580296-1-xu.xin16@zte.com.cn Signed-off-by: Theodore Ts'o <tytso@mit.edu> |
||
Dan Carpenter
|
da9e480212 |
ext4: fix a copy and paste typo
This was obviously supposed to be an ext4 struct, not xfs. GCC
doesn't care either way so it doesn't affect the build or runtime.
Fixes:
|
||
Luís Henriques
|
e81c9302a6 |
ext4: set csum seed in tmp inode while migrating to extents
When migrating to extents, the temporary inode will have it's own checksum seed. This means that, when swapping the inodes data, the inode checksums will be incorrect. This can be fixed by recalculating the extents checksums again. Or simply by copying the seed into the temporary inode. Link: https://bugzilla.kernel.org/show_bug.cgi?id=213357 Reported-by: Jeroen van Wolffelaar <jeroen@wolffelaar.nl> Signed-off-by: Luís Henriques <lhenriques@suse.de> Link: https://lore.kernel.org/r/20211214175058.19511-1-lhenriques@suse.de Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org |
||
luo penghao
|
ae6ec194b5 |
ext4: remove unnecessary 'offset' assignment
Although it is in the loop, offset is reassigned at the beginning of the while loop. And after the loop, the value will not be used The clang_analyzer complains as follows: fs/ext4/dir.c:306:3 warning: Value stored to 'offset' is never read Reported-by: Zeal Robot <zealci@zte.com.cn> Signed-off-by: luo penghao <luo.penghao@zte.com.cn> Link: https://lore.kernel.org/r/20211208075307.404703-1-luo.penghao@zte.com.cn Signed-off-by: Theodore Ts'o <tytso@mit.edu> |
||
luo penghao
|
a6dbc76c4d |
ext4: remove redundant o_start statement
The if will goto out of the loop, and until the end of the function execution, o_start will not be used again. The clang_analyzer complains as follows: fs/ext4/move_extent.c:635:5 warning: Value stored to 'o_start' is never read Reported-by: Zeal Robot <zealci@zte.com.cn> Signed-off-by: luo penghao <luo.penghao@zte.com.cn> Link: https://lore.kernel.org/r/20211208075157.404535-1-luo.penghao@zte.com.cn Signed-off-by: Theodore Ts'o <tytso@mit.edu> |
||
Adam Borowski
|
037e7c525d |
ext4: drop an always true check
EXT_FIRST_INDEX(ptr) is ptr+12, which can't possibly be null; gcc-12 warns about this. Signed-off-by: Adam Borowski <kilobyte@angband.pl> Reviewed-by: Andreas Dilger <adilger@dilger.ca> Link: https://lore.kernel.org/r/20211115172020.57853-1-kilobyte@angband.pl Signed-off-by: Theodore Ts'o <tytso@mit.edu> |
||
luo penghao
|
fac888b2be |
ext4: remove unused assignments
The eh assignment in these two places is meaningless, because the function will goto to merge, which will not use eh. The clang_analyzer complains as follows: fs/ext4/extents.c:1988:4 warning: fs/ext4/extents.c:2016:4 warning: Value stored to 'eh' is never read Reported-by: Zeal Robot <zealci@zte.com.cn> Signed-off-by: luo penghao <luo.penghao@zte.com.cn> Link: https://lore.kernel.org/r/20211104064007.2919-1-luo.penghao@zte.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> |
||
luo penghao
|
a660be97eb |
ext4: remove redundant statement
The local variable assignment at the end of the function is meaningless. The clang_analyzer complains as follows: fs/ext4/fast_commit.c:779:2 warning: Value stored to 'dst' is never read Reported-by: Zeal Robot <zealci@zte.com.cn> Signed-off-by: luo penghao <luo.penghao@zte.com.cn> Link: https://lore.kernel.org/r/20211104063406.2747-1-luo.penghao@zte.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> |
||
Nghia Le
|
effc5b3b0d |
ext4: remove useless resetting io_end_size in mpage_process_page()
The command "make clang-analyzer" detects dead stores in mpage_process_page() function. Do not reset io_end_size to 0 in the current paths, as the function exits on those paths without further using io_end_size. Signed-off-by: Nghia Le <nghialm78@gmail.com> Link: https://lore.kernel.org/r/20211025221803.3326-1-nghialm78@gmail.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> |
||
Lukas Czerner
|
4a69aecbfb |
ext4: allow to change s_last_trim_minblks via sysfs
Ext4 has an optimization mechanism for batched disacrd (FITRIM) that should help speed up subsequent calls of FITRIM ioctl by skipping the groups that were previously trimmed. However because the FITRIM allows to set the minimum size of an extent to trim, ext4 stores the last minimum extent size and only avoids trimming the group if it was previously trimmed with minimum extent size equal to, or smaller than the current call. There is currently no way to bypass the optimization without umount/mount cycle. This becomes a problem when the file system is live migrated to a different storage, because the optimization will prevent possibly useful discard calls to the storage. Fix it by exporting the s_last_trim_minblks via sysfs interface which will allow us to set the minimum size to the number of blocks larger than subsequent FITRIM call, effectively bypassing the optimization. By setting the s_last_trim_minblks to ULONG_MAX the optimization will be effectively cleared regardless of the previous state, or file system configuration. For example: getconf ULONG_MAX > /sys/fs/ext4/dm-1/last_trim_minblks Signed-off-by: Lukas Czerner <lczerner@redhat.com> Reported-by: Laurent GUERBY <laurent@guerby.net> Reviewed-by: Andreas Dilger <adilger@dilger.ca> Link: https://lore.kernel.org/r/20211103145122.17338-2-lczerner@redhat.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> |
||
Lukas Czerner
|
2327fb2e23 |
ext4: change s_last_trim_minblks type to unsigned long
There is no good reason for the s_last_trim_minblks to be atomic. There is no data integrity needed and there is no real danger in setting and reading it in a racy manner. Change it to be unsigned long, the same type as s_clusters_per_group which is the maximum that's allowed. Signed-off-by: Lukas Czerner <lczerner@redhat.com> Suggested-by: Andreas Dilger <adilger@dilger.ca> Reviewed-by: Andreas Dilger <adilger@dilger.ca> Link: https://lore.kernel.org/r/20211103145122.17338-1-lczerner@redhat.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> |
||
Lukas Czerner
|
bbc605cdb1 |
ext4: implement support for get/set fs label
Implement support for FS_IOC_GETFSLABEL and FS_IOC_SETFSLABEL ioctls for online reading and setting of file system label. ext4_ioctl_getlabel() is simple, just get the label from the primary superblock. This might not be the first sb on the file system if 'sb=' mount option is used. In ext4_ioctl_setlabel() we update what ext4 currently views as a primary superblock and then proceed to update backup superblocks. There are two caveats: - the primary superblock might not be the first superblock and so it might not be the one used by userspace tools if read directly off the disk. - because the primary superblock might not be the first superblock we potentialy have to update it as part of backup superblock update. However the first sb location is a bit more complicated than the rest so we have to account for that. The superblock modification is created generic enough so the infrastructure can be used for other potential superblock modification operations, such as chaning UUID. Tested with generic/492 with various configurations. I also checked the behavior with 'sb=' mount options, including very large file systems with and without sparse_super/sparse_super2. Signed-off-by: Lukas Czerner <lczerner@redhat.com> Link: https://lore.kernel.org/r/20211213135618.43303-1-lczerner@redhat.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> |
||
Lukas Czerner
|
4c1bd5a90c |
ext4: only set EXT4_MOUNT_QUOTA when journalled quota file is specified
Only set EXT4_MOUNT_QUOTA when journalled quota file is specified,
otherwise simply disabling specific quota type (usrjquota=) will also
set the EXT4_MOUNT_QUOTA super block option.
Signed-off-by: Lukas Czerner <lczerner@redhat.com>
Fixes:
|
||
Lukas Czerner
|
13b215a9e6 |
ext4: don't use kfree() on rcu protected pointer sbi->s_qf_names
During ext4 mount api rework the commit |
||
Jan Kara
|
173b6e383d |
ext4: avoid trim error on fs with small groups
A user reported FITRIM ioctl failing for him on ext4 on some devices
without apparent reason. After some debugging we've found out that
these devices (being LVM volumes) report rather large discard
granularity of 42MB and the filesystem had 1k blocksize and thus group
size of 8MB. Because ext4 FITRIM implementation puts discard
granularity into minlen, ext4_trim_fs() declared the trim request as
invalid. However just silently doing nothing seems to be a more
appropriate reaction to such combination of parameters since user did
not specify anything wrong.
CC: Lukas Czerner <lczerner@redhat.com>
Fixes:
|
||
Zhang Yi
|
5c48a7df91 |
ext4: fix an use-after-free issue about data=journal writeback mode
Our syzkaller report an use-after-free issue that accessing the freed buffer_head on the writeback page in __ext4_journalled_writepage(). The problem is that if there was a truncate racing with the data=journalled writeback procedure, the writeback length could become zero and bget_one() refuse to get buffer_head's refcount, then the truncate procedure release buffer once we drop page lock, finally, the last ext4_walk_page_buffers() trigger the use-after-free problem. sync truncate ext4_sync_file() file_write_and_wait_range() ext4_setattr(0) inode->i_size = 0 ext4_writepage() len = 0 __ext4_journalled_writepage() page_bufs = page_buffers(page) ext4_walk_page_buffers(bget_one) <- does not get refcount do_invalidatepage() free_buffer_head() ext4_walk_page_buffers(page_bufs) <- trigger use-after-free After commit |
||
Ye Bin
|
298b5c5217 |
ext4: fix null-ptr-deref in '__ext4_journal_ensure_credits'
We got issue as follows when run syzkaller test: [ 1901.130043] EXT4-fs error (device vda): ext4_remount:5624: comm syz-executor.5: Abort forced by user [ 1901.130901] Aborting journal on device vda-8. [ 1901.131437] EXT4-fs error (device vda): ext4_journal_check_start:61: comm syz-executor.16: Detected aborted journal [ 1901.131566] EXT4-fs error (device vda): ext4_journal_check_start:61: comm syz-executor.11: Detected aborted journal [ 1901.132586] EXT4-fs error (device vda): ext4_journal_check_start:61: comm syz-executor.18: Detected aborted journal [ 1901.132751] EXT4-fs error (device vda): ext4_journal_check_start:61: comm syz-executor.9: Detected aborted journal [ 1901.136149] EXT4-fs error (device vda) in ext4_reserve_inode_write:6035: Journal has aborted [ 1901.136837] EXT4-fs error (device vda): ext4_journal_check_start:61: comm syz-fuzzer: Detected aborted journal [ 1901.136915] ================================================================== [ 1901.138175] BUG: KASAN: null-ptr-deref in __ext4_journal_ensure_credits+0x74/0x140 [ext4] [ 1901.138343] EXT4-fs error (device vda): ext4_journal_check_start:61: comm syz-executor.13: Detected aborted journal [ 1901.138398] EXT4-fs error (device vda): ext4_journal_check_start:61: comm syz-executor.1: Detected aborted journal [ 1901.138808] Read of size 8 at addr 0000000000000000 by task syz-executor.17/968 [ 1901.138817] [ 1901.138852] EXT4-fs error (device vda): ext4_journal_check_start:61: comm syz-executor.30: Detected aborted journal [ 1901.144779] CPU: 1 PID: 968 Comm: syz-executor.17 Not tainted 4.19.90-vhulk2111.1.0.h893.eulerosv2r10.aarch64+ #1 [ 1901.146479] Hardware name: linux,dummy-virt (DT) [ 1901.147317] Call trace: [ 1901.147552] dump_backtrace+0x0/0x2d8 [ 1901.147898] show_stack+0x28/0x38 [ 1901.148215] dump_stack+0xec/0x15c [ 1901.148746] kasan_report+0x108/0x338 [ 1901.149207] __asan_load8+0x58/0xb0 [ 1901.149753] __ext4_journal_ensure_credits+0x74/0x140 [ext4] [ 1901.150579] ext4_xattr_delete_inode+0xe4/0x700 [ext4] [ 1901.151316] ext4_evict_inode+0x524/0xba8 [ext4] [ 1901.151985] evict+0x1a4/0x378 [ 1901.152353] iput+0x310/0x428 [ 1901.152733] do_unlinkat+0x260/0x428 [ 1901.153056] __arm64_sys_unlinkat+0x6c/0xc0 [ 1901.153455] el0_svc_common+0xc8/0x320 [ 1901.153799] el0_svc_handler+0xf8/0x160 [ 1901.154265] el0_svc+0x10/0x218 [ 1901.154682] ================================================================== This issue may happens like this: Process1 Process2 ext4_evict_inode ext4_journal_start ext4_truncate ext4_ind_truncate ext4_free_branches ext4_ind_truncate_ensure_credits ext4_journal_ensure_credits_fn ext4_journal_restart handle->h_transaction = NULL; mount -o remount,abort /mnt -> trigger JBD abort start_this_handle -> will return failed ext4_xattr_delete_inode ext4_journal_ensure_credits ext4_journal_ensure_credits_fn __ext4_journal_ensure_credits jbd2_handle_buffer_credits journal = handle->h_transaction->t_journal; ->null-ptr-deref Now, indirect truncate process didn't handle error. To solve this issue maybe simply add check handle is abort in '__ext4_journal_ensure_credits' is enough, and i also think this is necessary. Cc: stable@kernel.org Signed-off-by: Ye Bin <yebin10@huawei.com> Link: https://lore.kernel.org/r/20211224100341.3299128-1-yebin10@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> |
||
Harshad Shirwadkar
|
c27c29c6af |
ext4: initialize err_blk before calling __ext4_get_inode_loc
It is not guaranteed that __ext4_get_inode_loc will definitely set err_blk pointer when it returns EIO. To avoid using uninitialized variables, let's first set err_blk to 0. Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Harshad Shirwadkar <harshadshirwadkar@gmail.com> Link: https://lore.kernel.org/r/20211201163421.2631661-1-harshads@google.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org |
||
Chunguang Xu
|
8c80fb312d |
ext4: fix a possible ABBA deadlock due to busy PA
We found on older kernel (3.10) that in the scenario of insufficient disk space, system may trigger an ABBA deadlock problem, it seems that this problem still exists in latest kernel, try to fix it here. The main process triggered by this problem is that task A occupies the PA and waits for the jbd2 transaction finish, the jbd2 transaction waits for the completion of task B's IO (plug_list), but task B waits for the release of PA by task A to finish discard, which indirectly forms an ABBA deadlock. The related calltrace is as follows: Task A vfs_write ext4_mb_new_blocks() ext4_mb_mark_diskspace_used() JBD2 jbd2_journal_get_write_access() -> jbd2_journal_commit_transaction() ->schedule() filemap_fdatawait() | | | Task B | | do_unlinkat() | | ext4_evict_inode() | | jbd2_journal_begin_ordered_truncate() | | filemap_fdatawrite_range() | | ext4_mb_new_blocks() | -ext4_mb_discard_group_preallocations() <----- Here, try to cancel ext4_mb_discard_group_preallocations() internal retry due to PA busy, and do a limited number of retries inside ext4_mb_discard_preallocations(), which can circumvent the above problems, but also has some advantages: 1. Since the PA is in a busy state, if other groups have free PAs, keeping the current PA may help to reduce fragmentation. 2. Continue to traverse forward instead of waiting for the current group PA to be released. In most scenarios, the PA discard time can be reduced. However, in the case of smaller free space, if only a few groups have space, then due to multiple traversals of the group, it may increase CPU overhead. But in contrast, I feel that the overall benefit is better than the cost. Signed-off-by: Chunguang Xu <brookxu@tencent.com> Reported-by: kernel test robot <lkp@intel.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/1637630277-23496-1-git-send-email-brookxu.cn@gmail.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org |
||
Qing Wang
|
dfac1a1670 |
ext4: replace snprintf in show functions with sysfs_emit
coccicheck complains about the use of snprintf() in sysfs show functions. Fix the coccicheck warning: WARNING: use scnprintf or sprintf. Use sysfs_emit instead of scnprintf or sprintf makes more sense. Signed-off-by: Qing Wang <wangqing@vivo.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/1634095731-4528-1-git-send-email-wangqing@vivo.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> |
||
Jan Kara
|
4013d47a53 |
ext4: make sure to reset inode lockdep class when quota enabling fails
When we succeed in enabling some quota type but fail to enable another one with quota feature, we correctly disable all enabled quota types. However we forget to reset i_data_sem lockdep class. When the inode gets freed and reused, it will inherit this lockdep class (i_data_sem is initialized only when a slab is created) and thus eventually lockdep barfs about possible deadlocks. Reported-and-tested-by: syzbot+3b6f9218b1301ddda3e2@syzkaller.appspotmail.com Signed-off-by: Jan Kara <jack@suse.cz> Cc: stable@kernel.org Link: https://lore.kernel.org/r/20211007155336.12493-3-jack@suse.cz Signed-off-by: Theodore Ts'o <tytso@mit.edu> |
||
Jan Kara
|
15fc69bbbb |
ext4: make sure quota gets properly shutdown on error
When we hit an error when enabling quotas and setting inode flags, we do not properly shutdown quota subsystem despite returning error from Q_QUOTAON quotactl. This can lead to some odd situations like kernel using quota file while it is still writeable for userspace. Make sure we properly cleanup the quota subsystem in case of error. Signed-off-by: Jan Kara <jack@suse.cz> Cc: stable@kernel.org Link: https://lore.kernel.org/r/20211007155336.12493-2-jack@suse.cz Signed-off-by: Theodore Ts'o <tytso@mit.edu> |
||
Ye Bin
|
380a0091ca |
ext4: Fix BUG_ON in ext4_bread when write quota data
We got issue as follows when run syzkaller: [ 167.936972] EXT4-fs error (device loop0): __ext4_remount:6314: comm rep: Abort forced by user [ 167.938306] EXT4-fs (loop0): Remounting filesystem read-only [ 167.981637] Assertion failure in ext4_getblk() at fs/ext4/inode.c:847: '(EXT4_SB(inode->i_sb)->s_mount_state & EXT4_FC_REPLAY) || handle != NULL || create == 0' [ 167.983601] ------------[ cut here ]------------ [ 167.984245] kernel BUG at fs/ext4/inode.c:847! [ 167.984882] invalid opcode: 0000 [#1] PREEMPT SMP KASAN PTI [ 167.985624] CPU: 7 PID: 2290 Comm: rep Tainted: G B 5.16.0-rc5-next-20211217+ #123 [ 167.986823] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ?-20190727_073836-buildvm-ppc64le-16.ppc.fedoraproject.org-3.fc31 04/01/2014 [ 167.988590] RIP: 0010:ext4_getblk+0x17e/0x504 [ 167.989189] Code: c6 01 74 28 49 c7 c0 a0 a3 5c 9b b9 4f 03 00 00 48 c7 c2 80 9c 5c 9b 48 c7 c6 40 b6 5c 9b 48 c7 c7 20 a4 5c 9b e8 77 e3 fd ff <0f> 0b 8b 04 244 [ 167.991679] RSP: 0018:ffff8881736f7398 EFLAGS: 00010282 [ 167.992385] RAX: 0000000000000094 RBX: 1ffff1102e6dee75 RCX: 0000000000000000 [ 167.993337] RDX: 0000000000000001 RSI: ffffffff9b6e29e0 RDI: ffffed102e6dee66 [ 167.994292] RBP: ffff88816a076210 R08: 0000000000000094 R09: ffffed107363fa09 [ 167.995252] R10: ffff88839b1fd047 R11: ffffed107363fa08 R12: ffff88816a0761e8 [ 167.996205] R13: 0000000000000000 R14: 0000000000000021 R15: 0000000000000001 [ 167.997158] FS: 00007f6a1428c740(0000) GS:ffff88839b000000(0000) knlGS:0000000000000000 [ 167.998238] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 167.999025] CR2: 00007f6a140716c8 CR3: 0000000133216000 CR4: 00000000000006e0 [ 167.999987] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 168.000944] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 168.001899] Call Trace: [ 168.002235] <TASK> [ 168.007167] ext4_bread+0xd/0x53 [ 168.007612] ext4_quota_write+0x20c/0x5c0 [ 168.010457] write_blk+0x100/0x220 [ 168.010944] remove_free_dqentry+0x1c6/0x440 [ 168.011525] free_dqentry.isra.0+0x565/0x830 [ 168.012133] remove_tree+0x318/0x6d0 [ 168.014744] remove_tree+0x1eb/0x6d0 [ 168.017346] remove_tree+0x1eb/0x6d0 [ 168.019969] remove_tree+0x1eb/0x6d0 [ 168.022128] qtree_release_dquot+0x291/0x340 [ 168.023297] v2_release_dquot+0xce/0x120 [ 168.023847] dquot_release+0x197/0x3e0 [ 168.024358] ext4_release_dquot+0x22a/0x2d0 [ 168.024932] dqput.part.0+0x1c9/0x900 [ 168.025430] __dquot_drop+0x120/0x190 [ 168.025942] ext4_clear_inode+0x86/0x220 [ 168.026472] ext4_evict_inode+0x9e8/0xa22 [ 168.028200] evict+0x29e/0x4f0 [ 168.028625] dispose_list+0x102/0x1f0 [ 168.029148] evict_inodes+0x2c1/0x3e0 [ 168.030188] generic_shutdown_super+0xa4/0x3b0 [ 168.030817] kill_block_super+0x95/0xd0 [ 168.031360] deactivate_locked_super+0x85/0xd0 [ 168.031977] cleanup_mnt+0x2bc/0x480 [ 168.033062] task_work_run+0xd1/0x170 [ 168.033565] do_exit+0xa4f/0x2b50 [ 168.037155] do_group_exit+0xef/0x2d0 [ 168.037666] __x64_sys_exit_group+0x3a/0x50 [ 168.038237] do_syscall_64+0x3b/0x90 [ 168.038751] entry_SYSCALL_64_after_hwframe+0x44/0xae In order to reproduce this problem, the following conditions need to be met: 1. Ext4 filesystem with no journal; 2. Filesystem image with incorrect quota data; 3. Abort filesystem forced by user; 4. umount filesystem; As in ext4_quota_write: ... if (EXT4_SB(sb)->s_journal && !handle) { ext4_msg(sb, KERN_WARNING, "Quota write (off=%llu, len=%llu)" " cancelled because transaction is not started", (unsigned long long)off, (unsigned long long)len); return -EIO; } ... We only check handle if NULL when filesystem has journal. There is need check handle if NULL even when filesystem has no journal. Signed-off-by: Ye Bin <yebin10@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20211223015506.297766-1-yebin10@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org |