linux/fs
Kairui Song 1f49c1476d nilfs2: drop usage of page_index
Patch series "mm/swap: clean up and optimize swap cache index", v6.

Currently we use one swap_address_space for every 64M chunk to reduce lock
contention, this is like having a set of smaller files inside a swap
device.  But when doing swap cache look up or insert, we are still using
the offset of the whole large swap device.  This is OK for correctness, as
the offset (key) is unique.

But Xarray is specially optimized for small indexes, it creates the redix
tree levels lazily to be just enough to fit the largest key stored in one
Xarray.  So we are wasting tree nodes unnecessarily.

For 64M chunk it should only take at most 3 level to contain everything. 
But if we are using the offset from the whole swap device, the offset
(key) value will be way beyond 64M, and so will the tree level.

Optimize this by reduce the swap cache search space into 64M scope.

Test with `time memhog 128G` inside a 8G memcg using 128G swap (ramdisk
with SWP_SYNCHRONOUS_IO dropped, tested 3 times, results are stable.  The
test result is similar but the improvement is smaller if
SWP_SYNCHRONOUS_IO is enabled, as swap out path can never skip swap
cache):

Before:
6.07user 250.74system 4:17.26elapsed 99%CPU (0avgtext+0avgdata 8373376maxresident)k
0inputs+0outputs (55major+33555018minor)pagefaults 0swaps

After (+1.8% faster):
6.08user 246.09system 4:12.58elapsed 99%CPU (0avgtext+0avgdata 8373248maxresident)k
0inputs+0outputs (54major+33555027minor)pagefaults 0swaps

Similar result with MySQL and sysbench using swap:
Before:
94055.61 qps

After (+0.8% faster):
94834.91 qps

There is alse a very slight drop of radix tree node slab usage:
Before: 303952K
After:  302224K

For this series:

There are multiple places that expect mixed type of pages (page cache or
swap cache), eg. migration, huge memory split; There are four helpers
for that:

- page_index
- page_file_offset
- folio_index
- folio_file_pos

To keep the code clean and compatible, this series first cleaned up usage
of them.

page_file_offset and folio_file_pos are historical helpes that can be
simply dropped after clean up.  And page_index can be all converted to
folio_index or folio->index.

Then introduce two new helpers swap_cache_index and swap_dev_pos for swap.
Replace swp_offset with swap_cache_index when used to retrieve folio from
swap cache, and use swap_dev_pos when needed to retrieve the device
position of a swap entry.  This way, swap_cache_index can return the
optimized value with no compatibility issue.

The result is better performance and reduced LOC.

Idealy, in the future, we may want to reduce SWAP_ADDRESS_SPACE_SHIFT from
14 to 12: Default Xarray chunk offset is 6, so we have 3 level trees
instead of 2 level trees just for 2 extra bits.  But swap cache is based
on address_space struct, with 4 times more metadata sparsely distributed
in memory it waste more cacheline, the performance gain from this series
is almost canceled according to my test.  So first, just have a cleaner
seperation of offsets and smaller search space.


This patch (of 10):

page_index is only for mixed usage of page cache and swap cache, for pure
page cache usage, the caller can just use page->index instead.

It can't be a swap cache page here (being part of buffer head), so just
drop it.  And while we are at it, optimize the code by retrieving the
offset of the buffer head within the folio directly using bh_offset, and
get rid of the loop and usage of page helpers.

Link: https://lkml.kernel.org/r/20240521175854.96038-1-ryncsn@gmail.com
Link: https://lkml.kernel.org/r/20240521175854.96038-3-ryncsn@gmail.com
Suggested-by: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Kairui Song <kasong@tencent.com>
Acked-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Cc: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Cc: Anna Schumaker <anna@kernel.org>
Cc: Barry Song <v-songbaohua@oppo.com>
Cc: Chao Yu <chao@kernel.org>
Cc: Chris Li <chrisl@kernel.org>
Cc: David Hildenbrand <david@redhat.com>
Cc: David Howells <dhowells@redhat.com>
Cc: "Huang, Ying" <ying.huang@intel.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Ilya Dryomov <idryomov@gmail.com>
Cc: Jaegeuk Kim <jaegeuk@kernel.org>
Cc: Jeff Layton <jlayton@kernel.org>
Cc: Marc Dionne <marc.dionne@auristor.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: NeilBrown <neilb@suse.de>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Trond Myklebust <trond.myklebust@hammerspace.com>
Cc: Xiubo Li <xiubli@redhat.com>
Cc: Yosry Ahmed <yosryahmed@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-07-03 19:29:54 -07:00
..
9p Two fixes headed to stable trees: 2024-05-29 09:25:15 -07:00
adfs mm, slab: remove last vestiges of SLAB_MEM_SPREAD 2024-03-12 20:32:19 -07:00
affs affs: remove SLAB_MEM_SPREAD flag usage 2024-02-26 11:36:28 +01:00
afs netfs, 9p: Fix race between umount and async request completion 2024-05-27 13:12:13 +02:00
autofs dcache stuff for this cycle 2024-01-11 20:11:35 -08:00
bcachefs bcachefs: Fix kmalloc bug in __snapshot_t_mut 2024-06-25 20:51:14 -04:00
befs mm, slab: remove last vestiges of SLAB_MEM_SPREAD 2024-03-12 20:32:19 -07:00
bfs mm, slab: remove last vestiges of SLAB_MEM_SPREAD 2024-03-12 20:32:19 -07:00
btrfs for-6.10-rc5-tag 2024-06-27 10:26:16 -07:00
cachefiles cachefiles: remove unneeded include of <linux/fdtable.h> 2024-06-03 15:39:17 +02:00
ceph We have a series from Xiubo that adds support for additional access 2024-05-25 14:23:58 -07:00
coda mm, slab: remove last vestiges of SLAB_MEM_SPREAD 2024-03-12 20:32:19 -07:00
configfs
cramfs use ->bd_mapping instead of ->bd_inode->i_mapping 2024-05-03 02:36:51 -04:00
crypto The usual shower of singleton fixes and minor series all over MM, 2024-05-19 09:21:03 -07:00
debugfs debugfs: continue to ignore unknown mount options 2024-05-28 14:32:42 +02:00
devpts fs: Remove the now superfluous sentinel elements from ctl_table array 2023-12-28 04:57:57 -08:00
dlm dlm: return -ENOMEM if ls_recover_buf fails 2024-04-23 16:08:55 -05:00
ecryptfs hardening updates for 6.10-rc1 2024-05-13 14:14:05 -07:00
efivarfs efi: Clear up misconceptions about a maximum variable name size 2024-04-13 10:33:02 +02:00
efs efs: remove SLAB_MEM_SPREAD flag usage 2024-02-27 11:21:33 +01:00
erofs Changes since last update: 2024-05-24 09:31:50 -07:00
exfat exfat: zero the reserved fields of file and stream extension dentries 2024-04-25 21:59:59 +09:00
exportfs fs: Create a generic is_dot_dotdot() utility 2024-01-23 10:58:56 -05:00
ext2 ext2: Remove LEGACY_DIRECT_IO dependency 2024-05-03 11:50:28 +02:00
ext4 bd_inode series 2024-05-21 09:51:42 -07:00
f2fs f2fs update for 6.10-rc1 2024-05-20 13:23:43 -07:00
fat fs: add kernel-doc comments to fat_parse_long() 2024-04-25 21:07:02 -07:00
freevxfs freevxfs: Convert freevxfs to the new mount API. 2024-03-26 09:04:53 +01:00
fuse virtio: features, fixes, cleanups 2024-05-23 12:04:36 -07:00
gfs2 bd_inode series 2024-05-21 09:51:42 -07:00
hfs hfs: really remove hfs_writepage 2023-12-29 11:58:34 -08:00
hfsplus hfsplus: refactor copy_name to not use strncpy 2024-04-24 16:55:28 -07:00
hostfs hostfs: use d_splice_alias() calling conventions to simplify failure exits 2023-12-21 12:51:00 -05:00
hpfs mm, slab: remove last vestiges of SLAB_MEM_SPREAD 2024-03-12 20:32:19 -07:00
hugetlbfs The usual shower of singleton fixes and minor series all over MM, 2024-05-19 09:21:03 -07:00
iomap iomap: Fix iomap_adjust_read_range for plen calculation 2024-06-05 17:27:03 +02:00
isofs isofs: Use *-y instead of *-objs in Makefile 2024-05-09 18:09:57 +02:00
jbd2 bd_inode series 2024-05-21 09:51:42 -07:00
jffs2 This pull request contains the following changes for JFFS2: 2024-05-25 13:23:42 -07:00
jfs jfs: xattr: fix buffer overflow for invalid xattr 2024-06-04 18:09:03 +02:00
kernfs kernfs: mount: Remove unnecessary ‘NULL’ values from knparent 2024-05-04 19:02:39 +02:00
lockd lockd: host: Remove unnecessary statements'host = NULL;' 2024-05-06 09:07:20 -04:00
minix minix: convert minix to use the new mount api 2024-03-26 09:04:55 +01:00
netfs vfs-6.10-rc2.fixes 2024-05-27 08:09:12 -07:00
nfs nfs: drop the incorrect assertion in nfs_swap_rw() 2024-06-24 20:52:11 -07:00
nfs_common
nfsd nfsd-6.10 fixes: 2024-06-28 09:32:33 -07:00
nilfs2 nilfs2: drop usage of page_index 2024-07-03 19:29:54 -07:00
nls
notify Revert "fanotify: remove unneeded sub-zero check for unsigned value" 2024-05-20 12:43:58 -07:00
ntfs3 driver ntfs3 for linux 6.10 2024-05-25 14:19:01 -07:00
ocfs2 ocfs2: fix DIO failure due to insufficient transaction credits 2024-06-24 20:52:10 -07:00
omfs
openpromfs openpromfs: finish conversion to the new mount API 2024-03-26 09:04:54 +01:00
orangefs orangefs: fix out-of-bounds fsid access 2024-05-14 17:44:14 -07:00
overlayfs ovl: fix encoding fid for lower only root 2024-06-14 10:30:40 +02:00
proc /proc/pid/smaps: add mseal info for vma 2024-06-24 20:52:09 -07:00
pstore pstore/zone: Don't clear memory twice 2024-03-09 12:33:22 -08:00
qnx4 mm, slab: remove last vestiges of SLAB_MEM_SPREAD 2024-03-12 20:32:19 -07:00
qnx6 qnx6: convert qnx6 to use the new mount api 2024-03-26 09:04:53 +01:00
quota quota: fix to propagate error of mark_dquot_dirty() to caller 2024-04-12 14:52:29 +02:00
ramfs mm: switch mm->get_unmapped_area() to a flag 2024-04-25 20:56:25 -07:00
reiserfs getting rid of bogus set_blocksize() uses, switching it 2024-05-21 08:34:51 -07:00
romfs fs,block: yield devices early 2024-03-27 13:17:15 +01:00
smb cifs: Move the 'pid' from the subreq to the req 2024-06-20 15:25:08 -05:00
squashfs Mainly singleton patches, documented in their respective changelogs. 2024-05-19 14:02:03 -07:00
sysfs Merge 6.9-rc5 into driver-core-next 2024-04-23 13:27:43 +02:00
sysv sysv: remove SLAB_MEM_SPREAD flag usage 2024-02-27 11:21:31 +01:00
tracefs eventfs: Do not use attributes for events directory 2024-05-23 09:31:50 -04:00
ubifs This pull request contains updates for UBI and UBIFS: 2024-03-21 15:09:29 -07:00
udf udf: Use a folio in udf_write_end() 2024-04-23 15:37:02 +02:00
ufs mm, slab: remove last vestiges of SLAB_MEM_SPREAD 2024-03-12 20:32:19 -07:00
unicode kbuild: use $(src) instead of $(srctree)/$(src) for source directory 2024-05-10 04:34:52 +09:00
vboxsf vboxsf: explicitly deny setlease attempts 2024-04-03 16:06:39 +02:00
verity fsverity: use register_sysctl_init() to avoid kmemleak warning 2024-05-03 08:30:58 -07:00
xfs xfs: honor init_xattrs in xfs_init_new_inode for !ATTR fs 2024-06-26 14:29:25 +05:30
zonefs zonefs: Use str_plural() to fix Coccinelle warning 2024-04-10 07:23:47 +09:00
aio.c Assorted commits that had missed the last merge window... 2024-05-21 13:11:44 -07:00
anon_inodes.c fs: Create anon_inode_getfile_fmode() 2024-04-26 10:33:05 +02:00
attr.c lsm/stable-6.9 PR 20240312 2024-03-12 20:03:34 -07:00
backing-file.c ovl: implement tmpfile 2024-05-02 20:35:57 +02:00
bad_inode.c
binfmt_elf_fdpic.c binfmt_elf_fdpic: fix /proc/<pid>/auxv 2024-04-24 15:55:28 -07:00
binfmt_elf_test.c
binfmt_elf.c Mainly singleton patches, documented in their respective changelogs. 2024-05-19 14:02:03 -07:00
binfmt_flat.c
binfmt_misc.c execve updates for v6.7-rc1 2023-10-30 19:28:19 -10:00
binfmt_script.c
buffer.c bd_inode series 2024-05-21 09:51:42 -07:00
char_dev.c As usual, lots of singleton and doubleton patches all over the tree and 2023-11-02 20:53:31 -10:00
compat_binfmt_elf.c
coredump.c virtio: features, fixes, cleanups 2024-05-23 12:04:36 -07:00
d_path.c
dax.c dax: use huge_zero_folio 2024-04-25 20:56:20 -07:00
dcache.c Revert "vfs: Delete the associated dentry when deleting a file" 2024-05-29 09:39:34 -07:00
direct-io.c fs/direct-io: remove redundant assignment to variable retval 2024-04-11 10:21:24 +02:00
drop_caches.c
eventfd.c eventfd: strictly check the count parameter of eventfd_write to avoid inputting illegal strings 2024-02-08 10:12:26 +01:00
eventpoll.c epoll: be better about file lifetimes 2024-05-05 14:00:48 -07:00
exec.c The usual shower of singleton fixes and minor series all over MM, 2024-05-19 09:21:03 -07:00
fcntl.c fcntl: add F_DUPFD_QUERY fcntl() 2024-05-10 08:26:31 +02:00
fhandle.c fs: Annotate struct file_handle with __counted_by() and use struct_size() 2024-04-05 15:53:47 +02:00
file_table.c lsm/stable-6.9 PR 20240312 2024-03-12 20:03:34 -07:00
file.c fs/file: fix the check in find_next_fd() 2024-05-30 09:11:47 +02:00
filesystems.c
fs_context.c
fs_parser.c __fs_parse: Correct a documentation comment 2024-02-02 13:11:50 +01:00
fs_pin.c
fs_struct.c
fs_types.c
fs-writeback.c fs/writeback: remove unnecessary return in writeback_inodes_sb 2024-04-05 15:53:45 +02:00
fsopen.c
init.c
inode.c bcachefs updates for 6.9 2024-03-15 09:00:09 -07:00
internal.h ovl: implement tmpfile 2024-05-02 20:35:57 +02:00
ioctl.c fs/ioctl: Add a comment to keep the logic in sync with LSM policies 2024-05-13 06:58:35 +02:00
Kconfig - Sumanth Korikkar has taught s390 to allocate hotplug-time page frames 2024-03-14 17:43:30 -07:00
Kconfig.binfmt
kernel_read_file.c
libfs.c shmem: Fix shmem_rename2() 2024-04-17 13:49:44 +02:00
locks.c filelock: fix deadlock detection in POSIX locking 2024-02-20 09:53:33 +01:00
Makefile vfs-6.9.pidfd 2024-03-11 10:21:06 -07:00
mbcache.c vfs: remove SLAB_MEM_SPREAD flag usage 2024-02-27 11:21:31 +01:00
mnt_idmapping.c fs/mnt_idmapping.c: Return -EINVAL when no map is written 2024-02-08 10:12:37 +01:00
mount.h mounts: keep list of mounts in an rbtree 2023-11-18 14:56:16 +01:00
mpage.c block, fs: Restore the per-bio/request data lifetime fields 2024-02-06 14:31:05 +01:00
namei.c overlayfs update for 6.10 2024-05-22 09:23:18 -07:00
namespace.c fs: relax mount_setattr() permission checks 2024-02-07 21:16:29 +01:00
nsfs.c pidfs: remove config option 2024-03-13 12:53:53 -07:00
open.c ftruncate: pass a signed offset 2024-06-24 18:29:20 +02:00
pidfs.c fs/pidfs: make 'lsof' happy with our inode changes 2024-05-21 08:08:00 -07:00
pipe.c fs/pipe: Convert to lockdep_cmp_fn 2024-02-02 13:11:49 +01:00
pnode.c mounts: keep list of mounts in an rbtree 2023-11-18 14:56:16 +01:00
pnode.h
posix_acl.c lsm/stable-6.9 PR 20240312 2024-03-12 20:03:34 -07:00
proc_namespace.c namespace: extract show_path() helper 2023-11-18 14:56:16 +01:00
read_write.c Assorted commits that had missed the last merge window... 2024-05-21 13:11:44 -07:00
readdir.c fsnotify: optionally pass access range in file permission hooks 2023-12-12 16:20:02 +01:00
remap_range.c vfs: export remap and write check helpers 2024-04-15 14:54:13 -07:00
select.c fs/select: rework stack allocation hack for clang 2024-02-20 09:23:52 +01:00
seq_file.c seq_file: Simplify __seq_puts() 2024-05-02 16:28:20 +02:00
signalfd.c signalfd: drop an obsolete comment 2024-05-24 13:34:07 +02:00
splice.c remove call_{read,write}_iter() functions 2024-04-15 16:03:25 -04:00
stack.c
stat.c statx: stx_subvol 2024-03-26 09:01:18 +01:00
statfs.c
super.c \n 2024-05-20 12:31:43 -07:00
sync.c
sysctls.c fs: Remove the now superfluous sentinel elements from ctl_table array 2023-12-28 04:57:57 -08:00
timerfd.c timerfd: convert to ->read_iter() 2024-04-10 16:23:02 -06:00
userfaultfd.c The usual shower of singleton fixes and minor series all over MM, 2024-05-19 09:21:03 -07:00
utimes.c
xattr.c evm: Move to LSM infrastructure 2024-02-15 23:43:47 -05:00