linux/fs/nilfs2
Kairui Song 1f49c1476d nilfs2: drop usage of page_index
Patch series "mm/swap: clean up and optimize swap cache index", v6.

Currently we use one swap_address_space for every 64M chunk to reduce lock
contention, this is like having a set of smaller files inside a swap
device.  But when doing swap cache look up or insert, we are still using
the offset of the whole large swap device.  This is OK for correctness, as
the offset (key) is unique.

But Xarray is specially optimized for small indexes, it creates the redix
tree levels lazily to be just enough to fit the largest key stored in one
Xarray.  So we are wasting tree nodes unnecessarily.

For 64M chunk it should only take at most 3 level to contain everything. 
But if we are using the offset from the whole swap device, the offset
(key) value will be way beyond 64M, and so will the tree level.

Optimize this by reduce the swap cache search space into 64M scope.

Test with `time memhog 128G` inside a 8G memcg using 128G swap (ramdisk
with SWP_SYNCHRONOUS_IO dropped, tested 3 times, results are stable.  The
test result is similar but the improvement is smaller if
SWP_SYNCHRONOUS_IO is enabled, as swap out path can never skip swap
cache):

Before:
6.07user 250.74system 4:17.26elapsed 99%CPU (0avgtext+0avgdata 8373376maxresident)k
0inputs+0outputs (55major+33555018minor)pagefaults 0swaps

After (+1.8% faster):
6.08user 246.09system 4:12.58elapsed 99%CPU (0avgtext+0avgdata 8373248maxresident)k
0inputs+0outputs (54major+33555027minor)pagefaults 0swaps

Similar result with MySQL and sysbench using swap:
Before:
94055.61 qps

After (+0.8% faster):
94834.91 qps

There is alse a very slight drop of radix tree node slab usage:
Before: 303952K
After:  302224K

For this series:

There are multiple places that expect mixed type of pages (page cache or
swap cache), eg. migration, huge memory split; There are four helpers
for that:

- page_index
- page_file_offset
- folio_index
- folio_file_pos

To keep the code clean and compatible, this series first cleaned up usage
of them.

page_file_offset and folio_file_pos are historical helpes that can be
simply dropped after clean up.  And page_index can be all converted to
folio_index or folio->index.

Then introduce two new helpers swap_cache_index and swap_dev_pos for swap.
Replace swp_offset with swap_cache_index when used to retrieve folio from
swap cache, and use swap_dev_pos when needed to retrieve the device
position of a swap entry.  This way, swap_cache_index can return the
optimized value with no compatibility issue.

The result is better performance and reduced LOC.

Idealy, in the future, we may want to reduce SWAP_ADDRESS_SPACE_SHIFT from
14 to 12: Default Xarray chunk offset is 6, so we have 3 level trees
instead of 2 level trees just for 2 extra bits.  But swap cache is based
on address_space struct, with 4 times more metadata sparsely distributed
in memory it waste more cacheline, the performance gain from this series
is almost canceled according to my test.  So first, just have a cleaner
seperation of offsets and smaller search space.


This patch (of 10):

page_index is only for mixed usage of page cache and swap cache, for pure
page cache usage, the caller can just use page->index instead.

It can't be a swap cache page here (being part of buffer head), so just
drop it.  And while we are at it, optimize the code by retrieving the
offset of the buffer head within the folio directly using bh_offset, and
get rid of the loop and usage of page helpers.

Link: https://lkml.kernel.org/r/20240521175854.96038-1-ryncsn@gmail.com
Link: https://lkml.kernel.org/r/20240521175854.96038-3-ryncsn@gmail.com
Suggested-by: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Kairui Song <kasong@tencent.com>
Acked-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Cc: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Cc: Anna Schumaker <anna@kernel.org>
Cc: Barry Song <v-songbaohua@oppo.com>
Cc: Chao Yu <chao@kernel.org>
Cc: Chris Li <chrisl@kernel.org>
Cc: David Hildenbrand <david@redhat.com>
Cc: David Howells <dhowells@redhat.com>
Cc: "Huang, Ying" <ying.huang@intel.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Ilya Dryomov <idryomov@gmail.com>
Cc: Jaegeuk Kim <jaegeuk@kernel.org>
Cc: Jeff Layton <jlayton@kernel.org>
Cc: Marc Dionne <marc.dionne@auristor.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: NeilBrown <neilb@suse.de>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Trond Myklebust <trond.myklebust@hammerspace.com>
Cc: Xiubo Li <xiubli@redhat.com>
Cc: Yosry Ahmed <yosryahmed@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-07-03 19:29:54 -07:00
..
alloc.c nilfs2: convert persistent object allocator to use kmap_local 2024-02-22 15:38:53 -08:00
alloc.h nilfs2: remove filenames from file comments 2021-11-09 10:02:52 -08:00
bmap.c nilfs2: drop usage of page_index 2024-07-03 19:29:54 -07:00
bmap.h nilfs2: remove filenames from file comments 2021-11-09 10:02:52 -08:00
btnode.c nilfs2: convert nilfs_page_bug() to nilfs_folio_bug() 2023-12-10 17:21:48 -08:00
btnode.h fs/nilfs2: Use the enum req_op and blk_opf_t types 2022-07-14 12:14:33 -06:00
btree.c nilfs2: add kernel-doc comments to nilfs_btree_convert_and_insert() 2024-04-25 21:07:08 -07:00
btree.h nilfs2: remove filenames from file comments 2021-11-09 10:02:52 -08:00
cpfile.c nilfs2: use div64_ul() instead of do_div() 2024-03-12 13:09:23 -07:00
cpfile.h nilfs2: remove nilfs_cpfile_{get,put}_checkpoint() 2024-02-22 15:38:53 -08:00
dat.c nilfs2: use div64_ul() instead of do_div() 2024-03-12 13:09:23 -07:00
dat.h nilfs2: remove filenames from file comments 2021-11-09 10:02:52 -08:00
dir.c nilfs2: fix nilfs_empty_dir() misjudgment and long loop on I/O errors 2024-06-05 19:19:27 -07:00
direct.c nilfs2: fix failure to detect DAT corruption in btree and direct mappings 2024-03-14 09:17:29 -07:00
direct.h nilfs2: remove filenames from file comments 2021-11-09 10:02:52 -08:00
export.h License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
file.c nilfs2: fix hang in nilfs_lookup_dirty_data_buffers() 2024-02-07 21:20:36 -08:00
gcinode.c nilfs2: add kernel-doc comments to nilfs_remove_all_gcinodes() 2024-04-25 21:07:08 -07:00
ifile.c nilfs2: localize highmem mapping for checkpoint reading within cpfile 2024-02-22 15:38:53 -08:00
ifile.h nilfs2: localize highmem mapping for checkpoint reading within cpfile 2024-02-22 15:38:53 -08:00
inode.c nilfs2: prevent kernel bug at submit_bh_wbc() 2024-03-14 09:17:30 -07:00
ioctl.c nilfs2: fix out-of-range warning 2024-04-09 10:52:12 +02:00
Kconfig fs: add CONFIG_BUFFER_HEAD 2023-08-02 09:13:09 -06:00
Makefile License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
mdt.c nilfs2: convert metadata file common code to use kmap_local 2024-02-22 15:38:53 -08:00
mdt.h nilfs2: fix lockdep warnings during disk space reclamation 2022-04-01 11:46:09 -07:00
namei.c misc cleanups (the part that hadn't been picked by individual fs trees) 2024-01-11 20:23:50 -08:00
nilfs.h nilfs2: convert to use the new mount API 2024-05-08 08:41:27 -07:00
page.c nilfs2: convert nilfs_copy_buffer() to use kmap_local 2024-02-22 15:38:53 -08:00
page.h nilfs2: convert nilfs_page_bug() to nilfs_folio_bug() 2023-12-10 17:21:48 -08:00
recovery.c nilfs2: make block erasure safe in nilfs_finish_roll_forward() 2024-05-19 14:36:21 -07:00
segbuf.c nilfs2: convert segment buffer to use kmap_local 2024-02-22 15:38:53 -08:00
segbuf.h nilfs2: remove filenames from file comments 2021-11-09 10:02:52 -08:00
segment.c nilfs2: fix potential kernel bug due to lack of writeback flag waiting 2024-06-05 19:19:24 -07:00
segment.h nilfs2: remove filenames from file comments 2021-11-09 10:02:52 -08:00
sufile.c nilfs2: use div64_ul() instead of do_div() 2024-03-12 13:09:23 -07:00
sufile.h nilfs2: remove filenames from file comments 2021-11-09 10:02:52 -08:00
super.c nilfs2: convert to use the new mount API 2024-05-08 08:41:27 -07:00
sysfs.c nilfs2: use default_groups in kobj_type 2021-12-29 10:53:48 +01:00
sysfs.h nilfs2: remove filenames from file comments 2021-11-09 10:02:52 -08:00
the_nilfs.c nilfs2: make superblock data array index computation sparse friendly 2024-05-08 08:41:28 -07:00
the_nilfs.h nilfs2: convert to use the new mount API 2024-05-08 08:41:27 -07:00