2018-09-12 09:16:07 +08:00
|
|
|
// SPDX-License-Identifier: GPL-2.0
|
2012-11-29 12:28:09 +08:00
|
|
|
/*
|
f2fs: add garbage collection functions
This adds on-demand and background cleaning functions.
- The basic background cleaning policy is trying to do cleaning jobs as much as
possible whenever the system is idle. Once the background cleaning is done,
the cleaner sleeps an amount of time not to interfere with VFS calls. The time
is dynamically adjusted according to the status of whole segments, which is
decreased when the following conditions are satisfied.
. GC is not conducted currently, and
. IO subsystem is idle by checking the number of requets in bdev's request
list, and
. There are enough dirty segments.
Otherwise, the time is increased incrementally until to the maximum time.
Note that, min and max times are 10 secs and 30 secs by default.
- F2FS adopts a default victim selection policy where background cleaning uses
a cost-benefit algorithm, while on-demand cleaning uses a greedy algorithm.
- The method of moving data during the cleaning is slightly different between
background and on-demand cleaning schemes. In the case of background cleaning,
F2FS loads the data, and marks them as dirty. Then, F2FS expects that the data
will be moved by flusher or VM. In the case of on-demand cleaning, F2FS should
move the data right away.
- In order to identify valid blocks in a victim segment, F2FS scans the bitmap
of the segment managed as an SIT entry.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:13:01 +08:00
|
|
|
* fs/f2fs/gc.c
|
|
|
|
*
|
|
|
|
* Copyright (c) 2012 Samsung Electronics Co., Ltd.
|
|
|
|
* http://www.samsung.com/
|
|
|
|
*/
|
|
|
|
#include <linux/fs.h>
|
|
|
|
#include <linux/module.h>
|
|
|
|
#include <linux/backing-dev.h>
|
|
|
|
#include <linux/init.h>
|
|
|
|
#include <linux/f2fs_fs.h>
|
|
|
|
#include <linux/kthread.h>
|
|
|
|
#include <linux/delay.h>
|
|
|
|
#include <linux/freezer.h>
|
2020-04-01 02:43:07 +08:00
|
|
|
#include <linux/sched/signal.h>
|
f2fs: add garbage collection functions
This adds on-demand and background cleaning functions.
- The basic background cleaning policy is trying to do cleaning jobs as much as
possible whenever the system is idle. Once the background cleaning is done,
the cleaner sleeps an amount of time not to interfere with VFS calls. The time
is dynamically adjusted according to the status of whole segments, which is
decreased when the following conditions are satisfied.
. GC is not conducted currently, and
. IO subsystem is idle by checking the number of requets in bdev's request
list, and
. There are enough dirty segments.
Otherwise, the time is increased incrementally until to the maximum time.
Note that, min and max times are 10 secs and 30 secs by default.
- F2FS adopts a default victim selection policy where background cleaning uses
a cost-benefit algorithm, while on-demand cleaning uses a greedy algorithm.
- The method of moving data during the cleaning is slightly different between
background and on-demand cleaning schemes. In the case of background cleaning,
F2FS loads the data, and marks them as dirty. Then, F2FS expects that the data
will be moved by flusher or VM. In the case of on-demand cleaning, F2FS should
move the data right away.
- In order to identify valid blocks in a victim segment, F2FS scans the bitmap
of the segment managed as an SIT entry.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:13:01 +08:00
|
|
|
|
|
|
|
#include "f2fs.h"
|
|
|
|
#include "node.h"
|
|
|
|
#include "segment.h"
|
|
|
|
#include "gc.h"
|
2013-04-23 15:42:53 +08:00
|
|
|
#include <trace/events/f2fs.h>
|
f2fs: add garbage collection functions
This adds on-demand and background cleaning functions.
- The basic background cleaning policy is trying to do cleaning jobs as much as
possible whenever the system is idle. Once the background cleaning is done,
the cleaner sleeps an amount of time not to interfere with VFS calls. The time
is dynamically adjusted according to the status of whole segments, which is
decreased when the following conditions are satisfied.
. GC is not conducted currently, and
. IO subsystem is idle by checking the number of requets in bdev's request
list, and
. There are enough dirty segments.
Otherwise, the time is increased incrementally until to the maximum time.
Note that, min and max times are 10 secs and 30 secs by default.
- F2FS adopts a default victim selection policy where background cleaning uses
a cost-benefit algorithm, while on-demand cleaning uses a greedy algorithm.
- The method of moving data during the cleaning is slightly different between
background and on-demand cleaning schemes. In the case of background cleaning,
F2FS loads the data, and marks them as dirty. Then, F2FS expects that the data
will be moved by flusher or VM. In the case of on-demand cleaning, F2FS should
move the data right away.
- In order to identify valid blocks in a victim segment, F2FS scans the bitmap
of the segment managed as an SIT entry.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:13:01 +08:00
|
|
|
|
f2fs: support age threshold based garbage collection
There are several issues in current background GC algorithm:
- valid blocks is one of key factors during cost overhead calculation,
so if segment has less valid block, however even its age is young or
it locates hot segment, CB algorithm will still choose the segment as
victim, it's not appropriate.
- GCed data/node will go to existing logs, no matter in-there datas'
update frequency is the same or not, it may mix hot and cold data
again.
- GC alloctor mainly use LFS type segment, it will cost free segment
more quickly.
This patch introduces a new algorithm named age threshold based
garbage collection to solve above issues, there are three steps
mainly:
1. select a source victim:
- set an age threshold, and select candidates beased threshold:
e.g.
0 means youngest, 100 means oldest, if we set age threshold to 80
then select dirty segments which has age in range of [80, 100] as
candiddates;
- set candidate_ratio threshold, and select candidates based the
ratio, so that we can shrink candidates to those oldest segments;
- select target segment with fewest valid blocks in order to
migrate blocks with minimum cost;
2. select a target victim:
- select candidates beased age threshold;
- set candidate_radius threshold, search candidates whose age is
around source victims, searching radius should less than the
radius threshold.
- select target segment with most valid blocks in order to avoid
migrating current target segment.
3. merge valid blocks from source victim into target victim with
SSR alloctor.
Test steps:
- create 160 dirty segments:
* half of them have 128 valid blocks per segment
* left of them have 384 valid blocks per segment
- run background GC
Benefit: GC count and block movement count both decrease obviously:
- Before:
- Valid: 86
- Dirty: 1
- Prefree: 11
- Free: 6001 (6001)
GC calls: 162 (BG: 220)
- data segments : 160 (160)
- node segments : 2 (2)
Try to move 41454 blocks (BG: 41454)
- data blocks : 40960 (40960)
- node blocks : 494 (494)
IPU: 0 blocks
SSR: 0 blocks in 0 segments
LFS: 41364 blocks in 81 segments
- After:
- Valid: 87
- Dirty: 0
- Prefree: 4
- Free: 6008 (6008)
GC calls: 75 (BG: 76)
- data segments : 74 (74)
- node segments : 1 (1)
Try to move 12813 blocks (BG: 12813)
- data blocks : 12544 (12544)
- node blocks : 269 (269)
IPU: 0 blocks
SSR: 12032 blocks in 77 segments
LFS: 855 blocks in 2 segments
Signed-off-by: Chao Yu <yuchao0@huawei.com>
[Jaegeuk Kim: fix a bug along with pinfile in-mem segment & clean up]
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-08-04 21:14:49 +08:00
|
|
|
static struct kmem_cache *victim_entry_slab;
|
|
|
|
|
2020-06-18 12:37:10 +08:00
|
|
|
static unsigned int count_bits(const unsigned long *addr,
|
|
|
|
unsigned int offset, unsigned int len);
|
|
|
|
|
f2fs: add garbage collection functions
This adds on-demand and background cleaning functions.
- The basic background cleaning policy is trying to do cleaning jobs as much as
possible whenever the system is idle. Once the background cleaning is done,
the cleaner sleeps an amount of time not to interfere with VFS calls. The time
is dynamically adjusted according to the status of whole segments, which is
decreased when the following conditions are satisfied.
. GC is not conducted currently, and
. IO subsystem is idle by checking the number of requets in bdev's request
list, and
. There are enough dirty segments.
Otherwise, the time is increased incrementally until to the maximum time.
Note that, min and max times are 10 secs and 30 secs by default.
- F2FS adopts a default victim selection policy where background cleaning uses
a cost-benefit algorithm, while on-demand cleaning uses a greedy algorithm.
- The method of moving data during the cleaning is slightly different between
background and on-demand cleaning schemes. In the case of background cleaning,
F2FS loads the data, and marks them as dirty. Then, F2FS expects that the data
will be moved by flusher or VM. In the case of on-demand cleaning, F2FS should
move the data right away.
- In order to identify valid blocks in a victim segment, F2FS scans the bitmap
of the segment managed as an SIT entry.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:13:01 +08:00
|
|
|
static int gc_thread_func(void *data)
|
|
|
|
{
|
|
|
|
struct f2fs_sb_info *sbi = data;
|
2013-08-04 22:09:40 +08:00
|
|
|
struct f2fs_gc_kthread *gc_th = sbi->gc_thread;
|
f2fs: add garbage collection functions
This adds on-demand and background cleaning functions.
- The basic background cleaning policy is trying to do cleaning jobs as much as
possible whenever the system is idle. Once the background cleaning is done,
the cleaner sleeps an amount of time not to interfere with VFS calls. The time
is dynamically adjusted according to the status of whole segments, which is
decreased when the following conditions are satisfied.
. GC is not conducted currently, and
. IO subsystem is idle by checking the number of requets in bdev's request
list, and
. There are enough dirty segments.
Otherwise, the time is increased incrementally until to the maximum time.
Note that, min and max times are 10 secs and 30 secs by default.
- F2FS adopts a default victim selection policy where background cleaning uses
a cost-benefit algorithm, while on-demand cleaning uses a greedy algorithm.
- The method of moving data during the cleaning is slightly different between
background and on-demand cleaning schemes. In the case of background cleaning,
F2FS loads the data, and marks them as dirty. Then, F2FS expects that the data
will be moved by flusher or VM. In the case of on-demand cleaning, F2FS should
move the data right away.
- In order to identify valid blocks in a victim segment, F2FS scans the bitmap
of the segment managed as an SIT entry.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:13:01 +08:00
|
|
|
wait_queue_head_t *wq = &sbi->gc_thread->gc_wait_queue_head;
|
2017-08-07 23:12:46 +08:00
|
|
|
unsigned int wait_ms;
|
f2fs: add garbage collection functions
This adds on-demand and background cleaning functions.
- The basic background cleaning policy is trying to do cleaning jobs as much as
possible whenever the system is idle. Once the background cleaning is done,
the cleaner sleeps an amount of time not to interfere with VFS calls. The time
is dynamically adjusted according to the status of whole segments, which is
decreased when the following conditions are satisfied.
. GC is not conducted currently, and
. IO subsystem is idle by checking the number of requets in bdev's request
list, and
. There are enough dirty segments.
Otherwise, the time is increased incrementally until to the maximum time.
Note that, min and max times are 10 secs and 30 secs by default.
- F2FS adopts a default victim selection policy where background cleaning uses
a cost-benefit algorithm, while on-demand cleaning uses a greedy algorithm.
- The method of moving data during the cleaning is slightly different between
background and on-demand cleaning schemes. In the case of background cleaning,
F2FS loads the data, and marks them as dirty. Then, F2FS expects that the data
will be moved by flusher or VM. In the case of on-demand cleaning, F2FS should
move the data right away.
- In order to identify valid blocks in a victim segment, F2FS scans the bitmap
of the segment managed as an SIT entry.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:13:01 +08:00
|
|
|
|
2013-08-04 22:09:40 +08:00
|
|
|
wait_ms = gc_th->min_sleep_time;
|
f2fs: add garbage collection functions
This adds on-demand and background cleaning functions.
- The basic background cleaning policy is trying to do cleaning jobs as much as
possible whenever the system is idle. Once the background cleaning is done,
the cleaner sleeps an amount of time not to interfere with VFS calls. The time
is dynamically adjusted according to the status of whole segments, which is
decreased when the following conditions are satisfied.
. GC is not conducted currently, and
. IO subsystem is idle by checking the number of requets in bdev's request
list, and
. There are enough dirty segments.
Otherwise, the time is increased incrementally until to the maximum time.
Note that, min and max times are 10 secs and 30 secs by default.
- F2FS adopts a default victim selection policy where background cleaning uses
a cost-benefit algorithm, while on-demand cleaning uses a greedy algorithm.
- The method of moving data during the cleaning is slightly different between
background and on-demand cleaning schemes. In the case of background cleaning,
F2FS loads the data, and marks them as dirty. Then, F2FS expects that the data
will be moved by flusher or VM. In the case of on-demand cleaning, F2FS should
move the data right away.
- In order to identify valid blocks in a victim segment, F2FS scans the bitmap
of the segment managed as an SIT entry.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:13:01 +08:00
|
|
|
|
2017-05-18 01:36:58 +08:00
|
|
|
set_freezable();
|
f2fs: add garbage collection functions
This adds on-demand and background cleaning functions.
- The basic background cleaning policy is trying to do cleaning jobs as much as
possible whenever the system is idle. Once the background cleaning is done,
the cleaner sleeps an amount of time not to interfere with VFS calls. The time
is dynamically adjusted according to the status of whole segments, which is
decreased when the following conditions are satisfied.
. GC is not conducted currently, and
. IO subsystem is idle by checking the number of requets in bdev's request
list, and
. There are enough dirty segments.
Otherwise, the time is increased incrementally until to the maximum time.
Note that, min and max times are 10 secs and 30 secs by default.
- F2FS adopts a default victim selection policy where background cleaning uses
a cost-benefit algorithm, while on-demand cleaning uses a greedy algorithm.
- The method of moving data during the cleaning is slightly different between
background and on-demand cleaning schemes. In the case of background cleaning,
F2FS loads the data, and marks them as dirty. Then, F2FS expects that the data
will be moved by flusher or VM. In the case of on-demand cleaning, F2FS should
move the data right away.
- In order to identify valid blocks in a victim segment, F2FS scans the bitmap
of the segment managed as an SIT entry.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:13:01 +08:00
|
|
|
do {
|
2020-02-14 17:44:13 +08:00
|
|
|
bool sync_mode;
|
|
|
|
|
2017-05-18 01:36:58 +08:00
|
|
|
wait_event_interruptible_timeout(*wq,
|
2017-08-07 13:09:00 +08:00
|
|
|
kthread_should_stop() || freezing(current) ||
|
|
|
|
gc_th->gc_wake,
|
2017-05-18 01:36:58 +08:00
|
|
|
msecs_to_jiffies(wait_ms));
|
|
|
|
|
2017-08-07 13:09:00 +08:00
|
|
|
/* give it a try one time */
|
|
|
|
if (gc_th->gc_wake)
|
|
|
|
gc_th->gc_wake = 0;
|
|
|
|
|
2018-09-29 18:31:28 +08:00
|
|
|
if (try_to_freeze()) {
|
|
|
|
stat_other_skip_bggc_count(sbi);
|
f2fs: add garbage collection functions
This adds on-demand and background cleaning functions.
- The basic background cleaning policy is trying to do cleaning jobs as much as
possible whenever the system is idle. Once the background cleaning is done,
the cleaner sleeps an amount of time not to interfere with VFS calls. The time
is dynamically adjusted according to the status of whole segments, which is
decreased when the following conditions are satisfied.
. GC is not conducted currently, and
. IO subsystem is idle by checking the number of requets in bdev's request
list, and
. There are enough dirty segments.
Otherwise, the time is increased incrementally until to the maximum time.
Note that, min and max times are 10 secs and 30 secs by default.
- F2FS adopts a default victim selection policy where background cleaning uses
a cost-benefit algorithm, while on-demand cleaning uses a greedy algorithm.
- The method of moving data during the cleaning is slightly different between
background and on-demand cleaning schemes. In the case of background cleaning,
F2FS loads the data, and marks them as dirty. Then, F2FS expects that the data
will be moved by flusher or VM. In the case of on-demand cleaning, F2FS should
move the data right away.
- In order to identify valid blocks in a victim segment, F2FS scans the bitmap
of the segment managed as an SIT entry.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:13:01 +08:00
|
|
|
continue;
|
2018-09-29 18:31:28 +08:00
|
|
|
}
|
f2fs: add garbage collection functions
This adds on-demand and background cleaning functions.
- The basic background cleaning policy is trying to do cleaning jobs as much as
possible whenever the system is idle. Once the background cleaning is done,
the cleaner sleeps an amount of time not to interfere with VFS calls. The time
is dynamically adjusted according to the status of whole segments, which is
decreased when the following conditions are satisfied.
. GC is not conducted currently, and
. IO subsystem is idle by checking the number of requets in bdev's request
list, and
. There are enough dirty segments.
Otherwise, the time is increased incrementally until to the maximum time.
Note that, min and max times are 10 secs and 30 secs by default.
- F2FS adopts a default victim selection policy where background cleaning uses
a cost-benefit algorithm, while on-demand cleaning uses a greedy algorithm.
- The method of moving data during the cleaning is slightly different between
background and on-demand cleaning schemes. In the case of background cleaning,
F2FS loads the data, and marks them as dirty. Then, F2FS expects that the data
will be moved by flusher or VM. In the case of on-demand cleaning, F2FS should
move the data right away.
- In order to identify valid blocks in a victim segment, F2FS scans the bitmap
of the segment managed as an SIT entry.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:13:01 +08:00
|
|
|
if (kthread_should_stop())
|
|
|
|
break;
|
|
|
|
|
2013-01-29 17:30:07 +08:00
|
|
|
if (sbi->sb->s_writers.frozen >= SB_FREEZE_WRITE) {
|
2015-01-26 20:24:21 +08:00
|
|
|
increase_sleep_time(gc_th, &wait_ms);
|
2018-09-29 18:31:28 +08:00
|
|
|
stat_other_skip_bggc_count(sbi);
|
2013-01-29 17:30:07 +08:00
|
|
|
continue;
|
|
|
|
}
|
|
|
|
|
2017-02-25 11:08:28 +08:00
|
|
|
if (time_to_inject(sbi, FAULT_CHECKPOINT)) {
|
2019-11-01 17:53:23 +08:00
|
|
|
f2fs_show_injection_info(sbi, FAULT_CHECKPOINT);
|
2016-09-26 19:45:55 +08:00
|
|
|
f2fs_stop_checkpoint(sbi, false);
|
2017-02-25 11:08:28 +08:00
|
|
|
}
|
2016-09-26 19:45:55 +08:00
|
|
|
|
2018-09-29 18:31:28 +08:00
|
|
|
if (!sb_start_write_trylock(sbi->sb)) {
|
|
|
|
stat_other_skip_bggc_count(sbi);
|
f2fs: make background threads of f2fs being aware of freezing
When ->freeze_fs is called from lvm for doing snapshot, it needs to
make sure there will be no more changes in filesystem's data, however,
previously, background threads like GC thread wasn't aware of freezing,
so in environment with active background threads, data of snapshot
becomes unstable.
This patch fixes this issue by adding sb_{start,end}_intwrite in
below background threads:
- GC thread
- flush thread
- discard thread
Note that, don't use sb_start_intwrite() in gc_thread_func() due to:
generic/241 reports below bug:
======================================================
WARNING: possible circular locking dependency detected
4.13.0-rc1+ #32 Tainted: G O
------------------------------------------------------
f2fs_gc-250:0/22186 is trying to acquire lock:
(&sbi->gc_mutex){+.+...}, at: [<f8fa7f0b>] f2fs_sync_fs+0x7b/0x1b0 [f2fs]
but task is already holding lock:
(sb_internal#2){++++.-}, at: [<f8fb5609>] gc_thread_func+0x159/0x4a0 [f2fs]
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #2 (sb_internal#2){++++.-}:
__lock_acquire+0x405/0x7b0
lock_acquire+0xae/0x220
__sb_start_write+0x11d/0x1f0
f2fs_evict_inode+0x2d6/0x4e0 [f2fs]
evict+0xa8/0x170
iput+0x1fb/0x2c0
f2fs_sync_inode_meta+0x3f/0xf0 [f2fs]
write_checkpoint+0x1b1/0x750 [f2fs]
f2fs_sync_fs+0x85/0x1b0 [f2fs]
f2fs_do_sync_file.isra.24+0x137/0xa30 [f2fs]
f2fs_sync_file+0x34/0x40 [f2fs]
vfs_fsync_range+0x4a/0xa0
do_fsync+0x3c/0x60
SyS_fdatasync+0x15/0x20
do_fast_syscall_32+0xa1/0x1b0
entry_SYSENTER_32+0x4c/0x7b
-> #1 (&sbi->cp_mutex){+.+...}:
__lock_acquire+0x405/0x7b0
lock_acquire+0xae/0x220
__mutex_lock+0x4f/0x830
mutex_lock_nested+0x25/0x30
write_checkpoint+0x2f/0x750 [f2fs]
f2fs_sync_fs+0x85/0x1b0 [f2fs]
sync_filesystem+0x67/0x80
generic_shutdown_super+0x27/0x100
kill_block_super+0x22/0x50
kill_f2fs_super+0x3a/0x40 [f2fs]
deactivate_locked_super+0x3d/0x70
deactivate_super+0x40/0x60
cleanup_mnt+0x39/0x70
__cleanup_mnt+0x10/0x20
task_work_run+0x69/0x80
exit_to_usermode_loop+0x57/0x92
do_fast_syscall_32+0x18c/0x1b0
entry_SYSENTER_32+0x4c/0x7b
-> #0 (&sbi->gc_mutex){+.+...}:
validate_chain.isra.36+0xc50/0xdb0
__lock_acquire+0x405/0x7b0
lock_acquire+0xae/0x220
__mutex_lock+0x4f/0x830
mutex_lock_nested+0x25/0x30
f2fs_sync_fs+0x7b/0x1b0 [f2fs]
f2fs_balance_fs_bg+0xb9/0x200 [f2fs]
gc_thread_func+0x302/0x4a0 [f2fs]
kthread+0xe9/0x120
ret_from_fork+0x19/0x24
other info that might help us debug this:
Chain exists of:
&sbi->gc_mutex --> &sbi->cp_mutex --> sb_internal#2
Possible unsafe locking scenario:
CPU0 CPU1
---- ----
lock(sb_internal#2);
lock(&sbi->cp_mutex);
lock(sb_internal#2);
lock(&sbi->gc_mutex);
*** DEADLOCK ***
1 lock held by f2fs_gc-250:0/22186:
#0: (sb_internal#2){++++.-}, at: [<f8fb5609>] gc_thread_func+0x159/0x4a0 [f2fs]
stack backtrace:
CPU: 2 PID: 22186 Comm: f2fs_gc-250:0 Tainted: G O 4.13.0-rc1+ #32
Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
Call Trace:
dump_stack+0x5f/0x92
print_circular_bug+0x1b3/0x1bd
validate_chain.isra.36+0xc50/0xdb0
? __this_cpu_preempt_check+0xf/0x20
__lock_acquire+0x405/0x7b0
lock_acquire+0xae/0x220
? f2fs_sync_fs+0x7b/0x1b0 [f2fs]
__mutex_lock+0x4f/0x830
? f2fs_sync_fs+0x7b/0x1b0 [f2fs]
mutex_lock_nested+0x25/0x30
? f2fs_sync_fs+0x7b/0x1b0 [f2fs]
f2fs_sync_fs+0x7b/0x1b0 [f2fs]
f2fs_balance_fs_bg+0xb9/0x200 [f2fs]
gc_thread_func+0x302/0x4a0 [f2fs]
? preempt_schedule_common+0x2f/0x4d
? f2fs_gc+0x540/0x540 [f2fs]
kthread+0xe9/0x120
? f2fs_gc+0x540/0x540 [f2fs]
? kthread_create_on_node+0x30/0x30
ret_from_fork+0x19/0x24
The deadlock occurs in below condition:
GC Thread Thread B
- sb_start_intwrite
- f2fs_sync_file
- f2fs_sync_fs
- mutex_lock(&sbi->gc_mutex)
- write_checkpoint
- block_operations
- f2fs_sync_inode_meta
- iput
- sb_start_intwrite
- mutex_lock(&sbi->gc_mutex)
Fix this by altering sb_start_intwrite to sb_start_write_trylock.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-07-22 08:52:23 +08:00
|
|
|
continue;
|
2018-09-29 18:31:28 +08:00
|
|
|
}
|
f2fs: make background threads of f2fs being aware of freezing
When ->freeze_fs is called from lvm for doing snapshot, it needs to
make sure there will be no more changes in filesystem's data, however,
previously, background threads like GC thread wasn't aware of freezing,
so in environment with active background threads, data of snapshot
becomes unstable.
This patch fixes this issue by adding sb_{start,end}_intwrite in
below background threads:
- GC thread
- flush thread
- discard thread
Note that, don't use sb_start_intwrite() in gc_thread_func() due to:
generic/241 reports below bug:
======================================================
WARNING: possible circular locking dependency detected
4.13.0-rc1+ #32 Tainted: G O
------------------------------------------------------
f2fs_gc-250:0/22186 is trying to acquire lock:
(&sbi->gc_mutex){+.+...}, at: [<f8fa7f0b>] f2fs_sync_fs+0x7b/0x1b0 [f2fs]
but task is already holding lock:
(sb_internal#2){++++.-}, at: [<f8fb5609>] gc_thread_func+0x159/0x4a0 [f2fs]
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #2 (sb_internal#2){++++.-}:
__lock_acquire+0x405/0x7b0
lock_acquire+0xae/0x220
__sb_start_write+0x11d/0x1f0
f2fs_evict_inode+0x2d6/0x4e0 [f2fs]
evict+0xa8/0x170
iput+0x1fb/0x2c0
f2fs_sync_inode_meta+0x3f/0xf0 [f2fs]
write_checkpoint+0x1b1/0x750 [f2fs]
f2fs_sync_fs+0x85/0x1b0 [f2fs]
f2fs_do_sync_file.isra.24+0x137/0xa30 [f2fs]
f2fs_sync_file+0x34/0x40 [f2fs]
vfs_fsync_range+0x4a/0xa0
do_fsync+0x3c/0x60
SyS_fdatasync+0x15/0x20
do_fast_syscall_32+0xa1/0x1b0
entry_SYSENTER_32+0x4c/0x7b
-> #1 (&sbi->cp_mutex){+.+...}:
__lock_acquire+0x405/0x7b0
lock_acquire+0xae/0x220
__mutex_lock+0x4f/0x830
mutex_lock_nested+0x25/0x30
write_checkpoint+0x2f/0x750 [f2fs]
f2fs_sync_fs+0x85/0x1b0 [f2fs]
sync_filesystem+0x67/0x80
generic_shutdown_super+0x27/0x100
kill_block_super+0x22/0x50
kill_f2fs_super+0x3a/0x40 [f2fs]
deactivate_locked_super+0x3d/0x70
deactivate_super+0x40/0x60
cleanup_mnt+0x39/0x70
__cleanup_mnt+0x10/0x20
task_work_run+0x69/0x80
exit_to_usermode_loop+0x57/0x92
do_fast_syscall_32+0x18c/0x1b0
entry_SYSENTER_32+0x4c/0x7b
-> #0 (&sbi->gc_mutex){+.+...}:
validate_chain.isra.36+0xc50/0xdb0
__lock_acquire+0x405/0x7b0
lock_acquire+0xae/0x220
__mutex_lock+0x4f/0x830
mutex_lock_nested+0x25/0x30
f2fs_sync_fs+0x7b/0x1b0 [f2fs]
f2fs_balance_fs_bg+0xb9/0x200 [f2fs]
gc_thread_func+0x302/0x4a0 [f2fs]
kthread+0xe9/0x120
ret_from_fork+0x19/0x24
other info that might help us debug this:
Chain exists of:
&sbi->gc_mutex --> &sbi->cp_mutex --> sb_internal#2
Possible unsafe locking scenario:
CPU0 CPU1
---- ----
lock(sb_internal#2);
lock(&sbi->cp_mutex);
lock(sb_internal#2);
lock(&sbi->gc_mutex);
*** DEADLOCK ***
1 lock held by f2fs_gc-250:0/22186:
#0: (sb_internal#2){++++.-}, at: [<f8fb5609>] gc_thread_func+0x159/0x4a0 [f2fs]
stack backtrace:
CPU: 2 PID: 22186 Comm: f2fs_gc-250:0 Tainted: G O 4.13.0-rc1+ #32
Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
Call Trace:
dump_stack+0x5f/0x92
print_circular_bug+0x1b3/0x1bd
validate_chain.isra.36+0xc50/0xdb0
? __this_cpu_preempt_check+0xf/0x20
__lock_acquire+0x405/0x7b0
lock_acquire+0xae/0x220
? f2fs_sync_fs+0x7b/0x1b0 [f2fs]
__mutex_lock+0x4f/0x830
? f2fs_sync_fs+0x7b/0x1b0 [f2fs]
mutex_lock_nested+0x25/0x30
? f2fs_sync_fs+0x7b/0x1b0 [f2fs]
f2fs_sync_fs+0x7b/0x1b0 [f2fs]
f2fs_balance_fs_bg+0xb9/0x200 [f2fs]
gc_thread_func+0x302/0x4a0 [f2fs]
? preempt_schedule_common+0x2f/0x4d
? f2fs_gc+0x540/0x540 [f2fs]
kthread+0xe9/0x120
? f2fs_gc+0x540/0x540 [f2fs]
? kthread_create_on_node+0x30/0x30
ret_from_fork+0x19/0x24
The deadlock occurs in below condition:
GC Thread Thread B
- sb_start_intwrite
- f2fs_sync_file
- f2fs_sync_fs
- mutex_lock(&sbi->gc_mutex)
- write_checkpoint
- block_operations
- f2fs_sync_inode_meta
- iput
- sb_start_intwrite
- mutex_lock(&sbi->gc_mutex)
Fix this by altering sb_start_intwrite to sb_start_write_trylock.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-07-22 08:52:23 +08:00
|
|
|
|
f2fs: add garbage collection functions
This adds on-demand and background cleaning functions.
- The basic background cleaning policy is trying to do cleaning jobs as much as
possible whenever the system is idle. Once the background cleaning is done,
the cleaner sleeps an amount of time not to interfere with VFS calls. The time
is dynamically adjusted according to the status of whole segments, which is
decreased when the following conditions are satisfied.
. GC is not conducted currently, and
. IO subsystem is idle by checking the number of requets in bdev's request
list, and
. There are enough dirty segments.
Otherwise, the time is increased incrementally until to the maximum time.
Note that, min and max times are 10 secs and 30 secs by default.
- F2FS adopts a default victim selection policy where background cleaning uses
a cost-benefit algorithm, while on-demand cleaning uses a greedy algorithm.
- The method of moving data during the cleaning is slightly different between
background and on-demand cleaning schemes. In the case of background cleaning,
F2FS loads the data, and marks them as dirty. Then, F2FS expects that the data
will be moved by flusher or VM. In the case of on-demand cleaning, F2FS should
move the data right away.
- In order to identify valid blocks in a victim segment, F2FS scans the bitmap
of the segment managed as an SIT entry.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:13:01 +08:00
|
|
|
/*
|
|
|
|
* [GC triggering condition]
|
|
|
|
* 0. GC is not conducted currently.
|
|
|
|
* 1. There are enough dirty segments.
|
|
|
|
* 2. IO subsystem is idle by checking the # of writeback pages.
|
|
|
|
* 3. IO subsystem is idle by checking the # of requests in
|
|
|
|
* bdev's request list.
|
|
|
|
*
|
2014-08-06 22:22:50 +08:00
|
|
|
* Note) We have to avoid triggering GCs frequently.
|
f2fs: add garbage collection functions
This adds on-demand and background cleaning functions.
- The basic background cleaning policy is trying to do cleaning jobs as much as
possible whenever the system is idle. Once the background cleaning is done,
the cleaner sleeps an amount of time not to interfere with VFS calls. The time
is dynamically adjusted according to the status of whole segments, which is
decreased when the following conditions are satisfied.
. GC is not conducted currently, and
. IO subsystem is idle by checking the number of requets in bdev's request
list, and
. There are enough dirty segments.
Otherwise, the time is increased incrementally until to the maximum time.
Note that, min and max times are 10 secs and 30 secs by default.
- F2FS adopts a default victim selection policy where background cleaning uses
a cost-benefit algorithm, while on-demand cleaning uses a greedy algorithm.
- The method of moving data during the cleaning is slightly different between
background and on-demand cleaning schemes. In the case of background cleaning,
F2FS loads the data, and marks them as dirty. Then, F2FS expects that the data
will be moved by flusher or VM. In the case of on-demand cleaning, F2FS should
move the data right away.
- In order to identify valid blocks in a victim segment, F2FS scans the bitmap
of the segment managed as an SIT entry.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:13:01 +08:00
|
|
|
* Because it is possible that some segments can be
|
|
|
|
* invalidated soon after by user update or deletion.
|
|
|
|
* So, I'd like to wait some time to collect dirty segments.
|
|
|
|
*/
|
2020-07-02 12:14:14 +08:00
|
|
|
if (sbi->gc_mode == GC_URGENT_HIGH) {
|
2017-08-07 13:09:00 +08:00
|
|
|
wait_ms = gc_th->urgent_sleep_time;
|
2020-01-14 19:36:50 +08:00
|
|
|
down_write(&sbi->gc_lock);
|
2017-08-07 13:09:00 +08:00
|
|
|
goto do_gc;
|
|
|
|
}
|
|
|
|
|
2020-01-14 19:36:50 +08:00
|
|
|
if (!down_write_trylock(&sbi->gc_lock)) {
|
2018-09-29 18:31:28 +08:00
|
|
|
stat_other_skip_bggc_count(sbi);
|
2018-02-27 01:19:47 +08:00
|
|
|
goto next;
|
2018-09-29 18:31:28 +08:00
|
|
|
}
|
2018-02-27 01:19:47 +08:00
|
|
|
|
2018-09-19 16:48:47 +08:00
|
|
|
if (!is_idle(sbi, GC_TIME)) {
|
2015-01-26 20:24:21 +08:00
|
|
|
increase_sleep_time(gc_th, &wait_ms);
|
2020-01-14 19:36:50 +08:00
|
|
|
up_write(&sbi->gc_lock);
|
2018-09-29 18:31:28 +08:00
|
|
|
stat_io_skip_bggc_count(sbi);
|
f2fs: make background threads of f2fs being aware of freezing
When ->freeze_fs is called from lvm for doing snapshot, it needs to
make sure there will be no more changes in filesystem's data, however,
previously, background threads like GC thread wasn't aware of freezing,
so in environment with active background threads, data of snapshot
becomes unstable.
This patch fixes this issue by adding sb_{start,end}_intwrite in
below background threads:
- GC thread
- flush thread
- discard thread
Note that, don't use sb_start_intwrite() in gc_thread_func() due to:
generic/241 reports below bug:
======================================================
WARNING: possible circular locking dependency detected
4.13.0-rc1+ #32 Tainted: G O
------------------------------------------------------
f2fs_gc-250:0/22186 is trying to acquire lock:
(&sbi->gc_mutex){+.+...}, at: [<f8fa7f0b>] f2fs_sync_fs+0x7b/0x1b0 [f2fs]
but task is already holding lock:
(sb_internal#2){++++.-}, at: [<f8fb5609>] gc_thread_func+0x159/0x4a0 [f2fs]
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #2 (sb_internal#2){++++.-}:
__lock_acquire+0x405/0x7b0
lock_acquire+0xae/0x220
__sb_start_write+0x11d/0x1f0
f2fs_evict_inode+0x2d6/0x4e0 [f2fs]
evict+0xa8/0x170
iput+0x1fb/0x2c0
f2fs_sync_inode_meta+0x3f/0xf0 [f2fs]
write_checkpoint+0x1b1/0x750 [f2fs]
f2fs_sync_fs+0x85/0x1b0 [f2fs]
f2fs_do_sync_file.isra.24+0x137/0xa30 [f2fs]
f2fs_sync_file+0x34/0x40 [f2fs]
vfs_fsync_range+0x4a/0xa0
do_fsync+0x3c/0x60
SyS_fdatasync+0x15/0x20
do_fast_syscall_32+0xa1/0x1b0
entry_SYSENTER_32+0x4c/0x7b
-> #1 (&sbi->cp_mutex){+.+...}:
__lock_acquire+0x405/0x7b0
lock_acquire+0xae/0x220
__mutex_lock+0x4f/0x830
mutex_lock_nested+0x25/0x30
write_checkpoint+0x2f/0x750 [f2fs]
f2fs_sync_fs+0x85/0x1b0 [f2fs]
sync_filesystem+0x67/0x80
generic_shutdown_super+0x27/0x100
kill_block_super+0x22/0x50
kill_f2fs_super+0x3a/0x40 [f2fs]
deactivate_locked_super+0x3d/0x70
deactivate_super+0x40/0x60
cleanup_mnt+0x39/0x70
__cleanup_mnt+0x10/0x20
task_work_run+0x69/0x80
exit_to_usermode_loop+0x57/0x92
do_fast_syscall_32+0x18c/0x1b0
entry_SYSENTER_32+0x4c/0x7b
-> #0 (&sbi->gc_mutex){+.+...}:
validate_chain.isra.36+0xc50/0xdb0
__lock_acquire+0x405/0x7b0
lock_acquire+0xae/0x220
__mutex_lock+0x4f/0x830
mutex_lock_nested+0x25/0x30
f2fs_sync_fs+0x7b/0x1b0 [f2fs]
f2fs_balance_fs_bg+0xb9/0x200 [f2fs]
gc_thread_func+0x302/0x4a0 [f2fs]
kthread+0xe9/0x120
ret_from_fork+0x19/0x24
other info that might help us debug this:
Chain exists of:
&sbi->gc_mutex --> &sbi->cp_mutex --> sb_internal#2
Possible unsafe locking scenario:
CPU0 CPU1
---- ----
lock(sb_internal#2);
lock(&sbi->cp_mutex);
lock(sb_internal#2);
lock(&sbi->gc_mutex);
*** DEADLOCK ***
1 lock held by f2fs_gc-250:0/22186:
#0: (sb_internal#2){++++.-}, at: [<f8fb5609>] gc_thread_func+0x159/0x4a0 [f2fs]
stack backtrace:
CPU: 2 PID: 22186 Comm: f2fs_gc-250:0 Tainted: G O 4.13.0-rc1+ #32
Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
Call Trace:
dump_stack+0x5f/0x92
print_circular_bug+0x1b3/0x1bd
validate_chain.isra.36+0xc50/0xdb0
? __this_cpu_preempt_check+0xf/0x20
__lock_acquire+0x405/0x7b0
lock_acquire+0xae/0x220
? f2fs_sync_fs+0x7b/0x1b0 [f2fs]
__mutex_lock+0x4f/0x830
? f2fs_sync_fs+0x7b/0x1b0 [f2fs]
mutex_lock_nested+0x25/0x30
? f2fs_sync_fs+0x7b/0x1b0 [f2fs]
f2fs_sync_fs+0x7b/0x1b0 [f2fs]
f2fs_balance_fs_bg+0xb9/0x200 [f2fs]
gc_thread_func+0x302/0x4a0 [f2fs]
? preempt_schedule_common+0x2f/0x4d
? f2fs_gc+0x540/0x540 [f2fs]
kthread+0xe9/0x120
? f2fs_gc+0x540/0x540 [f2fs]
? kthread_create_on_node+0x30/0x30
ret_from_fork+0x19/0x24
The deadlock occurs in below condition:
GC Thread Thread B
- sb_start_intwrite
- f2fs_sync_file
- f2fs_sync_fs
- mutex_lock(&sbi->gc_mutex)
- write_checkpoint
- block_operations
- f2fs_sync_inode_meta
- iput
- sb_start_intwrite
- mutex_lock(&sbi->gc_mutex)
Fix this by altering sb_start_intwrite to sb_start_write_trylock.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-07-22 08:52:23 +08:00
|
|
|
goto next;
|
f2fs: add garbage collection functions
This adds on-demand and background cleaning functions.
- The basic background cleaning policy is trying to do cleaning jobs as much as
possible whenever the system is idle. Once the background cleaning is done,
the cleaner sleeps an amount of time not to interfere with VFS calls. The time
is dynamically adjusted according to the status of whole segments, which is
decreased when the following conditions are satisfied.
. GC is not conducted currently, and
. IO subsystem is idle by checking the number of requets in bdev's request
list, and
. There are enough dirty segments.
Otherwise, the time is increased incrementally until to the maximum time.
Note that, min and max times are 10 secs and 30 secs by default.
- F2FS adopts a default victim selection policy where background cleaning uses
a cost-benefit algorithm, while on-demand cleaning uses a greedy algorithm.
- The method of moving data during the cleaning is slightly different between
background and on-demand cleaning schemes. In the case of background cleaning,
F2FS loads the data, and marks them as dirty. Then, F2FS expects that the data
will be moved by flusher or VM. In the case of on-demand cleaning, F2FS should
move the data right away.
- In order to identify valid blocks in a victim segment, F2FS scans the bitmap
of the segment managed as an SIT entry.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:13:01 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
if (has_enough_invalid_blocks(sbi))
|
2015-01-26 20:24:21 +08:00
|
|
|
decrease_sleep_time(gc_th, &wait_ms);
|
f2fs: add garbage collection functions
This adds on-demand and background cleaning functions.
- The basic background cleaning policy is trying to do cleaning jobs as much as
possible whenever the system is idle. Once the background cleaning is done,
the cleaner sleeps an amount of time not to interfere with VFS calls. The time
is dynamically adjusted according to the status of whole segments, which is
decreased when the following conditions are satisfied.
. GC is not conducted currently, and
. IO subsystem is idle by checking the number of requets in bdev's request
list, and
. There are enough dirty segments.
Otherwise, the time is increased incrementally until to the maximum time.
Note that, min and max times are 10 secs and 30 secs by default.
- F2FS adopts a default victim selection policy where background cleaning uses
a cost-benefit algorithm, while on-demand cleaning uses a greedy algorithm.
- The method of moving data during the cleaning is slightly different between
background and on-demand cleaning schemes. In the case of background cleaning,
F2FS loads the data, and marks them as dirty. Then, F2FS expects that the data
will be moved by flusher or VM. In the case of on-demand cleaning, F2FS should
move the data right away.
- In order to identify valid blocks in a victim segment, F2FS scans the bitmap
of the segment managed as an SIT entry.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:13:01 +08:00
|
|
|
else
|
2015-01-26 20:24:21 +08:00
|
|
|
increase_sleep_time(gc_th, &wait_ms);
|
2017-08-07 13:09:00 +08:00
|
|
|
do_gc:
|
2020-01-23 02:51:16 +08:00
|
|
|
stat_inc_bggc_count(sbi->stat_info);
|
f2fs: add garbage collection functions
This adds on-demand and background cleaning functions.
- The basic background cleaning policy is trying to do cleaning jobs as much as
possible whenever the system is idle. Once the background cleaning is done,
the cleaner sleeps an amount of time not to interfere with VFS calls. The time
is dynamically adjusted according to the status of whole segments, which is
decreased when the following conditions are satisfied.
. GC is not conducted currently, and
. IO subsystem is idle by checking the number of requets in bdev's request
list, and
. There are enough dirty segments.
Otherwise, the time is increased incrementally until to the maximum time.
Note that, min and max times are 10 secs and 30 secs by default.
- F2FS adopts a default victim selection policy where background cleaning uses
a cost-benefit algorithm, while on-demand cleaning uses a greedy algorithm.
- The method of moving data during the cleaning is slightly different between
background and on-demand cleaning schemes. In the case of background cleaning,
F2FS loads the data, and marks them as dirty. Then, F2FS expects that the data
will be moved by flusher or VM. In the case of on-demand cleaning, F2FS should
move the data right away.
- In order to identify valid blocks in a victim segment, F2FS scans the bitmap
of the segment managed as an SIT entry.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:13:01 +08:00
|
|
|
|
2020-02-14 17:44:13 +08:00
|
|
|
sync_mode = F2FS_OPTION(sbi).bggc_mode == BGGC_MODE_SYNC;
|
|
|
|
|
2013-02-04 14:11:17 +08:00
|
|
|
/* if return value is not zero, no victim was selected */
|
2020-02-14 17:44:13 +08:00
|
|
|
if (f2fs_gc(sbi, sync_mode, true, NULL_SEGNO))
|
2013-08-04 22:09:40 +08:00
|
|
|
wait_ms = gc_th->no_gc_sleep_time;
|
2013-10-24 12:31:34 +08:00
|
|
|
|
2015-10-14 01:00:53 +08:00
|
|
|
trace_f2fs_background_gc(sbi->sb, wait_ms,
|
|
|
|
prefree_segments(sbi), free_segments(sbi));
|
|
|
|
|
2013-10-24 13:19:18 +08:00
|
|
|
/* balancing f2fs's metadata periodically */
|
2020-03-19 19:57:58 +08:00
|
|
|
f2fs_balance_fs_bg(sbi, true);
|
f2fs: make background threads of f2fs being aware of freezing
When ->freeze_fs is called from lvm for doing snapshot, it needs to
make sure there will be no more changes in filesystem's data, however,
previously, background threads like GC thread wasn't aware of freezing,
so in environment with active background threads, data of snapshot
becomes unstable.
This patch fixes this issue by adding sb_{start,end}_intwrite in
below background threads:
- GC thread
- flush thread
- discard thread
Note that, don't use sb_start_intwrite() in gc_thread_func() due to:
generic/241 reports below bug:
======================================================
WARNING: possible circular locking dependency detected
4.13.0-rc1+ #32 Tainted: G O
------------------------------------------------------
f2fs_gc-250:0/22186 is trying to acquire lock:
(&sbi->gc_mutex){+.+...}, at: [<f8fa7f0b>] f2fs_sync_fs+0x7b/0x1b0 [f2fs]
but task is already holding lock:
(sb_internal#2){++++.-}, at: [<f8fb5609>] gc_thread_func+0x159/0x4a0 [f2fs]
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #2 (sb_internal#2){++++.-}:
__lock_acquire+0x405/0x7b0
lock_acquire+0xae/0x220
__sb_start_write+0x11d/0x1f0
f2fs_evict_inode+0x2d6/0x4e0 [f2fs]
evict+0xa8/0x170
iput+0x1fb/0x2c0
f2fs_sync_inode_meta+0x3f/0xf0 [f2fs]
write_checkpoint+0x1b1/0x750 [f2fs]
f2fs_sync_fs+0x85/0x1b0 [f2fs]
f2fs_do_sync_file.isra.24+0x137/0xa30 [f2fs]
f2fs_sync_file+0x34/0x40 [f2fs]
vfs_fsync_range+0x4a/0xa0
do_fsync+0x3c/0x60
SyS_fdatasync+0x15/0x20
do_fast_syscall_32+0xa1/0x1b0
entry_SYSENTER_32+0x4c/0x7b
-> #1 (&sbi->cp_mutex){+.+...}:
__lock_acquire+0x405/0x7b0
lock_acquire+0xae/0x220
__mutex_lock+0x4f/0x830
mutex_lock_nested+0x25/0x30
write_checkpoint+0x2f/0x750 [f2fs]
f2fs_sync_fs+0x85/0x1b0 [f2fs]
sync_filesystem+0x67/0x80
generic_shutdown_super+0x27/0x100
kill_block_super+0x22/0x50
kill_f2fs_super+0x3a/0x40 [f2fs]
deactivate_locked_super+0x3d/0x70
deactivate_super+0x40/0x60
cleanup_mnt+0x39/0x70
__cleanup_mnt+0x10/0x20
task_work_run+0x69/0x80
exit_to_usermode_loop+0x57/0x92
do_fast_syscall_32+0x18c/0x1b0
entry_SYSENTER_32+0x4c/0x7b
-> #0 (&sbi->gc_mutex){+.+...}:
validate_chain.isra.36+0xc50/0xdb0
__lock_acquire+0x405/0x7b0
lock_acquire+0xae/0x220
__mutex_lock+0x4f/0x830
mutex_lock_nested+0x25/0x30
f2fs_sync_fs+0x7b/0x1b0 [f2fs]
f2fs_balance_fs_bg+0xb9/0x200 [f2fs]
gc_thread_func+0x302/0x4a0 [f2fs]
kthread+0xe9/0x120
ret_from_fork+0x19/0x24
other info that might help us debug this:
Chain exists of:
&sbi->gc_mutex --> &sbi->cp_mutex --> sb_internal#2
Possible unsafe locking scenario:
CPU0 CPU1
---- ----
lock(sb_internal#2);
lock(&sbi->cp_mutex);
lock(sb_internal#2);
lock(&sbi->gc_mutex);
*** DEADLOCK ***
1 lock held by f2fs_gc-250:0/22186:
#0: (sb_internal#2){++++.-}, at: [<f8fb5609>] gc_thread_func+0x159/0x4a0 [f2fs]
stack backtrace:
CPU: 2 PID: 22186 Comm: f2fs_gc-250:0 Tainted: G O 4.13.0-rc1+ #32
Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
Call Trace:
dump_stack+0x5f/0x92
print_circular_bug+0x1b3/0x1bd
validate_chain.isra.36+0xc50/0xdb0
? __this_cpu_preempt_check+0xf/0x20
__lock_acquire+0x405/0x7b0
lock_acquire+0xae/0x220
? f2fs_sync_fs+0x7b/0x1b0 [f2fs]
__mutex_lock+0x4f/0x830
? f2fs_sync_fs+0x7b/0x1b0 [f2fs]
mutex_lock_nested+0x25/0x30
? f2fs_sync_fs+0x7b/0x1b0 [f2fs]
f2fs_sync_fs+0x7b/0x1b0 [f2fs]
f2fs_balance_fs_bg+0xb9/0x200 [f2fs]
gc_thread_func+0x302/0x4a0 [f2fs]
? preempt_schedule_common+0x2f/0x4d
? f2fs_gc+0x540/0x540 [f2fs]
kthread+0xe9/0x120
? f2fs_gc+0x540/0x540 [f2fs]
? kthread_create_on_node+0x30/0x30
ret_from_fork+0x19/0x24
The deadlock occurs in below condition:
GC Thread Thread B
- sb_start_intwrite
- f2fs_sync_file
- f2fs_sync_fs
- mutex_lock(&sbi->gc_mutex)
- write_checkpoint
- block_operations
- f2fs_sync_inode_meta
- iput
- sb_start_intwrite
- mutex_lock(&sbi->gc_mutex)
Fix this by altering sb_start_intwrite to sb_start_write_trylock.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-07-22 08:52:23 +08:00
|
|
|
next:
|
|
|
|
sb_end_write(sbi->sb);
|
2013-10-24 12:31:34 +08:00
|
|
|
|
f2fs: add garbage collection functions
This adds on-demand and background cleaning functions.
- The basic background cleaning policy is trying to do cleaning jobs as much as
possible whenever the system is idle. Once the background cleaning is done,
the cleaner sleeps an amount of time not to interfere with VFS calls. The time
is dynamically adjusted according to the status of whole segments, which is
decreased when the following conditions are satisfied.
. GC is not conducted currently, and
. IO subsystem is idle by checking the number of requets in bdev's request
list, and
. There are enough dirty segments.
Otherwise, the time is increased incrementally until to the maximum time.
Note that, min and max times are 10 secs and 30 secs by default.
- F2FS adopts a default victim selection policy where background cleaning uses
a cost-benefit algorithm, while on-demand cleaning uses a greedy algorithm.
- The method of moving data during the cleaning is slightly different between
background and on-demand cleaning schemes. In the case of background cleaning,
F2FS loads the data, and marks them as dirty. Then, F2FS expects that the data
will be moved by flusher or VM. In the case of on-demand cleaning, F2FS should
move the data right away.
- In order to identify valid blocks in a victim segment, F2FS scans the bitmap
of the segment managed as an SIT entry.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:13:01 +08:00
|
|
|
} while (!kthread_should_stop());
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
f2fs: clean up symbol namespace
As Ted reported:
"Hi, I was looking at f2fs's sources recently, and I noticed that there
is a very large number of non-static symbols which don't have a f2fs
prefix. There's well over a hundred (see attached below).
As one example, in fs/f2fs/dir.c there is:
unsigned char get_de_type(struct f2fs_dir_entry *de)
This function is clearly only useful for f2fs, but it has a generic
name. This means that if any other file system tries to have the same
symbol name, there will be a symbol conflict and the kernel would not
successfully build. It also means that when someone is looking f2fs
sources, it's not at all obvious whether a function such as
read_data_page(), invalidate_blocks(), is a generic kernel function
found in the fs, mm, or block layers, or a f2fs specific function.
You might want to fix this at some point. Hopefully Kent's bcachefs
isn't similarly using genericly named functions, since that might
cause conflicts with f2fs's functions --- but just as this would be a
problem that we would rightly insist that Kent fix, this is something
that we should have rightly insisted that f2fs should have fixed
before it was integrated into the mainline kernel.
acquire_orphan_inode
add_ino_entry
add_orphan_inode
allocate_data_block
allocate_new_segments
alloc_nid
alloc_nid_done
alloc_nid_failed
available_free_memory
...."
This patch adds "f2fs_" prefix for all non-static symbols in order to:
a) avoid conflict with other kernel generic symbols;
b) to indicate the function is f2fs specific one instead of generic
one;
Reported-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-30 00:20:41 +08:00
|
|
|
int f2fs_start_gc_thread(struct f2fs_sb_info *sbi)
|
f2fs: add garbage collection functions
This adds on-demand and background cleaning functions.
- The basic background cleaning policy is trying to do cleaning jobs as much as
possible whenever the system is idle. Once the background cleaning is done,
the cleaner sleeps an amount of time not to interfere with VFS calls. The time
is dynamically adjusted according to the status of whole segments, which is
decreased when the following conditions are satisfied.
. GC is not conducted currently, and
. IO subsystem is idle by checking the number of requets in bdev's request
list, and
. There are enough dirty segments.
Otherwise, the time is increased incrementally until to the maximum time.
Note that, min and max times are 10 secs and 30 secs by default.
- F2FS adopts a default victim selection policy where background cleaning uses
a cost-benefit algorithm, while on-demand cleaning uses a greedy algorithm.
- The method of moving data during the cleaning is slightly different between
background and on-demand cleaning schemes. In the case of background cleaning,
F2FS loads the data, and marks them as dirty. Then, F2FS expects that the data
will be moved by flusher or VM. In the case of on-demand cleaning, F2FS should
move the data right away.
- In order to identify valid blocks in a victim segment, F2FS scans the bitmap
of the segment managed as an SIT entry.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:13:01 +08:00
|
|
|
{
|
2012-12-01 09:56:13 +08:00
|
|
|
struct f2fs_gc_kthread *gc_th;
|
2013-02-02 22:52:28 +08:00
|
|
|
dev_t dev = sbi->sb->s_bdev->bd_dev;
|
2013-05-26 10:05:32 +08:00
|
|
|
int err = 0;
|
f2fs: add garbage collection functions
This adds on-demand and background cleaning functions.
- The basic background cleaning policy is trying to do cleaning jobs as much as
possible whenever the system is idle. Once the background cleaning is done,
the cleaner sleeps an amount of time not to interfere with VFS calls. The time
is dynamically adjusted according to the status of whole segments, which is
decreased when the following conditions are satisfied.
. GC is not conducted currently, and
. IO subsystem is idle by checking the number of requets in bdev's request
list, and
. There are enough dirty segments.
Otherwise, the time is increased incrementally until to the maximum time.
Note that, min and max times are 10 secs and 30 secs by default.
- F2FS adopts a default victim selection policy where background cleaning uses
a cost-benefit algorithm, while on-demand cleaning uses a greedy algorithm.
- The method of moving data during the cleaning is slightly different between
background and on-demand cleaning schemes. In the case of background cleaning,
F2FS loads the data, and marks them as dirty. Then, F2FS expects that the data
will be moved by flusher or VM. In the case of on-demand cleaning, F2FS should
move the data right away.
- In order to identify valid blocks in a victim segment, F2FS scans the bitmap
of the segment managed as an SIT entry.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:13:01 +08:00
|
|
|
|
2016-09-23 21:30:09 +08:00
|
|
|
gc_th = f2fs_kmalloc(sbi, sizeof(struct f2fs_gc_kthread), GFP_KERNEL);
|
2013-05-26 10:05:32 +08:00
|
|
|
if (!gc_th) {
|
|
|
|
err = -ENOMEM;
|
|
|
|
goto out;
|
|
|
|
}
|
f2fs: add garbage collection functions
This adds on-demand and background cleaning functions.
- The basic background cleaning policy is trying to do cleaning jobs as much as
possible whenever the system is idle. Once the background cleaning is done,
the cleaner sleeps an amount of time not to interfere with VFS calls. The time
is dynamically adjusted according to the status of whole segments, which is
decreased when the following conditions are satisfied.
. GC is not conducted currently, and
. IO subsystem is idle by checking the number of requets in bdev's request
list, and
. There are enough dirty segments.
Otherwise, the time is increased incrementally until to the maximum time.
Note that, min and max times are 10 secs and 30 secs by default.
- F2FS adopts a default victim selection policy where background cleaning uses
a cost-benefit algorithm, while on-demand cleaning uses a greedy algorithm.
- The method of moving data during the cleaning is slightly different between
background and on-demand cleaning schemes. In the case of background cleaning,
F2FS loads the data, and marks them as dirty. Then, F2FS expects that the data
will be moved by flusher or VM. In the case of on-demand cleaning, F2FS should
move the data right away.
- In order to identify valid blocks in a victim segment, F2FS scans the bitmap
of the segment managed as an SIT entry.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:13:01 +08:00
|
|
|
|
2017-08-07 13:09:00 +08:00
|
|
|
gc_th->urgent_sleep_time = DEF_GC_THREAD_URGENT_SLEEP_TIME;
|
2013-08-04 22:09:40 +08:00
|
|
|
gc_th->min_sleep_time = DEF_GC_THREAD_MIN_SLEEP_TIME;
|
|
|
|
gc_th->max_sleep_time = DEF_GC_THREAD_MAX_SLEEP_TIME;
|
|
|
|
gc_th->no_gc_sleep_time = DEF_GC_THREAD_NOGC_SLEEP_TIME;
|
|
|
|
|
2017-08-07 13:09:00 +08:00
|
|
|
gc_th->gc_wake= 0;
|
2013-08-04 22:10:15 +08:00
|
|
|
|
f2fs: add garbage collection functions
This adds on-demand and background cleaning functions.
- The basic background cleaning policy is trying to do cleaning jobs as much as
possible whenever the system is idle. Once the background cleaning is done,
the cleaner sleeps an amount of time not to interfere with VFS calls. The time
is dynamically adjusted according to the status of whole segments, which is
decreased when the following conditions are satisfied.
. GC is not conducted currently, and
. IO subsystem is idle by checking the number of requets in bdev's request
list, and
. There are enough dirty segments.
Otherwise, the time is increased incrementally until to the maximum time.
Note that, min and max times are 10 secs and 30 secs by default.
- F2FS adopts a default victim selection policy where background cleaning uses
a cost-benefit algorithm, while on-demand cleaning uses a greedy algorithm.
- The method of moving data during the cleaning is slightly different between
background and on-demand cleaning schemes. In the case of background cleaning,
F2FS loads the data, and marks them as dirty. Then, F2FS expects that the data
will be moved by flusher or VM. In the case of on-demand cleaning, F2FS should
move the data right away.
- In order to identify valid blocks in a victim segment, F2FS scans the bitmap
of the segment managed as an SIT entry.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:13:01 +08:00
|
|
|
sbi->gc_thread = gc_th;
|
|
|
|
init_waitqueue_head(&sbi->gc_thread->gc_wait_queue_head);
|
|
|
|
sbi->gc_thread->f2fs_gc_task = kthread_run(gc_thread_func, sbi,
|
2013-02-02 22:52:28 +08:00
|
|
|
"f2fs_gc-%u:%u", MAJOR(dev), MINOR(dev));
|
f2fs: add garbage collection functions
This adds on-demand and background cleaning functions.
- The basic background cleaning policy is trying to do cleaning jobs as much as
possible whenever the system is idle. Once the background cleaning is done,
the cleaner sleeps an amount of time not to interfere with VFS calls. The time
is dynamically adjusted according to the status of whole segments, which is
decreased when the following conditions are satisfied.
. GC is not conducted currently, and
. IO subsystem is idle by checking the number of requets in bdev's request
list, and
. There are enough dirty segments.
Otherwise, the time is increased incrementally until to the maximum time.
Note that, min and max times are 10 secs and 30 secs by default.
- F2FS adopts a default victim selection policy where background cleaning uses
a cost-benefit algorithm, while on-demand cleaning uses a greedy algorithm.
- The method of moving data during the cleaning is slightly different between
background and on-demand cleaning schemes. In the case of background cleaning,
F2FS loads the data, and marks them as dirty. Then, F2FS expects that the data
will be moved by flusher or VM. In the case of on-demand cleaning, F2FS should
move the data right away.
- In order to identify valid blocks in a victim segment, F2FS scans the bitmap
of the segment managed as an SIT entry.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:13:01 +08:00
|
|
|
if (IS_ERR(gc_th->f2fs_gc_task)) {
|
2013-05-26 10:05:32 +08:00
|
|
|
err = PTR_ERR(gc_th->f2fs_gc_task);
|
2020-09-14 16:47:00 +08:00
|
|
|
kfree(gc_th);
|
2013-02-02 22:52:42 +08:00
|
|
|
sbi->gc_thread = NULL;
|
f2fs: add garbage collection functions
This adds on-demand and background cleaning functions.
- The basic background cleaning policy is trying to do cleaning jobs as much as
possible whenever the system is idle. Once the background cleaning is done,
the cleaner sleeps an amount of time not to interfere with VFS calls. The time
is dynamically adjusted according to the status of whole segments, which is
decreased when the following conditions are satisfied.
. GC is not conducted currently, and
. IO subsystem is idle by checking the number of requets in bdev's request
list, and
. There are enough dirty segments.
Otherwise, the time is increased incrementally until to the maximum time.
Note that, min and max times are 10 secs and 30 secs by default.
- F2FS adopts a default victim selection policy where background cleaning uses
a cost-benefit algorithm, while on-demand cleaning uses a greedy algorithm.
- The method of moving data during the cleaning is slightly different between
background and on-demand cleaning schemes. In the case of background cleaning,
F2FS loads the data, and marks them as dirty. Then, F2FS expects that the data
will be moved by flusher or VM. In the case of on-demand cleaning, F2FS should
move the data right away.
- In order to identify valid blocks in a victim segment, F2FS scans the bitmap
of the segment managed as an SIT entry.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:13:01 +08:00
|
|
|
}
|
2013-05-26 10:05:32 +08:00
|
|
|
out:
|
|
|
|
return err;
|
f2fs: add garbage collection functions
This adds on-demand and background cleaning functions.
- The basic background cleaning policy is trying to do cleaning jobs as much as
possible whenever the system is idle. Once the background cleaning is done,
the cleaner sleeps an amount of time not to interfere with VFS calls. The time
is dynamically adjusted according to the status of whole segments, which is
decreased when the following conditions are satisfied.
. GC is not conducted currently, and
. IO subsystem is idle by checking the number of requets in bdev's request
list, and
. There are enough dirty segments.
Otherwise, the time is increased incrementally until to the maximum time.
Note that, min and max times are 10 secs and 30 secs by default.
- F2FS adopts a default victim selection policy where background cleaning uses
a cost-benefit algorithm, while on-demand cleaning uses a greedy algorithm.
- The method of moving data during the cleaning is slightly different between
background and on-demand cleaning schemes. In the case of background cleaning,
F2FS loads the data, and marks them as dirty. Then, F2FS expects that the data
will be moved by flusher or VM. In the case of on-demand cleaning, F2FS should
move the data right away.
- In order to identify valid blocks in a victim segment, F2FS scans the bitmap
of the segment managed as an SIT entry.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:13:01 +08:00
|
|
|
}
|
|
|
|
|
f2fs: clean up symbol namespace
As Ted reported:
"Hi, I was looking at f2fs's sources recently, and I noticed that there
is a very large number of non-static symbols which don't have a f2fs
prefix. There's well over a hundred (see attached below).
As one example, in fs/f2fs/dir.c there is:
unsigned char get_de_type(struct f2fs_dir_entry *de)
This function is clearly only useful for f2fs, but it has a generic
name. This means that if any other file system tries to have the same
symbol name, there will be a symbol conflict and the kernel would not
successfully build. It also means that when someone is looking f2fs
sources, it's not at all obvious whether a function such as
read_data_page(), invalidate_blocks(), is a generic kernel function
found in the fs, mm, or block layers, or a f2fs specific function.
You might want to fix this at some point. Hopefully Kent's bcachefs
isn't similarly using genericly named functions, since that might
cause conflicts with f2fs's functions --- but just as this would be a
problem that we would rightly insist that Kent fix, this is something
that we should have rightly insisted that f2fs should have fixed
before it was integrated into the mainline kernel.
acquire_orphan_inode
add_ino_entry
add_orphan_inode
allocate_data_block
allocate_new_segments
alloc_nid
alloc_nid_done
alloc_nid_failed
available_free_memory
...."
This patch adds "f2fs_" prefix for all non-static symbols in order to:
a) avoid conflict with other kernel generic symbols;
b) to indicate the function is f2fs specific one instead of generic
one;
Reported-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-30 00:20:41 +08:00
|
|
|
void f2fs_stop_gc_thread(struct f2fs_sb_info *sbi)
|
f2fs: add garbage collection functions
This adds on-demand and background cleaning functions.
- The basic background cleaning policy is trying to do cleaning jobs as much as
possible whenever the system is idle. Once the background cleaning is done,
the cleaner sleeps an amount of time not to interfere with VFS calls. The time
is dynamically adjusted according to the status of whole segments, which is
decreased when the following conditions are satisfied.
. GC is not conducted currently, and
. IO subsystem is idle by checking the number of requets in bdev's request
list, and
. There are enough dirty segments.
Otherwise, the time is increased incrementally until to the maximum time.
Note that, min and max times are 10 secs and 30 secs by default.
- F2FS adopts a default victim selection policy where background cleaning uses
a cost-benefit algorithm, while on-demand cleaning uses a greedy algorithm.
- The method of moving data during the cleaning is slightly different between
background and on-demand cleaning schemes. In the case of background cleaning,
F2FS loads the data, and marks them as dirty. Then, F2FS expects that the data
will be moved by flusher or VM. In the case of on-demand cleaning, F2FS should
move the data right away.
- In order to identify valid blocks in a victim segment, F2FS scans the bitmap
of the segment managed as an SIT entry.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:13:01 +08:00
|
|
|
{
|
|
|
|
struct f2fs_gc_kthread *gc_th = sbi->gc_thread;
|
|
|
|
if (!gc_th)
|
|
|
|
return;
|
|
|
|
kthread_stop(gc_th->f2fs_gc_task);
|
2020-09-14 16:47:00 +08:00
|
|
|
kfree(gc_th);
|
f2fs: add garbage collection functions
This adds on-demand and background cleaning functions.
- The basic background cleaning policy is trying to do cleaning jobs as much as
possible whenever the system is idle. Once the background cleaning is done,
the cleaner sleeps an amount of time not to interfere with VFS calls. The time
is dynamically adjusted according to the status of whole segments, which is
decreased when the following conditions are satisfied.
. GC is not conducted currently, and
. IO subsystem is idle by checking the number of requets in bdev's request
list, and
. There are enough dirty segments.
Otherwise, the time is increased incrementally until to the maximum time.
Note that, min and max times are 10 secs and 30 secs by default.
- F2FS adopts a default victim selection policy where background cleaning uses
a cost-benefit algorithm, while on-demand cleaning uses a greedy algorithm.
- The method of moving data during the cleaning is slightly different between
background and on-demand cleaning schemes. In the case of background cleaning,
F2FS loads the data, and marks them as dirty. Then, F2FS expects that the data
will be moved by flusher or VM. In the case of on-demand cleaning, F2FS should
move the data right away.
- In order to identify valid blocks in a victim segment, F2FS scans the bitmap
of the segment managed as an SIT entry.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:13:01 +08:00
|
|
|
sbi->gc_thread = NULL;
|
|
|
|
}
|
|
|
|
|
2018-05-08 05:22:40 +08:00
|
|
|
static int select_gc_type(struct f2fs_sb_info *sbi, int gc_type)
|
f2fs: add garbage collection functions
This adds on-demand and background cleaning functions.
- The basic background cleaning policy is trying to do cleaning jobs as much as
possible whenever the system is idle. Once the background cleaning is done,
the cleaner sleeps an amount of time not to interfere with VFS calls. The time
is dynamically adjusted according to the status of whole segments, which is
decreased when the following conditions are satisfied.
. GC is not conducted currently, and
. IO subsystem is idle by checking the number of requets in bdev's request
list, and
. There are enough dirty segments.
Otherwise, the time is increased incrementally until to the maximum time.
Note that, min and max times are 10 secs and 30 secs by default.
- F2FS adopts a default victim selection policy where background cleaning uses
a cost-benefit algorithm, while on-demand cleaning uses a greedy algorithm.
- The method of moving data during the cleaning is slightly different between
background and on-demand cleaning schemes. In the case of background cleaning,
F2FS loads the data, and marks them as dirty. Then, F2FS expects that the data
will be moved by flusher or VM. In the case of on-demand cleaning, F2FS should
move the data right away.
- In order to identify valid blocks in a victim segment, F2FS scans the bitmap
of the segment managed as an SIT entry.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:13:01 +08:00
|
|
|
{
|
f2fs: support age threshold based garbage collection
There are several issues in current background GC algorithm:
- valid blocks is one of key factors during cost overhead calculation,
so if segment has less valid block, however even its age is young or
it locates hot segment, CB algorithm will still choose the segment as
victim, it's not appropriate.
- GCed data/node will go to existing logs, no matter in-there datas'
update frequency is the same or not, it may mix hot and cold data
again.
- GC alloctor mainly use LFS type segment, it will cost free segment
more quickly.
This patch introduces a new algorithm named age threshold based
garbage collection to solve above issues, there are three steps
mainly:
1. select a source victim:
- set an age threshold, and select candidates beased threshold:
e.g.
0 means youngest, 100 means oldest, if we set age threshold to 80
then select dirty segments which has age in range of [80, 100] as
candiddates;
- set candidate_ratio threshold, and select candidates based the
ratio, so that we can shrink candidates to those oldest segments;
- select target segment with fewest valid blocks in order to
migrate blocks with minimum cost;
2. select a target victim:
- select candidates beased age threshold;
- set candidate_radius threshold, search candidates whose age is
around source victims, searching radius should less than the
radius threshold.
- select target segment with most valid blocks in order to avoid
migrating current target segment.
3. merge valid blocks from source victim into target victim with
SSR alloctor.
Test steps:
- create 160 dirty segments:
* half of them have 128 valid blocks per segment
* left of them have 384 valid blocks per segment
- run background GC
Benefit: GC count and block movement count both decrease obviously:
- Before:
- Valid: 86
- Dirty: 1
- Prefree: 11
- Free: 6001 (6001)
GC calls: 162 (BG: 220)
- data segments : 160 (160)
- node segments : 2 (2)
Try to move 41454 blocks (BG: 41454)
- data blocks : 40960 (40960)
- node blocks : 494 (494)
IPU: 0 blocks
SSR: 0 blocks in 0 segments
LFS: 41364 blocks in 81 segments
- After:
- Valid: 87
- Dirty: 0
- Prefree: 4
- Free: 6008 (6008)
GC calls: 75 (BG: 76)
- data segments : 74 (74)
- node segments : 1 (1)
Try to move 12813 blocks (BG: 12813)
- data blocks : 12544 (12544)
- node blocks : 269 (269)
IPU: 0 blocks
SSR: 12032 blocks in 77 segments
LFS: 855 blocks in 2 segments
Signed-off-by: Chao Yu <yuchao0@huawei.com>
[Jaegeuk Kim: fix a bug along with pinfile in-mem segment & clean up]
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-08-04 21:14:49 +08:00
|
|
|
int gc_mode;
|
|
|
|
|
|
|
|
if (gc_type == BG_GC) {
|
|
|
|
if (sbi->am.atgc_enabled)
|
|
|
|
gc_mode = GC_AT;
|
|
|
|
else
|
|
|
|
gc_mode = GC_CB;
|
|
|
|
} else {
|
|
|
|
gc_mode = GC_GREEDY;
|
|
|
|
}
|
2013-08-04 22:10:15 +08:00
|
|
|
|
2018-05-08 05:22:40 +08:00
|
|
|
switch (sbi->gc_mode) {
|
|
|
|
case GC_IDLE_CB:
|
|
|
|
gc_mode = GC_CB;
|
|
|
|
break;
|
|
|
|
case GC_IDLE_GREEDY:
|
2020-07-02 12:14:14 +08:00
|
|
|
case GC_URGENT_HIGH:
|
2018-02-27 07:40:30 +08:00
|
|
|
gc_mode = GC_GREEDY;
|
2018-05-08 05:22:40 +08:00
|
|
|
break;
|
f2fs: support age threshold based garbage collection
There are several issues in current background GC algorithm:
- valid blocks is one of key factors during cost overhead calculation,
so if segment has less valid block, however even its age is young or
it locates hot segment, CB algorithm will still choose the segment as
victim, it's not appropriate.
- GCed data/node will go to existing logs, no matter in-there datas'
update frequency is the same or not, it may mix hot and cold data
again.
- GC alloctor mainly use LFS type segment, it will cost free segment
more quickly.
This patch introduces a new algorithm named age threshold based
garbage collection to solve above issues, there are three steps
mainly:
1. select a source victim:
- set an age threshold, and select candidates beased threshold:
e.g.
0 means youngest, 100 means oldest, if we set age threshold to 80
then select dirty segments which has age in range of [80, 100] as
candiddates;
- set candidate_ratio threshold, and select candidates based the
ratio, so that we can shrink candidates to those oldest segments;
- select target segment with fewest valid blocks in order to
migrate blocks with minimum cost;
2. select a target victim:
- select candidates beased age threshold;
- set candidate_radius threshold, search candidates whose age is
around source victims, searching radius should less than the
radius threshold.
- select target segment with most valid blocks in order to avoid
migrating current target segment.
3. merge valid blocks from source victim into target victim with
SSR alloctor.
Test steps:
- create 160 dirty segments:
* half of them have 128 valid blocks per segment
* left of them have 384 valid blocks per segment
- run background GC
Benefit: GC count and block movement count both decrease obviously:
- Before:
- Valid: 86
- Dirty: 1
- Prefree: 11
- Free: 6001 (6001)
GC calls: 162 (BG: 220)
- data segments : 160 (160)
- node segments : 2 (2)
Try to move 41454 blocks (BG: 41454)
- data blocks : 40960 (40960)
- node blocks : 494 (494)
IPU: 0 blocks
SSR: 0 blocks in 0 segments
LFS: 41364 blocks in 81 segments
- After:
- Valid: 87
- Dirty: 0
- Prefree: 4
- Free: 6008 (6008)
GC calls: 75 (BG: 76)
- data segments : 74 (74)
- node segments : 1 (1)
Try to move 12813 blocks (BG: 12813)
- data blocks : 12544 (12544)
- node blocks : 269 (269)
IPU: 0 blocks
SSR: 12032 blocks in 77 segments
LFS: 855 blocks in 2 segments
Signed-off-by: Chao Yu <yuchao0@huawei.com>
[Jaegeuk Kim: fix a bug along with pinfile in-mem segment & clean up]
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-08-04 21:14:49 +08:00
|
|
|
case GC_IDLE_AT:
|
|
|
|
gc_mode = GC_AT;
|
|
|
|
break;
|
2018-05-08 05:22:40 +08:00
|
|
|
}
|
f2fs: support age threshold based garbage collection
There are several issues in current background GC algorithm:
- valid blocks is one of key factors during cost overhead calculation,
so if segment has less valid block, however even its age is young or
it locates hot segment, CB algorithm will still choose the segment as
victim, it's not appropriate.
- GCed data/node will go to existing logs, no matter in-there datas'
update frequency is the same or not, it may mix hot and cold data
again.
- GC alloctor mainly use LFS type segment, it will cost free segment
more quickly.
This patch introduces a new algorithm named age threshold based
garbage collection to solve above issues, there are three steps
mainly:
1. select a source victim:
- set an age threshold, and select candidates beased threshold:
e.g.
0 means youngest, 100 means oldest, if we set age threshold to 80
then select dirty segments which has age in range of [80, 100] as
candiddates;
- set candidate_ratio threshold, and select candidates based the
ratio, so that we can shrink candidates to those oldest segments;
- select target segment with fewest valid blocks in order to
migrate blocks with minimum cost;
2. select a target victim:
- select candidates beased age threshold;
- set candidate_radius threshold, search candidates whose age is
around source victims, searching radius should less than the
radius threshold.
- select target segment with most valid blocks in order to avoid
migrating current target segment.
3. merge valid blocks from source victim into target victim with
SSR alloctor.
Test steps:
- create 160 dirty segments:
* half of them have 128 valid blocks per segment
* left of them have 384 valid blocks per segment
- run background GC
Benefit: GC count and block movement count both decrease obviously:
- Before:
- Valid: 86
- Dirty: 1
- Prefree: 11
- Free: 6001 (6001)
GC calls: 162 (BG: 220)
- data segments : 160 (160)
- node segments : 2 (2)
Try to move 41454 blocks (BG: 41454)
- data blocks : 40960 (40960)
- node blocks : 494 (494)
IPU: 0 blocks
SSR: 0 blocks in 0 segments
LFS: 41364 blocks in 81 segments
- After:
- Valid: 87
- Dirty: 0
- Prefree: 4
- Free: 6008 (6008)
GC calls: 75 (BG: 76)
- data segments : 74 (74)
- node segments : 1 (1)
Try to move 12813 blocks (BG: 12813)
- data blocks : 12544 (12544)
- node blocks : 269 (269)
IPU: 0 blocks
SSR: 12032 blocks in 77 segments
LFS: 855 blocks in 2 segments
Signed-off-by: Chao Yu <yuchao0@huawei.com>
[Jaegeuk Kim: fix a bug along with pinfile in-mem segment & clean up]
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-08-04 21:14:49 +08:00
|
|
|
|
2013-08-04 22:10:15 +08:00
|
|
|
return gc_mode;
|
f2fs: add garbage collection functions
This adds on-demand and background cleaning functions.
- The basic background cleaning policy is trying to do cleaning jobs as much as
possible whenever the system is idle. Once the background cleaning is done,
the cleaner sleeps an amount of time not to interfere with VFS calls. The time
is dynamically adjusted according to the status of whole segments, which is
decreased when the following conditions are satisfied.
. GC is not conducted currently, and
. IO subsystem is idle by checking the number of requets in bdev's request
list, and
. There are enough dirty segments.
Otherwise, the time is increased incrementally until to the maximum time.
Note that, min and max times are 10 secs and 30 secs by default.
- F2FS adopts a default victim selection policy where background cleaning uses
a cost-benefit algorithm, while on-demand cleaning uses a greedy algorithm.
- The method of moving data during the cleaning is slightly different between
background and on-demand cleaning schemes. In the case of background cleaning,
F2FS loads the data, and marks them as dirty. Then, F2FS expects that the data
will be moved by flusher or VM. In the case of on-demand cleaning, F2FS should
move the data right away.
- In order to identify valid blocks in a victim segment, F2FS scans the bitmap
of the segment managed as an SIT entry.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:13:01 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
static void select_policy(struct f2fs_sb_info *sbi, int gc_type,
|
|
|
|
int type, struct victim_sel_policy *p)
|
|
|
|
{
|
|
|
|
struct dirty_seglist_info *dirty_i = DIRTY_I(sbi);
|
|
|
|
|
2013-03-31 12:49:18 +08:00
|
|
|
if (p->alloc_mode == SSR) {
|
f2fs: add garbage collection functions
This adds on-demand and background cleaning functions.
- The basic background cleaning policy is trying to do cleaning jobs as much as
possible whenever the system is idle. Once the background cleaning is done,
the cleaner sleeps an amount of time not to interfere with VFS calls. The time
is dynamically adjusted according to the status of whole segments, which is
decreased when the following conditions are satisfied.
. GC is not conducted currently, and
. IO subsystem is idle by checking the number of requets in bdev's request
list, and
. There are enough dirty segments.
Otherwise, the time is increased incrementally until to the maximum time.
Note that, min and max times are 10 secs and 30 secs by default.
- F2FS adopts a default victim selection policy where background cleaning uses
a cost-benefit algorithm, while on-demand cleaning uses a greedy algorithm.
- The method of moving data during the cleaning is slightly different between
background and on-demand cleaning schemes. In the case of background cleaning,
F2FS loads the data, and marks them as dirty. Then, F2FS expects that the data
will be moved by flusher or VM. In the case of on-demand cleaning, F2FS should
move the data right away.
- In order to identify valid blocks in a victim segment, F2FS scans the bitmap
of the segment managed as an SIT entry.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:13:01 +08:00
|
|
|
p->gc_mode = GC_GREEDY;
|
2020-06-18 12:37:10 +08:00
|
|
|
p->dirty_bitmap = dirty_i->dirty_segmap[type];
|
f2fs: optimize gc for better performance
This patch improves the gc efficiency by optimizing the victim
selection policy. With this optimization, the random re-write
performance could increase up to 20%.
For f2fs, when disk is in shortage of free spaces, gc will selects
dirty segments and moves valid blocks around for making more space
available. The gc cost of a segment is determined by the valid blocks
in the segment. The less the valid blocks, the higher the efficiency.
The ideal victim segment is the one that has the most garbage blocks.
Currently, it searches up to 20 dirty segments for a victim segment.
The selected victim is not likely the best victim for gc when there
are much more dirty segments. Why not searching more dirty segments
for a better victim? The cost of searching dirty segments is
negligible in comparison to moving blocks.
In this patch, it enlarges the MAX_VICTIM_SEARCH to 4096 to make
the search more aggressively for a possible better victim. Since
it also applies to victim selection for SSR, it will likely improve
the SSR efficiency as well.
The test case is simple. It creates as many files until the disk full.
The size for each file is 32KB. Then it writes as many as 100000
records of 4KB size to random offsets of random files in sync mode.
The testing was done on a 2GB partition of a SDHC card. Let's see the
test result of f2fs without and with the patch.
---------------------------------------
2GB partition, SDHC
create 52023 files of size 32768 bytes
random re-write 100000 records of 4KB
---------------------------------------
| file creation (s) | rewrite time (s) | gc count | gc garbage blocks |
[no patch] 341 4227 1174 174840
[patched] 324 2958 645 106682
It's obvious that, with the patch, f2fs finishes the test in 20+% less
time than without the patch. And internally it does much less gc with
higher efficiency than before.
Since the performance improvement is related to gc, it might not be so
obvious for other tests that do not trigger gc as often as this one (
This is because f2fs selects dirty segments for SSR use most of the
time when free space is in shortage). The well-known iozone test tool
was not used for benchmarking the patch becuase it seems do not have
a test case that performs random re-write on a full disk.
This patch is the revised version based on the suggestion from
Jaegeuk Kim.
Signed-off-by: Jin Xu <jinuxstyle@gmail.com>
[Jaegeuk Kim: suggested simpler solution]
Reviewed-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-09-05 12:45:26 +08:00
|
|
|
p->max_search = dirty_i->nr_dirty[type];
|
f2fs: add garbage collection functions
This adds on-demand and background cleaning functions.
- The basic background cleaning policy is trying to do cleaning jobs as much as
possible whenever the system is idle. Once the background cleaning is done,
the cleaner sleeps an amount of time not to interfere with VFS calls. The time
is dynamically adjusted according to the status of whole segments, which is
decreased when the following conditions are satisfied.
. GC is not conducted currently, and
. IO subsystem is idle by checking the number of requets in bdev's request
list, and
. There are enough dirty segments.
Otherwise, the time is increased incrementally until to the maximum time.
Note that, min and max times are 10 secs and 30 secs by default.
- F2FS adopts a default victim selection policy where background cleaning uses
a cost-benefit algorithm, while on-demand cleaning uses a greedy algorithm.
- The method of moving data during the cleaning is slightly different between
background and on-demand cleaning schemes. In the case of background cleaning,
F2FS loads the data, and marks them as dirty. Then, F2FS expects that the data
will be moved by flusher or VM. In the case of on-demand cleaning, F2FS should
move the data right away.
- In order to identify valid blocks in a victim segment, F2FS scans the bitmap
of the segment managed as an SIT entry.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:13:01 +08:00
|
|
|
p->ofs_unit = 1;
|
f2fs: support age threshold based garbage collection
There are several issues in current background GC algorithm:
- valid blocks is one of key factors during cost overhead calculation,
so if segment has less valid block, however even its age is young or
it locates hot segment, CB algorithm will still choose the segment as
victim, it's not appropriate.
- GCed data/node will go to existing logs, no matter in-there datas'
update frequency is the same or not, it may mix hot and cold data
again.
- GC alloctor mainly use LFS type segment, it will cost free segment
more quickly.
This patch introduces a new algorithm named age threshold based
garbage collection to solve above issues, there are three steps
mainly:
1. select a source victim:
- set an age threshold, and select candidates beased threshold:
e.g.
0 means youngest, 100 means oldest, if we set age threshold to 80
then select dirty segments which has age in range of [80, 100] as
candiddates;
- set candidate_ratio threshold, and select candidates based the
ratio, so that we can shrink candidates to those oldest segments;
- select target segment with fewest valid blocks in order to
migrate blocks with minimum cost;
2. select a target victim:
- select candidates beased age threshold;
- set candidate_radius threshold, search candidates whose age is
around source victims, searching radius should less than the
radius threshold.
- select target segment with most valid blocks in order to avoid
migrating current target segment.
3. merge valid blocks from source victim into target victim with
SSR alloctor.
Test steps:
- create 160 dirty segments:
* half of them have 128 valid blocks per segment
* left of them have 384 valid blocks per segment
- run background GC
Benefit: GC count and block movement count both decrease obviously:
- Before:
- Valid: 86
- Dirty: 1
- Prefree: 11
- Free: 6001 (6001)
GC calls: 162 (BG: 220)
- data segments : 160 (160)
- node segments : 2 (2)
Try to move 41454 blocks (BG: 41454)
- data blocks : 40960 (40960)
- node blocks : 494 (494)
IPU: 0 blocks
SSR: 0 blocks in 0 segments
LFS: 41364 blocks in 81 segments
- After:
- Valid: 87
- Dirty: 0
- Prefree: 4
- Free: 6008 (6008)
GC calls: 75 (BG: 76)
- data segments : 74 (74)
- node segments : 1 (1)
Try to move 12813 blocks (BG: 12813)
- data blocks : 12544 (12544)
- node blocks : 269 (269)
IPU: 0 blocks
SSR: 12032 blocks in 77 segments
LFS: 855 blocks in 2 segments
Signed-off-by: Chao Yu <yuchao0@huawei.com>
[Jaegeuk Kim: fix a bug along with pinfile in-mem segment & clean up]
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-08-04 21:14:49 +08:00
|
|
|
} else if (p->alloc_mode == AT_SSR) {
|
|
|
|
p->gc_mode = GC_GREEDY;
|
|
|
|
p->dirty_bitmap = dirty_i->dirty_segmap[type];
|
|
|
|
p->max_search = dirty_i->nr_dirty[type];
|
|
|
|
p->ofs_unit = 1;
|
f2fs: add garbage collection functions
This adds on-demand and background cleaning functions.
- The basic background cleaning policy is trying to do cleaning jobs as much as
possible whenever the system is idle. Once the background cleaning is done,
the cleaner sleeps an amount of time not to interfere with VFS calls. The time
is dynamically adjusted according to the status of whole segments, which is
decreased when the following conditions are satisfied.
. GC is not conducted currently, and
. IO subsystem is idle by checking the number of requets in bdev's request
list, and
. There are enough dirty segments.
Otherwise, the time is increased incrementally until to the maximum time.
Note that, min and max times are 10 secs and 30 secs by default.
- F2FS adopts a default victim selection policy where background cleaning uses
a cost-benefit algorithm, while on-demand cleaning uses a greedy algorithm.
- The method of moving data during the cleaning is slightly different between
background and on-demand cleaning schemes. In the case of background cleaning,
F2FS loads the data, and marks them as dirty. Then, F2FS expects that the data
will be moved by flusher or VM. In the case of on-demand cleaning, F2FS should
move the data right away.
- In order to identify valid blocks in a victim segment, F2FS scans the bitmap
of the segment managed as an SIT entry.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:13:01 +08:00
|
|
|
} else {
|
2018-05-08 05:22:40 +08:00
|
|
|
p->gc_mode = select_gc_type(sbi, gc_type);
|
f2fs: add garbage collection functions
This adds on-demand and background cleaning functions.
- The basic background cleaning policy is trying to do cleaning jobs as much as
possible whenever the system is idle. Once the background cleaning is done,
the cleaner sleeps an amount of time not to interfere with VFS calls. The time
is dynamically adjusted according to the status of whole segments, which is
decreased when the following conditions are satisfied.
. GC is not conducted currently, and
. IO subsystem is idle by checking the number of requets in bdev's request
list, and
. There are enough dirty segments.
Otherwise, the time is increased incrementally until to the maximum time.
Note that, min and max times are 10 secs and 30 secs by default.
- F2FS adopts a default victim selection policy where background cleaning uses
a cost-benefit algorithm, while on-demand cleaning uses a greedy algorithm.
- The method of moving data during the cleaning is slightly different between
background and on-demand cleaning schemes. In the case of background cleaning,
F2FS loads the data, and marks them as dirty. Then, F2FS expects that the data
will be moved by flusher or VM. In the case of on-demand cleaning, F2FS should
move the data right away.
- In order to identify valid blocks in a victim segment, F2FS scans the bitmap
of the segment managed as an SIT entry.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:13:01 +08:00
|
|
|
p->ofs_unit = sbi->segs_per_sec;
|
2020-06-18 12:37:10 +08:00
|
|
|
if (__is_large_section(sbi)) {
|
|
|
|
p->dirty_bitmap = dirty_i->dirty_secmap;
|
|
|
|
p->max_search = count_bits(p->dirty_bitmap,
|
|
|
|
0, MAIN_SECS(sbi));
|
|
|
|
} else {
|
|
|
|
p->dirty_bitmap = dirty_i->dirty_segmap[DIRTY];
|
|
|
|
p->max_search = dirty_i->nr_dirty[DIRTY];
|
|
|
|
}
|
f2fs: add garbage collection functions
This adds on-demand and background cleaning functions.
- The basic background cleaning policy is trying to do cleaning jobs as much as
possible whenever the system is idle. Once the background cleaning is done,
the cleaner sleeps an amount of time not to interfere with VFS calls. The time
is dynamically adjusted according to the status of whole segments, which is
decreased when the following conditions are satisfied.
. GC is not conducted currently, and
. IO subsystem is idle by checking the number of requets in bdev's request
list, and
. There are enough dirty segments.
Otherwise, the time is increased incrementally until to the maximum time.
Note that, min and max times are 10 secs and 30 secs by default.
- F2FS adopts a default victim selection policy where background cleaning uses
a cost-benefit algorithm, while on-demand cleaning uses a greedy algorithm.
- The method of moving data during the cleaning is slightly different between
background and on-demand cleaning schemes. In the case of background cleaning,
F2FS loads the data, and marks them as dirty. Then, F2FS expects that the data
will be moved by flusher or VM. In the case of on-demand cleaning, F2FS should
move the data right away.
- In order to identify valid blocks in a victim segment, F2FS scans the bitmap
of the segment managed as an SIT entry.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:13:01 +08:00
|
|
|
}
|
f2fs: optimize gc for better performance
This patch improves the gc efficiency by optimizing the victim
selection policy. With this optimization, the random re-write
performance could increase up to 20%.
For f2fs, when disk is in shortage of free spaces, gc will selects
dirty segments and moves valid blocks around for making more space
available. The gc cost of a segment is determined by the valid blocks
in the segment. The less the valid blocks, the higher the efficiency.
The ideal victim segment is the one that has the most garbage blocks.
Currently, it searches up to 20 dirty segments for a victim segment.
The selected victim is not likely the best victim for gc when there
are much more dirty segments. Why not searching more dirty segments
for a better victim? The cost of searching dirty segments is
negligible in comparison to moving blocks.
In this patch, it enlarges the MAX_VICTIM_SEARCH to 4096 to make
the search more aggressively for a possible better victim. Since
it also applies to victim selection for SSR, it will likely improve
the SSR efficiency as well.
The test case is simple. It creates as many files until the disk full.
The size for each file is 32KB. Then it writes as many as 100000
records of 4KB size to random offsets of random files in sync mode.
The testing was done on a 2GB partition of a SDHC card. Let's see the
test result of f2fs without and with the patch.
---------------------------------------
2GB partition, SDHC
create 52023 files of size 32768 bytes
random re-write 100000 records of 4KB
---------------------------------------
| file creation (s) | rewrite time (s) | gc count | gc garbage blocks |
[no patch] 341 4227 1174 174840
[patched] 324 2958 645 106682
It's obvious that, with the patch, f2fs finishes the test in 20+% less
time than without the patch. And internally it does much less gc with
higher efficiency than before.
Since the performance improvement is related to gc, it might not be so
obvious for other tests that do not trigger gc as often as this one (
This is because f2fs selects dirty segments for SSR use most of the
time when free space is in shortage). The well-known iozone test tool
was not used for benchmarking the patch becuase it seems do not have
a test case that performs random re-write on a full disk.
This patch is the revised version based on the suggestion from
Jaegeuk Kim.
Signed-off-by: Jin Xu <jinuxstyle@gmail.com>
[Jaegeuk Kim: suggested simpler solution]
Reviewed-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-09-05 12:45:26 +08:00
|
|
|
|
2020-02-27 19:30:05 +08:00
|
|
|
/*
|
|
|
|
* adjust candidates range, should select all dirty segments for
|
|
|
|
* foreground GC and urgent GC cases.
|
|
|
|
*/
|
2018-02-27 07:40:30 +08:00
|
|
|
if (gc_type != FG_GC &&
|
2020-07-02 12:14:14 +08:00
|
|
|
(sbi->gc_mode != GC_URGENT_HIGH) &&
|
f2fs: support age threshold based garbage collection
There are several issues in current background GC algorithm:
- valid blocks is one of key factors during cost overhead calculation,
so if segment has less valid block, however even its age is young or
it locates hot segment, CB algorithm will still choose the segment as
victim, it's not appropriate.
- GCed data/node will go to existing logs, no matter in-there datas'
update frequency is the same or not, it may mix hot and cold data
again.
- GC alloctor mainly use LFS type segment, it will cost free segment
more quickly.
This patch introduces a new algorithm named age threshold based
garbage collection to solve above issues, there are three steps
mainly:
1. select a source victim:
- set an age threshold, and select candidates beased threshold:
e.g.
0 means youngest, 100 means oldest, if we set age threshold to 80
then select dirty segments which has age in range of [80, 100] as
candiddates;
- set candidate_ratio threshold, and select candidates based the
ratio, so that we can shrink candidates to those oldest segments;
- select target segment with fewest valid blocks in order to
migrate blocks with minimum cost;
2. select a target victim:
- select candidates beased age threshold;
- set candidate_radius threshold, search candidates whose age is
around source victims, searching radius should less than the
radius threshold.
- select target segment with most valid blocks in order to avoid
migrating current target segment.
3. merge valid blocks from source victim into target victim with
SSR alloctor.
Test steps:
- create 160 dirty segments:
* half of them have 128 valid blocks per segment
* left of them have 384 valid blocks per segment
- run background GC
Benefit: GC count and block movement count both decrease obviously:
- Before:
- Valid: 86
- Dirty: 1
- Prefree: 11
- Free: 6001 (6001)
GC calls: 162 (BG: 220)
- data segments : 160 (160)
- node segments : 2 (2)
Try to move 41454 blocks (BG: 41454)
- data blocks : 40960 (40960)
- node blocks : 494 (494)
IPU: 0 blocks
SSR: 0 blocks in 0 segments
LFS: 41364 blocks in 81 segments
- After:
- Valid: 87
- Dirty: 0
- Prefree: 4
- Free: 6008 (6008)
GC calls: 75 (BG: 76)
- data segments : 74 (74)
- node segments : 1 (1)
Try to move 12813 blocks (BG: 12813)
- data blocks : 12544 (12544)
- node blocks : 269 (269)
IPU: 0 blocks
SSR: 12032 blocks in 77 segments
LFS: 855 blocks in 2 segments
Signed-off-by: Chao Yu <yuchao0@huawei.com>
[Jaegeuk Kim: fix a bug along with pinfile in-mem segment & clean up]
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-08-04 21:14:49 +08:00
|
|
|
(p->gc_mode != GC_AT && p->alloc_mode != AT_SSR) &&
|
2018-02-27 07:40:30 +08:00
|
|
|
p->max_search > sbi->max_victim_search)
|
2014-01-08 12:45:08 +08:00
|
|
|
p->max_search = sbi->max_victim_search;
|
f2fs: optimize gc for better performance
This patch improves the gc efficiency by optimizing the victim
selection policy. With this optimization, the random re-write
performance could increase up to 20%.
For f2fs, when disk is in shortage of free spaces, gc will selects
dirty segments and moves valid blocks around for making more space
available. The gc cost of a segment is determined by the valid blocks
in the segment. The less the valid blocks, the higher the efficiency.
The ideal victim segment is the one that has the most garbage blocks.
Currently, it searches up to 20 dirty segments for a victim segment.
The selected victim is not likely the best victim for gc when there
are much more dirty segments. Why not searching more dirty segments
for a better victim? The cost of searching dirty segments is
negligible in comparison to moving blocks.
In this patch, it enlarges the MAX_VICTIM_SEARCH to 4096 to make
the search more aggressively for a possible better victim. Since
it also applies to victim selection for SSR, it will likely improve
the SSR efficiency as well.
The test case is simple. It creates as many files until the disk full.
The size for each file is 32KB. Then it writes as many as 100000
records of 4KB size to random offsets of random files in sync mode.
The testing was done on a 2GB partition of a SDHC card. Let's see the
test result of f2fs without and with the patch.
---------------------------------------
2GB partition, SDHC
create 52023 files of size 32768 bytes
random re-write 100000 records of 4KB
---------------------------------------
| file creation (s) | rewrite time (s) | gc count | gc garbage blocks |
[no patch] 341 4227 1174 174840
[patched] 324 2958 645 106682
It's obvious that, with the patch, f2fs finishes the test in 20+% less
time than without the patch. And internally it does much less gc with
higher efficiency than before.
Since the performance improvement is related to gc, it might not be so
obvious for other tests that do not trigger gc as often as this one (
This is because f2fs selects dirty segments for SSR use most of the
time when free space is in shortage). The well-known iozone test tool
was not used for benchmarking the patch becuase it seems do not have
a test case that performs random re-write on a full disk.
This patch is the revised version based on the suggestion from
Jaegeuk Kim.
Signed-off-by: Jin Xu <jinuxstyle@gmail.com>
[Jaegeuk Kim: suggested simpler solution]
Reviewed-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-09-05 12:45:26 +08:00
|
|
|
|
2018-01-29 11:37:45 +08:00
|
|
|
/* let's select beginning hot/small space first in no_heap mode*/
|
|
|
|
if (test_opt(sbi, NOHEAP) &&
|
|
|
|
(type == CURSEG_HOT_DATA || IS_NODESEG(type)))
|
2017-03-25 08:41:45 +08:00
|
|
|
p->offset = 0;
|
|
|
|
else
|
2017-04-14 06:17:00 +08:00
|
|
|
p->offset = SIT_I(sbi)->last_victim[p->gc_mode];
|
f2fs: add garbage collection functions
This adds on-demand and background cleaning functions.
- The basic background cleaning policy is trying to do cleaning jobs as much as
possible whenever the system is idle. Once the background cleaning is done,
the cleaner sleeps an amount of time not to interfere with VFS calls. The time
is dynamically adjusted according to the status of whole segments, which is
decreased when the following conditions are satisfied.
. GC is not conducted currently, and
. IO subsystem is idle by checking the number of requets in bdev's request
list, and
. There are enough dirty segments.
Otherwise, the time is increased incrementally until to the maximum time.
Note that, min and max times are 10 secs and 30 secs by default.
- F2FS adopts a default victim selection policy where background cleaning uses
a cost-benefit algorithm, while on-demand cleaning uses a greedy algorithm.
- The method of moving data during the cleaning is slightly different between
background and on-demand cleaning schemes. In the case of background cleaning,
F2FS loads the data, and marks them as dirty. Then, F2FS expects that the data
will be moved by flusher or VM. In the case of on-demand cleaning, F2FS should
move the data right away.
- In order to identify valid blocks in a victim segment, F2FS scans the bitmap
of the segment managed as an SIT entry.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:13:01 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
static unsigned int get_max_cost(struct f2fs_sb_info *sbi,
|
|
|
|
struct victim_sel_policy *p)
|
|
|
|
{
|
2013-02-05 12:19:28 +08:00
|
|
|
/* SSR allocates in a segment unit */
|
|
|
|
if (p->alloc_mode == SSR)
|
2015-12-01 11:56:52 +08:00
|
|
|
return sbi->blocks_per_seg;
|
f2fs: support age threshold based garbage collection
There are several issues in current background GC algorithm:
- valid blocks is one of key factors during cost overhead calculation,
so if segment has less valid block, however even its age is young or
it locates hot segment, CB algorithm will still choose the segment as
victim, it's not appropriate.
- GCed data/node will go to existing logs, no matter in-there datas'
update frequency is the same or not, it may mix hot and cold data
again.
- GC alloctor mainly use LFS type segment, it will cost free segment
more quickly.
This patch introduces a new algorithm named age threshold based
garbage collection to solve above issues, there are three steps
mainly:
1. select a source victim:
- set an age threshold, and select candidates beased threshold:
e.g.
0 means youngest, 100 means oldest, if we set age threshold to 80
then select dirty segments which has age in range of [80, 100] as
candiddates;
- set candidate_ratio threshold, and select candidates based the
ratio, so that we can shrink candidates to those oldest segments;
- select target segment with fewest valid blocks in order to
migrate blocks with minimum cost;
2. select a target victim:
- select candidates beased age threshold;
- set candidate_radius threshold, search candidates whose age is
around source victims, searching radius should less than the
radius threshold.
- select target segment with most valid blocks in order to avoid
migrating current target segment.
3. merge valid blocks from source victim into target victim with
SSR alloctor.
Test steps:
- create 160 dirty segments:
* half of them have 128 valid blocks per segment
* left of them have 384 valid blocks per segment
- run background GC
Benefit: GC count and block movement count both decrease obviously:
- Before:
- Valid: 86
- Dirty: 1
- Prefree: 11
- Free: 6001 (6001)
GC calls: 162 (BG: 220)
- data segments : 160 (160)
- node segments : 2 (2)
Try to move 41454 blocks (BG: 41454)
- data blocks : 40960 (40960)
- node blocks : 494 (494)
IPU: 0 blocks
SSR: 0 blocks in 0 segments
LFS: 41364 blocks in 81 segments
- After:
- Valid: 87
- Dirty: 0
- Prefree: 4
- Free: 6008 (6008)
GC calls: 75 (BG: 76)
- data segments : 74 (74)
- node segments : 1 (1)
Try to move 12813 blocks (BG: 12813)
- data blocks : 12544 (12544)
- node blocks : 269 (269)
IPU: 0 blocks
SSR: 12032 blocks in 77 segments
LFS: 855 blocks in 2 segments
Signed-off-by: Chao Yu <yuchao0@huawei.com>
[Jaegeuk Kim: fix a bug along with pinfile in-mem segment & clean up]
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-08-04 21:14:49 +08:00
|
|
|
else if (p->alloc_mode == AT_SSR)
|
|
|
|
return UINT_MAX;
|
|
|
|
|
|
|
|
/* LFS */
|
f2fs: add garbage collection functions
This adds on-demand and background cleaning functions.
- The basic background cleaning policy is trying to do cleaning jobs as much as
possible whenever the system is idle. Once the background cleaning is done,
the cleaner sleeps an amount of time not to interfere with VFS calls. The time
is dynamically adjusted according to the status of whole segments, which is
decreased when the following conditions are satisfied.
. GC is not conducted currently, and
. IO subsystem is idle by checking the number of requets in bdev's request
list, and
. There are enough dirty segments.
Otherwise, the time is increased incrementally until to the maximum time.
Note that, min and max times are 10 secs and 30 secs by default.
- F2FS adopts a default victim selection policy where background cleaning uses
a cost-benefit algorithm, while on-demand cleaning uses a greedy algorithm.
- The method of moving data during the cleaning is slightly different between
background and on-demand cleaning schemes. In the case of background cleaning,
F2FS loads the data, and marks them as dirty. Then, F2FS expects that the data
will be moved by flusher or VM. In the case of on-demand cleaning, F2FS should
move the data right away.
- In order to identify valid blocks in a victim segment, F2FS scans the bitmap
of the segment managed as an SIT entry.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:13:01 +08:00
|
|
|
if (p->gc_mode == GC_GREEDY)
|
2017-03-25 15:03:02 +08:00
|
|
|
return 2 * sbi->blocks_per_seg * p->ofs_unit;
|
f2fs: add garbage collection functions
This adds on-demand and background cleaning functions.
- The basic background cleaning policy is trying to do cleaning jobs as much as
possible whenever the system is idle. Once the background cleaning is done,
the cleaner sleeps an amount of time not to interfere with VFS calls. The time
is dynamically adjusted according to the status of whole segments, which is
decreased when the following conditions are satisfied.
. GC is not conducted currently, and
. IO subsystem is idle by checking the number of requets in bdev's request
list, and
. There are enough dirty segments.
Otherwise, the time is increased incrementally until to the maximum time.
Note that, min and max times are 10 secs and 30 secs by default.
- F2FS adopts a default victim selection policy where background cleaning uses
a cost-benefit algorithm, while on-demand cleaning uses a greedy algorithm.
- The method of moving data during the cleaning is slightly different between
background and on-demand cleaning schemes. In the case of background cleaning,
F2FS loads the data, and marks them as dirty. Then, F2FS expects that the data
will be moved by flusher or VM. In the case of on-demand cleaning, F2FS should
move the data right away.
- In order to identify valid blocks in a victim segment, F2FS scans the bitmap
of the segment managed as an SIT entry.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:13:01 +08:00
|
|
|
else if (p->gc_mode == GC_CB)
|
|
|
|
return UINT_MAX;
|
f2fs: support age threshold based garbage collection
There are several issues in current background GC algorithm:
- valid blocks is one of key factors during cost overhead calculation,
so if segment has less valid block, however even its age is young or
it locates hot segment, CB algorithm will still choose the segment as
victim, it's not appropriate.
- GCed data/node will go to existing logs, no matter in-there datas'
update frequency is the same or not, it may mix hot and cold data
again.
- GC alloctor mainly use LFS type segment, it will cost free segment
more quickly.
This patch introduces a new algorithm named age threshold based
garbage collection to solve above issues, there are three steps
mainly:
1. select a source victim:
- set an age threshold, and select candidates beased threshold:
e.g.
0 means youngest, 100 means oldest, if we set age threshold to 80
then select dirty segments which has age in range of [80, 100] as
candiddates;
- set candidate_ratio threshold, and select candidates based the
ratio, so that we can shrink candidates to those oldest segments;
- select target segment with fewest valid blocks in order to
migrate blocks with minimum cost;
2. select a target victim:
- select candidates beased age threshold;
- set candidate_radius threshold, search candidates whose age is
around source victims, searching radius should less than the
radius threshold.
- select target segment with most valid blocks in order to avoid
migrating current target segment.
3. merge valid blocks from source victim into target victim with
SSR alloctor.
Test steps:
- create 160 dirty segments:
* half of them have 128 valid blocks per segment
* left of them have 384 valid blocks per segment
- run background GC
Benefit: GC count and block movement count both decrease obviously:
- Before:
- Valid: 86
- Dirty: 1
- Prefree: 11
- Free: 6001 (6001)
GC calls: 162 (BG: 220)
- data segments : 160 (160)
- node segments : 2 (2)
Try to move 41454 blocks (BG: 41454)
- data blocks : 40960 (40960)
- node blocks : 494 (494)
IPU: 0 blocks
SSR: 0 blocks in 0 segments
LFS: 41364 blocks in 81 segments
- After:
- Valid: 87
- Dirty: 0
- Prefree: 4
- Free: 6008 (6008)
GC calls: 75 (BG: 76)
- data segments : 74 (74)
- node segments : 1 (1)
Try to move 12813 blocks (BG: 12813)
- data blocks : 12544 (12544)
- node blocks : 269 (269)
IPU: 0 blocks
SSR: 12032 blocks in 77 segments
LFS: 855 blocks in 2 segments
Signed-off-by: Chao Yu <yuchao0@huawei.com>
[Jaegeuk Kim: fix a bug along with pinfile in-mem segment & clean up]
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-08-04 21:14:49 +08:00
|
|
|
else if (p->gc_mode == GC_AT)
|
|
|
|
return UINT_MAX;
|
f2fs: add garbage collection functions
This adds on-demand and background cleaning functions.
- The basic background cleaning policy is trying to do cleaning jobs as much as
possible whenever the system is idle. Once the background cleaning is done,
the cleaner sleeps an amount of time not to interfere with VFS calls. The time
is dynamically adjusted according to the status of whole segments, which is
decreased when the following conditions are satisfied.
. GC is not conducted currently, and
. IO subsystem is idle by checking the number of requets in bdev's request
list, and
. There are enough dirty segments.
Otherwise, the time is increased incrementally until to the maximum time.
Note that, min and max times are 10 secs and 30 secs by default.
- F2FS adopts a default victim selection policy where background cleaning uses
a cost-benefit algorithm, while on-demand cleaning uses a greedy algorithm.
- The method of moving data during the cleaning is slightly different between
background and on-demand cleaning schemes. In the case of background cleaning,
F2FS loads the data, and marks them as dirty. Then, F2FS expects that the data
will be moved by flusher or VM. In the case of on-demand cleaning, F2FS should
move the data right away.
- In order to identify valid blocks in a victim segment, F2FS scans the bitmap
of the segment managed as an SIT entry.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:13:01 +08:00
|
|
|
else /* No other gc_mode */
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
static unsigned int check_bg_victims(struct f2fs_sb_info *sbi)
|
|
|
|
{
|
|
|
|
struct dirty_seglist_info *dirty_i = DIRTY_I(sbi);
|
2013-03-31 12:26:03 +08:00
|
|
|
unsigned int secno;
|
f2fs: add garbage collection functions
This adds on-demand and background cleaning functions.
- The basic background cleaning policy is trying to do cleaning jobs as much as
possible whenever the system is idle. Once the background cleaning is done,
the cleaner sleeps an amount of time not to interfere with VFS calls. The time
is dynamically adjusted according to the status of whole segments, which is
decreased when the following conditions are satisfied.
. GC is not conducted currently, and
. IO subsystem is idle by checking the number of requets in bdev's request
list, and
. There are enough dirty segments.
Otherwise, the time is increased incrementally until to the maximum time.
Note that, min and max times are 10 secs and 30 secs by default.
- F2FS adopts a default victim selection policy where background cleaning uses
a cost-benefit algorithm, while on-demand cleaning uses a greedy algorithm.
- The method of moving data during the cleaning is slightly different between
background and on-demand cleaning schemes. In the case of background cleaning,
F2FS loads the data, and marks them as dirty. Then, F2FS expects that the data
will be moved by flusher or VM. In the case of on-demand cleaning, F2FS should
move the data right away.
- In order to identify valid blocks in a victim segment, F2FS scans the bitmap
of the segment managed as an SIT entry.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:13:01 +08:00
|
|
|
|
|
|
|
/*
|
|
|
|
* If the gc_type is FG_GC, we can select victim segments
|
|
|
|
* selected by background GC before.
|
|
|
|
* Those segments guarantee they have small valid blocks.
|
|
|
|
*/
|
2014-09-24 02:23:01 +08:00
|
|
|
for_each_set_bit(secno, dirty_i->victim_secmap, MAIN_SECS(sbi)) {
|
2013-03-31 12:26:03 +08:00
|
|
|
if (sec_usage_check(sbi, secno))
|
2014-08-04 10:10:07 +08:00
|
|
|
continue;
|
2013-03-31 12:26:03 +08:00
|
|
|
clear_bit(secno, dirty_i->victim_secmap);
|
2017-04-08 06:08:17 +08:00
|
|
|
return GET_SEG_FROM_SEC(sbi, secno);
|
f2fs: add garbage collection functions
This adds on-demand and background cleaning functions.
- The basic background cleaning policy is trying to do cleaning jobs as much as
possible whenever the system is idle. Once the background cleaning is done,
the cleaner sleeps an amount of time not to interfere with VFS calls. The time
is dynamically adjusted according to the status of whole segments, which is
decreased when the following conditions are satisfied.
. GC is not conducted currently, and
. IO subsystem is idle by checking the number of requets in bdev's request
list, and
. There are enough dirty segments.
Otherwise, the time is increased incrementally until to the maximum time.
Note that, min and max times are 10 secs and 30 secs by default.
- F2FS adopts a default victim selection policy where background cleaning uses
a cost-benefit algorithm, while on-demand cleaning uses a greedy algorithm.
- The method of moving data during the cleaning is slightly different between
background and on-demand cleaning schemes. In the case of background cleaning,
F2FS loads the data, and marks them as dirty. Then, F2FS expects that the data
will be moved by flusher or VM. In the case of on-demand cleaning, F2FS should
move the data right away.
- In order to identify valid blocks in a victim segment, F2FS scans the bitmap
of the segment managed as an SIT entry.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:13:01 +08:00
|
|
|
}
|
|
|
|
return NULL_SEGNO;
|
|
|
|
}
|
|
|
|
|
|
|
|
static unsigned int get_cb_cost(struct f2fs_sb_info *sbi, unsigned int segno)
|
|
|
|
{
|
|
|
|
struct sit_info *sit_i = SIT_I(sbi);
|
2017-04-08 06:08:17 +08:00
|
|
|
unsigned int secno = GET_SEC_FROM_SEG(sbi, segno);
|
|
|
|
unsigned int start = GET_SEG_FROM_SEC(sbi, secno);
|
f2fs: add garbage collection functions
This adds on-demand and background cleaning functions.
- The basic background cleaning policy is trying to do cleaning jobs as much as
possible whenever the system is idle. Once the background cleaning is done,
the cleaner sleeps an amount of time not to interfere with VFS calls. The time
is dynamically adjusted according to the status of whole segments, which is
decreased when the following conditions are satisfied.
. GC is not conducted currently, and
. IO subsystem is idle by checking the number of requets in bdev's request
list, and
. There are enough dirty segments.
Otherwise, the time is increased incrementally until to the maximum time.
Note that, min and max times are 10 secs and 30 secs by default.
- F2FS adopts a default victim selection policy where background cleaning uses
a cost-benefit algorithm, while on-demand cleaning uses a greedy algorithm.
- The method of moving data during the cleaning is slightly different between
background and on-demand cleaning schemes. In the case of background cleaning,
F2FS loads the data, and marks them as dirty. Then, F2FS expects that the data
will be moved by flusher or VM. In the case of on-demand cleaning, F2FS should
move the data right away.
- In order to identify valid blocks in a victim segment, F2FS scans the bitmap
of the segment managed as an SIT entry.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:13:01 +08:00
|
|
|
unsigned long long mtime = 0;
|
|
|
|
unsigned int vblocks;
|
|
|
|
unsigned char age = 0;
|
|
|
|
unsigned char u;
|
|
|
|
unsigned int i;
|
f2fs: support zone capacity less than zone size
NVMe Zoned Namespace devices can have zone-capacity less than zone-size.
Zone-capacity indicates the maximum number of sectors that are usable in
a zone beginning from the first sector of the zone. This makes the sectors
sectors after the zone-capacity till zone-size to be unusable.
This patch set tracks zone-size and zone-capacity in zoned devices and
calculate the usable blocks per segment and usable segments per section.
If zone-capacity is less than zone-size mark only those segments which
start before zone-capacity as free segments. All segments at and beyond
zone-capacity are treated as permanently used segments. In cases where
zone-capacity does not align with segment size the last segment will start
before zone-capacity and end beyond the zone-capacity of the zone. For
such spanning segments only sectors within the zone-capacity are used.
During writes and GC manage the usable segments in a section and usable
blocks per segment. Segments which are beyond zone-capacity are never
allocated, and do not need to be garbage collected, only the segments
which are before zone-capacity needs to garbage collected.
For spanning segments based on the number of usable blocks in that
segment, write to blocks only up to zone-capacity.
Zone-capacity is device specific and cannot be configured by the user.
Since NVMe ZNS device zones are sequentially write only, a block device
with conventional zones or any normal block device is needed along with
the ZNS device for the metadata operations of F2fs.
A typical nvme-cli output of a zoned device shows zone start and capacity
and write pointer as below:
SLBA: 0x0 WP: 0x0 Cap: 0x18800 State: EMPTY Type: SEQWRITE_REQ
SLBA: 0x20000 WP: 0x20000 Cap: 0x18800 State: EMPTY Type: SEQWRITE_REQ
SLBA: 0x40000 WP: 0x40000 Cap: 0x18800 State: EMPTY Type: SEQWRITE_REQ
Here zone size is 64MB, capacity is 49MB, WP is at zone start as the zones
are in EMPTY state. For each zone, only zone start + 49MB is usable area,
any lba/sector after 49MB cannot be read or written to, the drive will fail
any attempts to read/write. So, the second zone starts at 64MB and is
usable till 113MB (64 + 49) and the range between 113 and 128MB is
again unusable. The next zone starts at 128MB, and so on.
Signed-off-by: Aravind Ramesh <aravind.ramesh@wdc.com>
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-07-16 20:56:56 +08:00
|
|
|
unsigned int usable_segs_per_sec = f2fs_usable_segs_in_sec(sbi, segno);
|
f2fs: add garbage collection functions
This adds on-demand and background cleaning functions.
- The basic background cleaning policy is trying to do cleaning jobs as much as
possible whenever the system is idle. Once the background cleaning is done,
the cleaner sleeps an amount of time not to interfere with VFS calls. The time
is dynamically adjusted according to the status of whole segments, which is
decreased when the following conditions are satisfied.
. GC is not conducted currently, and
. IO subsystem is idle by checking the number of requets in bdev's request
list, and
. There are enough dirty segments.
Otherwise, the time is increased incrementally until to the maximum time.
Note that, min and max times are 10 secs and 30 secs by default.
- F2FS adopts a default victim selection policy where background cleaning uses
a cost-benefit algorithm, while on-demand cleaning uses a greedy algorithm.
- The method of moving data during the cleaning is slightly different between
background and on-demand cleaning schemes. In the case of background cleaning,
F2FS loads the data, and marks them as dirty. Then, F2FS expects that the data
will be moved by flusher or VM. In the case of on-demand cleaning, F2FS should
move the data right away.
- In order to identify valid blocks in a victim segment, F2FS scans the bitmap
of the segment managed as an SIT entry.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:13:01 +08:00
|
|
|
|
f2fs: support zone capacity less than zone size
NVMe Zoned Namespace devices can have zone-capacity less than zone-size.
Zone-capacity indicates the maximum number of sectors that are usable in
a zone beginning from the first sector of the zone. This makes the sectors
sectors after the zone-capacity till zone-size to be unusable.
This patch set tracks zone-size and zone-capacity in zoned devices and
calculate the usable blocks per segment and usable segments per section.
If zone-capacity is less than zone-size mark only those segments which
start before zone-capacity as free segments. All segments at and beyond
zone-capacity are treated as permanently used segments. In cases where
zone-capacity does not align with segment size the last segment will start
before zone-capacity and end beyond the zone-capacity of the zone. For
such spanning segments only sectors within the zone-capacity are used.
During writes and GC manage the usable segments in a section and usable
blocks per segment. Segments which are beyond zone-capacity are never
allocated, and do not need to be garbage collected, only the segments
which are before zone-capacity needs to garbage collected.
For spanning segments based on the number of usable blocks in that
segment, write to blocks only up to zone-capacity.
Zone-capacity is device specific and cannot be configured by the user.
Since NVMe ZNS device zones are sequentially write only, a block device
with conventional zones or any normal block device is needed along with
the ZNS device for the metadata operations of F2fs.
A typical nvme-cli output of a zoned device shows zone start and capacity
and write pointer as below:
SLBA: 0x0 WP: 0x0 Cap: 0x18800 State: EMPTY Type: SEQWRITE_REQ
SLBA: 0x20000 WP: 0x20000 Cap: 0x18800 State: EMPTY Type: SEQWRITE_REQ
SLBA: 0x40000 WP: 0x40000 Cap: 0x18800 State: EMPTY Type: SEQWRITE_REQ
Here zone size is 64MB, capacity is 49MB, WP is at zone start as the zones
are in EMPTY state. For each zone, only zone start + 49MB is usable area,
any lba/sector after 49MB cannot be read or written to, the drive will fail
any attempts to read/write. So, the second zone starts at 64MB and is
usable till 113MB (64 + 49) and the range between 113 and 128MB is
again unusable. The next zone starts at 128MB, and so on.
Signed-off-by: Aravind Ramesh <aravind.ramesh@wdc.com>
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-07-16 20:56:56 +08:00
|
|
|
for (i = 0; i < usable_segs_per_sec; i++)
|
f2fs: add garbage collection functions
This adds on-demand and background cleaning functions.
- The basic background cleaning policy is trying to do cleaning jobs as much as
possible whenever the system is idle. Once the background cleaning is done,
the cleaner sleeps an amount of time not to interfere with VFS calls. The time
is dynamically adjusted according to the status of whole segments, which is
decreased when the following conditions are satisfied.
. GC is not conducted currently, and
. IO subsystem is idle by checking the number of requets in bdev's request
list, and
. There are enough dirty segments.
Otherwise, the time is increased incrementally until to the maximum time.
Note that, min and max times are 10 secs and 30 secs by default.
- F2FS adopts a default victim selection policy where background cleaning uses
a cost-benefit algorithm, while on-demand cleaning uses a greedy algorithm.
- The method of moving data during the cleaning is slightly different between
background and on-demand cleaning schemes. In the case of background cleaning,
F2FS loads the data, and marks them as dirty. Then, F2FS expects that the data
will be moved by flusher or VM. In the case of on-demand cleaning, F2FS should
move the data right away.
- In order to identify valid blocks in a victim segment, F2FS scans the bitmap
of the segment managed as an SIT entry.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:13:01 +08:00
|
|
|
mtime += get_seg_entry(sbi, start + i)->mtime;
|
2017-04-08 05:33:22 +08:00
|
|
|
vblocks = get_valid_blocks(sbi, segno, true);
|
f2fs: add garbage collection functions
This adds on-demand and background cleaning functions.
- The basic background cleaning policy is trying to do cleaning jobs as much as
possible whenever the system is idle. Once the background cleaning is done,
the cleaner sleeps an amount of time not to interfere with VFS calls. The time
is dynamically adjusted according to the status of whole segments, which is
decreased when the following conditions are satisfied.
. GC is not conducted currently, and
. IO subsystem is idle by checking the number of requets in bdev's request
list, and
. There are enough dirty segments.
Otherwise, the time is increased incrementally until to the maximum time.
Note that, min and max times are 10 secs and 30 secs by default.
- F2FS adopts a default victim selection policy where background cleaning uses
a cost-benefit algorithm, while on-demand cleaning uses a greedy algorithm.
- The method of moving data during the cleaning is slightly different between
background and on-demand cleaning schemes. In the case of background cleaning,
F2FS loads the data, and marks them as dirty. Then, F2FS expects that the data
will be moved by flusher or VM. In the case of on-demand cleaning, F2FS should
move the data right away.
- In order to identify valid blocks in a victim segment, F2FS scans the bitmap
of the segment managed as an SIT entry.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:13:01 +08:00
|
|
|
|
f2fs: support zone capacity less than zone size
NVMe Zoned Namespace devices can have zone-capacity less than zone-size.
Zone-capacity indicates the maximum number of sectors that are usable in
a zone beginning from the first sector of the zone. This makes the sectors
sectors after the zone-capacity till zone-size to be unusable.
This patch set tracks zone-size and zone-capacity in zoned devices and
calculate the usable blocks per segment and usable segments per section.
If zone-capacity is less than zone-size mark only those segments which
start before zone-capacity as free segments. All segments at and beyond
zone-capacity are treated as permanently used segments. In cases where
zone-capacity does not align with segment size the last segment will start
before zone-capacity and end beyond the zone-capacity of the zone. For
such spanning segments only sectors within the zone-capacity are used.
During writes and GC manage the usable segments in a section and usable
blocks per segment. Segments which are beyond zone-capacity are never
allocated, and do not need to be garbage collected, only the segments
which are before zone-capacity needs to garbage collected.
For spanning segments based on the number of usable blocks in that
segment, write to blocks only up to zone-capacity.
Zone-capacity is device specific and cannot be configured by the user.
Since NVMe ZNS device zones are sequentially write only, a block device
with conventional zones or any normal block device is needed along with
the ZNS device for the metadata operations of F2fs.
A typical nvme-cli output of a zoned device shows zone start and capacity
and write pointer as below:
SLBA: 0x0 WP: 0x0 Cap: 0x18800 State: EMPTY Type: SEQWRITE_REQ
SLBA: 0x20000 WP: 0x20000 Cap: 0x18800 State: EMPTY Type: SEQWRITE_REQ
SLBA: 0x40000 WP: 0x40000 Cap: 0x18800 State: EMPTY Type: SEQWRITE_REQ
Here zone size is 64MB, capacity is 49MB, WP is at zone start as the zones
are in EMPTY state. For each zone, only zone start + 49MB is usable area,
any lba/sector after 49MB cannot be read or written to, the drive will fail
any attempts to read/write. So, the second zone starts at 64MB and is
usable till 113MB (64 + 49) and the range between 113 and 128MB is
again unusable. The next zone starts at 128MB, and so on.
Signed-off-by: Aravind Ramesh <aravind.ramesh@wdc.com>
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-07-16 20:56:56 +08:00
|
|
|
mtime = div_u64(mtime, usable_segs_per_sec);
|
|
|
|
vblocks = div_u64(vblocks, usable_segs_per_sec);
|
f2fs: add garbage collection functions
This adds on-demand and background cleaning functions.
- The basic background cleaning policy is trying to do cleaning jobs as much as
possible whenever the system is idle. Once the background cleaning is done,
the cleaner sleeps an amount of time not to interfere with VFS calls. The time
is dynamically adjusted according to the status of whole segments, which is
decreased when the following conditions are satisfied.
. GC is not conducted currently, and
. IO subsystem is idle by checking the number of requets in bdev's request
list, and
. There are enough dirty segments.
Otherwise, the time is increased incrementally until to the maximum time.
Note that, min and max times are 10 secs and 30 secs by default.
- F2FS adopts a default victim selection policy where background cleaning uses
a cost-benefit algorithm, while on-demand cleaning uses a greedy algorithm.
- The method of moving data during the cleaning is slightly different between
background and on-demand cleaning schemes. In the case of background cleaning,
F2FS loads the data, and marks them as dirty. Then, F2FS expects that the data
will be moved by flusher or VM. In the case of on-demand cleaning, F2FS should
move the data right away.
- In order to identify valid blocks in a victim segment, F2FS scans the bitmap
of the segment managed as an SIT entry.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:13:01 +08:00
|
|
|
|
|
|
|
u = (vblocks * 100) >> sbi->log_blocks_per_seg;
|
|
|
|
|
2014-08-06 22:22:50 +08:00
|
|
|
/* Handle if the system time has changed by the user */
|
f2fs: add garbage collection functions
This adds on-demand and background cleaning functions.
- The basic background cleaning policy is trying to do cleaning jobs as much as
possible whenever the system is idle. Once the background cleaning is done,
the cleaner sleeps an amount of time not to interfere with VFS calls. The time
is dynamically adjusted according to the status of whole segments, which is
decreased when the following conditions are satisfied.
. GC is not conducted currently, and
. IO subsystem is idle by checking the number of requets in bdev's request
list, and
. There are enough dirty segments.
Otherwise, the time is increased incrementally until to the maximum time.
Note that, min and max times are 10 secs and 30 secs by default.
- F2FS adopts a default victim selection policy where background cleaning uses
a cost-benefit algorithm, while on-demand cleaning uses a greedy algorithm.
- The method of moving data during the cleaning is slightly different between
background and on-demand cleaning schemes. In the case of background cleaning,
F2FS loads the data, and marks them as dirty. Then, F2FS expects that the data
will be moved by flusher or VM. In the case of on-demand cleaning, F2FS should
move the data right away.
- In order to identify valid blocks in a victim segment, F2FS scans the bitmap
of the segment managed as an SIT entry.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:13:01 +08:00
|
|
|
if (mtime < sit_i->min_mtime)
|
|
|
|
sit_i->min_mtime = mtime;
|
|
|
|
if (mtime > sit_i->max_mtime)
|
|
|
|
sit_i->max_mtime = mtime;
|
|
|
|
if (sit_i->max_mtime != sit_i->min_mtime)
|
|
|
|
age = 100 - div64_u64(100 * (mtime - sit_i->min_mtime),
|
|
|
|
sit_i->max_mtime - sit_i->min_mtime);
|
|
|
|
|
|
|
|
return UINT_MAX - ((100 * (100 - u) * age) / (100 + u));
|
|
|
|
}
|
|
|
|
|
2013-09-13 08:38:54 +08:00
|
|
|
static inline unsigned int get_gc_cost(struct f2fs_sb_info *sbi,
|
|
|
|
unsigned int segno, struct victim_sel_policy *p)
|
f2fs: add garbage collection functions
This adds on-demand and background cleaning functions.
- The basic background cleaning policy is trying to do cleaning jobs as much as
possible whenever the system is idle. Once the background cleaning is done,
the cleaner sleeps an amount of time not to interfere with VFS calls. The time
is dynamically adjusted according to the status of whole segments, which is
decreased when the following conditions are satisfied.
. GC is not conducted currently, and
. IO subsystem is idle by checking the number of requets in bdev's request
list, and
. There are enough dirty segments.
Otherwise, the time is increased incrementally until to the maximum time.
Note that, min and max times are 10 secs and 30 secs by default.
- F2FS adopts a default victim selection policy where background cleaning uses
a cost-benefit algorithm, while on-demand cleaning uses a greedy algorithm.
- The method of moving data during the cleaning is slightly different between
background and on-demand cleaning schemes. In the case of background cleaning,
F2FS loads the data, and marks them as dirty. Then, F2FS expects that the data
will be moved by flusher or VM. In the case of on-demand cleaning, F2FS should
move the data right away.
- In order to identify valid blocks in a victim segment, F2FS scans the bitmap
of the segment managed as an SIT entry.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:13:01 +08:00
|
|
|
{
|
|
|
|
if (p->alloc_mode == SSR)
|
2017-09-04 11:10:18 +08:00
|
|
|
return get_seg_entry(sbi, segno)->ckpt_valid_blocks;
|
f2fs: add garbage collection functions
This adds on-demand and background cleaning functions.
- The basic background cleaning policy is trying to do cleaning jobs as much as
possible whenever the system is idle. Once the background cleaning is done,
the cleaner sleeps an amount of time not to interfere with VFS calls. The time
is dynamically adjusted according to the status of whole segments, which is
decreased when the following conditions are satisfied.
. GC is not conducted currently, and
. IO subsystem is idle by checking the number of requets in bdev's request
list, and
. There are enough dirty segments.
Otherwise, the time is increased incrementally until to the maximum time.
Note that, min and max times are 10 secs and 30 secs by default.
- F2FS adopts a default victim selection policy where background cleaning uses
a cost-benefit algorithm, while on-demand cleaning uses a greedy algorithm.
- The method of moving data during the cleaning is slightly different between
background and on-demand cleaning schemes. In the case of background cleaning,
F2FS loads the data, and marks them as dirty. Then, F2FS expects that the data
will be moved by flusher or VM. In the case of on-demand cleaning, F2FS should
move the data right away.
- In order to identify valid blocks in a victim segment, F2FS scans the bitmap
of the segment managed as an SIT entry.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:13:01 +08:00
|
|
|
|
|
|
|
/* alloc_mode == LFS */
|
|
|
|
if (p->gc_mode == GC_GREEDY)
|
2017-09-23 17:02:18 +08:00
|
|
|
return get_valid_blocks(sbi, segno, true);
|
f2fs: support age threshold based garbage collection
There are several issues in current background GC algorithm:
- valid blocks is one of key factors during cost overhead calculation,
so if segment has less valid block, however even its age is young or
it locates hot segment, CB algorithm will still choose the segment as
victim, it's not appropriate.
- GCed data/node will go to existing logs, no matter in-there datas'
update frequency is the same or not, it may mix hot and cold data
again.
- GC alloctor mainly use LFS type segment, it will cost free segment
more quickly.
This patch introduces a new algorithm named age threshold based
garbage collection to solve above issues, there are three steps
mainly:
1. select a source victim:
- set an age threshold, and select candidates beased threshold:
e.g.
0 means youngest, 100 means oldest, if we set age threshold to 80
then select dirty segments which has age in range of [80, 100] as
candiddates;
- set candidate_ratio threshold, and select candidates based the
ratio, so that we can shrink candidates to those oldest segments;
- select target segment with fewest valid blocks in order to
migrate blocks with minimum cost;
2. select a target victim:
- select candidates beased age threshold;
- set candidate_radius threshold, search candidates whose age is
around source victims, searching radius should less than the
radius threshold.
- select target segment with most valid blocks in order to avoid
migrating current target segment.
3. merge valid blocks from source victim into target victim with
SSR alloctor.
Test steps:
- create 160 dirty segments:
* half of them have 128 valid blocks per segment
* left of them have 384 valid blocks per segment
- run background GC
Benefit: GC count and block movement count both decrease obviously:
- Before:
- Valid: 86
- Dirty: 1
- Prefree: 11
- Free: 6001 (6001)
GC calls: 162 (BG: 220)
- data segments : 160 (160)
- node segments : 2 (2)
Try to move 41454 blocks (BG: 41454)
- data blocks : 40960 (40960)
- node blocks : 494 (494)
IPU: 0 blocks
SSR: 0 blocks in 0 segments
LFS: 41364 blocks in 81 segments
- After:
- Valid: 87
- Dirty: 0
- Prefree: 4
- Free: 6008 (6008)
GC calls: 75 (BG: 76)
- data segments : 74 (74)
- node segments : 1 (1)
Try to move 12813 blocks (BG: 12813)
- data blocks : 12544 (12544)
- node blocks : 269 (269)
IPU: 0 blocks
SSR: 12032 blocks in 77 segments
LFS: 855 blocks in 2 segments
Signed-off-by: Chao Yu <yuchao0@huawei.com>
[Jaegeuk Kim: fix a bug along with pinfile in-mem segment & clean up]
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-08-04 21:14:49 +08:00
|
|
|
else if (p->gc_mode == GC_CB)
|
f2fs: add garbage collection functions
This adds on-demand and background cleaning functions.
- The basic background cleaning policy is trying to do cleaning jobs as much as
possible whenever the system is idle. Once the background cleaning is done,
the cleaner sleeps an amount of time not to interfere with VFS calls. The time
is dynamically adjusted according to the status of whole segments, which is
decreased when the following conditions are satisfied.
. GC is not conducted currently, and
. IO subsystem is idle by checking the number of requets in bdev's request
list, and
. There are enough dirty segments.
Otherwise, the time is increased incrementally until to the maximum time.
Note that, min and max times are 10 secs and 30 secs by default.
- F2FS adopts a default victim selection policy where background cleaning uses
a cost-benefit algorithm, while on-demand cleaning uses a greedy algorithm.
- The method of moving data during the cleaning is slightly different between
background and on-demand cleaning schemes. In the case of background cleaning,
F2FS loads the data, and marks them as dirty. Then, F2FS expects that the data
will be moved by flusher or VM. In the case of on-demand cleaning, F2FS should
move the data right away.
- In order to identify valid blocks in a victim segment, F2FS scans the bitmap
of the segment managed as an SIT entry.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:13:01 +08:00
|
|
|
return get_cb_cost(sbi, segno);
|
f2fs: support age threshold based garbage collection
There are several issues in current background GC algorithm:
- valid blocks is one of key factors during cost overhead calculation,
so if segment has less valid block, however even its age is young or
it locates hot segment, CB algorithm will still choose the segment as
victim, it's not appropriate.
- GCed data/node will go to existing logs, no matter in-there datas'
update frequency is the same or not, it may mix hot and cold data
again.
- GC alloctor mainly use LFS type segment, it will cost free segment
more quickly.
This patch introduces a new algorithm named age threshold based
garbage collection to solve above issues, there are three steps
mainly:
1. select a source victim:
- set an age threshold, and select candidates beased threshold:
e.g.
0 means youngest, 100 means oldest, if we set age threshold to 80
then select dirty segments which has age in range of [80, 100] as
candiddates;
- set candidate_ratio threshold, and select candidates based the
ratio, so that we can shrink candidates to those oldest segments;
- select target segment with fewest valid blocks in order to
migrate blocks with minimum cost;
2. select a target victim:
- select candidates beased age threshold;
- set candidate_radius threshold, search candidates whose age is
around source victims, searching radius should less than the
radius threshold.
- select target segment with most valid blocks in order to avoid
migrating current target segment.
3. merge valid blocks from source victim into target victim with
SSR alloctor.
Test steps:
- create 160 dirty segments:
* half of them have 128 valid blocks per segment
* left of them have 384 valid blocks per segment
- run background GC
Benefit: GC count and block movement count both decrease obviously:
- Before:
- Valid: 86
- Dirty: 1
- Prefree: 11
- Free: 6001 (6001)
GC calls: 162 (BG: 220)
- data segments : 160 (160)
- node segments : 2 (2)
Try to move 41454 blocks (BG: 41454)
- data blocks : 40960 (40960)
- node blocks : 494 (494)
IPU: 0 blocks
SSR: 0 blocks in 0 segments
LFS: 41364 blocks in 81 segments
- After:
- Valid: 87
- Dirty: 0
- Prefree: 4
- Free: 6008 (6008)
GC calls: 75 (BG: 76)
- data segments : 74 (74)
- node segments : 1 (1)
Try to move 12813 blocks (BG: 12813)
- data blocks : 12544 (12544)
- node blocks : 269 (269)
IPU: 0 blocks
SSR: 12032 blocks in 77 segments
LFS: 855 blocks in 2 segments
Signed-off-by: Chao Yu <yuchao0@huawei.com>
[Jaegeuk Kim: fix a bug along with pinfile in-mem segment & clean up]
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-08-04 21:14:49 +08:00
|
|
|
|
|
|
|
f2fs_bug_on(sbi, 1);
|
|
|
|
return 0;
|
f2fs: add garbage collection functions
This adds on-demand and background cleaning functions.
- The basic background cleaning policy is trying to do cleaning jobs as much as
possible whenever the system is idle. Once the background cleaning is done,
the cleaner sleeps an amount of time not to interfere with VFS calls. The time
is dynamically adjusted according to the status of whole segments, which is
decreased when the following conditions are satisfied.
. GC is not conducted currently, and
. IO subsystem is idle by checking the number of requets in bdev's request
list, and
. There are enough dirty segments.
Otherwise, the time is increased incrementally until to the maximum time.
Note that, min and max times are 10 secs and 30 secs by default.
- F2FS adopts a default victim selection policy where background cleaning uses
a cost-benefit algorithm, while on-demand cleaning uses a greedy algorithm.
- The method of moving data during the cleaning is slightly different between
background and on-demand cleaning schemes. In the case of background cleaning,
F2FS loads the data, and marks them as dirty. Then, F2FS expects that the data
will be moved by flusher or VM. In the case of on-demand cleaning, F2FS should
move the data right away.
- In order to identify valid blocks in a victim segment, F2FS scans the bitmap
of the segment managed as an SIT entry.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:13:01 +08:00
|
|
|
}
|
|
|
|
|
2016-02-03 16:21:57 +08:00
|
|
|
static unsigned int count_bits(const unsigned long *addr,
|
|
|
|
unsigned int offset, unsigned int len)
|
|
|
|
{
|
|
|
|
unsigned int end = offset + len, sum = 0;
|
|
|
|
|
|
|
|
while (offset < end) {
|
|
|
|
if (test_bit(offset++, addr))
|
|
|
|
++sum;
|
|
|
|
}
|
|
|
|
return sum;
|
|
|
|
}
|
|
|
|
|
f2fs: support age threshold based garbage collection
There are several issues in current background GC algorithm:
- valid blocks is one of key factors during cost overhead calculation,
so if segment has less valid block, however even its age is young or
it locates hot segment, CB algorithm will still choose the segment as
victim, it's not appropriate.
- GCed data/node will go to existing logs, no matter in-there datas'
update frequency is the same or not, it may mix hot and cold data
again.
- GC alloctor mainly use LFS type segment, it will cost free segment
more quickly.
This patch introduces a new algorithm named age threshold based
garbage collection to solve above issues, there are three steps
mainly:
1. select a source victim:
- set an age threshold, and select candidates beased threshold:
e.g.
0 means youngest, 100 means oldest, if we set age threshold to 80
then select dirty segments which has age in range of [80, 100] as
candiddates;
- set candidate_ratio threshold, and select candidates based the
ratio, so that we can shrink candidates to those oldest segments;
- select target segment with fewest valid blocks in order to
migrate blocks with minimum cost;
2. select a target victim:
- select candidates beased age threshold;
- set candidate_radius threshold, search candidates whose age is
around source victims, searching radius should less than the
radius threshold.
- select target segment with most valid blocks in order to avoid
migrating current target segment.
3. merge valid blocks from source victim into target victim with
SSR alloctor.
Test steps:
- create 160 dirty segments:
* half of them have 128 valid blocks per segment
* left of them have 384 valid blocks per segment
- run background GC
Benefit: GC count and block movement count both decrease obviously:
- Before:
- Valid: 86
- Dirty: 1
- Prefree: 11
- Free: 6001 (6001)
GC calls: 162 (BG: 220)
- data segments : 160 (160)
- node segments : 2 (2)
Try to move 41454 blocks (BG: 41454)
- data blocks : 40960 (40960)
- node blocks : 494 (494)
IPU: 0 blocks
SSR: 0 blocks in 0 segments
LFS: 41364 blocks in 81 segments
- After:
- Valid: 87
- Dirty: 0
- Prefree: 4
- Free: 6008 (6008)
GC calls: 75 (BG: 76)
- data segments : 74 (74)
- node segments : 1 (1)
Try to move 12813 blocks (BG: 12813)
- data blocks : 12544 (12544)
- node blocks : 269 (269)
IPU: 0 blocks
SSR: 12032 blocks in 77 segments
LFS: 855 blocks in 2 segments
Signed-off-by: Chao Yu <yuchao0@huawei.com>
[Jaegeuk Kim: fix a bug along with pinfile in-mem segment & clean up]
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-08-04 21:14:49 +08:00
|
|
|
static struct victim_entry *attach_victim_entry(struct f2fs_sb_info *sbi,
|
|
|
|
unsigned long long mtime, unsigned int segno,
|
|
|
|
struct rb_node *parent, struct rb_node **p,
|
|
|
|
bool left_most)
|
|
|
|
{
|
|
|
|
struct atgc_management *am = &sbi->am;
|
|
|
|
struct victim_entry *ve;
|
|
|
|
|
|
|
|
ve = f2fs_kmem_cache_alloc(victim_entry_slab, GFP_NOFS);
|
|
|
|
|
|
|
|
ve->mtime = mtime;
|
|
|
|
ve->segno = segno;
|
|
|
|
|
|
|
|
rb_link_node(&ve->rb_node, parent, p);
|
|
|
|
rb_insert_color_cached(&ve->rb_node, &am->root, left_most);
|
|
|
|
|
|
|
|
list_add_tail(&ve->list, &am->victim_list);
|
|
|
|
|
|
|
|
am->victim_count++;
|
|
|
|
|
|
|
|
return ve;
|
|
|
|
}
|
|
|
|
|
|
|
|
static void insert_victim_entry(struct f2fs_sb_info *sbi,
|
|
|
|
unsigned long long mtime, unsigned int segno)
|
|
|
|
{
|
|
|
|
struct atgc_management *am = &sbi->am;
|
|
|
|
struct rb_node **p;
|
|
|
|
struct rb_node *parent = NULL;
|
|
|
|
bool left_most = true;
|
|
|
|
|
|
|
|
p = f2fs_lookup_rb_tree_ext(sbi, &am->root, &parent, mtime, &left_most);
|
|
|
|
attach_victim_entry(sbi, mtime, segno, parent, p, left_most);
|
|
|
|
}
|
|
|
|
|
|
|
|
static void add_victim_entry(struct f2fs_sb_info *sbi,
|
|
|
|
struct victim_sel_policy *p, unsigned int segno)
|
|
|
|
{
|
|
|
|
struct sit_info *sit_i = SIT_I(sbi);
|
|
|
|
unsigned int secno = GET_SEC_FROM_SEG(sbi, segno);
|
|
|
|
unsigned int start = GET_SEG_FROM_SEC(sbi, secno);
|
|
|
|
unsigned long long mtime = 0;
|
|
|
|
unsigned int i;
|
|
|
|
|
|
|
|
if (unlikely(is_sbi_flag_set(sbi, SBI_CP_DISABLED))) {
|
|
|
|
if (p->gc_mode == GC_AT &&
|
|
|
|
get_valid_blocks(sbi, segno, true) == 0)
|
|
|
|
return;
|
|
|
|
|
|
|
|
if (p->alloc_mode == AT_SSR &&
|
|
|
|
get_seg_entry(sbi, segno)->ckpt_valid_blocks == 0)
|
|
|
|
return;
|
|
|
|
}
|
|
|
|
|
|
|
|
for (i = 0; i < sbi->segs_per_sec; i++)
|
|
|
|
mtime += get_seg_entry(sbi, start + i)->mtime;
|
|
|
|
mtime = div_u64(mtime, sbi->segs_per_sec);
|
|
|
|
|
|
|
|
/* Handle if the system time has changed by the user */
|
|
|
|
if (mtime < sit_i->min_mtime)
|
|
|
|
sit_i->min_mtime = mtime;
|
|
|
|
if (mtime > sit_i->max_mtime)
|
|
|
|
sit_i->max_mtime = mtime;
|
|
|
|
if (mtime < sit_i->dirty_min_mtime)
|
|
|
|
sit_i->dirty_min_mtime = mtime;
|
|
|
|
if (mtime > sit_i->dirty_max_mtime)
|
|
|
|
sit_i->dirty_max_mtime = mtime;
|
|
|
|
|
|
|
|
/* don't choose young section as candidate */
|
|
|
|
if (sit_i->dirty_max_mtime - mtime < p->age_threshold)
|
|
|
|
return;
|
|
|
|
|
|
|
|
insert_victim_entry(sbi, mtime, segno);
|
|
|
|
}
|
|
|
|
|
|
|
|
static struct rb_node *lookup_central_victim(struct f2fs_sb_info *sbi,
|
|
|
|
struct victim_sel_policy *p)
|
|
|
|
{
|
|
|
|
struct atgc_management *am = &sbi->am;
|
|
|
|
struct rb_node *parent = NULL;
|
|
|
|
bool left_most;
|
|
|
|
|
|
|
|
f2fs_lookup_rb_tree_ext(sbi, &am->root, &parent, p->age, &left_most);
|
|
|
|
|
|
|
|
return parent;
|
|
|
|
}
|
|
|
|
|
|
|
|
static void atgc_lookup_victim(struct f2fs_sb_info *sbi,
|
|
|
|
struct victim_sel_policy *p)
|
|
|
|
{
|
|
|
|
struct sit_info *sit_i = SIT_I(sbi);
|
|
|
|
struct atgc_management *am = &sbi->am;
|
|
|
|
struct rb_root_cached *root = &am->root;
|
|
|
|
struct rb_node *node;
|
|
|
|
struct rb_entry *re;
|
|
|
|
struct victim_entry *ve;
|
|
|
|
unsigned long long total_time;
|
|
|
|
unsigned long long age, u, accu;
|
|
|
|
unsigned long long max_mtime = sit_i->dirty_max_mtime;
|
|
|
|
unsigned long long min_mtime = sit_i->dirty_min_mtime;
|
|
|
|
unsigned int sec_blocks = BLKS_PER_SEC(sbi);
|
|
|
|
unsigned int vblocks;
|
|
|
|
unsigned int dirty_threshold = max(am->max_candidate_count,
|
|
|
|
am->candidate_ratio *
|
|
|
|
am->victim_count / 100);
|
|
|
|
unsigned int age_weight = am->age_weight;
|
|
|
|
unsigned int cost;
|
|
|
|
unsigned int iter = 0;
|
|
|
|
|
|
|
|
if (max_mtime < min_mtime)
|
|
|
|
return;
|
|
|
|
|
|
|
|
max_mtime += 1;
|
|
|
|
total_time = max_mtime - min_mtime;
|
|
|
|
|
|
|
|
accu = div64_u64(ULLONG_MAX, total_time);
|
|
|
|
accu = min_t(unsigned long long, div_u64(accu, 100),
|
|
|
|
DEFAULT_ACCURACY_CLASS);
|
|
|
|
|
|
|
|
node = rb_first_cached(root);
|
|
|
|
next:
|
|
|
|
re = rb_entry_safe(node, struct rb_entry, rb_node);
|
|
|
|
if (!re)
|
|
|
|
return;
|
|
|
|
|
|
|
|
ve = (struct victim_entry *)re;
|
|
|
|
|
|
|
|
if (ve->mtime >= max_mtime || ve->mtime < min_mtime)
|
|
|
|
goto skip;
|
|
|
|
|
|
|
|
/* age = 10000 * x% * 60 */
|
|
|
|
age = div64_u64(accu * (max_mtime - ve->mtime), total_time) *
|
|
|
|
age_weight;
|
|
|
|
|
|
|
|
vblocks = get_valid_blocks(sbi, ve->segno, true);
|
|
|
|
f2fs_bug_on(sbi, !vblocks || vblocks == sec_blocks);
|
|
|
|
|
|
|
|
/* u = 10000 * x% * 40 */
|
|
|
|
u = div64_u64(accu * (sec_blocks - vblocks), sec_blocks) *
|
|
|
|
(100 - age_weight);
|
|
|
|
|
|
|
|
f2fs_bug_on(sbi, age + u >= UINT_MAX);
|
|
|
|
|
|
|
|
cost = UINT_MAX - (age + u);
|
|
|
|
iter++;
|
|
|
|
|
|
|
|
if (cost < p->min_cost ||
|
|
|
|
(cost == p->min_cost && age > p->oldest_age)) {
|
|
|
|
p->min_cost = cost;
|
|
|
|
p->oldest_age = age;
|
|
|
|
p->min_segno = ve->segno;
|
|
|
|
}
|
|
|
|
skip:
|
|
|
|
if (iter < dirty_threshold) {
|
|
|
|
node = rb_next(node);
|
|
|
|
goto next;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* select candidates around source section in range of
|
|
|
|
* [target - dirty_threshold, target + dirty_threshold]
|
|
|
|
*/
|
|
|
|
static void atssr_lookup_victim(struct f2fs_sb_info *sbi,
|
|
|
|
struct victim_sel_policy *p)
|
|
|
|
{
|
|
|
|
struct sit_info *sit_i = SIT_I(sbi);
|
|
|
|
struct atgc_management *am = &sbi->am;
|
|
|
|
struct rb_node *node;
|
|
|
|
struct rb_entry *re;
|
|
|
|
struct victim_entry *ve;
|
|
|
|
unsigned long long age;
|
|
|
|
unsigned long long max_mtime = sit_i->dirty_max_mtime;
|
|
|
|
unsigned long long min_mtime = sit_i->dirty_min_mtime;
|
|
|
|
unsigned int seg_blocks = sbi->blocks_per_seg;
|
|
|
|
unsigned int vblocks;
|
|
|
|
unsigned int dirty_threshold = max(am->max_candidate_count,
|
|
|
|
am->candidate_ratio *
|
|
|
|
am->victim_count / 100);
|
|
|
|
unsigned int cost;
|
|
|
|
unsigned int iter = 0;
|
|
|
|
int stage = 0;
|
|
|
|
|
|
|
|
if (max_mtime < min_mtime)
|
|
|
|
return;
|
|
|
|
max_mtime += 1;
|
|
|
|
next_stage:
|
|
|
|
node = lookup_central_victim(sbi, p);
|
|
|
|
next_node:
|
|
|
|
re = rb_entry_safe(node, struct rb_entry, rb_node);
|
|
|
|
if (!re) {
|
|
|
|
if (stage == 0)
|
|
|
|
goto skip_stage;
|
|
|
|
return;
|
|
|
|
}
|
|
|
|
|
|
|
|
ve = (struct victim_entry *)re;
|
|
|
|
|
|
|
|
if (ve->mtime >= max_mtime || ve->mtime < min_mtime)
|
|
|
|
goto skip_node;
|
|
|
|
|
|
|
|
age = max_mtime - ve->mtime;
|
|
|
|
|
|
|
|
vblocks = get_seg_entry(sbi, ve->segno)->ckpt_valid_blocks;
|
|
|
|
f2fs_bug_on(sbi, !vblocks);
|
|
|
|
|
|
|
|
/* rare case */
|
|
|
|
if (vblocks == seg_blocks)
|
|
|
|
goto skip_node;
|
|
|
|
|
|
|
|
iter++;
|
|
|
|
|
|
|
|
age = max_mtime - abs(p->age - age);
|
|
|
|
cost = UINT_MAX - vblocks;
|
|
|
|
|
|
|
|
if (cost < p->min_cost ||
|
|
|
|
(cost == p->min_cost && age > p->oldest_age)) {
|
|
|
|
p->min_cost = cost;
|
|
|
|
p->oldest_age = age;
|
|
|
|
p->min_segno = ve->segno;
|
|
|
|
}
|
|
|
|
skip_node:
|
|
|
|
if (iter < dirty_threshold) {
|
|
|
|
if (stage == 0)
|
|
|
|
node = rb_prev(node);
|
|
|
|
else if (stage == 1)
|
|
|
|
node = rb_next(node);
|
|
|
|
goto next_node;
|
|
|
|
}
|
|
|
|
skip_stage:
|
|
|
|
if (stage < 1) {
|
|
|
|
stage++;
|
|
|
|
iter = 0;
|
|
|
|
goto next_stage;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
static void lookup_victim_by_age(struct f2fs_sb_info *sbi,
|
|
|
|
struct victim_sel_policy *p)
|
|
|
|
{
|
|
|
|
f2fs_bug_on(sbi, !f2fs_check_rb_tree_consistence(sbi,
|
|
|
|
&sbi->am.root, true));
|
|
|
|
|
|
|
|
if (p->gc_mode == GC_AT)
|
|
|
|
atgc_lookup_victim(sbi, p);
|
|
|
|
else if (p->alloc_mode == AT_SSR)
|
|
|
|
atssr_lookup_victim(sbi, p);
|
|
|
|
else
|
|
|
|
f2fs_bug_on(sbi, 1);
|
|
|
|
}
|
|
|
|
|
|
|
|
static void release_victim_entry(struct f2fs_sb_info *sbi)
|
|
|
|
{
|
|
|
|
struct atgc_management *am = &sbi->am;
|
|
|
|
struct victim_entry *ve, *tmp;
|
|
|
|
|
|
|
|
list_for_each_entry_safe(ve, tmp, &am->victim_list, list) {
|
|
|
|
list_del(&ve->list);
|
|
|
|
kmem_cache_free(victim_entry_slab, ve);
|
|
|
|
am->victim_count--;
|
|
|
|
}
|
|
|
|
|
|
|
|
am->root = RB_ROOT_CACHED;
|
|
|
|
|
|
|
|
f2fs_bug_on(sbi, am->victim_count);
|
|
|
|
f2fs_bug_on(sbi, !list_empty(&am->victim_list));
|
|
|
|
}
|
|
|
|
|
2012-11-29 12:28:09 +08:00
|
|
|
/*
|
2013-03-19 07:03:35 +08:00
|
|
|
* This function is called from two paths.
|
f2fs: add garbage collection functions
This adds on-demand and background cleaning functions.
- The basic background cleaning policy is trying to do cleaning jobs as much as
possible whenever the system is idle. Once the background cleaning is done,
the cleaner sleeps an amount of time not to interfere with VFS calls. The time
is dynamically adjusted according to the status of whole segments, which is
decreased when the following conditions are satisfied.
. GC is not conducted currently, and
. IO subsystem is idle by checking the number of requets in bdev's request
list, and
. There are enough dirty segments.
Otherwise, the time is increased incrementally until to the maximum time.
Note that, min and max times are 10 secs and 30 secs by default.
- F2FS adopts a default victim selection policy where background cleaning uses
a cost-benefit algorithm, while on-demand cleaning uses a greedy algorithm.
- The method of moving data during the cleaning is slightly different between
background and on-demand cleaning schemes. In the case of background cleaning,
F2FS loads the data, and marks them as dirty. Then, F2FS expects that the data
will be moved by flusher or VM. In the case of on-demand cleaning, F2FS should
move the data right away.
- In order to identify valid blocks in a victim segment, F2FS scans the bitmap
of the segment managed as an SIT entry.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:13:01 +08:00
|
|
|
* One is garbage collection and the other is SSR segment selection.
|
|
|
|
* When it is called during GC, it just gets a victim segment
|
|
|
|
* and it does not remove it from dirty seglist.
|
|
|
|
* When it is called from SSR segment selection, it finds a segment
|
|
|
|
* which has minimum valid blocks and removes it from dirty seglist.
|
|
|
|
*/
|
|
|
|
static int get_victim_by_default(struct f2fs_sb_info *sbi,
|
f2fs: support age threshold based garbage collection
There are several issues in current background GC algorithm:
- valid blocks is one of key factors during cost overhead calculation,
so if segment has less valid block, however even its age is young or
it locates hot segment, CB algorithm will still choose the segment as
victim, it's not appropriate.
- GCed data/node will go to existing logs, no matter in-there datas'
update frequency is the same or not, it may mix hot and cold data
again.
- GC alloctor mainly use LFS type segment, it will cost free segment
more quickly.
This patch introduces a new algorithm named age threshold based
garbage collection to solve above issues, there are three steps
mainly:
1. select a source victim:
- set an age threshold, and select candidates beased threshold:
e.g.
0 means youngest, 100 means oldest, if we set age threshold to 80
then select dirty segments which has age in range of [80, 100] as
candiddates;
- set candidate_ratio threshold, and select candidates based the
ratio, so that we can shrink candidates to those oldest segments;
- select target segment with fewest valid blocks in order to
migrate blocks with minimum cost;
2. select a target victim:
- select candidates beased age threshold;
- set candidate_radius threshold, search candidates whose age is
around source victims, searching radius should less than the
radius threshold.
- select target segment with most valid blocks in order to avoid
migrating current target segment.
3. merge valid blocks from source victim into target victim with
SSR alloctor.
Test steps:
- create 160 dirty segments:
* half of them have 128 valid blocks per segment
* left of them have 384 valid blocks per segment
- run background GC
Benefit: GC count and block movement count both decrease obviously:
- Before:
- Valid: 86
- Dirty: 1
- Prefree: 11
- Free: 6001 (6001)
GC calls: 162 (BG: 220)
- data segments : 160 (160)
- node segments : 2 (2)
Try to move 41454 blocks (BG: 41454)
- data blocks : 40960 (40960)
- node blocks : 494 (494)
IPU: 0 blocks
SSR: 0 blocks in 0 segments
LFS: 41364 blocks in 81 segments
- After:
- Valid: 87
- Dirty: 0
- Prefree: 4
- Free: 6008 (6008)
GC calls: 75 (BG: 76)
- data segments : 74 (74)
- node segments : 1 (1)
Try to move 12813 blocks (BG: 12813)
- data blocks : 12544 (12544)
- node blocks : 269 (269)
IPU: 0 blocks
SSR: 12032 blocks in 77 segments
LFS: 855 blocks in 2 segments
Signed-off-by: Chao Yu <yuchao0@huawei.com>
[Jaegeuk Kim: fix a bug along with pinfile in-mem segment & clean up]
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-08-04 21:14:49 +08:00
|
|
|
unsigned int *result, int gc_type, int type,
|
|
|
|
char alloc_mode, unsigned long long age)
|
f2fs: add garbage collection functions
This adds on-demand and background cleaning functions.
- The basic background cleaning policy is trying to do cleaning jobs as much as
possible whenever the system is idle. Once the background cleaning is done,
the cleaner sleeps an amount of time not to interfere with VFS calls. The time
is dynamically adjusted according to the status of whole segments, which is
decreased when the following conditions are satisfied.
. GC is not conducted currently, and
. IO subsystem is idle by checking the number of requets in bdev's request
list, and
. There are enough dirty segments.
Otherwise, the time is increased incrementally until to the maximum time.
Note that, min and max times are 10 secs and 30 secs by default.
- F2FS adopts a default victim selection policy where background cleaning uses
a cost-benefit algorithm, while on-demand cleaning uses a greedy algorithm.
- The method of moving data during the cleaning is slightly different between
background and on-demand cleaning schemes. In the case of background cleaning,
F2FS loads the data, and marks them as dirty. Then, F2FS expects that the data
will be moved by flusher or VM. In the case of on-demand cleaning, F2FS should
move the data right away.
- In order to identify valid blocks in a victim segment, F2FS scans the bitmap
of the segment managed as an SIT entry.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:13:01 +08:00
|
|
|
{
|
|
|
|
struct dirty_seglist_info *dirty_i = DIRTY_I(sbi);
|
2017-04-14 06:17:00 +08:00
|
|
|
struct sit_info *sm = SIT_I(sbi);
|
f2fs: add garbage collection functions
This adds on-demand and background cleaning functions.
- The basic background cleaning policy is trying to do cleaning jobs as much as
possible whenever the system is idle. Once the background cleaning is done,
the cleaner sleeps an amount of time not to interfere with VFS calls. The time
is dynamically adjusted according to the status of whole segments, which is
decreased when the following conditions are satisfied.
. GC is not conducted currently, and
. IO subsystem is idle by checking the number of requets in bdev's request
list, and
. There are enough dirty segments.
Otherwise, the time is increased incrementally until to the maximum time.
Note that, min and max times are 10 secs and 30 secs by default.
- F2FS adopts a default victim selection policy where background cleaning uses
a cost-benefit algorithm, while on-demand cleaning uses a greedy algorithm.
- The method of moving data during the cleaning is slightly different between
background and on-demand cleaning schemes. In the case of background cleaning,
F2FS loads the data, and marks them as dirty. Then, F2FS expects that the data
will be moved by flusher or VM. In the case of on-demand cleaning, F2FS should
move the data right away.
- In order to identify valid blocks in a victim segment, F2FS scans the bitmap
of the segment managed as an SIT entry.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:13:01 +08:00
|
|
|
struct victim_sel_policy p;
|
2016-09-29 18:37:31 +08:00
|
|
|
unsigned int secno, last_victim;
|
2019-06-05 11:33:25 +08:00
|
|
|
unsigned int last_segment;
|
f2fs: support age threshold based garbage collection
There are several issues in current background GC algorithm:
- valid blocks is one of key factors during cost overhead calculation,
so if segment has less valid block, however even its age is young or
it locates hot segment, CB algorithm will still choose the segment as
victim, it's not appropriate.
- GCed data/node will go to existing logs, no matter in-there datas'
update frequency is the same or not, it may mix hot and cold data
again.
- GC alloctor mainly use LFS type segment, it will cost free segment
more quickly.
This patch introduces a new algorithm named age threshold based
garbage collection to solve above issues, there are three steps
mainly:
1. select a source victim:
- set an age threshold, and select candidates beased threshold:
e.g.
0 means youngest, 100 means oldest, if we set age threshold to 80
then select dirty segments which has age in range of [80, 100] as
candiddates;
- set candidate_ratio threshold, and select candidates based the
ratio, so that we can shrink candidates to those oldest segments;
- select target segment with fewest valid blocks in order to
migrate blocks with minimum cost;
2. select a target victim:
- select candidates beased age threshold;
- set candidate_radius threshold, search candidates whose age is
around source victims, searching radius should less than the
radius threshold.
- select target segment with most valid blocks in order to avoid
migrating current target segment.
3. merge valid blocks from source victim into target victim with
SSR alloctor.
Test steps:
- create 160 dirty segments:
* half of them have 128 valid blocks per segment
* left of them have 384 valid blocks per segment
- run background GC
Benefit: GC count and block movement count both decrease obviously:
- Before:
- Valid: 86
- Dirty: 1
- Prefree: 11
- Free: 6001 (6001)
GC calls: 162 (BG: 220)
- data segments : 160 (160)
- node segments : 2 (2)
Try to move 41454 blocks (BG: 41454)
- data blocks : 40960 (40960)
- node blocks : 494 (494)
IPU: 0 blocks
SSR: 0 blocks in 0 segments
LFS: 41364 blocks in 81 segments
- After:
- Valid: 87
- Dirty: 0
- Prefree: 4
- Free: 6008 (6008)
GC calls: 75 (BG: 76)
- data segments : 74 (74)
- node segments : 1 (1)
Try to move 12813 blocks (BG: 12813)
- data blocks : 12544 (12544)
- node blocks : 269 (269)
IPU: 0 blocks
SSR: 12032 blocks in 77 segments
LFS: 855 blocks in 2 segments
Signed-off-by: Chao Yu <yuchao0@huawei.com>
[Jaegeuk Kim: fix a bug along with pinfile in-mem segment & clean up]
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-08-04 21:14:49 +08:00
|
|
|
unsigned int nsearched;
|
|
|
|
bool is_atgc;
|
2020-06-28 19:23:03 +08:00
|
|
|
int ret = 0;
|
f2fs: add garbage collection functions
This adds on-demand and background cleaning functions.
- The basic background cleaning policy is trying to do cleaning jobs as much as
possible whenever the system is idle. Once the background cleaning is done,
the cleaner sleeps an amount of time not to interfere with VFS calls. The time
is dynamically adjusted according to the status of whole segments, which is
decreased when the following conditions are satisfied.
. GC is not conducted currently, and
. IO subsystem is idle by checking the number of requets in bdev's request
list, and
. There are enough dirty segments.
Otherwise, the time is increased incrementally until to the maximum time.
Note that, min and max times are 10 secs and 30 secs by default.
- F2FS adopts a default victim selection policy where background cleaning uses
a cost-benefit algorithm, while on-demand cleaning uses a greedy algorithm.
- The method of moving data during the cleaning is slightly different between
background and on-demand cleaning schemes. In the case of background cleaning,
F2FS loads the data, and marks them as dirty. Then, F2FS expects that the data
will be moved by flusher or VM. In the case of on-demand cleaning, F2FS should
move the data right away.
- In order to identify valid blocks in a victim segment, F2FS scans the bitmap
of the segment managed as an SIT entry.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:13:01 +08:00
|
|
|
|
2014-09-15 18:05:44 +08:00
|
|
|
mutex_lock(&dirty_i->seglist_lock);
|
2019-06-05 11:33:25 +08:00
|
|
|
last_segment = MAIN_SECS(sbi) * sbi->segs_per_sec;
|
2014-09-15 18:05:44 +08:00
|
|
|
|
f2fs: add garbage collection functions
This adds on-demand and background cleaning functions.
- The basic background cleaning policy is trying to do cleaning jobs as much as
possible whenever the system is idle. Once the background cleaning is done,
the cleaner sleeps an amount of time not to interfere with VFS calls. The time
is dynamically adjusted according to the status of whole segments, which is
decreased when the following conditions are satisfied.
. GC is not conducted currently, and
. IO subsystem is idle by checking the number of requets in bdev's request
list, and
. There are enough dirty segments.
Otherwise, the time is increased incrementally until to the maximum time.
Note that, min and max times are 10 secs and 30 secs by default.
- F2FS adopts a default victim selection policy where background cleaning uses
a cost-benefit algorithm, while on-demand cleaning uses a greedy algorithm.
- The method of moving data during the cleaning is slightly different between
background and on-demand cleaning schemes. In the case of background cleaning,
F2FS loads the data, and marks them as dirty. Then, F2FS expects that the data
will be moved by flusher or VM. In the case of on-demand cleaning, F2FS should
move the data right away.
- In order to identify valid blocks in a victim segment, F2FS scans the bitmap
of the segment managed as an SIT entry.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:13:01 +08:00
|
|
|
p.alloc_mode = alloc_mode;
|
f2fs: support age threshold based garbage collection
There are several issues in current background GC algorithm:
- valid blocks is one of key factors during cost overhead calculation,
so if segment has less valid block, however even its age is young or
it locates hot segment, CB algorithm will still choose the segment as
victim, it's not appropriate.
- GCed data/node will go to existing logs, no matter in-there datas'
update frequency is the same or not, it may mix hot and cold data
again.
- GC alloctor mainly use LFS type segment, it will cost free segment
more quickly.
This patch introduces a new algorithm named age threshold based
garbage collection to solve above issues, there are three steps
mainly:
1. select a source victim:
- set an age threshold, and select candidates beased threshold:
e.g.
0 means youngest, 100 means oldest, if we set age threshold to 80
then select dirty segments which has age in range of [80, 100] as
candiddates;
- set candidate_ratio threshold, and select candidates based the
ratio, so that we can shrink candidates to those oldest segments;
- select target segment with fewest valid blocks in order to
migrate blocks with minimum cost;
2. select a target victim:
- select candidates beased age threshold;
- set candidate_radius threshold, search candidates whose age is
around source victims, searching radius should less than the
radius threshold.
- select target segment with most valid blocks in order to avoid
migrating current target segment.
3. merge valid blocks from source victim into target victim with
SSR alloctor.
Test steps:
- create 160 dirty segments:
* half of them have 128 valid blocks per segment
* left of them have 384 valid blocks per segment
- run background GC
Benefit: GC count and block movement count both decrease obviously:
- Before:
- Valid: 86
- Dirty: 1
- Prefree: 11
- Free: 6001 (6001)
GC calls: 162 (BG: 220)
- data segments : 160 (160)
- node segments : 2 (2)
Try to move 41454 blocks (BG: 41454)
- data blocks : 40960 (40960)
- node blocks : 494 (494)
IPU: 0 blocks
SSR: 0 blocks in 0 segments
LFS: 41364 blocks in 81 segments
- After:
- Valid: 87
- Dirty: 0
- Prefree: 4
- Free: 6008 (6008)
GC calls: 75 (BG: 76)
- data segments : 74 (74)
- node segments : 1 (1)
Try to move 12813 blocks (BG: 12813)
- data blocks : 12544 (12544)
- node blocks : 269 (269)
IPU: 0 blocks
SSR: 12032 blocks in 77 segments
LFS: 855 blocks in 2 segments
Signed-off-by: Chao Yu <yuchao0@huawei.com>
[Jaegeuk Kim: fix a bug along with pinfile in-mem segment & clean up]
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-08-04 21:14:49 +08:00
|
|
|
p.age = age;
|
|
|
|
p.age_threshold = sbi->am.age_threshold;
|
f2fs: add garbage collection functions
This adds on-demand and background cleaning functions.
- The basic background cleaning policy is trying to do cleaning jobs as much as
possible whenever the system is idle. Once the background cleaning is done,
the cleaner sleeps an amount of time not to interfere with VFS calls. The time
is dynamically adjusted according to the status of whole segments, which is
decreased when the following conditions are satisfied.
. GC is not conducted currently, and
. IO subsystem is idle by checking the number of requets in bdev's request
list, and
. There are enough dirty segments.
Otherwise, the time is increased incrementally until to the maximum time.
Note that, min and max times are 10 secs and 30 secs by default.
- F2FS adopts a default victim selection policy where background cleaning uses
a cost-benefit algorithm, while on-demand cleaning uses a greedy algorithm.
- The method of moving data during the cleaning is slightly different between
background and on-demand cleaning schemes. In the case of background cleaning,
F2FS loads the data, and marks them as dirty. Then, F2FS expects that the data
will be moved by flusher or VM. In the case of on-demand cleaning, F2FS should
move the data right away.
- In order to identify valid blocks in a victim segment, F2FS scans the bitmap
of the segment managed as an SIT entry.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:13:01 +08:00
|
|
|
|
f2fs: support age threshold based garbage collection
There are several issues in current background GC algorithm:
- valid blocks is one of key factors during cost overhead calculation,
so if segment has less valid block, however even its age is young or
it locates hot segment, CB algorithm will still choose the segment as
victim, it's not appropriate.
- GCed data/node will go to existing logs, no matter in-there datas'
update frequency is the same or not, it may mix hot and cold data
again.
- GC alloctor mainly use LFS type segment, it will cost free segment
more quickly.
This patch introduces a new algorithm named age threshold based
garbage collection to solve above issues, there are three steps
mainly:
1. select a source victim:
- set an age threshold, and select candidates beased threshold:
e.g.
0 means youngest, 100 means oldest, if we set age threshold to 80
then select dirty segments which has age in range of [80, 100] as
candiddates;
- set candidate_ratio threshold, and select candidates based the
ratio, so that we can shrink candidates to those oldest segments;
- select target segment with fewest valid blocks in order to
migrate blocks with minimum cost;
2. select a target victim:
- select candidates beased age threshold;
- set candidate_radius threshold, search candidates whose age is
around source victims, searching radius should less than the
radius threshold.
- select target segment with most valid blocks in order to avoid
migrating current target segment.
3. merge valid blocks from source victim into target victim with
SSR alloctor.
Test steps:
- create 160 dirty segments:
* half of them have 128 valid blocks per segment
* left of them have 384 valid blocks per segment
- run background GC
Benefit: GC count and block movement count both decrease obviously:
- Before:
- Valid: 86
- Dirty: 1
- Prefree: 11
- Free: 6001 (6001)
GC calls: 162 (BG: 220)
- data segments : 160 (160)
- node segments : 2 (2)
Try to move 41454 blocks (BG: 41454)
- data blocks : 40960 (40960)
- node blocks : 494 (494)
IPU: 0 blocks
SSR: 0 blocks in 0 segments
LFS: 41364 blocks in 81 segments
- After:
- Valid: 87
- Dirty: 0
- Prefree: 4
- Free: 6008 (6008)
GC calls: 75 (BG: 76)
- data segments : 74 (74)
- node segments : 1 (1)
Try to move 12813 blocks (BG: 12813)
- data blocks : 12544 (12544)
- node blocks : 269 (269)
IPU: 0 blocks
SSR: 12032 blocks in 77 segments
LFS: 855 blocks in 2 segments
Signed-off-by: Chao Yu <yuchao0@huawei.com>
[Jaegeuk Kim: fix a bug along with pinfile in-mem segment & clean up]
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-08-04 21:14:49 +08:00
|
|
|
retry:
|
|
|
|
select_policy(sbi, gc_type, type, &p);
|
f2fs: add garbage collection functions
This adds on-demand and background cleaning functions.
- The basic background cleaning policy is trying to do cleaning jobs as much as
possible whenever the system is idle. Once the background cleaning is done,
the cleaner sleeps an amount of time not to interfere with VFS calls. The time
is dynamically adjusted according to the status of whole segments, which is
decreased when the following conditions are satisfied.
. GC is not conducted currently, and
. IO subsystem is idle by checking the number of requets in bdev's request
list, and
. There are enough dirty segments.
Otherwise, the time is increased incrementally until to the maximum time.
Note that, min and max times are 10 secs and 30 secs by default.
- F2FS adopts a default victim selection policy where background cleaning uses
a cost-benefit algorithm, while on-demand cleaning uses a greedy algorithm.
- The method of moving data during the cleaning is slightly different between
background and on-demand cleaning schemes. In the case of background cleaning,
F2FS loads the data, and marks them as dirty. Then, F2FS expects that the data
will be moved by flusher or VM. In the case of on-demand cleaning, F2FS should
move the data right away.
- In order to identify valid blocks in a victim segment, F2FS scans the bitmap
of the segment managed as an SIT entry.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:13:01 +08:00
|
|
|
p.min_segno = NULL_SEGNO;
|
f2fs: support age threshold based garbage collection
There are several issues in current background GC algorithm:
- valid blocks is one of key factors during cost overhead calculation,
so if segment has less valid block, however even its age is young or
it locates hot segment, CB algorithm will still choose the segment as
victim, it's not appropriate.
- GCed data/node will go to existing logs, no matter in-there datas'
update frequency is the same or not, it may mix hot and cold data
again.
- GC alloctor mainly use LFS type segment, it will cost free segment
more quickly.
This patch introduces a new algorithm named age threshold based
garbage collection to solve above issues, there are three steps
mainly:
1. select a source victim:
- set an age threshold, and select candidates beased threshold:
e.g.
0 means youngest, 100 means oldest, if we set age threshold to 80
then select dirty segments which has age in range of [80, 100] as
candiddates;
- set candidate_ratio threshold, and select candidates based the
ratio, so that we can shrink candidates to those oldest segments;
- select target segment with fewest valid blocks in order to
migrate blocks with minimum cost;
2. select a target victim:
- select candidates beased age threshold;
- set candidate_radius threshold, search candidates whose age is
around source victims, searching radius should less than the
radius threshold.
- select target segment with most valid blocks in order to avoid
migrating current target segment.
3. merge valid blocks from source victim into target victim with
SSR alloctor.
Test steps:
- create 160 dirty segments:
* half of them have 128 valid blocks per segment
* left of them have 384 valid blocks per segment
- run background GC
Benefit: GC count and block movement count both decrease obviously:
- Before:
- Valid: 86
- Dirty: 1
- Prefree: 11
- Free: 6001 (6001)
GC calls: 162 (BG: 220)
- data segments : 160 (160)
- node segments : 2 (2)
Try to move 41454 blocks (BG: 41454)
- data blocks : 40960 (40960)
- node blocks : 494 (494)
IPU: 0 blocks
SSR: 0 blocks in 0 segments
LFS: 41364 blocks in 81 segments
- After:
- Valid: 87
- Dirty: 0
- Prefree: 4
- Free: 6008 (6008)
GC calls: 75 (BG: 76)
- data segments : 74 (74)
- node segments : 1 (1)
Try to move 12813 blocks (BG: 12813)
- data blocks : 12544 (12544)
- node blocks : 269 (269)
IPU: 0 blocks
SSR: 12032 blocks in 77 segments
LFS: 855 blocks in 2 segments
Signed-off-by: Chao Yu <yuchao0@huawei.com>
[Jaegeuk Kim: fix a bug along with pinfile in-mem segment & clean up]
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-08-04 21:14:49 +08:00
|
|
|
p.oldest_age = 0;
|
2016-09-29 18:37:31 +08:00
|
|
|
p.min_cost = get_max_cost(sbi, &p);
|
f2fs: add garbage collection functions
This adds on-demand and background cleaning functions.
- The basic background cleaning policy is trying to do cleaning jobs as much as
possible whenever the system is idle. Once the background cleaning is done,
the cleaner sleeps an amount of time not to interfere with VFS calls. The time
is dynamically adjusted according to the status of whole segments, which is
decreased when the following conditions are satisfied.
. GC is not conducted currently, and
. IO subsystem is idle by checking the number of requets in bdev's request
list, and
. There are enough dirty segments.
Otherwise, the time is increased incrementally until to the maximum time.
Note that, min and max times are 10 secs and 30 secs by default.
- F2FS adopts a default victim selection policy where background cleaning uses
a cost-benefit algorithm, while on-demand cleaning uses a greedy algorithm.
- The method of moving data during the cleaning is slightly different between
background and on-demand cleaning schemes. In the case of background cleaning,
F2FS loads the data, and marks them as dirty. Then, F2FS expects that the data
will be moved by flusher or VM. In the case of on-demand cleaning, F2FS should
move the data right away.
- In order to identify valid blocks in a victim segment, F2FS scans the bitmap
of the segment managed as an SIT entry.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:13:01 +08:00
|
|
|
|
f2fs: support age threshold based garbage collection
There are several issues in current background GC algorithm:
- valid blocks is one of key factors during cost overhead calculation,
so if segment has less valid block, however even its age is young or
it locates hot segment, CB algorithm will still choose the segment as
victim, it's not appropriate.
- GCed data/node will go to existing logs, no matter in-there datas'
update frequency is the same or not, it may mix hot and cold data
again.
- GC alloctor mainly use LFS type segment, it will cost free segment
more quickly.
This patch introduces a new algorithm named age threshold based
garbage collection to solve above issues, there are three steps
mainly:
1. select a source victim:
- set an age threshold, and select candidates beased threshold:
e.g.
0 means youngest, 100 means oldest, if we set age threshold to 80
then select dirty segments which has age in range of [80, 100] as
candiddates;
- set candidate_ratio threshold, and select candidates based the
ratio, so that we can shrink candidates to those oldest segments;
- select target segment with fewest valid blocks in order to
migrate blocks with minimum cost;
2. select a target victim:
- select candidates beased age threshold;
- set candidate_radius threshold, search candidates whose age is
around source victims, searching radius should less than the
radius threshold.
- select target segment with most valid blocks in order to avoid
migrating current target segment.
3. merge valid blocks from source victim into target victim with
SSR alloctor.
Test steps:
- create 160 dirty segments:
* half of them have 128 valid blocks per segment
* left of them have 384 valid blocks per segment
- run background GC
Benefit: GC count and block movement count both decrease obviously:
- Before:
- Valid: 86
- Dirty: 1
- Prefree: 11
- Free: 6001 (6001)
GC calls: 162 (BG: 220)
- data segments : 160 (160)
- node segments : 2 (2)
Try to move 41454 blocks (BG: 41454)
- data blocks : 40960 (40960)
- node blocks : 494 (494)
IPU: 0 blocks
SSR: 0 blocks in 0 segments
LFS: 41364 blocks in 81 segments
- After:
- Valid: 87
- Dirty: 0
- Prefree: 4
- Free: 6008 (6008)
GC calls: 75 (BG: 76)
- data segments : 74 (74)
- node segments : 1 (1)
Try to move 12813 blocks (BG: 12813)
- data blocks : 12544 (12544)
- node blocks : 269 (269)
IPU: 0 blocks
SSR: 12032 blocks in 77 segments
LFS: 855 blocks in 2 segments
Signed-off-by: Chao Yu <yuchao0@huawei.com>
[Jaegeuk Kim: fix a bug along with pinfile in-mem segment & clean up]
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-08-04 21:14:49 +08:00
|
|
|
is_atgc = (p.gc_mode == GC_AT || p.alloc_mode == AT_SSR);
|
|
|
|
nsearched = 0;
|
|
|
|
|
|
|
|
if (is_atgc)
|
|
|
|
SIT_I(sbi)->dirty_min_mtime = ULLONG_MAX;
|
|
|
|
|
2017-04-14 06:17:00 +08:00
|
|
|
if (*result != NULL_SEGNO) {
|
2020-06-28 19:23:03 +08:00
|
|
|
if (!get_valid_blocks(sbi, *result, false)) {
|
|
|
|
ret = -ENODATA;
|
|
|
|
goto out;
|
|
|
|
}
|
|
|
|
|
|
|
|
if (sec_usage_check(sbi, GET_SEC_FROM_SEG(sbi, *result)))
|
|
|
|
ret = -EBUSY;
|
|
|
|
else
|
2017-04-14 06:17:00 +08:00
|
|
|
p.min_segno = *result;
|
|
|
|
goto out;
|
|
|
|
}
|
|
|
|
|
2020-06-28 19:23:03 +08:00
|
|
|
ret = -ENODATA;
|
2015-10-05 22:20:40 +08:00
|
|
|
if (p.max_search == 0)
|
|
|
|
goto out;
|
|
|
|
|
f2fs: support subsectional garbage collection
Section is minimal garbage collection unit of f2fs, in zoned block
device, or ancient block mapping flash device, in order to improve
GC efficiency, we can align GC unit to lower device erase unit,
normally, it consists of multiple of segments.
Once background or foreground GC triggers, it brings a large number
of IOs, which will impact user IO, and also occupy cpu/memory resource
intensively.
So, to reduce impact of GC on large size section, this patch supports
subsectional GC, in one cycle of GC, it only migrate partial segment{s}
in victim section. Currently, by default, we use sbi->segs_per_sec as
migration granularity.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-10-24 18:37:27 +08:00
|
|
|
if (__is_large_section(sbi) && p.alloc_mode == LFS) {
|
|
|
|
if (sbi->next_victim_seg[BG_GC] != NULL_SEGNO) {
|
|
|
|
p.min_segno = sbi->next_victim_seg[BG_GC];
|
|
|
|
*result = p.min_segno;
|
|
|
|
sbi->next_victim_seg[BG_GC] = NULL_SEGNO;
|
|
|
|
goto got_result;
|
|
|
|
}
|
|
|
|
if (gc_type == FG_GC &&
|
|
|
|
sbi->next_victim_seg[FG_GC] != NULL_SEGNO) {
|
|
|
|
p.min_segno = sbi->next_victim_seg[FG_GC];
|
|
|
|
*result = p.min_segno;
|
|
|
|
sbi->next_victim_seg[FG_GC] = NULL_SEGNO;
|
|
|
|
goto got_result;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2017-04-14 06:17:00 +08:00
|
|
|
last_victim = sm->last_victim[p.gc_mode];
|
f2fs: add garbage collection functions
This adds on-demand and background cleaning functions.
- The basic background cleaning policy is trying to do cleaning jobs as much as
possible whenever the system is idle. Once the background cleaning is done,
the cleaner sleeps an amount of time not to interfere with VFS calls. The time
is dynamically adjusted according to the status of whole segments, which is
decreased when the following conditions are satisfied.
. GC is not conducted currently, and
. IO subsystem is idle by checking the number of requets in bdev's request
list, and
. There are enough dirty segments.
Otherwise, the time is increased incrementally until to the maximum time.
Note that, min and max times are 10 secs and 30 secs by default.
- F2FS adopts a default victim selection policy where background cleaning uses
a cost-benefit algorithm, while on-demand cleaning uses a greedy algorithm.
- The method of moving data during the cleaning is slightly different between
background and on-demand cleaning schemes. In the case of background cleaning,
F2FS loads the data, and marks them as dirty. Then, F2FS expects that the data
will be moved by flusher or VM. In the case of on-demand cleaning, F2FS should
move the data right away.
- In order to identify valid blocks in a victim segment, F2FS scans the bitmap
of the segment managed as an SIT entry.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:13:01 +08:00
|
|
|
if (p.alloc_mode == LFS && gc_type == FG_GC) {
|
|
|
|
p.min_segno = check_bg_victims(sbi);
|
|
|
|
if (p.min_segno != NULL_SEGNO)
|
|
|
|
goto got_it;
|
|
|
|
}
|
|
|
|
|
|
|
|
while (1) {
|
2020-06-18 12:37:10 +08:00
|
|
|
unsigned long cost, *dirty_bitmap;
|
|
|
|
unsigned int unit_no, segno;
|
|
|
|
|
|
|
|
dirty_bitmap = p.dirty_bitmap;
|
|
|
|
unit_no = find_next_bit(dirty_bitmap,
|
|
|
|
last_segment / p.ofs_unit,
|
|
|
|
p.offset / p.ofs_unit);
|
|
|
|
segno = unit_no * p.ofs_unit;
|
2015-10-05 22:19:24 +08:00
|
|
|
if (segno >= last_segment) {
|
2017-04-14 06:17:00 +08:00
|
|
|
if (sm->last_victim[p.gc_mode]) {
|
|
|
|
last_segment =
|
|
|
|
sm->last_victim[p.gc_mode];
|
|
|
|
sm->last_victim[p.gc_mode] = 0;
|
f2fs: add garbage collection functions
This adds on-demand and background cleaning functions.
- The basic background cleaning policy is trying to do cleaning jobs as much as
possible whenever the system is idle. Once the background cleaning is done,
the cleaner sleeps an amount of time not to interfere with VFS calls. The time
is dynamically adjusted according to the status of whole segments, which is
decreased when the following conditions are satisfied.
. GC is not conducted currently, and
. IO subsystem is idle by checking the number of requets in bdev's request
list, and
. There are enough dirty segments.
Otherwise, the time is increased incrementally until to the maximum time.
Note that, min and max times are 10 secs and 30 secs by default.
- F2FS adopts a default victim selection policy where background cleaning uses
a cost-benefit algorithm, while on-demand cleaning uses a greedy algorithm.
- The method of moving data during the cleaning is slightly different between
background and on-demand cleaning schemes. In the case of background cleaning,
F2FS loads the data, and marks them as dirty. Then, F2FS expects that the data
will be moved by flusher or VM. In the case of on-demand cleaning, F2FS should
move the data right away.
- In order to identify valid blocks in a victim segment, F2FS scans the bitmap
of the segment managed as an SIT entry.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:13:01 +08:00
|
|
|
p.offset = 0;
|
|
|
|
continue;
|
|
|
|
}
|
|
|
|
break;
|
|
|
|
}
|
2013-09-13 08:38:54 +08:00
|
|
|
|
|
|
|
p.offset = segno + p.ofs_unit;
|
2020-06-18 12:37:10 +08:00
|
|
|
nsearched++;
|
2016-02-03 16:21:57 +08:00
|
|
|
|
2019-08-07 21:40:32 +08:00
|
|
|
#ifdef CONFIG_F2FS_CHECK_FS
|
|
|
|
/*
|
|
|
|
* skip selecting the invalid segno (that is failed due to block
|
|
|
|
* validity check failure during GC) to avoid endless GC loop in
|
|
|
|
* such cases.
|
|
|
|
*/
|
|
|
|
if (test_bit(segno, sm->invalid_segmap))
|
|
|
|
goto next;
|
|
|
|
#endif
|
|
|
|
|
2017-04-08 06:08:17 +08:00
|
|
|
secno = GET_SEC_FROM_SEG(sbi, segno);
|
f2fs: add garbage collection functions
This adds on-demand and background cleaning functions.
- The basic background cleaning policy is trying to do cleaning jobs as much as
possible whenever the system is idle. Once the background cleaning is done,
the cleaner sleeps an amount of time not to interfere with VFS calls. The time
is dynamically adjusted according to the status of whole segments, which is
decreased when the following conditions are satisfied.
. GC is not conducted currently, and
. IO subsystem is idle by checking the number of requets in bdev's request
list, and
. There are enough dirty segments.
Otherwise, the time is increased incrementally until to the maximum time.
Note that, min and max times are 10 secs and 30 secs by default.
- F2FS adopts a default victim selection policy where background cleaning uses
a cost-benefit algorithm, while on-demand cleaning uses a greedy algorithm.
- The method of moving data during the cleaning is slightly different between
background and on-demand cleaning schemes. In the case of background cleaning,
F2FS loads the data, and marks them as dirty. Then, F2FS expects that the data
will be moved by flusher or VM. In the case of on-demand cleaning, F2FS should
move the data right away.
- In order to identify valid blocks in a victim segment, F2FS scans the bitmap
of the segment managed as an SIT entry.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:13:01 +08:00
|
|
|
|
2013-03-31 12:26:03 +08:00
|
|
|
if (sec_usage_check(sbi, secno))
|
2016-02-03 16:21:57 +08:00
|
|
|
goto next;
|
2018-08-21 10:21:43 +08:00
|
|
|
/* Don't touch checkpointed data */
|
|
|
|
if (unlikely(is_sbi_flag_set(sbi, SBI_CP_DISABLED) &&
|
2019-05-21 06:54:49 +08:00
|
|
|
get_ckpt_valid_blocks(sbi, segno) &&
|
f2fs: support age threshold based garbage collection
There are several issues in current background GC algorithm:
- valid blocks is one of key factors during cost overhead calculation,
so if segment has less valid block, however even its age is young or
it locates hot segment, CB algorithm will still choose the segment as
victim, it's not appropriate.
- GCed data/node will go to existing logs, no matter in-there datas'
update frequency is the same or not, it may mix hot and cold data
again.
- GC alloctor mainly use LFS type segment, it will cost free segment
more quickly.
This patch introduces a new algorithm named age threshold based
garbage collection to solve above issues, there are three steps
mainly:
1. select a source victim:
- set an age threshold, and select candidates beased threshold:
e.g.
0 means youngest, 100 means oldest, if we set age threshold to 80
then select dirty segments which has age in range of [80, 100] as
candiddates;
- set candidate_ratio threshold, and select candidates based the
ratio, so that we can shrink candidates to those oldest segments;
- select target segment with fewest valid blocks in order to
migrate blocks with minimum cost;
2. select a target victim:
- select candidates beased age threshold;
- set candidate_radius threshold, search candidates whose age is
around source victims, searching radius should less than the
radius threshold.
- select target segment with most valid blocks in order to avoid
migrating current target segment.
3. merge valid blocks from source victim into target victim with
SSR alloctor.
Test steps:
- create 160 dirty segments:
* half of them have 128 valid blocks per segment
* left of them have 384 valid blocks per segment
- run background GC
Benefit: GC count and block movement count both decrease obviously:
- Before:
- Valid: 86
- Dirty: 1
- Prefree: 11
- Free: 6001 (6001)
GC calls: 162 (BG: 220)
- data segments : 160 (160)
- node segments : 2 (2)
Try to move 41454 blocks (BG: 41454)
- data blocks : 40960 (40960)
- node blocks : 494 (494)
IPU: 0 blocks
SSR: 0 blocks in 0 segments
LFS: 41364 blocks in 81 segments
- After:
- Valid: 87
- Dirty: 0
- Prefree: 4
- Free: 6008 (6008)
GC calls: 75 (BG: 76)
- data segments : 74 (74)
- node segments : 1 (1)
Try to move 12813 blocks (BG: 12813)
- data blocks : 12544 (12544)
- node blocks : 269 (269)
IPU: 0 blocks
SSR: 12032 blocks in 77 segments
LFS: 855 blocks in 2 segments
Signed-off-by: Chao Yu <yuchao0@huawei.com>
[Jaegeuk Kim: fix a bug along with pinfile in-mem segment & clean up]
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-08-04 21:14:49 +08:00
|
|
|
p.alloc_mode == LFS))
|
2018-08-21 10:21:43 +08:00
|
|
|
goto next;
|
2013-03-31 12:26:03 +08:00
|
|
|
if (gc_type == BG_GC && test_bit(secno, dirty_i->victim_secmap))
|
2016-02-03 16:21:57 +08:00
|
|
|
goto next;
|
f2fs: add garbage collection functions
This adds on-demand and background cleaning functions.
- The basic background cleaning policy is trying to do cleaning jobs as much as
possible whenever the system is idle. Once the background cleaning is done,
the cleaner sleeps an amount of time not to interfere with VFS calls. The time
is dynamically adjusted according to the status of whole segments, which is
decreased when the following conditions are satisfied.
. GC is not conducted currently, and
. IO subsystem is idle by checking the number of requets in bdev's request
list, and
. There are enough dirty segments.
Otherwise, the time is increased incrementally until to the maximum time.
Note that, min and max times are 10 secs and 30 secs by default.
- F2FS adopts a default victim selection policy where background cleaning uses
a cost-benefit algorithm, while on-demand cleaning uses a greedy algorithm.
- The method of moving data during the cleaning is slightly different between
background and on-demand cleaning schemes. In the case of background cleaning,
F2FS loads the data, and marks them as dirty. Then, F2FS expects that the data
will be moved by flusher or VM. In the case of on-demand cleaning, F2FS should
move the data right away.
- In order to identify valid blocks in a victim segment, F2FS scans the bitmap
of the segment managed as an SIT entry.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:13:01 +08:00
|
|
|
|
f2fs: support age threshold based garbage collection
There are several issues in current background GC algorithm:
- valid blocks is one of key factors during cost overhead calculation,
so if segment has less valid block, however even its age is young or
it locates hot segment, CB algorithm will still choose the segment as
victim, it's not appropriate.
- GCed data/node will go to existing logs, no matter in-there datas'
update frequency is the same or not, it may mix hot and cold data
again.
- GC alloctor mainly use LFS type segment, it will cost free segment
more quickly.
This patch introduces a new algorithm named age threshold based
garbage collection to solve above issues, there are three steps
mainly:
1. select a source victim:
- set an age threshold, and select candidates beased threshold:
e.g.
0 means youngest, 100 means oldest, if we set age threshold to 80
then select dirty segments which has age in range of [80, 100] as
candiddates;
- set candidate_ratio threshold, and select candidates based the
ratio, so that we can shrink candidates to those oldest segments;
- select target segment with fewest valid blocks in order to
migrate blocks with minimum cost;
2. select a target victim:
- select candidates beased age threshold;
- set candidate_radius threshold, search candidates whose age is
around source victims, searching radius should less than the
radius threshold.
- select target segment with most valid blocks in order to avoid
migrating current target segment.
3. merge valid blocks from source victim into target victim with
SSR alloctor.
Test steps:
- create 160 dirty segments:
* half of them have 128 valid blocks per segment
* left of them have 384 valid blocks per segment
- run background GC
Benefit: GC count and block movement count both decrease obviously:
- Before:
- Valid: 86
- Dirty: 1
- Prefree: 11
- Free: 6001 (6001)
GC calls: 162 (BG: 220)
- data segments : 160 (160)
- node segments : 2 (2)
Try to move 41454 blocks (BG: 41454)
- data blocks : 40960 (40960)
- node blocks : 494 (494)
IPU: 0 blocks
SSR: 0 blocks in 0 segments
LFS: 41364 blocks in 81 segments
- After:
- Valid: 87
- Dirty: 0
- Prefree: 4
- Free: 6008 (6008)
GC calls: 75 (BG: 76)
- data segments : 74 (74)
- node segments : 1 (1)
Try to move 12813 blocks (BG: 12813)
- data blocks : 12544 (12544)
- node blocks : 269 (269)
IPU: 0 blocks
SSR: 12032 blocks in 77 segments
LFS: 855 blocks in 2 segments
Signed-off-by: Chao Yu <yuchao0@huawei.com>
[Jaegeuk Kim: fix a bug along with pinfile in-mem segment & clean up]
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-08-04 21:14:49 +08:00
|
|
|
if (is_atgc) {
|
|
|
|
add_victim_entry(sbi, &p, segno);
|
|
|
|
goto next;
|
|
|
|
}
|
|
|
|
|
f2fs: add garbage collection functions
This adds on-demand and background cleaning functions.
- The basic background cleaning policy is trying to do cleaning jobs as much as
possible whenever the system is idle. Once the background cleaning is done,
the cleaner sleeps an amount of time not to interfere with VFS calls. The time
is dynamically adjusted according to the status of whole segments, which is
decreased when the following conditions are satisfied.
. GC is not conducted currently, and
. IO subsystem is idle by checking the number of requets in bdev's request
list, and
. There are enough dirty segments.
Otherwise, the time is increased incrementally until to the maximum time.
Note that, min and max times are 10 secs and 30 secs by default.
- F2FS adopts a default victim selection policy where background cleaning uses
a cost-benefit algorithm, while on-demand cleaning uses a greedy algorithm.
- The method of moving data during the cleaning is slightly different between
background and on-demand cleaning schemes. In the case of background cleaning,
F2FS loads the data, and marks them as dirty. Then, F2FS expects that the data
will be moved by flusher or VM. In the case of on-demand cleaning, F2FS should
move the data right away.
- In order to identify valid blocks in a victim segment, F2FS scans the bitmap
of the segment managed as an SIT entry.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:13:01 +08:00
|
|
|
cost = get_gc_cost(sbi, segno, &p);
|
|
|
|
|
|
|
|
if (p.min_cost > cost) {
|
|
|
|
p.min_segno = segno;
|
|
|
|
p.min_cost = cost;
|
2013-09-13 08:38:54 +08:00
|
|
|
}
|
2016-02-03 16:21:57 +08:00
|
|
|
next:
|
|
|
|
if (nsearched >= p.max_search) {
|
2017-04-14 06:17:00 +08:00
|
|
|
if (!sm->last_victim[p.gc_mode] && segno <= last_victim)
|
2020-06-18 12:37:10 +08:00
|
|
|
sm->last_victim[p.gc_mode] =
|
|
|
|
last_victim + p.ofs_unit;
|
2016-02-19 08:34:38 +08:00
|
|
|
else
|
2020-06-18 12:37:10 +08:00
|
|
|
sm->last_victim[p.gc_mode] = segno + p.ofs_unit;
|
2019-06-05 11:33:25 +08:00
|
|
|
sm->last_victim[p.gc_mode] %=
|
|
|
|
(MAIN_SECS(sbi) * sbi->segs_per_sec);
|
f2fs: add garbage collection functions
This adds on-demand and background cleaning functions.
- The basic background cleaning policy is trying to do cleaning jobs as much as
possible whenever the system is idle. Once the background cleaning is done,
the cleaner sleeps an amount of time not to interfere with VFS calls. The time
is dynamically adjusted according to the status of whole segments, which is
decreased when the following conditions are satisfied.
. GC is not conducted currently, and
. IO subsystem is idle by checking the number of requets in bdev's request
list, and
. There are enough dirty segments.
Otherwise, the time is increased incrementally until to the maximum time.
Note that, min and max times are 10 secs and 30 secs by default.
- F2FS adopts a default victim selection policy where background cleaning uses
a cost-benefit algorithm, while on-demand cleaning uses a greedy algorithm.
- The method of moving data during the cleaning is slightly different between
background and on-demand cleaning schemes. In the case of background cleaning,
F2FS loads the data, and marks them as dirty. Then, F2FS expects that the data
will be moved by flusher or VM. In the case of on-demand cleaning, F2FS should
move the data right away.
- In order to identify valid blocks in a victim segment, F2FS scans the bitmap
of the segment managed as an SIT entry.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:13:01 +08:00
|
|
|
break;
|
|
|
|
}
|
|
|
|
}
|
f2fs: support age threshold based garbage collection
There are several issues in current background GC algorithm:
- valid blocks is one of key factors during cost overhead calculation,
so if segment has less valid block, however even its age is young or
it locates hot segment, CB algorithm will still choose the segment as
victim, it's not appropriate.
- GCed data/node will go to existing logs, no matter in-there datas'
update frequency is the same or not, it may mix hot and cold data
again.
- GC alloctor mainly use LFS type segment, it will cost free segment
more quickly.
This patch introduces a new algorithm named age threshold based
garbage collection to solve above issues, there are three steps
mainly:
1. select a source victim:
- set an age threshold, and select candidates beased threshold:
e.g.
0 means youngest, 100 means oldest, if we set age threshold to 80
then select dirty segments which has age in range of [80, 100] as
candiddates;
- set candidate_ratio threshold, and select candidates based the
ratio, so that we can shrink candidates to those oldest segments;
- select target segment with fewest valid blocks in order to
migrate blocks with minimum cost;
2. select a target victim:
- select candidates beased age threshold;
- set candidate_radius threshold, search candidates whose age is
around source victims, searching radius should less than the
radius threshold.
- select target segment with most valid blocks in order to avoid
migrating current target segment.
3. merge valid blocks from source victim into target victim with
SSR alloctor.
Test steps:
- create 160 dirty segments:
* half of them have 128 valid blocks per segment
* left of them have 384 valid blocks per segment
- run background GC
Benefit: GC count and block movement count both decrease obviously:
- Before:
- Valid: 86
- Dirty: 1
- Prefree: 11
- Free: 6001 (6001)
GC calls: 162 (BG: 220)
- data segments : 160 (160)
- node segments : 2 (2)
Try to move 41454 blocks (BG: 41454)
- data blocks : 40960 (40960)
- node blocks : 494 (494)
IPU: 0 blocks
SSR: 0 blocks in 0 segments
LFS: 41364 blocks in 81 segments
- After:
- Valid: 87
- Dirty: 0
- Prefree: 4
- Free: 6008 (6008)
GC calls: 75 (BG: 76)
- data segments : 74 (74)
- node segments : 1 (1)
Try to move 12813 blocks (BG: 12813)
- data blocks : 12544 (12544)
- node blocks : 269 (269)
IPU: 0 blocks
SSR: 12032 blocks in 77 segments
LFS: 855 blocks in 2 segments
Signed-off-by: Chao Yu <yuchao0@huawei.com>
[Jaegeuk Kim: fix a bug along with pinfile in-mem segment & clean up]
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-08-04 21:14:49 +08:00
|
|
|
|
|
|
|
/* get victim for GC_AT/AT_SSR */
|
|
|
|
if (is_atgc) {
|
|
|
|
lookup_victim_by_age(sbi, &p);
|
|
|
|
release_victim_entry(sbi);
|
|
|
|
}
|
|
|
|
|
|
|
|
if (is_atgc && p.min_segno == NULL_SEGNO &&
|
|
|
|
sm->elapsed_time < p.age_threshold) {
|
|
|
|
p.age_threshold = 0;
|
|
|
|
goto retry;
|
|
|
|
}
|
|
|
|
|
f2fs: add garbage collection functions
This adds on-demand and background cleaning functions.
- The basic background cleaning policy is trying to do cleaning jobs as much as
possible whenever the system is idle. Once the background cleaning is done,
the cleaner sleeps an amount of time not to interfere with VFS calls. The time
is dynamically adjusted according to the status of whole segments, which is
decreased when the following conditions are satisfied.
. GC is not conducted currently, and
. IO subsystem is idle by checking the number of requets in bdev's request
list, and
. There are enough dirty segments.
Otherwise, the time is increased incrementally until to the maximum time.
Note that, min and max times are 10 secs and 30 secs by default.
- F2FS adopts a default victim selection policy where background cleaning uses
a cost-benefit algorithm, while on-demand cleaning uses a greedy algorithm.
- The method of moving data during the cleaning is slightly different between
background and on-demand cleaning schemes. In the case of background cleaning,
F2FS loads the data, and marks them as dirty. Then, F2FS expects that the data
will be moved by flusher or VM. In the case of on-demand cleaning, F2FS should
move the data right away.
- In order to identify valid blocks in a victim segment, F2FS scans the bitmap
of the segment managed as an SIT entry.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:13:01 +08:00
|
|
|
if (p.min_segno != NULL_SEGNO) {
|
2013-06-01 15:20:26 +08:00
|
|
|
got_it:
|
f2fs: support subsectional garbage collection
Section is minimal garbage collection unit of f2fs, in zoned block
device, or ancient block mapping flash device, in order to improve
GC efficiency, we can align GC unit to lower device erase unit,
normally, it consists of multiple of segments.
Once background or foreground GC triggers, it brings a large number
of IOs, which will impact user IO, and also occupy cpu/memory resource
intensively.
So, to reduce impact of GC on large size section, this patch supports
subsectional GC, in one cycle of GC, it only migrate partial segment{s}
in victim section. Currently, by default, we use sbi->segs_per_sec as
migration granularity.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-10-24 18:37:27 +08:00
|
|
|
*result = (p.min_segno / p.ofs_unit) * p.ofs_unit;
|
|
|
|
got_result:
|
f2fs: add garbage collection functions
This adds on-demand and background cleaning functions.
- The basic background cleaning policy is trying to do cleaning jobs as much as
possible whenever the system is idle. Once the background cleaning is done,
the cleaner sleeps an amount of time not to interfere with VFS calls. The time
is dynamically adjusted according to the status of whole segments, which is
decreased when the following conditions are satisfied.
. GC is not conducted currently, and
. IO subsystem is idle by checking the number of requets in bdev's request
list, and
. There are enough dirty segments.
Otherwise, the time is increased incrementally until to the maximum time.
Note that, min and max times are 10 secs and 30 secs by default.
- F2FS adopts a default victim selection policy where background cleaning uses
a cost-benefit algorithm, while on-demand cleaning uses a greedy algorithm.
- The method of moving data during the cleaning is slightly different between
background and on-demand cleaning schemes. In the case of background cleaning,
F2FS loads the data, and marks them as dirty. Then, F2FS expects that the data
will be moved by flusher or VM. In the case of on-demand cleaning, F2FS should
move the data right away.
- In order to identify valid blocks in a victim segment, F2FS scans the bitmap
of the segment managed as an SIT entry.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:13:01 +08:00
|
|
|
if (p.alloc_mode == LFS) {
|
2017-04-08 06:08:17 +08:00
|
|
|
secno = GET_SEC_FROM_SEG(sbi, p.min_segno);
|
2013-03-31 12:26:03 +08:00
|
|
|
if (gc_type == FG_GC)
|
|
|
|
sbi->cur_victim_sec = secno;
|
|
|
|
else
|
|
|
|
set_bit(secno, dirty_i->victim_secmap);
|
f2fs: add garbage collection functions
This adds on-demand and background cleaning functions.
- The basic background cleaning policy is trying to do cleaning jobs as much as
possible whenever the system is idle. Once the background cleaning is done,
the cleaner sleeps an amount of time not to interfere with VFS calls. The time
is dynamically adjusted according to the status of whole segments, which is
decreased when the following conditions are satisfied.
. GC is not conducted currently, and
. IO subsystem is idle by checking the number of requets in bdev's request
list, and
. There are enough dirty segments.
Otherwise, the time is increased incrementally until to the maximum time.
Note that, min and max times are 10 secs and 30 secs by default.
- F2FS adopts a default victim selection policy where background cleaning uses
a cost-benefit algorithm, while on-demand cleaning uses a greedy algorithm.
- The method of moving data during the cleaning is slightly different between
background and on-demand cleaning schemes. In the case of background cleaning,
F2FS loads the data, and marks them as dirty. Then, F2FS expects that the data
will be moved by flusher or VM. In the case of on-demand cleaning, F2FS should
move the data right away.
- In order to identify valid blocks in a victim segment, F2FS scans the bitmap
of the segment managed as an SIT entry.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:13:01 +08:00
|
|
|
}
|
2020-06-28 19:23:03 +08:00
|
|
|
ret = 0;
|
2013-04-23 15:42:53 +08:00
|
|
|
|
2018-11-26 16:01:42 +08:00
|
|
|
}
|
|
|
|
out:
|
|
|
|
if (p.min_segno != NULL_SEGNO)
|
2013-04-23 15:42:53 +08:00
|
|
|
trace_f2fs_get_victim(sbi->sb, type, gc_type, &p,
|
|
|
|
sbi->cur_victim_sec,
|
|
|
|
prefree_segments(sbi), free_segments(sbi));
|
f2fs: add garbage collection functions
This adds on-demand and background cleaning functions.
- The basic background cleaning policy is trying to do cleaning jobs as much as
possible whenever the system is idle. Once the background cleaning is done,
the cleaner sleeps an amount of time not to interfere with VFS calls. The time
is dynamically adjusted according to the status of whole segments, which is
decreased when the following conditions are satisfied.
. GC is not conducted currently, and
. IO subsystem is idle by checking the number of requets in bdev's request
list, and
. There are enough dirty segments.
Otherwise, the time is increased incrementally until to the maximum time.
Note that, min and max times are 10 secs and 30 secs by default.
- F2FS adopts a default victim selection policy where background cleaning uses
a cost-benefit algorithm, while on-demand cleaning uses a greedy algorithm.
- The method of moving data during the cleaning is slightly different between
background and on-demand cleaning schemes. In the case of background cleaning,
F2FS loads the data, and marks them as dirty. Then, F2FS expects that the data
will be moved by flusher or VM. In the case of on-demand cleaning, F2FS should
move the data right away.
- In order to identify valid blocks in a victim segment, F2FS scans the bitmap
of the segment managed as an SIT entry.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:13:01 +08:00
|
|
|
mutex_unlock(&dirty_i->seglist_lock);
|
|
|
|
|
2020-06-28 19:23:03 +08:00
|
|
|
return ret;
|
f2fs: add garbage collection functions
This adds on-demand and background cleaning functions.
- The basic background cleaning policy is trying to do cleaning jobs as much as
possible whenever the system is idle. Once the background cleaning is done,
the cleaner sleeps an amount of time not to interfere with VFS calls. The time
is dynamically adjusted according to the status of whole segments, which is
decreased when the following conditions are satisfied.
. GC is not conducted currently, and
. IO subsystem is idle by checking the number of requets in bdev's request
list, and
. There are enough dirty segments.
Otherwise, the time is increased incrementally until to the maximum time.
Note that, min and max times are 10 secs and 30 secs by default.
- F2FS adopts a default victim selection policy where background cleaning uses
a cost-benefit algorithm, while on-demand cleaning uses a greedy algorithm.
- The method of moving data during the cleaning is slightly different between
background and on-demand cleaning schemes. In the case of background cleaning,
F2FS loads the data, and marks them as dirty. Then, F2FS expects that the data
will be moved by flusher or VM. In the case of on-demand cleaning, F2FS should
move the data right away.
- In order to identify valid blocks in a victim segment, F2FS scans the bitmap
of the segment managed as an SIT entry.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:13:01 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
static const struct victim_selection default_v_ops = {
|
|
|
|
.get_victim = get_victim_by_default,
|
|
|
|
};
|
|
|
|
|
2014-11-28 23:49:40 +08:00
|
|
|
static struct inode *find_gc_inode(struct gc_inode_list *gc_list, nid_t ino)
|
f2fs: add garbage collection functions
This adds on-demand and background cleaning functions.
- The basic background cleaning policy is trying to do cleaning jobs as much as
possible whenever the system is idle. Once the background cleaning is done,
the cleaner sleeps an amount of time not to interfere with VFS calls. The time
is dynamically adjusted according to the status of whole segments, which is
decreased when the following conditions are satisfied.
. GC is not conducted currently, and
. IO subsystem is idle by checking the number of requets in bdev's request
list, and
. There are enough dirty segments.
Otherwise, the time is increased incrementally until to the maximum time.
Note that, min and max times are 10 secs and 30 secs by default.
- F2FS adopts a default victim selection policy where background cleaning uses
a cost-benefit algorithm, while on-demand cleaning uses a greedy algorithm.
- The method of moving data during the cleaning is slightly different between
background and on-demand cleaning schemes. In the case of background cleaning,
F2FS loads the data, and marks them as dirty. Then, F2FS expects that the data
will be moved by flusher or VM. In the case of on-demand cleaning, F2FS should
move the data right away.
- In order to identify valid blocks in a victim segment, F2FS scans the bitmap
of the segment managed as an SIT entry.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:13:01 +08:00
|
|
|
{
|
|
|
|
struct inode_entry *ie;
|
|
|
|
|
2014-11-28 23:49:40 +08:00
|
|
|
ie = radix_tree_lookup(&gc_list->iroot, ino);
|
|
|
|
if (ie)
|
|
|
|
return ie->inode;
|
f2fs: add garbage collection functions
This adds on-demand and background cleaning functions.
- The basic background cleaning policy is trying to do cleaning jobs as much as
possible whenever the system is idle. Once the background cleaning is done,
the cleaner sleeps an amount of time not to interfere with VFS calls. The time
is dynamically adjusted according to the status of whole segments, which is
decreased when the following conditions are satisfied.
. GC is not conducted currently, and
. IO subsystem is idle by checking the number of requets in bdev's request
list, and
. There are enough dirty segments.
Otherwise, the time is increased incrementally until to the maximum time.
Note that, min and max times are 10 secs and 30 secs by default.
- F2FS adopts a default victim selection policy where background cleaning uses
a cost-benefit algorithm, while on-demand cleaning uses a greedy algorithm.
- The method of moving data during the cleaning is slightly different between
background and on-demand cleaning schemes. In the case of background cleaning,
F2FS loads the data, and marks them as dirty. Then, F2FS expects that the data
will be moved by flusher or VM. In the case of on-demand cleaning, F2FS should
move the data right away.
- In order to identify valid blocks in a victim segment, F2FS scans the bitmap
of the segment managed as an SIT entry.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:13:01 +08:00
|
|
|
return NULL;
|
|
|
|
}
|
|
|
|
|
2014-11-28 23:49:40 +08:00
|
|
|
static void add_gc_inode(struct gc_inode_list *gc_list, struct inode *inode)
|
f2fs: add garbage collection functions
This adds on-demand and background cleaning functions.
- The basic background cleaning policy is trying to do cleaning jobs as much as
possible whenever the system is idle. Once the background cleaning is done,
the cleaner sleeps an amount of time not to interfere with VFS calls. The time
is dynamically adjusted according to the status of whole segments, which is
decreased when the following conditions are satisfied.
. GC is not conducted currently, and
. IO subsystem is idle by checking the number of requets in bdev's request
list, and
. There are enough dirty segments.
Otherwise, the time is increased incrementally until to the maximum time.
Note that, min and max times are 10 secs and 30 secs by default.
- F2FS adopts a default victim selection policy where background cleaning uses
a cost-benefit algorithm, while on-demand cleaning uses a greedy algorithm.
- The method of moving data during the cleaning is slightly different between
background and on-demand cleaning schemes. In the case of background cleaning,
F2FS loads the data, and marks them as dirty. Then, F2FS expects that the data
will be moved by flusher or VM. In the case of on-demand cleaning, F2FS should
move the data right away.
- In order to identify valid blocks in a victim segment, F2FS scans the bitmap
of the segment managed as an SIT entry.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:13:01 +08:00
|
|
|
{
|
2013-06-20 17:52:39 +08:00
|
|
|
struct inode_entry *new_ie;
|
|
|
|
|
2014-11-28 23:49:40 +08:00
|
|
|
if (inode == find_gc_inode(gc_list, inode->i_ino)) {
|
2013-06-20 17:52:39 +08:00
|
|
|
iput(inode);
|
|
|
|
return;
|
f2fs: add garbage collection functions
This adds on-demand and background cleaning functions.
- The basic background cleaning policy is trying to do cleaning jobs as much as
possible whenever the system is idle. Once the background cleaning is done,
the cleaner sleeps an amount of time not to interfere with VFS calls. The time
is dynamically adjusted according to the status of whole segments, which is
decreased when the following conditions are satisfied.
. GC is not conducted currently, and
. IO subsystem is idle by checking the number of requets in bdev's request
list, and
. There are enough dirty segments.
Otherwise, the time is increased incrementally until to the maximum time.
Note that, min and max times are 10 secs and 30 secs by default.
- F2FS adopts a default victim selection policy where background cleaning uses
a cost-benefit algorithm, while on-demand cleaning uses a greedy algorithm.
- The method of moving data during the cleaning is slightly different between
background and on-demand cleaning schemes. In the case of background cleaning,
F2FS loads the data, and marks them as dirty. Then, F2FS expects that the data
will be moved by flusher or VM. In the case of on-demand cleaning, F2FS should
move the data right away.
- In order to identify valid blocks in a victim segment, F2FS scans the bitmap
of the segment managed as an SIT entry.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:13:01 +08:00
|
|
|
}
|
f2fs: clean up symbol namespace
As Ted reported:
"Hi, I was looking at f2fs's sources recently, and I noticed that there
is a very large number of non-static symbols which don't have a f2fs
prefix. There's well over a hundred (see attached below).
As one example, in fs/f2fs/dir.c there is:
unsigned char get_de_type(struct f2fs_dir_entry *de)
This function is clearly only useful for f2fs, but it has a generic
name. This means that if any other file system tries to have the same
symbol name, there will be a symbol conflict and the kernel would not
successfully build. It also means that when someone is looking f2fs
sources, it's not at all obvious whether a function such as
read_data_page(), invalidate_blocks(), is a generic kernel function
found in the fs, mm, or block layers, or a f2fs specific function.
You might want to fix this at some point. Hopefully Kent's bcachefs
isn't similarly using genericly named functions, since that might
cause conflicts with f2fs's functions --- but just as this would be a
problem that we would rightly insist that Kent fix, this is something
that we should have rightly insisted that f2fs should have fixed
before it was integrated into the mainline kernel.
acquire_orphan_inode
add_ino_entry
add_orphan_inode
allocate_data_block
allocate_new_segments
alloc_nid
alloc_nid_done
alloc_nid_failed
available_free_memory
...."
This patch adds "f2fs_" prefix for all non-static symbols in order to:
a) avoid conflict with other kernel generic symbols;
b) to indicate the function is f2fs specific one instead of generic
one;
Reported-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-30 00:20:41 +08:00
|
|
|
new_ie = f2fs_kmem_cache_alloc(f2fs_inode_entry_slab, GFP_NOFS);
|
f2fs: add garbage collection functions
This adds on-demand and background cleaning functions.
- The basic background cleaning policy is trying to do cleaning jobs as much as
possible whenever the system is idle. Once the background cleaning is done,
the cleaner sleeps an amount of time not to interfere with VFS calls. The time
is dynamically adjusted according to the status of whole segments, which is
decreased when the following conditions are satisfied.
. GC is not conducted currently, and
. IO subsystem is idle by checking the number of requets in bdev's request
list, and
. There are enough dirty segments.
Otherwise, the time is increased incrementally until to the maximum time.
Note that, min and max times are 10 secs and 30 secs by default.
- F2FS adopts a default victim selection policy where background cleaning uses
a cost-benefit algorithm, while on-demand cleaning uses a greedy algorithm.
- The method of moving data during the cleaning is slightly different between
background and on-demand cleaning schemes. In the case of background cleaning,
F2FS loads the data, and marks them as dirty. Then, F2FS expects that the data
will be moved by flusher or VM. In the case of on-demand cleaning, F2FS should
move the data right away.
- In order to identify valid blocks in a victim segment, F2FS scans the bitmap
of the segment managed as an SIT entry.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:13:01 +08:00
|
|
|
new_ie->inode = inode;
|
2015-01-23 20:37:53 +08:00
|
|
|
|
|
|
|
f2fs_radix_tree_insert(&gc_list->iroot, inode->i_ino, new_ie);
|
2014-11-28 23:49:40 +08:00
|
|
|
list_add_tail(&new_ie->list, &gc_list->ilist);
|
f2fs: add garbage collection functions
This adds on-demand and background cleaning functions.
- The basic background cleaning policy is trying to do cleaning jobs as much as
possible whenever the system is idle. Once the background cleaning is done,
the cleaner sleeps an amount of time not to interfere with VFS calls. The time
is dynamically adjusted according to the status of whole segments, which is
decreased when the following conditions are satisfied.
. GC is not conducted currently, and
. IO subsystem is idle by checking the number of requets in bdev's request
list, and
. There are enough dirty segments.
Otherwise, the time is increased incrementally until to the maximum time.
Note that, min and max times are 10 secs and 30 secs by default.
- F2FS adopts a default victim selection policy where background cleaning uses
a cost-benefit algorithm, while on-demand cleaning uses a greedy algorithm.
- The method of moving data during the cleaning is slightly different between
background and on-demand cleaning schemes. In the case of background cleaning,
F2FS loads the data, and marks them as dirty. Then, F2FS expects that the data
will be moved by flusher or VM. In the case of on-demand cleaning, F2FS should
move the data right away.
- In order to identify valid blocks in a victim segment, F2FS scans the bitmap
of the segment managed as an SIT entry.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:13:01 +08:00
|
|
|
}
|
|
|
|
|
2014-11-28 23:49:40 +08:00
|
|
|
static void put_gc_inode(struct gc_inode_list *gc_list)
|
f2fs: add garbage collection functions
This adds on-demand and background cleaning functions.
- The basic background cleaning policy is trying to do cleaning jobs as much as
possible whenever the system is idle. Once the background cleaning is done,
the cleaner sleeps an amount of time not to interfere with VFS calls. The time
is dynamically adjusted according to the status of whole segments, which is
decreased when the following conditions are satisfied.
. GC is not conducted currently, and
. IO subsystem is idle by checking the number of requets in bdev's request
list, and
. There are enough dirty segments.
Otherwise, the time is increased incrementally until to the maximum time.
Note that, min and max times are 10 secs and 30 secs by default.
- F2FS adopts a default victim selection policy where background cleaning uses
a cost-benefit algorithm, while on-demand cleaning uses a greedy algorithm.
- The method of moving data during the cleaning is slightly different between
background and on-demand cleaning schemes. In the case of background cleaning,
F2FS loads the data, and marks them as dirty. Then, F2FS expects that the data
will be moved by flusher or VM. In the case of on-demand cleaning, F2FS should
move the data right away.
- In order to identify valid blocks in a victim segment, F2FS scans the bitmap
of the segment managed as an SIT entry.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:13:01 +08:00
|
|
|
{
|
|
|
|
struct inode_entry *ie, *next_ie;
|
2014-11-28 23:49:40 +08:00
|
|
|
list_for_each_entry_safe(ie, next_ie, &gc_list->ilist, list) {
|
|
|
|
radix_tree_delete(&gc_list->iroot, ie->inode->i_ino);
|
f2fs: add garbage collection functions
This adds on-demand and background cleaning functions.
- The basic background cleaning policy is trying to do cleaning jobs as much as
possible whenever the system is idle. Once the background cleaning is done,
the cleaner sleeps an amount of time not to interfere with VFS calls. The time
is dynamically adjusted according to the status of whole segments, which is
decreased when the following conditions are satisfied.
. GC is not conducted currently, and
. IO subsystem is idle by checking the number of requets in bdev's request
list, and
. There are enough dirty segments.
Otherwise, the time is increased incrementally until to the maximum time.
Note that, min and max times are 10 secs and 30 secs by default.
- F2FS adopts a default victim selection policy where background cleaning uses
a cost-benefit algorithm, while on-demand cleaning uses a greedy algorithm.
- The method of moving data during the cleaning is slightly different between
background and on-demand cleaning schemes. In the case of background cleaning,
F2FS loads the data, and marks them as dirty. Then, F2FS expects that the data
will be moved by flusher or VM. In the case of on-demand cleaning, F2FS should
move the data right away.
- In order to identify valid blocks in a victim segment, F2FS scans the bitmap
of the segment managed as an SIT entry.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:13:01 +08:00
|
|
|
iput(ie->inode);
|
|
|
|
list_del(&ie->list);
|
f2fs: clean up symbol namespace
As Ted reported:
"Hi, I was looking at f2fs's sources recently, and I noticed that there
is a very large number of non-static symbols which don't have a f2fs
prefix. There's well over a hundred (see attached below).
As one example, in fs/f2fs/dir.c there is:
unsigned char get_de_type(struct f2fs_dir_entry *de)
This function is clearly only useful for f2fs, but it has a generic
name. This means that if any other file system tries to have the same
symbol name, there will be a symbol conflict and the kernel would not
successfully build. It also means that when someone is looking f2fs
sources, it's not at all obvious whether a function such as
read_data_page(), invalidate_blocks(), is a generic kernel function
found in the fs, mm, or block layers, or a f2fs specific function.
You might want to fix this at some point. Hopefully Kent's bcachefs
isn't similarly using genericly named functions, since that might
cause conflicts with f2fs's functions --- but just as this would be a
problem that we would rightly insist that Kent fix, this is something
that we should have rightly insisted that f2fs should have fixed
before it was integrated into the mainline kernel.
acquire_orphan_inode
add_ino_entry
add_orphan_inode
allocate_data_block
allocate_new_segments
alloc_nid
alloc_nid_done
alloc_nid_failed
available_free_memory
...."
This patch adds "f2fs_" prefix for all non-static symbols in order to:
a) avoid conflict with other kernel generic symbols;
b) to indicate the function is f2fs specific one instead of generic
one;
Reported-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-30 00:20:41 +08:00
|
|
|
kmem_cache_free(f2fs_inode_entry_slab, ie);
|
f2fs: add garbage collection functions
This adds on-demand and background cleaning functions.
- The basic background cleaning policy is trying to do cleaning jobs as much as
possible whenever the system is idle. Once the background cleaning is done,
the cleaner sleeps an amount of time not to interfere with VFS calls. The time
is dynamically adjusted according to the status of whole segments, which is
decreased when the following conditions are satisfied.
. GC is not conducted currently, and
. IO subsystem is idle by checking the number of requets in bdev's request
list, and
. There are enough dirty segments.
Otherwise, the time is increased incrementally until to the maximum time.
Note that, min and max times are 10 secs and 30 secs by default.
- F2FS adopts a default victim selection policy where background cleaning uses
a cost-benefit algorithm, while on-demand cleaning uses a greedy algorithm.
- The method of moving data during the cleaning is slightly different between
background and on-demand cleaning schemes. In the case of background cleaning,
F2FS loads the data, and marks them as dirty. Then, F2FS expects that the data
will be moved by flusher or VM. In the case of on-demand cleaning, F2FS should
move the data right away.
- In order to identify valid blocks in a victim segment, F2FS scans the bitmap
of the segment managed as an SIT entry.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:13:01 +08:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
static int check_valid_map(struct f2fs_sb_info *sbi,
|
|
|
|
unsigned int segno, int offset)
|
|
|
|
{
|
|
|
|
struct sit_info *sit_i = SIT_I(sbi);
|
|
|
|
struct seg_entry *sentry;
|
|
|
|
int ret;
|
|
|
|
|
2017-10-30 17:49:53 +08:00
|
|
|
down_read(&sit_i->sentry_lock);
|
f2fs: add garbage collection functions
This adds on-demand and background cleaning functions.
- The basic background cleaning policy is trying to do cleaning jobs as much as
possible whenever the system is idle. Once the background cleaning is done,
the cleaner sleeps an amount of time not to interfere with VFS calls. The time
is dynamically adjusted according to the status of whole segments, which is
decreased when the following conditions are satisfied.
. GC is not conducted currently, and
. IO subsystem is idle by checking the number of requets in bdev's request
list, and
. There are enough dirty segments.
Otherwise, the time is increased incrementally until to the maximum time.
Note that, min and max times are 10 secs and 30 secs by default.
- F2FS adopts a default victim selection policy where background cleaning uses
a cost-benefit algorithm, while on-demand cleaning uses a greedy algorithm.
- The method of moving data during the cleaning is slightly different between
background and on-demand cleaning schemes. In the case of background cleaning,
F2FS loads the data, and marks them as dirty. Then, F2FS expects that the data
will be moved by flusher or VM. In the case of on-demand cleaning, F2FS should
move the data right away.
- In order to identify valid blocks in a victim segment, F2FS scans the bitmap
of the segment managed as an SIT entry.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:13:01 +08:00
|
|
|
sentry = get_seg_entry(sbi, segno);
|
|
|
|
ret = f2fs_test_bit(offset, sentry->cur_valid_map);
|
2017-10-30 17:49:53 +08:00
|
|
|
up_read(&sit_i->sentry_lock);
|
2013-02-04 14:11:17 +08:00
|
|
|
return ret;
|
f2fs: add garbage collection functions
This adds on-demand and background cleaning functions.
- The basic background cleaning policy is trying to do cleaning jobs as much as
possible whenever the system is idle. Once the background cleaning is done,
the cleaner sleeps an amount of time not to interfere with VFS calls. The time
is dynamically adjusted according to the status of whole segments, which is
decreased when the following conditions are satisfied.
. GC is not conducted currently, and
. IO subsystem is idle by checking the number of requets in bdev's request
list, and
. There are enough dirty segments.
Otherwise, the time is increased incrementally until to the maximum time.
Note that, min and max times are 10 secs and 30 secs by default.
- F2FS adopts a default victim selection policy where background cleaning uses
a cost-benefit algorithm, while on-demand cleaning uses a greedy algorithm.
- The method of moving data during the cleaning is slightly different between
background and on-demand cleaning schemes. In the case of background cleaning,
F2FS loads the data, and marks them as dirty. Then, F2FS expects that the data
will be moved by flusher or VM. In the case of on-demand cleaning, F2FS should
move the data right away.
- In order to identify valid blocks in a victim segment, F2FS scans the bitmap
of the segment managed as an SIT entry.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:13:01 +08:00
|
|
|
}
|
|
|
|
|
2012-11-29 12:28:09 +08:00
|
|
|
/*
|
f2fs: add garbage collection functions
This adds on-demand and background cleaning functions.
- The basic background cleaning policy is trying to do cleaning jobs as much as
possible whenever the system is idle. Once the background cleaning is done,
the cleaner sleeps an amount of time not to interfere with VFS calls. The time
is dynamically adjusted according to the status of whole segments, which is
decreased when the following conditions are satisfied.
. GC is not conducted currently, and
. IO subsystem is idle by checking the number of requets in bdev's request
list, and
. There are enough dirty segments.
Otherwise, the time is increased incrementally until to the maximum time.
Note that, min and max times are 10 secs and 30 secs by default.
- F2FS adopts a default victim selection policy where background cleaning uses
a cost-benefit algorithm, while on-demand cleaning uses a greedy algorithm.
- The method of moving data during the cleaning is slightly different between
background and on-demand cleaning schemes. In the case of background cleaning,
F2FS loads the data, and marks them as dirty. Then, F2FS expects that the data
will be moved by flusher or VM. In the case of on-demand cleaning, F2FS should
move the data right away.
- In order to identify valid blocks in a victim segment, F2FS scans the bitmap
of the segment managed as an SIT entry.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:13:01 +08:00
|
|
|
* This function compares node address got in summary with that in NAT.
|
|
|
|
* On validity, copy that node with cold status, otherwise (invalid node)
|
|
|
|
* ignore that.
|
|
|
|
*/
|
2018-09-13 07:40:53 +08:00
|
|
|
static int gc_node_segment(struct f2fs_sb_info *sbi,
|
f2fs: add garbage collection functions
This adds on-demand and background cleaning functions.
- The basic background cleaning policy is trying to do cleaning jobs as much as
possible whenever the system is idle. Once the background cleaning is done,
the cleaner sleeps an amount of time not to interfere with VFS calls. The time
is dynamically adjusted according to the status of whole segments, which is
decreased when the following conditions are satisfied.
. GC is not conducted currently, and
. IO subsystem is idle by checking the number of requets in bdev's request
list, and
. There are enough dirty segments.
Otherwise, the time is increased incrementally until to the maximum time.
Note that, min and max times are 10 secs and 30 secs by default.
- F2FS adopts a default victim selection policy where background cleaning uses
a cost-benefit algorithm, while on-demand cleaning uses a greedy algorithm.
- The method of moving data during the cleaning is slightly different between
background and on-demand cleaning schemes. In the case of background cleaning,
F2FS loads the data, and marks them as dirty. Then, F2FS expects that the data
will be moved by flusher or VM. In the case of on-demand cleaning, F2FS should
move the data right away.
- In order to identify valid blocks in a victim segment, F2FS scans the bitmap
of the segment managed as an SIT entry.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:13:01 +08:00
|
|
|
struct f2fs_summary *sum, unsigned int segno, int gc_type)
|
|
|
|
{
|
|
|
|
struct f2fs_summary *entry;
|
2015-08-15 05:37:50 +08:00
|
|
|
block_t start_addr;
|
f2fs: add garbage collection functions
This adds on-demand and background cleaning functions.
- The basic background cleaning policy is trying to do cleaning jobs as much as
possible whenever the system is idle. Once the background cleaning is done,
the cleaner sleeps an amount of time not to interfere with VFS calls. The time
is dynamically adjusted according to the status of whole segments, which is
decreased when the following conditions are satisfied.
. GC is not conducted currently, and
. IO subsystem is idle by checking the number of requets in bdev's request
list, and
. There are enough dirty segments.
Otherwise, the time is increased incrementally until to the maximum time.
Note that, min and max times are 10 secs and 30 secs by default.
- F2FS adopts a default victim selection policy where background cleaning uses
a cost-benefit algorithm, while on-demand cleaning uses a greedy algorithm.
- The method of moving data during the cleaning is slightly different between
background and on-demand cleaning schemes. In the case of background cleaning,
F2FS loads the data, and marks them as dirty. Then, F2FS expects that the data
will be moved by flusher or VM. In the case of on-demand cleaning, F2FS should
move the data right away.
- In order to identify valid blocks in a victim segment, F2FS scans the bitmap
of the segment managed as an SIT entry.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:13:01 +08:00
|
|
|
int off;
|
f2fs: do in batch synchronously readahead during GC
In order to enhance performance, we try to readahead node page during
GC, but before loading node page we should get block address of node page
which is stored in NAT table, so synchronously read of single NAT page
block our readahead flow.
f2fs_submit_page_bio: dev = (251,0), ino = 2, page_index = 0xa1e, oldaddr = 0xa1e, newaddr = 0xa1e, rw = READ_SYNC(MP), type = META
f2fs_submit_page_bio: dev = (251,0), ino = 1, page_index = 0x35e9, oldaddr = 0x72d7a, newaddr = 0x72d7a, rw = READAHEAD ^H, type = NODE
f2fs_submit_page_bio: dev = (251,0), ino = 2, page_index = 0xc1f, oldaddr = 0xc1f, newaddr = 0xc1f, rw = READ_SYNC(MP), type = META
f2fs_submit_page_bio: dev = (251,0), ino = 1, page_index = 0x389d, oldaddr = 0x72d7d, newaddr = 0x72d7d, rw = READAHEAD ^H, type = NODE
f2fs_submit_page_bio: dev = (251,0), ino = 1, page_index = 0x3a82, oldaddr = 0x72d7f, newaddr = 0x72d7f, rw = READAHEAD ^H, type = NODE
f2fs_submit_page_bio: dev = (251,0), ino = 1, page_index = 0x3bfa, oldaddr = 0x72d86, newaddr = 0x72d86, rw = READAHEAD ^H, type = NODE
This patch adds one phase that do readahead NAT pages in batch before
readahead node page for more effeciently.
f2fs_submit_page_bio: dev = (251,0), ino = 2, page_index = 0x1952, oldaddr = 0x1952, newaddr = 0x1952, rw = READ_SYNC(MP), type = META
f2fs_submit_page_mbio: dev = (251,0), ino = 2, page_index = 0xc34, oldaddr = 0xc34, newaddr = 0xc34, rw = READ_SYNC(MP), type = META
f2fs_submit_page_mbio: dev = (251,0), ino = 2, page_index = 0xa33, oldaddr = 0xa33, newaddr = 0xa33, rw = READ_SYNC(MP), type = META
f2fs_submit_page_mbio: dev = (251,0), ino = 2, page_index = 0xc30, oldaddr = 0xc30, newaddr = 0xc30, rw = READ_SYNC(MP), type = META
f2fs_submit_page_mbio: dev = (251,0), ino = 2, page_index = 0xc32, oldaddr = 0xc32, newaddr = 0xc32, rw = READ_SYNC(MP), type = META
f2fs_submit_page_mbio: dev = (251,0), ino = 2, page_index = 0xc26, oldaddr = 0xc26, newaddr = 0xc26, rw = READ_SYNC(MP), type = META
f2fs_submit_page_mbio: dev = (251,0), ino = 2, page_index = 0xa2b, oldaddr = 0xa2b, newaddr = 0xa2b, rw = READ_SYNC(MP), type = META
f2fs_submit_page_mbio: dev = (251,0), ino = 2, page_index = 0xc23, oldaddr = 0xc23, newaddr = 0xc23, rw = READ_SYNC(MP), type = META
f2fs_submit_page_mbio: dev = (251,0), ino = 2, page_index = 0xc24, oldaddr = 0xc24, newaddr = 0xc24, rw = READ_SYNC(MP), type = META
f2fs_submit_page_mbio: dev = (251,0), ino = 2, page_index = 0xa10, oldaddr = 0xa10, newaddr = 0xa10, rw = READ_SYNC(MP), type = META
f2fs_submit_page_mbio: dev = (251,0), ino = 2, page_index = 0xc2c, oldaddr = 0xc2c, newaddr = 0xc2c, rw = READ_SYNC(MP), type = META
f2fs_submit_page_bio: dev = (251,0), ino = 1, page_index = 0x5db7, oldaddr = 0x6be00, newaddr = 0x6be00, rw = READAHEAD ^H, type = NODE
f2fs_submit_page_bio: dev = (251,0), ino = 1, page_index = 0x5db9, oldaddr = 0x6be17, newaddr = 0x6be17, rw = READAHEAD ^H, type = NODE
f2fs_submit_page_bio: dev = (251,0), ino = 1, page_index = 0x5dbc, oldaddr = 0x6be1a, newaddr = 0x6be1a, rw = READAHEAD ^H, type = NODE
f2fs_submit_page_bio: dev = (251,0), ino = 1, page_index = 0x5dc3, oldaddr = 0x6be20, newaddr = 0x6be20, rw = READAHEAD ^H, type = NODE
f2fs_submit_page_bio: dev = (251,0), ino = 1, page_index = 0x5dc7, oldaddr = 0x6be24, newaddr = 0x6be24, rw = READAHEAD ^H, type = NODE
f2fs_submit_page_bio: dev = (251,0), ino = 1, page_index = 0x5dc9, oldaddr = 0x6be25, newaddr = 0x6be25, rw = READAHEAD ^H, type = NODE
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-08-27 00:14:31 +08:00
|
|
|
int phase = 0;
|
2018-06-04 23:20:36 +08:00
|
|
|
bool fggc = (gc_type == FG_GC);
|
2018-09-13 07:40:53 +08:00
|
|
|
int submitted = 0;
|
f2fs: support zone capacity less than zone size
NVMe Zoned Namespace devices can have zone-capacity less than zone-size.
Zone-capacity indicates the maximum number of sectors that are usable in
a zone beginning from the first sector of the zone. This makes the sectors
sectors after the zone-capacity till zone-size to be unusable.
This patch set tracks zone-size and zone-capacity in zoned devices and
calculate the usable blocks per segment and usable segments per section.
If zone-capacity is less than zone-size mark only those segments which
start before zone-capacity as free segments. All segments at and beyond
zone-capacity are treated as permanently used segments. In cases where
zone-capacity does not align with segment size the last segment will start
before zone-capacity and end beyond the zone-capacity of the zone. For
such spanning segments only sectors within the zone-capacity are used.
During writes and GC manage the usable segments in a section and usable
blocks per segment. Segments which are beyond zone-capacity are never
allocated, and do not need to be garbage collected, only the segments
which are before zone-capacity needs to garbage collected.
For spanning segments based on the number of usable blocks in that
segment, write to blocks only up to zone-capacity.
Zone-capacity is device specific and cannot be configured by the user.
Since NVMe ZNS device zones are sequentially write only, a block device
with conventional zones or any normal block device is needed along with
the ZNS device for the metadata operations of F2fs.
A typical nvme-cli output of a zoned device shows zone start and capacity
and write pointer as below:
SLBA: 0x0 WP: 0x0 Cap: 0x18800 State: EMPTY Type: SEQWRITE_REQ
SLBA: 0x20000 WP: 0x20000 Cap: 0x18800 State: EMPTY Type: SEQWRITE_REQ
SLBA: 0x40000 WP: 0x40000 Cap: 0x18800 State: EMPTY Type: SEQWRITE_REQ
Here zone size is 64MB, capacity is 49MB, WP is at zone start as the zones
are in EMPTY state. For each zone, only zone start + 49MB is usable area,
any lba/sector after 49MB cannot be read or written to, the drive will fail
any attempts to read/write. So, the second zone starts at 64MB and is
usable till 113MB (64 + 49) and the range between 113 and 128MB is
again unusable. The next zone starts at 128MB, and so on.
Signed-off-by: Aravind Ramesh <aravind.ramesh@wdc.com>
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-07-16 20:56:56 +08:00
|
|
|
unsigned int usable_blks_in_seg = f2fs_usable_blks_in_seg(sbi, segno);
|
f2fs: add garbage collection functions
This adds on-demand and background cleaning functions.
- The basic background cleaning policy is trying to do cleaning jobs as much as
possible whenever the system is idle. Once the background cleaning is done,
the cleaner sleeps an amount of time not to interfere with VFS calls. The time
is dynamically adjusted according to the status of whole segments, which is
decreased when the following conditions are satisfied.
. GC is not conducted currently, and
. IO subsystem is idle by checking the number of requets in bdev's request
list, and
. There are enough dirty segments.
Otherwise, the time is increased incrementally until to the maximum time.
Note that, min and max times are 10 secs and 30 secs by default.
- F2FS adopts a default victim selection policy where background cleaning uses
a cost-benefit algorithm, while on-demand cleaning uses a greedy algorithm.
- The method of moving data during the cleaning is slightly different between
background and on-demand cleaning schemes. In the case of background cleaning,
F2FS loads the data, and marks them as dirty. Then, F2FS expects that the data
will be moved by flusher or VM. In the case of on-demand cleaning, F2FS should
move the data right away.
- In order to identify valid blocks in a victim segment, F2FS scans the bitmap
of the segment managed as an SIT entry.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:13:01 +08:00
|
|
|
|
2015-08-15 05:37:50 +08:00
|
|
|
start_addr = START_BLOCK(sbi, segno);
|
|
|
|
|
f2fs: add garbage collection functions
This adds on-demand and background cleaning functions.
- The basic background cleaning policy is trying to do cleaning jobs as much as
possible whenever the system is idle. Once the background cleaning is done,
the cleaner sleeps an amount of time not to interfere with VFS calls. The time
is dynamically adjusted according to the status of whole segments, which is
decreased when the following conditions are satisfied.
. GC is not conducted currently, and
. IO subsystem is idle by checking the number of requets in bdev's request
list, and
. There are enough dirty segments.
Otherwise, the time is increased incrementally until to the maximum time.
Note that, min and max times are 10 secs and 30 secs by default.
- F2FS adopts a default victim selection policy where background cleaning uses
a cost-benefit algorithm, while on-demand cleaning uses a greedy algorithm.
- The method of moving data during the cleaning is slightly different between
background and on-demand cleaning schemes. In the case of background cleaning,
F2FS loads the data, and marks them as dirty. Then, F2FS expects that the data
will be moved by flusher or VM. In the case of on-demand cleaning, F2FS should
move the data right away.
- In order to identify valid blocks in a victim segment, F2FS scans the bitmap
of the segment managed as an SIT entry.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:13:01 +08:00
|
|
|
next_step:
|
|
|
|
entry = sum;
|
f2fs: give a chance to merge IOs by IO scheduler
Previously, background GC submits many 4KB read requests to load victim blocks
and/or its (i)node blocks.
...
f2fs_gc : f2fs_readpage: ino = 1, page_index = 0xb61, blkaddr = 0x3b964ed
f2fs_gc : block_rq_complete: 8,16 R () 499854968 + 8 [0]
f2fs_gc : f2fs_readpage: ino = 1, page_index = 0xb6f, blkaddr = 0x3b964ee
f2fs_gc : block_rq_complete: 8,16 R () 499854976 + 8 [0]
f2fs_gc : f2fs_readpage: ino = 1, page_index = 0xb79, blkaddr = 0x3b964ef
f2fs_gc : block_rq_complete: 8,16 R () 499854984 + 8 [0]
...
However, by the fact that many IOs are sequential, we can give a chance to merge
the IOs by IO scheduler.
In order to do that, let's use blk_plug.
...
f2fs_gc : f2fs_iget: ino = 143
f2fs_gc : f2fs_readpage: ino = 143, page_index = 0x1c6, blkaddr = 0x2e6ee
f2fs_gc : f2fs_iget: ino = 143
f2fs_gc : f2fs_readpage: ino = 143, page_index = 0x1c7, blkaddr = 0x2e6ef
<idle> : block_rq_complete: 8,16 R () 1519616 + 8 [0]
<idle> : block_rq_complete: 8,16 R () 1519848 + 8 [0]
<idle> : block_rq_complete: 8,16 R () 1520432 + 96 [0]
<idle> : block_rq_complete: 8,16 R () 1520536 + 104 [0]
<idle> : block_rq_complete: 8,16 R () 1521008 + 112 [0]
<idle> : block_rq_complete: 8,16 R () 1521440 + 152 [0]
<idle> : block_rq_complete: 8,16 R () 1521688 + 144 [0]
<idle> : block_rq_complete: 8,16 R () 1522128 + 192 [0]
<idle> : block_rq_complete: 8,16 R () 1523256 + 328 [0]
...
Note that this issue should be addressed in checkpoint, and some readahead
flows too.
Reviewed-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-04-24 12:19:56 +08:00
|
|
|
|
2018-06-04 23:20:36 +08:00
|
|
|
if (fggc && phase == 2)
|
|
|
|
atomic_inc(&sbi->wb_sync_req[NODE]);
|
|
|
|
|
f2fs: support zone capacity less than zone size
NVMe Zoned Namespace devices can have zone-capacity less than zone-size.
Zone-capacity indicates the maximum number of sectors that are usable in
a zone beginning from the first sector of the zone. This makes the sectors
sectors after the zone-capacity till zone-size to be unusable.
This patch set tracks zone-size and zone-capacity in zoned devices and
calculate the usable blocks per segment and usable segments per section.
If zone-capacity is less than zone-size mark only those segments which
start before zone-capacity as free segments. All segments at and beyond
zone-capacity are treated as permanently used segments. In cases where
zone-capacity does not align with segment size the last segment will start
before zone-capacity and end beyond the zone-capacity of the zone. For
such spanning segments only sectors within the zone-capacity are used.
During writes and GC manage the usable segments in a section and usable
blocks per segment. Segments which are beyond zone-capacity are never
allocated, and do not need to be garbage collected, only the segments
which are before zone-capacity needs to garbage collected.
For spanning segments based on the number of usable blocks in that
segment, write to blocks only up to zone-capacity.
Zone-capacity is device specific and cannot be configured by the user.
Since NVMe ZNS device zones are sequentially write only, a block device
with conventional zones or any normal block device is needed along with
the ZNS device for the metadata operations of F2fs.
A typical nvme-cli output of a zoned device shows zone start and capacity
and write pointer as below:
SLBA: 0x0 WP: 0x0 Cap: 0x18800 State: EMPTY Type: SEQWRITE_REQ
SLBA: 0x20000 WP: 0x20000 Cap: 0x18800 State: EMPTY Type: SEQWRITE_REQ
SLBA: 0x40000 WP: 0x40000 Cap: 0x18800 State: EMPTY Type: SEQWRITE_REQ
Here zone size is 64MB, capacity is 49MB, WP is at zone start as the zones
are in EMPTY state. For each zone, only zone start + 49MB is usable area,
any lba/sector after 49MB cannot be read or written to, the drive will fail
any attempts to read/write. So, the second zone starts at 64MB and is
usable till 113MB (64 + 49) and the range between 113 and 128MB is
again unusable. The next zone starts at 128MB, and so on.
Signed-off-by: Aravind Ramesh <aravind.ramesh@wdc.com>
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-07-16 20:56:56 +08:00
|
|
|
for (off = 0; off < usable_blks_in_seg; off++, entry++) {
|
f2fs: add garbage collection functions
This adds on-demand and background cleaning functions.
- The basic background cleaning policy is trying to do cleaning jobs as much as
possible whenever the system is idle. Once the background cleaning is done,
the cleaner sleeps an amount of time not to interfere with VFS calls. The time
is dynamically adjusted according to the status of whole segments, which is
decreased when the following conditions are satisfied.
. GC is not conducted currently, and
. IO subsystem is idle by checking the number of requets in bdev's request
list, and
. There are enough dirty segments.
Otherwise, the time is increased incrementally until to the maximum time.
Note that, min and max times are 10 secs and 30 secs by default.
- F2FS adopts a default victim selection policy where background cleaning uses
a cost-benefit algorithm, while on-demand cleaning uses a greedy algorithm.
- The method of moving data during the cleaning is slightly different between
background and on-demand cleaning schemes. In the case of background cleaning,
F2FS loads the data, and marks them as dirty. Then, F2FS expects that the data
will be moved by flusher or VM. In the case of on-demand cleaning, F2FS should
move the data right away.
- In order to identify valid blocks in a victim segment, F2FS scans the bitmap
of the segment managed as an SIT entry.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:13:01 +08:00
|
|
|
nid_t nid = le32_to_cpu(entry->nid);
|
|
|
|
struct page *node_page;
|
2015-08-15 05:37:50 +08:00
|
|
|
struct node_info ni;
|
2018-09-13 07:40:53 +08:00
|
|
|
int err;
|
f2fs: add garbage collection functions
This adds on-demand and background cleaning functions.
- The basic background cleaning policy is trying to do cleaning jobs as much as
possible whenever the system is idle. Once the background cleaning is done,
the cleaner sleeps an amount of time not to interfere with VFS calls. The time
is dynamically adjusted according to the status of whole segments, which is
decreased when the following conditions are satisfied.
. GC is not conducted currently, and
. IO subsystem is idle by checking the number of requets in bdev's request
list, and
. There are enough dirty segments.
Otherwise, the time is increased incrementally until to the maximum time.
Note that, min and max times are 10 secs and 30 secs by default.
- F2FS adopts a default victim selection policy where background cleaning uses
a cost-benefit algorithm, while on-demand cleaning uses a greedy algorithm.
- The method of moving data during the cleaning is slightly different between
background and on-demand cleaning schemes. In the case of background cleaning,
F2FS loads the data, and marks them as dirty. Then, F2FS expects that the data
will be moved by flusher or VM. In the case of on-demand cleaning, F2FS should
move the data right away.
- In order to identify valid blocks in a victim segment, F2FS scans the bitmap
of the segment managed as an SIT entry.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:13:01 +08:00
|
|
|
|
2013-02-04 14:11:17 +08:00
|
|
|
/* stop BG_GC if there is not enough free sections. */
|
2016-09-02 03:02:51 +08:00
|
|
|
if (gc_type == BG_GC && has_not_enough_free_secs(sbi, 0, 0))
|
2018-09-13 07:40:53 +08:00
|
|
|
return submitted;
|
f2fs: add garbage collection functions
This adds on-demand and background cleaning functions.
- The basic background cleaning policy is trying to do cleaning jobs as much as
possible whenever the system is idle. Once the background cleaning is done,
the cleaner sleeps an amount of time not to interfere with VFS calls. The time
is dynamically adjusted according to the status of whole segments, which is
decreased when the following conditions are satisfied.
. GC is not conducted currently, and
. IO subsystem is idle by checking the number of requets in bdev's request
list, and
. There are enough dirty segments.
Otherwise, the time is increased incrementally until to the maximum time.
Note that, min and max times are 10 secs and 30 secs by default.
- F2FS adopts a default victim selection policy where background cleaning uses
a cost-benefit algorithm, while on-demand cleaning uses a greedy algorithm.
- The method of moving data during the cleaning is slightly different between
background and on-demand cleaning schemes. In the case of background cleaning,
F2FS loads the data, and marks them as dirty. Then, F2FS expects that the data
will be moved by flusher or VM. In the case of on-demand cleaning, F2FS should
move the data right away.
- In order to identify valid blocks in a victim segment, F2FS scans the bitmap
of the segment managed as an SIT entry.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:13:01 +08:00
|
|
|
|
2013-02-04 14:11:17 +08:00
|
|
|
if (check_valid_map(sbi, segno, off) == 0)
|
f2fs: add garbage collection functions
This adds on-demand and background cleaning functions.
- The basic background cleaning policy is trying to do cleaning jobs as much as
possible whenever the system is idle. Once the background cleaning is done,
the cleaner sleeps an amount of time not to interfere with VFS calls. The time
is dynamically adjusted according to the status of whole segments, which is
decreased when the following conditions are satisfied.
. GC is not conducted currently, and
. IO subsystem is idle by checking the number of requets in bdev's request
list, and
. There are enough dirty segments.
Otherwise, the time is increased incrementally until to the maximum time.
Note that, min and max times are 10 secs and 30 secs by default.
- F2FS adopts a default victim selection policy where background cleaning uses
a cost-benefit algorithm, while on-demand cleaning uses a greedy algorithm.
- The method of moving data during the cleaning is slightly different between
background and on-demand cleaning schemes. In the case of background cleaning,
F2FS loads the data, and marks them as dirty. Then, F2FS expects that the data
will be moved by flusher or VM. In the case of on-demand cleaning, F2FS should
move the data right away.
- In order to identify valid blocks in a victim segment, F2FS scans the bitmap
of the segment managed as an SIT entry.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:13:01 +08:00
|
|
|
continue;
|
|
|
|
|
f2fs: do in batch synchronously readahead during GC
In order to enhance performance, we try to readahead node page during
GC, but before loading node page we should get block address of node page
which is stored in NAT table, so synchronously read of single NAT page
block our readahead flow.
f2fs_submit_page_bio: dev = (251,0), ino = 2, page_index = 0xa1e, oldaddr = 0xa1e, newaddr = 0xa1e, rw = READ_SYNC(MP), type = META
f2fs_submit_page_bio: dev = (251,0), ino = 1, page_index = 0x35e9, oldaddr = 0x72d7a, newaddr = 0x72d7a, rw = READAHEAD ^H, type = NODE
f2fs_submit_page_bio: dev = (251,0), ino = 2, page_index = 0xc1f, oldaddr = 0xc1f, newaddr = 0xc1f, rw = READ_SYNC(MP), type = META
f2fs_submit_page_bio: dev = (251,0), ino = 1, page_index = 0x389d, oldaddr = 0x72d7d, newaddr = 0x72d7d, rw = READAHEAD ^H, type = NODE
f2fs_submit_page_bio: dev = (251,0), ino = 1, page_index = 0x3a82, oldaddr = 0x72d7f, newaddr = 0x72d7f, rw = READAHEAD ^H, type = NODE
f2fs_submit_page_bio: dev = (251,0), ino = 1, page_index = 0x3bfa, oldaddr = 0x72d86, newaddr = 0x72d86, rw = READAHEAD ^H, type = NODE
This patch adds one phase that do readahead NAT pages in batch before
readahead node page for more effeciently.
f2fs_submit_page_bio: dev = (251,0), ino = 2, page_index = 0x1952, oldaddr = 0x1952, newaddr = 0x1952, rw = READ_SYNC(MP), type = META
f2fs_submit_page_mbio: dev = (251,0), ino = 2, page_index = 0xc34, oldaddr = 0xc34, newaddr = 0xc34, rw = READ_SYNC(MP), type = META
f2fs_submit_page_mbio: dev = (251,0), ino = 2, page_index = 0xa33, oldaddr = 0xa33, newaddr = 0xa33, rw = READ_SYNC(MP), type = META
f2fs_submit_page_mbio: dev = (251,0), ino = 2, page_index = 0xc30, oldaddr = 0xc30, newaddr = 0xc30, rw = READ_SYNC(MP), type = META
f2fs_submit_page_mbio: dev = (251,0), ino = 2, page_index = 0xc32, oldaddr = 0xc32, newaddr = 0xc32, rw = READ_SYNC(MP), type = META
f2fs_submit_page_mbio: dev = (251,0), ino = 2, page_index = 0xc26, oldaddr = 0xc26, newaddr = 0xc26, rw = READ_SYNC(MP), type = META
f2fs_submit_page_mbio: dev = (251,0), ino = 2, page_index = 0xa2b, oldaddr = 0xa2b, newaddr = 0xa2b, rw = READ_SYNC(MP), type = META
f2fs_submit_page_mbio: dev = (251,0), ino = 2, page_index = 0xc23, oldaddr = 0xc23, newaddr = 0xc23, rw = READ_SYNC(MP), type = META
f2fs_submit_page_mbio: dev = (251,0), ino = 2, page_index = 0xc24, oldaddr = 0xc24, newaddr = 0xc24, rw = READ_SYNC(MP), type = META
f2fs_submit_page_mbio: dev = (251,0), ino = 2, page_index = 0xa10, oldaddr = 0xa10, newaddr = 0xa10, rw = READ_SYNC(MP), type = META
f2fs_submit_page_mbio: dev = (251,0), ino = 2, page_index = 0xc2c, oldaddr = 0xc2c, newaddr = 0xc2c, rw = READ_SYNC(MP), type = META
f2fs_submit_page_bio: dev = (251,0), ino = 1, page_index = 0x5db7, oldaddr = 0x6be00, newaddr = 0x6be00, rw = READAHEAD ^H, type = NODE
f2fs_submit_page_bio: dev = (251,0), ino = 1, page_index = 0x5db9, oldaddr = 0x6be17, newaddr = 0x6be17, rw = READAHEAD ^H, type = NODE
f2fs_submit_page_bio: dev = (251,0), ino = 1, page_index = 0x5dbc, oldaddr = 0x6be1a, newaddr = 0x6be1a, rw = READAHEAD ^H, type = NODE
f2fs_submit_page_bio: dev = (251,0), ino = 1, page_index = 0x5dc3, oldaddr = 0x6be20, newaddr = 0x6be20, rw = READAHEAD ^H, type = NODE
f2fs_submit_page_bio: dev = (251,0), ino = 1, page_index = 0x5dc7, oldaddr = 0x6be24, newaddr = 0x6be24, rw = READAHEAD ^H, type = NODE
f2fs_submit_page_bio: dev = (251,0), ino = 1, page_index = 0x5dc9, oldaddr = 0x6be25, newaddr = 0x6be25, rw = READAHEAD ^H, type = NODE
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-08-27 00:14:31 +08:00
|
|
|
if (phase == 0) {
|
f2fs: clean up symbol namespace
As Ted reported:
"Hi, I was looking at f2fs's sources recently, and I noticed that there
is a very large number of non-static symbols which don't have a f2fs
prefix. There's well over a hundred (see attached below).
As one example, in fs/f2fs/dir.c there is:
unsigned char get_de_type(struct f2fs_dir_entry *de)
This function is clearly only useful for f2fs, but it has a generic
name. This means that if any other file system tries to have the same
symbol name, there will be a symbol conflict and the kernel would not
successfully build. It also means that when someone is looking f2fs
sources, it's not at all obvious whether a function such as
read_data_page(), invalidate_blocks(), is a generic kernel function
found in the fs, mm, or block layers, or a f2fs specific function.
You might want to fix this at some point. Hopefully Kent's bcachefs
isn't similarly using genericly named functions, since that might
cause conflicts with f2fs's functions --- but just as this would be a
problem that we would rightly insist that Kent fix, this is something
that we should have rightly insisted that f2fs should have fixed
before it was integrated into the mainline kernel.
acquire_orphan_inode
add_ino_entry
add_orphan_inode
allocate_data_block
allocate_new_segments
alloc_nid
alloc_nid_done
alloc_nid_failed
available_free_memory
...."
This patch adds "f2fs_" prefix for all non-static symbols in order to:
a) avoid conflict with other kernel generic symbols;
b) to indicate the function is f2fs specific one instead of generic
one;
Reported-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-30 00:20:41 +08:00
|
|
|
f2fs_ra_meta_pages(sbi, NAT_BLOCK_OFFSET(nid), 1,
|
f2fs: do in batch synchronously readahead during GC
In order to enhance performance, we try to readahead node page during
GC, but before loading node page we should get block address of node page
which is stored in NAT table, so synchronously read of single NAT page
block our readahead flow.
f2fs_submit_page_bio: dev = (251,0), ino = 2, page_index = 0xa1e, oldaddr = 0xa1e, newaddr = 0xa1e, rw = READ_SYNC(MP), type = META
f2fs_submit_page_bio: dev = (251,0), ino = 1, page_index = 0x35e9, oldaddr = 0x72d7a, newaddr = 0x72d7a, rw = READAHEAD ^H, type = NODE
f2fs_submit_page_bio: dev = (251,0), ino = 2, page_index = 0xc1f, oldaddr = 0xc1f, newaddr = 0xc1f, rw = READ_SYNC(MP), type = META
f2fs_submit_page_bio: dev = (251,0), ino = 1, page_index = 0x389d, oldaddr = 0x72d7d, newaddr = 0x72d7d, rw = READAHEAD ^H, type = NODE
f2fs_submit_page_bio: dev = (251,0), ino = 1, page_index = 0x3a82, oldaddr = 0x72d7f, newaddr = 0x72d7f, rw = READAHEAD ^H, type = NODE
f2fs_submit_page_bio: dev = (251,0), ino = 1, page_index = 0x3bfa, oldaddr = 0x72d86, newaddr = 0x72d86, rw = READAHEAD ^H, type = NODE
This patch adds one phase that do readahead NAT pages in batch before
readahead node page for more effeciently.
f2fs_submit_page_bio: dev = (251,0), ino = 2, page_index = 0x1952, oldaddr = 0x1952, newaddr = 0x1952, rw = READ_SYNC(MP), type = META
f2fs_submit_page_mbio: dev = (251,0), ino = 2, page_index = 0xc34, oldaddr = 0xc34, newaddr = 0xc34, rw = READ_SYNC(MP), type = META
f2fs_submit_page_mbio: dev = (251,0), ino = 2, page_index = 0xa33, oldaddr = 0xa33, newaddr = 0xa33, rw = READ_SYNC(MP), type = META
f2fs_submit_page_mbio: dev = (251,0), ino = 2, page_index = 0xc30, oldaddr = 0xc30, newaddr = 0xc30, rw = READ_SYNC(MP), type = META
f2fs_submit_page_mbio: dev = (251,0), ino = 2, page_index = 0xc32, oldaddr = 0xc32, newaddr = 0xc32, rw = READ_SYNC(MP), type = META
f2fs_submit_page_mbio: dev = (251,0), ino = 2, page_index = 0xc26, oldaddr = 0xc26, newaddr = 0xc26, rw = READ_SYNC(MP), type = META
f2fs_submit_page_mbio: dev = (251,0), ino = 2, page_index = 0xa2b, oldaddr = 0xa2b, newaddr = 0xa2b, rw = READ_SYNC(MP), type = META
f2fs_submit_page_mbio: dev = (251,0), ino = 2, page_index = 0xc23, oldaddr = 0xc23, newaddr = 0xc23, rw = READ_SYNC(MP), type = META
f2fs_submit_page_mbio: dev = (251,0), ino = 2, page_index = 0xc24, oldaddr = 0xc24, newaddr = 0xc24, rw = READ_SYNC(MP), type = META
f2fs_submit_page_mbio: dev = (251,0), ino = 2, page_index = 0xa10, oldaddr = 0xa10, newaddr = 0xa10, rw = READ_SYNC(MP), type = META
f2fs_submit_page_mbio: dev = (251,0), ino = 2, page_index = 0xc2c, oldaddr = 0xc2c, newaddr = 0xc2c, rw = READ_SYNC(MP), type = META
f2fs_submit_page_bio: dev = (251,0), ino = 1, page_index = 0x5db7, oldaddr = 0x6be00, newaddr = 0x6be00, rw = READAHEAD ^H, type = NODE
f2fs_submit_page_bio: dev = (251,0), ino = 1, page_index = 0x5db9, oldaddr = 0x6be17, newaddr = 0x6be17, rw = READAHEAD ^H, type = NODE
f2fs_submit_page_bio: dev = (251,0), ino = 1, page_index = 0x5dbc, oldaddr = 0x6be1a, newaddr = 0x6be1a, rw = READAHEAD ^H, type = NODE
f2fs_submit_page_bio: dev = (251,0), ino = 1, page_index = 0x5dc3, oldaddr = 0x6be20, newaddr = 0x6be20, rw = READAHEAD ^H, type = NODE
f2fs_submit_page_bio: dev = (251,0), ino = 1, page_index = 0x5dc7, oldaddr = 0x6be24, newaddr = 0x6be24, rw = READAHEAD ^H, type = NODE
f2fs_submit_page_bio: dev = (251,0), ino = 1, page_index = 0x5dc9, oldaddr = 0x6be25, newaddr = 0x6be25, rw = READAHEAD ^H, type = NODE
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-08-27 00:14:31 +08:00
|
|
|
META_NAT, true);
|
|
|
|
continue;
|
|
|
|
}
|
|
|
|
|
|
|
|
if (phase == 1) {
|
f2fs: clean up symbol namespace
As Ted reported:
"Hi, I was looking at f2fs's sources recently, and I noticed that there
is a very large number of non-static symbols which don't have a f2fs
prefix. There's well over a hundred (see attached below).
As one example, in fs/f2fs/dir.c there is:
unsigned char get_de_type(struct f2fs_dir_entry *de)
This function is clearly only useful for f2fs, but it has a generic
name. This means that if any other file system tries to have the same
symbol name, there will be a symbol conflict and the kernel would not
successfully build. It also means that when someone is looking f2fs
sources, it's not at all obvious whether a function such as
read_data_page(), invalidate_blocks(), is a generic kernel function
found in the fs, mm, or block layers, or a f2fs specific function.
You might want to fix this at some point. Hopefully Kent's bcachefs
isn't similarly using genericly named functions, since that might
cause conflicts with f2fs's functions --- but just as this would be a
problem that we would rightly insist that Kent fix, this is something
that we should have rightly insisted that f2fs should have fixed
before it was integrated into the mainline kernel.
acquire_orphan_inode
add_ino_entry
add_orphan_inode
allocate_data_block
allocate_new_segments
alloc_nid
alloc_nid_done
alloc_nid_failed
available_free_memory
...."
This patch adds "f2fs_" prefix for all non-static symbols in order to:
a) avoid conflict with other kernel generic symbols;
b) to indicate the function is f2fs specific one instead of generic
one;
Reported-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-30 00:20:41 +08:00
|
|
|
f2fs_ra_node_page(sbi, nid);
|
f2fs: add garbage collection functions
This adds on-demand and background cleaning functions.
- The basic background cleaning policy is trying to do cleaning jobs as much as
possible whenever the system is idle. Once the background cleaning is done,
the cleaner sleeps an amount of time not to interfere with VFS calls. The time
is dynamically adjusted according to the status of whole segments, which is
decreased when the following conditions are satisfied.
. GC is not conducted currently, and
. IO subsystem is idle by checking the number of requets in bdev's request
list, and
. There are enough dirty segments.
Otherwise, the time is increased incrementally until to the maximum time.
Note that, min and max times are 10 secs and 30 secs by default.
- F2FS adopts a default victim selection policy where background cleaning uses
a cost-benefit algorithm, while on-demand cleaning uses a greedy algorithm.
- The method of moving data during the cleaning is slightly different between
background and on-demand cleaning schemes. In the case of background cleaning,
F2FS loads the data, and marks them as dirty. Then, F2FS expects that the data
will be moved by flusher or VM. In the case of on-demand cleaning, F2FS should
move the data right away.
- In order to identify valid blocks in a victim segment, F2FS scans the bitmap
of the segment managed as an SIT entry.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:13:01 +08:00
|
|
|
continue;
|
|
|
|
}
|
f2fs: do in batch synchronously readahead during GC
In order to enhance performance, we try to readahead node page during
GC, but before loading node page we should get block address of node page
which is stored in NAT table, so synchronously read of single NAT page
block our readahead flow.
f2fs_submit_page_bio: dev = (251,0), ino = 2, page_index = 0xa1e, oldaddr = 0xa1e, newaddr = 0xa1e, rw = READ_SYNC(MP), type = META
f2fs_submit_page_bio: dev = (251,0), ino = 1, page_index = 0x35e9, oldaddr = 0x72d7a, newaddr = 0x72d7a, rw = READAHEAD ^H, type = NODE
f2fs_submit_page_bio: dev = (251,0), ino = 2, page_index = 0xc1f, oldaddr = 0xc1f, newaddr = 0xc1f, rw = READ_SYNC(MP), type = META
f2fs_submit_page_bio: dev = (251,0), ino = 1, page_index = 0x389d, oldaddr = 0x72d7d, newaddr = 0x72d7d, rw = READAHEAD ^H, type = NODE
f2fs_submit_page_bio: dev = (251,0), ino = 1, page_index = 0x3a82, oldaddr = 0x72d7f, newaddr = 0x72d7f, rw = READAHEAD ^H, type = NODE
f2fs_submit_page_bio: dev = (251,0), ino = 1, page_index = 0x3bfa, oldaddr = 0x72d86, newaddr = 0x72d86, rw = READAHEAD ^H, type = NODE
This patch adds one phase that do readahead NAT pages in batch before
readahead node page for more effeciently.
f2fs_submit_page_bio: dev = (251,0), ino = 2, page_index = 0x1952, oldaddr = 0x1952, newaddr = 0x1952, rw = READ_SYNC(MP), type = META
f2fs_submit_page_mbio: dev = (251,0), ino = 2, page_index = 0xc34, oldaddr = 0xc34, newaddr = 0xc34, rw = READ_SYNC(MP), type = META
f2fs_submit_page_mbio: dev = (251,0), ino = 2, page_index = 0xa33, oldaddr = 0xa33, newaddr = 0xa33, rw = READ_SYNC(MP), type = META
f2fs_submit_page_mbio: dev = (251,0), ino = 2, page_index = 0xc30, oldaddr = 0xc30, newaddr = 0xc30, rw = READ_SYNC(MP), type = META
f2fs_submit_page_mbio: dev = (251,0), ino = 2, page_index = 0xc32, oldaddr = 0xc32, newaddr = 0xc32, rw = READ_SYNC(MP), type = META
f2fs_submit_page_mbio: dev = (251,0), ino = 2, page_index = 0xc26, oldaddr = 0xc26, newaddr = 0xc26, rw = READ_SYNC(MP), type = META
f2fs_submit_page_mbio: dev = (251,0), ino = 2, page_index = 0xa2b, oldaddr = 0xa2b, newaddr = 0xa2b, rw = READ_SYNC(MP), type = META
f2fs_submit_page_mbio: dev = (251,0), ino = 2, page_index = 0xc23, oldaddr = 0xc23, newaddr = 0xc23, rw = READ_SYNC(MP), type = META
f2fs_submit_page_mbio: dev = (251,0), ino = 2, page_index = 0xc24, oldaddr = 0xc24, newaddr = 0xc24, rw = READ_SYNC(MP), type = META
f2fs_submit_page_mbio: dev = (251,0), ino = 2, page_index = 0xa10, oldaddr = 0xa10, newaddr = 0xa10, rw = READ_SYNC(MP), type = META
f2fs_submit_page_mbio: dev = (251,0), ino = 2, page_index = 0xc2c, oldaddr = 0xc2c, newaddr = 0xc2c, rw = READ_SYNC(MP), type = META
f2fs_submit_page_bio: dev = (251,0), ino = 1, page_index = 0x5db7, oldaddr = 0x6be00, newaddr = 0x6be00, rw = READAHEAD ^H, type = NODE
f2fs_submit_page_bio: dev = (251,0), ino = 1, page_index = 0x5db9, oldaddr = 0x6be17, newaddr = 0x6be17, rw = READAHEAD ^H, type = NODE
f2fs_submit_page_bio: dev = (251,0), ino = 1, page_index = 0x5dbc, oldaddr = 0x6be1a, newaddr = 0x6be1a, rw = READAHEAD ^H, type = NODE
f2fs_submit_page_bio: dev = (251,0), ino = 1, page_index = 0x5dc3, oldaddr = 0x6be20, newaddr = 0x6be20, rw = READAHEAD ^H, type = NODE
f2fs_submit_page_bio: dev = (251,0), ino = 1, page_index = 0x5dc7, oldaddr = 0x6be24, newaddr = 0x6be24, rw = READAHEAD ^H, type = NODE
f2fs_submit_page_bio: dev = (251,0), ino = 1, page_index = 0x5dc9, oldaddr = 0x6be25, newaddr = 0x6be25, rw = READAHEAD ^H, type = NODE
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-08-27 00:14:31 +08:00
|
|
|
|
|
|
|
/* phase == 2 */
|
f2fs: clean up symbol namespace
As Ted reported:
"Hi, I was looking at f2fs's sources recently, and I noticed that there
is a very large number of non-static symbols which don't have a f2fs
prefix. There's well over a hundred (see attached below).
As one example, in fs/f2fs/dir.c there is:
unsigned char get_de_type(struct f2fs_dir_entry *de)
This function is clearly only useful for f2fs, but it has a generic
name. This means that if any other file system tries to have the same
symbol name, there will be a symbol conflict and the kernel would not
successfully build. It also means that when someone is looking f2fs
sources, it's not at all obvious whether a function such as
read_data_page(), invalidate_blocks(), is a generic kernel function
found in the fs, mm, or block layers, or a f2fs specific function.
You might want to fix this at some point. Hopefully Kent's bcachefs
isn't similarly using genericly named functions, since that might
cause conflicts with f2fs's functions --- but just as this would be a
problem that we would rightly insist that Kent fix, this is something
that we should have rightly insisted that f2fs should have fixed
before it was integrated into the mainline kernel.
acquire_orphan_inode
add_ino_entry
add_orphan_inode
allocate_data_block
allocate_new_segments
alloc_nid
alloc_nid_done
alloc_nid_failed
available_free_memory
...."
This patch adds "f2fs_" prefix for all non-static symbols in order to:
a) avoid conflict with other kernel generic symbols;
b) to indicate the function is f2fs specific one instead of generic
one;
Reported-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-30 00:20:41 +08:00
|
|
|
node_page = f2fs_get_node_page(sbi, nid);
|
f2fs: add garbage collection functions
This adds on-demand and background cleaning functions.
- The basic background cleaning policy is trying to do cleaning jobs as much as
possible whenever the system is idle. Once the background cleaning is done,
the cleaner sleeps an amount of time not to interfere with VFS calls. The time
is dynamically adjusted according to the status of whole segments, which is
decreased when the following conditions are satisfied.
. GC is not conducted currently, and
. IO subsystem is idle by checking the number of requets in bdev's request
list, and
. There are enough dirty segments.
Otherwise, the time is increased incrementally until to the maximum time.
Note that, min and max times are 10 secs and 30 secs by default.
- F2FS adopts a default victim selection policy where background cleaning uses
a cost-benefit algorithm, while on-demand cleaning uses a greedy algorithm.
- The method of moving data during the cleaning is slightly different between
background and on-demand cleaning schemes. In the case of background cleaning,
F2FS loads the data, and marks them as dirty. Then, F2FS expects that the data
will be moved by flusher or VM. In the case of on-demand cleaning, F2FS should
move the data right away.
- In order to identify valid blocks in a victim segment, F2FS scans the bitmap
of the segment managed as an SIT entry.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:13:01 +08:00
|
|
|
if (IS_ERR(node_page))
|
|
|
|
continue;
|
|
|
|
|
f2fs: clean up symbol namespace
As Ted reported:
"Hi, I was looking at f2fs's sources recently, and I noticed that there
is a very large number of non-static symbols which don't have a f2fs
prefix. There's well over a hundred (see attached below).
As one example, in fs/f2fs/dir.c there is:
unsigned char get_de_type(struct f2fs_dir_entry *de)
This function is clearly only useful for f2fs, but it has a generic
name. This means that if any other file system tries to have the same
symbol name, there will be a symbol conflict and the kernel would not
successfully build. It also means that when someone is looking f2fs
sources, it's not at all obvious whether a function such as
read_data_page(), invalidate_blocks(), is a generic kernel function
found in the fs, mm, or block layers, or a f2fs specific function.
You might want to fix this at some point. Hopefully Kent's bcachefs
isn't similarly using genericly named functions, since that might
cause conflicts with f2fs's functions --- but just as this would be a
problem that we would rightly insist that Kent fix, this is something
that we should have rightly insisted that f2fs should have fixed
before it was integrated into the mainline kernel.
acquire_orphan_inode
add_ino_entry
add_orphan_inode
allocate_data_block
allocate_new_segments
alloc_nid
alloc_nid_done
alloc_nid_failed
available_free_memory
...."
This patch adds "f2fs_" prefix for all non-static symbols in order to:
a) avoid conflict with other kernel generic symbols;
b) to indicate the function is f2fs specific one instead of generic
one;
Reported-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-30 00:20:41 +08:00
|
|
|
/* block may become invalid during f2fs_get_node_page */
|
2014-09-07 11:05:20 +08:00
|
|
|
if (check_valid_map(sbi, segno, off) == 0) {
|
|
|
|
f2fs_put_page(node_page, 1);
|
|
|
|
continue;
|
2015-08-15 05:37:50 +08:00
|
|
|
}
|
|
|
|
|
2018-07-17 00:02:17 +08:00
|
|
|
if (f2fs_get_node_info(sbi, nid, &ni)) {
|
|
|
|
f2fs_put_page(node_page, 1);
|
|
|
|
continue;
|
|
|
|
}
|
|
|
|
|
2015-08-15 05:37:50 +08:00
|
|
|
if (ni.blk_addr != start_addr + off) {
|
|
|
|
f2fs_put_page(node_page, 1);
|
|
|
|
continue;
|
2014-09-07 11:05:20 +08:00
|
|
|
}
|
|
|
|
|
2018-09-13 07:40:53 +08:00
|
|
|
err = f2fs_move_node_page(node_page, gc_type);
|
|
|
|
if (!err && gc_type == FG_GC)
|
|
|
|
submitted++;
|
2014-12-23 07:37:39 +08:00
|
|
|
stat_inc_node_blk_count(sbi, 1, gc_type);
|
f2fs: add garbage collection functions
This adds on-demand and background cleaning functions.
- The basic background cleaning policy is trying to do cleaning jobs as much as
possible whenever the system is idle. Once the background cleaning is done,
the cleaner sleeps an amount of time not to interfere with VFS calls. The time
is dynamically adjusted according to the status of whole segments, which is
decreased when the following conditions are satisfied.
. GC is not conducted currently, and
. IO subsystem is idle by checking the number of requets in bdev's request
list, and
. There are enough dirty segments.
Otherwise, the time is increased incrementally until to the maximum time.
Note that, min and max times are 10 secs and 30 secs by default.
- F2FS adopts a default victim selection policy where background cleaning uses
a cost-benefit algorithm, while on-demand cleaning uses a greedy algorithm.
- The method of moving data during the cleaning is slightly different between
background and on-demand cleaning schemes. In the case of background cleaning,
F2FS loads the data, and marks them as dirty. Then, F2FS expects that the data
will be moved by flusher or VM. In the case of on-demand cleaning, F2FS should
move the data right away.
- In order to identify valid blocks in a victim segment, F2FS scans the bitmap
of the segment managed as an SIT entry.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:13:01 +08:00
|
|
|
}
|
f2fs: give a chance to merge IOs by IO scheduler
Previously, background GC submits many 4KB read requests to load victim blocks
and/or its (i)node blocks.
...
f2fs_gc : f2fs_readpage: ino = 1, page_index = 0xb61, blkaddr = 0x3b964ed
f2fs_gc : block_rq_complete: 8,16 R () 499854968 + 8 [0]
f2fs_gc : f2fs_readpage: ino = 1, page_index = 0xb6f, blkaddr = 0x3b964ee
f2fs_gc : block_rq_complete: 8,16 R () 499854976 + 8 [0]
f2fs_gc : f2fs_readpage: ino = 1, page_index = 0xb79, blkaddr = 0x3b964ef
f2fs_gc : block_rq_complete: 8,16 R () 499854984 + 8 [0]
...
However, by the fact that many IOs are sequential, we can give a chance to merge
the IOs by IO scheduler.
In order to do that, let's use blk_plug.
...
f2fs_gc : f2fs_iget: ino = 143
f2fs_gc : f2fs_readpage: ino = 143, page_index = 0x1c6, blkaddr = 0x2e6ee
f2fs_gc : f2fs_iget: ino = 143
f2fs_gc : f2fs_readpage: ino = 143, page_index = 0x1c7, blkaddr = 0x2e6ef
<idle> : block_rq_complete: 8,16 R () 1519616 + 8 [0]
<idle> : block_rq_complete: 8,16 R () 1519848 + 8 [0]
<idle> : block_rq_complete: 8,16 R () 1520432 + 96 [0]
<idle> : block_rq_complete: 8,16 R () 1520536 + 104 [0]
<idle> : block_rq_complete: 8,16 R () 1521008 + 112 [0]
<idle> : block_rq_complete: 8,16 R () 1521440 + 152 [0]
<idle> : block_rq_complete: 8,16 R () 1521688 + 144 [0]
<idle> : block_rq_complete: 8,16 R () 1522128 + 192 [0]
<idle> : block_rq_complete: 8,16 R () 1523256 + 328 [0]
...
Note that this issue should be addressed in checkpoint, and some readahead
flows too.
Reviewed-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-04-24 12:19:56 +08:00
|
|
|
|
f2fs: do in batch synchronously readahead during GC
In order to enhance performance, we try to readahead node page during
GC, but before loading node page we should get block address of node page
which is stored in NAT table, so synchronously read of single NAT page
block our readahead flow.
f2fs_submit_page_bio: dev = (251,0), ino = 2, page_index = 0xa1e, oldaddr = 0xa1e, newaddr = 0xa1e, rw = READ_SYNC(MP), type = META
f2fs_submit_page_bio: dev = (251,0), ino = 1, page_index = 0x35e9, oldaddr = 0x72d7a, newaddr = 0x72d7a, rw = READAHEAD ^H, type = NODE
f2fs_submit_page_bio: dev = (251,0), ino = 2, page_index = 0xc1f, oldaddr = 0xc1f, newaddr = 0xc1f, rw = READ_SYNC(MP), type = META
f2fs_submit_page_bio: dev = (251,0), ino = 1, page_index = 0x389d, oldaddr = 0x72d7d, newaddr = 0x72d7d, rw = READAHEAD ^H, type = NODE
f2fs_submit_page_bio: dev = (251,0), ino = 1, page_index = 0x3a82, oldaddr = 0x72d7f, newaddr = 0x72d7f, rw = READAHEAD ^H, type = NODE
f2fs_submit_page_bio: dev = (251,0), ino = 1, page_index = 0x3bfa, oldaddr = 0x72d86, newaddr = 0x72d86, rw = READAHEAD ^H, type = NODE
This patch adds one phase that do readahead NAT pages in batch before
readahead node page for more effeciently.
f2fs_submit_page_bio: dev = (251,0), ino = 2, page_index = 0x1952, oldaddr = 0x1952, newaddr = 0x1952, rw = READ_SYNC(MP), type = META
f2fs_submit_page_mbio: dev = (251,0), ino = 2, page_index = 0xc34, oldaddr = 0xc34, newaddr = 0xc34, rw = READ_SYNC(MP), type = META
f2fs_submit_page_mbio: dev = (251,0), ino = 2, page_index = 0xa33, oldaddr = 0xa33, newaddr = 0xa33, rw = READ_SYNC(MP), type = META
f2fs_submit_page_mbio: dev = (251,0), ino = 2, page_index = 0xc30, oldaddr = 0xc30, newaddr = 0xc30, rw = READ_SYNC(MP), type = META
f2fs_submit_page_mbio: dev = (251,0), ino = 2, page_index = 0xc32, oldaddr = 0xc32, newaddr = 0xc32, rw = READ_SYNC(MP), type = META
f2fs_submit_page_mbio: dev = (251,0), ino = 2, page_index = 0xc26, oldaddr = 0xc26, newaddr = 0xc26, rw = READ_SYNC(MP), type = META
f2fs_submit_page_mbio: dev = (251,0), ino = 2, page_index = 0xa2b, oldaddr = 0xa2b, newaddr = 0xa2b, rw = READ_SYNC(MP), type = META
f2fs_submit_page_mbio: dev = (251,0), ino = 2, page_index = 0xc23, oldaddr = 0xc23, newaddr = 0xc23, rw = READ_SYNC(MP), type = META
f2fs_submit_page_mbio: dev = (251,0), ino = 2, page_index = 0xc24, oldaddr = 0xc24, newaddr = 0xc24, rw = READ_SYNC(MP), type = META
f2fs_submit_page_mbio: dev = (251,0), ino = 2, page_index = 0xa10, oldaddr = 0xa10, newaddr = 0xa10, rw = READ_SYNC(MP), type = META
f2fs_submit_page_mbio: dev = (251,0), ino = 2, page_index = 0xc2c, oldaddr = 0xc2c, newaddr = 0xc2c, rw = READ_SYNC(MP), type = META
f2fs_submit_page_bio: dev = (251,0), ino = 1, page_index = 0x5db7, oldaddr = 0x6be00, newaddr = 0x6be00, rw = READAHEAD ^H, type = NODE
f2fs_submit_page_bio: dev = (251,0), ino = 1, page_index = 0x5db9, oldaddr = 0x6be17, newaddr = 0x6be17, rw = READAHEAD ^H, type = NODE
f2fs_submit_page_bio: dev = (251,0), ino = 1, page_index = 0x5dbc, oldaddr = 0x6be1a, newaddr = 0x6be1a, rw = READAHEAD ^H, type = NODE
f2fs_submit_page_bio: dev = (251,0), ino = 1, page_index = 0x5dc3, oldaddr = 0x6be20, newaddr = 0x6be20, rw = READAHEAD ^H, type = NODE
f2fs_submit_page_bio: dev = (251,0), ino = 1, page_index = 0x5dc7, oldaddr = 0x6be24, newaddr = 0x6be24, rw = READAHEAD ^H, type = NODE
f2fs_submit_page_bio: dev = (251,0), ino = 1, page_index = 0x5dc9, oldaddr = 0x6be25, newaddr = 0x6be25, rw = READAHEAD ^H, type = NODE
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-08-27 00:14:31 +08:00
|
|
|
if (++phase < 3)
|
f2fs: add garbage collection functions
This adds on-demand and background cleaning functions.
- The basic background cleaning policy is trying to do cleaning jobs as much as
possible whenever the system is idle. Once the background cleaning is done,
the cleaner sleeps an amount of time not to interfere with VFS calls. The time
is dynamically adjusted according to the status of whole segments, which is
decreased when the following conditions are satisfied.
. GC is not conducted currently, and
. IO subsystem is idle by checking the number of requets in bdev's request
list, and
. There are enough dirty segments.
Otherwise, the time is increased incrementally until to the maximum time.
Note that, min and max times are 10 secs and 30 secs by default.
- F2FS adopts a default victim selection policy where background cleaning uses
a cost-benefit algorithm, while on-demand cleaning uses a greedy algorithm.
- The method of moving data during the cleaning is slightly different between
background and on-demand cleaning schemes. In the case of background cleaning,
F2FS loads the data, and marks them as dirty. Then, F2FS expects that the data
will be moved by flusher or VM. In the case of on-demand cleaning, F2FS should
move the data right away.
- In order to identify valid blocks in a victim segment, F2FS scans the bitmap
of the segment managed as an SIT entry.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:13:01 +08:00
|
|
|
goto next_step;
|
2018-06-04 23:20:36 +08:00
|
|
|
|
|
|
|
if (fggc)
|
|
|
|
atomic_dec(&sbi->wb_sync_req[NODE]);
|
2018-09-13 07:40:53 +08:00
|
|
|
return submitted;
|
f2fs: add garbage collection functions
This adds on-demand and background cleaning functions.
- The basic background cleaning policy is trying to do cleaning jobs as much as
possible whenever the system is idle. Once the background cleaning is done,
the cleaner sleeps an amount of time not to interfere with VFS calls. The time
is dynamically adjusted according to the status of whole segments, which is
decreased when the following conditions are satisfied.
. GC is not conducted currently, and
. IO subsystem is idle by checking the number of requets in bdev's request
list, and
. There are enough dirty segments.
Otherwise, the time is increased incrementally until to the maximum time.
Note that, min and max times are 10 secs and 30 secs by default.
- F2FS adopts a default victim selection policy where background cleaning uses
a cost-benefit algorithm, while on-demand cleaning uses a greedy algorithm.
- The method of moving data during the cleaning is slightly different between
background and on-demand cleaning schemes. In the case of background cleaning,
F2FS loads the data, and marks them as dirty. Then, F2FS expects that the data
will be moved by flusher or VM. In the case of on-demand cleaning, F2FS should
move the data right away.
- In order to identify valid blocks in a victim segment, F2FS scans the bitmap
of the segment managed as an SIT entry.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:13:01 +08:00
|
|
|
}
|
|
|
|
|
2012-11-29 12:28:09 +08:00
|
|
|
/*
|
2013-01-21 16:34:21 +08:00
|
|
|
* Calculate start block index indicating the given node offset.
|
|
|
|
* Be careful, caller should give this node offset only indicating direct node
|
|
|
|
* blocks. If any node offsets, which point the other types of node blocks such
|
|
|
|
* as indirect or double indirect node blocks, are given, it must be a caller's
|
|
|
|
* bug.
|
f2fs: add garbage collection functions
This adds on-demand and background cleaning functions.
- The basic background cleaning policy is trying to do cleaning jobs as much as
possible whenever the system is idle. Once the background cleaning is done,
the cleaner sleeps an amount of time not to interfere with VFS calls. The time
is dynamically adjusted according to the status of whole segments, which is
decreased when the following conditions are satisfied.
. GC is not conducted currently, and
. IO subsystem is idle by checking the number of requets in bdev's request
list, and
. There are enough dirty segments.
Otherwise, the time is increased incrementally until to the maximum time.
Note that, min and max times are 10 secs and 30 secs by default.
- F2FS adopts a default victim selection policy where background cleaning uses
a cost-benefit algorithm, while on-demand cleaning uses a greedy algorithm.
- The method of moving data during the cleaning is slightly different between
background and on-demand cleaning schemes. In the case of background cleaning,
F2FS loads the data, and marks them as dirty. Then, F2FS expects that the data
will be moved by flusher or VM. In the case of on-demand cleaning, F2FS should
move the data right away.
- In order to identify valid blocks in a victim segment, F2FS scans the bitmap
of the segment managed as an SIT entry.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:13:01 +08:00
|
|
|
*/
|
f2fs: clean up symbol namespace
As Ted reported:
"Hi, I was looking at f2fs's sources recently, and I noticed that there
is a very large number of non-static symbols which don't have a f2fs
prefix. There's well over a hundred (see attached below).
As one example, in fs/f2fs/dir.c there is:
unsigned char get_de_type(struct f2fs_dir_entry *de)
This function is clearly only useful for f2fs, but it has a generic
name. This means that if any other file system tries to have the same
symbol name, there will be a symbol conflict and the kernel would not
successfully build. It also means that when someone is looking f2fs
sources, it's not at all obvious whether a function such as
read_data_page(), invalidate_blocks(), is a generic kernel function
found in the fs, mm, or block layers, or a f2fs specific function.
You might want to fix this at some point. Hopefully Kent's bcachefs
isn't similarly using genericly named functions, since that might
cause conflicts with f2fs's functions --- but just as this would be a
problem that we would rightly insist that Kent fix, this is something
that we should have rightly insisted that f2fs should have fixed
before it was integrated into the mainline kernel.
acquire_orphan_inode
add_ino_entry
add_orphan_inode
allocate_data_block
allocate_new_segments
alloc_nid
alloc_nid_done
alloc_nid_failed
available_free_memory
...."
This patch adds "f2fs_" prefix for all non-static symbols in order to:
a) avoid conflict with other kernel generic symbols;
b) to indicate the function is f2fs specific one instead of generic
one;
Reported-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-30 00:20:41 +08:00
|
|
|
block_t f2fs_start_bidx_of_node(unsigned int node_ofs, struct inode *inode)
|
f2fs: add garbage collection functions
This adds on-demand and background cleaning functions.
- The basic background cleaning policy is trying to do cleaning jobs as much as
possible whenever the system is idle. Once the background cleaning is done,
the cleaner sleeps an amount of time not to interfere with VFS calls. The time
is dynamically adjusted according to the status of whole segments, which is
decreased when the following conditions are satisfied.
. GC is not conducted currently, and
. IO subsystem is idle by checking the number of requets in bdev's request
list, and
. There are enough dirty segments.
Otherwise, the time is increased incrementally until to the maximum time.
Note that, min and max times are 10 secs and 30 secs by default.
- F2FS adopts a default victim selection policy where background cleaning uses
a cost-benefit algorithm, while on-demand cleaning uses a greedy algorithm.
- The method of moving data during the cleaning is slightly different between
background and on-demand cleaning schemes. In the case of background cleaning,
F2FS loads the data, and marks them as dirty. Then, F2FS expects that the data
will be moved by flusher or VM. In the case of on-demand cleaning, F2FS should
move the data right away.
- In order to identify valid blocks in a victim segment, F2FS scans the bitmap
of the segment managed as an SIT entry.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:13:01 +08:00
|
|
|
{
|
2012-12-26 11:03:22 +08:00
|
|
|
unsigned int indirect_blks = 2 * NIDS_PER_BLOCK + 4;
|
|
|
|
unsigned int bidx;
|
f2fs: add garbage collection functions
This adds on-demand and background cleaning functions.
- The basic background cleaning policy is trying to do cleaning jobs as much as
possible whenever the system is idle. Once the background cleaning is done,
the cleaner sleeps an amount of time not to interfere with VFS calls. The time
is dynamically adjusted according to the status of whole segments, which is
decreased when the following conditions are satisfied.
. GC is not conducted currently, and
. IO subsystem is idle by checking the number of requets in bdev's request
list, and
. There are enough dirty segments.
Otherwise, the time is increased incrementally until to the maximum time.
Note that, min and max times are 10 secs and 30 secs by default.
- F2FS adopts a default victim selection policy where background cleaning uses
a cost-benefit algorithm, while on-demand cleaning uses a greedy algorithm.
- The method of moving data during the cleaning is slightly different between
background and on-demand cleaning schemes. In the case of background cleaning,
F2FS loads the data, and marks them as dirty. Then, F2FS expects that the data
will be moved by flusher or VM. In the case of on-demand cleaning, F2FS should
move the data right away.
- In order to identify valid blocks in a victim segment, F2FS scans the bitmap
of the segment managed as an SIT entry.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:13:01 +08:00
|
|
|
|
2012-12-26 11:03:22 +08:00
|
|
|
if (node_ofs == 0)
|
|
|
|
return 0;
|
f2fs: add garbage collection functions
This adds on-demand and background cleaning functions.
- The basic background cleaning policy is trying to do cleaning jobs as much as
possible whenever the system is idle. Once the background cleaning is done,
the cleaner sleeps an amount of time not to interfere with VFS calls. The time
is dynamically adjusted according to the status of whole segments, which is
decreased when the following conditions are satisfied.
. GC is not conducted currently, and
. IO subsystem is idle by checking the number of requets in bdev's request
list, and
. There are enough dirty segments.
Otherwise, the time is increased incrementally until to the maximum time.
Note that, min and max times are 10 secs and 30 secs by default.
- F2FS adopts a default victim selection policy where background cleaning uses
a cost-benefit algorithm, while on-demand cleaning uses a greedy algorithm.
- The method of moving data during the cleaning is slightly different between
background and on-demand cleaning schemes. In the case of background cleaning,
F2FS loads the data, and marks them as dirty. Then, F2FS expects that the data
will be moved by flusher or VM. In the case of on-demand cleaning, F2FS should
move the data right away.
- In order to identify valid blocks in a victim segment, F2FS scans the bitmap
of the segment managed as an SIT entry.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:13:01 +08:00
|
|
|
|
2012-12-26 11:03:22 +08:00
|
|
|
if (node_ofs <= 2) {
|
f2fs: add garbage collection functions
This adds on-demand and background cleaning functions.
- The basic background cleaning policy is trying to do cleaning jobs as much as
possible whenever the system is idle. Once the background cleaning is done,
the cleaner sleeps an amount of time not to interfere with VFS calls. The time
is dynamically adjusted according to the status of whole segments, which is
decreased when the following conditions are satisfied.
. GC is not conducted currently, and
. IO subsystem is idle by checking the number of requets in bdev's request
list, and
. There are enough dirty segments.
Otherwise, the time is increased incrementally until to the maximum time.
Note that, min and max times are 10 secs and 30 secs by default.
- F2FS adopts a default victim selection policy where background cleaning uses
a cost-benefit algorithm, while on-demand cleaning uses a greedy algorithm.
- The method of moving data during the cleaning is slightly different between
background and on-demand cleaning schemes. In the case of background cleaning,
F2FS loads the data, and marks them as dirty. Then, F2FS expects that the data
will be moved by flusher or VM. In the case of on-demand cleaning, F2FS should
move the data right away.
- In order to identify valid blocks in a victim segment, F2FS scans the bitmap
of the segment managed as an SIT entry.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:13:01 +08:00
|
|
|
bidx = node_ofs - 1;
|
|
|
|
} else if (node_ofs <= indirect_blks) {
|
2012-12-26 11:03:22 +08:00
|
|
|
int dec = (node_ofs - 4) / (NIDS_PER_BLOCK + 1);
|
f2fs: add garbage collection functions
This adds on-demand and background cleaning functions.
- The basic background cleaning policy is trying to do cleaning jobs as much as
possible whenever the system is idle. Once the background cleaning is done,
the cleaner sleeps an amount of time not to interfere with VFS calls. The time
is dynamically adjusted according to the status of whole segments, which is
decreased when the following conditions are satisfied.
. GC is not conducted currently, and
. IO subsystem is idle by checking the number of requets in bdev's request
list, and
. There are enough dirty segments.
Otherwise, the time is increased incrementally until to the maximum time.
Note that, min and max times are 10 secs and 30 secs by default.
- F2FS adopts a default victim selection policy where background cleaning uses
a cost-benefit algorithm, while on-demand cleaning uses a greedy algorithm.
- The method of moving data during the cleaning is slightly different between
background and on-demand cleaning schemes. In the case of background cleaning,
F2FS loads the data, and marks them as dirty. Then, F2FS expects that the data
will be moved by flusher or VM. In the case of on-demand cleaning, F2FS should
move the data right away.
- In order to identify valid blocks in a victim segment, F2FS scans the bitmap
of the segment managed as an SIT entry.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:13:01 +08:00
|
|
|
bidx = node_ofs - 2 - dec;
|
|
|
|
} else {
|
2012-12-26 11:03:22 +08:00
|
|
|
int dec = (node_ofs - indirect_blks - 3) / (NIDS_PER_BLOCK + 1);
|
f2fs: add garbage collection functions
This adds on-demand and background cleaning functions.
- The basic background cleaning policy is trying to do cleaning jobs as much as
possible whenever the system is idle. Once the background cleaning is done,
the cleaner sleeps an amount of time not to interfere with VFS calls. The time
is dynamically adjusted according to the status of whole segments, which is
decreased when the following conditions are satisfied.
. GC is not conducted currently, and
. IO subsystem is idle by checking the number of requets in bdev's request
list, and
. There are enough dirty segments.
Otherwise, the time is increased incrementally until to the maximum time.
Note that, min and max times are 10 secs and 30 secs by default.
- F2FS adopts a default victim selection policy where background cleaning uses
a cost-benefit algorithm, while on-demand cleaning uses a greedy algorithm.
- The method of moving data during the cleaning is slightly different between
background and on-demand cleaning schemes. In the case of background cleaning,
F2FS loads the data, and marks them as dirty. Then, F2FS expects that the data
will be moved by flusher or VM. In the case of on-demand cleaning, F2FS should
move the data right away.
- In order to identify valid blocks in a victim segment, F2FS scans the bitmap
of the segment managed as an SIT entry.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:13:01 +08:00
|
|
|
bidx = node_ofs - 5 - dec;
|
|
|
|
}
|
2019-03-25 21:08:19 +08:00
|
|
|
return bidx * ADDRS_PER_BLOCK(inode) + ADDRS_PER_INODE(inode);
|
f2fs: add garbage collection functions
This adds on-demand and background cleaning functions.
- The basic background cleaning policy is trying to do cleaning jobs as much as
possible whenever the system is idle. Once the background cleaning is done,
the cleaner sleeps an amount of time not to interfere with VFS calls. The time
is dynamically adjusted according to the status of whole segments, which is
decreased when the following conditions are satisfied.
. GC is not conducted currently, and
. IO subsystem is idle by checking the number of requets in bdev's request
list, and
. There are enough dirty segments.
Otherwise, the time is increased incrementally until to the maximum time.
Note that, min and max times are 10 secs and 30 secs by default.
- F2FS adopts a default victim selection policy where background cleaning uses
a cost-benefit algorithm, while on-demand cleaning uses a greedy algorithm.
- The method of moving data during the cleaning is slightly different between
background and on-demand cleaning schemes. In the case of background cleaning,
F2FS loads the data, and marks them as dirty. Then, F2FS expects that the data
will be moved by flusher or VM. In the case of on-demand cleaning, F2FS should
move the data right away.
- In order to identify valid blocks in a victim segment, F2FS scans the bitmap
of the segment managed as an SIT entry.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:13:01 +08:00
|
|
|
}
|
|
|
|
|
2015-07-01 09:37:21 +08:00
|
|
|
static bool is_alive(struct f2fs_sb_info *sbi, struct f2fs_summary *sum,
|
f2fs: add garbage collection functions
This adds on-demand and background cleaning functions.
- The basic background cleaning policy is trying to do cleaning jobs as much as
possible whenever the system is idle. Once the background cleaning is done,
the cleaner sleeps an amount of time not to interfere with VFS calls. The time
is dynamically adjusted according to the status of whole segments, which is
decreased when the following conditions are satisfied.
. GC is not conducted currently, and
. IO subsystem is idle by checking the number of requets in bdev's request
list, and
. There are enough dirty segments.
Otherwise, the time is increased incrementally until to the maximum time.
Note that, min and max times are 10 secs and 30 secs by default.
- F2FS adopts a default victim selection policy where background cleaning uses
a cost-benefit algorithm, while on-demand cleaning uses a greedy algorithm.
- The method of moving data during the cleaning is slightly different between
background and on-demand cleaning schemes. In the case of background cleaning,
F2FS loads the data, and marks them as dirty. Then, F2FS expects that the data
will be moved by flusher or VM. In the case of on-demand cleaning, F2FS should
move the data right away.
- In order to identify valid blocks in a victim segment, F2FS scans the bitmap
of the segment managed as an SIT entry.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:13:01 +08:00
|
|
|
struct node_info *dni, block_t blkaddr, unsigned int *nofs)
|
|
|
|
{
|
|
|
|
struct page *node_page;
|
|
|
|
nid_t nid;
|
|
|
|
unsigned int ofs_in_node;
|
|
|
|
block_t source_blkaddr;
|
|
|
|
|
|
|
|
nid = le32_to_cpu(sum->nid);
|
|
|
|
ofs_in_node = le16_to_cpu(sum->ofs_in_node);
|
|
|
|
|
f2fs: clean up symbol namespace
As Ted reported:
"Hi, I was looking at f2fs's sources recently, and I noticed that there
is a very large number of non-static symbols which don't have a f2fs
prefix. There's well over a hundred (see attached below).
As one example, in fs/f2fs/dir.c there is:
unsigned char get_de_type(struct f2fs_dir_entry *de)
This function is clearly only useful for f2fs, but it has a generic
name. This means that if any other file system tries to have the same
symbol name, there will be a symbol conflict and the kernel would not
successfully build. It also means that when someone is looking f2fs
sources, it's not at all obvious whether a function such as
read_data_page(), invalidate_blocks(), is a generic kernel function
found in the fs, mm, or block layers, or a f2fs specific function.
You might want to fix this at some point. Hopefully Kent's bcachefs
isn't similarly using genericly named functions, since that might
cause conflicts with f2fs's functions --- but just as this would be a
problem that we would rightly insist that Kent fix, this is something
that we should have rightly insisted that f2fs should have fixed
before it was integrated into the mainline kernel.
acquire_orphan_inode
add_ino_entry
add_orphan_inode
allocate_data_block
allocate_new_segments
alloc_nid
alloc_nid_done
alloc_nid_failed
available_free_memory
...."
This patch adds "f2fs_" prefix for all non-static symbols in order to:
a) avoid conflict with other kernel generic symbols;
b) to indicate the function is f2fs specific one instead of generic
one;
Reported-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-30 00:20:41 +08:00
|
|
|
node_page = f2fs_get_node_page(sbi, nid);
|
f2fs: add garbage collection functions
This adds on-demand and background cleaning functions.
- The basic background cleaning policy is trying to do cleaning jobs as much as
possible whenever the system is idle. Once the background cleaning is done,
the cleaner sleeps an amount of time not to interfere with VFS calls. The time
is dynamically adjusted according to the status of whole segments, which is
decreased when the following conditions are satisfied.
. GC is not conducted currently, and
. IO subsystem is idle by checking the number of requets in bdev's request
list, and
. There are enough dirty segments.
Otherwise, the time is increased incrementally until to the maximum time.
Note that, min and max times are 10 secs and 30 secs by default.
- F2FS adopts a default victim selection policy where background cleaning uses
a cost-benefit algorithm, while on-demand cleaning uses a greedy algorithm.
- The method of moving data during the cleaning is slightly different between
background and on-demand cleaning schemes. In the case of background cleaning,
F2FS loads the data, and marks them as dirty. Then, F2FS expects that the data
will be moved by flusher or VM. In the case of on-demand cleaning, F2FS should
move the data right away.
- In order to identify valid blocks in a victim segment, F2FS scans the bitmap
of the segment managed as an SIT entry.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:13:01 +08:00
|
|
|
if (IS_ERR(node_page))
|
2015-07-01 09:37:21 +08:00
|
|
|
return false;
|
f2fs: add garbage collection functions
This adds on-demand and background cleaning functions.
- The basic background cleaning policy is trying to do cleaning jobs as much as
possible whenever the system is idle. Once the background cleaning is done,
the cleaner sleeps an amount of time not to interfere with VFS calls. The time
is dynamically adjusted according to the status of whole segments, which is
decreased when the following conditions are satisfied.
. GC is not conducted currently, and
. IO subsystem is idle by checking the number of requets in bdev's request
list, and
. There are enough dirty segments.
Otherwise, the time is increased incrementally until to the maximum time.
Note that, min and max times are 10 secs and 30 secs by default.
- F2FS adopts a default victim selection policy where background cleaning uses
a cost-benefit algorithm, while on-demand cleaning uses a greedy algorithm.
- The method of moving data during the cleaning is slightly different between
background and on-demand cleaning schemes. In the case of background cleaning,
F2FS loads the data, and marks them as dirty. Then, F2FS expects that the data
will be moved by flusher or VM. In the case of on-demand cleaning, F2FS should
move the data right away.
- In order to identify valid blocks in a victim segment, F2FS scans the bitmap
of the segment managed as an SIT entry.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:13:01 +08:00
|
|
|
|
2018-07-17 00:02:17 +08:00
|
|
|
if (f2fs_get_node_info(sbi, nid, dni)) {
|
|
|
|
f2fs_put_page(node_page, 1);
|
|
|
|
return false;
|
|
|
|
}
|
f2fs: add garbage collection functions
This adds on-demand and background cleaning functions.
- The basic background cleaning policy is trying to do cleaning jobs as much as
possible whenever the system is idle. Once the background cleaning is done,
the cleaner sleeps an amount of time not to interfere with VFS calls. The time
is dynamically adjusted according to the status of whole segments, which is
decreased when the following conditions are satisfied.
. GC is not conducted currently, and
. IO subsystem is idle by checking the number of requets in bdev's request
list, and
. There are enough dirty segments.
Otherwise, the time is increased incrementally until to the maximum time.
Note that, min and max times are 10 secs and 30 secs by default.
- F2FS adopts a default victim selection policy where background cleaning uses
a cost-benefit algorithm, while on-demand cleaning uses a greedy algorithm.
- The method of moving data during the cleaning is slightly different between
background and on-demand cleaning schemes. In the case of background cleaning,
F2FS loads the data, and marks them as dirty. Then, F2FS expects that the data
will be moved by flusher or VM. In the case of on-demand cleaning, F2FS should
move the data right away.
- In order to identify valid blocks in a victim segment, F2FS scans the bitmap
of the segment managed as an SIT entry.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:13:01 +08:00
|
|
|
|
|
|
|
if (sum->version != dni->version) {
|
2019-06-18 17:48:42 +08:00
|
|
|
f2fs_warn(sbi, "%s: valid data with mismatched node version.",
|
|
|
|
__func__);
|
f2fs: relax node version check for victim data in gc
- has_not_enough_free_secs
node_secs: 0 dent_secs: 0 freed:0 free_segments:103 reserved:104
- f2fs_gc
- get_victim_by_default
alloc_mode 0, gc_mode 1, max_search 2672, offset 4654, ofs_unit 1
- do_garbage_collect
start_segno 3976, end_segno 3977 type 0
- is_alive
nid 22797, blkaddr 2131882, ofs_in_node 0, version 0x8/0x0
- gc_data_segment 766, segno 3976, block 512/426 not alive
So, this patch fixes subtle corrupted case where node version does not match
to summary version which results in infinite loop by gc.
Reported-by: Yunlei He <heyunlei@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-03-21 22:59:50 +08:00
|
|
|
set_sbi_flag(sbi, SBI_NEED_FSCK);
|
f2fs: add garbage collection functions
This adds on-demand and background cleaning functions.
- The basic background cleaning policy is trying to do cleaning jobs as much as
possible whenever the system is idle. Once the background cleaning is done,
the cleaner sleeps an amount of time not to interfere with VFS calls. The time
is dynamically adjusted according to the status of whole segments, which is
decreased when the following conditions are satisfied.
. GC is not conducted currently, and
. IO subsystem is idle by checking the number of requets in bdev's request
list, and
. There are enough dirty segments.
Otherwise, the time is increased incrementally until to the maximum time.
Note that, min and max times are 10 secs and 30 secs by default.
- F2FS adopts a default victim selection policy where background cleaning uses
a cost-benefit algorithm, while on-demand cleaning uses a greedy algorithm.
- The method of moving data during the cleaning is slightly different between
background and on-demand cleaning schemes. In the case of background cleaning,
F2FS loads the data, and marks them as dirty. Then, F2FS expects that the data
will be moved by flusher or VM. In the case of on-demand cleaning, F2FS should
move the data right away.
- In order to identify valid blocks in a victim segment, F2FS scans the bitmap
of the segment managed as an SIT entry.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:13:01 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
*nofs = ofs_of_node(node_page);
|
2020-02-14 17:44:10 +08:00
|
|
|
source_blkaddr = data_blkaddr(NULL, node_page, ofs_in_node);
|
f2fs: add garbage collection functions
This adds on-demand and background cleaning functions.
- The basic background cleaning policy is trying to do cleaning jobs as much as
possible whenever the system is idle. Once the background cleaning is done,
the cleaner sleeps an amount of time not to interfere with VFS calls. The time
is dynamically adjusted according to the status of whole segments, which is
decreased when the following conditions are satisfied.
. GC is not conducted currently, and
. IO subsystem is idle by checking the number of requets in bdev's request
list, and
. There are enough dirty segments.
Otherwise, the time is increased incrementally until to the maximum time.
Note that, min and max times are 10 secs and 30 secs by default.
- F2FS adopts a default victim selection policy where background cleaning uses
a cost-benefit algorithm, while on-demand cleaning uses a greedy algorithm.
- The method of moving data during the cleaning is slightly different between
background and on-demand cleaning schemes. In the case of background cleaning,
F2FS loads the data, and marks them as dirty. Then, F2FS expects that the data
will be moved by flusher or VM. In the case of on-demand cleaning, F2FS should
move the data right away.
- In order to identify valid blocks in a victim segment, F2FS scans the bitmap
of the segment managed as an SIT entry.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:13:01 +08:00
|
|
|
f2fs_put_page(node_page, 1);
|
|
|
|
|
2019-08-07 21:40:32 +08:00
|
|
|
if (source_blkaddr != blkaddr) {
|
|
|
|
#ifdef CONFIG_F2FS_CHECK_FS
|
|
|
|
unsigned int segno = GET_SEGNO(sbi, blkaddr);
|
|
|
|
unsigned long offset = GET_BLKOFF_FROM_SEG0(sbi, blkaddr);
|
|
|
|
|
|
|
|
if (unlikely(check_valid_map(sbi, segno, offset))) {
|
|
|
|
if (!test_and_set_bit(segno, SIT_I(sbi)->invalid_segmap)) {
|
|
|
|
f2fs_err(sbi, "mismatched blkaddr %u (source_blkaddr %u) in seg %u\n",
|
|
|
|
blkaddr, source_blkaddr, segno);
|
|
|
|
f2fs_bug_on(sbi, 1);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
#endif
|
2015-07-01 09:37:21 +08:00
|
|
|
return false;
|
2019-08-07 21:40:32 +08:00
|
|
|
}
|
2015-07-01 09:37:21 +08:00
|
|
|
return true;
|
f2fs: add garbage collection functions
This adds on-demand and background cleaning functions.
- The basic background cleaning policy is trying to do cleaning jobs as much as
possible whenever the system is idle. Once the background cleaning is done,
the cleaner sleeps an amount of time not to interfere with VFS calls. The time
is dynamically adjusted according to the status of whole segments, which is
decreased when the following conditions are satisfied.
. GC is not conducted currently, and
. IO subsystem is idle by checking the number of requets in bdev's request
list, and
. There are enough dirty segments.
Otherwise, the time is increased incrementally until to the maximum time.
Note that, min and max times are 10 secs and 30 secs by default.
- F2FS adopts a default victim selection policy where background cleaning uses
a cost-benefit algorithm, while on-demand cleaning uses a greedy algorithm.
- The method of moving data during the cleaning is slightly different between
background and on-demand cleaning schemes. In the case of background cleaning,
F2FS loads the data, and marks them as dirty. Then, F2FS expects that the data
will be moved by flusher or VM. In the case of on-demand cleaning, F2FS should
move the data right away.
- In order to identify valid blocks in a victim segment, F2FS scans the bitmap
of the segment managed as an SIT entry.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:13:01 +08:00
|
|
|
}
|
|
|
|
|
f2fs: readahead encrypted block during GC
During GC, for each encrypted block, we will read block synchronously
into meta page, and then submit it into current cold data log area.
So this block read model with 4k granularity can make poor performance,
like migrating non-encrypted block, let's readahead encrypted block
as well to improve migration performance.
To implement this, we choose meta page that its index is old block
address of the encrypted block, and readahead ciphertext into this
page, later, if readaheaded page is still updated, we will load its
data into target meta page, and submit the write IO.
Note that for OPU, truncation, deletion, we need to invalid meta
page after we invalid old block address, to make sure we won't load
invalid data from target meta page during encrypted block migration.
for ((i = 0; i < 1000; i++))
do {
xfs_io -f /mnt/f2fs/dir/$i -c "pwrite 0 128k" -c "fsync";
} done
for ((i = 0; i < 1000; i+=2))
do {
rm /mnt/f2fs/dir/$i;
} done
ret = ioctl(fd, F2FS_IOC_GARBAGE_COLLECT, 0);
Before:
gc-6549 [001] d..1 214682.212797: block_rq_insert: 8,32 RA 32768 () 786400 + 64 [gc]
gc-6549 [001] d..1 214682.212802: block_unplug: [gc] 1
gc-6549 [001] .... 214682.213892: block_bio_queue: 8,32 R 67494144 + 8 [gc]
gc-6549 [001] .... 214682.213899: block_getrq: 8,32 R 67494144 + 8 [gc]
gc-6549 [001] .... 214682.213902: block_plug: [gc]
gc-6549 [001] d..1 214682.213905: block_rq_insert: 8,32 R 4096 () 67494144 + 8 [gc]
gc-6549 [001] d..1 214682.213908: block_unplug: [gc] 1
gc-6549 [001] .... 214682.226405: block_bio_queue: 8,32 R 67494152 + 8 [gc]
gc-6549 [001] .... 214682.226412: block_getrq: 8,32 R 67494152 + 8 [gc]
gc-6549 [001] .... 214682.226414: block_plug: [gc]
gc-6549 [001] d..1 214682.226417: block_rq_insert: 8,32 R 4096 () 67494152 + 8 [gc]
gc-6549 [001] d..1 214682.226420: block_unplug: [gc] 1
gc-6549 [001] .... 214682.226904: block_bio_queue: 8,32 R 67494160 + 8 [gc]
gc-6549 [001] .... 214682.226910: block_getrq: 8,32 R 67494160 + 8 [gc]
gc-6549 [001] .... 214682.226911: block_plug: [gc]
gc-6549 [001] d..1 214682.226914: block_rq_insert: 8,32 R 4096 () 67494160 + 8 [gc]
gc-6549 [001] d..1 214682.226916: block_unplug: [gc] 1
After:
gc-5678 [003] .... 214327.025906: block_bio_queue: 8,32 R 67493824 + 8 [gc]
gc-5678 [003] .... 214327.025908: block_bio_backmerge: 8,32 R 67493824 + 8 [gc]
gc-5678 [003] .... 214327.025915: block_bio_queue: 8,32 R 67493832 + 8 [gc]
gc-5678 [003] .... 214327.025917: block_bio_backmerge: 8,32 R 67493832 + 8 [gc]
gc-5678 [003] .... 214327.025923: block_bio_queue: 8,32 R 67493840 + 8 [gc]
gc-5678 [003] .... 214327.025925: block_bio_backmerge: 8,32 R 67493840 + 8 [gc]
gc-5678 [003] .... 214327.025932: block_bio_queue: 8,32 R 67493848 + 8 [gc]
gc-5678 [003] .... 214327.025934: block_bio_backmerge: 8,32 R 67493848 + 8 [gc]
gc-5678 [003] .... 214327.025941: block_bio_queue: 8,32 R 67493856 + 8 [gc]
gc-5678 [003] .... 214327.025943: block_bio_backmerge: 8,32 R 67493856 + 8 [gc]
gc-5678 [003] .... 214327.025953: block_bio_queue: 8,32 R 67493864 + 8 [gc]
gc-5678 [003] .... 214327.025955: block_bio_backmerge: 8,32 R 67493864 + 8 [gc]
gc-5678 [003] .... 214327.025962: block_bio_queue: 8,32 R 67493872 + 8 [gc]
gc-5678 [003] .... 214327.025964: block_bio_backmerge: 8,32 R 67493872 + 8 [gc]
gc-5678 [003] .... 214327.025970: block_bio_queue: 8,32 R 67493880 + 8 [gc]
gc-5678 [003] .... 214327.025972: block_bio_backmerge: 8,32 R 67493880 + 8 [gc]
gc-5678 [003] .... 214327.026000: block_bio_queue: 8,32 WS 34123776 + 2048 [gc]
gc-5678 [003] .... 214327.026019: block_getrq: 8,32 WS 34123776 + 2048 [gc]
gc-5678 [003] d..1 214327.026021: block_rq_insert: 8,32 R 131072 () 67493632 + 256 [gc]
gc-5678 [003] d..1 214327.026023: block_unplug: [gc] 1
gc-5678 [003] d..1 214327.026026: block_rq_issue: 8,32 R 131072 () 67493632 + 256 [gc]
gc-5678 [003] .... 214327.026046: block_plug: [gc]
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-08-14 22:37:25 +08:00
|
|
|
static int ra_data_block(struct inode *inode, pgoff_t index)
|
|
|
|
{
|
|
|
|
struct f2fs_sb_info *sbi = F2FS_I_SB(inode);
|
|
|
|
struct address_space *mapping = inode->i_mapping;
|
|
|
|
struct dnode_of_data dn;
|
|
|
|
struct page *page;
|
|
|
|
struct extent_info ei = {0, 0, 0};
|
|
|
|
struct f2fs_io_info fio = {
|
|
|
|
.sbi = sbi,
|
|
|
|
.ino = inode->i_ino,
|
|
|
|
.type = DATA,
|
|
|
|
.temp = COLD,
|
|
|
|
.op = REQ_OP_READ,
|
|
|
|
.op_flags = 0,
|
|
|
|
.encrypted_page = NULL,
|
|
|
|
.in_list = false,
|
|
|
|
.retry = false,
|
|
|
|
};
|
|
|
|
int err;
|
|
|
|
|
|
|
|
page = f2fs_grab_cache_page(mapping, index, true);
|
|
|
|
if (!page)
|
|
|
|
return -ENOMEM;
|
|
|
|
|
|
|
|
if (f2fs_lookup_extent_cache(inode, index, &ei)) {
|
|
|
|
dn.data_blkaddr = ei.blk + index - ei.fofs;
|
f2fs: introduce DATA_GENERIC_ENHANCE
Previously, f2fs_is_valid_blkaddr(, blkaddr, DATA_GENERIC) will check
whether @blkaddr locates in main area or not.
That check is weak, since the block address in range of main area can
point to the address which is not valid in segment info table, and we
can not detect such condition, we may suffer worse corruption as system
continues running.
So this patch introduce DATA_GENERIC_ENHANCE to enhance the sanity check
which trigger SIT bitmap check rather than only range check.
This patch did below changes as wel:
- set SBI_NEED_FSCK in f2fs_is_valid_blkaddr().
- get rid of is_valid_data_blkaddr() to avoid panic if blkaddr is invalid.
- introduce verify_fio_blkaddr() to wrap fio {new,old}_blkaddr validation check.
- spread blkaddr check in:
* f2fs_get_node_info()
* __read_out_blkaddrs()
* f2fs_submit_page_read()
* ra_data_block()
* do_recover_data()
This patch can fix bug reported from bugzilla below:
https://bugzilla.kernel.org/show_bug.cgi?id=203215
https://bugzilla.kernel.org/show_bug.cgi?id=203223
https://bugzilla.kernel.org/show_bug.cgi?id=203231
https://bugzilla.kernel.org/show_bug.cgi?id=203235
https://bugzilla.kernel.org/show_bug.cgi?id=203241
= Update by Jaegeuk Kim =
DATA_GENERIC_ENHANCE enhanced to validate block addresses on read/write paths.
But, xfstest/generic/446 compalins some generated kernel messages saying invalid
bitmap was detected when reading a block. The reaons is, when we get the
block addresses from extent_cache, there is no lock to synchronize it from
truncating the blocks in parallel.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-04-15 15:26:32 +08:00
|
|
|
if (unlikely(!f2fs_is_valid_blkaddr(sbi, dn.data_blkaddr,
|
|
|
|
DATA_GENERIC_ENHANCE_READ))) {
|
2019-06-20 11:36:14 +08:00
|
|
|
err = -EFSCORRUPTED;
|
f2fs: introduce DATA_GENERIC_ENHANCE
Previously, f2fs_is_valid_blkaddr(, blkaddr, DATA_GENERIC) will check
whether @blkaddr locates in main area or not.
That check is weak, since the block address in range of main area can
point to the address which is not valid in segment info table, and we
can not detect such condition, we may suffer worse corruption as system
continues running.
So this patch introduce DATA_GENERIC_ENHANCE to enhance the sanity check
which trigger SIT bitmap check rather than only range check.
This patch did below changes as wel:
- set SBI_NEED_FSCK in f2fs_is_valid_blkaddr().
- get rid of is_valid_data_blkaddr() to avoid panic if blkaddr is invalid.
- introduce verify_fio_blkaddr() to wrap fio {new,old}_blkaddr validation check.
- spread blkaddr check in:
* f2fs_get_node_info()
* __read_out_blkaddrs()
* f2fs_submit_page_read()
* ra_data_block()
* do_recover_data()
This patch can fix bug reported from bugzilla below:
https://bugzilla.kernel.org/show_bug.cgi?id=203215
https://bugzilla.kernel.org/show_bug.cgi?id=203223
https://bugzilla.kernel.org/show_bug.cgi?id=203231
https://bugzilla.kernel.org/show_bug.cgi?id=203235
https://bugzilla.kernel.org/show_bug.cgi?id=203241
= Update by Jaegeuk Kim =
DATA_GENERIC_ENHANCE enhanced to validate block addresses on read/write paths.
But, xfstest/generic/446 compalins some generated kernel messages saying invalid
bitmap was detected when reading a block. The reaons is, when we get the
block addresses from extent_cache, there is no lock to synchronize it from
truncating the blocks in parallel.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-04-15 15:26:32 +08:00
|
|
|
goto put_page;
|
|
|
|
}
|
f2fs: readahead encrypted block during GC
During GC, for each encrypted block, we will read block synchronously
into meta page, and then submit it into current cold data log area.
So this block read model with 4k granularity can make poor performance,
like migrating non-encrypted block, let's readahead encrypted block
as well to improve migration performance.
To implement this, we choose meta page that its index is old block
address of the encrypted block, and readahead ciphertext into this
page, later, if readaheaded page is still updated, we will load its
data into target meta page, and submit the write IO.
Note that for OPU, truncation, deletion, we need to invalid meta
page after we invalid old block address, to make sure we won't load
invalid data from target meta page during encrypted block migration.
for ((i = 0; i < 1000; i++))
do {
xfs_io -f /mnt/f2fs/dir/$i -c "pwrite 0 128k" -c "fsync";
} done
for ((i = 0; i < 1000; i+=2))
do {
rm /mnt/f2fs/dir/$i;
} done
ret = ioctl(fd, F2FS_IOC_GARBAGE_COLLECT, 0);
Before:
gc-6549 [001] d..1 214682.212797: block_rq_insert: 8,32 RA 32768 () 786400 + 64 [gc]
gc-6549 [001] d..1 214682.212802: block_unplug: [gc] 1
gc-6549 [001] .... 214682.213892: block_bio_queue: 8,32 R 67494144 + 8 [gc]
gc-6549 [001] .... 214682.213899: block_getrq: 8,32 R 67494144 + 8 [gc]
gc-6549 [001] .... 214682.213902: block_plug: [gc]
gc-6549 [001] d..1 214682.213905: block_rq_insert: 8,32 R 4096 () 67494144 + 8 [gc]
gc-6549 [001] d..1 214682.213908: block_unplug: [gc] 1
gc-6549 [001] .... 214682.226405: block_bio_queue: 8,32 R 67494152 + 8 [gc]
gc-6549 [001] .... 214682.226412: block_getrq: 8,32 R 67494152 + 8 [gc]
gc-6549 [001] .... 214682.226414: block_plug: [gc]
gc-6549 [001] d..1 214682.226417: block_rq_insert: 8,32 R 4096 () 67494152 + 8 [gc]
gc-6549 [001] d..1 214682.226420: block_unplug: [gc] 1
gc-6549 [001] .... 214682.226904: block_bio_queue: 8,32 R 67494160 + 8 [gc]
gc-6549 [001] .... 214682.226910: block_getrq: 8,32 R 67494160 + 8 [gc]
gc-6549 [001] .... 214682.226911: block_plug: [gc]
gc-6549 [001] d..1 214682.226914: block_rq_insert: 8,32 R 4096 () 67494160 + 8 [gc]
gc-6549 [001] d..1 214682.226916: block_unplug: [gc] 1
After:
gc-5678 [003] .... 214327.025906: block_bio_queue: 8,32 R 67493824 + 8 [gc]
gc-5678 [003] .... 214327.025908: block_bio_backmerge: 8,32 R 67493824 + 8 [gc]
gc-5678 [003] .... 214327.025915: block_bio_queue: 8,32 R 67493832 + 8 [gc]
gc-5678 [003] .... 214327.025917: block_bio_backmerge: 8,32 R 67493832 + 8 [gc]
gc-5678 [003] .... 214327.025923: block_bio_queue: 8,32 R 67493840 + 8 [gc]
gc-5678 [003] .... 214327.025925: block_bio_backmerge: 8,32 R 67493840 + 8 [gc]
gc-5678 [003] .... 214327.025932: block_bio_queue: 8,32 R 67493848 + 8 [gc]
gc-5678 [003] .... 214327.025934: block_bio_backmerge: 8,32 R 67493848 + 8 [gc]
gc-5678 [003] .... 214327.025941: block_bio_queue: 8,32 R 67493856 + 8 [gc]
gc-5678 [003] .... 214327.025943: block_bio_backmerge: 8,32 R 67493856 + 8 [gc]
gc-5678 [003] .... 214327.025953: block_bio_queue: 8,32 R 67493864 + 8 [gc]
gc-5678 [003] .... 214327.025955: block_bio_backmerge: 8,32 R 67493864 + 8 [gc]
gc-5678 [003] .... 214327.025962: block_bio_queue: 8,32 R 67493872 + 8 [gc]
gc-5678 [003] .... 214327.025964: block_bio_backmerge: 8,32 R 67493872 + 8 [gc]
gc-5678 [003] .... 214327.025970: block_bio_queue: 8,32 R 67493880 + 8 [gc]
gc-5678 [003] .... 214327.025972: block_bio_backmerge: 8,32 R 67493880 + 8 [gc]
gc-5678 [003] .... 214327.026000: block_bio_queue: 8,32 WS 34123776 + 2048 [gc]
gc-5678 [003] .... 214327.026019: block_getrq: 8,32 WS 34123776 + 2048 [gc]
gc-5678 [003] d..1 214327.026021: block_rq_insert: 8,32 R 131072 () 67493632 + 256 [gc]
gc-5678 [003] d..1 214327.026023: block_unplug: [gc] 1
gc-5678 [003] d..1 214327.026026: block_rq_issue: 8,32 R 131072 () 67493632 + 256 [gc]
gc-5678 [003] .... 214327.026046: block_plug: [gc]
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-08-14 22:37:25 +08:00
|
|
|
goto got_it;
|
|
|
|
}
|
|
|
|
|
|
|
|
set_new_dnode(&dn, inode, NULL, NULL, 0);
|
|
|
|
err = f2fs_get_dnode_of_data(&dn, index, LOOKUP_NODE);
|
|
|
|
if (err)
|
|
|
|
goto put_page;
|
|
|
|
f2fs_put_dnode(&dn);
|
|
|
|
|
f2fs: introduce DATA_GENERIC_ENHANCE
Previously, f2fs_is_valid_blkaddr(, blkaddr, DATA_GENERIC) will check
whether @blkaddr locates in main area or not.
That check is weak, since the block address in range of main area can
point to the address which is not valid in segment info table, and we
can not detect such condition, we may suffer worse corruption as system
continues running.
So this patch introduce DATA_GENERIC_ENHANCE to enhance the sanity check
which trigger SIT bitmap check rather than only range check.
This patch did below changes as wel:
- set SBI_NEED_FSCK in f2fs_is_valid_blkaddr().
- get rid of is_valid_data_blkaddr() to avoid panic if blkaddr is invalid.
- introduce verify_fio_blkaddr() to wrap fio {new,old}_blkaddr validation check.
- spread blkaddr check in:
* f2fs_get_node_info()
* __read_out_blkaddrs()
* f2fs_submit_page_read()
* ra_data_block()
* do_recover_data()
This patch can fix bug reported from bugzilla below:
https://bugzilla.kernel.org/show_bug.cgi?id=203215
https://bugzilla.kernel.org/show_bug.cgi?id=203223
https://bugzilla.kernel.org/show_bug.cgi?id=203231
https://bugzilla.kernel.org/show_bug.cgi?id=203235
https://bugzilla.kernel.org/show_bug.cgi?id=203241
= Update by Jaegeuk Kim =
DATA_GENERIC_ENHANCE enhanced to validate block addresses on read/write paths.
But, xfstest/generic/446 compalins some generated kernel messages saying invalid
bitmap was detected when reading a block. The reaons is, when we get the
block addresses from extent_cache, there is no lock to synchronize it from
truncating the blocks in parallel.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-04-15 15:26:32 +08:00
|
|
|
if (!__is_valid_data_blkaddr(dn.data_blkaddr)) {
|
|
|
|
err = -ENOENT;
|
|
|
|
goto put_page;
|
|
|
|
}
|
f2fs: readahead encrypted block during GC
During GC, for each encrypted block, we will read block synchronously
into meta page, and then submit it into current cold data log area.
So this block read model with 4k granularity can make poor performance,
like migrating non-encrypted block, let's readahead encrypted block
as well to improve migration performance.
To implement this, we choose meta page that its index is old block
address of the encrypted block, and readahead ciphertext into this
page, later, if readaheaded page is still updated, we will load its
data into target meta page, and submit the write IO.
Note that for OPU, truncation, deletion, we need to invalid meta
page after we invalid old block address, to make sure we won't load
invalid data from target meta page during encrypted block migration.
for ((i = 0; i < 1000; i++))
do {
xfs_io -f /mnt/f2fs/dir/$i -c "pwrite 0 128k" -c "fsync";
} done
for ((i = 0; i < 1000; i+=2))
do {
rm /mnt/f2fs/dir/$i;
} done
ret = ioctl(fd, F2FS_IOC_GARBAGE_COLLECT, 0);
Before:
gc-6549 [001] d..1 214682.212797: block_rq_insert: 8,32 RA 32768 () 786400 + 64 [gc]
gc-6549 [001] d..1 214682.212802: block_unplug: [gc] 1
gc-6549 [001] .... 214682.213892: block_bio_queue: 8,32 R 67494144 + 8 [gc]
gc-6549 [001] .... 214682.213899: block_getrq: 8,32 R 67494144 + 8 [gc]
gc-6549 [001] .... 214682.213902: block_plug: [gc]
gc-6549 [001] d..1 214682.213905: block_rq_insert: 8,32 R 4096 () 67494144 + 8 [gc]
gc-6549 [001] d..1 214682.213908: block_unplug: [gc] 1
gc-6549 [001] .... 214682.226405: block_bio_queue: 8,32 R 67494152 + 8 [gc]
gc-6549 [001] .... 214682.226412: block_getrq: 8,32 R 67494152 + 8 [gc]
gc-6549 [001] .... 214682.226414: block_plug: [gc]
gc-6549 [001] d..1 214682.226417: block_rq_insert: 8,32 R 4096 () 67494152 + 8 [gc]
gc-6549 [001] d..1 214682.226420: block_unplug: [gc] 1
gc-6549 [001] .... 214682.226904: block_bio_queue: 8,32 R 67494160 + 8 [gc]
gc-6549 [001] .... 214682.226910: block_getrq: 8,32 R 67494160 + 8 [gc]
gc-6549 [001] .... 214682.226911: block_plug: [gc]
gc-6549 [001] d..1 214682.226914: block_rq_insert: 8,32 R 4096 () 67494160 + 8 [gc]
gc-6549 [001] d..1 214682.226916: block_unplug: [gc] 1
After:
gc-5678 [003] .... 214327.025906: block_bio_queue: 8,32 R 67493824 + 8 [gc]
gc-5678 [003] .... 214327.025908: block_bio_backmerge: 8,32 R 67493824 + 8 [gc]
gc-5678 [003] .... 214327.025915: block_bio_queue: 8,32 R 67493832 + 8 [gc]
gc-5678 [003] .... 214327.025917: block_bio_backmerge: 8,32 R 67493832 + 8 [gc]
gc-5678 [003] .... 214327.025923: block_bio_queue: 8,32 R 67493840 + 8 [gc]
gc-5678 [003] .... 214327.025925: block_bio_backmerge: 8,32 R 67493840 + 8 [gc]
gc-5678 [003] .... 214327.025932: block_bio_queue: 8,32 R 67493848 + 8 [gc]
gc-5678 [003] .... 214327.025934: block_bio_backmerge: 8,32 R 67493848 + 8 [gc]
gc-5678 [003] .... 214327.025941: block_bio_queue: 8,32 R 67493856 + 8 [gc]
gc-5678 [003] .... 214327.025943: block_bio_backmerge: 8,32 R 67493856 + 8 [gc]
gc-5678 [003] .... 214327.025953: block_bio_queue: 8,32 R 67493864 + 8 [gc]
gc-5678 [003] .... 214327.025955: block_bio_backmerge: 8,32 R 67493864 + 8 [gc]
gc-5678 [003] .... 214327.025962: block_bio_queue: 8,32 R 67493872 + 8 [gc]
gc-5678 [003] .... 214327.025964: block_bio_backmerge: 8,32 R 67493872 + 8 [gc]
gc-5678 [003] .... 214327.025970: block_bio_queue: 8,32 R 67493880 + 8 [gc]
gc-5678 [003] .... 214327.025972: block_bio_backmerge: 8,32 R 67493880 + 8 [gc]
gc-5678 [003] .... 214327.026000: block_bio_queue: 8,32 WS 34123776 + 2048 [gc]
gc-5678 [003] .... 214327.026019: block_getrq: 8,32 WS 34123776 + 2048 [gc]
gc-5678 [003] d..1 214327.026021: block_rq_insert: 8,32 R 131072 () 67493632 + 256 [gc]
gc-5678 [003] d..1 214327.026023: block_unplug: [gc] 1
gc-5678 [003] d..1 214327.026026: block_rq_issue: 8,32 R 131072 () 67493632 + 256 [gc]
gc-5678 [003] .... 214327.026046: block_plug: [gc]
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-08-14 22:37:25 +08:00
|
|
|
if (unlikely(!f2fs_is_valid_blkaddr(sbi, dn.data_blkaddr,
|
f2fs: introduce DATA_GENERIC_ENHANCE
Previously, f2fs_is_valid_blkaddr(, blkaddr, DATA_GENERIC) will check
whether @blkaddr locates in main area or not.
That check is weak, since the block address in range of main area can
point to the address which is not valid in segment info table, and we
can not detect such condition, we may suffer worse corruption as system
continues running.
So this patch introduce DATA_GENERIC_ENHANCE to enhance the sanity check
which trigger SIT bitmap check rather than only range check.
This patch did below changes as wel:
- set SBI_NEED_FSCK in f2fs_is_valid_blkaddr().
- get rid of is_valid_data_blkaddr() to avoid panic if blkaddr is invalid.
- introduce verify_fio_blkaddr() to wrap fio {new,old}_blkaddr validation check.
- spread blkaddr check in:
* f2fs_get_node_info()
* __read_out_blkaddrs()
* f2fs_submit_page_read()
* ra_data_block()
* do_recover_data()
This patch can fix bug reported from bugzilla below:
https://bugzilla.kernel.org/show_bug.cgi?id=203215
https://bugzilla.kernel.org/show_bug.cgi?id=203223
https://bugzilla.kernel.org/show_bug.cgi?id=203231
https://bugzilla.kernel.org/show_bug.cgi?id=203235
https://bugzilla.kernel.org/show_bug.cgi?id=203241
= Update by Jaegeuk Kim =
DATA_GENERIC_ENHANCE enhanced to validate block addresses on read/write paths.
But, xfstest/generic/446 compalins some generated kernel messages saying invalid
bitmap was detected when reading a block. The reaons is, when we get the
block addresses from extent_cache, there is no lock to synchronize it from
truncating the blocks in parallel.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-04-15 15:26:32 +08:00
|
|
|
DATA_GENERIC_ENHANCE))) {
|
2019-06-20 11:36:14 +08:00
|
|
|
err = -EFSCORRUPTED;
|
f2fs: readahead encrypted block during GC
During GC, for each encrypted block, we will read block synchronously
into meta page, and then submit it into current cold data log area.
So this block read model with 4k granularity can make poor performance,
like migrating non-encrypted block, let's readahead encrypted block
as well to improve migration performance.
To implement this, we choose meta page that its index is old block
address of the encrypted block, and readahead ciphertext into this
page, later, if readaheaded page is still updated, we will load its
data into target meta page, and submit the write IO.
Note that for OPU, truncation, deletion, we need to invalid meta
page after we invalid old block address, to make sure we won't load
invalid data from target meta page during encrypted block migration.
for ((i = 0; i < 1000; i++))
do {
xfs_io -f /mnt/f2fs/dir/$i -c "pwrite 0 128k" -c "fsync";
} done
for ((i = 0; i < 1000; i+=2))
do {
rm /mnt/f2fs/dir/$i;
} done
ret = ioctl(fd, F2FS_IOC_GARBAGE_COLLECT, 0);
Before:
gc-6549 [001] d..1 214682.212797: block_rq_insert: 8,32 RA 32768 () 786400 + 64 [gc]
gc-6549 [001] d..1 214682.212802: block_unplug: [gc] 1
gc-6549 [001] .... 214682.213892: block_bio_queue: 8,32 R 67494144 + 8 [gc]
gc-6549 [001] .... 214682.213899: block_getrq: 8,32 R 67494144 + 8 [gc]
gc-6549 [001] .... 214682.213902: block_plug: [gc]
gc-6549 [001] d..1 214682.213905: block_rq_insert: 8,32 R 4096 () 67494144 + 8 [gc]
gc-6549 [001] d..1 214682.213908: block_unplug: [gc] 1
gc-6549 [001] .... 214682.226405: block_bio_queue: 8,32 R 67494152 + 8 [gc]
gc-6549 [001] .... 214682.226412: block_getrq: 8,32 R 67494152 + 8 [gc]
gc-6549 [001] .... 214682.226414: block_plug: [gc]
gc-6549 [001] d..1 214682.226417: block_rq_insert: 8,32 R 4096 () 67494152 + 8 [gc]
gc-6549 [001] d..1 214682.226420: block_unplug: [gc] 1
gc-6549 [001] .... 214682.226904: block_bio_queue: 8,32 R 67494160 + 8 [gc]
gc-6549 [001] .... 214682.226910: block_getrq: 8,32 R 67494160 + 8 [gc]
gc-6549 [001] .... 214682.226911: block_plug: [gc]
gc-6549 [001] d..1 214682.226914: block_rq_insert: 8,32 R 4096 () 67494160 + 8 [gc]
gc-6549 [001] d..1 214682.226916: block_unplug: [gc] 1
After:
gc-5678 [003] .... 214327.025906: block_bio_queue: 8,32 R 67493824 + 8 [gc]
gc-5678 [003] .... 214327.025908: block_bio_backmerge: 8,32 R 67493824 + 8 [gc]
gc-5678 [003] .... 214327.025915: block_bio_queue: 8,32 R 67493832 + 8 [gc]
gc-5678 [003] .... 214327.025917: block_bio_backmerge: 8,32 R 67493832 + 8 [gc]
gc-5678 [003] .... 214327.025923: block_bio_queue: 8,32 R 67493840 + 8 [gc]
gc-5678 [003] .... 214327.025925: block_bio_backmerge: 8,32 R 67493840 + 8 [gc]
gc-5678 [003] .... 214327.025932: block_bio_queue: 8,32 R 67493848 + 8 [gc]
gc-5678 [003] .... 214327.025934: block_bio_backmerge: 8,32 R 67493848 + 8 [gc]
gc-5678 [003] .... 214327.025941: block_bio_queue: 8,32 R 67493856 + 8 [gc]
gc-5678 [003] .... 214327.025943: block_bio_backmerge: 8,32 R 67493856 + 8 [gc]
gc-5678 [003] .... 214327.025953: block_bio_queue: 8,32 R 67493864 + 8 [gc]
gc-5678 [003] .... 214327.025955: block_bio_backmerge: 8,32 R 67493864 + 8 [gc]
gc-5678 [003] .... 214327.025962: block_bio_queue: 8,32 R 67493872 + 8 [gc]
gc-5678 [003] .... 214327.025964: block_bio_backmerge: 8,32 R 67493872 + 8 [gc]
gc-5678 [003] .... 214327.025970: block_bio_queue: 8,32 R 67493880 + 8 [gc]
gc-5678 [003] .... 214327.025972: block_bio_backmerge: 8,32 R 67493880 + 8 [gc]
gc-5678 [003] .... 214327.026000: block_bio_queue: 8,32 WS 34123776 + 2048 [gc]
gc-5678 [003] .... 214327.026019: block_getrq: 8,32 WS 34123776 + 2048 [gc]
gc-5678 [003] d..1 214327.026021: block_rq_insert: 8,32 R 131072 () 67493632 + 256 [gc]
gc-5678 [003] d..1 214327.026023: block_unplug: [gc] 1
gc-5678 [003] d..1 214327.026026: block_rq_issue: 8,32 R 131072 () 67493632 + 256 [gc]
gc-5678 [003] .... 214327.026046: block_plug: [gc]
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-08-14 22:37:25 +08:00
|
|
|
goto put_page;
|
|
|
|
}
|
|
|
|
got_it:
|
|
|
|
/* read page */
|
|
|
|
fio.page = page;
|
|
|
|
fio.new_blkaddr = fio.old_blkaddr = dn.data_blkaddr;
|
|
|
|
|
2018-09-18 20:39:53 +08:00
|
|
|
/*
|
|
|
|
* don't cache encrypted data into meta inode until previous dirty
|
|
|
|
* data were writebacked to avoid racing between GC and flush.
|
|
|
|
*/
|
2018-12-25 17:43:42 +08:00
|
|
|
f2fs_wait_on_page_writeback(page, DATA, true, true);
|
2018-09-18 20:39:53 +08:00
|
|
|
|
|
|
|
f2fs_wait_on_block_writeback(inode, dn.data_blkaddr);
|
|
|
|
|
f2fs: readahead encrypted block during GC
During GC, for each encrypted block, we will read block synchronously
into meta page, and then submit it into current cold data log area.
So this block read model with 4k granularity can make poor performance,
like migrating non-encrypted block, let's readahead encrypted block
as well to improve migration performance.
To implement this, we choose meta page that its index is old block
address of the encrypted block, and readahead ciphertext into this
page, later, if readaheaded page is still updated, we will load its
data into target meta page, and submit the write IO.
Note that for OPU, truncation, deletion, we need to invalid meta
page after we invalid old block address, to make sure we won't load
invalid data from target meta page during encrypted block migration.
for ((i = 0; i < 1000; i++))
do {
xfs_io -f /mnt/f2fs/dir/$i -c "pwrite 0 128k" -c "fsync";
} done
for ((i = 0; i < 1000; i+=2))
do {
rm /mnt/f2fs/dir/$i;
} done
ret = ioctl(fd, F2FS_IOC_GARBAGE_COLLECT, 0);
Before:
gc-6549 [001] d..1 214682.212797: block_rq_insert: 8,32 RA 32768 () 786400 + 64 [gc]
gc-6549 [001] d..1 214682.212802: block_unplug: [gc] 1
gc-6549 [001] .... 214682.213892: block_bio_queue: 8,32 R 67494144 + 8 [gc]
gc-6549 [001] .... 214682.213899: block_getrq: 8,32 R 67494144 + 8 [gc]
gc-6549 [001] .... 214682.213902: block_plug: [gc]
gc-6549 [001] d..1 214682.213905: block_rq_insert: 8,32 R 4096 () 67494144 + 8 [gc]
gc-6549 [001] d..1 214682.213908: block_unplug: [gc] 1
gc-6549 [001] .... 214682.226405: block_bio_queue: 8,32 R 67494152 + 8 [gc]
gc-6549 [001] .... 214682.226412: block_getrq: 8,32 R 67494152 + 8 [gc]
gc-6549 [001] .... 214682.226414: block_plug: [gc]
gc-6549 [001] d..1 214682.226417: block_rq_insert: 8,32 R 4096 () 67494152 + 8 [gc]
gc-6549 [001] d..1 214682.226420: block_unplug: [gc] 1
gc-6549 [001] .... 214682.226904: block_bio_queue: 8,32 R 67494160 + 8 [gc]
gc-6549 [001] .... 214682.226910: block_getrq: 8,32 R 67494160 + 8 [gc]
gc-6549 [001] .... 214682.226911: block_plug: [gc]
gc-6549 [001] d..1 214682.226914: block_rq_insert: 8,32 R 4096 () 67494160 + 8 [gc]
gc-6549 [001] d..1 214682.226916: block_unplug: [gc] 1
After:
gc-5678 [003] .... 214327.025906: block_bio_queue: 8,32 R 67493824 + 8 [gc]
gc-5678 [003] .... 214327.025908: block_bio_backmerge: 8,32 R 67493824 + 8 [gc]
gc-5678 [003] .... 214327.025915: block_bio_queue: 8,32 R 67493832 + 8 [gc]
gc-5678 [003] .... 214327.025917: block_bio_backmerge: 8,32 R 67493832 + 8 [gc]
gc-5678 [003] .... 214327.025923: block_bio_queue: 8,32 R 67493840 + 8 [gc]
gc-5678 [003] .... 214327.025925: block_bio_backmerge: 8,32 R 67493840 + 8 [gc]
gc-5678 [003] .... 214327.025932: block_bio_queue: 8,32 R 67493848 + 8 [gc]
gc-5678 [003] .... 214327.025934: block_bio_backmerge: 8,32 R 67493848 + 8 [gc]
gc-5678 [003] .... 214327.025941: block_bio_queue: 8,32 R 67493856 + 8 [gc]
gc-5678 [003] .... 214327.025943: block_bio_backmerge: 8,32 R 67493856 + 8 [gc]
gc-5678 [003] .... 214327.025953: block_bio_queue: 8,32 R 67493864 + 8 [gc]
gc-5678 [003] .... 214327.025955: block_bio_backmerge: 8,32 R 67493864 + 8 [gc]
gc-5678 [003] .... 214327.025962: block_bio_queue: 8,32 R 67493872 + 8 [gc]
gc-5678 [003] .... 214327.025964: block_bio_backmerge: 8,32 R 67493872 + 8 [gc]
gc-5678 [003] .... 214327.025970: block_bio_queue: 8,32 R 67493880 + 8 [gc]
gc-5678 [003] .... 214327.025972: block_bio_backmerge: 8,32 R 67493880 + 8 [gc]
gc-5678 [003] .... 214327.026000: block_bio_queue: 8,32 WS 34123776 + 2048 [gc]
gc-5678 [003] .... 214327.026019: block_getrq: 8,32 WS 34123776 + 2048 [gc]
gc-5678 [003] d..1 214327.026021: block_rq_insert: 8,32 R 131072 () 67493632 + 256 [gc]
gc-5678 [003] d..1 214327.026023: block_unplug: [gc] 1
gc-5678 [003] d..1 214327.026026: block_rq_issue: 8,32 R 131072 () 67493632 + 256 [gc]
gc-5678 [003] .... 214327.026046: block_plug: [gc]
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-08-14 22:37:25 +08:00
|
|
|
fio.encrypted_page = f2fs_pagecache_get_page(META_MAPPING(sbi),
|
|
|
|
dn.data_blkaddr,
|
|
|
|
FGP_LOCK | FGP_CREAT, GFP_NOFS);
|
|
|
|
if (!fio.encrypted_page) {
|
|
|
|
err = -ENOMEM;
|
|
|
|
goto put_page;
|
|
|
|
}
|
|
|
|
|
|
|
|
err = f2fs_submit_page_bio(&fio);
|
|
|
|
if (err)
|
|
|
|
goto put_encrypted_page;
|
|
|
|
f2fs_put_page(fio.encrypted_page, 0);
|
|
|
|
f2fs_put_page(page, 1);
|
2020-04-16 18:16:56 +08:00
|
|
|
|
|
|
|
f2fs_update_iostat(sbi, FS_DATA_READ_IO, F2FS_BLKSIZE);
|
2020-04-23 18:03:06 +08:00
|
|
|
f2fs_update_iostat(sbi, FS_GDATA_READ_IO, F2FS_BLKSIZE);
|
2020-04-16 18:16:56 +08:00
|
|
|
|
f2fs: readahead encrypted block during GC
During GC, for each encrypted block, we will read block synchronously
into meta page, and then submit it into current cold data log area.
So this block read model with 4k granularity can make poor performance,
like migrating non-encrypted block, let's readahead encrypted block
as well to improve migration performance.
To implement this, we choose meta page that its index is old block
address of the encrypted block, and readahead ciphertext into this
page, later, if readaheaded page is still updated, we will load its
data into target meta page, and submit the write IO.
Note that for OPU, truncation, deletion, we need to invalid meta
page after we invalid old block address, to make sure we won't load
invalid data from target meta page during encrypted block migration.
for ((i = 0; i < 1000; i++))
do {
xfs_io -f /mnt/f2fs/dir/$i -c "pwrite 0 128k" -c "fsync";
} done
for ((i = 0; i < 1000; i+=2))
do {
rm /mnt/f2fs/dir/$i;
} done
ret = ioctl(fd, F2FS_IOC_GARBAGE_COLLECT, 0);
Before:
gc-6549 [001] d..1 214682.212797: block_rq_insert: 8,32 RA 32768 () 786400 + 64 [gc]
gc-6549 [001] d..1 214682.212802: block_unplug: [gc] 1
gc-6549 [001] .... 214682.213892: block_bio_queue: 8,32 R 67494144 + 8 [gc]
gc-6549 [001] .... 214682.213899: block_getrq: 8,32 R 67494144 + 8 [gc]
gc-6549 [001] .... 214682.213902: block_plug: [gc]
gc-6549 [001] d..1 214682.213905: block_rq_insert: 8,32 R 4096 () 67494144 + 8 [gc]
gc-6549 [001] d..1 214682.213908: block_unplug: [gc] 1
gc-6549 [001] .... 214682.226405: block_bio_queue: 8,32 R 67494152 + 8 [gc]
gc-6549 [001] .... 214682.226412: block_getrq: 8,32 R 67494152 + 8 [gc]
gc-6549 [001] .... 214682.226414: block_plug: [gc]
gc-6549 [001] d..1 214682.226417: block_rq_insert: 8,32 R 4096 () 67494152 + 8 [gc]
gc-6549 [001] d..1 214682.226420: block_unplug: [gc] 1
gc-6549 [001] .... 214682.226904: block_bio_queue: 8,32 R 67494160 + 8 [gc]
gc-6549 [001] .... 214682.226910: block_getrq: 8,32 R 67494160 + 8 [gc]
gc-6549 [001] .... 214682.226911: block_plug: [gc]
gc-6549 [001] d..1 214682.226914: block_rq_insert: 8,32 R 4096 () 67494160 + 8 [gc]
gc-6549 [001] d..1 214682.226916: block_unplug: [gc] 1
After:
gc-5678 [003] .... 214327.025906: block_bio_queue: 8,32 R 67493824 + 8 [gc]
gc-5678 [003] .... 214327.025908: block_bio_backmerge: 8,32 R 67493824 + 8 [gc]
gc-5678 [003] .... 214327.025915: block_bio_queue: 8,32 R 67493832 + 8 [gc]
gc-5678 [003] .... 214327.025917: block_bio_backmerge: 8,32 R 67493832 + 8 [gc]
gc-5678 [003] .... 214327.025923: block_bio_queue: 8,32 R 67493840 + 8 [gc]
gc-5678 [003] .... 214327.025925: block_bio_backmerge: 8,32 R 67493840 + 8 [gc]
gc-5678 [003] .... 214327.025932: block_bio_queue: 8,32 R 67493848 + 8 [gc]
gc-5678 [003] .... 214327.025934: block_bio_backmerge: 8,32 R 67493848 + 8 [gc]
gc-5678 [003] .... 214327.025941: block_bio_queue: 8,32 R 67493856 + 8 [gc]
gc-5678 [003] .... 214327.025943: block_bio_backmerge: 8,32 R 67493856 + 8 [gc]
gc-5678 [003] .... 214327.025953: block_bio_queue: 8,32 R 67493864 + 8 [gc]
gc-5678 [003] .... 214327.025955: block_bio_backmerge: 8,32 R 67493864 + 8 [gc]
gc-5678 [003] .... 214327.025962: block_bio_queue: 8,32 R 67493872 + 8 [gc]
gc-5678 [003] .... 214327.025964: block_bio_backmerge: 8,32 R 67493872 + 8 [gc]
gc-5678 [003] .... 214327.025970: block_bio_queue: 8,32 R 67493880 + 8 [gc]
gc-5678 [003] .... 214327.025972: block_bio_backmerge: 8,32 R 67493880 + 8 [gc]
gc-5678 [003] .... 214327.026000: block_bio_queue: 8,32 WS 34123776 + 2048 [gc]
gc-5678 [003] .... 214327.026019: block_getrq: 8,32 WS 34123776 + 2048 [gc]
gc-5678 [003] d..1 214327.026021: block_rq_insert: 8,32 R 131072 () 67493632 + 256 [gc]
gc-5678 [003] d..1 214327.026023: block_unplug: [gc] 1
gc-5678 [003] d..1 214327.026026: block_rq_issue: 8,32 R 131072 () 67493632 + 256 [gc]
gc-5678 [003] .... 214327.026046: block_plug: [gc]
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-08-14 22:37:25 +08:00
|
|
|
return 0;
|
|
|
|
put_encrypted_page:
|
|
|
|
f2fs_put_page(fio.encrypted_page, 1);
|
|
|
|
put_page:
|
|
|
|
f2fs_put_page(page, 1);
|
|
|
|
return err;
|
|
|
|
}
|
|
|
|
|
2017-09-06 08:04:35 +08:00
|
|
|
/*
|
|
|
|
* Move data block via META_MAPPING while keeping locked data page.
|
|
|
|
* This can be used to move blocks, aka LBAs, directly on disk.
|
|
|
|
*/
|
2018-09-13 07:40:53 +08:00
|
|
|
static int move_data_block(struct inode *inode, block_t bidx,
|
f2fs: avoid stucking GC due to atomic write
f2fs doesn't allow abuse on atomic write class interface, so except
limiting in-mem pages' total memory usage capacity, we need to limit
atomic-write usage as well when filesystem is seriously fragmented,
otherwise we may run into infinite loop during foreground GC because
target blocks in victim segment are belong to atomic opened file for
long time.
Now, we will detect failure due to atomic write in foreground GC, if
the count exceeds threshold, we will drop all atomic written data in
cache, by this, I expect it can keep our system running safely to
prevent Dos attack.
In addition, his patch adds to show GC skip information in debugfs,
now it just shows count of skipped caused by atomic write.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-07 20:28:54 +08:00
|
|
|
int gc_type, unsigned int segno, int off)
|
2015-04-24 03:04:33 +08:00
|
|
|
{
|
|
|
|
struct f2fs_io_info fio = {
|
|
|
|
.sbi = F2FS_I_SB(inode),
|
2017-09-29 13:59:38 +08:00
|
|
|
.ino = inode->i_ino,
|
2015-04-24 03:04:33 +08:00
|
|
|
.type = DATA,
|
2017-05-11 02:18:25 +08:00
|
|
|
.temp = COLD,
|
2016-06-06 03:31:55 +08:00
|
|
|
.op = REQ_OP_READ,
|
2016-11-01 21:40:10 +08:00
|
|
|
.op_flags = 0,
|
2015-04-24 03:04:33 +08:00
|
|
|
.encrypted_page = NULL,
|
2017-05-19 23:37:01 +08:00
|
|
|
.in_list = false,
|
2018-05-28 23:47:18 +08:00
|
|
|
.retry = false,
|
2015-04-24 03:04:33 +08:00
|
|
|
};
|
|
|
|
struct dnode_of_data dn;
|
|
|
|
struct f2fs_summary sum;
|
|
|
|
struct node_info ni;
|
f2fs: readahead encrypted block during GC
During GC, for each encrypted block, we will read block synchronously
into meta page, and then submit it into current cold data log area.
So this block read model with 4k granularity can make poor performance,
like migrating non-encrypted block, let's readahead encrypted block
as well to improve migration performance.
To implement this, we choose meta page that its index is old block
address of the encrypted block, and readahead ciphertext into this
page, later, if readaheaded page is still updated, we will load its
data into target meta page, and submit the write IO.
Note that for OPU, truncation, deletion, we need to invalid meta
page after we invalid old block address, to make sure we won't load
invalid data from target meta page during encrypted block migration.
for ((i = 0; i < 1000; i++))
do {
xfs_io -f /mnt/f2fs/dir/$i -c "pwrite 0 128k" -c "fsync";
} done
for ((i = 0; i < 1000; i+=2))
do {
rm /mnt/f2fs/dir/$i;
} done
ret = ioctl(fd, F2FS_IOC_GARBAGE_COLLECT, 0);
Before:
gc-6549 [001] d..1 214682.212797: block_rq_insert: 8,32 RA 32768 () 786400 + 64 [gc]
gc-6549 [001] d..1 214682.212802: block_unplug: [gc] 1
gc-6549 [001] .... 214682.213892: block_bio_queue: 8,32 R 67494144 + 8 [gc]
gc-6549 [001] .... 214682.213899: block_getrq: 8,32 R 67494144 + 8 [gc]
gc-6549 [001] .... 214682.213902: block_plug: [gc]
gc-6549 [001] d..1 214682.213905: block_rq_insert: 8,32 R 4096 () 67494144 + 8 [gc]
gc-6549 [001] d..1 214682.213908: block_unplug: [gc] 1
gc-6549 [001] .... 214682.226405: block_bio_queue: 8,32 R 67494152 + 8 [gc]
gc-6549 [001] .... 214682.226412: block_getrq: 8,32 R 67494152 + 8 [gc]
gc-6549 [001] .... 214682.226414: block_plug: [gc]
gc-6549 [001] d..1 214682.226417: block_rq_insert: 8,32 R 4096 () 67494152 + 8 [gc]
gc-6549 [001] d..1 214682.226420: block_unplug: [gc] 1
gc-6549 [001] .... 214682.226904: block_bio_queue: 8,32 R 67494160 + 8 [gc]
gc-6549 [001] .... 214682.226910: block_getrq: 8,32 R 67494160 + 8 [gc]
gc-6549 [001] .... 214682.226911: block_plug: [gc]
gc-6549 [001] d..1 214682.226914: block_rq_insert: 8,32 R 4096 () 67494160 + 8 [gc]
gc-6549 [001] d..1 214682.226916: block_unplug: [gc] 1
After:
gc-5678 [003] .... 214327.025906: block_bio_queue: 8,32 R 67493824 + 8 [gc]
gc-5678 [003] .... 214327.025908: block_bio_backmerge: 8,32 R 67493824 + 8 [gc]
gc-5678 [003] .... 214327.025915: block_bio_queue: 8,32 R 67493832 + 8 [gc]
gc-5678 [003] .... 214327.025917: block_bio_backmerge: 8,32 R 67493832 + 8 [gc]
gc-5678 [003] .... 214327.025923: block_bio_queue: 8,32 R 67493840 + 8 [gc]
gc-5678 [003] .... 214327.025925: block_bio_backmerge: 8,32 R 67493840 + 8 [gc]
gc-5678 [003] .... 214327.025932: block_bio_queue: 8,32 R 67493848 + 8 [gc]
gc-5678 [003] .... 214327.025934: block_bio_backmerge: 8,32 R 67493848 + 8 [gc]
gc-5678 [003] .... 214327.025941: block_bio_queue: 8,32 R 67493856 + 8 [gc]
gc-5678 [003] .... 214327.025943: block_bio_backmerge: 8,32 R 67493856 + 8 [gc]
gc-5678 [003] .... 214327.025953: block_bio_queue: 8,32 R 67493864 + 8 [gc]
gc-5678 [003] .... 214327.025955: block_bio_backmerge: 8,32 R 67493864 + 8 [gc]
gc-5678 [003] .... 214327.025962: block_bio_queue: 8,32 R 67493872 + 8 [gc]
gc-5678 [003] .... 214327.025964: block_bio_backmerge: 8,32 R 67493872 + 8 [gc]
gc-5678 [003] .... 214327.025970: block_bio_queue: 8,32 R 67493880 + 8 [gc]
gc-5678 [003] .... 214327.025972: block_bio_backmerge: 8,32 R 67493880 + 8 [gc]
gc-5678 [003] .... 214327.026000: block_bio_queue: 8,32 WS 34123776 + 2048 [gc]
gc-5678 [003] .... 214327.026019: block_getrq: 8,32 WS 34123776 + 2048 [gc]
gc-5678 [003] d..1 214327.026021: block_rq_insert: 8,32 R 131072 () 67493632 + 256 [gc]
gc-5678 [003] d..1 214327.026023: block_unplug: [gc] 1
gc-5678 [003] d..1 214327.026026: block_rq_issue: 8,32 R 131072 () 67493632 + 256 [gc]
gc-5678 [003] .... 214327.026046: block_plug: [gc]
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-08-14 22:37:25 +08:00
|
|
|
struct page *page, *mpage;
|
2016-02-23 17:52:43 +08:00
|
|
|
block_t newaddr;
|
2018-09-13 07:40:53 +08:00
|
|
|
int err = 0;
|
2020-02-14 17:44:12 +08:00
|
|
|
bool lfs_mode = f2fs_lfs_mode(fio.sbi);
|
f2fs: support age threshold based garbage collection
There are several issues in current background GC algorithm:
- valid blocks is one of key factors during cost overhead calculation,
so if segment has less valid block, however even its age is young or
it locates hot segment, CB algorithm will still choose the segment as
victim, it's not appropriate.
- GCed data/node will go to existing logs, no matter in-there datas'
update frequency is the same or not, it may mix hot and cold data
again.
- GC alloctor mainly use LFS type segment, it will cost free segment
more quickly.
This patch introduces a new algorithm named age threshold based
garbage collection to solve above issues, there are three steps
mainly:
1. select a source victim:
- set an age threshold, and select candidates beased threshold:
e.g.
0 means youngest, 100 means oldest, if we set age threshold to 80
then select dirty segments which has age in range of [80, 100] as
candiddates;
- set candidate_ratio threshold, and select candidates based the
ratio, so that we can shrink candidates to those oldest segments;
- select target segment with fewest valid blocks in order to
migrate blocks with minimum cost;
2. select a target victim:
- select candidates beased age threshold;
- set candidate_radius threshold, search candidates whose age is
around source victims, searching radius should less than the
radius threshold.
- select target segment with most valid blocks in order to avoid
migrating current target segment.
3. merge valid blocks from source victim into target victim with
SSR alloctor.
Test steps:
- create 160 dirty segments:
* half of them have 128 valid blocks per segment
* left of them have 384 valid blocks per segment
- run background GC
Benefit: GC count and block movement count both decrease obviously:
- Before:
- Valid: 86
- Dirty: 1
- Prefree: 11
- Free: 6001 (6001)
GC calls: 162 (BG: 220)
- data segments : 160 (160)
- node segments : 2 (2)
Try to move 41454 blocks (BG: 41454)
- data blocks : 40960 (40960)
- node blocks : 494 (494)
IPU: 0 blocks
SSR: 0 blocks in 0 segments
LFS: 41364 blocks in 81 segments
- After:
- Valid: 87
- Dirty: 0
- Prefree: 4
- Free: 6008 (6008)
GC calls: 75 (BG: 76)
- data segments : 74 (74)
- node segments : 1 (1)
Try to move 12813 blocks (BG: 12813)
- data blocks : 12544 (12544)
- node blocks : 269 (269)
IPU: 0 blocks
SSR: 12032 blocks in 77 segments
LFS: 855 blocks in 2 segments
Signed-off-by: Chao Yu <yuchao0@huawei.com>
[Jaegeuk Kim: fix a bug along with pinfile in-mem segment & clean up]
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-08-04 21:14:49 +08:00
|
|
|
int type = fio.sbi->am.atgc_enabled ?
|
|
|
|
CURSEG_ALL_DATA_ATGC : CURSEG_COLD_DATA;
|
2015-04-24 03:04:33 +08:00
|
|
|
|
|
|
|
/* do not read out */
|
2015-10-10 06:11:38 +08:00
|
|
|
page = f2fs_grab_cache_page(inode->i_mapping, bidx, false);
|
2015-04-24 03:04:33 +08:00
|
|
|
if (!page)
|
2018-09-13 07:40:53 +08:00
|
|
|
return -ENOMEM;
|
2015-04-24 03:04:33 +08:00
|
|
|
|
2018-09-13 07:40:53 +08:00
|
|
|
if (!check_valid_map(F2FS_I_SB(inode), segno, off)) {
|
|
|
|
err = -ENOENT;
|
2016-11-07 21:22:31 +08:00
|
|
|
goto out;
|
2018-09-13 07:40:53 +08:00
|
|
|
}
|
2016-11-07 21:22:31 +08:00
|
|
|
|
f2fs: avoid stucking GC due to atomic write
f2fs doesn't allow abuse on atomic write class interface, so except
limiting in-mem pages' total memory usage capacity, we need to limit
atomic-write usage as well when filesystem is seriously fragmented,
otherwise we may run into infinite loop during foreground GC because
target blocks in victim segment are belong to atomic opened file for
long time.
Now, we will detect failure due to atomic write in foreground GC, if
the count exceeds threshold, we will drop all atomic written data in
cache, by this, I expect it can keep our system running safely to
prevent Dos attack.
In addition, his patch adds to show GC skip information in debugfs,
now it just shows count of skipped caused by atomic write.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-07 20:28:54 +08:00
|
|
|
if (f2fs_is_atomic_file(inode)) {
|
|
|
|
F2FS_I(inode)->i_gc_failures[GC_FAILURE_ATOMIC]++;
|
|
|
|
F2FS_I_SB(inode)->skipped_atomic_files[gc_type]++;
|
2018-09-13 07:40:53 +08:00
|
|
|
err = -EAGAIN;
|
2017-01-07 18:50:26 +08:00
|
|
|
goto out;
|
f2fs: avoid stucking GC due to atomic write
f2fs doesn't allow abuse on atomic write class interface, so except
limiting in-mem pages' total memory usage capacity, we need to limit
atomic-write usage as well when filesystem is seriously fragmented,
otherwise we may run into infinite loop during foreground GC because
target blocks in victim segment are belong to atomic opened file for
long time.
Now, we will detect failure due to atomic write in foreground GC, if
the count exceeds threshold, we will drop all atomic written data in
cache, by this, I expect it can keep our system running safely to
prevent Dos attack.
In addition, his patch adds to show GC skip information in debugfs,
now it just shows count of skipped caused by atomic write.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-07 20:28:54 +08:00
|
|
|
}
|
2017-01-07 18:50:26 +08:00
|
|
|
|
2017-12-08 08:25:39 +08:00
|
|
|
if (f2fs_is_pinned_file(inode)) {
|
|
|
|
f2fs_pin_file_control(inode, true);
|
2018-09-13 07:40:53 +08:00
|
|
|
err = -EAGAIN;
|
2017-12-08 08:25:39 +08:00
|
|
|
goto out;
|
|
|
|
}
|
|
|
|
|
2015-04-24 03:04:33 +08:00
|
|
|
set_new_dnode(&dn, inode, NULL, NULL, 0);
|
f2fs: clean up symbol namespace
As Ted reported:
"Hi, I was looking at f2fs's sources recently, and I noticed that there
is a very large number of non-static symbols which don't have a f2fs
prefix. There's well over a hundred (see attached below).
As one example, in fs/f2fs/dir.c there is:
unsigned char get_de_type(struct f2fs_dir_entry *de)
This function is clearly only useful for f2fs, but it has a generic
name. This means that if any other file system tries to have the same
symbol name, there will be a symbol conflict and the kernel would not
successfully build. It also means that when someone is looking f2fs
sources, it's not at all obvious whether a function such as
read_data_page(), invalidate_blocks(), is a generic kernel function
found in the fs, mm, or block layers, or a f2fs specific function.
You might want to fix this at some point. Hopefully Kent's bcachefs
isn't similarly using genericly named functions, since that might
cause conflicts with f2fs's functions --- but just as this would be a
problem that we would rightly insist that Kent fix, this is something
that we should have rightly insisted that f2fs should have fixed
before it was integrated into the mainline kernel.
acquire_orphan_inode
add_ino_entry
add_orphan_inode
allocate_data_block
allocate_new_segments
alloc_nid
alloc_nid_done
alloc_nid_failed
available_free_memory
...."
This patch adds "f2fs_" prefix for all non-static symbols in order to:
a) avoid conflict with other kernel generic symbols;
b) to indicate the function is f2fs specific one instead of generic
one;
Reported-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-30 00:20:41 +08:00
|
|
|
err = f2fs_get_dnode_of_data(&dn, bidx, LOOKUP_NODE);
|
2015-04-24 03:04:33 +08:00
|
|
|
if (err)
|
|
|
|
goto out;
|
|
|
|
|
2015-10-08 13:27:34 +08:00
|
|
|
if (unlikely(dn.data_blkaddr == NULL_ADDR)) {
|
|
|
|
ClearPageUptodate(page);
|
2018-09-13 07:40:53 +08:00
|
|
|
err = -ENOENT;
|
2015-04-24 03:04:33 +08:00
|
|
|
goto put_out;
|
2015-10-08 13:27:34 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* don't cache encrypted data into meta inode until previous dirty
|
|
|
|
* data were writebacked to avoid racing between GC and flush.
|
|
|
|
*/
|
2018-12-25 17:43:42 +08:00
|
|
|
f2fs_wait_on_page_writeback(page, DATA, true, true);
|
2015-04-24 03:04:33 +08:00
|
|
|
|
2018-09-18 20:39:53 +08:00
|
|
|
f2fs_wait_on_block_writeback(inode, dn.data_blkaddr);
|
|
|
|
|
2018-07-17 00:02:17 +08:00
|
|
|
err = f2fs_get_node_info(fio.sbi, dn.nid, &ni);
|
|
|
|
if (err)
|
|
|
|
goto put_out;
|
|
|
|
|
2015-04-24 03:04:33 +08:00
|
|
|
set_summary(&sum, dn.nid, dn.ofs_in_node, ni.version);
|
|
|
|
|
|
|
|
/* read page */
|
|
|
|
fio.page = page;
|
f2fs: trace old block address for CoWed page
This patch enables to trace old block address of CoWed page for better
debugging.
f2fs_submit_page_mbio: dev = (1,0), ino = 1, page_index = 0x1d4f0, oldaddr = 0xfe8ab, newaddr = 0xfee90 rw = WRITE_SYNC, type = NODE
f2fs_submit_page_mbio: dev = (1,0), ino = 1, page_index = 0x1d4f8, oldaddr = 0xfe8b0, newaddr = 0xfee91 rw = WRITE_SYNC, type = NODE
f2fs_submit_page_mbio: dev = (1,0), ino = 1, page_index = 0x1d4fa, oldaddr = 0xfe8ae, newaddr = 0xfee92 rw = WRITE_SYNC, type = NODE
f2fs_submit_page_mbio: dev = (1,0), ino = 134824, page_index = 0x96, oldaddr = 0xf049b, newaddr = 0x2bbe rw = WRITE, type = DATA
f2fs_submit_page_mbio: dev = (1,0), ino = 134824, page_index = 0x97, oldaddr = 0xf049c, newaddr = 0x2bbf rw = WRITE, type = DATA
f2fs_submit_page_mbio: dev = (1,0), ino = 134824, page_index = 0x98, oldaddr = 0xf049d, newaddr = 0x2bc0 rw = WRITE, type = DATA
f2fs_submit_page_mbio: dev = (1,0), ino = 135260, page_index = 0x47, oldaddr = 0xffffffff, newaddr = 0xf2631 rw = WRITE, type = DATA
f2fs_submit_page_mbio: dev = (1,0), ino = 135260, page_index = 0x48, oldaddr = 0xffffffff, newaddr = 0xf2632 rw = WRITE, type = DATA
f2fs_submit_page_mbio: dev = (1,0), ino = 135260, page_index = 0x49, oldaddr = 0xffffffff, newaddr = 0xf2633 rw = WRITE, type = DATA
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-02-22 18:36:38 +08:00
|
|
|
fio.new_blkaddr = fio.old_blkaddr = dn.data_blkaddr;
|
2015-04-24 03:04:33 +08:00
|
|
|
|
2018-05-26 09:00:13 +08:00
|
|
|
if (lfs_mode)
|
|
|
|
down_write(&fio.sbi->io_order_lock);
|
|
|
|
|
2019-07-18 09:31:53 +08:00
|
|
|
mpage = f2fs_grab_cache_page(META_MAPPING(fio.sbi),
|
|
|
|
fio.old_blkaddr, false);
|
2020-07-01 10:27:09 +08:00
|
|
|
if (!mpage) {
|
|
|
|
err = -ENOMEM;
|
2019-07-18 09:31:53 +08:00
|
|
|
goto up_out;
|
2020-07-01 10:27:09 +08:00
|
|
|
}
|
2019-07-18 09:31:53 +08:00
|
|
|
|
|
|
|
fio.encrypted_page = mpage;
|
|
|
|
|
|
|
|
/* read source block in mpage */
|
|
|
|
if (!PageUptodate(mpage)) {
|
|
|
|
err = f2fs_submit_page_bio(&fio);
|
|
|
|
if (err) {
|
|
|
|
f2fs_put_page(mpage, 1);
|
|
|
|
goto up_out;
|
|
|
|
}
|
2020-04-16 18:16:56 +08:00
|
|
|
|
|
|
|
f2fs_update_iostat(fio.sbi, FS_DATA_READ_IO, F2FS_BLKSIZE);
|
2020-04-23 18:03:06 +08:00
|
|
|
f2fs_update_iostat(fio.sbi, FS_GDATA_READ_IO, F2FS_BLKSIZE);
|
2020-04-16 18:16:56 +08:00
|
|
|
|
2019-07-18 09:31:53 +08:00
|
|
|
lock_page(mpage);
|
|
|
|
if (unlikely(mpage->mapping != META_MAPPING(fio.sbi) ||
|
|
|
|
!PageUptodate(mpage))) {
|
|
|
|
err = -EIO;
|
|
|
|
f2fs_put_page(mpage, 1);
|
|
|
|
goto up_out;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
f2fs: clean up symbol namespace
As Ted reported:
"Hi, I was looking at f2fs's sources recently, and I noticed that there
is a very large number of non-static symbols which don't have a f2fs
prefix. There's well over a hundred (see attached below).
As one example, in fs/f2fs/dir.c there is:
unsigned char get_de_type(struct f2fs_dir_entry *de)
This function is clearly only useful for f2fs, but it has a generic
name. This means that if any other file system tries to have the same
symbol name, there will be a symbol conflict and the kernel would not
successfully build. It also means that when someone is looking f2fs
sources, it's not at all obvious whether a function such as
read_data_page(), invalidate_blocks(), is a generic kernel function
found in the fs, mm, or block layers, or a f2fs specific function.
You might want to fix this at some point. Hopefully Kent's bcachefs
isn't similarly using genericly named functions, since that might
cause conflicts with f2fs's functions --- but just as this would be a
problem that we would rightly insist that Kent fix, this is something
that we should have rightly insisted that f2fs should have fixed
before it was integrated into the mainline kernel.
acquire_orphan_inode
add_ino_entry
add_orphan_inode
allocate_data_block
allocate_new_segments
alloc_nid
alloc_nid_done
alloc_nid_failed
available_free_memory
...."
This patch adds "f2fs_" prefix for all non-static symbols in order to:
a) avoid conflict with other kernel generic symbols;
b) to indicate the function is f2fs specific one instead of generic
one;
Reported-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-30 00:20:41 +08:00
|
|
|
f2fs_allocate_data_block(fio.sbi, NULL, fio.old_blkaddr, &newaddr,
|
f2fs: support age threshold based garbage collection
There are several issues in current background GC algorithm:
- valid blocks is one of key factors during cost overhead calculation,
so if segment has less valid block, however even its age is young or
it locates hot segment, CB algorithm will still choose the segment as
victim, it's not appropriate.
- GCed data/node will go to existing logs, no matter in-there datas'
update frequency is the same or not, it may mix hot and cold data
again.
- GC alloctor mainly use LFS type segment, it will cost free segment
more quickly.
This patch introduces a new algorithm named age threshold based
garbage collection to solve above issues, there are three steps
mainly:
1. select a source victim:
- set an age threshold, and select candidates beased threshold:
e.g.
0 means youngest, 100 means oldest, if we set age threshold to 80
then select dirty segments which has age in range of [80, 100] as
candiddates;
- set candidate_ratio threshold, and select candidates based the
ratio, so that we can shrink candidates to those oldest segments;
- select target segment with fewest valid blocks in order to
migrate blocks with minimum cost;
2. select a target victim:
- select candidates beased age threshold;
- set candidate_radius threshold, search candidates whose age is
around source victims, searching radius should less than the
radius threshold.
- select target segment with most valid blocks in order to avoid
migrating current target segment.
3. merge valid blocks from source victim into target victim with
SSR alloctor.
Test steps:
- create 160 dirty segments:
* half of them have 128 valid blocks per segment
* left of them have 384 valid blocks per segment
- run background GC
Benefit: GC count and block movement count both decrease obviously:
- Before:
- Valid: 86
- Dirty: 1
- Prefree: 11
- Free: 6001 (6001)
GC calls: 162 (BG: 220)
- data segments : 160 (160)
- node segments : 2 (2)
Try to move 41454 blocks (BG: 41454)
- data blocks : 40960 (40960)
- node blocks : 494 (494)
IPU: 0 blocks
SSR: 0 blocks in 0 segments
LFS: 41364 blocks in 81 segments
- After:
- Valid: 87
- Dirty: 0
- Prefree: 4
- Free: 6008 (6008)
GC calls: 75 (BG: 76)
- data segments : 74 (74)
- node segments : 1 (1)
Try to move 12813 blocks (BG: 12813)
- data blocks : 12544 (12544)
- node blocks : 269 (269)
IPU: 0 blocks
SSR: 12032 blocks in 77 segments
LFS: 855 blocks in 2 segments
Signed-off-by: Chao Yu <yuchao0@huawei.com>
[Jaegeuk Kim: fix a bug along with pinfile in-mem segment & clean up]
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-08-04 21:14:49 +08:00
|
|
|
&sum, type, NULL);
|
2016-02-23 17:52:43 +08:00
|
|
|
|
2017-10-28 16:52:30 +08:00
|
|
|
fio.encrypted_page = f2fs_pagecache_get_page(META_MAPPING(fio.sbi),
|
|
|
|
newaddr, FGP_LOCK | FGP_CREAT, GFP_NOFS);
|
2016-02-23 17:52:43 +08:00
|
|
|
if (!fio.encrypted_page) {
|
|
|
|
err = -ENOMEM;
|
f2fs: readahead encrypted block during GC
During GC, for each encrypted block, we will read block synchronously
into meta page, and then submit it into current cold data log area.
So this block read model with 4k granularity can make poor performance,
like migrating non-encrypted block, let's readahead encrypted block
as well to improve migration performance.
To implement this, we choose meta page that its index is old block
address of the encrypted block, and readahead ciphertext into this
page, later, if readaheaded page is still updated, we will load its
data into target meta page, and submit the write IO.
Note that for OPU, truncation, deletion, we need to invalid meta
page after we invalid old block address, to make sure we won't load
invalid data from target meta page during encrypted block migration.
for ((i = 0; i < 1000; i++))
do {
xfs_io -f /mnt/f2fs/dir/$i -c "pwrite 0 128k" -c "fsync";
} done
for ((i = 0; i < 1000; i+=2))
do {
rm /mnt/f2fs/dir/$i;
} done
ret = ioctl(fd, F2FS_IOC_GARBAGE_COLLECT, 0);
Before:
gc-6549 [001] d..1 214682.212797: block_rq_insert: 8,32 RA 32768 () 786400 + 64 [gc]
gc-6549 [001] d..1 214682.212802: block_unplug: [gc] 1
gc-6549 [001] .... 214682.213892: block_bio_queue: 8,32 R 67494144 + 8 [gc]
gc-6549 [001] .... 214682.213899: block_getrq: 8,32 R 67494144 + 8 [gc]
gc-6549 [001] .... 214682.213902: block_plug: [gc]
gc-6549 [001] d..1 214682.213905: block_rq_insert: 8,32 R 4096 () 67494144 + 8 [gc]
gc-6549 [001] d..1 214682.213908: block_unplug: [gc] 1
gc-6549 [001] .... 214682.226405: block_bio_queue: 8,32 R 67494152 + 8 [gc]
gc-6549 [001] .... 214682.226412: block_getrq: 8,32 R 67494152 + 8 [gc]
gc-6549 [001] .... 214682.226414: block_plug: [gc]
gc-6549 [001] d..1 214682.226417: block_rq_insert: 8,32 R 4096 () 67494152 + 8 [gc]
gc-6549 [001] d..1 214682.226420: block_unplug: [gc] 1
gc-6549 [001] .... 214682.226904: block_bio_queue: 8,32 R 67494160 + 8 [gc]
gc-6549 [001] .... 214682.226910: block_getrq: 8,32 R 67494160 + 8 [gc]
gc-6549 [001] .... 214682.226911: block_plug: [gc]
gc-6549 [001] d..1 214682.226914: block_rq_insert: 8,32 R 4096 () 67494160 + 8 [gc]
gc-6549 [001] d..1 214682.226916: block_unplug: [gc] 1
After:
gc-5678 [003] .... 214327.025906: block_bio_queue: 8,32 R 67493824 + 8 [gc]
gc-5678 [003] .... 214327.025908: block_bio_backmerge: 8,32 R 67493824 + 8 [gc]
gc-5678 [003] .... 214327.025915: block_bio_queue: 8,32 R 67493832 + 8 [gc]
gc-5678 [003] .... 214327.025917: block_bio_backmerge: 8,32 R 67493832 + 8 [gc]
gc-5678 [003] .... 214327.025923: block_bio_queue: 8,32 R 67493840 + 8 [gc]
gc-5678 [003] .... 214327.025925: block_bio_backmerge: 8,32 R 67493840 + 8 [gc]
gc-5678 [003] .... 214327.025932: block_bio_queue: 8,32 R 67493848 + 8 [gc]
gc-5678 [003] .... 214327.025934: block_bio_backmerge: 8,32 R 67493848 + 8 [gc]
gc-5678 [003] .... 214327.025941: block_bio_queue: 8,32 R 67493856 + 8 [gc]
gc-5678 [003] .... 214327.025943: block_bio_backmerge: 8,32 R 67493856 + 8 [gc]
gc-5678 [003] .... 214327.025953: block_bio_queue: 8,32 R 67493864 + 8 [gc]
gc-5678 [003] .... 214327.025955: block_bio_backmerge: 8,32 R 67493864 + 8 [gc]
gc-5678 [003] .... 214327.025962: block_bio_queue: 8,32 R 67493872 + 8 [gc]
gc-5678 [003] .... 214327.025964: block_bio_backmerge: 8,32 R 67493872 + 8 [gc]
gc-5678 [003] .... 214327.025970: block_bio_queue: 8,32 R 67493880 + 8 [gc]
gc-5678 [003] .... 214327.025972: block_bio_backmerge: 8,32 R 67493880 + 8 [gc]
gc-5678 [003] .... 214327.026000: block_bio_queue: 8,32 WS 34123776 + 2048 [gc]
gc-5678 [003] .... 214327.026019: block_getrq: 8,32 WS 34123776 + 2048 [gc]
gc-5678 [003] d..1 214327.026021: block_rq_insert: 8,32 R 131072 () 67493632 + 256 [gc]
gc-5678 [003] d..1 214327.026023: block_unplug: [gc] 1
gc-5678 [003] d..1 214327.026026: block_rq_issue: 8,32 R 131072 () 67493632 + 256 [gc]
gc-5678 [003] .... 214327.026046: block_plug: [gc]
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-08-14 22:37:25 +08:00
|
|
|
f2fs_put_page(mpage, 1);
|
2019-07-18 09:31:53 +08:00
|
|
|
goto recover_block;
|
2016-02-23 17:52:43 +08:00
|
|
|
}
|
2015-07-14 08:44:14 +08:00
|
|
|
|
2019-07-18 09:31:53 +08:00
|
|
|
/* write target block */
|
2018-12-25 17:43:42 +08:00
|
|
|
f2fs_wait_on_page_writeback(fio.encrypted_page, DATA, true, true);
|
2019-07-18 09:31:53 +08:00
|
|
|
memcpy(page_address(fio.encrypted_page),
|
|
|
|
page_address(mpage), PAGE_SIZE);
|
|
|
|
f2fs_put_page(mpage, 1);
|
|
|
|
invalidate_mapping_pages(META_MAPPING(fio.sbi),
|
|
|
|
fio.old_blkaddr, fio.old_blkaddr);
|
|
|
|
|
2018-12-12 18:12:30 +08:00
|
|
|
set_page_dirty(fio.encrypted_page);
|
2015-07-25 15:29:17 +08:00
|
|
|
if (clear_page_dirty_for_io(fio.encrypted_page))
|
|
|
|
dec_page_count(fio.sbi, F2FS_DIRTY_META);
|
|
|
|
|
2015-07-14 08:44:14 +08:00
|
|
|
set_page_writeback(fio.encrypted_page);
|
2018-04-12 14:09:04 +08:00
|
|
|
ClearPageError(page);
|
2015-04-24 03:04:33 +08:00
|
|
|
|
|
|
|
/* allocate block address */
|
2018-12-25 17:43:42 +08:00
|
|
|
f2fs_wait_on_page_writeback(dn.node_page, NODE, true, true);
|
2016-02-23 17:52:43 +08:00
|
|
|
|
2016-06-06 03:31:55 +08:00
|
|
|
fio.op = REQ_OP_WRITE;
|
2016-11-01 21:40:10 +08:00
|
|
|
fio.op_flags = REQ_SYNC;
|
2016-02-23 17:52:43 +08:00
|
|
|
fio.new_blkaddr = newaddr;
|
2018-05-28 23:47:18 +08:00
|
|
|
f2fs_submit_page_write(&fio);
|
|
|
|
if (fio.retry) {
|
2018-09-13 07:40:53 +08:00
|
|
|
err = -EAGAIN;
|
2018-01-17 12:11:31 +08:00
|
|
|
if (PageWriteback(fio.encrypted_page))
|
|
|
|
end_page_writeback(fio.encrypted_page);
|
|
|
|
goto put_page_out;
|
|
|
|
}
|
2015-04-24 03:04:33 +08:00
|
|
|
|
2017-08-02 23:21:48 +08:00
|
|
|
f2fs_update_iostat(fio.sbi, FS_GC_DATA_IO, F2FS_BLKSIZE);
|
|
|
|
|
2016-02-24 17:16:47 +08:00
|
|
|
f2fs_update_data_blkaddr(&dn, newaddr);
|
2016-05-21 01:13:22 +08:00
|
|
|
set_inode_flag(inode, FI_APPEND_WRITE);
|
2015-04-24 03:04:33 +08:00
|
|
|
if (page->index == 0)
|
2016-05-21 01:13:22 +08:00
|
|
|
set_inode_flag(inode, FI_FIRST_BLOCK_WRITTEN);
|
2015-07-14 08:44:14 +08:00
|
|
|
put_page_out:
|
2015-04-24 03:04:33 +08:00
|
|
|
f2fs_put_page(fio.encrypted_page, 1);
|
2016-02-23 17:52:43 +08:00
|
|
|
recover_block:
|
|
|
|
if (err)
|
f2fs: clean up symbol namespace
As Ted reported:
"Hi, I was looking at f2fs's sources recently, and I noticed that there
is a very large number of non-static symbols which don't have a f2fs
prefix. There's well over a hundred (see attached below).
As one example, in fs/f2fs/dir.c there is:
unsigned char get_de_type(struct f2fs_dir_entry *de)
This function is clearly only useful for f2fs, but it has a generic
name. This means that if any other file system tries to have the same
symbol name, there will be a symbol conflict and the kernel would not
successfully build. It also means that when someone is looking f2fs
sources, it's not at all obvious whether a function such as
read_data_page(), invalidate_blocks(), is a generic kernel function
found in the fs, mm, or block layers, or a f2fs specific function.
You might want to fix this at some point. Hopefully Kent's bcachefs
isn't similarly using genericly named functions, since that might
cause conflicts with f2fs's functions --- but just as this would be a
problem that we would rightly insist that Kent fix, this is something
that we should have rightly insisted that f2fs should have fixed
before it was integrated into the mainline kernel.
acquire_orphan_inode
add_ino_entry
add_orphan_inode
allocate_data_block
allocate_new_segments
alloc_nid
alloc_nid_done
alloc_nid_failed
available_free_memory
...."
This patch adds "f2fs_" prefix for all non-static symbols in order to:
a) avoid conflict with other kernel generic symbols;
b) to indicate the function is f2fs specific one instead of generic
one;
Reported-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-30 00:20:41 +08:00
|
|
|
f2fs_do_replace_block(fio.sbi, &sum, newaddr, fio.old_blkaddr,
|
2020-08-04 21:14:47 +08:00
|
|
|
true, true, true);
|
2019-07-18 09:31:53 +08:00
|
|
|
up_out:
|
|
|
|
if (lfs_mode)
|
|
|
|
up_write(&fio.sbi->io_order_lock);
|
2015-04-24 03:04:33 +08:00
|
|
|
put_out:
|
|
|
|
f2fs_put_dnode(&dn);
|
|
|
|
out:
|
|
|
|
f2fs_put_page(page, 1);
|
2018-09-13 07:40:53 +08:00
|
|
|
return err;
|
2015-04-24 03:04:33 +08:00
|
|
|
}
|
|
|
|
|
2018-09-13 07:40:53 +08:00
|
|
|
static int move_data_page(struct inode *inode, block_t bidx, int gc_type,
|
2016-11-07 21:22:31 +08:00
|
|
|
unsigned int segno, int off)
|
f2fs: add garbage collection functions
This adds on-demand and background cleaning functions.
- The basic background cleaning policy is trying to do cleaning jobs as much as
possible whenever the system is idle. Once the background cleaning is done,
the cleaner sleeps an amount of time not to interfere with VFS calls. The time
is dynamically adjusted according to the status of whole segments, which is
decreased when the following conditions are satisfied.
. GC is not conducted currently, and
. IO subsystem is idle by checking the number of requets in bdev's request
list, and
. There are enough dirty segments.
Otherwise, the time is increased incrementally until to the maximum time.
Note that, min and max times are 10 secs and 30 secs by default.
- F2FS adopts a default victim selection policy where background cleaning uses
a cost-benefit algorithm, while on-demand cleaning uses a greedy algorithm.
- The method of moving data during the cleaning is slightly different between
background and on-demand cleaning schemes. In the case of background cleaning,
F2FS loads the data, and marks them as dirty. Then, F2FS expects that the data
will be moved by flusher or VM. In the case of on-demand cleaning, F2FS should
move the data right away.
- In order to identify valid blocks in a victim segment, F2FS scans the bitmap
of the segment managed as an SIT entry.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:13:01 +08:00
|
|
|
{
|
2015-04-25 05:34:30 +08:00
|
|
|
struct page *page;
|
2018-09-13 07:40:53 +08:00
|
|
|
int err = 0;
|
2015-04-25 05:34:30 +08:00
|
|
|
|
f2fs: clean up symbol namespace
As Ted reported:
"Hi, I was looking at f2fs's sources recently, and I noticed that there
is a very large number of non-static symbols which don't have a f2fs
prefix. There's well over a hundred (see attached below).
As one example, in fs/f2fs/dir.c there is:
unsigned char get_de_type(struct f2fs_dir_entry *de)
This function is clearly only useful for f2fs, but it has a generic
name. This means that if any other file system tries to have the same
symbol name, there will be a symbol conflict and the kernel would not
successfully build. It also means that when someone is looking f2fs
sources, it's not at all obvious whether a function such as
read_data_page(), invalidate_blocks(), is a generic kernel function
found in the fs, mm, or block layers, or a f2fs specific function.
You might want to fix this at some point. Hopefully Kent's bcachefs
isn't similarly using genericly named functions, since that might
cause conflicts with f2fs's functions --- but just as this would be a
problem that we would rightly insist that Kent fix, this is something
that we should have rightly insisted that f2fs should have fixed
before it was integrated into the mainline kernel.
acquire_orphan_inode
add_ino_entry
add_orphan_inode
allocate_data_block
allocate_new_segments
alloc_nid
alloc_nid_done
alloc_nid_failed
available_free_memory
...."
This patch adds "f2fs_" prefix for all non-static symbols in order to:
a) avoid conflict with other kernel generic symbols;
b) to indicate the function is f2fs specific one instead of generic
one;
Reported-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-30 00:20:41 +08:00
|
|
|
page = f2fs_get_lock_data_page(inode, bidx, true);
|
2015-04-25 05:34:30 +08:00
|
|
|
if (IS_ERR(page))
|
2018-09-13 07:40:53 +08:00
|
|
|
return PTR_ERR(page);
|
2013-12-09 16:09:00 +08:00
|
|
|
|
2018-09-13 07:40:53 +08:00
|
|
|
if (!check_valid_map(F2FS_I_SB(inode), segno, off)) {
|
|
|
|
err = -ENOENT;
|
2016-11-07 21:22:31 +08:00
|
|
|
goto out;
|
2018-09-13 07:40:53 +08:00
|
|
|
}
|
2016-11-07 21:22:31 +08:00
|
|
|
|
f2fs: avoid stucking GC due to atomic write
f2fs doesn't allow abuse on atomic write class interface, so except
limiting in-mem pages' total memory usage capacity, we need to limit
atomic-write usage as well when filesystem is seriously fragmented,
otherwise we may run into infinite loop during foreground GC because
target blocks in victim segment are belong to atomic opened file for
long time.
Now, we will detect failure due to atomic write in foreground GC, if
the count exceeds threshold, we will drop all atomic written data in
cache, by this, I expect it can keep our system running safely to
prevent Dos attack.
In addition, his patch adds to show GC skip information in debugfs,
now it just shows count of skipped caused by atomic write.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-07 20:28:54 +08:00
|
|
|
if (f2fs_is_atomic_file(inode)) {
|
|
|
|
F2FS_I(inode)->i_gc_failures[GC_FAILURE_ATOMIC]++;
|
|
|
|
F2FS_I_SB(inode)->skipped_atomic_files[gc_type]++;
|
2018-09-13 07:40:53 +08:00
|
|
|
err = -EAGAIN;
|
2017-01-07 18:50:26 +08:00
|
|
|
goto out;
|
f2fs: avoid stucking GC due to atomic write
f2fs doesn't allow abuse on atomic write class interface, so except
limiting in-mem pages' total memory usage capacity, we need to limit
atomic-write usage as well when filesystem is seriously fragmented,
otherwise we may run into infinite loop during foreground GC because
target blocks in victim segment are belong to atomic opened file for
long time.
Now, we will detect failure due to atomic write in foreground GC, if
the count exceeds threshold, we will drop all atomic written data in
cache, by this, I expect it can keep our system running safely to
prevent Dos attack.
In addition, his patch adds to show GC skip information in debugfs,
now it just shows count of skipped caused by atomic write.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-07 20:28:54 +08:00
|
|
|
}
|
2017-12-08 08:25:39 +08:00
|
|
|
if (f2fs_is_pinned_file(inode)) {
|
|
|
|
if (gc_type == FG_GC)
|
|
|
|
f2fs_pin_file_control(inode, true);
|
2018-09-13 07:40:53 +08:00
|
|
|
err = -EAGAIN;
|
2017-12-08 08:25:39 +08:00
|
|
|
goto out;
|
|
|
|
}
|
2017-01-07 18:50:26 +08:00
|
|
|
|
f2fs: add garbage collection functions
This adds on-demand and background cleaning functions.
- The basic background cleaning policy is trying to do cleaning jobs as much as
possible whenever the system is idle. Once the background cleaning is done,
the cleaner sleeps an amount of time not to interfere with VFS calls. The time
is dynamically adjusted according to the status of whole segments, which is
decreased when the following conditions are satisfied.
. GC is not conducted currently, and
. IO subsystem is idle by checking the number of requets in bdev's request
list, and
. There are enough dirty segments.
Otherwise, the time is increased incrementally until to the maximum time.
Note that, min and max times are 10 secs and 30 secs by default.
- F2FS adopts a default victim selection policy where background cleaning uses
a cost-benefit algorithm, while on-demand cleaning uses a greedy algorithm.
- The method of moving data during the cleaning is slightly different between
background and on-demand cleaning schemes. In the case of background cleaning,
F2FS loads the data, and marks them as dirty. Then, F2FS expects that the data
will be moved by flusher or VM. In the case of on-demand cleaning, F2FS should
move the data right away.
- In order to identify valid blocks in a victim segment, F2FS scans the bitmap
of the segment managed as an SIT entry.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:13:01 +08:00
|
|
|
if (gc_type == BG_GC) {
|
2018-09-13 07:40:53 +08:00
|
|
|
if (PageWriteback(page)) {
|
|
|
|
err = -EAGAIN;
|
2013-03-31 12:49:18 +08:00
|
|
|
goto out;
|
2018-09-13 07:40:53 +08:00
|
|
|
}
|
f2fs: add garbage collection functions
This adds on-demand and background cleaning functions.
- The basic background cleaning policy is trying to do cleaning jobs as much as
possible whenever the system is idle. Once the background cleaning is done,
the cleaner sleeps an amount of time not to interfere with VFS calls. The time
is dynamically adjusted according to the status of whole segments, which is
decreased when the following conditions are satisfied.
. GC is not conducted currently, and
. IO subsystem is idle by checking the number of requets in bdev's request
list, and
. There are enough dirty segments.
Otherwise, the time is increased incrementally until to the maximum time.
Note that, min and max times are 10 secs and 30 secs by default.
- F2FS adopts a default victim selection policy where background cleaning uses
a cost-benefit algorithm, while on-demand cleaning uses a greedy algorithm.
- The method of moving data during the cleaning is slightly different between
background and on-demand cleaning schemes. In the case of background cleaning,
F2FS loads the data, and marks them as dirty. Then, F2FS expects that the data
will be moved by flusher or VM. In the case of on-demand cleaning, F2FS should
move the data right away.
- In order to identify valid blocks in a victim segment, F2FS scans the bitmap
of the segment managed as an SIT entry.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:13:01 +08:00
|
|
|
set_page_dirty(page);
|
|
|
|
set_cold_data(page);
|
|
|
|
} else {
|
2015-04-25 05:34:30 +08:00
|
|
|
struct f2fs_io_info fio = {
|
|
|
|
.sbi = F2FS_I_SB(inode),
|
2017-09-29 13:59:38 +08:00
|
|
|
.ino = inode->i_ino,
|
2015-04-25 05:34:30 +08:00
|
|
|
.type = DATA,
|
2017-05-11 02:18:25 +08:00
|
|
|
.temp = COLD,
|
2016-06-06 03:31:55 +08:00
|
|
|
.op = REQ_OP_WRITE,
|
2016-11-01 21:40:10 +08:00
|
|
|
.op_flags = REQ_SYNC,
|
2017-04-25 20:45:13 +08:00
|
|
|
.old_blkaddr = NULL_ADDR,
|
2015-04-25 05:34:30 +08:00
|
|
|
.page = page,
|
2015-04-24 03:04:33 +08:00
|
|
|
.encrypted_page = NULL,
|
2017-05-13 04:51:34 +08:00
|
|
|
.need_lock = LOCK_REQ,
|
2017-08-02 23:21:48 +08:00
|
|
|
.io_type = FS_GC_DATA_IO,
|
2015-04-25 05:34:30 +08:00
|
|
|
};
|
2016-07-03 22:05:13 +08:00
|
|
|
bool is_dirty = PageDirty(page);
|
|
|
|
|
|
|
|
retry:
|
2018-12-25 17:43:42 +08:00
|
|
|
f2fs_wait_on_page_writeback(page, DATA, true, true);
|
2018-12-12 18:12:30 +08:00
|
|
|
|
|
|
|
set_page_dirty(page);
|
2016-10-11 22:57:01 +08:00
|
|
|
if (clear_page_dirty_for_io(page)) {
|
2014-09-13 06:53:45 +08:00
|
|
|
inode_dec_dirty_pages(inode);
|
f2fs: clean up symbol namespace
As Ted reported:
"Hi, I was looking at f2fs's sources recently, and I noticed that there
is a very large number of non-static symbols which don't have a f2fs
prefix. There's well over a hundred (see attached below).
As one example, in fs/f2fs/dir.c there is:
unsigned char get_de_type(struct f2fs_dir_entry *de)
This function is clearly only useful for f2fs, but it has a generic
name. This means that if any other file system tries to have the same
symbol name, there will be a symbol conflict and the kernel would not
successfully build. It also means that when someone is looking f2fs
sources, it's not at all obvious whether a function such as
read_data_page(), invalidate_blocks(), is a generic kernel function
found in the fs, mm, or block layers, or a f2fs specific function.
You might want to fix this at some point. Hopefully Kent's bcachefs
isn't similarly using genericly named functions, since that might
cause conflicts with f2fs's functions --- but just as this would be a
problem that we would rightly insist that Kent fix, this is something
that we should have rightly insisted that f2fs should have fixed
before it was integrated into the mainline kernel.
acquire_orphan_inode
add_ino_entry
add_orphan_inode
allocate_data_block
allocate_new_segments
alloc_nid
alloc_nid_done
alloc_nid_failed
available_free_memory
...."
This patch adds "f2fs_" prefix for all non-static symbols in order to:
a) avoid conflict with other kernel generic symbols;
b) to indicate the function is f2fs specific one instead of generic
one;
Reported-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-30 00:20:41 +08:00
|
|
|
f2fs_remove_dirty_inode(inode);
|
2016-10-11 22:57:01 +08:00
|
|
|
}
|
2016-07-03 22:05:13 +08:00
|
|
|
|
f2fs: add garbage collection functions
This adds on-demand and background cleaning functions.
- The basic background cleaning policy is trying to do cleaning jobs as much as
possible whenever the system is idle. Once the background cleaning is done,
the cleaner sleeps an amount of time not to interfere with VFS calls. The time
is dynamically adjusted according to the status of whole segments, which is
decreased when the following conditions are satisfied.
. GC is not conducted currently, and
. IO subsystem is idle by checking the number of requets in bdev's request
list, and
. There are enough dirty segments.
Otherwise, the time is increased incrementally until to the maximum time.
Note that, min and max times are 10 secs and 30 secs by default.
- F2FS adopts a default victim selection policy where background cleaning uses
a cost-benefit algorithm, while on-demand cleaning uses a greedy algorithm.
- The method of moving data during the cleaning is slightly different between
background and on-demand cleaning schemes. In the case of background cleaning,
F2FS loads the data, and marks them as dirty. Then, F2FS expects that the data
will be moved by flusher or VM. In the case of on-demand cleaning, F2FS should
move the data right away.
- In order to identify valid blocks in a victim segment, F2FS scans the bitmap
of the segment managed as an SIT entry.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:13:01 +08:00
|
|
|
set_cold_data(page);
|
2016-07-03 22:05:13 +08:00
|
|
|
|
f2fs: clean up symbol namespace
As Ted reported:
"Hi, I was looking at f2fs's sources recently, and I noticed that there
is a very large number of non-static symbols which don't have a f2fs
prefix. There's well over a hundred (see attached below).
As one example, in fs/f2fs/dir.c there is:
unsigned char get_de_type(struct f2fs_dir_entry *de)
This function is clearly only useful for f2fs, but it has a generic
name. This means that if any other file system tries to have the same
symbol name, there will be a symbol conflict and the kernel would not
successfully build. It also means that when someone is looking f2fs
sources, it's not at all obvious whether a function such as
read_data_page(), invalidate_blocks(), is a generic kernel function
found in the fs, mm, or block layers, or a f2fs specific function.
You might want to fix this at some point. Hopefully Kent's bcachefs
isn't similarly using genericly named functions, since that might
cause conflicts with f2fs's functions --- but just as this would be a
problem that we would rightly insist that Kent fix, this is something
that we should have rightly insisted that f2fs should have fixed
before it was integrated into the mainline kernel.
acquire_orphan_inode
add_ino_entry
add_orphan_inode
allocate_data_block
allocate_new_segments
alloc_nid
alloc_nid_done
alloc_nid_failed
available_free_memory
...."
This patch adds "f2fs_" prefix for all non-static symbols in order to:
a) avoid conflict with other kernel generic symbols;
b) to indicate the function is f2fs specific one instead of generic
one;
Reported-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-30 00:20:41 +08:00
|
|
|
err = f2fs_do_write_data_page(&fio);
|
2018-05-28 16:59:27 +08:00
|
|
|
if (err) {
|
|
|
|
clear_cold_data(page);
|
|
|
|
if (err == -ENOMEM) {
|
2020-02-17 17:45:44 +08:00
|
|
|
congestion_wait(BLK_RW_ASYNC,
|
|
|
|
DEFAULT_IO_TIMEOUT);
|
2018-05-28 16:59:27 +08:00
|
|
|
goto retry;
|
|
|
|
}
|
|
|
|
if (is_dirty)
|
|
|
|
set_page_dirty(page);
|
2016-07-03 22:05:13 +08:00
|
|
|
}
|
f2fs: add garbage collection functions
This adds on-demand and background cleaning functions.
- The basic background cleaning policy is trying to do cleaning jobs as much as
possible whenever the system is idle. Once the background cleaning is done,
the cleaner sleeps an amount of time not to interfere with VFS calls. The time
is dynamically adjusted according to the status of whole segments, which is
decreased when the following conditions are satisfied.
. GC is not conducted currently, and
. IO subsystem is idle by checking the number of requets in bdev's request
list, and
. There are enough dirty segments.
Otherwise, the time is increased incrementally until to the maximum time.
Note that, min and max times are 10 secs and 30 secs by default.
- F2FS adopts a default victim selection policy where background cleaning uses
a cost-benefit algorithm, while on-demand cleaning uses a greedy algorithm.
- The method of moving data during the cleaning is slightly different between
background and on-demand cleaning schemes. In the case of background cleaning,
F2FS loads the data, and marks them as dirty. Then, F2FS expects that the data
will be moved by flusher or VM. In the case of on-demand cleaning, F2FS should
move the data right away.
- In order to identify valid blocks in a victim segment, F2FS scans the bitmap
of the segment managed as an SIT entry.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:13:01 +08:00
|
|
|
}
|
|
|
|
out:
|
|
|
|
f2fs_put_page(page, 1);
|
2018-09-13 07:40:53 +08:00
|
|
|
return err;
|
f2fs: add garbage collection functions
This adds on-demand and background cleaning functions.
- The basic background cleaning policy is trying to do cleaning jobs as much as
possible whenever the system is idle. Once the background cleaning is done,
the cleaner sleeps an amount of time not to interfere with VFS calls. The time
is dynamically adjusted according to the status of whole segments, which is
decreased when the following conditions are satisfied.
. GC is not conducted currently, and
. IO subsystem is idle by checking the number of requets in bdev's request
list, and
. There are enough dirty segments.
Otherwise, the time is increased incrementally until to the maximum time.
Note that, min and max times are 10 secs and 30 secs by default.
- F2FS adopts a default victim selection policy where background cleaning uses
a cost-benefit algorithm, while on-demand cleaning uses a greedy algorithm.
- The method of moving data during the cleaning is slightly different between
background and on-demand cleaning schemes. In the case of background cleaning,
F2FS loads the data, and marks them as dirty. Then, F2FS expects that the data
will be moved by flusher or VM. In the case of on-demand cleaning, F2FS should
move the data right away.
- In order to identify valid blocks in a victim segment, F2FS scans the bitmap
of the segment managed as an SIT entry.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:13:01 +08:00
|
|
|
}
|
|
|
|
|
2012-11-29 12:28:09 +08:00
|
|
|
/*
|
f2fs: add garbage collection functions
This adds on-demand and background cleaning functions.
- The basic background cleaning policy is trying to do cleaning jobs as much as
possible whenever the system is idle. Once the background cleaning is done,
the cleaner sleeps an amount of time not to interfere with VFS calls. The time
is dynamically adjusted according to the status of whole segments, which is
decreased when the following conditions are satisfied.
. GC is not conducted currently, and
. IO subsystem is idle by checking the number of requets in bdev's request
list, and
. There are enough dirty segments.
Otherwise, the time is increased incrementally until to the maximum time.
Note that, min and max times are 10 secs and 30 secs by default.
- F2FS adopts a default victim selection policy where background cleaning uses
a cost-benefit algorithm, while on-demand cleaning uses a greedy algorithm.
- The method of moving data during the cleaning is slightly different between
background and on-demand cleaning schemes. In the case of background cleaning,
F2FS loads the data, and marks them as dirty. Then, F2FS expects that the data
will be moved by flusher or VM. In the case of on-demand cleaning, F2FS should
move the data right away.
- In order to identify valid blocks in a victim segment, F2FS scans the bitmap
of the segment managed as an SIT entry.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:13:01 +08:00
|
|
|
* This function tries to get parent node of victim data block, and identifies
|
|
|
|
* data block validity. If the block is valid, copy that with cold status and
|
|
|
|
* modify parent node.
|
|
|
|
* If the parent node is not valid or the data block address is different,
|
|
|
|
* the victim data block is ignored.
|
|
|
|
*/
|
2018-09-13 07:40:53 +08:00
|
|
|
static int gc_data_segment(struct f2fs_sb_info *sbi, struct f2fs_summary *sum,
|
2014-11-28 23:49:40 +08:00
|
|
|
struct gc_inode_list *gc_list, unsigned int segno, int gc_type)
|
f2fs: add garbage collection functions
This adds on-demand and background cleaning functions.
- The basic background cleaning policy is trying to do cleaning jobs as much as
possible whenever the system is idle. Once the background cleaning is done,
the cleaner sleeps an amount of time not to interfere with VFS calls. The time
is dynamically adjusted according to the status of whole segments, which is
decreased when the following conditions are satisfied.
. GC is not conducted currently, and
. IO subsystem is idle by checking the number of requets in bdev's request
list, and
. There are enough dirty segments.
Otherwise, the time is increased incrementally until to the maximum time.
Note that, min and max times are 10 secs and 30 secs by default.
- F2FS adopts a default victim selection policy where background cleaning uses
a cost-benefit algorithm, while on-demand cleaning uses a greedy algorithm.
- The method of moving data during the cleaning is slightly different between
background and on-demand cleaning schemes. In the case of background cleaning,
F2FS loads the data, and marks them as dirty. Then, F2FS expects that the data
will be moved by flusher or VM. In the case of on-demand cleaning, F2FS should
move the data right away.
- In order to identify valid blocks in a victim segment, F2FS scans the bitmap
of the segment managed as an SIT entry.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:13:01 +08:00
|
|
|
{
|
|
|
|
struct super_block *sb = sbi->sb;
|
|
|
|
struct f2fs_summary *entry;
|
|
|
|
block_t start_addr;
|
2013-02-04 14:11:17 +08:00
|
|
|
int off;
|
f2fs: add garbage collection functions
This adds on-demand and background cleaning functions.
- The basic background cleaning policy is trying to do cleaning jobs as much as
possible whenever the system is idle. Once the background cleaning is done,
the cleaner sleeps an amount of time not to interfere with VFS calls. The time
is dynamically adjusted according to the status of whole segments, which is
decreased when the following conditions are satisfied.
. GC is not conducted currently, and
. IO subsystem is idle by checking the number of requets in bdev's request
list, and
. There are enough dirty segments.
Otherwise, the time is increased incrementally until to the maximum time.
Note that, min and max times are 10 secs and 30 secs by default.
- F2FS adopts a default victim selection policy where background cleaning uses
a cost-benefit algorithm, while on-demand cleaning uses a greedy algorithm.
- The method of moving data during the cleaning is slightly different between
background and on-demand cleaning schemes. In the case of background cleaning,
F2FS loads the data, and marks them as dirty. Then, F2FS expects that the data
will be moved by flusher or VM. In the case of on-demand cleaning, F2FS should
move the data right away.
- In order to identify valid blocks in a victim segment, F2FS scans the bitmap
of the segment managed as an SIT entry.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:13:01 +08:00
|
|
|
int phase = 0;
|
2018-09-13 07:40:53 +08:00
|
|
|
int submitted = 0;
|
f2fs: support zone capacity less than zone size
NVMe Zoned Namespace devices can have zone-capacity less than zone-size.
Zone-capacity indicates the maximum number of sectors that are usable in
a zone beginning from the first sector of the zone. This makes the sectors
sectors after the zone-capacity till zone-size to be unusable.
This patch set tracks zone-size and zone-capacity in zoned devices and
calculate the usable blocks per segment and usable segments per section.
If zone-capacity is less than zone-size mark only those segments which
start before zone-capacity as free segments. All segments at and beyond
zone-capacity are treated as permanently used segments. In cases where
zone-capacity does not align with segment size the last segment will start
before zone-capacity and end beyond the zone-capacity of the zone. For
such spanning segments only sectors within the zone-capacity are used.
During writes and GC manage the usable segments in a section and usable
blocks per segment. Segments which are beyond zone-capacity are never
allocated, and do not need to be garbage collected, only the segments
which are before zone-capacity needs to garbage collected.
For spanning segments based on the number of usable blocks in that
segment, write to blocks only up to zone-capacity.
Zone-capacity is device specific and cannot be configured by the user.
Since NVMe ZNS device zones are sequentially write only, a block device
with conventional zones or any normal block device is needed along with
the ZNS device for the metadata operations of F2fs.
A typical nvme-cli output of a zoned device shows zone start and capacity
and write pointer as below:
SLBA: 0x0 WP: 0x0 Cap: 0x18800 State: EMPTY Type: SEQWRITE_REQ
SLBA: 0x20000 WP: 0x20000 Cap: 0x18800 State: EMPTY Type: SEQWRITE_REQ
SLBA: 0x40000 WP: 0x40000 Cap: 0x18800 State: EMPTY Type: SEQWRITE_REQ
Here zone size is 64MB, capacity is 49MB, WP is at zone start as the zones
are in EMPTY state. For each zone, only zone start + 49MB is usable area,
any lba/sector after 49MB cannot be read or written to, the drive will fail
any attempts to read/write. So, the second zone starts at 64MB and is
usable till 113MB (64 + 49) and the range between 113 and 128MB is
again unusable. The next zone starts at 128MB, and so on.
Signed-off-by: Aravind Ramesh <aravind.ramesh@wdc.com>
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-07-16 20:56:56 +08:00
|
|
|
unsigned int usable_blks_in_seg = f2fs_usable_blks_in_seg(sbi, segno);
|
f2fs: add garbage collection functions
This adds on-demand and background cleaning functions.
- The basic background cleaning policy is trying to do cleaning jobs as much as
possible whenever the system is idle. Once the background cleaning is done,
the cleaner sleeps an amount of time not to interfere with VFS calls. The time
is dynamically adjusted according to the status of whole segments, which is
decreased when the following conditions are satisfied.
. GC is not conducted currently, and
. IO subsystem is idle by checking the number of requets in bdev's request
list, and
. There are enough dirty segments.
Otherwise, the time is increased incrementally until to the maximum time.
Note that, min and max times are 10 secs and 30 secs by default.
- F2FS adopts a default victim selection policy where background cleaning uses
a cost-benefit algorithm, while on-demand cleaning uses a greedy algorithm.
- The method of moving data during the cleaning is slightly different between
background and on-demand cleaning schemes. In the case of background cleaning,
F2FS loads the data, and marks them as dirty. Then, F2FS expects that the data
will be moved by flusher or VM. In the case of on-demand cleaning, F2FS should
move the data right away.
- In order to identify valid blocks in a victim segment, F2FS scans the bitmap
of the segment managed as an SIT entry.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:13:01 +08:00
|
|
|
|
|
|
|
start_addr = START_BLOCK(sbi, segno);
|
|
|
|
|
|
|
|
next_step:
|
|
|
|
entry = sum;
|
f2fs: give a chance to merge IOs by IO scheduler
Previously, background GC submits many 4KB read requests to load victim blocks
and/or its (i)node blocks.
...
f2fs_gc : f2fs_readpage: ino = 1, page_index = 0xb61, blkaddr = 0x3b964ed
f2fs_gc : block_rq_complete: 8,16 R () 499854968 + 8 [0]
f2fs_gc : f2fs_readpage: ino = 1, page_index = 0xb6f, blkaddr = 0x3b964ee
f2fs_gc : block_rq_complete: 8,16 R () 499854976 + 8 [0]
f2fs_gc : f2fs_readpage: ino = 1, page_index = 0xb79, blkaddr = 0x3b964ef
f2fs_gc : block_rq_complete: 8,16 R () 499854984 + 8 [0]
...
However, by the fact that many IOs are sequential, we can give a chance to merge
the IOs by IO scheduler.
In order to do that, let's use blk_plug.
...
f2fs_gc : f2fs_iget: ino = 143
f2fs_gc : f2fs_readpage: ino = 143, page_index = 0x1c6, blkaddr = 0x2e6ee
f2fs_gc : f2fs_iget: ino = 143
f2fs_gc : f2fs_readpage: ino = 143, page_index = 0x1c7, blkaddr = 0x2e6ef
<idle> : block_rq_complete: 8,16 R () 1519616 + 8 [0]
<idle> : block_rq_complete: 8,16 R () 1519848 + 8 [0]
<idle> : block_rq_complete: 8,16 R () 1520432 + 96 [0]
<idle> : block_rq_complete: 8,16 R () 1520536 + 104 [0]
<idle> : block_rq_complete: 8,16 R () 1521008 + 112 [0]
<idle> : block_rq_complete: 8,16 R () 1521440 + 152 [0]
<idle> : block_rq_complete: 8,16 R () 1521688 + 144 [0]
<idle> : block_rq_complete: 8,16 R () 1522128 + 192 [0]
<idle> : block_rq_complete: 8,16 R () 1523256 + 328 [0]
...
Note that this issue should be addressed in checkpoint, and some readahead
flows too.
Reviewed-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-04-24 12:19:56 +08:00
|
|
|
|
f2fs: support zone capacity less than zone size
NVMe Zoned Namespace devices can have zone-capacity less than zone-size.
Zone-capacity indicates the maximum number of sectors that are usable in
a zone beginning from the first sector of the zone. This makes the sectors
sectors after the zone-capacity till zone-size to be unusable.
This patch set tracks zone-size and zone-capacity in zoned devices and
calculate the usable blocks per segment and usable segments per section.
If zone-capacity is less than zone-size mark only those segments which
start before zone-capacity as free segments. All segments at and beyond
zone-capacity are treated as permanently used segments. In cases where
zone-capacity does not align with segment size the last segment will start
before zone-capacity and end beyond the zone-capacity of the zone. For
such spanning segments only sectors within the zone-capacity are used.
During writes and GC manage the usable segments in a section and usable
blocks per segment. Segments which are beyond zone-capacity are never
allocated, and do not need to be garbage collected, only the segments
which are before zone-capacity needs to garbage collected.
For spanning segments based on the number of usable blocks in that
segment, write to blocks only up to zone-capacity.
Zone-capacity is device specific and cannot be configured by the user.
Since NVMe ZNS device zones are sequentially write only, a block device
with conventional zones or any normal block device is needed along with
the ZNS device for the metadata operations of F2fs.
A typical nvme-cli output of a zoned device shows zone start and capacity
and write pointer as below:
SLBA: 0x0 WP: 0x0 Cap: 0x18800 State: EMPTY Type: SEQWRITE_REQ
SLBA: 0x20000 WP: 0x20000 Cap: 0x18800 State: EMPTY Type: SEQWRITE_REQ
SLBA: 0x40000 WP: 0x40000 Cap: 0x18800 State: EMPTY Type: SEQWRITE_REQ
Here zone size is 64MB, capacity is 49MB, WP is at zone start as the zones
are in EMPTY state. For each zone, only zone start + 49MB is usable area,
any lba/sector after 49MB cannot be read or written to, the drive will fail
any attempts to read/write. So, the second zone starts at 64MB and is
usable till 113MB (64 + 49) and the range between 113 and 128MB is
again unusable. The next zone starts at 128MB, and so on.
Signed-off-by: Aravind Ramesh <aravind.ramesh@wdc.com>
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-07-16 20:56:56 +08:00
|
|
|
for (off = 0; off < usable_blks_in_seg; off++, entry++) {
|
f2fs: add garbage collection functions
This adds on-demand and background cleaning functions.
- The basic background cleaning policy is trying to do cleaning jobs as much as
possible whenever the system is idle. Once the background cleaning is done,
the cleaner sleeps an amount of time not to interfere with VFS calls. The time
is dynamically adjusted according to the status of whole segments, which is
decreased when the following conditions are satisfied.
. GC is not conducted currently, and
. IO subsystem is idle by checking the number of requets in bdev's request
list, and
. There are enough dirty segments.
Otherwise, the time is increased incrementally until to the maximum time.
Note that, min and max times are 10 secs and 30 secs by default.
- F2FS adopts a default victim selection policy where background cleaning uses
a cost-benefit algorithm, while on-demand cleaning uses a greedy algorithm.
- The method of moving data during the cleaning is slightly different between
background and on-demand cleaning schemes. In the case of background cleaning,
F2FS loads the data, and marks them as dirty. Then, F2FS expects that the data
will be moved by flusher or VM. In the case of on-demand cleaning, F2FS should
move the data right away.
- In order to identify valid blocks in a victim segment, F2FS scans the bitmap
of the segment managed as an SIT entry.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:13:01 +08:00
|
|
|
struct page *data_page;
|
|
|
|
struct inode *inode;
|
|
|
|
struct node_info dni; /* dnode info for the data */
|
|
|
|
unsigned int ofs_in_node, nofs;
|
|
|
|
block_t start_bidx;
|
f2fs: do in batch synchronously readahead during GC
In order to enhance performance, we try to readahead node page during
GC, but before loading node page we should get block address of node page
which is stored in NAT table, so synchronously read of single NAT page
block our readahead flow.
f2fs_submit_page_bio: dev = (251,0), ino = 2, page_index = 0xa1e, oldaddr = 0xa1e, newaddr = 0xa1e, rw = READ_SYNC(MP), type = META
f2fs_submit_page_bio: dev = (251,0), ino = 1, page_index = 0x35e9, oldaddr = 0x72d7a, newaddr = 0x72d7a, rw = READAHEAD ^H, type = NODE
f2fs_submit_page_bio: dev = (251,0), ino = 2, page_index = 0xc1f, oldaddr = 0xc1f, newaddr = 0xc1f, rw = READ_SYNC(MP), type = META
f2fs_submit_page_bio: dev = (251,0), ino = 1, page_index = 0x389d, oldaddr = 0x72d7d, newaddr = 0x72d7d, rw = READAHEAD ^H, type = NODE
f2fs_submit_page_bio: dev = (251,0), ino = 1, page_index = 0x3a82, oldaddr = 0x72d7f, newaddr = 0x72d7f, rw = READAHEAD ^H, type = NODE
f2fs_submit_page_bio: dev = (251,0), ino = 1, page_index = 0x3bfa, oldaddr = 0x72d86, newaddr = 0x72d86, rw = READAHEAD ^H, type = NODE
This patch adds one phase that do readahead NAT pages in batch before
readahead node page for more effeciently.
f2fs_submit_page_bio: dev = (251,0), ino = 2, page_index = 0x1952, oldaddr = 0x1952, newaddr = 0x1952, rw = READ_SYNC(MP), type = META
f2fs_submit_page_mbio: dev = (251,0), ino = 2, page_index = 0xc34, oldaddr = 0xc34, newaddr = 0xc34, rw = READ_SYNC(MP), type = META
f2fs_submit_page_mbio: dev = (251,0), ino = 2, page_index = 0xa33, oldaddr = 0xa33, newaddr = 0xa33, rw = READ_SYNC(MP), type = META
f2fs_submit_page_mbio: dev = (251,0), ino = 2, page_index = 0xc30, oldaddr = 0xc30, newaddr = 0xc30, rw = READ_SYNC(MP), type = META
f2fs_submit_page_mbio: dev = (251,0), ino = 2, page_index = 0xc32, oldaddr = 0xc32, newaddr = 0xc32, rw = READ_SYNC(MP), type = META
f2fs_submit_page_mbio: dev = (251,0), ino = 2, page_index = 0xc26, oldaddr = 0xc26, newaddr = 0xc26, rw = READ_SYNC(MP), type = META
f2fs_submit_page_mbio: dev = (251,0), ino = 2, page_index = 0xa2b, oldaddr = 0xa2b, newaddr = 0xa2b, rw = READ_SYNC(MP), type = META
f2fs_submit_page_mbio: dev = (251,0), ino = 2, page_index = 0xc23, oldaddr = 0xc23, newaddr = 0xc23, rw = READ_SYNC(MP), type = META
f2fs_submit_page_mbio: dev = (251,0), ino = 2, page_index = 0xc24, oldaddr = 0xc24, newaddr = 0xc24, rw = READ_SYNC(MP), type = META
f2fs_submit_page_mbio: dev = (251,0), ino = 2, page_index = 0xa10, oldaddr = 0xa10, newaddr = 0xa10, rw = READ_SYNC(MP), type = META
f2fs_submit_page_mbio: dev = (251,0), ino = 2, page_index = 0xc2c, oldaddr = 0xc2c, newaddr = 0xc2c, rw = READ_SYNC(MP), type = META
f2fs_submit_page_bio: dev = (251,0), ino = 1, page_index = 0x5db7, oldaddr = 0x6be00, newaddr = 0x6be00, rw = READAHEAD ^H, type = NODE
f2fs_submit_page_bio: dev = (251,0), ino = 1, page_index = 0x5db9, oldaddr = 0x6be17, newaddr = 0x6be17, rw = READAHEAD ^H, type = NODE
f2fs_submit_page_bio: dev = (251,0), ino = 1, page_index = 0x5dbc, oldaddr = 0x6be1a, newaddr = 0x6be1a, rw = READAHEAD ^H, type = NODE
f2fs_submit_page_bio: dev = (251,0), ino = 1, page_index = 0x5dc3, oldaddr = 0x6be20, newaddr = 0x6be20, rw = READAHEAD ^H, type = NODE
f2fs_submit_page_bio: dev = (251,0), ino = 1, page_index = 0x5dc7, oldaddr = 0x6be24, newaddr = 0x6be24, rw = READAHEAD ^H, type = NODE
f2fs_submit_page_bio: dev = (251,0), ino = 1, page_index = 0x5dc9, oldaddr = 0x6be25, newaddr = 0x6be25, rw = READAHEAD ^H, type = NODE
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-08-27 00:14:31 +08:00
|
|
|
nid_t nid = le32_to_cpu(entry->nid);
|
f2fs: add garbage collection functions
This adds on-demand and background cleaning functions.
- The basic background cleaning policy is trying to do cleaning jobs as much as
possible whenever the system is idle. Once the background cleaning is done,
the cleaner sleeps an amount of time not to interfere with VFS calls. The time
is dynamically adjusted according to the status of whole segments, which is
decreased when the following conditions are satisfied.
. GC is not conducted currently, and
. IO subsystem is idle by checking the number of requets in bdev's request
list, and
. There are enough dirty segments.
Otherwise, the time is increased incrementally until to the maximum time.
Note that, min and max times are 10 secs and 30 secs by default.
- F2FS adopts a default victim selection policy where background cleaning uses
a cost-benefit algorithm, while on-demand cleaning uses a greedy algorithm.
- The method of moving data during the cleaning is slightly different between
background and on-demand cleaning schemes. In the case of background cleaning,
F2FS loads the data, and marks them as dirty. Then, F2FS expects that the data
will be moved by flusher or VM. In the case of on-demand cleaning, F2FS should
move the data right away.
- In order to identify valid blocks in a victim segment, F2FS scans the bitmap
of the segment managed as an SIT entry.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:13:01 +08:00
|
|
|
|
2019-11-23 04:02:06 +08:00
|
|
|
/*
|
|
|
|
* stop BG_GC if there is not enough free sections.
|
|
|
|
* Or, stop GC if the segment becomes fully valid caused by
|
|
|
|
* race condition along with SSR block allocation.
|
|
|
|
*/
|
|
|
|
if ((gc_type == BG_GC && has_not_enough_free_secs(sbi, 0, 0)) ||
|
2020-02-10 05:23:28 +08:00
|
|
|
get_valid_blocks(sbi, segno, true) ==
|
|
|
|
BLKS_PER_SEC(sbi))
|
2018-09-13 07:40:53 +08:00
|
|
|
return submitted;
|
f2fs: add garbage collection functions
This adds on-demand and background cleaning functions.
- The basic background cleaning policy is trying to do cleaning jobs as much as
possible whenever the system is idle. Once the background cleaning is done,
the cleaner sleeps an amount of time not to interfere with VFS calls. The time
is dynamically adjusted according to the status of whole segments, which is
decreased when the following conditions are satisfied.
. GC is not conducted currently, and
. IO subsystem is idle by checking the number of requets in bdev's request
list, and
. There are enough dirty segments.
Otherwise, the time is increased incrementally until to the maximum time.
Note that, min and max times are 10 secs and 30 secs by default.
- F2FS adopts a default victim selection policy where background cleaning uses
a cost-benefit algorithm, while on-demand cleaning uses a greedy algorithm.
- The method of moving data during the cleaning is slightly different between
background and on-demand cleaning schemes. In the case of background cleaning,
F2FS loads the data, and marks them as dirty. Then, F2FS expects that the data
will be moved by flusher or VM. In the case of on-demand cleaning, F2FS should
move the data right away.
- In order to identify valid blocks in a victim segment, F2FS scans the bitmap
of the segment managed as an SIT entry.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:13:01 +08:00
|
|
|
|
2013-02-04 14:11:17 +08:00
|
|
|
if (check_valid_map(sbi, segno, off) == 0)
|
f2fs: add garbage collection functions
This adds on-demand and background cleaning functions.
- The basic background cleaning policy is trying to do cleaning jobs as much as
possible whenever the system is idle. Once the background cleaning is done,
the cleaner sleeps an amount of time not to interfere with VFS calls. The time
is dynamically adjusted according to the status of whole segments, which is
decreased when the following conditions are satisfied.
. GC is not conducted currently, and
. IO subsystem is idle by checking the number of requets in bdev's request
list, and
. There are enough dirty segments.
Otherwise, the time is increased incrementally until to the maximum time.
Note that, min and max times are 10 secs and 30 secs by default.
- F2FS adopts a default victim selection policy where background cleaning uses
a cost-benefit algorithm, while on-demand cleaning uses a greedy algorithm.
- The method of moving data during the cleaning is slightly different between
background and on-demand cleaning schemes. In the case of background cleaning,
F2FS loads the data, and marks them as dirty. Then, F2FS expects that the data
will be moved by flusher or VM. In the case of on-demand cleaning, F2FS should
move the data right away.
- In order to identify valid blocks in a victim segment, F2FS scans the bitmap
of the segment managed as an SIT entry.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:13:01 +08:00
|
|
|
continue;
|
|
|
|
|
|
|
|
if (phase == 0) {
|
f2fs: clean up symbol namespace
As Ted reported:
"Hi, I was looking at f2fs's sources recently, and I noticed that there
is a very large number of non-static symbols which don't have a f2fs
prefix. There's well over a hundred (see attached below).
As one example, in fs/f2fs/dir.c there is:
unsigned char get_de_type(struct f2fs_dir_entry *de)
This function is clearly only useful for f2fs, but it has a generic
name. This means that if any other file system tries to have the same
symbol name, there will be a symbol conflict and the kernel would not
successfully build. It also means that when someone is looking f2fs
sources, it's not at all obvious whether a function such as
read_data_page(), invalidate_blocks(), is a generic kernel function
found in the fs, mm, or block layers, or a f2fs specific function.
You might want to fix this at some point. Hopefully Kent's bcachefs
isn't similarly using genericly named functions, since that might
cause conflicts with f2fs's functions --- but just as this would be a
problem that we would rightly insist that Kent fix, this is something
that we should have rightly insisted that f2fs should have fixed
before it was integrated into the mainline kernel.
acquire_orphan_inode
add_ino_entry
add_orphan_inode
allocate_data_block
allocate_new_segments
alloc_nid
alloc_nid_done
alloc_nid_failed
available_free_memory
...."
This patch adds "f2fs_" prefix for all non-static symbols in order to:
a) avoid conflict with other kernel generic symbols;
b) to indicate the function is f2fs specific one instead of generic
one;
Reported-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-30 00:20:41 +08:00
|
|
|
f2fs_ra_meta_pages(sbi, NAT_BLOCK_OFFSET(nid), 1,
|
f2fs: do in batch synchronously readahead during GC
In order to enhance performance, we try to readahead node page during
GC, but before loading node page we should get block address of node page
which is stored in NAT table, so synchronously read of single NAT page
block our readahead flow.
f2fs_submit_page_bio: dev = (251,0), ino = 2, page_index = 0xa1e, oldaddr = 0xa1e, newaddr = 0xa1e, rw = READ_SYNC(MP), type = META
f2fs_submit_page_bio: dev = (251,0), ino = 1, page_index = 0x35e9, oldaddr = 0x72d7a, newaddr = 0x72d7a, rw = READAHEAD ^H, type = NODE
f2fs_submit_page_bio: dev = (251,0), ino = 2, page_index = 0xc1f, oldaddr = 0xc1f, newaddr = 0xc1f, rw = READ_SYNC(MP), type = META
f2fs_submit_page_bio: dev = (251,0), ino = 1, page_index = 0x389d, oldaddr = 0x72d7d, newaddr = 0x72d7d, rw = READAHEAD ^H, type = NODE
f2fs_submit_page_bio: dev = (251,0), ino = 1, page_index = 0x3a82, oldaddr = 0x72d7f, newaddr = 0x72d7f, rw = READAHEAD ^H, type = NODE
f2fs_submit_page_bio: dev = (251,0), ino = 1, page_index = 0x3bfa, oldaddr = 0x72d86, newaddr = 0x72d86, rw = READAHEAD ^H, type = NODE
This patch adds one phase that do readahead NAT pages in batch before
readahead node page for more effeciently.
f2fs_submit_page_bio: dev = (251,0), ino = 2, page_index = 0x1952, oldaddr = 0x1952, newaddr = 0x1952, rw = READ_SYNC(MP), type = META
f2fs_submit_page_mbio: dev = (251,0), ino = 2, page_index = 0xc34, oldaddr = 0xc34, newaddr = 0xc34, rw = READ_SYNC(MP), type = META
f2fs_submit_page_mbio: dev = (251,0), ino = 2, page_index = 0xa33, oldaddr = 0xa33, newaddr = 0xa33, rw = READ_SYNC(MP), type = META
f2fs_submit_page_mbio: dev = (251,0), ino = 2, page_index = 0xc30, oldaddr = 0xc30, newaddr = 0xc30, rw = READ_SYNC(MP), type = META
f2fs_submit_page_mbio: dev = (251,0), ino = 2, page_index = 0xc32, oldaddr = 0xc32, newaddr = 0xc32, rw = READ_SYNC(MP), type = META
f2fs_submit_page_mbio: dev = (251,0), ino = 2, page_index = 0xc26, oldaddr = 0xc26, newaddr = 0xc26, rw = READ_SYNC(MP), type = META
f2fs_submit_page_mbio: dev = (251,0), ino = 2, page_index = 0xa2b, oldaddr = 0xa2b, newaddr = 0xa2b, rw = READ_SYNC(MP), type = META
f2fs_submit_page_mbio: dev = (251,0), ino = 2, page_index = 0xc23, oldaddr = 0xc23, newaddr = 0xc23, rw = READ_SYNC(MP), type = META
f2fs_submit_page_mbio: dev = (251,0), ino = 2, page_index = 0xc24, oldaddr = 0xc24, newaddr = 0xc24, rw = READ_SYNC(MP), type = META
f2fs_submit_page_mbio: dev = (251,0), ino = 2, page_index = 0xa10, oldaddr = 0xa10, newaddr = 0xa10, rw = READ_SYNC(MP), type = META
f2fs_submit_page_mbio: dev = (251,0), ino = 2, page_index = 0xc2c, oldaddr = 0xc2c, newaddr = 0xc2c, rw = READ_SYNC(MP), type = META
f2fs_submit_page_bio: dev = (251,0), ino = 1, page_index = 0x5db7, oldaddr = 0x6be00, newaddr = 0x6be00, rw = READAHEAD ^H, type = NODE
f2fs_submit_page_bio: dev = (251,0), ino = 1, page_index = 0x5db9, oldaddr = 0x6be17, newaddr = 0x6be17, rw = READAHEAD ^H, type = NODE
f2fs_submit_page_bio: dev = (251,0), ino = 1, page_index = 0x5dbc, oldaddr = 0x6be1a, newaddr = 0x6be1a, rw = READAHEAD ^H, type = NODE
f2fs_submit_page_bio: dev = (251,0), ino = 1, page_index = 0x5dc3, oldaddr = 0x6be20, newaddr = 0x6be20, rw = READAHEAD ^H, type = NODE
f2fs_submit_page_bio: dev = (251,0), ino = 1, page_index = 0x5dc7, oldaddr = 0x6be24, newaddr = 0x6be24, rw = READAHEAD ^H, type = NODE
f2fs_submit_page_bio: dev = (251,0), ino = 1, page_index = 0x5dc9, oldaddr = 0x6be25, newaddr = 0x6be25, rw = READAHEAD ^H, type = NODE
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-08-27 00:14:31 +08:00
|
|
|
META_NAT, true);
|
|
|
|
continue;
|
|
|
|
}
|
|
|
|
|
|
|
|
if (phase == 1) {
|
f2fs: clean up symbol namespace
As Ted reported:
"Hi, I was looking at f2fs's sources recently, and I noticed that there
is a very large number of non-static symbols which don't have a f2fs
prefix. There's well over a hundred (see attached below).
As one example, in fs/f2fs/dir.c there is:
unsigned char get_de_type(struct f2fs_dir_entry *de)
This function is clearly only useful for f2fs, but it has a generic
name. This means that if any other file system tries to have the same
symbol name, there will be a symbol conflict and the kernel would not
successfully build. It also means that when someone is looking f2fs
sources, it's not at all obvious whether a function such as
read_data_page(), invalidate_blocks(), is a generic kernel function
found in the fs, mm, or block layers, or a f2fs specific function.
You might want to fix this at some point. Hopefully Kent's bcachefs
isn't similarly using genericly named functions, since that might
cause conflicts with f2fs's functions --- but just as this would be a
problem that we would rightly insist that Kent fix, this is something
that we should have rightly insisted that f2fs should have fixed
before it was integrated into the mainline kernel.
acquire_orphan_inode
add_ino_entry
add_orphan_inode
allocate_data_block
allocate_new_segments
alloc_nid
alloc_nid_done
alloc_nid_failed
available_free_memory
...."
This patch adds "f2fs_" prefix for all non-static symbols in order to:
a) avoid conflict with other kernel generic symbols;
b) to indicate the function is f2fs specific one instead of generic
one;
Reported-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-30 00:20:41 +08:00
|
|
|
f2fs_ra_node_page(sbi, nid);
|
f2fs: add garbage collection functions
This adds on-demand and background cleaning functions.
- The basic background cleaning policy is trying to do cleaning jobs as much as
possible whenever the system is idle. Once the background cleaning is done,
the cleaner sleeps an amount of time not to interfere with VFS calls. The time
is dynamically adjusted according to the status of whole segments, which is
decreased when the following conditions are satisfied.
. GC is not conducted currently, and
. IO subsystem is idle by checking the number of requets in bdev's request
list, and
. There are enough dirty segments.
Otherwise, the time is increased incrementally until to the maximum time.
Note that, min and max times are 10 secs and 30 secs by default.
- F2FS adopts a default victim selection policy where background cleaning uses
a cost-benefit algorithm, while on-demand cleaning uses a greedy algorithm.
- The method of moving data during the cleaning is slightly different between
background and on-demand cleaning schemes. In the case of background cleaning,
F2FS loads the data, and marks them as dirty. Then, F2FS expects that the data
will be moved by flusher or VM. In the case of on-demand cleaning, F2FS should
move the data right away.
- In order to identify valid blocks in a victim segment, F2FS scans the bitmap
of the segment managed as an SIT entry.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:13:01 +08:00
|
|
|
continue;
|
|
|
|
}
|
|
|
|
|
|
|
|
/* Get an inode by ino with checking validity */
|
2015-07-01 09:37:21 +08:00
|
|
|
if (!is_alive(sbi, entry, &dni, start_addr + off, &nofs))
|
f2fs: add garbage collection functions
This adds on-demand and background cleaning functions.
- The basic background cleaning policy is trying to do cleaning jobs as much as
possible whenever the system is idle. Once the background cleaning is done,
the cleaner sleeps an amount of time not to interfere with VFS calls. The time
is dynamically adjusted according to the status of whole segments, which is
decreased when the following conditions are satisfied.
. GC is not conducted currently, and
. IO subsystem is idle by checking the number of requets in bdev's request
list, and
. There are enough dirty segments.
Otherwise, the time is increased incrementally until to the maximum time.
Note that, min and max times are 10 secs and 30 secs by default.
- F2FS adopts a default victim selection policy where background cleaning uses
a cost-benefit algorithm, while on-demand cleaning uses a greedy algorithm.
- The method of moving data during the cleaning is slightly different between
background and on-demand cleaning schemes. In the case of background cleaning,
F2FS loads the data, and marks them as dirty. Then, F2FS expects that the data
will be moved by flusher or VM. In the case of on-demand cleaning, F2FS should
move the data right away.
- In order to identify valid blocks in a victim segment, F2FS scans the bitmap
of the segment managed as an SIT entry.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:13:01 +08:00
|
|
|
continue;
|
|
|
|
|
f2fs: do in batch synchronously readahead during GC
In order to enhance performance, we try to readahead node page during
GC, but before loading node page we should get block address of node page
which is stored in NAT table, so synchronously read of single NAT page
block our readahead flow.
f2fs_submit_page_bio: dev = (251,0), ino = 2, page_index = 0xa1e, oldaddr = 0xa1e, newaddr = 0xa1e, rw = READ_SYNC(MP), type = META
f2fs_submit_page_bio: dev = (251,0), ino = 1, page_index = 0x35e9, oldaddr = 0x72d7a, newaddr = 0x72d7a, rw = READAHEAD ^H, type = NODE
f2fs_submit_page_bio: dev = (251,0), ino = 2, page_index = 0xc1f, oldaddr = 0xc1f, newaddr = 0xc1f, rw = READ_SYNC(MP), type = META
f2fs_submit_page_bio: dev = (251,0), ino = 1, page_index = 0x389d, oldaddr = 0x72d7d, newaddr = 0x72d7d, rw = READAHEAD ^H, type = NODE
f2fs_submit_page_bio: dev = (251,0), ino = 1, page_index = 0x3a82, oldaddr = 0x72d7f, newaddr = 0x72d7f, rw = READAHEAD ^H, type = NODE
f2fs_submit_page_bio: dev = (251,0), ino = 1, page_index = 0x3bfa, oldaddr = 0x72d86, newaddr = 0x72d86, rw = READAHEAD ^H, type = NODE
This patch adds one phase that do readahead NAT pages in batch before
readahead node page for more effeciently.
f2fs_submit_page_bio: dev = (251,0), ino = 2, page_index = 0x1952, oldaddr = 0x1952, newaddr = 0x1952, rw = READ_SYNC(MP), type = META
f2fs_submit_page_mbio: dev = (251,0), ino = 2, page_index = 0xc34, oldaddr = 0xc34, newaddr = 0xc34, rw = READ_SYNC(MP), type = META
f2fs_submit_page_mbio: dev = (251,0), ino = 2, page_index = 0xa33, oldaddr = 0xa33, newaddr = 0xa33, rw = READ_SYNC(MP), type = META
f2fs_submit_page_mbio: dev = (251,0), ino = 2, page_index = 0xc30, oldaddr = 0xc30, newaddr = 0xc30, rw = READ_SYNC(MP), type = META
f2fs_submit_page_mbio: dev = (251,0), ino = 2, page_index = 0xc32, oldaddr = 0xc32, newaddr = 0xc32, rw = READ_SYNC(MP), type = META
f2fs_submit_page_mbio: dev = (251,0), ino = 2, page_index = 0xc26, oldaddr = 0xc26, newaddr = 0xc26, rw = READ_SYNC(MP), type = META
f2fs_submit_page_mbio: dev = (251,0), ino = 2, page_index = 0xa2b, oldaddr = 0xa2b, newaddr = 0xa2b, rw = READ_SYNC(MP), type = META
f2fs_submit_page_mbio: dev = (251,0), ino = 2, page_index = 0xc23, oldaddr = 0xc23, newaddr = 0xc23, rw = READ_SYNC(MP), type = META
f2fs_submit_page_mbio: dev = (251,0), ino = 2, page_index = 0xc24, oldaddr = 0xc24, newaddr = 0xc24, rw = READ_SYNC(MP), type = META
f2fs_submit_page_mbio: dev = (251,0), ino = 2, page_index = 0xa10, oldaddr = 0xa10, newaddr = 0xa10, rw = READ_SYNC(MP), type = META
f2fs_submit_page_mbio: dev = (251,0), ino = 2, page_index = 0xc2c, oldaddr = 0xc2c, newaddr = 0xc2c, rw = READ_SYNC(MP), type = META
f2fs_submit_page_bio: dev = (251,0), ino = 1, page_index = 0x5db7, oldaddr = 0x6be00, newaddr = 0x6be00, rw = READAHEAD ^H, type = NODE
f2fs_submit_page_bio: dev = (251,0), ino = 1, page_index = 0x5db9, oldaddr = 0x6be17, newaddr = 0x6be17, rw = READAHEAD ^H, type = NODE
f2fs_submit_page_bio: dev = (251,0), ino = 1, page_index = 0x5dbc, oldaddr = 0x6be1a, newaddr = 0x6be1a, rw = READAHEAD ^H, type = NODE
f2fs_submit_page_bio: dev = (251,0), ino = 1, page_index = 0x5dc3, oldaddr = 0x6be20, newaddr = 0x6be20, rw = READAHEAD ^H, type = NODE
f2fs_submit_page_bio: dev = (251,0), ino = 1, page_index = 0x5dc7, oldaddr = 0x6be24, newaddr = 0x6be24, rw = READAHEAD ^H, type = NODE
f2fs_submit_page_bio: dev = (251,0), ino = 1, page_index = 0x5dc9, oldaddr = 0x6be25, newaddr = 0x6be25, rw = READAHEAD ^H, type = NODE
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-08-27 00:14:31 +08:00
|
|
|
if (phase == 2) {
|
f2fs: clean up symbol namespace
As Ted reported:
"Hi, I was looking at f2fs's sources recently, and I noticed that there
is a very large number of non-static symbols which don't have a f2fs
prefix. There's well over a hundred (see attached below).
As one example, in fs/f2fs/dir.c there is:
unsigned char get_de_type(struct f2fs_dir_entry *de)
This function is clearly only useful for f2fs, but it has a generic
name. This means that if any other file system tries to have the same
symbol name, there will be a symbol conflict and the kernel would not
successfully build. It also means that when someone is looking f2fs
sources, it's not at all obvious whether a function such as
read_data_page(), invalidate_blocks(), is a generic kernel function
found in the fs, mm, or block layers, or a f2fs specific function.
You might want to fix this at some point. Hopefully Kent's bcachefs
isn't similarly using genericly named functions, since that might
cause conflicts with f2fs's functions --- but just as this would be a
problem that we would rightly insist that Kent fix, this is something
that we should have rightly insisted that f2fs should have fixed
before it was integrated into the mainline kernel.
acquire_orphan_inode
add_ino_entry
add_orphan_inode
allocate_data_block
allocate_new_segments
alloc_nid
alloc_nid_done
alloc_nid_failed
available_free_memory
...."
This patch adds "f2fs_" prefix for all non-static symbols in order to:
a) avoid conflict with other kernel generic symbols;
b) to indicate the function is f2fs specific one instead of generic
one;
Reported-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-30 00:20:41 +08:00
|
|
|
f2fs_ra_node_page(sbi, dni.ino);
|
f2fs: add garbage collection functions
This adds on-demand and background cleaning functions.
- The basic background cleaning policy is trying to do cleaning jobs as much as
possible whenever the system is idle. Once the background cleaning is done,
the cleaner sleeps an amount of time not to interfere with VFS calls. The time
is dynamically adjusted according to the status of whole segments, which is
decreased when the following conditions are satisfied.
. GC is not conducted currently, and
. IO subsystem is idle by checking the number of requets in bdev's request
list, and
. There are enough dirty segments.
Otherwise, the time is increased incrementally until to the maximum time.
Note that, min and max times are 10 secs and 30 secs by default.
- F2FS adopts a default victim selection policy where background cleaning uses
a cost-benefit algorithm, while on-demand cleaning uses a greedy algorithm.
- The method of moving data during the cleaning is slightly different between
background and on-demand cleaning schemes. In the case of background cleaning,
F2FS loads the data, and marks them as dirty. Then, F2FS expects that the data
will be moved by flusher or VM. In the case of on-demand cleaning, F2FS should
move the data right away.
- In order to identify valid blocks in a victim segment, F2FS scans the bitmap
of the segment managed as an SIT entry.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:13:01 +08:00
|
|
|
continue;
|
|
|
|
}
|
|
|
|
|
|
|
|
ofs_in_node = le16_to_cpu(entry->ofs_in_node);
|
|
|
|
|
f2fs: do in batch synchronously readahead during GC
In order to enhance performance, we try to readahead node page during
GC, but before loading node page we should get block address of node page
which is stored in NAT table, so synchronously read of single NAT page
block our readahead flow.
f2fs_submit_page_bio: dev = (251,0), ino = 2, page_index = 0xa1e, oldaddr = 0xa1e, newaddr = 0xa1e, rw = READ_SYNC(MP), type = META
f2fs_submit_page_bio: dev = (251,0), ino = 1, page_index = 0x35e9, oldaddr = 0x72d7a, newaddr = 0x72d7a, rw = READAHEAD ^H, type = NODE
f2fs_submit_page_bio: dev = (251,0), ino = 2, page_index = 0xc1f, oldaddr = 0xc1f, newaddr = 0xc1f, rw = READ_SYNC(MP), type = META
f2fs_submit_page_bio: dev = (251,0), ino = 1, page_index = 0x389d, oldaddr = 0x72d7d, newaddr = 0x72d7d, rw = READAHEAD ^H, type = NODE
f2fs_submit_page_bio: dev = (251,0), ino = 1, page_index = 0x3a82, oldaddr = 0x72d7f, newaddr = 0x72d7f, rw = READAHEAD ^H, type = NODE
f2fs_submit_page_bio: dev = (251,0), ino = 1, page_index = 0x3bfa, oldaddr = 0x72d86, newaddr = 0x72d86, rw = READAHEAD ^H, type = NODE
This patch adds one phase that do readahead NAT pages in batch before
readahead node page for more effeciently.
f2fs_submit_page_bio: dev = (251,0), ino = 2, page_index = 0x1952, oldaddr = 0x1952, newaddr = 0x1952, rw = READ_SYNC(MP), type = META
f2fs_submit_page_mbio: dev = (251,0), ino = 2, page_index = 0xc34, oldaddr = 0xc34, newaddr = 0xc34, rw = READ_SYNC(MP), type = META
f2fs_submit_page_mbio: dev = (251,0), ino = 2, page_index = 0xa33, oldaddr = 0xa33, newaddr = 0xa33, rw = READ_SYNC(MP), type = META
f2fs_submit_page_mbio: dev = (251,0), ino = 2, page_index = 0xc30, oldaddr = 0xc30, newaddr = 0xc30, rw = READ_SYNC(MP), type = META
f2fs_submit_page_mbio: dev = (251,0), ino = 2, page_index = 0xc32, oldaddr = 0xc32, newaddr = 0xc32, rw = READ_SYNC(MP), type = META
f2fs_submit_page_mbio: dev = (251,0), ino = 2, page_index = 0xc26, oldaddr = 0xc26, newaddr = 0xc26, rw = READ_SYNC(MP), type = META
f2fs_submit_page_mbio: dev = (251,0), ino = 2, page_index = 0xa2b, oldaddr = 0xa2b, newaddr = 0xa2b, rw = READ_SYNC(MP), type = META
f2fs_submit_page_mbio: dev = (251,0), ino = 2, page_index = 0xc23, oldaddr = 0xc23, newaddr = 0xc23, rw = READ_SYNC(MP), type = META
f2fs_submit_page_mbio: dev = (251,0), ino = 2, page_index = 0xc24, oldaddr = 0xc24, newaddr = 0xc24, rw = READ_SYNC(MP), type = META
f2fs_submit_page_mbio: dev = (251,0), ino = 2, page_index = 0xa10, oldaddr = 0xa10, newaddr = 0xa10, rw = READ_SYNC(MP), type = META
f2fs_submit_page_mbio: dev = (251,0), ino = 2, page_index = 0xc2c, oldaddr = 0xc2c, newaddr = 0xc2c, rw = READ_SYNC(MP), type = META
f2fs_submit_page_bio: dev = (251,0), ino = 1, page_index = 0x5db7, oldaddr = 0x6be00, newaddr = 0x6be00, rw = READAHEAD ^H, type = NODE
f2fs_submit_page_bio: dev = (251,0), ino = 1, page_index = 0x5db9, oldaddr = 0x6be17, newaddr = 0x6be17, rw = READAHEAD ^H, type = NODE
f2fs_submit_page_bio: dev = (251,0), ino = 1, page_index = 0x5dbc, oldaddr = 0x6be1a, newaddr = 0x6be1a, rw = READAHEAD ^H, type = NODE
f2fs_submit_page_bio: dev = (251,0), ino = 1, page_index = 0x5dc3, oldaddr = 0x6be20, newaddr = 0x6be20, rw = READAHEAD ^H, type = NODE
f2fs_submit_page_bio: dev = (251,0), ino = 1, page_index = 0x5dc7, oldaddr = 0x6be24, newaddr = 0x6be24, rw = READAHEAD ^H, type = NODE
f2fs_submit_page_bio: dev = (251,0), ino = 1, page_index = 0x5dc9, oldaddr = 0x6be25, newaddr = 0x6be25, rw = READAHEAD ^H, type = NODE
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-08-27 00:14:31 +08:00
|
|
|
if (phase == 3) {
|
f2fs: avoid balanc_fs during evict_inode
1. Background
Previously, if f2fs tries to move data blocks of an *evicting* inode during the
cleaning process, it stops the process incompletely and then restarts the whole
process, since it needs a locked inode to grab victim data pages in its address
space. In order to get a locked inode, iget_locked() by f2fs_iget() is normally
used, but, it waits if the inode is on freeing.
So, here is a deadlock scenario.
1. f2fs_evict_inode() <- inode "A"
2. f2fs_balance_fs()
3. f2fs_gc()
4. gc_data_segment()
5. f2fs_iget() <- inode "A" too!
If step #1 and #5 treat a same inode "A", step #5 would fall into deadlock since
the inode "A" is on freeing. In order to resolve this, f2fs_iget_nowait() which
skips __wait_on_freeing_inode() was introduced in step #5, and stops f2fs_gc()
to complete f2fs_evict_inode().
1. f2fs_evict_inode() <- inode "A"
2. f2fs_balance_fs()
3. f2fs_gc()
4. gc_data_segment()
5. f2fs_iget_nowait() <- inode "A", then stop f2fs_gc() w/ -ENOENT
2. Problem and Solution
In the above scenario, however, f2fs cannot finish f2fs_evict_inode() only if:
o there are not enough free sections, and
o f2fs_gc() tries to move data blocks of the *evicting* inode repeatedly.
So, the final solution is to use f2fs_iget() and remove f2fs_balance_fs() in
f2fs_evict_inode().
The f2fs_evict_inode() actually truncates all the data and node blocks, which
means that it doesn't produce any dirty node pages accordingly.
So, we don't need to do f2fs_balance_fs() in practical.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-01-31 14:36:04 +08:00
|
|
|
inode = f2fs_iget(sb, dni.ino);
|
2019-12-21 09:20:05 +08:00
|
|
|
if (IS_ERR(inode) || is_bad_inode(inode)) {
|
|
|
|
set_sbi_flag(sbi, SBI_NEED_FSCK);
|
f2fs: add garbage collection functions
This adds on-demand and background cleaning functions.
- The basic background cleaning policy is trying to do cleaning jobs as much as
possible whenever the system is idle. Once the background cleaning is done,
the cleaner sleeps an amount of time not to interfere with VFS calls. The time
is dynamically adjusted according to the status of whole segments, which is
decreased when the following conditions are satisfied.
. GC is not conducted currently, and
. IO subsystem is idle by checking the number of requets in bdev's request
list, and
. There are enough dirty segments.
Otherwise, the time is increased incrementally until to the maximum time.
Note that, min and max times are 10 secs and 30 secs by default.
- F2FS adopts a default victim selection policy where background cleaning uses
a cost-benefit algorithm, while on-demand cleaning uses a greedy algorithm.
- The method of moving data during the cleaning is slightly different between
background and on-demand cleaning schemes. In the case of background cleaning,
F2FS loads the data, and marks them as dirty. Then, F2FS expects that the data
will be moved by flusher or VM. In the case of on-demand cleaning, F2FS should
move the data right away.
- In order to identify valid blocks in a victim segment, F2FS scans the bitmap
of the segment managed as an SIT entry.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:13:01 +08:00
|
|
|
continue;
|
2019-12-21 09:20:05 +08:00
|
|
|
}
|
f2fs: add garbage collection functions
This adds on-demand and background cleaning functions.
- The basic background cleaning policy is trying to do cleaning jobs as much as
possible whenever the system is idle. Once the background cleaning is done,
the cleaner sleeps an amount of time not to interfere with VFS calls. The time
is dynamically adjusted according to the status of whole segments, which is
decreased when the following conditions are satisfied.
. GC is not conducted currently, and
. IO subsystem is idle by checking the number of requets in bdev's request
list, and
. There are enough dirty segments.
Otherwise, the time is increased incrementally until to the maximum time.
Note that, min and max times are 10 secs and 30 secs by default.
- F2FS adopts a default victim selection policy where background cleaning uses
a cost-benefit algorithm, while on-demand cleaning uses a greedy algorithm.
- The method of moving data during the cleaning is slightly different between
background and on-demand cleaning schemes. In the case of background cleaning,
F2FS loads the data, and marks them as dirty. Then, F2FS expects that the data
will be moved by flusher or VM. In the case of on-demand cleaning, F2FS should
move the data right away.
- In order to identify valid blocks in a victim segment, F2FS scans the bitmap
of the segment managed as an SIT entry.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:13:01 +08:00
|
|
|
|
2017-11-03 10:21:05 +08:00
|
|
|
if (!down_write_trylock(
|
2018-04-24 10:55:28 +08:00
|
|
|
&F2FS_I(inode)->i_gc_rwsem[WRITE])) {
|
2017-11-03 10:21:05 +08:00
|
|
|
iput(inode);
|
2018-07-25 11:11:56 +08:00
|
|
|
sbi->skipped_gc_rwsem++;
|
2017-11-03 10:21:05 +08:00
|
|
|
continue;
|
|
|
|
}
|
|
|
|
|
f2fs: readahead encrypted block during GC
During GC, for each encrypted block, we will read block synchronously
into meta page, and then submit it into current cold data log area.
So this block read model with 4k granularity can make poor performance,
like migrating non-encrypted block, let's readahead encrypted block
as well to improve migration performance.
To implement this, we choose meta page that its index is old block
address of the encrypted block, and readahead ciphertext into this
page, later, if readaheaded page is still updated, we will load its
data into target meta page, and submit the write IO.
Note that for OPU, truncation, deletion, we need to invalid meta
page after we invalid old block address, to make sure we won't load
invalid data from target meta page during encrypted block migration.
for ((i = 0; i < 1000; i++))
do {
xfs_io -f /mnt/f2fs/dir/$i -c "pwrite 0 128k" -c "fsync";
} done
for ((i = 0; i < 1000; i+=2))
do {
rm /mnt/f2fs/dir/$i;
} done
ret = ioctl(fd, F2FS_IOC_GARBAGE_COLLECT, 0);
Before:
gc-6549 [001] d..1 214682.212797: block_rq_insert: 8,32 RA 32768 () 786400 + 64 [gc]
gc-6549 [001] d..1 214682.212802: block_unplug: [gc] 1
gc-6549 [001] .... 214682.213892: block_bio_queue: 8,32 R 67494144 + 8 [gc]
gc-6549 [001] .... 214682.213899: block_getrq: 8,32 R 67494144 + 8 [gc]
gc-6549 [001] .... 214682.213902: block_plug: [gc]
gc-6549 [001] d..1 214682.213905: block_rq_insert: 8,32 R 4096 () 67494144 + 8 [gc]
gc-6549 [001] d..1 214682.213908: block_unplug: [gc] 1
gc-6549 [001] .... 214682.226405: block_bio_queue: 8,32 R 67494152 + 8 [gc]
gc-6549 [001] .... 214682.226412: block_getrq: 8,32 R 67494152 + 8 [gc]
gc-6549 [001] .... 214682.226414: block_plug: [gc]
gc-6549 [001] d..1 214682.226417: block_rq_insert: 8,32 R 4096 () 67494152 + 8 [gc]
gc-6549 [001] d..1 214682.226420: block_unplug: [gc] 1
gc-6549 [001] .... 214682.226904: block_bio_queue: 8,32 R 67494160 + 8 [gc]
gc-6549 [001] .... 214682.226910: block_getrq: 8,32 R 67494160 + 8 [gc]
gc-6549 [001] .... 214682.226911: block_plug: [gc]
gc-6549 [001] d..1 214682.226914: block_rq_insert: 8,32 R 4096 () 67494160 + 8 [gc]
gc-6549 [001] d..1 214682.226916: block_unplug: [gc] 1
After:
gc-5678 [003] .... 214327.025906: block_bio_queue: 8,32 R 67493824 + 8 [gc]
gc-5678 [003] .... 214327.025908: block_bio_backmerge: 8,32 R 67493824 + 8 [gc]
gc-5678 [003] .... 214327.025915: block_bio_queue: 8,32 R 67493832 + 8 [gc]
gc-5678 [003] .... 214327.025917: block_bio_backmerge: 8,32 R 67493832 + 8 [gc]
gc-5678 [003] .... 214327.025923: block_bio_queue: 8,32 R 67493840 + 8 [gc]
gc-5678 [003] .... 214327.025925: block_bio_backmerge: 8,32 R 67493840 + 8 [gc]
gc-5678 [003] .... 214327.025932: block_bio_queue: 8,32 R 67493848 + 8 [gc]
gc-5678 [003] .... 214327.025934: block_bio_backmerge: 8,32 R 67493848 + 8 [gc]
gc-5678 [003] .... 214327.025941: block_bio_queue: 8,32 R 67493856 + 8 [gc]
gc-5678 [003] .... 214327.025943: block_bio_backmerge: 8,32 R 67493856 + 8 [gc]
gc-5678 [003] .... 214327.025953: block_bio_queue: 8,32 R 67493864 + 8 [gc]
gc-5678 [003] .... 214327.025955: block_bio_backmerge: 8,32 R 67493864 + 8 [gc]
gc-5678 [003] .... 214327.025962: block_bio_queue: 8,32 R 67493872 + 8 [gc]
gc-5678 [003] .... 214327.025964: block_bio_backmerge: 8,32 R 67493872 + 8 [gc]
gc-5678 [003] .... 214327.025970: block_bio_queue: 8,32 R 67493880 + 8 [gc]
gc-5678 [003] .... 214327.025972: block_bio_backmerge: 8,32 R 67493880 + 8 [gc]
gc-5678 [003] .... 214327.026000: block_bio_queue: 8,32 WS 34123776 + 2048 [gc]
gc-5678 [003] .... 214327.026019: block_getrq: 8,32 WS 34123776 + 2048 [gc]
gc-5678 [003] d..1 214327.026021: block_rq_insert: 8,32 R 131072 () 67493632 + 256 [gc]
gc-5678 [003] d..1 214327.026023: block_unplug: [gc] 1
gc-5678 [003] d..1 214327.026026: block_rq_issue: 8,32 R 131072 () 67493632 + 256 [gc]
gc-5678 [003] .... 214327.026046: block_plug: [gc]
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-08-14 22:37:25 +08:00
|
|
|
start_bidx = f2fs_start_bidx_of_node(nofs, inode) +
|
|
|
|
ofs_in_node;
|
|
|
|
|
|
|
|
if (f2fs_post_read_required(inode)) {
|
|
|
|
int err = ra_data_block(inode, start_bidx);
|
|
|
|
|
|
|
|
up_write(&F2FS_I(inode)->i_gc_rwsem[WRITE]);
|
|
|
|
if (err) {
|
|
|
|
iput(inode);
|
|
|
|
continue;
|
|
|
|
}
|
|
|
|
add_gc_inode(gc_list, inode);
|
|
|
|
continue;
|
|
|
|
}
|
|
|
|
|
f2fs: clean up symbol namespace
As Ted reported:
"Hi, I was looking at f2fs's sources recently, and I noticed that there
is a very large number of non-static symbols which don't have a f2fs
prefix. There's well over a hundred (see attached below).
As one example, in fs/f2fs/dir.c there is:
unsigned char get_de_type(struct f2fs_dir_entry *de)
This function is clearly only useful for f2fs, but it has a generic
name. This means that if any other file system tries to have the same
symbol name, there will be a symbol conflict and the kernel would not
successfully build. It also means that when someone is looking f2fs
sources, it's not at all obvious whether a function such as
read_data_page(), invalidate_blocks(), is a generic kernel function
found in the fs, mm, or block layers, or a f2fs specific function.
You might want to fix this at some point. Hopefully Kent's bcachefs
isn't similarly using genericly named functions, since that might
cause conflicts with f2fs's functions --- but just as this would be a
problem that we would rightly insist that Kent fix, this is something
that we should have rightly insisted that f2fs should have fixed
before it was integrated into the mainline kernel.
acquire_orphan_inode
add_ino_entry
add_orphan_inode
allocate_data_block
allocate_new_segments
alloc_nid
alloc_nid_done
alloc_nid_failed
available_free_memory
...."
This patch adds "f2fs_" prefix for all non-static symbols in order to:
a) avoid conflict with other kernel generic symbols;
b) to indicate the function is f2fs specific one instead of generic
one;
Reported-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-30 00:20:41 +08:00
|
|
|
data_page = f2fs_get_read_data_page(inode,
|
f2fs: readahead encrypted block during GC
During GC, for each encrypted block, we will read block synchronously
into meta page, and then submit it into current cold data log area.
So this block read model with 4k granularity can make poor performance,
like migrating non-encrypted block, let's readahead encrypted block
as well to improve migration performance.
To implement this, we choose meta page that its index is old block
address of the encrypted block, and readahead ciphertext into this
page, later, if readaheaded page is still updated, we will load its
data into target meta page, and submit the write IO.
Note that for OPU, truncation, deletion, we need to invalid meta
page after we invalid old block address, to make sure we won't load
invalid data from target meta page during encrypted block migration.
for ((i = 0; i < 1000; i++))
do {
xfs_io -f /mnt/f2fs/dir/$i -c "pwrite 0 128k" -c "fsync";
} done
for ((i = 0; i < 1000; i+=2))
do {
rm /mnt/f2fs/dir/$i;
} done
ret = ioctl(fd, F2FS_IOC_GARBAGE_COLLECT, 0);
Before:
gc-6549 [001] d..1 214682.212797: block_rq_insert: 8,32 RA 32768 () 786400 + 64 [gc]
gc-6549 [001] d..1 214682.212802: block_unplug: [gc] 1
gc-6549 [001] .... 214682.213892: block_bio_queue: 8,32 R 67494144 + 8 [gc]
gc-6549 [001] .... 214682.213899: block_getrq: 8,32 R 67494144 + 8 [gc]
gc-6549 [001] .... 214682.213902: block_plug: [gc]
gc-6549 [001] d..1 214682.213905: block_rq_insert: 8,32 R 4096 () 67494144 + 8 [gc]
gc-6549 [001] d..1 214682.213908: block_unplug: [gc] 1
gc-6549 [001] .... 214682.226405: block_bio_queue: 8,32 R 67494152 + 8 [gc]
gc-6549 [001] .... 214682.226412: block_getrq: 8,32 R 67494152 + 8 [gc]
gc-6549 [001] .... 214682.226414: block_plug: [gc]
gc-6549 [001] d..1 214682.226417: block_rq_insert: 8,32 R 4096 () 67494152 + 8 [gc]
gc-6549 [001] d..1 214682.226420: block_unplug: [gc] 1
gc-6549 [001] .... 214682.226904: block_bio_queue: 8,32 R 67494160 + 8 [gc]
gc-6549 [001] .... 214682.226910: block_getrq: 8,32 R 67494160 + 8 [gc]
gc-6549 [001] .... 214682.226911: block_plug: [gc]
gc-6549 [001] d..1 214682.226914: block_rq_insert: 8,32 R 4096 () 67494160 + 8 [gc]
gc-6549 [001] d..1 214682.226916: block_unplug: [gc] 1
After:
gc-5678 [003] .... 214327.025906: block_bio_queue: 8,32 R 67493824 + 8 [gc]
gc-5678 [003] .... 214327.025908: block_bio_backmerge: 8,32 R 67493824 + 8 [gc]
gc-5678 [003] .... 214327.025915: block_bio_queue: 8,32 R 67493832 + 8 [gc]
gc-5678 [003] .... 214327.025917: block_bio_backmerge: 8,32 R 67493832 + 8 [gc]
gc-5678 [003] .... 214327.025923: block_bio_queue: 8,32 R 67493840 + 8 [gc]
gc-5678 [003] .... 214327.025925: block_bio_backmerge: 8,32 R 67493840 + 8 [gc]
gc-5678 [003] .... 214327.025932: block_bio_queue: 8,32 R 67493848 + 8 [gc]
gc-5678 [003] .... 214327.025934: block_bio_backmerge: 8,32 R 67493848 + 8 [gc]
gc-5678 [003] .... 214327.025941: block_bio_queue: 8,32 R 67493856 + 8 [gc]
gc-5678 [003] .... 214327.025943: block_bio_backmerge: 8,32 R 67493856 + 8 [gc]
gc-5678 [003] .... 214327.025953: block_bio_queue: 8,32 R 67493864 + 8 [gc]
gc-5678 [003] .... 214327.025955: block_bio_backmerge: 8,32 R 67493864 + 8 [gc]
gc-5678 [003] .... 214327.025962: block_bio_queue: 8,32 R 67493872 + 8 [gc]
gc-5678 [003] .... 214327.025964: block_bio_backmerge: 8,32 R 67493872 + 8 [gc]
gc-5678 [003] .... 214327.025970: block_bio_queue: 8,32 R 67493880 + 8 [gc]
gc-5678 [003] .... 214327.025972: block_bio_backmerge: 8,32 R 67493880 + 8 [gc]
gc-5678 [003] .... 214327.026000: block_bio_queue: 8,32 WS 34123776 + 2048 [gc]
gc-5678 [003] .... 214327.026019: block_getrq: 8,32 WS 34123776 + 2048 [gc]
gc-5678 [003] d..1 214327.026021: block_rq_insert: 8,32 R 131072 () 67493632 + 256 [gc]
gc-5678 [003] d..1 214327.026023: block_unplug: [gc] 1
gc-5678 [003] d..1 214327.026026: block_rq_issue: 8,32 R 131072 () 67493632 + 256 [gc]
gc-5678 [003] .... 214327.026046: block_plug: [gc]
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-08-14 22:37:25 +08:00
|
|
|
start_bidx, REQ_RAHEAD, true);
|
2018-04-24 10:55:28 +08:00
|
|
|
up_write(&F2FS_I(inode)->i_gc_rwsem[WRITE]);
|
2014-11-27 15:03:08 +08:00
|
|
|
if (IS_ERR(data_page)) {
|
|
|
|
iput(inode);
|
|
|
|
continue;
|
|
|
|
}
|
f2fs: add garbage collection functions
This adds on-demand and background cleaning functions.
- The basic background cleaning policy is trying to do cleaning jobs as much as
possible whenever the system is idle. Once the background cleaning is done,
the cleaner sleeps an amount of time not to interfere with VFS calls. The time
is dynamically adjusted according to the status of whole segments, which is
decreased when the following conditions are satisfied.
. GC is not conducted currently, and
. IO subsystem is idle by checking the number of requets in bdev's request
list, and
. There are enough dirty segments.
Otherwise, the time is increased incrementally until to the maximum time.
Note that, min and max times are 10 secs and 30 secs by default.
- F2FS adopts a default victim selection policy where background cleaning uses
a cost-benefit algorithm, while on-demand cleaning uses a greedy algorithm.
- The method of moving data during the cleaning is slightly different between
background and on-demand cleaning schemes. In the case of background cleaning,
F2FS loads the data, and marks them as dirty. Then, F2FS expects that the data
will be moved by flusher or VM. In the case of on-demand cleaning, F2FS should
move the data right away.
- In order to identify valid blocks in a victim segment, F2FS scans the bitmap
of the segment managed as an SIT entry.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:13:01 +08:00
|
|
|
|
|
|
|
f2fs_put_page(data_page, 0);
|
2014-11-28 23:49:40 +08:00
|
|
|
add_gc_inode(gc_list, inode);
|
2014-11-27 15:03:08 +08:00
|
|
|
continue;
|
|
|
|
}
|
|
|
|
|
f2fs: do in batch synchronously readahead during GC
In order to enhance performance, we try to readahead node page during
GC, but before loading node page we should get block address of node page
which is stored in NAT table, so synchronously read of single NAT page
block our readahead flow.
f2fs_submit_page_bio: dev = (251,0), ino = 2, page_index = 0xa1e, oldaddr = 0xa1e, newaddr = 0xa1e, rw = READ_SYNC(MP), type = META
f2fs_submit_page_bio: dev = (251,0), ino = 1, page_index = 0x35e9, oldaddr = 0x72d7a, newaddr = 0x72d7a, rw = READAHEAD ^H, type = NODE
f2fs_submit_page_bio: dev = (251,0), ino = 2, page_index = 0xc1f, oldaddr = 0xc1f, newaddr = 0xc1f, rw = READ_SYNC(MP), type = META
f2fs_submit_page_bio: dev = (251,0), ino = 1, page_index = 0x389d, oldaddr = 0x72d7d, newaddr = 0x72d7d, rw = READAHEAD ^H, type = NODE
f2fs_submit_page_bio: dev = (251,0), ino = 1, page_index = 0x3a82, oldaddr = 0x72d7f, newaddr = 0x72d7f, rw = READAHEAD ^H, type = NODE
f2fs_submit_page_bio: dev = (251,0), ino = 1, page_index = 0x3bfa, oldaddr = 0x72d86, newaddr = 0x72d86, rw = READAHEAD ^H, type = NODE
This patch adds one phase that do readahead NAT pages in batch before
readahead node page for more effeciently.
f2fs_submit_page_bio: dev = (251,0), ino = 2, page_index = 0x1952, oldaddr = 0x1952, newaddr = 0x1952, rw = READ_SYNC(MP), type = META
f2fs_submit_page_mbio: dev = (251,0), ino = 2, page_index = 0xc34, oldaddr = 0xc34, newaddr = 0xc34, rw = READ_SYNC(MP), type = META
f2fs_submit_page_mbio: dev = (251,0), ino = 2, page_index = 0xa33, oldaddr = 0xa33, newaddr = 0xa33, rw = READ_SYNC(MP), type = META
f2fs_submit_page_mbio: dev = (251,0), ino = 2, page_index = 0xc30, oldaddr = 0xc30, newaddr = 0xc30, rw = READ_SYNC(MP), type = META
f2fs_submit_page_mbio: dev = (251,0), ino = 2, page_index = 0xc32, oldaddr = 0xc32, newaddr = 0xc32, rw = READ_SYNC(MP), type = META
f2fs_submit_page_mbio: dev = (251,0), ino = 2, page_index = 0xc26, oldaddr = 0xc26, newaddr = 0xc26, rw = READ_SYNC(MP), type = META
f2fs_submit_page_mbio: dev = (251,0), ino = 2, page_index = 0xa2b, oldaddr = 0xa2b, newaddr = 0xa2b, rw = READ_SYNC(MP), type = META
f2fs_submit_page_mbio: dev = (251,0), ino = 2, page_index = 0xc23, oldaddr = 0xc23, newaddr = 0xc23, rw = READ_SYNC(MP), type = META
f2fs_submit_page_mbio: dev = (251,0), ino = 2, page_index = 0xc24, oldaddr = 0xc24, newaddr = 0xc24, rw = READ_SYNC(MP), type = META
f2fs_submit_page_mbio: dev = (251,0), ino = 2, page_index = 0xa10, oldaddr = 0xa10, newaddr = 0xa10, rw = READ_SYNC(MP), type = META
f2fs_submit_page_mbio: dev = (251,0), ino = 2, page_index = 0xc2c, oldaddr = 0xc2c, newaddr = 0xc2c, rw = READ_SYNC(MP), type = META
f2fs_submit_page_bio: dev = (251,0), ino = 1, page_index = 0x5db7, oldaddr = 0x6be00, newaddr = 0x6be00, rw = READAHEAD ^H, type = NODE
f2fs_submit_page_bio: dev = (251,0), ino = 1, page_index = 0x5db9, oldaddr = 0x6be17, newaddr = 0x6be17, rw = READAHEAD ^H, type = NODE
f2fs_submit_page_bio: dev = (251,0), ino = 1, page_index = 0x5dbc, oldaddr = 0x6be1a, newaddr = 0x6be1a, rw = READAHEAD ^H, type = NODE
f2fs_submit_page_bio: dev = (251,0), ino = 1, page_index = 0x5dc3, oldaddr = 0x6be20, newaddr = 0x6be20, rw = READAHEAD ^H, type = NODE
f2fs_submit_page_bio: dev = (251,0), ino = 1, page_index = 0x5dc7, oldaddr = 0x6be24, newaddr = 0x6be24, rw = READAHEAD ^H, type = NODE
f2fs_submit_page_bio: dev = (251,0), ino = 1, page_index = 0x5dc9, oldaddr = 0x6be25, newaddr = 0x6be25, rw = READAHEAD ^H, type = NODE
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-08-27 00:14:31 +08:00
|
|
|
/* phase 4 */
|
2014-11-28 23:49:40 +08:00
|
|
|
inode = find_gc_inode(gc_list, dni.ino);
|
2014-11-27 15:03:08 +08:00
|
|
|
if (inode) {
|
2016-07-13 09:18:29 +08:00
|
|
|
struct f2fs_inode_info *fi = F2FS_I(inode);
|
|
|
|
bool locked = false;
|
2018-09-13 07:40:53 +08:00
|
|
|
int err;
|
2016-07-13 09:18:29 +08:00
|
|
|
|
|
|
|
if (S_ISREG(inode->i_mode)) {
|
2018-04-24 10:55:28 +08:00
|
|
|
if (!down_write_trylock(&fi->i_gc_rwsem[READ]))
|
2016-07-13 09:18:29 +08:00
|
|
|
continue;
|
|
|
|
if (!down_write_trylock(
|
2018-04-24 10:55:28 +08:00
|
|
|
&fi->i_gc_rwsem[WRITE])) {
|
2018-07-25 11:11:56 +08:00
|
|
|
sbi->skipped_gc_rwsem++;
|
2018-04-24 10:55:28 +08:00
|
|
|
up_write(&fi->i_gc_rwsem[READ]);
|
2016-07-13 09:18:29 +08:00
|
|
|
continue;
|
|
|
|
}
|
|
|
|
locked = true;
|
2017-08-23 18:23:24 +08:00
|
|
|
|
|
|
|
/* wait for all inflight aio data */
|
|
|
|
inode_dio_wait(inode);
|
2016-07-13 09:18:29 +08:00
|
|
|
}
|
|
|
|
|
f2fs: clean up symbol namespace
As Ted reported:
"Hi, I was looking at f2fs's sources recently, and I noticed that there
is a very large number of non-static symbols which don't have a f2fs
prefix. There's well over a hundred (see attached below).
As one example, in fs/f2fs/dir.c there is:
unsigned char get_de_type(struct f2fs_dir_entry *de)
This function is clearly only useful for f2fs, but it has a generic
name. This means that if any other file system tries to have the same
symbol name, there will be a symbol conflict and the kernel would not
successfully build. It also means that when someone is looking f2fs
sources, it's not at all obvious whether a function such as
read_data_page(), invalidate_blocks(), is a generic kernel function
found in the fs, mm, or block layers, or a f2fs specific function.
You might want to fix this at some point. Hopefully Kent's bcachefs
isn't similarly using genericly named functions, since that might
cause conflicts with f2fs's functions --- but just as this would be a
problem that we would rightly insist that Kent fix, this is something
that we should have rightly insisted that f2fs should have fixed
before it was integrated into the mainline kernel.
acquire_orphan_inode
add_ino_entry
add_orphan_inode
allocate_data_block
allocate_new_segments
alloc_nid
alloc_nid_done
alloc_nid_failed
available_free_memory
...."
This patch adds "f2fs_" prefix for all non-static symbols in order to:
a) avoid conflict with other kernel generic symbols;
b) to indicate the function is f2fs specific one instead of generic
one;
Reported-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-30 00:20:41 +08:00
|
|
|
start_bidx = f2fs_start_bidx_of_node(nofs, inode)
|
2015-04-25 05:34:30 +08:00
|
|
|
+ ofs_in_node;
|
f2fs: refactor read path to allow multiple postprocessing steps
Currently f2fs's ->readpage() and ->readpages() assume that either the
data undergoes no postprocessing, or decryption only. But with
fs-verity, there will be an additional authenticity verification step,
and it may be needed either by itself, or combined with decryption.
To support this, store a 'struct bio_post_read_ctx' in ->bi_private
which contains a work struct, a bitmask of postprocessing steps that are
enabled, and an indicator of the current step. The bio completion
routine, if there was no I/O error, enqueues the first postprocessing
step. When that completes, it continues to the next step. Pages that
fail any postprocessing step have PageError set. Once all steps have
completed, pages without PageError set are set Uptodate, and all pages
are unlocked.
Also replace f2fs_encrypted_file() with a new function
f2fs_post_read_required() in places like direct I/O and garbage
collection that really should be testing whether the file needs special
I/O processing, not whether it is encrypted specifically.
This may also be useful for other future f2fs features such as
compression.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-04-19 02:09:48 +08:00
|
|
|
if (f2fs_post_read_required(inode))
|
2018-09-13 07:40:53 +08:00
|
|
|
err = move_data_block(inode, start_bidx,
|
|
|
|
gc_type, segno, off);
|
2015-04-24 03:04:33 +08:00
|
|
|
else
|
2018-09-13 07:40:53 +08:00
|
|
|
err = move_data_page(inode, start_bidx, gc_type,
|
2017-09-06 08:04:35 +08:00
|
|
|
segno, off);
|
2016-07-13 09:18:29 +08:00
|
|
|
|
2018-09-13 07:40:53 +08:00
|
|
|
if (!err && (gc_type == FG_GC ||
|
|
|
|
f2fs_post_read_required(inode)))
|
|
|
|
submitted++;
|
|
|
|
|
2016-07-13 09:18:29 +08:00
|
|
|
if (locked) {
|
2018-04-24 10:55:28 +08:00
|
|
|
up_write(&fi->i_gc_rwsem[WRITE]);
|
|
|
|
up_write(&fi->i_gc_rwsem[READ]);
|
2016-07-13 09:18:29 +08:00
|
|
|
}
|
|
|
|
|
2014-12-23 07:37:39 +08:00
|
|
|
stat_inc_data_blk_count(sbi, 1, gc_type);
|
f2fs: add garbage collection functions
This adds on-demand and background cleaning functions.
- The basic background cleaning policy is trying to do cleaning jobs as much as
possible whenever the system is idle. Once the background cleaning is done,
the cleaner sleeps an amount of time not to interfere with VFS calls. The time
is dynamically adjusted according to the status of whole segments, which is
decreased when the following conditions are satisfied.
. GC is not conducted currently, and
. IO subsystem is idle by checking the number of requets in bdev's request
list, and
. There are enough dirty segments.
Otherwise, the time is increased incrementally until to the maximum time.
Note that, min and max times are 10 secs and 30 secs by default.
- F2FS adopts a default victim selection policy where background cleaning uses
a cost-benefit algorithm, while on-demand cleaning uses a greedy algorithm.
- The method of moving data during the cleaning is slightly different between
background and on-demand cleaning schemes. In the case of background cleaning,
F2FS loads the data, and marks them as dirty. Then, F2FS expects that the data
will be moved by flusher or VM. In the case of on-demand cleaning, F2FS should
move the data right away.
- In order to identify valid blocks in a victim segment, F2FS scans the bitmap
of the segment managed as an SIT entry.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:13:01 +08:00
|
|
|
}
|
|
|
|
}
|
f2fs: give a chance to merge IOs by IO scheduler
Previously, background GC submits many 4KB read requests to load victim blocks
and/or its (i)node blocks.
...
f2fs_gc : f2fs_readpage: ino = 1, page_index = 0xb61, blkaddr = 0x3b964ed
f2fs_gc : block_rq_complete: 8,16 R () 499854968 + 8 [0]
f2fs_gc : f2fs_readpage: ino = 1, page_index = 0xb6f, blkaddr = 0x3b964ee
f2fs_gc : block_rq_complete: 8,16 R () 499854976 + 8 [0]
f2fs_gc : f2fs_readpage: ino = 1, page_index = 0xb79, blkaddr = 0x3b964ef
f2fs_gc : block_rq_complete: 8,16 R () 499854984 + 8 [0]
...
However, by the fact that many IOs are sequential, we can give a chance to merge
the IOs by IO scheduler.
In order to do that, let's use blk_plug.
...
f2fs_gc : f2fs_iget: ino = 143
f2fs_gc : f2fs_readpage: ino = 143, page_index = 0x1c6, blkaddr = 0x2e6ee
f2fs_gc : f2fs_iget: ino = 143
f2fs_gc : f2fs_readpage: ino = 143, page_index = 0x1c7, blkaddr = 0x2e6ef
<idle> : block_rq_complete: 8,16 R () 1519616 + 8 [0]
<idle> : block_rq_complete: 8,16 R () 1519848 + 8 [0]
<idle> : block_rq_complete: 8,16 R () 1520432 + 96 [0]
<idle> : block_rq_complete: 8,16 R () 1520536 + 104 [0]
<idle> : block_rq_complete: 8,16 R () 1521008 + 112 [0]
<idle> : block_rq_complete: 8,16 R () 1521440 + 152 [0]
<idle> : block_rq_complete: 8,16 R () 1521688 + 144 [0]
<idle> : block_rq_complete: 8,16 R () 1522128 + 192 [0]
<idle> : block_rq_complete: 8,16 R () 1523256 + 328 [0]
...
Note that this issue should be addressed in checkpoint, and some readahead
flows too.
Reviewed-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-04-24 12:19:56 +08:00
|
|
|
|
f2fs: do in batch synchronously readahead during GC
In order to enhance performance, we try to readahead node page during
GC, but before loading node page we should get block address of node page
which is stored in NAT table, so synchronously read of single NAT page
block our readahead flow.
f2fs_submit_page_bio: dev = (251,0), ino = 2, page_index = 0xa1e, oldaddr = 0xa1e, newaddr = 0xa1e, rw = READ_SYNC(MP), type = META
f2fs_submit_page_bio: dev = (251,0), ino = 1, page_index = 0x35e9, oldaddr = 0x72d7a, newaddr = 0x72d7a, rw = READAHEAD ^H, type = NODE
f2fs_submit_page_bio: dev = (251,0), ino = 2, page_index = 0xc1f, oldaddr = 0xc1f, newaddr = 0xc1f, rw = READ_SYNC(MP), type = META
f2fs_submit_page_bio: dev = (251,0), ino = 1, page_index = 0x389d, oldaddr = 0x72d7d, newaddr = 0x72d7d, rw = READAHEAD ^H, type = NODE
f2fs_submit_page_bio: dev = (251,0), ino = 1, page_index = 0x3a82, oldaddr = 0x72d7f, newaddr = 0x72d7f, rw = READAHEAD ^H, type = NODE
f2fs_submit_page_bio: dev = (251,0), ino = 1, page_index = 0x3bfa, oldaddr = 0x72d86, newaddr = 0x72d86, rw = READAHEAD ^H, type = NODE
This patch adds one phase that do readahead NAT pages in batch before
readahead node page for more effeciently.
f2fs_submit_page_bio: dev = (251,0), ino = 2, page_index = 0x1952, oldaddr = 0x1952, newaddr = 0x1952, rw = READ_SYNC(MP), type = META
f2fs_submit_page_mbio: dev = (251,0), ino = 2, page_index = 0xc34, oldaddr = 0xc34, newaddr = 0xc34, rw = READ_SYNC(MP), type = META
f2fs_submit_page_mbio: dev = (251,0), ino = 2, page_index = 0xa33, oldaddr = 0xa33, newaddr = 0xa33, rw = READ_SYNC(MP), type = META
f2fs_submit_page_mbio: dev = (251,0), ino = 2, page_index = 0xc30, oldaddr = 0xc30, newaddr = 0xc30, rw = READ_SYNC(MP), type = META
f2fs_submit_page_mbio: dev = (251,0), ino = 2, page_index = 0xc32, oldaddr = 0xc32, newaddr = 0xc32, rw = READ_SYNC(MP), type = META
f2fs_submit_page_mbio: dev = (251,0), ino = 2, page_index = 0xc26, oldaddr = 0xc26, newaddr = 0xc26, rw = READ_SYNC(MP), type = META
f2fs_submit_page_mbio: dev = (251,0), ino = 2, page_index = 0xa2b, oldaddr = 0xa2b, newaddr = 0xa2b, rw = READ_SYNC(MP), type = META
f2fs_submit_page_mbio: dev = (251,0), ino = 2, page_index = 0xc23, oldaddr = 0xc23, newaddr = 0xc23, rw = READ_SYNC(MP), type = META
f2fs_submit_page_mbio: dev = (251,0), ino = 2, page_index = 0xc24, oldaddr = 0xc24, newaddr = 0xc24, rw = READ_SYNC(MP), type = META
f2fs_submit_page_mbio: dev = (251,0), ino = 2, page_index = 0xa10, oldaddr = 0xa10, newaddr = 0xa10, rw = READ_SYNC(MP), type = META
f2fs_submit_page_mbio: dev = (251,0), ino = 2, page_index = 0xc2c, oldaddr = 0xc2c, newaddr = 0xc2c, rw = READ_SYNC(MP), type = META
f2fs_submit_page_bio: dev = (251,0), ino = 1, page_index = 0x5db7, oldaddr = 0x6be00, newaddr = 0x6be00, rw = READAHEAD ^H, type = NODE
f2fs_submit_page_bio: dev = (251,0), ino = 1, page_index = 0x5db9, oldaddr = 0x6be17, newaddr = 0x6be17, rw = READAHEAD ^H, type = NODE
f2fs_submit_page_bio: dev = (251,0), ino = 1, page_index = 0x5dbc, oldaddr = 0x6be1a, newaddr = 0x6be1a, rw = READAHEAD ^H, type = NODE
f2fs_submit_page_bio: dev = (251,0), ino = 1, page_index = 0x5dc3, oldaddr = 0x6be20, newaddr = 0x6be20, rw = READAHEAD ^H, type = NODE
f2fs_submit_page_bio: dev = (251,0), ino = 1, page_index = 0x5dc7, oldaddr = 0x6be24, newaddr = 0x6be24, rw = READAHEAD ^H, type = NODE
f2fs_submit_page_bio: dev = (251,0), ino = 1, page_index = 0x5dc9, oldaddr = 0x6be25, newaddr = 0x6be25, rw = READAHEAD ^H, type = NODE
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-08-27 00:14:31 +08:00
|
|
|
if (++phase < 5)
|
f2fs: add garbage collection functions
This adds on-demand and background cleaning functions.
- The basic background cleaning policy is trying to do cleaning jobs as much as
possible whenever the system is idle. Once the background cleaning is done,
the cleaner sleeps an amount of time not to interfere with VFS calls. The time
is dynamically adjusted according to the status of whole segments, which is
decreased when the following conditions are satisfied.
. GC is not conducted currently, and
. IO subsystem is idle by checking the number of requets in bdev's request
list, and
. There are enough dirty segments.
Otherwise, the time is increased incrementally until to the maximum time.
Note that, min and max times are 10 secs and 30 secs by default.
- F2FS adopts a default victim selection policy where background cleaning uses
a cost-benefit algorithm, while on-demand cleaning uses a greedy algorithm.
- The method of moving data during the cleaning is slightly different between
background and on-demand cleaning schemes. In the case of background cleaning,
F2FS loads the data, and marks them as dirty. Then, F2FS expects that the data
will be moved by flusher or VM. In the case of on-demand cleaning, F2FS should
move the data right away.
- In order to identify valid blocks in a victim segment, F2FS scans the bitmap
of the segment managed as an SIT entry.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:13:01 +08:00
|
|
|
goto next_step;
|
2018-09-13 07:40:53 +08:00
|
|
|
|
|
|
|
return submitted;
|
f2fs: add garbage collection functions
This adds on-demand and background cleaning functions.
- The basic background cleaning policy is trying to do cleaning jobs as much as
possible whenever the system is idle. Once the background cleaning is done,
the cleaner sleeps an amount of time not to interfere with VFS calls. The time
is dynamically adjusted according to the status of whole segments, which is
decreased when the following conditions are satisfied.
. GC is not conducted currently, and
. IO subsystem is idle by checking the number of requets in bdev's request
list, and
. There are enough dirty segments.
Otherwise, the time is increased incrementally until to the maximum time.
Note that, min and max times are 10 secs and 30 secs by default.
- F2FS adopts a default victim selection policy where background cleaning uses
a cost-benefit algorithm, while on-demand cleaning uses a greedy algorithm.
- The method of moving data during the cleaning is slightly different between
background and on-demand cleaning schemes. In the case of background cleaning,
F2FS loads the data, and marks them as dirty. Then, F2FS expects that the data
will be moved by flusher or VM. In the case of on-demand cleaning, F2FS should
move the data right away.
- In order to identify valid blocks in a victim segment, F2FS scans the bitmap
of the segment managed as an SIT entry.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:13:01 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
static int __get_victim(struct f2fs_sb_info *sbi, unsigned int *victim,
|
2014-10-20 17:45:48 +08:00
|
|
|
int gc_type)
|
f2fs: add garbage collection functions
This adds on-demand and background cleaning functions.
- The basic background cleaning policy is trying to do cleaning jobs as much as
possible whenever the system is idle. Once the background cleaning is done,
the cleaner sleeps an amount of time not to interfere with VFS calls. The time
is dynamically adjusted according to the status of whole segments, which is
decreased when the following conditions are satisfied.
. GC is not conducted currently, and
. IO subsystem is idle by checking the number of requets in bdev's request
list, and
. There are enough dirty segments.
Otherwise, the time is increased incrementally until to the maximum time.
Note that, min and max times are 10 secs and 30 secs by default.
- F2FS adopts a default victim selection policy where background cleaning uses
a cost-benefit algorithm, while on-demand cleaning uses a greedy algorithm.
- The method of moving data during the cleaning is slightly different between
background and on-demand cleaning schemes. In the case of background cleaning,
F2FS loads the data, and marks them as dirty. Then, F2FS expects that the data
will be moved by flusher or VM. In the case of on-demand cleaning, F2FS should
move the data right away.
- In order to identify valid blocks in a victim segment, F2FS scans the bitmap
of the segment managed as an SIT entry.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:13:01 +08:00
|
|
|
{
|
|
|
|
struct sit_info *sit_i = SIT_I(sbi);
|
|
|
|
int ret;
|
2014-10-20 17:45:48 +08:00
|
|
|
|
2017-10-30 17:49:53 +08:00
|
|
|
down_write(&sit_i->sentry_lock);
|
2014-10-20 17:45:48 +08:00
|
|
|
ret = DIRTY_I(sbi)->v_ops->get_victim(sbi, victim, gc_type,
|
f2fs: support age threshold based garbage collection
There are several issues in current background GC algorithm:
- valid blocks is one of key factors during cost overhead calculation,
so if segment has less valid block, however even its age is young or
it locates hot segment, CB algorithm will still choose the segment as
victim, it's not appropriate.
- GCed data/node will go to existing logs, no matter in-there datas'
update frequency is the same or not, it may mix hot and cold data
again.
- GC alloctor mainly use LFS type segment, it will cost free segment
more quickly.
This patch introduces a new algorithm named age threshold based
garbage collection to solve above issues, there are three steps
mainly:
1. select a source victim:
- set an age threshold, and select candidates beased threshold:
e.g.
0 means youngest, 100 means oldest, if we set age threshold to 80
then select dirty segments which has age in range of [80, 100] as
candiddates;
- set candidate_ratio threshold, and select candidates based the
ratio, so that we can shrink candidates to those oldest segments;
- select target segment with fewest valid blocks in order to
migrate blocks with minimum cost;
2. select a target victim:
- select candidates beased age threshold;
- set candidate_radius threshold, search candidates whose age is
around source victims, searching radius should less than the
radius threshold.
- select target segment with most valid blocks in order to avoid
migrating current target segment.
3. merge valid blocks from source victim into target victim with
SSR alloctor.
Test steps:
- create 160 dirty segments:
* half of them have 128 valid blocks per segment
* left of them have 384 valid blocks per segment
- run background GC
Benefit: GC count and block movement count both decrease obviously:
- Before:
- Valid: 86
- Dirty: 1
- Prefree: 11
- Free: 6001 (6001)
GC calls: 162 (BG: 220)
- data segments : 160 (160)
- node segments : 2 (2)
Try to move 41454 blocks (BG: 41454)
- data blocks : 40960 (40960)
- node blocks : 494 (494)
IPU: 0 blocks
SSR: 0 blocks in 0 segments
LFS: 41364 blocks in 81 segments
- After:
- Valid: 87
- Dirty: 0
- Prefree: 4
- Free: 6008 (6008)
GC calls: 75 (BG: 76)
- data segments : 74 (74)
- node segments : 1 (1)
Try to move 12813 blocks (BG: 12813)
- data blocks : 12544 (12544)
- node blocks : 269 (269)
IPU: 0 blocks
SSR: 12032 blocks in 77 segments
LFS: 855 blocks in 2 segments
Signed-off-by: Chao Yu <yuchao0@huawei.com>
[Jaegeuk Kim: fix a bug along with pinfile in-mem segment & clean up]
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-08-04 21:14:49 +08:00
|
|
|
NO_CHECK_TYPE, LFS, 0);
|
2017-10-30 17:49:53 +08:00
|
|
|
up_write(&sit_i->sentry_lock);
|
f2fs: add garbage collection functions
This adds on-demand and background cleaning functions.
- The basic background cleaning policy is trying to do cleaning jobs as much as
possible whenever the system is idle. Once the background cleaning is done,
the cleaner sleeps an amount of time not to interfere with VFS calls. The time
is dynamically adjusted according to the status of whole segments, which is
decreased when the following conditions are satisfied.
. GC is not conducted currently, and
. IO subsystem is idle by checking the number of requets in bdev's request
list, and
. There are enough dirty segments.
Otherwise, the time is increased incrementally until to the maximum time.
Note that, min and max times are 10 secs and 30 secs by default.
- F2FS adopts a default victim selection policy where background cleaning uses
a cost-benefit algorithm, while on-demand cleaning uses a greedy algorithm.
- The method of moving data during the cleaning is slightly different between
background and on-demand cleaning schemes. In the case of background cleaning,
F2FS loads the data, and marks them as dirty. Then, F2FS expects that the data
will be moved by flusher or VM. In the case of on-demand cleaning, F2FS should
move the data right away.
- In order to identify valid blocks in a victim segment, F2FS scans the bitmap
of the segment managed as an SIT entry.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:13:01 +08:00
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
|
f2fs: enhance foreground GC
If we configure section consist of multiple segments, foreground GC will
do the garbage collection with following approach:
for each segment in victim section
blk_start_plug
for each valid block in segment
write out by OPU method
submit bio cache <---
blk_finish_plug <---
There are two issue:
1) for most of the time, 'submit bio cache' will break the merging in
current bio buffer from writes of next segments, making a smaller bio
submitting.
2) block plug only cover IO submitting in one segment, which reduce
opportunity of merging IOs in plug with multiple segments.
So refactor the code as below structure to strive for biggest
opportunity of merging IOs:
blk_start_plug
for each segment in victim section
for each valid block in segment
write out by OPU method
submit bio cache
blk_finish_plug
Test method:
1. mkfs.f2fs -s 8 /dev/sdX
2. touch 32 files
3. write 2M data into each file
4. punch 1.5M data from offset 0 for each file
5. trigger foreground gc through ioctl
Before patch, there are totoally 40 bios submitted.
f2fs_submit_write_bio: dev = (8,32), WRITE_SYNC, DATA, sector = 65536, size = 122880
f2fs_submit_write_bio: dev = (8,32), WRITE_SYNC, DATA, sector = 65776, size = 122880
f2fs_submit_write_bio: dev = (8,32), WRITE_SYNC, DATA, sector = 66016, size = 122880
f2fs_submit_write_bio: dev = (8,32), WRITE_SYNC, DATA, sector = 66256, size = 122880
f2fs_submit_write_bio: dev = (8,32), WRITE_SYNC, DATA, sector = 66496, size = 32768
----repeat for 8 times
After patch, there are totally 35 bios submitted.
f2fs_submit_write_bio: dev = (8,32), WRITE_SYNC, DATA, sector = 65536, size = 122880
----repeat 34 times
f2fs_submit_write_bio: dev = (8,32), WRITE_SYNC, DATA, sector = 73696, size = 16384
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-01-23 16:23:55 +08:00
|
|
|
static int do_garbage_collect(struct f2fs_sb_info *sbi,
|
|
|
|
unsigned int start_segno,
|
2014-11-28 23:49:40 +08:00
|
|
|
struct gc_inode_list *gc_list, int gc_type)
|
f2fs: add garbage collection functions
This adds on-demand and background cleaning functions.
- The basic background cleaning policy is trying to do cleaning jobs as much as
possible whenever the system is idle. Once the background cleaning is done,
the cleaner sleeps an amount of time not to interfere with VFS calls. The time
is dynamically adjusted according to the status of whole segments, which is
decreased when the following conditions are satisfied.
. GC is not conducted currently, and
. IO subsystem is idle by checking the number of requets in bdev's request
list, and
. There are enough dirty segments.
Otherwise, the time is increased incrementally until to the maximum time.
Note that, min and max times are 10 secs and 30 secs by default.
- F2FS adopts a default victim selection policy where background cleaning uses
a cost-benefit algorithm, while on-demand cleaning uses a greedy algorithm.
- The method of moving data during the cleaning is slightly different between
background and on-demand cleaning schemes. In the case of background cleaning,
F2FS loads the data, and marks them as dirty. Then, F2FS expects that the data
will be moved by flusher or VM. In the case of on-demand cleaning, F2FS should
move the data right away.
- In order to identify valid blocks in a victim segment, F2FS scans the bitmap
of the segment managed as an SIT entry.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:13:01 +08:00
|
|
|
{
|
|
|
|
struct page *sum_page;
|
|
|
|
struct f2fs_summary_block *sum;
|
f2fs: give a chance to merge IOs by IO scheduler
Previously, background GC submits many 4KB read requests to load victim blocks
and/or its (i)node blocks.
...
f2fs_gc : f2fs_readpage: ino = 1, page_index = 0xb61, blkaddr = 0x3b964ed
f2fs_gc : block_rq_complete: 8,16 R () 499854968 + 8 [0]
f2fs_gc : f2fs_readpage: ino = 1, page_index = 0xb6f, blkaddr = 0x3b964ee
f2fs_gc : block_rq_complete: 8,16 R () 499854976 + 8 [0]
f2fs_gc : f2fs_readpage: ino = 1, page_index = 0xb79, blkaddr = 0x3b964ef
f2fs_gc : block_rq_complete: 8,16 R () 499854984 + 8 [0]
...
However, by the fact that many IOs are sequential, we can give a chance to merge
the IOs by IO scheduler.
In order to do that, let's use blk_plug.
...
f2fs_gc : f2fs_iget: ino = 143
f2fs_gc : f2fs_readpage: ino = 143, page_index = 0x1c6, blkaddr = 0x2e6ee
f2fs_gc : f2fs_iget: ino = 143
f2fs_gc : f2fs_readpage: ino = 143, page_index = 0x1c7, blkaddr = 0x2e6ef
<idle> : block_rq_complete: 8,16 R () 1519616 + 8 [0]
<idle> : block_rq_complete: 8,16 R () 1519848 + 8 [0]
<idle> : block_rq_complete: 8,16 R () 1520432 + 96 [0]
<idle> : block_rq_complete: 8,16 R () 1520536 + 104 [0]
<idle> : block_rq_complete: 8,16 R () 1521008 + 112 [0]
<idle> : block_rq_complete: 8,16 R () 1521440 + 152 [0]
<idle> : block_rq_complete: 8,16 R () 1521688 + 144 [0]
<idle> : block_rq_complete: 8,16 R () 1522128 + 192 [0]
<idle> : block_rq_complete: 8,16 R () 1523256 + 328 [0]
...
Note that this issue should be addressed in checkpoint, and some readahead
flows too.
Reviewed-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-04-24 12:19:56 +08:00
|
|
|
struct blk_plug plug;
|
f2fs: enhance foreground GC
If we configure section consist of multiple segments, foreground GC will
do the garbage collection with following approach:
for each segment in victim section
blk_start_plug
for each valid block in segment
write out by OPU method
submit bio cache <---
blk_finish_plug <---
There are two issue:
1) for most of the time, 'submit bio cache' will break the merging in
current bio buffer from writes of next segments, making a smaller bio
submitting.
2) block plug only cover IO submitting in one segment, which reduce
opportunity of merging IOs in plug with multiple segments.
So refactor the code as below structure to strive for biggest
opportunity of merging IOs:
blk_start_plug
for each segment in victim section
for each valid block in segment
write out by OPU method
submit bio cache
blk_finish_plug
Test method:
1. mkfs.f2fs -s 8 /dev/sdX
2. touch 32 files
3. write 2M data into each file
4. punch 1.5M data from offset 0 for each file
5. trigger foreground gc through ioctl
Before patch, there are totoally 40 bios submitted.
f2fs_submit_write_bio: dev = (8,32), WRITE_SYNC, DATA, sector = 65536, size = 122880
f2fs_submit_write_bio: dev = (8,32), WRITE_SYNC, DATA, sector = 65776, size = 122880
f2fs_submit_write_bio: dev = (8,32), WRITE_SYNC, DATA, sector = 66016, size = 122880
f2fs_submit_write_bio: dev = (8,32), WRITE_SYNC, DATA, sector = 66256, size = 122880
f2fs_submit_write_bio: dev = (8,32), WRITE_SYNC, DATA, sector = 66496, size = 32768
----repeat for 8 times
After patch, there are totally 35 bios submitted.
f2fs_submit_write_bio: dev = (8,32), WRITE_SYNC, DATA, sector = 65536, size = 122880
----repeat 34 times
f2fs_submit_write_bio: dev = (8,32), WRITE_SYNC, DATA, sector = 73696, size = 16384
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-01-23 16:23:55 +08:00
|
|
|
unsigned int segno = start_segno;
|
|
|
|
unsigned int end_segno = start_segno + sbi->segs_per_sec;
|
f2fs: support subsectional garbage collection
Section is minimal garbage collection unit of f2fs, in zoned block
device, or ancient block mapping flash device, in order to improve
GC efficiency, we can align GC unit to lower device erase unit,
normally, it consists of multiple of segments.
Once background or foreground GC triggers, it brings a large number
of IOs, which will impact user IO, and also occupy cpu/memory resource
intensively.
So, to reduce impact of GC on large size section, this patch supports
subsectional GC, in one cycle of GC, it only migrate partial segment{s}
in victim section. Currently, by default, we use sbi->segs_per_sec as
migration granularity.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-10-24 18:37:27 +08:00
|
|
|
int seg_freed = 0, migrated = 0;
|
f2fs: enhance foreground GC
If we configure section consist of multiple segments, foreground GC will
do the garbage collection with following approach:
for each segment in victim section
blk_start_plug
for each valid block in segment
write out by OPU method
submit bio cache <---
blk_finish_plug <---
There are two issue:
1) for most of the time, 'submit bio cache' will break the merging in
current bio buffer from writes of next segments, making a smaller bio
submitting.
2) block plug only cover IO submitting in one segment, which reduce
opportunity of merging IOs in plug with multiple segments.
So refactor the code as below structure to strive for biggest
opportunity of merging IOs:
blk_start_plug
for each segment in victim section
for each valid block in segment
write out by OPU method
submit bio cache
blk_finish_plug
Test method:
1. mkfs.f2fs -s 8 /dev/sdX
2. touch 32 files
3. write 2M data into each file
4. punch 1.5M data from offset 0 for each file
5. trigger foreground gc through ioctl
Before patch, there are totoally 40 bios submitted.
f2fs_submit_write_bio: dev = (8,32), WRITE_SYNC, DATA, sector = 65536, size = 122880
f2fs_submit_write_bio: dev = (8,32), WRITE_SYNC, DATA, sector = 65776, size = 122880
f2fs_submit_write_bio: dev = (8,32), WRITE_SYNC, DATA, sector = 66016, size = 122880
f2fs_submit_write_bio: dev = (8,32), WRITE_SYNC, DATA, sector = 66256, size = 122880
f2fs_submit_write_bio: dev = (8,32), WRITE_SYNC, DATA, sector = 66496, size = 32768
----repeat for 8 times
After patch, there are totally 35 bios submitted.
f2fs_submit_write_bio: dev = (8,32), WRITE_SYNC, DATA, sector = 65536, size = 122880
----repeat 34 times
f2fs_submit_write_bio: dev = (8,32), WRITE_SYNC, DATA, sector = 73696, size = 16384
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-01-23 16:23:55 +08:00
|
|
|
unsigned char type = IS_DATASEG(get_seg_entry(sbi, segno)->type) ?
|
|
|
|
SUM_TYPE_DATA : SUM_TYPE_NODE;
|
2018-09-13 07:40:53 +08:00
|
|
|
int submitted = 0;
|
f2fs: add garbage collection functions
This adds on-demand and background cleaning functions.
- The basic background cleaning policy is trying to do cleaning jobs as much as
possible whenever the system is idle. Once the background cleaning is done,
the cleaner sleeps an amount of time not to interfere with VFS calls. The time
is dynamically adjusted according to the status of whole segments, which is
decreased when the following conditions are satisfied.
. GC is not conducted currently, and
. IO subsystem is idle by checking the number of requets in bdev's request
list, and
. There are enough dirty segments.
Otherwise, the time is increased incrementally until to the maximum time.
Note that, min and max times are 10 secs and 30 secs by default.
- F2FS adopts a default victim selection policy where background cleaning uses
a cost-benefit algorithm, while on-demand cleaning uses a greedy algorithm.
- The method of moving data during the cleaning is slightly different between
background and on-demand cleaning schemes. In the case of background cleaning,
F2FS loads the data, and marks them as dirty. Then, F2FS expects that the data
will be moved by flusher or VM. In the case of on-demand cleaning, F2FS should
move the data right away.
- In order to identify valid blocks in a victim segment, F2FS scans the bitmap
of the segment managed as an SIT entry.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:13:01 +08:00
|
|
|
|
f2fs: support subsectional garbage collection
Section is minimal garbage collection unit of f2fs, in zoned block
device, or ancient block mapping flash device, in order to improve
GC efficiency, we can align GC unit to lower device erase unit,
normally, it consists of multiple of segments.
Once background or foreground GC triggers, it brings a large number
of IOs, which will impact user IO, and also occupy cpu/memory resource
intensively.
So, to reduce impact of GC on large size section, this patch supports
subsectional GC, in one cycle of GC, it only migrate partial segment{s}
in victim section. Currently, by default, we use sbi->segs_per_sec as
migration granularity.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-10-24 18:37:27 +08:00
|
|
|
if (__is_large_section(sbi))
|
|
|
|
end_segno = rounddown(end_segno, sbi->segs_per_sec);
|
|
|
|
|
f2fs: support zone capacity less than zone size
NVMe Zoned Namespace devices can have zone-capacity less than zone-size.
Zone-capacity indicates the maximum number of sectors that are usable in
a zone beginning from the first sector of the zone. This makes the sectors
sectors after the zone-capacity till zone-size to be unusable.
This patch set tracks zone-size and zone-capacity in zoned devices and
calculate the usable blocks per segment and usable segments per section.
If zone-capacity is less than zone-size mark only those segments which
start before zone-capacity as free segments. All segments at and beyond
zone-capacity are treated as permanently used segments. In cases where
zone-capacity does not align with segment size the last segment will start
before zone-capacity and end beyond the zone-capacity of the zone. For
such spanning segments only sectors within the zone-capacity are used.
During writes and GC manage the usable segments in a section and usable
blocks per segment. Segments which are beyond zone-capacity are never
allocated, and do not need to be garbage collected, only the segments
which are before zone-capacity needs to garbage collected.
For spanning segments based on the number of usable blocks in that
segment, write to blocks only up to zone-capacity.
Zone-capacity is device specific and cannot be configured by the user.
Since NVMe ZNS device zones are sequentially write only, a block device
with conventional zones or any normal block device is needed along with
the ZNS device for the metadata operations of F2fs.
A typical nvme-cli output of a zoned device shows zone start and capacity
and write pointer as below:
SLBA: 0x0 WP: 0x0 Cap: 0x18800 State: EMPTY Type: SEQWRITE_REQ
SLBA: 0x20000 WP: 0x20000 Cap: 0x18800 State: EMPTY Type: SEQWRITE_REQ
SLBA: 0x40000 WP: 0x40000 Cap: 0x18800 State: EMPTY Type: SEQWRITE_REQ
Here zone size is 64MB, capacity is 49MB, WP is at zone start as the zones
are in EMPTY state. For each zone, only zone start + 49MB is usable area,
any lba/sector after 49MB cannot be read or written to, the drive will fail
any attempts to read/write. So, the second zone starts at 64MB and is
usable till 113MB (64 + 49) and the range between 113 and 128MB is
again unusable. The next zone starts at 128MB, and so on.
Signed-off-by: Aravind Ramesh <aravind.ramesh@wdc.com>
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-07-16 20:56:56 +08:00
|
|
|
/*
|
|
|
|
* zone-capacity can be less than zone-size in zoned devices,
|
|
|
|
* resulting in less than expected usable segments in the zone,
|
|
|
|
* calculate the end segno in the zone which can be garbage collected
|
|
|
|
*/
|
|
|
|
if (f2fs_sb_has_blkzoned(sbi))
|
|
|
|
end_segno -= sbi->segs_per_sec -
|
|
|
|
f2fs_usable_segs_in_sec(sbi, segno);
|
|
|
|
|
f2fs: support age threshold based garbage collection
There are several issues in current background GC algorithm:
- valid blocks is one of key factors during cost overhead calculation,
so if segment has less valid block, however even its age is young or
it locates hot segment, CB algorithm will still choose the segment as
victim, it's not appropriate.
- GCed data/node will go to existing logs, no matter in-there datas'
update frequency is the same or not, it may mix hot and cold data
again.
- GC alloctor mainly use LFS type segment, it will cost free segment
more quickly.
This patch introduces a new algorithm named age threshold based
garbage collection to solve above issues, there are three steps
mainly:
1. select a source victim:
- set an age threshold, and select candidates beased threshold:
e.g.
0 means youngest, 100 means oldest, if we set age threshold to 80
then select dirty segments which has age in range of [80, 100] as
candiddates;
- set candidate_ratio threshold, and select candidates based the
ratio, so that we can shrink candidates to those oldest segments;
- select target segment with fewest valid blocks in order to
migrate blocks with minimum cost;
2. select a target victim:
- select candidates beased age threshold;
- set candidate_radius threshold, search candidates whose age is
around source victims, searching radius should less than the
radius threshold.
- select target segment with most valid blocks in order to avoid
migrating current target segment.
3. merge valid blocks from source victim into target victim with
SSR alloctor.
Test steps:
- create 160 dirty segments:
* half of them have 128 valid blocks per segment
* left of them have 384 valid blocks per segment
- run background GC
Benefit: GC count and block movement count both decrease obviously:
- Before:
- Valid: 86
- Dirty: 1
- Prefree: 11
- Free: 6001 (6001)
GC calls: 162 (BG: 220)
- data segments : 160 (160)
- node segments : 2 (2)
Try to move 41454 blocks (BG: 41454)
- data blocks : 40960 (40960)
- node blocks : 494 (494)
IPU: 0 blocks
SSR: 0 blocks in 0 segments
LFS: 41364 blocks in 81 segments
- After:
- Valid: 87
- Dirty: 0
- Prefree: 4
- Free: 6008 (6008)
GC calls: 75 (BG: 76)
- data segments : 74 (74)
- node segments : 1 (1)
Try to move 12813 blocks (BG: 12813)
- data blocks : 12544 (12544)
- node blocks : 269 (269)
IPU: 0 blocks
SSR: 12032 blocks in 77 segments
LFS: 855 blocks in 2 segments
Signed-off-by: Chao Yu <yuchao0@huawei.com>
[Jaegeuk Kim: fix a bug along with pinfile in-mem segment & clean up]
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-08-04 21:14:49 +08:00
|
|
|
sanity_check_seg_type(sbi, get_seg_entry(sbi, segno)->type);
|
|
|
|
|
f2fs: enhance foreground GC
If we configure section consist of multiple segments, foreground GC will
do the garbage collection with following approach:
for each segment in victim section
blk_start_plug
for each valid block in segment
write out by OPU method
submit bio cache <---
blk_finish_plug <---
There are two issue:
1) for most of the time, 'submit bio cache' will break the merging in
current bio buffer from writes of next segments, making a smaller bio
submitting.
2) block plug only cover IO submitting in one segment, which reduce
opportunity of merging IOs in plug with multiple segments.
So refactor the code as below structure to strive for biggest
opportunity of merging IOs:
blk_start_plug
for each segment in victim section
for each valid block in segment
write out by OPU method
submit bio cache
blk_finish_plug
Test method:
1. mkfs.f2fs -s 8 /dev/sdX
2. touch 32 files
3. write 2M data into each file
4. punch 1.5M data from offset 0 for each file
5. trigger foreground gc through ioctl
Before patch, there are totoally 40 bios submitted.
f2fs_submit_write_bio: dev = (8,32), WRITE_SYNC, DATA, sector = 65536, size = 122880
f2fs_submit_write_bio: dev = (8,32), WRITE_SYNC, DATA, sector = 65776, size = 122880
f2fs_submit_write_bio: dev = (8,32), WRITE_SYNC, DATA, sector = 66016, size = 122880
f2fs_submit_write_bio: dev = (8,32), WRITE_SYNC, DATA, sector = 66256, size = 122880
f2fs_submit_write_bio: dev = (8,32), WRITE_SYNC, DATA, sector = 66496, size = 32768
----repeat for 8 times
After patch, there are totally 35 bios submitted.
f2fs_submit_write_bio: dev = (8,32), WRITE_SYNC, DATA, sector = 65536, size = 122880
----repeat 34 times
f2fs_submit_write_bio: dev = (8,32), WRITE_SYNC, DATA, sector = 73696, size = 16384
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-01-23 16:23:55 +08:00
|
|
|
/* readahead multi ssa blocks those have contiguous address */
|
2018-10-24 18:37:26 +08:00
|
|
|
if (__is_large_section(sbi))
|
f2fs: clean up symbol namespace
As Ted reported:
"Hi, I was looking at f2fs's sources recently, and I noticed that there
is a very large number of non-static symbols which don't have a f2fs
prefix. There's well over a hundred (see attached below).
As one example, in fs/f2fs/dir.c there is:
unsigned char get_de_type(struct f2fs_dir_entry *de)
This function is clearly only useful for f2fs, but it has a generic
name. This means that if any other file system tries to have the same
symbol name, there will be a symbol conflict and the kernel would not
successfully build. It also means that when someone is looking f2fs
sources, it's not at all obvious whether a function such as
read_data_page(), invalidate_blocks(), is a generic kernel function
found in the fs, mm, or block layers, or a f2fs specific function.
You might want to fix this at some point. Hopefully Kent's bcachefs
isn't similarly using genericly named functions, since that might
cause conflicts with f2fs's functions --- but just as this would be a
problem that we would rightly insist that Kent fix, this is something
that we should have rightly insisted that f2fs should have fixed
before it was integrated into the mainline kernel.
acquire_orphan_inode
add_ino_entry
add_orphan_inode
allocate_data_block
allocate_new_segments
alloc_nid
alloc_nid_done
alloc_nid_failed
available_free_memory
...."
This patch adds "f2fs_" prefix for all non-static symbols in order to:
a) avoid conflict with other kernel generic symbols;
b) to indicate the function is f2fs specific one instead of generic
one;
Reported-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-30 00:20:41 +08:00
|
|
|
f2fs_ra_meta_pages(sbi, GET_SUM_BLOCK(sbi, segno),
|
f2fs: support subsectional garbage collection
Section is minimal garbage collection unit of f2fs, in zoned block
device, or ancient block mapping flash device, in order to improve
GC efficiency, we can align GC unit to lower device erase unit,
normally, it consists of multiple of segments.
Once background or foreground GC triggers, it brings a large number
of IOs, which will impact user IO, and also occupy cpu/memory resource
intensively.
So, to reduce impact of GC on large size section, this patch supports
subsectional GC, in one cycle of GC, it only migrate partial segment{s}
in victim section. Currently, by default, we use sbi->segs_per_sec as
migration granularity.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-10-24 18:37:27 +08:00
|
|
|
end_segno - segno, META_SSA, true);
|
f2fs: enhance foreground GC
If we configure section consist of multiple segments, foreground GC will
do the garbage collection with following approach:
for each segment in victim section
blk_start_plug
for each valid block in segment
write out by OPU method
submit bio cache <---
blk_finish_plug <---
There are two issue:
1) for most of the time, 'submit bio cache' will break the merging in
current bio buffer from writes of next segments, making a smaller bio
submitting.
2) block plug only cover IO submitting in one segment, which reduce
opportunity of merging IOs in plug with multiple segments.
So refactor the code as below structure to strive for biggest
opportunity of merging IOs:
blk_start_plug
for each segment in victim section
for each valid block in segment
write out by OPU method
submit bio cache
blk_finish_plug
Test method:
1. mkfs.f2fs -s 8 /dev/sdX
2. touch 32 files
3. write 2M data into each file
4. punch 1.5M data from offset 0 for each file
5. trigger foreground gc through ioctl
Before patch, there are totoally 40 bios submitted.
f2fs_submit_write_bio: dev = (8,32), WRITE_SYNC, DATA, sector = 65536, size = 122880
f2fs_submit_write_bio: dev = (8,32), WRITE_SYNC, DATA, sector = 65776, size = 122880
f2fs_submit_write_bio: dev = (8,32), WRITE_SYNC, DATA, sector = 66016, size = 122880
f2fs_submit_write_bio: dev = (8,32), WRITE_SYNC, DATA, sector = 66256, size = 122880
f2fs_submit_write_bio: dev = (8,32), WRITE_SYNC, DATA, sector = 66496, size = 32768
----repeat for 8 times
After patch, there are totally 35 bios submitted.
f2fs_submit_write_bio: dev = (8,32), WRITE_SYNC, DATA, sector = 65536, size = 122880
----repeat 34 times
f2fs_submit_write_bio: dev = (8,32), WRITE_SYNC, DATA, sector = 73696, size = 16384
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-01-23 16:23:55 +08:00
|
|
|
|
|
|
|
/* reference all summary page */
|
|
|
|
while (segno < end_segno) {
|
f2fs: clean up symbol namespace
As Ted reported:
"Hi, I was looking at f2fs's sources recently, and I noticed that there
is a very large number of non-static symbols which don't have a f2fs
prefix. There's well over a hundred (see attached below).
As one example, in fs/f2fs/dir.c there is:
unsigned char get_de_type(struct f2fs_dir_entry *de)
This function is clearly only useful for f2fs, but it has a generic
name. This means that if any other file system tries to have the same
symbol name, there will be a symbol conflict and the kernel would not
successfully build. It also means that when someone is looking f2fs
sources, it's not at all obvious whether a function such as
read_data_page(), invalidate_blocks(), is a generic kernel function
found in the fs, mm, or block layers, or a f2fs specific function.
You might want to fix this at some point. Hopefully Kent's bcachefs
isn't similarly using genericly named functions, since that might
cause conflicts with f2fs's functions --- but just as this would be a
problem that we would rightly insist that Kent fix, this is something
that we should have rightly insisted that f2fs should have fixed
before it was integrated into the mainline kernel.
acquire_orphan_inode
add_ino_entry
add_orphan_inode
allocate_data_block
allocate_new_segments
alloc_nid
alloc_nid_done
alloc_nid_failed
available_free_memory
...."
This patch adds "f2fs_" prefix for all non-static symbols in order to:
a) avoid conflict with other kernel generic symbols;
b) to indicate the function is f2fs specific one instead of generic
one;
Reported-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-30 00:20:41 +08:00
|
|
|
sum_page = f2fs_get_sum_page(sbi, segno++);
|
2018-09-18 08:36:06 +08:00
|
|
|
if (IS_ERR(sum_page)) {
|
|
|
|
int err = PTR_ERR(sum_page);
|
|
|
|
|
|
|
|
end_segno = segno - 1;
|
|
|
|
for (segno = start_segno; segno < end_segno; segno++) {
|
|
|
|
sum_page = find_get_page(META_MAPPING(sbi),
|
|
|
|
GET_SUM_BLOCK(sbi, segno));
|
|
|
|
f2fs_put_page(sum_page, 0);
|
|
|
|
f2fs_put_page(sum_page, 0);
|
|
|
|
}
|
|
|
|
return err;
|
|
|
|
}
|
f2fs: enhance foreground GC
If we configure section consist of multiple segments, foreground GC will
do the garbage collection with following approach:
for each segment in victim section
blk_start_plug
for each valid block in segment
write out by OPU method
submit bio cache <---
blk_finish_plug <---
There are two issue:
1) for most of the time, 'submit bio cache' will break the merging in
current bio buffer from writes of next segments, making a smaller bio
submitting.
2) block plug only cover IO submitting in one segment, which reduce
opportunity of merging IOs in plug with multiple segments.
So refactor the code as below structure to strive for biggest
opportunity of merging IOs:
blk_start_plug
for each segment in victim section
for each valid block in segment
write out by OPU method
submit bio cache
blk_finish_plug
Test method:
1. mkfs.f2fs -s 8 /dev/sdX
2. touch 32 files
3. write 2M data into each file
4. punch 1.5M data from offset 0 for each file
5. trigger foreground gc through ioctl
Before patch, there are totoally 40 bios submitted.
f2fs_submit_write_bio: dev = (8,32), WRITE_SYNC, DATA, sector = 65536, size = 122880
f2fs_submit_write_bio: dev = (8,32), WRITE_SYNC, DATA, sector = 65776, size = 122880
f2fs_submit_write_bio: dev = (8,32), WRITE_SYNC, DATA, sector = 66016, size = 122880
f2fs_submit_write_bio: dev = (8,32), WRITE_SYNC, DATA, sector = 66256, size = 122880
f2fs_submit_write_bio: dev = (8,32), WRITE_SYNC, DATA, sector = 66496, size = 32768
----repeat for 8 times
After patch, there are totally 35 bios submitted.
f2fs_submit_write_bio: dev = (8,32), WRITE_SYNC, DATA, sector = 65536, size = 122880
----repeat 34 times
f2fs_submit_write_bio: dev = (8,32), WRITE_SYNC, DATA, sector = 73696, size = 16384
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-01-23 16:23:55 +08:00
|
|
|
unlock_page(sum_page);
|
|
|
|
}
|
f2fs: add garbage collection functions
This adds on-demand and background cleaning functions.
- The basic background cleaning policy is trying to do cleaning jobs as much as
possible whenever the system is idle. Once the background cleaning is done,
the cleaner sleeps an amount of time not to interfere with VFS calls. The time
is dynamically adjusted according to the status of whole segments, which is
decreased when the following conditions are satisfied.
. GC is not conducted currently, and
. IO subsystem is idle by checking the number of requets in bdev's request
list, and
. There are enough dirty segments.
Otherwise, the time is increased incrementally until to the maximum time.
Note that, min and max times are 10 secs and 30 secs by default.
- F2FS adopts a default victim selection policy where background cleaning uses
a cost-benefit algorithm, while on-demand cleaning uses a greedy algorithm.
- The method of moving data during the cleaning is slightly different between
background and on-demand cleaning schemes. In the case of background cleaning,
F2FS loads the data, and marks them as dirty. Then, F2FS expects that the data
will be moved by flusher or VM. In the case of on-demand cleaning, F2FS should
move the data right away.
- In order to identify valid blocks in a victim segment, F2FS scans the bitmap
of the segment managed as an SIT entry.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:13:01 +08:00
|
|
|
|
f2fs: give a chance to merge IOs by IO scheduler
Previously, background GC submits many 4KB read requests to load victim blocks
and/or its (i)node blocks.
...
f2fs_gc : f2fs_readpage: ino = 1, page_index = 0xb61, blkaddr = 0x3b964ed
f2fs_gc : block_rq_complete: 8,16 R () 499854968 + 8 [0]
f2fs_gc : f2fs_readpage: ino = 1, page_index = 0xb6f, blkaddr = 0x3b964ee
f2fs_gc : block_rq_complete: 8,16 R () 499854976 + 8 [0]
f2fs_gc : f2fs_readpage: ino = 1, page_index = 0xb79, blkaddr = 0x3b964ef
f2fs_gc : block_rq_complete: 8,16 R () 499854984 + 8 [0]
...
However, by the fact that many IOs are sequential, we can give a chance to merge
the IOs by IO scheduler.
In order to do that, let's use blk_plug.
...
f2fs_gc : f2fs_iget: ino = 143
f2fs_gc : f2fs_readpage: ino = 143, page_index = 0x1c6, blkaddr = 0x2e6ee
f2fs_gc : f2fs_iget: ino = 143
f2fs_gc : f2fs_readpage: ino = 143, page_index = 0x1c7, blkaddr = 0x2e6ef
<idle> : block_rq_complete: 8,16 R () 1519616 + 8 [0]
<idle> : block_rq_complete: 8,16 R () 1519848 + 8 [0]
<idle> : block_rq_complete: 8,16 R () 1520432 + 96 [0]
<idle> : block_rq_complete: 8,16 R () 1520536 + 104 [0]
<idle> : block_rq_complete: 8,16 R () 1521008 + 112 [0]
<idle> : block_rq_complete: 8,16 R () 1521440 + 152 [0]
<idle> : block_rq_complete: 8,16 R () 1521688 + 144 [0]
<idle> : block_rq_complete: 8,16 R () 1522128 + 192 [0]
<idle> : block_rq_complete: 8,16 R () 1523256 + 328 [0]
...
Note that this issue should be addressed in checkpoint, and some readahead
flows too.
Reviewed-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-04-24 12:19:56 +08:00
|
|
|
blk_start_plug(&plug);
|
|
|
|
|
f2fs: enhance foreground GC
If we configure section consist of multiple segments, foreground GC will
do the garbage collection with following approach:
for each segment in victim section
blk_start_plug
for each valid block in segment
write out by OPU method
submit bio cache <---
blk_finish_plug <---
There are two issue:
1) for most of the time, 'submit bio cache' will break the merging in
current bio buffer from writes of next segments, making a smaller bio
submitting.
2) block plug only cover IO submitting in one segment, which reduce
opportunity of merging IOs in plug with multiple segments.
So refactor the code as below structure to strive for biggest
opportunity of merging IOs:
blk_start_plug
for each segment in victim section
for each valid block in segment
write out by OPU method
submit bio cache
blk_finish_plug
Test method:
1. mkfs.f2fs -s 8 /dev/sdX
2. touch 32 files
3. write 2M data into each file
4. punch 1.5M data from offset 0 for each file
5. trigger foreground gc through ioctl
Before patch, there are totoally 40 bios submitted.
f2fs_submit_write_bio: dev = (8,32), WRITE_SYNC, DATA, sector = 65536, size = 122880
f2fs_submit_write_bio: dev = (8,32), WRITE_SYNC, DATA, sector = 65776, size = 122880
f2fs_submit_write_bio: dev = (8,32), WRITE_SYNC, DATA, sector = 66016, size = 122880
f2fs_submit_write_bio: dev = (8,32), WRITE_SYNC, DATA, sector = 66256, size = 122880
f2fs_submit_write_bio: dev = (8,32), WRITE_SYNC, DATA, sector = 66496, size = 32768
----repeat for 8 times
After patch, there are totally 35 bios submitted.
f2fs_submit_write_bio: dev = (8,32), WRITE_SYNC, DATA, sector = 65536, size = 122880
----repeat 34 times
f2fs_submit_write_bio: dev = (8,32), WRITE_SYNC, DATA, sector = 73696, size = 16384
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-01-23 16:23:55 +08:00
|
|
|
for (segno = start_segno; segno < end_segno; segno++) {
|
2016-06-07 09:49:54 +08:00
|
|
|
|
f2fs: enhance foreground GC
If we configure section consist of multiple segments, foreground GC will
do the garbage collection with following approach:
for each segment in victim section
blk_start_plug
for each valid block in segment
write out by OPU method
submit bio cache <---
blk_finish_plug <---
There are two issue:
1) for most of the time, 'submit bio cache' will break the merging in
current bio buffer from writes of next segments, making a smaller bio
submitting.
2) block plug only cover IO submitting in one segment, which reduce
opportunity of merging IOs in plug with multiple segments.
So refactor the code as below structure to strive for biggest
opportunity of merging IOs:
blk_start_plug
for each segment in victim section
for each valid block in segment
write out by OPU method
submit bio cache
blk_finish_plug
Test method:
1. mkfs.f2fs -s 8 /dev/sdX
2. touch 32 files
3. write 2M data into each file
4. punch 1.5M data from offset 0 for each file
5. trigger foreground gc through ioctl
Before patch, there are totoally 40 bios submitted.
f2fs_submit_write_bio: dev = (8,32), WRITE_SYNC, DATA, sector = 65536, size = 122880
f2fs_submit_write_bio: dev = (8,32), WRITE_SYNC, DATA, sector = 65776, size = 122880
f2fs_submit_write_bio: dev = (8,32), WRITE_SYNC, DATA, sector = 66016, size = 122880
f2fs_submit_write_bio: dev = (8,32), WRITE_SYNC, DATA, sector = 66256, size = 122880
f2fs_submit_write_bio: dev = (8,32), WRITE_SYNC, DATA, sector = 66496, size = 32768
----repeat for 8 times
After patch, there are totally 35 bios submitted.
f2fs_submit_write_bio: dev = (8,32), WRITE_SYNC, DATA, sector = 65536, size = 122880
----repeat 34 times
f2fs_submit_write_bio: dev = (8,32), WRITE_SYNC, DATA, sector = 73696, size = 16384
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-01-23 16:23:55 +08:00
|
|
|
/* find segment summary of victim */
|
|
|
|
sum_page = find_get_page(META_MAPPING(sbi),
|
|
|
|
GET_SUM_BLOCK(sbi, segno));
|
|
|
|
f2fs_put_page(sum_page, 0);
|
f2fs: add garbage collection functions
This adds on-demand and background cleaning functions.
- The basic background cleaning policy is trying to do cleaning jobs as much as
possible whenever the system is idle. Once the background cleaning is done,
the cleaner sleeps an amount of time not to interfere with VFS calls. The time
is dynamically adjusted according to the status of whole segments, which is
decreased when the following conditions are satisfied.
. GC is not conducted currently, and
. IO subsystem is idle by checking the number of requets in bdev's request
list, and
. There are enough dirty segments.
Otherwise, the time is increased incrementally until to the maximum time.
Note that, min and max times are 10 secs and 30 secs by default.
- F2FS adopts a default victim selection policy where background cleaning uses
a cost-benefit algorithm, while on-demand cleaning uses a greedy algorithm.
- The method of moving data during the cleaning is slightly different between
background and on-demand cleaning schemes. In the case of background cleaning,
F2FS loads the data, and marks them as dirty. Then, F2FS expects that the data
will be moved by flusher or VM. In the case of on-demand cleaning, F2FS should
move the data right away.
- In order to identify valid blocks in a victim segment, F2FS scans the bitmap
of the segment managed as an SIT entry.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:13:01 +08:00
|
|
|
|
2018-10-24 16:08:30 +08:00
|
|
|
if (get_valid_blocks(sbi, segno, false) == 0)
|
|
|
|
goto freed;
|
2020-02-10 05:28:45 +08:00
|
|
|
if (gc_type == BG_GC && __is_large_section(sbi) &&
|
f2fs: support subsectional garbage collection
Section is minimal garbage collection unit of f2fs, in zoned block
device, or ancient block mapping flash device, in order to improve
GC efficiency, we can align GC unit to lower device erase unit,
normally, it consists of multiple of segments.
Once background or foreground GC triggers, it brings a large number
of IOs, which will impact user IO, and also occupy cpu/memory resource
intensively.
So, to reduce impact of GC on large size section, this patch supports
subsectional GC, in one cycle of GC, it only migrate partial segment{s}
in victim section. Currently, by default, we use sbi->segs_per_sec as
migration granularity.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-10-24 18:37:27 +08:00
|
|
|
migrated >= sbi->migration_granularity)
|
|
|
|
goto skip;
|
2018-10-24 16:08:30 +08:00
|
|
|
if (!PageUptodate(sum_page) || unlikely(f2fs_cp_error(sbi)))
|
f2fs: support subsectional garbage collection
Section is minimal garbage collection unit of f2fs, in zoned block
device, or ancient block mapping flash device, in order to improve
GC efficiency, we can align GC unit to lower device erase unit,
normally, it consists of multiple of segments.
Once background or foreground GC triggers, it brings a large number
of IOs, which will impact user IO, and also occupy cpu/memory resource
intensively.
So, to reduce impact of GC on large size section, this patch supports
subsectional GC, in one cycle of GC, it only migrate partial segment{s}
in victim section. Currently, by default, we use sbi->segs_per_sec as
migration granularity.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-10-24 18:37:27 +08:00
|
|
|
goto skip;
|
2016-10-13 04:38:41 +08:00
|
|
|
|
f2fs: enhance foreground GC
If we configure section consist of multiple segments, foreground GC will
do the garbage collection with following approach:
for each segment in victim section
blk_start_plug
for each valid block in segment
write out by OPU method
submit bio cache <---
blk_finish_plug <---
There are two issue:
1) for most of the time, 'submit bio cache' will break the merging in
current bio buffer from writes of next segments, making a smaller bio
submitting.
2) block plug only cover IO submitting in one segment, which reduce
opportunity of merging IOs in plug with multiple segments.
So refactor the code as below structure to strive for biggest
opportunity of merging IOs:
blk_start_plug
for each segment in victim section
for each valid block in segment
write out by OPU method
submit bio cache
blk_finish_plug
Test method:
1. mkfs.f2fs -s 8 /dev/sdX
2. touch 32 files
3. write 2M data into each file
4. punch 1.5M data from offset 0 for each file
5. trigger foreground gc through ioctl
Before patch, there are totoally 40 bios submitted.
f2fs_submit_write_bio: dev = (8,32), WRITE_SYNC, DATA, sector = 65536, size = 122880
f2fs_submit_write_bio: dev = (8,32), WRITE_SYNC, DATA, sector = 65776, size = 122880
f2fs_submit_write_bio: dev = (8,32), WRITE_SYNC, DATA, sector = 66016, size = 122880
f2fs_submit_write_bio: dev = (8,32), WRITE_SYNC, DATA, sector = 66256, size = 122880
f2fs_submit_write_bio: dev = (8,32), WRITE_SYNC, DATA, sector = 66496, size = 32768
----repeat for 8 times
After patch, there are totally 35 bios submitted.
f2fs_submit_write_bio: dev = (8,32), WRITE_SYNC, DATA, sector = 65536, size = 122880
----repeat 34 times
f2fs_submit_write_bio: dev = (8,32), WRITE_SYNC, DATA, sector = 73696, size = 16384
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-01-23 16:23:55 +08:00
|
|
|
sum = page_address(sum_page);
|
2018-07-04 21:20:05 +08:00
|
|
|
if (type != GET_SUM_TYPE((&sum->footer))) {
|
2019-06-18 17:48:42 +08:00
|
|
|
f2fs_err(sbi, "Inconsistent segment (%u) type [%d, %d] in SSA and SIT",
|
|
|
|
segno, type, GET_SUM_TYPE((&sum->footer)));
|
2018-07-04 21:20:05 +08:00
|
|
|
set_sbi_flag(sbi, SBI_NEED_FSCK);
|
f2fs: fix to avoid deadloop in foreground GC
As Jungyeon reported in bugzilla:
https://bugzilla.kernel.org/show_bug.cgi?id=203211
- Overview
When mounting the attached crafted image and making a new file, I got this error and the error messages keep repeating.
The image is intentionally fuzzed from a normal f2fs image for testing and I run with option CONFIG_F2FS_CHECK_FS on.
- Reproduces
mkdir test
mount -t f2fs tmp.img test
cd test
touch t
- Messages
[ 58.820451] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT
[ 58.821485] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT
[ 58.822530] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT
[ 58.823571] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT
[ 58.824616] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT
[ 58.825640] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT
[ 58.826663] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT
[ 58.827698] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT
[ 58.828719] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT
[ 58.829759] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT
[ 58.830783] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT
[ 58.831828] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT
[ 58.832869] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT
[ 58.833888] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT
[ 58.834945] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT
[ 58.835996] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT
[ 58.837028] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT
[ 58.838051] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT
[ 58.839072] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT
[ 58.840100] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT
[ 58.841147] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT
[ 58.842186] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT
[ 58.843214] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT
[ 58.844267] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT
[ 58.845282] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT
[ 58.846305] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT
[ 58.847341] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT
... (repeating)
During GC, if segment type stored in SSA and SIT is inconsistent, we just
skip migrating current segment directly, since we need to know the exact
type to decide the migration function we use.
So in foreground GC, we will easily run into a infinite loop as we may
select the same victim segment which has inconsistent type due to greedy
policy. In order to end up this, we choose to shutdown filesystem. For
backgrond GC, we need to do that as well, so that we can avoid latter
potential infinite looped foreground GC.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-04-10 18:45:50 +08:00
|
|
|
f2fs_stop_checkpoint(sbi, false);
|
f2fs: support subsectional garbage collection
Section is minimal garbage collection unit of f2fs, in zoned block
device, or ancient block mapping flash device, in order to improve
GC efficiency, we can align GC unit to lower device erase unit,
normally, it consists of multiple of segments.
Once background or foreground GC triggers, it brings a large number
of IOs, which will impact user IO, and also occupy cpu/memory resource
intensively.
So, to reduce impact of GC on large size section, this patch supports
subsectional GC, in one cycle of GC, it only migrate partial segment{s}
in victim section. Currently, by default, we use sbi->segs_per_sec as
migration granularity.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-10-24 18:37:27 +08:00
|
|
|
goto skip;
|
2018-07-04 21:20:05 +08:00
|
|
|
}
|
f2fs: enhance foreground GC
If we configure section consist of multiple segments, foreground GC will
do the garbage collection with following approach:
for each segment in victim section
blk_start_plug
for each valid block in segment
write out by OPU method
submit bio cache <---
blk_finish_plug <---
There are two issue:
1) for most of the time, 'submit bio cache' will break the merging in
current bio buffer from writes of next segments, making a smaller bio
submitting.
2) block plug only cover IO submitting in one segment, which reduce
opportunity of merging IOs in plug with multiple segments.
So refactor the code as below structure to strive for biggest
opportunity of merging IOs:
blk_start_plug
for each segment in victim section
for each valid block in segment
write out by OPU method
submit bio cache
blk_finish_plug
Test method:
1. mkfs.f2fs -s 8 /dev/sdX
2. touch 32 files
3. write 2M data into each file
4. punch 1.5M data from offset 0 for each file
5. trigger foreground gc through ioctl
Before patch, there are totoally 40 bios submitted.
f2fs_submit_write_bio: dev = (8,32), WRITE_SYNC, DATA, sector = 65536, size = 122880
f2fs_submit_write_bio: dev = (8,32), WRITE_SYNC, DATA, sector = 65776, size = 122880
f2fs_submit_write_bio: dev = (8,32), WRITE_SYNC, DATA, sector = 66016, size = 122880
f2fs_submit_write_bio: dev = (8,32), WRITE_SYNC, DATA, sector = 66256, size = 122880
f2fs_submit_write_bio: dev = (8,32), WRITE_SYNC, DATA, sector = 66496, size = 32768
----repeat for 8 times
After patch, there are totally 35 bios submitted.
f2fs_submit_write_bio: dev = (8,32), WRITE_SYNC, DATA, sector = 65536, size = 122880
----repeat 34 times
f2fs_submit_write_bio: dev = (8,32), WRITE_SYNC, DATA, sector = 73696, size = 16384
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-01-23 16:23:55 +08:00
|
|
|
|
|
|
|
/*
|
|
|
|
* this is to avoid deadlock:
|
|
|
|
* - lock_page(sum_page) - f2fs_replace_block
|
2017-10-30 17:49:53 +08:00
|
|
|
* - check_valid_map() - down_write(sentry_lock)
|
|
|
|
* - down_read(sentry_lock) - change_curseg()
|
f2fs: enhance foreground GC
If we configure section consist of multiple segments, foreground GC will
do the garbage collection with following approach:
for each segment in victim section
blk_start_plug
for each valid block in segment
write out by OPU method
submit bio cache <---
blk_finish_plug <---
There are two issue:
1) for most of the time, 'submit bio cache' will break the merging in
current bio buffer from writes of next segments, making a smaller bio
submitting.
2) block plug only cover IO submitting in one segment, which reduce
opportunity of merging IOs in plug with multiple segments.
So refactor the code as below structure to strive for biggest
opportunity of merging IOs:
blk_start_plug
for each segment in victim section
for each valid block in segment
write out by OPU method
submit bio cache
blk_finish_plug
Test method:
1. mkfs.f2fs -s 8 /dev/sdX
2. touch 32 files
3. write 2M data into each file
4. punch 1.5M data from offset 0 for each file
5. trigger foreground gc through ioctl
Before patch, there are totoally 40 bios submitted.
f2fs_submit_write_bio: dev = (8,32), WRITE_SYNC, DATA, sector = 65536, size = 122880
f2fs_submit_write_bio: dev = (8,32), WRITE_SYNC, DATA, sector = 65776, size = 122880
f2fs_submit_write_bio: dev = (8,32), WRITE_SYNC, DATA, sector = 66016, size = 122880
f2fs_submit_write_bio: dev = (8,32), WRITE_SYNC, DATA, sector = 66256, size = 122880
f2fs_submit_write_bio: dev = (8,32), WRITE_SYNC, DATA, sector = 66496, size = 32768
----repeat for 8 times
After patch, there are totally 35 bios submitted.
f2fs_submit_write_bio: dev = (8,32), WRITE_SYNC, DATA, sector = 65536, size = 122880
----repeat 34 times
f2fs_submit_write_bio: dev = (8,32), WRITE_SYNC, DATA, sector = 73696, size = 16384
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-01-23 16:23:55 +08:00
|
|
|
* - lock_page(sum_page)
|
|
|
|
*/
|
|
|
|
if (type == SUM_TYPE_NODE)
|
2018-09-13 07:40:53 +08:00
|
|
|
submitted += gc_node_segment(sbi, sum->entries, segno,
|
f2fs: enhance foreground GC
If we configure section consist of multiple segments, foreground GC will
do the garbage collection with following approach:
for each segment in victim section
blk_start_plug
for each valid block in segment
write out by OPU method
submit bio cache <---
blk_finish_plug <---
There are two issue:
1) for most of the time, 'submit bio cache' will break the merging in
current bio buffer from writes of next segments, making a smaller bio
submitting.
2) block plug only cover IO submitting in one segment, which reduce
opportunity of merging IOs in plug with multiple segments.
So refactor the code as below structure to strive for biggest
opportunity of merging IOs:
blk_start_plug
for each segment in victim section
for each valid block in segment
write out by OPU method
submit bio cache
blk_finish_plug
Test method:
1. mkfs.f2fs -s 8 /dev/sdX
2. touch 32 files
3. write 2M data into each file
4. punch 1.5M data from offset 0 for each file
5. trigger foreground gc through ioctl
Before patch, there are totoally 40 bios submitted.
f2fs_submit_write_bio: dev = (8,32), WRITE_SYNC, DATA, sector = 65536, size = 122880
f2fs_submit_write_bio: dev = (8,32), WRITE_SYNC, DATA, sector = 65776, size = 122880
f2fs_submit_write_bio: dev = (8,32), WRITE_SYNC, DATA, sector = 66016, size = 122880
f2fs_submit_write_bio: dev = (8,32), WRITE_SYNC, DATA, sector = 66256, size = 122880
f2fs_submit_write_bio: dev = (8,32), WRITE_SYNC, DATA, sector = 66496, size = 32768
----repeat for 8 times
After patch, there are totally 35 bios submitted.
f2fs_submit_write_bio: dev = (8,32), WRITE_SYNC, DATA, sector = 65536, size = 122880
----repeat 34 times
f2fs_submit_write_bio: dev = (8,32), WRITE_SYNC, DATA, sector = 73696, size = 16384
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-01-23 16:23:55 +08:00
|
|
|
gc_type);
|
2018-09-13 07:40:53 +08:00
|
|
|
else
|
|
|
|
submitted += gc_data_segment(sbi, sum->entries, gc_list,
|
|
|
|
segno, gc_type);
|
f2fs: enhance foreground GC
If we configure section consist of multiple segments, foreground GC will
do the garbage collection with following approach:
for each segment in victim section
blk_start_plug
for each valid block in segment
write out by OPU method
submit bio cache <---
blk_finish_plug <---
There are two issue:
1) for most of the time, 'submit bio cache' will break the merging in
current bio buffer from writes of next segments, making a smaller bio
submitting.
2) block plug only cover IO submitting in one segment, which reduce
opportunity of merging IOs in plug with multiple segments.
So refactor the code as below structure to strive for biggest
opportunity of merging IOs:
blk_start_plug
for each segment in victim section
for each valid block in segment
write out by OPU method
submit bio cache
blk_finish_plug
Test method:
1. mkfs.f2fs -s 8 /dev/sdX
2. touch 32 files
3. write 2M data into each file
4. punch 1.5M data from offset 0 for each file
5. trigger foreground gc through ioctl
Before patch, there are totoally 40 bios submitted.
f2fs_submit_write_bio: dev = (8,32), WRITE_SYNC, DATA, sector = 65536, size = 122880
f2fs_submit_write_bio: dev = (8,32), WRITE_SYNC, DATA, sector = 65776, size = 122880
f2fs_submit_write_bio: dev = (8,32), WRITE_SYNC, DATA, sector = 66016, size = 122880
f2fs_submit_write_bio: dev = (8,32), WRITE_SYNC, DATA, sector = 66256, size = 122880
f2fs_submit_write_bio: dev = (8,32), WRITE_SYNC, DATA, sector = 66496, size = 32768
----repeat for 8 times
After patch, there are totally 35 bios submitted.
f2fs_submit_write_bio: dev = (8,32), WRITE_SYNC, DATA, sector = 65536, size = 122880
----repeat 34 times
f2fs_submit_write_bio: dev = (8,32), WRITE_SYNC, DATA, sector = 73696, size = 16384
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-01-23 16:23:55 +08:00
|
|
|
|
|
|
|
stat_inc_seg_count(sbi, type, gc_type);
|
2020-02-10 05:27:09 +08:00
|
|
|
migrated++;
|
2017-08-11 18:00:15 +08:00
|
|
|
|
2018-10-24 16:08:30 +08:00
|
|
|
freed:
|
2017-08-11 18:00:15 +08:00
|
|
|
if (gc_type == FG_GC &&
|
|
|
|
get_valid_blocks(sbi, segno, false) == 0)
|
|
|
|
seg_freed++;
|
f2fs: support subsectional garbage collection
Section is minimal garbage collection unit of f2fs, in zoned block
device, or ancient block mapping flash device, in order to improve
GC efficiency, we can align GC unit to lower device erase unit,
normally, it consists of multiple of segments.
Once background or foreground GC triggers, it brings a large number
of IOs, which will impact user IO, and also occupy cpu/memory resource
intensively.
So, to reduce impact of GC on large size section, this patch supports
subsectional GC, in one cycle of GC, it only migrate partial segment{s}
in victim section. Currently, by default, we use sbi->segs_per_sec as
migration granularity.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-10-24 18:37:27 +08:00
|
|
|
|
|
|
|
if (__is_large_section(sbi) && segno + 1 < end_segno)
|
|
|
|
sbi->next_victim_seg[gc_type] = segno + 1;
|
|
|
|
skip:
|
f2fs: enhance foreground GC
If we configure section consist of multiple segments, foreground GC will
do the garbage collection with following approach:
for each segment in victim section
blk_start_plug
for each valid block in segment
write out by OPU method
submit bio cache <---
blk_finish_plug <---
There are two issue:
1) for most of the time, 'submit bio cache' will break the merging in
current bio buffer from writes of next segments, making a smaller bio
submitting.
2) block plug only cover IO submitting in one segment, which reduce
opportunity of merging IOs in plug with multiple segments.
So refactor the code as below structure to strive for biggest
opportunity of merging IOs:
blk_start_plug
for each segment in victim section
for each valid block in segment
write out by OPU method
submit bio cache
blk_finish_plug
Test method:
1. mkfs.f2fs -s 8 /dev/sdX
2. touch 32 files
3. write 2M data into each file
4. punch 1.5M data from offset 0 for each file
5. trigger foreground gc through ioctl
Before patch, there are totoally 40 bios submitted.
f2fs_submit_write_bio: dev = (8,32), WRITE_SYNC, DATA, sector = 65536, size = 122880
f2fs_submit_write_bio: dev = (8,32), WRITE_SYNC, DATA, sector = 65776, size = 122880
f2fs_submit_write_bio: dev = (8,32), WRITE_SYNC, DATA, sector = 66016, size = 122880
f2fs_submit_write_bio: dev = (8,32), WRITE_SYNC, DATA, sector = 66256, size = 122880
f2fs_submit_write_bio: dev = (8,32), WRITE_SYNC, DATA, sector = 66496, size = 32768
----repeat for 8 times
After patch, there are totally 35 bios submitted.
f2fs_submit_write_bio: dev = (8,32), WRITE_SYNC, DATA, sector = 65536, size = 122880
----repeat 34 times
f2fs_submit_write_bio: dev = (8,32), WRITE_SYNC, DATA, sector = 73696, size = 16384
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-01-23 16:23:55 +08:00
|
|
|
f2fs_put_page(sum_page, 0);
|
|
|
|
}
|
|
|
|
|
2018-09-13 07:40:53 +08:00
|
|
|
if (submitted)
|
2017-05-11 02:28:38 +08:00
|
|
|
f2fs_submit_merged_write(sbi,
|
|
|
|
(type == SUM_TYPE_NODE) ? NODE : DATA);
|
f2fs: give a chance to merge IOs by IO scheduler
Previously, background GC submits many 4KB read requests to load victim blocks
and/or its (i)node blocks.
...
f2fs_gc : f2fs_readpage: ino = 1, page_index = 0xb61, blkaddr = 0x3b964ed
f2fs_gc : block_rq_complete: 8,16 R () 499854968 + 8 [0]
f2fs_gc : f2fs_readpage: ino = 1, page_index = 0xb6f, blkaddr = 0x3b964ee
f2fs_gc : block_rq_complete: 8,16 R () 499854976 + 8 [0]
f2fs_gc : f2fs_readpage: ino = 1, page_index = 0xb79, blkaddr = 0x3b964ef
f2fs_gc : block_rq_complete: 8,16 R () 499854984 + 8 [0]
...
However, by the fact that many IOs are sequential, we can give a chance to merge
the IOs by IO scheduler.
In order to do that, let's use blk_plug.
...
f2fs_gc : f2fs_iget: ino = 143
f2fs_gc : f2fs_readpage: ino = 143, page_index = 0x1c6, blkaddr = 0x2e6ee
f2fs_gc : f2fs_iget: ino = 143
f2fs_gc : f2fs_readpage: ino = 143, page_index = 0x1c7, blkaddr = 0x2e6ef
<idle> : block_rq_complete: 8,16 R () 1519616 + 8 [0]
<idle> : block_rq_complete: 8,16 R () 1519848 + 8 [0]
<idle> : block_rq_complete: 8,16 R () 1520432 + 96 [0]
<idle> : block_rq_complete: 8,16 R () 1520536 + 104 [0]
<idle> : block_rq_complete: 8,16 R () 1521008 + 112 [0]
<idle> : block_rq_complete: 8,16 R () 1521440 + 152 [0]
<idle> : block_rq_complete: 8,16 R () 1521688 + 144 [0]
<idle> : block_rq_complete: 8,16 R () 1522128 + 192 [0]
<idle> : block_rq_complete: 8,16 R () 1523256 + 328 [0]
...
Note that this issue should be addressed in checkpoint, and some readahead
flows too.
Reviewed-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-04-24 12:19:56 +08:00
|
|
|
|
f2fs: enhance foreground GC
If we configure section consist of multiple segments, foreground GC will
do the garbage collection with following approach:
for each segment in victim section
blk_start_plug
for each valid block in segment
write out by OPU method
submit bio cache <---
blk_finish_plug <---
There are two issue:
1) for most of the time, 'submit bio cache' will break the merging in
current bio buffer from writes of next segments, making a smaller bio
submitting.
2) block plug only cover IO submitting in one segment, which reduce
opportunity of merging IOs in plug with multiple segments.
So refactor the code as below structure to strive for biggest
opportunity of merging IOs:
blk_start_plug
for each segment in victim section
for each valid block in segment
write out by OPU method
submit bio cache
blk_finish_plug
Test method:
1. mkfs.f2fs -s 8 /dev/sdX
2. touch 32 files
3. write 2M data into each file
4. punch 1.5M data from offset 0 for each file
5. trigger foreground gc through ioctl
Before patch, there are totoally 40 bios submitted.
f2fs_submit_write_bio: dev = (8,32), WRITE_SYNC, DATA, sector = 65536, size = 122880
f2fs_submit_write_bio: dev = (8,32), WRITE_SYNC, DATA, sector = 65776, size = 122880
f2fs_submit_write_bio: dev = (8,32), WRITE_SYNC, DATA, sector = 66016, size = 122880
f2fs_submit_write_bio: dev = (8,32), WRITE_SYNC, DATA, sector = 66256, size = 122880
f2fs_submit_write_bio: dev = (8,32), WRITE_SYNC, DATA, sector = 66496, size = 32768
----repeat for 8 times
After patch, there are totally 35 bios submitted.
f2fs_submit_write_bio: dev = (8,32), WRITE_SYNC, DATA, sector = 65536, size = 122880
----repeat 34 times
f2fs_submit_write_bio: dev = (8,32), WRITE_SYNC, DATA, sector = 73696, size = 16384
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-01-23 16:23:55 +08:00
|
|
|
blk_finish_plug(&plug);
|
f2fs: add garbage collection functions
This adds on-demand and background cleaning functions.
- The basic background cleaning policy is trying to do cleaning jobs as much as
possible whenever the system is idle. Once the background cleaning is done,
the cleaner sleeps an amount of time not to interfere with VFS calls. The time
is dynamically adjusted according to the status of whole segments, which is
decreased when the following conditions are satisfied.
. GC is not conducted currently, and
. IO subsystem is idle by checking the number of requets in bdev's request
list, and
. There are enough dirty segments.
Otherwise, the time is increased incrementally until to the maximum time.
Note that, min and max times are 10 secs and 30 secs by default.
- F2FS adopts a default victim selection policy where background cleaning uses
a cost-benefit algorithm, while on-demand cleaning uses a greedy algorithm.
- The method of moving data during the cleaning is slightly different between
background and on-demand cleaning schemes. In the case of background cleaning,
F2FS loads the data, and marks them as dirty. Then, F2FS expects that the data
will be moved by flusher or VM. In the case of on-demand cleaning, F2FS should
move the data right away.
- In order to identify valid blocks in a victim segment, F2FS scans the bitmap
of the segment managed as an SIT entry.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:13:01 +08:00
|
|
|
|
2016-02-22 18:32:13 +08:00
|
|
|
stat_inc_call_count(sbi->stat_info);
|
|
|
|
|
2017-08-11 18:00:15 +08:00
|
|
|
return seg_freed;
|
f2fs: add garbage collection functions
This adds on-demand and background cleaning functions.
- The basic background cleaning policy is trying to do cleaning jobs as much as
possible whenever the system is idle. Once the background cleaning is done,
the cleaner sleeps an amount of time not to interfere with VFS calls. The time
is dynamically adjusted according to the status of whole segments, which is
decreased when the following conditions are satisfied.
. GC is not conducted currently, and
. IO subsystem is idle by checking the number of requets in bdev's request
list, and
. There are enough dirty segments.
Otherwise, the time is increased incrementally until to the maximum time.
Note that, min and max times are 10 secs and 30 secs by default.
- F2FS adopts a default victim selection policy where background cleaning uses
a cost-benefit algorithm, while on-demand cleaning uses a greedy algorithm.
- The method of moving data during the cleaning is slightly different between
background and on-demand cleaning schemes. In the case of background cleaning,
F2FS loads the data, and marks them as dirty. Then, F2FS expects that the data
will be moved by flusher or VM. In the case of on-demand cleaning, F2FS should
move the data right away.
- In order to identify valid blocks in a victim segment, F2FS scans the bitmap
of the segment managed as an SIT entry.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:13:01 +08:00
|
|
|
}
|
|
|
|
|
2017-04-14 06:17:00 +08:00
|
|
|
int f2fs_gc(struct f2fs_sb_info *sbi, bool sync,
|
|
|
|
bool background, unsigned int segno)
|
f2fs: add garbage collection functions
This adds on-demand and background cleaning functions.
- The basic background cleaning policy is trying to do cleaning jobs as much as
possible whenever the system is idle. Once the background cleaning is done,
the cleaner sleeps an amount of time not to interfere with VFS calls. The time
is dynamically adjusted according to the status of whole segments, which is
decreased when the following conditions are satisfied.
. GC is not conducted currently, and
. IO subsystem is idle by checking the number of requets in bdev's request
list, and
. There are enough dirty segments.
Otherwise, the time is increased incrementally until to the maximum time.
Note that, min and max times are 10 secs and 30 secs by default.
- F2FS adopts a default victim selection policy where background cleaning uses
a cost-benefit algorithm, while on-demand cleaning uses a greedy algorithm.
- The method of moving data during the cleaning is slightly different between
background and on-demand cleaning schemes. In the case of background cleaning,
F2FS loads the data, and marks them as dirty. Then, F2FS expects that the data
will be moved by flusher or VM. In the case of on-demand cleaning, F2FS should
move the data right away.
- In order to identify valid blocks in a victim segment, F2FS scans the bitmap
of the segment managed as an SIT entry.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:13:01 +08:00
|
|
|
{
|
2015-10-05 22:22:44 +08:00
|
|
|
int gc_type = sync ? FG_GC : BG_GC;
|
2017-08-11 18:00:15 +08:00
|
|
|
int sec_freed = 0, seg_freed = 0, total_freed = 0;
|
|
|
|
int ret = 0;
|
2014-10-31 13:47:03 +08:00
|
|
|
struct cp_control cpc;
|
2017-04-14 06:17:00 +08:00
|
|
|
unsigned int init_segno = segno;
|
2014-11-28 23:49:40 +08:00
|
|
|
struct gc_inode_list gc_list = {
|
|
|
|
.ilist = LIST_HEAD_INIT(gc_list.ilist),
|
2018-04-11 07:36:52 +08:00
|
|
|
.iroot = RADIX_TREE_INIT(gc_list.iroot, GFP_NOFS),
|
2014-11-28 23:49:40 +08:00
|
|
|
};
|
f2fs: avoid stucking GC due to atomic write
f2fs doesn't allow abuse on atomic write class interface, so except
limiting in-mem pages' total memory usage capacity, we need to limit
atomic-write usage as well when filesystem is seriously fragmented,
otherwise we may run into infinite loop during foreground GC because
target blocks in victim segment are belong to atomic opened file for
long time.
Now, we will detect failure due to atomic write in foreground GC, if
the count exceeds threshold, we will drop all atomic written data in
cache, by this, I expect it can keep our system running safely to
prevent Dos attack.
In addition, his patch adds to show GC skip information in debugfs,
now it just shows count of skipped caused by atomic write.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-07 20:28:54 +08:00
|
|
|
unsigned long long last_skipped = sbi->skipped_atomic_files[FG_GC];
|
2018-07-25 11:11:56 +08:00
|
|
|
unsigned long long first_skipped;
|
f2fs: avoid stucking GC due to atomic write
f2fs doesn't allow abuse on atomic write class interface, so except
limiting in-mem pages' total memory usage capacity, we need to limit
atomic-write usage as well when filesystem is seriously fragmented,
otherwise we may run into infinite loop during foreground GC because
target blocks in victim segment are belong to atomic opened file for
long time.
Now, we will detect failure due to atomic write in foreground GC, if
the count exceeds threshold, we will drop all atomic written data in
cache, by this, I expect it can keep our system running safely to
prevent Dos attack.
In addition, his patch adds to show GC skip information in debugfs,
now it just shows count of skipped caused by atomic write.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-07 20:28:54 +08:00
|
|
|
unsigned int skipped_round = 0, round = 0;
|
2014-10-31 13:47:03 +08:00
|
|
|
|
2017-08-11 18:00:15 +08:00
|
|
|
trace_f2fs_gc_begin(sbi->sb, sync, background,
|
|
|
|
get_pages(sbi, F2FS_DIRTY_NODES),
|
|
|
|
get_pages(sbi, F2FS_DIRTY_DENTS),
|
|
|
|
get_pages(sbi, F2FS_DIRTY_IMETA),
|
|
|
|
free_sections(sbi),
|
|
|
|
free_segments(sbi),
|
|
|
|
reserved_segments(sbi),
|
|
|
|
prefree_segments(sbi));
|
|
|
|
|
2015-01-30 03:45:33 +08:00
|
|
|
cpc.reason = __get_cp_reason(sbi);
|
2018-07-25 11:11:56 +08:00
|
|
|
sbi->skipped_gc_rwsem = 0;
|
|
|
|
first_skipped = last_skipped;
|
f2fs: add garbage collection functions
This adds on-demand and background cleaning functions.
- The basic background cleaning policy is trying to do cleaning jobs as much as
possible whenever the system is idle. Once the background cleaning is done,
the cleaner sleeps an amount of time not to interfere with VFS calls. The time
is dynamically adjusted according to the status of whole segments, which is
decreased when the following conditions are satisfied.
. GC is not conducted currently, and
. IO subsystem is idle by checking the number of requets in bdev's request
list, and
. There are enough dirty segments.
Otherwise, the time is increased incrementally until to the maximum time.
Note that, min and max times are 10 secs and 30 secs by default.
- F2FS adopts a default victim selection policy where background cleaning uses
a cost-benefit algorithm, while on-demand cleaning uses a greedy algorithm.
- The method of moving data during the cleaning is slightly different between
background and on-demand cleaning schemes. In the case of background cleaning,
F2FS loads the data, and marks them as dirty. Then, F2FS expects that the data
will be moved by flusher or VM. In the case of on-demand cleaning, F2FS should
move the data right away.
- In order to identify valid blocks in a victim segment, F2FS scans the bitmap
of the segment managed as an SIT entry.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:13:01 +08:00
|
|
|
gc_more:
|
2017-11-28 05:05:09 +08:00
|
|
|
if (unlikely(!(sbi->sb->s_flags & SB_ACTIVE))) {
|
2017-05-11 04:28:00 +08:00
|
|
|
ret = -EINVAL;
|
2013-01-03 16:55:52 +08:00
|
|
|
goto stop;
|
2017-05-11 04:28:00 +08:00
|
|
|
}
|
2015-12-24 18:04:56 +08:00
|
|
|
if (unlikely(f2fs_cp_error(sbi))) {
|
|
|
|
ret = -EIO;
|
2014-02-05 12:03:57 +08:00
|
|
|
goto stop;
|
2015-12-24 18:04:56 +08:00
|
|
|
}
|
f2fs: add garbage collection functions
This adds on-demand and background cleaning functions.
- The basic background cleaning policy is trying to do cleaning jobs as much as
possible whenever the system is idle. Once the background cleaning is done,
the cleaner sleeps an amount of time not to interfere with VFS calls. The time
is dynamically adjusted according to the status of whole segments, which is
decreased when the following conditions are satisfied.
. GC is not conducted currently, and
. IO subsystem is idle by checking the number of requets in bdev's request
list, and
. There are enough dirty segments.
Otherwise, the time is increased incrementally until to the maximum time.
Note that, min and max times are 10 secs and 30 secs by default.
- F2FS adopts a default victim selection policy where background cleaning uses
a cost-benefit algorithm, while on-demand cleaning uses a greedy algorithm.
- The method of moving data during the cleaning is slightly different between
background and on-demand cleaning schemes. In the case of background cleaning,
F2FS loads the data, and marks them as dirty. Then, F2FS expects that the data
will be moved by flusher or VM. In the case of on-demand cleaning, F2FS should
move the data right away.
- In order to identify valid blocks in a victim segment, F2FS scans the bitmap
of the segment managed as an SIT entry.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:13:01 +08:00
|
|
|
|
2017-02-25 11:57:38 +08:00
|
|
|
if (gc_type == BG_GC && has_not_enough_free_secs(sbi, 0, 0)) {
|
2016-01-23 22:00:57 +08:00
|
|
|
/*
|
2017-02-25 11:57:38 +08:00
|
|
|
* For example, if there are many prefree_segments below given
|
|
|
|
* threshold, we can make them free by checkpoint. Then, we
|
|
|
|
* secure free segments which doesn't need fggc any more.
|
2016-01-23 22:00:57 +08:00
|
|
|
*/
|
2018-08-21 10:21:43 +08:00
|
|
|
if (prefree_segments(sbi) &&
|
|
|
|
!is_sbi_flag_set(sbi, SBI_CP_DISABLED)) {
|
f2fs: clean up symbol namespace
As Ted reported:
"Hi, I was looking at f2fs's sources recently, and I noticed that there
is a very large number of non-static symbols which don't have a f2fs
prefix. There's well over a hundred (see attached below).
As one example, in fs/f2fs/dir.c there is:
unsigned char get_de_type(struct f2fs_dir_entry *de)
This function is clearly only useful for f2fs, but it has a generic
name. This means that if any other file system tries to have the same
symbol name, there will be a symbol conflict and the kernel would not
successfully build. It also means that when someone is looking f2fs
sources, it's not at all obvious whether a function such as
read_data_page(), invalidate_blocks(), is a generic kernel function
found in the fs, mm, or block layers, or a f2fs specific function.
You might want to fix this at some point. Hopefully Kent's bcachefs
isn't similarly using genericly named functions, since that might
cause conflicts with f2fs's functions --- but just as this would be a
problem that we would rightly insist that Kent fix, this is something
that we should have rightly insisted that f2fs should have fixed
before it was integrated into the mainline kernel.
acquire_orphan_inode
add_ino_entry
add_orphan_inode
allocate_data_block
allocate_new_segments
alloc_nid
alloc_nid_done
alloc_nid_failed
available_free_memory
...."
This patch adds "f2fs_" prefix for all non-static symbols in order to:
a) avoid conflict with other kernel generic symbols;
b) to indicate the function is f2fs specific one instead of generic
one;
Reported-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-30 00:20:41 +08:00
|
|
|
ret = f2fs_write_checkpoint(sbi, &cpc);
|
2017-04-08 08:25:54 +08:00
|
|
|
if (ret)
|
|
|
|
goto stop;
|
|
|
|
}
|
2017-02-25 11:57:38 +08:00
|
|
|
if (has_not_enough_free_secs(sbi, 0, 0))
|
|
|
|
gc_type = FG_GC;
|
2013-04-08 15:01:00 +08:00
|
|
|
}
|
f2fs: add garbage collection functions
This adds on-demand and background cleaning functions.
- The basic background cleaning policy is trying to do cleaning jobs as much as
possible whenever the system is idle. Once the background cleaning is done,
the cleaner sleeps an amount of time not to interfere with VFS calls. The time
is dynamically adjusted according to the status of whole segments, which is
decreased when the following conditions are satisfied.
. GC is not conducted currently, and
. IO subsystem is idle by checking the number of requets in bdev's request
list, and
. There are enough dirty segments.
Otherwise, the time is increased incrementally until to the maximum time.
Note that, min and max times are 10 secs and 30 secs by default.
- F2FS adopts a default victim selection policy where background cleaning uses
a cost-benefit algorithm, while on-demand cleaning uses a greedy algorithm.
- The method of moving data during the cleaning is slightly different between
background and on-demand cleaning schemes. In the case of background cleaning,
F2FS loads the data, and marks them as dirty. Then, F2FS expects that the data
will be moved by flusher or VM. In the case of on-demand cleaning, F2FS should
move the data right away.
- In order to identify valid blocks in a victim segment, F2FS scans the bitmap
of the segment managed as an SIT entry.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:13:01 +08:00
|
|
|
|
2017-02-25 11:57:38 +08:00
|
|
|
/* f2fs_balance_fs doesn't need to do BG_GC in critical path. */
|
2017-08-11 18:00:15 +08:00
|
|
|
if (gc_type == BG_GC && !background) {
|
|
|
|
ret = -EINVAL;
|
2017-02-25 11:57:38 +08:00
|
|
|
goto stop;
|
2017-08-11 18:00:15 +08:00
|
|
|
}
|
2020-06-28 19:23:03 +08:00
|
|
|
ret = __get_victim(sbi, &segno, gc_type);
|
|
|
|
if (ret)
|
2013-01-03 16:55:52 +08:00
|
|
|
goto stop;
|
f2fs: add garbage collection functions
This adds on-demand and background cleaning functions.
- The basic background cleaning policy is trying to do cleaning jobs as much as
possible whenever the system is idle. Once the background cleaning is done,
the cleaner sleeps an amount of time not to interfere with VFS calls. The time
is dynamically adjusted according to the status of whole segments, which is
decreased when the following conditions are satisfied.
. GC is not conducted currently, and
. IO subsystem is idle by checking the number of requets in bdev's request
list, and
. There are enough dirty segments.
Otherwise, the time is increased incrementally until to the maximum time.
Note that, min and max times are 10 secs and 30 secs by default.
- F2FS adopts a default victim selection policy where background cleaning uses
a cost-benefit algorithm, while on-demand cleaning uses a greedy algorithm.
- The method of moving data during the cleaning is slightly different between
background and on-demand cleaning schemes. In the case of background cleaning,
F2FS loads the data, and marks them as dirty. Then, F2FS expects that the data
will be moved by flusher or VM. In the case of on-demand cleaning, F2FS should
move the data right away.
- In order to identify valid blocks in a victim segment, F2FS scans the bitmap
of the segment managed as an SIT entry.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:13:01 +08:00
|
|
|
|
2017-08-11 18:00:15 +08:00
|
|
|
seg_freed = do_garbage_collect(sbi, segno, &gc_list, gc_type);
|
f2fs: support zone capacity less than zone size
NVMe Zoned Namespace devices can have zone-capacity less than zone-size.
Zone-capacity indicates the maximum number of sectors that are usable in
a zone beginning from the first sector of the zone. This makes the sectors
sectors after the zone-capacity till zone-size to be unusable.
This patch set tracks zone-size and zone-capacity in zoned devices and
calculate the usable blocks per segment and usable segments per section.
If zone-capacity is less than zone-size mark only those segments which
start before zone-capacity as free segments. All segments at and beyond
zone-capacity are treated as permanently used segments. In cases where
zone-capacity does not align with segment size the last segment will start
before zone-capacity and end beyond the zone-capacity of the zone. For
such spanning segments only sectors within the zone-capacity are used.
During writes and GC manage the usable segments in a section and usable
blocks per segment. Segments which are beyond zone-capacity are never
allocated, and do not need to be garbage collected, only the segments
which are before zone-capacity needs to garbage collected.
For spanning segments based on the number of usable blocks in that
segment, write to blocks only up to zone-capacity.
Zone-capacity is device specific and cannot be configured by the user.
Since NVMe ZNS device zones are sequentially write only, a block device
with conventional zones or any normal block device is needed along with
the ZNS device for the metadata operations of F2fs.
A typical nvme-cli output of a zoned device shows zone start and capacity
and write pointer as below:
SLBA: 0x0 WP: 0x0 Cap: 0x18800 State: EMPTY Type: SEQWRITE_REQ
SLBA: 0x20000 WP: 0x20000 Cap: 0x18800 State: EMPTY Type: SEQWRITE_REQ
SLBA: 0x40000 WP: 0x40000 Cap: 0x18800 State: EMPTY Type: SEQWRITE_REQ
Here zone size is 64MB, capacity is 49MB, WP is at zone start as the zones
are in EMPTY state. For each zone, only zone start + 49MB is usable area,
any lba/sector after 49MB cannot be read or written to, the drive will fail
any attempts to read/write. So, the second zone starts at 64MB and is
usable till 113MB (64 + 49) and the range between 113 and 128MB is
again unusable. The next zone starts at 128MB, and so on.
Signed-off-by: Aravind Ramesh <aravind.ramesh@wdc.com>
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-07-16 20:56:56 +08:00
|
|
|
if (gc_type == FG_GC &&
|
|
|
|
seg_freed == f2fs_usable_segs_in_sec(sbi, segno))
|
2015-09-28 17:42:24 +08:00
|
|
|
sec_freed++;
|
2017-08-11 18:00:15 +08:00
|
|
|
total_freed += seg_freed;
|
2013-02-04 14:11:17 +08:00
|
|
|
|
f2fs: avoid stucking GC due to atomic write
f2fs doesn't allow abuse on atomic write class interface, so except
limiting in-mem pages' total memory usage capacity, we need to limit
atomic-write usage as well when filesystem is seriously fragmented,
otherwise we may run into infinite loop during foreground GC because
target blocks in victim segment are belong to atomic opened file for
long time.
Now, we will detect failure due to atomic write in foreground GC, if
the count exceeds threshold, we will drop all atomic written data in
cache, by this, I expect it can keep our system running safely to
prevent Dos attack.
In addition, his patch adds to show GC skip information in debugfs,
now it just shows count of skipped caused by atomic write.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-07 20:28:54 +08:00
|
|
|
if (gc_type == FG_GC) {
|
2018-07-25 11:11:56 +08:00
|
|
|
if (sbi->skipped_atomic_files[FG_GC] > last_skipped ||
|
|
|
|
sbi->skipped_gc_rwsem)
|
f2fs: avoid stucking GC due to atomic write
f2fs doesn't allow abuse on atomic write class interface, so except
limiting in-mem pages' total memory usage capacity, we need to limit
atomic-write usage as well when filesystem is seriously fragmented,
otherwise we may run into infinite loop during foreground GC because
target blocks in victim segment are belong to atomic opened file for
long time.
Now, we will detect failure due to atomic write in foreground GC, if
the count exceeds threshold, we will drop all atomic written data in
cache, by this, I expect it can keep our system running safely to
prevent Dos attack.
In addition, his patch adds to show GC skip information in debugfs,
now it just shows count of skipped caused by atomic write.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-07 20:28:54 +08:00
|
|
|
skipped_round++;
|
|
|
|
last_skipped = sbi->skipped_atomic_files[FG_GC];
|
|
|
|
round++;
|
|
|
|
}
|
|
|
|
|
2019-07-29 13:20:26 +08:00
|
|
|
if (gc_type == FG_GC && seg_freed)
|
2013-03-31 12:26:03 +08:00
|
|
|
sbi->cur_victim_sec = NULL_SEGNO;
|
2013-02-04 14:11:17 +08:00
|
|
|
|
2018-07-25 11:11:56 +08:00
|
|
|
if (sync)
|
|
|
|
goto stop;
|
|
|
|
|
|
|
|
if (has_not_enough_free_secs(sbi, sec_freed, 0)) {
|
|
|
|
if (skipped_round <= MAX_SKIP_GC_COUNT ||
|
|
|
|
skipped_round * 2 < round) {
|
2017-04-14 06:17:00 +08:00
|
|
|
segno = NULL_SEGNO;
|
2015-10-05 22:22:44 +08:00
|
|
|
goto gc_more;
|
2017-04-14 06:17:00 +08:00
|
|
|
}
|
2013-02-04 14:11:17 +08:00
|
|
|
|
2018-07-25 11:11:56 +08:00
|
|
|
if (first_skipped < last_skipped &&
|
|
|
|
(last_skipped - first_skipped) >
|
|
|
|
sbi->skipped_gc_rwsem) {
|
|
|
|
f2fs_drop_inmem_pages_all(sbi, true);
|
|
|
|
segno = NULL_SEGNO;
|
|
|
|
goto gc_more;
|
|
|
|
}
|
2018-08-21 10:21:43 +08:00
|
|
|
if (gc_type == FG_GC && !is_sbi_flag_set(sbi, SBI_CP_DISABLED))
|
f2fs: clean up symbol namespace
As Ted reported:
"Hi, I was looking at f2fs's sources recently, and I noticed that there
is a very large number of non-static symbols which don't have a f2fs
prefix. There's well over a hundred (see attached below).
As one example, in fs/f2fs/dir.c there is:
unsigned char get_de_type(struct f2fs_dir_entry *de)
This function is clearly only useful for f2fs, but it has a generic
name. This means that if any other file system tries to have the same
symbol name, there will be a symbol conflict and the kernel would not
successfully build. It also means that when someone is looking f2fs
sources, it's not at all obvious whether a function such as
read_data_page(), invalidate_blocks(), is a generic kernel function
found in the fs, mm, or block layers, or a f2fs specific function.
You might want to fix this at some point. Hopefully Kent's bcachefs
isn't similarly using genericly named functions, since that might
cause conflicts with f2fs's functions --- but just as this would be a
problem that we would rightly insist that Kent fix, this is something
that we should have rightly insisted that f2fs should have fixed
before it was integrated into the mainline kernel.
acquire_orphan_inode
add_ino_entry
add_orphan_inode
allocate_data_block
allocate_new_segments
alloc_nid
alloc_nid_done
alloc_nid_failed
available_free_memory
...."
This patch adds "f2fs_" prefix for all non-static symbols in order to:
a) avoid conflict with other kernel generic symbols;
b) to indicate the function is f2fs specific one instead of generic
one;
Reported-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-30 00:20:41 +08:00
|
|
|
ret = f2fs_write_checkpoint(sbi, &cpc);
|
2015-10-05 22:22:44 +08:00
|
|
|
}
|
2013-01-03 16:55:52 +08:00
|
|
|
stop:
|
2017-04-14 06:17:00 +08:00
|
|
|
SIT_I(sbi)->last_victim[ALLOC_NEXT] = 0;
|
|
|
|
SIT_I(sbi)->last_victim[FLUSH_DEVICE] = init_segno;
|
2017-08-11 18:00:15 +08:00
|
|
|
|
|
|
|
trace_f2fs_gc_end(sbi->sb, ret, total_freed, sec_freed,
|
|
|
|
get_pages(sbi, F2FS_DIRTY_NODES),
|
|
|
|
get_pages(sbi, F2FS_DIRTY_DENTS),
|
|
|
|
get_pages(sbi, F2FS_DIRTY_IMETA),
|
|
|
|
free_sections(sbi),
|
|
|
|
free_segments(sbi),
|
|
|
|
reserved_segments(sbi),
|
|
|
|
prefree_segments(sbi));
|
|
|
|
|
2020-01-14 19:36:50 +08:00
|
|
|
up_write(&sbi->gc_lock);
|
f2fs: add garbage collection functions
This adds on-demand and background cleaning functions.
- The basic background cleaning policy is trying to do cleaning jobs as much as
possible whenever the system is idle. Once the background cleaning is done,
the cleaner sleeps an amount of time not to interfere with VFS calls. The time
is dynamically adjusted according to the status of whole segments, which is
decreased when the following conditions are satisfied.
. GC is not conducted currently, and
. IO subsystem is idle by checking the number of requets in bdev's request
list, and
. There are enough dirty segments.
Otherwise, the time is increased incrementally until to the maximum time.
Note that, min and max times are 10 secs and 30 secs by default.
- F2FS adopts a default victim selection policy where background cleaning uses
a cost-benefit algorithm, while on-demand cleaning uses a greedy algorithm.
- The method of moving data during the cleaning is slightly different between
background and on-demand cleaning schemes. In the case of background cleaning,
F2FS loads the data, and marks them as dirty. Then, F2FS expects that the data
will be moved by flusher or VM. In the case of on-demand cleaning, F2FS should
move the data right away.
- In order to identify valid blocks in a victim segment, F2FS scans the bitmap
of the segment managed as an SIT entry.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:13:01 +08:00
|
|
|
|
2014-11-28 23:49:40 +08:00
|
|
|
put_gc_inode(&gc_list);
|
2015-10-05 22:22:44 +08:00
|
|
|
|
2018-09-26 06:25:21 +08:00
|
|
|
if (sync && !ret)
|
2015-10-05 22:22:44 +08:00
|
|
|
ret = sec_freed ? 0 : -EAGAIN;
|
2013-02-04 14:11:17 +08:00
|
|
|
return ret;
|
f2fs: add garbage collection functions
This adds on-demand and background cleaning functions.
- The basic background cleaning policy is trying to do cleaning jobs as much as
possible whenever the system is idle. Once the background cleaning is done,
the cleaner sleeps an amount of time not to interfere with VFS calls. The time
is dynamically adjusted according to the status of whole segments, which is
decreased when the following conditions are satisfied.
. GC is not conducted currently, and
. IO subsystem is idle by checking the number of requets in bdev's request
list, and
. There are enough dirty segments.
Otherwise, the time is increased incrementally until to the maximum time.
Note that, min and max times are 10 secs and 30 secs by default.
- F2FS adopts a default victim selection policy where background cleaning uses
a cost-benefit algorithm, while on-demand cleaning uses a greedy algorithm.
- The method of moving data during the cleaning is slightly different between
background and on-demand cleaning schemes. In the case of background cleaning,
F2FS loads the data, and marks them as dirty. Then, F2FS expects that the data
will be moved by flusher or VM. In the case of on-demand cleaning, F2FS should
move the data right away.
- In order to identify valid blocks in a victim segment, F2FS scans the bitmap
of the segment managed as an SIT entry.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:13:01 +08:00
|
|
|
}
|
|
|
|
|
f2fs: support age threshold based garbage collection
There are several issues in current background GC algorithm:
- valid blocks is one of key factors during cost overhead calculation,
so if segment has less valid block, however even its age is young or
it locates hot segment, CB algorithm will still choose the segment as
victim, it's not appropriate.
- GCed data/node will go to existing logs, no matter in-there datas'
update frequency is the same or not, it may mix hot and cold data
again.
- GC alloctor mainly use LFS type segment, it will cost free segment
more quickly.
This patch introduces a new algorithm named age threshold based
garbage collection to solve above issues, there are three steps
mainly:
1. select a source victim:
- set an age threshold, and select candidates beased threshold:
e.g.
0 means youngest, 100 means oldest, if we set age threshold to 80
then select dirty segments which has age in range of [80, 100] as
candiddates;
- set candidate_ratio threshold, and select candidates based the
ratio, so that we can shrink candidates to those oldest segments;
- select target segment with fewest valid blocks in order to
migrate blocks with minimum cost;
2. select a target victim:
- select candidates beased age threshold;
- set candidate_radius threshold, search candidates whose age is
around source victims, searching radius should less than the
radius threshold.
- select target segment with most valid blocks in order to avoid
migrating current target segment.
3. merge valid blocks from source victim into target victim with
SSR alloctor.
Test steps:
- create 160 dirty segments:
* half of them have 128 valid blocks per segment
* left of them have 384 valid blocks per segment
- run background GC
Benefit: GC count and block movement count both decrease obviously:
- Before:
- Valid: 86
- Dirty: 1
- Prefree: 11
- Free: 6001 (6001)
GC calls: 162 (BG: 220)
- data segments : 160 (160)
- node segments : 2 (2)
Try to move 41454 blocks (BG: 41454)
- data blocks : 40960 (40960)
- node blocks : 494 (494)
IPU: 0 blocks
SSR: 0 blocks in 0 segments
LFS: 41364 blocks in 81 segments
- After:
- Valid: 87
- Dirty: 0
- Prefree: 4
- Free: 6008 (6008)
GC calls: 75 (BG: 76)
- data segments : 74 (74)
- node segments : 1 (1)
Try to move 12813 blocks (BG: 12813)
- data blocks : 12544 (12544)
- node blocks : 269 (269)
IPU: 0 blocks
SSR: 12032 blocks in 77 segments
LFS: 855 blocks in 2 segments
Signed-off-by: Chao Yu <yuchao0@huawei.com>
[Jaegeuk Kim: fix a bug along with pinfile in-mem segment & clean up]
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-08-04 21:14:49 +08:00
|
|
|
int __init f2fs_create_garbage_collection_cache(void)
|
|
|
|
{
|
|
|
|
victim_entry_slab = f2fs_kmem_cache_create("f2fs_victim_entry",
|
|
|
|
sizeof(struct victim_entry));
|
|
|
|
if (!victim_entry_slab)
|
|
|
|
return -ENOMEM;
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
void f2fs_destroy_garbage_collection_cache(void)
|
|
|
|
{
|
|
|
|
kmem_cache_destroy(victim_entry_slab);
|
|
|
|
}
|
|
|
|
|
|
|
|
static void init_atgc_management(struct f2fs_sb_info *sbi)
|
|
|
|
{
|
|
|
|
struct atgc_management *am = &sbi->am;
|
|
|
|
|
|
|
|
if (test_opt(sbi, ATGC) &&
|
|
|
|
SIT_I(sbi)->elapsed_time >= DEF_GC_THREAD_AGE_THRESHOLD)
|
|
|
|
am->atgc_enabled = true;
|
|
|
|
|
|
|
|
am->root = RB_ROOT_CACHED;
|
|
|
|
INIT_LIST_HEAD(&am->victim_list);
|
|
|
|
am->victim_count = 0;
|
|
|
|
|
|
|
|
am->candidate_ratio = DEF_GC_THREAD_CANDIDATE_RATIO;
|
|
|
|
am->max_candidate_count = DEF_GC_THREAD_MAX_CANDIDATE_COUNT;
|
|
|
|
am->age_weight = DEF_GC_THREAD_AGE_WEIGHT;
|
|
|
|
}
|
|
|
|
|
f2fs: clean up symbol namespace
As Ted reported:
"Hi, I was looking at f2fs's sources recently, and I noticed that there
is a very large number of non-static symbols which don't have a f2fs
prefix. There's well over a hundred (see attached below).
As one example, in fs/f2fs/dir.c there is:
unsigned char get_de_type(struct f2fs_dir_entry *de)
This function is clearly only useful for f2fs, but it has a generic
name. This means that if any other file system tries to have the same
symbol name, there will be a symbol conflict and the kernel would not
successfully build. It also means that when someone is looking f2fs
sources, it's not at all obvious whether a function such as
read_data_page(), invalidate_blocks(), is a generic kernel function
found in the fs, mm, or block layers, or a f2fs specific function.
You might want to fix this at some point. Hopefully Kent's bcachefs
isn't similarly using genericly named functions, since that might
cause conflicts with f2fs's functions --- but just as this would be a
problem that we would rightly insist that Kent fix, this is something
that we should have rightly insisted that f2fs should have fixed
before it was integrated into the mainline kernel.
acquire_orphan_inode
add_ino_entry
add_orphan_inode
allocate_data_block
allocate_new_segments
alloc_nid
alloc_nid_done
alloc_nid_failed
available_free_memory
...."
This patch adds "f2fs_" prefix for all non-static symbols in order to:
a) avoid conflict with other kernel generic symbols;
b) to indicate the function is f2fs specific one instead of generic
one;
Reported-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-30 00:20:41 +08:00
|
|
|
void f2fs_build_gc_manager(struct f2fs_sb_info *sbi)
|
f2fs: add garbage collection functions
This adds on-demand and background cleaning functions.
- The basic background cleaning policy is trying to do cleaning jobs as much as
possible whenever the system is idle. Once the background cleaning is done,
the cleaner sleeps an amount of time not to interfere with VFS calls. The time
is dynamically adjusted according to the status of whole segments, which is
decreased when the following conditions are satisfied.
. GC is not conducted currently, and
. IO subsystem is idle by checking the number of requets in bdev's request
list, and
. There are enough dirty segments.
Otherwise, the time is increased incrementally until to the maximum time.
Note that, min and max times are 10 secs and 30 secs by default.
- F2FS adopts a default victim selection policy where background cleaning uses
a cost-benefit algorithm, while on-demand cleaning uses a greedy algorithm.
- The method of moving data during the cleaning is slightly different between
background and on-demand cleaning schemes. In the case of background cleaning,
F2FS loads the data, and marks them as dirty. Then, F2FS expects that the data
will be moved by flusher or VM. In the case of on-demand cleaning, F2FS should
move the data right away.
- In order to identify valid blocks in a victim segment, F2FS scans the bitmap
of the segment managed as an SIT entry.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:13:01 +08:00
|
|
|
{
|
|
|
|
DIRTY_I(sbi)->v_ops = &default_v_ops;
|
f2fs: add ovp valid_blocks check for bg gc victim to fg_gc
For foreground gc, greedy algorithm should be adapted, which makes
this formula work well:
(2 * (100 / config.overprovision + 1) + 6)
But currently, we fg_gc have a prior to select bg_gc victim segments to gc
first, these victims are selected by cost-benefit algorithm, we can't guarantee
such segments have the small valid blocks, which may destroy the f2fs rule, on
the worstest case, would consume all the free segments.
This patch fix this by add a filter in check_bg_victims, if segment's has # of
valid blocks over overprovision ratio, skip such segments.
Cc: <stable@vger.kernel.org>
Signed-off-by: Hou Pengyang <houpengyang@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-02-16 20:34:31 +08:00
|
|
|
|
2017-12-08 08:25:39 +08:00
|
|
|
sbi->gc_pin_file_threshold = DEF_GC_FAILED_PINNED_FILES;
|
2017-04-19 06:03:15 +08:00
|
|
|
|
|
|
|
/* give warm/cold data area from slower device */
|
2019-03-16 08:13:06 +08:00
|
|
|
if (f2fs_is_multi_device(sbi) && !__is_large_section(sbi))
|
2017-04-19 06:03:15 +08:00
|
|
|
SIT_I(sbi)->last_victim[ALLOC_NEXT] =
|
|
|
|
GET_SEGNO(sbi, FDEV(0).end_blk) + 1;
|
f2fs: support age threshold based garbage collection
There are several issues in current background GC algorithm:
- valid blocks is one of key factors during cost overhead calculation,
so if segment has less valid block, however even its age is young or
it locates hot segment, CB algorithm will still choose the segment as
victim, it's not appropriate.
- GCed data/node will go to existing logs, no matter in-there datas'
update frequency is the same or not, it may mix hot and cold data
again.
- GC alloctor mainly use LFS type segment, it will cost free segment
more quickly.
This patch introduces a new algorithm named age threshold based
garbage collection to solve above issues, there are three steps
mainly:
1. select a source victim:
- set an age threshold, and select candidates beased threshold:
e.g.
0 means youngest, 100 means oldest, if we set age threshold to 80
then select dirty segments which has age in range of [80, 100] as
candiddates;
- set candidate_ratio threshold, and select candidates based the
ratio, so that we can shrink candidates to those oldest segments;
- select target segment with fewest valid blocks in order to
migrate blocks with minimum cost;
2. select a target victim:
- select candidates beased age threshold;
- set candidate_radius threshold, search candidates whose age is
around source victims, searching radius should less than the
radius threshold.
- select target segment with most valid blocks in order to avoid
migrating current target segment.
3. merge valid blocks from source victim into target victim with
SSR alloctor.
Test steps:
- create 160 dirty segments:
* half of them have 128 valid blocks per segment
* left of them have 384 valid blocks per segment
- run background GC
Benefit: GC count and block movement count both decrease obviously:
- Before:
- Valid: 86
- Dirty: 1
- Prefree: 11
- Free: 6001 (6001)
GC calls: 162 (BG: 220)
- data segments : 160 (160)
- node segments : 2 (2)
Try to move 41454 blocks (BG: 41454)
- data blocks : 40960 (40960)
- node blocks : 494 (494)
IPU: 0 blocks
SSR: 0 blocks in 0 segments
LFS: 41364 blocks in 81 segments
- After:
- Valid: 87
- Dirty: 0
- Prefree: 4
- Free: 6008 (6008)
GC calls: 75 (BG: 76)
- data segments : 74 (74)
- node segments : 1 (1)
Try to move 12813 blocks (BG: 12813)
- data blocks : 12544 (12544)
- node blocks : 269 (269)
IPU: 0 blocks
SSR: 12032 blocks in 77 segments
LFS: 855 blocks in 2 segments
Signed-off-by: Chao Yu <yuchao0@huawei.com>
[Jaegeuk Kim: fix a bug along with pinfile in-mem segment & clean up]
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-08-04 21:14:49 +08:00
|
|
|
|
|
|
|
init_atgc_management(sbi);
|
f2fs: add garbage collection functions
This adds on-demand and background cleaning functions.
- The basic background cleaning policy is trying to do cleaning jobs as much as
possible whenever the system is idle. Once the background cleaning is done,
the cleaner sleeps an amount of time not to interfere with VFS calls. The time
is dynamically adjusted according to the status of whole segments, which is
decreased when the following conditions are satisfied.
. GC is not conducted currently, and
. IO subsystem is idle by checking the number of requets in bdev's request
list, and
. There are enough dirty segments.
Otherwise, the time is increased incrementally until to the maximum time.
Note that, min and max times are 10 secs and 30 secs by default.
- F2FS adopts a default victim selection policy where background cleaning uses
a cost-benefit algorithm, while on-demand cleaning uses a greedy algorithm.
- The method of moving data during the cleaning is slightly different between
background and on-demand cleaning schemes. In the case of background cleaning,
F2FS loads the data, and marks them as dirty. Then, F2FS expects that the data
will be moved by flusher or VM. In the case of on-demand cleaning, F2FS should
move the data right away.
- In order to identify valid blocks in a victim segment, F2FS scans the bitmap
of the segment managed as an SIT entry.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:13:01 +08:00
|
|
|
}
|
2019-06-05 11:33:25 +08:00
|
|
|
|
2020-04-01 02:43:07 +08:00
|
|
|
static int free_segment_range(struct f2fs_sb_info *sbi,
|
|
|
|
unsigned int secs, bool gc_only)
|
2019-06-05 11:33:25 +08:00
|
|
|
{
|
2020-04-01 02:43:07 +08:00
|
|
|
unsigned int segno, next_inuse, start, end;
|
|
|
|
struct cp_control cpc = { CP_RESIZE, 0, 0, 0 };
|
|
|
|
int gc_mode, gc_type;
|
2019-06-05 11:33:25 +08:00
|
|
|
int err = 0;
|
2020-04-01 02:43:07 +08:00
|
|
|
int type;
|
|
|
|
|
|
|
|
/* Force block allocation for GC */
|
|
|
|
MAIN_SECS(sbi) -= secs;
|
|
|
|
start = MAIN_SECS(sbi) * sbi->segs_per_sec;
|
|
|
|
end = MAIN_SEGS(sbi) - 1;
|
|
|
|
|
|
|
|
mutex_lock(&DIRTY_I(sbi)->seglist_lock);
|
|
|
|
for (gc_mode = 0; gc_mode < MAX_GC_POLICY; gc_mode++)
|
|
|
|
if (SIT_I(sbi)->last_victim[gc_mode] >= start)
|
|
|
|
SIT_I(sbi)->last_victim[gc_mode] = 0;
|
|
|
|
|
|
|
|
for (gc_type = BG_GC; gc_type <= FG_GC; gc_type++)
|
|
|
|
if (sbi->next_victim_seg[gc_type] >= start)
|
|
|
|
sbi->next_victim_seg[gc_type] = NULL_SEGNO;
|
|
|
|
mutex_unlock(&DIRTY_I(sbi)->seglist_lock);
|
2019-06-05 11:33:25 +08:00
|
|
|
|
|
|
|
/* Move out cursegs from the target range */
|
f2fs: introduce inmem curseg
Previous implementation of aligned pinfile allocation will:
- allocate new segment on cold data log no matter whether last used
segment is partially used or not, it makes IOs more random;
- force concurrent cold data/GCed IO going into warm data area, it
can make a bad effect on hot/cold data separation;
In this patch, we introduce a new type of log named 'inmem curseg',
the differents from normal curseg is:
- it reuses existed segment type (CURSEG_XXX_NODE/DATA);
- it only exists in memory, its segno, blkofs, summary will not b
persisted into checkpoint area;
With this new feature, we can enhance scalability of log, special
allocators can be created for purposes:
- pure lfs allocator for aligned pinfile allocation or file
defragmentation
- pure ssr allocator for later feature
So that, let's update aligned pinfile allocation to use this new
inmem curseg fwk.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-08-04 21:14:45 +08:00
|
|
|
for (type = CURSEG_HOT_DATA; type < NR_CURSEG_PERSIST_TYPE; type++)
|
2020-06-18 14:36:22 +08:00
|
|
|
f2fs_allocate_segment_for_resize(sbi, type, start, end);
|
2019-06-05 11:33:25 +08:00
|
|
|
|
|
|
|
/* do GC to move out valid blocks in the range */
|
|
|
|
for (segno = start; segno <= end; segno += sbi->segs_per_sec) {
|
|
|
|
struct gc_inode_list gc_list = {
|
|
|
|
.ilist = LIST_HEAD_INIT(gc_list.ilist),
|
|
|
|
.iroot = RADIX_TREE_INIT(gc_list.iroot, GFP_NOFS),
|
|
|
|
};
|
|
|
|
|
|
|
|
do_garbage_collect(sbi, segno, &gc_list, FG_GC);
|
|
|
|
put_gc_inode(&gc_list);
|
|
|
|
|
2020-04-01 02:43:07 +08:00
|
|
|
if (!gc_only && get_valid_blocks(sbi, segno, true)) {
|
|
|
|
err = -EAGAIN;
|
|
|
|
goto out;
|
|
|
|
}
|
|
|
|
if (fatal_signal_pending(current)) {
|
|
|
|
err = -ERESTARTSYS;
|
|
|
|
goto out;
|
|
|
|
}
|
2019-06-05 11:33:25 +08:00
|
|
|
}
|
2020-04-01 02:43:07 +08:00
|
|
|
if (gc_only)
|
|
|
|
goto out;
|
2019-06-05 11:33:25 +08:00
|
|
|
|
2020-04-01 02:43:07 +08:00
|
|
|
err = f2fs_write_checkpoint(sbi, &cpc);
|
2019-06-05 11:33:25 +08:00
|
|
|
if (err)
|
2020-04-01 02:43:07 +08:00
|
|
|
goto out;
|
2019-06-05 11:33:25 +08:00
|
|
|
|
|
|
|
next_inuse = find_next_inuse(FREE_I(sbi), end + 1, start);
|
|
|
|
if (next_inuse <= end) {
|
2019-06-18 17:48:42 +08:00
|
|
|
f2fs_err(sbi, "segno %u should be free but still inuse!",
|
|
|
|
next_inuse);
|
2019-06-05 11:33:25 +08:00
|
|
|
f2fs_bug_on(sbi, 1);
|
|
|
|
}
|
2020-04-01 02:43:07 +08:00
|
|
|
out:
|
|
|
|
MAIN_SECS(sbi) += secs;
|
2019-06-05 11:33:25 +08:00
|
|
|
return err;
|
|
|
|
}
|
|
|
|
|
|
|
|
static void update_sb_metadata(struct f2fs_sb_info *sbi, int secs)
|
|
|
|
{
|
|
|
|
struct f2fs_super_block *raw_sb = F2FS_RAW_SUPER(sbi);
|
2020-03-03 20:09:25 +08:00
|
|
|
int section_count;
|
|
|
|
int segment_count;
|
|
|
|
int segment_count_main;
|
|
|
|
long long block_count;
|
2019-06-05 11:33:25 +08:00
|
|
|
int segs = secs * sbi->segs_per_sec;
|
|
|
|
|
2020-03-03 20:09:25 +08:00
|
|
|
down_write(&sbi->sb_lock);
|
|
|
|
|
|
|
|
section_count = le32_to_cpu(raw_sb->section_count);
|
|
|
|
segment_count = le32_to_cpu(raw_sb->segment_count);
|
|
|
|
segment_count_main = le32_to_cpu(raw_sb->segment_count_main);
|
|
|
|
block_count = le64_to_cpu(raw_sb->block_count);
|
|
|
|
|
2019-06-05 11:33:25 +08:00
|
|
|
raw_sb->section_count = cpu_to_le32(section_count + secs);
|
|
|
|
raw_sb->segment_count = cpu_to_le32(segment_count + segs);
|
|
|
|
raw_sb->segment_count_main = cpu_to_le32(segment_count_main + segs);
|
|
|
|
raw_sb->block_count = cpu_to_le64(block_count +
|
|
|
|
(long long)segs * sbi->blocks_per_seg);
|
2019-09-23 12:21:39 +08:00
|
|
|
if (f2fs_is_multi_device(sbi)) {
|
|
|
|
int last_dev = sbi->s_ndevs - 1;
|
|
|
|
int dev_segs =
|
|
|
|
le32_to_cpu(raw_sb->devs[last_dev].total_segments);
|
|
|
|
|
|
|
|
raw_sb->devs[last_dev].total_segments =
|
|
|
|
cpu_to_le32(dev_segs + segs);
|
|
|
|
}
|
2020-03-03 20:09:25 +08:00
|
|
|
|
|
|
|
up_write(&sbi->sb_lock);
|
2019-06-05 11:33:25 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
static void update_fs_metadata(struct f2fs_sb_info *sbi, int secs)
|
|
|
|
{
|
|
|
|
int segs = secs * sbi->segs_per_sec;
|
2019-09-23 12:21:39 +08:00
|
|
|
long long blks = (long long)segs * sbi->blocks_per_seg;
|
2019-06-05 11:33:25 +08:00
|
|
|
long long user_block_count =
|
|
|
|
le64_to_cpu(F2FS_CKPT(sbi)->user_block_count);
|
|
|
|
|
|
|
|
SM_I(sbi)->segment_count = (int)SM_I(sbi)->segment_count + segs;
|
|
|
|
MAIN_SEGS(sbi) = (int)MAIN_SEGS(sbi) + segs;
|
2020-04-01 02:43:07 +08:00
|
|
|
MAIN_SECS(sbi) += secs;
|
2019-06-05 11:33:25 +08:00
|
|
|
FREE_I(sbi)->free_sections = (int)FREE_I(sbi)->free_sections + secs;
|
|
|
|
FREE_I(sbi)->free_segments = (int)FREE_I(sbi)->free_segments + segs;
|
2019-09-23 12:21:39 +08:00
|
|
|
F2FS_CKPT(sbi)->user_block_count = cpu_to_le64(user_block_count + blks);
|
|
|
|
|
|
|
|
if (f2fs_is_multi_device(sbi)) {
|
|
|
|
int last_dev = sbi->s_ndevs - 1;
|
|
|
|
|
|
|
|
FDEV(last_dev).total_segments =
|
|
|
|
(int)FDEV(last_dev).total_segments + segs;
|
|
|
|
FDEV(last_dev).end_blk =
|
|
|
|
(long long)FDEV(last_dev).end_blk + blks;
|
|
|
|
#ifdef CONFIG_BLK_DEV_ZONED
|
|
|
|
FDEV(last_dev).nr_blkz = (int)FDEV(last_dev).nr_blkz +
|
|
|
|
(int)(blks >> sbi->log_blocks_per_blkz);
|
|
|
|
#endif
|
|
|
|
}
|
2019-06-05 11:33:25 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
int f2fs_resize_fs(struct f2fs_sb_info *sbi, __u64 block_count)
|
|
|
|
{
|
|
|
|
__u64 old_block_count, shrunk_blocks;
|
2020-04-01 02:43:07 +08:00
|
|
|
struct cp_control cpc = { CP_RESIZE, 0, 0, 0 };
|
2019-06-05 11:33:25 +08:00
|
|
|
unsigned int secs;
|
|
|
|
int err = 0;
|
|
|
|
__u32 rem;
|
|
|
|
|
|
|
|
old_block_count = le64_to_cpu(F2FS_RAW_SUPER(sbi)->block_count);
|
|
|
|
if (block_count > old_block_count)
|
|
|
|
return -EINVAL;
|
|
|
|
|
2019-09-23 12:21:39 +08:00
|
|
|
if (f2fs_is_multi_device(sbi)) {
|
|
|
|
int last_dev = sbi->s_ndevs - 1;
|
|
|
|
__u64 last_segs = FDEV(last_dev).total_segments;
|
|
|
|
|
|
|
|
if (block_count + last_segs * sbi->blocks_per_seg <=
|
|
|
|
old_block_count)
|
|
|
|
return -EINVAL;
|
|
|
|
}
|
|
|
|
|
2019-06-05 11:33:25 +08:00
|
|
|
/* new fs size should align to section size */
|
|
|
|
div_u64_rem(block_count, BLKS_PER_SEC(sbi), &rem);
|
|
|
|
if (rem)
|
|
|
|
return -EINVAL;
|
|
|
|
|
|
|
|
if (block_count == old_block_count)
|
|
|
|
return 0;
|
|
|
|
|
|
|
|
if (is_sbi_flag_set(sbi, SBI_NEED_FSCK)) {
|
2019-06-18 17:48:42 +08:00
|
|
|
f2fs_err(sbi, "Should run fsck to repair first.");
|
2019-06-20 11:36:14 +08:00
|
|
|
return -EFSCORRUPTED;
|
2019-06-05 11:33:25 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
if (test_opt(sbi, DISABLE_CHECKPOINT)) {
|
2019-06-18 17:48:42 +08:00
|
|
|
f2fs_err(sbi, "Checkpoint should be enabled.");
|
2019-06-05 11:33:25 +08:00
|
|
|
return -EINVAL;
|
|
|
|
}
|
|
|
|
|
|
|
|
shrunk_blocks = old_block_count - block_count;
|
|
|
|
secs = div_u64(shrunk_blocks, BLKS_PER_SEC(sbi));
|
2020-04-01 02:43:07 +08:00
|
|
|
|
|
|
|
/* stop other GC */
|
|
|
|
if (!down_write_trylock(&sbi->gc_lock))
|
|
|
|
return -EAGAIN;
|
|
|
|
|
|
|
|
/* stop CP to protect MAIN_SEC in free_segment_range */
|
|
|
|
f2fs_lock_op(sbi);
|
|
|
|
err = free_segment_range(sbi, secs, true);
|
|
|
|
f2fs_unlock_op(sbi);
|
|
|
|
up_write(&sbi->gc_lock);
|
|
|
|
if (err)
|
|
|
|
return err;
|
|
|
|
|
|
|
|
set_sbi_flag(sbi, SBI_IS_RESIZEFS);
|
|
|
|
|
|
|
|
freeze_super(sbi->sb);
|
|
|
|
down_write(&sbi->gc_lock);
|
2020-11-23 13:28:32 +08:00
|
|
|
down_write(&sbi->cp_global_sem);
|
2020-04-01 02:43:07 +08:00
|
|
|
|
2019-06-05 11:33:25 +08:00
|
|
|
spin_lock(&sbi->stat_lock);
|
|
|
|
if (shrunk_blocks + valid_user_blocks(sbi) +
|
|
|
|
sbi->current_reserved_blocks + sbi->unusable_block_count +
|
|
|
|
F2FS_OPTION(sbi).root_reserved_blocks > sbi->user_block_count)
|
|
|
|
err = -ENOSPC;
|
|
|
|
else
|
|
|
|
sbi->user_block_count -= shrunk_blocks;
|
|
|
|
spin_unlock(&sbi->stat_lock);
|
2020-04-01 02:43:07 +08:00
|
|
|
if (err)
|
|
|
|
goto out_err;
|
2019-06-05 11:33:25 +08:00
|
|
|
|
2020-04-01 02:43:07 +08:00
|
|
|
err = free_segment_range(sbi, secs, false);
|
2019-06-05 11:33:25 +08:00
|
|
|
if (err)
|
2020-04-01 02:43:07 +08:00
|
|
|
goto recover_out;
|
2019-06-05 11:33:25 +08:00
|
|
|
|
|
|
|
update_sb_metadata(sbi, -secs);
|
|
|
|
|
|
|
|
err = f2fs_commit_super(sbi, false);
|
|
|
|
if (err) {
|
|
|
|
update_sb_metadata(sbi, secs);
|
2020-04-01 02:43:07 +08:00
|
|
|
goto recover_out;
|
2019-06-05 11:33:25 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
update_fs_metadata(sbi, -secs);
|
|
|
|
clear_sbi_flag(sbi, SBI_IS_RESIZEFS);
|
2020-03-03 22:29:25 +08:00
|
|
|
set_sbi_flag(sbi, SBI_IS_DIRTY);
|
|
|
|
|
2020-04-01 02:43:07 +08:00
|
|
|
err = f2fs_write_checkpoint(sbi, &cpc);
|
2019-06-05 11:33:25 +08:00
|
|
|
if (err) {
|
|
|
|
update_fs_metadata(sbi, secs);
|
|
|
|
update_sb_metadata(sbi, secs);
|
|
|
|
f2fs_commit_super(sbi, false);
|
|
|
|
}
|
2020-04-01 02:43:07 +08:00
|
|
|
recover_out:
|
2019-06-05 11:33:25 +08:00
|
|
|
if (err) {
|
|
|
|
set_sbi_flag(sbi, SBI_NEED_FSCK);
|
2019-06-18 17:48:42 +08:00
|
|
|
f2fs_err(sbi, "resize_fs failed, should run fsck to repair!");
|
2019-06-05 11:33:25 +08:00
|
|
|
|
|
|
|
spin_lock(&sbi->stat_lock);
|
|
|
|
sbi->user_block_count += shrunk_blocks;
|
|
|
|
spin_unlock(&sbi->stat_lock);
|
|
|
|
}
|
2020-04-01 02:43:07 +08:00
|
|
|
out_err:
|
2020-11-23 13:28:32 +08:00
|
|
|
up_write(&sbi->cp_global_sem);
|
2020-04-01 02:43:07 +08:00
|
|
|
up_write(&sbi->gc_lock);
|
|
|
|
thaw_super(sbi->sb);
|
2019-06-05 11:33:25 +08:00
|
|
|
clear_sbi_flag(sbi, SBI_IS_RESIZEFS);
|
|
|
|
return err;
|
|
|
|
}
|