2018-09-12 09:16:07 +08:00
|
|
|
// SPDX-License-Identifier: GPL-2.0
|
2012-11-29 12:28:09 +08:00
|
|
|
/*
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
* fs/f2fs/segment.c
|
|
|
|
*
|
|
|
|
* Copyright (c) 2012 Samsung Electronics Co., Ltd.
|
|
|
|
* http://www.samsung.com/
|
|
|
|
*/
|
|
|
|
#include <linux/fs.h>
|
|
|
|
#include <linux/f2fs_fs.h>
|
|
|
|
#include <linux/bio.h>
|
|
|
|
#include <linux/blkdev.h>
|
2012-12-20 05:19:30 +08:00
|
|
|
#include <linux/prefetch.h>
|
2014-04-02 14:34:36 +08:00
|
|
|
#include <linux/kthread.h>
|
2013-11-22 09:09:59 +08:00
|
|
|
#include <linux/swap.h>
|
2015-10-06 05:49:57 +08:00
|
|
|
#include <linux/timer.h>
|
2017-05-18 01:36:58 +08:00
|
|
|
#include <linux/freezer.h>
|
2017-09-10 03:03:23 +08:00
|
|
|
#include <linux/sched/signal.h>
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
|
|
|
|
#include "f2fs.h"
|
|
|
|
#include "segment.h"
|
|
|
|
#include "node.h"
|
2017-08-16 12:27:19 +08:00
|
|
|
#include "gc.h"
|
2014-12-18 11:58:58 +08:00
|
|
|
#include "trace.h"
|
2013-04-23 16:51:43 +08:00
|
|
|
#include <trace/events/f2fs.h>
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
|
2013-11-15 09:42:51 +08:00
|
|
|
#define __reverse_ffz(x) __reverse_ffs(~(x))
|
|
|
|
|
2013-11-15 12:55:58 +08:00
|
|
|
static struct kmem_cache *discard_entry_slab;
|
2017-01-10 06:13:03 +08:00
|
|
|
static struct kmem_cache *discard_cmd_slab;
|
f2fs: refactor flush_sit_entries codes for reducing SIT writes
In commit aec71382c681 ("f2fs: refactor flush_nat_entries codes for reducing NAT
writes"), we descripte the issue as below:
"Although building NAT journal in cursum reduce the read/write work for NAT
block, but previous design leave us lower performance when write checkpoint
frequently for these cases:
1. if journal in cursum has already full, it's a bit of waste that we flush all
nat entries to page for persistence, but not to cache any entries.
2. if journal in cursum is not full, we fill nat entries to journal util
journal is full, then flush the left dirty entries to disk without merge
journaled entries, so these journaled entries may be flushed to disk at next
checkpoint but lost chance to flushed last time."
Actually, we have the same problem in using SIT journal area.
In this patch, firstly we will update sit journal with dirty entries as many as
possible. Secondly if there is no space in sit journal, we will remove all
entries in journal and walk through the whole dirty entry bitmap of sit,
accounting dirty sit entries located in same SIT block to sit entry set. All
entry sets are linked to list sit_entry_set in sm_info, sorted ascending order
by count of entries in set. Later we flush entries in set which have fewest
entries into journal as many as we can, and then flush dense set with merged
entries to disk.
In this way we can use sit journal area more effectively, also we will reduce
SIT update, result in gaining in performance and saving lifetime of flash
device.
In my testing environment, it shows this patch can help to reduce SIT block
update obviously.
virtual machine + hard disk:
fsstress -p 20 -n 400 -l 5
sit page num cp count sit pages/cp
based 2006.50 1349.75 1.486
patched 1566.25 1463.25 1.070
Our latency of merging op is small when handling a great number of dirty SIT
entries in flush_sit_entries:
latency(ns) dirty sit count
36038 2151
49168 2123
37174 2232
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-09-04 18:13:01 +08:00
|
|
|
static struct kmem_cache *sit_entry_set_slab;
|
2014-10-07 08:39:50 +08:00
|
|
|
static struct kmem_cache *inmem_entry_slab;
|
2013-11-15 12:55:58 +08:00
|
|
|
|
2015-10-21 06:17:19 +08:00
|
|
|
static unsigned long __reverse_ulong(unsigned char *str)
|
|
|
|
{
|
|
|
|
unsigned long tmp = 0;
|
|
|
|
int shift = 24, idx = 0;
|
|
|
|
|
|
|
|
#if BITS_PER_LONG == 64
|
|
|
|
shift = 56;
|
|
|
|
#endif
|
|
|
|
while (shift >= 0) {
|
|
|
|
tmp |= (unsigned long)str[idx++] << shift;
|
|
|
|
shift -= BITS_PER_BYTE;
|
|
|
|
}
|
|
|
|
return tmp;
|
|
|
|
}
|
|
|
|
|
2013-11-15 09:42:51 +08:00
|
|
|
/*
|
|
|
|
* __reverse_ffs is copied from include/asm-generic/bitops/__ffs.h since
|
|
|
|
* MSB and LSB are reversed in a byte by f2fs_set_bit.
|
|
|
|
*/
|
|
|
|
static inline unsigned long __reverse_ffs(unsigned long word)
|
|
|
|
{
|
|
|
|
int num = 0;
|
|
|
|
|
|
|
|
#if BITS_PER_LONG == 64
|
2015-10-21 06:17:19 +08:00
|
|
|
if ((word & 0xffffffff00000000UL) == 0)
|
2013-11-15 09:42:51 +08:00
|
|
|
num += 32;
|
2015-10-21 06:17:19 +08:00
|
|
|
else
|
2013-11-15 09:42:51 +08:00
|
|
|
word >>= 32;
|
|
|
|
#endif
|
2015-10-21 06:17:19 +08:00
|
|
|
if ((word & 0xffff0000) == 0)
|
2013-11-15 09:42:51 +08:00
|
|
|
num += 16;
|
2015-10-21 06:17:19 +08:00
|
|
|
else
|
2013-11-15 09:42:51 +08:00
|
|
|
word >>= 16;
|
2015-10-21 06:17:19 +08:00
|
|
|
|
|
|
|
if ((word & 0xff00) == 0)
|
2013-11-15 09:42:51 +08:00
|
|
|
num += 8;
|
2015-10-21 06:17:19 +08:00
|
|
|
else
|
2013-11-15 09:42:51 +08:00
|
|
|
word >>= 8;
|
2015-10-21 06:17:19 +08:00
|
|
|
|
2013-11-15 09:42:51 +08:00
|
|
|
if ((word & 0xf0) == 0)
|
|
|
|
num += 4;
|
|
|
|
else
|
|
|
|
word >>= 4;
|
2015-10-21 06:17:19 +08:00
|
|
|
|
2013-11-15 09:42:51 +08:00
|
|
|
if ((word & 0xc) == 0)
|
|
|
|
num += 2;
|
|
|
|
else
|
|
|
|
word >>= 2;
|
2015-10-21 06:17:19 +08:00
|
|
|
|
2013-11-15 09:42:51 +08:00
|
|
|
if ((word & 0x2) == 0)
|
|
|
|
num += 1;
|
|
|
|
return num;
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
2014-08-06 22:22:50 +08:00
|
|
|
* __find_rev_next(_zero)_bit is copied from lib/find_next_bit.c because
|
2013-11-15 09:42:51 +08:00
|
|
|
* f2fs_set_bit makes MSB and LSB reversed in a byte.
|
2015-11-12 08:43:04 +08:00
|
|
|
* @size must be integral times of unsigned long.
|
2013-11-15 09:42:51 +08:00
|
|
|
* Example:
|
2015-10-21 06:17:19 +08:00
|
|
|
* MSB <--> LSB
|
|
|
|
* f2fs_set_bit(0, bitmap) => 1000 0000
|
|
|
|
* f2fs_set_bit(7, bitmap) => 0000 0001
|
2013-11-15 09:42:51 +08:00
|
|
|
*/
|
|
|
|
static unsigned long __find_rev_next_bit(const unsigned long *addr,
|
|
|
|
unsigned long size, unsigned long offset)
|
|
|
|
{
|
|
|
|
const unsigned long *p = addr + BIT_WORD(offset);
|
2015-11-12 08:43:04 +08:00
|
|
|
unsigned long result = size;
|
2013-11-15 09:42:51 +08:00
|
|
|
unsigned long tmp;
|
|
|
|
|
|
|
|
if (offset >= size)
|
|
|
|
return size;
|
|
|
|
|
2015-11-12 08:43:04 +08:00
|
|
|
size -= (offset & ~(BITS_PER_LONG - 1));
|
2013-11-15 09:42:51 +08:00
|
|
|
offset %= BITS_PER_LONG;
|
2015-10-21 06:17:19 +08:00
|
|
|
|
2015-11-12 08:43:04 +08:00
|
|
|
while (1) {
|
|
|
|
if (*p == 0)
|
|
|
|
goto pass;
|
2013-11-15 09:42:51 +08:00
|
|
|
|
2015-10-21 06:17:19 +08:00
|
|
|
tmp = __reverse_ulong((unsigned char *)p);
|
2015-11-12 08:43:04 +08:00
|
|
|
|
|
|
|
tmp &= ~0UL >> offset;
|
|
|
|
if (size < BITS_PER_LONG)
|
|
|
|
tmp &= (~0UL << (BITS_PER_LONG - size));
|
2013-11-15 09:42:51 +08:00
|
|
|
if (tmp)
|
2015-11-12 08:43:04 +08:00
|
|
|
goto found;
|
|
|
|
pass:
|
|
|
|
if (size <= BITS_PER_LONG)
|
|
|
|
break;
|
2013-11-15 09:42:51 +08:00
|
|
|
size -= BITS_PER_LONG;
|
2015-11-12 08:43:04 +08:00
|
|
|
offset = 0;
|
2015-10-21 06:17:19 +08:00
|
|
|
p++;
|
2013-11-15 09:42:51 +08:00
|
|
|
}
|
2015-11-12 08:43:04 +08:00
|
|
|
return result;
|
|
|
|
found:
|
|
|
|
return result - size + __reverse_ffs(tmp);
|
2013-11-15 09:42:51 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
static unsigned long __find_rev_next_zero_bit(const unsigned long *addr,
|
|
|
|
unsigned long size, unsigned long offset)
|
|
|
|
{
|
|
|
|
const unsigned long *p = addr + BIT_WORD(offset);
|
2015-12-05 08:51:13 +08:00
|
|
|
unsigned long result = size;
|
2013-11-15 09:42:51 +08:00
|
|
|
unsigned long tmp;
|
|
|
|
|
|
|
|
if (offset >= size)
|
|
|
|
return size;
|
|
|
|
|
2015-12-05 08:51:13 +08:00
|
|
|
size -= (offset & ~(BITS_PER_LONG - 1));
|
2013-11-15 09:42:51 +08:00
|
|
|
offset %= BITS_PER_LONG;
|
2015-12-05 08:51:13 +08:00
|
|
|
|
|
|
|
while (1) {
|
|
|
|
if (*p == ~0UL)
|
|
|
|
goto pass;
|
|
|
|
|
2015-10-21 06:17:19 +08:00
|
|
|
tmp = __reverse_ulong((unsigned char *)p);
|
2015-12-05 08:51:13 +08:00
|
|
|
|
|
|
|
if (offset)
|
|
|
|
tmp |= ~0UL << (BITS_PER_LONG - offset);
|
|
|
|
if (size < BITS_PER_LONG)
|
|
|
|
tmp |= ~0UL >> size;
|
2015-10-21 06:17:19 +08:00
|
|
|
if (tmp != ~0UL)
|
2015-12-05 08:51:13 +08:00
|
|
|
goto found;
|
|
|
|
pass:
|
|
|
|
if (size <= BITS_PER_LONG)
|
|
|
|
break;
|
2013-11-15 09:42:51 +08:00
|
|
|
size -= BITS_PER_LONG;
|
2015-12-05 08:51:13 +08:00
|
|
|
offset = 0;
|
2015-10-21 06:17:19 +08:00
|
|
|
p++;
|
2013-11-15 09:42:51 +08:00
|
|
|
}
|
2015-12-05 08:51:13 +08:00
|
|
|
return result;
|
|
|
|
found:
|
|
|
|
return result - size + __reverse_ffz(tmp);
|
2013-11-15 09:42:51 +08:00
|
|
|
}
|
|
|
|
|
f2fs: clean up symbol namespace
As Ted reported:
"Hi, I was looking at f2fs's sources recently, and I noticed that there
is a very large number of non-static symbols which don't have a f2fs
prefix. There's well over a hundred (see attached below).
As one example, in fs/f2fs/dir.c there is:
unsigned char get_de_type(struct f2fs_dir_entry *de)
This function is clearly only useful for f2fs, but it has a generic
name. This means that if any other file system tries to have the same
symbol name, there will be a symbol conflict and the kernel would not
successfully build. It also means that when someone is looking f2fs
sources, it's not at all obvious whether a function such as
read_data_page(), invalidate_blocks(), is a generic kernel function
found in the fs, mm, or block layers, or a f2fs specific function.
You might want to fix this at some point. Hopefully Kent's bcachefs
isn't similarly using genericly named functions, since that might
cause conflicts with f2fs's functions --- but just as this would be a
problem that we would rightly insist that Kent fix, this is something
that we should have rightly insisted that f2fs should have fixed
before it was integrated into the mainline kernel.
acquire_orphan_inode
add_ino_entry
add_orphan_inode
allocate_data_block
allocate_new_segments
alloc_nid
alloc_nid_done
alloc_nid_failed
available_free_memory
...."
This patch adds "f2fs_" prefix for all non-static symbols in order to:
a) avoid conflict with other kernel generic symbols;
b) to indicate the function is f2fs specific one instead of generic
one;
Reported-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-30 00:20:41 +08:00
|
|
|
bool f2fs_need_SSR(struct f2fs_sb_info *sbi)
|
2017-09-10 02:11:04 +08:00
|
|
|
{
|
|
|
|
int node_secs = get_blocktype_secs(sbi, F2FS_DIRTY_NODES);
|
|
|
|
int dent_secs = get_blocktype_secs(sbi, F2FS_DIRTY_DENTS);
|
|
|
|
int imeta_secs = get_blocktype_secs(sbi, F2FS_DIRTY_IMETA);
|
|
|
|
|
|
|
|
if (test_opt(sbi, LFS))
|
|
|
|
return false;
|
2018-05-08 05:22:40 +08:00
|
|
|
if (sbi->gc_mode == GC_URGENT)
|
2017-09-10 02:11:04 +08:00
|
|
|
return true;
|
|
|
|
|
|
|
|
return free_sections(sbi) <= (node_secs + 2 * dent_secs + imeta_secs +
|
2017-10-28 16:52:33 +08:00
|
|
|
SM_I(sbi)->min_ssr_sections + reserved_sections(sbi));
|
2017-09-10 02:11:04 +08:00
|
|
|
}
|
|
|
|
|
f2fs: clean up symbol namespace
As Ted reported:
"Hi, I was looking at f2fs's sources recently, and I noticed that there
is a very large number of non-static symbols which don't have a f2fs
prefix. There's well over a hundred (see attached below).
As one example, in fs/f2fs/dir.c there is:
unsigned char get_de_type(struct f2fs_dir_entry *de)
This function is clearly only useful for f2fs, but it has a generic
name. This means that if any other file system tries to have the same
symbol name, there will be a symbol conflict and the kernel would not
successfully build. It also means that when someone is looking f2fs
sources, it's not at all obvious whether a function such as
read_data_page(), invalidate_blocks(), is a generic kernel function
found in the fs, mm, or block layers, or a f2fs specific function.
You might want to fix this at some point. Hopefully Kent's bcachefs
isn't similarly using genericly named functions, since that might
cause conflicts with f2fs's functions --- but just as this would be a
problem that we would rightly insist that Kent fix, this is something
that we should have rightly insisted that f2fs should have fixed
before it was integrated into the mainline kernel.
acquire_orphan_inode
add_ino_entry
add_orphan_inode
allocate_data_block
allocate_new_segments
alloc_nid
alloc_nid_done
alloc_nid_failed
available_free_memory
...."
This patch adds "f2fs_" prefix for all non-static symbols in order to:
a) avoid conflict with other kernel generic symbols;
b) to indicate the function is f2fs specific one instead of generic
one;
Reported-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-30 00:20:41 +08:00
|
|
|
void f2fs_register_inmem_page(struct inode *inode, struct page *page)
|
2014-10-07 08:39:50 +08:00
|
|
|
{
|
2017-10-19 10:05:57 +08:00
|
|
|
struct f2fs_sb_info *sbi = F2FS_I_SB(inode);
|
2014-10-07 08:39:50 +08:00
|
|
|
struct f2fs_inode_info *fi = F2FS_I(inode);
|
|
|
|
struct inmem_pages *new;
|
2014-12-06 02:39:49 +08:00
|
|
|
|
2014-12-18 11:58:58 +08:00
|
|
|
f2fs_trace_pid(page);
|
2014-12-06 03:58:02 +08:00
|
|
|
|
2015-08-07 18:42:09 +08:00
|
|
|
set_page_private(page, (unsigned long)ATOMIC_WRITTEN_PAGE);
|
|
|
|
SetPagePrivate(page);
|
|
|
|
|
2014-10-07 08:39:50 +08:00
|
|
|
new = f2fs_kmem_cache_alloc(inmem_entry_slab, GFP_NOFS);
|
|
|
|
|
|
|
|
/* add atomic page indices to the list */
|
|
|
|
new->page = page;
|
|
|
|
INIT_LIST_HEAD(&new->list);
|
2015-08-07 18:42:09 +08:00
|
|
|
|
2014-10-07 08:39:50 +08:00
|
|
|
/* increase reference count with clean state */
|
|
|
|
mutex_lock(&fi->inmem_lock);
|
|
|
|
get_page(page);
|
|
|
|
list_add_tail(&new->list, &fi->inmem_pages);
|
2017-10-19 10:05:57 +08:00
|
|
|
spin_lock(&sbi->inode_lock[ATOMIC_FILE]);
|
|
|
|
if (list_empty(&fi->inmem_ilist))
|
|
|
|
list_add_tail(&fi->inmem_ilist, &sbi->inode_list[ATOMIC_FILE]);
|
|
|
|
spin_unlock(&sbi->inode_lock[ATOMIC_FILE]);
|
2014-12-06 09:18:15 +08:00
|
|
|
inc_page_count(F2FS_I_SB(inode), F2FS_INMEM_PAGES);
|
2014-10-07 08:39:50 +08:00
|
|
|
mutex_unlock(&fi->inmem_lock);
|
2015-03-18 08:58:08 +08:00
|
|
|
|
|
|
|
trace_f2fs_register_inmem_page(page, INMEM);
|
2014-10-07 08:39:50 +08:00
|
|
|
}
|
|
|
|
|
f2fs: support revoking atomic written pages
f2fs support atomic write with following semantics:
1. open db file
2. ioctl start atomic write
3. (write db file) * n
4. ioctl commit atomic write
5. close db file
With this flow we can avoid file becoming corrupted when abnormal power
cut, because we hold data of transaction in referenced pages linked in
inmem_pages list of inode, but without setting them dirty, so these data
won't be persisted unless we commit them in step 4.
But we should still hold journal db file in memory by using volatile
write, because our semantics of 'atomic write support' is incomplete, in
step 4, we could fail to submit all dirty data of transaction, once
partial dirty data was committed in storage, then after a checkpoint &
abnormal power-cut, db file will be corrupted forever.
So this patch tries to improve atomic write flow by adding a revoking flow,
once inner error occurs in committing, this gives another chance to try to
revoke these partial submitted data of current transaction, it makes
committing operation more like aotmical one.
If we're not lucky, once revoking operation was failed, EAGAIN will be
reported to user for suggesting doing the recovery with held journal file,
or retrying current transaction again.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-02-06 14:40:34 +08:00
|
|
|
static int __revoke_inmem_pages(struct inode *inode,
|
|
|
|
struct list_head *head, bool drop, bool recover)
|
2016-02-06 14:38:29 +08:00
|
|
|
{
|
f2fs: support revoking atomic written pages
f2fs support atomic write with following semantics:
1. open db file
2. ioctl start atomic write
3. (write db file) * n
4. ioctl commit atomic write
5. close db file
With this flow we can avoid file becoming corrupted when abnormal power
cut, because we hold data of transaction in referenced pages linked in
inmem_pages list of inode, but without setting them dirty, so these data
won't be persisted unless we commit them in step 4.
But we should still hold journal db file in memory by using volatile
write, because our semantics of 'atomic write support' is incomplete, in
step 4, we could fail to submit all dirty data of transaction, once
partial dirty data was committed in storage, then after a checkpoint &
abnormal power-cut, db file will be corrupted forever.
So this patch tries to improve atomic write flow by adding a revoking flow,
once inner error occurs in committing, this gives another chance to try to
revoke these partial submitted data of current transaction, it makes
committing operation more like aotmical one.
If we're not lucky, once revoking operation was failed, EAGAIN will be
reported to user for suggesting doing the recovery with held journal file,
or retrying current transaction again.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-02-06 14:40:34 +08:00
|
|
|
struct f2fs_sb_info *sbi = F2FS_I_SB(inode);
|
2016-02-06 14:38:29 +08:00
|
|
|
struct inmem_pages *cur, *tmp;
|
f2fs: support revoking atomic written pages
f2fs support atomic write with following semantics:
1. open db file
2. ioctl start atomic write
3. (write db file) * n
4. ioctl commit atomic write
5. close db file
With this flow we can avoid file becoming corrupted when abnormal power
cut, because we hold data of transaction in referenced pages linked in
inmem_pages list of inode, but without setting them dirty, so these data
won't be persisted unless we commit them in step 4.
But we should still hold journal db file in memory by using volatile
write, because our semantics of 'atomic write support' is incomplete, in
step 4, we could fail to submit all dirty data of transaction, once
partial dirty data was committed in storage, then after a checkpoint &
abnormal power-cut, db file will be corrupted forever.
So this patch tries to improve atomic write flow by adding a revoking flow,
once inner error occurs in committing, this gives another chance to try to
revoke these partial submitted data of current transaction, it makes
committing operation more like aotmical one.
If we're not lucky, once revoking operation was failed, EAGAIN will be
reported to user for suggesting doing the recovery with held journal file,
or retrying current transaction again.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-02-06 14:40:34 +08:00
|
|
|
int err = 0;
|
2016-02-06 14:38:29 +08:00
|
|
|
|
|
|
|
list_for_each_entry_safe(cur, tmp, head, list) {
|
f2fs: support revoking atomic written pages
f2fs support atomic write with following semantics:
1. open db file
2. ioctl start atomic write
3. (write db file) * n
4. ioctl commit atomic write
5. close db file
With this flow we can avoid file becoming corrupted when abnormal power
cut, because we hold data of transaction in referenced pages linked in
inmem_pages list of inode, but without setting them dirty, so these data
won't be persisted unless we commit them in step 4.
But we should still hold journal db file in memory by using volatile
write, because our semantics of 'atomic write support' is incomplete, in
step 4, we could fail to submit all dirty data of transaction, once
partial dirty data was committed in storage, then after a checkpoint &
abnormal power-cut, db file will be corrupted forever.
So this patch tries to improve atomic write flow by adding a revoking flow,
once inner error occurs in committing, this gives another chance to try to
revoke these partial submitted data of current transaction, it makes
committing operation more like aotmical one.
If we're not lucky, once revoking operation was failed, EAGAIN will be
reported to user for suggesting doing the recovery with held journal file,
or retrying current transaction again.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-02-06 14:40:34 +08:00
|
|
|
struct page *page = cur->page;
|
|
|
|
|
|
|
|
if (drop)
|
|
|
|
trace_f2fs_commit_inmem_page(page, INMEM_DROP);
|
|
|
|
|
|
|
|
lock_page(page);
|
2016-02-06 14:38:29 +08:00
|
|
|
|
2018-04-23 10:36:13 +08:00
|
|
|
f2fs_wait_on_page_writeback(page, DATA, true);
|
|
|
|
|
f2fs: support revoking atomic written pages
f2fs support atomic write with following semantics:
1. open db file
2. ioctl start atomic write
3. (write db file) * n
4. ioctl commit atomic write
5. close db file
With this flow we can avoid file becoming corrupted when abnormal power
cut, because we hold data of transaction in referenced pages linked in
inmem_pages list of inode, but without setting them dirty, so these data
won't be persisted unless we commit them in step 4.
But we should still hold journal db file in memory by using volatile
write, because our semantics of 'atomic write support' is incomplete, in
step 4, we could fail to submit all dirty data of transaction, once
partial dirty data was committed in storage, then after a checkpoint &
abnormal power-cut, db file will be corrupted forever.
So this patch tries to improve atomic write flow by adding a revoking flow,
once inner error occurs in committing, this gives another chance to try to
revoke these partial submitted data of current transaction, it makes
committing operation more like aotmical one.
If we're not lucky, once revoking operation was failed, EAGAIN will be
reported to user for suggesting doing the recovery with held journal file,
or retrying current transaction again.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-02-06 14:40:34 +08:00
|
|
|
if (recover) {
|
|
|
|
struct dnode_of_data dn;
|
|
|
|
struct node_info ni;
|
|
|
|
|
|
|
|
trace_f2fs_commit_inmem_page(page, INMEM_REVOKE);
|
2017-08-08 19:09:08 +08:00
|
|
|
retry:
|
f2fs: support revoking atomic written pages
f2fs support atomic write with following semantics:
1. open db file
2. ioctl start atomic write
3. (write db file) * n
4. ioctl commit atomic write
5. close db file
With this flow we can avoid file becoming corrupted when abnormal power
cut, because we hold data of transaction in referenced pages linked in
inmem_pages list of inode, but without setting them dirty, so these data
won't be persisted unless we commit them in step 4.
But we should still hold journal db file in memory by using volatile
write, because our semantics of 'atomic write support' is incomplete, in
step 4, we could fail to submit all dirty data of transaction, once
partial dirty data was committed in storage, then after a checkpoint &
abnormal power-cut, db file will be corrupted forever.
So this patch tries to improve atomic write flow by adding a revoking flow,
once inner error occurs in committing, this gives another chance to try to
revoke these partial submitted data of current transaction, it makes
committing operation more like aotmical one.
If we're not lucky, once revoking operation was failed, EAGAIN will be
reported to user for suggesting doing the recovery with held journal file,
or retrying current transaction again.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-02-06 14:40:34 +08:00
|
|
|
set_new_dnode(&dn, inode, NULL, NULL, 0);
|
f2fs: clean up symbol namespace
As Ted reported:
"Hi, I was looking at f2fs's sources recently, and I noticed that there
is a very large number of non-static symbols which don't have a f2fs
prefix. There's well over a hundred (see attached below).
As one example, in fs/f2fs/dir.c there is:
unsigned char get_de_type(struct f2fs_dir_entry *de)
This function is clearly only useful for f2fs, but it has a generic
name. This means that if any other file system tries to have the same
symbol name, there will be a symbol conflict and the kernel would not
successfully build. It also means that when someone is looking f2fs
sources, it's not at all obvious whether a function such as
read_data_page(), invalidate_blocks(), is a generic kernel function
found in the fs, mm, or block layers, or a f2fs specific function.
You might want to fix this at some point. Hopefully Kent's bcachefs
isn't similarly using genericly named functions, since that might
cause conflicts with f2fs's functions --- but just as this would be a
problem that we would rightly insist that Kent fix, this is something
that we should have rightly insisted that f2fs should have fixed
before it was integrated into the mainline kernel.
acquire_orphan_inode
add_ino_entry
add_orphan_inode
allocate_data_block
allocate_new_segments
alloc_nid
alloc_nid_done
alloc_nid_failed
available_free_memory
...."
This patch adds "f2fs_" prefix for all non-static symbols in order to:
a) avoid conflict with other kernel generic symbols;
b) to indicate the function is f2fs specific one instead of generic
one;
Reported-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-30 00:20:41 +08:00
|
|
|
err = f2fs_get_dnode_of_data(&dn, page->index,
|
|
|
|
LOOKUP_NODE);
|
2017-08-08 19:09:08 +08:00
|
|
|
if (err) {
|
|
|
|
if (err == -ENOMEM) {
|
|
|
|
congestion_wait(BLK_RW_ASYNC, HZ/50);
|
|
|
|
cond_resched();
|
|
|
|
goto retry;
|
|
|
|
}
|
f2fs: support revoking atomic written pages
f2fs support atomic write with following semantics:
1. open db file
2. ioctl start atomic write
3. (write db file) * n
4. ioctl commit atomic write
5. close db file
With this flow we can avoid file becoming corrupted when abnormal power
cut, because we hold data of transaction in referenced pages linked in
inmem_pages list of inode, but without setting them dirty, so these data
won't be persisted unless we commit them in step 4.
But we should still hold journal db file in memory by using volatile
write, because our semantics of 'atomic write support' is incomplete, in
step 4, we could fail to submit all dirty data of transaction, once
partial dirty data was committed in storage, then after a checkpoint &
abnormal power-cut, db file will be corrupted forever.
So this patch tries to improve atomic write flow by adding a revoking flow,
once inner error occurs in committing, this gives another chance to try to
revoke these partial submitted data of current transaction, it makes
committing operation more like aotmical one.
If we're not lucky, once revoking operation was failed, EAGAIN will be
reported to user for suggesting doing the recovery with held journal file,
or retrying current transaction again.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-02-06 14:40:34 +08:00
|
|
|
err = -EAGAIN;
|
|
|
|
goto next;
|
|
|
|
}
|
2018-07-17 00:02:17 +08:00
|
|
|
|
|
|
|
err = f2fs_get_node_info(sbi, dn.nid, &ni);
|
|
|
|
if (err) {
|
|
|
|
f2fs_put_dnode(&dn);
|
|
|
|
return err;
|
|
|
|
}
|
|
|
|
|
2018-01-10 15:49:10 +08:00
|
|
|
if (cur->old_addr == NEW_ADDR) {
|
f2fs: clean up symbol namespace
As Ted reported:
"Hi, I was looking at f2fs's sources recently, and I noticed that there
is a very large number of non-static symbols which don't have a f2fs
prefix. There's well over a hundred (see attached below).
As one example, in fs/f2fs/dir.c there is:
unsigned char get_de_type(struct f2fs_dir_entry *de)
This function is clearly only useful for f2fs, but it has a generic
name. This means that if any other file system tries to have the same
symbol name, there will be a symbol conflict and the kernel would not
successfully build. It also means that when someone is looking f2fs
sources, it's not at all obvious whether a function such as
read_data_page(), invalidate_blocks(), is a generic kernel function
found in the fs, mm, or block layers, or a f2fs specific function.
You might want to fix this at some point. Hopefully Kent's bcachefs
isn't similarly using genericly named functions, since that might
cause conflicts with f2fs's functions --- but just as this would be a
problem that we would rightly insist that Kent fix, this is something
that we should have rightly insisted that f2fs should have fixed
before it was integrated into the mainline kernel.
acquire_orphan_inode
add_ino_entry
add_orphan_inode
allocate_data_block
allocate_new_segments
alloc_nid
alloc_nid_done
alloc_nid_failed
available_free_memory
...."
This patch adds "f2fs_" prefix for all non-static symbols in order to:
a) avoid conflict with other kernel generic symbols;
b) to indicate the function is f2fs specific one instead of generic
one;
Reported-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-30 00:20:41 +08:00
|
|
|
f2fs_invalidate_blocks(sbi, dn.data_blkaddr);
|
2018-01-10 15:49:10 +08:00
|
|
|
f2fs_update_data_blkaddr(&dn, NEW_ADDR);
|
|
|
|
} else
|
|
|
|
f2fs_replace_block(sbi, &dn, dn.data_blkaddr,
|
f2fs: support revoking atomic written pages
f2fs support atomic write with following semantics:
1. open db file
2. ioctl start atomic write
3. (write db file) * n
4. ioctl commit atomic write
5. close db file
With this flow we can avoid file becoming corrupted when abnormal power
cut, because we hold data of transaction in referenced pages linked in
inmem_pages list of inode, but without setting them dirty, so these data
won't be persisted unless we commit them in step 4.
But we should still hold journal db file in memory by using volatile
write, because our semantics of 'atomic write support' is incomplete, in
step 4, we could fail to submit all dirty data of transaction, once
partial dirty data was committed in storage, then after a checkpoint &
abnormal power-cut, db file will be corrupted forever.
So this patch tries to improve atomic write flow by adding a revoking flow,
once inner error occurs in committing, this gives another chance to try to
revoke these partial submitted data of current transaction, it makes
committing operation more like aotmical one.
If we're not lucky, once revoking operation was failed, EAGAIN will be
reported to user for suggesting doing the recovery with held journal file,
or retrying current transaction again.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-02-06 14:40:34 +08:00
|
|
|
cur->old_addr, ni.version, true, true);
|
|
|
|
f2fs_put_dnode(&dn);
|
|
|
|
}
|
|
|
|
next:
|
2016-04-13 05:11:03 +08:00
|
|
|
/* we don't need to invalidate this in the sccessful status */
|
|
|
|
if (drop || recover)
|
|
|
|
ClearPageUptodate(page);
|
f2fs: support revoking atomic written pages
f2fs support atomic write with following semantics:
1. open db file
2. ioctl start atomic write
3. (write db file) * n
4. ioctl commit atomic write
5. close db file
With this flow we can avoid file becoming corrupted when abnormal power
cut, because we hold data of transaction in referenced pages linked in
inmem_pages list of inode, but without setting them dirty, so these data
won't be persisted unless we commit them in step 4.
But we should still hold journal db file in memory by using volatile
write, because our semantics of 'atomic write support' is incomplete, in
step 4, we could fail to submit all dirty data of transaction, once
partial dirty data was committed in storage, then after a checkpoint &
abnormal power-cut, db file will be corrupted forever.
So this patch tries to improve atomic write flow by adding a revoking flow,
once inner error occurs in committing, this gives another chance to try to
revoke these partial submitted data of current transaction, it makes
committing operation more like aotmical one.
If we're not lucky, once revoking operation was failed, EAGAIN will be
reported to user for suggesting doing the recovery with held journal file,
or retrying current transaction again.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-02-06 14:40:34 +08:00
|
|
|
set_page_private(page, 0);
|
2016-04-29 20:13:36 +08:00
|
|
|
ClearPagePrivate(page);
|
f2fs: support revoking atomic written pages
f2fs support atomic write with following semantics:
1. open db file
2. ioctl start atomic write
3. (write db file) * n
4. ioctl commit atomic write
5. close db file
With this flow we can avoid file becoming corrupted when abnormal power
cut, because we hold data of transaction in referenced pages linked in
inmem_pages list of inode, but without setting them dirty, so these data
won't be persisted unless we commit them in step 4.
But we should still hold journal db file in memory by using volatile
write, because our semantics of 'atomic write support' is incomplete, in
step 4, we could fail to submit all dirty data of transaction, once
partial dirty data was committed in storage, then after a checkpoint &
abnormal power-cut, db file will be corrupted forever.
So this patch tries to improve atomic write flow by adding a revoking flow,
once inner error occurs in committing, this gives another chance to try to
revoke these partial submitted data of current transaction, it makes
committing operation more like aotmical one.
If we're not lucky, once revoking operation was failed, EAGAIN will be
reported to user for suggesting doing the recovery with held journal file,
or retrying current transaction again.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-02-06 14:40:34 +08:00
|
|
|
f2fs_put_page(page, 1);
|
2016-02-06 14:38:29 +08:00
|
|
|
|
|
|
|
list_del(&cur->list);
|
|
|
|
kmem_cache_free(inmem_entry_slab, cur);
|
|
|
|
dec_page_count(F2FS_I_SB(inode), F2FS_INMEM_PAGES);
|
|
|
|
}
|
f2fs: support revoking atomic written pages
f2fs support atomic write with following semantics:
1. open db file
2. ioctl start atomic write
3. (write db file) * n
4. ioctl commit atomic write
5. close db file
With this flow we can avoid file becoming corrupted when abnormal power
cut, because we hold data of transaction in referenced pages linked in
inmem_pages list of inode, but without setting them dirty, so these data
won't be persisted unless we commit them in step 4.
But we should still hold journal db file in memory by using volatile
write, because our semantics of 'atomic write support' is incomplete, in
step 4, we could fail to submit all dirty data of transaction, once
partial dirty data was committed in storage, then after a checkpoint &
abnormal power-cut, db file will be corrupted forever.
So this patch tries to improve atomic write flow by adding a revoking flow,
once inner error occurs in committing, this gives another chance to try to
revoke these partial submitted data of current transaction, it makes
committing operation more like aotmical one.
If we're not lucky, once revoking operation was failed, EAGAIN will be
reported to user for suggesting doing the recovery with held journal file,
or retrying current transaction again.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-02-06 14:40:34 +08:00
|
|
|
return err;
|
2016-02-06 14:38:29 +08:00
|
|
|
}
|
|
|
|
|
f2fs: clean up symbol namespace
As Ted reported:
"Hi, I was looking at f2fs's sources recently, and I noticed that there
is a very large number of non-static symbols which don't have a f2fs
prefix. There's well over a hundred (see attached below).
As one example, in fs/f2fs/dir.c there is:
unsigned char get_de_type(struct f2fs_dir_entry *de)
This function is clearly only useful for f2fs, but it has a generic
name. This means that if any other file system tries to have the same
symbol name, there will be a symbol conflict and the kernel would not
successfully build. It also means that when someone is looking f2fs
sources, it's not at all obvious whether a function such as
read_data_page(), invalidate_blocks(), is a generic kernel function
found in the fs, mm, or block layers, or a f2fs specific function.
You might want to fix this at some point. Hopefully Kent's bcachefs
isn't similarly using genericly named functions, since that might
cause conflicts with f2fs's functions --- but just as this would be a
problem that we would rightly insist that Kent fix, this is something
that we should have rightly insisted that f2fs should have fixed
before it was integrated into the mainline kernel.
acquire_orphan_inode
add_ino_entry
add_orphan_inode
allocate_data_block
allocate_new_segments
alloc_nid
alloc_nid_done
alloc_nid_failed
available_free_memory
...."
This patch adds "f2fs_" prefix for all non-static symbols in order to:
a) avoid conflict with other kernel generic symbols;
b) to indicate the function is f2fs specific one instead of generic
one;
Reported-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-30 00:20:41 +08:00
|
|
|
void f2fs_drop_inmem_pages_all(struct f2fs_sb_info *sbi, bool gc_failure)
|
2017-10-19 10:05:57 +08:00
|
|
|
{
|
|
|
|
struct list_head *head = &sbi->inode_list[ATOMIC_FILE];
|
|
|
|
struct inode *inode;
|
|
|
|
struct f2fs_inode_info *fi;
|
|
|
|
next:
|
|
|
|
spin_lock(&sbi->inode_lock[ATOMIC_FILE]);
|
|
|
|
if (list_empty(head)) {
|
|
|
|
spin_unlock(&sbi->inode_lock[ATOMIC_FILE]);
|
|
|
|
return;
|
|
|
|
}
|
|
|
|
fi = list_first_entry(head, struct f2fs_inode_info, inmem_ilist);
|
|
|
|
inode = igrab(&fi->vfs_inode);
|
|
|
|
spin_unlock(&sbi->inode_lock[ATOMIC_FILE]);
|
|
|
|
|
|
|
|
if (inode) {
|
f2fs: avoid stucking GC due to atomic write
f2fs doesn't allow abuse on atomic write class interface, so except
limiting in-mem pages' total memory usage capacity, we need to limit
atomic-write usage as well when filesystem is seriously fragmented,
otherwise we may run into infinite loop during foreground GC because
target blocks in victim segment are belong to atomic opened file for
long time.
Now, we will detect failure due to atomic write in foreground GC, if
the count exceeds threshold, we will drop all atomic written data in
cache, by this, I expect it can keep our system running safely to
prevent Dos attack.
In addition, his patch adds to show GC skip information in debugfs,
now it just shows count of skipped caused by atomic write.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-07 20:28:54 +08:00
|
|
|
if (gc_failure) {
|
|
|
|
if (fi->i_gc_failures[GC_FAILURE_ATOMIC])
|
|
|
|
goto drop;
|
|
|
|
goto skip;
|
|
|
|
}
|
|
|
|
drop:
|
|
|
|
set_inode_flag(inode, FI_ATOMIC_REVOKE_REQUEST);
|
f2fs: clean up symbol namespace
As Ted reported:
"Hi, I was looking at f2fs's sources recently, and I noticed that there
is a very large number of non-static symbols which don't have a f2fs
prefix. There's well over a hundred (see attached below).
As one example, in fs/f2fs/dir.c there is:
unsigned char get_de_type(struct f2fs_dir_entry *de)
This function is clearly only useful for f2fs, but it has a generic
name. This means that if any other file system tries to have the same
symbol name, there will be a symbol conflict and the kernel would not
successfully build. It also means that when someone is looking f2fs
sources, it's not at all obvious whether a function such as
read_data_page(), invalidate_blocks(), is a generic kernel function
found in the fs, mm, or block layers, or a f2fs specific function.
You might want to fix this at some point. Hopefully Kent's bcachefs
isn't similarly using genericly named functions, since that might
cause conflicts with f2fs's functions --- but just as this would be a
problem that we would rightly insist that Kent fix, this is something
that we should have rightly insisted that f2fs should have fixed
before it was integrated into the mainline kernel.
acquire_orphan_inode
add_ino_entry
add_orphan_inode
allocate_data_block
allocate_new_segments
alloc_nid
alloc_nid_done
alloc_nid_failed
available_free_memory
...."
This patch adds "f2fs_" prefix for all non-static symbols in order to:
a) avoid conflict with other kernel generic symbols;
b) to indicate the function is f2fs specific one instead of generic
one;
Reported-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-30 00:20:41 +08:00
|
|
|
f2fs_drop_inmem_pages(inode);
|
2017-10-19 10:05:57 +08:00
|
|
|
iput(inode);
|
|
|
|
}
|
f2fs: avoid stucking GC due to atomic write
f2fs doesn't allow abuse on atomic write class interface, so except
limiting in-mem pages' total memory usage capacity, we need to limit
atomic-write usage as well when filesystem is seriously fragmented,
otherwise we may run into infinite loop during foreground GC because
target blocks in victim segment are belong to atomic opened file for
long time.
Now, we will detect failure due to atomic write in foreground GC, if
the count exceeds threshold, we will drop all atomic written data in
cache, by this, I expect it can keep our system running safely to
prevent Dos attack.
In addition, his patch adds to show GC skip information in debugfs,
now it just shows count of skipped caused by atomic write.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-07 20:28:54 +08:00
|
|
|
skip:
|
2017-10-19 10:05:57 +08:00
|
|
|
congestion_wait(BLK_RW_ASYNC, HZ/50);
|
|
|
|
cond_resched();
|
|
|
|
goto next;
|
|
|
|
}
|
|
|
|
|
f2fs: clean up symbol namespace
As Ted reported:
"Hi, I was looking at f2fs's sources recently, and I noticed that there
is a very large number of non-static symbols which don't have a f2fs
prefix. There's well over a hundred (see attached below).
As one example, in fs/f2fs/dir.c there is:
unsigned char get_de_type(struct f2fs_dir_entry *de)
This function is clearly only useful for f2fs, but it has a generic
name. This means that if any other file system tries to have the same
symbol name, there will be a symbol conflict and the kernel would not
successfully build. It also means that when someone is looking f2fs
sources, it's not at all obvious whether a function such as
read_data_page(), invalidate_blocks(), is a generic kernel function
found in the fs, mm, or block layers, or a f2fs specific function.
You might want to fix this at some point. Hopefully Kent's bcachefs
isn't similarly using genericly named functions, since that might
cause conflicts with f2fs's functions --- but just as this would be a
problem that we would rightly insist that Kent fix, this is something
that we should have rightly insisted that f2fs should have fixed
before it was integrated into the mainline kernel.
acquire_orphan_inode
add_ino_entry
add_orphan_inode
allocate_data_block
allocate_new_segments
alloc_nid
alloc_nid_done
alloc_nid_failed
available_free_memory
...."
This patch adds "f2fs_" prefix for all non-static symbols in order to:
a) avoid conflict with other kernel generic symbols;
b) to indicate the function is f2fs specific one instead of generic
one;
Reported-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-30 00:20:41 +08:00
|
|
|
void f2fs_drop_inmem_pages(struct inode *inode)
|
2016-02-06 14:38:29 +08:00
|
|
|
{
|
2017-10-19 10:05:57 +08:00
|
|
|
struct f2fs_sb_info *sbi = F2FS_I_SB(inode);
|
2016-02-06 14:38:29 +08:00
|
|
|
struct f2fs_inode_info *fi = F2FS_I(inode);
|
|
|
|
|
|
|
|
mutex_lock(&fi->inmem_lock);
|
f2fs: support revoking atomic written pages
f2fs support atomic write with following semantics:
1. open db file
2. ioctl start atomic write
3. (write db file) * n
4. ioctl commit atomic write
5. close db file
With this flow we can avoid file becoming corrupted when abnormal power
cut, because we hold data of transaction in referenced pages linked in
inmem_pages list of inode, but without setting them dirty, so these data
won't be persisted unless we commit them in step 4.
But we should still hold journal db file in memory by using volatile
write, because our semantics of 'atomic write support' is incomplete, in
step 4, we could fail to submit all dirty data of transaction, once
partial dirty data was committed in storage, then after a checkpoint &
abnormal power-cut, db file will be corrupted forever.
So this patch tries to improve atomic write flow by adding a revoking flow,
once inner error occurs in committing, this gives another chance to try to
revoke these partial submitted data of current transaction, it makes
committing operation more like aotmical one.
If we're not lucky, once revoking operation was failed, EAGAIN will be
reported to user for suggesting doing the recovery with held journal file,
or retrying current transaction again.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-02-06 14:40:34 +08:00
|
|
|
__revoke_inmem_pages(inode, &fi->inmem_pages, true, false);
|
2017-10-19 10:05:57 +08:00
|
|
|
spin_lock(&sbi->inode_lock[ATOMIC_FILE]);
|
|
|
|
if (!list_empty(&fi->inmem_ilist))
|
|
|
|
list_del_init(&fi->inmem_ilist);
|
|
|
|
spin_unlock(&sbi->inode_lock[ATOMIC_FILE]);
|
2016-02-06 14:38:29 +08:00
|
|
|
mutex_unlock(&fi->inmem_lock);
|
2017-01-07 18:50:26 +08:00
|
|
|
|
|
|
|
clear_inode_flag(inode, FI_ATOMIC_FILE);
|
f2fs: avoid stucking GC due to atomic write
f2fs doesn't allow abuse on atomic write class interface, so except
limiting in-mem pages' total memory usage capacity, we need to limit
atomic-write usage as well when filesystem is seriously fragmented,
otherwise we may run into infinite loop during foreground GC because
target blocks in victim segment are belong to atomic opened file for
long time.
Now, we will detect failure due to atomic write in foreground GC, if
the count exceeds threshold, we will drop all atomic written data in
cache, by this, I expect it can keep our system running safely to
prevent Dos attack.
In addition, his patch adds to show GC skip information in debugfs,
now it just shows count of skipped caused by atomic write.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-07 20:28:54 +08:00
|
|
|
fi->i_gc_failures[GC_FAILURE_ATOMIC] = 0;
|
2017-01-07 18:50:26 +08:00
|
|
|
stat_dec_atomic_write(inode);
|
2016-02-06 14:38:29 +08:00
|
|
|
}
|
|
|
|
|
f2fs: clean up symbol namespace
As Ted reported:
"Hi, I was looking at f2fs's sources recently, and I noticed that there
is a very large number of non-static symbols which don't have a f2fs
prefix. There's well over a hundred (see attached below).
As one example, in fs/f2fs/dir.c there is:
unsigned char get_de_type(struct f2fs_dir_entry *de)
This function is clearly only useful for f2fs, but it has a generic
name. This means that if any other file system tries to have the same
symbol name, there will be a symbol conflict and the kernel would not
successfully build. It also means that when someone is looking f2fs
sources, it's not at all obvious whether a function such as
read_data_page(), invalidate_blocks(), is a generic kernel function
found in the fs, mm, or block layers, or a f2fs specific function.
You might want to fix this at some point. Hopefully Kent's bcachefs
isn't similarly using genericly named functions, since that might
cause conflicts with f2fs's functions --- but just as this would be a
problem that we would rightly insist that Kent fix, this is something
that we should have rightly insisted that f2fs should have fixed
before it was integrated into the mainline kernel.
acquire_orphan_inode
add_ino_entry
add_orphan_inode
allocate_data_block
allocate_new_segments
alloc_nid
alloc_nid_done
alloc_nid_failed
available_free_memory
...."
This patch adds "f2fs_" prefix for all non-static symbols in order to:
a) avoid conflict with other kernel generic symbols;
b) to indicate the function is f2fs specific one instead of generic
one;
Reported-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-30 00:20:41 +08:00
|
|
|
void f2fs_drop_inmem_page(struct inode *inode, struct page *page)
|
2017-03-17 09:55:52 +08:00
|
|
|
{
|
|
|
|
struct f2fs_inode_info *fi = F2FS_I(inode);
|
|
|
|
struct f2fs_sb_info *sbi = F2FS_I_SB(inode);
|
|
|
|
struct list_head *head = &fi->inmem_pages;
|
|
|
|
struct inmem_pages *cur = NULL;
|
|
|
|
|
|
|
|
f2fs_bug_on(sbi, !IS_ATOMIC_WRITTEN_PAGE(page));
|
|
|
|
|
|
|
|
mutex_lock(&fi->inmem_lock);
|
|
|
|
list_for_each_entry(cur, head, list) {
|
|
|
|
if (cur->page == page)
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
|
2018-04-17 17:12:27 +08:00
|
|
|
f2fs_bug_on(sbi, list_empty(head) || cur->page != page);
|
2017-03-17 09:55:52 +08:00
|
|
|
list_del(&cur->list);
|
|
|
|
mutex_unlock(&fi->inmem_lock);
|
|
|
|
|
|
|
|
dec_page_count(sbi, F2FS_INMEM_PAGES);
|
|
|
|
kmem_cache_free(inmem_entry_slab, cur);
|
|
|
|
|
|
|
|
ClearPageUptodate(page);
|
|
|
|
set_page_private(page, 0);
|
|
|
|
ClearPagePrivate(page);
|
|
|
|
f2fs_put_page(page, 0);
|
|
|
|
|
|
|
|
trace_f2fs_commit_inmem_page(page, INMEM_INVALIDATE);
|
|
|
|
}
|
|
|
|
|
f2fs: clean up symbol namespace
As Ted reported:
"Hi, I was looking at f2fs's sources recently, and I noticed that there
is a very large number of non-static symbols which don't have a f2fs
prefix. There's well over a hundred (see attached below).
As one example, in fs/f2fs/dir.c there is:
unsigned char get_de_type(struct f2fs_dir_entry *de)
This function is clearly only useful for f2fs, but it has a generic
name. This means that if any other file system tries to have the same
symbol name, there will be a symbol conflict and the kernel would not
successfully build. It also means that when someone is looking f2fs
sources, it's not at all obvious whether a function such as
read_data_page(), invalidate_blocks(), is a generic kernel function
found in the fs, mm, or block layers, or a f2fs specific function.
You might want to fix this at some point. Hopefully Kent's bcachefs
isn't similarly using genericly named functions, since that might
cause conflicts with f2fs's functions --- but just as this would be a
problem that we would rightly insist that Kent fix, this is something
that we should have rightly insisted that f2fs should have fixed
before it was integrated into the mainline kernel.
acquire_orphan_inode
add_ino_entry
add_orphan_inode
allocate_data_block
allocate_new_segments
alloc_nid
alloc_nid_done
alloc_nid_failed
available_free_memory
...."
This patch adds "f2fs_" prefix for all non-static symbols in order to:
a) avoid conflict with other kernel generic symbols;
b) to indicate the function is f2fs specific one instead of generic
one;
Reported-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-30 00:20:41 +08:00
|
|
|
static int __f2fs_commit_inmem_pages(struct inode *inode)
|
2014-10-07 08:39:50 +08:00
|
|
|
{
|
|
|
|
struct f2fs_sb_info *sbi = F2FS_I_SB(inode);
|
|
|
|
struct f2fs_inode_info *fi = F2FS_I(inode);
|
|
|
|
struct inmem_pages *cur, *tmp;
|
|
|
|
struct f2fs_io_info fio = {
|
2015-04-24 05:38:15 +08:00
|
|
|
.sbi = sbi,
|
2017-09-29 13:59:38 +08:00
|
|
|
.ino = inode->i_ino,
|
2014-10-07 08:39:50 +08:00
|
|
|
.type = DATA,
|
2016-06-06 03:31:55 +08:00
|
|
|
.op = REQ_OP_WRITE,
|
2016-11-01 21:40:10 +08:00
|
|
|
.op_flags = REQ_SYNC | REQ_PRIO,
|
2017-08-02 23:21:48 +08:00
|
|
|
.io_type = FS_DATA_IO,
|
2014-10-07 08:39:50 +08:00
|
|
|
};
|
2018-04-23 10:36:14 +08:00
|
|
|
struct list_head revoke_list;
|
2017-02-02 08:51:22 +08:00
|
|
|
pgoff_t last_idx = ULONG_MAX;
|
2015-07-25 15:52:52 +08:00
|
|
|
int err = 0;
|
2014-10-07 08:39:50 +08:00
|
|
|
|
2018-04-23 10:36:14 +08:00
|
|
|
INIT_LIST_HEAD(&revoke_list);
|
|
|
|
|
2014-10-07 08:39:50 +08:00
|
|
|
list_for_each_entry_safe(cur, tmp, &fi->inmem_pages, list) {
|
f2fs: support revoking atomic written pages
f2fs support atomic write with following semantics:
1. open db file
2. ioctl start atomic write
3. (write db file) * n
4. ioctl commit atomic write
5. close db file
With this flow we can avoid file becoming corrupted when abnormal power
cut, because we hold data of transaction in referenced pages linked in
inmem_pages list of inode, but without setting them dirty, so these data
won't be persisted unless we commit them in step 4.
But we should still hold journal db file in memory by using volatile
write, because our semantics of 'atomic write support' is incomplete, in
step 4, we could fail to submit all dirty data of transaction, once
partial dirty data was committed in storage, then after a checkpoint &
abnormal power-cut, db file will be corrupted forever.
So this patch tries to improve atomic write flow by adding a revoking flow,
once inner error occurs in committing, this gives another chance to try to
revoke these partial submitted data of current transaction, it makes
committing operation more like aotmical one.
If we're not lucky, once revoking operation was failed, EAGAIN will be
reported to user for suggesting doing the recovery with held journal file,
or retrying current transaction again.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-02-06 14:40:34 +08:00
|
|
|
struct page *page = cur->page;
|
|
|
|
|
|
|
|
lock_page(page);
|
|
|
|
if (page->mapping == inode->i_mapping) {
|
|
|
|
trace_f2fs_commit_inmem_page(page, INMEM);
|
|
|
|
|
|
|
|
set_page_dirty(page);
|
|
|
|
f2fs_wait_on_page_writeback(page, DATA, true);
|
2016-10-11 22:57:01 +08:00
|
|
|
if (clear_page_dirty_for_io(page)) {
|
2016-02-06 14:38:29 +08:00
|
|
|
inode_dec_dirty_pages(inode);
|
f2fs: clean up symbol namespace
As Ted reported:
"Hi, I was looking at f2fs's sources recently, and I noticed that there
is a very large number of non-static symbols which don't have a f2fs
prefix. There's well over a hundred (see attached below).
As one example, in fs/f2fs/dir.c there is:
unsigned char get_de_type(struct f2fs_dir_entry *de)
This function is clearly only useful for f2fs, but it has a generic
name. This means that if any other file system tries to have the same
symbol name, there will be a symbol conflict and the kernel would not
successfully build. It also means that when someone is looking f2fs
sources, it's not at all obvious whether a function such as
read_data_page(), invalidate_blocks(), is a generic kernel function
found in the fs, mm, or block layers, or a f2fs specific function.
You might want to fix this at some point. Hopefully Kent's bcachefs
isn't similarly using genericly named functions, since that might
cause conflicts with f2fs's functions --- but just as this would be a
problem that we would rightly insist that Kent fix, this is something
that we should have rightly insisted that f2fs should have fixed
before it was integrated into the mainline kernel.
acquire_orphan_inode
add_ino_entry
add_orphan_inode
allocate_data_block
allocate_new_segments
alloc_nid
alloc_nid_done
alloc_nid_failed
available_free_memory
...."
This patch adds "f2fs_" prefix for all non-static symbols in order to:
a) avoid conflict with other kernel generic symbols;
b) to indicate the function is f2fs specific one instead of generic
one;
Reported-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-30 00:20:41 +08:00
|
|
|
f2fs_remove_dirty_inode(inode);
|
2016-10-11 22:57:01 +08:00
|
|
|
}
|
2017-07-20 01:59:55 +08:00
|
|
|
retry:
|
f2fs: support revoking atomic written pages
f2fs support atomic write with following semantics:
1. open db file
2. ioctl start atomic write
3. (write db file) * n
4. ioctl commit atomic write
5. close db file
With this flow we can avoid file becoming corrupted when abnormal power
cut, because we hold data of transaction in referenced pages linked in
inmem_pages list of inode, but without setting them dirty, so these data
won't be persisted unless we commit them in step 4.
But we should still hold journal db file in memory by using volatile
write, because our semantics of 'atomic write support' is incomplete, in
step 4, we could fail to submit all dirty data of transaction, once
partial dirty data was committed in storage, then after a checkpoint &
abnormal power-cut, db file will be corrupted forever.
So this patch tries to improve atomic write flow by adding a revoking flow,
once inner error occurs in committing, this gives another chance to try to
revoke these partial submitted data of current transaction, it makes
committing operation more like aotmical one.
If we're not lucky, once revoking operation was failed, EAGAIN will be
reported to user for suggesting doing the recovery with held journal file,
or retrying current transaction again.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-02-06 14:40:34 +08:00
|
|
|
fio.page = page;
|
2017-04-25 20:45:13 +08:00
|
|
|
fio.old_blkaddr = NULL_ADDR;
|
2017-04-27 02:11:12 +08:00
|
|
|
fio.encrypted_page = NULL;
|
2017-05-13 04:51:34 +08:00
|
|
|
fio.need_lock = LOCK_DONE;
|
f2fs: clean up symbol namespace
As Ted reported:
"Hi, I was looking at f2fs's sources recently, and I noticed that there
is a very large number of non-static symbols which don't have a f2fs
prefix. There's well over a hundred (see attached below).
As one example, in fs/f2fs/dir.c there is:
unsigned char get_de_type(struct f2fs_dir_entry *de)
This function is clearly only useful for f2fs, but it has a generic
name. This means that if any other file system tries to have the same
symbol name, there will be a symbol conflict and the kernel would not
successfully build. It also means that when someone is looking f2fs
sources, it's not at all obvious whether a function such as
read_data_page(), invalidate_blocks(), is a generic kernel function
found in the fs, mm, or block layers, or a f2fs specific function.
You might want to fix this at some point. Hopefully Kent's bcachefs
isn't similarly using genericly named functions, since that might
cause conflicts with f2fs's functions --- but just as this would be a
problem that we would rightly insist that Kent fix, this is something
that we should have rightly insisted that f2fs should have fixed
before it was integrated into the mainline kernel.
acquire_orphan_inode
add_ino_entry
add_orphan_inode
allocate_data_block
allocate_new_segments
alloc_nid
alloc_nid_done
alloc_nid_failed
available_free_memory
...."
This patch adds "f2fs_" prefix for all non-static symbols in order to:
a) avoid conflict with other kernel generic symbols;
b) to indicate the function is f2fs specific one instead of generic
one;
Reported-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-30 00:20:41 +08:00
|
|
|
err = f2fs_do_write_data_page(&fio);
|
2016-02-06 14:38:29 +08:00
|
|
|
if (err) {
|
2017-07-20 01:59:55 +08:00
|
|
|
if (err == -ENOMEM) {
|
|
|
|
congestion_wait(BLK_RW_ASYNC, HZ/50);
|
|
|
|
cond_resched();
|
|
|
|
goto retry;
|
|
|
|
}
|
f2fs: support revoking atomic written pages
f2fs support atomic write with following semantics:
1. open db file
2. ioctl start atomic write
3. (write db file) * n
4. ioctl commit atomic write
5. close db file
With this flow we can avoid file becoming corrupted when abnormal power
cut, because we hold data of transaction in referenced pages linked in
inmem_pages list of inode, but without setting them dirty, so these data
won't be persisted unless we commit them in step 4.
But we should still hold journal db file in memory by using volatile
write, because our semantics of 'atomic write support' is incomplete, in
step 4, we could fail to submit all dirty data of transaction, once
partial dirty data was committed in storage, then after a checkpoint &
abnormal power-cut, db file will be corrupted forever.
So this patch tries to improve atomic write flow by adding a revoking flow,
once inner error occurs in committing, this gives another chance to try to
revoke these partial submitted data of current transaction, it makes
committing operation more like aotmical one.
If we're not lucky, once revoking operation was failed, EAGAIN will be
reported to user for suggesting doing the recovery with held journal file,
or retrying current transaction again.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-02-06 14:40:34 +08:00
|
|
|
unlock_page(page);
|
2016-02-06 14:38:29 +08:00
|
|
|
break;
|
2014-12-11 05:59:33 +08:00
|
|
|
}
|
f2fs: support revoking atomic written pages
f2fs support atomic write with following semantics:
1. open db file
2. ioctl start atomic write
3. (write db file) * n
4. ioctl commit atomic write
5. close db file
With this flow we can avoid file becoming corrupted when abnormal power
cut, because we hold data of transaction in referenced pages linked in
inmem_pages list of inode, but without setting them dirty, so these data
won't be persisted unless we commit them in step 4.
But we should still hold journal db file in memory by using volatile
write, because our semantics of 'atomic write support' is incomplete, in
step 4, we could fail to submit all dirty data of transaction, once
partial dirty data was committed in storage, then after a checkpoint &
abnormal power-cut, db file will be corrupted forever.
So this patch tries to improve atomic write flow by adding a revoking flow,
once inner error occurs in committing, this gives another chance to try to
revoke these partial submitted data of current transaction, it makes
committing operation more like aotmical one.
If we're not lucky, once revoking operation was failed, EAGAIN will be
reported to user for suggesting doing the recovery with held journal file,
or retrying current transaction again.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-02-06 14:40:34 +08:00
|
|
|
/* record old blkaddr for revoking */
|
|
|
|
cur->old_addr = fio.old_blkaddr;
|
2017-02-02 08:51:22 +08:00
|
|
|
last_idx = page->index;
|
f2fs: support revoking atomic written pages
f2fs support atomic write with following semantics:
1. open db file
2. ioctl start atomic write
3. (write db file) * n
4. ioctl commit atomic write
5. close db file
With this flow we can avoid file becoming corrupted when abnormal power
cut, because we hold data of transaction in referenced pages linked in
inmem_pages list of inode, but without setting them dirty, so these data
won't be persisted unless we commit them in step 4.
But we should still hold journal db file in memory by using volatile
write, because our semantics of 'atomic write support' is incomplete, in
step 4, we could fail to submit all dirty data of transaction, once
partial dirty data was committed in storage, then after a checkpoint &
abnormal power-cut, db file will be corrupted forever.
So this patch tries to improve atomic write flow by adding a revoking flow,
once inner error occurs in committing, this gives another chance to try to
revoke these partial submitted data of current transaction, it makes
committing operation more like aotmical one.
If we're not lucky, once revoking operation was failed, EAGAIN will be
reported to user for suggesting doing the recovery with held journal file,
or retrying current transaction again.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-02-06 14:40:34 +08:00
|
|
|
}
|
|
|
|
unlock_page(page);
|
2018-04-23 10:36:14 +08:00
|
|
|
list_move_tail(&cur->list, &revoke_list);
|
2014-10-07 08:39:50 +08:00
|
|
|
}
|
2016-02-06 14:38:29 +08:00
|
|
|
|
2017-02-02 08:51:22 +08:00
|
|
|
if (last_idx != ULONG_MAX)
|
2017-05-11 02:28:38 +08:00
|
|
|
f2fs_submit_merged_write_cond(sbi, inode, 0, last_idx, DATA);
|
f2fs: support revoking atomic written pages
f2fs support atomic write with following semantics:
1. open db file
2. ioctl start atomic write
3. (write db file) * n
4. ioctl commit atomic write
5. close db file
With this flow we can avoid file becoming corrupted when abnormal power
cut, because we hold data of transaction in referenced pages linked in
inmem_pages list of inode, but without setting them dirty, so these data
won't be persisted unless we commit them in step 4.
But we should still hold journal db file in memory by using volatile
write, because our semantics of 'atomic write support' is incomplete, in
step 4, we could fail to submit all dirty data of transaction, once
partial dirty data was committed in storage, then after a checkpoint &
abnormal power-cut, db file will be corrupted forever.
So this patch tries to improve atomic write flow by adding a revoking flow,
once inner error occurs in committing, this gives another chance to try to
revoke these partial submitted data of current transaction, it makes
committing operation more like aotmical one.
If we're not lucky, once revoking operation was failed, EAGAIN will be
reported to user for suggesting doing the recovery with held journal file,
or retrying current transaction again.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-02-06 14:40:34 +08:00
|
|
|
|
2018-04-23 10:36:14 +08:00
|
|
|
if (err) {
|
|
|
|
/*
|
|
|
|
* try to revoke all committed pages, but still we could fail
|
|
|
|
* due to no memory or other reason, if that happened, EAGAIN
|
|
|
|
* will be returned, which means in such case, transaction is
|
|
|
|
* already not integrity, caller should use journal to do the
|
|
|
|
* recovery or rewrite & commit last transaction. For other
|
|
|
|
* error number, revoking was done by filesystem itself.
|
|
|
|
*/
|
|
|
|
err = __revoke_inmem_pages(inode, &revoke_list, false, true);
|
|
|
|
|
|
|
|
/* drop all uncommitted pages */
|
|
|
|
__revoke_inmem_pages(inode, &fi->inmem_pages, true, false);
|
|
|
|
} else {
|
|
|
|
__revoke_inmem_pages(inode, &revoke_list, false, false);
|
|
|
|
}
|
f2fs: support revoking atomic written pages
f2fs support atomic write with following semantics:
1. open db file
2. ioctl start atomic write
3. (write db file) * n
4. ioctl commit atomic write
5. close db file
With this flow we can avoid file becoming corrupted when abnormal power
cut, because we hold data of transaction in referenced pages linked in
inmem_pages list of inode, but without setting them dirty, so these data
won't be persisted unless we commit them in step 4.
But we should still hold journal db file in memory by using volatile
write, because our semantics of 'atomic write support' is incomplete, in
step 4, we could fail to submit all dirty data of transaction, once
partial dirty data was committed in storage, then after a checkpoint &
abnormal power-cut, db file will be corrupted forever.
So this patch tries to improve atomic write flow by adding a revoking flow,
once inner error occurs in committing, this gives another chance to try to
revoke these partial submitted data of current transaction, it makes
committing operation more like aotmical one.
If we're not lucky, once revoking operation was failed, EAGAIN will be
reported to user for suggesting doing the recovery with held journal file,
or retrying current transaction again.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-02-06 14:40:34 +08:00
|
|
|
|
2016-02-06 14:38:29 +08:00
|
|
|
return err;
|
|
|
|
}
|
|
|
|
|
f2fs: clean up symbol namespace
As Ted reported:
"Hi, I was looking at f2fs's sources recently, and I noticed that there
is a very large number of non-static symbols which don't have a f2fs
prefix. There's well over a hundred (see attached below).
As one example, in fs/f2fs/dir.c there is:
unsigned char get_de_type(struct f2fs_dir_entry *de)
This function is clearly only useful for f2fs, but it has a generic
name. This means that if any other file system tries to have the same
symbol name, there will be a symbol conflict and the kernel would not
successfully build. It also means that when someone is looking f2fs
sources, it's not at all obvious whether a function such as
read_data_page(), invalidate_blocks(), is a generic kernel function
found in the fs, mm, or block layers, or a f2fs specific function.
You might want to fix this at some point. Hopefully Kent's bcachefs
isn't similarly using genericly named functions, since that might
cause conflicts with f2fs's functions --- but just as this would be a
problem that we would rightly insist that Kent fix, this is something
that we should have rightly insisted that f2fs should have fixed
before it was integrated into the mainline kernel.
acquire_orphan_inode
add_ino_entry
add_orphan_inode
allocate_data_block
allocate_new_segments
alloc_nid
alloc_nid_done
alloc_nid_failed
available_free_memory
...."
This patch adds "f2fs_" prefix for all non-static symbols in order to:
a) avoid conflict with other kernel generic symbols;
b) to indicate the function is f2fs specific one instead of generic
one;
Reported-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-30 00:20:41 +08:00
|
|
|
int f2fs_commit_inmem_pages(struct inode *inode)
|
2016-02-06 14:38:29 +08:00
|
|
|
{
|
|
|
|
struct f2fs_sb_info *sbi = F2FS_I_SB(inode);
|
|
|
|
struct f2fs_inode_info *fi = F2FS_I(inode);
|
f2fs: support revoking atomic written pages
f2fs support atomic write with following semantics:
1. open db file
2. ioctl start atomic write
3. (write db file) * n
4. ioctl commit atomic write
5. close db file
With this flow we can avoid file becoming corrupted when abnormal power
cut, because we hold data of transaction in referenced pages linked in
inmem_pages list of inode, but without setting them dirty, so these data
won't be persisted unless we commit them in step 4.
But we should still hold journal db file in memory by using volatile
write, because our semantics of 'atomic write support' is incomplete, in
step 4, we could fail to submit all dirty data of transaction, once
partial dirty data was committed in storage, then after a checkpoint &
abnormal power-cut, db file will be corrupted forever.
So this patch tries to improve atomic write flow by adding a revoking flow,
once inner error occurs in committing, this gives another chance to try to
revoke these partial submitted data of current transaction, it makes
committing operation more like aotmical one.
If we're not lucky, once revoking operation was failed, EAGAIN will be
reported to user for suggesting doing the recovery with held journal file,
or retrying current transaction again.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-02-06 14:40:34 +08:00
|
|
|
int err;
|
2016-02-06 14:38:29 +08:00
|
|
|
|
|
|
|
f2fs_balance_fs(sbi, true);
|
|
|
|
|
2018-07-25 11:11:56 +08:00
|
|
|
down_write(&fi->i_gc_rwsem[WRITE]);
|
|
|
|
|
|
|
|
f2fs_lock_op(sbi);
|
2017-01-07 18:50:26 +08:00
|
|
|
set_inode_flag(inode, FI_ATOMIC_COMMIT);
|
|
|
|
|
2016-02-06 14:38:29 +08:00
|
|
|
mutex_lock(&fi->inmem_lock);
|
f2fs: clean up symbol namespace
As Ted reported:
"Hi, I was looking at f2fs's sources recently, and I noticed that there
is a very large number of non-static symbols which don't have a f2fs
prefix. There's well over a hundred (see attached below).
As one example, in fs/f2fs/dir.c there is:
unsigned char get_de_type(struct f2fs_dir_entry *de)
This function is clearly only useful for f2fs, but it has a generic
name. This means that if any other file system tries to have the same
symbol name, there will be a symbol conflict and the kernel would not
successfully build. It also means that when someone is looking f2fs
sources, it's not at all obvious whether a function such as
read_data_page(), invalidate_blocks(), is a generic kernel function
found in the fs, mm, or block layers, or a f2fs specific function.
You might want to fix this at some point. Hopefully Kent's bcachefs
isn't similarly using genericly named functions, since that might
cause conflicts with f2fs's functions --- but just as this would be a
problem that we would rightly insist that Kent fix, this is something
that we should have rightly insisted that f2fs should have fixed
before it was integrated into the mainline kernel.
acquire_orphan_inode
add_ino_entry
add_orphan_inode
allocate_data_block
allocate_new_segments
alloc_nid
alloc_nid_done
alloc_nid_failed
available_free_memory
...."
This patch adds "f2fs_" prefix for all non-static symbols in order to:
a) avoid conflict with other kernel generic symbols;
b) to indicate the function is f2fs specific one instead of generic
one;
Reported-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-30 00:20:41 +08:00
|
|
|
err = __f2fs_commit_inmem_pages(inode);
|
f2fs: support revoking atomic written pages
f2fs support atomic write with following semantics:
1. open db file
2. ioctl start atomic write
3. (write db file) * n
4. ioctl commit atomic write
5. close db file
With this flow we can avoid file becoming corrupted when abnormal power
cut, because we hold data of transaction in referenced pages linked in
inmem_pages list of inode, but without setting them dirty, so these data
won't be persisted unless we commit them in step 4.
But we should still hold journal db file in memory by using volatile
write, because our semantics of 'atomic write support' is incomplete, in
step 4, we could fail to submit all dirty data of transaction, once
partial dirty data was committed in storage, then after a checkpoint &
abnormal power-cut, db file will be corrupted forever.
So this patch tries to improve atomic write flow by adding a revoking flow,
once inner error occurs in committing, this gives another chance to try to
revoke these partial submitted data of current transaction, it makes
committing operation more like aotmical one.
If we're not lucky, once revoking operation was failed, EAGAIN will be
reported to user for suggesting doing the recovery with held journal file,
or retrying current transaction again.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-02-06 14:40:34 +08:00
|
|
|
|
2017-10-19 10:05:57 +08:00
|
|
|
spin_lock(&sbi->inode_lock[ATOMIC_FILE]);
|
|
|
|
if (!list_empty(&fi->inmem_ilist))
|
|
|
|
list_del_init(&fi->inmem_ilist);
|
|
|
|
spin_unlock(&sbi->inode_lock[ATOMIC_FILE]);
|
2014-10-07 08:39:50 +08:00
|
|
|
mutex_unlock(&fi->inmem_lock);
|
|
|
|
|
2017-01-07 18:50:26 +08:00
|
|
|
clear_inode_flag(inode, FI_ATOMIC_COMMIT);
|
|
|
|
|
2016-02-06 14:38:29 +08:00
|
|
|
f2fs_unlock_op(sbi);
|
2018-07-25 11:11:56 +08:00
|
|
|
up_write(&fi->i_gc_rwsem[WRITE]);
|
|
|
|
|
2015-07-25 15:52:52 +08:00
|
|
|
return err;
|
2014-10-07 08:39:50 +08:00
|
|
|
}
|
|
|
|
|
2012-11-29 12:28:09 +08:00
|
|
|
/*
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
* This function balances dirty node and dentry pages.
|
|
|
|
* In addition, it controls garbage collection.
|
|
|
|
*/
|
2016-01-08 06:15:04 +08:00
|
|
|
void f2fs_balance_fs(struct f2fs_sb_info *sbi, bool need)
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
{
|
2017-02-25 11:08:28 +08:00
|
|
|
if (time_to_inject(sbi, FAULT_CHECKPOINT)) {
|
|
|
|
f2fs_show_injection_info(FAULT_CHECKPOINT);
|
2016-09-26 19:45:55 +08:00
|
|
|
f2fs_stop_checkpoint(sbi, false);
|
2017-02-25 11:08:28 +08:00
|
|
|
}
|
2016-09-26 19:45:55 +08:00
|
|
|
|
2016-06-03 06:24:24 +08:00
|
|
|
/* balance_fs_bg is able to be pending */
|
2017-04-21 04:51:57 +08:00
|
|
|
if (need && excess_cached_nats(sbi))
|
2016-06-03 06:24:24 +08:00
|
|
|
f2fs_balance_fs_bg(sbi);
|
|
|
|
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
/*
|
2012-12-21 16:20:21 +08:00
|
|
|
* We should do GC or end up with checkpoint, if there are so many dirty
|
|
|
|
* dir/node pages without enough free segments.
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
*/
|
2016-09-02 03:02:51 +08:00
|
|
|
if (has_not_enough_free_secs(sbi, 0, 0)) {
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
mutex_lock(&sbi->gc_mutex);
|
2017-04-14 06:17:00 +08:00
|
|
|
f2fs_gc(sbi, false, false, NULL_SEGNO);
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2013-10-24 13:19:18 +08:00
|
|
|
void f2fs_balance_fs_bg(struct f2fs_sb_info *sbi)
|
|
|
|
{
|
2018-05-26 18:03:34 +08:00
|
|
|
if (unlikely(is_sbi_flag_set(sbi, SBI_POR_DOING)))
|
|
|
|
return;
|
|
|
|
|
f2fs: enable rb-tree extent cache
This patch enables rb-tree based extent cache in f2fs.
When we mount with "-o extent_cache", f2fs will try to add recently accessed
page-block mappings into rb-tree based extent cache as much as possible, instead
of original one extent info cache.
By this way, f2fs can support more effective cache between dnode page cache and
disk. It will supply high hit ratio in the cache with fewer memory when dnode
page cache are reclaimed in environment of low memory.
Storage: Sandisk sd card 64g
1.append write file (offset: 0, size: 128M);
2.override write file (offset: 2M, size: 1M);
3.override write file (offset: 4M, size: 1M);
...
4.override write file (offset: 48M, size: 1M);
...
5.override write file (offset: 112M, size: 1M);
6.sync
7.echo 3 > /proc/sys/vm/drop_caches
8.read file (size:128M, unit: 4k, count: 32768)
(time dd if=/mnt/f2fs/128m bs=4k count=32768)
Extent Hit Ratio:
before patched
Hit Ratio 121 / 1071 1071 / 1071
Performance:
before patched
real 0m37.051s 0m35.556s
user 0m0.040s 0m0.026s
sys 0m2.990s 0m2.251s
Memory Cost:
before patched
Tree Count: 0 1 (size: 24 bytes)
Node Count: 0 45 (size: 1440 bytes)
v3:
o retest and given more details of test result.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-02-05 17:57:31 +08:00
|
|
|
/* try to shrink extent cache when there is no enough memory */
|
f2fs: clean up symbol namespace
As Ted reported:
"Hi, I was looking at f2fs's sources recently, and I noticed that there
is a very large number of non-static symbols which don't have a f2fs
prefix. There's well over a hundred (see attached below).
As one example, in fs/f2fs/dir.c there is:
unsigned char get_de_type(struct f2fs_dir_entry *de)
This function is clearly only useful for f2fs, but it has a generic
name. This means that if any other file system tries to have the same
symbol name, there will be a symbol conflict and the kernel would not
successfully build. It also means that when someone is looking f2fs
sources, it's not at all obvious whether a function such as
read_data_page(), invalidate_blocks(), is a generic kernel function
found in the fs, mm, or block layers, or a f2fs specific function.
You might want to fix this at some point. Hopefully Kent's bcachefs
isn't similarly using genericly named functions, since that might
cause conflicts with f2fs's functions --- but just as this would be a
problem that we would rightly insist that Kent fix, this is something
that we should have rightly insisted that f2fs should have fixed
before it was integrated into the mainline kernel.
acquire_orphan_inode
add_ino_entry
add_orphan_inode
allocate_data_block
allocate_new_segments
alloc_nid
alloc_nid_done
alloc_nid_failed
available_free_memory
...."
This patch adds "f2fs_" prefix for all non-static symbols in order to:
a) avoid conflict with other kernel generic symbols;
b) to indicate the function is f2fs specific one instead of generic
one;
Reported-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-30 00:20:41 +08:00
|
|
|
if (!f2fs_available_free_memory(sbi, EXTENT_CACHE))
|
2015-06-20 04:41:23 +08:00
|
|
|
f2fs_shrink_extent_tree(sbi, EXTENT_CACHE_SHRINK_NUMBER);
|
f2fs: enable rb-tree extent cache
This patch enables rb-tree based extent cache in f2fs.
When we mount with "-o extent_cache", f2fs will try to add recently accessed
page-block mappings into rb-tree based extent cache as much as possible, instead
of original one extent info cache.
By this way, f2fs can support more effective cache between dnode page cache and
disk. It will supply high hit ratio in the cache with fewer memory when dnode
page cache are reclaimed in environment of low memory.
Storage: Sandisk sd card 64g
1.append write file (offset: 0, size: 128M);
2.override write file (offset: 2M, size: 1M);
3.override write file (offset: 4M, size: 1M);
...
4.override write file (offset: 48M, size: 1M);
...
5.override write file (offset: 112M, size: 1M);
6.sync
7.echo 3 > /proc/sys/vm/drop_caches
8.read file (size:128M, unit: 4k, count: 32768)
(time dd if=/mnt/f2fs/128m bs=4k count=32768)
Extent Hit Ratio:
before patched
Hit Ratio 121 / 1071 1071 / 1071
Performance:
before patched
real 0m37.051s 0m35.556s
user 0m0.040s 0m0.026s
sys 0m2.990s 0m2.251s
Memory Cost:
before patched
Tree Count: 0 1 (size: 24 bytes)
Node Count: 0 45 (size: 1440 bytes)
v3:
o retest and given more details of test result.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-02-05 17:57:31 +08:00
|
|
|
|
2015-06-20 06:36:07 +08:00
|
|
|
/* check the # of cached NAT entries */
|
f2fs: clean up symbol namespace
As Ted reported:
"Hi, I was looking at f2fs's sources recently, and I noticed that there
is a very large number of non-static symbols which don't have a f2fs
prefix. There's well over a hundred (see attached below).
As one example, in fs/f2fs/dir.c there is:
unsigned char get_de_type(struct f2fs_dir_entry *de)
This function is clearly only useful for f2fs, but it has a generic
name. This means that if any other file system tries to have the same
symbol name, there will be a symbol conflict and the kernel would not
successfully build. It also means that when someone is looking f2fs
sources, it's not at all obvious whether a function such as
read_data_page(), invalidate_blocks(), is a generic kernel function
found in the fs, mm, or block layers, or a f2fs specific function.
You might want to fix this at some point. Hopefully Kent's bcachefs
isn't similarly using genericly named functions, since that might
cause conflicts with f2fs's functions --- but just as this would be a
problem that we would rightly insist that Kent fix, this is something
that we should have rightly insisted that f2fs should have fixed
before it was integrated into the mainline kernel.
acquire_orphan_inode
add_ino_entry
add_orphan_inode
allocate_data_block
allocate_new_segments
alloc_nid
alloc_nid_done
alloc_nid_failed
available_free_memory
...."
This patch adds "f2fs_" prefix for all non-static symbols in order to:
a) avoid conflict with other kernel generic symbols;
b) to indicate the function is f2fs specific one instead of generic
one;
Reported-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-30 00:20:41 +08:00
|
|
|
if (!f2fs_available_free_memory(sbi, NAT_ENTRIES))
|
|
|
|
f2fs_try_to_free_nats(sbi, NAT_ENTRY_PER_BLOCK);
|
2015-06-20 06:36:07 +08:00
|
|
|
|
f2fs: clean up symbol namespace
As Ted reported:
"Hi, I was looking at f2fs's sources recently, and I noticed that there
is a very large number of non-static symbols which don't have a f2fs
prefix. There's well over a hundred (see attached below).
As one example, in fs/f2fs/dir.c there is:
unsigned char get_de_type(struct f2fs_dir_entry *de)
This function is clearly only useful for f2fs, but it has a generic
name. This means that if any other file system tries to have the same
symbol name, there will be a symbol conflict and the kernel would not
successfully build. It also means that when someone is looking f2fs
sources, it's not at all obvious whether a function such as
read_data_page(), invalidate_blocks(), is a generic kernel function
found in the fs, mm, or block layers, or a f2fs specific function.
You might want to fix this at some point. Hopefully Kent's bcachefs
isn't similarly using genericly named functions, since that might
cause conflicts with f2fs's functions --- but just as this would be a
problem that we would rightly insist that Kent fix, this is something
that we should have rightly insisted that f2fs should have fixed
before it was integrated into the mainline kernel.
acquire_orphan_inode
add_ino_entry
add_orphan_inode
allocate_data_block
allocate_new_segments
alloc_nid
alloc_nid_done
alloc_nid_failed
available_free_memory
...."
This patch adds "f2fs_" prefix for all non-static symbols in order to:
a) avoid conflict with other kernel generic symbols;
b) to indicate the function is f2fs specific one instead of generic
one;
Reported-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-30 00:20:41 +08:00
|
|
|
if (!f2fs_available_free_memory(sbi, FREE_NIDS))
|
|
|
|
f2fs_try_to_free_nids(sbi, MAX_FREE_NIDS);
|
2016-06-17 07:41:49 +08:00
|
|
|
else
|
f2fs: clean up symbol namespace
As Ted reported:
"Hi, I was looking at f2fs's sources recently, and I noticed that there
is a very large number of non-static symbols which don't have a f2fs
prefix. There's well over a hundred (see attached below).
As one example, in fs/f2fs/dir.c there is:
unsigned char get_de_type(struct f2fs_dir_entry *de)
This function is clearly only useful for f2fs, but it has a generic
name. This means that if any other file system tries to have the same
symbol name, there will be a symbol conflict and the kernel would not
successfully build. It also means that when someone is looking f2fs
sources, it's not at all obvious whether a function such as
read_data_page(), invalidate_blocks(), is a generic kernel function
found in the fs, mm, or block layers, or a f2fs specific function.
You might want to fix this at some point. Hopefully Kent's bcachefs
isn't similarly using genericly named functions, since that might
cause conflicts with f2fs's functions --- but just as this would be a
problem that we would rightly insist that Kent fix, this is something
that we should have rightly insisted that f2fs should have fixed
before it was integrated into the mainline kernel.
acquire_orphan_inode
add_ino_entry
add_orphan_inode
allocate_data_block
allocate_new_segments
alloc_nid
alloc_nid_done
alloc_nid_failed
available_free_memory
...."
This patch adds "f2fs_" prefix for all non-static symbols in order to:
a) avoid conflict with other kernel generic symbols;
b) to indicate the function is f2fs specific one instead of generic
one;
Reported-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-30 00:20:41 +08:00
|
|
|
f2fs_build_free_nids(sbi, false, false);
|
2015-07-28 18:33:46 +08:00
|
|
|
|
2018-09-19 16:48:47 +08:00
|
|
|
if (!is_idle(sbi, REQ_TIME) &&
|
2018-07-25 19:16:21 +08:00
|
|
|
(!excess_dirty_nats(sbi) && !excess_dirty_nodes(sbi)))
|
2016-12-06 03:37:14 +08:00
|
|
|
return;
|
2015-07-28 18:33:46 +08:00
|
|
|
|
2015-06-20 06:36:07 +08:00
|
|
|
/* checkpoint is the only way to shrink partial cached entries */
|
f2fs: clean up symbol namespace
As Ted reported:
"Hi, I was looking at f2fs's sources recently, and I noticed that there
is a very large number of non-static symbols which don't have a f2fs
prefix. There's well over a hundred (see attached below).
As one example, in fs/f2fs/dir.c there is:
unsigned char get_de_type(struct f2fs_dir_entry *de)
This function is clearly only useful for f2fs, but it has a generic
name. This means that if any other file system tries to have the same
symbol name, there will be a symbol conflict and the kernel would not
successfully build. It also means that when someone is looking f2fs
sources, it's not at all obvious whether a function such as
read_data_page(), invalidate_blocks(), is a generic kernel function
found in the fs, mm, or block layers, or a f2fs specific function.
You might want to fix this at some point. Hopefully Kent's bcachefs
isn't similarly using genericly named functions, since that might
cause conflicts with f2fs's functions --- but just as this would be a
problem that we would rightly insist that Kent fix, this is something
that we should have rightly insisted that f2fs should have fixed
before it was integrated into the mainline kernel.
acquire_orphan_inode
add_ino_entry
add_orphan_inode
allocate_data_block
allocate_new_segments
alloc_nid
alloc_nid_done
alloc_nid_failed
available_free_memory
...."
This patch adds "f2fs_" prefix for all non-static symbols in order to:
a) avoid conflict with other kernel generic symbols;
b) to indicate the function is f2fs specific one instead of generic
one;
Reported-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-30 00:20:41 +08:00
|
|
|
if (!f2fs_available_free_memory(sbi, NAT_ENTRIES) ||
|
|
|
|
!f2fs_available_free_memory(sbi, INO_ENTRIES) ||
|
f2fs: flush dirty nat entries when exceeding threshold
When testing f2fs with xfstest, generic/251 is stuck for long time,
the case uses below serials to obtain fresh released space in device,
in order to prepare for following fstrim test.
1. rm -rf /mnt/dir
2. mkdir /mnt/dir/
3. cp -axT `pwd`/ /mnt/dir/
4. goto 1
During preparing step, all nat entries will be cached in nat cache,
most of them are dirty entries with invalid blkaddr, which means
nodes related to these entries have been truncated, and they could
be reused after the dirty entries been checkpointed.
However, there was no checkpoint been triggered, so nid allocators
(e.g. mkdir, creat) will run into long journey of iterating all NAT
pages, looking for free nids in alloc_nid->build_free_nids.
Here, in f2fs_balance_fs_bg we give another chance to do checkpoint
to flush nat entries for reusing them in free nid cache when dirty
entry count exceeds 10% of max count.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-01-18 18:31:18 +08:00
|
|
|
excess_prefree_segs(sbi) ||
|
|
|
|
excess_dirty_nats(sbi) ||
|
2018-07-25 19:16:21 +08:00
|
|
|
excess_dirty_nodes(sbi) ||
|
2016-12-06 03:37:14 +08:00
|
|
|
f2fs_time_over(sbi, CP_TIME)) {
|
2016-02-14 18:54:33 +08:00
|
|
|
if (test_opt(sbi, DATA_FLUSH)) {
|
|
|
|
struct blk_plug plug;
|
|
|
|
|
|
|
|
blk_start_plug(&plug);
|
f2fs: clean up symbol namespace
As Ted reported:
"Hi, I was looking at f2fs's sources recently, and I noticed that there
is a very large number of non-static symbols which don't have a f2fs
prefix. There's well over a hundred (see attached below).
As one example, in fs/f2fs/dir.c there is:
unsigned char get_de_type(struct f2fs_dir_entry *de)
This function is clearly only useful for f2fs, but it has a generic
name. This means that if any other file system tries to have the same
symbol name, there will be a symbol conflict and the kernel would not
successfully build. It also means that when someone is looking f2fs
sources, it's not at all obvious whether a function such as
read_data_page(), invalidate_blocks(), is a generic kernel function
found in the fs, mm, or block layers, or a f2fs specific function.
You might want to fix this at some point. Hopefully Kent's bcachefs
isn't similarly using genericly named functions, since that might
cause conflicts with f2fs's functions --- but just as this would be a
problem that we would rightly insist that Kent fix, this is something
that we should have rightly insisted that f2fs should have fixed
before it was integrated into the mainline kernel.
acquire_orphan_inode
add_ino_entry
add_orphan_inode
allocate_data_block
allocate_new_segments
alloc_nid
alloc_nid_done
alloc_nid_failed
available_free_memory
...."
This patch adds "f2fs_" prefix for all non-static symbols in order to:
a) avoid conflict with other kernel generic symbols;
b) to indicate the function is f2fs specific one instead of generic
one;
Reported-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-30 00:20:41 +08:00
|
|
|
f2fs_sync_dirty_inodes(sbi, FILE_INODE);
|
2016-02-14 18:54:33 +08:00
|
|
|
blk_finish_plug(&plug);
|
|
|
|
}
|
2013-10-24 13:19:18 +08:00
|
|
|
f2fs_sync_fs(sbi->sb, true);
|
2016-01-10 05:45:17 +08:00
|
|
|
stat_inc_bg_cp_count(sbi->stat_info);
|
f2fs: support data flush in background
Previously, when finishing a checkpoint, we have persisted all fs meta
info including meta inode, node inode, dentry page of directory inode, so,
after a sudden power cut, f2fs can recover from last checkpoint with full
directory structure.
But during checkpoint, we didn't flush dirty pages of regular and symlink
inode, so such dirty datas still in memory will be lost in that moment of
power off.
In order to reduce the chance of lost data, this patch enables
f2fs_balance_fs_bg with the ability of data flushing. It will try to flush
user data before starting a checkpoint. So user's data written after last
checkpoint which may not be fsynced could be saved.
When we mount with data_flush option, after every period of cp_interval
(could be configured in sysfs: /sys/fs/f2fs/device/cp_interval) seconds
user data could be flushed into device once f2fs_balance_fs_bg was called
in kworker thread or gc thread.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-12-17 17:13:28 +08:00
|
|
|
}
|
2013-10-24 13:19:18 +08:00
|
|
|
}
|
|
|
|
|
2017-03-04 22:13:10 +08:00
|
|
|
static int __submit_flush_wait(struct f2fs_sb_info *sbi,
|
|
|
|
struct block_device *bdev)
|
2016-10-07 10:02:05 +08:00
|
|
|
{
|
2017-10-28 16:52:31 +08:00
|
|
|
struct bio *bio = f2fs_bio_alloc(sbi, 0, true);
|
2016-10-07 10:02:05 +08:00
|
|
|
int ret;
|
|
|
|
|
2017-05-02 23:03:47 +08:00
|
|
|
bio->bi_opf = REQ_OP_WRITE | REQ_SYNC | REQ_PREFLUSH;
|
2017-08-24 01:10:32 +08:00
|
|
|
bio_set_dev(bio, bdev);
|
2016-10-07 10:02:05 +08:00
|
|
|
ret = submit_bio_wait(bio);
|
|
|
|
bio_put(bio);
|
2017-03-04 22:13:10 +08:00
|
|
|
|
|
|
|
trace_f2fs_issue_flush(bdev, test_opt(sbi, NOBARRIER),
|
|
|
|
test_opt(sbi, FLUSH_MERGE), ret);
|
2016-10-07 10:02:05 +08:00
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
|
2017-09-29 13:59:38 +08:00
|
|
|
static int submit_flush_wait(struct f2fs_sb_info *sbi, nid_t ino)
|
2016-10-07 10:02:05 +08:00
|
|
|
{
|
2017-09-29 13:59:38 +08:00
|
|
|
int ret = 0;
|
2016-10-07 10:02:05 +08:00
|
|
|
int i;
|
|
|
|
|
2017-09-29 13:59:38 +08:00
|
|
|
if (!sbi->s_ndevs)
|
|
|
|
return __submit_flush_wait(sbi, sbi->sb->s_bdev);
|
2017-03-04 22:13:10 +08:00
|
|
|
|
2017-09-29 13:59:38 +08:00
|
|
|
for (i = 0; i < sbi->s_ndevs; i++) {
|
f2fs: clean up symbol namespace
As Ted reported:
"Hi, I was looking at f2fs's sources recently, and I noticed that there
is a very large number of non-static symbols which don't have a f2fs
prefix. There's well over a hundred (see attached below).
As one example, in fs/f2fs/dir.c there is:
unsigned char get_de_type(struct f2fs_dir_entry *de)
This function is clearly only useful for f2fs, but it has a generic
name. This means that if any other file system tries to have the same
symbol name, there will be a symbol conflict and the kernel would not
successfully build. It also means that when someone is looking f2fs
sources, it's not at all obvious whether a function such as
read_data_page(), invalidate_blocks(), is a generic kernel function
found in the fs, mm, or block layers, or a f2fs specific function.
You might want to fix this at some point. Hopefully Kent's bcachefs
isn't similarly using genericly named functions, since that might
cause conflicts with f2fs's functions --- but just as this would be a
problem that we would rightly insist that Kent fix, this is something
that we should have rightly insisted that f2fs should have fixed
before it was integrated into the mainline kernel.
acquire_orphan_inode
add_ino_entry
add_orphan_inode
allocate_data_block
allocate_new_segments
alloc_nid
alloc_nid_done
alloc_nid_failed
available_free_memory
...."
This patch adds "f2fs_" prefix for all non-static symbols in order to:
a) avoid conflict with other kernel generic symbols;
b) to indicate the function is f2fs specific one instead of generic
one;
Reported-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-30 00:20:41 +08:00
|
|
|
if (!f2fs_is_dirty_device(sbi, ino, i, FLUSH_INO))
|
2017-09-29 13:59:38 +08:00
|
|
|
continue;
|
2017-03-04 22:13:10 +08:00
|
|
|
ret = __submit_flush_wait(sbi, FDEV(i).bdev);
|
|
|
|
if (ret)
|
|
|
|
break;
|
2016-10-07 10:02:05 +08:00
|
|
|
}
|
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
|
2014-04-27 14:21:33 +08:00
|
|
|
static int issue_flush_thread(void *data)
|
2014-04-02 14:34:36 +08:00
|
|
|
{
|
|
|
|
struct f2fs_sb_info *sbi = data;
|
2017-01-10 06:13:03 +08:00
|
|
|
struct flush_cmd_control *fcc = SM_I(sbi)->fcc_info;
|
2014-04-27 14:21:21 +08:00
|
|
|
wait_queue_head_t *q = &fcc->flush_wait_queue;
|
2014-04-02 14:34:36 +08:00
|
|
|
repeat:
|
|
|
|
if (kthread_should_stop())
|
|
|
|
return 0;
|
|
|
|
|
f2fs: make background threads of f2fs being aware of freezing
When ->freeze_fs is called from lvm for doing snapshot, it needs to
make sure there will be no more changes in filesystem's data, however,
previously, background threads like GC thread wasn't aware of freezing,
so in environment with active background threads, data of snapshot
becomes unstable.
This patch fixes this issue by adding sb_{start,end}_intwrite in
below background threads:
- GC thread
- flush thread
- discard thread
Note that, don't use sb_start_intwrite() in gc_thread_func() due to:
generic/241 reports below bug:
======================================================
WARNING: possible circular locking dependency detected
4.13.0-rc1+ #32 Tainted: G O
------------------------------------------------------
f2fs_gc-250:0/22186 is trying to acquire lock:
(&sbi->gc_mutex){+.+...}, at: [<f8fa7f0b>] f2fs_sync_fs+0x7b/0x1b0 [f2fs]
but task is already holding lock:
(sb_internal#2){++++.-}, at: [<f8fb5609>] gc_thread_func+0x159/0x4a0 [f2fs]
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #2 (sb_internal#2){++++.-}:
__lock_acquire+0x405/0x7b0
lock_acquire+0xae/0x220
__sb_start_write+0x11d/0x1f0
f2fs_evict_inode+0x2d6/0x4e0 [f2fs]
evict+0xa8/0x170
iput+0x1fb/0x2c0
f2fs_sync_inode_meta+0x3f/0xf0 [f2fs]
write_checkpoint+0x1b1/0x750 [f2fs]
f2fs_sync_fs+0x85/0x1b0 [f2fs]
f2fs_do_sync_file.isra.24+0x137/0xa30 [f2fs]
f2fs_sync_file+0x34/0x40 [f2fs]
vfs_fsync_range+0x4a/0xa0
do_fsync+0x3c/0x60
SyS_fdatasync+0x15/0x20
do_fast_syscall_32+0xa1/0x1b0
entry_SYSENTER_32+0x4c/0x7b
-> #1 (&sbi->cp_mutex){+.+...}:
__lock_acquire+0x405/0x7b0
lock_acquire+0xae/0x220
__mutex_lock+0x4f/0x830
mutex_lock_nested+0x25/0x30
write_checkpoint+0x2f/0x750 [f2fs]
f2fs_sync_fs+0x85/0x1b0 [f2fs]
sync_filesystem+0x67/0x80
generic_shutdown_super+0x27/0x100
kill_block_super+0x22/0x50
kill_f2fs_super+0x3a/0x40 [f2fs]
deactivate_locked_super+0x3d/0x70
deactivate_super+0x40/0x60
cleanup_mnt+0x39/0x70
__cleanup_mnt+0x10/0x20
task_work_run+0x69/0x80
exit_to_usermode_loop+0x57/0x92
do_fast_syscall_32+0x18c/0x1b0
entry_SYSENTER_32+0x4c/0x7b
-> #0 (&sbi->gc_mutex){+.+...}:
validate_chain.isra.36+0xc50/0xdb0
__lock_acquire+0x405/0x7b0
lock_acquire+0xae/0x220
__mutex_lock+0x4f/0x830
mutex_lock_nested+0x25/0x30
f2fs_sync_fs+0x7b/0x1b0 [f2fs]
f2fs_balance_fs_bg+0xb9/0x200 [f2fs]
gc_thread_func+0x302/0x4a0 [f2fs]
kthread+0xe9/0x120
ret_from_fork+0x19/0x24
other info that might help us debug this:
Chain exists of:
&sbi->gc_mutex --> &sbi->cp_mutex --> sb_internal#2
Possible unsafe locking scenario:
CPU0 CPU1
---- ----
lock(sb_internal#2);
lock(&sbi->cp_mutex);
lock(sb_internal#2);
lock(&sbi->gc_mutex);
*** DEADLOCK ***
1 lock held by f2fs_gc-250:0/22186:
#0: (sb_internal#2){++++.-}, at: [<f8fb5609>] gc_thread_func+0x159/0x4a0 [f2fs]
stack backtrace:
CPU: 2 PID: 22186 Comm: f2fs_gc-250:0 Tainted: G O 4.13.0-rc1+ #32
Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
Call Trace:
dump_stack+0x5f/0x92
print_circular_bug+0x1b3/0x1bd
validate_chain.isra.36+0xc50/0xdb0
? __this_cpu_preempt_check+0xf/0x20
__lock_acquire+0x405/0x7b0
lock_acquire+0xae/0x220
? f2fs_sync_fs+0x7b/0x1b0 [f2fs]
__mutex_lock+0x4f/0x830
? f2fs_sync_fs+0x7b/0x1b0 [f2fs]
mutex_lock_nested+0x25/0x30
? f2fs_sync_fs+0x7b/0x1b0 [f2fs]
f2fs_sync_fs+0x7b/0x1b0 [f2fs]
f2fs_balance_fs_bg+0xb9/0x200 [f2fs]
gc_thread_func+0x302/0x4a0 [f2fs]
? preempt_schedule_common+0x2f/0x4d
? f2fs_gc+0x540/0x540 [f2fs]
kthread+0xe9/0x120
? f2fs_gc+0x540/0x540 [f2fs]
? kthread_create_on_node+0x30/0x30
ret_from_fork+0x19/0x24
The deadlock occurs in below condition:
GC Thread Thread B
- sb_start_intwrite
- f2fs_sync_file
- f2fs_sync_fs
- mutex_lock(&sbi->gc_mutex)
- write_checkpoint
- block_operations
- f2fs_sync_inode_meta
- iput
- sb_start_intwrite
- mutex_lock(&sbi->gc_mutex)
Fix this by altering sb_start_intwrite to sb_start_write_trylock.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-07-22 08:52:23 +08:00
|
|
|
sb_start_intwrite(sbi->sb);
|
|
|
|
|
2014-09-05 18:31:00 +08:00
|
|
|
if (!llist_empty(&fcc->issue_list)) {
|
2014-04-02 14:34:36 +08:00
|
|
|
struct flush_cmd *cmd, *next;
|
|
|
|
int ret;
|
|
|
|
|
2014-09-05 18:31:00 +08:00
|
|
|
fcc->dispatch_list = llist_del_all(&fcc->issue_list);
|
|
|
|
fcc->dispatch_list = llist_reverse_order(fcc->dispatch_list);
|
|
|
|
|
2017-09-29 13:59:38 +08:00
|
|
|
cmd = llist_entry(fcc->dispatch_list, struct flush_cmd, llnode);
|
|
|
|
|
|
|
|
ret = submit_flush_wait(sbi, cmd->ino);
|
2017-03-25 17:19:58 +08:00
|
|
|
atomic_inc(&fcc->issued_flush);
|
|
|
|
|
2014-09-05 18:31:00 +08:00
|
|
|
llist_for_each_entry_safe(cmd, next,
|
|
|
|
fcc->dispatch_list, llnode) {
|
2014-04-02 14:34:36 +08:00
|
|
|
cmd->ret = ret;
|
|
|
|
complete(&cmd->wait);
|
|
|
|
}
|
2014-04-27 14:21:21 +08:00
|
|
|
fcc->dispatch_list = NULL;
|
2014-04-02 14:34:36 +08:00
|
|
|
}
|
|
|
|
|
f2fs: make background threads of f2fs being aware of freezing
When ->freeze_fs is called from lvm for doing snapshot, it needs to
make sure there will be no more changes in filesystem's data, however,
previously, background threads like GC thread wasn't aware of freezing,
so in environment with active background threads, data of snapshot
becomes unstable.
This patch fixes this issue by adding sb_{start,end}_intwrite in
below background threads:
- GC thread
- flush thread
- discard thread
Note that, don't use sb_start_intwrite() in gc_thread_func() due to:
generic/241 reports below bug:
======================================================
WARNING: possible circular locking dependency detected
4.13.0-rc1+ #32 Tainted: G O
------------------------------------------------------
f2fs_gc-250:0/22186 is trying to acquire lock:
(&sbi->gc_mutex){+.+...}, at: [<f8fa7f0b>] f2fs_sync_fs+0x7b/0x1b0 [f2fs]
but task is already holding lock:
(sb_internal#2){++++.-}, at: [<f8fb5609>] gc_thread_func+0x159/0x4a0 [f2fs]
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #2 (sb_internal#2){++++.-}:
__lock_acquire+0x405/0x7b0
lock_acquire+0xae/0x220
__sb_start_write+0x11d/0x1f0
f2fs_evict_inode+0x2d6/0x4e0 [f2fs]
evict+0xa8/0x170
iput+0x1fb/0x2c0
f2fs_sync_inode_meta+0x3f/0xf0 [f2fs]
write_checkpoint+0x1b1/0x750 [f2fs]
f2fs_sync_fs+0x85/0x1b0 [f2fs]
f2fs_do_sync_file.isra.24+0x137/0xa30 [f2fs]
f2fs_sync_file+0x34/0x40 [f2fs]
vfs_fsync_range+0x4a/0xa0
do_fsync+0x3c/0x60
SyS_fdatasync+0x15/0x20
do_fast_syscall_32+0xa1/0x1b0
entry_SYSENTER_32+0x4c/0x7b
-> #1 (&sbi->cp_mutex){+.+...}:
__lock_acquire+0x405/0x7b0
lock_acquire+0xae/0x220
__mutex_lock+0x4f/0x830
mutex_lock_nested+0x25/0x30
write_checkpoint+0x2f/0x750 [f2fs]
f2fs_sync_fs+0x85/0x1b0 [f2fs]
sync_filesystem+0x67/0x80
generic_shutdown_super+0x27/0x100
kill_block_super+0x22/0x50
kill_f2fs_super+0x3a/0x40 [f2fs]
deactivate_locked_super+0x3d/0x70
deactivate_super+0x40/0x60
cleanup_mnt+0x39/0x70
__cleanup_mnt+0x10/0x20
task_work_run+0x69/0x80
exit_to_usermode_loop+0x57/0x92
do_fast_syscall_32+0x18c/0x1b0
entry_SYSENTER_32+0x4c/0x7b
-> #0 (&sbi->gc_mutex){+.+...}:
validate_chain.isra.36+0xc50/0xdb0
__lock_acquire+0x405/0x7b0
lock_acquire+0xae/0x220
__mutex_lock+0x4f/0x830
mutex_lock_nested+0x25/0x30
f2fs_sync_fs+0x7b/0x1b0 [f2fs]
f2fs_balance_fs_bg+0xb9/0x200 [f2fs]
gc_thread_func+0x302/0x4a0 [f2fs]
kthread+0xe9/0x120
ret_from_fork+0x19/0x24
other info that might help us debug this:
Chain exists of:
&sbi->gc_mutex --> &sbi->cp_mutex --> sb_internal#2
Possible unsafe locking scenario:
CPU0 CPU1
---- ----
lock(sb_internal#2);
lock(&sbi->cp_mutex);
lock(sb_internal#2);
lock(&sbi->gc_mutex);
*** DEADLOCK ***
1 lock held by f2fs_gc-250:0/22186:
#0: (sb_internal#2){++++.-}, at: [<f8fb5609>] gc_thread_func+0x159/0x4a0 [f2fs]
stack backtrace:
CPU: 2 PID: 22186 Comm: f2fs_gc-250:0 Tainted: G O 4.13.0-rc1+ #32
Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
Call Trace:
dump_stack+0x5f/0x92
print_circular_bug+0x1b3/0x1bd
validate_chain.isra.36+0xc50/0xdb0
? __this_cpu_preempt_check+0xf/0x20
__lock_acquire+0x405/0x7b0
lock_acquire+0xae/0x220
? f2fs_sync_fs+0x7b/0x1b0 [f2fs]
__mutex_lock+0x4f/0x830
? f2fs_sync_fs+0x7b/0x1b0 [f2fs]
mutex_lock_nested+0x25/0x30
? f2fs_sync_fs+0x7b/0x1b0 [f2fs]
f2fs_sync_fs+0x7b/0x1b0 [f2fs]
f2fs_balance_fs_bg+0xb9/0x200 [f2fs]
gc_thread_func+0x302/0x4a0 [f2fs]
? preempt_schedule_common+0x2f/0x4d
? f2fs_gc+0x540/0x540 [f2fs]
kthread+0xe9/0x120
? f2fs_gc+0x540/0x540 [f2fs]
? kthread_create_on_node+0x30/0x30
ret_from_fork+0x19/0x24
The deadlock occurs in below condition:
GC Thread Thread B
- sb_start_intwrite
- f2fs_sync_file
- f2fs_sync_fs
- mutex_lock(&sbi->gc_mutex)
- write_checkpoint
- block_operations
- f2fs_sync_inode_meta
- iput
- sb_start_intwrite
- mutex_lock(&sbi->gc_mutex)
Fix this by altering sb_start_intwrite to sb_start_write_trylock.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-07-22 08:52:23 +08:00
|
|
|
sb_end_intwrite(sbi->sb);
|
|
|
|
|
2014-04-27 14:21:21 +08:00
|
|
|
wait_event_interruptible(*q,
|
2014-09-05 18:31:00 +08:00
|
|
|
kthread_should_stop() || !llist_empty(&fcc->issue_list));
|
2014-04-02 14:34:36 +08:00
|
|
|
goto repeat;
|
|
|
|
}
|
|
|
|
|
2017-09-29 13:59:38 +08:00
|
|
|
int f2fs_issue_flush(struct f2fs_sb_info *sbi, nid_t ino)
|
2014-04-02 14:34:36 +08:00
|
|
|
{
|
2017-01-10 06:13:03 +08:00
|
|
|
struct flush_cmd_control *fcc = SM_I(sbi)->fcc_info;
|
2014-05-08 17:00:35 +08:00
|
|
|
struct flush_cmd cmd;
|
2017-03-25 17:19:58 +08:00
|
|
|
int ret;
|
2014-04-02 14:34:36 +08:00
|
|
|
|
2014-07-24 00:57:31 +08:00
|
|
|
if (test_opt(sbi, NOBARRIER))
|
|
|
|
return 0;
|
|
|
|
|
2017-03-25 17:19:58 +08:00
|
|
|
if (!test_opt(sbi, FLUSH_MERGE)) {
|
2017-09-29 13:59:38 +08:00
|
|
|
ret = submit_flush_wait(sbi, ino);
|
2017-03-25 17:19:58 +08:00
|
|
|
atomic_inc(&fcc->issued_flush);
|
|
|
|
return ret;
|
|
|
|
}
|
2015-08-15 02:43:56 +08:00
|
|
|
|
2017-09-29 13:59:38 +08:00
|
|
|
if (atomic_inc_return(&fcc->issing_flush) == 1 || sbi->s_ndevs > 1) {
|
|
|
|
ret = submit_flush_wait(sbi, ino);
|
2017-03-25 17:19:58 +08:00
|
|
|
atomic_dec(&fcc->issing_flush);
|
|
|
|
|
|
|
|
atomic_inc(&fcc->issued_flush);
|
2015-08-15 02:43:56 +08:00
|
|
|
return ret;
|
|
|
|
}
|
2014-04-02 14:34:36 +08:00
|
|
|
|
2017-09-29 13:59:38 +08:00
|
|
|
cmd.ino = ino;
|
2014-05-08 17:00:35 +08:00
|
|
|
init_completion(&cmd.wait);
|
2014-04-02 14:34:36 +08:00
|
|
|
|
2014-09-05 18:31:00 +08:00
|
|
|
llist_add(&cmd.llnode, &fcc->issue_list);
|
2014-04-02 14:34:36 +08:00
|
|
|
|
2017-08-21 22:53:45 +08:00
|
|
|
/* update issue_list before we wake up issue_flush thread */
|
|
|
|
smp_mb();
|
|
|
|
|
|
|
|
if (waitqueue_active(&fcc->flush_wait_queue))
|
2014-04-27 14:21:21 +08:00
|
|
|
wake_up(&fcc->flush_wait_queue);
|
2014-04-02 14:34:36 +08:00
|
|
|
|
2016-12-08 08:23:32 +08:00
|
|
|
if (fcc->f2fs_issue_flush) {
|
|
|
|
wait_for_completion(&cmd.wait);
|
2017-03-25 17:19:58 +08:00
|
|
|
atomic_dec(&fcc->issing_flush);
|
2016-12-08 08:23:32 +08:00
|
|
|
} else {
|
2017-08-31 18:56:06 +08:00
|
|
|
struct llist_node *list;
|
|
|
|
|
|
|
|
list = llist_del_all(&fcc->issue_list);
|
|
|
|
if (!list) {
|
|
|
|
wait_for_completion(&cmd.wait);
|
|
|
|
atomic_dec(&fcc->issing_flush);
|
|
|
|
} else {
|
|
|
|
struct flush_cmd *tmp, *next;
|
|
|
|
|
2017-09-29 13:59:38 +08:00
|
|
|
ret = submit_flush_wait(sbi, ino);
|
2017-08-31 18:56:06 +08:00
|
|
|
|
|
|
|
llist_for_each_entry_safe(tmp, next, list, llnode) {
|
|
|
|
if (tmp == &cmd) {
|
|
|
|
cmd.ret = ret;
|
|
|
|
atomic_dec(&fcc->issing_flush);
|
|
|
|
continue;
|
|
|
|
}
|
|
|
|
tmp->ret = ret;
|
|
|
|
complete(&tmp->wait);
|
|
|
|
}
|
|
|
|
}
|
2016-12-08 08:23:32 +08:00
|
|
|
}
|
2014-05-08 17:00:35 +08:00
|
|
|
|
|
|
|
return cmd.ret;
|
2014-04-02 14:34:36 +08:00
|
|
|
}
|
|
|
|
|
f2fs: clean up symbol namespace
As Ted reported:
"Hi, I was looking at f2fs's sources recently, and I noticed that there
is a very large number of non-static symbols which don't have a f2fs
prefix. There's well over a hundred (see attached below).
As one example, in fs/f2fs/dir.c there is:
unsigned char get_de_type(struct f2fs_dir_entry *de)
This function is clearly only useful for f2fs, but it has a generic
name. This means that if any other file system tries to have the same
symbol name, there will be a symbol conflict and the kernel would not
successfully build. It also means that when someone is looking f2fs
sources, it's not at all obvious whether a function such as
read_data_page(), invalidate_blocks(), is a generic kernel function
found in the fs, mm, or block layers, or a f2fs specific function.
You might want to fix this at some point. Hopefully Kent's bcachefs
isn't similarly using genericly named functions, since that might
cause conflicts with f2fs's functions --- but just as this would be a
problem that we would rightly insist that Kent fix, this is something
that we should have rightly insisted that f2fs should have fixed
before it was integrated into the mainline kernel.
acquire_orphan_inode
add_ino_entry
add_orphan_inode
allocate_data_block
allocate_new_segments
alloc_nid
alloc_nid_done
alloc_nid_failed
available_free_memory
...."
This patch adds "f2fs_" prefix for all non-static symbols in order to:
a) avoid conflict with other kernel generic symbols;
b) to indicate the function is f2fs specific one instead of generic
one;
Reported-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-30 00:20:41 +08:00
|
|
|
int f2fs_create_flush_cmd_control(struct f2fs_sb_info *sbi)
|
2014-04-27 14:21:33 +08:00
|
|
|
{
|
|
|
|
dev_t dev = sbi->sb->s_bdev->bd_dev;
|
|
|
|
struct flush_cmd_control *fcc;
|
|
|
|
int err = 0;
|
|
|
|
|
2017-01-10 06:13:03 +08:00
|
|
|
if (SM_I(sbi)->fcc_info) {
|
|
|
|
fcc = SM_I(sbi)->fcc_info;
|
2017-06-24 15:57:19 +08:00
|
|
|
if (fcc->f2fs_issue_flush)
|
|
|
|
return err;
|
2016-12-08 08:23:32 +08:00
|
|
|
goto init_thread;
|
|
|
|
}
|
|
|
|
|
2017-11-30 19:28:17 +08:00
|
|
|
fcc = f2fs_kzalloc(sbi, sizeof(struct flush_cmd_control), GFP_KERNEL);
|
2014-04-27 14:21:33 +08:00
|
|
|
if (!fcc)
|
|
|
|
return -ENOMEM;
|
2017-03-25 17:19:58 +08:00
|
|
|
atomic_set(&fcc->issued_flush, 0);
|
|
|
|
atomic_set(&fcc->issing_flush, 0);
|
2014-04-27 14:21:33 +08:00
|
|
|
init_waitqueue_head(&fcc->flush_wait_queue);
|
2014-09-05 18:31:00 +08:00
|
|
|
init_llist_head(&fcc->issue_list);
|
2017-01-10 06:13:03 +08:00
|
|
|
SM_I(sbi)->fcc_info = fcc;
|
2017-06-01 16:43:51 +08:00
|
|
|
if (!test_opt(sbi, FLUSH_MERGE))
|
|
|
|
return err;
|
|
|
|
|
2016-12-08 08:23:32 +08:00
|
|
|
init_thread:
|
2014-04-27 14:21:33 +08:00
|
|
|
fcc->f2fs_issue_flush = kthread_run(issue_flush_thread, sbi,
|
|
|
|
"f2fs_flush-%u:%u", MAJOR(dev), MINOR(dev));
|
|
|
|
if (IS_ERR(fcc->f2fs_issue_flush)) {
|
|
|
|
err = PTR_ERR(fcc->f2fs_issue_flush);
|
|
|
|
kfree(fcc);
|
2017-01-10 06:13:03 +08:00
|
|
|
SM_I(sbi)->fcc_info = NULL;
|
2014-04-27 14:21:33 +08:00
|
|
|
return err;
|
|
|
|
}
|
|
|
|
|
|
|
|
return err;
|
|
|
|
}
|
|
|
|
|
f2fs: clean up symbol namespace
As Ted reported:
"Hi, I was looking at f2fs's sources recently, and I noticed that there
is a very large number of non-static symbols which don't have a f2fs
prefix. There's well over a hundred (see attached below).
As one example, in fs/f2fs/dir.c there is:
unsigned char get_de_type(struct f2fs_dir_entry *de)
This function is clearly only useful for f2fs, but it has a generic
name. This means that if any other file system tries to have the same
symbol name, there will be a symbol conflict and the kernel would not
successfully build. It also means that when someone is looking f2fs
sources, it's not at all obvious whether a function such as
read_data_page(), invalidate_blocks(), is a generic kernel function
found in the fs, mm, or block layers, or a f2fs specific function.
You might want to fix this at some point. Hopefully Kent's bcachefs
isn't similarly using genericly named functions, since that might
cause conflicts with f2fs's functions --- but just as this would be a
problem that we would rightly insist that Kent fix, this is something
that we should have rightly insisted that f2fs should have fixed
before it was integrated into the mainline kernel.
acquire_orphan_inode
add_ino_entry
add_orphan_inode
allocate_data_block
allocate_new_segments
alloc_nid
alloc_nid_done
alloc_nid_failed
available_free_memory
...."
This patch adds "f2fs_" prefix for all non-static symbols in order to:
a) avoid conflict with other kernel generic symbols;
b) to indicate the function is f2fs specific one instead of generic
one;
Reported-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-30 00:20:41 +08:00
|
|
|
void f2fs_destroy_flush_cmd_control(struct f2fs_sb_info *sbi, bool free)
|
2014-04-27 14:21:33 +08:00
|
|
|
{
|
2017-01-10 06:13:03 +08:00
|
|
|
struct flush_cmd_control *fcc = SM_I(sbi)->fcc_info;
|
2014-04-27 14:21:33 +08:00
|
|
|
|
2016-12-08 08:23:32 +08:00
|
|
|
if (fcc && fcc->f2fs_issue_flush) {
|
|
|
|
struct task_struct *flush_thread = fcc->f2fs_issue_flush;
|
|
|
|
|
|
|
|
fcc->f2fs_issue_flush = NULL;
|
|
|
|
kthread_stop(flush_thread);
|
|
|
|
}
|
|
|
|
if (free) {
|
|
|
|
kfree(fcc);
|
2017-01-10 06:13:03 +08:00
|
|
|
SM_I(sbi)->fcc_info = NULL;
|
2016-12-08 08:23:32 +08:00
|
|
|
}
|
2014-04-27 14:21:33 +08:00
|
|
|
}
|
|
|
|
|
2017-09-29 13:59:39 +08:00
|
|
|
int f2fs_flush_device_cache(struct f2fs_sb_info *sbi)
|
|
|
|
{
|
|
|
|
int ret = 0, i;
|
|
|
|
|
|
|
|
if (!sbi->s_ndevs)
|
|
|
|
return 0;
|
|
|
|
|
|
|
|
for (i = 1; i < sbi->s_ndevs; i++) {
|
|
|
|
if (!f2fs_test_bit(i, (char *)&sbi->dirty_device))
|
|
|
|
continue;
|
|
|
|
ret = __submit_flush_wait(sbi, FDEV(i).bdev);
|
|
|
|
if (ret)
|
|
|
|
break;
|
|
|
|
|
|
|
|
spin_lock(&sbi->dev_lock);
|
|
|
|
f2fs_clear_bit(i, (char *)&sbi->dirty_device);
|
|
|
|
spin_unlock(&sbi->dev_lock);
|
|
|
|
}
|
|
|
|
|
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
static void __locate_dirty_segment(struct f2fs_sb_info *sbi, unsigned int segno,
|
|
|
|
enum dirty_type dirty_type)
|
|
|
|
{
|
|
|
|
struct dirty_seglist_info *dirty_i = DIRTY_I(sbi);
|
|
|
|
|
|
|
|
/* need not be added */
|
|
|
|
if (IS_CURSEG(sbi, segno))
|
|
|
|
return;
|
|
|
|
|
|
|
|
if (!test_and_set_bit(segno, dirty_i->dirty_segmap[dirty_type]))
|
|
|
|
dirty_i->nr_dirty[dirty_type]++;
|
|
|
|
|
|
|
|
if (dirty_type == DIRTY) {
|
|
|
|
struct seg_entry *sentry = get_seg_entry(sbi, segno);
|
2013-10-25 16:31:57 +08:00
|
|
|
enum dirty_type t = sentry->type;
|
f2fs: fix the bitmap consistency of dirty segments
Like below, there are 8 segment bitmaps for SSR victim candidates.
enum dirty_type {
DIRTY_HOT_DATA, /* dirty segments assigned as hot data logs */
DIRTY_WARM_DATA, /* dirty segments assigned as warm data logs */
DIRTY_COLD_DATA, /* dirty segments assigned as cold data logs */
DIRTY_HOT_NODE, /* dirty segments assigned as hot node logs */
DIRTY_WARM_NODE, /* dirty segments assigned as warm node logs */
DIRTY_COLD_NODE, /* dirty segments assigned as cold node logs */
DIRTY, /* to count # of dirty segments */
PRE, /* to count # of entirely obsolete segments */
NR_DIRTY_TYPE
};
The upper 6 bitmaps indicates segments dirtied by active log areas respectively.
And, the DIRTY bitmap integrates all the 6 bitmaps.
For example,
o DIRTY_HOT_DATA : 1010000
o DIRTY_WARM_DATA: 0100000
o DIRTY_COLD_DATA: 0001000
o DIRTY_HOT_NODE : 0000010
o DIRTY_WARM_NODE: 0000001
o DIRTY_COLD_NODE: 0000000
In this case,
o DIRTY : 1111011,
which means that we should guarantee the consistency between DIRTY and other
bitmaps concreately.
However, the SSR mode selects victims freely from any log types, which can set
multiple bits across the various bitmap types.
So, this patch eliminates this inconsistency.
Reviewed-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-04-01 12:52:09 +08:00
|
|
|
|
2014-09-03 07:24:11 +08:00
|
|
|
if (unlikely(t >= DIRTY)) {
|
|
|
|
f2fs_bug_on(sbi, 1);
|
|
|
|
return;
|
|
|
|
}
|
2013-10-25 16:31:57 +08:00
|
|
|
if (!test_and_set_bit(segno, dirty_i->dirty_segmap[t]))
|
|
|
|
dirty_i->nr_dirty[t]++;
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
static void __remove_dirty_segment(struct f2fs_sb_info *sbi, unsigned int segno,
|
|
|
|
enum dirty_type dirty_type)
|
|
|
|
{
|
|
|
|
struct dirty_seglist_info *dirty_i = DIRTY_I(sbi);
|
|
|
|
|
|
|
|
if (test_and_clear_bit(segno, dirty_i->dirty_segmap[dirty_type]))
|
|
|
|
dirty_i->nr_dirty[dirty_type]--;
|
|
|
|
|
|
|
|
if (dirty_type == DIRTY) {
|
2013-10-25 16:31:57 +08:00
|
|
|
struct seg_entry *sentry = get_seg_entry(sbi, segno);
|
|
|
|
enum dirty_type t = sentry->type;
|
|
|
|
|
|
|
|
if (test_and_clear_bit(segno, dirty_i->dirty_segmap[t]))
|
|
|
|
dirty_i->nr_dirty[t]--;
|
f2fs: fix the bitmap consistency of dirty segments
Like below, there are 8 segment bitmaps for SSR victim candidates.
enum dirty_type {
DIRTY_HOT_DATA, /* dirty segments assigned as hot data logs */
DIRTY_WARM_DATA, /* dirty segments assigned as warm data logs */
DIRTY_COLD_DATA, /* dirty segments assigned as cold data logs */
DIRTY_HOT_NODE, /* dirty segments assigned as hot node logs */
DIRTY_WARM_NODE, /* dirty segments assigned as warm node logs */
DIRTY_COLD_NODE, /* dirty segments assigned as cold node logs */
DIRTY, /* to count # of dirty segments */
PRE, /* to count # of entirely obsolete segments */
NR_DIRTY_TYPE
};
The upper 6 bitmaps indicates segments dirtied by active log areas respectively.
And, the DIRTY bitmap integrates all the 6 bitmaps.
For example,
o DIRTY_HOT_DATA : 1010000
o DIRTY_WARM_DATA: 0100000
o DIRTY_COLD_DATA: 0001000
o DIRTY_HOT_NODE : 0000010
o DIRTY_WARM_NODE: 0000001
o DIRTY_COLD_NODE: 0000000
In this case,
o DIRTY : 1111011,
which means that we should guarantee the consistency between DIRTY and other
bitmaps concreately.
However, the SSR mode selects victims freely from any log types, which can set
multiple bits across the various bitmap types.
So, this patch eliminates this inconsistency.
Reviewed-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-04-01 12:52:09 +08:00
|
|
|
|
2017-04-08 05:33:22 +08:00
|
|
|
if (get_valid_blocks(sbi, segno, true) == 0)
|
2017-04-08 06:08:17 +08:00
|
|
|
clear_bit(GET_SEC_FROM_SEG(sbi, segno),
|
2013-03-31 12:26:03 +08:00
|
|
|
dirty_i->victim_secmap);
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2012-11-29 12:28:09 +08:00
|
|
|
/*
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
* Should not occur error such as -ENOMEM.
|
|
|
|
* Adding dirty entry into seglist is not critical operation.
|
|
|
|
* If a given segment is one of current working segments, it won't be added.
|
|
|
|
*/
|
2013-06-13 16:59:28 +08:00
|
|
|
static void locate_dirty_segment(struct f2fs_sb_info *sbi, unsigned int segno)
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
{
|
|
|
|
struct dirty_seglist_info *dirty_i = DIRTY_I(sbi);
|
|
|
|
unsigned short valid_blocks;
|
|
|
|
|
|
|
|
if (segno == NULL_SEGNO || IS_CURSEG(sbi, segno))
|
|
|
|
return;
|
|
|
|
|
|
|
|
mutex_lock(&dirty_i->seglist_lock);
|
|
|
|
|
2017-04-08 05:33:22 +08:00
|
|
|
valid_blocks = get_valid_blocks(sbi, segno, false);
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
|
|
|
|
if (valid_blocks == 0) {
|
|
|
|
__locate_dirty_segment(sbi, segno, PRE);
|
|
|
|
__remove_dirty_segment(sbi, segno, DIRTY);
|
|
|
|
} else if (valid_blocks < sbi->blocks_per_seg) {
|
|
|
|
__locate_dirty_segment(sbi, segno, DIRTY);
|
|
|
|
} else {
|
|
|
|
/* Recovery routine with SSR needs this */
|
|
|
|
__remove_dirty_segment(sbi, segno, DIRTY);
|
|
|
|
}
|
|
|
|
|
|
|
|
mutex_unlock(&dirty_i->seglist_lock);
|
|
|
|
}
|
|
|
|
|
2017-04-14 23:24:55 +08:00
|
|
|
static struct discard_cmd *__create_discard_cmd(struct f2fs_sb_info *sbi,
|
2017-03-08 10:02:02 +08:00
|
|
|
struct block_device *bdev, block_t lstart,
|
|
|
|
block_t start, block_t len)
|
2016-08-29 23:58:34 +08:00
|
|
|
{
|
2017-01-12 06:40:24 +08:00
|
|
|
struct discard_cmd_control *dcc = SM_I(sbi)->dcc_info;
|
2017-04-15 14:09:37 +08:00
|
|
|
struct list_head *pend_list;
|
2017-01-10 06:13:03 +08:00
|
|
|
struct discard_cmd *dc;
|
2016-08-29 23:58:34 +08:00
|
|
|
|
2017-04-15 14:09:37 +08:00
|
|
|
f2fs_bug_on(sbi, !len);
|
|
|
|
|
|
|
|
pend_list = &dcc->pend_list[plist_idx(len)];
|
|
|
|
|
2017-01-10 06:13:03 +08:00
|
|
|
dc = f2fs_kmem_cache_alloc(discard_cmd_slab, GFP_NOFS);
|
|
|
|
INIT_LIST_HEAD(&dc->list);
|
2017-03-08 10:02:02 +08:00
|
|
|
dc->bdev = bdev;
|
2017-01-10 06:13:03 +08:00
|
|
|
dc->lstart = lstart;
|
2017-03-08 10:02:02 +08:00
|
|
|
dc->start = start;
|
2017-01-10 06:13:03 +08:00
|
|
|
dc->len = len;
|
2017-04-26 17:39:54 +08:00
|
|
|
dc->ref = 0;
|
2017-01-10 12:32:07 +08:00
|
|
|
dc->state = D_PREP;
|
2018-08-06 22:43:50 +08:00
|
|
|
dc->issuing = 0;
|
2017-03-08 10:02:02 +08:00
|
|
|
dc->error = 0;
|
2017-01-10 06:13:03 +08:00
|
|
|
init_completion(&dc->wait);
|
2017-04-05 18:19:48 +08:00
|
|
|
list_add_tail(&dc->list, pend_list);
|
2018-08-06 22:43:50 +08:00
|
|
|
spin_lock_init(&dc->lock);
|
|
|
|
dc->bio_ref = 0;
|
2017-03-25 17:19:59 +08:00
|
|
|
atomic_inc(&dcc->discard_cmd_cnt);
|
2017-04-18 19:27:39 +08:00
|
|
|
dcc->undiscard_blks += len;
|
2017-04-14 23:24:55 +08:00
|
|
|
|
|
|
|
return dc;
|
|
|
|
}
|
|
|
|
|
|
|
|
static struct discard_cmd *__attach_discard_cmd(struct f2fs_sb_info *sbi,
|
|
|
|
struct block_device *bdev, block_t lstart,
|
|
|
|
block_t start, block_t len,
|
|
|
|
struct rb_node *parent, struct rb_node **p)
|
|
|
|
{
|
|
|
|
struct discard_cmd_control *dcc = SM_I(sbi)->dcc_info;
|
|
|
|
struct discard_cmd *dc;
|
|
|
|
|
|
|
|
dc = __create_discard_cmd(sbi, bdev, lstart, start, len);
|
|
|
|
|
|
|
|
rb_link_node(&dc->rb_node, parent, p);
|
|
|
|
rb_insert_color(&dc->rb_node, &dcc->root);
|
|
|
|
|
|
|
|
return dc;
|
2017-01-10 12:32:07 +08:00
|
|
|
}
|
|
|
|
|
2017-04-14 23:24:55 +08:00
|
|
|
static void __detach_discard_cmd(struct discard_cmd_control *dcc,
|
|
|
|
struct discard_cmd *dc)
|
2017-01-10 12:32:07 +08:00
|
|
|
{
|
2017-01-12 02:20:04 +08:00
|
|
|
if (dc->state == D_DONE)
|
2018-08-06 22:43:50 +08:00
|
|
|
atomic_sub(dc->issuing, &dcc->issing_discard);
|
2017-04-14 23:24:55 +08:00
|
|
|
|
|
|
|
list_del(&dc->list);
|
|
|
|
rb_erase(&dc->rb_node, &dcc->root);
|
2017-04-18 19:27:39 +08:00
|
|
|
dcc->undiscard_blks -= dc->len;
|
2017-04-14 23:24:55 +08:00
|
|
|
|
|
|
|
kmem_cache_free(discard_cmd_slab, dc);
|
|
|
|
|
|
|
|
atomic_dec(&dcc->discard_cmd_cnt);
|
|
|
|
}
|
|
|
|
|
|
|
|
static void __remove_discard_cmd(struct f2fs_sb_info *sbi,
|
|
|
|
struct discard_cmd *dc)
|
|
|
|
{
|
|
|
|
struct discard_cmd_control *dcc = SM_I(sbi)->dcc_info;
|
2018-08-06 22:43:50 +08:00
|
|
|
unsigned long flags;
|
2017-01-12 02:20:04 +08:00
|
|
|
|
2017-10-04 09:08:36 +08:00
|
|
|
trace_f2fs_remove_discard(dc->bdev, dc->start, dc->len);
|
|
|
|
|
2018-08-06 22:43:50 +08:00
|
|
|
spin_lock_irqsave(&dc->lock, flags);
|
|
|
|
if (dc->bio_ref) {
|
|
|
|
spin_unlock_irqrestore(&dc->lock, flags);
|
|
|
|
return;
|
|
|
|
}
|
|
|
|
spin_unlock_irqrestore(&dc->lock, flags);
|
|
|
|
|
2017-06-05 18:29:07 +08:00
|
|
|
f2fs_bug_on(sbi, dc->ref);
|
|
|
|
|
2017-03-08 10:02:02 +08:00
|
|
|
if (dc->error == -EOPNOTSUPP)
|
|
|
|
dc->error = 0;
|
2017-01-10 12:32:07 +08:00
|
|
|
|
2017-03-08 10:02:02 +08:00
|
|
|
if (dc->error)
|
2018-08-22 17:17:47 +08:00
|
|
|
printk_ratelimited(
|
|
|
|
"%sF2FS-fs: Issue discard(%u, %u, %u) failed, ret: %d",
|
|
|
|
KERN_INFO, dc->lstart, dc->start, dc->len, dc->error);
|
2017-04-14 23:24:55 +08:00
|
|
|
__detach_discard_cmd(dcc, dc);
|
2016-08-29 23:58:34 +08:00
|
|
|
}
|
|
|
|
|
2017-03-08 10:02:02 +08:00
|
|
|
static void f2fs_submit_discard_endio(struct bio *bio)
|
|
|
|
{
|
|
|
|
struct discard_cmd *dc = (struct discard_cmd *)bio->bi_private;
|
2018-08-06 22:43:50 +08:00
|
|
|
unsigned long flags;
|
2017-03-08 10:02:02 +08:00
|
|
|
|
2017-06-03 15:38:06 +08:00
|
|
|
dc->error = blk_status_to_errno(bio->bi_status);
|
2018-08-06 22:43:50 +08:00
|
|
|
|
|
|
|
spin_lock_irqsave(&dc->lock, flags);
|
|
|
|
dc->bio_ref--;
|
|
|
|
if (!dc->bio_ref && dc->state == D_SUBMIT) {
|
|
|
|
dc->state = D_DONE;
|
|
|
|
complete_all(&dc->wait);
|
|
|
|
}
|
|
|
|
spin_unlock_irqrestore(&dc->lock, flags);
|
2017-03-08 10:02:02 +08:00
|
|
|
bio_put(bio);
|
|
|
|
}
|
|
|
|
|
2018-01-05 17:41:20 +08:00
|
|
|
static void __check_sit_bitmap(struct f2fs_sb_info *sbi,
|
2017-06-30 17:19:02 +08:00
|
|
|
block_t start, block_t end)
|
|
|
|
{
|
|
|
|
#ifdef CONFIG_F2FS_CHECK_FS
|
|
|
|
struct seg_entry *sentry;
|
|
|
|
unsigned int segno;
|
|
|
|
block_t blk = start;
|
|
|
|
unsigned long offset, size, max_blocks = sbi->blocks_per_seg;
|
|
|
|
unsigned long *map;
|
|
|
|
|
|
|
|
while (blk < end) {
|
|
|
|
segno = GET_SEGNO(sbi, blk);
|
|
|
|
sentry = get_seg_entry(sbi, segno);
|
|
|
|
offset = GET_BLKOFF_FROM_SEG0(sbi, blk);
|
|
|
|
|
2017-08-04 17:07:15 +08:00
|
|
|
if (end < START_BLOCK(sbi, segno + 1))
|
|
|
|
size = GET_BLKOFF_FROM_SEG0(sbi, end);
|
|
|
|
else
|
|
|
|
size = max_blocks;
|
2017-06-30 17:19:02 +08:00
|
|
|
map = (unsigned long *)(sentry->cur_valid_map);
|
|
|
|
offset = __find_rev_next_bit(map, size, offset);
|
|
|
|
f2fs_bug_on(sbi, offset != size);
|
2017-08-04 17:07:15 +08:00
|
|
|
blk = START_BLOCK(sbi, segno + 1);
|
2017-06-30 17:19:02 +08:00
|
|
|
}
|
|
|
|
#endif
|
|
|
|
}
|
|
|
|
|
2018-05-30 00:58:42 +08:00
|
|
|
static void __init_discard_policy(struct f2fs_sb_info *sbi,
|
|
|
|
struct discard_policy *dpolicy,
|
|
|
|
int discard_type, unsigned int granularity)
|
|
|
|
{
|
|
|
|
/* common policy */
|
|
|
|
dpolicy->type = discard_type;
|
|
|
|
dpolicy->sync = true;
|
2018-07-08 22:11:01 +08:00
|
|
|
dpolicy->ordered = false;
|
2018-05-30 00:58:42 +08:00
|
|
|
dpolicy->granularity = granularity;
|
|
|
|
|
|
|
|
dpolicy->max_requests = DEF_MAX_DISCARD_REQUEST;
|
|
|
|
dpolicy->io_aware_gran = MAX_PLIST_NUM;
|
|
|
|
|
|
|
|
if (discard_type == DPOLICY_BG) {
|
|
|
|
dpolicy->min_interval = DEF_MIN_DISCARD_ISSUE_TIME;
|
2018-04-08 15:11:11 +08:00
|
|
|
dpolicy->mid_interval = DEF_MID_DISCARD_ISSUE_TIME;
|
2018-05-30 00:58:42 +08:00
|
|
|
dpolicy->max_interval = DEF_MAX_DISCARD_ISSUE_TIME;
|
|
|
|
dpolicy->io_aware = true;
|
2018-04-10 15:43:09 +08:00
|
|
|
dpolicy->sync = false;
|
2018-07-08 22:11:01 +08:00
|
|
|
dpolicy->ordered = true;
|
2018-05-30 00:58:42 +08:00
|
|
|
if (utilization(sbi) > DEF_DISCARD_URGENT_UTIL) {
|
|
|
|
dpolicy->granularity = 1;
|
|
|
|
dpolicy->max_interval = DEF_MIN_DISCARD_ISSUE_TIME;
|
|
|
|
}
|
|
|
|
} else if (discard_type == DPOLICY_FORCE) {
|
|
|
|
dpolicy->min_interval = DEF_MIN_DISCARD_ISSUE_TIME;
|
2018-04-08 15:11:11 +08:00
|
|
|
dpolicy->mid_interval = DEF_MID_DISCARD_ISSUE_TIME;
|
2018-05-30 00:58:42 +08:00
|
|
|
dpolicy->max_interval = DEF_MAX_DISCARD_ISSUE_TIME;
|
|
|
|
dpolicy->io_aware = false;
|
|
|
|
} else if (discard_type == DPOLICY_FSTRIM) {
|
|
|
|
dpolicy->io_aware = false;
|
|
|
|
} else if (discard_type == DPOLICY_UMOUNT) {
|
2018-04-04 17:29:05 +08:00
|
|
|
dpolicy->max_requests = UINT_MAX;
|
2018-05-30 00:58:42 +08:00
|
|
|
dpolicy->io_aware = false;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2018-08-06 22:43:50 +08:00
|
|
|
static void __update_discard_tree_range(struct f2fs_sb_info *sbi,
|
|
|
|
struct block_device *bdev, block_t lstart,
|
|
|
|
block_t start, block_t len);
|
2017-03-08 10:02:02 +08:00
|
|
|
/* this function is copied from blkdev_issue_discard from block/blk-lib.c */
|
2018-08-08 10:14:55 +08:00
|
|
|
static int __submit_discard_cmd(struct f2fs_sb_info *sbi,
|
2017-10-04 09:08:34 +08:00
|
|
|
struct discard_policy *dpolicy,
|
2018-08-06 22:43:50 +08:00
|
|
|
struct discard_cmd *dc,
|
|
|
|
unsigned int *issued)
|
2017-03-08 10:02:02 +08:00
|
|
|
{
|
2018-08-06 22:43:50 +08:00
|
|
|
struct block_device *bdev = dc->bdev;
|
|
|
|
struct request_queue *q = bdev_get_queue(bdev);
|
|
|
|
unsigned int max_discard_blocks =
|
|
|
|
SECTOR_TO_BLOCK(q->limits.max_discard_sectors);
|
2017-03-08 10:02:02 +08:00
|
|
|
struct discard_cmd_control *dcc = SM_I(sbi)->dcc_info;
|
2017-10-04 09:08:34 +08:00
|
|
|
struct list_head *wait_list = (dpolicy->type == DPOLICY_FSTRIM) ?
|
|
|
|
&(dcc->fstrim_list) : &(dcc->wait_list);
|
|
|
|
int flag = dpolicy->sync ? REQ_SYNC : 0;
|
2018-08-06 22:43:50 +08:00
|
|
|
block_t lstart, start, len, total_len;
|
|
|
|
int err = 0;
|
2017-03-08 10:02:02 +08:00
|
|
|
|
|
|
|
if (dc->state != D_PREP)
|
2018-08-08 10:14:55 +08:00
|
|
|
return 0;
|
2017-03-08 10:02:02 +08:00
|
|
|
|
2018-04-13 11:08:05 +08:00
|
|
|
if (is_sbi_flag_set(sbi, SBI_NEED_FSCK))
|
2018-08-08 10:14:55 +08:00
|
|
|
return 0;
|
2018-04-13 11:08:05 +08:00
|
|
|
|
2018-08-06 22:43:50 +08:00
|
|
|
trace_f2fs_issue_discard(bdev, dc->start, dc->len);
|
|
|
|
|
|
|
|
lstart = dc->lstart;
|
|
|
|
start = dc->start;
|
|
|
|
len = dc->len;
|
|
|
|
total_len = len;
|
|
|
|
|
|
|
|
dc->len = 0;
|
|
|
|
|
|
|
|
while (total_len && *issued < dpolicy->max_requests && !err) {
|
|
|
|
struct bio *bio = NULL;
|
|
|
|
unsigned long flags;
|
|
|
|
bool last = true;
|
|
|
|
|
|
|
|
if (len > max_discard_blocks) {
|
|
|
|
len = max_discard_blocks;
|
|
|
|
last = false;
|
|
|
|
}
|
|
|
|
|
|
|
|
(*issued)++;
|
|
|
|
if (*issued == dpolicy->max_requests)
|
|
|
|
last = true;
|
|
|
|
|
|
|
|
dc->len += len;
|
|
|
|
|
2018-08-06 20:30:18 +08:00
|
|
|
if (time_to_inject(sbi, FAULT_DISCARD)) {
|
|
|
|
f2fs_show_injection_info(FAULT_DISCARD);
|
|
|
|
err = -EIO;
|
|
|
|
goto submit;
|
|
|
|
}
|
2018-08-06 22:43:50 +08:00
|
|
|
err = __blkdev_issue_discard(bdev,
|
|
|
|
SECTOR_FROM_BLOCK(start),
|
|
|
|
SECTOR_FROM_BLOCK(len),
|
|
|
|
GFP_NOFS, 0, &bio);
|
2018-08-06 20:30:18 +08:00
|
|
|
submit:
|
2018-08-08 10:14:55 +08:00
|
|
|
if (err) {
|
2018-08-06 22:43:50 +08:00
|
|
|
spin_lock_irqsave(&dc->lock, flags);
|
2018-08-08 10:14:55 +08:00
|
|
|
if (dc->state == D_PARTIAL)
|
2018-08-06 22:43:50 +08:00
|
|
|
dc->state = D_SUBMIT;
|
|
|
|
spin_unlock_irqrestore(&dc->lock, flags);
|
|
|
|
|
2018-08-08 10:14:55 +08:00
|
|
|
break;
|
|
|
|
}
|
2018-08-06 22:43:50 +08:00
|
|
|
|
2018-08-08 10:14:55 +08:00
|
|
|
f2fs_bug_on(sbi, !bio);
|
2018-08-06 22:43:50 +08:00
|
|
|
|
2018-08-08 10:14:55 +08:00
|
|
|
/*
|
|
|
|
* should keep before submission to avoid D_DONE
|
|
|
|
* right away
|
|
|
|
*/
|
|
|
|
spin_lock_irqsave(&dc->lock, flags);
|
|
|
|
if (last)
|
|
|
|
dc->state = D_SUBMIT;
|
|
|
|
else
|
|
|
|
dc->state = D_PARTIAL;
|
|
|
|
dc->bio_ref++;
|
|
|
|
spin_unlock_irqrestore(&dc->lock, flags);
|
2018-08-06 22:43:50 +08:00
|
|
|
|
2018-08-08 10:14:55 +08:00
|
|
|
atomic_inc(&dcc->issing_discard);
|
|
|
|
dc->issuing++;
|
|
|
|
list_move_tail(&dc->list, wait_list);
|
2017-08-02 23:21:48 +08:00
|
|
|
|
2018-08-08 10:14:55 +08:00
|
|
|
/* sanity check on discard range */
|
|
|
|
__check_sit_bitmap(sbi, start, start + len);
|
2018-08-06 22:43:50 +08:00
|
|
|
|
2018-08-08 10:14:55 +08:00
|
|
|
bio->bi_private = dc;
|
|
|
|
bio->bi_end_io = f2fs_submit_discard_endio;
|
|
|
|
bio->bi_opf |= flag;
|
|
|
|
submit_bio(bio);
|
|
|
|
|
|
|
|
atomic_inc(&dcc->issued_discard);
|
|
|
|
|
|
|
|
f2fs_update_iostat(sbi, FS_DISCARD, 1);
|
2018-08-06 22:43:50 +08:00
|
|
|
|
|
|
|
lstart += len;
|
|
|
|
start += len;
|
|
|
|
total_len -= len;
|
|
|
|
len = total_len;
|
2017-03-08 10:02:02 +08:00
|
|
|
}
|
2018-08-06 22:43:50 +08:00
|
|
|
|
2018-08-08 10:14:55 +08:00
|
|
|
if (!err && len)
|
2018-08-06 22:43:50 +08:00
|
|
|
__update_discard_tree_range(sbi, bdev, lstart, start, len);
|
2018-08-08 10:14:55 +08:00
|
|
|
return err;
|
2017-03-08 10:02:02 +08:00
|
|
|
}
|
|
|
|
|
2017-04-14 23:24:55 +08:00
|
|
|
static struct discard_cmd *__insert_discard_tree(struct f2fs_sb_info *sbi,
|
|
|
|
struct block_device *bdev, block_t lstart,
|
|
|
|
block_t start, block_t len,
|
|
|
|
struct rb_node **insert_p,
|
|
|
|
struct rb_node *insert_parent)
|
2017-03-08 10:02:02 +08:00
|
|
|
{
|
2017-04-14 23:24:55 +08:00
|
|
|
struct discard_cmd_control *dcc = SM_I(sbi)->dcc_info;
|
2017-10-19 18:58:21 +08:00
|
|
|
struct rb_node **p;
|
2017-04-14 23:24:55 +08:00
|
|
|
struct rb_node *parent = NULL;
|
|
|
|
struct discard_cmd *dc = NULL;
|
|
|
|
|
|
|
|
if (insert_p && insert_parent) {
|
|
|
|
parent = insert_parent;
|
|
|
|
p = insert_p;
|
|
|
|
goto do_insert;
|
|
|
|
}
|
2017-03-08 10:02:02 +08:00
|
|
|
|
f2fs: clean up symbol namespace
As Ted reported:
"Hi, I was looking at f2fs's sources recently, and I noticed that there
is a very large number of non-static symbols which don't have a f2fs
prefix. There's well over a hundred (see attached below).
As one example, in fs/f2fs/dir.c there is:
unsigned char get_de_type(struct f2fs_dir_entry *de)
This function is clearly only useful for f2fs, but it has a generic
name. This means that if any other file system tries to have the same
symbol name, there will be a symbol conflict and the kernel would not
successfully build. It also means that when someone is looking f2fs
sources, it's not at all obvious whether a function such as
read_data_page(), invalidate_blocks(), is a generic kernel function
found in the fs, mm, or block layers, or a f2fs specific function.
You might want to fix this at some point. Hopefully Kent's bcachefs
isn't similarly using genericly named functions, since that might
cause conflicts with f2fs's functions --- but just as this would be a
problem that we would rightly insist that Kent fix, this is something
that we should have rightly insisted that f2fs should have fixed
before it was integrated into the mainline kernel.
acquire_orphan_inode
add_ino_entry
add_orphan_inode
allocate_data_block
allocate_new_segments
alloc_nid
alloc_nid_done
alloc_nid_failed
available_free_memory
...."
This patch adds "f2fs_" prefix for all non-static symbols in order to:
a) avoid conflict with other kernel generic symbols;
b) to indicate the function is f2fs specific one instead of generic
one;
Reported-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-30 00:20:41 +08:00
|
|
|
p = f2fs_lookup_rb_tree_for_insert(sbi, &dcc->root, &parent, lstart);
|
2017-04-14 23:24:55 +08:00
|
|
|
do_insert:
|
|
|
|
dc = __attach_discard_cmd(sbi, bdev, lstart, start, len, parent, p);
|
|
|
|
if (!dc)
|
|
|
|
return NULL;
|
2017-03-08 10:02:02 +08:00
|
|
|
|
2017-04-14 23:24:55 +08:00
|
|
|
return dc;
|
2017-03-08 10:02:02 +08:00
|
|
|
}
|
|
|
|
|
2017-04-15 14:09:37 +08:00
|
|
|
static void __relocate_discard_cmd(struct discard_cmd_control *dcc,
|
|
|
|
struct discard_cmd *dc)
|
|
|
|
{
|
|
|
|
list_move_tail(&dc->list, &dcc->pend_list[plist_idx(dc->len)]);
|
|
|
|
}
|
|
|
|
|
2017-03-02 10:36:20 +08:00
|
|
|
static void __punch_discard_cmd(struct f2fs_sb_info *sbi,
|
|
|
|
struct discard_cmd *dc, block_t blkaddr)
|
|
|
|
{
|
2017-04-15 14:09:37 +08:00
|
|
|
struct discard_cmd_control *dcc = SM_I(sbi)->dcc_info;
|
2017-04-14 23:24:55 +08:00
|
|
|
struct discard_info di = dc->di;
|
|
|
|
bool modified = false;
|
2017-03-02 10:36:20 +08:00
|
|
|
|
2017-04-14 23:24:55 +08:00
|
|
|
if (dc->state == D_DONE || dc->len == 1) {
|
2017-03-02 10:36:20 +08:00
|
|
|
__remove_discard_cmd(sbi, dc);
|
|
|
|
return;
|
|
|
|
}
|
|
|
|
|
2017-04-18 19:27:39 +08:00
|
|
|
dcc->undiscard_blks -= di.len;
|
|
|
|
|
2017-04-14 23:24:55 +08:00
|
|
|
if (blkaddr > di.lstart) {
|
2017-03-02 10:36:20 +08:00
|
|
|
dc->len = blkaddr - dc->lstart;
|
2017-04-18 19:27:39 +08:00
|
|
|
dcc->undiscard_blks += dc->len;
|
2017-04-15 14:09:37 +08:00
|
|
|
__relocate_discard_cmd(dcc, dc);
|
2017-04-14 23:24:55 +08:00
|
|
|
modified = true;
|
|
|
|
}
|
|
|
|
|
|
|
|
if (blkaddr < di.lstart + di.len - 1) {
|
|
|
|
if (modified) {
|
|
|
|
__insert_discard_tree(sbi, dc->bdev, blkaddr + 1,
|
|
|
|
di.start + blkaddr + 1 - di.lstart,
|
|
|
|
di.lstart + di.len - 1 - blkaddr,
|
|
|
|
NULL, NULL);
|
|
|
|
} else {
|
|
|
|
dc->lstart++;
|
|
|
|
dc->len--;
|
|
|
|
dc->start++;
|
2017-04-18 19:27:39 +08:00
|
|
|
dcc->undiscard_blks += dc->len;
|
2017-04-15 14:09:37 +08:00
|
|
|
__relocate_discard_cmd(dcc, dc);
|
2017-04-14 23:24:55 +08:00
|
|
|
}
|
2017-03-02 10:36:20 +08:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2017-04-14 23:24:55 +08:00
|
|
|
static void __update_discard_tree_range(struct f2fs_sb_info *sbi,
|
|
|
|
struct block_device *bdev, block_t lstart,
|
|
|
|
block_t start, block_t len)
|
2016-08-29 23:58:34 +08:00
|
|
|
{
|
2017-01-12 06:40:24 +08:00
|
|
|
struct discard_cmd_control *dcc = SM_I(sbi)->dcc_info;
|
2017-04-14 23:24:55 +08:00
|
|
|
struct discard_cmd *prev_dc = NULL, *next_dc = NULL;
|
|
|
|
struct discard_cmd *dc;
|
|
|
|
struct discard_info di = {0};
|
|
|
|
struct rb_node **insert_p = NULL, *insert_parent = NULL;
|
2018-08-06 22:43:50 +08:00
|
|
|
struct request_queue *q = bdev_get_queue(bdev);
|
|
|
|
unsigned int max_discard_blocks =
|
|
|
|
SECTOR_TO_BLOCK(q->limits.max_discard_sectors);
|
2017-04-14 23:24:55 +08:00
|
|
|
block_t end = lstart + len;
|
2016-08-29 23:58:34 +08:00
|
|
|
|
f2fs: clean up symbol namespace
As Ted reported:
"Hi, I was looking at f2fs's sources recently, and I noticed that there
is a very large number of non-static symbols which don't have a f2fs
prefix. There's well over a hundred (see attached below).
As one example, in fs/f2fs/dir.c there is:
unsigned char get_de_type(struct f2fs_dir_entry *de)
This function is clearly only useful for f2fs, but it has a generic
name. This means that if any other file system tries to have the same
symbol name, there will be a symbol conflict and the kernel would not
successfully build. It also means that when someone is looking f2fs
sources, it's not at all obvious whether a function such as
read_data_page(), invalidate_blocks(), is a generic kernel function
found in the fs, mm, or block layers, or a f2fs specific function.
You might want to fix this at some point. Hopefully Kent's bcachefs
isn't similarly using genericly named functions, since that might
cause conflicts with f2fs's functions --- but just as this would be a
problem that we would rightly insist that Kent fix, this is something
that we should have rightly insisted that f2fs should have fixed
before it was integrated into the mainline kernel.
acquire_orphan_inode
add_ino_entry
add_orphan_inode
allocate_data_block
allocate_new_segments
alloc_nid
alloc_nid_done
alloc_nid_failed
available_free_memory
...."
This patch adds "f2fs_" prefix for all non-static symbols in order to:
a) avoid conflict with other kernel generic symbols;
b) to indicate the function is f2fs specific one instead of generic
one;
Reported-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-30 00:20:41 +08:00
|
|
|
dc = (struct discard_cmd *)f2fs_lookup_rb_tree_ret(&dcc->root,
|
2017-04-14 23:24:55 +08:00
|
|
|
NULL, lstart,
|
|
|
|
(struct rb_entry **)&prev_dc,
|
|
|
|
(struct rb_entry **)&next_dc,
|
|
|
|
&insert_p, &insert_parent, true);
|
|
|
|
if (dc)
|
|
|
|
prev_dc = dc;
|
|
|
|
|
|
|
|
if (!prev_dc) {
|
|
|
|
di.lstart = lstart;
|
|
|
|
di.len = next_dc ? next_dc->lstart - lstart : len;
|
|
|
|
di.len = min(di.len, len);
|
|
|
|
di.start = start;
|
2017-04-05 18:19:48 +08:00
|
|
|
}
|
2017-01-10 12:32:07 +08:00
|
|
|
|
2017-04-14 23:24:55 +08:00
|
|
|
while (1) {
|
|
|
|
struct rb_node *node;
|
|
|
|
bool merged = false;
|
|
|
|
struct discard_cmd *tdc = NULL;
|
|
|
|
|
|
|
|
if (prev_dc) {
|
|
|
|
di.lstart = prev_dc->lstart + prev_dc->len;
|
|
|
|
if (di.lstart < lstart)
|
|
|
|
di.lstart = lstart;
|
|
|
|
if (di.lstart >= end)
|
|
|
|
break;
|
|
|
|
|
|
|
|
if (!next_dc || next_dc->lstart > end)
|
|
|
|
di.len = end - di.lstart;
|
|
|
|
else
|
|
|
|
di.len = next_dc->lstart - di.lstart;
|
|
|
|
di.start = start + di.lstart - lstart;
|
|
|
|
}
|
|
|
|
|
|
|
|
if (!di.len)
|
|
|
|
goto next;
|
|
|
|
|
|
|
|
if (prev_dc && prev_dc->state == D_PREP &&
|
|
|
|
prev_dc->bdev == bdev &&
|
2018-08-06 22:43:50 +08:00
|
|
|
__is_discard_back_mergeable(&di, &prev_dc->di,
|
|
|
|
max_discard_blocks)) {
|
2017-04-14 23:24:55 +08:00
|
|
|
prev_dc->di.len += di.len;
|
2017-04-18 19:27:39 +08:00
|
|
|
dcc->undiscard_blks += di.len;
|
2017-04-15 14:09:37 +08:00
|
|
|
__relocate_discard_cmd(dcc, prev_dc);
|
2017-04-14 23:24:55 +08:00
|
|
|
di = prev_dc->di;
|
|
|
|
tdc = prev_dc;
|
|
|
|
merged = true;
|
|
|
|
}
|
|
|
|
|
|
|
|
if (next_dc && next_dc->state == D_PREP &&
|
|
|
|
next_dc->bdev == bdev &&
|
2018-08-06 22:43:50 +08:00
|
|
|
__is_discard_front_mergeable(&di, &next_dc->di,
|
|
|
|
max_discard_blocks)) {
|
2017-04-14 23:24:55 +08:00
|
|
|
next_dc->di.lstart = di.lstart;
|
|
|
|
next_dc->di.len += di.len;
|
|
|
|
next_dc->di.start = di.start;
|
2017-04-18 19:27:39 +08:00
|
|
|
dcc->undiscard_blks += di.len;
|
2017-04-15 14:09:37 +08:00
|
|
|
__relocate_discard_cmd(dcc, next_dc);
|
2017-04-14 23:24:55 +08:00
|
|
|
if (tdc)
|
|
|
|
__remove_discard_cmd(sbi, tdc);
|
|
|
|
merged = true;
|
2016-12-30 06:07:53 +08:00
|
|
|
}
|
2017-04-14 23:24:55 +08:00
|
|
|
|
2017-04-17 18:21:43 +08:00
|
|
|
if (!merged) {
|
2017-04-14 23:24:55 +08:00
|
|
|
__insert_discard_tree(sbi, bdev, di.lstart, di.start,
|
|
|
|
di.len, NULL, NULL);
|
2017-04-17 18:21:43 +08:00
|
|
|
}
|
2017-04-14 23:24:55 +08:00
|
|
|
next:
|
|
|
|
prev_dc = next_dc;
|
|
|
|
if (!prev_dc)
|
|
|
|
break;
|
|
|
|
|
|
|
|
node = rb_next(&prev_dc->rb_node);
|
|
|
|
next_dc = rb_entry_safe(node, struct discard_cmd, rb_node);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
static int __queue_discard_cmd(struct f2fs_sb_info *sbi,
|
|
|
|
struct block_device *bdev, block_t blkstart, block_t blklen)
|
|
|
|
{
|
|
|
|
block_t lblkstart = blkstart;
|
|
|
|
|
2017-04-15 14:09:38 +08:00
|
|
|
trace_f2fs_queue_discard(bdev, blkstart, blklen);
|
2017-04-14 23:24:55 +08:00
|
|
|
|
|
|
|
if (sbi->s_ndevs) {
|
|
|
|
int devi = f2fs_target_device_index(sbi, blkstart);
|
|
|
|
|
|
|
|
blkstart -= FDEV(devi).start_blk;
|
|
|
|
}
|
2018-08-06 22:43:50 +08:00
|
|
|
mutex_lock(&SM_I(sbi)->dcc_info->cmd_lock);
|
2017-04-14 23:24:55 +08:00
|
|
|
__update_discard_tree_range(sbi, bdev, lblkstart, blkstart, blklen);
|
2018-08-06 22:43:50 +08:00
|
|
|
mutex_unlock(&SM_I(sbi)->dcc_info->cmd_lock);
|
2017-04-14 23:24:55 +08:00
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2018-07-08 22:11:01 +08:00
|
|
|
static unsigned int __issue_discard_cmd_orderly(struct f2fs_sb_info *sbi,
|
|
|
|
struct discard_policy *dpolicy)
|
|
|
|
{
|
|
|
|
struct discard_cmd_control *dcc = SM_I(sbi)->dcc_info;
|
|
|
|
struct discard_cmd *prev_dc = NULL, *next_dc = NULL;
|
|
|
|
struct rb_node **insert_p = NULL, *insert_parent = NULL;
|
|
|
|
struct discard_cmd *dc;
|
|
|
|
struct blk_plug plug;
|
|
|
|
unsigned int pos = dcc->next_pos;
|
|
|
|
unsigned int issued = 0;
|
|
|
|
bool io_interrupted = false;
|
|
|
|
|
|
|
|
mutex_lock(&dcc->cmd_lock);
|
|
|
|
dc = (struct discard_cmd *)f2fs_lookup_rb_tree_ret(&dcc->root,
|
|
|
|
NULL, pos,
|
|
|
|
(struct rb_entry **)&prev_dc,
|
|
|
|
(struct rb_entry **)&next_dc,
|
|
|
|
&insert_p, &insert_parent, true);
|
|
|
|
if (!dc)
|
|
|
|
dc = next_dc;
|
|
|
|
|
|
|
|
blk_start_plug(&plug);
|
|
|
|
|
|
|
|
while (dc) {
|
|
|
|
struct rb_node *node;
|
2018-08-08 10:14:55 +08:00
|
|
|
int err = 0;
|
2018-07-08 22:11:01 +08:00
|
|
|
|
|
|
|
if (dc->state != D_PREP)
|
|
|
|
goto next;
|
|
|
|
|
2018-09-19 16:48:47 +08:00
|
|
|
if (dpolicy->io_aware && !is_idle(sbi, DISCARD_TIME)) {
|
2018-07-08 22:11:01 +08:00
|
|
|
io_interrupted = true;
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
|
|
|
|
dcc->next_pos = dc->lstart + dc->len;
|
2018-08-08 10:14:55 +08:00
|
|
|
err = __submit_discard_cmd(sbi, dpolicy, dc, &issued);
|
2018-07-08 22:11:01 +08:00
|
|
|
|
2018-08-06 22:43:50 +08:00
|
|
|
if (issued >= dpolicy->max_requests)
|
2018-07-08 22:11:01 +08:00
|
|
|
break;
|
|
|
|
next:
|
|
|
|
node = rb_next(&dc->rb_node);
|
2018-08-08 10:14:55 +08:00
|
|
|
if (err)
|
|
|
|
__remove_discard_cmd(sbi, dc);
|
2018-07-08 22:11:01 +08:00
|
|
|
dc = rb_entry_safe(node, struct discard_cmd, rb_node);
|
|
|
|
}
|
|
|
|
|
|
|
|
blk_finish_plug(&plug);
|
|
|
|
|
|
|
|
if (!dc)
|
|
|
|
dcc->next_pos = 0;
|
|
|
|
|
|
|
|
mutex_unlock(&dcc->cmd_lock);
|
|
|
|
|
|
|
|
if (!issued && io_interrupted)
|
|
|
|
issued = -1;
|
|
|
|
|
|
|
|
return issued;
|
|
|
|
}
|
|
|
|
|
2017-10-04 09:08:34 +08:00
|
|
|
static int __issue_discard_cmd(struct f2fs_sb_info *sbi,
|
|
|
|
struct discard_policy *dpolicy)
|
2017-04-25 20:21:37 +08:00
|
|
|
{
|
|
|
|
struct discard_cmd_control *dcc = SM_I(sbi)->dcc_info;
|
|
|
|
struct list_head *pend_list;
|
|
|
|
struct discard_cmd *dc, *tmp;
|
|
|
|
struct blk_plug plug;
|
2018-07-08 22:08:09 +08:00
|
|
|
int i, issued = 0;
|
2017-09-12 21:35:12 +08:00
|
|
|
bool io_interrupted = false;
|
2017-04-25 20:21:37 +08:00
|
|
|
|
2017-10-04 09:08:34 +08:00
|
|
|
for (i = MAX_PLIST_NUM - 1; i >= 0; i--) {
|
|
|
|
if (i + 1 < dpolicy->granularity)
|
|
|
|
break;
|
2018-07-08 22:11:01 +08:00
|
|
|
|
|
|
|
if (i < DEFAULT_DISCARD_GRANULARITY && dpolicy->ordered)
|
|
|
|
return __issue_discard_cmd_orderly(sbi, dpolicy);
|
|
|
|
|
2017-04-25 20:21:37 +08:00
|
|
|
pend_list = &dcc->pend_list[i];
|
2017-10-04 09:08:35 +08:00
|
|
|
|
|
|
|
mutex_lock(&dcc->cmd_lock);
|
2018-01-08 18:48:33 +08:00
|
|
|
if (list_empty(pend_list))
|
|
|
|
goto next;
|
2018-06-22 16:06:59 +08:00
|
|
|
if (unlikely(dcc->rbtree_check))
|
|
|
|
f2fs_bug_on(sbi, !f2fs_check_rb_tree_consistence(sbi,
|
|
|
|
&dcc->root));
|
2017-10-04 09:08:35 +08:00
|
|
|
blk_start_plug(&plug);
|
2017-04-25 20:21:37 +08:00
|
|
|
list_for_each_entry_safe(dc, tmp, pend_list, list) {
|
|
|
|
f2fs_bug_on(sbi, dc->state != D_PREP);
|
|
|
|
|
2017-10-04 09:08:33 +08:00
|
|
|
if (dpolicy->io_aware && i < dpolicy->io_aware_gran &&
|
2018-09-19 16:48:47 +08:00
|
|
|
!is_idle(sbi, DISCARD_TIME)) {
|
2017-09-12 21:35:12 +08:00
|
|
|
io_interrupted = true;
|
2018-07-08 22:08:09 +08:00
|
|
|
break;
|
f2fs: introduce discard_granularity sysfs entry
Commit d618ebaf0aa8 ("f2fs: enable small discard by default") enables
f2fs to issue 4K size discard in real-time discard mode. However, issuing
smaller discard may cost more lifetime but releasing less free space in
flash device. Since f2fs has ability of separating hot/cold data and
garbage collection, we can expect that small-sized invalid region would
expand soon with OPU, deletion or garbage collection on valid datas, so
it's better to delay or skip issuing smaller size discards, it could help
to reduce overmuch consumption of IO bandwidth and lifetime of flash
storage.
This patch makes f2fs selectng 64K size as its default minimal
granularity, and issue discard with the size which is not smaller than
minimal granularity. Also it exposes discard granularity as sysfs entry
for configuration in different scenario.
Jaegeuk Kim:
We must issue all the accumulated discard commands when fstrim is called.
So, I've added pend_list_tag[] to indicate whether we should issue the
commands or not. If tag sets P_ACTIVE or P_TRIM, we have to issue them.
P_TRIM is set once at a time, given fstrim trigger.
In addition, issue_discard_thread is calling too much due to the number of
discard commands remaining in the pending list. I added a timer to control
it likewise gc_thread.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-08-07 23:09:56 +08:00
|
|
|
}
|
2017-09-12 21:35:12 +08:00
|
|
|
|
2018-08-06 22:43:50 +08:00
|
|
|
__submit_discard_cmd(sbi, dpolicy, dc, &issued);
|
2018-07-08 22:08:09 +08:00
|
|
|
|
2018-08-06 22:43:50 +08:00
|
|
|
if (issued >= dpolicy->max_requests)
|
2017-10-04 09:08:35 +08:00
|
|
|
break;
|
2017-04-25 20:21:37 +08:00
|
|
|
}
|
2017-10-04 09:08:35 +08:00
|
|
|
blk_finish_plug(&plug);
|
2018-01-08 18:48:33 +08:00
|
|
|
next:
|
2017-10-04 09:08:35 +08:00
|
|
|
mutex_unlock(&dcc->cmd_lock);
|
|
|
|
|
2018-07-08 22:08:09 +08:00
|
|
|
if (issued >= dpolicy->max_requests || io_interrupted)
|
2017-10-04 09:08:35 +08:00
|
|
|
break;
|
2017-04-25 20:21:37 +08:00
|
|
|
}
|
f2fs: introduce discard_granularity sysfs entry
Commit d618ebaf0aa8 ("f2fs: enable small discard by default") enables
f2fs to issue 4K size discard in real-time discard mode. However, issuing
smaller discard may cost more lifetime but releasing less free space in
flash device. Since f2fs has ability of separating hot/cold data and
garbage collection, we can expect that small-sized invalid region would
expand soon with OPU, deletion or garbage collection on valid datas, so
it's better to delay or skip issuing smaller size discards, it could help
to reduce overmuch consumption of IO bandwidth and lifetime of flash
storage.
This patch makes f2fs selectng 64K size as its default minimal
granularity, and issue discard with the size which is not smaller than
minimal granularity. Also it exposes discard granularity as sysfs entry
for configuration in different scenario.
Jaegeuk Kim:
We must issue all the accumulated discard commands when fstrim is called.
So, I've added pend_list_tag[] to indicate whether we should issue the
commands or not. If tag sets P_ACTIVE or P_TRIM, we have to issue them.
P_TRIM is set once at a time, given fstrim trigger.
In addition, issue_discard_thread is calling too much due to the number of
discard commands remaining in the pending list. I added a timer to control
it likewise gc_thread.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-08-07 23:09:56 +08:00
|
|
|
|
2017-09-12 21:35:12 +08:00
|
|
|
if (!issued && io_interrupted)
|
|
|
|
issued = -1;
|
|
|
|
|
f2fs: introduce discard_granularity sysfs entry
Commit d618ebaf0aa8 ("f2fs: enable small discard by default") enables
f2fs to issue 4K size discard in real-time discard mode. However, issuing
smaller discard may cost more lifetime but releasing less free space in
flash device. Since f2fs has ability of separating hot/cold data and
garbage collection, we can expect that small-sized invalid region would
expand soon with OPU, deletion or garbage collection on valid datas, so
it's better to delay or skip issuing smaller size discards, it could help
to reduce overmuch consumption of IO bandwidth and lifetime of flash
storage.
This patch makes f2fs selectng 64K size as its default minimal
granularity, and issue discard with the size which is not smaller than
minimal granularity. Also it exposes discard granularity as sysfs entry
for configuration in different scenario.
Jaegeuk Kim:
We must issue all the accumulated discard commands when fstrim is called.
So, I've added pend_list_tag[] to indicate whether we should issue the
commands or not. If tag sets P_ACTIVE or P_TRIM, we have to issue them.
P_TRIM is set once at a time, given fstrim trigger.
In addition, issue_discard_thread is calling too much due to the number of
discard commands remaining in the pending list. I added a timer to control
it likewise gc_thread.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-08-07 23:09:56 +08:00
|
|
|
return issued;
|
|
|
|
}
|
|
|
|
|
2017-10-04 09:08:37 +08:00
|
|
|
static bool __drop_discard_cmd(struct f2fs_sb_info *sbi)
|
f2fs: introduce discard_granularity sysfs entry
Commit d618ebaf0aa8 ("f2fs: enable small discard by default") enables
f2fs to issue 4K size discard in real-time discard mode. However, issuing
smaller discard may cost more lifetime but releasing less free space in
flash device. Since f2fs has ability of separating hot/cold data and
garbage collection, we can expect that small-sized invalid region would
expand soon with OPU, deletion or garbage collection on valid datas, so
it's better to delay or skip issuing smaller size discards, it could help
to reduce overmuch consumption of IO bandwidth and lifetime of flash
storage.
This patch makes f2fs selectng 64K size as its default minimal
granularity, and issue discard with the size which is not smaller than
minimal granularity. Also it exposes discard granularity as sysfs entry
for configuration in different scenario.
Jaegeuk Kim:
We must issue all the accumulated discard commands when fstrim is called.
So, I've added pend_list_tag[] to indicate whether we should issue the
commands or not. If tag sets P_ACTIVE or P_TRIM, we have to issue them.
P_TRIM is set once at a time, given fstrim trigger.
In addition, issue_discard_thread is calling too much due to the number of
discard commands remaining in the pending list. I added a timer to control
it likewise gc_thread.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-08-07 23:09:56 +08:00
|
|
|
{
|
|
|
|
struct discard_cmd_control *dcc = SM_I(sbi)->dcc_info;
|
|
|
|
struct list_head *pend_list;
|
|
|
|
struct discard_cmd *dc, *tmp;
|
|
|
|
int i;
|
2017-10-04 09:08:37 +08:00
|
|
|
bool dropped = false;
|
f2fs: introduce discard_granularity sysfs entry
Commit d618ebaf0aa8 ("f2fs: enable small discard by default") enables
f2fs to issue 4K size discard in real-time discard mode. However, issuing
smaller discard may cost more lifetime but releasing less free space in
flash device. Since f2fs has ability of separating hot/cold data and
garbage collection, we can expect that small-sized invalid region would
expand soon with OPU, deletion or garbage collection on valid datas, so
it's better to delay or skip issuing smaller size discards, it could help
to reduce overmuch consumption of IO bandwidth and lifetime of flash
storage.
This patch makes f2fs selectng 64K size as its default minimal
granularity, and issue discard with the size which is not smaller than
minimal granularity. Also it exposes discard granularity as sysfs entry
for configuration in different scenario.
Jaegeuk Kim:
We must issue all the accumulated discard commands when fstrim is called.
So, I've added pend_list_tag[] to indicate whether we should issue the
commands or not. If tag sets P_ACTIVE or P_TRIM, we have to issue them.
P_TRIM is set once at a time, given fstrim trigger.
In addition, issue_discard_thread is calling too much due to the number of
discard commands remaining in the pending list. I added a timer to control
it likewise gc_thread.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-08-07 23:09:56 +08:00
|
|
|
|
|
|
|
mutex_lock(&dcc->cmd_lock);
|
|
|
|
for (i = MAX_PLIST_NUM - 1; i >= 0; i--) {
|
|
|
|
pend_list = &dcc->pend_list[i];
|
|
|
|
list_for_each_entry_safe(dc, tmp, pend_list, list) {
|
|
|
|
f2fs_bug_on(sbi, dc->state != D_PREP);
|
|
|
|
__remove_discard_cmd(sbi, dc);
|
2017-10-04 09:08:37 +08:00
|
|
|
dropped = true;
|
f2fs: introduce discard_granularity sysfs entry
Commit d618ebaf0aa8 ("f2fs: enable small discard by default") enables
f2fs to issue 4K size discard in real-time discard mode. However, issuing
smaller discard may cost more lifetime but releasing less free space in
flash device. Since f2fs has ability of separating hot/cold data and
garbage collection, we can expect that small-sized invalid region would
expand soon with OPU, deletion or garbage collection on valid datas, so
it's better to delay or skip issuing smaller size discards, it could help
to reduce overmuch consumption of IO bandwidth and lifetime of flash
storage.
This patch makes f2fs selectng 64K size as its default minimal
granularity, and issue discard with the size which is not smaller than
minimal granularity. Also it exposes discard granularity as sysfs entry
for configuration in different scenario.
Jaegeuk Kim:
We must issue all the accumulated discard commands when fstrim is called.
So, I've added pend_list_tag[] to indicate whether we should issue the
commands or not. If tag sets P_ACTIVE or P_TRIM, we have to issue them.
P_TRIM is set once at a time, given fstrim trigger.
In addition, issue_discard_thread is calling too much due to the number of
discard commands remaining in the pending list. I added a timer to control
it likewise gc_thread.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-08-07 23:09:56 +08:00
|
|
|
}
|
|
|
|
}
|
|
|
|
mutex_unlock(&dcc->cmd_lock);
|
2017-10-04 09:08:37 +08:00
|
|
|
|
|
|
|
return dropped;
|
2017-04-25 20:21:37 +08:00
|
|
|
}
|
|
|
|
|
f2fs: clean up symbol namespace
As Ted reported:
"Hi, I was looking at f2fs's sources recently, and I noticed that there
is a very large number of non-static symbols which don't have a f2fs
prefix. There's well over a hundred (see attached below).
As one example, in fs/f2fs/dir.c there is:
unsigned char get_de_type(struct f2fs_dir_entry *de)
This function is clearly only useful for f2fs, but it has a generic
name. This means that if any other file system tries to have the same
symbol name, there will be a symbol conflict and the kernel would not
successfully build. It also means that when someone is looking f2fs
sources, it's not at all obvious whether a function such as
read_data_page(), invalidate_blocks(), is a generic kernel function
found in the fs, mm, or block layers, or a f2fs specific function.
You might want to fix this at some point. Hopefully Kent's bcachefs
isn't similarly using genericly named functions, since that might
cause conflicts with f2fs's functions --- but just as this would be a
problem that we would rightly insist that Kent fix, this is something
that we should have rightly insisted that f2fs should have fixed
before it was integrated into the mainline kernel.
acquire_orphan_inode
add_ino_entry
add_orphan_inode
allocate_data_block
allocate_new_segments
alloc_nid
alloc_nid_done
alloc_nid_failed
available_free_memory
...."
This patch adds "f2fs_" prefix for all non-static symbols in order to:
a) avoid conflict with other kernel generic symbols;
b) to indicate the function is f2fs specific one instead of generic
one;
Reported-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-30 00:20:41 +08:00
|
|
|
void f2fs_drop_discard_cmd(struct f2fs_sb_info *sbi)
|
2018-01-18 17:23:29 +08:00
|
|
|
{
|
|
|
|
__drop_discard_cmd(sbi);
|
|
|
|
}
|
|
|
|
|
2017-10-28 16:52:32 +08:00
|
|
|
static unsigned int __wait_one_discard_bio(struct f2fs_sb_info *sbi,
|
2017-06-05 18:29:06 +08:00
|
|
|
struct discard_cmd *dc)
|
|
|
|
{
|
|
|
|
struct discard_cmd_control *dcc = SM_I(sbi)->dcc_info;
|
2017-10-28 16:52:32 +08:00
|
|
|
unsigned int len = 0;
|
2017-06-05 18:29:06 +08:00
|
|
|
|
|
|
|
wait_for_completion_io(&dc->wait);
|
|
|
|
mutex_lock(&dcc->cmd_lock);
|
|
|
|
f2fs_bug_on(sbi, dc->state != D_DONE);
|
|
|
|
dc->ref--;
|
2017-10-28 16:52:32 +08:00
|
|
|
if (!dc->ref) {
|
|
|
|
if (!dc->error)
|
|
|
|
len = dc->len;
|
2017-06-05 18:29:06 +08:00
|
|
|
__remove_discard_cmd(sbi, dc);
|
2017-10-28 16:52:32 +08:00
|
|
|
}
|
2017-06-05 18:29:06 +08:00
|
|
|
mutex_unlock(&dcc->cmd_lock);
|
2017-10-28 16:52:32 +08:00
|
|
|
|
|
|
|
return len;
|
2017-06-05 18:29:06 +08:00
|
|
|
}
|
|
|
|
|
2017-10-28 16:52:32 +08:00
|
|
|
static unsigned int __wait_discard_cmd_range(struct f2fs_sb_info *sbi,
|
2017-10-04 09:08:34 +08:00
|
|
|
struct discard_policy *dpolicy,
|
|
|
|
block_t start, block_t end)
|
2017-04-25 20:21:38 +08:00
|
|
|
{
|
|
|
|
struct discard_cmd_control *dcc = SM_I(sbi)->dcc_info;
|
2017-10-04 09:08:34 +08:00
|
|
|
struct list_head *wait_list = (dpolicy->type == DPOLICY_FSTRIM) ?
|
|
|
|
&(dcc->fstrim_list) : &(dcc->wait_list);
|
2017-04-25 20:21:38 +08:00
|
|
|
struct discard_cmd *dc, *tmp;
|
2017-05-19 23:46:45 +08:00
|
|
|
bool need_wait;
|
2017-10-28 16:52:32 +08:00
|
|
|
unsigned int trimmed = 0;
|
2017-05-19 23:46:45 +08:00
|
|
|
|
|
|
|
next:
|
|
|
|
need_wait = false;
|
2017-04-25 20:21:38 +08:00
|
|
|
|
|
|
|
mutex_lock(&dcc->cmd_lock);
|
|
|
|
list_for_each_entry_safe(dc, tmp, wait_list, list) {
|
2017-10-04 09:08:32 +08:00
|
|
|
if (dc->lstart + dc->len <= start || end <= dc->lstart)
|
|
|
|
continue;
|
2017-10-04 09:08:34 +08:00
|
|
|
if (dc->len < dpolicy->granularity)
|
2017-10-04 09:08:32 +08:00
|
|
|
continue;
|
2017-10-04 09:08:34 +08:00
|
|
|
if (dc->state == D_DONE && !dc->ref) {
|
2017-04-25 20:21:38 +08:00
|
|
|
wait_for_completion_io(&dc->wait);
|
2017-10-28 16:52:32 +08:00
|
|
|
if (!dc->error)
|
|
|
|
trimmed += dc->len;
|
2017-04-25 20:21:38 +08:00
|
|
|
__remove_discard_cmd(sbi, dc);
|
2017-05-19 23:46:45 +08:00
|
|
|
} else {
|
|
|
|
dc->ref++;
|
|
|
|
need_wait = true;
|
|
|
|
break;
|
2017-04-25 20:21:38 +08:00
|
|
|
}
|
|
|
|
}
|
|
|
|
mutex_unlock(&dcc->cmd_lock);
|
2017-05-19 23:46:45 +08:00
|
|
|
|
|
|
|
if (need_wait) {
|
2017-10-28 16:52:32 +08:00
|
|
|
trimmed += __wait_one_discard_bio(sbi, dc);
|
2017-05-19 23:46:45 +08:00
|
|
|
goto next;
|
|
|
|
}
|
2017-10-28 16:52:32 +08:00
|
|
|
|
|
|
|
return trimmed;
|
2017-04-25 20:21:38 +08:00
|
|
|
}
|
|
|
|
|
2018-06-25 20:33:24 +08:00
|
|
|
static unsigned int __wait_all_discard_cmd(struct f2fs_sb_info *sbi,
|
2017-10-04 09:08:34 +08:00
|
|
|
struct discard_policy *dpolicy)
|
2017-10-04 09:08:32 +08:00
|
|
|
{
|
2018-05-25 04:57:26 +08:00
|
|
|
struct discard_policy dp;
|
2018-06-25 20:33:24 +08:00
|
|
|
unsigned int discard_blks;
|
2018-05-25 04:57:26 +08:00
|
|
|
|
2018-06-25 20:33:24 +08:00
|
|
|
if (dpolicy)
|
|
|
|
return __wait_discard_cmd_range(sbi, dpolicy, 0, UINT_MAX);
|
2018-05-25 04:57:26 +08:00
|
|
|
|
|
|
|
/* wait all */
|
2018-05-30 00:58:42 +08:00
|
|
|
__init_discard_policy(sbi, &dp, DPOLICY_FSTRIM, 1);
|
2018-06-25 20:33:24 +08:00
|
|
|
discard_blks = __wait_discard_cmd_range(sbi, &dp, 0, UINT_MAX);
|
2018-05-30 00:58:42 +08:00
|
|
|
__init_discard_policy(sbi, &dp, DPOLICY_UMOUNT, 1);
|
2018-06-25 20:33:24 +08:00
|
|
|
discard_blks += __wait_discard_cmd_range(sbi, &dp, 0, UINT_MAX);
|
|
|
|
|
|
|
|
return discard_blks;
|
2017-10-04 09:08:32 +08:00
|
|
|
}
|
|
|
|
|
2017-04-14 23:24:55 +08:00
|
|
|
/* This should be covered by global mutex, &sit_i->sentry_lock */
|
2018-01-05 17:41:20 +08:00
|
|
|
static void f2fs_wait_discard_bio(struct f2fs_sb_info *sbi, block_t blkaddr)
|
2017-04-14 23:24:55 +08:00
|
|
|
{
|
|
|
|
struct discard_cmd_control *dcc = SM_I(sbi)->dcc_info;
|
|
|
|
struct discard_cmd *dc;
|
2017-04-26 17:39:54 +08:00
|
|
|
bool need_wait = false;
|
2017-04-14 23:24:55 +08:00
|
|
|
|
|
|
|
mutex_lock(&dcc->cmd_lock);
|
f2fs: clean up symbol namespace
As Ted reported:
"Hi, I was looking at f2fs's sources recently, and I noticed that there
is a very large number of non-static symbols which don't have a f2fs
prefix. There's well over a hundred (see attached below).
As one example, in fs/f2fs/dir.c there is:
unsigned char get_de_type(struct f2fs_dir_entry *de)
This function is clearly only useful for f2fs, but it has a generic
name. This means that if any other file system tries to have the same
symbol name, there will be a symbol conflict and the kernel would not
successfully build. It also means that when someone is looking f2fs
sources, it's not at all obvious whether a function such as
read_data_page(), invalidate_blocks(), is a generic kernel function
found in the fs, mm, or block layers, or a f2fs specific function.
You might want to fix this at some point. Hopefully Kent's bcachefs
isn't similarly using genericly named functions, since that might
cause conflicts with f2fs's functions --- but just as this would be a
problem that we would rightly insist that Kent fix, this is something
that we should have rightly insisted that f2fs should have fixed
before it was integrated into the mainline kernel.
acquire_orphan_inode
add_ino_entry
add_orphan_inode
allocate_data_block
allocate_new_segments
alloc_nid
alloc_nid_done
alloc_nid_failed
available_free_memory
...."
This patch adds "f2fs_" prefix for all non-static symbols in order to:
a) avoid conflict with other kernel generic symbols;
b) to indicate the function is f2fs specific one instead of generic
one;
Reported-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-30 00:20:41 +08:00
|
|
|
dc = (struct discard_cmd *)f2fs_lookup_rb_tree(&dcc->root,
|
|
|
|
NULL, blkaddr);
|
2017-04-14 23:24:55 +08:00
|
|
|
if (dc) {
|
2017-04-26 17:39:54 +08:00
|
|
|
if (dc->state == D_PREP) {
|
|
|
|
__punch_discard_cmd(sbi, dc, blkaddr);
|
|
|
|
} else {
|
|
|
|
dc->ref++;
|
|
|
|
need_wait = true;
|
|
|
|
}
|
2016-08-29 23:58:34 +08:00
|
|
|
}
|
2017-04-05 18:19:49 +08:00
|
|
|
mutex_unlock(&dcc->cmd_lock);
|
2017-04-26 17:39:54 +08:00
|
|
|
|
2017-06-05 18:29:06 +08:00
|
|
|
if (need_wait)
|
|
|
|
__wait_one_discard_bio(sbi, dc);
|
2017-04-05 18:19:49 +08:00
|
|
|
}
|
|
|
|
|
f2fs: clean up symbol namespace
As Ted reported:
"Hi, I was looking at f2fs's sources recently, and I noticed that there
is a very large number of non-static symbols which don't have a f2fs
prefix. There's well over a hundred (see attached below).
As one example, in fs/f2fs/dir.c there is:
unsigned char get_de_type(struct f2fs_dir_entry *de)
This function is clearly only useful for f2fs, but it has a generic
name. This means that if any other file system tries to have the same
symbol name, there will be a symbol conflict and the kernel would not
successfully build. It also means that when someone is looking f2fs
sources, it's not at all obvious whether a function such as
read_data_page(), invalidate_blocks(), is a generic kernel function
found in the fs, mm, or block layers, or a f2fs specific function.
You might want to fix this at some point. Hopefully Kent's bcachefs
isn't similarly using genericly named functions, since that might
cause conflicts with f2fs's functions --- but just as this would be a
problem that we would rightly insist that Kent fix, this is something
that we should have rightly insisted that f2fs should have fixed
before it was integrated into the mainline kernel.
acquire_orphan_inode
add_ino_entry
add_orphan_inode
allocate_data_block
allocate_new_segments
alloc_nid
alloc_nid_done
alloc_nid_failed
available_free_memory
...."
This patch adds "f2fs_" prefix for all non-static symbols in order to:
a) avoid conflict with other kernel generic symbols;
b) to indicate the function is f2fs specific one instead of generic
one;
Reported-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-30 00:20:41 +08:00
|
|
|
void f2fs_stop_discard_thread(struct f2fs_sb_info *sbi)
|
2017-06-29 23:17:45 +08:00
|
|
|
{
|
|
|
|
struct discard_cmd_control *dcc = SM_I(sbi)->dcc_info;
|
|
|
|
|
|
|
|
if (dcc && dcc->f2fs_issue_discard) {
|
|
|
|
struct task_struct *discard_thread = dcc->f2fs_issue_discard;
|
|
|
|
|
|
|
|
dcc->f2fs_issue_discard = NULL;
|
|
|
|
kthread_stop(discard_thread);
|
2017-04-26 17:39:54 +08:00
|
|
|
}
|
2017-04-05 18:19:49 +08:00
|
|
|
}
|
|
|
|
|
2017-10-04 09:08:32 +08:00
|
|
|
/* This comes from f2fs_put_super */
|
2017-10-04 09:08:37 +08:00
|
|
|
bool f2fs_wait_discard_bios(struct f2fs_sb_info *sbi)
|
f2fs: introduce discard_granularity sysfs entry
Commit d618ebaf0aa8 ("f2fs: enable small discard by default") enables
f2fs to issue 4K size discard in real-time discard mode. However, issuing
smaller discard may cost more lifetime but releasing less free space in
flash device. Since f2fs has ability of separating hot/cold data and
garbage collection, we can expect that small-sized invalid region would
expand soon with OPU, deletion or garbage collection on valid datas, so
it's better to delay or skip issuing smaller size discards, it could help
to reduce overmuch consumption of IO bandwidth and lifetime of flash
storage.
This patch makes f2fs selectng 64K size as its default minimal
granularity, and issue discard with the size which is not smaller than
minimal granularity. Also it exposes discard granularity as sysfs entry
for configuration in different scenario.
Jaegeuk Kim:
We must issue all the accumulated discard commands when fstrim is called.
So, I've added pend_list_tag[] to indicate whether we should issue the
commands or not. If tag sets P_ACTIVE or P_TRIM, we have to issue them.
P_TRIM is set once at a time, given fstrim trigger.
In addition, issue_discard_thread is calling too much due to the number of
discard commands remaining in the pending list. I added a timer to control
it likewise gc_thread.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-08-07 23:09:56 +08:00
|
|
|
{
|
|
|
|
struct discard_cmd_control *dcc = SM_I(sbi)->dcc_info;
|
2017-10-04 09:08:34 +08:00
|
|
|
struct discard_policy dpolicy;
|
2017-10-04 09:08:37 +08:00
|
|
|
bool dropped;
|
f2fs: introduce discard_granularity sysfs entry
Commit d618ebaf0aa8 ("f2fs: enable small discard by default") enables
f2fs to issue 4K size discard in real-time discard mode. However, issuing
smaller discard may cost more lifetime but releasing less free space in
flash device. Since f2fs has ability of separating hot/cold data and
garbage collection, we can expect that small-sized invalid region would
expand soon with OPU, deletion or garbage collection on valid datas, so
it's better to delay or skip issuing smaller size discards, it could help
to reduce overmuch consumption of IO bandwidth and lifetime of flash
storage.
This patch makes f2fs selectng 64K size as its default minimal
granularity, and issue discard with the size which is not smaller than
minimal granularity. Also it exposes discard granularity as sysfs entry
for configuration in different scenario.
Jaegeuk Kim:
We must issue all the accumulated discard commands when fstrim is called.
So, I've added pend_list_tag[] to indicate whether we should issue the
commands or not. If tag sets P_ACTIVE or P_TRIM, we have to issue them.
P_TRIM is set once at a time, given fstrim trigger.
In addition, issue_discard_thread is calling too much due to the number of
discard commands remaining in the pending list. I added a timer to control
it likewise gc_thread.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-08-07 23:09:56 +08:00
|
|
|
|
2018-05-30 00:58:42 +08:00
|
|
|
__init_discard_policy(sbi, &dpolicy, DPOLICY_UMOUNT,
|
|
|
|
dcc->discard_granularity);
|
2017-10-04 09:08:34 +08:00
|
|
|
__issue_discard_cmd(sbi, &dpolicy);
|
2017-10-04 09:08:37 +08:00
|
|
|
dropped = __drop_discard_cmd(sbi);
|
|
|
|
|
2018-05-25 04:57:26 +08:00
|
|
|
/* just to make sure there is no pending discard commands */
|
|
|
|
__wait_all_discard_cmd(sbi, NULL);
|
2018-07-08 22:16:53 +08:00
|
|
|
|
|
|
|
f2fs_bug_on(sbi, atomic_read(&dcc->discard_cmd_cnt));
|
2017-10-04 09:08:37 +08:00
|
|
|
return dropped;
|
f2fs: introduce discard_granularity sysfs entry
Commit d618ebaf0aa8 ("f2fs: enable small discard by default") enables
f2fs to issue 4K size discard in real-time discard mode. However, issuing
smaller discard may cost more lifetime but releasing less free space in
flash device. Since f2fs has ability of separating hot/cold data and
garbage collection, we can expect that small-sized invalid region would
expand soon with OPU, deletion or garbage collection on valid datas, so
it's better to delay or skip issuing smaller size discards, it could help
to reduce overmuch consumption of IO bandwidth and lifetime of flash
storage.
This patch makes f2fs selectng 64K size as its default minimal
granularity, and issue discard with the size which is not smaller than
minimal granularity. Also it exposes discard granularity as sysfs entry
for configuration in different scenario.
Jaegeuk Kim:
We must issue all the accumulated discard commands when fstrim is called.
So, I've added pend_list_tag[] to indicate whether we should issue the
commands or not. If tag sets P_ACTIVE or P_TRIM, we have to issue them.
P_TRIM is set once at a time, given fstrim trigger.
In addition, issue_discard_thread is calling too much due to the number of
discard commands remaining in the pending list. I added a timer to control
it likewise gc_thread.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-08-07 23:09:56 +08:00
|
|
|
}
|
|
|
|
|
2017-01-10 12:32:07 +08:00
|
|
|
static int issue_discard_thread(void *data)
|
|
|
|
{
|
|
|
|
struct f2fs_sb_info *sbi = data;
|
|
|
|
struct discard_cmd_control *dcc = SM_I(sbi)->dcc_info;
|
|
|
|
wait_queue_head_t *q = &dcc->discard_wait_queue;
|
2017-10-04 09:08:34 +08:00
|
|
|
struct discard_policy dpolicy;
|
f2fs: introduce discard_granularity sysfs entry
Commit d618ebaf0aa8 ("f2fs: enable small discard by default") enables
f2fs to issue 4K size discard in real-time discard mode. However, issuing
smaller discard may cost more lifetime but releasing less free space in
flash device. Since f2fs has ability of separating hot/cold data and
garbage collection, we can expect that small-sized invalid region would
expand soon with OPU, deletion or garbage collection on valid datas, so
it's better to delay or skip issuing smaller size discards, it could help
to reduce overmuch consumption of IO bandwidth and lifetime of flash
storage.
This patch makes f2fs selectng 64K size as its default minimal
granularity, and issue discard with the size which is not smaller than
minimal granularity. Also it exposes discard granularity as sysfs entry
for configuration in different scenario.
Jaegeuk Kim:
We must issue all the accumulated discard commands when fstrim is called.
So, I've added pend_list_tag[] to indicate whether we should issue the
commands or not. If tag sets P_ACTIVE or P_TRIM, we have to issue them.
P_TRIM is set once at a time, given fstrim trigger.
In addition, issue_discard_thread is calling too much due to the number of
discard commands remaining in the pending list. I added a timer to control
it likewise gc_thread.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-08-07 23:09:56 +08:00
|
|
|
unsigned int wait_ms = DEF_MIN_DISCARD_ISSUE_TIME;
|
|
|
|
int issued;
|
2017-01-10 12:32:07 +08:00
|
|
|
|
2017-05-18 01:36:58 +08:00
|
|
|
set_freezable();
|
2017-01-10 12:32:07 +08:00
|
|
|
|
2017-05-18 01:36:58 +08:00
|
|
|
do {
|
2018-05-30 00:58:42 +08:00
|
|
|
__init_discard_policy(sbi, &dpolicy, DPOLICY_BG,
|
2017-10-04 09:08:34 +08:00
|
|
|
dcc->discard_granularity);
|
|
|
|
|
f2fs: introduce discard_granularity sysfs entry
Commit d618ebaf0aa8 ("f2fs: enable small discard by default") enables
f2fs to issue 4K size discard in real-time discard mode. However, issuing
smaller discard may cost more lifetime but releasing less free space in
flash device. Since f2fs has ability of separating hot/cold data and
garbage collection, we can expect that small-sized invalid region would
expand soon with OPU, deletion or garbage collection on valid datas, so
it's better to delay or skip issuing smaller size discards, it could help
to reduce overmuch consumption of IO bandwidth and lifetime of flash
storage.
This patch makes f2fs selectng 64K size as its default minimal
granularity, and issue discard with the size which is not smaller than
minimal granularity. Also it exposes discard granularity as sysfs entry
for configuration in different scenario.
Jaegeuk Kim:
We must issue all the accumulated discard commands when fstrim is called.
So, I've added pend_list_tag[] to indicate whether we should issue the
commands or not. If tag sets P_ACTIVE or P_TRIM, we have to issue them.
P_TRIM is set once at a time, given fstrim trigger.
In addition, issue_discard_thread is calling too much due to the number of
discard commands remaining in the pending list. I added a timer to control
it likewise gc_thread.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-08-07 23:09:56 +08:00
|
|
|
wait_event_interruptible_timeout(*q,
|
|
|
|
kthread_should_stop() || freezing(current) ||
|
|
|
|
dcc->discard_wake,
|
|
|
|
msecs_to_jiffies(wait_ms));
|
2018-05-08 17:51:34 +08:00
|
|
|
|
|
|
|
if (dcc->discard_wake)
|
|
|
|
dcc->discard_wake = 0;
|
|
|
|
|
2017-05-18 01:36:58 +08:00
|
|
|
if (try_to_freeze())
|
|
|
|
continue;
|
2018-01-25 18:57:27 +08:00
|
|
|
if (f2fs_readonly(sbi->sb))
|
|
|
|
continue;
|
2017-05-18 01:36:58 +08:00
|
|
|
if (kthread_should_stop())
|
|
|
|
return 0;
|
2018-04-13 11:08:05 +08:00
|
|
|
if (is_sbi_flag_set(sbi, SBI_NEED_FSCK)) {
|
|
|
|
wait_ms = dpolicy.max_interval;
|
|
|
|
continue;
|
|
|
|
}
|
2017-01-10 12:32:07 +08:00
|
|
|
|
2018-05-08 05:22:40 +08:00
|
|
|
if (sbi->gc_mode == GC_URGENT)
|
2018-05-30 00:58:42 +08:00
|
|
|
__init_discard_policy(sbi, &dpolicy, DPOLICY_FORCE, 1);
|
f2fs: introduce discard_granularity sysfs entry
Commit d618ebaf0aa8 ("f2fs: enable small discard by default") enables
f2fs to issue 4K size discard in real-time discard mode. However, issuing
smaller discard may cost more lifetime but releasing less free space in
flash device. Since f2fs has ability of separating hot/cold data and
garbage collection, we can expect that small-sized invalid region would
expand soon with OPU, deletion or garbage collection on valid datas, so
it's better to delay or skip issuing smaller size discards, it could help
to reduce overmuch consumption of IO bandwidth and lifetime of flash
storage.
This patch makes f2fs selectng 64K size as its default minimal
granularity, and issue discard with the size which is not smaller than
minimal granularity. Also it exposes discard granularity as sysfs entry
for configuration in different scenario.
Jaegeuk Kim:
We must issue all the accumulated discard commands when fstrim is called.
So, I've added pend_list_tag[] to indicate whether we should issue the
commands or not. If tag sets P_ACTIVE or P_TRIM, we have to issue them.
P_TRIM is set once at a time, given fstrim trigger.
In addition, issue_discard_thread is calling too much due to the number of
discard commands remaining in the pending list. I added a timer to control
it likewise gc_thread.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-08-07 23:09:56 +08:00
|
|
|
|
f2fs: make background threads of f2fs being aware of freezing
When ->freeze_fs is called from lvm for doing snapshot, it needs to
make sure there will be no more changes in filesystem's data, however,
previously, background threads like GC thread wasn't aware of freezing,
so in environment with active background threads, data of snapshot
becomes unstable.
This patch fixes this issue by adding sb_{start,end}_intwrite in
below background threads:
- GC thread
- flush thread
- discard thread
Note that, don't use sb_start_intwrite() in gc_thread_func() due to:
generic/241 reports below bug:
======================================================
WARNING: possible circular locking dependency detected
4.13.0-rc1+ #32 Tainted: G O
------------------------------------------------------
f2fs_gc-250:0/22186 is trying to acquire lock:
(&sbi->gc_mutex){+.+...}, at: [<f8fa7f0b>] f2fs_sync_fs+0x7b/0x1b0 [f2fs]
but task is already holding lock:
(sb_internal#2){++++.-}, at: [<f8fb5609>] gc_thread_func+0x159/0x4a0 [f2fs]
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #2 (sb_internal#2){++++.-}:
__lock_acquire+0x405/0x7b0
lock_acquire+0xae/0x220
__sb_start_write+0x11d/0x1f0
f2fs_evict_inode+0x2d6/0x4e0 [f2fs]
evict+0xa8/0x170
iput+0x1fb/0x2c0
f2fs_sync_inode_meta+0x3f/0xf0 [f2fs]
write_checkpoint+0x1b1/0x750 [f2fs]
f2fs_sync_fs+0x85/0x1b0 [f2fs]
f2fs_do_sync_file.isra.24+0x137/0xa30 [f2fs]
f2fs_sync_file+0x34/0x40 [f2fs]
vfs_fsync_range+0x4a/0xa0
do_fsync+0x3c/0x60
SyS_fdatasync+0x15/0x20
do_fast_syscall_32+0xa1/0x1b0
entry_SYSENTER_32+0x4c/0x7b
-> #1 (&sbi->cp_mutex){+.+...}:
__lock_acquire+0x405/0x7b0
lock_acquire+0xae/0x220
__mutex_lock+0x4f/0x830
mutex_lock_nested+0x25/0x30
write_checkpoint+0x2f/0x750 [f2fs]
f2fs_sync_fs+0x85/0x1b0 [f2fs]
sync_filesystem+0x67/0x80
generic_shutdown_super+0x27/0x100
kill_block_super+0x22/0x50
kill_f2fs_super+0x3a/0x40 [f2fs]
deactivate_locked_super+0x3d/0x70
deactivate_super+0x40/0x60
cleanup_mnt+0x39/0x70
__cleanup_mnt+0x10/0x20
task_work_run+0x69/0x80
exit_to_usermode_loop+0x57/0x92
do_fast_syscall_32+0x18c/0x1b0
entry_SYSENTER_32+0x4c/0x7b
-> #0 (&sbi->gc_mutex){+.+...}:
validate_chain.isra.36+0xc50/0xdb0
__lock_acquire+0x405/0x7b0
lock_acquire+0xae/0x220
__mutex_lock+0x4f/0x830
mutex_lock_nested+0x25/0x30
f2fs_sync_fs+0x7b/0x1b0 [f2fs]
f2fs_balance_fs_bg+0xb9/0x200 [f2fs]
gc_thread_func+0x302/0x4a0 [f2fs]
kthread+0xe9/0x120
ret_from_fork+0x19/0x24
other info that might help us debug this:
Chain exists of:
&sbi->gc_mutex --> &sbi->cp_mutex --> sb_internal#2
Possible unsafe locking scenario:
CPU0 CPU1
---- ----
lock(sb_internal#2);
lock(&sbi->cp_mutex);
lock(sb_internal#2);
lock(&sbi->gc_mutex);
*** DEADLOCK ***
1 lock held by f2fs_gc-250:0/22186:
#0: (sb_internal#2){++++.-}, at: [<f8fb5609>] gc_thread_func+0x159/0x4a0 [f2fs]
stack backtrace:
CPU: 2 PID: 22186 Comm: f2fs_gc-250:0 Tainted: G O 4.13.0-rc1+ #32
Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
Call Trace:
dump_stack+0x5f/0x92
print_circular_bug+0x1b3/0x1bd
validate_chain.isra.36+0xc50/0xdb0
? __this_cpu_preempt_check+0xf/0x20
__lock_acquire+0x405/0x7b0
lock_acquire+0xae/0x220
? f2fs_sync_fs+0x7b/0x1b0 [f2fs]
__mutex_lock+0x4f/0x830
? f2fs_sync_fs+0x7b/0x1b0 [f2fs]
mutex_lock_nested+0x25/0x30
? f2fs_sync_fs+0x7b/0x1b0 [f2fs]
f2fs_sync_fs+0x7b/0x1b0 [f2fs]
f2fs_balance_fs_bg+0xb9/0x200 [f2fs]
gc_thread_func+0x302/0x4a0 [f2fs]
? preempt_schedule_common+0x2f/0x4d
? f2fs_gc+0x540/0x540 [f2fs]
kthread+0xe9/0x120
? f2fs_gc+0x540/0x540 [f2fs]
? kthread_create_on_node+0x30/0x30
ret_from_fork+0x19/0x24
The deadlock occurs in below condition:
GC Thread Thread B
- sb_start_intwrite
- f2fs_sync_file
- f2fs_sync_fs
- mutex_lock(&sbi->gc_mutex)
- write_checkpoint
- block_operations
- f2fs_sync_inode_meta
- iput
- sb_start_intwrite
- mutex_lock(&sbi->gc_mutex)
Fix this by altering sb_start_intwrite to sb_start_write_trylock.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-07-22 08:52:23 +08:00
|
|
|
sb_start_intwrite(sbi->sb);
|
|
|
|
|
2017-10-04 09:08:34 +08:00
|
|
|
issued = __issue_discard_cmd(sbi, &dpolicy);
|
2018-04-08 15:11:11 +08:00
|
|
|
if (issued > 0) {
|
2017-10-04 09:08:34 +08:00
|
|
|
__wait_all_discard_cmd(sbi, &dpolicy);
|
|
|
|
wait_ms = dpolicy.min_interval;
|
2018-04-08 15:11:11 +08:00
|
|
|
} else if (issued == -1){
|
2018-09-19 16:48:47 +08:00
|
|
|
wait_ms = f2fs_time_to_wait(sbi, DISCARD_TIME);
|
|
|
|
if (!wait_ms)
|
2018-08-31 17:39:26 +08:00
|
|
|
wait_ms = dpolicy.mid_interval;
|
f2fs: introduce discard_granularity sysfs entry
Commit d618ebaf0aa8 ("f2fs: enable small discard by default") enables
f2fs to issue 4K size discard in real-time discard mode. However, issuing
smaller discard may cost more lifetime but releasing less free space in
flash device. Since f2fs has ability of separating hot/cold data and
garbage collection, we can expect that small-sized invalid region would
expand soon with OPU, deletion or garbage collection on valid datas, so
it's better to delay or skip issuing smaller size discards, it could help
to reduce overmuch consumption of IO bandwidth and lifetime of flash
storage.
This patch makes f2fs selectng 64K size as its default minimal
granularity, and issue discard with the size which is not smaller than
minimal granularity. Also it exposes discard granularity as sysfs entry
for configuration in different scenario.
Jaegeuk Kim:
We must issue all the accumulated discard commands when fstrim is called.
So, I've added pend_list_tag[] to indicate whether we should issue the
commands or not. If tag sets P_ACTIVE or P_TRIM, we have to issue them.
P_TRIM is set once at a time, given fstrim trigger.
In addition, issue_discard_thread is calling too much due to the number of
discard commands remaining in the pending list. I added a timer to control
it likewise gc_thread.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-08-07 23:09:56 +08:00
|
|
|
} else {
|
2017-10-04 09:08:34 +08:00
|
|
|
wait_ms = dpolicy.max_interval;
|
f2fs: introduce discard_granularity sysfs entry
Commit d618ebaf0aa8 ("f2fs: enable small discard by default") enables
f2fs to issue 4K size discard in real-time discard mode. However, issuing
smaller discard may cost more lifetime but releasing less free space in
flash device. Since f2fs has ability of separating hot/cold data and
garbage collection, we can expect that small-sized invalid region would
expand soon with OPU, deletion or garbage collection on valid datas, so
it's better to delay or skip issuing smaller size discards, it could help
to reduce overmuch consumption of IO bandwidth and lifetime of flash
storage.
This patch makes f2fs selectng 64K size as its default minimal
granularity, and issue discard with the size which is not smaller than
minimal granularity. Also it exposes discard granularity as sysfs entry
for configuration in different scenario.
Jaegeuk Kim:
We must issue all the accumulated discard commands when fstrim is called.
So, I've added pend_list_tag[] to indicate whether we should issue the
commands or not. If tag sets P_ACTIVE or P_TRIM, we have to issue them.
P_TRIM is set once at a time, given fstrim trigger.
In addition, issue_discard_thread is calling too much due to the number of
discard commands remaining in the pending list. I added a timer to control
it likewise gc_thread.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-08-07 23:09:56 +08:00
|
|
|
}
|
2017-05-18 01:36:58 +08:00
|
|
|
|
f2fs: make background threads of f2fs being aware of freezing
When ->freeze_fs is called from lvm for doing snapshot, it needs to
make sure there will be no more changes in filesystem's data, however,
previously, background threads like GC thread wasn't aware of freezing,
so in environment with active background threads, data of snapshot
becomes unstable.
This patch fixes this issue by adding sb_{start,end}_intwrite in
below background threads:
- GC thread
- flush thread
- discard thread
Note that, don't use sb_start_intwrite() in gc_thread_func() due to:
generic/241 reports below bug:
======================================================
WARNING: possible circular locking dependency detected
4.13.0-rc1+ #32 Tainted: G O
------------------------------------------------------
f2fs_gc-250:0/22186 is trying to acquire lock:
(&sbi->gc_mutex){+.+...}, at: [<f8fa7f0b>] f2fs_sync_fs+0x7b/0x1b0 [f2fs]
but task is already holding lock:
(sb_internal#2){++++.-}, at: [<f8fb5609>] gc_thread_func+0x159/0x4a0 [f2fs]
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #2 (sb_internal#2){++++.-}:
__lock_acquire+0x405/0x7b0
lock_acquire+0xae/0x220
__sb_start_write+0x11d/0x1f0
f2fs_evict_inode+0x2d6/0x4e0 [f2fs]
evict+0xa8/0x170
iput+0x1fb/0x2c0
f2fs_sync_inode_meta+0x3f/0xf0 [f2fs]
write_checkpoint+0x1b1/0x750 [f2fs]
f2fs_sync_fs+0x85/0x1b0 [f2fs]
f2fs_do_sync_file.isra.24+0x137/0xa30 [f2fs]
f2fs_sync_file+0x34/0x40 [f2fs]
vfs_fsync_range+0x4a/0xa0
do_fsync+0x3c/0x60
SyS_fdatasync+0x15/0x20
do_fast_syscall_32+0xa1/0x1b0
entry_SYSENTER_32+0x4c/0x7b
-> #1 (&sbi->cp_mutex){+.+...}:
__lock_acquire+0x405/0x7b0
lock_acquire+0xae/0x220
__mutex_lock+0x4f/0x830
mutex_lock_nested+0x25/0x30
write_checkpoint+0x2f/0x750 [f2fs]
f2fs_sync_fs+0x85/0x1b0 [f2fs]
sync_filesystem+0x67/0x80
generic_shutdown_super+0x27/0x100
kill_block_super+0x22/0x50
kill_f2fs_super+0x3a/0x40 [f2fs]
deactivate_locked_super+0x3d/0x70
deactivate_super+0x40/0x60
cleanup_mnt+0x39/0x70
__cleanup_mnt+0x10/0x20
task_work_run+0x69/0x80
exit_to_usermode_loop+0x57/0x92
do_fast_syscall_32+0x18c/0x1b0
entry_SYSENTER_32+0x4c/0x7b
-> #0 (&sbi->gc_mutex){+.+...}:
validate_chain.isra.36+0xc50/0xdb0
__lock_acquire+0x405/0x7b0
lock_acquire+0xae/0x220
__mutex_lock+0x4f/0x830
mutex_lock_nested+0x25/0x30
f2fs_sync_fs+0x7b/0x1b0 [f2fs]
f2fs_balance_fs_bg+0xb9/0x200 [f2fs]
gc_thread_func+0x302/0x4a0 [f2fs]
kthread+0xe9/0x120
ret_from_fork+0x19/0x24
other info that might help us debug this:
Chain exists of:
&sbi->gc_mutex --> &sbi->cp_mutex --> sb_internal#2
Possible unsafe locking scenario:
CPU0 CPU1
---- ----
lock(sb_internal#2);
lock(&sbi->cp_mutex);
lock(sb_internal#2);
lock(&sbi->gc_mutex);
*** DEADLOCK ***
1 lock held by f2fs_gc-250:0/22186:
#0: (sb_internal#2){++++.-}, at: [<f8fb5609>] gc_thread_func+0x159/0x4a0 [f2fs]
stack backtrace:
CPU: 2 PID: 22186 Comm: f2fs_gc-250:0 Tainted: G O 4.13.0-rc1+ #32
Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
Call Trace:
dump_stack+0x5f/0x92
print_circular_bug+0x1b3/0x1bd
validate_chain.isra.36+0xc50/0xdb0
? __this_cpu_preempt_check+0xf/0x20
__lock_acquire+0x405/0x7b0
lock_acquire+0xae/0x220
? f2fs_sync_fs+0x7b/0x1b0 [f2fs]
__mutex_lock+0x4f/0x830
? f2fs_sync_fs+0x7b/0x1b0 [f2fs]
mutex_lock_nested+0x25/0x30
? f2fs_sync_fs+0x7b/0x1b0 [f2fs]
f2fs_sync_fs+0x7b/0x1b0 [f2fs]
f2fs_balance_fs_bg+0xb9/0x200 [f2fs]
gc_thread_func+0x302/0x4a0 [f2fs]
? preempt_schedule_common+0x2f/0x4d
? f2fs_gc+0x540/0x540 [f2fs]
kthread+0xe9/0x120
? f2fs_gc+0x540/0x540 [f2fs]
? kthread_create_on_node+0x30/0x30
ret_from_fork+0x19/0x24
The deadlock occurs in below condition:
GC Thread Thread B
- sb_start_intwrite
- f2fs_sync_file
- f2fs_sync_fs
- mutex_lock(&sbi->gc_mutex)
- write_checkpoint
- block_operations
- f2fs_sync_inode_meta
- iput
- sb_start_intwrite
- mutex_lock(&sbi->gc_mutex)
Fix this by altering sb_start_intwrite to sb_start_write_trylock.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-07-22 08:52:23 +08:00
|
|
|
sb_end_intwrite(sbi->sb);
|
2017-05-18 01:36:58 +08:00
|
|
|
|
|
|
|
} while (!kthread_should_stop());
|
|
|
|
return 0;
|
2017-01-10 12:32:07 +08:00
|
|
|
}
|
|
|
|
|
2016-10-28 16:45:06 +08:00
|
|
|
#ifdef CONFIG_BLK_DEV_ZONED
|
2016-10-07 10:02:05 +08:00
|
|
|
static int __f2fs_issue_discard_zone(struct f2fs_sb_info *sbi,
|
|
|
|
struct block_device *bdev, block_t blkstart, block_t blklen)
|
2016-10-28 16:45:06 +08:00
|
|
|
{
|
2017-02-23 12:18:35 +08:00
|
|
|
sector_t sector, nr_sects;
|
2017-03-08 09:49:53 +08:00
|
|
|
block_t lblkstart = blkstart;
|
2016-10-07 10:02:05 +08:00
|
|
|
int devi = 0;
|
|
|
|
|
|
|
|
if (sbi->s_ndevs) {
|
|
|
|
devi = f2fs_target_device_index(sbi, blkstart);
|
|
|
|
blkstart -= FDEV(devi).start_blk;
|
|
|
|
}
|
2016-10-28 16:45:06 +08:00
|
|
|
|
|
|
|
/*
|
|
|
|
* We need to know the type of the zone: for conventional zones,
|
|
|
|
* use regular discard if the drive supports it. For sequential
|
|
|
|
* zones, reset the zone write pointer.
|
|
|
|
*/
|
2016-10-07 10:02:05 +08:00
|
|
|
switch (get_blkz_type(sbi, bdev, blkstart)) {
|
2016-10-28 16:45:06 +08:00
|
|
|
|
|
|
|
case BLK_ZONE_TYPE_CONVENTIONAL:
|
|
|
|
if (!blk_queue_discard(bdev_get_queue(bdev)))
|
|
|
|
return 0;
|
2017-03-08 10:02:02 +08:00
|
|
|
return __queue_discard_cmd(sbi, bdev, lblkstart, blklen);
|
2016-10-28 16:45:06 +08:00
|
|
|
case BLK_ZONE_TYPE_SEQWRITE_REQ:
|
|
|
|
case BLK_ZONE_TYPE_SEQWRITE_PREF:
|
2017-02-23 12:18:35 +08:00
|
|
|
sector = SECTOR_FROM_BLOCK(blkstart);
|
|
|
|
nr_sects = SECTOR_FROM_BLOCK(blklen);
|
|
|
|
|
|
|
|
if (sector & (bdev_zone_sectors(bdev) - 1) ||
|
|
|
|
nr_sects != bdev_zone_sectors(bdev)) {
|
|
|
|
f2fs_msg(sbi->sb, KERN_INFO,
|
|
|
|
"(%d) %s: Unaligned discard attempted (block %x + %x)",
|
|
|
|
devi, sbi->s_ndevs ? FDEV(devi).path: "",
|
|
|
|
blkstart, blklen);
|
|
|
|
return -EIO;
|
|
|
|
}
|
2017-02-16 03:14:06 +08:00
|
|
|
trace_f2fs_issue_reset_zone(bdev, blkstart);
|
2016-10-28 16:45:06 +08:00
|
|
|
return blkdev_reset_zones(bdev, sector,
|
|
|
|
nr_sects, GFP_NOFS);
|
|
|
|
default:
|
|
|
|
/* Unknown zone type: broken device ? */
|
|
|
|
return -EIO;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
#endif
|
|
|
|
|
2016-10-07 10:02:05 +08:00
|
|
|
static int __issue_discard_async(struct f2fs_sb_info *sbi,
|
|
|
|
struct block_device *bdev, block_t blkstart, block_t blklen)
|
|
|
|
{
|
|
|
|
#ifdef CONFIG_BLK_DEV_ZONED
|
2018-02-06 12:31:17 +08:00
|
|
|
if (f2fs_sb_has_blkzoned(sbi->sb) &&
|
2016-10-07 10:02:05 +08:00
|
|
|
bdev_zoned_model(bdev) != BLK_ZONED_NONE)
|
|
|
|
return __f2fs_issue_discard_zone(sbi, bdev, blkstart, blklen);
|
|
|
|
#endif
|
2017-03-08 10:02:02 +08:00
|
|
|
return __queue_discard_cmd(sbi, bdev, blkstart, blklen);
|
2016-10-07 10:02:05 +08:00
|
|
|
}
|
|
|
|
|
2014-04-15 12:57:55 +08:00
|
|
|
static int f2fs_issue_discard(struct f2fs_sb_info *sbi,
|
2013-11-12 15:55:17 +08:00
|
|
|
block_t blkstart, block_t blklen)
|
|
|
|
{
|
2016-10-07 10:02:05 +08:00
|
|
|
sector_t start = blkstart, len = 0;
|
|
|
|
struct block_device *bdev;
|
2015-05-01 13:37:50 +08:00
|
|
|
struct seg_entry *se;
|
|
|
|
unsigned int offset;
|
|
|
|
block_t i;
|
2016-10-07 10:02:05 +08:00
|
|
|
int err = 0;
|
|
|
|
|
|
|
|
bdev = f2fs_target_device(sbi, blkstart, NULL);
|
|
|
|
|
|
|
|
for (i = blkstart; i < blkstart + blklen; i++, len++) {
|
|
|
|
if (i != start) {
|
|
|
|
struct block_device *bdev2 =
|
|
|
|
f2fs_target_device(sbi, i, NULL);
|
|
|
|
|
|
|
|
if (bdev2 != bdev) {
|
|
|
|
err = __issue_discard_async(sbi, bdev,
|
|
|
|
start, len);
|
|
|
|
if (err)
|
|
|
|
return err;
|
|
|
|
bdev = bdev2;
|
|
|
|
start = i;
|
|
|
|
len = 0;
|
|
|
|
}
|
|
|
|
}
|
2015-05-01 13:37:50 +08:00
|
|
|
|
|
|
|
se = get_seg_entry(sbi, GET_SEGNO(sbi, i));
|
|
|
|
offset = GET_BLKOFF_FROM_SEG0(sbi, i);
|
|
|
|
|
|
|
|
if (!f2fs_test_and_set_bit(offset, se->discard_map))
|
|
|
|
sbi->discard_blks--;
|
|
|
|
}
|
2016-10-28 16:45:06 +08:00
|
|
|
|
2016-10-07 10:02:05 +08:00
|
|
|
if (len)
|
|
|
|
err = __issue_discard_async(sbi, bdev, start, len);
|
|
|
|
return err;
|
2014-04-15 12:57:55 +08:00
|
|
|
}
|
|
|
|
|
2016-12-30 14:06:15 +08:00
|
|
|
static bool add_discard_addrs(struct f2fs_sb_info *sbi, struct cp_control *cpc,
|
|
|
|
bool check_only)
|
2014-10-29 13:27:59 +08:00
|
|
|
{
|
2013-11-12 13:49:56 +08:00
|
|
|
int entries = SIT_VBLOCK_MAP_SIZE / sizeof(unsigned long);
|
|
|
|
int max_blocks = sbi->blocks_per_seg;
|
2014-09-21 13:06:39 +08:00
|
|
|
struct seg_entry *se = get_seg_entry(sbi, cpc->trim_start);
|
2013-11-12 13:49:56 +08:00
|
|
|
unsigned long *cur_map = (unsigned long *)se->cur_valid_map;
|
|
|
|
unsigned long *ckpt_map = (unsigned long *)se->ckpt_valid_map;
|
2015-05-01 13:37:50 +08:00
|
|
|
unsigned long *discard_map = (unsigned long *)se->discard_map;
|
2015-02-11 08:44:29 +08:00
|
|
|
unsigned long *dmap = SIT_I(sbi)->tmp_map;
|
2013-11-12 13:49:56 +08:00
|
|
|
unsigned int start = 0, end = -1;
|
2017-04-27 20:40:39 +08:00
|
|
|
bool force = (cpc->reason & CP_DISCARD);
|
2017-03-28 18:18:50 +08:00
|
|
|
struct discard_entry *de = NULL;
|
2017-04-15 14:09:36 +08:00
|
|
|
struct list_head *head = &SM_I(sbi)->dcc_info->entry_list;
|
2013-11-12 13:49:56 +08:00
|
|
|
int i;
|
|
|
|
|
f2fs: fix to avoid NULL pointer dereference on se->discard_map
https://bugzilla.kernel.org/show_bug.cgi?id=200951
These is a NULL pointer dereference issue reported in bugzilla:
Hi,
in the setup there is a SATA SSD connected to a SATA-to-USB bridge.
The disc is "Samsung SSD 850 PRO 256G" which supports TRIM.
There are four partitions:
sda1: FAT /boot
sda2: F2FS /
sda3: F2FS /home
sda4: F2FS
The bridge is ASMT1153e which uses the "uas" driver.
There is no TRIM pass-through, so, when mounting it reports:
mounting with "discard" option, but the device does not support discard
The USB host is USB3.0 and UASP capable. It is the one on RK3399.
Given this everything works fine, except there is no TRIM support.
In order to enable TRIM a new UDEV rule is added [1]:
/etc/udev/rules.d/10-sata-bridge-trim.rules:
ACTION=="add|change", ATTRS{idVendor}=="174c", ATTRS{idProduct}=="55aa", SUBSYSTEM=="scsi_disk", ATTR{provisioning_mode}="unmap"
After reboot any F2FS write hangs forever and dmesg reports:
Unable to handle kernel NULL pointer dereference
Also tested on a x86_64 system: works fine even with TRIM enabled.
same disc
same bridge
different usb host controller
different cpu architecture
not root filesystem
Regards,
Vicenç.
[1] Post #5 in https://bbs.archlinux.org/viewtopic.php?id=236280
Unable to handle kernel NULL pointer dereference at virtual address 000000000000003e
Mem abort info:
ESR = 0x96000004
Exception class = DABT (current EL), IL = 32 bits
SET = 0, FnV = 0
EA = 0, S1PTW = 0
Data abort info:
ISV = 0, ISS = 0x00000004
CM = 0, WnR = 0
user pgtable: 4k pages, 48-bit VAs, pgdp = 00000000626e3122
[000000000000003e] pgd=0000000000000000
Internal error: Oops: 96000004 [#1] SMP
Modules linked in: overlay snd_soc_hdmi_codec rc_cec dw_hdmi_i2s_audio dw_hdmi_cec snd_soc_simple_card snd_soc_simple_card_utils snd_soc_rockchip_i2s rockchip_rga snd_soc_rockchip_pcm rockchipdrm videobuf2_dma_sg v4l2_mem2mem rtc_rk808 videobuf2_memops analogix_dp videobuf2_v4l2 videobuf2_common dw_hdmi dw_wdt cec rc_core videodev drm_kms_helper media drm rockchip_thermal rockchip_saradc realtek drm_panel_orientation_quirks syscopyarea sysfillrect sysimgblt fb_sys_fops dwmac_rk stmmac_platform stmmac pwm_bl squashfs loop crypto_user gpio_keys hid_kensington
CPU: 5 PID: 957 Comm: nvim Not tainted 4.19.0-rc1-1-ARCH #1
Hardware name: Sapphire-RK3399 Board (DT)
pstate: 00000005 (nzcv daif -PAN -UAO)
pc : update_sit_entry+0x304/0x4b0
lr : update_sit_entry+0x108/0x4b0
sp : ffff00000ca13bd0
x29: ffff00000ca13bd0 x28: 000000000000003e
x27: 0000000000000020 x26: 0000000000080000
x25: 0000000000000048 x24: ffff8000ebb85cf8
x23: 0000000000000253 x22: 00000000ffffffff
x21: 00000000000535f2 x20: 00000000ffffffdf
x19: ffff8000eb9e6800 x18: ffff8000eb9e6be8
x17: 0000000007ce6926 x16: 000000001c83ffa8
x15: 0000000000000000 x14: ffff8000f602df90
x13: 0000000000000006 x12: 0000000000000040
x11: 0000000000000228 x10: 0000000000000000
x9 : 0000000000000000 x8 : 0000000000000000
x7 : 00000000000535f2 x6 : ffff8000ebff3440
x5 : ffff8000ebff3440 x4 : ffff8000ebe3a6c8
x3 : 00000000ffffffff x2 : 0000000000000020
x1 : 0000000000000000 x0 : ffff8000eb9e5800
Process nvim (pid: 957, stack limit = 0x0000000063a78320)
Call trace:
update_sit_entry+0x304/0x4b0
f2fs_invalidate_blocks+0x98/0x140
truncate_node+0x90/0x400
f2fs_remove_inode_page+0xe8/0x340
f2fs_evict_inode+0x2b0/0x408
evict+0xe0/0x1e0
iput+0x160/0x260
do_unlinkat+0x214/0x298
__arm64_sys_unlinkat+0x3c/0x68
el0_svc_handler+0x94/0x118
el0_svc+0x8/0xc
Code: f9400800 b9488400 36080140 f9400f01 (387c4820)
---[ end trace a0f21a307118c477 ]---
The reason is it is possible to enable discard flag on block queue via
UDEV, but during mount, f2fs will initialize se->discard_map only if
this flag is set, once the flag is set after mount, f2fs may dereference
NULL pointer on se->discard_map.
So this patch does below changes to fix this issue:
- initialize and update se->discard_map all the time.
- don't clear DISCARD option if device has no QUEUE_FLAG_DISCARD flag
during mount.
- don't issue small discard on zoned block device.
- introduce some functions to enhance the readability.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Tested-by: Vicente Bergas <vicencb@gmail.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-09-04 03:52:17 +08:00
|
|
|
if (se->valid_blocks == max_blocks || !f2fs_hw_support_discard(sbi))
|
2016-12-30 14:06:15 +08:00
|
|
|
return false;
|
2013-11-12 13:49:56 +08:00
|
|
|
|
2015-05-01 13:37:50 +08:00
|
|
|
if (!force) {
|
f2fs: fix to avoid NULL pointer dereference on se->discard_map
https://bugzilla.kernel.org/show_bug.cgi?id=200951
These is a NULL pointer dereference issue reported in bugzilla:
Hi,
in the setup there is a SATA SSD connected to a SATA-to-USB bridge.
The disc is "Samsung SSD 850 PRO 256G" which supports TRIM.
There are four partitions:
sda1: FAT /boot
sda2: F2FS /
sda3: F2FS /home
sda4: F2FS
The bridge is ASMT1153e which uses the "uas" driver.
There is no TRIM pass-through, so, when mounting it reports:
mounting with "discard" option, but the device does not support discard
The USB host is USB3.0 and UASP capable. It is the one on RK3399.
Given this everything works fine, except there is no TRIM support.
In order to enable TRIM a new UDEV rule is added [1]:
/etc/udev/rules.d/10-sata-bridge-trim.rules:
ACTION=="add|change", ATTRS{idVendor}=="174c", ATTRS{idProduct}=="55aa", SUBSYSTEM=="scsi_disk", ATTR{provisioning_mode}="unmap"
After reboot any F2FS write hangs forever and dmesg reports:
Unable to handle kernel NULL pointer dereference
Also tested on a x86_64 system: works fine even with TRIM enabled.
same disc
same bridge
different usb host controller
different cpu architecture
not root filesystem
Regards,
Vicenç.
[1] Post #5 in https://bbs.archlinux.org/viewtopic.php?id=236280
Unable to handle kernel NULL pointer dereference at virtual address 000000000000003e
Mem abort info:
ESR = 0x96000004
Exception class = DABT (current EL), IL = 32 bits
SET = 0, FnV = 0
EA = 0, S1PTW = 0
Data abort info:
ISV = 0, ISS = 0x00000004
CM = 0, WnR = 0
user pgtable: 4k pages, 48-bit VAs, pgdp = 00000000626e3122
[000000000000003e] pgd=0000000000000000
Internal error: Oops: 96000004 [#1] SMP
Modules linked in: overlay snd_soc_hdmi_codec rc_cec dw_hdmi_i2s_audio dw_hdmi_cec snd_soc_simple_card snd_soc_simple_card_utils snd_soc_rockchip_i2s rockchip_rga snd_soc_rockchip_pcm rockchipdrm videobuf2_dma_sg v4l2_mem2mem rtc_rk808 videobuf2_memops analogix_dp videobuf2_v4l2 videobuf2_common dw_hdmi dw_wdt cec rc_core videodev drm_kms_helper media drm rockchip_thermal rockchip_saradc realtek drm_panel_orientation_quirks syscopyarea sysfillrect sysimgblt fb_sys_fops dwmac_rk stmmac_platform stmmac pwm_bl squashfs loop crypto_user gpio_keys hid_kensington
CPU: 5 PID: 957 Comm: nvim Not tainted 4.19.0-rc1-1-ARCH #1
Hardware name: Sapphire-RK3399 Board (DT)
pstate: 00000005 (nzcv daif -PAN -UAO)
pc : update_sit_entry+0x304/0x4b0
lr : update_sit_entry+0x108/0x4b0
sp : ffff00000ca13bd0
x29: ffff00000ca13bd0 x28: 000000000000003e
x27: 0000000000000020 x26: 0000000000080000
x25: 0000000000000048 x24: ffff8000ebb85cf8
x23: 0000000000000253 x22: 00000000ffffffff
x21: 00000000000535f2 x20: 00000000ffffffdf
x19: ffff8000eb9e6800 x18: ffff8000eb9e6be8
x17: 0000000007ce6926 x16: 000000001c83ffa8
x15: 0000000000000000 x14: ffff8000f602df90
x13: 0000000000000006 x12: 0000000000000040
x11: 0000000000000228 x10: 0000000000000000
x9 : 0000000000000000 x8 : 0000000000000000
x7 : 00000000000535f2 x6 : ffff8000ebff3440
x5 : ffff8000ebff3440 x4 : ffff8000ebe3a6c8
x3 : 00000000ffffffff x2 : 0000000000000020
x1 : 0000000000000000 x0 : ffff8000eb9e5800
Process nvim (pid: 957, stack limit = 0x0000000063a78320)
Call trace:
update_sit_entry+0x304/0x4b0
f2fs_invalidate_blocks+0x98/0x140
truncate_node+0x90/0x400
f2fs_remove_inode_page+0xe8/0x340
f2fs_evict_inode+0x2b0/0x408
evict+0xe0/0x1e0
iput+0x160/0x260
do_unlinkat+0x214/0x298
__arm64_sys_unlinkat+0x3c/0x68
el0_svc_handler+0x94/0x118
el0_svc+0x8/0xc
Code: f9400800 b9488400 36080140 f9400f01 (387c4820)
---[ end trace a0f21a307118c477 ]---
The reason is it is possible to enable discard flag on block queue via
UDEV, but during mount, f2fs will initialize se->discard_map only if
this flag is set, once the flag is set after mount, f2fs may dereference
NULL pointer on se->discard_map.
So this patch does below changes to fix this issue:
- initialize and update se->discard_map all the time.
- don't clear DISCARD option if device has no QUEUE_FLAG_DISCARD flag
during mount.
- don't issue small discard on zoned block device.
- introduce some functions to enhance the readability.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Tested-by: Vicente Bergas <vicencb@gmail.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-09-04 03:52:17 +08:00
|
|
|
if (!f2fs_realtime_discard_enable(sbi) || !se->valid_blocks ||
|
2017-01-12 06:40:24 +08:00
|
|
|
SM_I(sbi)->dcc_info->nr_discards >=
|
|
|
|
SM_I(sbi)->dcc_info->max_discards)
|
2016-12-30 14:06:15 +08:00
|
|
|
return false;
|
2014-09-21 13:06:39 +08:00
|
|
|
}
|
|
|
|
|
2013-11-12 13:49:56 +08:00
|
|
|
/* SIT_VBLOCK_MAP_SIZE should be multiple of sizeof(unsigned long) */
|
|
|
|
for (i = 0; i < entries; i++)
|
2015-05-01 13:37:50 +08:00
|
|
|
dmap[i] = force ? ~ckpt_map[i] & ~discard_map[i] :
|
2014-12-13 05:53:41 +08:00
|
|
|
(cur_map[i] ^ ckpt_map[i]) & ckpt_map[i];
|
2013-11-12 13:49:56 +08:00
|
|
|
|
2017-01-12 06:40:24 +08:00
|
|
|
while (force || SM_I(sbi)->dcc_info->nr_discards <=
|
|
|
|
SM_I(sbi)->dcc_info->max_discards) {
|
2013-11-12 13:49:56 +08:00
|
|
|
start = __find_rev_next_bit(dmap, max_blocks, end + 1);
|
|
|
|
if (start >= max_blocks)
|
|
|
|
break;
|
|
|
|
|
|
|
|
end = __find_rev_next_zero_bit(dmap, max_blocks, start + 1);
|
2016-07-07 12:13:33 +08:00
|
|
|
if (force && start && end != max_blocks
|
|
|
|
&& (end - start) < cpc->trim_minlen)
|
|
|
|
continue;
|
|
|
|
|
2016-12-30 14:06:15 +08:00
|
|
|
if (check_only)
|
|
|
|
return true;
|
|
|
|
|
2017-03-28 18:18:50 +08:00
|
|
|
if (!de) {
|
|
|
|
de = f2fs_kmem_cache_alloc(discard_entry_slab,
|
|
|
|
GFP_F2FS_ZERO);
|
|
|
|
de->start_blkaddr = START_BLOCK(sbi, cpc->trim_start);
|
|
|
|
list_add_tail(&de->list, head);
|
|
|
|
}
|
|
|
|
|
|
|
|
for (i = start; i < end; i++)
|
|
|
|
__set_bit_le(i, (void *)de->discard_map);
|
|
|
|
|
|
|
|
SM_I(sbi)->dcc_info->nr_discards += end - start;
|
2013-11-12 13:49:56 +08:00
|
|
|
}
|
2016-12-30 14:06:15 +08:00
|
|
|
return false;
|
2013-11-12 13:49:56 +08:00
|
|
|
}
|
|
|
|
|
2018-04-25 17:38:29 +08:00
|
|
|
static void release_discard_addr(struct discard_entry *entry)
|
|
|
|
{
|
|
|
|
list_del(&entry->list);
|
|
|
|
kmem_cache_free(discard_entry_slab, entry);
|
|
|
|
}
|
|
|
|
|
f2fs: clean up symbol namespace
As Ted reported:
"Hi, I was looking at f2fs's sources recently, and I noticed that there
is a very large number of non-static symbols which don't have a f2fs
prefix. There's well over a hundred (see attached below).
As one example, in fs/f2fs/dir.c there is:
unsigned char get_de_type(struct f2fs_dir_entry *de)
This function is clearly only useful for f2fs, but it has a generic
name. This means that if any other file system tries to have the same
symbol name, there will be a symbol conflict and the kernel would not
successfully build. It also means that when someone is looking f2fs
sources, it's not at all obvious whether a function such as
read_data_page(), invalidate_blocks(), is a generic kernel function
found in the fs, mm, or block layers, or a f2fs specific function.
You might want to fix this at some point. Hopefully Kent's bcachefs
isn't similarly using genericly named functions, since that might
cause conflicts with f2fs's functions --- but just as this would be a
problem that we would rightly insist that Kent fix, this is something
that we should have rightly insisted that f2fs should have fixed
before it was integrated into the mainline kernel.
acquire_orphan_inode
add_ino_entry
add_orphan_inode
allocate_data_block
allocate_new_segments
alloc_nid
alloc_nid_done
alloc_nid_failed
available_free_memory
...."
This patch adds "f2fs_" prefix for all non-static symbols in order to:
a) avoid conflict with other kernel generic symbols;
b) to indicate the function is f2fs specific one instead of generic
one;
Reported-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-30 00:20:41 +08:00
|
|
|
void f2fs_release_discard_addrs(struct f2fs_sb_info *sbi)
|
2014-09-21 13:06:39 +08:00
|
|
|
{
|
2017-04-15 14:09:36 +08:00
|
|
|
struct list_head *head = &(SM_I(sbi)->dcc_info->entry_list);
|
2014-09-21 13:06:39 +08:00
|
|
|
struct discard_entry *entry, *this;
|
|
|
|
|
|
|
|
/* drop caches */
|
2018-04-25 17:38:29 +08:00
|
|
|
list_for_each_entry_safe(entry, this, head, list)
|
|
|
|
release_discard_addr(entry);
|
2014-09-21 13:06:39 +08:00
|
|
|
}
|
|
|
|
|
2012-11-29 12:28:09 +08:00
|
|
|
/*
|
f2fs: clean up symbol namespace
As Ted reported:
"Hi, I was looking at f2fs's sources recently, and I noticed that there
is a very large number of non-static symbols which don't have a f2fs
prefix. There's well over a hundred (see attached below).
As one example, in fs/f2fs/dir.c there is:
unsigned char get_de_type(struct f2fs_dir_entry *de)
This function is clearly only useful for f2fs, but it has a generic
name. This means that if any other file system tries to have the same
symbol name, there will be a symbol conflict and the kernel would not
successfully build. It also means that when someone is looking f2fs
sources, it's not at all obvious whether a function such as
read_data_page(), invalidate_blocks(), is a generic kernel function
found in the fs, mm, or block layers, or a f2fs specific function.
You might want to fix this at some point. Hopefully Kent's bcachefs
isn't similarly using genericly named functions, since that might
cause conflicts with f2fs's functions --- but just as this would be a
problem that we would rightly insist that Kent fix, this is something
that we should have rightly insisted that f2fs should have fixed
before it was integrated into the mainline kernel.
acquire_orphan_inode
add_ino_entry
add_orphan_inode
allocate_data_block
allocate_new_segments
alloc_nid
alloc_nid_done
alloc_nid_failed
available_free_memory
...."
This patch adds "f2fs_" prefix for all non-static symbols in order to:
a) avoid conflict with other kernel generic symbols;
b) to indicate the function is f2fs specific one instead of generic
one;
Reported-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-30 00:20:41 +08:00
|
|
|
* Should call f2fs_clear_prefree_segments after checkpoint is done.
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
*/
|
|
|
|
static void set_prefree_as_free_segments(struct f2fs_sb_info *sbi)
|
|
|
|
{
|
|
|
|
struct dirty_seglist_info *dirty_i = DIRTY_I(sbi);
|
2014-08-04 10:10:07 +08:00
|
|
|
unsigned int segno;
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
|
|
|
|
mutex_lock(&dirty_i->seglist_lock);
|
2014-09-24 02:23:01 +08:00
|
|
|
for_each_set_bit(segno, dirty_i->dirty_segmap[PRE], MAIN_SEGS(sbi))
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
__set_test_and_free(sbi, segno);
|
|
|
|
mutex_unlock(&dirty_i->seglist_lock);
|
|
|
|
}
|
|
|
|
|
f2fs: clean up symbol namespace
As Ted reported:
"Hi, I was looking at f2fs's sources recently, and I noticed that there
is a very large number of non-static symbols which don't have a f2fs
prefix. There's well over a hundred (see attached below).
As one example, in fs/f2fs/dir.c there is:
unsigned char get_de_type(struct f2fs_dir_entry *de)
This function is clearly only useful for f2fs, but it has a generic
name. This means that if any other file system tries to have the same
symbol name, there will be a symbol conflict and the kernel would not
successfully build. It also means that when someone is looking f2fs
sources, it's not at all obvious whether a function such as
read_data_page(), invalidate_blocks(), is a generic kernel function
found in the fs, mm, or block layers, or a f2fs specific function.
You might want to fix this at some point. Hopefully Kent's bcachefs
isn't similarly using genericly named functions, since that might
cause conflicts with f2fs's functions --- but just as this would be a
problem that we would rightly insist that Kent fix, this is something
that we should have rightly insisted that f2fs should have fixed
before it was integrated into the mainline kernel.
acquire_orphan_inode
add_ino_entry
add_orphan_inode
allocate_data_block
allocate_new_segments
alloc_nid
alloc_nid_done
alloc_nid_failed
available_free_memory
...."
This patch adds "f2fs_" prefix for all non-static symbols in order to:
a) avoid conflict with other kernel generic symbols;
b) to indicate the function is f2fs specific one instead of generic
one;
Reported-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-30 00:20:41 +08:00
|
|
|
void f2fs_clear_prefree_segments(struct f2fs_sb_info *sbi,
|
|
|
|
struct cp_control *cpc)
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
{
|
f2fs: introduce discard_granularity sysfs entry
Commit d618ebaf0aa8 ("f2fs: enable small discard by default") enables
f2fs to issue 4K size discard in real-time discard mode. However, issuing
smaller discard may cost more lifetime but releasing less free space in
flash device. Since f2fs has ability of separating hot/cold data and
garbage collection, we can expect that small-sized invalid region would
expand soon with OPU, deletion or garbage collection on valid datas, so
it's better to delay or skip issuing smaller size discards, it could help
to reduce overmuch consumption of IO bandwidth and lifetime of flash
storage.
This patch makes f2fs selectng 64K size as its default minimal
granularity, and issue discard with the size which is not smaller than
minimal granularity. Also it exposes discard granularity as sysfs entry
for configuration in different scenario.
Jaegeuk Kim:
We must issue all the accumulated discard commands when fstrim is called.
So, I've added pend_list_tag[] to indicate whether we should issue the
commands or not. If tag sets P_ACTIVE or P_TRIM, we have to issue them.
P_TRIM is set once at a time, given fstrim trigger.
In addition, issue_discard_thread is calling too much due to the number of
discard commands remaining in the pending list. I added a timer to control
it likewise gc_thread.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-08-07 23:09:56 +08:00
|
|
|
struct discard_cmd_control *dcc = SM_I(sbi)->dcc_info;
|
|
|
|
struct list_head *head = &dcc->entry_list;
|
2014-03-29 11:33:17 +08:00
|
|
|
struct discard_entry *entry, *this;
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
struct dirty_seglist_info *dirty_i = DIRTY_I(sbi);
|
2013-11-11 08:24:37 +08:00
|
|
|
unsigned long *prefree_map = dirty_i->dirty_segmap[PRE];
|
|
|
|
unsigned int start = 0, end = -1;
|
2016-06-04 10:29:38 +08:00
|
|
|
unsigned int secno, start_segno;
|
2017-04-27 20:40:39 +08:00
|
|
|
bool force = (cpc->reason & CP_DISCARD);
|
f2fs: issue discard align to section in LFS mode
For the case when sbi->segs_per_sec > 1 with lfs mode, take
section:segment = 5 for example, if the section prefree_map is
...previous section | current section (1 1 0 1 1) | next section...,
then the start = x, end = x + 1, after start = start_segno +
sbi->segs_per_sec, start = x + 5, then it will skip x + 3 and x + 4, but
their bitmap is still set, which will cause duplicated
f2fs_issue_discard of this same section in the next write_checkpoint:
round 1: section bitmap : 1 1 1 1 1, all valid, prefree_map: 0 0 0 0 0
then rm data block NO.2, block NO.2 becomes invalid, prefree_map: 0 0 1 0 0
write_checkpoint: section bitmap: 1 1 0 1 1, prefree_map: 0 0 0 0 0,
prefree of NO.2 is cleared, and no discard issued
round 2: rm data block NO.0, NO.1, NO.3, NO.4
all invalid, but prefree bit of NO.2 is set and cleared in round 1, then
prefree_map: 1 1 0 1 1
write_checkpoint: section bitmap: 0 0 0 0 0, prefree_map: 0 0 0 1 1, no
valid blocks of this section, so discard issued, but this time prefree
bit of NO.3 and NO.4 is skipped due to start = start_segno + sbi->segs_per_sec;
round 3:
write_checkpoint: section bitmap: 0 0 0 0 0, prefree_map: 0 0 0 1 1 ->
0 0 0 0 0, no valid blocks of this section, so discard issued,
this time prefree bit of NO.3 and NO.4 is cleared, but the discard of
this section is sent again...
To fix this problem, we can align the start and end value to section
boundary for fstrim and real-time discard operation, and decide to issue
discard only when the whole section is invalid, which can issue discard
aligned to section size as much as possible and avoid redundant discard.
Signed-off-by: Yunlong Song <yunlong.song@huawei.com>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-07-19 20:58:15 +08:00
|
|
|
bool need_align = test_opt(sbi, LFS) && sbi->segs_per_sec > 1;
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
|
|
|
|
mutex_lock(&dirty_i->seglist_lock);
|
2013-11-11 08:24:37 +08:00
|
|
|
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
while (1) {
|
2013-11-11 08:24:37 +08:00
|
|
|
int i;
|
f2fs: issue discard align to section in LFS mode
For the case when sbi->segs_per_sec > 1 with lfs mode, take
section:segment = 5 for example, if the section prefree_map is
...previous section | current section (1 1 0 1 1) | next section...,
then the start = x, end = x + 1, after start = start_segno +
sbi->segs_per_sec, start = x + 5, then it will skip x + 3 and x + 4, but
their bitmap is still set, which will cause duplicated
f2fs_issue_discard of this same section in the next write_checkpoint:
round 1: section bitmap : 1 1 1 1 1, all valid, prefree_map: 0 0 0 0 0
then rm data block NO.2, block NO.2 becomes invalid, prefree_map: 0 0 1 0 0
write_checkpoint: section bitmap: 1 1 0 1 1, prefree_map: 0 0 0 0 0,
prefree of NO.2 is cleared, and no discard issued
round 2: rm data block NO.0, NO.1, NO.3, NO.4
all invalid, but prefree bit of NO.2 is set and cleared in round 1, then
prefree_map: 1 1 0 1 1
write_checkpoint: section bitmap: 0 0 0 0 0, prefree_map: 0 0 0 1 1, no
valid blocks of this section, so discard issued, but this time prefree
bit of NO.3 and NO.4 is skipped due to start = start_segno + sbi->segs_per_sec;
round 3:
write_checkpoint: section bitmap: 0 0 0 0 0, prefree_map: 0 0 0 1 1 ->
0 0 0 0 0, no valid blocks of this section, so discard issued,
this time prefree bit of NO.3 and NO.4 is cleared, but the discard of
this section is sent again...
To fix this problem, we can align the start and end value to section
boundary for fstrim and real-time discard operation, and decide to issue
discard only when the whole section is invalid, which can issue discard
aligned to section size as much as possible and avoid redundant discard.
Signed-off-by: Yunlong Song <yunlong.song@huawei.com>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-07-19 20:58:15 +08:00
|
|
|
|
|
|
|
if (need_align && end != -1)
|
|
|
|
end--;
|
2014-09-24 02:23:01 +08:00
|
|
|
start = find_next_bit(prefree_map, MAIN_SEGS(sbi), end + 1);
|
|
|
|
if (start >= MAIN_SEGS(sbi))
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
break;
|
2014-09-24 02:23:01 +08:00
|
|
|
end = find_next_zero_bit(prefree_map, MAIN_SEGS(sbi),
|
|
|
|
start + 1);
|
2013-11-11 08:24:37 +08:00
|
|
|
|
f2fs: issue discard align to section in LFS mode
For the case when sbi->segs_per_sec > 1 with lfs mode, take
section:segment = 5 for example, if the section prefree_map is
...previous section | current section (1 1 0 1 1) | next section...,
then the start = x, end = x + 1, after start = start_segno +
sbi->segs_per_sec, start = x + 5, then it will skip x + 3 and x + 4, but
their bitmap is still set, which will cause duplicated
f2fs_issue_discard of this same section in the next write_checkpoint:
round 1: section bitmap : 1 1 1 1 1, all valid, prefree_map: 0 0 0 0 0
then rm data block NO.2, block NO.2 becomes invalid, prefree_map: 0 0 1 0 0
write_checkpoint: section bitmap: 1 1 0 1 1, prefree_map: 0 0 0 0 0,
prefree of NO.2 is cleared, and no discard issued
round 2: rm data block NO.0, NO.1, NO.3, NO.4
all invalid, but prefree bit of NO.2 is set and cleared in round 1, then
prefree_map: 1 1 0 1 1
write_checkpoint: section bitmap: 0 0 0 0 0, prefree_map: 0 0 0 1 1, no
valid blocks of this section, so discard issued, but this time prefree
bit of NO.3 and NO.4 is skipped due to start = start_segno + sbi->segs_per_sec;
round 3:
write_checkpoint: section bitmap: 0 0 0 0 0, prefree_map: 0 0 0 1 1 ->
0 0 0 0 0, no valid blocks of this section, so discard issued,
this time prefree bit of NO.3 and NO.4 is cleared, but the discard of
this section is sent again...
To fix this problem, we can align the start and end value to section
boundary for fstrim and real-time discard operation, and decide to issue
discard only when the whole section is invalid, which can issue discard
aligned to section size as much as possible and avoid redundant discard.
Signed-off-by: Yunlong Song <yunlong.song@huawei.com>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-07-19 20:58:15 +08:00
|
|
|
if (need_align) {
|
|
|
|
start = rounddown(start, sbi->segs_per_sec);
|
|
|
|
end = roundup(end, sbi->segs_per_sec);
|
|
|
|
}
|
2013-11-11 08:24:37 +08:00
|
|
|
|
f2fs: issue discard align to section in LFS mode
For the case when sbi->segs_per_sec > 1 with lfs mode, take
section:segment = 5 for example, if the section prefree_map is
...previous section | current section (1 1 0 1 1) | next section...,
then the start = x, end = x + 1, after start = start_segno +
sbi->segs_per_sec, start = x + 5, then it will skip x + 3 and x + 4, but
their bitmap is still set, which will cause duplicated
f2fs_issue_discard of this same section in the next write_checkpoint:
round 1: section bitmap : 1 1 1 1 1, all valid, prefree_map: 0 0 0 0 0
then rm data block NO.2, block NO.2 becomes invalid, prefree_map: 0 0 1 0 0
write_checkpoint: section bitmap: 1 1 0 1 1, prefree_map: 0 0 0 0 0,
prefree of NO.2 is cleared, and no discard issued
round 2: rm data block NO.0, NO.1, NO.3, NO.4
all invalid, but prefree bit of NO.2 is set and cleared in round 1, then
prefree_map: 1 1 0 1 1
write_checkpoint: section bitmap: 0 0 0 0 0, prefree_map: 0 0 0 1 1, no
valid blocks of this section, so discard issued, but this time prefree
bit of NO.3 and NO.4 is skipped due to start = start_segno + sbi->segs_per_sec;
round 3:
write_checkpoint: section bitmap: 0 0 0 0 0, prefree_map: 0 0 0 1 1 ->
0 0 0 0 0, no valid blocks of this section, so discard issued,
this time prefree bit of NO.3 and NO.4 is cleared, but the discard of
this section is sent again...
To fix this problem, we can align the start and end value to section
boundary for fstrim and real-time discard operation, and decide to issue
discard only when the whole section is invalid, which can issue discard
aligned to section size as much as possible and avoid redundant discard.
Signed-off-by: Yunlong Song <yunlong.song@huawei.com>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-07-19 20:58:15 +08:00
|
|
|
for (i = start; i < end; i++) {
|
|
|
|
if (test_and_clear_bit(i, prefree_map))
|
|
|
|
dirty_i->nr_dirty[PRE]--;
|
|
|
|
}
|
2013-11-11 08:24:37 +08:00
|
|
|
|
f2fs: fix to avoid NULL pointer dereference on se->discard_map
https://bugzilla.kernel.org/show_bug.cgi?id=200951
These is a NULL pointer dereference issue reported in bugzilla:
Hi,
in the setup there is a SATA SSD connected to a SATA-to-USB bridge.
The disc is "Samsung SSD 850 PRO 256G" which supports TRIM.
There are four partitions:
sda1: FAT /boot
sda2: F2FS /
sda3: F2FS /home
sda4: F2FS
The bridge is ASMT1153e which uses the "uas" driver.
There is no TRIM pass-through, so, when mounting it reports:
mounting with "discard" option, but the device does not support discard
The USB host is USB3.0 and UASP capable. It is the one on RK3399.
Given this everything works fine, except there is no TRIM support.
In order to enable TRIM a new UDEV rule is added [1]:
/etc/udev/rules.d/10-sata-bridge-trim.rules:
ACTION=="add|change", ATTRS{idVendor}=="174c", ATTRS{idProduct}=="55aa", SUBSYSTEM=="scsi_disk", ATTR{provisioning_mode}="unmap"
After reboot any F2FS write hangs forever and dmesg reports:
Unable to handle kernel NULL pointer dereference
Also tested on a x86_64 system: works fine even with TRIM enabled.
same disc
same bridge
different usb host controller
different cpu architecture
not root filesystem
Regards,
Vicenç.
[1] Post #5 in https://bbs.archlinux.org/viewtopic.php?id=236280
Unable to handle kernel NULL pointer dereference at virtual address 000000000000003e
Mem abort info:
ESR = 0x96000004
Exception class = DABT (current EL), IL = 32 bits
SET = 0, FnV = 0
EA = 0, S1PTW = 0
Data abort info:
ISV = 0, ISS = 0x00000004
CM = 0, WnR = 0
user pgtable: 4k pages, 48-bit VAs, pgdp = 00000000626e3122
[000000000000003e] pgd=0000000000000000
Internal error: Oops: 96000004 [#1] SMP
Modules linked in: overlay snd_soc_hdmi_codec rc_cec dw_hdmi_i2s_audio dw_hdmi_cec snd_soc_simple_card snd_soc_simple_card_utils snd_soc_rockchip_i2s rockchip_rga snd_soc_rockchip_pcm rockchipdrm videobuf2_dma_sg v4l2_mem2mem rtc_rk808 videobuf2_memops analogix_dp videobuf2_v4l2 videobuf2_common dw_hdmi dw_wdt cec rc_core videodev drm_kms_helper media drm rockchip_thermal rockchip_saradc realtek drm_panel_orientation_quirks syscopyarea sysfillrect sysimgblt fb_sys_fops dwmac_rk stmmac_platform stmmac pwm_bl squashfs loop crypto_user gpio_keys hid_kensington
CPU: 5 PID: 957 Comm: nvim Not tainted 4.19.0-rc1-1-ARCH #1
Hardware name: Sapphire-RK3399 Board (DT)
pstate: 00000005 (nzcv daif -PAN -UAO)
pc : update_sit_entry+0x304/0x4b0
lr : update_sit_entry+0x108/0x4b0
sp : ffff00000ca13bd0
x29: ffff00000ca13bd0 x28: 000000000000003e
x27: 0000000000000020 x26: 0000000000080000
x25: 0000000000000048 x24: ffff8000ebb85cf8
x23: 0000000000000253 x22: 00000000ffffffff
x21: 00000000000535f2 x20: 00000000ffffffdf
x19: ffff8000eb9e6800 x18: ffff8000eb9e6be8
x17: 0000000007ce6926 x16: 000000001c83ffa8
x15: 0000000000000000 x14: ffff8000f602df90
x13: 0000000000000006 x12: 0000000000000040
x11: 0000000000000228 x10: 0000000000000000
x9 : 0000000000000000 x8 : 0000000000000000
x7 : 00000000000535f2 x6 : ffff8000ebff3440
x5 : ffff8000ebff3440 x4 : ffff8000ebe3a6c8
x3 : 00000000ffffffff x2 : 0000000000000020
x1 : 0000000000000000 x0 : ffff8000eb9e5800
Process nvim (pid: 957, stack limit = 0x0000000063a78320)
Call trace:
update_sit_entry+0x304/0x4b0
f2fs_invalidate_blocks+0x98/0x140
truncate_node+0x90/0x400
f2fs_remove_inode_page+0xe8/0x340
f2fs_evict_inode+0x2b0/0x408
evict+0xe0/0x1e0
iput+0x160/0x260
do_unlinkat+0x214/0x298
__arm64_sys_unlinkat+0x3c/0x68
el0_svc_handler+0x94/0x118
el0_svc+0x8/0xc
Code: f9400800 b9488400 36080140 f9400f01 (387c4820)
---[ end trace a0f21a307118c477 ]---
The reason is it is possible to enable discard flag on block queue via
UDEV, but during mount, f2fs will initialize se->discard_map only if
this flag is set, once the flag is set after mount, f2fs may dereference
NULL pointer on se->discard_map.
So this patch does below changes to fix this issue:
- initialize and update se->discard_map all the time.
- don't clear DISCARD option if device has no QUEUE_FLAG_DISCARD flag
during mount.
- don't issue small discard on zoned block device.
- introduce some functions to enhance the readability.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Tested-by: Vicente Bergas <vicencb@gmail.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-09-04 03:52:17 +08:00
|
|
|
if (!f2fs_realtime_discard_enable(sbi))
|
2013-11-11 08:24:37 +08:00
|
|
|
continue;
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
|
2016-12-22 11:46:24 +08:00
|
|
|
if (force && start >= cpc->trim_start &&
|
|
|
|
(end - 1) <= cpc->trim_end)
|
|
|
|
continue;
|
|
|
|
|
2016-06-04 10:29:38 +08:00
|
|
|
if (!test_opt(sbi, LFS) || sbi->segs_per_sec == 1) {
|
|
|
|
f2fs_issue_discard(sbi, START_BLOCK(sbi, start),
|
2013-11-12 15:55:17 +08:00
|
|
|
(end - start) << sbi->log_blocks_per_seg);
|
2016-06-04 10:29:38 +08:00
|
|
|
continue;
|
|
|
|
}
|
|
|
|
next:
|
2017-04-08 06:08:17 +08:00
|
|
|
secno = GET_SEC_FROM_SEG(sbi, start);
|
|
|
|
start_segno = GET_SEG_FROM_SEC(sbi, secno);
|
2016-06-04 10:29:38 +08:00
|
|
|
if (!IS_CURSEC(sbi, secno) &&
|
2017-04-08 05:33:22 +08:00
|
|
|
!get_valid_blocks(sbi, start, true))
|
2016-06-04 10:29:38 +08:00
|
|
|
f2fs_issue_discard(sbi, START_BLOCK(sbi, start_segno),
|
|
|
|
sbi->segs_per_sec << sbi->log_blocks_per_seg);
|
|
|
|
|
|
|
|
start = start_segno + sbi->segs_per_sec;
|
|
|
|
if (start < end)
|
|
|
|
goto next;
|
2017-02-28 03:57:11 +08:00
|
|
|
else
|
|
|
|
end = start - 1;
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
}
|
|
|
|
mutex_unlock(&dirty_i->seglist_lock);
|
2013-11-12 13:49:56 +08:00
|
|
|
|
|
|
|
/* send small discards */
|
2014-03-29 11:33:17 +08:00
|
|
|
list_for_each_entry_safe(entry, this, head, list) {
|
2017-03-28 18:18:50 +08:00
|
|
|
unsigned int cur_pos = 0, next_pos, len, total_len = 0;
|
|
|
|
bool is_valid = test_bit_le(0, entry->discard_map);
|
|
|
|
|
|
|
|
find_next:
|
|
|
|
if (is_valid) {
|
|
|
|
next_pos = find_next_zero_bit_le(entry->discard_map,
|
|
|
|
sbi->blocks_per_seg, cur_pos);
|
|
|
|
len = next_pos - cur_pos;
|
|
|
|
|
2018-02-06 12:31:17 +08:00
|
|
|
if (f2fs_sb_has_blkzoned(sbi->sb) ||
|
2017-05-26 16:04:40 +08:00
|
|
|
(force && len < cpc->trim_minlen))
|
2017-03-28 18:18:50 +08:00
|
|
|
goto skip;
|
|
|
|
|
|
|
|
f2fs_issue_discard(sbi, entry->start_blkaddr + cur_pos,
|
|
|
|
len);
|
|
|
|
total_len += len;
|
|
|
|
} else {
|
|
|
|
next_pos = find_next_bit_le(entry->discard_map,
|
|
|
|
sbi->blocks_per_seg, cur_pos);
|
|
|
|
}
|
2015-05-01 13:50:06 +08:00
|
|
|
skip:
|
2017-03-28 18:18:50 +08:00
|
|
|
cur_pos = next_pos;
|
|
|
|
is_valid = !is_valid;
|
|
|
|
|
|
|
|
if (cur_pos < sbi->blocks_per_seg)
|
|
|
|
goto find_next;
|
|
|
|
|
2018-04-25 17:38:29 +08:00
|
|
|
release_discard_addr(entry);
|
f2fs: introduce discard_granularity sysfs entry
Commit d618ebaf0aa8 ("f2fs: enable small discard by default") enables
f2fs to issue 4K size discard in real-time discard mode. However, issuing
smaller discard may cost more lifetime but releasing less free space in
flash device. Since f2fs has ability of separating hot/cold data and
garbage collection, we can expect that small-sized invalid region would
expand soon with OPU, deletion or garbage collection on valid datas, so
it's better to delay or skip issuing smaller size discards, it could help
to reduce overmuch consumption of IO bandwidth and lifetime of flash
storage.
This patch makes f2fs selectng 64K size as its default minimal
granularity, and issue discard with the size which is not smaller than
minimal granularity. Also it exposes discard granularity as sysfs entry
for configuration in different scenario.
Jaegeuk Kim:
We must issue all the accumulated discard commands when fstrim is called.
So, I've added pend_list_tag[] to indicate whether we should issue the
commands or not. If tag sets P_ACTIVE or P_TRIM, we have to issue them.
P_TRIM is set once at a time, given fstrim trigger.
In addition, issue_discard_thread is calling too much due to the number of
discard commands remaining in the pending list. I added a timer to control
it likewise gc_thread.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-08-07 23:09:56 +08:00
|
|
|
dcc->nr_discards -= total_len;
|
2013-11-12 13:49:56 +08:00
|
|
|
}
|
2017-04-25 00:21:34 +08:00
|
|
|
|
2017-08-23 12:15:43 +08:00
|
|
|
wake_up_discard_thread(sbi, false);
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
}
|
|
|
|
|
2017-01-29 13:27:02 +08:00
|
|
|
static int create_discard_cmd_control(struct f2fs_sb_info *sbi)
|
2017-01-12 06:40:24 +08:00
|
|
|
{
|
2017-01-10 12:32:07 +08:00
|
|
|
dev_t dev = sbi->sb->s_bdev->bd_dev;
|
2017-01-12 06:40:24 +08:00
|
|
|
struct discard_cmd_control *dcc;
|
2017-04-15 14:09:37 +08:00
|
|
|
int err = 0, i;
|
2017-01-12 06:40:24 +08:00
|
|
|
|
|
|
|
if (SM_I(sbi)->dcc_info) {
|
|
|
|
dcc = SM_I(sbi)->dcc_info;
|
|
|
|
goto init_thread;
|
|
|
|
}
|
|
|
|
|
2017-11-30 19:28:17 +08:00
|
|
|
dcc = f2fs_kzalloc(sbi, sizeof(struct discard_cmd_control), GFP_KERNEL);
|
2017-01-12 06:40:24 +08:00
|
|
|
if (!dcc)
|
|
|
|
return -ENOMEM;
|
|
|
|
|
f2fs: introduce discard_granularity sysfs entry
Commit d618ebaf0aa8 ("f2fs: enable small discard by default") enables
f2fs to issue 4K size discard in real-time discard mode. However, issuing
smaller discard may cost more lifetime but releasing less free space in
flash device. Since f2fs has ability of separating hot/cold data and
garbage collection, we can expect that small-sized invalid region would
expand soon with OPU, deletion or garbage collection on valid datas, so
it's better to delay or skip issuing smaller size discards, it could help
to reduce overmuch consumption of IO bandwidth and lifetime of flash
storage.
This patch makes f2fs selectng 64K size as its default minimal
granularity, and issue discard with the size which is not smaller than
minimal granularity. Also it exposes discard granularity as sysfs entry
for configuration in different scenario.
Jaegeuk Kim:
We must issue all the accumulated discard commands when fstrim is called.
So, I've added pend_list_tag[] to indicate whether we should issue the
commands or not. If tag sets P_ACTIVE or P_TRIM, we have to issue them.
P_TRIM is set once at a time, given fstrim trigger.
In addition, issue_discard_thread is calling too much due to the number of
discard commands remaining in the pending list. I added a timer to control
it likewise gc_thread.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-08-07 23:09:56 +08:00
|
|
|
dcc->discard_granularity = DEFAULT_DISCARD_GRANULARITY;
|
2017-04-15 14:09:36 +08:00
|
|
|
INIT_LIST_HEAD(&dcc->entry_list);
|
2017-10-04 09:08:34 +08:00
|
|
|
for (i = 0; i < MAX_PLIST_NUM; i++)
|
2017-04-15 14:09:37 +08:00
|
|
|
INIT_LIST_HEAD(&dcc->pend_list[i]);
|
2017-04-15 14:09:36 +08:00
|
|
|
INIT_LIST_HEAD(&dcc->wait_list);
|
2017-10-04 09:08:32 +08:00
|
|
|
INIT_LIST_HEAD(&dcc->fstrim_list);
|
2017-01-10 12:32:07 +08:00
|
|
|
mutex_init(&dcc->cmd_lock);
|
2017-03-25 17:19:58 +08:00
|
|
|
atomic_set(&dcc->issued_discard, 0);
|
|
|
|
atomic_set(&dcc->issing_discard, 0);
|
2017-03-25 17:19:59 +08:00
|
|
|
atomic_set(&dcc->discard_cmd_cnt, 0);
|
2017-01-12 06:40:24 +08:00
|
|
|
dcc->nr_discards = 0;
|
2017-04-25 00:21:35 +08:00
|
|
|
dcc->max_discards = MAIN_SEGS(sbi) << sbi->log_blocks_per_seg;
|
2017-04-18 19:27:39 +08:00
|
|
|
dcc->undiscard_blks = 0;
|
2018-07-08 22:11:01 +08:00
|
|
|
dcc->next_pos = 0;
|
2017-04-14 23:24:55 +08:00
|
|
|
dcc->root = RB_ROOT;
|
2018-06-22 16:06:59 +08:00
|
|
|
dcc->rbtree_check = false;
|
2017-01-12 06:40:24 +08:00
|
|
|
|
2017-01-10 12:32:07 +08:00
|
|
|
init_waitqueue_head(&dcc->discard_wait_queue);
|
2017-01-12 06:40:24 +08:00
|
|
|
SM_I(sbi)->dcc_info = dcc;
|
|
|
|
init_thread:
|
2017-01-10 12:32:07 +08:00
|
|
|
dcc->f2fs_issue_discard = kthread_run(issue_discard_thread, sbi,
|
|
|
|
"f2fs_discard-%u:%u", MAJOR(dev), MINOR(dev));
|
|
|
|
if (IS_ERR(dcc->f2fs_issue_discard)) {
|
|
|
|
err = PTR_ERR(dcc->f2fs_issue_discard);
|
|
|
|
kfree(dcc);
|
|
|
|
SM_I(sbi)->dcc_info = NULL;
|
|
|
|
return err;
|
|
|
|
}
|
|
|
|
|
2017-01-12 06:40:24 +08:00
|
|
|
return err;
|
|
|
|
}
|
|
|
|
|
2017-03-27 18:14:04 +08:00
|
|
|
static void destroy_discard_cmd_control(struct f2fs_sb_info *sbi)
|
2017-01-12 06:40:24 +08:00
|
|
|
{
|
|
|
|
struct discard_cmd_control *dcc = SM_I(sbi)->dcc_info;
|
|
|
|
|
2017-03-27 18:14:04 +08:00
|
|
|
if (!dcc)
|
|
|
|
return;
|
|
|
|
|
f2fs: clean up symbol namespace
As Ted reported:
"Hi, I was looking at f2fs's sources recently, and I noticed that there
is a very large number of non-static symbols which don't have a f2fs
prefix. There's well over a hundred (see attached below).
As one example, in fs/f2fs/dir.c there is:
unsigned char get_de_type(struct f2fs_dir_entry *de)
This function is clearly only useful for f2fs, but it has a generic
name. This means that if any other file system tries to have the same
symbol name, there will be a symbol conflict and the kernel would not
successfully build. It also means that when someone is looking f2fs
sources, it's not at all obvious whether a function such as
read_data_page(), invalidate_blocks(), is a generic kernel function
found in the fs, mm, or block layers, or a f2fs specific function.
You might want to fix this at some point. Hopefully Kent's bcachefs
isn't similarly using genericly named functions, since that might
cause conflicts with f2fs's functions --- but just as this would be a
problem that we would rightly insist that Kent fix, this is something
that we should have rightly insisted that f2fs should have fixed
before it was integrated into the mainline kernel.
acquire_orphan_inode
add_ino_entry
add_orphan_inode
allocate_data_block
allocate_new_segments
alloc_nid
alloc_nid_done
alloc_nid_failed
available_free_memory
...."
This patch adds "f2fs_" prefix for all non-static symbols in order to:
a) avoid conflict with other kernel generic symbols;
b) to indicate the function is f2fs specific one instead of generic
one;
Reported-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-30 00:20:41 +08:00
|
|
|
f2fs_stop_discard_thread(sbi);
|
2017-03-27 18:14:04 +08:00
|
|
|
|
|
|
|
kfree(dcc);
|
|
|
|
SM_I(sbi)->dcc_info = NULL;
|
2017-01-12 06:40:24 +08:00
|
|
|
}
|
|
|
|
|
f2fs: refactor flush_sit_entries codes for reducing SIT writes
In commit aec71382c681 ("f2fs: refactor flush_nat_entries codes for reducing NAT
writes"), we descripte the issue as below:
"Although building NAT journal in cursum reduce the read/write work for NAT
block, but previous design leave us lower performance when write checkpoint
frequently for these cases:
1. if journal in cursum has already full, it's a bit of waste that we flush all
nat entries to page for persistence, but not to cache any entries.
2. if journal in cursum is not full, we fill nat entries to journal util
journal is full, then flush the left dirty entries to disk without merge
journaled entries, so these journaled entries may be flushed to disk at next
checkpoint but lost chance to flushed last time."
Actually, we have the same problem in using SIT journal area.
In this patch, firstly we will update sit journal with dirty entries as many as
possible. Secondly if there is no space in sit journal, we will remove all
entries in journal and walk through the whole dirty entry bitmap of sit,
accounting dirty sit entries located in same SIT block to sit entry set. All
entry sets are linked to list sit_entry_set in sm_info, sorted ascending order
by count of entries in set. Later we flush entries in set which have fewest
entries into journal as many as we can, and then flush dense set with merged
entries to disk.
In this way we can use sit journal area more effectively, also we will reduce
SIT update, result in gaining in performance and saving lifetime of flash
device.
In my testing environment, it shows this patch can help to reduce SIT block
update obviously.
virtual machine + hard disk:
fsstress -p 20 -n 400 -l 5
sit page num cp count sit pages/cp
based 2006.50 1349.75 1.486
patched 1566.25 1463.25 1.070
Our latency of merging op is small when handling a great number of dirty SIT
entries in flush_sit_entries:
latency(ns) dirty sit count
36038 2151
49168 2123
37174 2232
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-09-04 18:13:01 +08:00
|
|
|
static bool __mark_sit_entry_dirty(struct f2fs_sb_info *sbi, unsigned int segno)
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
{
|
|
|
|
struct sit_info *sit_i = SIT_I(sbi);
|
f2fs: refactor flush_sit_entries codes for reducing SIT writes
In commit aec71382c681 ("f2fs: refactor flush_nat_entries codes for reducing NAT
writes"), we descripte the issue as below:
"Although building NAT journal in cursum reduce the read/write work for NAT
block, but previous design leave us lower performance when write checkpoint
frequently for these cases:
1. if journal in cursum has already full, it's a bit of waste that we flush all
nat entries to page for persistence, but not to cache any entries.
2. if journal in cursum is not full, we fill nat entries to journal util
journal is full, then flush the left dirty entries to disk without merge
journaled entries, so these journaled entries may be flushed to disk at next
checkpoint but lost chance to flushed last time."
Actually, we have the same problem in using SIT journal area.
In this patch, firstly we will update sit journal with dirty entries as many as
possible. Secondly if there is no space in sit journal, we will remove all
entries in journal and walk through the whole dirty entry bitmap of sit,
accounting dirty sit entries located in same SIT block to sit entry set. All
entry sets are linked to list sit_entry_set in sm_info, sorted ascending order
by count of entries in set. Later we flush entries in set which have fewest
entries into journal as many as we can, and then flush dense set with merged
entries to disk.
In this way we can use sit journal area more effectively, also we will reduce
SIT update, result in gaining in performance and saving lifetime of flash
device.
In my testing environment, it shows this patch can help to reduce SIT block
update obviously.
virtual machine + hard disk:
fsstress -p 20 -n 400 -l 5
sit page num cp count sit pages/cp
based 2006.50 1349.75 1.486
patched 1566.25 1463.25 1.070
Our latency of merging op is small when handling a great number of dirty SIT
entries in flush_sit_entries:
latency(ns) dirty sit count
36038 2151
49168 2123
37174 2232
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-09-04 18:13:01 +08:00
|
|
|
|
|
|
|
if (!__test_and_set_bit(segno, sit_i->dirty_sentries_bitmap)) {
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
sit_i->dirty_sentries++;
|
f2fs: refactor flush_sit_entries codes for reducing SIT writes
In commit aec71382c681 ("f2fs: refactor flush_nat_entries codes for reducing NAT
writes"), we descripte the issue as below:
"Although building NAT journal in cursum reduce the read/write work for NAT
block, but previous design leave us lower performance when write checkpoint
frequently for these cases:
1. if journal in cursum has already full, it's a bit of waste that we flush all
nat entries to page for persistence, but not to cache any entries.
2. if journal in cursum is not full, we fill nat entries to journal util
journal is full, then flush the left dirty entries to disk without merge
journaled entries, so these journaled entries may be flushed to disk at next
checkpoint but lost chance to flushed last time."
Actually, we have the same problem in using SIT journal area.
In this patch, firstly we will update sit journal with dirty entries as many as
possible. Secondly if there is no space in sit journal, we will remove all
entries in journal and walk through the whole dirty entry bitmap of sit,
accounting dirty sit entries located in same SIT block to sit entry set. All
entry sets are linked to list sit_entry_set in sm_info, sorted ascending order
by count of entries in set. Later we flush entries in set which have fewest
entries into journal as many as we can, and then flush dense set with merged
entries to disk.
In this way we can use sit journal area more effectively, also we will reduce
SIT update, result in gaining in performance and saving lifetime of flash
device.
In my testing environment, it shows this patch can help to reduce SIT block
update obviously.
virtual machine + hard disk:
fsstress -p 20 -n 400 -l 5
sit page num cp count sit pages/cp
based 2006.50 1349.75 1.486
patched 1566.25 1463.25 1.070
Our latency of merging op is small when handling a great number of dirty SIT
entries in flush_sit_entries:
latency(ns) dirty sit count
36038 2151
49168 2123
37174 2232
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-09-04 18:13:01 +08:00
|
|
|
return false;
|
|
|
|
}
|
|
|
|
|
|
|
|
return true;
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
static void __set_sit_entry_type(struct f2fs_sb_info *sbi, int type,
|
|
|
|
unsigned int segno, int modified)
|
|
|
|
{
|
|
|
|
struct seg_entry *se = get_seg_entry(sbi, segno);
|
|
|
|
se->type = type;
|
|
|
|
if (modified)
|
|
|
|
__mark_sit_entry_dirty(sbi, segno);
|
|
|
|
}
|
|
|
|
|
|
|
|
static void update_sit_entry(struct f2fs_sb_info *sbi, block_t blkaddr, int del)
|
|
|
|
{
|
|
|
|
struct seg_entry *se;
|
|
|
|
unsigned int segno, offset;
|
|
|
|
long int new_vblocks;
|
2017-08-02 21:20:13 +08:00
|
|
|
bool exist;
|
|
|
|
#ifdef CONFIG_F2FS_CHECK_FS
|
|
|
|
bool mir_exist;
|
|
|
|
#endif
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
|
|
|
|
segno = GET_SEGNO(sbi, blkaddr);
|
|
|
|
|
|
|
|
se = get_seg_entry(sbi, segno);
|
|
|
|
new_vblocks = se->valid_blocks + del;
|
2014-02-04 12:01:10 +08:00
|
|
|
offset = GET_BLKOFF_FROM_SEG0(sbi, blkaddr);
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
|
2014-09-03 06:52:58 +08:00
|
|
|
f2fs_bug_on(sbi, (new_vblocks >> (sizeof(unsigned short) << 3) ||
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
(new_vblocks > sbi->blocks_per_seg)));
|
|
|
|
|
|
|
|
se->valid_blocks = new_vblocks;
|
2018-06-04 23:20:17 +08:00
|
|
|
se->mtime = get_mtime(sbi, false);
|
|
|
|
if (se->mtime > SIT_I(sbi)->max_mtime)
|
|
|
|
SIT_I(sbi)->max_mtime = se->mtime;
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
|
|
|
|
/* Update valid block bitmap */
|
|
|
|
if (del > 0) {
|
2017-08-02 21:20:13 +08:00
|
|
|
exist = f2fs_test_and_set_bit(offset, se->cur_valid_map);
|
2017-01-07 18:51:01 +08:00
|
|
|
#ifdef CONFIG_F2FS_CHECK_FS
|
2017-08-02 21:20:13 +08:00
|
|
|
mir_exist = f2fs_test_and_set_bit(offset,
|
|
|
|
se->cur_valid_map_mir);
|
|
|
|
if (unlikely(exist != mir_exist)) {
|
|
|
|
f2fs_msg(sbi->sb, KERN_ERR, "Inconsistent error "
|
|
|
|
"when setting bitmap, blk:%u, old bit:%d",
|
|
|
|
blkaddr, exist);
|
2014-09-03 07:05:00 +08:00
|
|
|
f2fs_bug_on(sbi, 1);
|
2017-08-02 21:20:13 +08:00
|
|
|
}
|
2017-01-07 18:51:01 +08:00
|
|
|
#endif
|
2017-08-02 21:20:13 +08:00
|
|
|
if (unlikely(exist)) {
|
|
|
|
f2fs_msg(sbi->sb, KERN_ERR,
|
|
|
|
"Bitmap was wrongly set, blk:%u", blkaddr);
|
|
|
|
f2fs_bug_on(sbi, 1);
|
2017-08-02 22:16:54 +08:00
|
|
|
se->valid_blocks--;
|
|
|
|
del = 0;
|
2017-01-07 18:51:01 +08:00
|
|
|
}
|
2017-08-02 21:20:13 +08:00
|
|
|
|
f2fs: fix to avoid NULL pointer dereference on se->discard_map
https://bugzilla.kernel.org/show_bug.cgi?id=200951
These is a NULL pointer dereference issue reported in bugzilla:
Hi,
in the setup there is a SATA SSD connected to a SATA-to-USB bridge.
The disc is "Samsung SSD 850 PRO 256G" which supports TRIM.
There are four partitions:
sda1: FAT /boot
sda2: F2FS /
sda3: F2FS /home
sda4: F2FS
The bridge is ASMT1153e which uses the "uas" driver.
There is no TRIM pass-through, so, when mounting it reports:
mounting with "discard" option, but the device does not support discard
The USB host is USB3.0 and UASP capable. It is the one on RK3399.
Given this everything works fine, except there is no TRIM support.
In order to enable TRIM a new UDEV rule is added [1]:
/etc/udev/rules.d/10-sata-bridge-trim.rules:
ACTION=="add|change", ATTRS{idVendor}=="174c", ATTRS{idProduct}=="55aa", SUBSYSTEM=="scsi_disk", ATTR{provisioning_mode}="unmap"
After reboot any F2FS write hangs forever and dmesg reports:
Unable to handle kernel NULL pointer dereference
Also tested on a x86_64 system: works fine even with TRIM enabled.
same disc
same bridge
different usb host controller
different cpu architecture
not root filesystem
Regards,
Vicenç.
[1] Post #5 in https://bbs.archlinux.org/viewtopic.php?id=236280
Unable to handle kernel NULL pointer dereference at virtual address 000000000000003e
Mem abort info:
ESR = 0x96000004
Exception class = DABT (current EL), IL = 32 bits
SET = 0, FnV = 0
EA = 0, S1PTW = 0
Data abort info:
ISV = 0, ISS = 0x00000004
CM = 0, WnR = 0
user pgtable: 4k pages, 48-bit VAs, pgdp = 00000000626e3122
[000000000000003e] pgd=0000000000000000
Internal error: Oops: 96000004 [#1] SMP
Modules linked in: overlay snd_soc_hdmi_codec rc_cec dw_hdmi_i2s_audio dw_hdmi_cec snd_soc_simple_card snd_soc_simple_card_utils snd_soc_rockchip_i2s rockchip_rga snd_soc_rockchip_pcm rockchipdrm videobuf2_dma_sg v4l2_mem2mem rtc_rk808 videobuf2_memops analogix_dp videobuf2_v4l2 videobuf2_common dw_hdmi dw_wdt cec rc_core videodev drm_kms_helper media drm rockchip_thermal rockchip_saradc realtek drm_panel_orientation_quirks syscopyarea sysfillrect sysimgblt fb_sys_fops dwmac_rk stmmac_platform stmmac pwm_bl squashfs loop crypto_user gpio_keys hid_kensington
CPU: 5 PID: 957 Comm: nvim Not tainted 4.19.0-rc1-1-ARCH #1
Hardware name: Sapphire-RK3399 Board (DT)
pstate: 00000005 (nzcv daif -PAN -UAO)
pc : update_sit_entry+0x304/0x4b0
lr : update_sit_entry+0x108/0x4b0
sp : ffff00000ca13bd0
x29: ffff00000ca13bd0 x28: 000000000000003e
x27: 0000000000000020 x26: 0000000000080000
x25: 0000000000000048 x24: ffff8000ebb85cf8
x23: 0000000000000253 x22: 00000000ffffffff
x21: 00000000000535f2 x20: 00000000ffffffdf
x19: ffff8000eb9e6800 x18: ffff8000eb9e6be8
x17: 0000000007ce6926 x16: 000000001c83ffa8
x15: 0000000000000000 x14: ffff8000f602df90
x13: 0000000000000006 x12: 0000000000000040
x11: 0000000000000228 x10: 0000000000000000
x9 : 0000000000000000 x8 : 0000000000000000
x7 : 00000000000535f2 x6 : ffff8000ebff3440
x5 : ffff8000ebff3440 x4 : ffff8000ebe3a6c8
x3 : 00000000ffffffff x2 : 0000000000000020
x1 : 0000000000000000 x0 : ffff8000eb9e5800
Process nvim (pid: 957, stack limit = 0x0000000063a78320)
Call trace:
update_sit_entry+0x304/0x4b0
f2fs_invalidate_blocks+0x98/0x140
truncate_node+0x90/0x400
f2fs_remove_inode_page+0xe8/0x340
f2fs_evict_inode+0x2b0/0x408
evict+0xe0/0x1e0
iput+0x160/0x260
do_unlinkat+0x214/0x298
__arm64_sys_unlinkat+0x3c/0x68
el0_svc_handler+0x94/0x118
el0_svc+0x8/0xc
Code: f9400800 b9488400 36080140 f9400f01 (387c4820)
---[ end trace a0f21a307118c477 ]---
The reason is it is possible to enable discard flag on block queue via
UDEV, but during mount, f2fs will initialize se->discard_map only if
this flag is set, once the flag is set after mount, f2fs may dereference
NULL pointer on se->discard_map.
So this patch does below changes to fix this issue:
- initialize and update se->discard_map all the time.
- don't clear DISCARD option if device has no QUEUE_FLAG_DISCARD flag
during mount.
- don't issue small discard on zoned block device.
- introduce some functions to enhance the readability.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Tested-by: Vicente Bergas <vicencb@gmail.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-09-04 03:52:17 +08:00
|
|
|
if (!f2fs_test_and_set_bit(offset, se->discard_map))
|
2015-05-01 13:37:50 +08:00
|
|
|
sbi->discard_blks--;
|
2017-03-07 03:59:56 +08:00
|
|
|
|
|
|
|
/* don't overwrite by SSR to keep node chain */
|
2018-03-07 16:22:50 +08:00
|
|
|
if (IS_NODESEG(se->type)) {
|
2017-03-07 03:59:56 +08:00
|
|
|
if (!f2fs_test_and_set_bit(offset, se->ckpt_valid_map))
|
|
|
|
se->ckpt_valid_blocks++;
|
|
|
|
}
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
} else {
|
2017-08-02 21:20:13 +08:00
|
|
|
exist = f2fs_test_and_clear_bit(offset, se->cur_valid_map);
|
2017-01-07 18:51:01 +08:00
|
|
|
#ifdef CONFIG_F2FS_CHECK_FS
|
2017-08-02 21:20:13 +08:00
|
|
|
mir_exist = f2fs_test_and_clear_bit(offset,
|
|
|
|
se->cur_valid_map_mir);
|
|
|
|
if (unlikely(exist != mir_exist)) {
|
|
|
|
f2fs_msg(sbi->sb, KERN_ERR, "Inconsistent error "
|
|
|
|
"when clearing bitmap, blk:%u, old bit:%d",
|
|
|
|
blkaddr, exist);
|
2014-09-03 07:05:00 +08:00
|
|
|
f2fs_bug_on(sbi, 1);
|
2017-08-02 21:20:13 +08:00
|
|
|
}
|
2017-01-07 18:51:01 +08:00
|
|
|
#endif
|
2017-08-02 21:20:13 +08:00
|
|
|
if (unlikely(!exist)) {
|
|
|
|
f2fs_msg(sbi->sb, KERN_ERR,
|
|
|
|
"Bitmap was wrongly cleared, blk:%u", blkaddr);
|
|
|
|
f2fs_bug_on(sbi, 1);
|
2017-08-02 22:16:54 +08:00
|
|
|
se->valid_blocks++;
|
|
|
|
del = 0;
|
2017-01-07 18:51:01 +08:00
|
|
|
}
|
2017-08-02 21:20:13 +08:00
|
|
|
|
f2fs: fix to avoid NULL pointer dereference on se->discard_map
https://bugzilla.kernel.org/show_bug.cgi?id=200951
These is a NULL pointer dereference issue reported in bugzilla:
Hi,
in the setup there is a SATA SSD connected to a SATA-to-USB bridge.
The disc is "Samsung SSD 850 PRO 256G" which supports TRIM.
There are four partitions:
sda1: FAT /boot
sda2: F2FS /
sda3: F2FS /home
sda4: F2FS
The bridge is ASMT1153e which uses the "uas" driver.
There is no TRIM pass-through, so, when mounting it reports:
mounting with "discard" option, but the device does not support discard
The USB host is USB3.0 and UASP capable. It is the one on RK3399.
Given this everything works fine, except there is no TRIM support.
In order to enable TRIM a new UDEV rule is added [1]:
/etc/udev/rules.d/10-sata-bridge-trim.rules:
ACTION=="add|change", ATTRS{idVendor}=="174c", ATTRS{idProduct}=="55aa", SUBSYSTEM=="scsi_disk", ATTR{provisioning_mode}="unmap"
After reboot any F2FS write hangs forever and dmesg reports:
Unable to handle kernel NULL pointer dereference
Also tested on a x86_64 system: works fine even with TRIM enabled.
same disc
same bridge
different usb host controller
different cpu architecture
not root filesystem
Regards,
Vicenç.
[1] Post #5 in https://bbs.archlinux.org/viewtopic.php?id=236280
Unable to handle kernel NULL pointer dereference at virtual address 000000000000003e
Mem abort info:
ESR = 0x96000004
Exception class = DABT (current EL), IL = 32 bits
SET = 0, FnV = 0
EA = 0, S1PTW = 0
Data abort info:
ISV = 0, ISS = 0x00000004
CM = 0, WnR = 0
user pgtable: 4k pages, 48-bit VAs, pgdp = 00000000626e3122
[000000000000003e] pgd=0000000000000000
Internal error: Oops: 96000004 [#1] SMP
Modules linked in: overlay snd_soc_hdmi_codec rc_cec dw_hdmi_i2s_audio dw_hdmi_cec snd_soc_simple_card snd_soc_simple_card_utils snd_soc_rockchip_i2s rockchip_rga snd_soc_rockchip_pcm rockchipdrm videobuf2_dma_sg v4l2_mem2mem rtc_rk808 videobuf2_memops analogix_dp videobuf2_v4l2 videobuf2_common dw_hdmi dw_wdt cec rc_core videodev drm_kms_helper media drm rockchip_thermal rockchip_saradc realtek drm_panel_orientation_quirks syscopyarea sysfillrect sysimgblt fb_sys_fops dwmac_rk stmmac_platform stmmac pwm_bl squashfs loop crypto_user gpio_keys hid_kensington
CPU: 5 PID: 957 Comm: nvim Not tainted 4.19.0-rc1-1-ARCH #1
Hardware name: Sapphire-RK3399 Board (DT)
pstate: 00000005 (nzcv daif -PAN -UAO)
pc : update_sit_entry+0x304/0x4b0
lr : update_sit_entry+0x108/0x4b0
sp : ffff00000ca13bd0
x29: ffff00000ca13bd0 x28: 000000000000003e
x27: 0000000000000020 x26: 0000000000080000
x25: 0000000000000048 x24: ffff8000ebb85cf8
x23: 0000000000000253 x22: 00000000ffffffff
x21: 00000000000535f2 x20: 00000000ffffffdf
x19: ffff8000eb9e6800 x18: ffff8000eb9e6be8
x17: 0000000007ce6926 x16: 000000001c83ffa8
x15: 0000000000000000 x14: ffff8000f602df90
x13: 0000000000000006 x12: 0000000000000040
x11: 0000000000000228 x10: 0000000000000000
x9 : 0000000000000000 x8 : 0000000000000000
x7 : 00000000000535f2 x6 : ffff8000ebff3440
x5 : ffff8000ebff3440 x4 : ffff8000ebe3a6c8
x3 : 00000000ffffffff x2 : 0000000000000020
x1 : 0000000000000000 x0 : ffff8000eb9e5800
Process nvim (pid: 957, stack limit = 0x0000000063a78320)
Call trace:
update_sit_entry+0x304/0x4b0
f2fs_invalidate_blocks+0x98/0x140
truncate_node+0x90/0x400
f2fs_remove_inode_page+0xe8/0x340
f2fs_evict_inode+0x2b0/0x408
evict+0xe0/0x1e0
iput+0x160/0x260
do_unlinkat+0x214/0x298
__arm64_sys_unlinkat+0x3c/0x68
el0_svc_handler+0x94/0x118
el0_svc+0x8/0xc
Code: f9400800 b9488400 36080140 f9400f01 (387c4820)
---[ end trace a0f21a307118c477 ]---
The reason is it is possible to enable discard flag on block queue via
UDEV, but during mount, f2fs will initialize se->discard_map only if
this flag is set, once the flag is set after mount, f2fs may dereference
NULL pointer on se->discard_map.
So this patch does below changes to fix this issue:
- initialize and update se->discard_map all the time.
- don't clear DISCARD option if device has no QUEUE_FLAG_DISCARD flag
during mount.
- don't issue small discard on zoned block device.
- introduce some functions to enhance the readability.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Tested-by: Vicente Bergas <vicencb@gmail.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-09-04 03:52:17 +08:00
|
|
|
if (f2fs_test_and_clear_bit(offset, se->discard_map))
|
2015-05-01 13:37:50 +08:00
|
|
|
sbi->discard_blks++;
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
}
|
|
|
|
if (!f2fs_test_bit(offset, se->ckpt_valid_map))
|
|
|
|
se->ckpt_valid_blocks += del;
|
|
|
|
|
|
|
|
__mark_sit_entry_dirty(sbi, segno);
|
|
|
|
|
|
|
|
/* update total number of valid blocks to be written in ckpt area */
|
|
|
|
SIT_I(sbi)->written_valid_blocks += del;
|
|
|
|
|
|
|
|
if (sbi->segs_per_sec > 1)
|
|
|
|
get_sec_entry(sbi, segno)->valid_blocks += del;
|
|
|
|
}
|
|
|
|
|
f2fs: clean up symbol namespace
As Ted reported:
"Hi, I was looking at f2fs's sources recently, and I noticed that there
is a very large number of non-static symbols which don't have a f2fs
prefix. There's well over a hundred (see attached below).
As one example, in fs/f2fs/dir.c there is:
unsigned char get_de_type(struct f2fs_dir_entry *de)
This function is clearly only useful for f2fs, but it has a generic
name. This means that if any other file system tries to have the same
symbol name, there will be a symbol conflict and the kernel would not
successfully build. It also means that when someone is looking f2fs
sources, it's not at all obvious whether a function such as
read_data_page(), invalidate_blocks(), is a generic kernel function
found in the fs, mm, or block layers, or a f2fs specific function.
You might want to fix this at some point. Hopefully Kent's bcachefs
isn't similarly using genericly named functions, since that might
cause conflicts with f2fs's functions --- but just as this would be a
problem that we would rightly insist that Kent fix, this is something
that we should have rightly insisted that f2fs should have fixed
before it was integrated into the mainline kernel.
acquire_orphan_inode
add_ino_entry
add_orphan_inode
allocate_data_block
allocate_new_segments
alloc_nid
alloc_nid_done
alloc_nid_failed
available_free_memory
...."
This patch adds "f2fs_" prefix for all non-static symbols in order to:
a) avoid conflict with other kernel generic symbols;
b) to indicate the function is f2fs specific one instead of generic
one;
Reported-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-30 00:20:41 +08:00
|
|
|
void f2fs_invalidate_blocks(struct f2fs_sb_info *sbi, block_t addr)
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
{
|
|
|
|
unsigned int segno = GET_SEGNO(sbi, addr);
|
|
|
|
struct sit_info *sit_i = SIT_I(sbi);
|
|
|
|
|
2014-09-03 06:52:58 +08:00
|
|
|
f2fs_bug_on(sbi, addr == NULL_ADDR);
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
if (addr == NEW_ADDR)
|
|
|
|
return;
|
|
|
|
|
f2fs: readahead encrypted block during GC
During GC, for each encrypted block, we will read block synchronously
into meta page, and then submit it into current cold data log area.
So this block read model with 4k granularity can make poor performance,
like migrating non-encrypted block, let's readahead encrypted block
as well to improve migration performance.
To implement this, we choose meta page that its index is old block
address of the encrypted block, and readahead ciphertext into this
page, later, if readaheaded page is still updated, we will load its
data into target meta page, and submit the write IO.
Note that for OPU, truncation, deletion, we need to invalid meta
page after we invalid old block address, to make sure we won't load
invalid data from target meta page during encrypted block migration.
for ((i = 0; i < 1000; i++))
do {
xfs_io -f /mnt/f2fs/dir/$i -c "pwrite 0 128k" -c "fsync";
} done
for ((i = 0; i < 1000; i+=2))
do {
rm /mnt/f2fs/dir/$i;
} done
ret = ioctl(fd, F2FS_IOC_GARBAGE_COLLECT, 0);
Before:
gc-6549 [001] d..1 214682.212797: block_rq_insert: 8,32 RA 32768 () 786400 + 64 [gc]
gc-6549 [001] d..1 214682.212802: block_unplug: [gc] 1
gc-6549 [001] .... 214682.213892: block_bio_queue: 8,32 R 67494144 + 8 [gc]
gc-6549 [001] .... 214682.213899: block_getrq: 8,32 R 67494144 + 8 [gc]
gc-6549 [001] .... 214682.213902: block_plug: [gc]
gc-6549 [001] d..1 214682.213905: block_rq_insert: 8,32 R 4096 () 67494144 + 8 [gc]
gc-6549 [001] d..1 214682.213908: block_unplug: [gc] 1
gc-6549 [001] .... 214682.226405: block_bio_queue: 8,32 R 67494152 + 8 [gc]
gc-6549 [001] .... 214682.226412: block_getrq: 8,32 R 67494152 + 8 [gc]
gc-6549 [001] .... 214682.226414: block_plug: [gc]
gc-6549 [001] d..1 214682.226417: block_rq_insert: 8,32 R 4096 () 67494152 + 8 [gc]
gc-6549 [001] d..1 214682.226420: block_unplug: [gc] 1
gc-6549 [001] .... 214682.226904: block_bio_queue: 8,32 R 67494160 + 8 [gc]
gc-6549 [001] .... 214682.226910: block_getrq: 8,32 R 67494160 + 8 [gc]
gc-6549 [001] .... 214682.226911: block_plug: [gc]
gc-6549 [001] d..1 214682.226914: block_rq_insert: 8,32 R 4096 () 67494160 + 8 [gc]
gc-6549 [001] d..1 214682.226916: block_unplug: [gc] 1
After:
gc-5678 [003] .... 214327.025906: block_bio_queue: 8,32 R 67493824 + 8 [gc]
gc-5678 [003] .... 214327.025908: block_bio_backmerge: 8,32 R 67493824 + 8 [gc]
gc-5678 [003] .... 214327.025915: block_bio_queue: 8,32 R 67493832 + 8 [gc]
gc-5678 [003] .... 214327.025917: block_bio_backmerge: 8,32 R 67493832 + 8 [gc]
gc-5678 [003] .... 214327.025923: block_bio_queue: 8,32 R 67493840 + 8 [gc]
gc-5678 [003] .... 214327.025925: block_bio_backmerge: 8,32 R 67493840 + 8 [gc]
gc-5678 [003] .... 214327.025932: block_bio_queue: 8,32 R 67493848 + 8 [gc]
gc-5678 [003] .... 214327.025934: block_bio_backmerge: 8,32 R 67493848 + 8 [gc]
gc-5678 [003] .... 214327.025941: block_bio_queue: 8,32 R 67493856 + 8 [gc]
gc-5678 [003] .... 214327.025943: block_bio_backmerge: 8,32 R 67493856 + 8 [gc]
gc-5678 [003] .... 214327.025953: block_bio_queue: 8,32 R 67493864 + 8 [gc]
gc-5678 [003] .... 214327.025955: block_bio_backmerge: 8,32 R 67493864 + 8 [gc]
gc-5678 [003] .... 214327.025962: block_bio_queue: 8,32 R 67493872 + 8 [gc]
gc-5678 [003] .... 214327.025964: block_bio_backmerge: 8,32 R 67493872 + 8 [gc]
gc-5678 [003] .... 214327.025970: block_bio_queue: 8,32 R 67493880 + 8 [gc]
gc-5678 [003] .... 214327.025972: block_bio_backmerge: 8,32 R 67493880 + 8 [gc]
gc-5678 [003] .... 214327.026000: block_bio_queue: 8,32 WS 34123776 + 2048 [gc]
gc-5678 [003] .... 214327.026019: block_getrq: 8,32 WS 34123776 + 2048 [gc]
gc-5678 [003] d..1 214327.026021: block_rq_insert: 8,32 R 131072 () 67493632 + 256 [gc]
gc-5678 [003] d..1 214327.026023: block_unplug: [gc] 1
gc-5678 [003] d..1 214327.026026: block_rq_issue: 8,32 R 131072 () 67493632 + 256 [gc]
gc-5678 [003] .... 214327.026046: block_plug: [gc]
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-08-14 22:37:25 +08:00
|
|
|
invalidate_mapping_pages(META_MAPPING(sbi), addr, addr);
|
|
|
|
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
/* add it into sit main buffer */
|
2017-10-30 17:49:53 +08:00
|
|
|
down_write(&sit_i->sentry_lock);
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
|
|
|
|
update_sit_entry(sbi, addr, -1);
|
|
|
|
|
|
|
|
/* add it into dirty seglist */
|
|
|
|
locate_dirty_segment(sbi, segno);
|
|
|
|
|
2017-10-30 17:49:53 +08:00
|
|
|
up_write(&sit_i->sentry_lock);
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
}
|
|
|
|
|
f2fs: clean up symbol namespace
As Ted reported:
"Hi, I was looking at f2fs's sources recently, and I noticed that there
is a very large number of non-static symbols which don't have a f2fs
prefix. There's well over a hundred (see attached below).
As one example, in fs/f2fs/dir.c there is:
unsigned char get_de_type(struct f2fs_dir_entry *de)
This function is clearly only useful for f2fs, but it has a generic
name. This means that if any other file system tries to have the same
symbol name, there will be a symbol conflict and the kernel would not
successfully build. It also means that when someone is looking f2fs
sources, it's not at all obvious whether a function such as
read_data_page(), invalidate_blocks(), is a generic kernel function
found in the fs, mm, or block layers, or a f2fs specific function.
You might want to fix this at some point. Hopefully Kent's bcachefs
isn't similarly using genericly named functions, since that might
cause conflicts with f2fs's functions --- but just as this would be a
problem that we would rightly insist that Kent fix, this is something
that we should have rightly insisted that f2fs should have fixed
before it was integrated into the mainline kernel.
acquire_orphan_inode
add_ino_entry
add_orphan_inode
allocate_data_block
allocate_new_segments
alloc_nid
alloc_nid_done
alloc_nid_failed
available_free_memory
...."
This patch adds "f2fs_" prefix for all non-static symbols in order to:
a) avoid conflict with other kernel generic symbols;
b) to indicate the function is f2fs specific one instead of generic
one;
Reported-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-30 00:20:41 +08:00
|
|
|
bool f2fs_is_checkpointed_data(struct f2fs_sb_info *sbi, block_t blkaddr)
|
2015-10-08 03:28:41 +08:00
|
|
|
{
|
|
|
|
struct sit_info *sit_i = SIT_I(sbi);
|
|
|
|
unsigned int segno, offset;
|
|
|
|
struct seg_entry *se;
|
|
|
|
bool is_cp = false;
|
|
|
|
|
2018-06-05 17:44:11 +08:00
|
|
|
if (!is_valid_data_blkaddr(sbi, blkaddr))
|
2015-10-08 03:28:41 +08:00
|
|
|
return true;
|
|
|
|
|
2017-10-30 17:49:53 +08:00
|
|
|
down_read(&sit_i->sentry_lock);
|
2015-10-08 03:28:41 +08:00
|
|
|
|
|
|
|
segno = GET_SEGNO(sbi, blkaddr);
|
|
|
|
se = get_seg_entry(sbi, segno);
|
|
|
|
offset = GET_BLKOFF_FROM_SEG0(sbi, blkaddr);
|
|
|
|
|
|
|
|
if (f2fs_test_bit(offset, se->ckpt_valid_map))
|
|
|
|
is_cp = true;
|
|
|
|
|
2017-10-30 17:49:53 +08:00
|
|
|
up_read(&sit_i->sentry_lock);
|
2015-10-08 03:28:41 +08:00
|
|
|
|
|
|
|
return is_cp;
|
|
|
|
}
|
|
|
|
|
2012-11-29 12:28:09 +08:00
|
|
|
/*
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
* This function should be resided under the curseg_mutex lock
|
|
|
|
*/
|
|
|
|
static void __add_sum_entry(struct f2fs_sb_info *sbi, int type,
|
2013-06-13 16:59:27 +08:00
|
|
|
struct f2fs_summary *sum)
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
{
|
|
|
|
struct curseg_info *curseg = CURSEG_I(sbi, type);
|
|
|
|
void *addr = curseg->sum_blk;
|
2013-06-13 16:59:27 +08:00
|
|
|
addr += curseg->next_blkoff * sizeof(struct f2fs_summary);
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
memcpy(addr, sum, sizeof(struct f2fs_summary));
|
|
|
|
}
|
|
|
|
|
2012-11-29 12:28:09 +08:00
|
|
|
/*
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
* Calculate the number of current summary pages for writing
|
|
|
|
*/
|
f2fs: clean up symbol namespace
As Ted reported:
"Hi, I was looking at f2fs's sources recently, and I noticed that there
is a very large number of non-static symbols which don't have a f2fs
prefix. There's well over a hundred (see attached below).
As one example, in fs/f2fs/dir.c there is:
unsigned char get_de_type(struct f2fs_dir_entry *de)
This function is clearly only useful for f2fs, but it has a generic
name. This means that if any other file system tries to have the same
symbol name, there will be a symbol conflict and the kernel would not
successfully build. It also means that when someone is looking f2fs
sources, it's not at all obvious whether a function such as
read_data_page(), invalidate_blocks(), is a generic kernel function
found in the fs, mm, or block layers, or a f2fs specific function.
You might want to fix this at some point. Hopefully Kent's bcachefs
isn't similarly using genericly named functions, since that might
cause conflicts with f2fs's functions --- but just as this would be a
problem that we would rightly insist that Kent fix, this is something
that we should have rightly insisted that f2fs should have fixed
before it was integrated into the mainline kernel.
acquire_orphan_inode
add_ino_entry
add_orphan_inode
allocate_data_block
allocate_new_segments
alloc_nid
alloc_nid_done
alloc_nid_failed
available_free_memory
...."
This patch adds "f2fs_" prefix for all non-static symbols in order to:
a) avoid conflict with other kernel generic symbols;
b) to indicate the function is f2fs specific one instead of generic
one;
Reported-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-30 00:20:41 +08:00
|
|
|
int f2fs_npages_for_summary_flush(struct f2fs_sb_info *sbi, bool for_ra)
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
{
|
|
|
|
int valid_sum_count = 0;
|
2013-10-29 16:21:47 +08:00
|
|
|
int i, sum_in_page;
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
|
|
|
|
for (i = CURSEG_HOT_DATA; i <= CURSEG_COLD_DATA; i++) {
|
|
|
|
if (sbi->ckpt->alloc_type[i] == SSR)
|
|
|
|
valid_sum_count += sbi->blocks_per_seg;
|
2014-12-09 14:21:46 +08:00
|
|
|
else {
|
|
|
|
if (for_ra)
|
|
|
|
valid_sum_count += le16_to_cpu(
|
|
|
|
F2FS_CKPT(sbi)->cur_data_blkoff[i]);
|
|
|
|
else
|
|
|
|
valid_sum_count += curseg_blkoff(sbi, i);
|
|
|
|
}
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
}
|
|
|
|
|
mm, fs: get rid of PAGE_CACHE_* and page_cache_{get,release} macros
PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were introduced *long* time
ago with promise that one day it will be possible to implement page
cache with bigger chunks than PAGE_SIZE.
This promise never materialized. And unlikely will.
We have many places where PAGE_CACHE_SIZE assumed to be equal to
PAGE_SIZE. And it's constant source of confusion on whether
PAGE_CACHE_* or PAGE_* constant should be used in a particular case,
especially on the border between fs and mm.
Global switching to PAGE_CACHE_SIZE != PAGE_SIZE would cause to much
breakage to be doable.
Let's stop pretending that pages in page cache are special. They are
not.
The changes are pretty straight-forward:
- <foo> << (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;
- <foo> >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;
- PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} -> PAGE_{SIZE,SHIFT,MASK,ALIGN};
- page_cache_get() -> get_page();
- page_cache_release() -> put_page();
This patch contains automated changes generated with coccinelle using
script below. For some reason, coccinelle doesn't patch header files.
I've called spatch for them manually.
The only adjustment after coccinelle is revert of changes to
PAGE_CAHCE_ALIGN definition: we are going to drop it later.
There are few places in the code where coccinelle didn't reach. I'll
fix them manually in a separate patch. Comments and documentation also
will be addressed with the separate patch.
virtual patch
@@
expression E;
@@
- E << (PAGE_CACHE_SHIFT - PAGE_SHIFT)
+ E
@@
expression E;
@@
- E >> (PAGE_CACHE_SHIFT - PAGE_SHIFT)
+ E
@@
@@
- PAGE_CACHE_SHIFT
+ PAGE_SHIFT
@@
@@
- PAGE_CACHE_SIZE
+ PAGE_SIZE
@@
@@
- PAGE_CACHE_MASK
+ PAGE_MASK
@@
expression E;
@@
- PAGE_CACHE_ALIGN(E)
+ PAGE_ALIGN(E)
@@
expression E;
@@
- page_cache_get(E)
+ get_page(E)
@@
expression E;
@@
- page_cache_release(E)
+ put_page(E)
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-04-01 20:29:47 +08:00
|
|
|
sum_in_page = (PAGE_SIZE - 2 * SUM_JOURNAL_SIZE -
|
2013-10-29 16:21:47 +08:00
|
|
|
SUM_FOOTER_SIZE) / SUMMARY_SIZE;
|
|
|
|
if (valid_sum_count <= sum_in_page)
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
return 1;
|
2013-10-29 16:21:47 +08:00
|
|
|
else if ((valid_sum_count - sum_in_page) <=
|
mm, fs: get rid of PAGE_CACHE_* and page_cache_{get,release} macros
PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were introduced *long* time
ago with promise that one day it will be possible to implement page
cache with bigger chunks than PAGE_SIZE.
This promise never materialized. And unlikely will.
We have many places where PAGE_CACHE_SIZE assumed to be equal to
PAGE_SIZE. And it's constant source of confusion on whether
PAGE_CACHE_* or PAGE_* constant should be used in a particular case,
especially on the border between fs and mm.
Global switching to PAGE_CACHE_SIZE != PAGE_SIZE would cause to much
breakage to be doable.
Let's stop pretending that pages in page cache are special. They are
not.
The changes are pretty straight-forward:
- <foo> << (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;
- <foo> >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;
- PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} -> PAGE_{SIZE,SHIFT,MASK,ALIGN};
- page_cache_get() -> get_page();
- page_cache_release() -> put_page();
This patch contains automated changes generated with coccinelle using
script below. For some reason, coccinelle doesn't patch header files.
I've called spatch for them manually.
The only adjustment after coccinelle is revert of changes to
PAGE_CAHCE_ALIGN definition: we are going to drop it later.
There are few places in the code where coccinelle didn't reach. I'll
fix them manually in a separate patch. Comments and documentation also
will be addressed with the separate patch.
virtual patch
@@
expression E;
@@
- E << (PAGE_CACHE_SHIFT - PAGE_SHIFT)
+ E
@@
expression E;
@@
- E >> (PAGE_CACHE_SHIFT - PAGE_SHIFT)
+ E
@@
@@
- PAGE_CACHE_SHIFT
+ PAGE_SHIFT
@@
@@
- PAGE_CACHE_SIZE
+ PAGE_SIZE
@@
@@
- PAGE_CACHE_MASK
+ PAGE_MASK
@@
expression E;
@@
- PAGE_CACHE_ALIGN(E)
+ PAGE_ALIGN(E)
@@
expression E;
@@
- page_cache_get(E)
+ get_page(E)
@@
expression E;
@@
- page_cache_release(E)
+ put_page(E)
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-04-01 20:29:47 +08:00
|
|
|
(PAGE_SIZE - SUM_FOOTER_SIZE) / SUMMARY_SIZE)
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
return 2;
|
|
|
|
return 3;
|
|
|
|
}
|
|
|
|
|
2012-11-29 12:28:09 +08:00
|
|
|
/*
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
* Caller should put this summary page
|
|
|
|
*/
|
f2fs: clean up symbol namespace
As Ted reported:
"Hi, I was looking at f2fs's sources recently, and I noticed that there
is a very large number of non-static symbols which don't have a f2fs
prefix. There's well over a hundred (see attached below).
As one example, in fs/f2fs/dir.c there is:
unsigned char get_de_type(struct f2fs_dir_entry *de)
This function is clearly only useful for f2fs, but it has a generic
name. This means that if any other file system tries to have the same
symbol name, there will be a symbol conflict and the kernel would not
successfully build. It also means that when someone is looking f2fs
sources, it's not at all obvious whether a function such as
read_data_page(), invalidate_blocks(), is a generic kernel function
found in the fs, mm, or block layers, or a f2fs specific function.
You might want to fix this at some point. Hopefully Kent's bcachefs
isn't similarly using genericly named functions, since that might
cause conflicts with f2fs's functions --- but just as this would be a
problem that we would rightly insist that Kent fix, this is something
that we should have rightly insisted that f2fs should have fixed
before it was integrated into the mainline kernel.
acquire_orphan_inode
add_ino_entry
add_orphan_inode
allocate_data_block
allocate_new_segments
alloc_nid
alloc_nid_done
alloc_nid_failed
available_free_memory
...."
This patch adds "f2fs_" prefix for all non-static symbols in order to:
a) avoid conflict with other kernel generic symbols;
b) to indicate the function is f2fs specific one instead of generic
one;
Reported-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-30 00:20:41 +08:00
|
|
|
struct page *f2fs_get_sum_page(struct f2fs_sb_info *sbi, unsigned int segno)
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
{
|
2018-07-17 00:02:17 +08:00
|
|
|
return f2fs_get_meta_page_nofail(sbi, GET_SUM_BLOCK(sbi, segno));
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
}
|
|
|
|
|
f2fs: clean up symbol namespace
As Ted reported:
"Hi, I was looking at f2fs's sources recently, and I noticed that there
is a very large number of non-static symbols which don't have a f2fs
prefix. There's well over a hundred (see attached below).
As one example, in fs/f2fs/dir.c there is:
unsigned char get_de_type(struct f2fs_dir_entry *de)
This function is clearly only useful for f2fs, but it has a generic
name. This means that if any other file system tries to have the same
symbol name, there will be a symbol conflict and the kernel would not
successfully build. It also means that when someone is looking f2fs
sources, it's not at all obvious whether a function such as
read_data_page(), invalidate_blocks(), is a generic kernel function
found in the fs, mm, or block layers, or a f2fs specific function.
You might want to fix this at some point. Hopefully Kent's bcachefs
isn't similarly using genericly named functions, since that might
cause conflicts with f2fs's functions --- but just as this would be a
problem that we would rightly insist that Kent fix, this is something
that we should have rightly insisted that f2fs should have fixed
before it was integrated into the mainline kernel.
acquire_orphan_inode
add_ino_entry
add_orphan_inode
allocate_data_block
allocate_new_segments
alloc_nid
alloc_nid_done
alloc_nid_failed
available_free_memory
...."
This patch adds "f2fs_" prefix for all non-static symbols in order to:
a) avoid conflict with other kernel generic symbols;
b) to indicate the function is f2fs specific one instead of generic
one;
Reported-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-30 00:20:41 +08:00
|
|
|
void f2fs_update_meta_page(struct f2fs_sb_info *sbi,
|
|
|
|
void *src, block_t blk_addr)
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
{
|
f2fs: clean up symbol namespace
As Ted reported:
"Hi, I was looking at f2fs's sources recently, and I noticed that there
is a very large number of non-static symbols which don't have a f2fs
prefix. There's well over a hundred (see attached below).
As one example, in fs/f2fs/dir.c there is:
unsigned char get_de_type(struct f2fs_dir_entry *de)
This function is clearly only useful for f2fs, but it has a generic
name. This means that if any other file system tries to have the same
symbol name, there will be a symbol conflict and the kernel would not
successfully build. It also means that when someone is looking f2fs
sources, it's not at all obvious whether a function such as
read_data_page(), invalidate_blocks(), is a generic kernel function
found in the fs, mm, or block layers, or a f2fs specific function.
You might want to fix this at some point. Hopefully Kent's bcachefs
isn't similarly using genericly named functions, since that might
cause conflicts with f2fs's functions --- but just as this would be a
problem that we would rightly insist that Kent fix, this is something
that we should have rightly insisted that f2fs should have fixed
before it was integrated into the mainline kernel.
acquire_orphan_inode
add_ino_entry
add_orphan_inode
allocate_data_block
allocate_new_segments
alloc_nid
alloc_nid_done
alloc_nid_failed
available_free_memory
...."
This patch adds "f2fs_" prefix for all non-static symbols in order to:
a) avoid conflict with other kernel generic symbols;
b) to indicate the function is f2fs specific one instead of generic
one;
Reported-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-30 00:20:41 +08:00
|
|
|
struct page *page = f2fs_grab_meta_page(sbi, blk_addr);
|
2015-05-19 17:40:04 +08:00
|
|
|
|
2017-11-02 20:41:02 +08:00
|
|
|
memcpy(page_address(page), src, PAGE_SIZE);
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
set_page_dirty(page);
|
|
|
|
f2fs_put_page(page, 1);
|
|
|
|
}
|
|
|
|
|
2015-05-19 17:40:04 +08:00
|
|
|
static void write_sum_page(struct f2fs_sb_info *sbi,
|
|
|
|
struct f2fs_summary_block *sum_blk, block_t blk_addr)
|
|
|
|
{
|
f2fs: clean up symbol namespace
As Ted reported:
"Hi, I was looking at f2fs's sources recently, and I noticed that there
is a very large number of non-static symbols which don't have a f2fs
prefix. There's well over a hundred (see attached below).
As one example, in fs/f2fs/dir.c there is:
unsigned char get_de_type(struct f2fs_dir_entry *de)
This function is clearly only useful for f2fs, but it has a generic
name. This means that if any other file system tries to have the same
symbol name, there will be a symbol conflict and the kernel would not
successfully build. It also means that when someone is looking f2fs
sources, it's not at all obvious whether a function such as
read_data_page(), invalidate_blocks(), is a generic kernel function
found in the fs, mm, or block layers, or a f2fs specific function.
You might want to fix this at some point. Hopefully Kent's bcachefs
isn't similarly using genericly named functions, since that might
cause conflicts with f2fs's functions --- but just as this would be a
problem that we would rightly insist that Kent fix, this is something
that we should have rightly insisted that f2fs should have fixed
before it was integrated into the mainline kernel.
acquire_orphan_inode
add_ino_entry
add_orphan_inode
allocate_data_block
allocate_new_segments
alloc_nid
alloc_nid_done
alloc_nid_failed
available_free_memory
...."
This patch adds "f2fs_" prefix for all non-static symbols in order to:
a) avoid conflict with other kernel generic symbols;
b) to indicate the function is f2fs specific one instead of generic
one;
Reported-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-30 00:20:41 +08:00
|
|
|
f2fs_update_meta_page(sbi, (void *)sum_blk, blk_addr);
|
2015-05-19 17:40:04 +08:00
|
|
|
}
|
|
|
|
|
f2fs: split journal cache from curseg cache
In curseg cache, f2fs caches two different parts:
- datas of current summay block, i.e. summary entries, footer info.
- journal info, i.e. sparse nat/sit entries or io stat info.
With this approach, 1) it may cause higher lock contention when we access
or update both of the parts of cache since we use the same mutex lock
curseg_mutex to protect the cache. 2) current summary block with last
journal info will be writebacked into device as a normal summary block
when flushing, however, we treat journal info as valid one only in current
summary, so most normal summary blocks contain junk journal data, it wastes
remaining space of summary block.
So, in order to fix above issues, we split curseg cache into two parts:
a) current summary block, protected by original mutex lock curseg_mutex
b) journal cache, protected by newly introduced r/w semaphore journal_rwsem
When loading curseg cache during ->mount, we store summary info and
journal info into different caches; When doing checkpoint, we combine
datas of two cache into current summary block for persisting.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-02-19 18:08:46 +08:00
|
|
|
static void write_current_sum_page(struct f2fs_sb_info *sbi,
|
|
|
|
int type, block_t blk_addr)
|
|
|
|
{
|
|
|
|
struct curseg_info *curseg = CURSEG_I(sbi, type);
|
f2fs: clean up symbol namespace
As Ted reported:
"Hi, I was looking at f2fs's sources recently, and I noticed that there
is a very large number of non-static symbols which don't have a f2fs
prefix. There's well over a hundred (see attached below).
As one example, in fs/f2fs/dir.c there is:
unsigned char get_de_type(struct f2fs_dir_entry *de)
This function is clearly only useful for f2fs, but it has a generic
name. This means that if any other file system tries to have the same
symbol name, there will be a symbol conflict and the kernel would not
successfully build. It also means that when someone is looking f2fs
sources, it's not at all obvious whether a function such as
read_data_page(), invalidate_blocks(), is a generic kernel function
found in the fs, mm, or block layers, or a f2fs specific function.
You might want to fix this at some point. Hopefully Kent's bcachefs
isn't similarly using genericly named functions, since that might
cause conflicts with f2fs's functions --- but just as this would be a
problem that we would rightly insist that Kent fix, this is something
that we should have rightly insisted that f2fs should have fixed
before it was integrated into the mainline kernel.
acquire_orphan_inode
add_ino_entry
add_orphan_inode
allocate_data_block
allocate_new_segments
alloc_nid
alloc_nid_done
alloc_nid_failed
available_free_memory
...."
This patch adds "f2fs_" prefix for all non-static symbols in order to:
a) avoid conflict with other kernel generic symbols;
b) to indicate the function is f2fs specific one instead of generic
one;
Reported-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-30 00:20:41 +08:00
|
|
|
struct page *page = f2fs_grab_meta_page(sbi, blk_addr);
|
f2fs: split journal cache from curseg cache
In curseg cache, f2fs caches two different parts:
- datas of current summay block, i.e. summary entries, footer info.
- journal info, i.e. sparse nat/sit entries or io stat info.
With this approach, 1) it may cause higher lock contention when we access
or update both of the parts of cache since we use the same mutex lock
curseg_mutex to protect the cache. 2) current summary block with last
journal info will be writebacked into device as a normal summary block
when flushing, however, we treat journal info as valid one only in current
summary, so most normal summary blocks contain junk journal data, it wastes
remaining space of summary block.
So, in order to fix above issues, we split curseg cache into two parts:
a) current summary block, protected by original mutex lock curseg_mutex
b) journal cache, protected by newly introduced r/w semaphore journal_rwsem
When loading curseg cache during ->mount, we store summary info and
journal info into different caches; When doing checkpoint, we combine
datas of two cache into current summary block for persisting.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-02-19 18:08:46 +08:00
|
|
|
struct f2fs_summary_block *src = curseg->sum_blk;
|
|
|
|
struct f2fs_summary_block *dst;
|
|
|
|
|
|
|
|
dst = (struct f2fs_summary_block *)page_address(page);
|
2018-04-09 20:25:06 +08:00
|
|
|
memset(dst, 0, PAGE_SIZE);
|
f2fs: split journal cache from curseg cache
In curseg cache, f2fs caches two different parts:
- datas of current summay block, i.e. summary entries, footer info.
- journal info, i.e. sparse nat/sit entries or io stat info.
With this approach, 1) it may cause higher lock contention when we access
or update both of the parts of cache since we use the same mutex lock
curseg_mutex to protect the cache. 2) current summary block with last
journal info will be writebacked into device as a normal summary block
when flushing, however, we treat journal info as valid one only in current
summary, so most normal summary blocks contain junk journal data, it wastes
remaining space of summary block.
So, in order to fix above issues, we split curseg cache into two parts:
a) current summary block, protected by original mutex lock curseg_mutex
b) journal cache, protected by newly introduced r/w semaphore journal_rwsem
When loading curseg cache during ->mount, we store summary info and
journal info into different caches; When doing checkpoint, we combine
datas of two cache into current summary block for persisting.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-02-19 18:08:46 +08:00
|
|
|
|
|
|
|
mutex_lock(&curseg->curseg_mutex);
|
|
|
|
|
|
|
|
down_read(&curseg->journal_rwsem);
|
|
|
|
memcpy(&dst->journal, curseg->journal, SUM_JOURNAL_SIZE);
|
|
|
|
up_read(&curseg->journal_rwsem);
|
|
|
|
|
|
|
|
memcpy(dst->entries, src->entries, SUM_ENTRY_SIZE);
|
|
|
|
memcpy(&dst->footer, &src->footer, SUM_FOOTER_SIZE);
|
|
|
|
|
|
|
|
mutex_unlock(&curseg->curseg_mutex);
|
|
|
|
|
|
|
|
set_page_dirty(page);
|
|
|
|
f2fs_put_page(page, 1);
|
|
|
|
}
|
|
|
|
|
2017-04-21 04:51:57 +08:00
|
|
|
static int is_next_segment_free(struct f2fs_sb_info *sbi, int type)
|
|
|
|
{
|
|
|
|
struct curseg_info *curseg = CURSEG_I(sbi, type);
|
|
|
|
unsigned int segno = curseg->segno + 1;
|
|
|
|
struct free_segmap_info *free_i = FREE_I(sbi);
|
|
|
|
|
|
|
|
if (segno < MAIN_SEGS(sbi) && segno % sbi->segs_per_sec)
|
|
|
|
return !test_bit(segno, free_i->free_segmap);
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2012-11-29 12:28:09 +08:00
|
|
|
/*
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
* Find a new segment from the free segments bitmap to right order
|
|
|
|
* This function should be returned with success, otherwise BUG
|
|
|
|
*/
|
|
|
|
static void get_new_segment(struct f2fs_sb_info *sbi,
|
|
|
|
unsigned int *newseg, bool new_sec, int dir)
|
|
|
|
{
|
|
|
|
struct free_segmap_info *free_i = FREE_I(sbi);
|
|
|
|
unsigned int segno, secno, zoneno;
|
2014-09-24 02:23:01 +08:00
|
|
|
unsigned int total_zones = MAIN_SECS(sbi) / sbi->secs_per_zone;
|
2017-04-08 06:08:17 +08:00
|
|
|
unsigned int hint = GET_SEC_FROM_SEG(sbi, *newseg);
|
|
|
|
unsigned int old_zoneno = GET_ZONE_FROM_SEG(sbi, *newseg);
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
unsigned int left_start = hint;
|
|
|
|
bool init = true;
|
|
|
|
int go_left = 0;
|
|
|
|
int i;
|
|
|
|
|
2015-02-11 18:20:38 +08:00
|
|
|
spin_lock(&free_i->segmap_lock);
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
|
|
|
|
if (!new_sec && ((*newseg + 1) % sbi->segs_per_sec)) {
|
|
|
|
segno = find_next_zero_bit(free_i->free_segmap,
|
2017-04-08 06:08:17 +08:00
|
|
|
GET_SEG_FROM_SEC(sbi, hint + 1), *newseg + 1);
|
|
|
|
if (segno < GET_SEG_FROM_SEC(sbi, hint + 1))
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
goto got_it;
|
|
|
|
}
|
|
|
|
find_other_zone:
|
2014-09-24 02:23:01 +08:00
|
|
|
secno = find_next_zero_bit(free_i->free_secmap, MAIN_SECS(sbi), hint);
|
|
|
|
if (secno >= MAIN_SECS(sbi)) {
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
if (dir == ALLOC_RIGHT) {
|
|
|
|
secno = find_next_zero_bit(free_i->free_secmap,
|
2014-09-24 02:23:01 +08:00
|
|
|
MAIN_SECS(sbi), 0);
|
|
|
|
f2fs_bug_on(sbi, secno >= MAIN_SECS(sbi));
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
} else {
|
|
|
|
go_left = 1;
|
|
|
|
left_start = hint - 1;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
if (go_left == 0)
|
|
|
|
goto skip_left;
|
|
|
|
|
|
|
|
while (test_bit(left_start, free_i->free_secmap)) {
|
|
|
|
if (left_start > 0) {
|
|
|
|
left_start--;
|
|
|
|
continue;
|
|
|
|
}
|
|
|
|
left_start = find_next_zero_bit(free_i->free_secmap,
|
2014-09-24 02:23:01 +08:00
|
|
|
MAIN_SECS(sbi), 0);
|
|
|
|
f2fs_bug_on(sbi, left_start >= MAIN_SECS(sbi));
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
break;
|
|
|
|
}
|
|
|
|
secno = left_start;
|
|
|
|
skip_left:
|
2017-04-08 06:08:17 +08:00
|
|
|
segno = GET_SEG_FROM_SEC(sbi, secno);
|
|
|
|
zoneno = GET_ZONE_FROM_SEC(sbi, secno);
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
|
|
|
|
/* give up on finding another zone */
|
|
|
|
if (!init)
|
|
|
|
goto got_it;
|
|
|
|
if (sbi->secs_per_zone == 1)
|
|
|
|
goto got_it;
|
|
|
|
if (zoneno == old_zoneno)
|
|
|
|
goto got_it;
|
|
|
|
if (dir == ALLOC_LEFT) {
|
|
|
|
if (!go_left && zoneno + 1 >= total_zones)
|
|
|
|
goto got_it;
|
|
|
|
if (go_left && zoneno == 0)
|
|
|
|
goto got_it;
|
|
|
|
}
|
|
|
|
for (i = 0; i < NR_CURSEG_TYPE; i++)
|
|
|
|
if (CURSEG_I(sbi, i)->zone == zoneno)
|
|
|
|
break;
|
|
|
|
|
|
|
|
if (i < NR_CURSEG_TYPE) {
|
|
|
|
/* zone is in user, try another */
|
|
|
|
if (go_left)
|
|
|
|
hint = zoneno * sbi->secs_per_zone - 1;
|
|
|
|
else if (zoneno + 1 >= total_zones)
|
|
|
|
hint = 0;
|
|
|
|
else
|
|
|
|
hint = (zoneno + 1) * sbi->secs_per_zone;
|
|
|
|
init = false;
|
|
|
|
goto find_other_zone;
|
|
|
|
}
|
|
|
|
got_it:
|
|
|
|
/* set it as dirty segment in free segmap */
|
2014-09-03 06:52:58 +08:00
|
|
|
f2fs_bug_on(sbi, test_bit(segno, free_i->free_segmap));
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
__set_inuse(sbi, segno);
|
|
|
|
*newseg = segno;
|
2015-02-11 18:20:38 +08:00
|
|
|
spin_unlock(&free_i->segmap_lock);
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
static void reset_curseg(struct f2fs_sb_info *sbi, int type, int modified)
|
|
|
|
{
|
|
|
|
struct curseg_info *curseg = CURSEG_I(sbi, type);
|
|
|
|
struct summary_footer *sum_footer;
|
|
|
|
|
|
|
|
curseg->segno = curseg->next_segno;
|
2017-04-08 06:08:17 +08:00
|
|
|
curseg->zone = GET_ZONE_FROM_SEG(sbi, curseg->segno);
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
curseg->next_blkoff = 0;
|
|
|
|
curseg->next_segno = NULL_SEGNO;
|
|
|
|
|
|
|
|
sum_footer = &(curseg->sum_blk->footer);
|
|
|
|
memset(sum_footer, 0, sizeof(struct summary_footer));
|
|
|
|
if (IS_DATASEG(type))
|
|
|
|
SET_SUM_TYPE(sum_footer, SUM_TYPE_DATA);
|
|
|
|
if (IS_NODESEG(type))
|
|
|
|
SET_SUM_TYPE(sum_footer, SUM_TYPE_NODE);
|
|
|
|
__set_sit_entry_type(sbi, type, curseg->segno, modified);
|
|
|
|
}
|
|
|
|
|
2017-03-25 08:41:45 +08:00
|
|
|
static unsigned int __get_next_segno(struct f2fs_sb_info *sbi, int type)
|
|
|
|
{
|
2017-04-21 04:51:57 +08:00
|
|
|
/* if segs_per_sec is large than 1, we need to keep original policy. */
|
|
|
|
if (sbi->segs_per_sec != 1)
|
|
|
|
return CURSEG_I(sbi, type)->segno;
|
|
|
|
|
2018-01-29 11:37:45 +08:00
|
|
|
if (test_opt(sbi, NOHEAP) &&
|
|
|
|
(type == CURSEG_HOT_DATA || IS_NODESEG(type)))
|
2017-03-25 08:41:45 +08:00
|
|
|
return 0;
|
|
|
|
|
2017-04-14 06:17:00 +08:00
|
|
|
if (SIT_I(sbi)->last_victim[ALLOC_NEXT])
|
|
|
|
return SIT_I(sbi)->last_victim[ALLOC_NEXT];
|
2018-02-19 00:50:49 +08:00
|
|
|
|
|
|
|
/* find segments from 0 to reuse freed segments */
|
2018-03-08 14:22:56 +08:00
|
|
|
if (F2FS_OPTION(sbi).alloc_mode == ALLOC_MODE_REUSE)
|
2018-02-19 00:50:49 +08:00
|
|
|
return 0;
|
|
|
|
|
2017-03-25 08:41:45 +08:00
|
|
|
return CURSEG_I(sbi, type)->segno;
|
|
|
|
}
|
|
|
|
|
2012-11-29 12:28:09 +08:00
|
|
|
/*
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
* Allocate a current working segment.
|
|
|
|
* This function always allocates a free segment in LFS manner.
|
|
|
|
*/
|
|
|
|
static void new_curseg(struct f2fs_sb_info *sbi, int type, bool new_sec)
|
|
|
|
{
|
|
|
|
struct curseg_info *curseg = CURSEG_I(sbi, type);
|
|
|
|
unsigned int segno = curseg->segno;
|
|
|
|
int dir = ALLOC_LEFT;
|
|
|
|
|
|
|
|
write_sum_page(sbi, curseg->sum_blk,
|
2013-05-14 18:20:28 +08:00
|
|
|
GET_SUM_BLOCK(sbi, segno));
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
if (type == CURSEG_WARM_DATA || type == CURSEG_COLD_DATA)
|
|
|
|
dir = ALLOC_RIGHT;
|
|
|
|
|
|
|
|
if (test_opt(sbi, NOHEAP))
|
|
|
|
dir = ALLOC_RIGHT;
|
|
|
|
|
2017-03-25 08:41:45 +08:00
|
|
|
segno = __get_next_segno(sbi, type);
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
get_new_segment(sbi, &segno, new_sec, dir);
|
|
|
|
curseg->next_segno = segno;
|
|
|
|
reset_curseg(sbi, type, 1);
|
|
|
|
curseg->alloc_type = LFS;
|
|
|
|
}
|
|
|
|
|
|
|
|
static void __next_free_blkoff(struct f2fs_sb_info *sbi,
|
|
|
|
struct curseg_info *seg, block_t start)
|
|
|
|
{
|
|
|
|
struct seg_entry *se = get_seg_entry(sbi, seg->segno);
|
2013-11-15 12:21:16 +08:00
|
|
|
int entries = SIT_VBLOCK_MAP_SIZE / sizeof(unsigned long);
|
2015-02-11 08:44:29 +08:00
|
|
|
unsigned long *target_map = SIT_I(sbi)->tmp_map;
|
2013-11-15 12:21:16 +08:00
|
|
|
unsigned long *ckpt_map = (unsigned long *)se->ckpt_valid_map;
|
|
|
|
unsigned long *cur_map = (unsigned long *)se->cur_valid_map;
|
|
|
|
int i, pos;
|
|
|
|
|
|
|
|
for (i = 0; i < entries; i++)
|
|
|
|
target_map[i] = ckpt_map[i] | cur_map[i];
|
|
|
|
|
|
|
|
pos = __find_rev_next_zero_bit(target_map, sbi->blocks_per_seg, start);
|
|
|
|
|
|
|
|
seg->next_blkoff = pos;
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
}
|
|
|
|
|
2012-11-29 12:28:09 +08:00
|
|
|
/*
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
* If a segment is written by LFS manner, next block offset is just obtained
|
|
|
|
* by increasing the current block offset. However, if a segment is written by
|
|
|
|
* SSR manner, next block offset obtained by calling __next_free_blkoff
|
|
|
|
*/
|
|
|
|
static void __refresh_next_blkoff(struct f2fs_sb_info *sbi,
|
|
|
|
struct curseg_info *seg)
|
|
|
|
{
|
|
|
|
if (seg->alloc_type == SSR)
|
|
|
|
__next_free_blkoff(sbi, seg, seg->next_blkoff + 1);
|
|
|
|
else
|
|
|
|
seg->next_blkoff++;
|
|
|
|
}
|
|
|
|
|
2012-11-29 12:28:09 +08:00
|
|
|
/*
|
2014-08-06 22:22:50 +08:00
|
|
|
* This function always allocates a used segment(from dirty seglist) by SSR
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
* manner, so it should recover the existing segment information of valid blocks
|
|
|
|
*/
|
2017-08-30 18:04:48 +08:00
|
|
|
static void change_curseg(struct f2fs_sb_info *sbi, int type)
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
{
|
|
|
|
struct dirty_seglist_info *dirty_i = DIRTY_I(sbi);
|
|
|
|
struct curseg_info *curseg = CURSEG_I(sbi, type);
|
|
|
|
unsigned int new_segno = curseg->next_segno;
|
|
|
|
struct f2fs_summary_block *sum_node;
|
|
|
|
struct page *sum_page;
|
|
|
|
|
|
|
|
write_sum_page(sbi, curseg->sum_blk,
|
|
|
|
GET_SUM_BLOCK(sbi, curseg->segno));
|
|
|
|
__set_test_and_inuse(sbi, new_segno);
|
|
|
|
|
|
|
|
mutex_lock(&dirty_i->seglist_lock);
|
|
|
|
__remove_dirty_segment(sbi, new_segno, PRE);
|
|
|
|
__remove_dirty_segment(sbi, new_segno, DIRTY);
|
|
|
|
mutex_unlock(&dirty_i->seglist_lock);
|
|
|
|
|
|
|
|
reset_curseg(sbi, type, 1);
|
|
|
|
curseg->alloc_type = SSR;
|
|
|
|
__next_free_blkoff(sbi, curseg, 0);
|
|
|
|
|
f2fs: clean up symbol namespace
As Ted reported:
"Hi, I was looking at f2fs's sources recently, and I noticed that there
is a very large number of non-static symbols which don't have a f2fs
prefix. There's well over a hundred (see attached below).
As one example, in fs/f2fs/dir.c there is:
unsigned char get_de_type(struct f2fs_dir_entry *de)
This function is clearly only useful for f2fs, but it has a generic
name. This means that if any other file system tries to have the same
symbol name, there will be a symbol conflict and the kernel would not
successfully build. It also means that when someone is looking f2fs
sources, it's not at all obvious whether a function such as
read_data_page(), invalidate_blocks(), is a generic kernel function
found in the fs, mm, or block layers, or a f2fs specific function.
You might want to fix this at some point. Hopefully Kent's bcachefs
isn't similarly using genericly named functions, since that might
cause conflicts with f2fs's functions --- but just as this would be a
problem that we would rightly insist that Kent fix, this is something
that we should have rightly insisted that f2fs should have fixed
before it was integrated into the mainline kernel.
acquire_orphan_inode
add_ino_entry
add_orphan_inode
allocate_data_block
allocate_new_segments
alloc_nid
alloc_nid_done
alloc_nid_failed
available_free_memory
...."
This patch adds "f2fs_" prefix for all non-static symbols in order to:
a) avoid conflict with other kernel generic symbols;
b) to indicate the function is f2fs specific one instead of generic
one;
Reported-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-30 00:20:41 +08:00
|
|
|
sum_page = f2fs_get_sum_page(sbi, new_segno);
|
2018-09-18 08:36:06 +08:00
|
|
|
f2fs_bug_on(sbi, IS_ERR(sum_page));
|
2017-08-30 18:04:48 +08:00
|
|
|
sum_node = (struct f2fs_summary_block *)page_address(sum_page);
|
|
|
|
memcpy(curseg->sum_blk, sum_node, SUM_ENTRY_SIZE);
|
|
|
|
f2fs_put_page(sum_page, 1);
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
}
|
|
|
|
|
2013-02-04 14:11:17 +08:00
|
|
|
static int get_ssr_segment(struct f2fs_sb_info *sbi, int type)
|
|
|
|
{
|
|
|
|
struct curseg_info *curseg = CURSEG_I(sbi, type);
|
|
|
|
const struct victim_selection *v_ops = DIRTY_I(sbi)->v_ops;
|
2017-04-14 06:17:00 +08:00
|
|
|
unsigned segno = NULL_SEGNO;
|
2017-02-24 18:46:00 +08:00
|
|
|
int i, cnt;
|
|
|
|
bool reversed = false;
|
2017-02-23 09:10:18 +08:00
|
|
|
|
f2fs: clean up symbol namespace
As Ted reported:
"Hi, I was looking at f2fs's sources recently, and I noticed that there
is a very large number of non-static symbols which don't have a f2fs
prefix. There's well over a hundred (see attached below).
As one example, in fs/f2fs/dir.c there is:
unsigned char get_de_type(struct f2fs_dir_entry *de)
This function is clearly only useful for f2fs, but it has a generic
name. This means that if any other file system tries to have the same
symbol name, there will be a symbol conflict and the kernel would not
successfully build. It also means that when someone is looking f2fs
sources, it's not at all obvious whether a function such as
read_data_page(), invalidate_blocks(), is a generic kernel function
found in the fs, mm, or block layers, or a f2fs specific function.
You might want to fix this at some point. Hopefully Kent's bcachefs
isn't similarly using genericly named functions, since that might
cause conflicts with f2fs's functions --- but just as this would be a
problem that we would rightly insist that Kent fix, this is something
that we should have rightly insisted that f2fs should have fixed
before it was integrated into the mainline kernel.
acquire_orphan_inode
add_ino_entry
add_orphan_inode
allocate_data_block
allocate_new_segments
alloc_nid
alloc_nid_done
alloc_nid_failed
available_free_memory
...."
This patch adds "f2fs_" prefix for all non-static symbols in order to:
a) avoid conflict with other kernel generic symbols;
b) to indicate the function is f2fs specific one instead of generic
one;
Reported-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-30 00:20:41 +08:00
|
|
|
/* f2fs_need_SSR() already forces to do this */
|
2017-04-14 06:17:00 +08:00
|
|
|
if (v_ops->get_victim(sbi, &segno, BG_GC, type, SSR)) {
|
|
|
|
curseg->next_segno = segno;
|
2017-02-23 09:10:18 +08:00
|
|
|
return 1;
|
2017-04-14 06:17:00 +08:00
|
|
|
}
|
2013-02-04 14:11:17 +08:00
|
|
|
|
2017-02-23 09:02:32 +08:00
|
|
|
/* For node segments, let's do SSR more intensively */
|
|
|
|
if (IS_NODESEG(type)) {
|
2017-02-24 18:46:00 +08:00
|
|
|
if (type >= CURSEG_WARM_NODE) {
|
|
|
|
reversed = true;
|
|
|
|
i = CURSEG_COLD_NODE;
|
|
|
|
} else {
|
|
|
|
i = CURSEG_HOT_NODE;
|
|
|
|
}
|
|
|
|
cnt = NR_CURSEG_NODE_TYPE;
|
2017-02-23 09:02:32 +08:00
|
|
|
} else {
|
2017-02-24 18:46:00 +08:00
|
|
|
if (type >= CURSEG_WARM_DATA) {
|
|
|
|
reversed = true;
|
|
|
|
i = CURSEG_COLD_DATA;
|
|
|
|
} else {
|
|
|
|
i = CURSEG_HOT_DATA;
|
|
|
|
}
|
|
|
|
cnt = NR_CURSEG_DATA_TYPE;
|
2017-02-23 09:02:32 +08:00
|
|
|
}
|
2013-02-04 14:11:17 +08:00
|
|
|
|
2017-02-24 18:46:00 +08:00
|
|
|
for (; cnt-- > 0; reversed ? i-- : i++) {
|
2017-02-23 09:10:18 +08:00
|
|
|
if (i == type)
|
|
|
|
continue;
|
2017-04-14 06:17:00 +08:00
|
|
|
if (v_ops->get_victim(sbi, &segno, BG_GC, i, SSR)) {
|
|
|
|
curseg->next_segno = segno;
|
2013-02-04 14:11:17 +08:00
|
|
|
return 1;
|
2017-04-14 06:17:00 +08:00
|
|
|
}
|
2017-02-23 09:10:18 +08:00
|
|
|
}
|
2013-02-04 14:11:17 +08:00
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
/*
|
|
|
|
* flush out current segment and replace it with new segment
|
|
|
|
* This function should be returned with success, otherwise BUG
|
|
|
|
*/
|
|
|
|
static void allocate_segment_by_default(struct f2fs_sb_info *sbi,
|
|
|
|
int type, bool force)
|
|
|
|
{
|
2017-04-21 04:51:57 +08:00
|
|
|
struct curseg_info *curseg = CURSEG_I(sbi, type);
|
|
|
|
|
2013-08-19 09:41:15 +08:00
|
|
|
if (force)
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
new_curseg(sbi, type, true);
|
2017-02-15 11:32:51 +08:00
|
|
|
else if (!is_set_ckpt_flags(sbi, CP_CRC_RECOVERY_FLAG) &&
|
|
|
|
type == CURSEG_WARM_NODE)
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
new_curseg(sbi, type, false);
|
2017-04-21 04:51:57 +08:00
|
|
|
else if (curseg->alloc_type == LFS && is_next_segment_free(sbi, type))
|
|
|
|
new_curseg(sbi, type, false);
|
f2fs: clean up symbol namespace
As Ted reported:
"Hi, I was looking at f2fs's sources recently, and I noticed that there
is a very large number of non-static symbols which don't have a f2fs
prefix. There's well over a hundred (see attached below).
As one example, in fs/f2fs/dir.c there is:
unsigned char get_de_type(struct f2fs_dir_entry *de)
This function is clearly only useful for f2fs, but it has a generic
name. This means that if any other file system tries to have the same
symbol name, there will be a symbol conflict and the kernel would not
successfully build. It also means that when someone is looking f2fs
sources, it's not at all obvious whether a function such as
read_data_page(), invalidate_blocks(), is a generic kernel function
found in the fs, mm, or block layers, or a f2fs specific function.
You might want to fix this at some point. Hopefully Kent's bcachefs
isn't similarly using genericly named functions, since that might
cause conflicts with f2fs's functions --- but just as this would be a
problem that we would rightly insist that Kent fix, this is something
that we should have rightly insisted that f2fs should have fixed
before it was integrated into the mainline kernel.
acquire_orphan_inode
add_ino_entry
add_orphan_inode
allocate_data_block
allocate_new_segments
alloc_nid
alloc_nid_done
alloc_nid_failed
available_free_memory
...."
This patch adds "f2fs_" prefix for all non-static symbols in order to:
a) avoid conflict with other kernel generic symbols;
b) to indicate the function is f2fs specific one instead of generic
one;
Reported-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-30 00:20:41 +08:00
|
|
|
else if (f2fs_need_SSR(sbi) && get_ssr_segment(sbi, type))
|
2017-08-30 18:04:48 +08:00
|
|
|
change_curseg(sbi, type);
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
else
|
|
|
|
new_curseg(sbi, type, false);
|
2013-10-22 19:56:10 +08:00
|
|
|
|
2017-04-21 04:51:57 +08:00
|
|
|
stat_inc_seg_type(sbi, curseg);
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
}
|
|
|
|
|
f2fs: clean up symbol namespace
As Ted reported:
"Hi, I was looking at f2fs's sources recently, and I noticed that there
is a very large number of non-static symbols which don't have a f2fs
prefix. There's well over a hundred (see attached below).
As one example, in fs/f2fs/dir.c there is:
unsigned char get_de_type(struct f2fs_dir_entry *de)
This function is clearly only useful for f2fs, but it has a generic
name. This means that if any other file system tries to have the same
symbol name, there will be a symbol conflict and the kernel would not
successfully build. It also means that when someone is looking f2fs
sources, it's not at all obvious whether a function such as
read_data_page(), invalidate_blocks(), is a generic kernel function
found in the fs, mm, or block layers, or a f2fs specific function.
You might want to fix this at some point. Hopefully Kent's bcachefs
isn't similarly using genericly named functions, since that might
cause conflicts with f2fs's functions --- but just as this would be a
problem that we would rightly insist that Kent fix, this is something
that we should have rightly insisted that f2fs should have fixed
before it was integrated into the mainline kernel.
acquire_orphan_inode
add_ino_entry
add_orphan_inode
allocate_data_block
allocate_new_segments
alloc_nid
alloc_nid_done
alloc_nid_failed
available_free_memory
...."
This patch adds "f2fs_" prefix for all non-static symbols in order to:
a) avoid conflict with other kernel generic symbols;
b) to indicate the function is f2fs specific one instead of generic
one;
Reported-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-30 00:20:41 +08:00
|
|
|
void f2fs_allocate_new_segments(struct f2fs_sb_info *sbi)
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
{
|
2016-11-12 04:31:40 +08:00
|
|
|
struct curseg_info *curseg;
|
|
|
|
unsigned int old_segno;
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
int i;
|
|
|
|
|
2017-10-30 17:49:53 +08:00
|
|
|
down_write(&SIT_I(sbi)->sentry_lock);
|
|
|
|
|
2016-11-12 04:31:40 +08:00
|
|
|
for (i = CURSEG_HOT_DATA; i <= CURSEG_COLD_DATA; i++) {
|
|
|
|
curseg = CURSEG_I(sbi, i);
|
|
|
|
old_segno = curseg->segno;
|
|
|
|
SIT_I(sbi)->s_ops->allocate_segment(sbi, i, true);
|
|
|
|
locate_dirty_segment(sbi, old_segno);
|
|
|
|
}
|
2017-10-30 17:49:53 +08:00
|
|
|
|
|
|
|
up_write(&SIT_I(sbi)->sentry_lock);
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
static const struct segment_allocation default_salloc_ops = {
|
|
|
|
.allocate_segment = allocate_segment_by_default,
|
|
|
|
};
|
|
|
|
|
f2fs: clean up symbol namespace
As Ted reported:
"Hi, I was looking at f2fs's sources recently, and I noticed that there
is a very large number of non-static symbols which don't have a f2fs
prefix. There's well over a hundred (see attached below).
As one example, in fs/f2fs/dir.c there is:
unsigned char get_de_type(struct f2fs_dir_entry *de)
This function is clearly only useful for f2fs, but it has a generic
name. This means that if any other file system tries to have the same
symbol name, there will be a symbol conflict and the kernel would not
successfully build. It also means that when someone is looking f2fs
sources, it's not at all obvious whether a function such as
read_data_page(), invalidate_blocks(), is a generic kernel function
found in the fs, mm, or block layers, or a f2fs specific function.
You might want to fix this at some point. Hopefully Kent's bcachefs
isn't similarly using genericly named functions, since that might
cause conflicts with f2fs's functions --- but just as this would be a
problem that we would rightly insist that Kent fix, this is something
that we should have rightly insisted that f2fs should have fixed
before it was integrated into the mainline kernel.
acquire_orphan_inode
add_ino_entry
add_orphan_inode
allocate_data_block
allocate_new_segments
alloc_nid
alloc_nid_done
alloc_nid_failed
available_free_memory
...."
This patch adds "f2fs_" prefix for all non-static symbols in order to:
a) avoid conflict with other kernel generic symbols;
b) to indicate the function is f2fs specific one instead of generic
one;
Reported-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-30 00:20:41 +08:00
|
|
|
bool f2fs_exist_trim_candidates(struct f2fs_sb_info *sbi,
|
|
|
|
struct cp_control *cpc)
|
2016-12-30 14:06:15 +08:00
|
|
|
{
|
|
|
|
__u64 trim_start = cpc->trim_start;
|
|
|
|
bool has_candidate = false;
|
|
|
|
|
2017-10-30 17:49:53 +08:00
|
|
|
down_write(&SIT_I(sbi)->sentry_lock);
|
2016-12-30 14:06:15 +08:00
|
|
|
for (; cpc->trim_start <= cpc->trim_end; cpc->trim_start++) {
|
|
|
|
if (add_discard_addrs(sbi, cpc, true)) {
|
|
|
|
has_candidate = true;
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
}
|
2017-10-30 17:49:53 +08:00
|
|
|
up_write(&SIT_I(sbi)->sentry_lock);
|
2016-12-30 14:06:15 +08:00
|
|
|
|
|
|
|
cpc->trim_start = trim_start;
|
|
|
|
return has_candidate;
|
|
|
|
}
|
|
|
|
|
2018-06-25 20:33:24 +08:00
|
|
|
static unsigned int __issue_discard_cmd_range(struct f2fs_sb_info *sbi,
|
2018-05-25 04:57:26 +08:00
|
|
|
struct discard_policy *dpolicy,
|
|
|
|
unsigned int start, unsigned int end)
|
|
|
|
{
|
|
|
|
struct discard_cmd_control *dcc = SM_I(sbi)->dcc_info;
|
|
|
|
struct discard_cmd *prev_dc = NULL, *next_dc = NULL;
|
|
|
|
struct rb_node **insert_p = NULL, *insert_parent = NULL;
|
|
|
|
struct discard_cmd *dc;
|
|
|
|
struct blk_plug plug;
|
|
|
|
int issued;
|
2018-06-25 20:33:24 +08:00
|
|
|
unsigned int trimmed = 0;
|
2018-05-25 04:57:26 +08:00
|
|
|
|
|
|
|
next:
|
|
|
|
issued = 0;
|
|
|
|
|
|
|
|
mutex_lock(&dcc->cmd_lock);
|
2018-06-22 16:06:59 +08:00
|
|
|
if (unlikely(dcc->rbtree_check))
|
|
|
|
f2fs_bug_on(sbi, !f2fs_check_rb_tree_consistence(sbi,
|
|
|
|
&dcc->root));
|
2018-05-25 04:57:26 +08:00
|
|
|
|
f2fs: clean up symbol namespace
As Ted reported:
"Hi, I was looking at f2fs's sources recently, and I noticed that there
is a very large number of non-static symbols which don't have a f2fs
prefix. There's well over a hundred (see attached below).
As one example, in fs/f2fs/dir.c there is:
unsigned char get_de_type(struct f2fs_dir_entry *de)
This function is clearly only useful for f2fs, but it has a generic
name. This means that if any other file system tries to have the same
symbol name, there will be a symbol conflict and the kernel would not
successfully build. It also means that when someone is looking f2fs
sources, it's not at all obvious whether a function such as
read_data_page(), invalidate_blocks(), is a generic kernel function
found in the fs, mm, or block layers, or a f2fs specific function.
You might want to fix this at some point. Hopefully Kent's bcachefs
isn't similarly using genericly named functions, since that might
cause conflicts with f2fs's functions --- but just as this would be a
problem that we would rightly insist that Kent fix, this is something
that we should have rightly insisted that f2fs should have fixed
before it was integrated into the mainline kernel.
acquire_orphan_inode
add_ino_entry
add_orphan_inode
allocate_data_block
allocate_new_segments
alloc_nid
alloc_nid_done
alloc_nid_failed
available_free_memory
...."
This patch adds "f2fs_" prefix for all non-static symbols in order to:
a) avoid conflict with other kernel generic symbols;
b) to indicate the function is f2fs specific one instead of generic
one;
Reported-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-30 00:20:41 +08:00
|
|
|
dc = (struct discard_cmd *)f2fs_lookup_rb_tree_ret(&dcc->root,
|
2018-05-25 04:57:26 +08:00
|
|
|
NULL, start,
|
|
|
|
(struct rb_entry **)&prev_dc,
|
|
|
|
(struct rb_entry **)&next_dc,
|
|
|
|
&insert_p, &insert_parent, true);
|
|
|
|
if (!dc)
|
|
|
|
dc = next_dc;
|
|
|
|
|
|
|
|
blk_start_plug(&plug);
|
|
|
|
|
|
|
|
while (dc && dc->lstart <= end) {
|
|
|
|
struct rb_node *node;
|
2018-08-08 10:14:55 +08:00
|
|
|
int err = 0;
|
2018-05-25 04:57:26 +08:00
|
|
|
|
|
|
|
if (dc->len < dpolicy->granularity)
|
|
|
|
goto skip;
|
|
|
|
|
|
|
|
if (dc->state != D_PREP) {
|
|
|
|
list_move_tail(&dc->list, &dcc->fstrim_list);
|
|
|
|
goto skip;
|
|
|
|
}
|
|
|
|
|
2018-08-08 10:14:55 +08:00
|
|
|
err = __submit_discard_cmd(sbi, dpolicy, dc, &issued);
|
2018-05-25 04:57:26 +08:00
|
|
|
|
2018-08-06 22:43:50 +08:00
|
|
|
if (issued >= dpolicy->max_requests) {
|
2018-05-25 04:57:26 +08:00
|
|
|
start = dc->lstart + dc->len;
|
|
|
|
|
2018-08-08 10:14:55 +08:00
|
|
|
if (err)
|
|
|
|
__remove_discard_cmd(sbi, dc);
|
|
|
|
|
2018-05-25 04:57:26 +08:00
|
|
|
blk_finish_plug(&plug);
|
|
|
|
mutex_unlock(&dcc->cmd_lock);
|
2018-06-25 20:33:24 +08:00
|
|
|
trimmed += __wait_all_discard_cmd(sbi, NULL);
|
2018-05-25 04:57:26 +08:00
|
|
|
congestion_wait(BLK_RW_ASYNC, HZ/50);
|
|
|
|
goto next;
|
|
|
|
}
|
|
|
|
skip:
|
|
|
|
node = rb_next(&dc->rb_node);
|
2018-08-08 10:14:55 +08:00
|
|
|
if (err)
|
|
|
|
__remove_discard_cmd(sbi, dc);
|
2018-05-25 04:57:26 +08:00
|
|
|
dc = rb_entry_safe(node, struct discard_cmd, rb_node);
|
|
|
|
|
|
|
|
if (fatal_signal_pending(current))
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
|
|
|
|
blk_finish_plug(&plug);
|
|
|
|
mutex_unlock(&dcc->cmd_lock);
|
2018-06-25 20:33:24 +08:00
|
|
|
|
|
|
|
return trimmed;
|
2018-05-25 04:57:26 +08:00
|
|
|
}
|
|
|
|
|
2014-09-21 13:06:39 +08:00
|
|
|
int f2fs_trim_fs(struct f2fs_sb_info *sbi, struct fstrim_range *range)
|
|
|
|
{
|
2015-02-10 04:02:44 +08:00
|
|
|
__u64 start = F2FS_BYTES_TO_BLK(range->start);
|
|
|
|
__u64 end = start + F2FS_BYTES_TO_BLK(range->len) - 1;
|
2018-04-09 10:25:23 +08:00
|
|
|
unsigned int start_segno, end_segno;
|
2017-10-04 09:08:32 +08:00
|
|
|
block_t start_block, end_block;
|
2014-09-21 13:06:39 +08:00
|
|
|
struct cp_control cpc;
|
2017-10-04 09:08:34 +08:00
|
|
|
struct discard_policy dpolicy;
|
2017-10-28 16:52:32 +08:00
|
|
|
unsigned long long trimmed = 0;
|
2015-12-23 17:50:30 +08:00
|
|
|
int err = 0;
|
f2fs: issue discard align to section in LFS mode
For the case when sbi->segs_per_sec > 1 with lfs mode, take
section:segment = 5 for example, if the section prefree_map is
...previous section | current section (1 1 0 1 1) | next section...,
then the start = x, end = x + 1, after start = start_segno +
sbi->segs_per_sec, start = x + 5, then it will skip x + 3 and x + 4, but
their bitmap is still set, which will cause duplicated
f2fs_issue_discard of this same section in the next write_checkpoint:
round 1: section bitmap : 1 1 1 1 1, all valid, prefree_map: 0 0 0 0 0
then rm data block NO.2, block NO.2 becomes invalid, prefree_map: 0 0 1 0 0
write_checkpoint: section bitmap: 1 1 0 1 1, prefree_map: 0 0 0 0 0,
prefree of NO.2 is cleared, and no discard issued
round 2: rm data block NO.0, NO.1, NO.3, NO.4
all invalid, but prefree bit of NO.2 is set and cleared in round 1, then
prefree_map: 1 1 0 1 1
write_checkpoint: section bitmap: 0 0 0 0 0, prefree_map: 0 0 0 1 1, no
valid blocks of this section, so discard issued, but this time prefree
bit of NO.3 and NO.4 is skipped due to start = start_segno + sbi->segs_per_sec;
round 3:
write_checkpoint: section bitmap: 0 0 0 0 0, prefree_map: 0 0 0 1 1 ->
0 0 0 0 0, no valid blocks of this section, so discard issued,
this time prefree bit of NO.3 and NO.4 is cleared, but the discard of
this section is sent again...
To fix this problem, we can align the start and end value to section
boundary for fstrim and real-time discard operation, and decide to issue
discard only when the whole section is invalid, which can issue discard
aligned to section size as much as possible and avoid redundant discard.
Signed-off-by: Yunlong Song <yunlong.song@huawei.com>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-07-19 20:58:15 +08:00
|
|
|
bool need_align = test_opt(sbi, LFS) && sbi->segs_per_sec > 1;
|
2014-09-21 13:06:39 +08:00
|
|
|
|
2015-05-01 13:50:06 +08:00
|
|
|
if (start >= MAX_BLKADDR(sbi) || range->len < sbi->blocksize)
|
2014-09-21 13:06:39 +08:00
|
|
|
return -EINVAL;
|
|
|
|
|
2018-08-08 17:36:29 +08:00
|
|
|
if (end < MAIN_BLKADDR(sbi))
|
|
|
|
goto out;
|
2014-09-21 13:06:39 +08:00
|
|
|
|
2016-09-01 10:14:39 +08:00
|
|
|
if (is_sbi_flag_set(sbi, SBI_NEED_FSCK)) {
|
|
|
|
f2fs_msg(sbi->sb, KERN_WARNING,
|
|
|
|
"Found FS corruption, run fsck to fix.");
|
2018-04-08 20:39:03 +08:00
|
|
|
return -EIO;
|
2016-09-01 10:14:39 +08:00
|
|
|
}
|
|
|
|
|
2014-09-21 13:06:39 +08:00
|
|
|
/* start/end segment number in main_area */
|
2014-09-24 02:23:01 +08:00
|
|
|
start_segno = (start <= MAIN_BLKADDR(sbi)) ? 0 : GET_SEGNO(sbi, start);
|
|
|
|
end_segno = (end >= MAX_BLKADDR(sbi)) ? MAIN_SEGS(sbi) - 1 :
|
|
|
|
GET_SEGNO(sbi, end);
|
f2fs: issue discard align to section in LFS mode
For the case when sbi->segs_per_sec > 1 with lfs mode, take
section:segment = 5 for example, if the section prefree_map is
...previous section | current section (1 1 0 1 1) | next section...,
then the start = x, end = x + 1, after start = start_segno +
sbi->segs_per_sec, start = x + 5, then it will skip x + 3 and x + 4, but
their bitmap is still set, which will cause duplicated
f2fs_issue_discard of this same section in the next write_checkpoint:
round 1: section bitmap : 1 1 1 1 1, all valid, prefree_map: 0 0 0 0 0
then rm data block NO.2, block NO.2 becomes invalid, prefree_map: 0 0 1 0 0
write_checkpoint: section bitmap: 1 1 0 1 1, prefree_map: 0 0 0 0 0,
prefree of NO.2 is cleared, and no discard issued
round 2: rm data block NO.0, NO.1, NO.3, NO.4
all invalid, but prefree bit of NO.2 is set and cleared in round 1, then
prefree_map: 1 1 0 1 1
write_checkpoint: section bitmap: 0 0 0 0 0, prefree_map: 0 0 0 1 1, no
valid blocks of this section, so discard issued, but this time prefree
bit of NO.3 and NO.4 is skipped due to start = start_segno + sbi->segs_per_sec;
round 3:
write_checkpoint: section bitmap: 0 0 0 0 0, prefree_map: 0 0 0 1 1 ->
0 0 0 0 0, no valid blocks of this section, so discard issued,
this time prefree bit of NO.3 and NO.4 is cleared, but the discard of
this section is sent again...
To fix this problem, we can align the start and end value to section
boundary for fstrim and real-time discard operation, and decide to issue
discard only when the whole section is invalid, which can issue discard
aligned to section size as much as possible and avoid redundant discard.
Signed-off-by: Yunlong Song <yunlong.song@huawei.com>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-07-19 20:58:15 +08:00
|
|
|
if (need_align) {
|
|
|
|
start_segno = rounddown(start_segno, sbi->segs_per_sec);
|
|
|
|
end_segno = roundup(end_segno + 1, sbi->segs_per_sec) - 1;
|
|
|
|
}
|
2017-10-04 09:08:32 +08:00
|
|
|
|
2014-09-21 13:06:39 +08:00
|
|
|
cpc.reason = CP_DISCARD;
|
2015-05-01 13:50:06 +08:00
|
|
|
cpc.trim_minlen = max_t(__u64, 1, F2FS_BYTES_TO_BLK(range->minlen));
|
2018-04-09 10:25:23 +08:00
|
|
|
cpc.trim_start = start_segno;
|
|
|
|
cpc.trim_end = end_segno;
|
2014-09-21 13:06:39 +08:00
|
|
|
|
2018-04-09 10:25:23 +08:00
|
|
|
if (sbi->discard_blks == 0)
|
|
|
|
goto out;
|
2016-08-21 23:21:30 +08:00
|
|
|
|
2018-04-09 10:25:23 +08:00
|
|
|
mutex_lock(&sbi->gc_mutex);
|
f2fs: clean up symbol namespace
As Ted reported:
"Hi, I was looking at f2fs's sources recently, and I noticed that there
is a very large number of non-static symbols which don't have a f2fs
prefix. There's well over a hundred (see attached below).
As one example, in fs/f2fs/dir.c there is:
unsigned char get_de_type(struct f2fs_dir_entry *de)
This function is clearly only useful for f2fs, but it has a generic
name. This means that if any other file system tries to have the same
symbol name, there will be a symbol conflict and the kernel would not
successfully build. It also means that when someone is looking f2fs
sources, it's not at all obvious whether a function such as
read_data_page(), invalidate_blocks(), is a generic kernel function
found in the fs, mm, or block layers, or a f2fs specific function.
You might want to fix this at some point. Hopefully Kent's bcachefs
isn't similarly using genericly named functions, since that might
cause conflicts with f2fs's functions --- but just as this would be a
problem that we would rightly insist that Kent fix, this is something
that we should have rightly insisted that f2fs should have fixed
before it was integrated into the mainline kernel.
acquire_orphan_inode
add_ino_entry
add_orphan_inode
allocate_data_block
allocate_new_segments
alloc_nid
alloc_nid_done
alloc_nid_failed
available_free_memory
...."
This patch adds "f2fs_" prefix for all non-static symbols in order to:
a) avoid conflict with other kernel generic symbols;
b) to indicate the function is f2fs specific one instead of generic
one;
Reported-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-30 00:20:41 +08:00
|
|
|
err = f2fs_write_checkpoint(sbi, &cpc);
|
2018-04-09 10:25:23 +08:00
|
|
|
mutex_unlock(&sbi->gc_mutex);
|
|
|
|
if (err)
|
|
|
|
goto out;
|
2017-10-04 09:08:32 +08:00
|
|
|
|
2018-06-01 01:20:48 +08:00
|
|
|
/*
|
|
|
|
* We filed discard candidates, but actually we don't need to wait for
|
|
|
|
* all of them, since they'll be issued in idle time along with runtime
|
|
|
|
* discard option. User configuration looks like using runtime discard
|
|
|
|
* or periodic fstrim instead of it.
|
|
|
|
*/
|
f2fs: fix to avoid NULL pointer dereference on se->discard_map
https://bugzilla.kernel.org/show_bug.cgi?id=200951
These is a NULL pointer dereference issue reported in bugzilla:
Hi,
in the setup there is a SATA SSD connected to a SATA-to-USB bridge.
The disc is "Samsung SSD 850 PRO 256G" which supports TRIM.
There are four partitions:
sda1: FAT /boot
sda2: F2FS /
sda3: F2FS /home
sda4: F2FS
The bridge is ASMT1153e which uses the "uas" driver.
There is no TRIM pass-through, so, when mounting it reports:
mounting with "discard" option, but the device does not support discard
The USB host is USB3.0 and UASP capable. It is the one on RK3399.
Given this everything works fine, except there is no TRIM support.
In order to enable TRIM a new UDEV rule is added [1]:
/etc/udev/rules.d/10-sata-bridge-trim.rules:
ACTION=="add|change", ATTRS{idVendor}=="174c", ATTRS{idProduct}=="55aa", SUBSYSTEM=="scsi_disk", ATTR{provisioning_mode}="unmap"
After reboot any F2FS write hangs forever and dmesg reports:
Unable to handle kernel NULL pointer dereference
Also tested on a x86_64 system: works fine even with TRIM enabled.
same disc
same bridge
different usb host controller
different cpu architecture
not root filesystem
Regards,
Vicenç.
[1] Post #5 in https://bbs.archlinux.org/viewtopic.php?id=236280
Unable to handle kernel NULL pointer dereference at virtual address 000000000000003e
Mem abort info:
ESR = 0x96000004
Exception class = DABT (current EL), IL = 32 bits
SET = 0, FnV = 0
EA = 0, S1PTW = 0
Data abort info:
ISV = 0, ISS = 0x00000004
CM = 0, WnR = 0
user pgtable: 4k pages, 48-bit VAs, pgdp = 00000000626e3122
[000000000000003e] pgd=0000000000000000
Internal error: Oops: 96000004 [#1] SMP
Modules linked in: overlay snd_soc_hdmi_codec rc_cec dw_hdmi_i2s_audio dw_hdmi_cec snd_soc_simple_card snd_soc_simple_card_utils snd_soc_rockchip_i2s rockchip_rga snd_soc_rockchip_pcm rockchipdrm videobuf2_dma_sg v4l2_mem2mem rtc_rk808 videobuf2_memops analogix_dp videobuf2_v4l2 videobuf2_common dw_hdmi dw_wdt cec rc_core videodev drm_kms_helper media drm rockchip_thermal rockchip_saradc realtek drm_panel_orientation_quirks syscopyarea sysfillrect sysimgblt fb_sys_fops dwmac_rk stmmac_platform stmmac pwm_bl squashfs loop crypto_user gpio_keys hid_kensington
CPU: 5 PID: 957 Comm: nvim Not tainted 4.19.0-rc1-1-ARCH #1
Hardware name: Sapphire-RK3399 Board (DT)
pstate: 00000005 (nzcv daif -PAN -UAO)
pc : update_sit_entry+0x304/0x4b0
lr : update_sit_entry+0x108/0x4b0
sp : ffff00000ca13bd0
x29: ffff00000ca13bd0 x28: 000000000000003e
x27: 0000000000000020 x26: 0000000000080000
x25: 0000000000000048 x24: ffff8000ebb85cf8
x23: 0000000000000253 x22: 00000000ffffffff
x21: 00000000000535f2 x20: 00000000ffffffdf
x19: ffff8000eb9e6800 x18: ffff8000eb9e6be8
x17: 0000000007ce6926 x16: 000000001c83ffa8
x15: 0000000000000000 x14: ffff8000f602df90
x13: 0000000000000006 x12: 0000000000000040
x11: 0000000000000228 x10: 0000000000000000
x9 : 0000000000000000 x8 : 0000000000000000
x7 : 00000000000535f2 x6 : ffff8000ebff3440
x5 : ffff8000ebff3440 x4 : ffff8000ebe3a6c8
x3 : 00000000ffffffff x2 : 0000000000000020
x1 : 0000000000000000 x0 : ffff8000eb9e5800
Process nvim (pid: 957, stack limit = 0x0000000063a78320)
Call trace:
update_sit_entry+0x304/0x4b0
f2fs_invalidate_blocks+0x98/0x140
truncate_node+0x90/0x400
f2fs_remove_inode_page+0xe8/0x340
f2fs_evict_inode+0x2b0/0x408
evict+0xe0/0x1e0
iput+0x160/0x260
do_unlinkat+0x214/0x298
__arm64_sys_unlinkat+0x3c/0x68
el0_svc_handler+0x94/0x118
el0_svc+0x8/0xc
Code: f9400800 b9488400 36080140 f9400f01 (387c4820)
---[ end trace a0f21a307118c477 ]---
The reason is it is possible to enable discard flag on block queue via
UDEV, but during mount, f2fs will initialize se->discard_map only if
this flag is set, once the flag is set after mount, f2fs may dereference
NULL pointer on se->discard_map.
So this patch does below changes to fix this issue:
- initialize and update se->discard_map all the time.
- don't clear DISCARD option if device has no QUEUE_FLAG_DISCARD flag
during mount.
- don't issue small discard on zoned block device.
- introduce some functions to enhance the readability.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Tested-by: Vicente Bergas <vicencb@gmail.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-09-04 03:52:17 +08:00
|
|
|
if (f2fs_realtime_discard_enable(sbi))
|
2018-06-21 12:27:21 +08:00
|
|
|
goto out;
|
|
|
|
|
|
|
|
start_block = START_BLOCK(sbi, start_segno);
|
|
|
|
end_block = START_BLOCK(sbi, end_segno + 1);
|
|
|
|
|
|
|
|
__init_discard_policy(sbi, &dpolicy, DPOLICY_FSTRIM, cpc.trim_minlen);
|
2018-06-25 20:33:24 +08:00
|
|
|
trimmed = __issue_discard_cmd_range(sbi, &dpolicy,
|
|
|
|
start_block, end_block);
|
2018-06-21 12:27:21 +08:00
|
|
|
|
2018-06-25 20:33:24 +08:00
|
|
|
trimmed += __wait_discard_cmd_range(sbi, &dpolicy,
|
2017-10-28 16:52:32 +08:00
|
|
|
start_block, end_block);
|
2018-04-09 10:25:23 +08:00
|
|
|
out:
|
2018-08-05 23:09:00 +08:00
|
|
|
if (!err)
|
|
|
|
range->len = F2FS_BLK_TO_BYTES(trimmed);
|
2015-12-23 17:50:30 +08:00
|
|
|
return err;
|
2014-09-21 13:06:39 +08:00
|
|
|
}
|
|
|
|
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
static bool __has_curseg_space(struct f2fs_sb_info *sbi, int type)
|
|
|
|
{
|
|
|
|
struct curseg_info *curseg = CURSEG_I(sbi, type);
|
|
|
|
if (curseg->next_blkoff < sbi->blocks_per_seg)
|
|
|
|
return true;
|
|
|
|
return false;
|
|
|
|
}
|
|
|
|
|
f2fs: clean up symbol namespace
As Ted reported:
"Hi, I was looking at f2fs's sources recently, and I noticed that there
is a very large number of non-static symbols which don't have a f2fs
prefix. There's well over a hundred (see attached below).
As one example, in fs/f2fs/dir.c there is:
unsigned char get_de_type(struct f2fs_dir_entry *de)
This function is clearly only useful for f2fs, but it has a generic
name. This means that if any other file system tries to have the same
symbol name, there will be a symbol conflict and the kernel would not
successfully build. It also means that when someone is looking f2fs
sources, it's not at all obvious whether a function such as
read_data_page(), invalidate_blocks(), is a generic kernel function
found in the fs, mm, or block layers, or a f2fs specific function.
You might want to fix this at some point. Hopefully Kent's bcachefs
isn't similarly using genericly named functions, since that might
cause conflicts with f2fs's functions --- but just as this would be a
problem that we would rightly insist that Kent fix, this is something
that we should have rightly insisted that f2fs should have fixed
before it was integrated into the mainline kernel.
acquire_orphan_inode
add_ino_entry
add_orphan_inode
allocate_data_block
allocate_new_segments
alloc_nid
alloc_nid_done
alloc_nid_failed
available_free_memory
...."
This patch adds "f2fs_" prefix for all non-static symbols in order to:
a) avoid conflict with other kernel generic symbols;
b) to indicate the function is f2fs specific one instead of generic
one;
Reported-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-30 00:20:41 +08:00
|
|
|
int f2fs_rw_hint_to_seg_type(enum rw_hint hint)
|
2017-11-09 13:51:27 +08:00
|
|
|
{
|
|
|
|
switch (hint) {
|
|
|
|
case WRITE_LIFE_SHORT:
|
|
|
|
return CURSEG_HOT_DATA;
|
|
|
|
case WRITE_LIFE_EXTREME:
|
|
|
|
return CURSEG_COLD_DATA;
|
|
|
|
default:
|
|
|
|
return CURSEG_WARM_DATA;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2018-01-31 10:36:57 +08:00
|
|
|
/* This returns write hints for each segment type. This hints will be
|
|
|
|
* passed down to block layer. There are mapping tables which depend on
|
|
|
|
* the mount option 'whint_mode'.
|
|
|
|
*
|
|
|
|
* 1) whint_mode=off. F2FS only passes down WRITE_LIFE_NOT_SET.
|
|
|
|
*
|
|
|
|
* 2) whint_mode=user-based. F2FS tries to pass down hints given by users.
|
|
|
|
*
|
|
|
|
* User F2FS Block
|
|
|
|
* ---- ---- -----
|
|
|
|
* META WRITE_LIFE_NOT_SET
|
|
|
|
* HOT_NODE "
|
|
|
|
* WARM_NODE "
|
|
|
|
* COLD_NODE "
|
|
|
|
* ioctl(COLD) COLD_DATA WRITE_LIFE_EXTREME
|
|
|
|
* extension list " "
|
|
|
|
*
|
|
|
|
* -- buffered io
|
|
|
|
* WRITE_LIFE_EXTREME COLD_DATA WRITE_LIFE_EXTREME
|
|
|
|
* WRITE_LIFE_SHORT HOT_DATA WRITE_LIFE_SHORT
|
|
|
|
* WRITE_LIFE_NOT_SET WARM_DATA WRITE_LIFE_NOT_SET
|
|
|
|
* WRITE_LIFE_NONE " "
|
|
|
|
* WRITE_LIFE_MEDIUM " "
|
|
|
|
* WRITE_LIFE_LONG " "
|
|
|
|
*
|
|
|
|
* -- direct io
|
|
|
|
* WRITE_LIFE_EXTREME COLD_DATA WRITE_LIFE_EXTREME
|
|
|
|
* WRITE_LIFE_SHORT HOT_DATA WRITE_LIFE_SHORT
|
|
|
|
* WRITE_LIFE_NOT_SET WARM_DATA WRITE_LIFE_NOT_SET
|
|
|
|
* WRITE_LIFE_NONE " WRITE_LIFE_NONE
|
|
|
|
* WRITE_LIFE_MEDIUM " WRITE_LIFE_MEDIUM
|
|
|
|
* WRITE_LIFE_LONG " WRITE_LIFE_LONG
|
|
|
|
*
|
2018-01-31 10:36:58 +08:00
|
|
|
* 3) whint_mode=fs-based. F2FS passes down hints with its policy.
|
|
|
|
*
|
|
|
|
* User F2FS Block
|
|
|
|
* ---- ---- -----
|
|
|
|
* META WRITE_LIFE_MEDIUM;
|
|
|
|
* HOT_NODE WRITE_LIFE_NOT_SET
|
|
|
|
* WARM_NODE "
|
|
|
|
* COLD_NODE WRITE_LIFE_NONE
|
|
|
|
* ioctl(COLD) COLD_DATA WRITE_LIFE_EXTREME
|
|
|
|
* extension list " "
|
|
|
|
*
|
|
|
|
* -- buffered io
|
|
|
|
* WRITE_LIFE_EXTREME COLD_DATA WRITE_LIFE_EXTREME
|
|
|
|
* WRITE_LIFE_SHORT HOT_DATA WRITE_LIFE_SHORT
|
|
|
|
* WRITE_LIFE_NOT_SET WARM_DATA WRITE_LIFE_LONG
|
|
|
|
* WRITE_LIFE_NONE " "
|
|
|
|
* WRITE_LIFE_MEDIUM " "
|
|
|
|
* WRITE_LIFE_LONG " "
|
|
|
|
*
|
|
|
|
* -- direct io
|
|
|
|
* WRITE_LIFE_EXTREME COLD_DATA WRITE_LIFE_EXTREME
|
|
|
|
* WRITE_LIFE_SHORT HOT_DATA WRITE_LIFE_SHORT
|
|
|
|
* WRITE_LIFE_NOT_SET WARM_DATA WRITE_LIFE_NOT_SET
|
|
|
|
* WRITE_LIFE_NONE " WRITE_LIFE_NONE
|
|
|
|
* WRITE_LIFE_MEDIUM " WRITE_LIFE_MEDIUM
|
|
|
|
* WRITE_LIFE_LONG " WRITE_LIFE_LONG
|
2018-01-31 10:36:57 +08:00
|
|
|
*/
|
|
|
|
|
f2fs: clean up symbol namespace
As Ted reported:
"Hi, I was looking at f2fs's sources recently, and I noticed that there
is a very large number of non-static symbols which don't have a f2fs
prefix. There's well over a hundred (see attached below).
As one example, in fs/f2fs/dir.c there is:
unsigned char get_de_type(struct f2fs_dir_entry *de)
This function is clearly only useful for f2fs, but it has a generic
name. This means that if any other file system tries to have the same
symbol name, there will be a symbol conflict and the kernel would not
successfully build. It also means that when someone is looking f2fs
sources, it's not at all obvious whether a function such as
read_data_page(), invalidate_blocks(), is a generic kernel function
found in the fs, mm, or block layers, or a f2fs specific function.
You might want to fix this at some point. Hopefully Kent's bcachefs
isn't similarly using genericly named functions, since that might
cause conflicts with f2fs's functions --- but just as this would be a
problem that we would rightly insist that Kent fix, this is something
that we should have rightly insisted that f2fs should have fixed
before it was integrated into the mainline kernel.
acquire_orphan_inode
add_ino_entry
add_orphan_inode
allocate_data_block
allocate_new_segments
alloc_nid
alloc_nid_done
alloc_nid_failed
available_free_memory
...."
This patch adds "f2fs_" prefix for all non-static symbols in order to:
a) avoid conflict with other kernel generic symbols;
b) to indicate the function is f2fs specific one instead of generic
one;
Reported-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-30 00:20:41 +08:00
|
|
|
enum rw_hint f2fs_io_type_to_rw_hint(struct f2fs_sb_info *sbi,
|
2018-01-31 10:36:57 +08:00
|
|
|
enum page_type type, enum temp_type temp)
|
|
|
|
{
|
2018-03-08 14:22:56 +08:00
|
|
|
if (F2FS_OPTION(sbi).whint_mode == WHINT_MODE_USER) {
|
2018-01-31 10:36:57 +08:00
|
|
|
if (type == DATA) {
|
2018-01-31 10:36:58 +08:00
|
|
|
if (temp == WARM)
|
2018-01-31 10:36:57 +08:00
|
|
|
return WRITE_LIFE_NOT_SET;
|
2018-01-31 10:36:58 +08:00
|
|
|
else if (temp == HOT)
|
|
|
|
return WRITE_LIFE_SHORT;
|
|
|
|
else if (temp == COLD)
|
|
|
|
return WRITE_LIFE_EXTREME;
|
2018-01-31 10:36:57 +08:00
|
|
|
} else {
|
|
|
|
return WRITE_LIFE_NOT_SET;
|
|
|
|
}
|
2018-03-08 14:22:56 +08:00
|
|
|
} else if (F2FS_OPTION(sbi).whint_mode == WHINT_MODE_FS) {
|
2018-01-31 10:36:58 +08:00
|
|
|
if (type == DATA) {
|
|
|
|
if (temp == WARM)
|
|
|
|
return WRITE_LIFE_LONG;
|
|
|
|
else if (temp == HOT)
|
|
|
|
return WRITE_LIFE_SHORT;
|
|
|
|
else if (temp == COLD)
|
|
|
|
return WRITE_LIFE_EXTREME;
|
|
|
|
} else if (type == NODE) {
|
|
|
|
if (temp == WARM || temp == HOT)
|
|
|
|
return WRITE_LIFE_NOT_SET;
|
|
|
|
else if (temp == COLD)
|
|
|
|
return WRITE_LIFE_NONE;
|
|
|
|
} else if (type == META) {
|
|
|
|
return WRITE_LIFE_MEDIUM;
|
|
|
|
}
|
2018-01-31 10:36:57 +08:00
|
|
|
}
|
2018-01-31 10:36:58 +08:00
|
|
|
return WRITE_LIFE_NOT_SET;
|
2018-01-31 10:36:57 +08:00
|
|
|
}
|
|
|
|
|
2017-05-11 05:19:54 +08:00
|
|
|
static int __get_segment_type_2(struct f2fs_io_info *fio)
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
{
|
2017-05-11 05:19:54 +08:00
|
|
|
if (fio->type == DATA)
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
return CURSEG_HOT_DATA;
|
|
|
|
else
|
|
|
|
return CURSEG_HOT_NODE;
|
|
|
|
}
|
|
|
|
|
2017-05-11 05:19:54 +08:00
|
|
|
static int __get_segment_type_4(struct f2fs_io_info *fio)
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
{
|
2017-05-11 05:19:54 +08:00
|
|
|
if (fio->type == DATA) {
|
|
|
|
struct inode *inode = fio->page->mapping->host;
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
|
|
|
|
if (S_ISDIR(inode->i_mode))
|
|
|
|
return CURSEG_HOT_DATA;
|
|
|
|
else
|
|
|
|
return CURSEG_COLD_DATA;
|
|
|
|
} else {
|
2017-05-11 05:19:54 +08:00
|
|
|
if (IS_DNODE(fio->page) && is_cold_node(fio->page))
|
2014-11-06 12:05:53 +08:00
|
|
|
return CURSEG_WARM_NODE;
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
else
|
|
|
|
return CURSEG_COLD_NODE;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2017-05-11 05:19:54 +08:00
|
|
|
static int __get_segment_type_6(struct f2fs_io_info *fio)
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
{
|
2017-05-11 05:19:54 +08:00
|
|
|
if (fio->type == DATA) {
|
|
|
|
struct inode *inode = fio->page->mapping->host;
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
|
2017-05-11 05:19:54 +08:00
|
|
|
if (is_cold_data(fio->page) || file_is_cold(inode))
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
return CURSEG_COLD_DATA;
|
2018-02-28 17:07:27 +08:00
|
|
|
if (file_is_hot(inode) ||
|
2018-04-26 17:05:50 +08:00
|
|
|
is_inode_flag_set(inode, FI_HOT_DATA) ||
|
2018-07-17 20:41:48 +08:00
|
|
|
f2fs_is_atomic_file(inode) ||
|
|
|
|
f2fs_is_volatile_file(inode))
|
2017-03-25 08:05:13 +08:00
|
|
|
return CURSEG_HOT_DATA;
|
f2fs: clean up symbol namespace
As Ted reported:
"Hi, I was looking at f2fs's sources recently, and I noticed that there
is a very large number of non-static symbols which don't have a f2fs
prefix. There's well over a hundred (see attached below).
As one example, in fs/f2fs/dir.c there is:
unsigned char get_de_type(struct f2fs_dir_entry *de)
This function is clearly only useful for f2fs, but it has a generic
name. This means that if any other file system tries to have the same
symbol name, there will be a symbol conflict and the kernel would not
successfully build. It also means that when someone is looking f2fs
sources, it's not at all obvious whether a function such as
read_data_page(), invalidate_blocks(), is a generic kernel function
found in the fs, mm, or block layers, or a f2fs specific function.
You might want to fix this at some point. Hopefully Kent's bcachefs
isn't similarly using genericly named functions, since that might
cause conflicts with f2fs's functions --- but just as this would be a
problem that we would rightly insist that Kent fix, this is something
that we should have rightly insisted that f2fs should have fixed
before it was integrated into the mainline kernel.
acquire_orphan_inode
add_ino_entry
add_orphan_inode
allocate_data_block
allocate_new_segments
alloc_nid
alloc_nid_done
alloc_nid_failed
available_free_memory
...."
This patch adds "f2fs_" prefix for all non-static symbols in order to:
a) avoid conflict with other kernel generic symbols;
b) to indicate the function is f2fs specific one instead of generic
one;
Reported-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-30 00:20:41 +08:00
|
|
|
return f2fs_rw_hint_to_seg_type(inode->i_write_hint);
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
} else {
|
2017-05-11 05:19:54 +08:00
|
|
|
if (IS_DNODE(fio->page))
|
|
|
|
return is_cold_node(fio->page) ? CURSEG_WARM_NODE :
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
CURSEG_HOT_NODE;
|
2017-03-25 08:05:13 +08:00
|
|
|
return CURSEG_COLD_NODE;
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2017-05-11 05:19:54 +08:00
|
|
|
static int __get_segment_type(struct f2fs_io_info *fio)
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
{
|
2017-05-11 02:18:25 +08:00
|
|
|
int type = 0;
|
|
|
|
|
2018-03-08 14:22:56 +08:00
|
|
|
switch (F2FS_OPTION(fio->sbi).active_logs) {
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
case 2:
|
2017-05-11 02:18:25 +08:00
|
|
|
type = __get_segment_type_2(fio);
|
|
|
|
break;
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
case 4:
|
2017-05-11 02:18:25 +08:00
|
|
|
type = __get_segment_type_4(fio);
|
|
|
|
break;
|
|
|
|
case 6:
|
|
|
|
type = __get_segment_type_6(fio);
|
|
|
|
break;
|
|
|
|
default:
|
|
|
|
f2fs_bug_on(fio->sbi, true);
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
}
|
2017-05-11 05:19:54 +08:00
|
|
|
|
2017-05-11 02:18:25 +08:00
|
|
|
if (IS_HOT(type))
|
|
|
|
fio->temp = HOT;
|
|
|
|
else if (IS_WARM(type))
|
|
|
|
fio->temp = WARM;
|
|
|
|
else
|
|
|
|
fio->temp = COLD;
|
|
|
|
return type;
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
}
|
|
|
|
|
f2fs: clean up symbol namespace
As Ted reported:
"Hi, I was looking at f2fs's sources recently, and I noticed that there
is a very large number of non-static symbols which don't have a f2fs
prefix. There's well over a hundred (see attached below).
As one example, in fs/f2fs/dir.c there is:
unsigned char get_de_type(struct f2fs_dir_entry *de)
This function is clearly only useful for f2fs, but it has a generic
name. This means that if any other file system tries to have the same
symbol name, there will be a symbol conflict and the kernel would not
successfully build. It also means that when someone is looking f2fs
sources, it's not at all obvious whether a function such as
read_data_page(), invalidate_blocks(), is a generic kernel function
found in the fs, mm, or block layers, or a f2fs specific function.
You might want to fix this at some point. Hopefully Kent's bcachefs
isn't similarly using genericly named functions, since that might
cause conflicts with f2fs's functions --- but just as this would be a
problem that we would rightly insist that Kent fix, this is something
that we should have rightly insisted that f2fs should have fixed
before it was integrated into the mainline kernel.
acquire_orphan_inode
add_ino_entry
add_orphan_inode
allocate_data_block
allocate_new_segments
alloc_nid
alloc_nid_done
alloc_nid_failed
available_free_memory
...."
This patch adds "f2fs_" prefix for all non-static symbols in order to:
a) avoid conflict with other kernel generic symbols;
b) to indicate the function is f2fs specific one instead of generic
one;
Reported-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-30 00:20:41 +08:00
|
|
|
void f2fs_allocate_data_block(struct f2fs_sb_info *sbi, struct page *page,
|
2013-12-16 18:04:05 +08:00
|
|
|
block_t old_blkaddr, block_t *new_blkaddr,
|
2017-05-19 23:37:01 +08:00
|
|
|
struct f2fs_summary *sum, int type,
|
|
|
|
struct f2fs_io_info *fio, bool add_list)
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
{
|
|
|
|
struct sit_info *sit_i = SIT_I(sbi);
|
2016-11-12 04:31:40 +08:00
|
|
|
struct curseg_info *curseg = CURSEG_I(sbi, type);
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
|
f2fs: fix summary info corruption
Sometimes, after running generic/270 of fstest, fsck reports summary
info and actual position of block address in direct node becoming
inconsistent.
The root cause is race in between __f2fs_replace_block and change_curseg
as below:
Thread A Thread B
- __clone_blkaddrs
- f2fs_replace_block
- __f2fs_replace_block
- segnoA = GET_SEGNO(sbi, blkaddrA);
- type = se->type:=CURSEG_HOT_DATA
- if (!IS_CURSEG(sbi, segnoA))
type = CURSEG_WARM_DATA
- allocate_data_block
- allocate_segment
- get_ssr_segment
- change_curseg(segnoA, CURSEG_HOT_DATA)
- change_curseg(segnoA, CURSEG_WARM_DATA)
- reset_curseg
- __set_sit_entry_type
- change se->type from CURSEG_HOT_DATA to CURSEG_WARM_DATA
So finally, hot curseg locates in segnoA, but type of segnoA becomes
CURSEG_WARM_DATA.
Then if we invoke __f2fs_replace_block(blkaddrB, blkaddrA, true, false),
as blkaddrA locates in segnoA, so we will move warm type curseg to segnoA,
then change its summary cache and writeback it to summary block.
But segnoA is used by hot type curseg too, once it moves or persist, it
will cover summary block content with inner old summary cache, result in
inconsistent status.
This patch tries to fix this issue by introduce global curseg lock to avoid
race in between __f2fs_replace_block and change_curseg.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-11-02 20:41:03 +08:00
|
|
|
down_read(&SM_I(sbi)->curseg_lock);
|
|
|
|
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
mutex_lock(&curseg->curseg_mutex);
|
2017-10-30 17:49:53 +08:00
|
|
|
down_write(&sit_i->sentry_lock);
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
|
|
|
|
*new_blkaddr = NEXT_FREE_BLKADDR(sbi, curseg);
|
|
|
|
|
2016-12-30 06:07:53 +08:00
|
|
|
f2fs_wait_discard_bio(sbi, *new_blkaddr);
|
|
|
|
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
/*
|
|
|
|
* __add_sum_entry should be resided under the curseg_mutex
|
|
|
|
* because, this function updates a summary entry in the
|
|
|
|
* current summary block.
|
|
|
|
*/
|
2013-06-13 16:59:27 +08:00
|
|
|
__add_sum_entry(sbi, type, sum);
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
|
|
|
|
__refresh_next_blkoff(sbi, curseg);
|
2013-10-22 19:56:10 +08:00
|
|
|
|
|
|
|
stat_inc_block_count(sbi, curseg);
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
|
2017-10-30 09:33:41 +08:00
|
|
|
/*
|
|
|
|
* SIT information should be updated before segment allocation,
|
|
|
|
* since SSR needs latest valid block information.
|
|
|
|
*/
|
|
|
|
update_sit_entry(sbi, *new_blkaddr, 1);
|
|
|
|
if (GET_SEGNO(sbi, old_blkaddr) != NULL_SEGNO)
|
|
|
|
update_sit_entry(sbi, old_blkaddr, -1);
|
|
|
|
|
2017-04-05 07:45:30 +08:00
|
|
|
if (!__has_curseg_space(sbi, type))
|
|
|
|
sit_i->s_ops->allocate_segment(sbi, type, false);
|
2017-10-30 09:33:41 +08:00
|
|
|
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
/*
|
2017-10-30 09:33:41 +08:00
|
|
|
* segment dirty status should be updated after segment allocation,
|
|
|
|
* so we just need to update status only one time after previous
|
|
|
|
* segment being closed.
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
*/
|
2017-10-30 09:33:41 +08:00
|
|
|
locate_dirty_segment(sbi, GET_SEGNO(sbi, old_blkaddr));
|
|
|
|
locate_dirty_segment(sbi, GET_SEGNO(sbi, *new_blkaddr));
|
2014-01-28 11:22:14 +08:00
|
|
|
|
2017-10-30 17:49:53 +08:00
|
|
|
up_write(&sit_i->sentry_lock);
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
|
2017-07-31 20:19:09 +08:00
|
|
|
if (page && IS_NODESEG(type)) {
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
fill_node_footer_blkaddr(page, NEXT_FREE_BLKADDR(sbi, curseg));
|
|
|
|
|
2017-07-31 20:19:09 +08:00
|
|
|
f2fs_inode_chksum_set(sbi, page);
|
|
|
|
}
|
|
|
|
|
2017-05-19 23:37:01 +08:00
|
|
|
if (add_list) {
|
|
|
|
struct f2fs_bio_info *io;
|
|
|
|
|
|
|
|
INIT_LIST_HEAD(&fio->list);
|
|
|
|
fio->in_list = true;
|
2018-05-28 23:47:18 +08:00
|
|
|
fio->retry = false;
|
2017-05-19 23:37:01 +08:00
|
|
|
io = sbi->write_io[fio->type] + fio->temp;
|
|
|
|
spin_lock(&io->io_lock);
|
|
|
|
list_add_tail(&fio->list, &io->io_list);
|
|
|
|
spin_unlock(&io->io_lock);
|
|
|
|
}
|
|
|
|
|
2013-12-16 18:04:05 +08:00
|
|
|
mutex_unlock(&curseg->curseg_mutex);
|
f2fs: fix summary info corruption
Sometimes, after running generic/270 of fstest, fsck reports summary
info and actual position of block address in direct node becoming
inconsistent.
The root cause is race in between __f2fs_replace_block and change_curseg
as below:
Thread A Thread B
- __clone_blkaddrs
- f2fs_replace_block
- __f2fs_replace_block
- segnoA = GET_SEGNO(sbi, blkaddrA);
- type = se->type:=CURSEG_HOT_DATA
- if (!IS_CURSEG(sbi, segnoA))
type = CURSEG_WARM_DATA
- allocate_data_block
- allocate_segment
- get_ssr_segment
- change_curseg(segnoA, CURSEG_HOT_DATA)
- change_curseg(segnoA, CURSEG_WARM_DATA)
- reset_curseg
- __set_sit_entry_type
- change se->type from CURSEG_HOT_DATA to CURSEG_WARM_DATA
So finally, hot curseg locates in segnoA, but type of segnoA becomes
CURSEG_WARM_DATA.
Then if we invoke __f2fs_replace_block(blkaddrB, blkaddrA, true, false),
as blkaddrA locates in segnoA, so we will move warm type curseg to segnoA,
then change its summary cache and writeback it to summary block.
But segnoA is used by hot type curseg too, once it moves or persist, it
will cover summary block content with inner old summary cache, result in
inconsistent status.
This patch tries to fix this issue by introduce global curseg lock to avoid
race in between __f2fs_replace_block and change_curseg.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-11-02 20:41:03 +08:00
|
|
|
|
|
|
|
up_read(&SM_I(sbi)->curseg_lock);
|
2013-12-16 18:04:05 +08:00
|
|
|
}
|
|
|
|
|
2017-09-29 13:59:38 +08:00
|
|
|
static void update_device_state(struct f2fs_io_info *fio)
|
|
|
|
{
|
|
|
|
struct f2fs_sb_info *sbi = fio->sbi;
|
|
|
|
unsigned int devidx;
|
|
|
|
|
|
|
|
if (!sbi->s_ndevs)
|
|
|
|
return;
|
|
|
|
|
|
|
|
devidx = f2fs_target_device_index(sbi, fio->new_blkaddr);
|
|
|
|
|
|
|
|
/* update device state for fsync */
|
f2fs: clean up symbol namespace
As Ted reported:
"Hi, I was looking at f2fs's sources recently, and I noticed that there
is a very large number of non-static symbols which don't have a f2fs
prefix. There's well over a hundred (see attached below).
As one example, in fs/f2fs/dir.c there is:
unsigned char get_de_type(struct f2fs_dir_entry *de)
This function is clearly only useful for f2fs, but it has a generic
name. This means that if any other file system tries to have the same
symbol name, there will be a symbol conflict and the kernel would not
successfully build. It also means that when someone is looking f2fs
sources, it's not at all obvious whether a function such as
read_data_page(), invalidate_blocks(), is a generic kernel function
found in the fs, mm, or block layers, or a f2fs specific function.
You might want to fix this at some point. Hopefully Kent's bcachefs
isn't similarly using genericly named functions, since that might
cause conflicts with f2fs's functions --- but just as this would be a
problem that we would rightly insist that Kent fix, this is something
that we should have rightly insisted that f2fs should have fixed
before it was integrated into the mainline kernel.
acquire_orphan_inode
add_ino_entry
add_orphan_inode
allocate_data_block
allocate_new_segments
alloc_nid
alloc_nid_done
alloc_nid_failed
available_free_memory
...."
This patch adds "f2fs_" prefix for all non-static symbols in order to:
a) avoid conflict with other kernel generic symbols;
b) to indicate the function is f2fs specific one instead of generic
one;
Reported-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-30 00:20:41 +08:00
|
|
|
f2fs_set_dirty_device(sbi, fio->ino, devidx, FLUSH_INO);
|
2017-09-29 13:59:39 +08:00
|
|
|
|
|
|
|
/* update device state for checkpoint */
|
|
|
|
if (!f2fs_test_bit(devidx, (char *)&sbi->dirty_device)) {
|
|
|
|
spin_lock(&sbi->dev_lock);
|
|
|
|
f2fs_set_bit(devidx, (char *)&sbi->dirty_device);
|
|
|
|
spin_unlock(&sbi->dev_lock);
|
|
|
|
}
|
2017-09-29 13:59:38 +08:00
|
|
|
}
|
|
|
|
|
2015-04-24 05:38:15 +08:00
|
|
|
static void do_write_page(struct f2fs_summary *sum, struct f2fs_io_info *fio)
|
2013-12-16 18:04:05 +08:00
|
|
|
{
|
2017-05-11 05:19:54 +08:00
|
|
|
int type = __get_segment_type(fio);
|
2018-05-26 09:00:13 +08:00
|
|
|
bool keep_order = (test_opt(fio->sbi, LFS) && type == CURSEG_COLD_DATA);
|
2013-12-16 18:04:05 +08:00
|
|
|
|
2018-05-26 09:00:13 +08:00
|
|
|
if (keep_order)
|
|
|
|
down_read(&fio->sbi->io_order_lock);
|
2016-12-15 02:12:56 +08:00
|
|
|
reallocate:
|
f2fs: clean up symbol namespace
As Ted reported:
"Hi, I was looking at f2fs's sources recently, and I noticed that there
is a very large number of non-static symbols which don't have a f2fs
prefix. There's well over a hundred (see attached below).
As one example, in fs/f2fs/dir.c there is:
unsigned char get_de_type(struct f2fs_dir_entry *de)
This function is clearly only useful for f2fs, but it has a generic
name. This means that if any other file system tries to have the same
symbol name, there will be a symbol conflict and the kernel would not
successfully build. It also means that when someone is looking f2fs
sources, it's not at all obvious whether a function such as
read_data_page(), invalidate_blocks(), is a generic kernel function
found in the fs, mm, or block layers, or a f2fs specific function.
You might want to fix this at some point. Hopefully Kent's bcachefs
isn't similarly using genericly named functions, since that might
cause conflicts with f2fs's functions --- but just as this would be a
problem that we would rightly insist that Kent fix, this is something
that we should have rightly insisted that f2fs should have fixed
before it was integrated into the mainline kernel.
acquire_orphan_inode
add_ino_entry
add_orphan_inode
allocate_data_block
allocate_new_segments
alloc_nid
alloc_nid_done
alloc_nid_failed
available_free_memory
...."
This patch adds "f2fs_" prefix for all non-static symbols in order to:
a) avoid conflict with other kernel generic symbols;
b) to indicate the function is f2fs specific one instead of generic
one;
Reported-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-30 00:20:41 +08:00
|
|
|
f2fs_allocate_data_block(fio->sbi, fio->page, fio->old_blkaddr,
|
2017-05-19 23:37:01 +08:00
|
|
|
&fio->new_blkaddr, sum, type, fio, true);
|
f2fs: readahead encrypted block during GC
During GC, for each encrypted block, we will read block synchronously
into meta page, and then submit it into current cold data log area.
So this block read model with 4k granularity can make poor performance,
like migrating non-encrypted block, let's readahead encrypted block
as well to improve migration performance.
To implement this, we choose meta page that its index is old block
address of the encrypted block, and readahead ciphertext into this
page, later, if readaheaded page is still updated, we will load its
data into target meta page, and submit the write IO.
Note that for OPU, truncation, deletion, we need to invalid meta
page after we invalid old block address, to make sure we won't load
invalid data from target meta page during encrypted block migration.
for ((i = 0; i < 1000; i++))
do {
xfs_io -f /mnt/f2fs/dir/$i -c "pwrite 0 128k" -c "fsync";
} done
for ((i = 0; i < 1000; i+=2))
do {
rm /mnt/f2fs/dir/$i;
} done
ret = ioctl(fd, F2FS_IOC_GARBAGE_COLLECT, 0);
Before:
gc-6549 [001] d..1 214682.212797: block_rq_insert: 8,32 RA 32768 () 786400 + 64 [gc]
gc-6549 [001] d..1 214682.212802: block_unplug: [gc] 1
gc-6549 [001] .... 214682.213892: block_bio_queue: 8,32 R 67494144 + 8 [gc]
gc-6549 [001] .... 214682.213899: block_getrq: 8,32 R 67494144 + 8 [gc]
gc-6549 [001] .... 214682.213902: block_plug: [gc]
gc-6549 [001] d..1 214682.213905: block_rq_insert: 8,32 R 4096 () 67494144 + 8 [gc]
gc-6549 [001] d..1 214682.213908: block_unplug: [gc] 1
gc-6549 [001] .... 214682.226405: block_bio_queue: 8,32 R 67494152 + 8 [gc]
gc-6549 [001] .... 214682.226412: block_getrq: 8,32 R 67494152 + 8 [gc]
gc-6549 [001] .... 214682.226414: block_plug: [gc]
gc-6549 [001] d..1 214682.226417: block_rq_insert: 8,32 R 4096 () 67494152 + 8 [gc]
gc-6549 [001] d..1 214682.226420: block_unplug: [gc] 1
gc-6549 [001] .... 214682.226904: block_bio_queue: 8,32 R 67494160 + 8 [gc]
gc-6549 [001] .... 214682.226910: block_getrq: 8,32 R 67494160 + 8 [gc]
gc-6549 [001] .... 214682.226911: block_plug: [gc]
gc-6549 [001] d..1 214682.226914: block_rq_insert: 8,32 R 4096 () 67494160 + 8 [gc]
gc-6549 [001] d..1 214682.226916: block_unplug: [gc] 1
After:
gc-5678 [003] .... 214327.025906: block_bio_queue: 8,32 R 67493824 + 8 [gc]
gc-5678 [003] .... 214327.025908: block_bio_backmerge: 8,32 R 67493824 + 8 [gc]
gc-5678 [003] .... 214327.025915: block_bio_queue: 8,32 R 67493832 + 8 [gc]
gc-5678 [003] .... 214327.025917: block_bio_backmerge: 8,32 R 67493832 + 8 [gc]
gc-5678 [003] .... 214327.025923: block_bio_queue: 8,32 R 67493840 + 8 [gc]
gc-5678 [003] .... 214327.025925: block_bio_backmerge: 8,32 R 67493840 + 8 [gc]
gc-5678 [003] .... 214327.025932: block_bio_queue: 8,32 R 67493848 + 8 [gc]
gc-5678 [003] .... 214327.025934: block_bio_backmerge: 8,32 R 67493848 + 8 [gc]
gc-5678 [003] .... 214327.025941: block_bio_queue: 8,32 R 67493856 + 8 [gc]
gc-5678 [003] .... 214327.025943: block_bio_backmerge: 8,32 R 67493856 + 8 [gc]
gc-5678 [003] .... 214327.025953: block_bio_queue: 8,32 R 67493864 + 8 [gc]
gc-5678 [003] .... 214327.025955: block_bio_backmerge: 8,32 R 67493864 + 8 [gc]
gc-5678 [003] .... 214327.025962: block_bio_queue: 8,32 R 67493872 + 8 [gc]
gc-5678 [003] .... 214327.025964: block_bio_backmerge: 8,32 R 67493872 + 8 [gc]
gc-5678 [003] .... 214327.025970: block_bio_queue: 8,32 R 67493880 + 8 [gc]
gc-5678 [003] .... 214327.025972: block_bio_backmerge: 8,32 R 67493880 + 8 [gc]
gc-5678 [003] .... 214327.026000: block_bio_queue: 8,32 WS 34123776 + 2048 [gc]
gc-5678 [003] .... 214327.026019: block_getrq: 8,32 WS 34123776 + 2048 [gc]
gc-5678 [003] d..1 214327.026021: block_rq_insert: 8,32 R 131072 () 67493632 + 256 [gc]
gc-5678 [003] d..1 214327.026023: block_unplug: [gc] 1
gc-5678 [003] d..1 214327.026026: block_rq_issue: 8,32 R 131072 () 67493632 + 256 [gc]
gc-5678 [003] .... 214327.026046: block_plug: [gc]
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-08-14 22:37:25 +08:00
|
|
|
if (GET_SEGNO(fio->sbi, fio->old_blkaddr) != NULL_SEGNO)
|
|
|
|
invalidate_mapping_pages(META_MAPPING(fio->sbi),
|
|
|
|
fio->old_blkaddr, fio->old_blkaddr);
|
2013-12-16 18:04:05 +08:00
|
|
|
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
/* writeout dirty page into bdev */
|
2018-05-28 23:47:18 +08:00
|
|
|
f2fs_submit_page_write(fio);
|
|
|
|
if (fio->retry) {
|
2016-12-15 02:12:56 +08:00
|
|
|
fio->old_blkaddr = fio->new_blkaddr;
|
|
|
|
goto reallocate;
|
|
|
|
}
|
2018-05-28 23:47:18 +08:00
|
|
|
|
|
|
|
update_device_state(fio);
|
|
|
|
|
2018-05-26 09:00:13 +08:00
|
|
|
if (keep_order)
|
|
|
|
up_read(&fio->sbi->io_order_lock);
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
}
|
|
|
|
|
f2fs: clean up symbol namespace
As Ted reported:
"Hi, I was looking at f2fs's sources recently, and I noticed that there
is a very large number of non-static symbols which don't have a f2fs
prefix. There's well over a hundred (see attached below).
As one example, in fs/f2fs/dir.c there is:
unsigned char get_de_type(struct f2fs_dir_entry *de)
This function is clearly only useful for f2fs, but it has a generic
name. This means that if any other file system tries to have the same
symbol name, there will be a symbol conflict and the kernel would not
successfully build. It also means that when someone is looking f2fs
sources, it's not at all obvious whether a function such as
read_data_page(), invalidate_blocks(), is a generic kernel function
found in the fs, mm, or block layers, or a f2fs specific function.
You might want to fix this at some point. Hopefully Kent's bcachefs
isn't similarly using genericly named functions, since that might
cause conflicts with f2fs's functions --- but just as this would be a
problem that we would rightly insist that Kent fix, this is something
that we should have rightly insisted that f2fs should have fixed
before it was integrated into the mainline kernel.
acquire_orphan_inode
add_ino_entry
add_orphan_inode
allocate_data_block
allocate_new_segments
alloc_nid
alloc_nid_done
alloc_nid_failed
available_free_memory
...."
This patch adds "f2fs_" prefix for all non-static symbols in order to:
a) avoid conflict with other kernel generic symbols;
b) to indicate the function is f2fs specific one instead of generic
one;
Reported-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-30 00:20:41 +08:00
|
|
|
void f2fs_do_write_meta_page(struct f2fs_sb_info *sbi, struct page *page,
|
2017-08-02 23:21:48 +08:00
|
|
|
enum iostat_type io_type)
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
{
|
2013-12-11 12:54:01 +08:00
|
|
|
struct f2fs_io_info fio = {
|
2015-04-24 05:38:15 +08:00
|
|
|
.sbi = sbi,
|
2013-12-11 12:54:01 +08:00
|
|
|
.type = META,
|
2018-01-31 10:36:57 +08:00
|
|
|
.temp = HOT,
|
2016-06-06 03:31:55 +08:00
|
|
|
.op = REQ_OP_WRITE,
|
2016-11-01 21:40:10 +08:00
|
|
|
.op_flags = REQ_SYNC | REQ_META | REQ_PRIO,
|
f2fs: trace old block address for CoWed page
This patch enables to trace old block address of CoWed page for better
debugging.
f2fs_submit_page_mbio: dev = (1,0), ino = 1, page_index = 0x1d4f0, oldaddr = 0xfe8ab, newaddr = 0xfee90 rw = WRITE_SYNC, type = NODE
f2fs_submit_page_mbio: dev = (1,0), ino = 1, page_index = 0x1d4f8, oldaddr = 0xfe8b0, newaddr = 0xfee91 rw = WRITE_SYNC, type = NODE
f2fs_submit_page_mbio: dev = (1,0), ino = 1, page_index = 0x1d4fa, oldaddr = 0xfe8ae, newaddr = 0xfee92 rw = WRITE_SYNC, type = NODE
f2fs_submit_page_mbio: dev = (1,0), ino = 134824, page_index = 0x96, oldaddr = 0xf049b, newaddr = 0x2bbe rw = WRITE, type = DATA
f2fs_submit_page_mbio: dev = (1,0), ino = 134824, page_index = 0x97, oldaddr = 0xf049c, newaddr = 0x2bbf rw = WRITE, type = DATA
f2fs_submit_page_mbio: dev = (1,0), ino = 134824, page_index = 0x98, oldaddr = 0xf049d, newaddr = 0x2bc0 rw = WRITE, type = DATA
f2fs_submit_page_mbio: dev = (1,0), ino = 135260, page_index = 0x47, oldaddr = 0xffffffff, newaddr = 0xf2631 rw = WRITE, type = DATA
f2fs_submit_page_mbio: dev = (1,0), ino = 135260, page_index = 0x48, oldaddr = 0xffffffff, newaddr = 0xf2632 rw = WRITE, type = DATA
f2fs_submit_page_mbio: dev = (1,0), ino = 135260, page_index = 0x49, oldaddr = 0xffffffff, newaddr = 0xf2633 rw = WRITE, type = DATA
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-02-22 18:36:38 +08:00
|
|
|
.old_blkaddr = page->index,
|
|
|
|
.new_blkaddr = page->index,
|
2015-04-24 05:38:15 +08:00
|
|
|
.page = page,
|
2015-04-24 03:04:33 +08:00
|
|
|
.encrypted_page = NULL,
|
2017-05-19 23:37:01 +08:00
|
|
|
.in_list = false,
|
2013-12-11 12:54:01 +08:00
|
|
|
};
|
|
|
|
|
2015-10-12 17:04:21 +08:00
|
|
|
if (unlikely(page->index >= MAIN_BLKADDR(sbi)))
|
2016-06-06 03:31:55 +08:00
|
|
|
fio.op_flags &= ~REQ_META;
|
2015-10-12 17:04:21 +08:00
|
|
|
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
set_page_writeback(page);
|
2018-04-12 14:09:04 +08:00
|
|
|
ClearPageError(page);
|
2017-05-11 02:28:38 +08:00
|
|
|
f2fs_submit_page_write(&fio);
|
2017-08-02 23:21:48 +08:00
|
|
|
|
|
|
|
f2fs_update_iostat(sbi, io_type, F2FS_BLKSIZE);
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
}
|
|
|
|
|
f2fs: clean up symbol namespace
As Ted reported:
"Hi, I was looking at f2fs's sources recently, and I noticed that there
is a very large number of non-static symbols which don't have a f2fs
prefix. There's well over a hundred (see attached below).
As one example, in fs/f2fs/dir.c there is:
unsigned char get_de_type(struct f2fs_dir_entry *de)
This function is clearly only useful for f2fs, but it has a generic
name. This means that if any other file system tries to have the same
symbol name, there will be a symbol conflict and the kernel would not
successfully build. It also means that when someone is looking f2fs
sources, it's not at all obvious whether a function such as
read_data_page(), invalidate_blocks(), is a generic kernel function
found in the fs, mm, or block layers, or a f2fs specific function.
You might want to fix this at some point. Hopefully Kent's bcachefs
isn't similarly using genericly named functions, since that might
cause conflicts with f2fs's functions --- but just as this would be a
problem that we would rightly insist that Kent fix, this is something
that we should have rightly insisted that f2fs should have fixed
before it was integrated into the mainline kernel.
acquire_orphan_inode
add_ino_entry
add_orphan_inode
allocate_data_block
allocate_new_segments
alloc_nid
alloc_nid_done
alloc_nid_failed
available_free_memory
...."
This patch adds "f2fs_" prefix for all non-static symbols in order to:
a) avoid conflict with other kernel generic symbols;
b) to indicate the function is f2fs specific one instead of generic
one;
Reported-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-30 00:20:41 +08:00
|
|
|
void f2fs_do_write_node_page(unsigned int nid, struct f2fs_io_info *fio)
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
{
|
|
|
|
struct f2fs_summary sum;
|
2015-04-24 05:38:15 +08:00
|
|
|
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
set_summary(&sum, nid, 0, 0);
|
2015-04-24 05:38:15 +08:00
|
|
|
do_write_page(&sum, fio);
|
2017-08-02 23:21:48 +08:00
|
|
|
|
|
|
|
f2fs_update_iostat(fio->sbi, fio->io_type, F2FS_BLKSIZE);
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
}
|
|
|
|
|
f2fs: clean up symbol namespace
As Ted reported:
"Hi, I was looking at f2fs's sources recently, and I noticed that there
is a very large number of non-static symbols which don't have a f2fs
prefix. There's well over a hundred (see attached below).
As one example, in fs/f2fs/dir.c there is:
unsigned char get_de_type(struct f2fs_dir_entry *de)
This function is clearly only useful for f2fs, but it has a generic
name. This means that if any other file system tries to have the same
symbol name, there will be a symbol conflict and the kernel would not
successfully build. It also means that when someone is looking f2fs
sources, it's not at all obvious whether a function such as
read_data_page(), invalidate_blocks(), is a generic kernel function
found in the fs, mm, or block layers, or a f2fs specific function.
You might want to fix this at some point. Hopefully Kent's bcachefs
isn't similarly using genericly named functions, since that might
cause conflicts with f2fs's functions --- but just as this would be a
problem that we would rightly insist that Kent fix, this is something
that we should have rightly insisted that f2fs should have fixed
before it was integrated into the mainline kernel.
acquire_orphan_inode
add_ino_entry
add_orphan_inode
allocate_data_block
allocate_new_segments
alloc_nid
alloc_nid_done
alloc_nid_failed
available_free_memory
...."
This patch adds "f2fs_" prefix for all non-static symbols in order to:
a) avoid conflict with other kernel generic symbols;
b) to indicate the function is f2fs specific one instead of generic
one;
Reported-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-30 00:20:41 +08:00
|
|
|
void f2fs_outplace_write_data(struct dnode_of_data *dn,
|
|
|
|
struct f2fs_io_info *fio)
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
{
|
2015-04-24 05:38:15 +08:00
|
|
|
struct f2fs_sb_info *sbi = fio->sbi;
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
struct f2fs_summary sum;
|
|
|
|
|
2014-09-03 06:52:58 +08:00
|
|
|
f2fs_bug_on(sbi, dn->data_blkaddr == NULL_ADDR);
|
2018-07-17 00:02:17 +08:00
|
|
|
set_summary(&sum, dn->nid, dn->ofs_in_node, fio->version);
|
2015-04-24 05:38:15 +08:00
|
|
|
do_write_page(&sum, fio);
|
2016-02-24 17:16:47 +08:00
|
|
|
f2fs_update_data_blkaddr(dn, fio->new_blkaddr);
|
2017-08-02 23:21:48 +08:00
|
|
|
|
|
|
|
f2fs_update_iostat(sbi, fio->io_type, F2FS_BLKSIZE);
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
}
|
|
|
|
|
f2fs: clean up symbol namespace
As Ted reported:
"Hi, I was looking at f2fs's sources recently, and I noticed that there
is a very large number of non-static symbols which don't have a f2fs
prefix. There's well over a hundred (see attached below).
As one example, in fs/f2fs/dir.c there is:
unsigned char get_de_type(struct f2fs_dir_entry *de)
This function is clearly only useful for f2fs, but it has a generic
name. This means that if any other file system tries to have the same
symbol name, there will be a symbol conflict and the kernel would not
successfully build. It also means that when someone is looking f2fs
sources, it's not at all obvious whether a function such as
read_data_page(), invalidate_blocks(), is a generic kernel function
found in the fs, mm, or block layers, or a f2fs specific function.
You might want to fix this at some point. Hopefully Kent's bcachefs
isn't similarly using genericly named functions, since that might
cause conflicts with f2fs's functions --- but just as this would be a
problem that we would rightly insist that Kent fix, this is something
that we should have rightly insisted that f2fs should have fixed
before it was integrated into the mainline kernel.
acquire_orphan_inode
add_ino_entry
add_orphan_inode
allocate_data_block
allocate_new_segments
alloc_nid
alloc_nid_done
alloc_nid_failed
available_free_memory
...."
This patch adds "f2fs_" prefix for all non-static symbols in order to:
a) avoid conflict with other kernel generic symbols;
b) to indicate the function is f2fs specific one instead of generic
one;
Reported-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-30 00:20:41 +08:00
|
|
|
int f2fs_inplace_write_data(struct f2fs_io_info *fio)
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
{
|
2017-08-02 23:21:48 +08:00
|
|
|
int err;
|
2018-03-26 17:32:23 +08:00
|
|
|
struct f2fs_sb_info *sbi = fio->sbi;
|
2017-08-02 23:21:48 +08:00
|
|
|
|
f2fs: trace old block address for CoWed page
This patch enables to trace old block address of CoWed page for better
debugging.
f2fs_submit_page_mbio: dev = (1,0), ino = 1, page_index = 0x1d4f0, oldaddr = 0xfe8ab, newaddr = 0xfee90 rw = WRITE_SYNC, type = NODE
f2fs_submit_page_mbio: dev = (1,0), ino = 1, page_index = 0x1d4f8, oldaddr = 0xfe8b0, newaddr = 0xfee91 rw = WRITE_SYNC, type = NODE
f2fs_submit_page_mbio: dev = (1,0), ino = 1, page_index = 0x1d4fa, oldaddr = 0xfe8ae, newaddr = 0xfee92 rw = WRITE_SYNC, type = NODE
f2fs_submit_page_mbio: dev = (1,0), ino = 134824, page_index = 0x96, oldaddr = 0xf049b, newaddr = 0x2bbe rw = WRITE, type = DATA
f2fs_submit_page_mbio: dev = (1,0), ino = 134824, page_index = 0x97, oldaddr = 0xf049c, newaddr = 0x2bbf rw = WRITE, type = DATA
f2fs_submit_page_mbio: dev = (1,0), ino = 134824, page_index = 0x98, oldaddr = 0xf049d, newaddr = 0x2bc0 rw = WRITE, type = DATA
f2fs_submit_page_mbio: dev = (1,0), ino = 135260, page_index = 0x47, oldaddr = 0xffffffff, newaddr = 0xf2631 rw = WRITE, type = DATA
f2fs_submit_page_mbio: dev = (1,0), ino = 135260, page_index = 0x48, oldaddr = 0xffffffff, newaddr = 0xf2632 rw = WRITE, type = DATA
f2fs_submit_page_mbio: dev = (1,0), ino = 135260, page_index = 0x49, oldaddr = 0xffffffff, newaddr = 0xf2633 rw = WRITE, type = DATA
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-02-22 18:36:38 +08:00
|
|
|
fio->new_blkaddr = fio->old_blkaddr;
|
2018-01-31 10:36:57 +08:00
|
|
|
/* i/o temperature is needed for passing down write hints */
|
|
|
|
__get_segment_type(fio);
|
2018-03-26 17:32:23 +08:00
|
|
|
|
|
|
|
f2fs_bug_on(sbi, !IS_DATASEG(get_seg_entry(sbi,
|
|
|
|
GET_SEGNO(sbi, fio->new_blkaddr))->type));
|
|
|
|
|
2015-04-24 05:38:15 +08:00
|
|
|
stat_inc_inplace_blocks(fio->sbi);
|
2017-08-02 23:21:48 +08:00
|
|
|
|
|
|
|
err = f2fs_submit_page_bio(fio);
|
2017-09-29 13:59:38 +08:00
|
|
|
if (!err)
|
|
|
|
update_device_state(fio);
|
2017-08-02 23:21:48 +08:00
|
|
|
|
|
|
|
f2fs_update_iostat(fio->sbi, fio->io_type, F2FS_BLKSIZE);
|
|
|
|
|
|
|
|
return err;
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
}
|
|
|
|
|
f2fs: fix summary info corruption
Sometimes, after running generic/270 of fstest, fsck reports summary
info and actual position of block address in direct node becoming
inconsistent.
The root cause is race in between __f2fs_replace_block and change_curseg
as below:
Thread A Thread B
- __clone_blkaddrs
- f2fs_replace_block
- __f2fs_replace_block
- segnoA = GET_SEGNO(sbi, blkaddrA);
- type = se->type:=CURSEG_HOT_DATA
- if (!IS_CURSEG(sbi, segnoA))
type = CURSEG_WARM_DATA
- allocate_data_block
- allocate_segment
- get_ssr_segment
- change_curseg(segnoA, CURSEG_HOT_DATA)
- change_curseg(segnoA, CURSEG_WARM_DATA)
- reset_curseg
- __set_sit_entry_type
- change se->type from CURSEG_HOT_DATA to CURSEG_WARM_DATA
So finally, hot curseg locates in segnoA, but type of segnoA becomes
CURSEG_WARM_DATA.
Then if we invoke __f2fs_replace_block(blkaddrB, blkaddrA, true, false),
as blkaddrA locates in segnoA, so we will move warm type curseg to segnoA,
then change its summary cache and writeback it to summary block.
But segnoA is used by hot type curseg too, once it moves or persist, it
will cover summary block content with inner old summary cache, result in
inconsistent status.
This patch tries to fix this issue by introduce global curseg lock to avoid
race in between __f2fs_replace_block and change_curseg.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-11-02 20:41:03 +08:00
|
|
|
static inline int __f2fs_get_curseg(struct f2fs_sb_info *sbi,
|
|
|
|
unsigned int segno)
|
|
|
|
{
|
|
|
|
int i;
|
|
|
|
|
|
|
|
for (i = CURSEG_HOT_DATA; i < NO_CHECK_TYPE; i++) {
|
|
|
|
if (CURSEG_I(sbi, i)->segno == segno)
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
return i;
|
|
|
|
}
|
|
|
|
|
f2fs: clean up symbol namespace
As Ted reported:
"Hi, I was looking at f2fs's sources recently, and I noticed that there
is a very large number of non-static symbols which don't have a f2fs
prefix. There's well over a hundred (see attached below).
As one example, in fs/f2fs/dir.c there is:
unsigned char get_de_type(struct f2fs_dir_entry *de)
This function is clearly only useful for f2fs, but it has a generic
name. This means that if any other file system tries to have the same
symbol name, there will be a symbol conflict and the kernel would not
successfully build. It also means that when someone is looking f2fs
sources, it's not at all obvious whether a function such as
read_data_page(), invalidate_blocks(), is a generic kernel function
found in the fs, mm, or block layers, or a f2fs specific function.
You might want to fix this at some point. Hopefully Kent's bcachefs
isn't similarly using genericly named functions, since that might
cause conflicts with f2fs's functions --- but just as this would be a
problem that we would rightly insist that Kent fix, this is something
that we should have rightly insisted that f2fs should have fixed
before it was integrated into the mainline kernel.
acquire_orphan_inode
add_ino_entry
add_orphan_inode
allocate_data_block
allocate_new_segments
alloc_nid
alloc_nid_done
alloc_nid_failed
available_free_memory
...."
This patch adds "f2fs_" prefix for all non-static symbols in order to:
a) avoid conflict with other kernel generic symbols;
b) to indicate the function is f2fs specific one instead of generic
one;
Reported-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-30 00:20:41 +08:00
|
|
|
void f2fs_do_replace_block(struct f2fs_sb_info *sbi, struct f2fs_summary *sum,
|
2015-05-06 13:08:06 +08:00
|
|
|
block_t old_blkaddr, block_t new_blkaddr,
|
f2fs: support revoking atomic written pages
f2fs support atomic write with following semantics:
1. open db file
2. ioctl start atomic write
3. (write db file) * n
4. ioctl commit atomic write
5. close db file
With this flow we can avoid file becoming corrupted when abnormal power
cut, because we hold data of transaction in referenced pages linked in
inmem_pages list of inode, but without setting them dirty, so these data
won't be persisted unless we commit them in step 4.
But we should still hold journal db file in memory by using volatile
write, because our semantics of 'atomic write support' is incomplete, in
step 4, we could fail to submit all dirty data of transaction, once
partial dirty data was committed in storage, then after a checkpoint &
abnormal power-cut, db file will be corrupted forever.
So this patch tries to improve atomic write flow by adding a revoking flow,
once inner error occurs in committing, this gives another chance to try to
revoke these partial submitted data of current transaction, it makes
committing operation more like aotmical one.
If we're not lucky, once revoking operation was failed, EAGAIN will be
reported to user for suggesting doing the recovery with held journal file,
or retrying current transaction again.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-02-06 14:40:34 +08:00
|
|
|
bool recover_curseg, bool recover_newaddr)
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
{
|
|
|
|
struct sit_info *sit_i = SIT_I(sbi);
|
|
|
|
struct curseg_info *curseg;
|
|
|
|
unsigned int segno, old_cursegno;
|
|
|
|
struct seg_entry *se;
|
|
|
|
int type;
|
2015-05-06 13:08:06 +08:00
|
|
|
unsigned short old_blkoff;
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
|
|
|
|
segno = GET_SEGNO(sbi, new_blkaddr);
|
|
|
|
se = get_seg_entry(sbi, segno);
|
|
|
|
type = se->type;
|
|
|
|
|
f2fs: fix summary info corruption
Sometimes, after running generic/270 of fstest, fsck reports summary
info and actual position of block address in direct node becoming
inconsistent.
The root cause is race in between __f2fs_replace_block and change_curseg
as below:
Thread A Thread B
- __clone_blkaddrs
- f2fs_replace_block
- __f2fs_replace_block
- segnoA = GET_SEGNO(sbi, blkaddrA);
- type = se->type:=CURSEG_HOT_DATA
- if (!IS_CURSEG(sbi, segnoA))
type = CURSEG_WARM_DATA
- allocate_data_block
- allocate_segment
- get_ssr_segment
- change_curseg(segnoA, CURSEG_HOT_DATA)
- change_curseg(segnoA, CURSEG_WARM_DATA)
- reset_curseg
- __set_sit_entry_type
- change se->type from CURSEG_HOT_DATA to CURSEG_WARM_DATA
So finally, hot curseg locates in segnoA, but type of segnoA becomes
CURSEG_WARM_DATA.
Then if we invoke __f2fs_replace_block(blkaddrB, blkaddrA, true, false),
as blkaddrA locates in segnoA, so we will move warm type curseg to segnoA,
then change its summary cache and writeback it to summary block.
But segnoA is used by hot type curseg too, once it moves or persist, it
will cover summary block content with inner old summary cache, result in
inconsistent status.
This patch tries to fix this issue by introduce global curseg lock to avoid
race in between __f2fs_replace_block and change_curseg.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-11-02 20:41:03 +08:00
|
|
|
down_write(&SM_I(sbi)->curseg_lock);
|
|
|
|
|
2015-05-06 13:08:06 +08:00
|
|
|
if (!recover_curseg) {
|
|
|
|
/* for recovery flow */
|
|
|
|
if (se->valid_blocks == 0 && !IS_CURSEG(sbi, segno)) {
|
|
|
|
if (old_blkaddr == NULL_ADDR)
|
|
|
|
type = CURSEG_COLD_DATA;
|
|
|
|
else
|
|
|
|
type = CURSEG_WARM_DATA;
|
|
|
|
}
|
|
|
|
} else {
|
f2fs: fix summary info corruption
Sometimes, after running generic/270 of fstest, fsck reports summary
info and actual position of block address in direct node becoming
inconsistent.
The root cause is race in between __f2fs_replace_block and change_curseg
as below:
Thread A Thread B
- __clone_blkaddrs
- f2fs_replace_block
- __f2fs_replace_block
- segnoA = GET_SEGNO(sbi, blkaddrA);
- type = se->type:=CURSEG_HOT_DATA
- if (!IS_CURSEG(sbi, segnoA))
type = CURSEG_WARM_DATA
- allocate_data_block
- allocate_segment
- get_ssr_segment
- change_curseg(segnoA, CURSEG_HOT_DATA)
- change_curseg(segnoA, CURSEG_WARM_DATA)
- reset_curseg
- __set_sit_entry_type
- change se->type from CURSEG_HOT_DATA to CURSEG_WARM_DATA
So finally, hot curseg locates in segnoA, but type of segnoA becomes
CURSEG_WARM_DATA.
Then if we invoke __f2fs_replace_block(blkaddrB, blkaddrA, true, false),
as blkaddrA locates in segnoA, so we will move warm type curseg to segnoA,
then change its summary cache and writeback it to summary block.
But segnoA is used by hot type curseg too, once it moves or persist, it
will cover summary block content with inner old summary cache, result in
inconsistent status.
This patch tries to fix this issue by introduce global curseg lock to avoid
race in between __f2fs_replace_block and change_curseg.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-11-02 20:41:03 +08:00
|
|
|
if (IS_CURSEG(sbi, segno)) {
|
|
|
|
/* se->type is volatile as SSR allocation */
|
|
|
|
type = __f2fs_get_curseg(sbi, segno);
|
|
|
|
f2fs_bug_on(sbi, type == NO_CHECK_TYPE);
|
|
|
|
} else {
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
type = CURSEG_WARM_DATA;
|
f2fs: fix summary info corruption
Sometimes, after running generic/270 of fstest, fsck reports summary
info and actual position of block address in direct node becoming
inconsistent.
The root cause is race in between __f2fs_replace_block and change_curseg
as below:
Thread A Thread B
- __clone_blkaddrs
- f2fs_replace_block
- __f2fs_replace_block
- segnoA = GET_SEGNO(sbi, blkaddrA);
- type = se->type:=CURSEG_HOT_DATA
- if (!IS_CURSEG(sbi, segnoA))
type = CURSEG_WARM_DATA
- allocate_data_block
- allocate_segment
- get_ssr_segment
- change_curseg(segnoA, CURSEG_HOT_DATA)
- change_curseg(segnoA, CURSEG_WARM_DATA)
- reset_curseg
- __set_sit_entry_type
- change se->type from CURSEG_HOT_DATA to CURSEG_WARM_DATA
So finally, hot curseg locates in segnoA, but type of segnoA becomes
CURSEG_WARM_DATA.
Then if we invoke __f2fs_replace_block(blkaddrB, blkaddrA, true, false),
as blkaddrA locates in segnoA, so we will move warm type curseg to segnoA,
then change its summary cache and writeback it to summary block.
But segnoA is used by hot type curseg too, once it moves or persist, it
will cover summary block content with inner old summary cache, result in
inconsistent status.
This patch tries to fix this issue by introduce global curseg lock to avoid
race in between __f2fs_replace_block and change_curseg.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-11-02 20:41:03 +08:00
|
|
|
}
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
}
|
2015-05-06 13:08:06 +08:00
|
|
|
|
f2fs: check segment type in __f2fs_replace_block
In some case, the node blocks has wrong blkaddr whose segment type is
NODE, e.g., recover inode has missing xattr flag and the blkaddr is in
the xattr range. Since fsck.f2fs does not check the recovery nodes, this
will cause __f2fs_replace_block change the curseg of node and do the
update_sit_entry(sbi, new_blkaddr, 1) with no next_blkoff refresh, as a
result, when recovery process write checkpoint and sync nodes, the
next_blkoff of curseg is used in the segment bit map, then it will
cause f2fs_bug_on. So let's check segment type in __f2fs_replace_block.
Signed-off-by: Yunlong Song <yunlong.song@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-01-04 15:02:02 +08:00
|
|
|
f2fs_bug_on(sbi, !IS_DATASEG(type));
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
curseg = CURSEG_I(sbi, type);
|
|
|
|
|
|
|
|
mutex_lock(&curseg->curseg_mutex);
|
2017-10-30 17:49:53 +08:00
|
|
|
down_write(&sit_i->sentry_lock);
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
|
|
|
|
old_cursegno = curseg->segno;
|
2015-05-06 13:08:06 +08:00
|
|
|
old_blkoff = curseg->next_blkoff;
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
|
|
|
|
/* change the current segment */
|
|
|
|
if (segno != curseg->segno) {
|
|
|
|
curseg->next_segno = segno;
|
2017-08-30 18:04:48 +08:00
|
|
|
change_curseg(sbi, type);
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
}
|
|
|
|
|
2014-02-04 12:01:10 +08:00
|
|
|
curseg->next_blkoff = GET_BLKOFF_FROM_SEG0(sbi, new_blkaddr);
|
2013-06-13 16:59:27 +08:00
|
|
|
__add_sum_entry(sbi, type, sum);
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
|
f2fs: support revoking atomic written pages
f2fs support atomic write with following semantics:
1. open db file
2. ioctl start atomic write
3. (write db file) * n
4. ioctl commit atomic write
5. close db file
With this flow we can avoid file becoming corrupted when abnormal power
cut, because we hold data of transaction in referenced pages linked in
inmem_pages list of inode, but without setting them dirty, so these data
won't be persisted unless we commit them in step 4.
But we should still hold journal db file in memory by using volatile
write, because our semantics of 'atomic write support' is incomplete, in
step 4, we could fail to submit all dirty data of transaction, once
partial dirty data was committed in storage, then after a checkpoint &
abnormal power-cut, db file will be corrupted forever.
So this patch tries to improve atomic write flow by adding a revoking flow,
once inner error occurs in committing, this gives another chance to try to
revoke these partial submitted data of current transaction, it makes
committing operation more like aotmical one.
If we're not lucky, once revoking operation was failed, EAGAIN will be
reported to user for suggesting doing the recovery with held journal file,
or retrying current transaction again.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-02-06 14:40:34 +08:00
|
|
|
if (!recover_curseg || recover_newaddr)
|
2015-10-08 03:28:41 +08:00
|
|
|
update_sit_entry(sbi, new_blkaddr, 1);
|
f2fs: readahead encrypted block during GC
During GC, for each encrypted block, we will read block synchronously
into meta page, and then submit it into current cold data log area.
So this block read model with 4k granularity can make poor performance,
like migrating non-encrypted block, let's readahead encrypted block
as well to improve migration performance.
To implement this, we choose meta page that its index is old block
address of the encrypted block, and readahead ciphertext into this
page, later, if readaheaded page is still updated, we will load its
data into target meta page, and submit the write IO.
Note that for OPU, truncation, deletion, we need to invalid meta
page after we invalid old block address, to make sure we won't load
invalid data from target meta page during encrypted block migration.
for ((i = 0; i < 1000; i++))
do {
xfs_io -f /mnt/f2fs/dir/$i -c "pwrite 0 128k" -c "fsync";
} done
for ((i = 0; i < 1000; i+=2))
do {
rm /mnt/f2fs/dir/$i;
} done
ret = ioctl(fd, F2FS_IOC_GARBAGE_COLLECT, 0);
Before:
gc-6549 [001] d..1 214682.212797: block_rq_insert: 8,32 RA 32768 () 786400 + 64 [gc]
gc-6549 [001] d..1 214682.212802: block_unplug: [gc] 1
gc-6549 [001] .... 214682.213892: block_bio_queue: 8,32 R 67494144 + 8 [gc]
gc-6549 [001] .... 214682.213899: block_getrq: 8,32 R 67494144 + 8 [gc]
gc-6549 [001] .... 214682.213902: block_plug: [gc]
gc-6549 [001] d..1 214682.213905: block_rq_insert: 8,32 R 4096 () 67494144 + 8 [gc]
gc-6549 [001] d..1 214682.213908: block_unplug: [gc] 1
gc-6549 [001] .... 214682.226405: block_bio_queue: 8,32 R 67494152 + 8 [gc]
gc-6549 [001] .... 214682.226412: block_getrq: 8,32 R 67494152 + 8 [gc]
gc-6549 [001] .... 214682.226414: block_plug: [gc]
gc-6549 [001] d..1 214682.226417: block_rq_insert: 8,32 R 4096 () 67494152 + 8 [gc]
gc-6549 [001] d..1 214682.226420: block_unplug: [gc] 1
gc-6549 [001] .... 214682.226904: block_bio_queue: 8,32 R 67494160 + 8 [gc]
gc-6549 [001] .... 214682.226910: block_getrq: 8,32 R 67494160 + 8 [gc]
gc-6549 [001] .... 214682.226911: block_plug: [gc]
gc-6549 [001] d..1 214682.226914: block_rq_insert: 8,32 R 4096 () 67494160 + 8 [gc]
gc-6549 [001] d..1 214682.226916: block_unplug: [gc] 1
After:
gc-5678 [003] .... 214327.025906: block_bio_queue: 8,32 R 67493824 + 8 [gc]
gc-5678 [003] .... 214327.025908: block_bio_backmerge: 8,32 R 67493824 + 8 [gc]
gc-5678 [003] .... 214327.025915: block_bio_queue: 8,32 R 67493832 + 8 [gc]
gc-5678 [003] .... 214327.025917: block_bio_backmerge: 8,32 R 67493832 + 8 [gc]
gc-5678 [003] .... 214327.025923: block_bio_queue: 8,32 R 67493840 + 8 [gc]
gc-5678 [003] .... 214327.025925: block_bio_backmerge: 8,32 R 67493840 + 8 [gc]
gc-5678 [003] .... 214327.025932: block_bio_queue: 8,32 R 67493848 + 8 [gc]
gc-5678 [003] .... 214327.025934: block_bio_backmerge: 8,32 R 67493848 + 8 [gc]
gc-5678 [003] .... 214327.025941: block_bio_queue: 8,32 R 67493856 + 8 [gc]
gc-5678 [003] .... 214327.025943: block_bio_backmerge: 8,32 R 67493856 + 8 [gc]
gc-5678 [003] .... 214327.025953: block_bio_queue: 8,32 R 67493864 + 8 [gc]
gc-5678 [003] .... 214327.025955: block_bio_backmerge: 8,32 R 67493864 + 8 [gc]
gc-5678 [003] .... 214327.025962: block_bio_queue: 8,32 R 67493872 + 8 [gc]
gc-5678 [003] .... 214327.025964: block_bio_backmerge: 8,32 R 67493872 + 8 [gc]
gc-5678 [003] .... 214327.025970: block_bio_queue: 8,32 R 67493880 + 8 [gc]
gc-5678 [003] .... 214327.025972: block_bio_backmerge: 8,32 R 67493880 + 8 [gc]
gc-5678 [003] .... 214327.026000: block_bio_queue: 8,32 WS 34123776 + 2048 [gc]
gc-5678 [003] .... 214327.026019: block_getrq: 8,32 WS 34123776 + 2048 [gc]
gc-5678 [003] d..1 214327.026021: block_rq_insert: 8,32 R 131072 () 67493632 + 256 [gc]
gc-5678 [003] d..1 214327.026023: block_unplug: [gc] 1
gc-5678 [003] d..1 214327.026026: block_rq_issue: 8,32 R 131072 () 67493632 + 256 [gc]
gc-5678 [003] .... 214327.026046: block_plug: [gc]
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-08-14 22:37:25 +08:00
|
|
|
if (GET_SEGNO(sbi, old_blkaddr) != NULL_SEGNO) {
|
|
|
|
invalidate_mapping_pages(META_MAPPING(sbi),
|
|
|
|
old_blkaddr, old_blkaddr);
|
2015-10-08 03:28:41 +08:00
|
|
|
update_sit_entry(sbi, old_blkaddr, -1);
|
f2fs: readahead encrypted block during GC
During GC, for each encrypted block, we will read block synchronously
into meta page, and then submit it into current cold data log area.
So this block read model with 4k granularity can make poor performance,
like migrating non-encrypted block, let's readahead encrypted block
as well to improve migration performance.
To implement this, we choose meta page that its index is old block
address of the encrypted block, and readahead ciphertext into this
page, later, if readaheaded page is still updated, we will load its
data into target meta page, and submit the write IO.
Note that for OPU, truncation, deletion, we need to invalid meta
page after we invalid old block address, to make sure we won't load
invalid data from target meta page during encrypted block migration.
for ((i = 0; i < 1000; i++))
do {
xfs_io -f /mnt/f2fs/dir/$i -c "pwrite 0 128k" -c "fsync";
} done
for ((i = 0; i < 1000; i+=2))
do {
rm /mnt/f2fs/dir/$i;
} done
ret = ioctl(fd, F2FS_IOC_GARBAGE_COLLECT, 0);
Before:
gc-6549 [001] d..1 214682.212797: block_rq_insert: 8,32 RA 32768 () 786400 + 64 [gc]
gc-6549 [001] d..1 214682.212802: block_unplug: [gc] 1
gc-6549 [001] .... 214682.213892: block_bio_queue: 8,32 R 67494144 + 8 [gc]
gc-6549 [001] .... 214682.213899: block_getrq: 8,32 R 67494144 + 8 [gc]
gc-6549 [001] .... 214682.213902: block_plug: [gc]
gc-6549 [001] d..1 214682.213905: block_rq_insert: 8,32 R 4096 () 67494144 + 8 [gc]
gc-6549 [001] d..1 214682.213908: block_unplug: [gc] 1
gc-6549 [001] .... 214682.226405: block_bio_queue: 8,32 R 67494152 + 8 [gc]
gc-6549 [001] .... 214682.226412: block_getrq: 8,32 R 67494152 + 8 [gc]
gc-6549 [001] .... 214682.226414: block_plug: [gc]
gc-6549 [001] d..1 214682.226417: block_rq_insert: 8,32 R 4096 () 67494152 + 8 [gc]
gc-6549 [001] d..1 214682.226420: block_unplug: [gc] 1
gc-6549 [001] .... 214682.226904: block_bio_queue: 8,32 R 67494160 + 8 [gc]
gc-6549 [001] .... 214682.226910: block_getrq: 8,32 R 67494160 + 8 [gc]
gc-6549 [001] .... 214682.226911: block_plug: [gc]
gc-6549 [001] d..1 214682.226914: block_rq_insert: 8,32 R 4096 () 67494160 + 8 [gc]
gc-6549 [001] d..1 214682.226916: block_unplug: [gc] 1
After:
gc-5678 [003] .... 214327.025906: block_bio_queue: 8,32 R 67493824 + 8 [gc]
gc-5678 [003] .... 214327.025908: block_bio_backmerge: 8,32 R 67493824 + 8 [gc]
gc-5678 [003] .... 214327.025915: block_bio_queue: 8,32 R 67493832 + 8 [gc]
gc-5678 [003] .... 214327.025917: block_bio_backmerge: 8,32 R 67493832 + 8 [gc]
gc-5678 [003] .... 214327.025923: block_bio_queue: 8,32 R 67493840 + 8 [gc]
gc-5678 [003] .... 214327.025925: block_bio_backmerge: 8,32 R 67493840 + 8 [gc]
gc-5678 [003] .... 214327.025932: block_bio_queue: 8,32 R 67493848 + 8 [gc]
gc-5678 [003] .... 214327.025934: block_bio_backmerge: 8,32 R 67493848 + 8 [gc]
gc-5678 [003] .... 214327.025941: block_bio_queue: 8,32 R 67493856 + 8 [gc]
gc-5678 [003] .... 214327.025943: block_bio_backmerge: 8,32 R 67493856 + 8 [gc]
gc-5678 [003] .... 214327.025953: block_bio_queue: 8,32 R 67493864 + 8 [gc]
gc-5678 [003] .... 214327.025955: block_bio_backmerge: 8,32 R 67493864 + 8 [gc]
gc-5678 [003] .... 214327.025962: block_bio_queue: 8,32 R 67493872 + 8 [gc]
gc-5678 [003] .... 214327.025964: block_bio_backmerge: 8,32 R 67493872 + 8 [gc]
gc-5678 [003] .... 214327.025970: block_bio_queue: 8,32 R 67493880 + 8 [gc]
gc-5678 [003] .... 214327.025972: block_bio_backmerge: 8,32 R 67493880 + 8 [gc]
gc-5678 [003] .... 214327.026000: block_bio_queue: 8,32 WS 34123776 + 2048 [gc]
gc-5678 [003] .... 214327.026019: block_getrq: 8,32 WS 34123776 + 2048 [gc]
gc-5678 [003] d..1 214327.026021: block_rq_insert: 8,32 R 131072 () 67493632 + 256 [gc]
gc-5678 [003] d..1 214327.026023: block_unplug: [gc] 1
gc-5678 [003] d..1 214327.026026: block_rq_issue: 8,32 R 131072 () 67493632 + 256 [gc]
gc-5678 [003] .... 214327.026046: block_plug: [gc]
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-08-14 22:37:25 +08:00
|
|
|
}
|
2015-10-08 03:28:41 +08:00
|
|
|
|
|
|
|
locate_dirty_segment(sbi, GET_SEGNO(sbi, old_blkaddr));
|
|
|
|
locate_dirty_segment(sbi, GET_SEGNO(sbi, new_blkaddr));
|
|
|
|
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
locate_dirty_segment(sbi, old_cursegno);
|
|
|
|
|
2015-05-06 13:08:06 +08:00
|
|
|
if (recover_curseg) {
|
|
|
|
if (old_cursegno != curseg->segno) {
|
|
|
|
curseg->next_segno = old_cursegno;
|
2017-08-30 18:04:48 +08:00
|
|
|
change_curseg(sbi, type);
|
2015-05-06 13:08:06 +08:00
|
|
|
}
|
|
|
|
curseg->next_blkoff = old_blkoff;
|
|
|
|
}
|
|
|
|
|
2017-10-30 17:49:53 +08:00
|
|
|
up_write(&sit_i->sentry_lock);
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
mutex_unlock(&curseg->curseg_mutex);
|
f2fs: fix summary info corruption
Sometimes, after running generic/270 of fstest, fsck reports summary
info and actual position of block address in direct node becoming
inconsistent.
The root cause is race in between __f2fs_replace_block and change_curseg
as below:
Thread A Thread B
- __clone_blkaddrs
- f2fs_replace_block
- __f2fs_replace_block
- segnoA = GET_SEGNO(sbi, blkaddrA);
- type = se->type:=CURSEG_HOT_DATA
- if (!IS_CURSEG(sbi, segnoA))
type = CURSEG_WARM_DATA
- allocate_data_block
- allocate_segment
- get_ssr_segment
- change_curseg(segnoA, CURSEG_HOT_DATA)
- change_curseg(segnoA, CURSEG_WARM_DATA)
- reset_curseg
- __set_sit_entry_type
- change se->type from CURSEG_HOT_DATA to CURSEG_WARM_DATA
So finally, hot curseg locates in segnoA, but type of segnoA becomes
CURSEG_WARM_DATA.
Then if we invoke __f2fs_replace_block(blkaddrB, blkaddrA, true, false),
as blkaddrA locates in segnoA, so we will move warm type curseg to segnoA,
then change its summary cache and writeback it to summary block.
But segnoA is used by hot type curseg too, once it moves or persist, it
will cover summary block content with inner old summary cache, result in
inconsistent status.
This patch tries to fix this issue by introduce global curseg lock to avoid
race in between __f2fs_replace_block and change_curseg.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-11-02 20:41:03 +08:00
|
|
|
up_write(&SM_I(sbi)->curseg_lock);
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
}
|
|
|
|
|
2015-05-28 19:15:35 +08:00
|
|
|
void f2fs_replace_block(struct f2fs_sb_info *sbi, struct dnode_of_data *dn,
|
|
|
|
block_t old_addr, block_t new_addr,
|
f2fs: support revoking atomic written pages
f2fs support atomic write with following semantics:
1. open db file
2. ioctl start atomic write
3. (write db file) * n
4. ioctl commit atomic write
5. close db file
With this flow we can avoid file becoming corrupted when abnormal power
cut, because we hold data of transaction in referenced pages linked in
inmem_pages list of inode, but without setting them dirty, so these data
won't be persisted unless we commit them in step 4.
But we should still hold journal db file in memory by using volatile
write, because our semantics of 'atomic write support' is incomplete, in
step 4, we could fail to submit all dirty data of transaction, once
partial dirty data was committed in storage, then after a checkpoint &
abnormal power-cut, db file will be corrupted forever.
So this patch tries to improve atomic write flow by adding a revoking flow,
once inner error occurs in committing, this gives another chance to try to
revoke these partial submitted data of current transaction, it makes
committing operation more like aotmical one.
If we're not lucky, once revoking operation was failed, EAGAIN will be
reported to user for suggesting doing the recovery with held journal file,
or retrying current transaction again.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-02-06 14:40:34 +08:00
|
|
|
unsigned char version, bool recover_curseg,
|
|
|
|
bool recover_newaddr)
|
2015-05-28 19:15:35 +08:00
|
|
|
{
|
|
|
|
struct f2fs_summary sum;
|
|
|
|
|
|
|
|
set_summary(&sum, dn->nid, dn->ofs_in_node, version);
|
|
|
|
|
f2fs: clean up symbol namespace
As Ted reported:
"Hi, I was looking at f2fs's sources recently, and I noticed that there
is a very large number of non-static symbols which don't have a f2fs
prefix. There's well over a hundred (see attached below).
As one example, in fs/f2fs/dir.c there is:
unsigned char get_de_type(struct f2fs_dir_entry *de)
This function is clearly only useful for f2fs, but it has a generic
name. This means that if any other file system tries to have the same
symbol name, there will be a symbol conflict and the kernel would not
successfully build. It also means that when someone is looking f2fs
sources, it's not at all obvious whether a function such as
read_data_page(), invalidate_blocks(), is a generic kernel function
found in the fs, mm, or block layers, or a f2fs specific function.
You might want to fix this at some point. Hopefully Kent's bcachefs
isn't similarly using genericly named functions, since that might
cause conflicts with f2fs's functions --- but just as this would be a
problem that we would rightly insist that Kent fix, this is something
that we should have rightly insisted that f2fs should have fixed
before it was integrated into the mainline kernel.
acquire_orphan_inode
add_ino_entry
add_orphan_inode
allocate_data_block
allocate_new_segments
alloc_nid
alloc_nid_done
alloc_nid_failed
available_free_memory
...."
This patch adds "f2fs_" prefix for all non-static symbols in order to:
a) avoid conflict with other kernel generic symbols;
b) to indicate the function is f2fs specific one instead of generic
one;
Reported-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-30 00:20:41 +08:00
|
|
|
f2fs_do_replace_block(sbi, &sum, old_addr, new_addr,
|
f2fs: support revoking atomic written pages
f2fs support atomic write with following semantics:
1. open db file
2. ioctl start atomic write
3. (write db file) * n
4. ioctl commit atomic write
5. close db file
With this flow we can avoid file becoming corrupted when abnormal power
cut, because we hold data of transaction in referenced pages linked in
inmem_pages list of inode, but without setting them dirty, so these data
won't be persisted unless we commit them in step 4.
But we should still hold journal db file in memory by using volatile
write, because our semantics of 'atomic write support' is incomplete, in
step 4, we could fail to submit all dirty data of transaction, once
partial dirty data was committed in storage, then after a checkpoint &
abnormal power-cut, db file will be corrupted forever.
So this patch tries to improve atomic write flow by adding a revoking flow,
once inner error occurs in committing, this gives another chance to try to
revoke these partial submitted data of current transaction, it makes
committing operation more like aotmical one.
If we're not lucky, once revoking operation was failed, EAGAIN will be
reported to user for suggesting doing the recovery with held journal file,
or retrying current transaction again.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-02-06 14:40:34 +08:00
|
|
|
recover_curseg, recover_newaddr);
|
2015-05-28 19:15:35 +08:00
|
|
|
|
2016-02-24 17:16:47 +08:00
|
|
|
f2fs_update_data_blkaddr(dn, new_addr);
|
2015-05-28 19:15:35 +08:00
|
|
|
}
|
|
|
|
|
2013-11-30 11:51:14 +08:00
|
|
|
void f2fs_wait_on_page_writeback(struct page *page,
|
2016-01-20 23:43:51 +08:00
|
|
|
enum page_type type, bool ordered)
|
2013-11-30 11:51:14 +08:00
|
|
|
{
|
|
|
|
if (PageWriteback(page)) {
|
2014-09-03 06:31:18 +08:00
|
|
|
struct f2fs_sb_info *sbi = F2FS_P_SB(page);
|
|
|
|
|
2017-05-11 02:28:38 +08:00
|
|
|
f2fs_submit_merged_write_cond(sbi, page->mapping->host,
|
|
|
|
0, page->index, type);
|
2016-01-20 23:43:51 +08:00
|
|
|
if (ordered)
|
|
|
|
wait_on_page_writeback(page);
|
|
|
|
else
|
|
|
|
wait_for_stable_page(page);
|
2013-11-30 11:51:14 +08:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2018-08-23 12:18:00 +08:00
|
|
|
void f2fs_wait_on_block_writeback(struct inode *inode, block_t blkaddr)
|
2015-10-08 13:27:34 +08:00
|
|
|
{
|
2018-08-23 12:18:00 +08:00
|
|
|
struct f2fs_sb_info *sbi = F2FS_I_SB(inode);
|
2015-10-08 13:27:34 +08:00
|
|
|
struct page *cpage;
|
|
|
|
|
2018-08-23 12:18:00 +08:00
|
|
|
if (!f2fs_post_read_required(inode))
|
|
|
|
return;
|
|
|
|
|
2018-06-05 17:44:11 +08:00
|
|
|
if (!is_valid_data_blkaddr(sbi, blkaddr))
|
2015-10-08 13:27:34 +08:00
|
|
|
return;
|
|
|
|
|
|
|
|
cpage = find_lock_page(META_MAPPING(sbi), blkaddr);
|
|
|
|
if (cpage) {
|
2016-01-20 23:43:51 +08:00
|
|
|
f2fs_wait_on_page_writeback(cpage, DATA, true);
|
2015-10-08 13:27:34 +08:00
|
|
|
f2fs_put_page(cpage, 1);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2018-07-17 00:02:17 +08:00
|
|
|
static int read_compacted_summaries(struct f2fs_sb_info *sbi)
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
{
|
|
|
|
struct f2fs_checkpoint *ckpt = F2FS_CKPT(sbi);
|
|
|
|
struct curseg_info *seg_i;
|
|
|
|
unsigned char *kaddr;
|
|
|
|
struct page *page;
|
|
|
|
block_t start;
|
|
|
|
int i, j, offset;
|
|
|
|
|
|
|
|
start = start_sum_block(sbi);
|
|
|
|
|
f2fs: clean up symbol namespace
As Ted reported:
"Hi, I was looking at f2fs's sources recently, and I noticed that there
is a very large number of non-static symbols which don't have a f2fs
prefix. There's well over a hundred (see attached below).
As one example, in fs/f2fs/dir.c there is:
unsigned char get_de_type(struct f2fs_dir_entry *de)
This function is clearly only useful for f2fs, but it has a generic
name. This means that if any other file system tries to have the same
symbol name, there will be a symbol conflict and the kernel would not
successfully build. It also means that when someone is looking f2fs
sources, it's not at all obvious whether a function such as
read_data_page(), invalidate_blocks(), is a generic kernel function
found in the fs, mm, or block layers, or a f2fs specific function.
You might want to fix this at some point. Hopefully Kent's bcachefs
isn't similarly using genericly named functions, since that might
cause conflicts with f2fs's functions --- but just as this would be a
problem that we would rightly insist that Kent fix, this is something
that we should have rightly insisted that f2fs should have fixed
before it was integrated into the mainline kernel.
acquire_orphan_inode
add_ino_entry
add_orphan_inode
allocate_data_block
allocate_new_segments
alloc_nid
alloc_nid_done
alloc_nid_failed
available_free_memory
...."
This patch adds "f2fs_" prefix for all non-static symbols in order to:
a) avoid conflict with other kernel generic symbols;
b) to indicate the function is f2fs specific one instead of generic
one;
Reported-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-30 00:20:41 +08:00
|
|
|
page = f2fs_get_meta_page(sbi, start++);
|
2018-07-17 00:02:17 +08:00
|
|
|
if (IS_ERR(page))
|
|
|
|
return PTR_ERR(page);
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
kaddr = (unsigned char *)page_address(page);
|
|
|
|
|
|
|
|
/* Step 1: restore nat cache */
|
|
|
|
seg_i = CURSEG_I(sbi, CURSEG_HOT_DATA);
|
f2fs: split journal cache from curseg cache
In curseg cache, f2fs caches two different parts:
- datas of current summay block, i.e. summary entries, footer info.
- journal info, i.e. sparse nat/sit entries or io stat info.
With this approach, 1) it may cause higher lock contention when we access
or update both of the parts of cache since we use the same mutex lock
curseg_mutex to protect the cache. 2) current summary block with last
journal info will be writebacked into device as a normal summary block
when flushing, however, we treat journal info as valid one only in current
summary, so most normal summary blocks contain junk journal data, it wastes
remaining space of summary block.
So, in order to fix above issues, we split curseg cache into two parts:
a) current summary block, protected by original mutex lock curseg_mutex
b) journal cache, protected by newly introduced r/w semaphore journal_rwsem
When loading curseg cache during ->mount, we store summary info and
journal info into different caches; When doing checkpoint, we combine
datas of two cache into current summary block for persisting.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-02-19 18:08:46 +08:00
|
|
|
memcpy(seg_i->journal, kaddr, SUM_JOURNAL_SIZE);
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
|
|
|
|
/* Step 2: restore sit cache */
|
|
|
|
seg_i = CURSEG_I(sbi, CURSEG_COLD_DATA);
|
f2fs: split journal cache from curseg cache
In curseg cache, f2fs caches two different parts:
- datas of current summay block, i.e. summary entries, footer info.
- journal info, i.e. sparse nat/sit entries or io stat info.
With this approach, 1) it may cause higher lock contention when we access
or update both of the parts of cache since we use the same mutex lock
curseg_mutex to protect the cache. 2) current summary block with last
journal info will be writebacked into device as a normal summary block
when flushing, however, we treat journal info as valid one only in current
summary, so most normal summary blocks contain junk journal data, it wastes
remaining space of summary block.
So, in order to fix above issues, we split curseg cache into two parts:
a) current summary block, protected by original mutex lock curseg_mutex
b) journal cache, protected by newly introduced r/w semaphore journal_rwsem
When loading curseg cache during ->mount, we store summary info and
journal info into different caches; When doing checkpoint, we combine
datas of two cache into current summary block for persisting.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-02-19 18:08:46 +08:00
|
|
|
memcpy(seg_i->journal, kaddr + SUM_JOURNAL_SIZE, SUM_JOURNAL_SIZE);
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
offset = 2 * SUM_JOURNAL_SIZE;
|
|
|
|
|
|
|
|
/* Step 3: restore summary entries */
|
|
|
|
for (i = CURSEG_HOT_DATA; i <= CURSEG_COLD_DATA; i++) {
|
|
|
|
unsigned short blk_off;
|
|
|
|
unsigned int segno;
|
|
|
|
|
|
|
|
seg_i = CURSEG_I(sbi, i);
|
|
|
|
segno = le32_to_cpu(ckpt->cur_data_segno[i]);
|
|
|
|
blk_off = le16_to_cpu(ckpt->cur_data_blkoff[i]);
|
|
|
|
seg_i->next_segno = segno;
|
|
|
|
reset_curseg(sbi, i, 0);
|
|
|
|
seg_i->alloc_type = ckpt->alloc_type[i];
|
|
|
|
seg_i->next_blkoff = blk_off;
|
|
|
|
|
|
|
|
if (seg_i->alloc_type == SSR)
|
|
|
|
blk_off = sbi->blocks_per_seg;
|
|
|
|
|
|
|
|
for (j = 0; j < blk_off; j++) {
|
|
|
|
struct f2fs_summary *s;
|
|
|
|
s = (struct f2fs_summary *)(kaddr + offset);
|
|
|
|
seg_i->sum_blk->entries[j] = *s;
|
|
|
|
offset += SUMMARY_SIZE;
|
mm, fs: get rid of PAGE_CACHE_* and page_cache_{get,release} macros
PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were introduced *long* time
ago with promise that one day it will be possible to implement page
cache with bigger chunks than PAGE_SIZE.
This promise never materialized. And unlikely will.
We have many places where PAGE_CACHE_SIZE assumed to be equal to
PAGE_SIZE. And it's constant source of confusion on whether
PAGE_CACHE_* or PAGE_* constant should be used in a particular case,
especially on the border between fs and mm.
Global switching to PAGE_CACHE_SIZE != PAGE_SIZE would cause to much
breakage to be doable.
Let's stop pretending that pages in page cache are special. They are
not.
The changes are pretty straight-forward:
- <foo> << (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;
- <foo> >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;
- PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} -> PAGE_{SIZE,SHIFT,MASK,ALIGN};
- page_cache_get() -> get_page();
- page_cache_release() -> put_page();
This patch contains automated changes generated with coccinelle using
script below. For some reason, coccinelle doesn't patch header files.
I've called spatch for them manually.
The only adjustment after coccinelle is revert of changes to
PAGE_CAHCE_ALIGN definition: we are going to drop it later.
There are few places in the code where coccinelle didn't reach. I'll
fix them manually in a separate patch. Comments and documentation also
will be addressed with the separate patch.
virtual patch
@@
expression E;
@@
- E << (PAGE_CACHE_SHIFT - PAGE_SHIFT)
+ E
@@
expression E;
@@
- E >> (PAGE_CACHE_SHIFT - PAGE_SHIFT)
+ E
@@
@@
- PAGE_CACHE_SHIFT
+ PAGE_SHIFT
@@
@@
- PAGE_CACHE_SIZE
+ PAGE_SIZE
@@
@@
- PAGE_CACHE_MASK
+ PAGE_MASK
@@
expression E;
@@
- PAGE_CACHE_ALIGN(E)
+ PAGE_ALIGN(E)
@@
expression E;
@@
- page_cache_get(E)
+ get_page(E)
@@
expression E;
@@
- page_cache_release(E)
+ put_page(E)
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-04-01 20:29:47 +08:00
|
|
|
if (offset + SUMMARY_SIZE <= PAGE_SIZE -
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
SUM_FOOTER_SIZE)
|
|
|
|
continue;
|
|
|
|
|
|
|
|
f2fs_put_page(page, 1);
|
|
|
|
page = NULL;
|
|
|
|
|
f2fs: clean up symbol namespace
As Ted reported:
"Hi, I was looking at f2fs's sources recently, and I noticed that there
is a very large number of non-static symbols which don't have a f2fs
prefix. There's well over a hundred (see attached below).
As one example, in fs/f2fs/dir.c there is:
unsigned char get_de_type(struct f2fs_dir_entry *de)
This function is clearly only useful for f2fs, but it has a generic
name. This means that if any other file system tries to have the same
symbol name, there will be a symbol conflict and the kernel would not
successfully build. It also means that when someone is looking f2fs
sources, it's not at all obvious whether a function such as
read_data_page(), invalidate_blocks(), is a generic kernel function
found in the fs, mm, or block layers, or a f2fs specific function.
You might want to fix this at some point. Hopefully Kent's bcachefs
isn't similarly using genericly named functions, since that might
cause conflicts with f2fs's functions --- but just as this would be a
problem that we would rightly insist that Kent fix, this is something
that we should have rightly insisted that f2fs should have fixed
before it was integrated into the mainline kernel.
acquire_orphan_inode
add_ino_entry
add_orphan_inode
allocate_data_block
allocate_new_segments
alloc_nid
alloc_nid_done
alloc_nid_failed
available_free_memory
...."
This patch adds "f2fs_" prefix for all non-static symbols in order to:
a) avoid conflict with other kernel generic symbols;
b) to indicate the function is f2fs specific one instead of generic
one;
Reported-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-30 00:20:41 +08:00
|
|
|
page = f2fs_get_meta_page(sbi, start++);
|
2018-07-17 00:02:17 +08:00
|
|
|
if (IS_ERR(page))
|
|
|
|
return PTR_ERR(page);
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
kaddr = (unsigned char *)page_address(page);
|
|
|
|
offset = 0;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
f2fs_put_page(page, 1);
|
2018-07-17 00:02:17 +08:00
|
|
|
return 0;
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
static int read_normal_summaries(struct f2fs_sb_info *sbi, int type)
|
|
|
|
{
|
|
|
|
struct f2fs_checkpoint *ckpt = F2FS_CKPT(sbi);
|
|
|
|
struct f2fs_summary_block *sum;
|
|
|
|
struct curseg_info *curseg;
|
|
|
|
struct page *new;
|
|
|
|
unsigned short blk_off;
|
|
|
|
unsigned int segno = 0;
|
|
|
|
block_t blk_addr = 0;
|
2018-07-17 00:02:17 +08:00
|
|
|
int err = 0;
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
|
|
|
|
/* get segment number and block addr */
|
|
|
|
if (IS_DATASEG(type)) {
|
|
|
|
segno = le32_to_cpu(ckpt->cur_data_segno[type]);
|
|
|
|
blk_off = le16_to_cpu(ckpt->cur_data_blkoff[type -
|
|
|
|
CURSEG_HOT_DATA]);
|
2015-01-30 03:45:33 +08:00
|
|
|
if (__exist_node_summaries(sbi))
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
blk_addr = sum_blk_addr(sbi, NR_CURSEG_TYPE, type);
|
|
|
|
else
|
|
|
|
blk_addr = sum_blk_addr(sbi, NR_CURSEG_DATA_TYPE, type);
|
|
|
|
} else {
|
|
|
|
segno = le32_to_cpu(ckpt->cur_node_segno[type -
|
|
|
|
CURSEG_HOT_NODE]);
|
|
|
|
blk_off = le16_to_cpu(ckpt->cur_node_blkoff[type -
|
|
|
|
CURSEG_HOT_NODE]);
|
2015-01-30 03:45:33 +08:00
|
|
|
if (__exist_node_summaries(sbi))
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
blk_addr = sum_blk_addr(sbi, NR_CURSEG_NODE_TYPE,
|
|
|
|
type - CURSEG_HOT_NODE);
|
|
|
|
else
|
|
|
|
blk_addr = GET_SUM_BLOCK(sbi, segno);
|
|
|
|
}
|
|
|
|
|
f2fs: clean up symbol namespace
As Ted reported:
"Hi, I was looking at f2fs's sources recently, and I noticed that there
is a very large number of non-static symbols which don't have a f2fs
prefix. There's well over a hundred (see attached below).
As one example, in fs/f2fs/dir.c there is:
unsigned char get_de_type(struct f2fs_dir_entry *de)
This function is clearly only useful for f2fs, but it has a generic
name. This means that if any other file system tries to have the same
symbol name, there will be a symbol conflict and the kernel would not
successfully build. It also means that when someone is looking f2fs
sources, it's not at all obvious whether a function such as
read_data_page(), invalidate_blocks(), is a generic kernel function
found in the fs, mm, or block layers, or a f2fs specific function.
You might want to fix this at some point. Hopefully Kent's bcachefs
isn't similarly using genericly named functions, since that might
cause conflicts with f2fs's functions --- but just as this would be a
problem that we would rightly insist that Kent fix, this is something
that we should have rightly insisted that f2fs should have fixed
before it was integrated into the mainline kernel.
acquire_orphan_inode
add_ino_entry
add_orphan_inode
allocate_data_block
allocate_new_segments
alloc_nid
alloc_nid_done
alloc_nid_failed
available_free_memory
...."
This patch adds "f2fs_" prefix for all non-static symbols in order to:
a) avoid conflict with other kernel generic symbols;
b) to indicate the function is f2fs specific one instead of generic
one;
Reported-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-30 00:20:41 +08:00
|
|
|
new = f2fs_get_meta_page(sbi, blk_addr);
|
2018-07-17 00:02:17 +08:00
|
|
|
if (IS_ERR(new))
|
|
|
|
return PTR_ERR(new);
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
sum = (struct f2fs_summary_block *)page_address(new);
|
|
|
|
|
|
|
|
if (IS_NODESEG(type)) {
|
2015-01-30 03:45:33 +08:00
|
|
|
if (__exist_node_summaries(sbi)) {
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
struct f2fs_summary *ns = &sum->entries[0];
|
|
|
|
int i;
|
|
|
|
for (i = 0; i < sbi->blocks_per_seg; i++, ns++) {
|
|
|
|
ns->version = 0;
|
|
|
|
ns->ofs_in_node = 0;
|
|
|
|
}
|
|
|
|
} else {
|
2018-07-17 00:02:17 +08:00
|
|
|
err = f2fs_restore_node_summary(sbi, segno, sum);
|
|
|
|
if (err)
|
|
|
|
goto out;
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
/* set uncompleted segment to curseg */
|
|
|
|
curseg = CURSEG_I(sbi, type);
|
|
|
|
mutex_lock(&curseg->curseg_mutex);
|
f2fs: split journal cache from curseg cache
In curseg cache, f2fs caches two different parts:
- datas of current summay block, i.e. summary entries, footer info.
- journal info, i.e. sparse nat/sit entries or io stat info.
With this approach, 1) it may cause higher lock contention when we access
or update both of the parts of cache since we use the same mutex lock
curseg_mutex to protect the cache. 2) current summary block with last
journal info will be writebacked into device as a normal summary block
when flushing, however, we treat journal info as valid one only in current
summary, so most normal summary blocks contain junk journal data, it wastes
remaining space of summary block.
So, in order to fix above issues, we split curseg cache into two parts:
a) current summary block, protected by original mutex lock curseg_mutex
b) journal cache, protected by newly introduced r/w semaphore journal_rwsem
When loading curseg cache during ->mount, we store summary info and
journal info into different caches; When doing checkpoint, we combine
datas of two cache into current summary block for persisting.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-02-19 18:08:46 +08:00
|
|
|
|
|
|
|
/* update journal info */
|
|
|
|
down_write(&curseg->journal_rwsem);
|
|
|
|
memcpy(curseg->journal, &sum->journal, SUM_JOURNAL_SIZE);
|
|
|
|
up_write(&curseg->journal_rwsem);
|
|
|
|
|
|
|
|
memcpy(curseg->sum_blk->entries, sum->entries, SUM_ENTRY_SIZE);
|
|
|
|
memcpy(&curseg->sum_blk->footer, &sum->footer, SUM_FOOTER_SIZE);
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
curseg->next_segno = segno;
|
|
|
|
reset_curseg(sbi, type, 0);
|
|
|
|
curseg->alloc_type = ckpt->alloc_type[type];
|
|
|
|
curseg->next_blkoff = blk_off;
|
|
|
|
mutex_unlock(&curseg->curseg_mutex);
|
2018-07-17 00:02:17 +08:00
|
|
|
out:
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
f2fs_put_page(new, 1);
|
2018-07-17 00:02:17 +08:00
|
|
|
return err;
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
static int restore_curseg_summaries(struct f2fs_sb_info *sbi)
|
|
|
|
{
|
2017-06-02 02:18:30 +08:00
|
|
|
struct f2fs_journal *sit_j = CURSEG_I(sbi, CURSEG_COLD_DATA)->journal;
|
|
|
|
struct f2fs_journal *nat_j = CURSEG_I(sbi, CURSEG_HOT_DATA)->journal;
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
int type = CURSEG_HOT_DATA;
|
2014-03-17 16:36:24 +08:00
|
|
|
int err;
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
|
2016-09-20 11:04:18 +08:00
|
|
|
if (is_set_ckpt_flags(sbi, CP_COMPACT_SUM_FLAG)) {
|
f2fs: clean up symbol namespace
As Ted reported:
"Hi, I was looking at f2fs's sources recently, and I noticed that there
is a very large number of non-static symbols which don't have a f2fs
prefix. There's well over a hundred (see attached below).
As one example, in fs/f2fs/dir.c there is:
unsigned char get_de_type(struct f2fs_dir_entry *de)
This function is clearly only useful for f2fs, but it has a generic
name. This means that if any other file system tries to have the same
symbol name, there will be a symbol conflict and the kernel would not
successfully build. It also means that when someone is looking f2fs
sources, it's not at all obvious whether a function such as
read_data_page(), invalidate_blocks(), is a generic kernel function
found in the fs, mm, or block layers, or a f2fs specific function.
You might want to fix this at some point. Hopefully Kent's bcachefs
isn't similarly using genericly named functions, since that might
cause conflicts with f2fs's functions --- but just as this would be a
problem that we would rightly insist that Kent fix, this is something
that we should have rightly insisted that f2fs should have fixed
before it was integrated into the mainline kernel.
acquire_orphan_inode
add_ino_entry
add_orphan_inode
allocate_data_block
allocate_new_segments
alloc_nid
alloc_nid_done
alloc_nid_failed
available_free_memory
...."
This patch adds "f2fs_" prefix for all non-static symbols in order to:
a) avoid conflict with other kernel generic symbols;
b) to indicate the function is f2fs specific one instead of generic
one;
Reported-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-30 00:20:41 +08:00
|
|
|
int npages = f2fs_npages_for_summary_flush(sbi, true);
|
2014-12-09 14:21:46 +08:00
|
|
|
|
|
|
|
if (npages >= 2)
|
f2fs: clean up symbol namespace
As Ted reported:
"Hi, I was looking at f2fs's sources recently, and I noticed that there
is a very large number of non-static symbols which don't have a f2fs
prefix. There's well over a hundred (see attached below).
As one example, in fs/f2fs/dir.c there is:
unsigned char get_de_type(struct f2fs_dir_entry *de)
This function is clearly only useful for f2fs, but it has a generic
name. This means that if any other file system tries to have the same
symbol name, there will be a symbol conflict and the kernel would not
successfully build. It also means that when someone is looking f2fs
sources, it's not at all obvious whether a function such as
read_data_page(), invalidate_blocks(), is a generic kernel function
found in the fs, mm, or block layers, or a f2fs specific function.
You might want to fix this at some point. Hopefully Kent's bcachefs
isn't similarly using genericly named functions, since that might
cause conflicts with f2fs's functions --- but just as this would be a
problem that we would rightly insist that Kent fix, this is something
that we should have rightly insisted that f2fs should have fixed
before it was integrated into the mainline kernel.
acquire_orphan_inode
add_ino_entry
add_orphan_inode
allocate_data_block
allocate_new_segments
alloc_nid
alloc_nid_done
alloc_nid_failed
available_free_memory
...."
This patch adds "f2fs_" prefix for all non-static symbols in order to:
a) avoid conflict with other kernel generic symbols;
b) to indicate the function is f2fs specific one instead of generic
one;
Reported-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-30 00:20:41 +08:00
|
|
|
f2fs_ra_meta_pages(sbi, start_sum_block(sbi), npages,
|
2015-10-12 17:05:59 +08:00
|
|
|
META_CP, true);
|
2014-12-09 14:21:46 +08:00
|
|
|
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
/* restore for compacted data summary */
|
2018-07-17 00:02:17 +08:00
|
|
|
err = read_compacted_summaries(sbi);
|
|
|
|
if (err)
|
|
|
|
return err;
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
type = CURSEG_HOT_NODE;
|
|
|
|
}
|
|
|
|
|
2015-01-30 03:45:33 +08:00
|
|
|
if (__exist_node_summaries(sbi))
|
f2fs: clean up symbol namespace
As Ted reported:
"Hi, I was looking at f2fs's sources recently, and I noticed that there
is a very large number of non-static symbols which don't have a f2fs
prefix. There's well over a hundred (see attached below).
As one example, in fs/f2fs/dir.c there is:
unsigned char get_de_type(struct f2fs_dir_entry *de)
This function is clearly only useful for f2fs, but it has a generic
name. This means that if any other file system tries to have the same
symbol name, there will be a symbol conflict and the kernel would not
successfully build. It also means that when someone is looking f2fs
sources, it's not at all obvious whether a function such as
read_data_page(), invalidate_blocks(), is a generic kernel function
found in the fs, mm, or block layers, or a f2fs specific function.
You might want to fix this at some point. Hopefully Kent's bcachefs
isn't similarly using genericly named functions, since that might
cause conflicts with f2fs's functions --- but just as this would be a
problem that we would rightly insist that Kent fix, this is something
that we should have rightly insisted that f2fs should have fixed
before it was integrated into the mainline kernel.
acquire_orphan_inode
add_ino_entry
add_orphan_inode
allocate_data_block
allocate_new_segments
alloc_nid
alloc_nid_done
alloc_nid_failed
available_free_memory
...."
This patch adds "f2fs_" prefix for all non-static symbols in order to:
a) avoid conflict with other kernel generic symbols;
b) to indicate the function is f2fs specific one instead of generic
one;
Reported-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-30 00:20:41 +08:00
|
|
|
f2fs_ra_meta_pages(sbi, sum_blk_addr(sbi, NR_CURSEG_TYPE, type),
|
2015-10-12 17:05:59 +08:00
|
|
|
NR_CURSEG_TYPE - type, META_CP, true);
|
2014-12-09 14:21:46 +08:00
|
|
|
|
2014-03-17 16:36:24 +08:00
|
|
|
for (; type <= CURSEG_COLD_NODE; type++) {
|
|
|
|
err = read_normal_summaries(sbi, type);
|
|
|
|
if (err)
|
|
|
|
return err;
|
|
|
|
}
|
|
|
|
|
2017-06-02 02:18:30 +08:00
|
|
|
/* sanity check for summary blocks */
|
|
|
|
if (nats_in_cursum(nat_j) > NAT_JOURNAL_ENTRIES ||
|
|
|
|
sits_in_cursum(sit_j) > SIT_JOURNAL_ENTRIES)
|
|
|
|
return -EINVAL;
|
|
|
|
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
static void write_compacted_summaries(struct f2fs_sb_info *sbi, block_t blkaddr)
|
|
|
|
{
|
|
|
|
struct page *page;
|
|
|
|
unsigned char *kaddr;
|
|
|
|
struct f2fs_summary *summary;
|
|
|
|
struct curseg_info *seg_i;
|
|
|
|
int written_size = 0;
|
|
|
|
int i, j;
|
|
|
|
|
f2fs: clean up symbol namespace
As Ted reported:
"Hi, I was looking at f2fs's sources recently, and I noticed that there
is a very large number of non-static symbols which don't have a f2fs
prefix. There's well over a hundred (see attached below).
As one example, in fs/f2fs/dir.c there is:
unsigned char get_de_type(struct f2fs_dir_entry *de)
This function is clearly only useful for f2fs, but it has a generic
name. This means that if any other file system tries to have the same
symbol name, there will be a symbol conflict and the kernel would not
successfully build. It also means that when someone is looking f2fs
sources, it's not at all obvious whether a function such as
read_data_page(), invalidate_blocks(), is a generic kernel function
found in the fs, mm, or block layers, or a f2fs specific function.
You might want to fix this at some point. Hopefully Kent's bcachefs
isn't similarly using genericly named functions, since that might
cause conflicts with f2fs's functions --- but just as this would be a
problem that we would rightly insist that Kent fix, this is something
that we should have rightly insisted that f2fs should have fixed
before it was integrated into the mainline kernel.
acquire_orphan_inode
add_ino_entry
add_orphan_inode
allocate_data_block
allocate_new_segments
alloc_nid
alloc_nid_done
alloc_nid_failed
available_free_memory
...."
This patch adds "f2fs_" prefix for all non-static symbols in order to:
a) avoid conflict with other kernel generic symbols;
b) to indicate the function is f2fs specific one instead of generic
one;
Reported-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-30 00:20:41 +08:00
|
|
|
page = f2fs_grab_meta_page(sbi, blkaddr++);
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
kaddr = (unsigned char *)page_address(page);
|
2018-04-09 20:25:06 +08:00
|
|
|
memset(kaddr, 0, PAGE_SIZE);
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
|
|
|
|
/* Step 1: write nat cache */
|
|
|
|
seg_i = CURSEG_I(sbi, CURSEG_HOT_DATA);
|
f2fs: split journal cache from curseg cache
In curseg cache, f2fs caches two different parts:
- datas of current summay block, i.e. summary entries, footer info.
- journal info, i.e. sparse nat/sit entries or io stat info.
With this approach, 1) it may cause higher lock contention when we access
or update both of the parts of cache since we use the same mutex lock
curseg_mutex to protect the cache. 2) current summary block with last
journal info will be writebacked into device as a normal summary block
when flushing, however, we treat journal info as valid one only in current
summary, so most normal summary blocks contain junk journal data, it wastes
remaining space of summary block.
So, in order to fix above issues, we split curseg cache into two parts:
a) current summary block, protected by original mutex lock curseg_mutex
b) journal cache, protected by newly introduced r/w semaphore journal_rwsem
When loading curseg cache during ->mount, we store summary info and
journal info into different caches; When doing checkpoint, we combine
datas of two cache into current summary block for persisting.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-02-19 18:08:46 +08:00
|
|
|
memcpy(kaddr, seg_i->journal, SUM_JOURNAL_SIZE);
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
written_size += SUM_JOURNAL_SIZE;
|
|
|
|
|
|
|
|
/* Step 2: write sit cache */
|
|
|
|
seg_i = CURSEG_I(sbi, CURSEG_COLD_DATA);
|
f2fs: split journal cache from curseg cache
In curseg cache, f2fs caches two different parts:
- datas of current summay block, i.e. summary entries, footer info.
- journal info, i.e. sparse nat/sit entries or io stat info.
With this approach, 1) it may cause higher lock contention when we access
or update both of the parts of cache since we use the same mutex lock
curseg_mutex to protect the cache. 2) current summary block with last
journal info will be writebacked into device as a normal summary block
when flushing, however, we treat journal info as valid one only in current
summary, so most normal summary blocks contain junk journal data, it wastes
remaining space of summary block.
So, in order to fix above issues, we split curseg cache into two parts:
a) current summary block, protected by original mutex lock curseg_mutex
b) journal cache, protected by newly introduced r/w semaphore journal_rwsem
When loading curseg cache during ->mount, we store summary info and
journal info into different caches; When doing checkpoint, we combine
datas of two cache into current summary block for persisting.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-02-19 18:08:46 +08:00
|
|
|
memcpy(kaddr + written_size, seg_i->journal, SUM_JOURNAL_SIZE);
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
written_size += SUM_JOURNAL_SIZE;
|
|
|
|
|
|
|
|
/* Step 3: write summary entries */
|
|
|
|
for (i = CURSEG_HOT_DATA; i <= CURSEG_COLD_DATA; i++) {
|
|
|
|
unsigned short blkoff;
|
|
|
|
seg_i = CURSEG_I(sbi, i);
|
|
|
|
if (sbi->ckpt->alloc_type[i] == SSR)
|
|
|
|
blkoff = sbi->blocks_per_seg;
|
|
|
|
else
|
|
|
|
blkoff = curseg_blkoff(sbi, i);
|
|
|
|
|
|
|
|
for (j = 0; j < blkoff; j++) {
|
|
|
|
if (!page) {
|
f2fs: clean up symbol namespace
As Ted reported:
"Hi, I was looking at f2fs's sources recently, and I noticed that there
is a very large number of non-static symbols which don't have a f2fs
prefix. There's well over a hundred (see attached below).
As one example, in fs/f2fs/dir.c there is:
unsigned char get_de_type(struct f2fs_dir_entry *de)
This function is clearly only useful for f2fs, but it has a generic
name. This means that if any other file system tries to have the same
symbol name, there will be a symbol conflict and the kernel would not
successfully build. It also means that when someone is looking f2fs
sources, it's not at all obvious whether a function such as
read_data_page(), invalidate_blocks(), is a generic kernel function
found in the fs, mm, or block layers, or a f2fs specific function.
You might want to fix this at some point. Hopefully Kent's bcachefs
isn't similarly using genericly named functions, since that might
cause conflicts with f2fs's functions --- but just as this would be a
problem that we would rightly insist that Kent fix, this is something
that we should have rightly insisted that f2fs should have fixed
before it was integrated into the mainline kernel.
acquire_orphan_inode
add_ino_entry
add_orphan_inode
allocate_data_block
allocate_new_segments
alloc_nid
alloc_nid_done
alloc_nid_failed
available_free_memory
...."
This patch adds "f2fs_" prefix for all non-static symbols in order to:
a) avoid conflict with other kernel generic symbols;
b) to indicate the function is f2fs specific one instead of generic
one;
Reported-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-30 00:20:41 +08:00
|
|
|
page = f2fs_grab_meta_page(sbi, blkaddr++);
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
kaddr = (unsigned char *)page_address(page);
|
2018-04-09 20:25:06 +08:00
|
|
|
memset(kaddr, 0, PAGE_SIZE);
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
written_size = 0;
|
|
|
|
}
|
|
|
|
summary = (struct f2fs_summary *)(kaddr + written_size);
|
|
|
|
*summary = seg_i->sum_blk->entries[j];
|
|
|
|
written_size += SUMMARY_SIZE;
|
|
|
|
|
mm, fs: get rid of PAGE_CACHE_* and page_cache_{get,release} macros
PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were introduced *long* time
ago with promise that one day it will be possible to implement page
cache with bigger chunks than PAGE_SIZE.
This promise never materialized. And unlikely will.
We have many places where PAGE_CACHE_SIZE assumed to be equal to
PAGE_SIZE. And it's constant source of confusion on whether
PAGE_CACHE_* or PAGE_* constant should be used in a particular case,
especially on the border between fs and mm.
Global switching to PAGE_CACHE_SIZE != PAGE_SIZE would cause to much
breakage to be doable.
Let's stop pretending that pages in page cache are special. They are
not.
The changes are pretty straight-forward:
- <foo> << (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;
- <foo> >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;
- PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} -> PAGE_{SIZE,SHIFT,MASK,ALIGN};
- page_cache_get() -> get_page();
- page_cache_release() -> put_page();
This patch contains automated changes generated with coccinelle using
script below. For some reason, coccinelle doesn't patch header files.
I've called spatch for them manually.
The only adjustment after coccinelle is revert of changes to
PAGE_CAHCE_ALIGN definition: we are going to drop it later.
There are few places in the code where coccinelle didn't reach. I'll
fix them manually in a separate patch. Comments and documentation also
will be addressed with the separate patch.
virtual patch
@@
expression E;
@@
- E << (PAGE_CACHE_SHIFT - PAGE_SHIFT)
+ E
@@
expression E;
@@
- E >> (PAGE_CACHE_SHIFT - PAGE_SHIFT)
+ E
@@
@@
- PAGE_CACHE_SHIFT
+ PAGE_SHIFT
@@
@@
- PAGE_CACHE_SIZE
+ PAGE_SIZE
@@
@@
- PAGE_CACHE_MASK
+ PAGE_MASK
@@
expression E;
@@
- PAGE_CACHE_ALIGN(E)
+ PAGE_ALIGN(E)
@@
expression E;
@@
- page_cache_get(E)
+ get_page(E)
@@
expression E;
@@
- page_cache_release(E)
+ put_page(E)
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-04-01 20:29:47 +08:00
|
|
|
if (written_size + SUMMARY_SIZE <= PAGE_SIZE -
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
SUM_FOOTER_SIZE)
|
|
|
|
continue;
|
|
|
|
|
2013-10-24 15:08:28 +08:00
|
|
|
set_page_dirty(page);
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
f2fs_put_page(page, 1);
|
|
|
|
page = NULL;
|
|
|
|
}
|
|
|
|
}
|
2013-10-24 15:08:28 +08:00
|
|
|
if (page) {
|
|
|
|
set_page_dirty(page);
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
f2fs_put_page(page, 1);
|
2013-10-24 15:08:28 +08:00
|
|
|
}
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
static void write_normal_summaries(struct f2fs_sb_info *sbi,
|
|
|
|
block_t blkaddr, int type)
|
|
|
|
{
|
|
|
|
int i, end;
|
|
|
|
if (IS_DATASEG(type))
|
|
|
|
end = type + NR_CURSEG_DATA_TYPE;
|
|
|
|
else
|
|
|
|
end = type + NR_CURSEG_NODE_TYPE;
|
|
|
|
|
f2fs: split journal cache from curseg cache
In curseg cache, f2fs caches two different parts:
- datas of current summay block, i.e. summary entries, footer info.
- journal info, i.e. sparse nat/sit entries or io stat info.
With this approach, 1) it may cause higher lock contention when we access
or update both of the parts of cache since we use the same mutex lock
curseg_mutex to protect the cache. 2) current summary block with last
journal info will be writebacked into device as a normal summary block
when flushing, however, we treat journal info as valid one only in current
summary, so most normal summary blocks contain junk journal data, it wastes
remaining space of summary block.
So, in order to fix above issues, we split curseg cache into two parts:
a) current summary block, protected by original mutex lock curseg_mutex
b) journal cache, protected by newly introduced r/w semaphore journal_rwsem
When loading curseg cache during ->mount, we store summary info and
journal info into different caches; When doing checkpoint, we combine
datas of two cache into current summary block for persisting.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-02-19 18:08:46 +08:00
|
|
|
for (i = type; i < end; i++)
|
|
|
|
write_current_sum_page(sbi, i, blkaddr + (i - type));
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
}
|
|
|
|
|
f2fs: clean up symbol namespace
As Ted reported:
"Hi, I was looking at f2fs's sources recently, and I noticed that there
is a very large number of non-static symbols which don't have a f2fs
prefix. There's well over a hundred (see attached below).
As one example, in fs/f2fs/dir.c there is:
unsigned char get_de_type(struct f2fs_dir_entry *de)
This function is clearly only useful for f2fs, but it has a generic
name. This means that if any other file system tries to have the same
symbol name, there will be a symbol conflict and the kernel would not
successfully build. It also means that when someone is looking f2fs
sources, it's not at all obvious whether a function such as
read_data_page(), invalidate_blocks(), is a generic kernel function
found in the fs, mm, or block layers, or a f2fs specific function.
You might want to fix this at some point. Hopefully Kent's bcachefs
isn't similarly using genericly named functions, since that might
cause conflicts with f2fs's functions --- but just as this would be a
problem that we would rightly insist that Kent fix, this is something
that we should have rightly insisted that f2fs should have fixed
before it was integrated into the mainline kernel.
acquire_orphan_inode
add_ino_entry
add_orphan_inode
allocate_data_block
allocate_new_segments
alloc_nid
alloc_nid_done
alloc_nid_failed
available_free_memory
...."
This patch adds "f2fs_" prefix for all non-static symbols in order to:
a) avoid conflict with other kernel generic symbols;
b) to indicate the function is f2fs specific one instead of generic
one;
Reported-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-30 00:20:41 +08:00
|
|
|
void f2fs_write_data_summaries(struct f2fs_sb_info *sbi, block_t start_blk)
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
{
|
2016-09-20 11:04:18 +08:00
|
|
|
if (is_set_ckpt_flags(sbi, CP_COMPACT_SUM_FLAG))
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
write_compacted_summaries(sbi, start_blk);
|
|
|
|
else
|
|
|
|
write_normal_summaries(sbi, start_blk, CURSEG_HOT_DATA);
|
|
|
|
}
|
|
|
|
|
f2fs: clean up symbol namespace
As Ted reported:
"Hi, I was looking at f2fs's sources recently, and I noticed that there
is a very large number of non-static symbols which don't have a f2fs
prefix. There's well over a hundred (see attached below).
As one example, in fs/f2fs/dir.c there is:
unsigned char get_de_type(struct f2fs_dir_entry *de)
This function is clearly only useful for f2fs, but it has a generic
name. This means that if any other file system tries to have the same
symbol name, there will be a symbol conflict and the kernel would not
successfully build. It also means that when someone is looking f2fs
sources, it's not at all obvious whether a function such as
read_data_page(), invalidate_blocks(), is a generic kernel function
found in the fs, mm, or block layers, or a f2fs specific function.
You might want to fix this at some point. Hopefully Kent's bcachefs
isn't similarly using genericly named functions, since that might
cause conflicts with f2fs's functions --- but just as this would be a
problem that we would rightly insist that Kent fix, this is something
that we should have rightly insisted that f2fs should have fixed
before it was integrated into the mainline kernel.
acquire_orphan_inode
add_ino_entry
add_orphan_inode
allocate_data_block
allocate_new_segments
alloc_nid
alloc_nid_done
alloc_nid_failed
available_free_memory
...."
This patch adds "f2fs_" prefix for all non-static symbols in order to:
a) avoid conflict with other kernel generic symbols;
b) to indicate the function is f2fs specific one instead of generic
one;
Reported-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-30 00:20:41 +08:00
|
|
|
void f2fs_write_node_summaries(struct f2fs_sb_info *sbi, block_t start_blk)
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
{
|
2015-01-30 03:45:33 +08:00
|
|
|
write_normal_summaries(sbi, start_blk, CURSEG_HOT_NODE);
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
}
|
|
|
|
|
f2fs: clean up symbol namespace
As Ted reported:
"Hi, I was looking at f2fs's sources recently, and I noticed that there
is a very large number of non-static symbols which don't have a f2fs
prefix. There's well over a hundred (see attached below).
As one example, in fs/f2fs/dir.c there is:
unsigned char get_de_type(struct f2fs_dir_entry *de)
This function is clearly only useful for f2fs, but it has a generic
name. This means that if any other file system tries to have the same
symbol name, there will be a symbol conflict and the kernel would not
successfully build. It also means that when someone is looking f2fs
sources, it's not at all obvious whether a function such as
read_data_page(), invalidate_blocks(), is a generic kernel function
found in the fs, mm, or block layers, or a f2fs specific function.
You might want to fix this at some point. Hopefully Kent's bcachefs
isn't similarly using genericly named functions, since that might
cause conflicts with f2fs's functions --- but just as this would be a
problem that we would rightly insist that Kent fix, this is something
that we should have rightly insisted that f2fs should have fixed
before it was integrated into the mainline kernel.
acquire_orphan_inode
add_ino_entry
add_orphan_inode
allocate_data_block
allocate_new_segments
alloc_nid
alloc_nid_done
alloc_nid_failed
available_free_memory
...."
This patch adds "f2fs_" prefix for all non-static symbols in order to:
a) avoid conflict with other kernel generic symbols;
b) to indicate the function is f2fs specific one instead of generic
one;
Reported-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-30 00:20:41 +08:00
|
|
|
int f2fs_lookup_journal_in_cursum(struct f2fs_journal *journal, int type,
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
unsigned int val, int alloc)
|
|
|
|
{
|
|
|
|
int i;
|
|
|
|
|
|
|
|
if (type == NAT_JOURNAL) {
|
2016-02-14 18:50:40 +08:00
|
|
|
for (i = 0; i < nats_in_cursum(journal); i++) {
|
|
|
|
if (le32_to_cpu(nid_in_journal(journal, i)) == val)
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
return i;
|
|
|
|
}
|
2016-02-14 18:50:40 +08:00
|
|
|
if (alloc && __has_cursum_space(journal, 1, NAT_JOURNAL))
|
|
|
|
return update_nats_in_cursum(journal, 1);
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
} else if (type == SIT_JOURNAL) {
|
2016-02-14 18:50:40 +08:00
|
|
|
for (i = 0; i < sits_in_cursum(journal); i++)
|
|
|
|
if (le32_to_cpu(segno_in_journal(journal, i)) == val)
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
return i;
|
2016-02-14 18:50:40 +08:00
|
|
|
if (alloc && __has_cursum_space(journal, 1, SIT_JOURNAL))
|
|
|
|
return update_sits_in_cursum(journal, 1);
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
}
|
|
|
|
return -1;
|
|
|
|
}
|
|
|
|
|
|
|
|
static struct page *get_current_sit_page(struct f2fs_sb_info *sbi,
|
|
|
|
unsigned int segno)
|
|
|
|
{
|
2018-07-17 00:02:17 +08:00
|
|
|
return f2fs_get_meta_page_nofail(sbi, current_sit_addr(sbi, segno));
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
static struct page *get_next_sit_page(struct f2fs_sb_info *sbi,
|
|
|
|
unsigned int start)
|
|
|
|
{
|
|
|
|
struct sit_info *sit_i = SIT_I(sbi);
|
f2fs: rebuild sit page from sit info in mem
This patch rebuild sit page from sit info in mem instead
of issue a read io.
I test this method and the result is as below:
Pre:
mmc_perf_test-12061 [001] ...1 976.819992: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
mmc_perf_test-12061 [001] ...1 976.856446: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
mmc_perf_test-12061 [003] ...1 998.976946: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
mmc_perf_test-12061 [003] ...1 999.023269: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
mmc_perf_test-12061 [003] ...1 1022.060772: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
mmc_perf_test-12061 [003] ...1 1022.111034: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
mmc_perf_test-12061 [002] ...1 1070.127643: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
mmc_perf_test-12061 [003] ...1 1070.187352: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
mmc_perf_test-12061 [003] ...1 1095.942124: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
mmc_perf_test-12061 [003] ...1 1095.995975: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
mmc_perf_test-12061 [003] ...1 1122.535091: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
mmc_perf_test-12061 [003] ...1 1122.586521: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
mmc_perf_test-12061 [001] ...1 1147.897487: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
mmc_perf_test-12061 [001] ...1 1147.959438: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
mmc_perf_test-12061 [003] ...1 1177.926951: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
mmc_perf_test-12061 [002] ...1 1177.976823: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
mmc_perf_test-12061 [002] ...1 1204.176087: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
mmc_perf_test-12061 [002] ...1 1204.239046: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
Some sit flush consume more than 50ms.
Now:
mmc_perf_test-2187 [007] ...1 196.840684: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
mmc_perf_test-2187 [007] ...1 196.841258: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
mmc_perf_test-2187 [007] ...1 219.430582: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
mmc_perf_test-2187 [007] ...1 219.431144: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
mmc_perf_test-2187 [002] ...1 243.638678: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
mmc_perf_test-2187 [000] ...1 243.638980: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
mmc_perf_test-2187 [002] ...1 265.392180: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
mmc_perf_test-2187 [002] ...1 265.392245: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
mmc_perf_test-2187 [000] ...1 290.309051: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
mmc_perf_test-2187 [000] ...1 290.309116: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
mmc_perf_test-2187 [003] ...1 317.144209: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
mmc_perf_test-2187 [003] ...1 317.145913: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
mmc_perf_test-2187 [005] ...1 343.224954: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
mmc_perf_test-2187 [005] ...1 343.225574: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
mmc_perf_test-2187 [000] ...1 370.239846: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
mmc_perf_test-2187 [000] ...1 370.241138: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
mmc_perf_test-2187 [001] ...1 397.029043: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
mmc_perf_test-2187 [001] ...1 397.030750: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
mmc_perf_test-2187 [003] ...1 425.386377: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
mmc_perf_test-2187 [003] ...1 425.387735: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
Most sit flush consume no more than 1ms.
Signed-off-by: Yunlei He <heyunlei@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-01-25 17:27:11 +08:00
|
|
|
struct page *page;
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
pgoff_t src_off, dst_off;
|
|
|
|
|
|
|
|
src_off = current_sit_addr(sbi, start);
|
|
|
|
dst_off = next_sit_addr(sbi, src_off);
|
|
|
|
|
f2fs: clean up symbol namespace
As Ted reported:
"Hi, I was looking at f2fs's sources recently, and I noticed that there
is a very large number of non-static symbols which don't have a f2fs
prefix. There's well over a hundred (see attached below).
As one example, in fs/f2fs/dir.c there is:
unsigned char get_de_type(struct f2fs_dir_entry *de)
This function is clearly only useful for f2fs, but it has a generic
name. This means that if any other file system tries to have the same
symbol name, there will be a symbol conflict and the kernel would not
successfully build. It also means that when someone is looking f2fs
sources, it's not at all obvious whether a function such as
read_data_page(), invalidate_blocks(), is a generic kernel function
found in the fs, mm, or block layers, or a f2fs specific function.
You might want to fix this at some point. Hopefully Kent's bcachefs
isn't similarly using genericly named functions, since that might
cause conflicts with f2fs's functions --- but just as this would be a
problem that we would rightly insist that Kent fix, this is something
that we should have rightly insisted that f2fs should have fixed
before it was integrated into the mainline kernel.
acquire_orphan_inode
add_ino_entry
add_orphan_inode
allocate_data_block
allocate_new_segments
alloc_nid
alloc_nid_done
alloc_nid_failed
available_free_memory
...."
This patch adds "f2fs_" prefix for all non-static symbols in order to:
a) avoid conflict with other kernel generic symbols;
b) to indicate the function is f2fs specific one instead of generic
one;
Reported-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-30 00:20:41 +08:00
|
|
|
page = f2fs_grab_meta_page(sbi, dst_off);
|
f2fs: rebuild sit page from sit info in mem
This patch rebuild sit page from sit info in mem instead
of issue a read io.
I test this method and the result is as below:
Pre:
mmc_perf_test-12061 [001] ...1 976.819992: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
mmc_perf_test-12061 [001] ...1 976.856446: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
mmc_perf_test-12061 [003] ...1 998.976946: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
mmc_perf_test-12061 [003] ...1 999.023269: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
mmc_perf_test-12061 [003] ...1 1022.060772: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
mmc_perf_test-12061 [003] ...1 1022.111034: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
mmc_perf_test-12061 [002] ...1 1070.127643: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
mmc_perf_test-12061 [003] ...1 1070.187352: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
mmc_perf_test-12061 [003] ...1 1095.942124: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
mmc_perf_test-12061 [003] ...1 1095.995975: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
mmc_perf_test-12061 [003] ...1 1122.535091: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
mmc_perf_test-12061 [003] ...1 1122.586521: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
mmc_perf_test-12061 [001] ...1 1147.897487: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
mmc_perf_test-12061 [001] ...1 1147.959438: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
mmc_perf_test-12061 [003] ...1 1177.926951: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
mmc_perf_test-12061 [002] ...1 1177.976823: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
mmc_perf_test-12061 [002] ...1 1204.176087: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
mmc_perf_test-12061 [002] ...1 1204.239046: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
Some sit flush consume more than 50ms.
Now:
mmc_perf_test-2187 [007] ...1 196.840684: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
mmc_perf_test-2187 [007] ...1 196.841258: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
mmc_perf_test-2187 [007] ...1 219.430582: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
mmc_perf_test-2187 [007] ...1 219.431144: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
mmc_perf_test-2187 [002] ...1 243.638678: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
mmc_perf_test-2187 [000] ...1 243.638980: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
mmc_perf_test-2187 [002] ...1 265.392180: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
mmc_perf_test-2187 [002] ...1 265.392245: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
mmc_perf_test-2187 [000] ...1 290.309051: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
mmc_perf_test-2187 [000] ...1 290.309116: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
mmc_perf_test-2187 [003] ...1 317.144209: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
mmc_perf_test-2187 [003] ...1 317.145913: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
mmc_perf_test-2187 [005] ...1 343.224954: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
mmc_perf_test-2187 [005] ...1 343.225574: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
mmc_perf_test-2187 [000] ...1 370.239846: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
mmc_perf_test-2187 [000] ...1 370.241138: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
mmc_perf_test-2187 [001] ...1 397.029043: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
mmc_perf_test-2187 [001] ...1 397.030750: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
mmc_perf_test-2187 [003] ...1 425.386377: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
mmc_perf_test-2187 [003] ...1 425.387735: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
Most sit flush consume no more than 1ms.
Signed-off-by: Yunlei He <heyunlei@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-01-25 17:27:11 +08:00
|
|
|
seg_info_to_sit_page(sbi, page, start);
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
|
f2fs: rebuild sit page from sit info in mem
This patch rebuild sit page from sit info in mem instead
of issue a read io.
I test this method and the result is as below:
Pre:
mmc_perf_test-12061 [001] ...1 976.819992: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
mmc_perf_test-12061 [001] ...1 976.856446: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
mmc_perf_test-12061 [003] ...1 998.976946: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
mmc_perf_test-12061 [003] ...1 999.023269: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
mmc_perf_test-12061 [003] ...1 1022.060772: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
mmc_perf_test-12061 [003] ...1 1022.111034: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
mmc_perf_test-12061 [002] ...1 1070.127643: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
mmc_perf_test-12061 [003] ...1 1070.187352: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
mmc_perf_test-12061 [003] ...1 1095.942124: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
mmc_perf_test-12061 [003] ...1 1095.995975: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
mmc_perf_test-12061 [003] ...1 1122.535091: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
mmc_perf_test-12061 [003] ...1 1122.586521: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
mmc_perf_test-12061 [001] ...1 1147.897487: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
mmc_perf_test-12061 [001] ...1 1147.959438: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
mmc_perf_test-12061 [003] ...1 1177.926951: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
mmc_perf_test-12061 [002] ...1 1177.976823: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
mmc_perf_test-12061 [002] ...1 1204.176087: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
mmc_perf_test-12061 [002] ...1 1204.239046: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
Some sit flush consume more than 50ms.
Now:
mmc_perf_test-2187 [007] ...1 196.840684: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
mmc_perf_test-2187 [007] ...1 196.841258: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
mmc_perf_test-2187 [007] ...1 219.430582: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
mmc_perf_test-2187 [007] ...1 219.431144: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
mmc_perf_test-2187 [002] ...1 243.638678: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
mmc_perf_test-2187 [000] ...1 243.638980: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
mmc_perf_test-2187 [002] ...1 265.392180: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
mmc_perf_test-2187 [002] ...1 265.392245: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
mmc_perf_test-2187 [000] ...1 290.309051: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
mmc_perf_test-2187 [000] ...1 290.309116: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
mmc_perf_test-2187 [003] ...1 317.144209: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
mmc_perf_test-2187 [003] ...1 317.145913: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
mmc_perf_test-2187 [005] ...1 343.224954: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
mmc_perf_test-2187 [005] ...1 343.225574: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
mmc_perf_test-2187 [000] ...1 370.239846: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
mmc_perf_test-2187 [000] ...1 370.241138: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
mmc_perf_test-2187 [001] ...1 397.029043: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
mmc_perf_test-2187 [001] ...1 397.030750: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
mmc_perf_test-2187 [003] ...1 425.386377: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
mmc_perf_test-2187 [003] ...1 425.387735: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
Most sit flush consume no more than 1ms.
Signed-off-by: Yunlei He <heyunlei@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-01-25 17:27:11 +08:00
|
|
|
set_page_dirty(page);
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
set_to_next_sit(sit_i, start);
|
|
|
|
|
f2fs: rebuild sit page from sit info in mem
This patch rebuild sit page from sit info in mem instead
of issue a read io.
I test this method and the result is as below:
Pre:
mmc_perf_test-12061 [001] ...1 976.819992: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
mmc_perf_test-12061 [001] ...1 976.856446: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
mmc_perf_test-12061 [003] ...1 998.976946: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
mmc_perf_test-12061 [003] ...1 999.023269: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
mmc_perf_test-12061 [003] ...1 1022.060772: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
mmc_perf_test-12061 [003] ...1 1022.111034: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
mmc_perf_test-12061 [002] ...1 1070.127643: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
mmc_perf_test-12061 [003] ...1 1070.187352: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
mmc_perf_test-12061 [003] ...1 1095.942124: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
mmc_perf_test-12061 [003] ...1 1095.995975: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
mmc_perf_test-12061 [003] ...1 1122.535091: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
mmc_perf_test-12061 [003] ...1 1122.586521: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
mmc_perf_test-12061 [001] ...1 1147.897487: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
mmc_perf_test-12061 [001] ...1 1147.959438: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
mmc_perf_test-12061 [003] ...1 1177.926951: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
mmc_perf_test-12061 [002] ...1 1177.976823: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
mmc_perf_test-12061 [002] ...1 1204.176087: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
mmc_perf_test-12061 [002] ...1 1204.239046: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
Some sit flush consume more than 50ms.
Now:
mmc_perf_test-2187 [007] ...1 196.840684: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
mmc_perf_test-2187 [007] ...1 196.841258: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
mmc_perf_test-2187 [007] ...1 219.430582: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
mmc_perf_test-2187 [007] ...1 219.431144: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
mmc_perf_test-2187 [002] ...1 243.638678: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
mmc_perf_test-2187 [000] ...1 243.638980: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
mmc_perf_test-2187 [002] ...1 265.392180: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
mmc_perf_test-2187 [002] ...1 265.392245: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
mmc_perf_test-2187 [000] ...1 290.309051: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
mmc_perf_test-2187 [000] ...1 290.309116: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
mmc_perf_test-2187 [003] ...1 317.144209: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
mmc_perf_test-2187 [003] ...1 317.145913: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
mmc_perf_test-2187 [005] ...1 343.224954: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
mmc_perf_test-2187 [005] ...1 343.225574: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
mmc_perf_test-2187 [000] ...1 370.239846: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
mmc_perf_test-2187 [000] ...1 370.241138: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
mmc_perf_test-2187 [001] ...1 397.029043: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
mmc_perf_test-2187 [001] ...1 397.030750: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
mmc_perf_test-2187 [003] ...1 425.386377: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
mmc_perf_test-2187 [003] ...1 425.387735: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
Most sit flush consume no more than 1ms.
Signed-off-by: Yunlei He <heyunlei@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-01-25 17:27:11 +08:00
|
|
|
return page;
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
}
|
|
|
|
|
f2fs: refactor flush_sit_entries codes for reducing SIT writes
In commit aec71382c681 ("f2fs: refactor flush_nat_entries codes for reducing NAT
writes"), we descripte the issue as below:
"Although building NAT journal in cursum reduce the read/write work for NAT
block, but previous design leave us lower performance when write checkpoint
frequently for these cases:
1. if journal in cursum has already full, it's a bit of waste that we flush all
nat entries to page for persistence, but not to cache any entries.
2. if journal in cursum is not full, we fill nat entries to journal util
journal is full, then flush the left dirty entries to disk without merge
journaled entries, so these journaled entries may be flushed to disk at next
checkpoint but lost chance to flushed last time."
Actually, we have the same problem in using SIT journal area.
In this patch, firstly we will update sit journal with dirty entries as many as
possible. Secondly if there is no space in sit journal, we will remove all
entries in journal and walk through the whole dirty entry bitmap of sit,
accounting dirty sit entries located in same SIT block to sit entry set. All
entry sets are linked to list sit_entry_set in sm_info, sorted ascending order
by count of entries in set. Later we flush entries in set which have fewest
entries into journal as many as we can, and then flush dense set with merged
entries to disk.
In this way we can use sit journal area more effectively, also we will reduce
SIT update, result in gaining in performance and saving lifetime of flash
device.
In my testing environment, it shows this patch can help to reduce SIT block
update obviously.
virtual machine + hard disk:
fsstress -p 20 -n 400 -l 5
sit page num cp count sit pages/cp
based 2006.50 1349.75 1.486
patched 1566.25 1463.25 1.070
Our latency of merging op is small when handling a great number of dirty SIT
entries in flush_sit_entries:
latency(ns) dirty sit count
36038 2151
49168 2123
37174 2232
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-09-04 18:13:01 +08:00
|
|
|
static struct sit_entry_set *grab_sit_entry_set(void)
|
|
|
|
{
|
|
|
|
struct sit_entry_set *ses =
|
2015-08-20 23:51:56 +08:00
|
|
|
f2fs_kmem_cache_alloc(sit_entry_set_slab, GFP_NOFS);
|
f2fs: refactor flush_sit_entries codes for reducing SIT writes
In commit aec71382c681 ("f2fs: refactor flush_nat_entries codes for reducing NAT
writes"), we descripte the issue as below:
"Although building NAT journal in cursum reduce the read/write work for NAT
block, but previous design leave us lower performance when write checkpoint
frequently for these cases:
1. if journal in cursum has already full, it's a bit of waste that we flush all
nat entries to page for persistence, but not to cache any entries.
2. if journal in cursum is not full, we fill nat entries to journal util
journal is full, then flush the left dirty entries to disk without merge
journaled entries, so these journaled entries may be flushed to disk at next
checkpoint but lost chance to flushed last time."
Actually, we have the same problem in using SIT journal area.
In this patch, firstly we will update sit journal with dirty entries as many as
possible. Secondly if there is no space in sit journal, we will remove all
entries in journal and walk through the whole dirty entry bitmap of sit,
accounting dirty sit entries located in same SIT block to sit entry set. All
entry sets are linked to list sit_entry_set in sm_info, sorted ascending order
by count of entries in set. Later we flush entries in set which have fewest
entries into journal as many as we can, and then flush dense set with merged
entries to disk.
In this way we can use sit journal area more effectively, also we will reduce
SIT update, result in gaining in performance and saving lifetime of flash
device.
In my testing environment, it shows this patch can help to reduce SIT block
update obviously.
virtual machine + hard disk:
fsstress -p 20 -n 400 -l 5
sit page num cp count sit pages/cp
based 2006.50 1349.75 1.486
patched 1566.25 1463.25 1.070
Our latency of merging op is small when handling a great number of dirty SIT
entries in flush_sit_entries:
latency(ns) dirty sit count
36038 2151
49168 2123
37174 2232
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-09-04 18:13:01 +08:00
|
|
|
|
|
|
|
ses->entry_cnt = 0;
|
|
|
|
INIT_LIST_HEAD(&ses->set_list);
|
|
|
|
return ses;
|
|
|
|
}
|
|
|
|
|
|
|
|
static void release_sit_entry_set(struct sit_entry_set *ses)
|
|
|
|
{
|
|
|
|
list_del(&ses->set_list);
|
|
|
|
kmem_cache_free(sit_entry_set_slab, ses);
|
|
|
|
}
|
|
|
|
|
|
|
|
static void adjust_sit_entry_set(struct sit_entry_set *ses,
|
|
|
|
struct list_head *head)
|
|
|
|
{
|
|
|
|
struct sit_entry_set *next = ses;
|
|
|
|
|
|
|
|
if (list_is_last(&ses->set_list, head))
|
|
|
|
return;
|
|
|
|
|
|
|
|
list_for_each_entry_continue(next, head, set_list)
|
|
|
|
if (ses->entry_cnt <= next->entry_cnt)
|
|
|
|
break;
|
|
|
|
|
|
|
|
list_move_tail(&ses->set_list, &next->set_list);
|
|
|
|
}
|
|
|
|
|
|
|
|
static void add_sit_entry(unsigned int segno, struct list_head *head)
|
|
|
|
{
|
|
|
|
struct sit_entry_set *ses;
|
|
|
|
unsigned int start_segno = START_SEGNO(segno);
|
|
|
|
|
|
|
|
list_for_each_entry(ses, head, set_list) {
|
|
|
|
if (ses->start_segno == start_segno) {
|
|
|
|
ses->entry_cnt++;
|
|
|
|
adjust_sit_entry_set(ses, head);
|
|
|
|
return;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
ses = grab_sit_entry_set();
|
|
|
|
|
|
|
|
ses->start_segno = start_segno;
|
|
|
|
ses->entry_cnt++;
|
|
|
|
list_add(&ses->set_list, head);
|
|
|
|
}
|
|
|
|
|
|
|
|
static void add_sits_in_set(struct f2fs_sb_info *sbi)
|
|
|
|
{
|
|
|
|
struct f2fs_sm_info *sm_info = SM_I(sbi);
|
|
|
|
struct list_head *set_list = &sm_info->sit_entry_set;
|
|
|
|
unsigned long *bitmap = SIT_I(sbi)->dirty_sentries_bitmap;
|
|
|
|
unsigned int segno;
|
|
|
|
|
2014-09-24 02:23:01 +08:00
|
|
|
for_each_set_bit(segno, bitmap, MAIN_SEGS(sbi))
|
f2fs: refactor flush_sit_entries codes for reducing SIT writes
In commit aec71382c681 ("f2fs: refactor flush_nat_entries codes for reducing NAT
writes"), we descripte the issue as below:
"Although building NAT journal in cursum reduce the read/write work for NAT
block, but previous design leave us lower performance when write checkpoint
frequently for these cases:
1. if journal in cursum has already full, it's a bit of waste that we flush all
nat entries to page for persistence, but not to cache any entries.
2. if journal in cursum is not full, we fill nat entries to journal util
journal is full, then flush the left dirty entries to disk without merge
journaled entries, so these journaled entries may be flushed to disk at next
checkpoint but lost chance to flushed last time."
Actually, we have the same problem in using SIT journal area.
In this patch, firstly we will update sit journal with dirty entries as many as
possible. Secondly if there is no space in sit journal, we will remove all
entries in journal and walk through the whole dirty entry bitmap of sit,
accounting dirty sit entries located in same SIT block to sit entry set. All
entry sets are linked to list sit_entry_set in sm_info, sorted ascending order
by count of entries in set. Later we flush entries in set which have fewest
entries into journal as many as we can, and then flush dense set with merged
entries to disk.
In this way we can use sit journal area more effectively, also we will reduce
SIT update, result in gaining in performance and saving lifetime of flash
device.
In my testing environment, it shows this patch can help to reduce SIT block
update obviously.
virtual machine + hard disk:
fsstress -p 20 -n 400 -l 5
sit page num cp count sit pages/cp
based 2006.50 1349.75 1.486
patched 1566.25 1463.25 1.070
Our latency of merging op is small when handling a great number of dirty SIT
entries in flush_sit_entries:
latency(ns) dirty sit count
36038 2151
49168 2123
37174 2232
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-09-04 18:13:01 +08:00
|
|
|
add_sit_entry(segno, set_list);
|
|
|
|
}
|
|
|
|
|
|
|
|
static void remove_sits_in_journal(struct f2fs_sb_info *sbi)
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
{
|
|
|
|
struct curseg_info *curseg = CURSEG_I(sbi, CURSEG_COLD_DATA);
|
f2fs: split journal cache from curseg cache
In curseg cache, f2fs caches two different parts:
- datas of current summay block, i.e. summary entries, footer info.
- journal info, i.e. sparse nat/sit entries or io stat info.
With this approach, 1) it may cause higher lock contention when we access
or update both of the parts of cache since we use the same mutex lock
curseg_mutex to protect the cache. 2) current summary block with last
journal info will be writebacked into device as a normal summary block
when flushing, however, we treat journal info as valid one only in current
summary, so most normal summary blocks contain junk journal data, it wastes
remaining space of summary block.
So, in order to fix above issues, we split curseg cache into two parts:
a) current summary block, protected by original mutex lock curseg_mutex
b) journal cache, protected by newly introduced r/w semaphore journal_rwsem
When loading curseg cache during ->mount, we store summary info and
journal info into different caches; When doing checkpoint, we combine
datas of two cache into current summary block for persisting.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-02-19 18:08:46 +08:00
|
|
|
struct f2fs_journal *journal = curseg->journal;
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
int i;
|
|
|
|
|
f2fs: split journal cache from curseg cache
In curseg cache, f2fs caches two different parts:
- datas of current summay block, i.e. summary entries, footer info.
- journal info, i.e. sparse nat/sit entries or io stat info.
With this approach, 1) it may cause higher lock contention when we access
or update both of the parts of cache since we use the same mutex lock
curseg_mutex to protect the cache. 2) current summary block with last
journal info will be writebacked into device as a normal summary block
when flushing, however, we treat journal info as valid one only in current
summary, so most normal summary blocks contain junk journal data, it wastes
remaining space of summary block.
So, in order to fix above issues, we split curseg cache into two parts:
a) current summary block, protected by original mutex lock curseg_mutex
b) journal cache, protected by newly introduced r/w semaphore journal_rwsem
When loading curseg cache during ->mount, we store summary info and
journal info into different caches; When doing checkpoint, we combine
datas of two cache into current summary block for persisting.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-02-19 18:08:46 +08:00
|
|
|
down_write(&curseg->journal_rwsem);
|
2016-02-14 18:50:40 +08:00
|
|
|
for (i = 0; i < sits_in_cursum(journal); i++) {
|
f2fs: refactor flush_sit_entries codes for reducing SIT writes
In commit aec71382c681 ("f2fs: refactor flush_nat_entries codes for reducing NAT
writes"), we descripte the issue as below:
"Although building NAT journal in cursum reduce the read/write work for NAT
block, but previous design leave us lower performance when write checkpoint
frequently for these cases:
1. if journal in cursum has already full, it's a bit of waste that we flush all
nat entries to page for persistence, but not to cache any entries.
2. if journal in cursum is not full, we fill nat entries to journal util
journal is full, then flush the left dirty entries to disk without merge
journaled entries, so these journaled entries may be flushed to disk at next
checkpoint but lost chance to flushed last time."
Actually, we have the same problem in using SIT journal area.
In this patch, firstly we will update sit journal with dirty entries as many as
possible. Secondly if there is no space in sit journal, we will remove all
entries in journal and walk through the whole dirty entry bitmap of sit,
accounting dirty sit entries located in same SIT block to sit entry set. All
entry sets are linked to list sit_entry_set in sm_info, sorted ascending order
by count of entries in set. Later we flush entries in set which have fewest
entries into journal as many as we can, and then flush dense set with merged
entries to disk.
In this way we can use sit journal area more effectively, also we will reduce
SIT update, result in gaining in performance and saving lifetime of flash
device.
In my testing environment, it shows this patch can help to reduce SIT block
update obviously.
virtual machine + hard disk:
fsstress -p 20 -n 400 -l 5
sit page num cp count sit pages/cp
based 2006.50 1349.75 1.486
patched 1566.25 1463.25 1.070
Our latency of merging op is small when handling a great number of dirty SIT
entries in flush_sit_entries:
latency(ns) dirty sit count
36038 2151
49168 2123
37174 2232
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-09-04 18:13:01 +08:00
|
|
|
unsigned int segno;
|
|
|
|
bool dirtied;
|
|
|
|
|
2016-02-14 18:50:40 +08:00
|
|
|
segno = le32_to_cpu(segno_in_journal(journal, i));
|
f2fs: refactor flush_sit_entries codes for reducing SIT writes
In commit aec71382c681 ("f2fs: refactor flush_nat_entries codes for reducing NAT
writes"), we descripte the issue as below:
"Although building NAT journal in cursum reduce the read/write work for NAT
block, but previous design leave us lower performance when write checkpoint
frequently for these cases:
1. if journal in cursum has already full, it's a bit of waste that we flush all
nat entries to page for persistence, but not to cache any entries.
2. if journal in cursum is not full, we fill nat entries to journal util
journal is full, then flush the left dirty entries to disk without merge
journaled entries, so these journaled entries may be flushed to disk at next
checkpoint but lost chance to flushed last time."
Actually, we have the same problem in using SIT journal area.
In this patch, firstly we will update sit journal with dirty entries as many as
possible. Secondly if there is no space in sit journal, we will remove all
entries in journal and walk through the whole dirty entry bitmap of sit,
accounting dirty sit entries located in same SIT block to sit entry set. All
entry sets are linked to list sit_entry_set in sm_info, sorted ascending order
by count of entries in set. Later we flush entries in set which have fewest
entries into journal as many as we can, and then flush dense set with merged
entries to disk.
In this way we can use sit journal area more effectively, also we will reduce
SIT update, result in gaining in performance and saving lifetime of flash
device.
In my testing environment, it shows this patch can help to reduce SIT block
update obviously.
virtual machine + hard disk:
fsstress -p 20 -n 400 -l 5
sit page num cp count sit pages/cp
based 2006.50 1349.75 1.486
patched 1566.25 1463.25 1.070
Our latency of merging op is small when handling a great number of dirty SIT
entries in flush_sit_entries:
latency(ns) dirty sit count
36038 2151
49168 2123
37174 2232
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-09-04 18:13:01 +08:00
|
|
|
dirtied = __mark_sit_entry_dirty(sbi, segno);
|
|
|
|
|
|
|
|
if (!dirtied)
|
|
|
|
add_sit_entry(segno, &SM_I(sbi)->sit_entry_set);
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
}
|
2016-02-14 18:50:40 +08:00
|
|
|
update_sits_in_cursum(journal, -i);
|
f2fs: split journal cache from curseg cache
In curseg cache, f2fs caches two different parts:
- datas of current summay block, i.e. summary entries, footer info.
- journal info, i.e. sparse nat/sit entries or io stat info.
With this approach, 1) it may cause higher lock contention when we access
or update both of the parts of cache since we use the same mutex lock
curseg_mutex to protect the cache. 2) current summary block with last
journal info will be writebacked into device as a normal summary block
when flushing, however, we treat journal info as valid one only in current
summary, so most normal summary blocks contain junk journal data, it wastes
remaining space of summary block.
So, in order to fix above issues, we split curseg cache into two parts:
a) current summary block, protected by original mutex lock curseg_mutex
b) journal cache, protected by newly introduced r/w semaphore journal_rwsem
When loading curseg cache during ->mount, we store summary info and
journal info into different caches; When doing checkpoint, we combine
datas of two cache into current summary block for persisting.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-02-19 18:08:46 +08:00
|
|
|
up_write(&curseg->journal_rwsem);
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
}
|
|
|
|
|
2012-11-29 12:28:09 +08:00
|
|
|
/*
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
* CP calls this function, which flushes SIT entries including sit_journal,
|
|
|
|
* and moves prefree segs to free segs.
|
|
|
|
*/
|
f2fs: clean up symbol namespace
As Ted reported:
"Hi, I was looking at f2fs's sources recently, and I noticed that there
is a very large number of non-static symbols which don't have a f2fs
prefix. There's well over a hundred (see attached below).
As one example, in fs/f2fs/dir.c there is:
unsigned char get_de_type(struct f2fs_dir_entry *de)
This function is clearly only useful for f2fs, but it has a generic
name. This means that if any other file system tries to have the same
symbol name, there will be a symbol conflict and the kernel would not
successfully build. It also means that when someone is looking f2fs
sources, it's not at all obvious whether a function such as
read_data_page(), invalidate_blocks(), is a generic kernel function
found in the fs, mm, or block layers, or a f2fs specific function.
You might want to fix this at some point. Hopefully Kent's bcachefs
isn't similarly using genericly named functions, since that might
cause conflicts with f2fs's functions --- but just as this would be a
problem that we would rightly insist that Kent fix, this is something
that we should have rightly insisted that f2fs should have fixed
before it was integrated into the mainline kernel.
acquire_orphan_inode
add_ino_entry
add_orphan_inode
allocate_data_block
allocate_new_segments
alloc_nid
alloc_nid_done
alloc_nid_failed
available_free_memory
...."
This patch adds "f2fs_" prefix for all non-static symbols in order to:
a) avoid conflict with other kernel generic symbols;
b) to indicate the function is f2fs specific one instead of generic
one;
Reported-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-30 00:20:41 +08:00
|
|
|
void f2fs_flush_sit_entries(struct f2fs_sb_info *sbi, struct cp_control *cpc)
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
{
|
|
|
|
struct sit_info *sit_i = SIT_I(sbi);
|
|
|
|
unsigned long *bitmap = sit_i->dirty_sentries_bitmap;
|
|
|
|
struct curseg_info *curseg = CURSEG_I(sbi, CURSEG_COLD_DATA);
|
f2fs: split journal cache from curseg cache
In curseg cache, f2fs caches two different parts:
- datas of current summay block, i.e. summary entries, footer info.
- journal info, i.e. sparse nat/sit entries or io stat info.
With this approach, 1) it may cause higher lock contention when we access
or update both of the parts of cache since we use the same mutex lock
curseg_mutex to protect the cache. 2) current summary block with last
journal info will be writebacked into device as a normal summary block
when flushing, however, we treat journal info as valid one only in current
summary, so most normal summary blocks contain junk journal data, it wastes
remaining space of summary block.
So, in order to fix above issues, we split curseg cache into two parts:
a) current summary block, protected by original mutex lock curseg_mutex
b) journal cache, protected by newly introduced r/w semaphore journal_rwsem
When loading curseg cache during ->mount, we store summary info and
journal info into different caches; When doing checkpoint, we combine
datas of two cache into current summary block for persisting.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-02-19 18:08:46 +08:00
|
|
|
struct f2fs_journal *journal = curseg->journal;
|
f2fs: refactor flush_sit_entries codes for reducing SIT writes
In commit aec71382c681 ("f2fs: refactor flush_nat_entries codes for reducing NAT
writes"), we descripte the issue as below:
"Although building NAT journal in cursum reduce the read/write work for NAT
block, but previous design leave us lower performance when write checkpoint
frequently for these cases:
1. if journal in cursum has already full, it's a bit of waste that we flush all
nat entries to page for persistence, but not to cache any entries.
2. if journal in cursum is not full, we fill nat entries to journal util
journal is full, then flush the left dirty entries to disk without merge
journaled entries, so these journaled entries may be flushed to disk at next
checkpoint but lost chance to flushed last time."
Actually, we have the same problem in using SIT journal area.
In this patch, firstly we will update sit journal with dirty entries as many as
possible. Secondly if there is no space in sit journal, we will remove all
entries in journal and walk through the whole dirty entry bitmap of sit,
accounting dirty sit entries located in same SIT block to sit entry set. All
entry sets are linked to list sit_entry_set in sm_info, sorted ascending order
by count of entries in set. Later we flush entries in set which have fewest
entries into journal as many as we can, and then flush dense set with merged
entries to disk.
In this way we can use sit journal area more effectively, also we will reduce
SIT update, result in gaining in performance and saving lifetime of flash
device.
In my testing environment, it shows this patch can help to reduce SIT block
update obviously.
virtual machine + hard disk:
fsstress -p 20 -n 400 -l 5
sit page num cp count sit pages/cp
based 2006.50 1349.75 1.486
patched 1566.25 1463.25 1.070
Our latency of merging op is small when handling a great number of dirty SIT
entries in flush_sit_entries:
latency(ns) dirty sit count
36038 2151
49168 2123
37174 2232
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-09-04 18:13:01 +08:00
|
|
|
struct sit_entry_set *ses, *tmp;
|
|
|
|
struct list_head *head = &SM_I(sbi)->sit_entry_set;
|
|
|
|
bool to_journal = true;
|
2014-09-21 13:06:39 +08:00
|
|
|
struct seg_entry *se;
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
|
2017-10-30 17:49:53 +08:00
|
|
|
down_write(&sit_i->sentry_lock);
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
|
2015-02-27 16:52:50 +08:00
|
|
|
if (!sit_i->dirty_sentries)
|
|
|
|
goto out;
|
|
|
|
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
/*
|
f2fs: refactor flush_sit_entries codes for reducing SIT writes
In commit aec71382c681 ("f2fs: refactor flush_nat_entries codes for reducing NAT
writes"), we descripte the issue as below:
"Although building NAT journal in cursum reduce the read/write work for NAT
block, but previous design leave us lower performance when write checkpoint
frequently for these cases:
1. if journal in cursum has already full, it's a bit of waste that we flush all
nat entries to page for persistence, but not to cache any entries.
2. if journal in cursum is not full, we fill nat entries to journal util
journal is full, then flush the left dirty entries to disk without merge
journaled entries, so these journaled entries may be flushed to disk at next
checkpoint but lost chance to flushed last time."
Actually, we have the same problem in using SIT journal area.
In this patch, firstly we will update sit journal with dirty entries as many as
possible. Secondly if there is no space in sit journal, we will remove all
entries in journal and walk through the whole dirty entry bitmap of sit,
accounting dirty sit entries located in same SIT block to sit entry set. All
entry sets are linked to list sit_entry_set in sm_info, sorted ascending order
by count of entries in set. Later we flush entries in set which have fewest
entries into journal as many as we can, and then flush dense set with merged
entries to disk.
In this way we can use sit journal area more effectively, also we will reduce
SIT update, result in gaining in performance and saving lifetime of flash
device.
In my testing environment, it shows this patch can help to reduce SIT block
update obviously.
virtual machine + hard disk:
fsstress -p 20 -n 400 -l 5
sit page num cp count sit pages/cp
based 2006.50 1349.75 1.486
patched 1566.25 1463.25 1.070
Our latency of merging op is small when handling a great number of dirty SIT
entries in flush_sit_entries:
latency(ns) dirty sit count
36038 2151
49168 2123
37174 2232
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-09-04 18:13:01 +08:00
|
|
|
* add and account sit entries of dirty bitmap in sit entry
|
|
|
|
* set temporarily
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
*/
|
f2fs: refactor flush_sit_entries codes for reducing SIT writes
In commit aec71382c681 ("f2fs: refactor flush_nat_entries codes for reducing NAT
writes"), we descripte the issue as below:
"Although building NAT journal in cursum reduce the read/write work for NAT
block, but previous design leave us lower performance when write checkpoint
frequently for these cases:
1. if journal in cursum has already full, it's a bit of waste that we flush all
nat entries to page for persistence, but not to cache any entries.
2. if journal in cursum is not full, we fill nat entries to journal util
journal is full, then flush the left dirty entries to disk without merge
journaled entries, so these journaled entries may be flushed to disk at next
checkpoint but lost chance to flushed last time."
Actually, we have the same problem in using SIT journal area.
In this patch, firstly we will update sit journal with dirty entries as many as
possible. Secondly if there is no space in sit journal, we will remove all
entries in journal and walk through the whole dirty entry bitmap of sit,
accounting dirty sit entries located in same SIT block to sit entry set. All
entry sets are linked to list sit_entry_set in sm_info, sorted ascending order
by count of entries in set. Later we flush entries in set which have fewest
entries into journal as many as we can, and then flush dense set with merged
entries to disk.
In this way we can use sit journal area more effectively, also we will reduce
SIT update, result in gaining in performance and saving lifetime of flash
device.
In my testing environment, it shows this patch can help to reduce SIT block
update obviously.
virtual machine + hard disk:
fsstress -p 20 -n 400 -l 5
sit page num cp count sit pages/cp
based 2006.50 1349.75 1.486
patched 1566.25 1463.25 1.070
Our latency of merging op is small when handling a great number of dirty SIT
entries in flush_sit_entries:
latency(ns) dirty sit count
36038 2151
49168 2123
37174 2232
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-09-04 18:13:01 +08:00
|
|
|
add_sits_in_set(sbi);
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
|
f2fs: refactor flush_sit_entries codes for reducing SIT writes
In commit aec71382c681 ("f2fs: refactor flush_nat_entries codes for reducing NAT
writes"), we descripte the issue as below:
"Although building NAT journal in cursum reduce the read/write work for NAT
block, but previous design leave us lower performance when write checkpoint
frequently for these cases:
1. if journal in cursum has already full, it's a bit of waste that we flush all
nat entries to page for persistence, but not to cache any entries.
2. if journal in cursum is not full, we fill nat entries to journal util
journal is full, then flush the left dirty entries to disk without merge
journaled entries, so these journaled entries may be flushed to disk at next
checkpoint but lost chance to flushed last time."
Actually, we have the same problem in using SIT journal area.
In this patch, firstly we will update sit journal with dirty entries as many as
possible. Secondly if there is no space in sit journal, we will remove all
entries in journal and walk through the whole dirty entry bitmap of sit,
accounting dirty sit entries located in same SIT block to sit entry set. All
entry sets are linked to list sit_entry_set in sm_info, sorted ascending order
by count of entries in set. Later we flush entries in set which have fewest
entries into journal as many as we can, and then flush dense set with merged
entries to disk.
In this way we can use sit journal area more effectively, also we will reduce
SIT update, result in gaining in performance and saving lifetime of flash
device.
In my testing environment, it shows this patch can help to reduce SIT block
update obviously.
virtual machine + hard disk:
fsstress -p 20 -n 400 -l 5
sit page num cp count sit pages/cp
based 2006.50 1349.75 1.486
patched 1566.25 1463.25 1.070
Our latency of merging op is small when handling a great number of dirty SIT
entries in flush_sit_entries:
latency(ns) dirty sit count
36038 2151
49168 2123
37174 2232
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-09-04 18:13:01 +08:00
|
|
|
/*
|
|
|
|
* if there are no enough space in journal to store dirty sit
|
|
|
|
* entries, remove all entries from journal and add and account
|
|
|
|
* them in sit entry set.
|
|
|
|
*/
|
2016-02-14 18:50:40 +08:00
|
|
|
if (!__has_cursum_space(journal, sit_i->dirty_sentries, SIT_JOURNAL))
|
f2fs: refactor flush_sit_entries codes for reducing SIT writes
In commit aec71382c681 ("f2fs: refactor flush_nat_entries codes for reducing NAT
writes"), we descripte the issue as below:
"Although building NAT journal in cursum reduce the read/write work for NAT
block, but previous design leave us lower performance when write checkpoint
frequently for these cases:
1. if journal in cursum has already full, it's a bit of waste that we flush all
nat entries to page for persistence, but not to cache any entries.
2. if journal in cursum is not full, we fill nat entries to journal util
journal is full, then flush the left dirty entries to disk without merge
journaled entries, so these journaled entries may be flushed to disk at next
checkpoint but lost chance to flushed last time."
Actually, we have the same problem in using SIT journal area.
In this patch, firstly we will update sit journal with dirty entries as many as
possible. Secondly if there is no space in sit journal, we will remove all
entries in journal and walk through the whole dirty entry bitmap of sit,
accounting dirty sit entries located in same SIT block to sit entry set. All
entry sets are linked to list sit_entry_set in sm_info, sorted ascending order
by count of entries in set. Later we flush entries in set which have fewest
entries into journal as many as we can, and then flush dense set with merged
entries to disk.
In this way we can use sit journal area more effectively, also we will reduce
SIT update, result in gaining in performance and saving lifetime of flash
device.
In my testing environment, it shows this patch can help to reduce SIT block
update obviously.
virtual machine + hard disk:
fsstress -p 20 -n 400 -l 5
sit page num cp count sit pages/cp
based 2006.50 1349.75 1.486
patched 1566.25 1463.25 1.070
Our latency of merging op is small when handling a great number of dirty SIT
entries in flush_sit_entries:
latency(ns) dirty sit count
36038 2151
49168 2123
37174 2232
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-09-04 18:13:01 +08:00
|
|
|
remove_sits_in_journal(sbi);
|
2013-11-12 13:49:56 +08:00
|
|
|
|
f2fs: refactor flush_sit_entries codes for reducing SIT writes
In commit aec71382c681 ("f2fs: refactor flush_nat_entries codes for reducing NAT
writes"), we descripte the issue as below:
"Although building NAT journal in cursum reduce the read/write work for NAT
block, but previous design leave us lower performance when write checkpoint
frequently for these cases:
1. if journal in cursum has already full, it's a bit of waste that we flush all
nat entries to page for persistence, but not to cache any entries.
2. if journal in cursum is not full, we fill nat entries to journal util
journal is full, then flush the left dirty entries to disk without merge
journaled entries, so these journaled entries may be flushed to disk at next
checkpoint but lost chance to flushed last time."
Actually, we have the same problem in using SIT journal area.
In this patch, firstly we will update sit journal with dirty entries as many as
possible. Secondly if there is no space in sit journal, we will remove all
entries in journal and walk through the whole dirty entry bitmap of sit,
accounting dirty sit entries located in same SIT block to sit entry set. All
entry sets are linked to list sit_entry_set in sm_info, sorted ascending order
by count of entries in set. Later we flush entries in set which have fewest
entries into journal as many as we can, and then flush dense set with merged
entries to disk.
In this way we can use sit journal area more effectively, also we will reduce
SIT update, result in gaining in performance and saving lifetime of flash
device.
In my testing environment, it shows this patch can help to reduce SIT block
update obviously.
virtual machine + hard disk:
fsstress -p 20 -n 400 -l 5
sit page num cp count sit pages/cp
based 2006.50 1349.75 1.486
patched 1566.25 1463.25 1.070
Our latency of merging op is small when handling a great number of dirty SIT
entries in flush_sit_entries:
latency(ns) dirty sit count
36038 2151
49168 2123
37174 2232
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-09-04 18:13:01 +08:00
|
|
|
/*
|
|
|
|
* there are two steps to flush sit entries:
|
|
|
|
* #1, flush sit entries to journal in current cold data summary block.
|
|
|
|
* #2, flush sit entries to sit page.
|
|
|
|
*/
|
|
|
|
list_for_each_entry_safe(ses, tmp, head, set_list) {
|
2014-10-17 02:43:30 +08:00
|
|
|
struct page *page = NULL;
|
f2fs: refactor flush_sit_entries codes for reducing SIT writes
In commit aec71382c681 ("f2fs: refactor flush_nat_entries codes for reducing NAT
writes"), we descripte the issue as below:
"Although building NAT journal in cursum reduce the read/write work for NAT
block, but previous design leave us lower performance when write checkpoint
frequently for these cases:
1. if journal in cursum has already full, it's a bit of waste that we flush all
nat entries to page for persistence, but not to cache any entries.
2. if journal in cursum is not full, we fill nat entries to journal util
journal is full, then flush the left dirty entries to disk without merge
journaled entries, so these journaled entries may be flushed to disk at next
checkpoint but lost chance to flushed last time."
Actually, we have the same problem in using SIT journal area.
In this patch, firstly we will update sit journal with dirty entries as many as
possible. Secondly if there is no space in sit journal, we will remove all
entries in journal and walk through the whole dirty entry bitmap of sit,
accounting dirty sit entries located in same SIT block to sit entry set. All
entry sets are linked to list sit_entry_set in sm_info, sorted ascending order
by count of entries in set. Later we flush entries in set which have fewest
entries into journal as many as we can, and then flush dense set with merged
entries to disk.
In this way we can use sit journal area more effectively, also we will reduce
SIT update, result in gaining in performance and saving lifetime of flash
device.
In my testing environment, it shows this patch can help to reduce SIT block
update obviously.
virtual machine + hard disk:
fsstress -p 20 -n 400 -l 5
sit page num cp count sit pages/cp
based 2006.50 1349.75 1.486
patched 1566.25 1463.25 1.070
Our latency of merging op is small when handling a great number of dirty SIT
entries in flush_sit_entries:
latency(ns) dirty sit count
36038 2151
49168 2123
37174 2232
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-09-04 18:13:01 +08:00
|
|
|
struct f2fs_sit_block *raw_sit = NULL;
|
|
|
|
unsigned int start_segno = ses->start_segno;
|
|
|
|
unsigned int end = min(start_segno + SIT_ENTRY_PER_BLOCK,
|
2014-09-24 02:23:01 +08:00
|
|
|
(unsigned long)MAIN_SEGS(sbi));
|
f2fs: refactor flush_sit_entries codes for reducing SIT writes
In commit aec71382c681 ("f2fs: refactor flush_nat_entries codes for reducing NAT
writes"), we descripte the issue as below:
"Although building NAT journal in cursum reduce the read/write work for NAT
block, but previous design leave us lower performance when write checkpoint
frequently for these cases:
1. if journal in cursum has already full, it's a bit of waste that we flush all
nat entries to page for persistence, but not to cache any entries.
2. if journal in cursum is not full, we fill nat entries to journal util
journal is full, then flush the left dirty entries to disk without merge
journaled entries, so these journaled entries may be flushed to disk at next
checkpoint but lost chance to flushed last time."
Actually, we have the same problem in using SIT journal area.
In this patch, firstly we will update sit journal with dirty entries as many as
possible. Secondly if there is no space in sit journal, we will remove all
entries in journal and walk through the whole dirty entry bitmap of sit,
accounting dirty sit entries located in same SIT block to sit entry set. All
entry sets are linked to list sit_entry_set in sm_info, sorted ascending order
by count of entries in set. Later we flush entries in set which have fewest
entries into journal as many as we can, and then flush dense set with merged
entries to disk.
In this way we can use sit journal area more effectively, also we will reduce
SIT update, result in gaining in performance and saving lifetime of flash
device.
In my testing environment, it shows this patch can help to reduce SIT block
update obviously.
virtual machine + hard disk:
fsstress -p 20 -n 400 -l 5
sit page num cp count sit pages/cp
based 2006.50 1349.75 1.486
patched 1566.25 1463.25 1.070
Our latency of merging op is small when handling a great number of dirty SIT
entries in flush_sit_entries:
latency(ns) dirty sit count
36038 2151
49168 2123
37174 2232
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-09-04 18:13:01 +08:00
|
|
|
unsigned int segno = start_segno;
|
|
|
|
|
|
|
|
if (to_journal &&
|
2016-02-14 18:50:40 +08:00
|
|
|
!__has_cursum_space(journal, ses->entry_cnt, SIT_JOURNAL))
|
f2fs: refactor flush_sit_entries codes for reducing SIT writes
In commit aec71382c681 ("f2fs: refactor flush_nat_entries codes for reducing NAT
writes"), we descripte the issue as below:
"Although building NAT journal in cursum reduce the read/write work for NAT
block, but previous design leave us lower performance when write checkpoint
frequently for these cases:
1. if journal in cursum has already full, it's a bit of waste that we flush all
nat entries to page for persistence, but not to cache any entries.
2. if journal in cursum is not full, we fill nat entries to journal util
journal is full, then flush the left dirty entries to disk without merge
journaled entries, so these journaled entries may be flushed to disk at next
checkpoint but lost chance to flushed last time."
Actually, we have the same problem in using SIT journal area.
In this patch, firstly we will update sit journal with dirty entries as many as
possible. Secondly if there is no space in sit journal, we will remove all
entries in journal and walk through the whole dirty entry bitmap of sit,
accounting dirty sit entries located in same SIT block to sit entry set. All
entry sets are linked to list sit_entry_set in sm_info, sorted ascending order
by count of entries in set. Later we flush entries in set which have fewest
entries into journal as many as we can, and then flush dense set with merged
entries to disk.
In this way we can use sit journal area more effectively, also we will reduce
SIT update, result in gaining in performance and saving lifetime of flash
device.
In my testing environment, it shows this patch can help to reduce SIT block
update obviously.
virtual machine + hard disk:
fsstress -p 20 -n 400 -l 5
sit page num cp count sit pages/cp
based 2006.50 1349.75 1.486
patched 1566.25 1463.25 1.070
Our latency of merging op is small when handling a great number of dirty SIT
entries in flush_sit_entries:
latency(ns) dirty sit count
36038 2151
49168 2123
37174 2232
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-09-04 18:13:01 +08:00
|
|
|
to_journal = false;
|
|
|
|
|
f2fs: split journal cache from curseg cache
In curseg cache, f2fs caches two different parts:
- datas of current summay block, i.e. summary entries, footer info.
- journal info, i.e. sparse nat/sit entries or io stat info.
With this approach, 1) it may cause higher lock contention when we access
or update both of the parts of cache since we use the same mutex lock
curseg_mutex to protect the cache. 2) current summary block with last
journal info will be writebacked into device as a normal summary block
when flushing, however, we treat journal info as valid one only in current
summary, so most normal summary blocks contain junk journal data, it wastes
remaining space of summary block.
So, in order to fix above issues, we split curseg cache into two parts:
a) current summary block, protected by original mutex lock curseg_mutex
b) journal cache, protected by newly introduced r/w semaphore journal_rwsem
When loading curseg cache during ->mount, we store summary info and
journal info into different caches; When doing checkpoint, we combine
datas of two cache into current summary block for persisting.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-02-19 18:08:46 +08:00
|
|
|
if (to_journal) {
|
|
|
|
down_write(&curseg->journal_rwsem);
|
|
|
|
} else {
|
f2fs: refactor flush_sit_entries codes for reducing SIT writes
In commit aec71382c681 ("f2fs: refactor flush_nat_entries codes for reducing NAT
writes"), we descripte the issue as below:
"Although building NAT journal in cursum reduce the read/write work for NAT
block, but previous design leave us lower performance when write checkpoint
frequently for these cases:
1. if journal in cursum has already full, it's a bit of waste that we flush all
nat entries to page for persistence, but not to cache any entries.
2. if journal in cursum is not full, we fill nat entries to journal util
journal is full, then flush the left dirty entries to disk without merge
journaled entries, so these journaled entries may be flushed to disk at next
checkpoint but lost chance to flushed last time."
Actually, we have the same problem in using SIT journal area.
In this patch, firstly we will update sit journal with dirty entries as many as
possible. Secondly if there is no space in sit journal, we will remove all
entries in journal and walk through the whole dirty entry bitmap of sit,
accounting dirty sit entries located in same SIT block to sit entry set. All
entry sets are linked to list sit_entry_set in sm_info, sorted ascending order
by count of entries in set. Later we flush entries in set which have fewest
entries into journal as many as we can, and then flush dense set with merged
entries to disk.
In this way we can use sit journal area more effectively, also we will reduce
SIT update, result in gaining in performance and saving lifetime of flash
device.
In my testing environment, it shows this patch can help to reduce SIT block
update obviously.
virtual machine + hard disk:
fsstress -p 20 -n 400 -l 5
sit page num cp count sit pages/cp
based 2006.50 1349.75 1.486
patched 1566.25 1463.25 1.070
Our latency of merging op is small when handling a great number of dirty SIT
entries in flush_sit_entries:
latency(ns) dirty sit count
36038 2151
49168 2123
37174 2232
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-09-04 18:13:01 +08:00
|
|
|
page = get_next_sit_page(sbi, start_segno);
|
|
|
|
raw_sit = page_address(page);
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
}
|
|
|
|
|
f2fs: refactor flush_sit_entries codes for reducing SIT writes
In commit aec71382c681 ("f2fs: refactor flush_nat_entries codes for reducing NAT
writes"), we descripte the issue as below:
"Although building NAT journal in cursum reduce the read/write work for NAT
block, but previous design leave us lower performance when write checkpoint
frequently for these cases:
1. if journal in cursum has already full, it's a bit of waste that we flush all
nat entries to page for persistence, but not to cache any entries.
2. if journal in cursum is not full, we fill nat entries to journal util
journal is full, then flush the left dirty entries to disk without merge
journaled entries, so these journaled entries may be flushed to disk at next
checkpoint but lost chance to flushed last time."
Actually, we have the same problem in using SIT journal area.
In this patch, firstly we will update sit journal with dirty entries as many as
possible. Secondly if there is no space in sit journal, we will remove all
entries in journal and walk through the whole dirty entry bitmap of sit,
accounting dirty sit entries located in same SIT block to sit entry set. All
entry sets are linked to list sit_entry_set in sm_info, sorted ascending order
by count of entries in set. Later we flush entries in set which have fewest
entries into journal as many as we can, and then flush dense set with merged
entries to disk.
In this way we can use sit journal area more effectively, also we will reduce
SIT update, result in gaining in performance and saving lifetime of flash
device.
In my testing environment, it shows this patch can help to reduce SIT block
update obviously.
virtual machine + hard disk:
fsstress -p 20 -n 400 -l 5
sit page num cp count sit pages/cp
based 2006.50 1349.75 1.486
patched 1566.25 1463.25 1.070
Our latency of merging op is small when handling a great number of dirty SIT
entries in flush_sit_entries:
latency(ns) dirty sit count
36038 2151
49168 2123
37174 2232
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-09-04 18:13:01 +08:00
|
|
|
/* flush dirty sit entries in region of current sit set */
|
|
|
|
for_each_set_bit_from(segno, bitmap, end) {
|
|
|
|
int offset, sit_offset;
|
2014-09-21 13:06:39 +08:00
|
|
|
|
|
|
|
se = get_seg_entry(sbi, segno);
|
2018-04-09 04:28:41 +08:00
|
|
|
#ifdef CONFIG_F2FS_CHECK_FS
|
|
|
|
if (memcmp(se->cur_valid_map, se->cur_valid_map_mir,
|
|
|
|
SIT_VBLOCK_MAP_SIZE))
|
|
|
|
f2fs_bug_on(sbi, 1);
|
|
|
|
#endif
|
f2fs: refactor flush_sit_entries codes for reducing SIT writes
In commit aec71382c681 ("f2fs: refactor flush_nat_entries codes for reducing NAT
writes"), we descripte the issue as below:
"Although building NAT journal in cursum reduce the read/write work for NAT
block, but previous design leave us lower performance when write checkpoint
frequently for these cases:
1. if journal in cursum has already full, it's a bit of waste that we flush all
nat entries to page for persistence, but not to cache any entries.
2. if journal in cursum is not full, we fill nat entries to journal util
journal is full, then flush the left dirty entries to disk without merge
journaled entries, so these journaled entries may be flushed to disk at next
checkpoint but lost chance to flushed last time."
Actually, we have the same problem in using SIT journal area.
In this patch, firstly we will update sit journal with dirty entries as many as
possible. Secondly if there is no space in sit journal, we will remove all
entries in journal and walk through the whole dirty entry bitmap of sit,
accounting dirty sit entries located in same SIT block to sit entry set. All
entry sets are linked to list sit_entry_set in sm_info, sorted ascending order
by count of entries in set. Later we flush entries in set which have fewest
entries into journal as many as we can, and then flush dense set with merged
entries to disk.
In this way we can use sit journal area more effectively, also we will reduce
SIT update, result in gaining in performance and saving lifetime of flash
device.
In my testing environment, it shows this patch can help to reduce SIT block
update obviously.
virtual machine + hard disk:
fsstress -p 20 -n 400 -l 5
sit page num cp count sit pages/cp
based 2006.50 1349.75 1.486
patched 1566.25 1463.25 1.070
Our latency of merging op is small when handling a great number of dirty SIT
entries in flush_sit_entries:
latency(ns) dirty sit count
36038 2151
49168 2123
37174 2232
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-09-04 18:13:01 +08:00
|
|
|
|
|
|
|
/* add discard candidates */
|
2017-04-27 20:40:39 +08:00
|
|
|
if (!(cpc->reason & CP_DISCARD)) {
|
2014-09-21 13:06:39 +08:00
|
|
|
cpc->trim_start = segno;
|
2016-12-30 14:06:15 +08:00
|
|
|
add_discard_addrs(sbi, cpc, false);
|
2014-09-21 13:06:39 +08:00
|
|
|
}
|
f2fs: refactor flush_sit_entries codes for reducing SIT writes
In commit aec71382c681 ("f2fs: refactor flush_nat_entries codes for reducing NAT
writes"), we descripte the issue as below:
"Although building NAT journal in cursum reduce the read/write work for NAT
block, but previous design leave us lower performance when write checkpoint
frequently for these cases:
1. if journal in cursum has already full, it's a bit of waste that we flush all
nat entries to page for persistence, but not to cache any entries.
2. if journal in cursum is not full, we fill nat entries to journal util
journal is full, then flush the left dirty entries to disk without merge
journaled entries, so these journaled entries may be flushed to disk at next
checkpoint but lost chance to flushed last time."
Actually, we have the same problem in using SIT journal area.
In this patch, firstly we will update sit journal with dirty entries as many as
possible. Secondly if there is no space in sit journal, we will remove all
entries in journal and walk through the whole dirty entry bitmap of sit,
accounting dirty sit entries located in same SIT block to sit entry set. All
entry sets are linked to list sit_entry_set in sm_info, sorted ascending order
by count of entries in set. Later we flush entries in set which have fewest
entries into journal as many as we can, and then flush dense set with merged
entries to disk.
In this way we can use sit journal area more effectively, also we will reduce
SIT update, result in gaining in performance and saving lifetime of flash
device.
In my testing environment, it shows this patch can help to reduce SIT block
update obviously.
virtual machine + hard disk:
fsstress -p 20 -n 400 -l 5
sit page num cp count sit pages/cp
based 2006.50 1349.75 1.486
patched 1566.25 1463.25 1.070
Our latency of merging op is small when handling a great number of dirty SIT
entries in flush_sit_entries:
latency(ns) dirty sit count
36038 2151
49168 2123
37174 2232
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-09-04 18:13:01 +08:00
|
|
|
|
|
|
|
if (to_journal) {
|
f2fs: clean up symbol namespace
As Ted reported:
"Hi, I was looking at f2fs's sources recently, and I noticed that there
is a very large number of non-static symbols which don't have a f2fs
prefix. There's well over a hundred (see attached below).
As one example, in fs/f2fs/dir.c there is:
unsigned char get_de_type(struct f2fs_dir_entry *de)
This function is clearly only useful for f2fs, but it has a generic
name. This means that if any other file system tries to have the same
symbol name, there will be a symbol conflict and the kernel would not
successfully build. It also means that when someone is looking f2fs
sources, it's not at all obvious whether a function such as
read_data_page(), invalidate_blocks(), is a generic kernel function
found in the fs, mm, or block layers, or a f2fs specific function.
You might want to fix this at some point. Hopefully Kent's bcachefs
isn't similarly using genericly named functions, since that might
cause conflicts with f2fs's functions --- but just as this would be a
problem that we would rightly insist that Kent fix, this is something
that we should have rightly insisted that f2fs should have fixed
before it was integrated into the mainline kernel.
acquire_orphan_inode
add_ino_entry
add_orphan_inode
allocate_data_block
allocate_new_segments
alloc_nid
alloc_nid_done
alloc_nid_failed
available_free_memory
...."
This patch adds "f2fs_" prefix for all non-static symbols in order to:
a) avoid conflict with other kernel generic symbols;
b) to indicate the function is f2fs specific one instead of generic
one;
Reported-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-30 00:20:41 +08:00
|
|
|
offset = f2fs_lookup_journal_in_cursum(journal,
|
f2fs: refactor flush_sit_entries codes for reducing SIT writes
In commit aec71382c681 ("f2fs: refactor flush_nat_entries codes for reducing NAT
writes"), we descripte the issue as below:
"Although building NAT journal in cursum reduce the read/write work for NAT
block, but previous design leave us lower performance when write checkpoint
frequently for these cases:
1. if journal in cursum has already full, it's a bit of waste that we flush all
nat entries to page for persistence, but not to cache any entries.
2. if journal in cursum is not full, we fill nat entries to journal util
journal is full, then flush the left dirty entries to disk without merge
journaled entries, so these journaled entries may be flushed to disk at next
checkpoint but lost chance to flushed last time."
Actually, we have the same problem in using SIT journal area.
In this patch, firstly we will update sit journal with dirty entries as many as
possible. Secondly if there is no space in sit journal, we will remove all
entries in journal and walk through the whole dirty entry bitmap of sit,
accounting dirty sit entries located in same SIT block to sit entry set. All
entry sets are linked to list sit_entry_set in sm_info, sorted ascending order
by count of entries in set. Later we flush entries in set which have fewest
entries into journal as many as we can, and then flush dense set with merged
entries to disk.
In this way we can use sit journal area more effectively, also we will reduce
SIT update, result in gaining in performance and saving lifetime of flash
device.
In my testing environment, it shows this patch can help to reduce SIT block
update obviously.
virtual machine + hard disk:
fsstress -p 20 -n 400 -l 5
sit page num cp count sit pages/cp
based 2006.50 1349.75 1.486
patched 1566.25 1463.25 1.070
Our latency of merging op is small when handling a great number of dirty SIT
entries in flush_sit_entries:
latency(ns) dirty sit count
36038 2151
49168 2123
37174 2232
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-09-04 18:13:01 +08:00
|
|
|
SIT_JOURNAL, segno, 1);
|
|
|
|
f2fs_bug_on(sbi, offset < 0);
|
2016-02-14 18:50:40 +08:00
|
|
|
segno_in_journal(journal, offset) =
|
f2fs: refactor flush_sit_entries codes for reducing SIT writes
In commit aec71382c681 ("f2fs: refactor flush_nat_entries codes for reducing NAT
writes"), we descripte the issue as below:
"Although building NAT journal in cursum reduce the read/write work for NAT
block, but previous design leave us lower performance when write checkpoint
frequently for these cases:
1. if journal in cursum has already full, it's a bit of waste that we flush all
nat entries to page for persistence, but not to cache any entries.
2. if journal in cursum is not full, we fill nat entries to journal util
journal is full, then flush the left dirty entries to disk without merge
journaled entries, so these journaled entries may be flushed to disk at next
checkpoint but lost chance to flushed last time."
Actually, we have the same problem in using SIT journal area.
In this patch, firstly we will update sit journal with dirty entries as many as
possible. Secondly if there is no space in sit journal, we will remove all
entries in journal and walk through the whole dirty entry bitmap of sit,
accounting dirty sit entries located in same SIT block to sit entry set. All
entry sets are linked to list sit_entry_set in sm_info, sorted ascending order
by count of entries in set. Later we flush entries in set which have fewest
entries into journal as many as we can, and then flush dense set with merged
entries to disk.
In this way we can use sit journal area more effectively, also we will reduce
SIT update, result in gaining in performance and saving lifetime of flash
device.
In my testing environment, it shows this patch can help to reduce SIT block
update obviously.
virtual machine + hard disk:
fsstress -p 20 -n 400 -l 5
sit page num cp count sit pages/cp
based 2006.50 1349.75 1.486
patched 1566.25 1463.25 1.070
Our latency of merging op is small when handling a great number of dirty SIT
entries in flush_sit_entries:
latency(ns) dirty sit count
36038 2151
49168 2123
37174 2232
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-09-04 18:13:01 +08:00
|
|
|
cpu_to_le32(segno);
|
|
|
|
seg_info_to_raw_sit(se,
|
2016-02-14 18:50:40 +08:00
|
|
|
&sit_in_journal(journal, offset));
|
2018-04-09 04:28:41 +08:00
|
|
|
check_block_count(sbi, segno,
|
|
|
|
&sit_in_journal(journal, offset));
|
f2fs: refactor flush_sit_entries codes for reducing SIT writes
In commit aec71382c681 ("f2fs: refactor flush_nat_entries codes for reducing NAT
writes"), we descripte the issue as below:
"Although building NAT journal in cursum reduce the read/write work for NAT
block, but previous design leave us lower performance when write checkpoint
frequently for these cases:
1. if journal in cursum has already full, it's a bit of waste that we flush all
nat entries to page for persistence, but not to cache any entries.
2. if journal in cursum is not full, we fill nat entries to journal util
journal is full, then flush the left dirty entries to disk without merge
journaled entries, so these journaled entries may be flushed to disk at next
checkpoint but lost chance to flushed last time."
Actually, we have the same problem in using SIT journal area.
In this patch, firstly we will update sit journal with dirty entries as many as
possible. Secondly if there is no space in sit journal, we will remove all
entries in journal and walk through the whole dirty entry bitmap of sit,
accounting dirty sit entries located in same SIT block to sit entry set. All
entry sets are linked to list sit_entry_set in sm_info, sorted ascending order
by count of entries in set. Later we flush entries in set which have fewest
entries into journal as many as we can, and then flush dense set with merged
entries to disk.
In this way we can use sit journal area more effectively, also we will reduce
SIT update, result in gaining in performance and saving lifetime of flash
device.
In my testing environment, it shows this patch can help to reduce SIT block
update obviously.
virtual machine + hard disk:
fsstress -p 20 -n 400 -l 5
sit page num cp count sit pages/cp
based 2006.50 1349.75 1.486
patched 1566.25 1463.25 1.070
Our latency of merging op is small when handling a great number of dirty SIT
entries in flush_sit_entries:
latency(ns) dirty sit count
36038 2151
49168 2123
37174 2232
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-09-04 18:13:01 +08:00
|
|
|
} else {
|
|
|
|
sit_offset = SIT_ENTRY_OFFSET(sit_i, segno);
|
|
|
|
seg_info_to_raw_sit(se,
|
|
|
|
&raw_sit->entries[sit_offset]);
|
2018-04-09 04:28:41 +08:00
|
|
|
check_block_count(sbi, segno,
|
|
|
|
&raw_sit->entries[sit_offset]);
|
f2fs: refactor flush_sit_entries codes for reducing SIT writes
In commit aec71382c681 ("f2fs: refactor flush_nat_entries codes for reducing NAT
writes"), we descripte the issue as below:
"Although building NAT journal in cursum reduce the read/write work for NAT
block, but previous design leave us lower performance when write checkpoint
frequently for these cases:
1. if journal in cursum has already full, it's a bit of waste that we flush all
nat entries to page for persistence, but not to cache any entries.
2. if journal in cursum is not full, we fill nat entries to journal util
journal is full, then flush the left dirty entries to disk without merge
journaled entries, so these journaled entries may be flushed to disk at next
checkpoint but lost chance to flushed last time."
Actually, we have the same problem in using SIT journal area.
In this patch, firstly we will update sit journal with dirty entries as many as
possible. Secondly if there is no space in sit journal, we will remove all
entries in journal and walk through the whole dirty entry bitmap of sit,
accounting dirty sit entries located in same SIT block to sit entry set. All
entry sets are linked to list sit_entry_set in sm_info, sorted ascending order
by count of entries in set. Later we flush entries in set which have fewest
entries into journal as many as we can, and then flush dense set with merged
entries to disk.
In this way we can use sit journal area more effectively, also we will reduce
SIT update, result in gaining in performance and saving lifetime of flash
device.
In my testing environment, it shows this patch can help to reduce SIT block
update obviously.
virtual machine + hard disk:
fsstress -p 20 -n 400 -l 5
sit page num cp count sit pages/cp
based 2006.50 1349.75 1.486
patched 1566.25 1463.25 1.070
Our latency of merging op is small when handling a great number of dirty SIT
entries in flush_sit_entries:
latency(ns) dirty sit count
36038 2151
49168 2123
37174 2232
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-09-04 18:13:01 +08:00
|
|
|
}
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
|
f2fs: refactor flush_sit_entries codes for reducing SIT writes
In commit aec71382c681 ("f2fs: refactor flush_nat_entries codes for reducing NAT
writes"), we descripte the issue as below:
"Although building NAT journal in cursum reduce the read/write work for NAT
block, but previous design leave us lower performance when write checkpoint
frequently for these cases:
1. if journal in cursum has already full, it's a bit of waste that we flush all
nat entries to page for persistence, but not to cache any entries.
2. if journal in cursum is not full, we fill nat entries to journal util
journal is full, then flush the left dirty entries to disk without merge
journaled entries, so these journaled entries may be flushed to disk at next
checkpoint but lost chance to flushed last time."
Actually, we have the same problem in using SIT journal area.
In this patch, firstly we will update sit journal with dirty entries as many as
possible. Secondly if there is no space in sit journal, we will remove all
entries in journal and walk through the whole dirty entry bitmap of sit,
accounting dirty sit entries located in same SIT block to sit entry set. All
entry sets are linked to list sit_entry_set in sm_info, sorted ascending order
by count of entries in set. Later we flush entries in set which have fewest
entries into journal as many as we can, and then flush dense set with merged
entries to disk.
In this way we can use sit journal area more effectively, also we will reduce
SIT update, result in gaining in performance and saving lifetime of flash
device.
In my testing environment, it shows this patch can help to reduce SIT block
update obviously.
virtual machine + hard disk:
fsstress -p 20 -n 400 -l 5
sit page num cp count sit pages/cp
based 2006.50 1349.75 1.486
patched 1566.25 1463.25 1.070
Our latency of merging op is small when handling a great number of dirty SIT
entries in flush_sit_entries:
latency(ns) dirty sit count
36038 2151
49168 2123
37174 2232
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-09-04 18:13:01 +08:00
|
|
|
__clear_bit(segno, bitmap);
|
|
|
|
sit_i->dirty_sentries--;
|
|
|
|
ses->entry_cnt--;
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
}
|
|
|
|
|
f2fs: split journal cache from curseg cache
In curseg cache, f2fs caches two different parts:
- datas of current summay block, i.e. summary entries, footer info.
- journal info, i.e. sparse nat/sit entries or io stat info.
With this approach, 1) it may cause higher lock contention when we access
or update both of the parts of cache since we use the same mutex lock
curseg_mutex to protect the cache. 2) current summary block with last
journal info will be writebacked into device as a normal summary block
when flushing, however, we treat journal info as valid one only in current
summary, so most normal summary blocks contain junk journal data, it wastes
remaining space of summary block.
So, in order to fix above issues, we split curseg cache into two parts:
a) current summary block, protected by original mutex lock curseg_mutex
b) journal cache, protected by newly introduced r/w semaphore journal_rwsem
When loading curseg cache during ->mount, we store summary info and
journal info into different caches; When doing checkpoint, we combine
datas of two cache into current summary block for persisting.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-02-19 18:08:46 +08:00
|
|
|
if (to_journal)
|
|
|
|
up_write(&curseg->journal_rwsem);
|
|
|
|
else
|
f2fs: refactor flush_sit_entries codes for reducing SIT writes
In commit aec71382c681 ("f2fs: refactor flush_nat_entries codes for reducing NAT
writes"), we descripte the issue as below:
"Although building NAT journal in cursum reduce the read/write work for NAT
block, but previous design leave us lower performance when write checkpoint
frequently for these cases:
1. if journal in cursum has already full, it's a bit of waste that we flush all
nat entries to page for persistence, but not to cache any entries.
2. if journal in cursum is not full, we fill nat entries to journal util
journal is full, then flush the left dirty entries to disk without merge
journaled entries, so these journaled entries may be flushed to disk at next
checkpoint but lost chance to flushed last time."
Actually, we have the same problem in using SIT journal area.
In this patch, firstly we will update sit journal with dirty entries as many as
possible. Secondly if there is no space in sit journal, we will remove all
entries in journal and walk through the whole dirty entry bitmap of sit,
accounting dirty sit entries located in same SIT block to sit entry set. All
entry sets are linked to list sit_entry_set in sm_info, sorted ascending order
by count of entries in set. Later we flush entries in set which have fewest
entries into journal as many as we can, and then flush dense set with merged
entries to disk.
In this way we can use sit journal area more effectively, also we will reduce
SIT update, result in gaining in performance and saving lifetime of flash
device.
In my testing environment, it shows this patch can help to reduce SIT block
update obviously.
virtual machine + hard disk:
fsstress -p 20 -n 400 -l 5
sit page num cp count sit pages/cp
based 2006.50 1349.75 1.486
patched 1566.25 1463.25 1.070
Our latency of merging op is small when handling a great number of dirty SIT
entries in flush_sit_entries:
latency(ns) dirty sit count
36038 2151
49168 2123
37174 2232
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-09-04 18:13:01 +08:00
|
|
|
f2fs_put_page(page, 1);
|
|
|
|
|
|
|
|
f2fs_bug_on(sbi, ses->entry_cnt);
|
|
|
|
release_sit_entry_set(ses);
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
}
|
f2fs: refactor flush_sit_entries codes for reducing SIT writes
In commit aec71382c681 ("f2fs: refactor flush_nat_entries codes for reducing NAT
writes"), we descripte the issue as below:
"Although building NAT journal in cursum reduce the read/write work for NAT
block, but previous design leave us lower performance when write checkpoint
frequently for these cases:
1. if journal in cursum has already full, it's a bit of waste that we flush all
nat entries to page for persistence, but not to cache any entries.
2. if journal in cursum is not full, we fill nat entries to journal util
journal is full, then flush the left dirty entries to disk without merge
journaled entries, so these journaled entries may be flushed to disk at next
checkpoint but lost chance to flushed last time."
Actually, we have the same problem in using SIT journal area.
In this patch, firstly we will update sit journal with dirty entries as many as
possible. Secondly if there is no space in sit journal, we will remove all
entries in journal and walk through the whole dirty entry bitmap of sit,
accounting dirty sit entries located in same SIT block to sit entry set. All
entry sets are linked to list sit_entry_set in sm_info, sorted ascending order
by count of entries in set. Later we flush entries in set which have fewest
entries into journal as many as we can, and then flush dense set with merged
entries to disk.
In this way we can use sit journal area more effectively, also we will reduce
SIT update, result in gaining in performance and saving lifetime of flash
device.
In my testing environment, it shows this patch can help to reduce SIT block
update obviously.
virtual machine + hard disk:
fsstress -p 20 -n 400 -l 5
sit page num cp count sit pages/cp
based 2006.50 1349.75 1.486
patched 1566.25 1463.25 1.070
Our latency of merging op is small when handling a great number of dirty SIT
entries in flush_sit_entries:
latency(ns) dirty sit count
36038 2151
49168 2123
37174 2232
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-09-04 18:13:01 +08:00
|
|
|
|
|
|
|
f2fs_bug_on(sbi, !list_empty(head));
|
|
|
|
f2fs_bug_on(sbi, sit_i->dirty_sentries);
|
|
|
|
out:
|
2017-04-27 20:40:39 +08:00
|
|
|
if (cpc->reason & CP_DISCARD) {
|
2016-12-22 11:46:24 +08:00
|
|
|
__u64 trim_start = cpc->trim_start;
|
|
|
|
|
2014-09-21 13:06:39 +08:00
|
|
|
for (; cpc->trim_start <= cpc->trim_end; cpc->trim_start++)
|
2016-12-30 14:06:15 +08:00
|
|
|
add_discard_addrs(sbi, cpc, false);
|
2016-12-22 11:46:24 +08:00
|
|
|
|
|
|
|
cpc->trim_start = trim_start;
|
2014-09-21 13:06:39 +08:00
|
|
|
}
|
2017-10-30 17:49:53 +08:00
|
|
|
up_write(&sit_i->sentry_lock);
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
|
|
|
|
set_prefree_as_free_segments(sbi);
|
|
|
|
}
|
|
|
|
|
|
|
|
static int build_sit_info(struct f2fs_sb_info *sbi)
|
|
|
|
{
|
|
|
|
struct f2fs_super_block *raw_super = F2FS_RAW_SUPER(sbi);
|
|
|
|
struct sit_info *sit_i;
|
|
|
|
unsigned int sit_segs, start;
|
2017-01-07 18:52:34 +08:00
|
|
|
char *src_bitmap;
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
unsigned int bitmap_size;
|
|
|
|
|
|
|
|
/* allocate memory for SIT information */
|
2017-11-30 19:28:17 +08:00
|
|
|
sit_i = f2fs_kzalloc(sbi, sizeof(struct sit_info), GFP_KERNEL);
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
if (!sit_i)
|
|
|
|
return -ENOMEM;
|
|
|
|
|
|
|
|
SM_I(sbi)->sit_info = sit_i;
|
|
|
|
|
treewide: Use array_size in f2fs_kvzalloc()
The f2fs_kvzalloc() function has no 2-factor argument form, so
multiplication factors need to be wrapped in array_size(). This patch
replaces cases of:
f2fs_kvzalloc(handle, a * b, gfp)
with:
f2fs_kvzalloc(handle, array_size(a, b), gfp)
as well as handling cases of:
f2fs_kvzalloc(handle, a * b * c, gfp)
with:
f2fs_kvzalloc(handle, array3_size(a, b, c), gfp)
This does, however, attempt to ignore constant size factors like:
f2fs_kvzalloc(handle, 4 * 1024, gfp)
though any constants defined via macros get caught up in the conversion.
Any factors with a sizeof() of "unsigned char", "char", and "u8" were
dropped, since they're redundant.
The Coccinelle script used for this was:
// Fix redundant parens around sizeof().
@@
expression HANDLE;
type TYPE;
expression THING, E;
@@
(
f2fs_kvzalloc(HANDLE,
- (sizeof(TYPE)) * E
+ sizeof(TYPE) * E
, ...)
|
f2fs_kvzalloc(HANDLE,
- (sizeof(THING)) * E
+ sizeof(THING) * E
, ...)
)
// Drop single-byte sizes and redundant parens.
@@
expression HANDLE;
expression COUNT;
typedef u8;
typedef __u8;
@@
(
f2fs_kvzalloc(HANDLE,
- sizeof(u8) * (COUNT)
+ COUNT
, ...)
|
f2fs_kvzalloc(HANDLE,
- sizeof(__u8) * (COUNT)
+ COUNT
, ...)
|
f2fs_kvzalloc(HANDLE,
- sizeof(char) * (COUNT)
+ COUNT
, ...)
|
f2fs_kvzalloc(HANDLE,
- sizeof(unsigned char) * (COUNT)
+ COUNT
, ...)
|
f2fs_kvzalloc(HANDLE,
- sizeof(u8) * COUNT
+ COUNT
, ...)
|
f2fs_kvzalloc(HANDLE,
- sizeof(__u8) * COUNT
+ COUNT
, ...)
|
f2fs_kvzalloc(HANDLE,
- sizeof(char) * COUNT
+ COUNT
, ...)
|
f2fs_kvzalloc(HANDLE,
- sizeof(unsigned char) * COUNT
+ COUNT
, ...)
)
// 2-factor product with sizeof(type/expression) and identifier or constant.
@@
expression HANDLE;
type TYPE;
expression THING;
identifier COUNT_ID;
constant COUNT_CONST;
@@
(
f2fs_kvzalloc(HANDLE,
- sizeof(TYPE) * (COUNT_ID)
+ array_size(COUNT_ID, sizeof(TYPE))
, ...)
|
f2fs_kvzalloc(HANDLE,
- sizeof(TYPE) * COUNT_ID
+ array_size(COUNT_ID, sizeof(TYPE))
, ...)
|
f2fs_kvzalloc(HANDLE,
- sizeof(TYPE) * (COUNT_CONST)
+ array_size(COUNT_CONST, sizeof(TYPE))
, ...)
|
f2fs_kvzalloc(HANDLE,
- sizeof(TYPE) * COUNT_CONST
+ array_size(COUNT_CONST, sizeof(TYPE))
, ...)
|
f2fs_kvzalloc(HANDLE,
- sizeof(THING) * (COUNT_ID)
+ array_size(COUNT_ID, sizeof(THING))
, ...)
|
f2fs_kvzalloc(HANDLE,
- sizeof(THING) * COUNT_ID
+ array_size(COUNT_ID, sizeof(THING))
, ...)
|
f2fs_kvzalloc(HANDLE,
- sizeof(THING) * (COUNT_CONST)
+ array_size(COUNT_CONST, sizeof(THING))
, ...)
|
f2fs_kvzalloc(HANDLE,
- sizeof(THING) * COUNT_CONST
+ array_size(COUNT_CONST, sizeof(THING))
, ...)
)
// 2-factor product, only identifiers.
@@
expression HANDLE;
identifier SIZE, COUNT;
@@
f2fs_kvzalloc(HANDLE,
- SIZE * COUNT
+ array_size(COUNT, SIZE)
, ...)
// 3-factor product with 1 sizeof(type) or sizeof(expression), with
// redundant parens removed.
@@
expression HANDLE;
expression THING;
identifier STRIDE, COUNT;
type TYPE;
@@
(
f2fs_kvzalloc(HANDLE,
- sizeof(TYPE) * (COUNT) * (STRIDE)
+ array3_size(COUNT, STRIDE, sizeof(TYPE))
, ...)
|
f2fs_kvzalloc(HANDLE,
- sizeof(TYPE) * (COUNT) * STRIDE
+ array3_size(COUNT, STRIDE, sizeof(TYPE))
, ...)
|
f2fs_kvzalloc(HANDLE,
- sizeof(TYPE) * COUNT * (STRIDE)
+ array3_size(COUNT, STRIDE, sizeof(TYPE))
, ...)
|
f2fs_kvzalloc(HANDLE,
- sizeof(TYPE) * COUNT * STRIDE
+ array3_size(COUNT, STRIDE, sizeof(TYPE))
, ...)
|
f2fs_kvzalloc(HANDLE,
- sizeof(THING) * (COUNT) * (STRIDE)
+ array3_size(COUNT, STRIDE, sizeof(THING))
, ...)
|
f2fs_kvzalloc(HANDLE,
- sizeof(THING) * (COUNT) * STRIDE
+ array3_size(COUNT, STRIDE, sizeof(THING))
, ...)
|
f2fs_kvzalloc(HANDLE,
- sizeof(THING) * COUNT * (STRIDE)
+ array3_size(COUNT, STRIDE, sizeof(THING))
, ...)
|
f2fs_kvzalloc(HANDLE,
- sizeof(THING) * COUNT * STRIDE
+ array3_size(COUNT, STRIDE, sizeof(THING))
, ...)
)
// 3-factor product with 2 sizeof(variable), with redundant parens removed.
@@
expression HANDLE;
expression THING1, THING2;
identifier COUNT;
type TYPE1, TYPE2;
@@
(
f2fs_kvzalloc(HANDLE,
- sizeof(TYPE1) * sizeof(TYPE2) * COUNT
+ array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2))
, ...)
|
f2fs_kvzalloc(HANDLE,
- sizeof(TYPE1) * sizeof(THING2) * (COUNT)
+ array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2))
, ...)
|
f2fs_kvzalloc(HANDLE,
- sizeof(THING1) * sizeof(THING2) * COUNT
+ array3_size(COUNT, sizeof(THING1), sizeof(THING2))
, ...)
|
f2fs_kvzalloc(HANDLE,
- sizeof(THING1) * sizeof(THING2) * (COUNT)
+ array3_size(COUNT, sizeof(THING1), sizeof(THING2))
, ...)
|
f2fs_kvzalloc(HANDLE,
- sizeof(TYPE1) * sizeof(THING2) * COUNT
+ array3_size(COUNT, sizeof(TYPE1), sizeof(THING2))
, ...)
|
f2fs_kvzalloc(HANDLE,
- sizeof(TYPE1) * sizeof(THING2) * (COUNT)
+ array3_size(COUNT, sizeof(TYPE1), sizeof(THING2))
, ...)
)
// 3-factor product, only identifiers, with redundant parens removed.
@@
expression HANDLE;
identifier STRIDE, SIZE, COUNT;
@@
(
f2fs_kvzalloc(HANDLE,
- (COUNT) * STRIDE * SIZE
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
|
f2fs_kvzalloc(HANDLE,
- COUNT * (STRIDE) * SIZE
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
|
f2fs_kvzalloc(HANDLE,
- COUNT * STRIDE * (SIZE)
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
|
f2fs_kvzalloc(HANDLE,
- (COUNT) * (STRIDE) * SIZE
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
|
f2fs_kvzalloc(HANDLE,
- COUNT * (STRIDE) * (SIZE)
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
|
f2fs_kvzalloc(HANDLE,
- (COUNT) * STRIDE * (SIZE)
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
|
f2fs_kvzalloc(HANDLE,
- (COUNT) * (STRIDE) * (SIZE)
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
|
f2fs_kvzalloc(HANDLE,
- COUNT * STRIDE * SIZE
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
)
// Any remaining multi-factor products, first at least 3-factor products
// when they're not all constants...
@@
expression HANDLE;
expression E1, E2, E3;
constant C1, C2, C3;
@@
(
f2fs_kvzalloc(HANDLE, C1 * C2 * C3, ...)
|
f2fs_kvzalloc(HANDLE,
- E1 * E2 * E3
+ array3_size(E1, E2, E3)
, ...)
)
// And then all remaining 2 factors products when they're not all constants.
@@
expression HANDLE;
expression E1, E2;
constant C1, C2;
@@
(
f2fs_kvzalloc(HANDLE, C1 * C2, ...)
|
f2fs_kvzalloc(HANDLE,
- E1 * E2
+ array_size(E1, E2)
, ...)
)
Signed-off-by: Kees Cook <keescook@chromium.org>
2018-06-13 05:28:35 +08:00
|
|
|
sit_i->sentries =
|
|
|
|
f2fs_kvzalloc(sbi, array_size(sizeof(struct seg_entry),
|
|
|
|
MAIN_SEGS(sbi)),
|
|
|
|
GFP_KERNEL);
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
if (!sit_i->sentries)
|
|
|
|
return -ENOMEM;
|
|
|
|
|
2014-09-24 02:23:01 +08:00
|
|
|
bitmap_size = f2fs_bitmap_size(MAIN_SEGS(sbi));
|
2017-11-30 19:28:18 +08:00
|
|
|
sit_i->dirty_sentries_bitmap = f2fs_kvzalloc(sbi, bitmap_size,
|
|
|
|
GFP_KERNEL);
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
if (!sit_i->dirty_sentries_bitmap)
|
|
|
|
return -ENOMEM;
|
|
|
|
|
2014-09-24 02:23:01 +08:00
|
|
|
for (start = 0; start < MAIN_SEGS(sbi); start++) {
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
sit_i->sentries[start].cur_valid_map
|
2017-11-30 19:28:17 +08:00
|
|
|
= f2fs_kzalloc(sbi, SIT_VBLOCK_MAP_SIZE, GFP_KERNEL);
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
sit_i->sentries[start].ckpt_valid_map
|
2017-11-30 19:28:17 +08:00
|
|
|
= f2fs_kzalloc(sbi, SIT_VBLOCK_MAP_SIZE, GFP_KERNEL);
|
2015-05-01 13:37:50 +08:00
|
|
|
if (!sit_i->sentries[start].cur_valid_map ||
|
2016-08-03 01:56:40 +08:00
|
|
|
!sit_i->sentries[start].ckpt_valid_map)
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
return -ENOMEM;
|
2016-08-03 01:56:40 +08:00
|
|
|
|
2017-01-07 18:51:01 +08:00
|
|
|
#ifdef CONFIG_F2FS_CHECK_FS
|
|
|
|
sit_i->sentries[start].cur_valid_map_mir
|
2017-11-30 19:28:17 +08:00
|
|
|
= f2fs_kzalloc(sbi, SIT_VBLOCK_MAP_SIZE, GFP_KERNEL);
|
2017-01-07 18:51:01 +08:00
|
|
|
if (!sit_i->sentries[start].cur_valid_map_mir)
|
|
|
|
return -ENOMEM;
|
|
|
|
#endif
|
|
|
|
|
f2fs: fix to avoid NULL pointer dereference on se->discard_map
https://bugzilla.kernel.org/show_bug.cgi?id=200951
These is a NULL pointer dereference issue reported in bugzilla:
Hi,
in the setup there is a SATA SSD connected to a SATA-to-USB bridge.
The disc is "Samsung SSD 850 PRO 256G" which supports TRIM.
There are four partitions:
sda1: FAT /boot
sda2: F2FS /
sda3: F2FS /home
sda4: F2FS
The bridge is ASMT1153e which uses the "uas" driver.
There is no TRIM pass-through, so, when mounting it reports:
mounting with "discard" option, but the device does not support discard
The USB host is USB3.0 and UASP capable. It is the one on RK3399.
Given this everything works fine, except there is no TRIM support.
In order to enable TRIM a new UDEV rule is added [1]:
/etc/udev/rules.d/10-sata-bridge-trim.rules:
ACTION=="add|change", ATTRS{idVendor}=="174c", ATTRS{idProduct}=="55aa", SUBSYSTEM=="scsi_disk", ATTR{provisioning_mode}="unmap"
After reboot any F2FS write hangs forever and dmesg reports:
Unable to handle kernel NULL pointer dereference
Also tested on a x86_64 system: works fine even with TRIM enabled.
same disc
same bridge
different usb host controller
different cpu architecture
not root filesystem
Regards,
Vicenç.
[1] Post #5 in https://bbs.archlinux.org/viewtopic.php?id=236280
Unable to handle kernel NULL pointer dereference at virtual address 000000000000003e
Mem abort info:
ESR = 0x96000004
Exception class = DABT (current EL), IL = 32 bits
SET = 0, FnV = 0
EA = 0, S1PTW = 0
Data abort info:
ISV = 0, ISS = 0x00000004
CM = 0, WnR = 0
user pgtable: 4k pages, 48-bit VAs, pgdp = 00000000626e3122
[000000000000003e] pgd=0000000000000000
Internal error: Oops: 96000004 [#1] SMP
Modules linked in: overlay snd_soc_hdmi_codec rc_cec dw_hdmi_i2s_audio dw_hdmi_cec snd_soc_simple_card snd_soc_simple_card_utils snd_soc_rockchip_i2s rockchip_rga snd_soc_rockchip_pcm rockchipdrm videobuf2_dma_sg v4l2_mem2mem rtc_rk808 videobuf2_memops analogix_dp videobuf2_v4l2 videobuf2_common dw_hdmi dw_wdt cec rc_core videodev drm_kms_helper media drm rockchip_thermal rockchip_saradc realtek drm_panel_orientation_quirks syscopyarea sysfillrect sysimgblt fb_sys_fops dwmac_rk stmmac_platform stmmac pwm_bl squashfs loop crypto_user gpio_keys hid_kensington
CPU: 5 PID: 957 Comm: nvim Not tainted 4.19.0-rc1-1-ARCH #1
Hardware name: Sapphire-RK3399 Board (DT)
pstate: 00000005 (nzcv daif -PAN -UAO)
pc : update_sit_entry+0x304/0x4b0
lr : update_sit_entry+0x108/0x4b0
sp : ffff00000ca13bd0
x29: ffff00000ca13bd0 x28: 000000000000003e
x27: 0000000000000020 x26: 0000000000080000
x25: 0000000000000048 x24: ffff8000ebb85cf8
x23: 0000000000000253 x22: 00000000ffffffff
x21: 00000000000535f2 x20: 00000000ffffffdf
x19: ffff8000eb9e6800 x18: ffff8000eb9e6be8
x17: 0000000007ce6926 x16: 000000001c83ffa8
x15: 0000000000000000 x14: ffff8000f602df90
x13: 0000000000000006 x12: 0000000000000040
x11: 0000000000000228 x10: 0000000000000000
x9 : 0000000000000000 x8 : 0000000000000000
x7 : 00000000000535f2 x6 : ffff8000ebff3440
x5 : ffff8000ebff3440 x4 : ffff8000ebe3a6c8
x3 : 00000000ffffffff x2 : 0000000000000020
x1 : 0000000000000000 x0 : ffff8000eb9e5800
Process nvim (pid: 957, stack limit = 0x0000000063a78320)
Call trace:
update_sit_entry+0x304/0x4b0
f2fs_invalidate_blocks+0x98/0x140
truncate_node+0x90/0x400
f2fs_remove_inode_page+0xe8/0x340
f2fs_evict_inode+0x2b0/0x408
evict+0xe0/0x1e0
iput+0x160/0x260
do_unlinkat+0x214/0x298
__arm64_sys_unlinkat+0x3c/0x68
el0_svc_handler+0x94/0x118
el0_svc+0x8/0xc
Code: f9400800 b9488400 36080140 f9400f01 (387c4820)
---[ end trace a0f21a307118c477 ]---
The reason is it is possible to enable discard flag on block queue via
UDEV, but during mount, f2fs will initialize se->discard_map only if
this flag is set, once the flag is set after mount, f2fs may dereference
NULL pointer on se->discard_map.
So this patch does below changes to fix this issue:
- initialize and update se->discard_map all the time.
- don't clear DISCARD option if device has no QUEUE_FLAG_DISCARD flag
during mount.
- don't issue small discard on zoned block device.
- introduce some functions to enhance the readability.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Tested-by: Vicente Bergas <vicencb@gmail.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-09-04 03:52:17 +08:00
|
|
|
sit_i->sentries[start].discard_map
|
|
|
|
= f2fs_kzalloc(sbi, SIT_VBLOCK_MAP_SIZE,
|
|
|
|
GFP_KERNEL);
|
|
|
|
if (!sit_i->sentries[start].discard_map)
|
|
|
|
return -ENOMEM;
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
}
|
|
|
|
|
2017-11-30 19:28:17 +08:00
|
|
|
sit_i->tmp_map = f2fs_kzalloc(sbi, SIT_VBLOCK_MAP_SIZE, GFP_KERNEL);
|
2015-02-11 08:44:29 +08:00
|
|
|
if (!sit_i->tmp_map)
|
|
|
|
return -ENOMEM;
|
|
|
|
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
if (sbi->segs_per_sec > 1) {
|
treewide: Use array_size in f2fs_kvzalloc()
The f2fs_kvzalloc() function has no 2-factor argument form, so
multiplication factors need to be wrapped in array_size(). This patch
replaces cases of:
f2fs_kvzalloc(handle, a * b, gfp)
with:
f2fs_kvzalloc(handle, array_size(a, b), gfp)
as well as handling cases of:
f2fs_kvzalloc(handle, a * b * c, gfp)
with:
f2fs_kvzalloc(handle, array3_size(a, b, c), gfp)
This does, however, attempt to ignore constant size factors like:
f2fs_kvzalloc(handle, 4 * 1024, gfp)
though any constants defined via macros get caught up in the conversion.
Any factors with a sizeof() of "unsigned char", "char", and "u8" were
dropped, since they're redundant.
The Coccinelle script used for this was:
// Fix redundant parens around sizeof().
@@
expression HANDLE;
type TYPE;
expression THING, E;
@@
(
f2fs_kvzalloc(HANDLE,
- (sizeof(TYPE)) * E
+ sizeof(TYPE) * E
, ...)
|
f2fs_kvzalloc(HANDLE,
- (sizeof(THING)) * E
+ sizeof(THING) * E
, ...)
)
// Drop single-byte sizes and redundant parens.
@@
expression HANDLE;
expression COUNT;
typedef u8;
typedef __u8;
@@
(
f2fs_kvzalloc(HANDLE,
- sizeof(u8) * (COUNT)
+ COUNT
, ...)
|
f2fs_kvzalloc(HANDLE,
- sizeof(__u8) * (COUNT)
+ COUNT
, ...)
|
f2fs_kvzalloc(HANDLE,
- sizeof(char) * (COUNT)
+ COUNT
, ...)
|
f2fs_kvzalloc(HANDLE,
- sizeof(unsigned char) * (COUNT)
+ COUNT
, ...)
|
f2fs_kvzalloc(HANDLE,
- sizeof(u8) * COUNT
+ COUNT
, ...)
|
f2fs_kvzalloc(HANDLE,
- sizeof(__u8) * COUNT
+ COUNT
, ...)
|
f2fs_kvzalloc(HANDLE,
- sizeof(char) * COUNT
+ COUNT
, ...)
|
f2fs_kvzalloc(HANDLE,
- sizeof(unsigned char) * COUNT
+ COUNT
, ...)
)
// 2-factor product with sizeof(type/expression) and identifier or constant.
@@
expression HANDLE;
type TYPE;
expression THING;
identifier COUNT_ID;
constant COUNT_CONST;
@@
(
f2fs_kvzalloc(HANDLE,
- sizeof(TYPE) * (COUNT_ID)
+ array_size(COUNT_ID, sizeof(TYPE))
, ...)
|
f2fs_kvzalloc(HANDLE,
- sizeof(TYPE) * COUNT_ID
+ array_size(COUNT_ID, sizeof(TYPE))
, ...)
|
f2fs_kvzalloc(HANDLE,
- sizeof(TYPE) * (COUNT_CONST)
+ array_size(COUNT_CONST, sizeof(TYPE))
, ...)
|
f2fs_kvzalloc(HANDLE,
- sizeof(TYPE) * COUNT_CONST
+ array_size(COUNT_CONST, sizeof(TYPE))
, ...)
|
f2fs_kvzalloc(HANDLE,
- sizeof(THING) * (COUNT_ID)
+ array_size(COUNT_ID, sizeof(THING))
, ...)
|
f2fs_kvzalloc(HANDLE,
- sizeof(THING) * COUNT_ID
+ array_size(COUNT_ID, sizeof(THING))
, ...)
|
f2fs_kvzalloc(HANDLE,
- sizeof(THING) * (COUNT_CONST)
+ array_size(COUNT_CONST, sizeof(THING))
, ...)
|
f2fs_kvzalloc(HANDLE,
- sizeof(THING) * COUNT_CONST
+ array_size(COUNT_CONST, sizeof(THING))
, ...)
)
// 2-factor product, only identifiers.
@@
expression HANDLE;
identifier SIZE, COUNT;
@@
f2fs_kvzalloc(HANDLE,
- SIZE * COUNT
+ array_size(COUNT, SIZE)
, ...)
// 3-factor product with 1 sizeof(type) or sizeof(expression), with
// redundant parens removed.
@@
expression HANDLE;
expression THING;
identifier STRIDE, COUNT;
type TYPE;
@@
(
f2fs_kvzalloc(HANDLE,
- sizeof(TYPE) * (COUNT) * (STRIDE)
+ array3_size(COUNT, STRIDE, sizeof(TYPE))
, ...)
|
f2fs_kvzalloc(HANDLE,
- sizeof(TYPE) * (COUNT) * STRIDE
+ array3_size(COUNT, STRIDE, sizeof(TYPE))
, ...)
|
f2fs_kvzalloc(HANDLE,
- sizeof(TYPE) * COUNT * (STRIDE)
+ array3_size(COUNT, STRIDE, sizeof(TYPE))
, ...)
|
f2fs_kvzalloc(HANDLE,
- sizeof(TYPE) * COUNT * STRIDE
+ array3_size(COUNT, STRIDE, sizeof(TYPE))
, ...)
|
f2fs_kvzalloc(HANDLE,
- sizeof(THING) * (COUNT) * (STRIDE)
+ array3_size(COUNT, STRIDE, sizeof(THING))
, ...)
|
f2fs_kvzalloc(HANDLE,
- sizeof(THING) * (COUNT) * STRIDE
+ array3_size(COUNT, STRIDE, sizeof(THING))
, ...)
|
f2fs_kvzalloc(HANDLE,
- sizeof(THING) * COUNT * (STRIDE)
+ array3_size(COUNT, STRIDE, sizeof(THING))
, ...)
|
f2fs_kvzalloc(HANDLE,
- sizeof(THING) * COUNT * STRIDE
+ array3_size(COUNT, STRIDE, sizeof(THING))
, ...)
)
// 3-factor product with 2 sizeof(variable), with redundant parens removed.
@@
expression HANDLE;
expression THING1, THING2;
identifier COUNT;
type TYPE1, TYPE2;
@@
(
f2fs_kvzalloc(HANDLE,
- sizeof(TYPE1) * sizeof(TYPE2) * COUNT
+ array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2))
, ...)
|
f2fs_kvzalloc(HANDLE,
- sizeof(TYPE1) * sizeof(THING2) * (COUNT)
+ array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2))
, ...)
|
f2fs_kvzalloc(HANDLE,
- sizeof(THING1) * sizeof(THING2) * COUNT
+ array3_size(COUNT, sizeof(THING1), sizeof(THING2))
, ...)
|
f2fs_kvzalloc(HANDLE,
- sizeof(THING1) * sizeof(THING2) * (COUNT)
+ array3_size(COUNT, sizeof(THING1), sizeof(THING2))
, ...)
|
f2fs_kvzalloc(HANDLE,
- sizeof(TYPE1) * sizeof(THING2) * COUNT
+ array3_size(COUNT, sizeof(TYPE1), sizeof(THING2))
, ...)
|
f2fs_kvzalloc(HANDLE,
- sizeof(TYPE1) * sizeof(THING2) * (COUNT)
+ array3_size(COUNT, sizeof(TYPE1), sizeof(THING2))
, ...)
)
// 3-factor product, only identifiers, with redundant parens removed.
@@
expression HANDLE;
identifier STRIDE, SIZE, COUNT;
@@
(
f2fs_kvzalloc(HANDLE,
- (COUNT) * STRIDE * SIZE
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
|
f2fs_kvzalloc(HANDLE,
- COUNT * (STRIDE) * SIZE
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
|
f2fs_kvzalloc(HANDLE,
- COUNT * STRIDE * (SIZE)
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
|
f2fs_kvzalloc(HANDLE,
- (COUNT) * (STRIDE) * SIZE
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
|
f2fs_kvzalloc(HANDLE,
- COUNT * (STRIDE) * (SIZE)
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
|
f2fs_kvzalloc(HANDLE,
- (COUNT) * STRIDE * (SIZE)
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
|
f2fs_kvzalloc(HANDLE,
- (COUNT) * (STRIDE) * (SIZE)
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
|
f2fs_kvzalloc(HANDLE,
- COUNT * STRIDE * SIZE
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
)
// Any remaining multi-factor products, first at least 3-factor products
// when they're not all constants...
@@
expression HANDLE;
expression E1, E2, E3;
constant C1, C2, C3;
@@
(
f2fs_kvzalloc(HANDLE, C1 * C2 * C3, ...)
|
f2fs_kvzalloc(HANDLE,
- E1 * E2 * E3
+ array3_size(E1, E2, E3)
, ...)
)
// And then all remaining 2 factors products when they're not all constants.
@@
expression HANDLE;
expression E1, E2;
constant C1, C2;
@@
(
f2fs_kvzalloc(HANDLE, C1 * C2, ...)
|
f2fs_kvzalloc(HANDLE,
- E1 * E2
+ array_size(E1, E2)
, ...)
)
Signed-off-by: Kees Cook <keescook@chromium.org>
2018-06-13 05:28:35 +08:00
|
|
|
sit_i->sec_entries =
|
|
|
|
f2fs_kvzalloc(sbi, array_size(sizeof(struct sec_entry),
|
|
|
|
MAIN_SECS(sbi)),
|
|
|
|
GFP_KERNEL);
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
if (!sit_i->sec_entries)
|
|
|
|
return -ENOMEM;
|
|
|
|
}
|
|
|
|
|
|
|
|
/* get information related with SIT */
|
|
|
|
sit_segs = le32_to_cpu(raw_super->segment_count_sit) >> 1;
|
|
|
|
|
|
|
|
/* setup SIT bitmap from ckeckpoint pack */
|
|
|
|
bitmap_size = __bitmap_size(sbi, SIT_BITMAP);
|
|
|
|
src_bitmap = __bitmap_ptr(sbi, SIT_BITMAP);
|
|
|
|
|
2017-01-07 18:52:34 +08:00
|
|
|
sit_i->sit_bitmap = kmemdup(src_bitmap, bitmap_size, GFP_KERNEL);
|
|
|
|
if (!sit_i->sit_bitmap)
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
return -ENOMEM;
|
|
|
|
|
2017-01-07 18:52:34 +08:00
|
|
|
#ifdef CONFIG_F2FS_CHECK_FS
|
|
|
|
sit_i->sit_bitmap_mir = kmemdup(src_bitmap, bitmap_size, GFP_KERNEL);
|
|
|
|
if (!sit_i->sit_bitmap_mir)
|
|
|
|
return -ENOMEM;
|
|
|
|
#endif
|
|
|
|
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
/* init SIT information */
|
|
|
|
sit_i->s_ops = &default_salloc_ops;
|
|
|
|
|
|
|
|
sit_i->sit_base_addr = le32_to_cpu(raw_super->sit_blkaddr);
|
|
|
|
sit_i->sit_blocks = sit_segs << sbi->log_blocks_per_seg;
|
2016-11-15 10:20:10 +08:00
|
|
|
sit_i->written_valid_blocks = 0;
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
sit_i->bitmap_size = bitmap_size;
|
|
|
|
sit_i->dirty_sentries = 0;
|
|
|
|
sit_i->sents_per_block = SIT_ENTRY_PER_BLOCK;
|
|
|
|
sit_i->elapsed_time = le64_to_cpu(sbi->ckpt->elapsed_time);
|
2017-05-09 06:59:10 +08:00
|
|
|
sit_i->mounted_time = ktime_get_real_seconds();
|
2017-10-30 17:49:53 +08:00
|
|
|
init_rwsem(&sit_i->sentry_lock);
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
static int build_free_segmap(struct f2fs_sb_info *sbi)
|
|
|
|
{
|
|
|
|
struct free_segmap_info *free_i;
|
|
|
|
unsigned int bitmap_size, sec_bitmap_size;
|
|
|
|
|
|
|
|
/* allocate memory for free segmap information */
|
2017-11-30 19:28:17 +08:00
|
|
|
free_i = f2fs_kzalloc(sbi, sizeof(struct free_segmap_info), GFP_KERNEL);
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
if (!free_i)
|
|
|
|
return -ENOMEM;
|
|
|
|
|
|
|
|
SM_I(sbi)->free_info = free_i;
|
|
|
|
|
2014-09-24 02:23:01 +08:00
|
|
|
bitmap_size = f2fs_bitmap_size(MAIN_SEGS(sbi));
|
2017-11-30 19:28:18 +08:00
|
|
|
free_i->free_segmap = f2fs_kvmalloc(sbi, bitmap_size, GFP_KERNEL);
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
if (!free_i->free_segmap)
|
|
|
|
return -ENOMEM;
|
|
|
|
|
2014-09-24 02:23:01 +08:00
|
|
|
sec_bitmap_size = f2fs_bitmap_size(MAIN_SECS(sbi));
|
2017-11-30 19:28:18 +08:00
|
|
|
free_i->free_secmap = f2fs_kvmalloc(sbi, sec_bitmap_size, GFP_KERNEL);
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
if (!free_i->free_secmap)
|
|
|
|
return -ENOMEM;
|
|
|
|
|
|
|
|
/* set all segments as dirty temporarily */
|
|
|
|
memset(free_i->free_segmap, 0xff, bitmap_size);
|
|
|
|
memset(free_i->free_secmap, 0xff, sec_bitmap_size);
|
|
|
|
|
|
|
|
/* init free segmap information */
|
2014-09-24 02:23:01 +08:00
|
|
|
free_i->start_segno = GET_SEGNO_FROM_SEG0(sbi, MAIN_BLKADDR(sbi));
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
free_i->free_segments = 0;
|
|
|
|
free_i->free_sections = 0;
|
2015-02-11 18:20:38 +08:00
|
|
|
spin_lock_init(&free_i->segmap_lock);
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
static int build_curseg(struct f2fs_sb_info *sbi)
|
|
|
|
{
|
2012-12-01 09:56:13 +08:00
|
|
|
struct curseg_info *array;
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
int i;
|
|
|
|
|
treewide: Use array_size() in f2fs_kzalloc()
The f2fs_kzalloc() function has no 2-factor argument form, so
multiplication factors need to be wrapped in array_size(). This patch
replaces cases of:
f2fs_kzalloc(handle, a * b, gfp)
with:
f2fs_kzalloc(handle, array_size(a, b), gfp)
as well as handling cases of:
f2fs_kzalloc(handle, a * b * c, gfp)
with:
f2fs_kzalloc(handle, array3_size(a, b, c), gfp)
This does, however, attempt to ignore constant size factors like:
f2fs_kzalloc(handle, 4 * 1024, gfp)
though any constants defined via macros get caught up in the conversion.
Any factors with a sizeof() of "unsigned char", "char", and "u8" were
dropped, since they're redundant.
The Coccinelle script used for this was:
// Fix redundant parens around sizeof().
@@
expression HANDLE;
type TYPE;
expression THING, E;
@@
(
f2fs_kzalloc(HANDLE,
- (sizeof(TYPE)) * E
+ sizeof(TYPE) * E
, ...)
|
f2fs_kzalloc(HANDLE,
- (sizeof(THING)) * E
+ sizeof(THING) * E
, ...)
)
// Drop single-byte sizes and redundant parens.
@@
expression HANDLE;
expression COUNT;
typedef u8;
typedef __u8;
@@
(
f2fs_kzalloc(HANDLE,
- sizeof(u8) * (COUNT)
+ COUNT
, ...)
|
f2fs_kzalloc(HANDLE,
- sizeof(__u8) * (COUNT)
+ COUNT
, ...)
|
f2fs_kzalloc(HANDLE,
- sizeof(char) * (COUNT)
+ COUNT
, ...)
|
f2fs_kzalloc(HANDLE,
- sizeof(unsigned char) * (COUNT)
+ COUNT
, ...)
|
f2fs_kzalloc(HANDLE,
- sizeof(u8) * COUNT
+ COUNT
, ...)
|
f2fs_kzalloc(HANDLE,
- sizeof(__u8) * COUNT
+ COUNT
, ...)
|
f2fs_kzalloc(HANDLE,
- sizeof(char) * COUNT
+ COUNT
, ...)
|
f2fs_kzalloc(HANDLE,
- sizeof(unsigned char) * COUNT
+ COUNT
, ...)
)
// 2-factor product with sizeof(type/expression) and identifier or constant.
@@
expression HANDLE;
type TYPE;
expression THING;
identifier COUNT_ID;
constant COUNT_CONST;
@@
(
f2fs_kzalloc(HANDLE,
- sizeof(TYPE) * (COUNT_ID)
+ array_size(COUNT_ID, sizeof(TYPE))
, ...)
|
f2fs_kzalloc(HANDLE,
- sizeof(TYPE) * COUNT_ID
+ array_size(COUNT_ID, sizeof(TYPE))
, ...)
|
f2fs_kzalloc(HANDLE,
- sizeof(TYPE) * (COUNT_CONST)
+ array_size(COUNT_CONST, sizeof(TYPE))
, ...)
|
f2fs_kzalloc(HANDLE,
- sizeof(TYPE) * COUNT_CONST
+ array_size(COUNT_CONST, sizeof(TYPE))
, ...)
|
f2fs_kzalloc(HANDLE,
- sizeof(THING) * (COUNT_ID)
+ array_size(COUNT_ID, sizeof(THING))
, ...)
|
f2fs_kzalloc(HANDLE,
- sizeof(THING) * COUNT_ID
+ array_size(COUNT_ID, sizeof(THING))
, ...)
|
f2fs_kzalloc(HANDLE,
- sizeof(THING) * (COUNT_CONST)
+ array_size(COUNT_CONST, sizeof(THING))
, ...)
|
f2fs_kzalloc(HANDLE,
- sizeof(THING) * COUNT_CONST
+ array_size(COUNT_CONST, sizeof(THING))
, ...)
)
// 2-factor product, only identifiers.
@@
expression HANDLE;
identifier SIZE, COUNT;
@@
f2fs_kzalloc(HANDLE,
- SIZE * COUNT
+ array_size(COUNT, SIZE)
, ...)
// 3-factor product with 1 sizeof(type) or sizeof(expression), with
// redundant parens removed.
@@
expression HANDLE;
expression THING;
identifier STRIDE, COUNT;
type TYPE;
@@
(
f2fs_kzalloc(HANDLE,
- sizeof(TYPE) * (COUNT) * (STRIDE)
+ array3_size(COUNT, STRIDE, sizeof(TYPE))
, ...)
|
f2fs_kzalloc(HANDLE,
- sizeof(TYPE) * (COUNT) * STRIDE
+ array3_size(COUNT, STRIDE, sizeof(TYPE))
, ...)
|
f2fs_kzalloc(HANDLE,
- sizeof(TYPE) * COUNT * (STRIDE)
+ array3_size(COUNT, STRIDE, sizeof(TYPE))
, ...)
|
f2fs_kzalloc(HANDLE,
- sizeof(TYPE) * COUNT * STRIDE
+ array3_size(COUNT, STRIDE, sizeof(TYPE))
, ...)
|
f2fs_kzalloc(HANDLE,
- sizeof(THING) * (COUNT) * (STRIDE)
+ array3_size(COUNT, STRIDE, sizeof(THING))
, ...)
|
f2fs_kzalloc(HANDLE,
- sizeof(THING) * (COUNT) * STRIDE
+ array3_size(COUNT, STRIDE, sizeof(THING))
, ...)
|
f2fs_kzalloc(HANDLE,
- sizeof(THING) * COUNT * (STRIDE)
+ array3_size(COUNT, STRIDE, sizeof(THING))
, ...)
|
f2fs_kzalloc(HANDLE,
- sizeof(THING) * COUNT * STRIDE
+ array3_size(COUNT, STRIDE, sizeof(THING))
, ...)
)
// 3-factor product with 2 sizeof(variable), with redundant parens removed.
@@
expression HANDLE;
expression THING1, THING2;
identifier COUNT;
type TYPE1, TYPE2;
@@
(
f2fs_kzalloc(HANDLE,
- sizeof(TYPE1) * sizeof(TYPE2) * COUNT
+ array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2))
, ...)
|
f2fs_kzalloc(HANDLE,
- sizeof(TYPE1) * sizeof(THING2) * (COUNT)
+ array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2))
, ...)
|
f2fs_kzalloc(HANDLE,
- sizeof(THING1) * sizeof(THING2) * COUNT
+ array3_size(COUNT, sizeof(THING1), sizeof(THING2))
, ...)
|
f2fs_kzalloc(HANDLE,
- sizeof(THING1) * sizeof(THING2) * (COUNT)
+ array3_size(COUNT, sizeof(THING1), sizeof(THING2))
, ...)
|
f2fs_kzalloc(HANDLE,
- sizeof(TYPE1) * sizeof(THING2) * COUNT
+ array3_size(COUNT, sizeof(TYPE1), sizeof(THING2))
, ...)
|
f2fs_kzalloc(HANDLE,
- sizeof(TYPE1) * sizeof(THING2) * (COUNT)
+ array3_size(COUNT, sizeof(TYPE1), sizeof(THING2))
, ...)
)
// 3-factor product, only identifiers, with redundant parens removed.
@@
expression HANDLE;
identifier STRIDE, SIZE, COUNT;
@@
(
f2fs_kzalloc(HANDLE,
- (COUNT) * STRIDE * SIZE
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
|
f2fs_kzalloc(HANDLE,
- COUNT * (STRIDE) * SIZE
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
|
f2fs_kzalloc(HANDLE,
- COUNT * STRIDE * (SIZE)
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
|
f2fs_kzalloc(HANDLE,
- (COUNT) * (STRIDE) * SIZE
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
|
f2fs_kzalloc(HANDLE,
- COUNT * (STRIDE) * (SIZE)
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
|
f2fs_kzalloc(HANDLE,
- (COUNT) * STRIDE * (SIZE)
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
|
f2fs_kzalloc(HANDLE,
- (COUNT) * (STRIDE) * (SIZE)
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
|
f2fs_kzalloc(HANDLE,
- COUNT * STRIDE * SIZE
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
)
// Any remaining multi-factor products, first at least 3-factor products
// when they're not all constants...
@@
expression HANDLE;
expression E1, E2, E3;
constant C1, C2, C3;
@@
(
f2fs_kzalloc(HANDLE, C1 * C2 * C3, ...)
|
f2fs_kzalloc(HANDLE,
- E1 * E2 * E3
+ array3_size(E1, E2, E3)
, ...)
)
// And then all remaining 2 factors products when they're not all constants.
@@
expression HANDLE;
expression E1, E2;
constant C1, C2;
@@
(
f2fs_kzalloc(HANDLE, C1 * C2, ...)
|
f2fs_kzalloc(HANDLE,
- E1 * E2
+ array_size(E1, E2)
, ...)
)
Signed-off-by: Kees Cook <keescook@chromium.org>
2018-06-13 05:28:23 +08:00
|
|
|
array = f2fs_kzalloc(sbi, array_size(NR_CURSEG_TYPE, sizeof(*array)),
|
|
|
|
GFP_KERNEL);
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
if (!array)
|
|
|
|
return -ENOMEM;
|
|
|
|
|
|
|
|
SM_I(sbi)->curseg_array = array;
|
|
|
|
|
|
|
|
for (i = 0; i < NR_CURSEG_TYPE; i++) {
|
|
|
|
mutex_init(&array[i].curseg_mutex);
|
2017-11-30 19:28:17 +08:00
|
|
|
array[i].sum_blk = f2fs_kzalloc(sbi, PAGE_SIZE, GFP_KERNEL);
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
if (!array[i].sum_blk)
|
|
|
|
return -ENOMEM;
|
f2fs: split journal cache from curseg cache
In curseg cache, f2fs caches two different parts:
- datas of current summay block, i.e. summary entries, footer info.
- journal info, i.e. sparse nat/sit entries or io stat info.
With this approach, 1) it may cause higher lock contention when we access
or update both of the parts of cache since we use the same mutex lock
curseg_mutex to protect the cache. 2) current summary block with last
journal info will be writebacked into device as a normal summary block
when flushing, however, we treat journal info as valid one only in current
summary, so most normal summary blocks contain junk journal data, it wastes
remaining space of summary block.
So, in order to fix above issues, we split curseg cache into two parts:
a) current summary block, protected by original mutex lock curseg_mutex
b) journal cache, protected by newly introduced r/w semaphore journal_rwsem
When loading curseg cache during ->mount, we store summary info and
journal info into different caches; When doing checkpoint, we combine
datas of two cache into current summary block for persisting.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-02-19 18:08:46 +08:00
|
|
|
init_rwsem(&array[i].journal_rwsem);
|
2017-11-30 19:28:17 +08:00
|
|
|
array[i].journal = f2fs_kzalloc(sbi,
|
|
|
|
sizeof(struct f2fs_journal), GFP_KERNEL);
|
f2fs: split journal cache from curseg cache
In curseg cache, f2fs caches two different parts:
- datas of current summay block, i.e. summary entries, footer info.
- journal info, i.e. sparse nat/sit entries or io stat info.
With this approach, 1) it may cause higher lock contention when we access
or update both of the parts of cache since we use the same mutex lock
curseg_mutex to protect the cache. 2) current summary block with last
journal info will be writebacked into device as a normal summary block
when flushing, however, we treat journal info as valid one only in current
summary, so most normal summary blocks contain junk journal data, it wastes
remaining space of summary block.
So, in order to fix above issues, we split curseg cache into two parts:
a) current summary block, protected by original mutex lock curseg_mutex
b) journal cache, protected by newly introduced r/w semaphore journal_rwsem
When loading curseg cache during ->mount, we store summary info and
journal info into different caches; When doing checkpoint, we combine
datas of two cache into current summary block for persisting.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-02-19 18:08:46 +08:00
|
|
|
if (!array[i].journal)
|
|
|
|
return -ENOMEM;
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
array[i].segno = NULL_SEGNO;
|
|
|
|
array[i].next_blkoff = 0;
|
|
|
|
}
|
|
|
|
return restore_curseg_summaries(sbi);
|
|
|
|
}
|
|
|
|
|
2017-12-20 11:16:34 +08:00
|
|
|
static int build_sit_entries(struct f2fs_sb_info *sbi)
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
{
|
|
|
|
struct sit_info *sit_i = SIT_I(sbi);
|
|
|
|
struct curseg_info *curseg = CURSEG_I(sbi, CURSEG_COLD_DATA);
|
f2fs: split journal cache from curseg cache
In curseg cache, f2fs caches two different parts:
- datas of current summay block, i.e. summary entries, footer info.
- journal info, i.e. sparse nat/sit entries or io stat info.
With this approach, 1) it may cause higher lock contention when we access
or update both of the parts of cache since we use the same mutex lock
curseg_mutex to protect the cache. 2) current summary block with last
journal info will be writebacked into device as a normal summary block
when flushing, however, we treat journal info as valid one only in current
summary, so most normal summary blocks contain junk journal data, it wastes
remaining space of summary block.
So, in order to fix above issues, we split curseg cache into two parts:
a) current summary block, protected by original mutex lock curseg_mutex
b) journal cache, protected by newly introduced r/w semaphore journal_rwsem
When loading curseg cache during ->mount, we store summary info and
journal info into different caches; When doing checkpoint, we combine
datas of two cache into current summary block for persisting.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-02-19 18:08:46 +08:00
|
|
|
struct f2fs_journal *journal = curseg->journal;
|
2016-09-24 12:29:18 +08:00
|
|
|
struct seg_entry *se;
|
|
|
|
struct f2fs_sit_entry sit;
|
2013-11-22 09:09:59 +08:00
|
|
|
int sit_blk_cnt = SIT_BLK_CNT(sbi);
|
|
|
|
unsigned int i, start, end;
|
|
|
|
unsigned int readed, start_blk = 0;
|
2017-12-20 11:16:34 +08:00
|
|
|
int err = 0;
|
2018-04-25 11:34:05 +08:00
|
|
|
block_t total_node_blocks = 0;
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
|
2013-11-22 09:09:59 +08:00
|
|
|
do {
|
f2fs: clean up symbol namespace
As Ted reported:
"Hi, I was looking at f2fs's sources recently, and I noticed that there
is a very large number of non-static symbols which don't have a f2fs
prefix. There's well over a hundred (see attached below).
As one example, in fs/f2fs/dir.c there is:
unsigned char get_de_type(struct f2fs_dir_entry *de)
This function is clearly only useful for f2fs, but it has a generic
name. This means that if any other file system tries to have the same
symbol name, there will be a symbol conflict and the kernel would not
successfully build. It also means that when someone is looking f2fs
sources, it's not at all obvious whether a function such as
read_data_page(), invalidate_blocks(), is a generic kernel function
found in the fs, mm, or block layers, or a f2fs specific function.
You might want to fix this at some point. Hopefully Kent's bcachefs
isn't similarly using genericly named functions, since that might
cause conflicts with f2fs's functions --- but just as this would be a
problem that we would rightly insist that Kent fix, this is something
that we should have rightly insisted that f2fs should have fixed
before it was integrated into the mainline kernel.
acquire_orphan_inode
add_ino_entry
add_orphan_inode
allocate_data_block
allocate_new_segments
alloc_nid
alloc_nid_done
alloc_nid_failed
available_free_memory
...."
This patch adds "f2fs_" prefix for all non-static symbols in order to:
a) avoid conflict with other kernel generic symbols;
b) to indicate the function is f2fs specific one instead of generic
one;
Reported-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-30 00:20:41 +08:00
|
|
|
readed = f2fs_ra_meta_pages(sbi, start_blk, BIO_MAX_PAGES,
|
2016-10-19 02:07:45 +08:00
|
|
|
META_SIT, true);
|
2013-11-22 09:09:59 +08:00
|
|
|
|
|
|
|
start = start_blk * sit_i->sents_per_block;
|
|
|
|
end = (start_blk + readed) * sit_i->sents_per_block;
|
|
|
|
|
2014-09-24 02:23:01 +08:00
|
|
|
for (; start < end && start < MAIN_SEGS(sbi); start++) {
|
2013-11-22 09:09:59 +08:00
|
|
|
struct f2fs_sit_block *sit_blk;
|
|
|
|
struct page *page;
|
|
|
|
|
2016-09-24 12:29:18 +08:00
|
|
|
se = &sit_i->sentries[start];
|
2013-11-22 09:09:59 +08:00
|
|
|
page = get_current_sit_page(sbi, start);
|
2018-09-18 08:36:06 +08:00
|
|
|
if (IS_ERR(page))
|
|
|
|
return PTR_ERR(page);
|
2013-11-22 09:09:59 +08:00
|
|
|
sit_blk = (struct f2fs_sit_block *)page_address(page);
|
|
|
|
sit = sit_blk->entries[SIT_ENTRY_OFFSET(sit_i, start)];
|
|
|
|
f2fs_put_page(page, 1);
|
2016-08-19 23:13:47 +08:00
|
|
|
|
2017-12-20 11:16:34 +08:00
|
|
|
err = check_block_count(sbi, start, &sit);
|
|
|
|
if (err)
|
|
|
|
return err;
|
2013-11-22 09:09:59 +08:00
|
|
|
seg_info_from_raw_sit(se, &sit);
|
2018-04-25 11:34:05 +08:00
|
|
|
if (IS_NODESEG(se->type))
|
|
|
|
total_node_blocks += se->valid_blocks;
|
2015-05-01 13:37:50 +08:00
|
|
|
|
|
|
|
/* build discard map only one time */
|
f2fs: fix to avoid NULL pointer dereference on se->discard_map
https://bugzilla.kernel.org/show_bug.cgi?id=200951
These is a NULL pointer dereference issue reported in bugzilla:
Hi,
in the setup there is a SATA SSD connected to a SATA-to-USB bridge.
The disc is "Samsung SSD 850 PRO 256G" which supports TRIM.
There are four partitions:
sda1: FAT /boot
sda2: F2FS /
sda3: F2FS /home
sda4: F2FS
The bridge is ASMT1153e which uses the "uas" driver.
There is no TRIM pass-through, so, when mounting it reports:
mounting with "discard" option, but the device does not support discard
The USB host is USB3.0 and UASP capable. It is the one on RK3399.
Given this everything works fine, except there is no TRIM support.
In order to enable TRIM a new UDEV rule is added [1]:
/etc/udev/rules.d/10-sata-bridge-trim.rules:
ACTION=="add|change", ATTRS{idVendor}=="174c", ATTRS{idProduct}=="55aa", SUBSYSTEM=="scsi_disk", ATTR{provisioning_mode}="unmap"
After reboot any F2FS write hangs forever and dmesg reports:
Unable to handle kernel NULL pointer dereference
Also tested on a x86_64 system: works fine even with TRIM enabled.
same disc
same bridge
different usb host controller
different cpu architecture
not root filesystem
Regards,
Vicenç.
[1] Post #5 in https://bbs.archlinux.org/viewtopic.php?id=236280
Unable to handle kernel NULL pointer dereference at virtual address 000000000000003e
Mem abort info:
ESR = 0x96000004
Exception class = DABT (current EL), IL = 32 bits
SET = 0, FnV = 0
EA = 0, S1PTW = 0
Data abort info:
ISV = 0, ISS = 0x00000004
CM = 0, WnR = 0
user pgtable: 4k pages, 48-bit VAs, pgdp = 00000000626e3122
[000000000000003e] pgd=0000000000000000
Internal error: Oops: 96000004 [#1] SMP
Modules linked in: overlay snd_soc_hdmi_codec rc_cec dw_hdmi_i2s_audio dw_hdmi_cec snd_soc_simple_card snd_soc_simple_card_utils snd_soc_rockchip_i2s rockchip_rga snd_soc_rockchip_pcm rockchipdrm videobuf2_dma_sg v4l2_mem2mem rtc_rk808 videobuf2_memops analogix_dp videobuf2_v4l2 videobuf2_common dw_hdmi dw_wdt cec rc_core videodev drm_kms_helper media drm rockchip_thermal rockchip_saradc realtek drm_panel_orientation_quirks syscopyarea sysfillrect sysimgblt fb_sys_fops dwmac_rk stmmac_platform stmmac pwm_bl squashfs loop crypto_user gpio_keys hid_kensington
CPU: 5 PID: 957 Comm: nvim Not tainted 4.19.0-rc1-1-ARCH #1
Hardware name: Sapphire-RK3399 Board (DT)
pstate: 00000005 (nzcv daif -PAN -UAO)
pc : update_sit_entry+0x304/0x4b0
lr : update_sit_entry+0x108/0x4b0
sp : ffff00000ca13bd0
x29: ffff00000ca13bd0 x28: 000000000000003e
x27: 0000000000000020 x26: 0000000000080000
x25: 0000000000000048 x24: ffff8000ebb85cf8
x23: 0000000000000253 x22: 00000000ffffffff
x21: 00000000000535f2 x20: 00000000ffffffdf
x19: ffff8000eb9e6800 x18: ffff8000eb9e6be8
x17: 0000000007ce6926 x16: 000000001c83ffa8
x15: 0000000000000000 x14: ffff8000f602df90
x13: 0000000000000006 x12: 0000000000000040
x11: 0000000000000228 x10: 0000000000000000
x9 : 0000000000000000 x8 : 0000000000000000
x7 : 00000000000535f2 x6 : ffff8000ebff3440
x5 : ffff8000ebff3440 x4 : ffff8000ebe3a6c8
x3 : 00000000ffffffff x2 : 0000000000000020
x1 : 0000000000000000 x0 : ffff8000eb9e5800
Process nvim (pid: 957, stack limit = 0x0000000063a78320)
Call trace:
update_sit_entry+0x304/0x4b0
f2fs_invalidate_blocks+0x98/0x140
truncate_node+0x90/0x400
f2fs_remove_inode_page+0xe8/0x340
f2fs_evict_inode+0x2b0/0x408
evict+0xe0/0x1e0
iput+0x160/0x260
do_unlinkat+0x214/0x298
__arm64_sys_unlinkat+0x3c/0x68
el0_svc_handler+0x94/0x118
el0_svc+0x8/0xc
Code: f9400800 b9488400 36080140 f9400f01 (387c4820)
---[ end trace a0f21a307118c477 ]---
The reason is it is possible to enable discard flag on block queue via
UDEV, but during mount, f2fs will initialize se->discard_map only if
this flag is set, once the flag is set after mount, f2fs may dereference
NULL pointer on se->discard_map.
So this patch does below changes to fix this issue:
- initialize and update se->discard_map all the time.
- don't clear DISCARD option if device has no QUEUE_FLAG_DISCARD flag
during mount.
- don't issue small discard on zoned block device.
- introduce some functions to enhance the readability.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Tested-by: Vicente Bergas <vicencb@gmail.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-09-04 03:52:17 +08:00
|
|
|
if (is_set_ckpt_flags(sbi, CP_TRIMMED_FLAG)) {
|
|
|
|
memset(se->discard_map, 0xff,
|
|
|
|
SIT_VBLOCK_MAP_SIZE);
|
|
|
|
} else {
|
|
|
|
memcpy(se->discard_map,
|
|
|
|
se->cur_valid_map,
|
|
|
|
SIT_VBLOCK_MAP_SIZE);
|
|
|
|
sbi->discard_blks +=
|
|
|
|
sbi->blocks_per_seg -
|
|
|
|
se->valid_blocks;
|
2016-08-03 01:56:40 +08:00
|
|
|
}
|
2015-05-01 13:37:50 +08:00
|
|
|
|
2016-08-19 23:13:47 +08:00
|
|
|
if (sbi->segs_per_sec > 1)
|
|
|
|
get_sec_entry(sbi, start)->valid_blocks +=
|
|
|
|
se->valid_blocks;
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
}
|
2013-11-22 09:09:59 +08:00
|
|
|
start_blk += readed;
|
|
|
|
} while (start_blk < sit_blk_cnt);
|
2016-08-19 23:13:47 +08:00
|
|
|
|
|
|
|
down_read(&curseg->journal_rwsem);
|
|
|
|
for (i = 0; i < sits_in_cursum(journal); i++) {
|
|
|
|
unsigned int old_valid_blocks;
|
|
|
|
|
|
|
|
start = le32_to_cpu(segno_in_journal(journal, i));
|
2018-04-25 05:44:16 +08:00
|
|
|
if (start >= MAIN_SEGS(sbi)) {
|
|
|
|
f2fs_msg(sbi->sb, KERN_ERR,
|
|
|
|
"Wrong journal entry on segno %u",
|
|
|
|
start);
|
|
|
|
set_sbi_flag(sbi, SBI_NEED_FSCK);
|
|
|
|
err = -EINVAL;
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
|
2016-08-19 23:13:47 +08:00
|
|
|
se = &sit_i->sentries[start];
|
|
|
|
sit = sit_in_journal(journal, i);
|
|
|
|
|
|
|
|
old_valid_blocks = se->valid_blocks;
|
2018-04-25 11:34:05 +08:00
|
|
|
if (IS_NODESEG(se->type))
|
|
|
|
total_node_blocks -= old_valid_blocks;
|
2016-08-19 23:13:47 +08:00
|
|
|
|
2017-12-20 11:16:34 +08:00
|
|
|
err = check_block_count(sbi, start, &sit);
|
|
|
|
if (err)
|
|
|
|
break;
|
2016-08-19 23:13:47 +08:00
|
|
|
seg_info_from_raw_sit(se, &sit);
|
2018-04-25 11:34:05 +08:00
|
|
|
if (IS_NODESEG(se->type))
|
|
|
|
total_node_blocks += se->valid_blocks;
|
2016-08-19 23:13:47 +08:00
|
|
|
|
f2fs: fix to avoid NULL pointer dereference on se->discard_map
https://bugzilla.kernel.org/show_bug.cgi?id=200951
These is a NULL pointer dereference issue reported in bugzilla:
Hi,
in the setup there is a SATA SSD connected to a SATA-to-USB bridge.
The disc is "Samsung SSD 850 PRO 256G" which supports TRIM.
There are four partitions:
sda1: FAT /boot
sda2: F2FS /
sda3: F2FS /home
sda4: F2FS
The bridge is ASMT1153e which uses the "uas" driver.
There is no TRIM pass-through, so, when mounting it reports:
mounting with "discard" option, but the device does not support discard
The USB host is USB3.0 and UASP capable. It is the one on RK3399.
Given this everything works fine, except there is no TRIM support.
In order to enable TRIM a new UDEV rule is added [1]:
/etc/udev/rules.d/10-sata-bridge-trim.rules:
ACTION=="add|change", ATTRS{idVendor}=="174c", ATTRS{idProduct}=="55aa", SUBSYSTEM=="scsi_disk", ATTR{provisioning_mode}="unmap"
After reboot any F2FS write hangs forever and dmesg reports:
Unable to handle kernel NULL pointer dereference
Also tested on a x86_64 system: works fine even with TRIM enabled.
same disc
same bridge
different usb host controller
different cpu architecture
not root filesystem
Regards,
Vicenç.
[1] Post #5 in https://bbs.archlinux.org/viewtopic.php?id=236280
Unable to handle kernel NULL pointer dereference at virtual address 000000000000003e
Mem abort info:
ESR = 0x96000004
Exception class = DABT (current EL), IL = 32 bits
SET = 0, FnV = 0
EA = 0, S1PTW = 0
Data abort info:
ISV = 0, ISS = 0x00000004
CM = 0, WnR = 0
user pgtable: 4k pages, 48-bit VAs, pgdp = 00000000626e3122
[000000000000003e] pgd=0000000000000000
Internal error: Oops: 96000004 [#1] SMP
Modules linked in: overlay snd_soc_hdmi_codec rc_cec dw_hdmi_i2s_audio dw_hdmi_cec snd_soc_simple_card snd_soc_simple_card_utils snd_soc_rockchip_i2s rockchip_rga snd_soc_rockchip_pcm rockchipdrm videobuf2_dma_sg v4l2_mem2mem rtc_rk808 videobuf2_memops analogix_dp videobuf2_v4l2 videobuf2_common dw_hdmi dw_wdt cec rc_core videodev drm_kms_helper media drm rockchip_thermal rockchip_saradc realtek drm_panel_orientation_quirks syscopyarea sysfillrect sysimgblt fb_sys_fops dwmac_rk stmmac_platform stmmac pwm_bl squashfs loop crypto_user gpio_keys hid_kensington
CPU: 5 PID: 957 Comm: nvim Not tainted 4.19.0-rc1-1-ARCH #1
Hardware name: Sapphire-RK3399 Board (DT)
pstate: 00000005 (nzcv daif -PAN -UAO)
pc : update_sit_entry+0x304/0x4b0
lr : update_sit_entry+0x108/0x4b0
sp : ffff00000ca13bd0
x29: ffff00000ca13bd0 x28: 000000000000003e
x27: 0000000000000020 x26: 0000000000080000
x25: 0000000000000048 x24: ffff8000ebb85cf8
x23: 0000000000000253 x22: 00000000ffffffff
x21: 00000000000535f2 x20: 00000000ffffffdf
x19: ffff8000eb9e6800 x18: ffff8000eb9e6be8
x17: 0000000007ce6926 x16: 000000001c83ffa8
x15: 0000000000000000 x14: ffff8000f602df90
x13: 0000000000000006 x12: 0000000000000040
x11: 0000000000000228 x10: 0000000000000000
x9 : 0000000000000000 x8 : 0000000000000000
x7 : 00000000000535f2 x6 : ffff8000ebff3440
x5 : ffff8000ebff3440 x4 : ffff8000ebe3a6c8
x3 : 00000000ffffffff x2 : 0000000000000020
x1 : 0000000000000000 x0 : ffff8000eb9e5800
Process nvim (pid: 957, stack limit = 0x0000000063a78320)
Call trace:
update_sit_entry+0x304/0x4b0
f2fs_invalidate_blocks+0x98/0x140
truncate_node+0x90/0x400
f2fs_remove_inode_page+0xe8/0x340
f2fs_evict_inode+0x2b0/0x408
evict+0xe0/0x1e0
iput+0x160/0x260
do_unlinkat+0x214/0x298
__arm64_sys_unlinkat+0x3c/0x68
el0_svc_handler+0x94/0x118
el0_svc+0x8/0xc
Code: f9400800 b9488400 36080140 f9400f01 (387c4820)
---[ end trace a0f21a307118c477 ]---
The reason is it is possible to enable discard flag on block queue via
UDEV, but during mount, f2fs will initialize se->discard_map only if
this flag is set, once the flag is set after mount, f2fs may dereference
NULL pointer on se->discard_map.
So this patch does below changes to fix this issue:
- initialize and update se->discard_map all the time.
- don't clear DISCARD option if device has no QUEUE_FLAG_DISCARD flag
during mount.
- don't issue small discard on zoned block device.
- introduce some functions to enhance the readability.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Tested-by: Vicente Bergas <vicencb@gmail.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-09-04 03:52:17 +08:00
|
|
|
if (is_set_ckpt_flags(sbi, CP_TRIMMED_FLAG)) {
|
|
|
|
memset(se->discard_map, 0xff, SIT_VBLOCK_MAP_SIZE);
|
|
|
|
} else {
|
|
|
|
memcpy(se->discard_map, se->cur_valid_map,
|
|
|
|
SIT_VBLOCK_MAP_SIZE);
|
|
|
|
sbi->discard_blks += old_valid_blocks;
|
|
|
|
sbi->discard_blks -= se->valid_blocks;
|
2016-08-19 23:13:47 +08:00
|
|
|
}
|
|
|
|
|
2018-04-25 19:38:17 +08:00
|
|
|
if (sbi->segs_per_sec > 1) {
|
2016-08-19 23:13:47 +08:00
|
|
|
get_sec_entry(sbi, start)->valid_blocks +=
|
2018-04-25 19:38:17 +08:00
|
|
|
se->valid_blocks;
|
|
|
|
get_sec_entry(sbi, start)->valid_blocks -=
|
|
|
|
old_valid_blocks;
|
|
|
|
}
|
2016-08-19 23:13:47 +08:00
|
|
|
}
|
|
|
|
up_read(&curseg->journal_rwsem);
|
2018-04-25 11:34:05 +08:00
|
|
|
|
|
|
|
if (!err && total_node_blocks != valid_node_count(sbi)) {
|
|
|
|
f2fs_msg(sbi->sb, KERN_ERR,
|
|
|
|
"SIT is corrupted node# %u vs %u",
|
|
|
|
total_node_blocks, valid_node_count(sbi));
|
|
|
|
set_sbi_flag(sbi, SBI_NEED_FSCK);
|
|
|
|
err = -EINVAL;
|
|
|
|
}
|
|
|
|
|
2017-12-20 11:16:34 +08:00
|
|
|
return err;
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
static void init_free_segmap(struct f2fs_sb_info *sbi)
|
|
|
|
{
|
|
|
|
unsigned int start;
|
|
|
|
int type;
|
|
|
|
|
2014-09-24 02:23:01 +08:00
|
|
|
for (start = 0; start < MAIN_SEGS(sbi); start++) {
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
struct seg_entry *sentry = get_seg_entry(sbi, start);
|
|
|
|
if (!sentry->valid_blocks)
|
|
|
|
__set_free(sbi, start);
|
2016-11-15 10:20:10 +08:00
|
|
|
else
|
|
|
|
SIT_I(sbi)->written_valid_blocks +=
|
|
|
|
sentry->valid_blocks;
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
/* set use the current segments */
|
|
|
|
for (type = CURSEG_HOT_DATA; type <= CURSEG_COLD_NODE; type++) {
|
|
|
|
struct curseg_info *curseg_t = CURSEG_I(sbi, type);
|
|
|
|
__set_test_and_inuse(sbi, curseg_t->segno);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
static void init_dirty_segmap(struct f2fs_sb_info *sbi)
|
|
|
|
{
|
|
|
|
struct dirty_seglist_info *dirty_i = DIRTY_I(sbi);
|
|
|
|
struct free_segmap_info *free_i = FREE_I(sbi);
|
2014-09-24 02:23:01 +08:00
|
|
|
unsigned int segno = 0, offset = 0;
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
unsigned short valid_blocks;
|
|
|
|
|
2013-06-16 08:49:11 +08:00
|
|
|
while (1) {
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
/* find dirty segment based on free segmap */
|
2014-09-24 02:23:01 +08:00
|
|
|
segno = find_next_inuse(free_i, MAIN_SEGS(sbi), offset);
|
|
|
|
if (segno >= MAIN_SEGS(sbi))
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
break;
|
|
|
|
offset = segno + 1;
|
2017-04-08 05:33:22 +08:00
|
|
|
valid_blocks = get_valid_blocks(sbi, segno, false);
|
2014-09-03 07:24:11 +08:00
|
|
|
if (valid_blocks == sbi->blocks_per_seg || !valid_blocks)
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
continue;
|
2014-09-03 07:24:11 +08:00
|
|
|
if (valid_blocks > sbi->blocks_per_seg) {
|
|
|
|
f2fs_bug_on(sbi, 1);
|
|
|
|
continue;
|
|
|
|
}
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
mutex_lock(&dirty_i->seglist_lock);
|
|
|
|
__locate_dirty_segment(sbi, segno, DIRTY);
|
|
|
|
mutex_unlock(&dirty_i->seglist_lock);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2013-03-31 12:26:03 +08:00
|
|
|
static int init_victim_secmap(struct f2fs_sb_info *sbi)
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
{
|
|
|
|
struct dirty_seglist_info *dirty_i = DIRTY_I(sbi);
|
2014-09-24 02:23:01 +08:00
|
|
|
unsigned int bitmap_size = f2fs_bitmap_size(MAIN_SECS(sbi));
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
|
2017-11-30 19:28:18 +08:00
|
|
|
dirty_i->victim_secmap = f2fs_kvzalloc(sbi, bitmap_size, GFP_KERNEL);
|
2013-03-31 12:26:03 +08:00
|
|
|
if (!dirty_i->victim_secmap)
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
return -ENOMEM;
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
static int build_dirty_segmap(struct f2fs_sb_info *sbi)
|
|
|
|
{
|
|
|
|
struct dirty_seglist_info *dirty_i;
|
|
|
|
unsigned int bitmap_size, i;
|
|
|
|
|
|
|
|
/* allocate memory for dirty segments list information */
|
2017-11-30 19:28:17 +08:00
|
|
|
dirty_i = f2fs_kzalloc(sbi, sizeof(struct dirty_seglist_info),
|
|
|
|
GFP_KERNEL);
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
if (!dirty_i)
|
|
|
|
return -ENOMEM;
|
|
|
|
|
|
|
|
SM_I(sbi)->dirty_info = dirty_i;
|
|
|
|
mutex_init(&dirty_i->seglist_lock);
|
|
|
|
|
2014-09-24 02:23:01 +08:00
|
|
|
bitmap_size = f2fs_bitmap_size(MAIN_SEGS(sbi));
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
|
|
|
|
for (i = 0; i < NR_DIRTY_TYPE; i++) {
|
2017-11-30 19:28:18 +08:00
|
|
|
dirty_i->dirty_segmap[i] = f2fs_kvzalloc(sbi, bitmap_size,
|
|
|
|
GFP_KERNEL);
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
if (!dirty_i->dirty_segmap[i])
|
|
|
|
return -ENOMEM;
|
|
|
|
}
|
|
|
|
|
|
|
|
init_dirty_segmap(sbi);
|
2013-03-31 12:26:03 +08:00
|
|
|
return init_victim_secmap(sbi);
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
}
|
|
|
|
|
2012-11-29 12:28:09 +08:00
|
|
|
/*
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
* Update min, max modified time for cost-benefit GC algorithm
|
|
|
|
*/
|
|
|
|
static void init_min_max_mtime(struct f2fs_sb_info *sbi)
|
|
|
|
{
|
|
|
|
struct sit_info *sit_i = SIT_I(sbi);
|
|
|
|
unsigned int segno;
|
|
|
|
|
2017-10-30 17:49:53 +08:00
|
|
|
down_write(&sit_i->sentry_lock);
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
|
2018-05-15 18:59:55 +08:00
|
|
|
sit_i->min_mtime = ULLONG_MAX;
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
|
2014-09-24 02:23:01 +08:00
|
|
|
for (segno = 0; segno < MAIN_SEGS(sbi); segno += sbi->segs_per_sec) {
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
unsigned int i;
|
|
|
|
unsigned long long mtime = 0;
|
|
|
|
|
|
|
|
for (i = 0; i < sbi->segs_per_sec; i++)
|
|
|
|
mtime += get_seg_entry(sbi, segno + i)->mtime;
|
|
|
|
|
|
|
|
mtime = div_u64(mtime, sbi->segs_per_sec);
|
|
|
|
|
|
|
|
if (sit_i->min_mtime > mtime)
|
|
|
|
sit_i->min_mtime = mtime;
|
|
|
|
}
|
2018-06-04 23:20:17 +08:00
|
|
|
sit_i->max_mtime = get_mtime(sbi, false);
|
2017-10-30 17:49:53 +08:00
|
|
|
up_write(&sit_i->sentry_lock);
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
}
|
|
|
|
|
f2fs: clean up symbol namespace
As Ted reported:
"Hi, I was looking at f2fs's sources recently, and I noticed that there
is a very large number of non-static symbols which don't have a f2fs
prefix. There's well over a hundred (see attached below).
As one example, in fs/f2fs/dir.c there is:
unsigned char get_de_type(struct f2fs_dir_entry *de)
This function is clearly only useful for f2fs, but it has a generic
name. This means that if any other file system tries to have the same
symbol name, there will be a symbol conflict and the kernel would not
successfully build. It also means that when someone is looking f2fs
sources, it's not at all obvious whether a function such as
read_data_page(), invalidate_blocks(), is a generic kernel function
found in the fs, mm, or block layers, or a f2fs specific function.
You might want to fix this at some point. Hopefully Kent's bcachefs
isn't similarly using genericly named functions, since that might
cause conflicts with f2fs's functions --- but just as this would be a
problem that we would rightly insist that Kent fix, this is something
that we should have rightly insisted that f2fs should have fixed
before it was integrated into the mainline kernel.
acquire_orphan_inode
add_ino_entry
add_orphan_inode
allocate_data_block
allocate_new_segments
alloc_nid
alloc_nid_done
alloc_nid_failed
available_free_memory
...."
This patch adds "f2fs_" prefix for all non-static symbols in order to:
a) avoid conflict with other kernel generic symbols;
b) to indicate the function is f2fs specific one instead of generic
one;
Reported-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-30 00:20:41 +08:00
|
|
|
int f2fs_build_segment_manager(struct f2fs_sb_info *sbi)
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
{
|
|
|
|
struct f2fs_super_block *raw_super = F2FS_RAW_SUPER(sbi);
|
|
|
|
struct f2fs_checkpoint *ckpt = F2FS_CKPT(sbi);
|
2012-12-01 09:56:13 +08:00
|
|
|
struct f2fs_sm_info *sm_info;
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
int err;
|
|
|
|
|
2017-11-30 19:28:17 +08:00
|
|
|
sm_info = f2fs_kzalloc(sbi, sizeof(struct f2fs_sm_info), GFP_KERNEL);
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
if (!sm_info)
|
|
|
|
return -ENOMEM;
|
|
|
|
|
|
|
|
/* init sm info */
|
|
|
|
sbi->sm_info = sm_info;
|
|
|
|
sm_info->seg0_blkaddr = le32_to_cpu(raw_super->segment0_blkaddr);
|
|
|
|
sm_info->main_blkaddr = le32_to_cpu(raw_super->main_blkaddr);
|
|
|
|
sm_info->segment_count = le32_to_cpu(raw_super->segment_count);
|
|
|
|
sm_info->reserved_segments = le32_to_cpu(ckpt->rsvd_segment_count);
|
|
|
|
sm_info->ovp_segments = le32_to_cpu(ckpt->overprov_segment_count);
|
|
|
|
sm_info->main_segments = le32_to_cpu(raw_super->segment_count_main);
|
|
|
|
sm_info->ssa_blkaddr = le32_to_cpu(raw_super->ssa_blkaddr);
|
2014-03-19 13:17:21 +08:00
|
|
|
sm_info->rec_prefree_segments = sm_info->main_segments *
|
|
|
|
DEF_RECLAIM_PREFREE_SEGMENTS / 100;
|
2016-07-14 09:23:35 +08:00
|
|
|
if (sm_info->rec_prefree_segments > DEF_MAX_RECLAIM_PREFREE_SEGMENTS)
|
|
|
|
sm_info->rec_prefree_segments = DEF_MAX_RECLAIM_PREFREE_SEGMENTS;
|
|
|
|
|
2016-06-14 00:47:48 +08:00
|
|
|
if (!test_opt(sbi, LFS))
|
|
|
|
sm_info->ipu_policy = 1 << F2FS_IPU_FSYNC;
|
2013-11-07 12:13:42 +08:00
|
|
|
sm_info->min_ipu_util = DEF_MIN_IPU_UTIL;
|
2014-09-11 07:53:02 +08:00
|
|
|
sm_info->min_fsync_blocks = DEF_MIN_FSYNC_BLOCKS;
|
2018-08-10 08:53:34 +08:00
|
|
|
sm_info->min_seq_blocks = sbi->blocks_per_seg * sbi->segs_per_sec;
|
2017-03-25 08:05:13 +08:00
|
|
|
sm_info->min_hot_blocks = DEF_MIN_HOT_BLOCKS;
|
2017-10-28 16:52:33 +08:00
|
|
|
sm_info->min_ssr_sections = reserved_sections(sbi);
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
|
f2fs: refactor flush_sit_entries codes for reducing SIT writes
In commit aec71382c681 ("f2fs: refactor flush_nat_entries codes for reducing NAT
writes"), we descripte the issue as below:
"Although building NAT journal in cursum reduce the read/write work for NAT
block, but previous design leave us lower performance when write checkpoint
frequently for these cases:
1. if journal in cursum has already full, it's a bit of waste that we flush all
nat entries to page for persistence, but not to cache any entries.
2. if journal in cursum is not full, we fill nat entries to journal util
journal is full, then flush the left dirty entries to disk without merge
journaled entries, so these journaled entries may be flushed to disk at next
checkpoint but lost chance to flushed last time."
Actually, we have the same problem in using SIT journal area.
In this patch, firstly we will update sit journal with dirty entries as many as
possible. Secondly if there is no space in sit journal, we will remove all
entries in journal and walk through the whole dirty entry bitmap of sit,
accounting dirty sit entries located in same SIT block to sit entry set. All
entry sets are linked to list sit_entry_set in sm_info, sorted ascending order
by count of entries in set. Later we flush entries in set which have fewest
entries into journal as many as we can, and then flush dense set with merged
entries to disk.
In this way we can use sit journal area more effectively, also we will reduce
SIT update, result in gaining in performance and saving lifetime of flash
device.
In my testing environment, it shows this patch can help to reduce SIT block
update obviously.
virtual machine + hard disk:
fsstress -p 20 -n 400 -l 5
sit page num cp count sit pages/cp
based 2006.50 1349.75 1.486
patched 1566.25 1463.25 1.070
Our latency of merging op is small when handling a great number of dirty SIT
entries in flush_sit_entries:
latency(ns) dirty sit count
36038 2151
49168 2123
37174 2232
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-09-04 18:13:01 +08:00
|
|
|
INIT_LIST_HEAD(&sm_info->sit_entry_set);
|
|
|
|
|
f2fs: fix summary info corruption
Sometimes, after running generic/270 of fstest, fsck reports summary
info and actual position of block address in direct node becoming
inconsistent.
The root cause is race in between __f2fs_replace_block and change_curseg
as below:
Thread A Thread B
- __clone_blkaddrs
- f2fs_replace_block
- __f2fs_replace_block
- segnoA = GET_SEGNO(sbi, blkaddrA);
- type = se->type:=CURSEG_HOT_DATA
- if (!IS_CURSEG(sbi, segnoA))
type = CURSEG_WARM_DATA
- allocate_data_block
- allocate_segment
- get_ssr_segment
- change_curseg(segnoA, CURSEG_HOT_DATA)
- change_curseg(segnoA, CURSEG_WARM_DATA)
- reset_curseg
- __set_sit_entry_type
- change se->type from CURSEG_HOT_DATA to CURSEG_WARM_DATA
So finally, hot curseg locates in segnoA, but type of segnoA becomes
CURSEG_WARM_DATA.
Then if we invoke __f2fs_replace_block(blkaddrB, blkaddrA, true, false),
as blkaddrA locates in segnoA, so we will move warm type curseg to segnoA,
then change its summary cache and writeback it to summary block.
But segnoA is used by hot type curseg too, once it moves or persist, it
will cover summary block content with inner old summary cache, result in
inconsistent status.
This patch tries to fix this issue by introduce global curseg lock to avoid
race in between __f2fs_replace_block and change_curseg.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-11-02 20:41:03 +08:00
|
|
|
init_rwsem(&sm_info->curseg_lock);
|
|
|
|
|
2017-06-01 16:43:51 +08:00
|
|
|
if (!f2fs_readonly(sbi->sb)) {
|
f2fs: clean up symbol namespace
As Ted reported:
"Hi, I was looking at f2fs's sources recently, and I noticed that there
is a very large number of non-static symbols which don't have a f2fs
prefix. There's well over a hundred (see attached below).
As one example, in fs/f2fs/dir.c there is:
unsigned char get_de_type(struct f2fs_dir_entry *de)
This function is clearly only useful for f2fs, but it has a generic
name. This means that if any other file system tries to have the same
symbol name, there will be a symbol conflict and the kernel would not
successfully build. It also means that when someone is looking f2fs
sources, it's not at all obvious whether a function such as
read_data_page(), invalidate_blocks(), is a generic kernel function
found in the fs, mm, or block layers, or a f2fs specific function.
You might want to fix this at some point. Hopefully Kent's bcachefs
isn't similarly using genericly named functions, since that might
cause conflicts with f2fs's functions --- but just as this would be a
problem that we would rightly insist that Kent fix, this is something
that we should have rightly insisted that f2fs should have fixed
before it was integrated into the mainline kernel.
acquire_orphan_inode
add_ino_entry
add_orphan_inode
allocate_data_block
allocate_new_segments
alloc_nid
alloc_nid_done
alloc_nid_failed
available_free_memory
...."
This patch adds "f2fs_" prefix for all non-static symbols in order to:
a) avoid conflict with other kernel generic symbols;
b) to indicate the function is f2fs specific one instead of generic
one;
Reported-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-30 00:20:41 +08:00
|
|
|
err = f2fs_create_flush_cmd_control(sbi);
|
2014-04-27 14:21:33 +08:00
|
|
|
if (err)
|
2014-04-27 14:21:21 +08:00
|
|
|
return err;
|
2014-04-02 14:34:36 +08:00
|
|
|
}
|
|
|
|
|
2017-01-12 06:40:24 +08:00
|
|
|
err = create_discard_cmd_control(sbi);
|
|
|
|
if (err)
|
|
|
|
return err;
|
|
|
|
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
err = build_sit_info(sbi);
|
|
|
|
if (err)
|
|
|
|
return err;
|
|
|
|
err = build_free_segmap(sbi);
|
|
|
|
if (err)
|
|
|
|
return err;
|
|
|
|
err = build_curseg(sbi);
|
|
|
|
if (err)
|
|
|
|
return err;
|
|
|
|
|
|
|
|
/* reinit free segmap based on SIT */
|
2017-12-20 11:16:34 +08:00
|
|
|
err = build_sit_entries(sbi);
|
|
|
|
if (err)
|
|
|
|
return err;
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
|
|
|
|
init_free_segmap(sbi);
|
|
|
|
err = build_dirty_segmap(sbi);
|
|
|
|
if (err)
|
|
|
|
return err;
|
|
|
|
|
|
|
|
init_min_max_mtime(sbi);
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
static void discard_dirty_segmap(struct f2fs_sb_info *sbi,
|
|
|
|
enum dirty_type dirty_type)
|
|
|
|
{
|
|
|
|
struct dirty_seglist_info *dirty_i = DIRTY_I(sbi);
|
|
|
|
|
|
|
|
mutex_lock(&dirty_i->seglist_lock);
|
2015-09-23 04:50:47 +08:00
|
|
|
kvfree(dirty_i->dirty_segmap[dirty_type]);
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
dirty_i->nr_dirty[dirty_type] = 0;
|
|
|
|
mutex_unlock(&dirty_i->seglist_lock);
|
|
|
|
}
|
|
|
|
|
2013-03-31 12:26:03 +08:00
|
|
|
static void destroy_victim_secmap(struct f2fs_sb_info *sbi)
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
{
|
|
|
|
struct dirty_seglist_info *dirty_i = DIRTY_I(sbi);
|
2015-09-23 04:50:47 +08:00
|
|
|
kvfree(dirty_i->victim_secmap);
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
static void destroy_dirty_segmap(struct f2fs_sb_info *sbi)
|
|
|
|
{
|
|
|
|
struct dirty_seglist_info *dirty_i = DIRTY_I(sbi);
|
|
|
|
int i;
|
|
|
|
|
|
|
|
if (!dirty_i)
|
|
|
|
return;
|
|
|
|
|
|
|
|
/* discard pre-free/dirty segments list */
|
|
|
|
for (i = 0; i < NR_DIRTY_TYPE; i++)
|
|
|
|
discard_dirty_segmap(sbi, i);
|
|
|
|
|
2013-03-31 12:26:03 +08:00
|
|
|
destroy_victim_secmap(sbi);
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
SM_I(sbi)->dirty_info = NULL;
|
|
|
|
kfree(dirty_i);
|
|
|
|
}
|
|
|
|
|
|
|
|
static void destroy_curseg(struct f2fs_sb_info *sbi)
|
|
|
|
{
|
|
|
|
struct curseg_info *array = SM_I(sbi)->curseg_array;
|
|
|
|
int i;
|
|
|
|
|
|
|
|
if (!array)
|
|
|
|
return;
|
|
|
|
SM_I(sbi)->curseg_array = NULL;
|
f2fs: split journal cache from curseg cache
In curseg cache, f2fs caches two different parts:
- datas of current summay block, i.e. summary entries, footer info.
- journal info, i.e. sparse nat/sit entries or io stat info.
With this approach, 1) it may cause higher lock contention when we access
or update both of the parts of cache since we use the same mutex lock
curseg_mutex to protect the cache. 2) current summary block with last
journal info will be writebacked into device as a normal summary block
when flushing, however, we treat journal info as valid one only in current
summary, so most normal summary blocks contain junk journal data, it wastes
remaining space of summary block.
So, in order to fix above issues, we split curseg cache into two parts:
a) current summary block, protected by original mutex lock curseg_mutex
b) journal cache, protected by newly introduced r/w semaphore journal_rwsem
When loading curseg cache during ->mount, we store summary info and
journal info into different caches; When doing checkpoint, we combine
datas of two cache into current summary block for persisting.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-02-19 18:08:46 +08:00
|
|
|
for (i = 0; i < NR_CURSEG_TYPE; i++) {
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
kfree(array[i].sum_blk);
|
f2fs: split journal cache from curseg cache
In curseg cache, f2fs caches two different parts:
- datas of current summay block, i.e. summary entries, footer info.
- journal info, i.e. sparse nat/sit entries or io stat info.
With this approach, 1) it may cause higher lock contention when we access
or update both of the parts of cache since we use the same mutex lock
curseg_mutex to protect the cache. 2) current summary block with last
journal info will be writebacked into device as a normal summary block
when flushing, however, we treat journal info as valid one only in current
summary, so most normal summary blocks contain junk journal data, it wastes
remaining space of summary block.
So, in order to fix above issues, we split curseg cache into two parts:
a) current summary block, protected by original mutex lock curseg_mutex
b) journal cache, protected by newly introduced r/w semaphore journal_rwsem
When loading curseg cache during ->mount, we store summary info and
journal info into different caches; When doing checkpoint, we combine
datas of two cache into current summary block for persisting.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-02-19 18:08:46 +08:00
|
|
|
kfree(array[i].journal);
|
|
|
|
}
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
kfree(array);
|
|
|
|
}
|
|
|
|
|
|
|
|
static void destroy_free_segmap(struct f2fs_sb_info *sbi)
|
|
|
|
{
|
|
|
|
struct free_segmap_info *free_i = SM_I(sbi)->free_info;
|
|
|
|
if (!free_i)
|
|
|
|
return;
|
|
|
|
SM_I(sbi)->free_info = NULL;
|
2015-09-23 04:50:47 +08:00
|
|
|
kvfree(free_i->free_segmap);
|
|
|
|
kvfree(free_i->free_secmap);
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
kfree(free_i);
|
|
|
|
}
|
|
|
|
|
|
|
|
static void destroy_sit_info(struct f2fs_sb_info *sbi)
|
|
|
|
{
|
|
|
|
struct sit_info *sit_i = SIT_I(sbi);
|
|
|
|
unsigned int start;
|
|
|
|
|
|
|
|
if (!sit_i)
|
|
|
|
return;
|
|
|
|
|
|
|
|
if (sit_i->sentries) {
|
2014-09-24 02:23:01 +08:00
|
|
|
for (start = 0; start < MAIN_SEGS(sbi); start++) {
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
kfree(sit_i->sentries[start].cur_valid_map);
|
2017-01-07 18:51:01 +08:00
|
|
|
#ifdef CONFIG_F2FS_CHECK_FS
|
|
|
|
kfree(sit_i->sentries[start].cur_valid_map_mir);
|
|
|
|
#endif
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
kfree(sit_i->sentries[start].ckpt_valid_map);
|
2015-05-01 13:37:50 +08:00
|
|
|
kfree(sit_i->sentries[start].discard_map);
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
}
|
|
|
|
}
|
2015-02-11 08:44:29 +08:00
|
|
|
kfree(sit_i->tmp_map);
|
|
|
|
|
2015-09-23 04:50:47 +08:00
|
|
|
kvfree(sit_i->sentries);
|
|
|
|
kvfree(sit_i->sec_entries);
|
|
|
|
kvfree(sit_i->dirty_sentries_bitmap);
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
|
|
|
|
SM_I(sbi)->sit_info = NULL;
|
|
|
|
kfree(sit_i->sit_bitmap);
|
2017-01-07 18:52:34 +08:00
|
|
|
#ifdef CONFIG_F2FS_CHECK_FS
|
|
|
|
kfree(sit_i->sit_bitmap_mir);
|
|
|
|
#endif
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
kfree(sit_i);
|
|
|
|
}
|
|
|
|
|
f2fs: clean up symbol namespace
As Ted reported:
"Hi, I was looking at f2fs's sources recently, and I noticed that there
is a very large number of non-static symbols which don't have a f2fs
prefix. There's well over a hundred (see attached below).
As one example, in fs/f2fs/dir.c there is:
unsigned char get_de_type(struct f2fs_dir_entry *de)
This function is clearly only useful for f2fs, but it has a generic
name. This means that if any other file system tries to have the same
symbol name, there will be a symbol conflict and the kernel would not
successfully build. It also means that when someone is looking f2fs
sources, it's not at all obvious whether a function such as
read_data_page(), invalidate_blocks(), is a generic kernel function
found in the fs, mm, or block layers, or a f2fs specific function.
You might want to fix this at some point. Hopefully Kent's bcachefs
isn't similarly using genericly named functions, since that might
cause conflicts with f2fs's functions --- but just as this would be a
problem that we would rightly insist that Kent fix, this is something
that we should have rightly insisted that f2fs should have fixed
before it was integrated into the mainline kernel.
acquire_orphan_inode
add_ino_entry
add_orphan_inode
allocate_data_block
allocate_new_segments
alloc_nid
alloc_nid_done
alloc_nid_failed
available_free_memory
...."
This patch adds "f2fs_" prefix for all non-static symbols in order to:
a) avoid conflict with other kernel generic symbols;
b) to indicate the function is f2fs specific one instead of generic
one;
Reported-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-30 00:20:41 +08:00
|
|
|
void f2fs_destroy_segment_manager(struct f2fs_sb_info *sbi)
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
{
|
|
|
|
struct f2fs_sm_info *sm_info = SM_I(sbi);
|
2014-04-27 14:21:21 +08:00
|
|
|
|
2013-11-06 09:12:04 +08:00
|
|
|
if (!sm_info)
|
|
|
|
return;
|
f2fs: clean up symbol namespace
As Ted reported:
"Hi, I was looking at f2fs's sources recently, and I noticed that there
is a very large number of non-static symbols which don't have a f2fs
prefix. There's well over a hundred (see attached below).
As one example, in fs/f2fs/dir.c there is:
unsigned char get_de_type(struct f2fs_dir_entry *de)
This function is clearly only useful for f2fs, but it has a generic
name. This means that if any other file system tries to have the same
symbol name, there will be a symbol conflict and the kernel would not
successfully build. It also means that when someone is looking f2fs
sources, it's not at all obvious whether a function such as
read_data_page(), invalidate_blocks(), is a generic kernel function
found in the fs, mm, or block layers, or a f2fs specific function.
You might want to fix this at some point. Hopefully Kent's bcachefs
isn't similarly using genericly named functions, since that might
cause conflicts with f2fs's functions --- but just as this would be a
problem that we would rightly insist that Kent fix, this is something
that we should have rightly insisted that f2fs should have fixed
before it was integrated into the mainline kernel.
acquire_orphan_inode
add_ino_entry
add_orphan_inode
allocate_data_block
allocate_new_segments
alloc_nid
alloc_nid_done
alloc_nid_failed
available_free_memory
...."
This patch adds "f2fs_" prefix for all non-static symbols in order to:
a) avoid conflict with other kernel generic symbols;
b) to indicate the function is f2fs specific one instead of generic
one;
Reported-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-30 00:20:41 +08:00
|
|
|
f2fs_destroy_flush_cmd_control(sbi, true);
|
2017-03-27 18:14:04 +08:00
|
|
|
destroy_discard_cmd_control(sbi);
|
f2fs: add segment operations
This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.
- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.
- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.
- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.
- To cache SIT entries, a simple array is used. The index for the array is the
segment number.
- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.
- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.
- This patch adds a default block allocation function which supports heap-based
allocation policy.
- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-11-02 16:09:16 +08:00
|
|
|
destroy_dirty_segmap(sbi);
|
|
|
|
destroy_curseg(sbi);
|
|
|
|
destroy_free_segmap(sbi);
|
|
|
|
destroy_sit_info(sbi);
|
|
|
|
sbi->sm_info = NULL;
|
|
|
|
kfree(sm_info);
|
|
|
|
}
|
2013-11-15 12:55:58 +08:00
|
|
|
|
f2fs: clean up symbol namespace
As Ted reported:
"Hi, I was looking at f2fs's sources recently, and I noticed that there
is a very large number of non-static symbols which don't have a f2fs
prefix. There's well over a hundred (see attached below).
As one example, in fs/f2fs/dir.c there is:
unsigned char get_de_type(struct f2fs_dir_entry *de)
This function is clearly only useful for f2fs, but it has a generic
name. This means that if any other file system tries to have the same
symbol name, there will be a symbol conflict and the kernel would not
successfully build. It also means that when someone is looking f2fs
sources, it's not at all obvious whether a function such as
read_data_page(), invalidate_blocks(), is a generic kernel function
found in the fs, mm, or block layers, or a f2fs specific function.
You might want to fix this at some point. Hopefully Kent's bcachefs
isn't similarly using genericly named functions, since that might
cause conflicts with f2fs's functions --- but just as this would be a
problem that we would rightly insist that Kent fix, this is something
that we should have rightly insisted that f2fs should have fixed
before it was integrated into the mainline kernel.
acquire_orphan_inode
add_ino_entry
add_orphan_inode
allocate_data_block
allocate_new_segments
alloc_nid
alloc_nid_done
alloc_nid_failed
available_free_memory
...."
This patch adds "f2fs_" prefix for all non-static symbols in order to:
a) avoid conflict with other kernel generic symbols;
b) to indicate the function is f2fs specific one instead of generic
one;
Reported-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-30 00:20:41 +08:00
|
|
|
int __init f2fs_create_segment_manager_caches(void)
|
2013-11-15 12:55:58 +08:00
|
|
|
{
|
|
|
|
discard_entry_slab = f2fs_kmem_cache_create("discard_entry",
|
2014-03-07 18:43:28 +08:00
|
|
|
sizeof(struct discard_entry));
|
2013-11-15 12:55:58 +08:00
|
|
|
if (!discard_entry_slab)
|
f2fs: refactor flush_sit_entries codes for reducing SIT writes
In commit aec71382c681 ("f2fs: refactor flush_nat_entries codes for reducing NAT
writes"), we descripte the issue as below:
"Although building NAT journal in cursum reduce the read/write work for NAT
block, but previous design leave us lower performance when write checkpoint
frequently for these cases:
1. if journal in cursum has already full, it's a bit of waste that we flush all
nat entries to page for persistence, but not to cache any entries.
2. if journal in cursum is not full, we fill nat entries to journal util
journal is full, then flush the left dirty entries to disk without merge
journaled entries, so these journaled entries may be flushed to disk at next
checkpoint but lost chance to flushed last time."
Actually, we have the same problem in using SIT journal area.
In this patch, firstly we will update sit journal with dirty entries as many as
possible. Secondly if there is no space in sit journal, we will remove all
entries in journal and walk through the whole dirty entry bitmap of sit,
accounting dirty sit entries located in same SIT block to sit entry set. All
entry sets are linked to list sit_entry_set in sm_info, sorted ascending order
by count of entries in set. Later we flush entries in set which have fewest
entries into journal as many as we can, and then flush dense set with merged
entries to disk.
In this way we can use sit journal area more effectively, also we will reduce
SIT update, result in gaining in performance and saving lifetime of flash
device.
In my testing environment, it shows this patch can help to reduce SIT block
update obviously.
virtual machine + hard disk:
fsstress -p 20 -n 400 -l 5
sit page num cp count sit pages/cp
based 2006.50 1349.75 1.486
patched 1566.25 1463.25 1.070
Our latency of merging op is small when handling a great number of dirty SIT
entries in flush_sit_entries:
latency(ns) dirty sit count
36038 2151
49168 2123
37174 2232
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-09-04 18:13:01 +08:00
|
|
|
goto fail;
|
|
|
|
|
2017-01-10 06:13:03 +08:00
|
|
|
discard_cmd_slab = f2fs_kmem_cache_create("discard_cmd",
|
|
|
|
sizeof(struct discard_cmd));
|
|
|
|
if (!discard_cmd_slab)
|
2016-09-05 12:28:26 +08:00
|
|
|
goto destroy_discard_entry;
|
2016-08-29 23:58:34 +08:00
|
|
|
|
f2fs: refactor flush_sit_entries codes for reducing SIT writes
In commit aec71382c681 ("f2fs: refactor flush_nat_entries codes for reducing NAT
writes"), we descripte the issue as below:
"Although building NAT journal in cursum reduce the read/write work for NAT
block, but previous design leave us lower performance when write checkpoint
frequently for these cases:
1. if journal in cursum has already full, it's a bit of waste that we flush all
nat entries to page for persistence, but not to cache any entries.
2. if journal in cursum is not full, we fill nat entries to journal util
journal is full, then flush the left dirty entries to disk without merge
journaled entries, so these journaled entries may be flushed to disk at next
checkpoint but lost chance to flushed last time."
Actually, we have the same problem in using SIT journal area.
In this patch, firstly we will update sit journal with dirty entries as many as
possible. Secondly if there is no space in sit journal, we will remove all
entries in journal and walk through the whole dirty entry bitmap of sit,
accounting dirty sit entries located in same SIT block to sit entry set. All
entry sets are linked to list sit_entry_set in sm_info, sorted ascending order
by count of entries in set. Later we flush entries in set which have fewest
entries into journal as many as we can, and then flush dense set with merged
entries to disk.
In this way we can use sit journal area more effectively, also we will reduce
SIT update, result in gaining in performance and saving lifetime of flash
device.
In my testing environment, it shows this patch can help to reduce SIT block
update obviously.
virtual machine + hard disk:
fsstress -p 20 -n 400 -l 5
sit page num cp count sit pages/cp
based 2006.50 1349.75 1.486
patched 1566.25 1463.25 1.070
Our latency of merging op is small when handling a great number of dirty SIT
entries in flush_sit_entries:
latency(ns) dirty sit count
36038 2151
49168 2123
37174 2232
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-09-04 18:13:01 +08:00
|
|
|
sit_entry_set_slab = f2fs_kmem_cache_create("sit_entry_set",
|
2014-11-21 14:42:07 +08:00
|
|
|
sizeof(struct sit_entry_set));
|
f2fs: refactor flush_sit_entries codes for reducing SIT writes
In commit aec71382c681 ("f2fs: refactor flush_nat_entries codes for reducing NAT
writes"), we descripte the issue as below:
"Although building NAT journal in cursum reduce the read/write work for NAT
block, but previous design leave us lower performance when write checkpoint
frequently for these cases:
1. if journal in cursum has already full, it's a bit of waste that we flush all
nat entries to page for persistence, but not to cache any entries.
2. if journal in cursum is not full, we fill nat entries to journal util
journal is full, then flush the left dirty entries to disk without merge
journaled entries, so these journaled entries may be flushed to disk at next
checkpoint but lost chance to flushed last time."
Actually, we have the same problem in using SIT journal area.
In this patch, firstly we will update sit journal with dirty entries as many as
possible. Secondly if there is no space in sit journal, we will remove all
entries in journal and walk through the whole dirty entry bitmap of sit,
accounting dirty sit entries located in same SIT block to sit entry set. All
entry sets are linked to list sit_entry_set in sm_info, sorted ascending order
by count of entries in set. Later we flush entries in set which have fewest
entries into journal as many as we can, and then flush dense set with merged
entries to disk.
In this way we can use sit journal area more effectively, also we will reduce
SIT update, result in gaining in performance and saving lifetime of flash
device.
In my testing environment, it shows this patch can help to reduce SIT block
update obviously.
virtual machine + hard disk:
fsstress -p 20 -n 400 -l 5
sit page num cp count sit pages/cp
based 2006.50 1349.75 1.486
patched 1566.25 1463.25 1.070
Our latency of merging op is small when handling a great number of dirty SIT
entries in flush_sit_entries:
latency(ns) dirty sit count
36038 2151
49168 2123
37174 2232
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-09-04 18:13:01 +08:00
|
|
|
if (!sit_entry_set_slab)
|
2017-01-10 06:13:03 +08:00
|
|
|
goto destroy_discard_cmd;
|
2014-10-07 08:39:50 +08:00
|
|
|
|
|
|
|
inmem_entry_slab = f2fs_kmem_cache_create("inmem_page_entry",
|
|
|
|
sizeof(struct inmem_pages));
|
|
|
|
if (!inmem_entry_slab)
|
|
|
|
goto destroy_sit_entry_set;
|
2013-11-15 12:55:58 +08:00
|
|
|
return 0;
|
f2fs: refactor flush_sit_entries codes for reducing SIT writes
In commit aec71382c681 ("f2fs: refactor flush_nat_entries codes for reducing NAT
writes"), we descripte the issue as below:
"Although building NAT journal in cursum reduce the read/write work for NAT
block, but previous design leave us lower performance when write checkpoint
frequently for these cases:
1. if journal in cursum has already full, it's a bit of waste that we flush all
nat entries to page for persistence, but not to cache any entries.
2. if journal in cursum is not full, we fill nat entries to journal util
journal is full, then flush the left dirty entries to disk without merge
journaled entries, so these journaled entries may be flushed to disk at next
checkpoint but lost chance to flushed last time."
Actually, we have the same problem in using SIT journal area.
In this patch, firstly we will update sit journal with dirty entries as many as
possible. Secondly if there is no space in sit journal, we will remove all
entries in journal and walk through the whole dirty entry bitmap of sit,
accounting dirty sit entries located in same SIT block to sit entry set. All
entry sets are linked to list sit_entry_set in sm_info, sorted ascending order
by count of entries in set. Later we flush entries in set which have fewest
entries into journal as many as we can, and then flush dense set with merged
entries to disk.
In this way we can use sit journal area more effectively, also we will reduce
SIT update, result in gaining in performance and saving lifetime of flash
device.
In my testing environment, it shows this patch can help to reduce SIT block
update obviously.
virtual machine + hard disk:
fsstress -p 20 -n 400 -l 5
sit page num cp count sit pages/cp
based 2006.50 1349.75 1.486
patched 1566.25 1463.25 1.070
Our latency of merging op is small when handling a great number of dirty SIT
entries in flush_sit_entries:
latency(ns) dirty sit count
36038 2151
49168 2123
37174 2232
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-09-04 18:13:01 +08:00
|
|
|
|
2014-10-07 08:39:50 +08:00
|
|
|
destroy_sit_entry_set:
|
|
|
|
kmem_cache_destroy(sit_entry_set_slab);
|
2017-01-10 06:13:03 +08:00
|
|
|
destroy_discard_cmd:
|
|
|
|
kmem_cache_destroy(discard_cmd_slab);
|
2016-09-05 12:28:26 +08:00
|
|
|
destroy_discard_entry:
|
f2fs: refactor flush_sit_entries codes for reducing SIT writes
In commit aec71382c681 ("f2fs: refactor flush_nat_entries codes for reducing NAT
writes"), we descripte the issue as below:
"Although building NAT journal in cursum reduce the read/write work for NAT
block, but previous design leave us lower performance when write checkpoint
frequently for these cases:
1. if journal in cursum has already full, it's a bit of waste that we flush all
nat entries to page for persistence, but not to cache any entries.
2. if journal in cursum is not full, we fill nat entries to journal util
journal is full, then flush the left dirty entries to disk without merge
journaled entries, so these journaled entries may be flushed to disk at next
checkpoint but lost chance to flushed last time."
Actually, we have the same problem in using SIT journal area.
In this patch, firstly we will update sit journal with dirty entries as many as
possible. Secondly if there is no space in sit journal, we will remove all
entries in journal and walk through the whole dirty entry bitmap of sit,
accounting dirty sit entries located in same SIT block to sit entry set. All
entry sets are linked to list sit_entry_set in sm_info, sorted ascending order
by count of entries in set. Later we flush entries in set which have fewest
entries into journal as many as we can, and then flush dense set with merged
entries to disk.
In this way we can use sit journal area more effectively, also we will reduce
SIT update, result in gaining in performance and saving lifetime of flash
device.
In my testing environment, it shows this patch can help to reduce SIT block
update obviously.
virtual machine + hard disk:
fsstress -p 20 -n 400 -l 5
sit page num cp count sit pages/cp
based 2006.50 1349.75 1.486
patched 1566.25 1463.25 1.070
Our latency of merging op is small when handling a great number of dirty SIT
entries in flush_sit_entries:
latency(ns) dirty sit count
36038 2151
49168 2123
37174 2232
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-09-04 18:13:01 +08:00
|
|
|
kmem_cache_destroy(discard_entry_slab);
|
|
|
|
fail:
|
|
|
|
return -ENOMEM;
|
2013-11-15 12:55:58 +08:00
|
|
|
}
|
|
|
|
|
f2fs: clean up symbol namespace
As Ted reported:
"Hi, I was looking at f2fs's sources recently, and I noticed that there
is a very large number of non-static symbols which don't have a f2fs
prefix. There's well over a hundred (see attached below).
As one example, in fs/f2fs/dir.c there is:
unsigned char get_de_type(struct f2fs_dir_entry *de)
This function is clearly only useful for f2fs, but it has a generic
name. This means that if any other file system tries to have the same
symbol name, there will be a symbol conflict and the kernel would not
successfully build. It also means that when someone is looking f2fs
sources, it's not at all obvious whether a function such as
read_data_page(), invalidate_blocks(), is a generic kernel function
found in the fs, mm, or block layers, or a f2fs specific function.
You might want to fix this at some point. Hopefully Kent's bcachefs
isn't similarly using genericly named functions, since that might
cause conflicts with f2fs's functions --- but just as this would be a
problem that we would rightly insist that Kent fix, this is something
that we should have rightly insisted that f2fs should have fixed
before it was integrated into the mainline kernel.
acquire_orphan_inode
add_ino_entry
add_orphan_inode
allocate_data_block
allocate_new_segments
alloc_nid
alloc_nid_done
alloc_nid_failed
available_free_memory
...."
This patch adds "f2fs_" prefix for all non-static symbols in order to:
a) avoid conflict with other kernel generic symbols;
b) to indicate the function is f2fs specific one instead of generic
one;
Reported-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-30 00:20:41 +08:00
|
|
|
void f2fs_destroy_segment_manager_caches(void)
|
2013-11-15 12:55:58 +08:00
|
|
|
{
|
f2fs: refactor flush_sit_entries codes for reducing SIT writes
In commit aec71382c681 ("f2fs: refactor flush_nat_entries codes for reducing NAT
writes"), we descripte the issue as below:
"Although building NAT journal in cursum reduce the read/write work for NAT
block, but previous design leave us lower performance when write checkpoint
frequently for these cases:
1. if journal in cursum has already full, it's a bit of waste that we flush all
nat entries to page for persistence, but not to cache any entries.
2. if journal in cursum is not full, we fill nat entries to journal util
journal is full, then flush the left dirty entries to disk without merge
journaled entries, so these journaled entries may be flushed to disk at next
checkpoint but lost chance to flushed last time."
Actually, we have the same problem in using SIT journal area.
In this patch, firstly we will update sit journal with dirty entries as many as
possible. Secondly if there is no space in sit journal, we will remove all
entries in journal and walk through the whole dirty entry bitmap of sit,
accounting dirty sit entries located in same SIT block to sit entry set. All
entry sets are linked to list sit_entry_set in sm_info, sorted ascending order
by count of entries in set. Later we flush entries in set which have fewest
entries into journal as many as we can, and then flush dense set with merged
entries to disk.
In this way we can use sit journal area more effectively, also we will reduce
SIT update, result in gaining in performance and saving lifetime of flash
device.
In my testing environment, it shows this patch can help to reduce SIT block
update obviously.
virtual machine + hard disk:
fsstress -p 20 -n 400 -l 5
sit page num cp count sit pages/cp
based 2006.50 1349.75 1.486
patched 1566.25 1463.25 1.070
Our latency of merging op is small when handling a great number of dirty SIT
entries in flush_sit_entries:
latency(ns) dirty sit count
36038 2151
49168 2123
37174 2232
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-09-04 18:13:01 +08:00
|
|
|
kmem_cache_destroy(sit_entry_set_slab);
|
2017-01-10 06:13:03 +08:00
|
|
|
kmem_cache_destroy(discard_cmd_slab);
|
2013-11-15 12:55:58 +08:00
|
|
|
kmem_cache_destroy(discard_entry_slab);
|
2014-10-07 08:39:50 +08:00
|
|
|
kmem_cache_destroy(inmem_entry_slab);
|
2013-11-15 12:55:58 +08:00
|
|
|
}
|