linux

korg/linux

mirror of https://mirrors.bfsu.edu.cn/git/linux.git synced 2024-12-18 08:35:08 +08:00

Author	SHA1	Message	Date
Chao Yu	34e159da41	f2fs: delay awaking discard thread It's better to delay awaking discard thread while queuing discard commands in checkpoint, it will help to give more chances for merging big and small discard. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-04-25 14:18:44 -07:00
Yunlei He	66a82d1fc7	f2fs: seperate read nat page from nat_tree_lock This patch seperate nat page read io from nat_tree_lock. -lock_page -get_node_info() -current_nat_addr ...... -> write_checkpoint -get_meta_page Because we lock node page, we can make sure no other threads modify this nid concurrently. So we just obtain current_nat_addr under nat_tree_lock, node info is always same in both nat pack. Signed-off-by: Yunlei He <heyunlei@huawei.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-04-25 14:16:39 -07:00
Sheng Yong	d3bb910c15	f2fs: fix multiple f2fs_add_link() having same name for inline dentry Commit `88c5c13a50` (f2fs: fix multiple f2fs_add_link() calls having same name) does not cover the scenario where inline dentry is enabled. In that case, F2FS_I(dir)->task will be NULL, and __f2fs_add_link will lookup dentries one more time. This patch fixes it by moving the assigment of current task to a upper level to cover both normal and inline dentry. Cc: <stable@vger.kernel.org> Fixes: `88c5c13a50` (f2fs: fix multiple f2fs_add_link() calls having same name) Signed-off-by: Sheng Yong <shengyong1@huawei.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-04-25 14:16:31 -07:00
Hou Pengyang	4086d3f61b	f2fs: skip encrypted inode in ASYNC IPU policy Async request may be throttled in block layer, so page for async may keep WRITE_BACK for a long time. For encrytped inode, we need wait on page writeback no matter if the device supports BDI_CAP_STABLE_WRITES. This may result in a higher waiting page writeback time for async encrypted inode page. This patch skips IPU for encrypted inode's updating write. Signed-off-by: Hou Pengyang <houpengyang@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-04-24 13:13:24 -07:00
Jaegeuk Kim	a788189305	f2fs: fix out-of free segments This patch also reverts `d0db7703ac` ("f2fs: do SSR in higher priority"). This patch fixes out of free segments caused by many small file creation by 1) mkfs -s 1 2G 2) mount 3) untar - preoduce 60000 small files burstly 4) sync - flush node pages - flush imeta Here, when we do f2fs_balance_fs, we missed # of imeta blocks, resulting in skipping to check has_not_enough_free_secs. Another test is done by 1) mkfs -s 12 2G 2) mount 3) untar - preoduce 60000 small files burstly 4) sync - flush node pages - flush imeta In this case, this patch also fixes wrong block allocation under large section size. Reported-by: William Brana <wbrana@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-04-24 13:13:23 -07:00
Arnd Bergmann	d66450e773	f2fs: improve definition of statistic macros With a recent addition of f2fs_lookup_extent_tree(), we get a warning about the use of empty macros: fs/f2fs/extent_cache.c: In function 'f2fs_lookup_extent_tree': fs/f2fs/extent_cache.c:358:32: error: suggest braces around empty body in an 'else' statement [-Werror=empty-body] stat_inc_rbtree_node_hit(sbi); A good way to avoid the warning and make the code more robust is to define all no-op macros as 'do { } while (0)'. Fixes: `54c2258cd6` ("f2fs: extract rb-tree operation infrastructure") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reivewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-04-24 13:13:22 -07:00
Jaegeuk Kim	d579324998	f2fs: assign allocation hint for warm/cold data This patch gives slower device region to warm/cold data area more eagerly. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-04-24 13:06:53 -07:00
Jaegeuk Kim	d07efb5077	f2fs: fix _IOW usage This patch fixes wrong _IOW usage. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-04-24 12:55:45 -07:00
Jaegeuk Kim	e066b83c9b	f2fs: add ioctl to flush data from faster device to cold area This patch adds an ioctl to flush data in faster device to cold area. User can give device number and number of segments to move. It doesn't move it if there is only one device. The parameter looks like: struct f2fs_flush_device { u32 dev_num; /* device number to flush / u32 segments; / # of segments to flush */ }; Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-04-24 12:55:41 -07:00
Hou Pengyang	04485987f0	f2fs: introduce async IPU policy This patch introduces an ASYNC IPU policy. Under senario of large # of async updating(e.g. log writing in Android), disk would be seriously fragmented, and higher frequent gc would be triggered. This patch uses IPU to rewrite the async update writting, since async is NOT sensitive to io latency. Signed-off-by: Hou Pengyang <houpengyang@huawei.com>	2017-04-19 11:00:46 -07:00
Chao Yu	d84d1cbdec	f2fs: add undiscard blocks stat This patch adds to account undiscard blocks. Signed-off-by: Chao Yu <yuchao0@huawei.com>	2017-04-19 11:00:45 -07:00
Chao Yu	001c584cca	f2fs: unlock cp_rwsem early for IPU writes For IPU writes, there won't be any udpates in dnode page since we will reuse old block address instead of allocating new one, so we don't need to lock cp_rwsem during IPU IO submitting. Signed-off-by: Chao Yu <yuchao0@huawei.com>	2017-04-19 11:00:44 -07:00
Chao Yu	df0f6b44dd	f2fs: introduce __check_rb_tree_consistence Introduce __check_rb_tree_consistence to check consistence of rb-tree based discard cache in runtime. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-04-19 11:00:44 -07:00
Chao Yu	0243a5f9da	f2fs: trace __submit_discard_cmd Add an even class f2fs_discard for introducing f2fs_queue_discard, then use f2fs_{queue,issue}_discard to trace __{queue,submit}_discard_cmd. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-04-19 11:00:43 -07:00
Chao Yu	ba48a33ef6	f2fs: in prior to issue big discard Keep issuing big size discard in prior instead of the one with random size, so that we expect that it will help to: - be quick to recycle unused large space in flash storage device. - give a chance for a) wait to merge small piece discards into bigger one, or b) avoid issuing discards while they have being reallocated by SSR. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-04-19 11:00:42 -07:00
Chao Yu	46f84c2c05	f2fs: clean up discard_cmd_control structure Avoid long variable name in discard_cmd_control structure, no logic change. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-04-19 11:00:41 -07:00
Chao Yu	004b686218	f2fs: use rb-tree to track pending discard commands Introduce rb-tree based discard cache infrastructure to speed up lookup and merge operation of discard entry. Signed-off-by: Chao Yu <yuchao0@huawei.com> [Jaegeuk Kim: initialize dc to avoid build warning] Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-04-19 11:00:40 -07:00
Jaegeuk Kim	d40d30c5aa	f2fs: avoid dirty node pages in check_only recovery In the check_only mode, we should not make any dirty node pages. Otherwise, we can get this panic: F2FS-fs (nvme0n1p1): Need to recover fsync data ------------[ cut here ]------------ kernel BUG at fs/f2fs/node.c:2204! CPU: 7 PID: 19923 Comm: mount Tainted: G OE 4.9.8 #2 RIP: 0010:[<ffffffffc0979c0b>] [<ffffffffc0979c0b>] flush_nat_entries+0x43b/0x7d0 [f2fs] Call Trace: [<ffffffffc096ddaa>] ? __f2fs_submit_merged_bio+0x5a/0xd0 [f2fs] [<ffffffffc096ddaa>] ? __f2fs_submit_merged_bio+0x5a/0xd0 [f2fs] [<ffffffffc096dddb>] ? __f2fs_submit_merged_bio+0x8b/0xd0 [f2fs] [<ffffffff860e450f>] ? up_write+0x1f/0x40 [<ffffffffc096dddb>] ? __f2fs_submit_merged_bio+0x8b/0xd0 [f2fs] [<ffffffffc0969f04>] write_checkpoint+0x2f4/0xf20 [f2fs] [<ffffffff860e938d>] ? trace_hardirqs_on+0xd/0x10 [<ffffffffc0960bc9>] ? f2fs_sync_fs+0x79/0x190 [f2fs] [<ffffffffc0960bc9>] ? f2fs_sync_fs+0x79/0x190 [f2fs] [<ffffffffc0960bd5>] f2fs_sync_fs+0x85/0x190 [f2fs] [<ffffffffc097b6de>] f2fs_balance_fs_bg+0x7e/0x1c0 [f2fs] [<ffffffffc0977b64>] f2fs_write_node_pages+0x34/0x350 [f2fs] [<ffffffff860e5f42>] ? __lock_is_held+0x52/0x70 [<ffffffff861d9b31>] do_writepages+0x21/0x30 [<ffffffff86298ce1>] __writeback_single_inode+0x61/0x760 [<ffffffff86909127>] ? _raw_spin_unlock+0x27/0x40 [<ffffffff8629a735>] writeback_single_inode+0xd5/0x190 [<ffffffff8629a889>] write_inode_now+0x99/0xc0 [<ffffffff86283876>] iput+0x1f6/0x2c0 [<ffffffffc0964b52>] f2fs_fill_super+0xc32/0x10c0 [f2fs] [<ffffffff86266462>] mount_bdev+0x182/0x1b0 [<ffffffffc0963f20>] ? f2fs_commit_super+0x100/0x100 [f2fs] [<ffffffffc0960da5>] f2fs_mount+0x15/0x20 [f2fs] [<ffffffff86266e08>] mount_fs+0x38/0x170 [<ffffffff86288bab>] vfs_kern_mount+0x6b/0x160 [<ffffffff8628bcfe>] do_mount+0x1be/0xd60 Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-04-18 13:37:49 -07:00
Jaegeuk Kim	d29fd17218	f2fs: fix not to set fsync/dentry mark Otherwise, we can see stale fsync/dentry mark given by previous calls, resulting in giving up roll-forward recovery due to wrong dentry mark. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-04-12 12:57:09 -07:00
Jaegeuk Kim	6c3acd9757	f2fs: allocate hot_data for atomic writes We'd better allocate atomic writes to hot_data zone. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-04-12 12:57:08 -07:00
Jaegeuk Kim	3097388354	f2fs: give time to flush dirty pages for checkpoint If all the threads are waiting for checkpoint, we have no chance to flush required dirty pages. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-04-12 12:57:07 -07:00
Jaegeuk Kim	9bb02c3627	f2fs: fix fs corruption due to zero inode page This patch fixes the following scenario. - f2fs_create/f2fs_mkdir - write_checkpoint - f2fs_mark_inode_dirty_sync - block_operations - f2fs_lock_all - f2fs_sync_inode_meta - f2fs_unlock_all - sync_inode_metadata - f2fs_lock_op - f2fs_write_inode - update_inode_page - get_node_page return -ENOENT - new_inode_page - fill_node_footer - f2fs_mark_inode_dirty_sync - ... - f2fs_unlock_op - f2fs_inode_synced - f2fs_lock_all - do_checkpoint In this checkpoint, we can get an inode page which contains zeros having valid node footer only. Cc: <stable@vger.kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-04-12 12:57:06 -07:00
Chao Yu	a54455f5ee	f2fs: shrink blk plug region Don't use blk plug covering area where there won't be any IOs being issued. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-04-12 12:57:05 -07:00
Chao Yu	54c2258cd6	f2fs: extract rb-tree operation infrastructure rb-tree lookup/update functions are deeply coupled into extent cache codes, it's very hard to reuse these basic functions, this patch extracts common rb-tree operation infrastructure for latter reusing. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-04-11 15:13:52 -07:00
Jaegeuk Kim	8fd5a37efa	f2fs: avoid frequent checkpoint during f2fs_gc Now we're doing SSR aggressively more than ever before, so once we reach to the reserved_segment, f2fs_balance_fs will call f2fs_gc, which triggers checkpoint everytime. We actually must avoid that. Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-04-11 15:12:39 -07:00
Jaegeuk Kim	4ddb1a4d4d	f2fs: clean up some macros in terms of GET_SEGNO This patch cleans several macros by introducing: - BLKS_PER_SEC - GET_SEC_FROM_SEG - GET_SEG_FROM_SEC - GET_ZONE_FROM_SEC - GET_ZONE_FROM_SEG Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-04-10 19:48:13 -07:00
Jaegeuk Kim	302bd34882	f2fs: clean up get_valid_blocks with consistent parameter This patch cleans up get_valid_blocks, which has no functional change. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-04-10 19:48:12 -07:00
Jaegeuk Kim	63fcf8e8d6	f2fs: use segment number for get_valid_blocks This patch fixes to submit a segment number for get_valid_blocks. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-04-10 19:48:11 -07:00
Tomohiro Kusumi	68afcf2d38	f2fs: guard macro variables with braces Add braces around variables used within macros for those make sense to do it. Many of the macros in f2fs already do this. What this commit doesn't do is anything that changes line# as a result of adding braces, which usually affects the binary via __LINE__. Confirmed no diff in fs/f2fs/f2fs.ko before/after this commit on x86_64, to make sure this has no functional change as well as there's been no unexpected side effect due to callers' arithmetics within the existing code. Signed-off-by: Tomohiro Kusumi <tkusumi@tuxera.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-04-10 19:48:10 -07:00
Tomohiro Kusumi	771a9a7177	f2fs: fix comment on f2fs_flush_merged_bios() after `86531d6b` Callers are to unlock the page on failure after `86531d6b`. Signed-off-by: Tomohiro Kusumi <tkusumi@tuxera.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-04-10 19:48:10 -07:00
Chao Yu	fa64a0036c	f2fs: prevent waiter encountering incorrect discard states In f2fs_submit_discard_endio, we will wake up waiter before setting discard command states, so waiter may use incorrect states. Change the order between complete() and states setting to fix this issue. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-04-10 19:48:09 -07:00
Chao Yu	d431413f00	f2fs: introduce f2fs_wait_discard_bios Split f2fs_wait_discard_bios from f2fs_wait_discard_bio, just for cleanup, no logic change. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-04-10 19:48:08 -07:00
Chao Yu	22d375dd9c	f2fs: split discard_cmd_list Split discard_cmd_list to discard_{pend,wait}_list, so while sending/waiting discard command, we can avoid traversing unneeded entries in original list. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-04-10 19:48:07 -07:00
Jaegeuk Kim	c6f82fe90d	Revert "f2fs: put allocate_segment after refresh_sit_entry" This reverts commit `3436c4bdb3`. This makes a leak to register dirty segments. I reproduced the issue by modified postmark which injects a lot of file create/delete/update and finally triggers huge number of SSR allocations. Cc: <stable@vger.kernel.org> # v4.10+ [Jaegeuk Kim: Change missing incorrect comment] Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-04-10 19:48:06 -07:00
Tomohiro Kusumi	64c24ecb3c	f2fs: split make_dentry_ptr() into block and inline versions Since callers statically know which type to use, make_dentry_ptr() can simply be splitted into two inline functions. This way, the code has less inlined, fewer arguments, and no cast. Signed-off-by: Tomohiro Kusumi <tkusumi@tuxera.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-04-05 11:05:08 -07:00
Jaegeuk Kim	d1b3e72d54	f2fs: submit bio of in-place-update pages This patch tries to split in-place-update bios from sequential bios. Suggested-by: Yunlei He <heyunlei@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-04-05 11:05:07 -07:00
Kaixu Xia	fc2e2875d5	f2fs: remove the redundant variable definition The variable 'i' has been defined before, so here we can use it directly. Signed-off-by: Kaixu Xia <xiakaixu@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-04-05 11:05:07 -07:00
Jaegeuk Kim	687de7f101	f2fs: avoid IO split due to mixed WB_SYNC_ALL and WB_SYNC_NONE If two threads try to flush dirty pages in different inodes respectively, f2fs_write_data_pages() will produce WRITE and WRITE_SYNC one at a time, resulting in a lot of 4KB seperated IOs. So, this patch gives higher priority to WB_SYNC_ALL IOs and gathers write IOs with a big WRITE_SYNC'ed bio. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-04-05 11:05:06 -07:00
Jaegeuk Kim	ef095d19e8	f2fs: write small sized IO to hot log It would better split small and large IOs separately in order to get more consecutive big writes. The default threshold is set to 64KB, but configurable by sysfs/min_hot_blocks. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-04-05 11:05:05 -07:00
Chao Yu	a7eeb82385	f2fs: use bitmap in discard_entry This patch changes to use bitmap instead of extent in struct discard_entry to indicate discard range in one segment, for fragmented space, this implementation can save memory footprint. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-04-05 11:05:04 -07:00
Chao Yu	f099405fc8	f2fs: clean up destroy_discard_cmd_control Remove unneeded parameter and simply change flow in destroy_discard_cmd_control. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-04-05 11:05:04 -07:00
Chao Yu	5f32366a29	f2fs: count discard command entry Adds to count discard command entry and show the number in debugfs, also fix to add cost of discard command cache into total comsumed memory footprint. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-04-05 11:05:03 -07:00
Chao Yu	8b8dd65f72	f2fs: show issued flush/discard count Show historical count of flush command and discard command. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-04-05 11:05:02 -07:00
Jaegeuk Kim	c13ff37e35	f2fs: relax node version check for victim data in gc - has_not_enough_free_secs node_secs: 0 dent_secs: 0 freed:0 free_segments:103 reserved:104 - f2fs_gc - get_victim_by_default alloc_mode 0, gc_mode 1, max_search 2672, offset 4654, ofs_unit 1 - do_garbage_collect start_segno 3976, end_segno 3977 type 0 - is_alive nid 22797, blkaddr 2131882, ofs_in_node 0, version 0x8/0x0 - gc_data_segment 766, segno 3976, block 512/426 not alive So, this patch fixes subtle corrupted case where node version does not match to summary version which results in infinite loop by gc. Reported-by: Yunlei He <heyunlei@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-03-29 17:34:40 -07:00
Jaegeuk Kim	796dbbfe4e	f2fs: start SSR much eariler to avoid FG_GC This patch initiates SSR much eariler, resulting in less FG_GC. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-03-29 17:34:39 -07:00
Jaegeuk Kim	7a20b8a61e	f2fs: allocate node and hot data in the beginning of partition In order to give more spatial locality, this patch changes the block allocation policy which assigns beginning of partition for small and hot data/node blocks. In order to do this, we set noheap allocation by default and introduce another mount option, heap, to reset it back. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-03-29 17:34:37 -07:00
Jaegeuk Kim	c541a51b8c	f2fs: fix wrong max cost initialization This patch fixes missing increased max cost caused by a patch that we increased cose of data segments in greedy algorithm. Cc: <stable@vger.kernel.org> # v4.10+ Fixes: `b9cd20619` "f2fs: node segment is prior to data segment selected victim" Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-03-28 15:49:27 -07:00
Yunlei He	59c9081bc8	f2fs: allow write page cache when writting cp This patch allow write data to normal file when writting new checkpoint. We relax three limitations for write_begin path: 1. data allocation 2. node allocation 3. variables in checkpoint Signed-off-by: Yunlei He <heyunlei@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-03-25 00:19:37 -07:00
Chao Yu	22588f8773	f2fs: don't reserve additional space in xattr block In this patch, we change xattr block disk layout as below: Before: xattr node block layout +---------------------------------------------+---------------+-------------+ \| node block xattr entries \| reserved \| node footer \| \| 4068 Bytes \| 4 Bytes \| 24 Bytes \| In memory layout +--------------------+---------------------------------+--------------------+ \| inline xattr \| node block xattr entries \| reserved \| \| 200 Bytes \| 4068 Bytes \| 4 Bytes \| After: xattr node block layout +-------------------------------------------------------------+-------------+ \| node block xattr entries \| node footer \| \| 4072 Bytes \| 24 Bytes \| In memory layout +--------------------+---------------------------------+--------------------+ \| inline xattr \| node block xattr entries \| reserved \| \| 200 Bytes \| 4072 Bytes \| 4 Bytes \| With this change, we don't need to reserve additional space in node block, just keep reserved space in logical in-memory layout. So that it would help to enlarge valid free space of xattr node block. As tested, generic/026 shows max stored xattr entires number increases from 531 to 532 when inline_xattr option is enabled. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-03-24 15:10:53 -04:00
Chao Yu	89e9eabd7d	f2fs: clean up xattr operation 1. don't allocate redundant memory in read_all_xattrs. 2. introduce RESERVED_XATTR_SIZE for cleanup. Signed-off-by: Chao Yu <yuchao0@huawei.com> Reviewed-by: Kinglong Mee <kinglongmee@gmail.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-03-24 15:10:52 -04:00
Chao Yu	99f4b917b0	f2fs: don't track volatile file in dirty inode list Don't track volatile file in dirty inode list, otherwise with data_flush option, background thread will entry into endless loop for flushing journal file's pages. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-03-24 15:10:51 -04:00
Chao Yu	648d50ba12	f2fs: show the max number of volatile operations This patch adds to show the max number of volatile operations which are conducting concurrently. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-03-24 15:10:50 -04:00
Chao Yu	30a61ddf81	f2fs: fix race condition in between free nid allocator/initializer In below concurrent case, allocated nid can be loaded into free nid cache and be allocated again. Thread A Thread B - f2fs_create - f2fs_new_inode - alloc_nid - __insert_nid_to_list(ALLOC_NID_LIST) - f2fs_balance_fs_bg - build_free_nids - __build_free_nids - scan_nat_page - add_free_nid - __lookup_nat_cache - f2fs_add_link - init_inode_metadata - new_inode_page - new_node_page - set_node_addr - alloc_nid_done - __remove_nid_from_list(ALLOC_NID_LIST) - __insert_nid_to_list(FREE_NID_LIST) This patch makes nat cache lookup and free nid list operation being atomical to avoid this race condition. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-03-24 15:10:50 -04:00
Yunlei He	5f4c3dec22	f2fs: use set_page_private marcro in f2fs_trace_pid Use set_page_private marcro instead of operte page struct directly Signed-off-by: Yunlei He <heyunlei@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-03-24 15:10:49 -04:00
Chao Yu	9897159a7b	f2fs: fix recording invalid last_victim When doing garbage collection, we try to record segment offset which locates at next one of last victim, using it as the start offset in next searching. But in some corner cases, recorded offset may cross the end of main segment area, it will cause incorrectly searching in dirty_segmap bitmap. This patch adds modular operation to avoid this issue. Reported-by: Yunlei He <heyunlei@huawei.com> Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-03-24 15:10:48 -04:00
Kinglong Mee	8f73cbb7d4	f2fs: more reasonable mem_size calculating of ino_entry Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-03-21 22:34:37 -04:00
Kinglong Mee	70874fb34f	f2fs: calculate the f2fs_stat_info into base_mem The memory size of f2fs_stat_info also should be calculated. Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-03-21 22:34:36 -04:00
Kinglong Mee	684ca7e55d	f2fs: avoid stat_inc_atomic_write for non-atomic file After filemap_write_and_wait_range fail, the FI_ATOMIC_FILE flags is removed, so that f2fs should not increase the stat of atomic_write. Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-03-21 22:34:36 -04:00
Kinglong Mee	c6f89dfd52	f2fs: sanity check of crc_offset from raw checkpoint The crc_offset towards or beyond the end of block is wrong, sanity check it. Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-03-21 22:34:34 -04:00
Kinglong Mee	d03ba4cc3f	f2fs: cleanup the disk level filename updating As discuss with Jaegeuk and Chao, "Once checkpoint is done, f2fs doesn't need to update there-in filename at all." The disk-level filename is used only one case, 1. create a file A under a dir 2. sync A 3. godown 4. umount 5. mount (roll_forward) Only the rename/cross_rename changes the filename, if it happens, a. between step 1 and 2, the sync A will caused checkpoint, so that, the roll_forward at step 5 never happens. b. after step 2, the roll_forward happens, file A will roll forward to the result as after step 1. So that, any updating the disk filename is useless, just cleanup it. Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-03-21 22:34:33 -04:00
Chao Yu	346fe752c4	f2fs: cover update_free_nid_bitmap with nid_list_lock free_nid_bitmap and free_nid_count in update_free_nid_bitmap should be updated atomically, use nid_list_lock cover them to avoid race in concurrent scenario. Signed-off-by: Chao Yu <yuchao0@huawei.com> Reviewed-by: Kinglong Mee <kinglongmee@gmail.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-03-21 22:34:32 -04:00
Kinglong Mee	a83d50bc16	f2fs: fix bad prefetchw of NULL page For f2fs_read_data_pages, the f2fs_mpage_readpages gets "page == NULL", so that, the prefetchw(&page->flags) is operated on NULL. Fixes: `f1e8866016` ("f2fs: expose f2fs_mpage_readpages") Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-03-21 22:34:31 -04:00
Kinglong Mee	bd4667cb4b	f2fs: clear FI_DATA_EXIST flag in truncate_inline_inode Clear FI_DATA_EXIST flag atomically in truncate_inline_inode, and the return value from truncate_inline_inode isn't used, remove it. Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-03-21 22:34:31 -04:00
Kinglong Mee	d756386172	f2fs: move mnt_want_write_file after arguments checking It's needless of mnt_want_write_file for arguments checking. Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-03-21 22:34:30 -04:00
Kinglong Mee	46e82fb1b5	f2fs: check new size by inode_newsize_ok in f2fs_insert_range The inode_newsize_ok is better than only checking the maxbytes, eg. the rlimit etc. Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-03-21 22:34:29 -04:00
Kinglong Mee	3cecfa5f67	f2fs: avoid copy date to user-space if move file range fail If move file range return error, the data copied to user-space is duplicate. Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-03-21 22:34:28 -04:00
Kinglong Mee	087d3d8bae	f2fs: drop duplicate new_size assign in f2fs_zero_range Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-03-21 22:34:28 -04:00
Fan Li	8a6aa32502	f2fs: adjust the way of calculating nat block use a slightly simpler expression to calculate nat block with nid. Signed-off-by: Fan Li <fanofcode.li@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-03-21 22:34:27 -04:00
Jaegeuk Kim	14b44d238c	f2fs: add fault injection on f2fs_truncate Inject a fault during f2fs_truncate(). Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-03-21 22:34:26 -04:00
Sheng Yong	1941d7bcb4	f2fs: check range before defragment This patch checks the parameter range passed by ioctl to void that range exceeds the max_file_blocks limit. Signed-off-by: Sheng Yong <shengyong1@huawei.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-03-21 22:34:25 -04:00
Sheng Yong	b0beab5016	f2fs: use parameter max_items instead of PIDVEC_SIZE Signed-off-by: Sheng Yong <shengyong1@huawei.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-03-21 22:34:24 -04:00
Yunlei He	3d6a650feb	f2fs: add a punch discard command function This patch add a function to punch discard command if one segment reuse before discard. Split this segment from multi-segments discard range, and discard the left bigger range. Signed-off-by: Yunlei He <heyunlei@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-03-21 22:34:23 -04:00
Jaegeuk Kim	c81abe34fe	f2fs: allocate a bio for discarding when actually issuing it Let's allocate a bio when issuing discard commands later. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-03-21 22:34:23 -04:00
Yunlei He	a29d0e0bc0	f2fs: skip writeback meta pages if cp_mutex acquire failed Skip writeback meta pages if cp_mutex lock acquire failed, cp will flush dirty pages instead. Signed-off-by: Yunlei He <heyunlei@huawei.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-03-21 22:34:22 -04:00
Jaegeuk Kim	5ce4738a02	f2fs: show more precise message on orphan recovery failure This case is not caused by fsck.f2fs. User needs to retry mount. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-03-21 22:34:21 -04:00
Kinglong Mee	fc7d5bc427	f2fs: remove dead macro PGOFS_OF_NEXT_DNODE Fixes: `3cf4574705` ("f2fs: introduce get_next_page_offset to speed up SEEK_DATA") Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-03-21 22:34:21 -04:00
Kinglong Mee	0b28b71e29	f2fs: drop duplicate radix tree lookup of nat_entry_set The nat entry is listed from the set list for freeing, it's duplicate to do radix tree lookup again. Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> [Jaegeuk Kim: remove unnecessary f2fs_bug_on] Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-03-21 22:34:20 -04:00
Kinglong Mee	20fda56b01	f2fs: make sure trace all f2fs_issue_flush The root device's issue flush trace is missing, add it and tracing the result from submit. Fixes `d50aaeec90` ("f2fs: show actual device info in tracepoints") Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-03-21 22:34:19 -04:00
Chao Yu	8ff0971f15	f2fs: don't allow volatile writes for non-regular file Now f2fs only supports volatile writes for journal db regular file. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-03-21 22:34:18 -04:00
Jaegeuk Kim	e811898c97	f2fs: don't allow atomic writes for not regular files The atomic writes only supports regular files for database. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-03-21 22:34:17 -04:00
Jaegeuk Kim	8c242db9b8	f2fs: fix stale ATOMIC_WRITTEN_PAGE private pointer When I forced to enable atomic operations intentionally, I could hit the below panic, since we didn't clear page->private in f2fs_invalidate_page called by file truncation. The panic occurs due to NULL mapping having page->private. BUG: unable to handle kernel paging request at ffffffffffffffff IP: drop_buffers+0x38/0xe0 PGD 5d00c067 PUD 5d00e067 PMD 0 CPU: 3 PID: 1648 Comm: fsstress Tainted: G D OE 4.10.0+ #5 Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006 task: ffff9151952863c0 task.stack: ffffaaec40db4000 RIP: 0010:drop_buffers+0x38/0xe0 RSP: 0018:ffffaaec40db74c8 EFLAGS: 00010292 Call Trace: ? page_referenced+0x8b/0x170 try_to_free_buffers+0xc5/0xe0 try_to_release_page+0x49/0x50 shrink_page_list+0x8bc/0x9f0 shrink_inactive_list+0x1dd/0x500 ? shrink_active_list+0x2c0/0x430 shrink_node_memcg+0x5eb/0x7c0 shrink_node+0xe1/0x320 do_try_to_free_pages+0xef/0x2e0 try_to_free_pages+0xe9/0x190 __alloc_pages_slowpath+0x390/0xe70 __alloc_pages_nodemask+0x291/0x2b0 alloc_pages_current+0x95/0x140 __page_cache_alloc+0xc4/0xe0 pagecache_get_page+0xab/0x2a0 grab_cache_page_write_begin+0x20/0x40 get_read_data_page+0x2e6/0x4c0 [f2fs] ? f2fs_mark_inode_dirty_sync+0x16/0x30 [f2fs] ? truncate_data_blocks_range+0x238/0x2b0 [f2fs] get_lock_data_page+0x30/0x190 [f2fs] __exchange_data_block+0xaaf/0xf40 [f2fs] f2fs_fallocate+0x418/0xd00 [f2fs] vfs_fallocate+0x157/0x220 SyS_fallocate+0x48/0x80 Signed-off-by: Yunlei He <heyunlei@huawei.com> Signed-off-by: Chao Yu <yuchao0@huawei.com> [Chao Yu: use INMEM_INVALIDATE for better tracing] Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-03-21 22:34:10 -04:00
Jaegeuk Kim	aa51d08a11	f2fs: build stat_info before orphan inode recovery f2fs_sync_fs() -> write_checkpoint() calls stat_inc_cp_count(sbi->stat_info), which needs stat_info allocation. Otherwise, we can hit: [254042.598623] ? count_shadow_nodes+0xa0/0xa0 [254042.598633] f2fs_sync_fs+0x65/0xd0 [f2fs] [254042.598645] f2fs_balance_fs_bg+0xe4/0x1c0 [f2fs] [254042.598657] f2fs_write_node_pages+0x34/0x1a0 [f2fs] [254042.598664] ? pagevec_lookup_entries+0x1e/0x30 [254042.598673] do_writepages+0x1e/0x30 [254042.598682] __writeback_single_inode+0x45/0x330 [254042.598688] writeback_single_inode+0xd7/0x190 [254042.598694] write_inode_now+0x86/0xa0 [254042.598699] iput+0x122/0x200 [254042.598709] f2fs_fill_super+0xd4a/0x14d0 [f2fs] [254042.598717] mount_bdev+0x184/0x1c0 [254042.598934] ? f2fs_commit_super+0x100/0x100 [f2fs] [254042.599142] f2fs_mount+0x15/0x20 [f2fs] [254042.599349] mount_fs+0x39/0x160 [254042.599554] ? __alloc_percpu+0x15/0x20 [254042.599759] vfs_kern_mount+0x67/0x110 [254042.599972] do_mount+0x1bb/0xc80 [254042.600175] ? memdup_user+0x42/0x60 [254042.600380] SyS_mount+0x83/0xd0 [254042.600583] entry_SYSCALL_64_fastpath+0x1e/0xad Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-03-21 16:52:16 -04:00
Kinglong Mee	10a875f82b	f2fs: fix the fault of calculating blkstart twice When the zone type is BLK_ZONE_TYPE_CONVENTIONAL, the blkstart is calculated twice. Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-03-21 16:52:16 -04:00
Kinglong Mee	e2f0e962ac	f2fs: fix the fault of checking F2FS_LINK_MAX for rename inode The parent directory's nlink will change, not the inode. Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-03-21 16:52:16 -04:00
Jaegeuk Kim	4f1bca9f0d	f2fs: don't allow to get pino when filename is encrypted After renaming an encrypted file, we have no way to get its encrypted filename from its dentry. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-03-21 16:52:16 -04:00
Jaegeuk Kim	8c1b3c0fb6	f2fs: fix wrong error injection for evict_inode The previous one was not a proper location to inject an error, since there is no point to get errors. Instead, we can emulate EIO during truncation, and the below logic should handle it correctly. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-03-21 16:52:16 -04:00
Kinglong Mee	10047f537c	f2fs: le32_to_cpu for ckpt->cp_pack_total_block_count Fixes: `22ad0b6ab4` ("f2fs: add bitmaps for empty or full NAT blocks") Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-03-21 16:52:16 -04:00
Jaegeuk Kim	b71deadbc4	f2fs: le16_to_cpu for xattr->e_value_size This patch fixes missing le16 conversion, reported by kbuild test robot. Fixes: `5f35a2cd5` ("f2fs: Don't update the xattr data that same as the exist") Reviewed-by: Kinglong Mee <kinglongmee@gmail.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-03-21 16:52:16 -04:00
Jaegeuk Kim	4f295443bf	f2fs: don't need to invalidate wrong node page If f2fs_new_inode() is failed, the bad inode will invalidate 0'th node page during f2fs_evict_inode(), which doesn't need to do. Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-03-21 16:52:16 -04:00
Yunlei He	a78aaa2c3c	f2fs: fix an error return value in truncate_partial_data_page This patch fix a error return value in truncate_partial_data_page Signed-off-by: Yunlei He <heyunlei@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-03-21 16:52:16 -04:00
Chao Yu	7041d5d286	f2fs: combine nat_bits and free_nid_bitmap cache Both nat_bits cache and free_nid_bitmap cache provide same functionality as a intermediate cache between free nid cache and disk, but with different granularity of indicating free nid range, and different persistence policy. nat_bits cache provides better persistence ability, and free_nid_bitmap provides better granularity. In this patch we combine advantage of both caches, so finally policy of the intermediate cache would be: - init: load free nid status from nat_bits into free_nid_bitmap - lookup: scan free_nid_bitmap before load NAT blocks - update: update free_nid_bitmap in real-time - persistence: udpate and persist nat_bits in checkpoint This patch also resolves performance regression reported by lkp-robot. commit: `4ac912427c` ("f2fs: introduce free nid bitmap") d00030cf9cd0bb96fdccc41e33d3c91dcbb672ba ("f2fs: use __set{__clear}_bit_le") 1382c0f3f9d3f936c8bc42ed1591cf7a593ef9f7 ("f2fs: combine nat_bits and free_nid_bitmap cache") `4ac912427c` d00030cf9cd0bb96fdccc41e33 1382c0f3f9d3f936c8bc42ed15 ---------------- -------------------------- -------------------------- %stddev %change %stddev %change %stddev \ \| \ \| \ 77863 ± 0% +2.1% 79485 ± 1% +50.8% 117404 ± 0% aim7.jobs-per-min 231.63 ± 0% -2.0% 227.01 ± 1% -33.6% 153.80 ± 0% aim7.time.elapsed_time 231.63 ± 0% -2.0% 227.01 ± 1% -33.6% 153.80 ± 0% aim7.time.elapsed_time.max 896604 ± 0% -0.8% 889221 ± 3% -20.2% 715260 ± 1% aim7.time.involuntary_context_switches 2394 ± 1% +4.6% 2503 ± 1% +3.7% 2481 ± 2% aim7.time.maximum_resident_set_size 6240 ± 0% -1.5% 6145 ± 1% -14.1% 5360 ± 1% aim7.time.system_time 1111357 ± 3% +1.9% 1132509 ± 2% -6.2% 1041932 ± 2% aim7.time.voluntary_context_switches ... Signed-off-by: Chao Yu <yuchao0@huawei.com> Tested-by: Xiaolong Ye <xiaolong.ye@intel.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-03-20 10:00:18 -04:00
Chao Yu	586d1492f3	f2fs: skip scanning free nid bitmap of full NAT blocks This patch adds to account free nids for each NAT blocks, and while scanning all free nid bitmap, do check count and skip lookuping in full NAT block. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-03-20 10:00:17 -04:00
Jaegeuk Kim	23380b8568	f2fs: use __set{__clear}_bit_le This patch uses __set{__clear}_bit_le for highter speed. Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-03-20 10:00:16 -04:00
Jaegeuk Kim	9f7e4a2c49	f2fs: declare static functions This is to avoid build warning reported by kbuild test robot. Signed-off-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-03-20 10:00:15 -04:00
Jaegeuk Kim	720037f939	f2fs: don't overwrite node block by SSR This patch fixes that SSR can overwrite previous warm node block consisting of a node chain since the last checkpoint. Fixes: `5b6c6be2d8` ("f2fs: use SSR for warm node as well") Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-03-20 10:00:14 -04:00
Linus Torvalds	590dce2d49	Merge branch 'rebased-statx' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs 'statx()' update from Al Viro. This adds the new extended stat() interface that internally subsumes our previous stat interfaces, and allows user mode to specify in more detail what kind of information it wants. It also allows for some explicit synchronization information to be passed to the filesystem, which can be relevant for network filesystems: is the cached value ok, or do you need open/close consistency, or what? From David Howells. Andreas Dilger points out that the first version of the extended statx interface was posted June 29, 2010: https://www.spinics.net/lists/linux-fsdevel/msg33831.html * 'rebased-statx' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: statx: Add a system call to make enhanced file info available	2017-03-03 11:38:56 -08:00
David Howells	a528d35e8b	statx: Add a system call to make enhanced file info available Add a system call to make extended file information available, including file creation and some attribute flags where available through the underlying filesystem. The getattr inode operation is altered to take two additional arguments: a u32 request_mask and an unsigned int flags that indicate the synchronisation mode. This change is propagated to the vfs_getattr() function. Functions like vfs_stat() are now inline wrappers around new functions vfs_statx() and vfs_statx_fd() to reduce stack usage. ======== OVERVIEW ======== The idea was initially proposed as a set of xattrs that could be retrieved with getxattr(), but the general preference proved to be for a new syscall with an extended stat structure. A number of requests were gathered for features to be included. The following have been included: (1) Make the fields a consistent size on all arches and make them large. (2) Spare space, request flags and information flags are provided for future expansion. (3) Better support for the y2038 problem [Arnd Bergmann] (tv_sec is an __s64). (4) Creation time: The SMB protocol carries the creation time, which could be exported by Samba, which will in turn help CIFS make use of FS-Cache as that can be used for coherency data (stx_btime). This is also specified in NFSv4 as a recommended attribute and could be exported by NFSD [Steve French]. (5) Lightweight stat: Ask for just those details of interest, and allow a netfs (such as NFS) to approximate anything not of interest, possibly without going to the server [Trond Myklebust, Ulrich Drepper, Andreas Dilger] (AT_STATX_DONT_SYNC). (6) Heavyweight stat: Force a netfs to go to the server, even if it thinks its cached attributes are up to date [Trond Myklebust] (AT_STATX_FORCE_SYNC). And the following have been left out for future extension: (7) Data version number: Could be used by userspace NFS servers [Aneesh Kumar]. Can also be used to modify fill_post_wcc() in NFSD which retrieves i_version directly, but has just called vfs_getattr(). It could get it from the kstat struct if it used vfs_xgetattr() instead. (There's disagreement on the exact semantics of a single field, since not all filesystems do this the same way). (8) BSD stat compatibility: Including more fields from the BSD stat such as creation time (st_btime) and inode generation number (st_gen) [Jeremy Allison, Bernd Schubert]. (9) Inode generation number: Useful for FUSE and userspace NFS servers [Bernd Schubert]. (This was asked for but later deemed unnecessary with the open-by-handle capability available and caused disagreement as to whether it's a security hole or not). (10) Extra coherency data may be useful in making backups [Andreas Dilger]. (No particular data were offered, but things like last backup timestamp, the data version number and the DOS archive bit would come into this category). (11) Allow the filesystem to indicate what it can/cannot provide: A filesystem can now say it doesn't support a standard stat feature if that isn't available, so if, for instance, inode numbers or UIDs don't exist or are fabricated locally... (This requires a separate system call - I have an fsinfo() call idea for this). (12) Store a 16-byte volume ID in the superblock that can be returned in struct xstat [Steve French]. (Deferred to fsinfo). (13) Include granularity fields in the time data to indicate the granularity of each of the times (NFSv4 time_delta) [Steve French]. (Deferred to fsinfo). (14) FS_IOC_GETFLAGS value. These could be translated to BSD's st_flags. Note that the Linux IOC flags are a mess and filesystems such as Ext4 define flags that aren't in linux/fs.h, so translation in the kernel may be a necessity (or, possibly, we provide the filesystem type too). (Some attributes are made available in stx_attributes, but the general feeling was that the IOC flags were to ext[234]-specific and shouldn't be exposed through statx this way). (15) Mask of features available on file (eg: ACLs, seclabel) [Brad Boyer, Michael Kerrisk]. (Deferred, probably to fsinfo. Finding out if there's an ACL or seclabal might require extra filesystem operations). (16) Femtosecond-resolution timestamps [Dave Chinner]. (A __reserved field has been left in the statx_timestamp struct for this - if there proves to be a need). (17) A set multiple attributes syscall to go with this. =============== NEW SYSTEM CALL =============== The new system call is: int ret = statx(int dfd, const char filename, unsigned int flags, unsigned int mask, struct statx buffer); The dfd, filename and flags parameters indicate the file to query, in a similar way to fstatat(). There is no equivalent of lstat() as that can be emulated with statx() by passing AT_SYMLINK_NOFOLLOW in flags. There is also no equivalent of fstat() as that can be emulated by passing a NULL filename to statx() with the fd of interest in dfd. Whether or not statx() synchronises the attributes with the backing store can be controlled by OR'ing a value into the flags argument (this typically only affects network filesystems): (1) AT_STATX_SYNC_AS_STAT tells statx() to behave as stat() does in this respect. (2) AT_STATX_FORCE_SYNC will require a network filesystem to synchronise its attributes with the server - which might require data writeback to occur to get the timestamps correct. (3) AT_STATX_DONT_SYNC will suppress synchronisation with the server in a network filesystem. The resulting values should be considered approximate. mask is a bitmask indicating the fields in struct statx that are of interest to the caller. The user should set this to STATX_BASIC_STATS to get the basic set returned by stat(). It should be noted that asking for more information may entail extra I/O operations. buffer points to the destination for the data. This must be 256 bytes in size. ====================== MAIN ATTRIBUTES RECORD ====================== The following structures are defined in which to return the main attribute set: struct statx_timestamp { __s64 tv_sec; __s32 tv_nsec; __s32 __reserved; }; struct statx { __u32 stx_mask; __u32 stx_blksize; __u64 stx_attributes; __u32 stx_nlink; __u32 stx_uid; __u32 stx_gid; __u16 stx_mode; __u16 __spare0[1]; __u64 stx_ino; __u64 stx_size; __u64 stx_blocks; __u64 __spare1[1]; struct statx_timestamp stx_atime; struct statx_timestamp stx_btime; struct statx_timestamp stx_ctime; struct statx_timestamp stx_mtime; __u32 stx_rdev_major; __u32 stx_rdev_minor; __u32 stx_dev_major; __u32 stx_dev_minor; __u64 __spare2[14]; }; The defined bits in request_mask and stx_mask are: STATX_TYPE Want/got stx_mode & S_IFMT STATX_MODE Want/got stx_mode & ~S_IFMT STATX_NLINK Want/got stx_nlink STATX_UID Want/got stx_uid STATX_GID Want/got stx_gid STATX_ATIME Want/got stx_atime{,_ns} STATX_MTIME Want/got stx_mtime{,_ns} STATX_CTIME Want/got stx_ctime{,_ns} STATX_INO Want/got stx_ino STATX_SIZE Want/got stx_size STATX_BLOCKS Want/got stx_blocks STATX_BASIC_STATS [The stuff in the normal stat struct] STATX_BTIME Want/got stx_btime{,_ns} STATX_ALL [All currently available stuff] stx_btime is the file creation time, stx_mask is a bitmask indicating the data provided and __spares[] are where as-yet undefined fields can be placed. Time fields are structures with separate seconds and nanoseconds fields plus a reserved field in case we want to add even finer resolution. Note that times will be negative if before 1970; in such a case, the nanosecond fields will also be negative if not zero. The bits defined in the stx_attributes field convey information about a file, how it is accessed, where it is and what it does. The following attributes map to FS__FL flags and are the same numerical value: STATX_ATTR_COMPRESSED File is compressed by the fs STATX_ATTR_IMMUTABLE File is marked immutable STATX_ATTR_APPEND File is append-only STATX_ATTR_NODUMP File is not to be dumped STATX_ATTR_ENCRYPTED File requires key to decrypt in fs Within the kernel, the supported flags are listed by: KSTAT_ATTR_FS_IOC_FLAGS [Are any other IOC flags of sufficient general interest to be exposed through this interface?] New flags include: STATX_ATTR_AUTOMOUNT Object is an automount trigger These are for the use of GUI tools that might want to mark files specially, depending on what they are. Fields in struct statx come in a number of classes: (0) stx_dev_, stx_blksize. These are local system information and are always available. (1) stx_mode, stx_nlinks, stx_uid, stx_gid, stx_[amc]time, stx_ino, stx_size, stx_blocks. These will be returned whether the caller asks for them or not. The corresponding bits in stx_mask will be set to indicate whether they actually have valid values. If the caller didn't ask for them, then they may be approximated. For example, NFS won't waste any time updating them from the server, unless as a byproduct of updating something requested. If the values don't actually exist for the underlying object (such as UID or GID on a DOS file), then the bit won't be set in the stx_mask, even if the caller asked for the value. In such a case, the returned value will be a fabrication. Note that there are instances where the type might not be valid, for instance Windows reparse points. (2) stx_rdev_*. This will be set only if stx_mode indicates we're looking at a blockdev or a chardev, otherwise will be 0. (3) stx_btime. Similar to (1), except this will be set to 0 if it doesn't exist. ======= TESTING ======= The following test program can be used to test the statx system call: samples/statx/test-statx.c Just compile and run, passing it paths to the files you want to examine. The file is built automatically if CONFIG_SAMPLES is enabled. Here's some example output. Firstly, an NFS directory that crosses to another FSID. Note that the AUTOMOUNT attribute is set because transiting this directory will cause d_automount to be invoked by the VFS. [root@andromeda ~]# /tmp/test-statx -A /warthog/data statx(/warthog/data) = 0 results=7ff Size: 4096 Blocks: 8 IO Block: 1048576 directory Device: 00:26 Inode: 1703937 Links: 125 Access: (3777/drwxrwxrwx) Uid: 0 Gid: 4041 Access: 2016-11-24 09:02:12.219699527+0000 Modify: 2016-11-17 10:44:36.225653653+0000 Change: 2016-11-17 10:44:36.225653653+0000 Attributes: 0000000000001000 (-------- -------- -------- -------- -------- -------- ---m---- --------) Secondly, the result of automounting on that directory. [root@andromeda ~]# /tmp/test-statx /warthog/data statx(/warthog/data) = 0 results=7ff Size: 4096 Blocks: 8 IO Block: 1048576 directory Device: 00:27 Inode: 2 Links: 125 Access: (3777/drwxrwxrwx) Uid: 0 Gid: 4041 Access: 2016-11-24 09:02:12.219699527+0000 Modify: 2016-11-17 10:44:36.225653653+0000 Change: 2016-11-17 10:44:36.225653653+0000 Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2017-03-02 20:51:15 -05:00
Ingo Molnar	174cd4b1e5	sched/headers: Prepare to move signal wakeup & sigpending methods from <linux/sched.h> into <linux/sched/signal.h> Fix up affected files that include this signal functionality via sched.h. Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-03-02 08:42:32 +01:00
Linus Torvalds	25c4e6c3f0	for-f2fs-4.11 This round introduces several interesting features such as on-disk NAT bitmaps, IO alignment, and a discard thread. And it includes a couple of major bug fixes as below. == Enhancement == - introduce on-disk bitmaps to avoid scanning NAT blocks when getting free nids - support IO alignment to prepare open-channel SSD integration in future - introduce a discard thread to avoid long latency during checkpoint and fstrim - use SSR for warm node and enable inline_xattr by default - introduce in-memory bitmaps to check FS consistency for debugging - improve write_begin by avoiding needless read IO == Bug fix == - fix broken zone_reset behavior for SMR drive - fix wrong victim selection policy during GC - fix missing behavior when preparing discard commands - fix bugs in atomic write support and fiemap - workaround to handle multiple f2fs_add_link calls having same name And it includes a bunch of clean-up patches as well. -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQIcBAABCAAGBQJYtmdOAAoJEEAUqH6CSFDSs0UP/AzngT37xVIhVBD13J9oHIuv rFA/eHVGRJmU1xc4SG1bghKm45xq8rwUX7irarfvLLc5aL+6VPGSdaRBykUr4A5N MN/bgK//EPp7If8EF+8PpY+9x7g67i0mtz5iD8dDrK+bUKV/IDKV1LWw5pR3g/g6 RwMH0dUVOiD/HJ5iFp1ykTdVPe4vFY013uVmyPxUq+nCBlqlQm1nOvrGjF/HeYyX kqcD2LEc79GPfS5ebQIKfCfLE0rsWVnnS6YaqlDNCD5/oRim71CUtA4MPTYv29vp R/SebWlayEm+u68+uQUu6AyIk/1IdP0+AtRuQd/VxuteoyXmkTMHER662DqN4F8J npPdNrbNdlzwuAP77avy+hplqbD19yUa7o7Fl1No5rfheT3CiNTSj2uoriyEAffH 1AM6tES7S7n5ttrXOr9iOxrK0u/vuaf7fbKVtK+RI09hwzdvyGB5HUdQB0iP/XR+ obw8dru79ISMVZ9YuDhSfjI5ohAcfthfuqgjUt2RAfDv19IRsg5eayAp3T6nUfEX AGQbV/52dkO9svZztMbcBW95zmqkE0cMeX66KIMCPXNuDiE474t8k115K6kHpFwP e4Kx+mTSNhR1LEAaVdmCjbLb0gVrumVHTdjaZopnxTFmE70u/M6h1vY90m1LkReF ZDK5mhfMmGzU4wkvbgP8 =tw8c -----END PGP SIGNATURE----- Merge tag 'for-f2fs-4.11' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs Pull f2fs updates from Jaegeuk Kim: "This round introduces several interesting features such as on-disk NAT bitmaps, IO alignment, and a discard thread. And it includes a couple of major bug fixes as below. Enhancements: - introduce on-disk bitmaps to avoid scanning NAT blocks when getting free nids - support IO alignment to prepare open-channel SSD integration in future - introduce a discard thread to avoid long latency during checkpoint and fstrim - use SSR for warm node and enable inline_xattr by default - introduce in-memory bitmaps to check FS consistency for debugging - improve write_begin by avoiding needless read IO Bug fixes: - fix broken zone_reset behavior for SMR drive - fix wrong victim selection policy during GC - fix missing behavior when preparing discard commands - fix bugs in atomic write support and fiemap - workaround to handle multiple f2fs_add_link calls having same name ... and it includes a bunch of clean-up patches as well" * tag 'for-f2fs-4.11' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (97 commits) f2fs: avoid to flush nat journal entries f2fs: avoid to issue redundant discard commands f2fs: fix a plint compile warning f2fs: add f2fs_drop_inode tracepoint f2fs: Fix zoned block device support f2fs: remove redundant set_page_dirty() f2fs: fix to enlarge size of write_io_dummy mempool f2fs: fix memory leak of write_io_dummy mempool during umount f2fs: fix to update F2FS_{CP_}WB_DATA count correctly f2fs: use MAX_FREE_NIDS for the free nids target f2fs: introduce free nid bitmap f2fs: new helper cur_cp_crc() getting crc in f2fs_checkpoint f2fs: update the comment of default nr_pages to skipping f2fs: drop the duplicate pval in f2fs_getxattr f2fs: Don't update the xattr data that same as the exist f2fs: kill __is_extent_same f2fs: avoid bggc->fggc when enough free segments are avaliable after cp f2fs: select target segment with closer temperature in SSR mode f2fs: show simple call stack in fault injection message f2fs: no need lock_op in f2fs_write_inline_data ...	2017-03-01 15:55:04 -08:00
Jaegeuk Kim	900f736251	f2fs: avoid to flush nat journal entries This patch adds a missing condition which flushes nat journal entries unnecessarily introduced by: f2fs: add bitmaps for empty or full NAT blocks Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-02-28 09:57:13 -08:00
Jaegeuk Kim	8b107f5b97	f2fs: avoid to issue redundant discard commands If segs_per_sec is over 1 like under SMR, previously f2fs issues discard commands redundantly on the same section, since we didn't move end position for the previous discard command. E.g., start end \| \| prefree_bitmap = [01111100111100] And, after issue discard for this section, end start \| \| prefree_bitmap = [01111100111100] Select this section again by searching from (end + 1), start end \| \| prefree_bitmap = [01111100111100] Fixes: `36abef4e79` ("f2fs: introduce mode=lfs mount option") Cc: <stable@vger.kernel.org> Cc: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-02-27 13:44:08 -08:00
Hou Pengyang	37e79cd31c	f2fs: fix a plint compile warning fix such pclint warning: ... Loss of precision (arg. no. 2) (unsigned long long to unsigned int)) Signed-off-by: Hou Pengyang <houpengyang@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-02-27 10:51:21 -08:00
Hou Pengyang	b8d96a30b6	f2fs: add f2fs_drop_inode tracepoint Signed-off-by: Hou Pengyang <houpengyang@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-02-27 10:51:20 -08:00
Masato Suzuki	7bb3a371d1	f2fs: Fix zoned block device support The introduction of the multi-device feature partially broke the support for zoned block devices. In the function f2fs_scan_devices, sbi->devs allocation and initialization is skipped in the case of a single device mount. This result in no device information structure being allocated for the device. This is fine if the device is a regular device, but in the case of a zoned block device, the device zone type array is not initialized, which causes the function __f2fs_issue_discard_zone to fail as get_blkz_type is unable to determine the zone type of a section. Fix this by always allocating and initializing the sbi->devs device information array even in the case of a single device if that device is zoned. For this particular case, make sure to obtain a reference on the single device so that the call to blkdev_put() in destroy_device_list operates as expected. Fixes: `3c62be17d4` ("f2fs: support multiple devices") Cc: <stable@vger.kernel.org> # v4.10 Signed-off-by: Masato Suzuki <masato.suzuki@wdc.com> Acked-by: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-02-27 10:40:12 -08:00
Yunlei He	4fcf589ad4	f2fs: remove redundant set_page_dirty() This patch remove redundant set_page_dirty in truncate_blocks Signed-off-by: Yunlei He <heyunlei@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-02-27 10:40:11 -08:00
Chao Yu	a3ebfe4fd8	f2fs: fix to enlarge size of write_io_dummy mempool It needs to double cache size of write_io_dummy mempool, otherwise we may run out of cache in scenraio of Data/Node IOs were issued concurrently. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-02-27 10:40:10 -08:00
Chao Yu	b6895e8f99	f2fs: fix memory leak of write_io_dummy mempool during umount Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-02-27 10:40:09 -08:00
Chao Yu	540faedb00	f2fs: fix to update F2FS_{CP_}WB_DATA count correctly We should only account F2FS_{CP_}WB_DATA IOs for write path, fix it. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-02-27 10:40:08 -08:00
Kinglong Mee	f0cdbfe6ef	f2fs: use MAX_FREE_NIDS for the free nids target F2FS has define MAX_FREE_NIDS for maximum of cached free nids target. Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-02-27 10:07:48 -08:00
Chao Yu	4ac912427c	f2fs: introduce free nid bitmap In scenario of intensively node allocation, free nids will be ran out soon, then it needs to stop to load free nids by traversing NAT blocks, in worse case, if NAT blocks does not be cached in memory, it generates IOs which slows down our foreground operations. In order to speed up node allocation, in this patch we introduce a new free_nid_bitmap array, so there is an bitmap table for each NAT block, Once the NAT block is loaded, related bitmap cache will be switched on, and bitmap will be set during traversing nat entries in NAT block, later we can query and update nid usage status in memory completely. With such implementation, I expect performance of node allocation can be improved in the long-term after filesystem image is mounted. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-02-27 10:07:47 -08:00
Kinglong Mee	ced2c7ea8e	f2fs: new helper cur_cp_crc() getting crc in f2fs_checkpoint There are four places that getting the crc value in f2fs_checkpoint, just add a new helper cur_cp_crc for them. Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-02-27 10:07:47 -08:00
Kinglong Mee	727ebb091e	f2fs: update the comment of default nr_pages to skipping Fixes: `2c237ebaa4` ("f2fs: avoid writing node/metapages during writes") Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-02-27 10:07:46 -08:00
Kinglong Mee	0ec4a5b647	f2fs: drop the duplicate pval in f2fs_getxattr Fixes: `ba38c27eb9` ("f2fs: enhance lookup xattr") Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-02-27 10:07:45 -08:00
Kinglong Mee	5f35a2cd5b	f2fs: Don't update the xattr data that same as the exist f2fs removes the old xattr data and appends the new data although the new data is same as the exist. Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-02-27 10:07:44 -08:00
Chao Yu	317e130096	f2fs: kill __is_extent_same Since commit `ee6d182f2a` ("f2fs: remove syncing inode page in all the cases") delayed inode element updating from inode cache to node page cache, so once largest cached extent is updated, we can make inode dirty immediately instead of checking and updating it in the end of extent cache update. The above commit didn't clean up unneeded codes in extent_cache.c, let's finish the job in this patch. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-02-27 10:07:43 -08:00
Hou Pengyang	19f4e688f8	f2fs: avoid bggc->fggc when enough free segments are avaliable after cp We use has_not_enough_free_secs to check if there are enough free segments, (free_sections(sbi) + freed) <= (node_secs + 2 * dent_secs + imeta_secs + reserved_sections(sbi) + needed); Under scenario with large number of dirty nodes, these nodes would be flushed during cp, as a result, right side of the inequality would be decreased, while left side stays unchanged if these nodes are flushed in SSR way, which means there are enough free segments after this cp. For this case, we just do a bggc instead of fggc. Signed-off-by: Hou Pengyang <houpengyang@huawei.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-02-27 10:07:37 -08:00
Chao Yu	d27c3d89db	f2fs: select target segment with closer temperature in SSR mode In SSR mode, we can allocate target segment which has different temperature type from the type of current block, in order to avoid mixing coldest and hottest data/node as much as possible, change SSR allocation policy to select closer temperature for current block prior. Signed-off-by: Yunlong Song <yunlong.song@huawei.com> Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-02-27 09:59:56 -08:00
Chao Yu	55523519bc	f2fs: show simple call stack in fault injection message Previously kernel message can show that in which function we do the injection, but unfortunately, most of the caller are the same, for tracking more information of injection path, it needs to show upper caller's name. This patch supports that ability. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-02-27 09:59:55 -08:00
Yunlei He	dd7b2333e6	f2fs: no need lock_op in f2fs_write_inline_data Similar as f2fs_write_inode, f2fs_write_inline_data just mark inode page dirty, so it's no need to write inline data under read lock of cp_rwsem. Signed-off-by: Yunlei He <heyunlei@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-02-27 09:59:55 -08:00
Jaegeuk Kim	22ad0b6ab4	f2fs: add bitmaps for empty or full NAT blocks This patches adds bitmaps to represent empty or full NAT blocks containing free nid entries. If we can find valid crc\|cp_ver in the last block of checkpoint pack, we'll use these bitmaps when building free nids. In order to avoid checkpointing burden, up-to-date bitmaps will be flushed only during umount time. So, normally we can get this gain, but when power-cut happens, we rely on fsck.f2fs which recovers this bitmap again. After this patch, we build free nids from nid #0 at mount time to make more full NAT blocks, but in runtime, we check empty NAT blocks to load free nids without loading any NAT pages from disk. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-02-27 09:59:54 -08:00
Yunlei He	5e8256ac2e	f2fs: replace rw semaphore extent_tree_lock with mutex lock This patch replace rw semaphore extent_tree_lock with mutex lock for no read cases with this lock. Signed-off-by: Yunlei He <heyunlei@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-02-27 09:59:53 -08:00
Kinglong Mee	3f2be04304	f2fs: avoid m_flags overlay when allocating more data blocks When more than one data blocks are allocated, the F2FS_MAP_UNWRITTEN/MAPPED flags will be overlapped by F2FS_MAP_NEW at the later times. Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-02-27 09:59:52 -08:00
Hou Pengyang	6bfaf7b150	f2fs: remove unsafe bitmap checking proc A: proc B: - writeback_sb_inodes - __writeback_single_inode - do_writepages - f2fs_write_node_pages - f2fs_balance_fs_bg - write_checkpoint - build_free_nids - flush_nat_entries - __build_free_nids - __flush_nat_entry_set - ra_meta_pages - get_next_nat_page - current_nat_addr - set_to_next_nat [do nat_bitmap checking] - f2fs_change_bit For proc A, nat_bitmap and nat_bitmap_mir would be compared without lock_op and nm_i->nat_tree_lock, while proc B is changing nat_bitmap/nat_bitmap_ver in cp. So it is normal for nat_bitmap/nat_bitmap diffrence under such scenario. This patch fix this by removing the monitoring point. [Fix: `599a09b` f2fs: check in-memory nat version bitmap] Signed-off-by: Hou Pengyang <houpengyang@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-02-27 09:59:51 -08:00
Hou Pengyang	e15882b6c6	f2fs: init local extent_info to avoid stale stack info in tp To avoid such stale(fops, blk, len) info in f2fs_lookup_extent_tree_end tp dio-23095 [005] ...1 17878.856859: f2fs_lookup_extent_tree_end: dev = (259,30), ino = 856, pgofs = 0, ext_info(fofs: 3441207040, blk: 4294967232, len: 3481143808) Signed-off-by: Hou Pengyang <houpengyang@huawei.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-02-27 09:59:50 -08:00
Yunlong Song	77190e1f31	f2fs: remove unnecessary condition check for write_checkpoint in f2fs_gc Since has_not_enough_free_secs(sbi, 0, 0) must be true if has_not_enough_ free_secs(sbi, sec_freed, 0) is true, write_checkpoint is sure to execute in both conditions. Signed-off-by: Yunlong Song <yunlong.song@huawei.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-02-27 09:59:50 -08:00
Jaegeuk Kim	9259228571	f2fs: check discard alignment only for SEQWRITE zones For converntional zones, we don't need to align discard commands to exact zone size. Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-02-27 09:59:46 -08:00
Jaegeuk Kim	40465257ac	f2fs: wait for discard completion after submission We don't need to wait for each discard commands when unmounting the image. Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-02-27 09:59:39 -08:00
Jaegeuk Kim	47b8980816	f2fs: much larger batched trim_fs job We have a kernel thread to issue discard commands, so we can increase the number of batched discard sections. By default, now it becomes 4GB range. Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-02-27 09:59:30 -08:00
Jaegeuk Kim	ad4d307fce	f2fs: avoid very large discard command This patch adds MAX_DISCARD_BLOCKS() to avoid issuing too much large single discard command. Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-02-27 09:59:20 -08:00
Dave Jiang	11bac80004	mm, fs: reduce fault, page_mkwrite, and pfn_mkwrite to take only vmf ->fault(), ->page_mkwrite(), and ->pfn_mkwrite() calls do not need to take a vma and vmf parameter when the vma already resides in vmf. Remove the vma parameter to simplify things. [arnd@arndb.de: fix ARM build] Link: http://lkml.kernel.org/r/20170125223558.1451224-1-arnd@arndb.de Link: http://lkml.kernel.org/r/148521301778.19116.10840599906674778980.stgit@djiang5-desk3.ch.intel.com Signed-off-by: Dave Jiang <dave.jiang@intel.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com> Cc: Theodore Ts'o <tytso@mit.edu> Cc: Darrick J. Wong <darrick.wong@oracle.com> Cc: Matthew Wilcox <mawilcox@microsoft.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Jan Kara <jack@suse.com> Cc: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2017-02-24 17:46:54 -08:00
Jaegeuk Kim	70d625cbdb	f2fs: do SSR for node segments more aggresively This patch gives more SSR chances for node blocks. Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-02-24 10:01:41 -08:00
Jaegeuk Kim	c192f7a477	f2fs: find data segments across all the types Previously, if type is CURSEG_HOT_DATA, we only check CURSEG_HOT_DATA only. This patch fixes to search all the different types for SSR. Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-02-24 10:01:08 -08:00
Jaegeuk Kim	d0db7703ac	f2fs: do SSR in higher priority Let's check SSR in prior to LFS allocation. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-02-24 09:39:53 -08:00
Yunlong Song	035e97adab	f2fs: do SSR for data when there is enough free space In allocate_segment_by_default(), need_SSR() already detected it's time to do SSR. So, let's try to find victims for data segments more aggressively in time. Signed-off-by: Yunlong Song <yunlong.song@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-02-24 09:39:52 -08:00
Hou Pengyang	b9cd20619e	f2fs: node segment is prior to data segment selected victim As data segment gc may lead dnode dirty, so the greedy cost for data segment should be valid blocks * 2, that is data segment is prior to node segment. Signed-off-by: Hou Pengyang <houpengyang@huawei.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-02-24 09:39:40 -08:00
Yunlong Song	3436c4bdb3	f2fs: put allocate_segment after refresh_sit_entry SIT information should be updated before segment allocation, since SSR needs latest valid block information. Current code does not update the old_blkaddr info in sit_entry, so adjust the allocate_segment to its proper location. Commit `5e443818fa` ("f2fs: handle dirty segments inside refresh_sit_entry") puts it into wrong location. Signed-off-by: Yunlong Song <yunlong.song@huawei.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-02-24 09:37:30 -08:00
Hou Pengyang	e93b986525	f2fs: add ovp valid_blocks check for bg gc victim to fg_gc For foreground gc, greedy algorithm should be adapted, which makes this formula work well: (2 * (100 / config.overprovision + 1) + 6) But currently, we fg_gc have a prior to select bg_gc victim segments to gc first, these victims are selected by cost-benefit algorithm, we can't guarantee such segments have the small valid blocks, which may destroy the f2fs rule, on the worstest case, would consume all the free segments. This patch fix this by add a filter in check_bg_victims, if segment's has # of valid blocks over overprovision ratio, skip such segments. Cc: <stable@vger.kernel.org> Signed-off-by: Hou Pengyang <houpengyang@huawei.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-02-23 11:28:20 -08:00
Jaegeuk Kim	86d54795c9	f2fs: do not wait for writeback in write_begin Otherwise we can get livelock like below. [79880.428136] dbench D 0 18405 18404 0x00000000 [79880.428139] Call Trace: [79880.428142] __schedule+0x219/0x6b0 [79880.428144] schedule+0x36/0x80 [79880.428147] schedule_timeout+0x243/0x2e0 [79880.428152] ? update_sd_lb_stats+0x16b/0x5f0 [79880.428155] ? ktime_get+0x3c/0xb0 [79880.428157] io_schedule_timeout+0xa6/0x110 [79880.428161] __lock_page+0xf7/0x130 [79880.428164] ? unlock_page+0x30/0x30 [79880.428167] pagecache_get_page+0x16b/0x250 [79880.428171] grab_cache_page_write_begin+0x20/0x40 [79880.428182] f2fs_write_begin+0xa2/0xdb0 [f2fs] [79880.428192] ? f2fs_mark_inode_dirty_sync+0x16/0x30 [f2fs] [79880.428197] ? kmem_cache_free+0x79/0x200 [79880.428203] ? __mark_inode_dirty+0x17f/0x360 [79880.428206] generic_perform_write+0xbb/0x190 [79880.428213] ? file_update_time+0xa4/0xf0 [79880.428217] __generic_file_write_iter+0x19b/0x1e0 [79880.428226] f2fs_file_write_iter+0x9c/0x180 [f2fs] [79880.428231] __vfs_write+0xc5/0x140 [79880.428235] vfs_write+0xb2/0x1b0 [79880.428238] SyS_write+0x46/0xa0 [79880.428242] entry_SYSCALL_64_fastpath+0x1e/0xad Fixes: cae96a5c8ab6 ("f2fs: check io submission more precisely") Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-02-23 11:23:27 -08:00
Yunlei He	05eeb118a0	f2fs: replace __get_victim by dirty_segments in FG_GC In FG_GC process, it will search victim section twice. This will cause some dirty section with less valid blocks skip garbage collection. section # 26425 : valid blocks # 3 142.037567: get_victim_by_default: victim 26425 : valid blocks # 3 142.037585: f2fs_get_victim: dev = (259,30), type = No TYPE, policy = (Foreground GC, LFS-mode, Greedy), victim = 26425 ofs_unit = 1, pre_victim_secno = 26425, prefree = 0, free = 244 142.039494: f2fs_get_victim: dev = (259,30), type = Hot DATA, policy = (Background GC, SSR-mode, Greedy), victim = 19022 ofs_unit = 1, pre_victim_secno = 26425, prefree = 0, free = 24 142.070247: new_curseg: Debug: alloc new segment 26746 142.244341: f2fs_get_victim: dev = (259,30), type = No TYPE, policy = (Foreground GC, LFS-mode, Greedy), victim = 26054 ofs_unit = 1, pre_victim_secno = 26054, prefree = 0, free = 243 142.254475: do_garbage_collect: Debug: FG_GC, seg_freed = 1 142.293131: f2fs_get_victim: dev = (259,30), type = Warm DATA, policy = (Background GC, SSR-mode, Greedy), victim = 23466 ofs_unit = 1, pre_victim_secno = -1, prefree = 0, free = 244 142.319001: f2fs_get_victim: dev = (259,30), type = Warm DATA, policy = (Background GC, SSR-mode, Greedy), victim = 23467 ofs_unit = 1, pre_victim_secno = -1, prefree = 0, free = 244 142.368879: get_victim_by_default: victim 26425 : valid blocks # 3 142.368894: f2fs_get_victim: dev = (259,30), type = No TYPE, policy = (Foreground GC, LFS-mode, Greedy), victim = 26425 ofs_unit = 1, pre_victim_secno = 26425, prefree = 0, free = 244 142.378127: f2fs_get_victim: dev = (259,30), type = Hot DATA, policy = (Background GC, SSR-mode, Greedy), victim = 19612 ofs_unit = 1, pre_victim_secno = 26425, prefree = 0, free = 24 142.416917: new_curseg: Debug: alloc new segment 26054 142.656794: f2fs_get_victim: dev = (259,30), type = No TYPE, policy = (Foreground GC, LFS-mode, Greedy), victim = 25404 ofs_unit = 1, pre_victim_secno = 25404, prefree = 0, free = 243 142.662139: do_garbage_collect: Debug: FG_GC, seg_freed = 1 142.684159: new_curseg: Debug: alloc new segment 25197 142.685059: get_victim_by_default: victim 26425 : valid blocks # 3 142.685079: f2fs_get_victim: dev = (259,30), type = No TYPE, policy = (Foreground GC, LFS-mode, Greedy), victim = 26425 ofs_unit = 1, pre_victim_secno = 26425, prefree = 0, free = 243 142.701427: f2fs_get_victim: dev = (259,30), type = No TYPE, policy = (Foreground GC, LFS-mode, Greedy), victim = 26238 ofs_unit = 1, pre_victim_secno = 26238, prefree = 0, free = 243 142.707105: do_garbage_collect: Debug: FG_GC, seg_freed = 1 142.802444: f2fs_get_victim: dev = (259,30), type = Warm DATA, policy = (Background GC, SSR-mode, Greedy), victim = 23473 ofs_unit = 1, pre_victim_secno = -1, prefree = 0, free = 244 142.804422: get_victim_by_default: victim 26425 : valid blocks # 3 142.804443: f2fs_get_victim: dev = (259,30), type = No TYPE, policy = (Foreground GC, LFS-mode, Greedy), victim = 26425 ofs_unit = 1, pre_victim_secno = 26425, prefree = 0, free = 244 142.851567: f2fs_get_victim: dev = (259,30), type = Hot DATA, policy = (Background GC, SSR-mode, Greedy), victim = 19092 ofs_unit = 1, pre_victim_secno = 26425, prefree = 0, free = 24 142.865014: new_curseg: Debug: alloc new segment 26238 143.082245: f2fs_get_victim: dev = (259,30), type = No TYPE, policy = (Foreground GC, LFS-mode, Greedy), victim = 26307 ofs_unit = 1, pre_victim_secno = 26307, prefree = 0, free = 244 143.088252: do_garbage_collect: Debug: FG_GC, seg_freed = 1 143.128307: new_curseg: Debug: alloc new segment 25404 143.181846: get_victim_by_default: victim 26425 : valid blocks # 3 143.181872: f2fs_get_victim: dev = (259,30), type = No TYPE, policy = (Foreground GC, LFS-mode, Greedy), victim = 26425 ofs_unit = 1, pre_victim_secno = 26425, prefree = 0, free = 244 Signed-off-by: Yunlei He <heyunlei@huawei.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-02-23 11:23:26 -08:00
Jaegeuk Kim	88c5c13a50	f2fs: fix multiple f2fs_add_link() calls having same name It turns out a stakable filesystem like sdcardfs in AOSP can trigger multiple vfs_create() to lower filesystem. In that case, f2fs will add multiple dentries having same name which breaks filesystem consistency. Until upper layer fixes, let's work around by f2fs, which shows actually not much performance regression. Cc: <stable@vger.kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-02-23 11:23:25 -08:00
Jaegeuk Kim	d50aaeec90	f2fs: show actual device info in tracepoints This patch shows actual device information in the tracepoints. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-02-23 11:23:24 -08:00
Jaegeuk Kim	5b6c6be2d8	f2fs: use SSR for warm node as well We have had node chains, but haven't used it so far due to stale node blocks. Now, we have crc\|cp_ver in node footer and give random cp_ver at format time, we can start to use it again. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-02-23 11:23:22 -08:00
Chao Yu	39133a5015	f2fs: enable inline_xattr by default In android, since SElinux is enable, security policy will be appliedd for each file, it stores in inode as an xattr entry, so it will take one 4k size node block additionally for each file. Let's enable inline_xattr by default in order to save storage space. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-02-23 10:21:49 -08:00
Chao Yu	23cf7212a1	f2fs: introduce noinline_xattr mount option This patch introduces new mount option 'noinline_xattr', so we can disable inline xattr functionality which is already set as a default mount option. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-02-23 10:21:48 -08:00
Jaegeuk Kim	25cc5d3b9d	f2fs: avoid reading NAT page by get_node_info We've not seen this buggy case for a long time, so it's time to avoid this unnecessary get_node_info() call which reading NAT page to cache nat entry. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-02-23 10:21:47 -08:00
Jaegeuk Kim	9b064f7d0c	f2fs: remove build_free_nids() during checkpoint Let's avoid build_free_nids() in checkpoint path. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-02-23 10:10:53 -08:00
Chao Yu	d260081ccf	f2fs: change recovery policy of xattr node block Currently, if we call fsync after updating the xattr date belongs to the file, f2fs needs to trigger checkpoint to keep xattr data consistent. But, this policy cause low performance as checkpoint will block most foreground operations and cause unneeded and unrelated IOs around checkpoint. This patch will reuse regular file recovery policy for xattr node block, so, we change to write xattr node block tagged with fsync flag to warm area instead of cold area, and during recovery, we search warm node chain for fsynced xattr block, and do the recovery. So, for below application IO pattern, performance can be improved obviously: - touch file - create/update/delete xattr entry in file - fsync file Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-02-23 10:10:52 -08:00
Bhumika Goyal	2ad0ef846b	f2fs: super: constify fscrypt_operations structure Declare fscrypt_operations structure as const as it is only stored in the s_cop field of a super_block structure. This field is of type const, so fscrypt_operations structure having this property can be made const too. File size before: fs/f2fs/super.o text data bss dec hex filename 54131 31355 184 85670 14ea6 fs/f2fs/super.o File size after: fs/f2fs/super.o text data bss dec hex filename 54227 31259 184 85670 14ea6 fs/f2fs/super.o Signed-off-by: Bhumika Goyal <bhumirks@gmail.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-02-23 10:10:51 -08:00
Jaegeuk Kim	1200abb26f	f2fs: show checkpoint version at mount time If we mounted f2fs successfully, let's show current checkpoint version. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-02-23 10:10:50 -08:00
Jaegeuk Kim	7f54f51f46	f2fs: remove preflush for nobarrier case This patch removes REQ_PREFLUSH in the nobarrier case. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-02-23 10:10:48 -08:00
Jaegeuk Kim	942fd3192f	f2fs: check last page index in cached bio to decide submission If the cached bio has the last page's index, then we need to submit it. Otherwise, we don't need to submit it and can wait for further IO merges. Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-02-23 10:10:48 -08:00
Jaegeuk Kim	d68f735b3b	f2fs: check io submission more precisely This patch check IO submission more precisely than previous rough check. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-02-23 10:10:47 -08:00
Jaegeuk Kim	f566bae846	f2fs: call internal __write_data_page directly This patch introduces __write_data_page to call it by f2fs_write_cache_pages directly.. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-02-23 10:10:46 -08:00
Jaegeuk Kim	e7c75ab099	f2fs: avoid out-of-order execution of atomic writes We need to flush data writes before flushing last node block writes by using FUA with PREFLUSH. We don't need to guarantee precedent node writes since if those are not written, we can't reach to the last node block when scanning node block chain during roll-forward recovery. Afterwards f2fs_wait_on_page_writeback guarantees all the IO submission to disk, which builds a valid node block chain. Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-02-23 10:10:35 -08:00
Jaegeuk Kim	faa24895ac	f2fs: move write_node_page above fsync_node_pages This patch just moves write_node_page and introduces an inner function. Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-02-23 10:09:43 -08:00
Jaegeuk Kim	c1b221078b	f2fs: move flush tracepoint This patch moves the tracepoint location for flush command. Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-02-23 10:08:43 -08:00
Jaegeuk Kim	a00861dbca	f2fs: show # of APPEND and UPDATE inodes This patch shows cached # of APPEND and UPDATE inode entries. Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-02-22 20:54:53 -08:00
DongOh Shin	cac5a3d8f5	f2fs: fix 446 coding style warnings in f2fs.h 1) Nine coding style warnings below have been resolved: "Missing a blank line after declarations" 2) 435 coding style warnings below have been resolved: "function definition argument 'x' should also have an identifier name" 3) Two coding style warnings below have been resolved: "macros should not use a trailing semicolon" Signed-off-by: DongOh Shin <doscode.kr@gmail.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-02-22 20:24:55 -08:00
DongOh Shin	c64ab12e36	f2fs: fix 3 coding style errors in f2fs.h Two coding style errors below have been resolved: "Macros with complex values should be enclosed in parentheses" And a coding style error below has been resolved: "space prohibited before that ',' (ctx:WxW)" Signed-off-by: DongOh Shin <doscode.kr@gmail.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-02-22 20:24:55 -08:00
Jaegeuk Kim	8ed5974552	f2fs: declare missing static function We missed two functions declared as static functions. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-02-22 20:24:54 -08:00
Kaixu Xia	0cc0dec2b6	f2fs: show the fault injection mount option This patch shows the fault injection mount option in f2fs_show_options(). Signed-off-by: Kaixu Xia <xiakaixu@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-02-22 20:24:53 -08:00
Chao Yu	73545817c9	f2fs: fix null pointer dereference when issuing flush in ->fsync We only allocate flush merge control structure sbi::sm_info::fcc_info when flush_merge option is on, but in f2fs_issue_flush we still try to access member of the control structure without that option, it incurs panic as show below, fix it. Call Trace: __remove_ino_entry+0xa9/0xc0 [f2fs] f2fs_do_sync_file.isra.27+0x214/0x6d0 [f2fs] f2fs_sync_file+0x18/0x20 [f2fs] vfs_fsync_range+0x3d/0xb0 __do_page_fault+0x261/0x4d0 do_fsync+0x3d/0x70 SyS_fsync+0x10/0x20 do_syscall_64+0x6e/0x180 entry_SYSCALL64_slow_path+0x25/0x25 RIP: 0033:0x7f18ce260de0 RSP: 002b:00007ffdd4589258 EFLAGS: 00000246 ORIG_RAX: 000000000000004a RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 00007f18ce260de0 RDX: 0000000000000006 RSI: 00000000016c0360 RDI: 0000000000000003 RBP: 00000000016c0360 R08: 000000000000ffff R09: 000000000000001f R10: 00007ffdd4589020 R11: 0000000000000246 R12: 00000000016c0100 R13: 0000000000000000 R14: 00000000016c1f00 R15: 00000000016c0100 Code: fb 81 e3 00 08 00 00 48 89 45 a0 0f 1f 44 00 00 31 c0 85 db 75 27 41 81 e7 00 04 00 00 74 0c 41 8b 45 20 85 c0 0f 85 81 00 00 00 <f0> 41 ff 45 20 4c 89 e7 e8 f8 e9 ff ff f0 41 ff 4d 20 48 83 c4 RIP: f2fs_issue_flush+0x5b/0x170 [f2fs] RSP: ffffc90003b5fd78 CR2: 0000000000000020 ---[ end trace a09314c24f037648 ]--- Reported-by: Shuoran Liu <liushuoran@huawei.com> Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-02-22 20:24:52 -08:00
Chao Yu	dba79f38bc	f2fs: fix to avoid overflow when left shifting page offset We use following method to calculate size with current page index: size = index << PAGE_SHIFT If type of index has only 32-bits size, left shifting will incur overflow, which makes result incorrect. So let's cast index with 64-bits type to avoid such issue. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-02-22 20:24:51 -08:00
Chao Yu	ba38c27eb9	f2fs: enhance lookup xattr Previously, in getxattr we will load all entries both in inline xattr and xattr node block, and then do the lookup in all entries, but our lookup flow shows low efficiency, since if we can lookup and hit in inline xattr of inode page cache first, we don't need to load and lookup xattr node block, which can obviously save cpu time and IO latency. Signed-off-by: Chao Yu <yuchao0@huawei.com> [Jaegeuk Kim: initialize NULL to avoid warning] Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-02-22 20:24:51 -08:00
Wei Fang	b86e33075e	f2fs: fix a dead loop in f2fs_fiemap() A dead loop can be triggered in f2fs_fiemap() using the test case as below: ... fd = open(); fallocate(fd, 0, 0, 4294967296); ioctl(fd, FS_IOC_FIEMAP, fiemap_buf); ... It's caused by an overflow in __get_data_block(): ... bh->b_size = map.m_len << inode->i_blkbits; ... map.m_len is an unsigned int, and bh->b_size is a size_t which is 64 bits on 64 bits archtecture, type conversion from an unsigned int to a size_t will result in an overflow. In the above-mentioned case, bh->b_size will be zero, and f2fs_fiemap() will call get_data_block() at block 0 again an again. Fix this by adding a force conversion before left shift. Signed-off-by: Wei Fang <fangwei1@huawei.com> Acked-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-02-22 20:24:49 -08:00
Jaegeuk Kim	dc91de78e5	f2fs: do not preallocate blocks which has wrong buffer Sheng Yong reports needless preallocation if write(small_buffer, large_size) is called. In that case, f2fs preallocates large_size, but vfs returns early due to small_buffer size. Let's detect it before preallocation phase in f2fs. Reported-by: Sheng Yong <shengyong1@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-02-22 20:24:48 -08:00
Jaegeuk Kim	dcc9165dbf	f2fs: show # of on-going flush and discard bios This patch adds stat information for flush and discard commands. Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-02-22 20:24:47 -08:00
Jaegeuk Kim	1546996348	f2fs: add a kernel thread to issue discard commands asynchronously This patch adds a kernel thread to issue discard commands. It proposes three states, D_PREP, D_SUBMIT, and D_DONE to identify current bio status. Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-02-22 20:24:45 -08:00
Jaegeuk Kim	0b54fb8458	f2fs: factor out discard command info into discard_cmd_control This patch adds discard_cmd_control with the existing discarding controls. Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-02-22 18:48:53 -08:00
Jaegeuk Kim	d4adb30f25	f2fs: reorganize stat information This patch modifies stat information more clearly. Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-02-22 18:48:52 -08:00
Jaegeuk Kim	b01a92019c	f2fs: clean up flush/discard command namings This patch simply cleans up the names for flush/discard commands. Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-02-22 18:48:51 -08:00
Chao Yu	ae27d62e6b	f2fs: check in-memory sit version bitmap This patch adds a mirror for sit version bitmap, and use it to detect in-memory bitmap corruption which may be caused by bit-transition of cache or memory overflow. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-02-22 18:48:50 -08:00
Chao Yu	599a09b2c1	f2fs: check in-memory nat version bitmap This patch adds a mirror for nat version bitmap, and use it to detect in-memory bitmap corruption which may be caused by bit-transition of cache or memory overflow. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-02-22 18:48:49 -08:00
Chao Yu	355e78913c	f2fs: check in-memory block bitmap This patch adds a mirror for valid block bitmap, and use it to detect in-memory bitmap corruption which may be caused by bit-transition of cache or memory overflow. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-02-22 18:48:49 -08:00
Chao Yu	5fe457430e	f2fs: introduce FI_ATOMIC_COMMIT This patch introduces a new flag to indicate inode status of doing atomic write committing, so that, we can keep atomic write status for inode during atomic committing, then we can skip GCing pages of atomic write inode, that avoids random GCed datas being mixed with current transaction, so isolation of transaction can be kept. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-02-22 18:48:48 -08:00
Chao Yu	939afa943c	f2fs: clean up with list_{first, last}_entry Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-02-22 18:48:47 -08:00
Jaegeuk Kim	25290fa559	f2fs: return fs_trim if there is no candidate If there is no candidate to submit discard command during f2fs_trim_fs, let's return without checkpoint. Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-02-22 18:48:40 -08:00
Jaegeuk Kim	0333ad4e4f	f2fs: avoid needless checkpoint in f2fs_trim_fs The f2fs_trim_fs() doesn't need to do checkpoint if there are newly allocated data blocks only which didn't change the critical checkpoint data such as nat and sit entries. Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-02-22 13:16:36 -08:00
Linus Torvalds	6c24337f22	Various cleanups for the file system encryption feature. -----BEGIN PGP SIGNATURE----- iQEzBAABCAAdFiEEK2m5VNv+CHkogTfJ8vlZVpUNgaMFAlirP6wACgkQ8vlZVpUN gaMwpQgApR67CxzlstxYjZpWPAqC8McJ2FBDX+mCOle5Vkc1WQDklwr0oCfQThTj eDSFRhNfIvyPh0DJ589PxBCsWOqN5h6Si7hD5ZinomVNI+IL89OytaU5EV2OpWaW iKdJgO9Tm8U7LuY6FOIoVdX57kUXVdkWoj61rC056B1SNhnNiVeofi7lYDM8Ix4q IGSQ9W24iQKmCk4hCwgObhJBRK9RnlOH0GLUmpMaS+jnfnj/uePwdxWEFsPuCOob 8acAJ49lr55kjIw79E0BAyWxhEZ2aiArHk8PaWynT/DyNq3ftcapPlpftoeba8vo glBJRX70QxPvt0iHEp0ykfExkhWhFA== =Joki -----END PGP SIGNATURE----- Merge tag 'fscrypt-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/fscrypt Pull fscrypt updates from Ted Ts'o: "Various cleanups for the file system encryption feature" * tag 'fscrypt-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/fscrypt: fscrypt: constify struct fscrypt_operations fscrypt: properly declare on-stack completion fscrypt: split supp and notsupp declarations into their own headers fscrypt: remove redundant assignment of res fscrypt: make fscrypt_operations.key_prefix a string fscrypt: remove unused 'mode' member of fscrypt_ctx ext4: don't allow encrypted operations without keys fscrypt: make test_dummy_encryption require a keyring key fscrypt: factor out bio specific functions fscrypt: pass up error codes from ->get_context() fscrypt: remove user-triggerable warning messages fscrypt: use EEXIST when file already uses different policy fscrypt: use ENOTDIR when setting encryption policy on nondirectory fscrypt: use ENOKEY when file cannot be created w/o key	2017-02-20 18:22:31 -08:00
Eric Biggers	6f69f0ed61	fscrypt: constify struct fscrypt_operations Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Richard Weinberger <richard@nod.at>	2017-02-08 10:59:57 -05:00
Eric Biggers	46f47e4800	fscrypt: split supp and notsupp declarations into their own headers Previously, each filesystem configured without encryption support would define all the public fscrypt functions to their notsupp_* stubs. This list of #defines had to be updated in every filesystem whenever a change was made to the public fscrypt functions. To make things more maintainable now that we have three filesystems using fscrypt, split the old header fscrypto.h into several new headers. fscrypt_supp.h contains the real declarations and is included by filesystems when configured with encryption support, whereas fscrypt_notsupp.h contains the inline stubs and is included by filesystems when configured without encryption support. fscrypt_common.h contains common declarations needed by both. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2017-02-06 23:26:43 -05:00
Jaegeuk Kim	4e6a8d9b22	f2fs: relax async discard commands more This patch relaxes async discard commands to avoid waiting its end_io during checkpoint. Instead of waiting them during checkpoint, it will be done when actually reusing them. Test on initial partition of nvme drive. # time fstrim /mnt/test Before : 6.158s After : 4.822s Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-01-29 12:46:01 +09:00
Jaegeuk Kim	bb95d9ab2a	f2fs: drop exist_data for inline_data when truncated to 0 A test program gets the SEEK_DATA with two values between a new created file and the exist file on f2fs filesystem. F2FS filesystem, (the first "test1" is a new file) SEEK_DATA size != 0 (offset = 8192) SEEK_DATA size != 0 (offset = 4096) PNFS filesystem, (the first "test1" is a new file) SEEK_DATA size != 0 (offset = 4096) SEEK_DATA size != 0 (offset = 4096) int main(int argc, char *argv) { char filename = argv[1]; int offset = 1, i = 0, fd = -1; if (argc < 2) { printf("Usage: %s f2fsfilename\n", argv[0]); return -1; } /* if (!access(filename, F_OK) \|\| errno != ENOENT) { printf("Needs a new file for test, %m\n"); return -1; }/ fd = open(filename, O_RDWR \| O_CREAT, 0777); if (fd < 0) { printf("Create test file %s failed, %m\n", filename); return -1; } for (i = 0; i < 20; i++) { offset = 1 << i; ftruncate(fd, 0); lseek(fd, offset, SEEK_SET); write(fd, "test", 5); / Get the alloc size by seek data equal zero*/ if (lseek(fd, 0, SEEK_DATA)) { printf("SEEK_DATA size != 0 (offset = %d)\n", offset); break; } } close(fd); return 0; } Reported-and-Tested-by: Kinglong Mee <kinglongmee@gmail.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-01-29 12:46:01 +09:00
Jaegeuk Kim	363fa4e078	f2fs: don't allow encrypted operations without keys This patch fixes the renaming bug on encrypted filenames, which was pointed by (ext4: don't allow encrypted operations without keys) Cc: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-01-29 12:46:01 +09:00
Jaegeuk Kim	26a28a0c1e	f2fs: show the max number of atomic operations This patch adds to show the max number of atomic operations which are conducting concurrently. Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-01-29 12:46:01 +09:00
Jaegeuk Kim	ec91538dcc	f2fs: get io size bit from mount option This patch adds to set io_size_bits from mount option. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-01-29 12:46:01 +09:00
Jaegeuk Kim	0a595ebaaa	f2fs: support IO alignment for DATA and NODE writes This patch implements IO alignment by filling dummy blocks in DATA and NODE write bios. If we can guarantee, for example, 32KB or 64KB for such the IOs, we can eliminate underlying dummy page problem which FTL conducts in order to close MLC or TLC partial written pages. Note that, - it requires "-o mode=lfs". - IO size should be power of 2, not exceed BIO_MAX_PAGES, 256. - read IO is still 4KB. - do checkpoint at fsync, if dummy NODE page was written. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-01-29 12:46:01 +09:00
Jaegeuk Kim	554b5125f5	f2fs: add submit_bio tracepoint This patch adds final submit_bio() tracepoint. Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-01-29 12:46:01 +09:00
Jaegeuk Kim	9d52a504db	f2fs: reassign new segment for mode=lfs Otherwise we can remain wrong curseg->next_blkoff, resulting in fsck failure. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-01-29 12:46:01 +09:00
Yunlei He	650d3c4e56	f2fs: fix a missing discard prefree segments If userspace issue a fstrim with a range not involve prefree segments, it will reuse these segments without discard. This patch fix it. Signed-off-by: Yunlei He <heyunlei@huawei.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-01-29 12:46:01 +09:00
Geliang Tang	ed0b56209f	f2fs: use rb_entry_safe Use rb_entry_safe() instead of open-coding it. Signed-off-by: Geliang Tang <geliangtang@gmail.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-01-29 12:46:01 +09:00
Yunlei He	746e240392	f2fs: add a case of no need to read a page in write begin If the range we write cover the whole valid data in the last page, we do not need to read it. Signed-off-by: Yunlei He <heyunlei@huawei.com> [Jaegeuk Kim: nullify the remaining area (fix: xfstests/f2fs/001)] Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-01-29 12:46:01 +09:00
Yunlei He	7855eba4d6	f2fs: fix a problem of using memory after free This patch fix a problem of using memory after free in function __try_merge_extent_node. Fixes: `0f825ee6e8` ("f2fs: add new interfaces for extent tree") Cc: <stable@vger.kernel.org> Signed-off-by: Yunlei He <heyunlei@huawei.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-01-29 12:46:01 +09:00
Dan Carpenter	07fe8d4440	f2fs: remove unneeded condition We checked that "inode" is not an error pointer earlier so there is no need to check again here. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-01-29 12:46:00 +09:00
Chao Yu	5c9e418436	f2fs: don't cache nat entry if out of memory If we run out of memory, in cache_nat_entry, it's better to avoid loop for allocating memory to cache nat entry, so in low memory scenario, for read path of node block, I expect this can avoid unneeded latency. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-01-29 12:46:00 +09:00
Yunlei He	fed2466848	f2fs: remove unused values in recover_fsync_data This patch remove unused values in function recover_fsync_data Signed-off-by: Yunlei He <heyunlei@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-01-29 12:46:00 +09:00
Damien Le Moal	f99e86485c	block: Rename blk_queue_zone_size and bdev_zone_size All block device data fields and functions returning a number of 512B sectors are by convention named xxx_sectors while names in the form xxx_size are generally used for a number of bytes. The blk_queue_zone_size and bdev_zone_size functions were not following this convention so rename them. No functional change is introduced by this patch. Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> Collapsed the two patches, they were nonsensically split and broke bisection. Signed-off-by: Jens Axboe <axboe@fb.com>	2017-01-12 07:58:32 -07:00
Eric Biggers	a5d431eff2	fscrypt: make fscrypt_operations.key_prefix a string There was an unnecessary amount of complexity around requesting the filesystem-specific key prefix. It was unclear why; perhaps it was envisioned that different instances of the same filesystem type could use different key prefixes, or that key prefixes could be binary. However, neither of those things were implemented or really make sense at all. So simplify the code by making key_prefix a const char *. Signed-off-by: Eric Biggers <ebiggers@google.com> Reviewed-by: Richard Weinberger <richard@nod.at> Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2017-01-08 01:03:41 -05:00
Eric Biggers	54475f531b	fscrypt: use ENOKEY when file cannot be created w/o key As part of an effort to clean up fscrypt-related error codes, make attempting to create a file in an encrypted directory that hasn't been "unlocked" fail with ENOKEY. Previously, several error codes were used for this case, including ENOENT, EACCES, and EPERM, and they were not consistent between and within filesystems. ENOKEY is a better choice because it expresses that the failure is due to lacking the encryption key. It also matches the error code returned when trying to open an encrypted regular file without the key. I am not aware of any users who might be relying on the previous inconsistent error codes, which were never documented anywhere. This failure case will be exercised by an xfstest. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2016-12-31 16:26:20 -05:00
Linus Torvalds	231753ef78	Merge uncontroversial parts of branch 'readlink' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs Pull partial readlink cleanups from Miklos Szeredi. This is the uncontroversial part of the readlink cleanup patch-set that simplifies the default readlink handling. Miklos and Al are still discussing the rest of the series. * git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs: vfs: make generic_readlink() static vfs: remove ".readlink = generic_readlink" assignments vfs: default to generic_readlink() vfs: replace calling i_op->readlink with vfs_readlink() proc/self: use generic_readlink ecryptfs: use vfs_get_link() bad_inode: add missing i_op initializers	2016-12-17 19:16:12 -08:00

... 2 3 4 5 6 ...

2000 Commits