Commit Graph

1312 Commits

Author SHA1 Message Date
Jaegeuk Kim
957efb0c21 Revert "f2fs: check the node block address of newly allocated nid"
Original issue is fixed by:

  f2fs: cover more area with nat_tree_lock

This reverts commit 24928634f8.
2016-01-06 19:15:49 -08:00
Jaegeuk Kim
a51311938e f2fs: cover more area with nat_tree_lock
There was a subtle bug on nat cache management which incurs wrong nid allocation
or wrong block addresses when try_to_free_nats is triggered heavily.
This patch enlarges the previous coverage of nat_tree_lock to avoid data race.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-01-06 19:15:48 -08:00
Dmitry Monakhov
a1c6f05733 fs: use block_device name vsprintf helper
Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-01-06 13:03:18 -05:00
Chao Yu
e0afc4d6d0 f2fs: introduce max_file_blocks in sbi
Introduce max_file_blocks in sbi to store max block index of file in f2fs,
it could be used to avoid unneeded calculation of max block index in
runtime.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
[Jaegeuk Kim: fix overflow of sbi->max_file_blocks]
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-01-03 21:40:04 -08:00
Chao Yu
3a9e6433a3 f2fs crypto: check CONFIG_F2FS_FS_XATTR for encrypted symlink
Add missed CONFIG_F2FS_FS_XATTR for encrypted symlink inode in order
to avoid unneeded registry of ->{get,set,remove}xattr.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-12-31 18:54:50 -08:00
Jaegeuk Kim
137d09f002 f2fs: introduce zombie list for fast shrinking extent trees
This patch removes refcount, and instead, adds zombie_list to shrink directly
without radix tree traverse.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-12-31 15:39:22 -08:00
Jaegeuk Kim
c00ba55485 f2fs: monitor zombie_tree count
This patch adds an entry to show the number of zombie extent_tree.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-12-31 15:33:00 -08:00
Jaegeuk Kim
c46a155bdf f2fs: use IPU for fdatasync
This patch fixes missing IPU condition when fdatasync is called.
With this patch, fdatasync is able to avoid additional node writes for recovery.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-12-31 13:49:17 -08:00
Jaegeuk Kim
8d4ea29b64 f2fs: write pending bios when cp_error is set
When testing ioc_shutdown, put_super is able to be hanged by waiting for
writebacking pages as follows.

INFO: task umount:2723 blocked for more than 120 seconds.
      Tainted: G           O    4.4.0-rc3+ #8
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
umount          D ffff88000859f9d8     0  2723   2110 0x00000000
 ffff88000859f9d8 0000000000000000 0000000000000000 ffffffff81e11540
 ffff880078c225c0 ffff8800085a0000 ffff88007fc17440 7fffffffffffffff
 ffffffff818239f0 ffff88000859fb48 ffff88000859f9f0 ffffffff8182310c
Call Trace:
 [<ffffffff818239f0>] ? bit_wait+0x50/0x50
 [<ffffffff8182310c>] schedule+0x3c/0x90
 [<ffffffff81827fb9>] schedule_timeout+0x2d9/0x430
 [<ffffffff810e0f8f>] ? mark_held_locks+0x6f/0xa0
 [<ffffffff8111614d>] ? ktime_get+0x7d/0x140
 [<ffffffff818239f0>] ? bit_wait+0x50/0x50
 [<ffffffff8106a655>] ? kvm_clock_get_cycles+0x25/0x30
 [<ffffffff8111617c>] ? ktime_get+0xac/0x140
 [<ffffffff818239f0>] ? bit_wait+0x50/0x50
 [<ffffffff81822564>] io_schedule_timeout+0xa4/0x110
 [<ffffffff81823a25>] bit_wait_io+0x35/0x50
 [<ffffffff818235bd>] __wait_on_bit+0x5d/0x90
 [<ffffffff811b9e8b>] wait_on_page_bit+0xcb/0xf0
 [<ffffffff810d5f90>] ? autoremove_wake_function+0x40/0x40
 [<ffffffff811cf84c>] truncate_inode_pages_range+0x4bc/0x840
 [<ffffffff811cfc3d>] truncate_inode_pages_final+0x4d/0x60
 [<ffffffffc023ced5>] f2fs_evict_inode+0x75/0x400 [f2fs]
 [<ffffffff812639bc>] evict+0xbc/0x190
 [<ffffffff81263d19>] iput+0x229/0x2c0
 [<ffffffffc0241885>] f2fs_put_super+0x105/0x1a0 [f2fs]
 [<ffffffff8124756a>] generic_shutdown_super+0x6a/0xf0
 [<ffffffff812478f7>] kill_block_super+0x27/0x70
 [<ffffffffc0241290>] kill_f2fs_super+0x20/0x30 [f2fs]
 [<ffffffff81247b03>] deactivate_locked_super+0x43/0x70
 [<ffffffff81247f4c>] deactivate_super+0x5c/0x60
 [<ffffffff81268d2f>] cleanup_mnt+0x3f/0x90
 [<ffffffff81268dc2>] __cleanup_mnt+0x12/0x20
 [<ffffffff810ac463>] task_work_run+0x73/0xa0
 [<ffffffff810032ac>] exit_to_usermode_loop+0xcc/0xd0
 [<ffffffff81003e7c>] syscall_return_slowpath+0xcc/0xe0
 [<ffffffff81829ea2>] int_ret_from_sys_call+0x25/0x9f

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-12-31 13:08:02 -08:00
Jaegeuk Kim
1f6fa26199 f2fs: remove f2fs_bug_on in terms of max_depth
There is no report on this bug_on case, but if malicious attacker changed this
field intentionally, we can just reset it as a MAX value.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-12-31 11:42:46 -08:00
Jaegeuk Kim
732d56489f f2fs: fix f2fs_ioc_abort_volatile_write
There are two rules to handle aborting volatile or atomic writes.

1. drop atomic writes
 - we don't need to keep any stale db data.

2. write journal data
 - we should keep the journal data with fsync for db recovery.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-12-30 10:53:25 -08:00
Chao Yu
4e0d836d5f f2fs: fix to skip recovering dot dentries in a readonly fs
If filesystem is readonly, leave user message info instead of recovering
inline dot inode.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-12-30 10:51:28 -08:00
Jaegeuk Kim
ed3d12561a f2fs: load largest extent all the time
Otherwise, we can get mismatched largest extent information.

One example is:
1. mount f2fs w/ extent_cache
2. make a small extent
3. umount
4. mount f2fs w/o extent_cache
5. update the largest extent
6. umount
7. mount f2fs w/ extent_cache
8. get the old extent made by #2

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-12-30 10:14:20 -08:00
Jaegeuk Kim
819d9153d4 f2fs: use i_size_read to get i_size
We need to use i_size_read() to get inode->i_size.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-12-30 10:14:19 -08:00
Jaegeuk Kim
8dc0d6a11e f2fs: early check broken symlink length in the encrypted case
If link is broken, its len is zero, and we don't need to move forward.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-12-30 10:14:19 -08:00
Chao Yu
e96248bb45 f2fs: clean up f2fs_ioc_write_checkpoint
Use f2fs_sync_fs to clean up codes in f2fs_ioc_write_checkpoint.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
[Jaegeuk Kim: remove unused err variable]
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-12-30 10:14:18 -08:00
Yunlei He
179448bfe4 f2fs: add a max block check for get_data_block_bmap
This patch adds a max block check for get_data_block_bmap.

Trinity test program will send a block number as parameter into
ioctl_fibmap, which will be used in get_node_path(), when the block
number large than f2fs max blocks, it will trigger kernel bug.

Signed-off-by: Yunlei He <heyunlei@huawei.com>
Signed-off-by: Xue Liu <liuxueliu.liu@huawei.com>
[Jaegeuk Kim: fix missing condition, pointed by Chao Yu]
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-12-30 10:14:17 -08:00
Fan Li
9a950d52b7 f2fs: fix bugs and simplify codes of f2fs_fiemap
fix bugs:
1. len could be updated incorrectly when start+len is beyond isize.
2. If there is a hole consisting of more than two blocks, it could
   fail to add FIEMAP_EXTENT_LAST flag for the last extent.
3. If there is an extent beyond isize, when we search extents in a range
   that ends at isize, it will also return the extent beyond isize,
   which is outside the range.

Signed-off-by: Fan li <fanofcode.li@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-12-30 10:14:16 -08:00
Chao Yu
6d5a1495ee f2fs: let user being aware of IO error
Sometimes we keep dumb when IO error occur in lower layer device, so user
will not receive any error return value for some operation, but actually,
the operation did not succeed.

This sould be avoided, so this patch reports such kind of error to user.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-12-30 10:14:15 -08:00
Chao Yu
d53841740f f2fs: add missing f2fs_balance_fs in __recover_dot_dentries
__recover_do_dentries will try to grab free space in storage, so fix to
add missing f2fs_balance_fs here.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-12-30 10:14:14 -08:00
Jaegeuk Kim
06d6f22639 f2fs: declare static function
The __f2fs_commit_super is static.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-12-30 10:14:14 -08:00
Jaegeuk Kim
b4d07a3e1a f2fs: avoid f2fs_lock_op in f2fs_write_begin
If f2fs_write_begin is to update data, we can bypass calling f2fs_lock_op() in
order to avoid the checkpoint latency in the write syscall.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-12-30 10:14:13 -08:00
Jaegeuk Kim
4aa69d5667 f2fs: return early when trying to read null nid
If get_node_page() gets zero nid, we can return early without getting a wrong
page. For example, get_dnode_of_data() can try to do that.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-12-30 10:14:12 -08:00
Jaegeuk Kim
2aadac085c f2fs: introduce prepare_write_begin to clean up
This patch adds prepare_write_begin to clean f2fs_write_begin.
The major role of this function is to convert any inline_data and allocate
or find block address.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-12-30 10:14:11 -08:00
Chao Yu
fba48a8b14 f2fs: don't convert inline inode when inline_data option is disable
If inline_data option is disable, when truncating an inline inode with
size which is not exceed maxinum inline size, we should not convert
inline inode to regular one to avoid the overhead of synchronizing
conversion.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-12-30 10:14:10 -08:00
Chao Yu
c34f42e2cb f2fs: report error of do_checkpoint
do_checkpoint and write_checkpoint can fail due to reasons like triggering
in a readonly fs or encountering IO error of storage device.

So it's better to report such error info to user, let user be aware of
failure of doing checkpoint.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-12-30 10:14:09 -08:00
Jaegeuk Kim
2a34076070 f2fs: call f2fs_balance_fs only when node was changed
If user tries to update or read data, we don't need to call f2fs_balance_fs
which triggers f2fs_gc, which increases unnecessary long latency.

Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-12-30 10:14:09 -08:00
Chao Yu
3104af35eb f2fs: reduce covered region of sbi->cp_rwsem in f2fs_map_blocks
Only cover sbi->cp_rwsem on one dnode page's allocation and modification
instead of multiple's in f2fs_map_blocks, it can reduce the covered region
of cp_rwsem, then we can avoid potential long time delay for concurrent
checkpointer.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-12-30 10:14:08 -08:00
Jaegeuk Kim
93bae099ea f2fs: record node block allocation in dnode_of_data
This patch introduces recording node block allocation in dnode_of_data.
This information helps to figure out whether any node block is allocated during
specific file operations.

Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-12-30 10:14:07 -08:00
Jaegeuk Kim
00623e6bcf f2fs: avoid unnecessary f2fs_gc for dir operations
The f2fs_balance_fs doesn't need to cover f2fs_new_inode or f2fs_find_entry
works.

Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-12-30 10:14:06 -08:00
Jaegeuk Kim
b9d777b85f f2fs: check inline_data flag at converting time
We can check inode's inline_data flag  when calling to convert it.

Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-12-30 10:14:05 -08:00
Jaegeuk Kim
74fd8d9927 f2fs: speed up shrinking extent tree entries
If there is no candidates for shrinking slab entries, we don't need to traverse
any trees at all.

Reviewed-by: Chao Yu <chao2.yu@samsung.com>
[Jaegeuk Kim: fix missing initialization reported by Yunlei He]
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-12-30 10:13:00 -08:00
Al Viro
fceef393a5 switch ->get_link() to delayed_call, kill ->put_link()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-12-30 13:01:03 -05:00
Jaegeuk Kim
7441ccef33 f2fs: use atomic variable for total_extent_tree
It would be better to use atomic variable for total_extent_tree.

Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-12-22 10:31:41 -08:00
Chao Yu
4cf185379b f2fs: add a tracepoint for sync_dirty_inodes
This patch adds a tracepoint for sync_dirty_inodes.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-12-17 09:55:27 -08:00
Fan Li
7df3a4318d f2fs: optimize the flow of f2fs_map_blocks
check map->m_len right after it changes to avoid excess call
to update dnode_of_data.

Signed-off-by: Fan li <fanofcode.li@samsung.com>
Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-12-17 09:54:42 -08:00
Chao Yu
36b35a0dbe f2fs: support data flush in background
Previously, when finishing a checkpoint, we have persisted all fs meta
info including meta inode, node inode, dentry page of directory inode, so,
after a sudden power cut, f2fs can recover from last checkpoint with full
directory structure.

But during checkpoint, we didn't flush dirty pages of regular and symlink
inode, so such dirty datas still in memory will be lost in that moment of
power off.

In order to reduce the chance of lost data, this patch enables
f2fs_balance_fs_bg with the ability of data flushing. It will try to flush
user data before starting a checkpoint. So user's data written after last
checkpoint which may not be fsynced could be saved.

When we mount with data_flush option, after every period of cp_interval
(could be configured in sysfs: /sys/fs/f2fs/device/cp_interval) seconds
user data could be flushed into device once f2fs_balance_fs_bg was called
in kworker thread or gc thread.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-12-17 09:53:26 -08:00
Chao Yu
33fbd5100d f2fs: stat dirty regular/symlink inodes
Add to stat dirty regular and symlink inode for showing in debugfs.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-12-17 09:53:19 -08:00
Chao Yu
343f40f0a7 f2fs: introduce new option for controlling data flush
Add a new option 'data_flush' to enable data flush functionality.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-12-16 09:25:48 -08:00
Chao Yu
c227f91273 f2fs: record dirty status of regular/symlink inode
Maintain regular/symlink inode which has dirty pages in global dirty list
and record their total dirty pages count like the way of handling directory
inode.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-12-16 08:58:12 -08:00
Chao Yu
b3980910f7 f2fs: introduce __f2fs_commit_super
Introduce __f2fs_commit_super to include duplicated codes in
f2fs_commit_super for cleanup.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-12-16 08:58:07 -08:00
Jaegeuk Kim
55d1cdb25a f2fs: relocate tracepoint of write_checkpoint
It needs to relocate its location to see exact trace logs.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-12-16 08:58:06 -08:00
Chao Yu
e8240f656d f2fs: don't grab super block buffer header all the time
We have already got one copy of valid super block in memory, do not grab
buffer header of super block all the time.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-12-16 08:58:06 -08:00
Yunlei He
b39f0de23d f2fs: backup raw_super in sbi
f2fs use fields of f2fs_super_block struct directly in a grabbed buffer.

Once the buffer happen to be destroyed (e.g. through dd), it may bring
in unpredictable effect on f2fs.

This patch fixes to allocate additional buffer to store datas of super
block rather than using grabbed block buffer directly.

Signed-off-by: Yunlei He <heyunlei@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-12-16 08:58:05 -08:00
Fan Li
a1c1e9b74f f2fs: fix to reset variable correctlly
f2fs_map_blocks will set m_flags and m_len to 0, so we don't need to
reset m_flags ourselves, but have to reset m_len to correct value
before use it again.

Signed-off-by: Fan li <fanofcode.li@samsung.com>
Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-12-16 08:58:04 -08:00
Chao Yu
6ad7609a18 f2fs: introduce __remove_dirty_inode
Introduce __remove_dirty_inode to clean up codes in remove_dirty_dir_inode.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-12-15 13:31:28 -08:00
Chao Yu
2710fd7e00 f2fs: introduce dirty list node in inode info
Add a new dirt list node member in inode info for linking the inode to
global dirty list in superblock, instead of old implementation which
allocate slab cache memory as an entry to inode.

It avoids memory pressure due to slab cache allocation, and also makes
codes more clean.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-12-15 13:24:19 -08:00
Chao Yu
a49324f127 f2fs: rename {add,remove,release}_dirty_inode to {add,remove,release}_ino_entry
remove_dirty_dir_inode will be renamed to remove_dirty_inode as a generic
function in following patch for removing directory/regular/symlink inode
in global dirty list.

Here rename ino management related functions for readability, also in
order to avoid name conflict.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-12-15 13:23:43 -08:00
Chao Yu
9a59b62fd8 f2fs: do more integrity verification for superblock
Do more sanity check for superblock during ->mount.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-12-14 18:58:13 -08:00
Fan Li
e8931582bd f2fs: fix to update variable correctly when skip a unmapped block
map.m_len should be reduced after skip a block

Signed-off-by: Fan li <fanofcode.li@samsung.com>
Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-12-14 10:17:54 -08:00
Fan Li
d8fe4f0e74 f2fs: write only the pages in range during defragment
@lend of filemap_write_and_wait_range is supposed to be a "offset
in bytes where the range ends (inclusive)". Subtract 1 to avoid
writing an extra page.

Signed-off-by: Fan li <fanofcode.li@samsung.com>
Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-12-14 10:17:54 -08:00
Chao Yu
e1c51b9f1d f2fs: clean up node page updating flow
If read_node_page return LOCKED_PAGE, in its caller it's better a) skip
unneeded 'Update' flag and mapping info verfication; b) check nid value
stored in footer structure of node page.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-12-14 09:09:17 -08:00
Andreas Gruenbacher
764a5c6b1f xattr handlers: Simplify list operation
Change the list operation to only return whether or not an attribute
should be listed.  Copying the attribute names into the buffer is moved
to the callers.

Since the result only depends on the dentry and not on the attribute
name, we do not pass the attribute name to list operations.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-12-13 19:46:12 -05:00
Jaegeuk Kim
ea212a4a7a f2fs: use lock_buffer when changing superblock
When modifying sb contents, we need to use lock its buffer.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-12-09 09:53:18 -08:00
Jaegeuk Kim
5d909cdbbb f2fs: refactor f2fs_commit_super
Previously, f2fs_commit_super hacks the bh->blocknr to write the broken
alternate superblock.
Instead of it, we should use the correct logic to retrieve its buffer head
with locking it appropriately.

Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-12-09 09:51:07 -08:00
Jaegeuk Kim
80609448cd f2fs: enhance the bit operation for SSR
This patch enhances the existing bit operation when f2fs allocates SSR
blocks.

Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-12-09 09:50:32 -08:00
Al Viro
6b2553918d replace ->follow_link() with new method that could stay in RCU mode
new method: ->get_link(); replacement of ->follow_link().  The differences
are:
	* inode and dentry are passed separately
	* might be called both in RCU and non-RCU mode;
the former is indicated by passing it a NULL dentry.
	* when called that way it isn't allowed to block
and should return ERR_PTR(-ECHILD) if it needs to be called
in non-RCU mode.

It's a flagday change - the old method is gone, all in-tree instances
converted.  Conversion isn't hard; said that, so far very few instances
do not immediately bail out when called in RCU mode.  That'll change
in the next commits.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-12-08 22:41:54 -05:00
Al Viro
21fc61c73c don't put symlink bodies in pagecache into highmem
kmap() in page_follow_link_light() needed to go - allowing to hold
an arbitrary number of kmaps for long is a great way to deadlocking
the system.

new helper (inode_nohighmem(inode)) needs to be used for pagecache
symlinks inodes; done for all in-tree cases.  page_follow_link_light()
instrumented to yell about anything missed.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-12-08 22:41:36 -05:00
Andreas Gruenbacher
98e9cb5711 vfs: Distinguish between full xattr names and proper prefixes
Add an additional "name" field to struct xattr_handler.  When the name
is set, the handler matches attributes with exactly that name.  When the
prefix is set instead, the handler matches attributes with the given
prefix and with a non-empty suffix.

This patch should avoid bugs like the one fixed in commit c361016a in
the future.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Reviewed-by: James Morris <james.l.morris@oracle.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-12-06 21:33:52 -05:00
Al Viro
886f56f970 f2fs: it's umode_t, not mode_t...
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-12-06 21:18:00 -05:00
Chao Yu
0cab80ee0c f2fs: fix to convert inline inode in ->setattr
In commit 3c45414527 ("f2fs: do not trim preallocated blocks when
truncating after i_size"), in order to follow the regulation: "truncate(x)
where x > i_size will not trim all blocks past i_size." like other file
systems, in ->setattr we invoked truncate_setsize instead of f2fs_truncate
to avoid unneeded block trimming in such case, but forgot to call
f2fs_convert_inline_inode keep consistency of inline data conversion rule.

This patch fixes to convert inline data if necessary.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-12-04 13:14:44 -08:00
Chao Yu
3519e3f992 f2fs: use sbi->blocks_per_seg to avoid unnecessary calculation
Use sbi->blocks_per_seg directly to avoid unnecessary calculation when using
1 << sbi->log_blocks_per_seg.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-12-04 12:07:57 -08:00
Chao Yu
9006f2c93f f2fs: kill f2fs_drop_largest_extent
For direct IO, f2fs only allocate new address for the block which is not
exist in the disk before, its mapping info should not exist in extent
cache previously, so here we do not need to call f2fs_drop_largest_extent
to drop related cache.

Due to no more callers for f2fs_drop_largest_extent now, kill it.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-12-04 12:07:57 -08:00
Chao Yu
b7973f2378 f2fs: clean up argument of recover_data
In recover_data, value of argument 'type' will be CURSEG_WARM_NODE all
the time, remove it for cleanup.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-12-04 12:07:56 -08:00
Chao Yu
855639deca f2fs: clean up code with __has_cursum_space
Clean up codes in lookup_journal_in_cursum() with __has_cursum_space().

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-12-04 12:07:55 -08:00
Chao Yu
e9837bc2a4 f2fs: clean up error path in f2fs_readdir
No logic changes, just clean up the error path.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-12-04 12:07:55 -08:00
Jaegeuk Kim
807b1e1c8e f2fs: do not recover from previous remained wrong dnodes
If device does not support discard, some obsolete dnodes can be recovered
by roll-forward. This patch enhances the recovery flow.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-12-04 11:52:36 -08:00
Jaegeuk Kim
760de7914e f2fs: avoid deadlock in f2fs_shrink_extent_tree
While handling extent trees, we can enter into a reclaiming path anytime.
If it tries to release some extent nodes in the same extent tree,
write_lock(&et->lock) would be hanged.
In order to avoid the deadlock, we can just skip it.

Note that, if it is an unreferenced tree, we should get write_lock(&et->lock)
successfully and release all of therein nodes.

Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-12-04 11:52:36 -08:00
Chao Yu
57b62d29ad f2fs: fix to report error in f2fs_readdir
get_lock_data_page in f2fs_readdir can fail due to a lot of reasons (i.e.
no memory or IO error...), it's better to report this kind of error to
user rather than ignoring it.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-12-04 11:52:35 -08:00
Chao Yu
f478f43fa0 f2fs: clear page uptodate when dropping cache for atomic write
We should clear uptodate flag for all pages atomic written when we drop
them, otherwise before these cached pages were reclaimed or invalidated
eventually, we will see invalid data when hitting them again.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-12-04 11:52:35 -08:00
Fan Li
692223d132 f2fs: optimize __find_rev_next_bit
1. Skip __reverse_ulong if the bitmap is empty.
2. Reduce branches and codes.
According to my test, the performance of this new version is 5% higher on
an empty bitmap of 64bytes, and remains about the same in the worst scenario.

Signed-off-by: Fan li <fanofcode.li@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-12-04 11:52:35 -08:00
Chao Yu
eb7e813cc7 f2fs: fix to remove directory inode from dirty list
If last dirty dentry page was writebacked in reclaim path, we should
remove its directory inode from global dirty list to avoid unnecessary
flush for this inode when doing checkpoint.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-12-04 11:52:34 -08:00
Chao Yu
04ef4b626c f2fs: fix to enable missing ioctl interfaces in ->compat_ioctl
In 64-bit kernel f2fs can supports 32-bit ioctl system call by identifying
encoded code which is converted from 32-bit one to 64-bit one in
->compat_ioctl.

When we introduced new interfaces in ->ioctl, we forgot to enable them in
->compat_ioctl, so enable them for fixing.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
[Jaegeuk Kim: fix wrongly added spaces together]
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-12-04 11:52:34 -08:00
Chao Yu
29ba108d9b f2fs: fix memory leak of kobject in error path of fill_super
f2fs_sb_info::s_kobj should be released in error path of fill_super,
otherwise it will lead to memory leak.

This bug was found by kmemleak:

dmesg:
kmemleak: 2 new suspected memory leaks (see /sys/kernel/debug/kmemleak)

cat /sys/kernel/debug/kmemleak
unreferenced object 0xffff8800838dc358 (size 8):
  comm "mount", pid 4154, jiffies 4297482839 (age 1911.412s)
  hex dump (first 8 bytes):
    7a 72 61 6d 31 00 ff ff                          zram1...
  backtrace:
    [<ffffffff817a3668>] kmemleak_alloc+0x28/0x50
    [<ffffffff811dc99f>] __kmalloc_track_caller+0xef/0x1c0
    [<ffffffff8119d1c5>] kstrdup+0x45/0x80
    [<ffffffff8119d228>] kstrdup_const+0x28/0x30
    [<ffffffff813d2333>] kvasprintf_const+0x63/0xa0
    [<ffffffff813c59fc>] kobject_set_name_vargs+0x3c/0xa0
    [<ffffffff813c5a85>] kobject_add_varg+0x25/0x60
    [<ffffffff813c5b13>] kobject_init_and_add+0x53/0x70
    [<ffffffffa07ced19>] f2fs_fill_super+0x9d9/0xc40 [f2fs]
    [<ffffffff811ff0b2>] mount_bdev+0x192/0x1d0
    [<ffffffffa07cd3e5>] f2fs_mount+0x15/0x20 [f2fs]
    [<ffffffff811fec93>] mount_fs+0x43/0x170
    [<ffffffff81220256>] vfs_kern_mount+0x76/0x160
    [<ffffffff812211c8>] do_mount+0x258/0xdc0
    [<ffffffff81221dab>] SyS_mount+0x7b/0xc0
    [<ffffffff817aecd7>] entry_SYSCALL_64_fastpath+0x12/0x6f
...

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-12-04 11:52:34 -08:00
Chao Yu
d323d005ac f2fs: support file defragment
This patch introduces a new ioctl F2FS_IOC_DEFRAGMENT to support file
defragment in a specified range of regular file.

This ioctl can be used in very limited workload: if user expects high
sequential read performance in randomly written file, this interface
can be used for defragmentation, after that file can be written as
continuous as possible in the device.

Meanwhile, it has side-effect, it will make holes in segments where
blocks located originally, so it's better to trigger GC to eliminate
fragment in segments.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-12-04 11:52:33 -08:00
Chao Yu
2da3e02746 f2fs: commit atomic written page in LFS mode
We should always commit atomic written pages in LFS mode, otherwise data
will become corrupted if we encounter suddent power cut after partial
pages committed in IPU mode.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-12-04 11:52:33 -08:00
Chao Yu
787c7b8cb3 f2fs: report error of f2fs_create_root_stats
f2fs_create_root_stats can fail due to no memory, report it to user.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-12-04 11:52:33 -08:00
Linus Torvalds
5d2eb548b3 Merge branch 'for-linus-3' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs xattr cleanups from Al Viro.

* 'for-linus-3' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  f2fs: xattr simplifications
  squashfs: xattr simplifications
  9p: xattr simplifications
  xattr handlers: Pass handler to operations instead of flags
  jffs2: Add missing capability check for listing trusted xattrs
  hfsplus: Remove unused xattr handler list operations
  ubifs: Remove unused security xattr handler
  vfs: Fix the posix_acl_xattr_list return value
  vfs: Check attribute names in posix acl xattr handers
2015-11-13 18:02:30 -08:00
Andreas Gruenbacher
29608d208b f2fs: xattr simplifications
Now that the xattr handler is passed to the xattr handler operations, we
have access to the attribute name prefix, so simplify
f2fs_xattr_generic_list.

Also, f2fs_xattr_advise_list is only ever called for
f2fs_xattr_advise_handler; there is no need to double check for that.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Cc: Jaegeuk Kim <jaegeuk@kernel.org>
Cc: Changman Lee <cm224.lee@samsung.com>
Cc: Chao Yu <chao2.yu@samsung.com>
Cc: linux-f2fs-devel@lists.sourceforge.net
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-11-13 20:34:34 -05:00
Andreas Gruenbacher
d9a82a0403 xattr handlers: Pass handler to operations instead of flags
The xattr_handler operations are currently all passed a file system
specific flags value which the operations can use to disambiguate between
different handlers; some file systems use that to distinguish the xattr
namespace, for example.  In some oprations, it would be useful to also have
access to the handler prefix.  To allow that, pass a pointer to the handler
to operations instead of the flags value alone.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-11-13 20:34:32 -05:00
Yaowei Bai
a8415e4b13 fs/f2fs/namei.c: remove unnecessary new_valid_dev() check
new_valid_dev() always returns 1, so the !new_valid_dev() check is not
needed.  Remove it.

Signed-off-by: Yaowei Bai <bywxiaobai@163.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Jaegeuk Kim <jaegeuk@kernel.org>
Cc: Changman Lee <cm224.lee@samsung.com>
Cc: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-11-09 15:11:24 -08:00
Linus Torvalds
1873499e13 Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security
Pull security subsystem update from James Morris:
 "This is mostly maintenance updates across the subsystem, with a
  notable update for TPM 2.0, and addition of Jarkko Sakkinen as a
  maintainer of that"

* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: (40 commits)
  apparmor: clarify CRYPTO dependency
  selinux: Use a kmem_cache for allocation struct file_security_struct
  selinux: ioctl_has_perm should be static
  selinux: use sprintf return value
  selinux: use kstrdup() in security_get_bools()
  selinux: use kmemdup in security_sid_to_context_core()
  selinux: remove pointless cast in selinux_inode_setsecurity()
  selinux: introduce security_context_str_to_sid
  selinux: do not check open perm on ftruncate call
  selinux: change CONFIG_SECURITY_SELINUX_CHECKREQPROT_VALUE default
  KEYS: Merge the type-specific data with the payload data
  KEYS: Provide a script to extract a module signature
  KEYS: Provide a script to extract the sys cert list from a vmlinux file
  keys: Be more consistent in selection of union members used
  certs: add .gitignore to stop git nagging about x509_certificate_list
  KEYS: use kvfree() in add_key
  Smack: limited capability for changing process label
  TPM: remove unnecessary little endian conversion
  vTPM: support little endian guests
  char: Drop owner assignment from i2c_driver
  ...
2015-11-05 15:32:38 -08:00
Chao Yu
beaa57dd98 f2fs: fix to skip shrinking extent nodes
In f2fs_shrink_extent_tree we should stop shrink flow if we have already
shrunk enough nodes in extent cache.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-10-22 09:39:35 -07:00
Chao Yu
a6be014e1d f2fs: fix error path of ->symlink
Now, in ->symlink of f2fs, we kept the fixed invoking order between
f2fs_add_link and page_symlink since we should init node info firstly
in f2fs_add_link, then such node info can be used in page_symlink.

But we didn't fix to release meta info which was done before page_symlink
in our error path, so this will leave us corrupt symlink entry in its
parent's dentry page. Fix this issue by adding f2fs_unlink in the error
path for removing such linking.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-10-22 09:39:24 -07:00
Chao Yu
7fee740697 f2fs: fix to clear GCed flag for atomic written page
Atomic write page can be GCed, after committing this kind of page, we should
clear the GCed flag for it.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-10-22 09:37:13 -07:00
Jaegeuk Kim
2b246fb0f6 f2fs: don't need to submit bio on error case
If commit_atomic_write is failed, we don't need to submit any bio.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-10-21 19:05:53 -07:00
Jaegeuk Kim
d7b8b384b0 f2fs: fix leakage of inmemory atomic pages
If we got failure during commit_atomic_write, abort_volatile_write will be
called, but will not drop the inmemory pages due to no FI_ATOMIC_FILE.
Actually, there is no reason to check the flag in abort_volatile_write.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-10-21 19:04:17 -07:00
Jaegeuk Kim
f96999c35f f2fs: refactor __find_rev_next_{zero}_bit
This patch refactors __find_rev_next_{zero}_bit which was disabled previously
due to bugs.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-10-21 15:26:00 -07:00
David Howells
146aa8b145 KEYS: Merge the type-specific data with the payload data
Merge the type-specific data with the payload data into one four-word chunk
as it seems pointless to keep them separate.

Use user_key_payload() for accessing the payloads of overloaded
user-defined keys.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: linux-cifs@vger.kernel.org
cc: ecryptfs@vger.kernel.org
cc: linux-ext4@vger.kernel.org
cc: linux-f2fs-devel@lists.sourceforge.net
cc: linux-nfs@vger.kernel.org
cc: ceph-devel@vger.kernel.org
cc: linux-ima-devel@lists.sourceforge.net
2015-10-21 15:18:36 +01:00
Jaegeuk Kim
67f8cf3cee f2fs: support fiemap for inline_data
There is a FIEMAP_EXTENT_INLINE_DATA, pointed out by Marc.

Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-10-20 11:33:21 -07:00
Jaegeuk Kim
1d373a0ef7 f2fs: flush dirty data for bmap
Users expect bmap will give allocated block addresses.
Let's play likewise ext4.

Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-10-20 11:33:11 -07:00
Jaegeuk Kim
84e4214f08 f2fs: relocate the tracepoint for background_gc
Once f2fs_gc is done, wait_ms is changed once more.
So, its tracepoint would be located after it.

Reported-by: He YunLei <heyunlei@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-10-13 10:02:01 -07:00
Chao Yu
08b39fbd59 f2fs crypto: fix racing of accessing encrypted page among
different competitors

Since we use different page cache (normally inode's page cache for R/W
and meta inode's page cache for GC) to cache the same physical block
which is belong to an encrypted inode. Writeback of these two page
cache should be exclusive, but now we didn't handle writeback state
well, so there may be potential racing problem:

a)
kworker:				f2fs_gc:
 - f2fs_write_data_pages
  - f2fs_write_data_page
   - do_write_data_page
    - write_data_page
     - f2fs_submit_page_mbio
(page#1 in inode's page cache was queued
in f2fs bio cache, and be ready to write
to new blkaddr)
					 - gc_data_segment
					  - move_encrypted_block
					   - pagecache_get_page
					(page#2 in meta inode's page cache
					was cached with the invalid datas
					of physical block located in new
					blkaddr)
					   - f2fs_submit_page_mbio
					(page#1 was submitted, later, page#2
					with invalid data will be submitted)

b)
f2fs_gc:
 - gc_data_segment
  - move_encrypted_block
   - f2fs_submit_page_mbio
(page#1 in meta inode's page cache was
queued in f2fs bio cache, and be ready
to write to new blkaddr)
					user thread:
					 - f2fs_write_begin
					  - f2fs_submit_page_bio
					(we submit the request to block layer
					to update page#2 in inode's page cache
					with physical block located in new
					blkaddr, so here we may read gabbage
					data from new blkaddr since GC hasn't
					writebacked the page#1 yet)

This patch fixes above potential racing problem for encrypted inode.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-10-13 09:52:34 -07:00
Chao Yu
ea1a29a0bd f2fs: export ra_nid_pages to sysfs
After finishing building free nid cache, we will try to readahead
asynchronously 4 more pages for the next reloading, the count of
readahead nid pages is fixed.

In some case, like SMR drive, read less sectors with fixed count
each time we trigger RA may be low efficient, since we will face
high seeking overhead, so we'd better let user to configure this
parameter from sysfs in specific workload.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-10-12 14:03:43 -07:00
Chao Yu
2db2388fcf f2fs: readahead for free nids building
When there is no free nid in nid cache, all new node allocaters stop their
job to wait for reloading of free nids, however reloading is synchronous as
we will read 4 NAT pages for building nid cache, it cause the long latency.

This patch tries to readahead more NAT pages with READA request flag after
reloading of free nids. It helps to improve performance when users allocate
node id intensively.

Env: Sandisk 32G sd card
time for i in `seq 1 60000`; { echo -n > /mnt/f2fs/$i; echo XXXXXX > /mnt/f2fs/$i;}

Before:
real    0m2.814s
user    0m1.220s
sys     0m1.536s

After:
real    0m2.711s
user    0m1.136s
sys     0m1.568s

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-10-12 14:03:22 -07:00
Chao Yu
26879fb101 f2fs: support lower priority asynchronous readahead in ra_meta_pages
Now, we use ra_meta_pages to reads continuous physical blocks as much as
possible to improve performance of following reads. However, ra_meta_pages
uses a synchronous readahead approach by submitting bio with READ, as READ
is with high priority, it can not be used in the case of preloading blocks,
and it's not sure when these RAed pages will be used.

This patch supports asynchronous readahead in ra_meta_pages by tagging bio
with READA flag in order to allow preloading.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-10-12 14:03:15 -07:00
Chao Yu
2b947003fa f2fs: don't tag REQ_META for temporary non-meta pages
In recovery or checkpoint flow, we grab pages temperarily in meta inode's
mapping for caching temperary data, actually, datas in these pages were
not meta data of f2fs, but still we tag them with REQ_META flag. However,
lower device like eMMC may do some optimization for data of such type.
So in order to avoid wrong optimization, we'd better remove such flag
for temperary non-meta pages.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-10-12 14:01:46 -07:00
Chao Yu
b8c2940048 f2fs: add a tracepoint for f2fs_read_data_pages
This patch adds a tracepoint for f2fs_read_data_pages to trace when pages
are readahead by VFS.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-10-12 14:00:34 -07:00
Jaegeuk Kim
a56c7c6fb3 f2fs: set GFP_NOFS for grab_cache_page
For normal inodes, their pages are allocated with __GFP_FS, which can cause
filesystem calls when reclaiming memory.
This can incur a dead lock condition accordingly.

So, this patch addresses this problem by introducing
f2fs_grab_cache_page(.., bool for_write), which calls
grab_cache_page_write_begin() with AOP_FLAG_NOFS.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-10-12 13:38:03 -07:00
Jaegeuk Kim
6e2c64ad7c f2fs: fix SSA updates resulting in corruption
The f2fs_collapse_range and f2fs_insert_range changes the block addresses
directly. But that can cause uncovered SSA updates.
In that case, we need to give up to change the block addresses and do buffered
writes to keep filesystem consistency.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-10-12 13:38:02 -07:00
Jaegeuk Kim
a125702326 Revert "f2fs: do not skip dentry block writes"
The periodic checkpoint can resolve the previous issue.
So, now we can use this again to improve the reported performance regression:

https://lkml.org/lkml/2015/10/8/20

This reverts commit 15bec0ff5a9ba6d203178fa8772259df6207942a.
2015-10-12 13:38:02 -07:00
Jaegeuk Kim
c912a8298c f2fs: add F2FS_GOING_DOWN_METAFLUSH to test power-failure
This patch introduces F2FS_GOING_DOWN_METAFLUSH which flushes meta pages like
SSA blocks and then blocks all the writes.
This can be used by power-failure tests.

Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-10-12 13:37:54 -07:00
Jaegeuk Kim
6066d8cdb6 f2fs: merge meta writes as many possible
This patch tries to merge IOs as many as possible when background flusher
conducts flushing the dirty meta pages.

[Before]

...
2fs_submit_write_bio: dev = (8,18), WRITE_SYNC(MP), META, sector = 124320, size = 4096
f2fs_submit_write_bio: dev = (8,18), WRITE_SYNC(MP), META, sector = 124560, size = 32768
f2fs_submit_write_bio: dev = (8,18), WRITE_SYNC(MP), META, sector = 95720, size = 987136
f2fs_submit_write_bio: dev = (8,18), WRITE_SYNC(MP), META, sector = 123928, size = 4096
f2fs_submit_write_bio: dev = (8,18), WRITE_SYNC(MP), META, sector = 123944, size = 8192
f2fs_submit_write_bio: dev = (8,18), WRITE_SYNC(MP), META, sector = 123968, size = 45056
f2fs_submit_write_bio: dev = (8,18), WRITE_SYNC(MP), META, sector = 124064, size = 4096
f2fs_submit_write_bio: dev = (8,18), WRITE_SYNC(MP), META, sector = 97648, size = 1007616
f2fs_submit_write_bio: dev = (8,18), WRITE_SYNC(MP), META, sector = 123776, size = 8192
f2fs_submit_write_bio: dev = (8,18), WRITE_SYNC(MP), META, sector = 123800, size = 32768
f2fs_submit_write_bio: dev = (8,18), WRITE_SYNC(MP), META, sector = 124624, size = 4096
f2fs_submit_write_bio: dev = (8,18), WRITE_SYNC(MP), META, sector = 99616, size = 921600
f2fs_submit_write_bio: dev = (8,18), WRITE_SYNC(MP), META, sector = 123608, size = 4096
f2fs_submit_write_bio: dev = (8,18), WRITE_SYNC(MP), META, sector = 123624, size = 77824
f2fs_submit_write_bio: dev = (8,18), WRITE_SYNC(MP), META, sector = 123792, size = 4096
f2fs_submit_write_bio: dev = (8,18), WRITE_SYNC(MP), META, sector = 123864, size = 32768
...

[After]

...
f2fs_submit_write_bio: dev = (8,18), WRITE_SYNC(MP), META, sector = 92168, size = 892928
f2fs_submit_write_bio: dev = (8,18), WRITE_SYNC(MP), META, sector = 93912, size = 753664
f2fs_submit_write_bio: dev = (8,18), WRITE_SYNC(MP), META, sector = 95384, size = 716800
f2fs_submit_write_bio: dev = (8,18), WRITE_SYNC(MP), META, sector = 96784, size = 712704
f2fs_submit_write_bio: dev = (8,18), WRITE_SYNC(MP), META, sector = 104160, size = 364544
f2fs_submit_write_bio: dev = (8,18), WRITE_SYNC(MP), META, sector = 104872, size = 356352
f2fs_submit_write_bio: dev = (8,18), WRITE_SYNC(MP), META, sector = 105568, size = 278528
f2fs_submit_write_bio: dev = (8,18), WRITE_SYNC(MP), META, sector = 106112, size = 319488
f2fs_submit_write_bio: dev = (8,18), WRITE_SYNC(MP), META, sector = 106736, size = 258048
f2fs_submit_write_bio: dev = (8,18), WRITE_SYNC(MP), META, sector = 107240, size = 270336
f2fs_submit_write_bio: dev = (8,18), WRITE_SYNC(MP), META, sector = 107768, size = 180224
...

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-10-09 16:20:57 -07:00
Jaegeuk Kim
60b99b486b f2fs: introduce a periodic checkpoint flow
This patch introduces a periodic checkpoint feature.
Note that, this is not enforcing to conduct checkpoints very strictly in terms
of trigger timing, instead just hope to help user experiences.
The default value is 60 seconds.

Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-10-09 16:20:57 -07:00
Jaegeuk Kim
5c26743474 f2fs: add a tracepoint for background gc
This patch introduces a tracepoint to monitor background gc behaviors.

Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-10-09 16:20:57 -07:00
Jaegeuk Kim
6aefd93b01 f2fs: introduce background_gc=sync mount option
This patch introduce background_gc=sync enabling synchronous cleaning in
background.

Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-10-09 16:20:57 -07:00
Chao Yu
456b88e4d1 f2fs: introduce a new ioctl F2FS_IOC_WRITE_CHECKPOINT
This patch introduce a new ioctl for those users who want to trigger
checkpoint from userspace through ioctl.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-10-09 16:20:56 -07:00
Chao Yu
d530d4d8e2 f2fs: support synchronous gc in ioctl
This patch drops in batches gc triggered through ioctl, since user
can easily control the gc by designing the loop around the ->ioctl.

We support synchronous gc by forcing using FG_GC in f2fs_gc, so with
it, user can make sure that in this round all blocks gced were
persistent in the device until ioctl returned.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-10-09 16:20:56 -07:00
Chao Yu
3342bb303b f2fs: skip searching dirty map if dirty segment is not exist
When searching victim during gc, if there are no dirty segments in
filesystem, we will still take the time to search the whole dirty segment
map, it's not needed, it's better to skip in this condition.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-10-09 16:20:56 -07:00
Chao Yu
a43f7ec327 f2fs: fix to avoid redundant searching in dirty map during gc
When doing gc, we search a victim in dirty map, starting from position of
last victim, we will reset the current searching position until we touch
the end of dirty map, and then search the whole diryt map. So sometimes we
will search the range [victim, last] twice, it's redundant, this patch
avoids this issue.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-10-09 16:20:55 -07:00
Chao Yu
5b7ee37414 f2fs: use atomic64_t for extent cache hit stat
Our hit stat of extent cache will increase all the time until remount,
and we use atomic_t type for the stat variable, so it may easily incur
overflow when we query extent cache frequently in a long time running
fs.

So to avoid that, this patch uses atomic64_t for hit stat variables.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-10-09 16:20:55 -07:00
Jaegeuk Kim
39307a8e24 f2fs: use vmalloc to handle -ENOMEM error
This patch introduces f2fs_kvmalloc to avoid -ENOMEM during mount.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-10-09 16:20:55 -07:00
Jaegeuk Kim
ab126cfc30 f2fs: should get a victim from retrials
If we do not call get_victim first, we cannot get a new victim for retrial
path.

Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-10-09 16:20:55 -07:00
Chao Yu
45fe8492cc f2fs: fix to correct freed section number during gc
This patch fixes to maintain the right section count freed in garbage
collecting when triggering a foreground gc.

Besides, when a foreground gc is running on current selected section, once
we fail to gc one segment, it's better to abandon gcing the left segments
in current section, because anyway we will select next victim for
foreground gc, so gc on the left segments in previous section will become
overhead and also cause the long latency for caller.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-10-09 16:20:54 -07:00
Chao Yu
345a6b2ee2 f2fs: fix to update {m,c}time correctly when truncating larger
This patch fixes to update ctime and atime correctly when truncating
larger in ->setattr.

The bug is reported by xfstest generic/313 as below:

generic/313 2s ... - output mismatch (see ./results/generic/313.out.bad)
    --- tests/generic/313.out   2015-08-04 15:28:53.430798882 +0800
    +++ results/generic/313.out.bad   2015-09-28 17:04:27.294278016 +0800
    @@ -1,2 +1,4 @@
     QA output created by 313
     Silence is golden
    +ctime not updated after truncate up
    +mtime not updated after truncate up
    ...
    (Run 'diff -u tests/generic/313.out tests/generic/313.out.bad'  to see the entire diff)
Ran: generic/313
Failures: generic/313
Failed 1 of 1 tests

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-10-09 16:20:54 -07:00
Jaegeuk Kim
90b803e6fb f2fs: do not skip dentry block writes
Previously, we skip dentry block writes when wbc is SYNC_NONE with no memory
pressure and the number of dirty pages is pretty small.

But, we didn't skip for normal data writes, which gives us not much big impact
on overall performance.
Moreover, by skipping some data writes, kworker falls into infinite loop to try
to write blocks, when many dir inodes have only one dentry block.

So, this patch removes skipping data writes.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-10-09 16:20:54 -07:00
Chao Yu
7223554133 f2fs: remove unneeded f2fs_{,un}lock_op in do_recover_data()
Protecting recovery flow by using cp_rwsem is not needed, since we have
prevent triggering any checkpoint by locking cp_mutex previously.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-10-09 16:20:54 -07:00
Chao Yu
1d7e10d58a f2fs: fix incorrect bimodal calculation
In update_sit_info, we use div_u64 to handle 'u64 divide u64' case, but
div_u64 can only handle 32-bits divisor, so our divisor with u64 type
passed to div_u64 will overflow, result in the wrong calculation when
show debug info of f2fs as below:

BDF: 464, avg. vblocks: 23509
(BDF should never exceed 100)

So change to use div64_u64 to handle this case correctly.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-10-09 16:20:53 -07:00
Chao Yu
4abd3f5ac4 f2fs: introduce __try_update_largest_extent
This patch adds a new helper __try_update_largest_extent for cleanup.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-10-09 16:20:53 -07:00
Nicholas Krause
545fe4210d f2fs: fix error handling for calls to various functions in the function recover_inline_data
This fixes error handling for calls to various functions in the
function  recover_inline_data to check if these particular functions
either return a error code or the boolean value false to signal their
caller they have failed internally and if this arises return false
to signal failure immediately to the caller of recover_inline_data
as we cannot continue after failures to calling either the function
truncate_inline_inode or truncate_blocks.

Signed-off-by: Nicholas Krause <xerofoify@gmail.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-10-09 16:20:53 -07:00
Chao Yu
9cd81ce3c2 f2fs: disallow switch extent_cache option dynamically
Swith extent_cache option dynamically when remount may casue consistency
issue between extent cache and dnode page. Fix in this patch to avoid
that condition.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-10-09 16:20:53 -07:00
Chao Yu
46c9e1413f f2fs: use correct flag in f2fs_map_blocks()
We introduce F2FS_GET_BLOCK_READ in commit e2b4e2bc88 ("f2fs: fix
incorrect mapping for bmap"), but forget to use this flag in the right
place, fix it.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-10-09 16:20:52 -07:00
Chao Yu
f9811703fe f2fs: fix to handle io error in ->direct_IO
Here is a oops reported as following message when testing generic/019 of
xfstest:

 ------------[ cut here ]------------
 kernel BUG at /home/yuchao/git/f2fs-dev/segment.c:882!
 invalid opcode: 0000 [#1] SMP
 Modules linked in: zram lz4_compress lz4_decompress f2fs(O) ip6table_filter ip6_tables ebtable_nat ebtables nf_conntrack_ipv4
nf_def
 CPU: 2 PID: 25441 Comm: fio Tainted: G           O    4.3.0-rc1+ #6
 Hardware name: Hewlett-Packard HP Z220 CMT Workstation/1790, BIOS K51 v01.61 05/16/2013
 task: ffff8803f4e85580 ti: ffff8803fd61c000 task.ti: ffff8803fd61c000
 RIP: 0010:[<ffffffffa0784981>]  [<ffffffffa0784981>] new_curseg+0x321/0x330 [f2fs]
 RSP: 0018:ffff8803fd61f918  EFLAGS: 00010246
 RAX: 00000000000007ed RBX: 0000000000000224 RCX: 000000000000001f
 RDX: 0000000000000800 RSI: ffffffffffffffff RDI: ffff8803f56f4300
 RBP: ffff8803fd61f978 R08: 0000000000000000 R09: 0000000000000000
 R10: 0000000000000024 R11: ffff8800d23bbd78 R12: ffff8800d0ef0000
 R13: 0000000000000224 R14: 0000000000000000 R15: 0000000000000001
 FS:  00007f827ff85700(0000) GS:ffff88041ea80000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: ffffffffff600000 CR3: 00000003fef17000 CR4: 00000000001406e0
 Stack:
  000007ea00000002 0000000100000001 ffff8803f6456248 000007ed0000002b
  0000000000000224 ffff880404d1aa20 ffff8803fd61f9c8 ffff8800d0ef0000
  ffff8803f6456248 0000000000000001 00000000ffffffff ffffffffa078f358
 Call Trace:
  [<ffffffffa0785b87>] allocate_segment_by_default+0x1a7/0x1f0 [f2fs]
  [<ffffffffa078322c>] allocate_data_block+0x17c/0x360 [f2fs]
  [<ffffffffa0779521>] __allocate_data_block+0x131/0x1d0 [f2fs]
  [<ffffffffa077a995>] f2fs_direct_IO+0x4b5/0x580 [f2fs]
  [<ffffffff811510ae>] generic_file_direct_write+0xae/0x160
  [<ffffffff811518f5>] __generic_file_write_iter+0xd5/0x1f0
  [<ffffffff81151e07>] generic_file_write_iter+0xf7/0x200
  [<ffffffff81319e38>] ? apparmor_file_permission+0x18/0x20
  [<ffffffffa0768480>] ? f2fs_fallocate+0x1190/0x1190 [f2fs]
  [<ffffffffa07684c6>] f2fs_file_write_iter+0x46/0x90 [f2fs]
  [<ffffffff8120b4fe>] aio_run_iocb+0x1ee/0x290
  [<ffffffff81700f7e>] ? mutex_lock+0x1e/0x50
  [<ffffffff8120a1d7>] ? aio_read_events+0x207/0x2b0
  [<ffffffff8120b913>] do_io_submit+0x373/0x630
  [<ffffffff8120a4f6>] ? SyS_io_getevents+0x56/0xb0
  [<ffffffff8120bbe0>] SyS_io_submit+0x10/0x20
  [<ffffffff81703857>] entry_SYSCALL_64_fastpath+0x12/0x6a
 Code: 45 c8 48 8b 78 10 e8 9f 23 bf e0 41 8b 8c 24 cc 03 00 00 89 c7 31 d2 89 c6 89 d8 29 df f7 f1 29 d1 39 cf 0f 83 be fd ff ff eb
 RIP  [<ffffffffa0784981>] new_curseg+0x321/0x330 [f2fs]
  RSP <ffff8803fd61f918>
 ---[ end trace 2e577d7f711ddb86 ]---

The reason is that: in the test of generic/019, we will trigger a manmade
IO error in block layer through debugfs, after that, prefree segment will
no longer be freed, because we always skip doing gc or checkpoint when
there occurs an IO error.

Meanwhile fio with aio engine generated a large number of direct IOs,
which continue allocating spaces in free segment until we run out of them,
eventually, results in panic in new_curseg as no more free segment was
found.

So, this patch changes to return EIO in direct_IO for this condition.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-10-09 16:20:52 -07:00
Chao Yu
ea58711e88 f2fs: do in batches truncation in truncate_hole
truncate_data_blocks_range can do in batches truncation which makes all
changes in dnode page content, dnode page status, extent cache, block
count updating together.

But previously, truncate_hole() always truncates one block in dnode page
at a time by invoking truncate_data_blocks_range(,1), which make thing
slow.

This patch changes truncate_hole() to do in batches truncation for all
target blocks in one direct node inside truncate_data_blocks_range, which
can make our punch hole operation in ->fallocate more efficent.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-10-09 16:20:52 -07:00
Fan Li
4d1fa815f2 f2fs: optimize code of f2fs_update_extent_tree_range
Fix 2 potential problems:
1. when largest extent needs to be invalidated, it will be reset in
   __drop_largest_extent, which makes __is_extent_same after always
   return false, and largest extent unchanged. Now we update it properly.

2. when extent is split and the latter part remains in tree, next_en
   should be the latter part instead of next extent of original extent.
   It will cause merge failure if there is in-place update, although
   there is not, I think this fix will still makes codes less ambiguous.

This patch also simplifies codes of invalidating extents, and optimizes the
procedues that split extent into two.
There are a few modifications after last patch:
1. prev_en now is updated properly.
2. more codes and branches are simplified.

Signed-off-by: Fan li <fanofcode.li@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-10-09 16:20:52 -07:00
Fan Li
41a099de3a f2fs: drop largest extent by range
now we update extent by range, fofs may not be on the largest
extent if the new extent overlaps with it. so add a new function
to drop largest extent properly.

Signed-off-by: Fan li <fanofcode.li@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-10-09 16:20:51 -07:00
Jaegeuk Kim
a7230d16d5 f2fs: check end_io for metapages before making next checkpoint blocks
This patch avoids to produce new checkpoint blocks before the previous meta
pages were written completely.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-10-09 16:20:51 -07:00
Jaegeuk Kim
569cf1876a f2fs crypto: allocate buffer for decrypting filename
We got dentry pages from high_mem, and its address space directly goes into the
decryption path via f2fs_fname_disk_to_usr.
But, sg_init_one assumes the address is not from high_mem, so we can get this
panic since it doesn't call kmap_high but kunmap_high is triggered at the end.

kernel BUG at ../../../../../../kernel/mm/highmem.c:290!
Internal error: Oops - BUG: 0 [#1] PREEMPT SMP ARM
...
 (kunmap_high+0xb0/0xb8) from [<c0114534>] (__kunmap_atomic+0xa0/0xa4)
 (__kunmap_atomic+0xa0/0xa4) from [<c035f028>] (blkcipher_walk_done+0x128/0x1ec)
 (blkcipher_walk_done+0x128/0x1ec) from [<c0366c24>] (crypto_cbc_decrypt+0xc0/0x170)
 (crypto_cbc_decrypt+0xc0/0x170) from [<c0367148>] (crypto_cts_decrypt+0xc0/0x114)
 (crypto_cts_decrypt+0xc0/0x114) from [<c035ea98>] (async_decrypt+0x40/0x48)
 (async_decrypt+0x40/0x48) from [<c032ca34>] (f2fs_fname_disk_to_usr+0x124/0x304)
 (f2fs_fname_disk_to_usr+0x124/0x304) from [<c03056fc>] (f2fs_fill_dentries+0xac/0x188)
 (f2fs_fill_dentries+0xac/0x188) from [<c03059c8>] (f2fs_readdir+0x1f0/0x300)
 (f2fs_readdir+0x1f0/0x300) from [<c0218054>] (vfs_readdir+0x90/0xb4)
 (vfs_readdir+0x90/0xb4) from [<c0218418>] (SyS_getdents64+0x64/0xcc)
 (SyS_getdents64+0x64/0xcc) from [<c0105ba0>] (ret_fast_syscall+0x0/0x30)

Cc: <stable@vger.kernel.org>
Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-10-09 16:20:51 -07:00
Chao Yu
973163fc0c f2fs: reorganize f2fs_map_blocks
In this patch, we try to reorganize f2fs_map_blocks to make block mapping
flow more clear by using following structure:

/* check status of mapping */

if (unmapped) {
	/* blkaddr == NULL_ADDR || blkaddr == NEW_ADDR */

	if (create) {
		/* write path, handle dio write case here */
		alloc_and_map;
	} else {
		/*
		 * handle read cases from all call paths:
		 *     1. generic read;
		 *     2. dio read;
		 *     3. fiemap;
		 *     4. bmap
		 */
	}
}

/* map buffer_header */

Besides, this patch handles the missing case correctly for dio write:
When we fail in __allocate_data_blocks, then in f2fs_map_blocks, we will
not allocate blocks correctly for preallocated blocks, but returning with
an unmapped buffer head, which will result in failure of dio write.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-10-09 16:20:51 -07:00
Jaegeuk Kim
514053e454 f2fs: declare f2fs_update_extent_tree_range as static
This function should be static.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-10-09 16:20:50 -07:00
Chao Yu
9edcdabf36 f2fs: fix overflow of size calculation
We have potential overflow issue when calculating size of object, when
we left shift index with PAGE_CACHE_SHIFT bits, if type of index has only
32-bits space in 32-bit architecture, left shifting will incur overflow,
i.e:

pgoff_t index =  0xFFFFFFFF;
loff_t size = index << PAGE_CACHE_SHIFT;
size: 0xFFFFF000

So we should cast index with 64-bits type to avoid this issue.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-10-09 16:20:50 -07:00
Chao Yu
100136acfb f2fs: fix incorrect searching position when shrinking extent cache
When shrinking extent cache, we have two steps in the flow:
1) shrink objects which are unreferenced by inodes;
2) shrink objects from LRU list of extent cache.

In step 1, if we haven't shrunk enough number of objects, we will try
step 2, but before that we didn't update the searching position which
may point to last inode index in global extent tree, result in failing
to shrink objects by traversing the all inodes' extent tree.

In this patch, we reset searching position to beginning of global extent
tree for fixing.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-10-09 16:20:50 -07:00
Chao Yu
c998012b0b f2fs: verify file type early in f2fs_fallocate
This patch changes to verify file type early in f2fs_fallocate for
cleanup, meanwhile this also fixes to add missing verification for
expand_inode_data.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-10-09 16:20:50 -07:00
Jaegeuk Kim
c5cd29d21c f2fs: no need to lock for update_inode_page all the time
As comment says, we don't need to call f2fs_lock_op in write_inode to prevent
from producing dirty node pages all the time.
That happens only when there is not enough free sections and we can avoid that
by calling balance_fs in prior to that.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-10-09 16:20:50 -07:00
Jaegeuk Kim
25b93346a6 f2fs: cover number of dirty node pages under node_write lock
This number is referenced by checkpoint under node_write lock.

Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-10-09 16:20:49 -07:00
Nicholas Krause
538e17e7e9 f2fs: fix incorrect return statement in the function f2fs_ioc_release_volatile_write
This fixes the incorrect return statement at the end of the function
f2fs_ioc_release_volatile_write's body for returning zero as this is
incorrect due to the function call before this return statement to
the function punch_hole being able to fail and we should return this
function's return fail directly in order to signal to callers of the
function f2fs_ioc_release_volatile if a failure arises with this call
to punch_hole fails.

Signed-off-by: Nicholas Krause <xerofoify@gmail.com>
Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-10-09 16:20:49 -07:00
Chao Yu
744288c721 f2fs: trace in batches extent info update
Rename trace_f2fs_update_extent_tree to trace_f2fs_update_extent_tree_range,
then expand and enable it to trace in batches extent info updates.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-10-09 16:20:49 -07:00
Linus Torvalds
4c12ab7e5e Merge tag 'for-f2fs-4.3' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs
Pull f2fs updates from Jaegeuk Kim:
 "The major work includes fixing and enhancing the existing extent_cache
  feature, which has been well settling down so far and now it becomes a
  default mount option accordingly.

  Also, this version newly registers a f2fs memory shrinker to reclaim
  several objects consumed by a couple of data structures in order to
  avoid memory pressures.

  Another new feature is to add ioctl(F2FS_GARBAGE_COLLECT) which
  triggers a cleaning job explicitly by users.

  Most of the other patches are to fix bugs occurred in the corner cases
  across the whole code area"

* tag 'for-f2fs-4.3' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (85 commits)
  f2fs: upset segment_info repair
  f2fs: avoid accessing NULL pointer in f2fs_drop_largest_extent
  f2fs: update extent tree in batches
  f2fs: fix to release inode correctly
  f2fs: handle f2fs_truncate error correctly
  f2fs: avoid unneeded initializing when converting inline dentry
  f2fs: atomically set inode->i_flags
  f2fs: fix wrong pointer access during try_to_free_nids
  f2fs: use __GFP_NOFAIL to avoid infinite loop
  f2fs: lookup neighbor extent nodes for merging later
  f2fs: split __insert_extent_tree_ret for readability
  f2fs: kill dead code in __insert_extent_tree
  f2fs: adjust showing of extent cache stat
  f2fs: add largest/cached stat in extent cache
  f2fs: fix incorrect mapping for bmap
  f2fs: add annotation for space utilization of regular/inline dentry
  f2fs: fix to update cached_en of extent tree properly
  f2fs: fix typo
  f2fs: check the node block address of newly allocated nid
  f2fs: go out for insert_inode_locked failure
  ...
2015-09-03 13:10:22 -07:00
Linus Torvalds
1081230b74 Merge branch 'for-4.3/core' of git://git.kernel.dk/linux-block
Pull core block updates from Jens Axboe:
 "This first core part of the block IO changes contains:

   - Cleanup of the bio IO error signaling from Christoph.  We used to
     rely on the uptodate bit and passing around of an error, now we
     store the error in the bio itself.

   - Improvement of the above from myself, by shrinking the bio size
     down again to fit in two cachelines on x86-64.

   - Revert of the max_hw_sectors cap removal from a revision again,
     from Jeff Moyer.  This caused performance regressions in various
     tests.  Reinstate the limit, bump it to a more reasonable size
     instead.

   - Make /sys/block/<dev>/queue/discard_max_bytes writeable, by me.
     Most devices have huge trim limits, which can cause nasty latencies
     when deleting files.  Enable the admin to configure the size down.
     We will look into having a more sane default instead of UINT_MAX
     sectors.

   - Improvement of the SGP gaps logic from Keith Busch.

   - Enable the block core to handle arbitrarily sized bios, which
     enables a nice simplification of bio_add_page() (which is an IO hot
     path).  From Kent.

   - Improvements to the partition io stats accounting, making it
     faster.  From Ming Lei.

   - Also from Ming Lei, a basic fixup for overflow of the sysfs pending
     file in blk-mq, as well as a fix for a blk-mq timeout race
     condition.

   - Ming Lin has been carrying Kents above mentioned patches forward
     for a while, and testing them.  Ming also did a few fixes around
     that.

   - Sasha Levin found and fixed a use-after-free problem introduced by
     the bio->bi_error changes from Christoph.

   - Small blk cgroup cleanup from Viresh Kumar"

* 'for-4.3/core' of git://git.kernel.dk/linux-block: (26 commits)
  blk: Fix bio_io_vec index when checking bvec gaps
  block: Replace SG_GAPS with new queue limits mask
  block: bump BLK_DEF_MAX_SECTORS to 2560
  Revert "block: remove artifical max_hw_sectors cap"
  blk-mq: fix race between timeout and freeing request
  blk-mq: fix buffer overflow when reading sysfs file of 'pending'
  Documentation: update notes in biovecs about arbitrarily sized bios
  block: remove bio_get_nr_vecs()
  fs: use helper bio_add_page() instead of open coding on bi_io_vec
  block: kill merge_bvec_fn() completely
  md/raid5: get rid of bio_fits_rdev()
  md/raid5: split bio for chunk_aligned_read
  block: remove split code in blkdev_issue_{discard,write_same}
  btrfs: remove bio splitting and merge_bvec_fn() calls
  bcache: remove driver private bio splitting code
  block: simplify bio_add_page()
  block: make generic_make_request handle arbitrarily sized bios
  blk-cgroup: Drop unlikely before IS_ERR(_OR_NULL)
  block: don't access bio->bi_error after bio_put()
  block: shrink struct bio down to 2 cache lines again
  ...
2015-09-02 13:10:25 -07:00
Yunlei He
01a5ad827a f2fs: upset segment_info repair
upset segment_info like this:

276000|161 0|0   4|70  3|0   3|0   0|0   0|91  4|0   4|232 4|39
276104|0   4|0   4|1   4|0   4|0   4|280 4|0   4|42  4|262 4|38
276204|179 4|89  4|39  4|24  4|0   4|96  4|3   4|428 4|0   4|118
276304|112 4|97  4|0   4|0   4|0   4|68  4|0   4|0   4|86  4|138
276404|0   4|0   0|166 5|39  4|101 0|111

Signed-off-by: Yunlei He <heyunlei@huawei.com>
Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-09-01 14:45:27 -07:00
Chao Yu
54d7185642 f2fs: avoid accessing NULL pointer in f2fs_drop_largest_extent
If extent cache is disable, we will encounter oops when triggering direct
IO as below:

BUG: unable to handle kernel NULL pointer dereference at 0000000c
IP: [<f0b9c61e>] f2fs_drop_largest_extent+0xe/0x30 [f2fs]
*pdpt = 000000002bb9a001 *pde = 0000000000000000
Oops: 0000 [#1] SMP
Modules linked in: f2fs(O) fuse bnep rfcomm bluetooth nfsd dm_crypt nfs_acl auth_rpcgss oid_registry nfs binfmt_misc fscache lockd
sunrpc grace snd_intel8x0 snd_ac97_codec ac97_bus snd_pcm snd_seq_midi snd_rawmidi snd_seq_midi_event snd_seq snd_timer
snd_seq_device snd soundcore joydev psmouse hid_generic i2c_piix4 serio_raw ppdev mac_hid parport_pc lp parport ext4 jbd2 mbcache
usbhid hid e1000
CPU: 3 PID: 3608 Comm: dd Tainted: G           O    4.2.0-rc4 #12
Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
task: ef161600 ti: ebd5e000 task.ti: ebd5e000
EIP: 0060:[<f0b9c61e>] EFLAGS: 00010202 CPU: 3
EIP is at f2fs_drop_largest_extent+0xe/0x30 [f2fs]
EAX: 00000000 EBX: ddebc000 ECX: 00000000 EDX: 00000000
ESI: ebd5fdf8 EDI: 00000000 EBP: ebd5fd58 ESP: ebd5fd58
 DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
CR0: 80050033 CR2: 0000000c CR3: 2c24ee40 CR4: 000006f0
Stack:
 ebd5fda4 f0b8c005 00000000 00000001 00000000 f0b8c430 c816cd68 ddebc000
 ddebc088 00001000 00000555 00000555 ffffffff c160bb00 00055501 00000000
 00000000 00000100 00000000 ebd5fe20 f0b8c430 00000046 ef161600 00001000
Call Trace:
 [<f0b8c005>] __allocate_data_block+0x1a5/0x260 [f2fs]
 [<f0b8c430>] ? f2fs_direct_IO+0x370/0x440 [f2fs]
 [<c160bb00>] ? down_read+0x30/0x50
 [<f0b8c430>] f2fs_direct_IO+0x370/0x440 [f2fs]
 [<c113e115>] generic_file_direct_write+0xa5/0x260
 [<c10b53f8>] ? current_fs_time+0x18/0x50
 [<c113e38b>] __generic_file_write_iter+0xbb/0x210
 [<c113e50f>] ? generic_file_write_iter+0x2f/0x320
 [<c113e63c>] generic_file_write_iter+0x15c/0x320
 [<f0b77f29>] f2fs_file_write_iter+0x39/0x80 [f2fs]
 [<c11984d9>] __vfs_write+0xa9/0xe0
 [<c1199227>] vfs_write+0x97/0x180
 [<c119955b>] SyS_write+0x5b/0xd0
 [<c160dcd0>] sysenter_do_call+0x12/0x12
Code: 10 8b 50 1c 89 53 14 eb ca 8d 74 26 00 85 f6 74 86 eb a6 0f 0b 90 8d b4 26 00 00 00 00 55 89 e5 3e 8d 74 26 00 8b 80 d4 02 00
00 <8b> 48 0c 39 d1 77 0e 03 48 14 39 ca 73 07 c7 40 14 00 00 00 00
EIP: [<f0b9c61e>] f2fs_drop_largest_extent+0xe/0x30 [f2fs] SS:ESP 0068:ebd5fd58
CR2: 000000000000000c
---[ end trace a38c07026a1afffd ]---

This is because when extent cache is disable, extent_tree pointer in struct
f2fs_inode_info should be NULL, but in f2fs_drop_largest_extent we access
this NULL pointer directly without checking state of extent cache, then,
the oops occurs. Let's fix it by checking state of extent cache before
accessing.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-08-28 10:14:26 -07:00
Chao Yu
19b2c30d3c f2fs: update extent tree in batches
This patch introduce a new helper f2fs_update_extent_tree_range which can
do extent mapping update at a specified range.

The main idea is:
1) punch all mapping info in extent node(s) which are at a specified range;
2) try to merge new extent mapping with adjacent node, or failing that,
   insert the mapping into extent tree as a new node.

In order to see the benefit, I add a function for stating time stamping
count as below:

uint64_t rdtsc(void)
{
	uint32_t lo, hi;
	__asm__ __volatile__ ("rdtsc" : "=a" (lo), "=d" (hi));
	return (uint64_t)hi << 32 | lo;
}

My test environment is: ubuntu, intel i7-3770, 16G memory, 256g micron ssd.

truncation path:	update extent cache from truncate_data_blocks_range
non-truncataion path:	update extent cache from other paths
total:			all update paths

a) Removing 128MB file which has one extent node mapping whole range of
file:
1. dd if=/dev/zero of=/mnt/f2fs/128M bs=1M count=128
2. sync
3. rm /mnt/f2fs/128M

Before:
		total		count		average
truncation:	7651022		32768		233.49

Patched:
		total		count		average
truncation:	3321		33		100.64

b) fsstress:
fsstress -d /mnt/f2fs -l 5 -n 100 -p 20
Test times:		5 times.

Before:
		total		count		average
truncation:	5812480.6	20911.6		277.95
non-truncation:	7783845.6	13440.8		579.12
total:		13596326.2	34352.4		395.79

Patched:
		total		count		average
truncation:	1281283.0	3041.6		421.25
non-truncation:	7355844.4	13662.8		538.38
total:		8637127.4	16704.4		517.06

1) For the updates in truncation path:
 - we can see updating in batches leads total tsc and update count reducing
   explicitly;
 - besides, for a single batched updating, punching multiple extent nodes
   in a loop, result in executing more operations, so our average tsc
   increase intensively.
2) For the updates in non-truncation path:
 - there is a little improvement, that is because for the scenario that we
   just need to update in the head or tail of extent node, new interface
   optimize to update info in extent node directly, rather than removing
   original extent node for updating and then inserting that updated one
   into cache as new node.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-08-26 11:50:35 -07:00
Chao Yu
13ec7297e5 f2fs: fix to release inode correctly
In following call stack, if unfortunately we lose all chances to truncate
inode page in remove_inode_page, eventually we will add the nid allocated
previously into free nid cache, this nid is with NID_NEW status and with
NEW_ADDR in its blkaddr pointer:

 - f2fs_create
  - f2fs_add_link
   - __f2fs_add_link
    - init_inode_metadata
     - new_inode_page
      - new_node_page
       - set_node_addr(, NEW_ADDR)
     - f2fs_init_acl   failed
     - remove_inode_page  failed
  - handle_failed_inode
   - remove_inode_page  failed
   - iput
    - f2fs_evict_inode
     - remove_inode_page  failed
     - alloc_nid_failed   cache a nid with valid blkaddr: NEW_ADDR

This may not only cause resource leak of previous inode, but also may cause
incorrect use of the previous blkaddr which is located in NO.nid node entry
when this nid is reused by others.

This patch tries to add this inode to orphan list if we fail to truncate
inode, so that we can obtain a second chance to release it in orphan
recovery flow.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-08-24 16:35:59 -07:00
Chao Yu
b01548919c f2fs: handle f2fs_truncate error correctly
This patch fixes to return error number of f2fs_truncate, so that we
can handle the error correctly in callers.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-08-24 09:39:56 -07:00
Chao Yu
4ec17d688d f2fs: avoid unneeded initializing when converting inline dentry
When converting inline dentry, we will zero out target dentry page before
duplicating data of inline dentry into target page, it become overhead
since inline dentry size is not small.

So this patch tries to remove unneeded initializing in the space of target
dentry page.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-08-24 09:38:20 -07:00
Zhang Zhen
6a6788576d f2fs: atomically set inode->i_flags
According to commit 5f16f3225b ("ext4: atomically set inode->i_flags in
ext4_set_inode_flags()").

Signed-off-by: Zhang Zhen <zhenzhang.zhang@huawei.com>
Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-08-24 09:37:53 -07:00
Jaegeuk Kim
f7409d0fae f2fs: fix wrong pointer access during try_to_free_nids
If we release the lock in list_for_each_entry_safe, we can lose the tmp
pointer by alloc_nid.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-08-24 09:37:42 -07:00
Jaegeuk Kim
80c545055d f2fs: use __GFP_NOFAIL to avoid infinite loop
__GFP_NOFAIL can avoid retrying the whole path of kmem_cache_alloc and
bio_alloc.
And, it also fixes the use cases of GFP_ATOMIC correctly.

Suggested-by: Chao Yu <chao2.yu@samsung.com>
Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-08-24 09:37:21 -07:00
Chao Yu
dac2ddefe6 f2fs: lookup neighbor extent nodes for merging later
In __lookup_extent_tree_ret we will not try to find neighbor nodes if
we find the target node, in this condition, we will lost the chance to
merge the new mapping with exist extent node later.

So our extent cache of inode will be fragmented after overwrite exist
file, we can see the number of extent node increases intensively in
following test case:

dd if=/dev/zero of=/mnt/f2fs/4m bs=4K count=1024

Extent Cache:
  - Hit Count: L1-1:0 L1-2:0 L2:0
  - Hit Ratio: 0% (0 / 3072)
  - Inner Struct Count: tree: 1, node: 1

dd if=/dev/zero of=/mnt/f2fs/4m bs=4K count=1024 conv=notrunc

Extent Cache:
  - Hit Count: L1-1:2048 L1-2:0 L2:0
  - Hit Ratio: 33% (2048 / 6144)
  - Inner Struct Count: tree: 1, node: 961

This patch fixes to lookup neighbors of target node for further
merging.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-08-21 22:45:18 -07:00
Chao Yu
ef05e22199 f2fs: split __insert_extent_tree_ret for readability
This patch splits __insert_extent_tree_ret into __try_merge_extent_node &
__insert_extent_tree for code readability.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-08-21 22:45:17 -07:00