linux

korg/linux

mirror of https://mirrors.bfsu.edu.cn/git/linux.git synced 2024-11-25 21:24:08 +08:00

Author	SHA1	Message	Date
Chao Yu	530e070420	f2fs: don't mark compressed inode dirty during f2fs_iget() - f2fs_iget - do_read_inode - set_inode_flag(, FI_COMPRESSED_FILE) - __mark_inode_dirty_flag(, true) It's unnecessary, so let's just mark compressed inode dirty while compressed inode conversion. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2020-03-30 20:46:23 -07:00
Christoph Hellwig	c6a564ffad	block: move the part_stat* helpers from genhd.h to a new header These macros are just used by a few files. Move them out of genhd.h, which is included everywhere into a new standalone header. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-03-25 09:50:09 -06:00
Chao Yu	1a67cbe141	f2fs: fix to account compressed blocks in f2fs_compressed_blocks() por_fsstress reports inconsistent status in orphan inode, the root cause of this is in f2fs_write_raw_pages() we decrease i_compr_blocks incorrectly due to wrong calculation in f2fs_compressed_blocks(). So this patch exposes below two functions based on __f2fs_cluster_blocks: - f2fs_compressed_blocks: get count of compressed blocks in compressed cluster - f2fs_cluster_blocks: get count of valid blocks (including reserved blocks) in compressed cluster. Then use f2fs_compress_blocks() to get correct compressed blocks count in f2fs_write_raw_pages(). sanity_check_inode: inode (ino=ad80) hash inconsistent i_compr_blocks:2, i_blocks:1, run fsck to fix Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2020-03-22 21:16:29 -07:00
Gustavo A. R. Silva	50b1203d8c	f2fs: xattr.h: Replace zero-length array with flexible-array member The current codebase makes use of the zero-length array language extension to the C90 standard, but the preferred mechanism to declare variable-length types such as these ones is a flexible array member[1][2], introduced in C99: struct foo { int stuff; struct boo array[]; }; By making use of the mechanism above, we will get a compiler warning in case the flexible array does not occur last in the structure, which will help us prevent some kind of undefined behavior bugs from being inadvertently introduced[3] to the codebase from now on. Also, notice that, dynamic memory allocations won't be affected by this change: "Flexible array members have incomplete type, and so the sizeof operator may not be applied. As a quirk of the original implementation of zero-length arrays, sizeof evaluates to zero."[1] This issue was found with the help of Coccinelle. [1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html [2] https://github.com/KSPP/linux/issues/21 [3] commit `7649773293` ("cxgb3/l2t: Fix undefined behaviour") Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2020-03-22 21:16:29 -07:00
Chao Yu	a4ba5dfc5c	f2fs: fix to update f2fs_super_block fields under sb_lock Fields in struct f2fs_super_block should be updated under coverage of sb_lock, fix to adjust update_sb_metadata() for that rule. Fixes: `04f0b2eaa3` ("f2fs: ioctl for removing a range from F2FS") Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2020-03-22 21:16:29 -07:00
Sahitya Tummala	c84ef3c5e6	f2fs: Add a new CP flag to help fsck fix resize SPO issues Add and set a new CP flag CP_RESIZEFS_FLAG during online resize FS to help fsck fix the metadata mismatch that may happen due to SPO during resize, where SB got updated but CP data couldn't be written yet. fsck errors - Info: CKPT version = 6ed7bccb Wrong user_block_count(2233856) [f2fs_do_mount:3365] Checkpoint is polluted Signed-off-by: Sahitya Tummala <stummala@codeaurora.org> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2020-03-22 21:16:28 -07:00
Sahitya Tummala	6827568275	f2fs: Fix mount failure due to SPO after a successful online resize FS Even though online resize is successfully done, a SPO immediately after resize, still causes below error in the next mount. [ 11.294650] F2FS-fs (sda8): Wrong user_block_count: 2233856 [ 11.300272] F2FS-fs (sda8): Failed to get valid F2FS checkpoint This is because after FS metadata is updated in update_fs_metadata() if the SBI_IS_DIRTY is not dirty, then CP will not be done to reflect the new user_block_count. Signed-off-by: Sahitya Tummala <stummala@codeaurora.org> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2020-03-22 21:16:28 -07:00
Chao Yu	a999150f4f	f2fs: use kmem_cache pool during inline xattr lookups It's been observed that kzalloc() on lookup_all_xattrs() are called millions of times on Android, quickly becoming the top abuser of slub memory allocator. Use a dedicated kmem cache pool for xattr lookups to mitigate this. Signed-off-by: Park Ju Hyung <qkrwngud825@gmail.com> Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2020-03-22 21:16:27 -07:00
Eric Biggers	ee446e1af4	f2fs: wire up FS_IOC_GET_ENCRYPTION_NONCE This new ioctl retrieves a file's encryption nonce, which is useful for testing. See the corresponding fs/crypto/ patch for more details. Link: https://lore.kernel.org/r/20200314205052.93294-4-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com>	2020-03-19 21:57:06 -07:00
Jaegeuk Kim	dabfbbc8f9	f2fs: skip migration only when BG_GC is called FG_GC needs to move entire section more quickly. Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2020-03-19 11:41:26 -07:00
Chao Yu	7bd2935870	f2fs: fix to show tracepoint correctly Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2020-03-19 11:41:26 -07:00
Chao Yu	ca9e968a5e	f2fs: avoid __GFP_NOFAIL in f2fs_bio_alloc __f2fs_bio_alloc() won't fail due to memory pool backend, remove unneeded __GFP_NOFAIL flag in __f2fs_bio_alloc(). Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2020-03-19 11:41:26 -07:00
Chao Yu	439dfb1062	f2fs: introduce F2FS_IOC_GET_COMPRESS_BLOCKS With this newly introduced interface, user can get block number compression saved in target inode. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2020-03-19 11:41:26 -07:00
Chao Yu	0683728ada	f2fs: fix to avoid triggering IO in write path If we are in write IO path, we need to avoid using GFP_KERNEL. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2020-03-19 11:41:26 -07:00
Chao Yu	985100035e	f2fs: add prefix for f2fs slab cache name In order to avoid polluting global slab cache namespace. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2020-03-19 11:41:26 -07:00
Chao Yu	5df7731f60	f2fs: introduce DEFAULT_IO_TIMEOUT As Geert Uytterhoeven reported: for parameter HZ/50 in congestion_wait(BLK_RW_ASYNC, HZ/50); On some platforms, HZ can be less than 50, then unexpected 0 timeout jiffies will be set in congestion_wait(). This patch introduces a macro DEFAULT_IO_TIMEOUT to wrap a determinate value with msecs_to_jiffies(20) to instead HZ/50 to avoid such issue. Quoted from Geert Uytterhoeven: "A timeout of HZ means 1 second. HZ/50 means 20 ms, but has the risk of being zero, if HZ < 50. If you want to use a timeout of 20 ms, you best use msecs_to_jiffies(20), as that takes care of the special cases, and never returns 0." Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2020-03-19 11:41:26 -07:00
Jaegeuk Kim	2bac07635d	f2fs: skip GC when section is full This fixes skipping GC when segment is full in large section. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2020-03-19 11:41:26 -07:00
Jaegeuk Kim	8c7b9ac129	f2fs: add migration count iff migration happens If first segment is empty and migration_granularity is 1, we can't move this at all. Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2020-03-19 11:41:26 -07:00
Chao Yu	bbbc34fd66	f2fs: clean up bggc mount option There are three status for background gc: on, off and sync, it's a little bit confused to use test_opt(BG_GC) and test_opt(FORCE_FG_GC) combinations to indicate status of background gc. So let's remove F2FS_MOUNT_BG_GC and F2FS_MOUNT_FORCE_FG_GC mount options, and add F2FS_OPTION().bggc_mode with below three status to clean up codes and enhance bggc mode's scalability. enum { BGGC_MODE_ON, /* background gc is on / BGGC_MODE_OFF, / background gc is off / BGGC_MODE_SYNC, / * background gc is on, migrating blocks * like foreground gc */ }; Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2020-03-19 11:41:25 -07:00
Chao Yu	b0332a0f95	f2fs: clean up lfs/adaptive mount option This patch removes F2FS_MOUNT_ADAPTIVE and F2FS_MOUNT_LFS mount options, and add F2FS_OPTION.fs_mode with below two status to indicate filesystem mode. enum { FS_MODE_ADAPTIVE, /* use both lfs/ssr allocation / FS_MODE_LFS, / use lfs allocation only */ }; It can enhance code readability and fs mode's scalability. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2020-03-19 11:41:25 -07:00
Chao Yu	a9117eca1d	f2fs: fix to show norecovery mount option Previously, 'norecovery' mount option will be shown as 'disable_roll_forward', fix to show original option name correctly. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2020-03-19 11:41:25 -07:00
Chao Yu	ba3b583cff	f2fs: clean up parameter of macro XATTR_SIZE() Just cleanup, no logic change. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2020-03-19 11:41:25 -07:00
Chao Yu	a2ced1ce10	f2fs: clean up codes with {f2fs_,}data_blkaddr() - rename datablock_addr() to data_blkaddr(). - wrap data_blkaddr() with f2fs_data_blkaddr() to clean up parameters. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2020-03-19 11:41:25 -07:00
Jaegeuk Kim	a7e679b533	f2fs: show mounted time Let's show mounted time. Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2020-03-19 11:41:25 -07:00
Takashi Iwai	c6d5789bea	f2fs: Use scnprintf() for avoiding potential buffer overflow Since snprintf() returns the would-be-output size instead of the actual output size, the succeeding calls may go beyond the given buffer limit. Fix it by replacing with scnprintf(). Signed-off-by: Takashi Iwai <tiwai@suse.de> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2020-03-19 11:37:56 -07:00
Chao Yu	2536ac6872	f2fs: allow to clear F2FS_COMPR_FL flag If regular inode has no compressed cluster, allow using 'chattr -c' to remove its compress flag, recovering it to a non-compressed file. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2020-03-11 08:25:38 -07:00
Chao Yu	6cfdf15fdb	f2fs: fix to check dirty pages during compressed inode conversion Compressed cluster can be generated during dirty data writeback, if there is dirty pages on compressed inode, it needs to disable converting compressed inode to non-compressed one. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2020-03-11 08:25:38 -07:00
Chao Yu	96f5b4fa56	f2fs: fix to account compressed inode correctly stat_inc_compr_inode() needs to check FI_COMPRESSED_FILE flag, so in f2fs_disable_compressed_file(), we should call stat_dec_compr_inode() before clearing FI_COMPRESSED_FILE flag. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2020-03-11 08:25:38 -07:00
Jaegeuk Kim	99eabb914e	f2fs: fix wrong check on F2FS_IOC_FSSETXATTR This fixes the incorrect failure when enabling project quota on casefold-enabled file. Cc: Daniel Rosenberg <drosen@google.com> Cc: kernel-team@android.com Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2020-03-10 09:18:33 -07:00
Chao Yu	95978caa13	f2fs: fix to avoid use-after-free in f2fs_write_multi_pages() In compress cluster, if physical block number is less than logic page number, race condition will cause use-after-free issue as described below: - f2fs_write_compressed_pages - fio.page = cic->rpages[0]; - f2fs_outplace_write_data - f2fs_compress_write_end_io - kfree(cic->rpages); - kfree(cic); - fio.page = cic->rpages[1]; f2fs_write_multi_pages+0xfd0/0x1a98 f2fs_write_data_pages+0x74c/0xb5c do_writepages+0x64/0x108 __writeback_single_inode+0xdc/0x4b8 writeback_sb_inodes+0x4d0/0xa68 __writeback_inodes_wb+0x88/0x178 wb_writeback+0x1f0/0x424 wb_workfn+0x2f4/0x574 process_one_work+0x210/0x48c worker_thread+0x2e8/0x44c kthread+0x110/0x120 ret_from_fork+0x10/0x18 Fixes: `4c8ff7095b` ("f2fs: support data compression") Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2020-03-10 09:18:33 -07:00
Chao Yu	06c7540fd2	f2fs: fix to avoid using uninitialized variable In f2fs_vm_page_mkwrite(), if inode is compress one, and current mmapped page locates in compressed cluster, we have to call f2fs_get_dnode_of_data() to get its physical block address before f2fs_wait_on_block_writeback(). Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2020-03-10 09:18:33 -07:00
Chao Yu	7a88ddb560	f2fs: fix inconsistent comments Lack of maintenance on comments may mislead developers, fix them. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2020-03-10 09:18:33 -07:00
Chao Yu	3addc1aed3	f2fs: remove i_sem lock coverage in f2fs_setxattr() f2fs_inode.xattr_ver field was gone after commit `d260081ccf` ("f2fs: change recovery policy of xattr node block"), remove i_sem lock coverage in f2fs_setxattr() Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2020-03-10 09:18:32 -07:00
Chao Yu	c10c982032	f2fs: cover last_disk_size update with spinlock This change solves below hangtask issue: INFO: task kworker/u16:1:58 blocked for more than 122 seconds. Not tainted 5.6.0-rc2-00590-g9983bdae4974e #11 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. kworker/u16:1 D 0 58 2 0x00000000 Workqueue: writeback wb_workfn (flush-179:0) Backtrace: (__schedule) from [<c0913234>] (schedule+0x78/0xf4) (schedule) from [<c017ec74>] (rwsem_down_write_slowpath+0x24c/0x4c0) (rwsem_down_write_slowpath) from [<c0915f2c>] (down_write+0x6c/0x70) (down_write) from [<c0435b80>] (f2fs_write_single_data_page+0x608/0x7ac) (f2fs_write_single_data_page) from [<c0435fd8>] (f2fs_write_cache_pages+0x2b4/0x7c4) (f2fs_write_cache_pages) from [<c043682c>] (f2fs_write_data_pages+0x344/0x35c) (f2fs_write_data_pages) from [<c0267ee8>] (do_writepages+0x3c/0xd4) (do_writepages) from [<c0310cbc>] (__writeback_single_inode+0x44/0x454) (__writeback_single_inode) from [<c03112d0>] (writeback_sb_inodes+0x204/0x4b0) (writeback_sb_inodes) from [<c03115cc>] (__writeback_inodes_wb+0x50/0xe4) (__writeback_inodes_wb) from [<c03118f4>] (wb_writeback+0x294/0x338) (wb_writeback) from [<c0312dac>] (wb_workfn+0x35c/0x54c) (wb_workfn) from [<c014f2b8>] (process_one_work+0x214/0x544) (process_one_work) from [<c014f634>] (worker_thread+0x4c/0x574) (worker_thread) from [<c01564fc>] (kthread+0x144/0x170) (kthread) from [<c01010e8>] (ret_from_fork+0x14/0x2c) Reported-and-tested-by: Ondřej Jirman <megi@xff.cz> Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2020-03-10 09:18:32 -07:00
Chao Yu	d940aa07ed	f2fs: fix to check i_compr_blocks correctly inode.i_blocks counts based on 512byte sector, we need to convert to 4kb sized block count before comparing to i_compr_blocks. In addition, add to print message when sanity check on inode compression configs failed. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2020-03-10 09:18:29 -07:00
Chao Yu	df77fbd8c5	f2fs: fix to avoid potential deadlock Using f2fs_trylock_op() in f2fs_write_compressed_pages() to avoid potential deadlock like we did in f2fs_write_single_data_page(). Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2020-02-27 10:16:45 -08:00
Chao Yu	097a768650	f2fs: add missing function name in kernel message Otherwise, we can not distinguish the exact location of messages, when there are more than one places printing same message. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2020-02-27 10:16:45 -08:00
Chao Yu	0b32dc1864	f2fs: recycle unused compress_data.chksum feild In Struct compress_data, chksum field was never used, remove it. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2020-02-27 10:16:45 -08:00
Chao Yu	61fbae2b2b	f2fs: fix to avoid NULL pointer dereference Unable to handle kernel NULL pointer dereference at virtual address 00000000 PC is at f2fs_free_dic+0x60/0x2c8 LR is at f2fs_decompress_pages+0x3c4/0x3e8 f2fs_free_dic+0x60/0x2c8 f2fs_decompress_pages+0x3c4/0x3e8 __read_end_io+0x78/0x19c f2fs_post_read_work+0x6c/0x94 process_one_work+0x210/0x48c worker_thread+0x2e8/0x44c kthread+0x110/0x120 ret_from_fork+0x10/0x18 In f2fs_free_dic(), we can not use f2fs_put_page(,1) to release dic->tpages[i], as the page's mapping is NULL. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2020-02-27 10:16:45 -08:00
Eric Biggers	7fa6d59816	f2fs: fix leaking uninitialized memory in compressed clusters When the compressed data of a cluster doesn't end on a page boundary, the remainder of the last page must be zeroed in order to avoid leaking uninitialized memory to disk. Fixes: `4c8ff7095b` ("f2fs: support data compression") Signed-off-by: Eric Biggers <ebiggers@google.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2020-02-27 10:16:44 -08:00
Sahitya Tummala	bf22c3cc8c	f2fs: fix the panic in do_checkpoint() There could be a scenario where f2fs_sync_meta_pages() will not ensure that all F2FS_DIRTY_META pages are submitted for IO. Thus, resulting in the below panic in do_checkpoint() - f2fs_bug_on(sbi, get_pages(sbi, F2FS_DIRTY_META) && !f2fs_cp_error(sbi)); This can happen in a low-memory condition, where shrinker could also be doing the writepage operation (stack shown below) at the same time when checkpoint is running on another core. schedule down_write f2fs_submit_page_write -> by this time, this page in page cache is tagged as PAGECACHE_TAG_WRITEBACK and PAGECACHE_TAG_DIRTY is cleared, due to which f2fs_sync_meta_pages() cannot sync this page in do_checkpoint() path. f2fs_do_write_meta_page __f2fs_write_meta_page f2fs_write_meta_page shrink_page_list shrink_inactive_list shrink_node_memcg shrink_node kswapd Signed-off-by: Sahitya Tummala <stummala@codeaurora.org> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2020-02-27 10:16:44 -08:00
Chao Yu	dc5a941223	f2fs: fix to wait all node page writeback There is a race condition that we may miss to wait for all node pages writeback, fix it. - fsync() - shrink - f2fs_do_sync_file - __write_node_page - set_page_writeback(page#0) : remove DIRTY/TOWRITE flag - f2fs_fsync_node_pages : won't find page #0 as TOWRITE flag was removeD - f2fs_wait_on_node_pages_writeback : wont' wait page #0 writeback as it was not in fsync_node_list list. - f2fs_add_fsync_node_entry Fixes: `50fa53eccf` ("f2fs: fix to avoid broken of dnode block list") Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2020-02-27 10:16:44 -08:00
Linus Torvalds	236f453294	Merge branch 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull misc vfs updates from Al Viro: - bmap series from cmaiolino - getting rid of convolutions in copy_mount_options() (use a couple of copy_from_user() instead of the __get_user() crap) * 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: saner copy_mount_options() fibmap: Reject negative block numbers fibmap: Use bmap instead of ->bmap method in ioctl_fibmap ecryptfs: drop direct calls to ->bmap cachefiles: drop direct usage of ->bmap method. fs: Enable bmap() function to properly return errors	2020-02-08 13:04:49 -08:00
Linus Torvalds	bddea11b1b	Merge branch 'imm.timestamp' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs timestamp updates from Al Viro: "More 64bit timestamp work" * 'imm.timestamp' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: kernfs: don't bother with timestamp truncation fs: Do not overload update_time fs: Delete timespec64_trunc() fs: ubifs: Eliminate timespec64_trunc() usage fs: ceph: Delete timespec64_trunc() usage fs: cifs: Delete usage of timespec64_trunc fs: fat: Eliminate timespec64_trunc() usage utimes: Clamp the timestamps in notify_change()	2020-02-05 05:02:42 +00:00
Masahiro Yamada	45586c7078	treewide: remove redundant IS_ERR() before error code check 'PTR_ERR(p) == -E*' is a stronger condition than IS_ERR(p). Hence, IS_ERR(p) is unneeded. The semantic patch that generates this commit is as follows: // <smpl> @@ expression ptr; constant error_code; @@ -IS_ERR(ptr) && (PTR_ERR(ptr) == - error_code) +PTR_ERR(ptr) == - error_code // </smpl> Link: http://lkml.kernel.org/r/20200106045833.1725-1-masahiroy@kernel.org Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Cc: Julia Lawall <julia.lawall@lip6.fr> Acked-by: Stephen Boyd <sboyd@kernel.org> [drivers/clk/clk.c] Acked-by: Bartosz Golaszewski <bgolaszewski@baylibre.com> [GPIO] Acked-by: Wolfram Sang <wsa@the-dreams.de> [drivers/i2c] Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> [acpi/scan.c] Acked-by: Rob Herring <robh@kernel.org> Cc: Eric Biggers <ebiggers@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2020-02-04 03:05:27 +00:00
Carlos Maiolino	30460e1ea3	fs: Enable bmap() function to properly return errors By now, bmap() will either return the physical block number related to the requested file offset or 0 in case of error or the requested offset maps into a hole. This patch makes the needed changes to enable bmap() to proper return errors, using the return value as an error return, and now, a pointer must be passed to bmap() to be filled with the mapped physical block. It will change the behavior of bmap() on return: - negative value in case of error - zero on success or map fell into a hole In case of a hole, the *block will be zero too Since this is a prep patch, by now, the only error return is -EINVAL if ->bmap doesn't exist. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2020-02-03 08:05:37 -05:00
Linus Torvalds	6e135baed8	f2fs-for-5.6 In this series, we've implemented transparent compression experimentally. It supports LZO and LZ4, but will add more later as we investigate in the field more. At this point, the feature doesn't expose compressed space to user directly in order to guarantee potential data updates later to the space. Instead, the main goal is to reduce data writes to flash disk as much as possible, resulting in extending disk life time as well as relaxing IO congestion. Alternatively, we're also considering to add ioctl() to reclaim compressed space and show it to user after putting the immutable bit. Enhancement: - add compression support - avoid unnecessary locks in quota ops - harden power-cut scenario for zoned block devices - use private bio_set to avoid IO congestion - replace GC mutex with rwsem to serialize callers Bug fix: - fix dentry consistency and memory corruption in rename()'s error case - fix wrong swap extent reports - fix casefolding bugs - change lock coverage to avoid deadlock - avoid GFP_KERNEL under f2fs_lock_op And, we've cleaned up sysfs entries to prepare no debugfs. -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEE00UqedjCtOrGVvQiQBSofoJIUNIFAl4zInwACgkQQBSofoJI UNL4Tg/+JBbVEFa3IUBGMdbjfgd/g0Jye++iMAYYGRWT6Ll/IGcHRV9NunITjgWU mBZqdhI28kXeiGCcewB1ZvivjLx22X4n6yevHk2B5A6PNe9IDCHi0HOAhJJHkjPH ecv2L+vX3Oj4y0+H7JNz9Fo3OIPJvMPtCQWlg1z+VQyhB85zNP7fZlvvIY4tG8yw ERo0YNotLqwcF1BxCwNbAhV3aJGDxar+MI//yNzpiwDX7IptVpqestfcoIYc9kKL 4kSWRyEIGwcuIeyoM6aofGS9t4Z/Oe/gdqcxNr6l5n0Q/tMTpb4b/fJFGNr6RRx9 X9NQo8flkQb2DEIOP0DVpO2aPebzsVtzg3LZUOLA83+wCHfwINtHai2Dy2zDJ2my BrVdou8fe2oxoaYihJg/Tz9cd0nA/6mZArtpYvDImAmX/xuGOvVk9zZkXNwc9nVX EyVzy0vW4lA6gAIJ95aG6DDhJcAtVoy0MhBRWG92Pufxhn9aW24AV63ChWUf9DRx /3RqpMAuQ3UC2gOxXKKnr54lsdhUIMn/y9sjROkVvQ1BvgRVxO8I4GFvMHMKv9pR 9KXiVRdzyYERyoL4+MF7A2zTnw+RHL4RVILa85p2ALGy2jQ1UuNUQi0BN9x2u1v8 S1ifNNX8SwOP+83ImFJhhn3HybpFQ45aLO3F7ZjKBQAnufJu+xw= =zeoY -----END PGP SIGNATURE----- Merge tag 'f2fs-for-5.6' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs Pull f2fs updates from Jaegeuk Kim: "In this series, we've implemented transparent compression experimentally. It supports LZO and LZ4, but will add more later as we investigate in the field more. At this point, the feature doesn't expose compressed space to user directly in order to guarantee potential data updates later to the space. Instead, the main goal is to reduce data writes to flash disk as much as possible, resulting in extending disk life time as well as relaxing IO congestion. Alternatively, we're also considering to add ioctl() to reclaim compressed space and show it to user after putting the immutable bit. Enhancements: - add compression support - avoid unnecessary locks in quota ops - harden power-cut scenario for zoned block devices - use private bio_set to avoid IO congestion - replace GC mutex with rwsem to serialize callers Bug fixes: - fix dentry consistency and memory corruption in rename()'s error case - fix wrong swap extent reports - fix casefolding bugs - change lock coverage to avoid deadlock - avoid GFP_KERNEL under f2fs_lock_op And, we've cleaned up sysfs entries to prepare no debugfs" * tag 'f2fs-for-5.6' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (31 commits) f2fs: fix race conditions in ->d_compare() and ->d_hash() f2fs: fix dcache lookup of !casefolded directories f2fs: Add f2fs stats to sysfs f2fs: delete duplicate information on sysfs nodes f2fs: change to use rwsem for gc_mutex f2fs: update f2fs document regarding to fsync_mode f2fs: add a way to turn off ipu bio cache f2fs: code cleanup for f2fs_statfs_project() f2fs: fix miscounted block limit in f2fs_statfs_project() f2fs: show the CP_PAUSE reason in checkpoint traces f2fs: fix deadlock allocating bio_post_read_ctx from mempool f2fs: remove unneeded check for error allocating bio_post_read_ctx f2fs: convert inline_dir early before starting rename f2fs: fix memleak of kobject f2fs: fix to add swap extent correctly f2fs: run fsck when getting bad inode during GC f2fs: support data compression f2fs: free sysfs kobject f2fs: declare nested quota_sem and remove unnecessary sems f2fs: don't put new_page twice in f2fs_rename ...	2020-01-30 15:39:24 -08:00
Linus Torvalds	c8994374d9	fsverity updates for 5.6 - Optimize fs-verity sequential read performance by implementing readahead of Merkle tree pages. This allows the Merkle tree to be read in larger chunks. - Optimize FS_IOC_ENABLE_VERITY performance in the uncached case by implementing readahead of data pages. - Allocate the hash requests from a mempool in order to eliminate the possibility of allocation failures during I/O. -----BEGIN PGP SIGNATURE----- iIoEABYIADIWIQSacvsUNc7UX4ntmEPzXCl4vpKOKwUCXi+OuhQcZWJpZ2dlcnNA Z29vZ2xlLmNvbQAKCRDzXCl4vpKOK/ZIAP452KKPs6AGXrClZ2l+5nFbkDLN9Or8 w277B0BeRnu5ogEApmKnYsmRsduLZRJbni7VCpkJLAYI2kmFCwGkFfe3tAQ= =svdR -----END PGP SIGNATURE----- Merge tag 'fsverity-for-linus' of git://git.kernel.org/pub/scm/fs/fscrypt/fscrypt Pull fsverity updates from Eric Biggers: - Optimize fs-verity sequential read performance by implementing readahead of Merkle tree pages. This allows the Merkle tree to be read in larger chunks. - Optimize FS_IOC_ENABLE_VERITY performance in the uncached case by implementing readahead of data pages. - Allocate the hash requests from a mempool in order to eliminate the possibility of allocation failures during I/O. * tag 'fsverity-for-linus' of git://git.kernel.org/pub/scm/fs/fscrypt/fscrypt: fs-verity: use u64_to_user_ptr() fs-verity: use mempool for hash requests fs-verity: implement readahead of Merkle tree pages fs-verity: implement readahead for FS_IOC_ENABLE_VERITY	2020-01-28 15:31:03 -08:00
Eric Biggers	80f2388afa	f2fs: fix race conditions in ->d_compare() and ->d_hash() Since ->d_compare() and ->d_hash() can be called in RCU-walk mode, ->d_parent and ->d_inode can be concurrently modified, and in particular, ->d_inode may be changed to NULL. For f2fs_d_hash() this resulted in a reproducible NULL dereference if a lookup is done in a directory being deleted, e.g. with: int main() { if (fork()) { for (;;) { mkdir("subdir", 0700); rmdir("subdir"); } } else { for (;;) access("subdir/file", 0); } } ... or by running the 't_encrypted_d_revalidate' program from xfstests. Both repros work in any directory on a filesystem with the encoding feature, even if the directory doesn't actually have the casefold flag. I couldn't reproduce a crash in f2fs_d_compare(), but it appears that a similar crash is possible there. Fix these bugs by reading ->d_parent and ->d_inode using READ_ONCE() and falling back to the case sensitive behavior if the inode is NULL. Reported-by: Al Viro <viro@zeniv.linux.org.uk> Fixes: `2c2eb7a300` ("f2fs: Support case-insensitive file name lookups") Cc: <stable@vger.kernel.org> # v5.4+ Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2020-01-24 10:04:09 -08:00
Eric Biggers	5515eae647	f2fs: fix dcache lookup of !casefolded directories Do the name comparison for non-casefolded directories correctly. This is analogous to ext4's commit `66883da1ee` ("ext4: fix dcache lookup of !casefolded directories"). Fixes: `2c2eb7a300` ("f2fs: Support case-insensitive file name lookups") Cc: <stable@vger.kernel.org> # v5.4+ Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2020-01-24 09:53:02 -08:00
Hridya Valsaraju	fc7100ea2a	f2fs: Add f2fs stats to sysfs Currently f2fs stats are only available from /d/f2fs/status. This patch adds some of the f2fs stats to sysfs so that they are accessible even when debugfs is not mounted. The following sysfs nodes are added: -/sys/fs/f2fs/<disk>/free_segments -/sys/fs/f2fs/<disk>/cp_foreground_calls -/sys/fs/f2fs/<disk>/cp_background_calls -/sys/fs/f2fs/<disk>/gc_foreground_calls -/sys/fs/f2fs/<disk>/gc_background_calls -/sys/fs/f2fs/<disk>/moved_blocks_foreground -/sys/fs/f2fs/<disk>/moved_blocks_background -/sys/fs/f2fs/<disk>/avg_vblocks Signed-off-by: Hridya Valsaraju <hridya@google.com> [Jaegeuk Kim: allow STAT_FS without DEBUG_FS] Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2020-01-23 09:24:25 -08:00
Chao Yu	fb24fea75c	f2fs: change to use rwsem for gc_mutex Mutex lock won't serialize callers, in order to avoid starving of unlucky caller, let's use rwsem lock instead. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2020-01-17 16:48:44 -08:00
Jaegeuk Kim	d7b0a23d81	f2fs: update f2fs document regarding to fsync_mode This patch adds missing fsync_mode entry in f2fs document. Fixes: `04485987f0` ("f2fs: introduce async IPU policy") Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2020-01-17 16:48:44 -08:00
Jaegeuk Kim	0e7f41974e	f2fs: add a way to turn off ipu bio cache Setting 0x40 in /sys/fs/f2fs/dev/ipu_policy gives a way to turn off bio cache, which is useufl to check whether block layer using hardware encryption engine merges IOs correctly. Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2020-01-17 16:48:43 -08:00
Chengguang Xu	bf2cbd3c57	f2fs: code cleanup for f2fs_statfs_project() Calling min_not_zero() to simplify complicated prjquota limit comparison in f2fs_statfs_project(). Signed-off-by: Chengguang Xu <cgxu519@mykernel.net> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2020-01-17 16:48:43 -08:00
Chengguang Xu	acdf217217	f2fs: fix miscounted block limit in f2fs_statfs_project() statfs calculates Total/Used/Avail disk space in block unit, so we should translate soft/hard prjquota limit to block unit as well. Below testing result shows the block/inode numbers of Total/Used/Avail from df command are all correct afer applying this patch. [root@localhost quota-tools]\# ./repquota -P /dev/sdb1 *** Report for project quotas on device /dev/sdb1 Block grace time: 7days; Inode grace time: 7days Block limits File limits Project used soft hard grace used soft hard grace ----------------------------------------------------------- \#0 -- 4 0 0 1 0 0 \#101 -- 0 0 0 2 0 0 \#102 -- 0 10240 0 2 10 0 \#103 -- 0 0 20480 2 0 20 \#104 -- 0 10240 20480 2 10 20 \#105 -- 0 20480 10240 2 20 10 [root@localhost sdb1]\# lsattr -p t{1,2,3,4,5} 101 ----------------N-- t1/a1 102 ----------------N-- t2/a2 103 ----------------N-- t3/a3 104 ----------------N-- t4/a4 105 ----------------N-- t5/a5 [root@localhost sdb1]\# df -hi t{1,2,3,4,5} Filesystem Inodes IUsed IFree IUse% Mounted on /dev/sdb1 2.4M 21 2.4M 1% /mnt/sdb1 /dev/sdb1 10 2 8 20% /mnt/sdb1 /dev/sdb1 20 2 18 10% /mnt/sdb1 /dev/sdb1 10 2 8 20% /mnt/sdb1 /dev/sdb1 10 2 8 20% /mnt/sdb1 [root@localhost sdb1]\# df -h t{1,2,3,4,5} Filesystem Size Used Avail Use% Mounted on /dev/sdb1 10G 489M 9.6G 5% /mnt/sdb1 /dev/sdb1 10M 0 10M 0% /mnt/sdb1 /dev/sdb1 20M 0 20M 0% /mnt/sdb1 /dev/sdb1 10M 0 10M 0% /mnt/sdb1 /dev/sdb1 10M 0 10M 0% /mnt/sdb1 Fixes: `909110c060` ("f2fs: choose hardlimit when softlimit is larger than hardlimit in f2fs_statfs_project()") Signed-off-by: Chengguang Xu <cgxu519@mykernel.net> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2020-01-17 16:48:43 -08:00
Eric Biggers	644c8c92ad	f2fs: fix deadlock allocating bio_post_read_ctx from mempool Without any form of coordination, any case where multiple allocations from the same mempool are needed at a time to make forward progress can deadlock under memory pressure. This is the case for struct bio_post_read_ctx, as one can be allocated to decrypt a Merkle tree page during fsverity_verify_bio(), which itself is running from a post-read callback for a data bio which has its own struct bio_post_read_ctx. Fix this by freeing first bio_post_read_ctx before calling fsverity_verify_bio(). This works because verity (if enabled) is always the last post-read step. This deadlock can be reproduced by trying to read from an encrypted verity file after reducing NUM_PREALLOC_POST_READ_CTXS to 1 and patching mempool_alloc() to pretend that pool->alloc() always fails. Note that since NUM_PREALLOC_POST_READ_CTXS is actually 128, to actually hit this bug in practice would require reading from lots of encrypted verity files at the same time. But it's theoretically possible, as N available objects doesn't guarantee forward progress when > N/2 threads each need 2 objects at a time. Fixes: `95ae251fe8` ("f2fs: add fs-verity support") Signed-off-by: Eric Biggers <ebiggers@google.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2020-01-17 16:48:43 -08:00
Eric Biggers	e8ce5749d7	f2fs: remove unneeded check for error allocating bio_post_read_ctx Since allocating an object from a mempool never fails when __GFP_DIRECT_RECLAIM (which is included in GFP_NOFS) is set, the check for failure to allocate a bio_post_read_ctx is unnecessary. Remove it. Signed-off-by: Eric Biggers <ebiggers@google.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2020-01-17 16:48:42 -08:00
Jaegeuk Kim	b06af2aff2	f2fs: convert inline_dir early before starting rename If we hit an error during rename, we'll get two dentries in different directories. Chao adds to check the room in inline_dir which can avoid needless inversion. This should be done by inode_lock(&old_dir). Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2020-01-17 16:48:42 -08:00
Chao Yu	fe396ad8e7	f2fs: fix memleak of kobject If kobject_init_and_add() failed, caller needs to invoke kobject_put() to release kobject explicitly. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2020-01-17 16:48:42 -08:00
Chao Yu	3e5e479a39	f2fs: fix to add swap extent correctly As Youling reported in mailing list: https://www.linuxquestions.org/questions/linux-newbie-8/the-file-system-f2fs-is-broken-4175666043/ https://www.linux.org/threads/the-file-system-f2fs-is-broken.26490/ There is a test case can corrupt f2fs image: - dd if=/dev/zero of=/swapfile bs=1M count=4096 - chmod 600 /swapfile - mkswap /swapfile - swapon --discard /swapfile The root cause is f2fs_swap_activate() intends to return zero value to setup_swap_extents() to enable SWP_FS mode (swap file goes through fs), in this flow, setup_swap_extents() setups swap extent with wrong block address range, result in discard_swap() erasing incorrect address. Because f2fs_swap_activate() has pinned swapfile, its data block address will not change, it's safe to let swap to handle IO through raw device, so we can get rid of SWAP_FS mode and initial swap extents inside f2fs_swap_activate(), by this way, later discard_swap() can trim in right address range. Fixes: `4969c06a0d` ("f2fs: support swap file w/ DIO") Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2020-01-17 16:48:42 -08:00
Jaegeuk Kim	4eea93e3ff	f2fs: run fsck when getting bad inode during GC This is to avoid inifinite GC when trying to disable checkpoint. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2020-01-17 16:48:42 -08:00
Chao Yu	4c8ff7095b	f2fs: support data compression This patch tries to support compression in f2fs. - New term named cluster is defined as basic unit of compression, file can be divided into multiple clusters logically. One cluster includes 4 << n (n >= 0) logical pages, compression size is also cluster size, each of cluster can be compressed or not. - In cluster metadata layout, one special flag is used to indicate cluster is compressed one or normal one, for compressed cluster, following metadata maps cluster to [1, 4 << n - 1] physical blocks, in where f2fs stores data including compress header and compressed data. - In order to eliminate write amplification during overwrite, F2FS only support compression on write-once file, data can be compressed only when all logical blocks in file are valid and cluster compress ratio is lower than specified threshold. - To enable compression on regular inode, there are three ways: * chattr +c file * chattr +c dir; touch dir/file * mount w/ -o compress_extension=ext; touch file.ext Compress metadata layout: [Dnode Structure] +-----------------------------------------------+ \| cluster 1 \| cluster 2 \| ......... \| cluster N \| +-----------------------------------------------+ . . . . . . . . . Compressed Cluster . . Normal Cluster . +----------+---------+---------+---------+ +---------+---------+---------+---------+ \|compr flag\| block 1 \| block 2 \| block 3 \| \| block 1 \| block 2 \| block 3 \| block 4 \| +----------+---------+---------+---------+ +---------+---------+---------+---------+ . . . . . . +-------------+-------------+----------+----------------------------+ \| data length \| data chksum \| reserved \| compressed data \| +-------------+-------------+----------+----------------------------+ Changelog: 20190326: - fix error handling of read_end_io(). - remove unneeded comments in f2fs_encrypt_one_page(). 20190327: - fix wrong use of f2fs_cluster_is_full() in f2fs_mpage_readpages(). - don't jump into loop directly to avoid uninitialized variables. - add TODO tag in error path of f2fs_write_cache_pages(). 20190328: - fix wrong merge condition in f2fs_read_multi_pages(). - check compressed file in f2fs_post_read_required(). 20190401 - allow overwrite on non-compressed cluster. - check cluster meta before writing compressed data. 20190402 - don't preallocate blocks for compressed file. - add lz4 compress algorithm - process multiple post read works in one workqueue Now f2fs supports processing post read work in multiple workqueue, it shows low performance due to schedule overhead of multiple workqueue executing orderly. 20190921 - compress: support buffered overwrite C: compress cluster flag V: valid block address N: NEW_ADDR One cluster contain 4 blocks before overwrite after overwrite - VVVV -> CVNN - CVNN -> VVVV - CVNN -> CVNN - CVNN -> CVVV - CVVV -> CVNN - CVVV -> CVVV 20191029 - add kconfig F2FS_FS_COMPRESSION to isolate compression related codes, add kconfig F2FS_FS_{LZO,LZ4} to cover backend algorithm. note that: will remove lzo backend if Jaegeuk agreed that too. - update codes according to Eric's comments. 20191101 - apply fixes from Jaegeuk 20191113 - apply fixes from Jaegeuk - split workqueue for fsverity 20191216 - apply fixes from Jaegeuk 20200117 - fix to avoid NULL pointer dereference [Jaegeuk Kim] - add tracepoint for f2fs_{,de}compress_pages() - fix many bugs and add some compression stats - fix overwrite/mmap bugs - address 32bit build error, reported by Geert. - bug fixes when handling errors and i_compressed_blocks Reported-by: <noreply@ellerman.id.au> Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2020-01-17 16:48:07 -08:00
Jaegeuk Kim	820d366736	f2fs: free sysfs kobject Detected kmemleak. Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2020-01-15 13:43:49 -08:00
Jaegeuk Kim	2c4e0c528e	f2fs: declare nested quota_sem and remove unnecessary sems 1. f2fs_quota_sync -> down_read(&sbi->quota_sem) -> dquot_writeback_dquots -> f2fs_dquot_commit -> down_read(&sbi->quota_sem) 2. f2fs_quota_sync -> down_read(&sbi->quota_sem) -> f2fs_write_data_pages -> f2fs_write_single_data_page -> down_write(&F2FS_I(inode)->i_sem) f2fs_mkdir -> f2fs_do_add_link -> down_write(&F2FS_I(inode)->i_sem) -> f2fs_init_inode_metadata -> f2fs_new_node_page -> dquot_alloc_inode -> f2fs_dquot_mark_dquot_dirty -> down_read(&sbi->quota_sem) Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2020-01-15 13:43:49 -08:00
Jaegeuk Kim	762e4db545	f2fs: don't put new_page twice in f2fs_rename In f2fs_rename(), new_page is gone after f2fs_set_link(), but it tries to put again when whiteout is failed and jumped to put_out_dir. Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2020-01-15 13:43:49 -08:00
Jaegeuk Kim	5b1dbb082f	f2fs: set I_LINKABLE early to avoid wrong access by vfs This patch moves setting I_LINKABLE early in rename2(whiteout) to avoid the below warning. [ 3189.163385] WARNING: CPU: 3 PID: 59523 at fs/inode.c:358 inc_nlink+0x32/0x40 [ 3189.246979] Call Trace: [ 3189.248707] f2fs_init_inode_metadata+0x2d6/0x440 [f2fs] [ 3189.251399] f2fs_add_inline_entry+0x162/0x8c0 [f2fs] [ 3189.254010] f2fs_add_dentry+0x69/0xe0 [f2fs] [ 3189.256353] f2fs_do_add_link+0xc5/0x100 [f2fs] [ 3189.258774] f2fs_rename2+0xabf/0x1010 [f2fs] [ 3189.261079] vfs_rename+0x3f8/0xaa0 [ 3189.263056] ? tomoyo_path_rename+0x44/0x60 [ 3189.265283] ? do_renameat2+0x49b/0x550 [ 3189.267324] do_renameat2+0x49b/0x550 [ 3189.269316] __x64_sys_renameat2+0x20/0x30 [ 3189.271441] do_syscall_64+0x5a/0x230 [ 3189.273410] entry_SYSCALL_64_after_hwframe+0x49/0xbe [ 3189.275848] RIP: 0033:0x7f270b4d9a49 Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2020-01-15 13:43:48 -08:00
Eric Biggers	542989b674	f2fs: don't keep META_MAPPING pages used for moving verity file blocks META_MAPPING is used to move blocks for both encrypted and verity files. So the META_MAPPING invalidation condition in do_checkpoint() should consider verity too, not just encrypt. Signed-off-by: Eric Biggers <ebiggers@google.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2020-01-15 13:43:48 -08:00
Chao Yu	f543805fcd	f2fs: introduce private bioset In low memory scenario, we can allocate multiple bios without submitting any of them. - f2fs_write_checkpoint() - block_operations() - f2fs_sync_node_pages() step 1) flush cold nodes, allocate new bio from mempool - bio_alloc() - mempool_alloc() step 2) flush hot nodes, allocate a bio from mempool - bio_alloc() - mempool_alloc() step 3) flush warm nodes, be stuck in below call path - bio_alloc() - mempool_alloc() - loop to wait mempool element release, as we only reserved memory for two bio allocation, however above allocated two bios may never be submitted. So we need avoid using default bioset, in this patch we introduce a private bioset, in where we enlarg mempool element count to total number of log header, so that we can make sure we have enough backuped memory pool in scenario of allocating/holding multiple bios. Signed-off-by: Gao Xiang <gaoxiang25@huawei.com> Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2020-01-15 13:43:48 -08:00
Sahitya Tummala	0e6d01643c	f2fs: cleanup duplicate stats for atomic files Remove duplicate sbi->aw_cnt stats counter that tracks the number of atomic files currently opened (it also shows incorrect value sometimes). Use more relit lable sbi->atomic_files to show in the stats. Signed-off-by: Sahitya Tummala <stummala@codeaurora.org> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2020-01-15 13:43:48 -08:00
Shin'ichiro Kawasaki	d508c94e45	f2fs: Check write pointer consistency of non-open zones To catch f2fs bugs in write pointer handling code for zoned block devices, check write pointers of non-open zones that current segments do not point to. Do this check at mount time, after the fsync data recovery and current segments' write pointer consistency fix. Or when fsync data recovery is disabled by mount option, do the check when there is no fsync data. Check two items comparing write pointers with valid block maps in SIT. The first item is check for zones with no valid blocks. When there is no valid blocks in a zone, the write pointer should be at the start of the zone. If not, next write operation to the zone will cause unaligned write error. If write pointer is not at the zone start, reset the write pointer to place at the zone start. The second item is check between the write pointer position and the last valid block in the zone. It is unexpected that the last valid block position is beyond the write pointer. In such a case, report as a bug. Fix is not required for such zone, because the zone is not selected for next write operation until the zone get discarded. Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2020-01-15 13:43:48 -08:00
Shin'ichiro Kawasaki	c426d99127	f2fs: Check write pointer consistency of open zones On sudden f2fs shutdown, write pointers of zoned block devices can go further but f2fs meta data keeps current segments at positions before the write operations. After remounting the f2fs, this inconsistency causes write operations not at write pointers and "Unaligned write command" error is reported. To avoid the error, compare current segments with write pointers of open zones the current segments point to, during mount operation. If the write pointer position is not aligned with the current segment position, assign a new zone to the current segment. Also check the newly assigned zone has write pointer at zone start. If not, reset write pointer of the zone. Perform the consistency check during fsync recovery. Not to lose the fsync data, do the check after fsync data gets restored and before checkpoint commit which flushes data at current segment positions. Not to cause conflict with kworker's dirfy data/node flush, do the fix within SBI_POR_DOING protection. Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2020-01-15 13:42:14 -08:00
Eric Biggers	fd39073dba	fs-verity: implement readahead of Merkle tree pages When fs-verity verifies data pages, currently it reads each Merkle tree page synchronously using read_mapping_page(). Therefore, when the Merkle tree pages aren't already cached, fs-verity causes an extra 4 KiB I/O request for every 512 KiB of data (assuming that the Merkle tree uses SHA-256 and 4 KiB blocks). This results in more I/O requests and performance loss than is strictly necessary. Therefore, implement readahead of the Merkle tree pages. For simplicity, we take advantage of the fact that the kernel already does readahead of the file's data, just like it does for any other file. Due to this, we don't really need a separate readahead state (struct file_ra_state) just for the Merkle tree, but rather we just need to piggy-back on the existing data readahead requests. We also only really need to bother with the first level of the Merkle tree, since the usual fan-out factor is 128, so normally over 99% of Merkle tree I/O requests are for the first level. Therefore, make fsverity_verify_bio() enable readahead of the first Merkle tree level, for up to 1/4 the number of pages in the bio, when it sees that the REQ_RAHEAD flag is set on the bio. The readahead size is then passed down to ->read_merkle_tree_page() for the filesystem to (optionally) implement if it sees that the requested page is uncached. While we're at it, also make build_merkle_tree_level() set the Merkle tree readahead size, since it's easy to do there. However, for now don't set the readahead size in fsverity_verify_page(), since currently it's only used to verify holes on ext4 and f2fs, and it would need parameters added to know how much to read ahead. This patch significantly improves fs-verity sequential read performance. Some quick benchmarks with 'cat'-ing a 250MB file after dropping caches: On an ARM64 phone (using sha256-ce): Before: 217 MB/s After: 263 MB/s (compare to sha256sum of non-verity file: 357 MB/s) In an x86_64 VM (using sha256-avx2): Before: 173 MB/s After: 215 MB/s (compare to sha256sum of non-verity file: 223 MB/s) Link: https://lore.kernel.org/r/20200106205533.137005-1-ebiggers@kernel.org Reviewed-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Eric Biggers <ebiggers@google.com>	2020-01-14 13:27:32 -08:00
Herbert Xu	ede7a09fc8	fscrypt: Allow modular crypto algorithms The commit `643fa9612b` ("fscrypt: remove filesystem specific build config option") removed modular support for fs/crypto. This causes the Crypto API to be built-in whenever fscrypt is enabled. This makes it very difficult for me to test modular builds of the Crypto API without disabling fscrypt which is a pain. As fscrypt is still evolving and it's developing new ties with the fs layer, it's hard to build it as a module for now. However, the actual algorithms are not required until a filesystem is mounted. Therefore we can allow them to be built as modules. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Link: https://lore.kernel.org/r/20191227024700.7vrzuux32uyfdgum@gondor.apana.org.au Signed-off-by: Eric Biggers <ebiggers@google.com>	2019-12-31 10:33:51 -06:00
Eric Biggers	3b1ada55b9	fscrypt: don't check for ENOKEY from fscrypt_get_encryption_info() fscrypt_get_encryption_info() returns 0 if the encryption key is unavailable; it never returns ENOKEY. So remove checks for ENOKEY. Link: https://lore.kernel.org/r/20191209212348.243331-1-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com>	2019-12-31 10:33:51 -06:00
Jaegeuk Kim	dd973007bf	f2fs: set GFP_NOFS when moving inline dentries Otherwise, it can cause circular locking dependency reported by mm. Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2019-12-12 13:24:34 -08:00
Jaegeuk Kim	4f4460c08a	f2fs: should avoid recursive filesystem ops We need to use GFP_NOFS, since we did f2fs_lock_op(). Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2019-12-12 13:24:34 -08:00
Jaegeuk Kim	3f188c23d7	f2fs: keep quota data on write_begin failure This patch avoids some unnecessary locks for quota files when write_begin fails. Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2019-12-12 13:24:34 -08:00
Jaegeuk Kim	bdf0329924	f2fs: call f2fs_balance_fs outside of locked page Otherwise, we can hit deadlock by waiting for the locked page in move_data_block in GC. Thread A Thread B - do_page_mkwrite - f2fs_vm_page_mkwrite - lock_page - f2fs_balance_fs - mutex_lock(gc_mutex) - f2fs_gc - do_garbage_collect - ra_data_block - grab_cache_page - f2fs_balance_fs - mutex_lock(gc_mutex) Fixes: `39a8695824` ("f2fs: refactor ->page_mkwrite() flow") Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2019-12-10 16:03:55 -08:00
Jaegeuk Kim	47501f87c6	f2fs: preallocate DIO blocks when forcing buffered_io The previous preallocation and DIO decision like below. allow_outplace_dio !allow_outplace_dio f2fs_force_buffered_io () No_Prealloc / Buffered_IO Prealloc / Buffered_IO !f2fs_force_buffered_io No_Prealloc / DIO Prealloc / DIO But, Javier reported Case () where zoned device bypassed preallocation but fell back to buffered writes in f2fs_direct_IO(), resulting in stale data being read. In order to fix the issue, actually we need to preallocate blocks whenever we fall back to buffered IO like this. No change is made in the other cases. allow_outplace_dio !allow_outplace_dio f2fs_force_buffered_io (*) Prealloc / Buffered_IO Prealloc / Buffered_IO !f2fs_force_buffered_io No_Prealloc / DIO Prealloc / DIO Reported-and-tested-by: Javier Gonzalez <javier@javigon.com> Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> Tested-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Reviewed-by: Javier González <javier@javigon.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2019-12-09 15:57:45 -08:00
Amir Goldstein	eb31e2f63d	utimes: Clamp the timestamps in notify_change() Push clamping timestamps into notify_change(), so in-kernel callers like nfsd and overlayfs will get similar timestamp set behavior as utimes. AV: get rid of clamping in ->setattr() instances; we don't need to bother with that there, with notify_change() doing normalization in all cases now (it already did for implicit case, since current_time() clamps). Suggested-by: Miklos Szeredi <mszeredi@redhat.com> Fixes: `42e729b9dd` ("utimes: Clamp the timestamps before update") Cc: stable@vger.kernel.org # v5.4 Cc: Deepa Dinamani <deepa.kernel@gmail.com> Cc: Jeff Layton <jlayton@kernel.org> Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2019-12-08 19:10:50 -05:00
Linus Torvalds	0da522107e	compat_ioctl: remove most of fs/compat_ioctl.c As part of the cleanup of some remaining y2038 issues, I came to fs/compat_ioctl.c, which still has a couple of commands that need support for time64_t. In completely unrelated work, I spent time on cleaning up parts of this file in the past, moving things out into drivers instead. After Al Viro reviewed an earlier version of this series and did a lot more of that cleanup, I decided to try to completely eliminate the rest of it and move it all into drivers. This series incorporates some of Al's work and many patches of my own, but in the end stops short of actually removing the last part, which is the scsi ioctl handlers. I have patches for those as well, but they need more testing or possibly a rewrite. Signed-off-by: Arnd Bergmann <arnd@arndb.de> -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQIcBAABCAAGBQJdsHCdAAoJEJpsee/mABjZtYkP/1JGl3jFv3Iq/5BCdPkaePP1 RtMJRNfURgK3GeuHUui330PvVjI/pLWXU/VXMK2MPTASpJLzYz3uCaZrpVWEMpDZ +ImzGmgJkITlW1uWU3zOcQhOxTyb1hCZ0Ci+2xn9QAmyOL7prXoXCXDWv3h6iyiF lwG+nW+HNtyx41YG+9bRfKNoG0ZJ+nkJ70BV6u0acQHXWn7Xuupa9YUmBL87hxAL 6dlJfLTJg6q8QSv/Q6LxslfWk2Ti8OOJZOwtFM5R8Bgl0iUcvshiRCKfv/3t9jXD dJNvF1uq8z+gracWK49Qsfq5dnZ2ZxHFUo9u0NjbCrxNvWH/sdvhbaUBuJI75seH VIznCkdxFhrqitJJ8KmxANxG08u+9zSKjSlxG2SmlA4qFx/AoStoHwQXcogJscNb YIXYKmWBvwPzYu09QFAXdHFPmZvp/3HhMWU6o92lvDhsDwzkSGt3XKhCJea4DCaT m+oCcoACqSWhMwdbJOEFofSub4bY43s5iaYuKes+c8O261/Dwg6v/pgIVez9mxXm TBnvCsotq5m8wbwzv99eFqGeJH8zpDHrXxEtRR5KQqMqjLq/OQVaEzmpHZTEuK7n e/V/PAKo2/V63g4k6GApQXDxnjwT+m0aWToWoeEzPYXS6KmtWC91r4bWtslu3rdl bN65armTm7bFFR32Avnu =lgCl -----END PGP SIGNATURE----- Merge tag 'compat-ioctl-5.5' of git://git.kernel.org:/pub/scm/linux/kernel/git/arnd/playground Pull removal of most of fs/compat_ioctl.c from Arnd Bergmann: "As part of the cleanup of some remaining y2038 issues, I came to fs/compat_ioctl.c, which still has a couple of commands that need support for time64_t. In completely unrelated work, I spent time on cleaning up parts of this file in the past, moving things out into drivers instead. After Al Viro reviewed an earlier version of this series and did a lot more of that cleanup, I decided to try to completely eliminate the rest of it and move it all into drivers. This series incorporates some of Al's work and many patches of my own, but in the end stops short of actually removing the last part, which is the scsi ioctl handlers. I have patches for those as well, but they need more testing or possibly a rewrite" * tag 'compat-ioctl-5.5' of git://git.kernel.org:/pub/scm/linux/kernel/git/arnd/playground: (42 commits) scsi: sd: enable compat ioctls for sed-opal pktcdvd: add compat_ioctl handler compat_ioctl: move SG_GET_REQUEST_TABLE handling compat_ioctl: ppp: move simple commands into ppp_generic.c compat_ioctl: handle PPPIOCGIDLE for 64-bit time_t compat_ioctl: move PPPIOCSCOMPRESS to ppp_generic compat_ioctl: unify copy-in of ppp filters tty: handle compat PPP ioctls compat_ioctl: move SIOCOUTQ out of compat_ioctl.c compat_ioctl: handle SIOCOUTQNSD af_unix: add compat_ioctl support compat_ioctl: reimplement SG_IO handling compat_ioctl: move WDIOC handling into wdt drivers fs: compat_ioctl: move FITRIM emulation into file systems gfs2: add compat_ioctl support compat_ioctl: remove unused convert_in_user macro compat_ioctl: remove last RAID handling code compat_ioctl: remove /dev/raw ioctl translation compat_ioctl: remove PCI ioctl translation compat_ioctl: remove joystick ioctl translation ...	2019-12-01 13:46:15 -08:00
Linus Torvalds	b8072d5b3c	\n -----BEGIN PGP SIGNATURE----- iQEzBAABCAAdFiEEq1nRK9aeMoq1VSgcnJ2qBz9kQNkFAl3hAFIACgkQnJ2qBz9k QNkV/gf+Kwn7xHg76YXd15lZYBzhgj/ABAYEEAAVY49OOCK5+XVmmAufHesMZ2lU Solt8PvbQ8d5786bWpaYXgrTU3JW37c6x1MDUPDLQ8goXWzx7pZWvD+Yup558rDa H1aoqvFKLgpeVVqkUdvvv2CDbgZyOgGlkDqWeS+c5pZd1NPFZzUAoU26slvQ5h4f t41mbavOIm5DChQ5UjwRNw+pb09GXaHrPBRJwa1XuJYJWAansBcQIsxiiqt/43Gn AzwUGrsz4vrPBk+Kcd0SGb8vinFVQr19gBFKFeN3rPFUEUn6T0FPBqaYeiNTNE37 AqASYKlIuhcSf0Wdvx6vxwSHsFl5VA== =NGxV -----END PGP SIGNATURE----- Merge tag 'for_v5.5-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs Pull ext2, quota, reiserfs cleanups and fixes from Jan Kara: - Refactor the quota on/off kernel internal interfaces (mostly for ubifs quota support as ubifs does not want to have inodes holding quota information) - A few other small quota fixes and cleanups - Various small ext2 fixes and cleanups - Reiserfs xattr fix and one cleanup * tag 'for_v5.5-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs: (28 commits) ext2: code cleanup for descriptor_loc() fs/quota: handle overflows of sysctl fs.quota.* and report as unsigned long ext2: fix improper function comment ext2: code cleanup for ext2_try_to_allocate() ext2: skip unnecessary operations in ext2_try_to_allocate() ext2: Simplify initialization in ext2_try_to_allocate() ext2: code cleanup by calling ext2_group_last_block_no() ext2: introduce new helper ext2_group_last_block_no() reiserfs: replace open-coded atomic_dec_and_mutex_lock() ext2: check err when partial != NULL quota: Handle quotas without quota inodes in dquot_get_state() quota: Make dquot_disable() work without quota inodes quota: Drop dquot_enable() fs: Use dquot_load_quota_inode() from filesystems quota: Rename vfs_load_quota_inode() to dquot_load_quota_inode() quota: Simplify dquot_resume() quota: Factor out setup of quota inode quota: Check that quota is not dirty before release quota: fix livelock in dquot_writeback_dquots ext2: don't set *count in the case of failure in ext2_try_to_allocate() ...	2019-11-30 11:16:07 -08:00
Linus Torvalds	8f45533e9d	f2fs-for-5.5-rc1 In this round, we've introduced fairly small number of patches as below. Enhancement: - improve the in-place-update IO flow - allocate segment to guarantee no GC for pinned files Bug fix: - fix updatetime in lazytime mode - potential memory leak in f2fs_listxattr - record parent inode number in rename2 correctly - fix deadlock in f2fs_gc along with atomic writes - avoid needless data migration in GC -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEE00UqedjCtOrGVvQiQBSofoJIUNIFAl3e1XkACgkQQBSofoJI UNJ0GhAAhVIX4J91CLnVSh0ik1XCaI6h/dFeS6kbDd8oxzQm/qt64b59aZqgy7Rk iblGWfj8uPP5yO60pqb5uN4a0hybptVZSEldbhF0Xv0zUeVoT7C1ksTMrdUd1p7d YkO8G+V4QBBrtpKG1KKKEncrvcdx4n9QHxGsRh4z5vXZH7sEmH7+N8OE88MaPjdZ UWqYk0S0GoZBhPe7c8pQuD/PM+WJJH4Lewgw5kK21eAjOKI+yZKb+bY2tGjo5dA1 nzYO72CRMV4VEKsnxTZ/LCB2kCXeexaGuiVPyHjCmgAh990cLjsCWIbJ8EJu7uAa vAo6/EMfgfPkPt5Y7uWGR4EeNT7AFhUoMuoQ9zdXzecY48D4Gz58o87Q+OFY3ipZ W2OSf92pEJyfumE5o8wN435gaRYUjjCo1SMoIQABNav411XrBVoRwjvkV3DyA6af Bs1bafz2hR/E1q0uoZvLWC5waiHy9605OkKMs/y8IRsn6yhRep/tv3KLk2Dz3fOO LxenhuVO9bQDCheEcH15qIljxTuyfTyUOa9UrFXOwn4mK61J8A/Gs+SiqW0y28oA feSw7cLPxK0OlYQgql24JfJN/Xt523WmCSfXfe7TCUDTDkBpmsdhFwHYZyCLzqt+ FyBhf2DF/BGzKMT28oc7StO43mIvOc1Wk+jfJFW+hld5ncAJxCE= =qyrd -----END PGP SIGNATURE----- Merge tag 'f2fs-for-5.5' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs Pull f2fs updates from Jaegeuk Kim: "In this round, we've introduced fairly small number of patches as below. Enhancements: - improve the in-place-update IO flow - allocate segment to guarantee no GC for pinned files Bug fixes: - fix updatetime in lazytime mode - potential memory leak in f2fs_listxattr - record parent inode number in rename2 correctly - fix deadlock in f2fs_gc along with atomic writes - avoid needless data migration in GC" * tag 'f2fs-for-5.5' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: f2fs: stop GC when the victim becomes fully valid f2fs: expose main_blkaddr in sysfs f2fs: choose hardlimit when softlimit is larger than hardlimit in f2fs_statfs_project() f2fs: Fix deadlock in f2fs_gc() context during atomic files handling f2fs: show f2fs instance in printk_ratelimited f2fs: fix potential overflow f2fs: fix to update dir's i_pino during cross_rename f2fs: support aligned pinned file f2fs: avoid kernel panic on corruption test f2fs: fix wrong description in document f2fs: cache global IPU bio f2fs: fix to avoid memory leakage in f2fs_listxattr f2fs: check total_segments from devices in raw_super f2fs: update multi-dev metadata in resize_fs f2fs: mark recovery flag correctly in read_raw_super_block() f2fs: fix to update time in lazytime mode	2019-11-30 11:02:30 -08:00
Linus Torvalds	1c1ff4836f	fsverity updates for 5.5 Expose the fs-verity bit through statx(). -----BEGIN PGP SIGNATURE----- iIoEABYIADIWIQSacvsUNc7UX4ntmEPzXCl4vpKOKwUCXdtWqhQcZWJpZ2dlcnNA Z29vZ2xlLmNvbQAKCRDzXCl4vpKOK+C9AQCCf8C2KP6DynoGQb9KRYYreJk8js8G IgtlhazJ3j1RJAD/VijFbdwbxGCmiR1Y6BhKq5eaCYD1El68wSwkKuNO3ww= =7WpU -----END PGP SIGNATURE----- Merge tag 'fsverity-for-linus' of git://git.kernel.org/pub/scm/fs/fscrypt/fscrypt Pull fsverity updates from Eric Biggers: "Expose the fs-verity bit through statx()" * tag 'fsverity-for-linus' of git://git.kernel.org/pub/scm/fs/fscrypt/fscrypt: docs: fs-verity: mention statx() support f2fs: support STATX_ATTR_VERITY ext4: support STATX_ATTR_VERITY statx: define STATX_ATTR_VERITY docs: fs-verity: document first supported kernel version	2019-11-25 12:21:23 -08:00
Linus Torvalds	ea4b71bc0b	fscrypt updates for 5.5 - Add the IV_INO_LBLK_64 encryption policy flag which modifies the encryption to be optimized for UFS inline encryption hardware. - For AES-128-CBC, use the crypto API's implementation of ESSIV (which was added in 5.4) rather than doing ESSIV manually. - A few other cleanups. -----BEGIN PGP SIGNATURE----- iIoEABYIADIWIQSacvsUNc7UX4ntmEPzXCl4vpKOKwUCXdtVMxQcZWJpZ2dlcnNA Z29vZ2xlLmNvbQAKCRDzXCl4vpKOK8MVAP44iRzj8ZXu62BhqNOYYcF60s/58QfZ Jo1VdmvO/8MNrAD+P/jW5sqzcB5BLdNzS7pLKGIzsC55uMyp/79xyKK8wQc= =XKWV -----END PGP SIGNATURE----- Merge tag 'fscrypt-for-linus' of git://git.kernel.org/pub/scm/fs/fscrypt/fscrypt Pull fscrypt updates from Eric Biggers: - Add the IV_INO_LBLK_64 encryption policy flag which modifies the encryption to be optimized for UFS inline encryption hardware. - For AES-128-CBC, use the crypto API's implementation of ESSIV (which was added in 5.4) rather than doing ESSIV manually. - A few other cleanups. * tag 'fscrypt-for-linus' of git://git.kernel.org/pub/scm/fs/fscrypt/fscrypt: f2fs: add support for IV_INO_LBLK_64 encryption policies ext4: add support for IV_INO_LBLK_64 encryption policies fscrypt: add support for IV_INO_LBLK_64 policies fscrypt: avoid data race on fscrypt_mode::logged_impl_name docs: ioctl-number: document fscrypt ioctl numbers fscrypt: zeroize fscrypt_info before freeing fscrypt: remove struct fscrypt_ctx fscrypt: invoke crypto API for ESSIV handling	2019-11-25 12:19:28 -08:00
Jaegeuk Kim	803e74be04	f2fs: stop GC when the victim becomes fully valid We must stop GC, once the segment becomes fully valid. Otherwise, it can produce another dirty segments by moving valid blocks in the segment partially. Ramon hit no free segment panic sometimes and saw this case happens when validating reliable file pinning feature. Signed-off-by: Ramon Pantin <pantin@google.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2019-11-25 10:01:28 -08:00
Jaegeuk Kim	a4db59ac90	f2fs: expose main_blkaddr in sysfs Expose in /sys/fs/f2fs/<blockdev>/main_blkaddr the block address where the main area starts. This allows user mode programs to determine: - That pinned files that are made exclusively of fully allocated 2MB segments will never be unpinned by the file system. - Where the main area starts. This is required by programs that want to verify if a file is made exclusively of 2MB f2fs segments, the alignment boundary for segments starts at this address. Testing for 2MB alignment relative to the start of the device is incorrect, because for some filesystems main_blkaddr is not at a 2MB boundary relative to the start of the device. The entry will be used when validating reliable pinning file feature proposed by "f2fs: support aligned pinned file". Signed-off-by: Ramon Pantin <pantin@google.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2019-11-25 10:01:27 -08:00
Chengguang Xu	909110c060	f2fs: choose hardlimit when softlimit is larger than hardlimit in f2fs_statfs_project() Setting softlimit larger than hardlimit seems meaningless for disk quota but currently it is allowed. In this case, there may be a bit of comfusion for users when they run df comamnd to directory which has project quota. For example, we set 20M softlimit and 10M hardlimit of block usage limit for project quota of test_dir(project id 123). [root@hades f2fs]# repquota -P -a *** Report for project quotas on device /dev/nvme0n1p8 Block grace time: 7days; Inode grace time: 7days Block limits File limits Project used soft hard grace used soft hard grace ---------------------------------------------------------------------- 0 -- 4 0 0 1 0 0 123 +- 10248 20480 10240 2 0 0 The result of df command as below: [root@hades f2fs]# df -h /mnt/f2fs/test Filesystem Size Used Avail Use% Mounted on /dev/nvme0n1p8 20M 11M 10M 51% /mnt/f2fs Even though it looks like there is another 10M free space to use, if we write new data to diretory test(inherit project id), the write will fail with errno(-EDQUOT). After this patch, the df result looks like below. [root@hades f2fs]# df -h /mnt/f2fs/test Filesystem Size Used Avail Use% Mounted on /dev/nvme0n1p8 10M 10M 0 100% /mnt/f2fs Signed-off-by: Chengguang Xu <cgxu519@mykernel.net> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2019-11-25 10:01:12 -08:00
Sahitya Tummala	677017d196	f2fs: Fix deadlock in f2fs_gc() context during atomic files handling The FS got stuck in the below stack when the storage is almost full/dirty condition (when FG_GC is being done). schedule_timeout io_schedule_timeout congestion_wait f2fs_drop_inmem_pages_all f2fs_gc f2fs_balance_fs __write_node_page f2fs_fsync_node_pages f2fs_do_sync_file f2fs_ioctl The root cause for this issue is there is a potential infinite loop in f2fs_drop_inmem_pages_all() for the case where gc_failure is true and when there an inode whose i_gc_failures[GC_FAILURE_ATOMIC] is not set. Fix this by keeping track of the total atomic files currently opened and using that to exit from this condition. Fix-suggested-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Sahitya Tummala <stummala@codeaurora.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2019-11-19 14:41:21 -08:00
Chao Yu	c45d6002ff	f2fs: show f2fs instance in printk_ratelimited As Eric mentioned, bare printk{,_ratelimited} won't show which filesystem instance these message is coming from, this patch tries to show fs instance with sb->s_id field in all places we missed before. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2019-11-19 14:41:21 -08:00
Eric Biggers	924e319416	f2fs: support STATX_ATTR_VERITY Set the STATX_ATTR_VERITY bit when the statx() system call is used on a verity file on f2fs. Signed-off-by: Eric Biggers <ebiggers@google.com>	2019-11-13 12:15:34 -08:00
Christoph Hellwig	d41003513e	block: rework zone reporting Avoid the need to allocate a potentially large array of struct blk_zone in the block layer by switching the ->report_zones method interface to a callback model. Now the caller simply supplies a callback that is executed on each reported zone, and private data for it. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com> Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2019-11-12 19:12:07 -07:00
Chao Yu	1f0d5c911b	f2fs: fix potential overflow We expect 64-bit calculation result from below statement, however in 32-bit machine, looped left shift operation on pgoff_t type variable may cause overflow issue, fix it by forcing type cast. page->index << PAGE_SHIFT; Fixes: `26de9b1171` ("f2fs: avoid unnecessary updating inode during fsync") Fixes: `0a2aa8fbb9` ("f2fs: refactor __exchange_data_block for speed up") Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2019-11-07 11:17:39 -08:00
Chao Yu	2a60637f06	f2fs: fix to update dir's i_pino during cross_rename As Eric reported: RENAME_EXCHANGE support was just added to fsstress in xfstests: commit 65dfd40a97b6bbbd2a22538977bab355c5bc0f06 Author: kaixuxia <xiakaixu1987@gmail.com> Date: Thu Oct 31 14:41:48 2019 +0800 fsstress: add EXCHANGE renameat2 support This is causing xfstest generic/579 to fail due to fsck.f2fs reporting errors. I'm not sure what the problem is, but it still happens even with all the fs-verity stuff in the test commented out, so that the test just runs fsstress. generic/579 23s ... [10:02:25] [ 7.745370] run fstests generic/579 at 2019-11-04 10:02:25 _check_generic_filesystem: filesystem on /dev/vdc is inconsistent (see /results/f2fs/results-default/generic/579.full for details) [10:02:47] Ran: generic/579 Failures: generic/579 Failed 1 of 1 tests Xunit report: /results/f2fs/results-default/result.xml Here's the contents of 579.full: _check_generic_filesystem: filesystem on /dev/vdc is inconsistent * fsck.f2fs output * [ASSERT] (__chk_dots_dentries:1378) --> Bad inode number[0x24] for '..', parent parent ino is [0xd10] The root cause is that we forgot to update directory's i_pino during cross_rename, fix it. Fixes: `32f9bc25cb` ("f2fs: support ->rename2()") Signed-off-by: Chao Yu <yuchao0@huawei.com> Tested-by: Eric Biggers <ebiggers@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2019-11-07 11:15:39 -08:00
Jaegeuk Kim	f5a53edcf0	f2fs: support aligned pinned file This patch supports 2MB-aligned pinned file, which can guarantee no GC at all by allocating fully valid 2MB segment. Check free segments by has_not_enough_free_secs() with large budget. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2019-11-07 10:40:59 -08:00
Jaegeuk Kim	bc005a4d53	f2fs: avoid kernel panic on corruption test xfstests/generic/475 complains kernel warn/panic while testing corrupted disk. Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2019-11-07 10:40:59 -08:00
Ajay Joshi	6c1b1da58f	block: add zone open, close and finish operations Zoned block devices (ZBC and ZAC devices) allow an explicit control over the condition (state) of zones. The operations allowed are: * Open a zone: Transition to open condition to indicate that a zone will actively be written * Close a zone: Transition to closed condition to release the drive resources used for writing to a zone * Finish a zone: Transition an open or closed zone to the full condition to prevent write operations To enable this control for in-kernel zoned block device users, define the new request operations REQ_OP_ZONE_OPEN, REQ_OP_ZONE_CLOSE and REQ_OP_ZONE_FINISH as well as the generic function blkdev_zone_mgmt() for submitting these operations on a range of zones. This results in blkdev_reset_zones() removal and replacement with this new zone magement function. Users of blkdev_reset_zones() (f2fs and dm-zoned) are updated accordingly. Contains contributions from Matias Bjorling, Hans Holmberg, Dmitry Fomichev, Keith Busch, Damien Le Moal and Christoph Hellwig. Reviewed-by: Javier González <javier@javigon.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Ajay Joshi <ajay.joshi@wdc.com> Signed-off-by: Matias Bjorling <matias.bjorling@wdc.com> Signed-off-by: Hans Holmberg <hans.holmberg@wdc.com> Signed-off-by: Dmitry Fomichev <dmitry.fomichev@wdc.com> Signed-off-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2019-11-07 06:31:48 -07:00
Eric Biggers	0eee17e332	f2fs: add support for IV_INO_LBLK_64 encryption policies f2fs inode numbers are stable across filesystem resizing, and f2fs inode and file logical block numbers are always 32-bit. So f2fs can always support IV_INO_LBLK_64 encryption policies. Wire up the needed fscrypt_operations to declare support. Acked-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Eric Biggers <ebiggers@google.com>	2019-11-06 12:34:42 -08:00
Jan Kara	7212b95e61	fs: Use dquot_load_quota_inode() from filesystems Use dquot_load_quota_inode from filesystems instead of dquot_enable(). In all three cases we want to load quota inode and never use the function to update quota flags. Signed-off-by: Jan Kara <jack@suse.cz>	2019-11-04 09:58:05 +01:00

1 2 3 4 5 ...

2892 Commits