2
0
mirror of https://github.com/edk2-porting/linux-next.git synced 2024-12-22 20:23:57 +08:00
Commit Graph

2828 Commits

Author SHA1 Message Date
Chao Yu
985100035e f2fs: add prefix for f2fs slab cache name
In order to avoid polluting global slab cache namespace.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-03-19 11:41:26 -07:00
Chao Yu
5df7731f60 f2fs: introduce DEFAULT_IO_TIMEOUT
As Geert Uytterhoeven reported:

for parameter HZ/50 in congestion_wait(BLK_RW_ASYNC, HZ/50);

On some platforms, HZ can be less than 50, then unexpected 0 timeout
jiffies will be set in congestion_wait().

This patch introduces a macro DEFAULT_IO_TIMEOUT to wrap a determinate
value with msecs_to_jiffies(20) to instead HZ/50 to avoid such issue.

Quoted from Geert Uytterhoeven:

"A timeout of HZ means 1 second.
HZ/50 means 20 ms, but has the risk of being zero, if HZ < 50.

If you want to use a timeout of 20 ms, you best use msecs_to_jiffies(20),
as that takes care of the special cases, and never returns 0."

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-03-19 11:41:26 -07:00
Jaegeuk Kim
2bac07635d f2fs: skip GC when section is full
This fixes skipping GC when segment is full in large section.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-03-19 11:41:26 -07:00
Jaegeuk Kim
8c7b9ac129 f2fs: add migration count iff migration happens
If first segment is empty and migration_granularity is 1, we can't move this
at all.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-03-19 11:41:26 -07:00
Chao Yu
bbbc34fd66 f2fs: clean up bggc mount option
There are three status for background gc: on, off and sync, it's
a little bit confused to use test_opt(BG_GC) and test_opt(FORCE_FG_GC)
combinations to indicate status of background gc.

So let's remove F2FS_MOUNT_BG_GC and F2FS_MOUNT_FORCE_FG_GC mount
options, and add F2FS_OPTION().bggc_mode with below three status
to clean up codes and enhance bggc mode's scalability.

enum {
	BGGC_MODE_ON,		/* background gc is on */
	BGGC_MODE_OFF,		/* background gc is off */
	BGGC_MODE_SYNC,		/*
				 * background gc is on, migrating blocks
				 * like foreground gc
				 */
};

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-03-19 11:41:25 -07:00
Chao Yu
b0332a0f95 f2fs: clean up lfs/adaptive mount option
This patch removes F2FS_MOUNT_ADAPTIVE and F2FS_MOUNT_LFS mount options,
and add F2FS_OPTION.fs_mode with below two status to indicate filesystem
mode.

enum {
	FS_MODE_ADAPTIVE,	/* use both lfs/ssr allocation */
	FS_MODE_LFS,		/* use lfs allocation only */
};

It can enhance code readability and fs mode's scalability.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-03-19 11:41:25 -07:00
Chao Yu
a9117eca1d f2fs: fix to show norecovery mount option
Previously, 'norecovery' mount option will be shown as
'disable_roll_forward', fix to show original option name correctly.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-03-19 11:41:25 -07:00
Chao Yu
ba3b583cff f2fs: clean up parameter of macro XATTR_SIZE()
Just cleanup, no logic change.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-03-19 11:41:25 -07:00
Chao Yu
a2ced1ce10 f2fs: clean up codes with {f2fs_,}data_blkaddr()
- rename datablock_addr() to data_blkaddr().
- wrap data_blkaddr() with f2fs_data_blkaddr() to clean up
parameters.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-03-19 11:41:25 -07:00
Jaegeuk Kim
a7e679b533 f2fs: show mounted time
Let's show mounted time.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-03-19 11:41:25 -07:00
Takashi Iwai
c6d5789bea f2fs: Use scnprintf() for avoiding potential buffer overflow
Since snprintf() returns the would-be-output size instead of the
actual output size, the succeeding calls may go beyond the given
buffer limit.  Fix it by replacing with scnprintf().

Signed-off-by: Takashi Iwai <tiwai@suse.de>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-03-19 11:37:56 -07:00
Chao Yu
2536ac6872 f2fs: allow to clear F2FS_COMPR_FL flag
If regular inode has no compressed cluster, allow using 'chattr -c'
to remove its compress flag, recovering it to a non-compressed file.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-03-11 08:25:38 -07:00
Chao Yu
6cfdf15fdb f2fs: fix to check dirty pages during compressed inode conversion
Compressed cluster can be generated during dirty data writeback,
if there is dirty pages on compressed inode, it needs to disable
converting compressed inode to non-compressed one.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-03-11 08:25:38 -07:00
Chao Yu
96f5b4fa56 f2fs: fix to account compressed inode correctly
stat_inc_compr_inode() needs to check FI_COMPRESSED_FILE flag, so
in f2fs_disable_compressed_file(), we should call stat_dec_compr_inode()
before clearing FI_COMPRESSED_FILE flag.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-03-11 08:25:38 -07:00
Jaegeuk Kim
99eabb914e f2fs: fix wrong check on F2FS_IOC_FSSETXATTR
This fixes the incorrect failure when enabling project quota on casefold-enabled
file.

Cc: Daniel Rosenberg <drosen@google.com>
Cc: kernel-team@android.com
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-03-10 09:18:33 -07:00
Chao Yu
95978caa13 f2fs: fix to avoid use-after-free in f2fs_write_multi_pages()
In compress cluster, if physical block number is less than logic
page number, race condition will cause use-after-free issue as
described below:

- f2fs_write_compressed_pages
 - fio.page = cic->rpages[0];
 - f2fs_outplace_write_data
					- f2fs_compress_write_end_io
					 - kfree(cic->rpages);
					 - kfree(cic);
 - fio.page = cic->rpages[1];

f2fs_write_multi_pages+0xfd0/0x1a98
f2fs_write_data_pages+0x74c/0xb5c
do_writepages+0x64/0x108
__writeback_single_inode+0xdc/0x4b8
writeback_sb_inodes+0x4d0/0xa68
__writeback_inodes_wb+0x88/0x178
wb_writeback+0x1f0/0x424
wb_workfn+0x2f4/0x574
process_one_work+0x210/0x48c
worker_thread+0x2e8/0x44c
kthread+0x110/0x120
ret_from_fork+0x10/0x18

Fixes: 4c8ff7095b ("f2fs: support data compression")
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-03-10 09:18:33 -07:00
Chao Yu
06c7540fd2 f2fs: fix to avoid using uninitialized variable
In f2fs_vm_page_mkwrite(), if inode is compress one, and current mmapped
page locates in compressed cluster, we have to call f2fs_get_dnode_of_data()
to get its physical block address before f2fs_wait_on_block_writeback().

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-03-10 09:18:33 -07:00
Chao Yu
7a88ddb560 f2fs: fix inconsistent comments
Lack of maintenance on comments may mislead developers, fix them.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-03-10 09:18:33 -07:00
Chao Yu
3addc1aed3 f2fs: remove i_sem lock coverage in f2fs_setxattr()
f2fs_inode.xattr_ver field was gone after commit d260081ccf
("f2fs: change recovery policy of xattr node block"), remove i_sem
lock coverage in f2fs_setxattr()

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-03-10 09:18:32 -07:00
Chao Yu
c10c982032 f2fs: cover last_disk_size update with spinlock
This change solves below hangtask issue:

INFO: task kworker/u16:1:58 blocked for more than 122 seconds.
      Not tainted 5.6.0-rc2-00590-g9983bdae4974e #11
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
kworker/u16:1   D    0    58      2 0x00000000
Workqueue: writeback wb_workfn (flush-179:0)
Backtrace:
 (__schedule) from [<c0913234>] (schedule+0x78/0xf4)
 (schedule) from [<c017ec74>] (rwsem_down_write_slowpath+0x24c/0x4c0)
 (rwsem_down_write_slowpath) from [<c0915f2c>] (down_write+0x6c/0x70)
 (down_write) from [<c0435b80>] (f2fs_write_single_data_page+0x608/0x7ac)
 (f2fs_write_single_data_page) from [<c0435fd8>] (f2fs_write_cache_pages+0x2b4/0x7c4)
 (f2fs_write_cache_pages) from [<c043682c>] (f2fs_write_data_pages+0x344/0x35c)
 (f2fs_write_data_pages) from [<c0267ee8>] (do_writepages+0x3c/0xd4)
 (do_writepages) from [<c0310cbc>] (__writeback_single_inode+0x44/0x454)
 (__writeback_single_inode) from [<c03112d0>] (writeback_sb_inodes+0x204/0x4b0)
 (writeback_sb_inodes) from [<c03115cc>] (__writeback_inodes_wb+0x50/0xe4)
 (__writeback_inodes_wb) from [<c03118f4>] (wb_writeback+0x294/0x338)
 (wb_writeback) from [<c0312dac>] (wb_workfn+0x35c/0x54c)
 (wb_workfn) from [<c014f2b8>] (process_one_work+0x214/0x544)
 (process_one_work) from [<c014f634>] (worker_thread+0x4c/0x574)
 (worker_thread) from [<c01564fc>] (kthread+0x144/0x170)
 (kthread) from [<c01010e8>] (ret_from_fork+0x14/0x2c)

Reported-and-tested-by: Ondřej Jirman <megi@xff.cz>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-03-10 09:18:32 -07:00
Chao Yu
d940aa07ed f2fs: fix to check i_compr_blocks correctly
inode.i_blocks counts based on 512byte sector, we need to convert
to 4kb sized block count before comparing to i_compr_blocks.

In addition, add to print message when sanity check on inode
compression configs failed.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-03-10 09:18:29 -07:00
Chao Yu
df77fbd8c5 f2fs: fix to avoid potential deadlock
Using f2fs_trylock_op() in f2fs_write_compressed_pages() to avoid potential
deadlock like we did in f2fs_write_single_data_page().

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-02-27 10:16:45 -08:00
Chao Yu
097a768650 f2fs: add missing function name in kernel message
Otherwise, we can not distinguish the exact location of messages,
when there are more than one places printing same message.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-02-27 10:16:45 -08:00
Chao Yu
0b32dc1864 f2fs: recycle unused compress_data.chksum feild
In Struct compress_data, chksum field was never used, remove it.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-02-27 10:16:45 -08:00
Chao Yu
61fbae2b2b f2fs: fix to avoid NULL pointer dereference
Unable to handle kernel NULL pointer dereference at virtual address 00000000
PC is at f2fs_free_dic+0x60/0x2c8
LR is at f2fs_decompress_pages+0x3c4/0x3e8
f2fs_free_dic+0x60/0x2c8
f2fs_decompress_pages+0x3c4/0x3e8
__read_end_io+0x78/0x19c
f2fs_post_read_work+0x6c/0x94
process_one_work+0x210/0x48c
worker_thread+0x2e8/0x44c
kthread+0x110/0x120
ret_from_fork+0x10/0x18

In f2fs_free_dic(), we can not use f2fs_put_page(,1) to release dic->tpages[i],
as the page's mapping is NULL.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-02-27 10:16:45 -08:00
Eric Biggers
7fa6d59816 f2fs: fix leaking uninitialized memory in compressed clusters
When the compressed data of a cluster doesn't end on a page boundary,
the remainder of the last page must be zeroed in order to avoid leaking
uninitialized memory to disk.

Fixes: 4c8ff7095b ("f2fs: support data compression")
Signed-off-by: Eric Biggers <ebiggers@google.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-02-27 10:16:44 -08:00
Sahitya Tummala
bf22c3cc8c f2fs: fix the panic in do_checkpoint()
There could be a scenario where f2fs_sync_meta_pages() will not
ensure that all F2FS_DIRTY_META pages are submitted for IO. Thus,
resulting in the below panic in do_checkpoint() -

f2fs_bug_on(sbi, get_pages(sbi, F2FS_DIRTY_META) &&
				!f2fs_cp_error(sbi));

This can happen in a low-memory condition, where shrinker could
also be doing the writepage operation (stack shown below)
at the same time when checkpoint is running on another core.

schedule
down_write
f2fs_submit_page_write -> by this time, this page in page cache is tagged
			as PAGECACHE_TAG_WRITEBACK and PAGECACHE_TAG_DIRTY
			is cleared, due to which f2fs_sync_meta_pages()
			cannot sync this page in do_checkpoint() path.
f2fs_do_write_meta_page
__f2fs_write_meta_page
f2fs_write_meta_page
shrink_page_list
shrink_inactive_list
shrink_node_memcg
shrink_node
kswapd

Signed-off-by: Sahitya Tummala <stummala@codeaurora.org>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-02-27 10:16:44 -08:00
Chao Yu
dc5a941223 f2fs: fix to wait all node page writeback
There is a race condition that we may miss to wait for all node pages
writeback, fix it.

- fsync()				- shrink
 - f2fs_do_sync_file
					 - __write_node_page
					  - set_page_writeback(page#0)
					  : remove DIRTY/TOWRITE flag
  - f2fs_fsync_node_pages
  : won't find page #0 as TOWRITE flag was removeD
  - f2fs_wait_on_node_pages_writeback
  : wont' wait page #0 writeback as it was not in fsync_node_list list.
					   - f2fs_add_fsync_node_entry

Fixes: 50fa53eccf ("f2fs: fix to avoid broken of dnode block list")
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-02-27 10:16:44 -08:00
Linus Torvalds
236f453294 Merge branch 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull misc vfs updates from Al Viro:

 - bmap series from cmaiolino

 - getting rid of convolutions in copy_mount_options() (use a couple of
   copy_from_user() instead of the __get_user() crap)

* 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  saner copy_mount_options()
  fibmap: Reject negative block numbers
  fibmap: Use bmap instead of ->bmap method in ioctl_fibmap
  ecryptfs: drop direct calls to ->bmap
  cachefiles: drop direct usage of ->bmap method.
  fs: Enable bmap() function to properly return errors
2020-02-08 13:04:49 -08:00
Linus Torvalds
bddea11b1b Merge branch 'imm.timestamp' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs timestamp updates from Al Viro:
 "More 64bit timestamp work"

* 'imm.timestamp' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  kernfs: don't bother with timestamp truncation
  fs: Do not overload update_time
  fs: Delete timespec64_trunc()
  fs: ubifs: Eliminate timespec64_trunc() usage
  fs: ceph: Delete timespec64_trunc() usage
  fs: cifs: Delete usage of timespec64_trunc
  fs: fat: Eliminate timespec64_trunc() usage
  utimes: Clamp the timestamps in notify_change()
2020-02-05 05:02:42 +00:00
Masahiro Yamada
45586c7078 treewide: remove redundant IS_ERR() before error code check
'PTR_ERR(p) == -E*' is a stronger condition than IS_ERR(p).
Hence, IS_ERR(p) is unneeded.

The semantic patch that generates this commit is as follows:

// <smpl>
@@
expression ptr;
constant error_code;
@@
-IS_ERR(ptr) && (PTR_ERR(ptr) == - error_code)
+PTR_ERR(ptr) == - error_code
// </smpl>

Link: http://lkml.kernel.org/r/20200106045833.1725-1-masahiroy@kernel.org
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Cc: Julia Lawall <julia.lawall@lip6.fr>
Acked-by: Stephen Boyd <sboyd@kernel.org> [drivers/clk/clk.c]
Acked-by: Bartosz Golaszewski <bgolaszewski@baylibre.com> [GPIO]
Acked-by: Wolfram Sang <wsa@the-dreams.de> [drivers/i2c]
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> [acpi/scan.c]
Acked-by: Rob Herring <robh@kernel.org>
Cc: Eric Biggers <ebiggers@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-02-04 03:05:27 +00:00
Carlos Maiolino
30460e1ea3 fs: Enable bmap() function to properly return errors
By now, bmap() will either return the physical block number related to
the requested file offset or 0 in case of error or the requested offset
maps into a hole.
This patch makes the needed changes to enable bmap() to proper return
errors, using the return value as an error return, and now, a pointer
must be passed to bmap() to be filled with the mapped physical block.

It will change the behavior of bmap() on return:

- negative value in case of error
- zero on success or map fell into a hole

In case of a hole, the *block will be zero too

Since this is a prep patch, by now, the only error return is -EINVAL if
->bmap doesn't exist.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-02-03 08:05:37 -05:00
Linus Torvalds
6e135baed8 f2fs-for-5.6
In this series, we've implemented transparent compression experimentally. It
 supports LZO and LZ4, but will add more later as we investigate in the field
 more. At this point, the feature doesn't expose compressed space to user
 directly in order to guarantee potential data updates later to the space.
 Instead, the main goal is to reduce data writes to flash disk as much as
 possible, resulting in extending disk life time as well as relaxing IO
 congestion. Alternatively, we're also considering to add ioctl() to reclaim
 compressed space and show it to user after putting the immutable bit.
 
 Enhancement:
  - add compression support
  - avoid unnecessary locks in quota ops
  - harden power-cut scenario for zoned block devices
  - use private bio_set to avoid IO congestion
  - replace GC mutex with rwsem to serialize callers
 
 Bug fix:
  - fix dentry consistency and memory corruption in rename()'s error case
  - fix wrong swap extent reports
  - fix casefolding bugs
  - change lock coverage to avoid deadlock
  - avoid GFP_KERNEL under f2fs_lock_op
 
 And, we've cleaned up sysfs entries to prepare no debugfs.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEE00UqedjCtOrGVvQiQBSofoJIUNIFAl4zInwACgkQQBSofoJI
 UNL4Tg/+JBbVEFa3IUBGMdbjfgd/g0Jye++iMAYYGRWT6Ll/IGcHRV9NunITjgWU
 mBZqdhI28kXeiGCcewB1ZvivjLx22X4n6yevHk2B5A6PNe9IDCHi0HOAhJJHkjPH
 ecv2L+vX3Oj4y0+H7JNz9Fo3OIPJvMPtCQWlg1z+VQyhB85zNP7fZlvvIY4tG8yw
 ERo0YNotLqwcF1BxCwNbAhV3aJGDxar+MI//yNzpiwDX7IptVpqestfcoIYc9kKL
 4kSWRyEIGwcuIeyoM6aofGS9t4Z/Oe/gdqcxNr6l5n0Q/tMTpb4b/fJFGNr6RRx9
 X9NQo8flkQb2DEIOP0DVpO2aPebzsVtzg3LZUOLA83+wCHfwINtHai2Dy2zDJ2my
 BrVdou8fe2oxoaYihJg/Tz9cd0nA/6mZArtpYvDImAmX/xuGOvVk9zZkXNwc9nVX
 EyVzy0vW4lA6gAIJ95aG6DDhJcAtVoy0MhBRWG92Pufxhn9aW24AV63ChWUf9DRx
 /3RqpMAuQ3UC2gOxXKKnr54lsdhUIMn/y9sjROkVvQ1BvgRVxO8I4GFvMHMKv9pR
 9KXiVRdzyYERyoL4+MF7A2zTnw+RHL4RVILa85p2ALGy2jQ1UuNUQi0BN9x2u1v8
 S1ifNNX8SwOP+83ImFJhhn3HybpFQ45aLO3F7ZjKBQAnufJu+xw=
 =zeoY
 -----END PGP SIGNATURE-----

Merge tag 'f2fs-for-5.6' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs

Pull f2fs updates from Jaegeuk Kim:
 "In this series, we've implemented transparent compression
  experimentally. It supports LZO and LZ4, but will add more later as we
  investigate in the field more.

  At this point, the feature doesn't expose compressed space to user
  directly in order to guarantee potential data updates later to the
  space. Instead, the main goal is to reduce data writes to flash disk
  as much as possible, resulting in extending disk life time as well as
  relaxing IO congestion.

  Alternatively, we're also considering to add ioctl() to reclaim
  compressed space and show it to user after putting the immutable bit.

  Enhancements:
   - add compression support
   - avoid unnecessary locks in quota ops
   - harden power-cut scenario for zoned block devices
   - use private bio_set to avoid IO congestion
   - replace GC mutex with rwsem to serialize callers

  Bug fixes:
   - fix dentry consistency and memory corruption in rename()'s error case
   - fix wrong swap extent reports
   - fix casefolding bugs
   - change lock coverage to avoid deadlock
   - avoid GFP_KERNEL under f2fs_lock_op

  And, we've cleaned up sysfs entries to prepare no debugfs"

* tag 'f2fs-for-5.6' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (31 commits)
  f2fs: fix race conditions in ->d_compare() and ->d_hash()
  f2fs: fix dcache lookup of !casefolded directories
  f2fs: Add f2fs stats to sysfs
  f2fs: delete duplicate information on sysfs nodes
  f2fs: change to use rwsem for gc_mutex
  f2fs: update f2fs document regarding to fsync_mode
  f2fs: add a way to turn off ipu bio cache
  f2fs: code cleanup for f2fs_statfs_project()
  f2fs: fix miscounted block limit in f2fs_statfs_project()
  f2fs: show the CP_PAUSE reason in checkpoint traces
  f2fs: fix deadlock allocating bio_post_read_ctx from mempool
  f2fs: remove unneeded check for error allocating bio_post_read_ctx
  f2fs: convert inline_dir early before starting rename
  f2fs: fix memleak of kobject
  f2fs: fix to add swap extent correctly
  f2fs: run fsck when getting bad inode during GC
  f2fs: support data compression
  f2fs: free sysfs kobject
  f2fs: declare nested quota_sem and remove unnecessary sems
  f2fs: don't put new_page twice in f2fs_rename
  ...
2020-01-30 15:39:24 -08:00
Linus Torvalds
c8994374d9 fsverity updates for 5.6
- Optimize fs-verity sequential read performance by implementing
   readahead of Merkle tree pages.  This allows the Merkle tree to be
   read in larger chunks.
 
 - Optimize FS_IOC_ENABLE_VERITY performance in the uncached case by
   implementing readahead of data pages.
 
 - Allocate the hash requests from a mempool in order to eliminate the
   possibility of allocation failures during I/O.
 -----BEGIN PGP SIGNATURE-----
 
 iIoEABYIADIWIQSacvsUNc7UX4ntmEPzXCl4vpKOKwUCXi+OuhQcZWJpZ2dlcnNA
 Z29vZ2xlLmNvbQAKCRDzXCl4vpKOK/ZIAP452KKPs6AGXrClZ2l+5nFbkDLN9Or8
 w277B0BeRnu5ogEApmKnYsmRsduLZRJbni7VCpkJLAYI2kmFCwGkFfe3tAQ=
 =svdR
 -----END PGP SIGNATURE-----

Merge tag 'fsverity-for-linus' of git://git.kernel.org/pub/scm/fs/fscrypt/fscrypt

Pull fsverity updates from Eric Biggers:

 - Optimize fs-verity sequential read performance by implementing
   readahead of Merkle tree pages. This allows the Merkle tree to be
   read in larger chunks.

 - Optimize FS_IOC_ENABLE_VERITY performance in the uncached case by
   implementing readahead of data pages.

 - Allocate the hash requests from a mempool in order to eliminate the
   possibility of allocation failures during I/O.

* tag 'fsverity-for-linus' of git://git.kernel.org/pub/scm/fs/fscrypt/fscrypt:
  fs-verity: use u64_to_user_ptr()
  fs-verity: use mempool for hash requests
  fs-verity: implement readahead of Merkle tree pages
  fs-verity: implement readahead for FS_IOC_ENABLE_VERITY
2020-01-28 15:31:03 -08:00
Eric Biggers
80f2388afa f2fs: fix race conditions in ->d_compare() and ->d_hash()
Since ->d_compare() and ->d_hash() can be called in RCU-walk mode,
->d_parent and ->d_inode can be concurrently modified, and in
particular, ->d_inode may be changed to NULL.  For f2fs_d_hash() this
resulted in a reproducible NULL dereference if a lookup is done in a
directory being deleted, e.g. with:

	int main()
	{
		if (fork()) {
			for (;;) {
				mkdir("subdir", 0700);
				rmdir("subdir");
			}
		} else {
			for (;;)
				access("subdir/file", 0);
		}
	}

... or by running the 't_encrypted_d_revalidate' program from xfstests.
Both repros work in any directory on a filesystem with the encoding
feature, even if the directory doesn't actually have the casefold flag.

I couldn't reproduce a crash in f2fs_d_compare(), but it appears that a
similar crash is possible there.

Fix these bugs by reading ->d_parent and ->d_inode using READ_ONCE() and
falling back to the case sensitive behavior if the inode is NULL.

Reported-by: Al Viro <viro@zeniv.linux.org.uk>
Fixes: 2c2eb7a300 ("f2fs: Support case-insensitive file name lookups")
Cc: <stable@vger.kernel.org> # v5.4+
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-01-24 10:04:09 -08:00
Eric Biggers
5515eae647 f2fs: fix dcache lookup of !casefolded directories
Do the name comparison for non-casefolded directories correctly.

This is analogous to ext4's commit 66883da1ee ("ext4: fix dcache
lookup of !casefolded directories").

Fixes: 2c2eb7a300 ("f2fs: Support case-insensitive file name lookups")
Cc: <stable@vger.kernel.org> # v5.4+
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-01-24 09:53:02 -08:00
Hridya Valsaraju
fc7100ea2a f2fs: Add f2fs stats to sysfs
Currently f2fs stats are only available from /d/f2fs/status. This patch
adds some of the f2fs stats to sysfs so that they are accessible even
when debugfs is not mounted.

The following sysfs nodes are added:
-/sys/fs/f2fs/<disk>/free_segments
-/sys/fs/f2fs/<disk>/cp_foreground_calls
-/sys/fs/f2fs/<disk>/cp_background_calls
-/sys/fs/f2fs/<disk>/gc_foreground_calls
-/sys/fs/f2fs/<disk>/gc_background_calls
-/sys/fs/f2fs/<disk>/moved_blocks_foreground
-/sys/fs/f2fs/<disk>/moved_blocks_background
-/sys/fs/f2fs/<disk>/avg_vblocks

Signed-off-by: Hridya Valsaraju <hridya@google.com>
[Jaegeuk Kim: allow STAT_FS without DEBUG_FS]
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-01-23 09:24:25 -08:00
Chao Yu
fb24fea75c f2fs: change to use rwsem for gc_mutex
Mutex lock won't serialize callers, in order to avoid starving of unlucky
caller, let's use rwsem lock instead.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-01-17 16:48:44 -08:00
Jaegeuk Kim
d7b0a23d81 f2fs: update f2fs document regarding to fsync_mode
This patch adds missing fsync_mode entry in f2fs document.

Fixes: 04485987f0 ("f2fs: introduce async IPU policy")
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-01-17 16:48:44 -08:00
Jaegeuk Kim
0e7f41974e f2fs: add a way to turn off ipu bio cache
Setting 0x40 in /sys/fs/f2fs/dev/ipu_policy gives a way to turn off
bio cache, which is useufl to check whether block layer using hardware
encryption engine merges IOs correctly.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-01-17 16:48:43 -08:00
Chengguang Xu
bf2cbd3c57 f2fs: code cleanup for f2fs_statfs_project()
Calling min_not_zero() to simplify complicated prjquota
limit comparison in f2fs_statfs_project().

Signed-off-by: Chengguang Xu <cgxu519@mykernel.net>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-01-17 16:48:43 -08:00
Chengguang Xu
acdf217217 f2fs: fix miscounted block limit in f2fs_statfs_project()
statfs calculates Total/Used/Avail disk space in block unit,
so we should translate soft/hard prjquota limit to block unit
as well.

Below testing result shows the block/inode numbers of
Total/Used/Avail from df command are all correct afer
applying this patch.

[root@localhost quota-tools]\# ./repquota -P /dev/sdb1
*** Report for project quotas on device /dev/sdb1
Block grace time: 7days; Inode grace time: 7days
              Block limits                File limits
Project   used soft    hard  grace  used  soft  hard  grace
-----------------------------------------------------------
\#0   --   4       0       0         1     0     0
\#101 --   0       0       0         2     0     0
\#102 --   0   10240       0         2    10     0
\#103 --   0       0   20480         2     0    20
\#104 --   0   10240   20480         2    10    20
\#105 --   0   20480   10240         2    20    10

[root@localhost sdb1]\# lsattr -p t{1,2,3,4,5}
  101 ----------------N-- t1/a1
  102 ----------------N-- t2/a2
  103 ----------------N-- t3/a3
  104 ----------------N-- t4/a4
  105 ----------------N-- t5/a5

[root@localhost sdb1]\# df -hi t{1,2,3,4,5}
Filesystem     Inodes IUsed IFree IUse% Mounted on
/dev/sdb1        2.4M    21  2.4M    1% /mnt/sdb1
/dev/sdb1          10     2     8   20% /mnt/sdb1
/dev/sdb1          20     2    18   10% /mnt/sdb1
/dev/sdb1          10     2     8   20% /mnt/sdb1
/dev/sdb1          10     2     8   20% /mnt/sdb1

[root@localhost sdb1]\# df -h t{1,2,3,4,5}
Filesystem      Size  Used Avail Use% Mounted on
/dev/sdb1        10G  489M  9.6G   5% /mnt/sdb1
/dev/sdb1        10M     0   10M   0% /mnt/sdb1
/dev/sdb1        20M     0   20M   0% /mnt/sdb1
/dev/sdb1        10M     0   10M   0% /mnt/sdb1
/dev/sdb1        10M     0   10M   0% /mnt/sdb1

Fixes: 909110c060 ("f2fs: choose hardlimit when softlimit is larger than hardlimit in f2fs_statfs_project()")
Signed-off-by: Chengguang Xu <cgxu519@mykernel.net>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-01-17 16:48:43 -08:00
Eric Biggers
644c8c92ad f2fs: fix deadlock allocating bio_post_read_ctx from mempool
Without any form of coordination, any case where multiple allocations
from the same mempool are needed at a time to make forward progress can
deadlock under memory pressure.

This is the case for struct bio_post_read_ctx, as one can be allocated
to decrypt a Merkle tree page during fsverity_verify_bio(), which itself
is running from a post-read callback for a data bio which has its own
struct bio_post_read_ctx.

Fix this by freeing first bio_post_read_ctx before calling
fsverity_verify_bio().  This works because verity (if enabled) is always
the last post-read step.

This deadlock can be reproduced by trying to read from an encrypted
verity file after reducing NUM_PREALLOC_POST_READ_CTXS to 1 and patching
mempool_alloc() to pretend that pool->alloc() always fails.

Note that since NUM_PREALLOC_POST_READ_CTXS is actually 128, to actually
hit this bug in practice would require reading from lots of encrypted
verity files at the same time.  But it's theoretically possible, as N
available objects doesn't guarantee forward progress when > N/2 threads
each need 2 objects at a time.

Fixes: 95ae251fe8 ("f2fs: add fs-verity support")
Signed-off-by: Eric Biggers <ebiggers@google.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-01-17 16:48:43 -08:00
Eric Biggers
e8ce5749d7 f2fs: remove unneeded check for error allocating bio_post_read_ctx
Since allocating an object from a mempool never fails when
__GFP_DIRECT_RECLAIM (which is included in GFP_NOFS) is set, the check
for failure to allocate a bio_post_read_ctx is unnecessary.  Remove it.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-01-17 16:48:42 -08:00
Jaegeuk Kim
b06af2aff2 f2fs: convert inline_dir early before starting rename
If we hit an error during rename, we'll get two dentries in different
directories.

Chao adds to check the room in inline_dir which can avoid needless
inversion. This should be done by inode_lock(&old_dir).

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-01-17 16:48:42 -08:00
Chao Yu
fe396ad8e7 f2fs: fix memleak of kobject
If kobject_init_and_add() failed, caller needs to invoke kobject_put()
to release kobject explicitly.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-01-17 16:48:42 -08:00
Chao Yu
3e5e479a39 f2fs: fix to add swap extent correctly
As Youling reported in mailing list:

https://www.linuxquestions.org/questions/linux-newbie-8/the-file-system-f2fs-is-broken-4175666043/

https://www.linux.org/threads/the-file-system-f2fs-is-broken.26490/

There is a test case can corrupt f2fs image:
- dd if=/dev/zero of=/swapfile bs=1M count=4096
- chmod 600 /swapfile
- mkswap /swapfile
- swapon --discard /swapfile

The root cause is f2fs_swap_activate() intends to return zero value
to setup_swap_extents() to enable SWP_FS mode (swap file goes through
fs), in this flow, setup_swap_extents() setups swap extent with wrong
block address range, result in discard_swap() erasing incorrect address.

Because f2fs_swap_activate() has pinned swapfile, its data block
address will not change, it's safe to let swap to handle IO through
raw device, so we can get rid of SWAP_FS mode and initial swap extents
inside f2fs_swap_activate(), by this way, later discard_swap() can trim
in right address range.

Fixes: 4969c06a0d ("f2fs: support swap file w/ DIO")
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-01-17 16:48:42 -08:00
Jaegeuk Kim
4eea93e3ff f2fs: run fsck when getting bad inode during GC
This is to avoid inifinite GC when trying to disable checkpoint.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-01-17 16:48:42 -08:00
Chao Yu
4c8ff7095b f2fs: support data compression
This patch tries to support compression in f2fs.

- New term named cluster is defined as basic unit of compression, file can
be divided into multiple clusters logically. One cluster includes 4 << n
(n >= 0) logical pages, compression size is also cluster size, each of
cluster can be compressed or not.

- In cluster metadata layout, one special flag is used to indicate cluster
is compressed one or normal one, for compressed cluster, following metadata
maps cluster to [1, 4 << n - 1] physical blocks, in where f2fs stores
data including compress header and compressed data.

- In order to eliminate write amplification during overwrite, F2FS only
support compression on write-once file, data can be compressed only when
all logical blocks in file are valid and cluster compress ratio is lower
than specified threshold.

- To enable compression on regular inode, there are three ways:
* chattr +c file
* chattr +c dir; touch dir/file
* mount w/ -o compress_extension=ext; touch file.ext

Compress metadata layout:
                             [Dnode Structure]
             +-----------------------------------------------+
             | cluster 1 | cluster 2 | ......... | cluster N |
             +-----------------------------------------------+
             .           .                       .           .
       .                       .                .                      .
  .         Compressed Cluster       .        .        Normal Cluster            .
+----------+---------+---------+---------+  +---------+---------+---------+---------+
|compr flag| block 1 | block 2 | block 3 |  | block 1 | block 2 | block 3 | block 4 |
+----------+---------+---------+---------+  +---------+---------+---------+---------+
           .                             .
         .                                           .
       .                                                           .
      +-------------+-------------+----------+----------------------------+
      | data length | data chksum | reserved |      compressed data       |
      +-------------+-------------+----------+----------------------------+

Changelog:

20190326:
- fix error handling of read_end_io().
- remove unneeded comments in f2fs_encrypt_one_page().

20190327:
- fix wrong use of f2fs_cluster_is_full() in f2fs_mpage_readpages().
- don't jump into loop directly to avoid uninitialized variables.
- add TODO tag in error path of f2fs_write_cache_pages().

20190328:
- fix wrong merge condition in f2fs_read_multi_pages().
- check compressed file in f2fs_post_read_required().

20190401
- allow overwrite on non-compressed cluster.
- check cluster meta before writing compressed data.

20190402
- don't preallocate blocks for compressed file.

- add lz4 compress algorithm
- process multiple post read works in one workqueue
  Now f2fs supports processing post read work in multiple workqueue,
  it shows low performance due to schedule overhead of multiple
  workqueue executing orderly.

20190921
- compress: support buffered overwrite
C: compress cluster flag
V: valid block address
N: NEW_ADDR

One cluster contain 4 blocks

 before overwrite   after overwrite

- VVVV		->	CVNN
- CVNN		->	VVVV

- CVNN		->	CVNN
- CVNN		->	CVVV

- CVVV		->	CVNN
- CVVV		->	CVVV

20191029
- add kconfig F2FS_FS_COMPRESSION to isolate compression related
codes, add kconfig F2FS_FS_{LZO,LZ4} to cover backend algorithm.
note that: will remove lzo backend if Jaegeuk agreed that too.
- update codes according to Eric's comments.

20191101
- apply fixes from Jaegeuk

20191113
- apply fixes from Jaegeuk
- split workqueue for fsverity

20191216
- apply fixes from Jaegeuk

20200117
- fix to avoid NULL pointer dereference

[Jaegeuk Kim]
- add tracepoint for f2fs_{,de}compress_pages()
- fix many bugs and add some compression stats
- fix overwrite/mmap bugs
- address 32bit build error, reported by Geert.
- bug fixes when handling errors and i_compressed_blocks

Reported-by: <noreply@ellerman.id.au>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-01-17 16:48:07 -08:00
Jaegeuk Kim
820d366736 f2fs: free sysfs kobject
Detected kmemleak.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-01-15 13:43:49 -08:00