Commit Graph

87676 Commits

Author SHA1 Message Date
Yongpeng Yang
19ec1d31fa f2fs: Add error handling for negative returns from do_garbage_collect
The function do_garbage_collect can return a value less than 0 due
to f2fs_cp_error being true or page allocation failure, as a result
of calling f2fs_get_sum_page. However, f2fs_gc does not account for
such cases, which could potentially lead to an abnormal total_freed
and thus cause subsequent code to behave unexpectedly. Given that
an f2fs_cp_error is irrecoverable, and considering that
do_garbage_collect already retries page allocation errors through
its call to f2fs_get_sum_page->f2fs_get_meta_page_retry, any error
reported by do_garbage_collect should immediately terminate the
current GC.

Signed-off-by: Yongpeng Yang <yangyongpeng1@oppo.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2023-12-26 13:06:40 -08:00
Yongpeng Yang
0145eed6ed f2fs: Constrain the modification range of dir_level in the sysfs
The {struct f2fs_sb_info}->dir_level can be modified through the sysfs
interface, but its value range is not limited. If the value exceeds
MAX_DIR_HASH_DEPTH and the mount options include "noinline_dentry",
the following error will occur:
[root@fedora ~]# mount -o noinline_dentry /dev/sdb  /mnt/sdb/
[root@fedora ~]# echo 128 > /sys/fs/f2fs/sdb/dir_level
[root@fedora ~]# cd /mnt/sdb/
[root@fedora sdb]# mkdir test
[root@fedora sdb]# cd test/
[root@fedora test]# mkdir test
mkdir: cannot create directory 'test': Argument list too long

Signed-off-by: Yongpeng Yang <yangyongpeng1@oppo.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2023-12-26 13:06:32 -08:00
Kevin Hao
94e7eb4241 f2fs: Use wait_event_freezable_timeout() for freezable kthread
A freezable kernel thread can enter frozen state during freezing by
either calling try_to_freeze() or using wait_event_freezable() and its
variants. So for the following snippet of code in a kernel thread loop:
  wait_event_interruptible_timeout();
  try_to_freeze();

We can change it to a simple wait_event_freezable_timeout() and then
eliminate the function calls to try_to_freeze() and freezing().

Signed-off-by: Kevin Hao <haokexin@gmail.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2023-12-26 13:05:56 -08:00
Zhiguo Niu
86d7d57a3f f2fs: fix to check return value of f2fs_recover_xattr_data
Should check return value of f2fs_recover_xattr_data in
__f2fs_setxattr rather than doing invalid retry if error happen.

Also just do set_page_dirty in f2fs_recover_xattr_data when
page is changed really.

Fixes: 50a472bbc7 ("f2fs: do not return EFSCORRUPTED, but try to run online repair")
Signed-off-by: Zhiguo Niu <zhiguo.niu@unisoc.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2023-12-15 15:09:17 -08:00
Chao Yu
394e7f4dbb f2fs: don't set FI_PREALLOCATED_ALL for partial write
In f2fs_preallocate_blocks(), if it is partial write in 4KB, it's not
necessary to call f2fs_map_blocks() and set FI_PREALLOCATED_ALL flag.

Cc: Eric Biggers <ebiggers@google.com>
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2023-12-11 17:46:35 -08:00
Chao Yu
bb34cc6ca8 f2fs: fix to update iostat correctly in f2fs_filemap_fault()
In f2fs_filemap_fault(), it fixes to update iostat info only if
VM_FAULT_LOCKED is tagged in return value of filemap_fault().

Fixes: 8b83ac81f4 ("f2fs: support read iostat")
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2023-12-11 14:12:19 -08:00
Chao Yu
fb9b65340c f2fs: fix to check compress file in f2fs_move_file_range()
f2fs_move_file_range() doesn't support migrating compressed cluster
data, let's add the missing check condition and return -EOPNOTSUPP
for the case until we support it.

Fixes: 4c8ff7095b ("f2fs: support data compression")
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2023-12-11 14:04:38 -08:00
Chao Yu
55fdc1c24a f2fs: fix to wait on block writeback for post_read case
If inode is compressed, but not encrypted, it missed to call
f2fs_wait_on_block_writeback() to wait for GCed page writeback
in IPU write path.

Thread A				GC-Thread
					- f2fs_gc
					 - do_garbage_collect
					  - gc_data_segment
					   - move_data_block
					    - f2fs_submit_page_write
					     migrate normal cluster's block via
					     meta_inode's page cache
- f2fs_write_single_data_page
 - f2fs_do_write_data_page
  - f2fs_inplace_write_data
   - f2fs_submit_page_bio

IRQ
- f2fs_read_end_io
					IRQ
					old data overrides new data due to
					out-of-order GC and common IO.
					- f2fs_read_end_io

Fixes: 4c8ff7095b ("f2fs: support data compression")
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2023-12-11 14:04:08 -08:00
Chao Yu
4961acdd65 f2fs: fix to tag gcing flag on page during block migration
It needs to add missing gcing flag on page during block migration,
in order to garantee migrated data be persisted during checkpoint,
otherwise out-of-order persistency between data and node may cause
data corruption after SPOR.

Similar issue was fixed by commit 2d1fe8a86b ("f2fs: fix to tag
gcing flag on page during file defragment").

Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2023-12-11 13:38:40 -08:00
Chao Yu
87f3afd366 f2fs: add tracepoint for f2fs_vm_page_mkwrite()
This patch adds to support tracepoint for f2fs_vm_page_mkwrite(),
meanwhile it prints more details for trace_f2fs_filemap_fault().

Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2023-12-11 13:37:53 -08:00
Chao Yu
4e4f1eb994 f2fs: introduce f2fs_invalidate_internal_cache() for cleanup
Just cleanup, no logic changes.

Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2023-12-11 13:34:55 -08:00
Chao Yu
59d0d4c3ea f2fs: update blkaddr in __set_data_blkaddr() for cleanup
This patch allows caller to pass blkaddr to f2fs_set_data_blkaddr()
and let __set_data_blkaddr() inside f2fs_set_data_blkaddr() to update
dn->data_blkaddr w/ last value of blkaddr.

Just cleanup, no logic changes.

Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2023-12-11 13:34:04 -08:00
Chao Yu
2020cd48e4 f2fs: introduce get_dnode_addr() to clean up codes
Just cleanup, no logic changes.

Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2023-12-11 13:32:01 -08:00
Chao Yu
bb6e1c8fa5 f2fs: delete obsolete FI_DROP_CACHE
FI_DROP_CACHE was introduced in commit 1e84371ffe ("f2fs: change
atomic and volatile write policies") for volatile write feature,
after commit 7bc155fec5 ("f2fs: kill volatile write support"),
we won't support volatile write, let's delete related codes.

Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2023-12-11 13:30:08 -08:00
Chao Yu
a539363613 f2fs: delete obsolete FI_FIRST_BLOCK_WRITTEN
Commit 3c6c2bebef ("f2fs: avoid punch_hole overhead when releasing
volatile data") introduced FI_FIRST_BLOCK_WRITTEN as below reason:

This patch is to avoid some punch_hole overhead when releasing volatile
data. If volatile data was not written yet, we just can make the first
page as zero.

After commit 7bc155fec5 ("f2fs: kill volatile write support"), we
won't support volatile write, but it missed to remove obsolete
FI_FIRST_BLOCK_WRITTEN, delete it in this patch.

Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2023-12-11 13:29:34 -08:00
Daniel Rosenberg
a6a010f5de f2fs: Restrict max filesize for 16K f2fs
Blocks are tracked by u32, so the max permitted filesize is
(U32_MAX + 1) * BLOCK_SIZE. Additionally, in order to support crypto
data unit sizes of 4K with a 16K block with IV_INO_LBLK_{32,64}, we must
further restrict max filesize to (U32_MAX + 1) * 4096. This does not
affect 4K blocksize f2fs as the natural limit for files are well below
that.

Fixes: d7e9a9037d ("f2fs: Support Block Size == Page Size")
Signed-off-by: Daniel Rosenberg <drosen@google.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2023-12-08 12:31:08 -08:00
Jaegeuk Kim
1ccd91963b f2fs: let's finish or reset zones all the time
In order to limit # of open zones, let's finish or reset zones given # of
valid blocks per section and its zone condition.

Reviewed-by: Daeho Jeong <daehojeong@google.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2023-12-05 11:29:05 -08:00
Jaegeuk Kim
aca90eea8a f2fs: check write pointers when checkpoint=disable
Even if f2fs was rebooted as staying checkpoint=disable, let's match the write
pointers all the time.

Reviewed-by: Daeho Jeong <daehojeong@google.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2023-12-04 17:18:03 -08:00
Jaegeuk Kim
9dad4d9642 f2fs: fix write pointers on zoned device after roll forward
1. do roll forward recovery
2. update current segments pointers
3. fix the entire zones' write pointers
4. do checkpoint

Reviewed-by: Daeho Jeong <daehojeong@google.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2023-12-04 17:18:00 -08:00
Jaegeuk Kim
15a76c8014 f2fs: allocate new section if it's not new
If fsck can allocate a new zone, it'd be better to use that instead of
allocating a new one.

And, it modifies kernel messages.

Reviewed-by: Daeho Jeong <daehojeong@google.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2023-12-04 17:17:56 -08:00
Jaegeuk Kim
29215a7d43 f2fs: allow checkpoint=disable for zoned block device
Let's allow checkpoint=disable back for zoned block device. It's very risky
as the feature relies on fsck or runtime recovery which matches the write
pointers again if the device rebooted while disabling the checkpoint.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2023-12-01 10:38:35 -08:00
Chao Yu
d346fa09ab f2fs: sysfs: support discard_io_aware
It gives a way to enable/disable IO aware feature for background
discard, so that we can tune background discard more precisely
based on undiscard condition. e.g. force to disable IO aware if
there are large number of discard extents, and discard IO may
always be interrupted by frequent common IO.

Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2023-11-28 10:59:51 -08:00
Chao Yu
5f23ffdf17 f2fs: introduce tracepoint for f2fs_rename()
This patch adds tracepoints for f2fs_rename().

Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2023-11-28 10:52:44 -08:00
Chao Yu
53edb54956 f2fs: fix to avoid dirent corruption
As Al reported in link[1]:

f2fs_rename()
...
	if (old_dir != new_dir && !whiteout)
		f2fs_set_link(old_inode, old_dir_entry,
					old_dir_page, new_dir);
	else
		f2fs_put_page(old_dir_page, 0);

You want correct inumber in the ".." link.  And cross-directory
rename does move the source to new parent, even if you'd been asked
to leave a whiteout in the old place.

[1] https://lore.kernel.org/all/20231017055040.GN800259@ZenIV/

With below testcase, it may cause dirent corruption, due to it missed
to call f2fs_set_link() to update ".." link to new directory.
- mkdir -p dir/foo
- renameat2 -w dir/foo bar

[ASSERT] (__chk_dots_dentries:1421)  --> Bad inode number[0x4] for '..', parent parent ino is [0x3]
[FSCK] other corrupted bugs                           [Fail]

Fixes: 7e01e7ad74 ("f2fs: support RENAME_WHITEOUT")
Cc: Jan Kara <jack@suse.cz>
Reported-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Chao Yu <chao@kernel.org>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2023-11-28 10:52:04 -08:00
Jaegeuk Kim
bbd3efed33 f2fs: skip adding a discard command if exists
When recovering zoned UFS, sometimes we add the same zone to discard multiple
times. Simple workaround is to bypass adding it.

Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2023-11-20 09:00:24 -08:00
Chao Yu
956fa1ddc1 f2fs: fix to check return value of f2fs_reserve_new_block()
Let's check return value of f2fs_reserve_new_block() in do_recover_data()
rather than letting it fails silently.

Also refactoring check condition on return value of f2fs_reserve_new_block()
as below:
- trigger f2fs_bug_on() only for ENOSPC case;
- use do-while statement to avoid redundant codes;

Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2023-11-17 09:31:27 -08:00
Chao Yu
9458915036 f2fs: use shared inode lock during f2fs_fiemap()
f2fs_fiemap() will only traverse metadata of inode, let's use shared
inode lock for it to avoid unnecessary race on inode lock.

Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2023-11-17 09:31:18 -08:00
Chao Yu
ff6584ac2c f2fs: clean up w/ dotdot_name
Just cleanup, no logic changes.

Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2023-11-17 09:31:09 -08:00
Eric Biggers
e26b6d3927 f2fs: explicitly null-terminate the xattr list
When setting an xattr, explicitly null-terminate the xattr list.  This
eliminates the fragile assumption that the unused xattr space is always
zeroed.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2023-11-17 09:30:54 -08:00
zhangxirui
36062b9183 f2fs: use inode_lock_shared instead of inode_lock in f2fs_seek_block()
inode_lock_shared() -> down_read(&inode->i_rwsem)
       inode_lock() -> down_write(&inode->i_rwsem)

Inode is not updated in f2fs_seek_block(), so there is no need
to hold write lock, use read lock for more efficiency.

Signed-off-by: zhangxirui <xirui.zhang@vivo.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2023-11-17 09:30:53 -08:00
Linus Torvalds
6bc40e44f1 overlayfs fixes for 6.7-rc2
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE9zuTYTs0RXF+Ke33EVvVyTe/1WoFAmVXZFQACgkQEVvVyTe/
 1Wr/hQ//YofnLzFuE172QmfiYLQYAnBJONql47Hs32g+9zGjw7ev6tVbwEwuduY9
 23lktlJthJO15+L8mfG3ECqpV7KfdBfuipjI6nO9V/Br7YEdHtDgk7jUqFWoADUA
 tEtXqjk8cqkWc4+6XFKHYeN04Nd0tvRIFmtW90gIxANE/AZxiPcCGKIKfqKgDLZ2
 0IbN7yeJASc7XCtcjl9uldvhgmltpu1xX3IETsKOtLh1H8J3+DSI/5K7kQ4if5q/
 6Hi3+6Qf3aTqyaqG6z8RVhbwvrRWFNvaUWpjW5F1sBpNddtq8ioHmqX4L3Caybsw
 ukitshGj59MfmNnirxryO8MXv4RwqOAZFQc7ZfQhL6RzEO6WiqNybQ112SOh25E+
 NsKSy4vhCiH3ifGQC8LZtdeWmcPS/5vPUMv81w7P6Y/VWZImQQ04kf1akSr9/iBX
 KCLFhYb8lKu+pBHFEZkYrdTDbIby+7QKraIi9hC2RsfFiIfvHn4Y1AtUt9M145va
 vBTF/7y8t5VhftMhP77ZUvREwIMrzcBJtqIH8J5XoT6EkxlGCV5ft9el20VyXYia
 tkWSzW9dQzGG+eGtdSX490MQMlZs7yN0SzyP0rUrZ2LMycwmMX976ssXtnp4NWBM
 sAnHbZMS1eXwK57WP9gOXiKLZ7sMWj03NzYK+cITL6Gttq/bAdw=
 =OtFU
 -----END PGP SIGNATURE-----

Merge tag 'ovl-fixes-6.7-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/overlayfs/vfs

Pull overlayfs fixes from Amir Goldstein:
 "A fix to an overlayfs param parsing bug and a misformatted comment"

* tag 'ovl-fixes-6.7-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/overlayfs/vfs:
  ovl: fix memory leak in ovl_parse_param()
  ovl: fix misformatted comment
2023-11-17 09:18:48 -05:00
Amir Goldstein
37f32f5264 ovl: fix memory leak in ovl_parse_param()
On failure to parse parameters in ovl_parse_param_lowerdir(), it is
necessary to update ctx->nr with the correct nr before using
ovl_reset_lowerdirs() to release l->name.

Reported-and-tested-by: syzbot+26eedf3631650972f17c@syzkaller.appspotmail.com
Fixes: c835110b58 ("ovl: remove unused code in lowerdir param parsing")
Co-authored-by: Edward Adam Davis <eadavis@qq.com>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
2023-11-14 08:09:36 +02:00
Amir Goldstein
b28060db71 ovl: fix misformatted comment
Remove misleading /** prefix from a regular comment.

Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202311121628.byHp8tkv-lkp@intel.com/
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
2023-11-14 08:09:36 +02:00
Linus Torvalds
9bacdd8996 for-6.7-rc1-tag
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEE8rQSAMVO+zA4DBdWxWXV+ddtWDsFAmVSO50ACgkQxWXV+ddt
 WDuiyg/7BZviFAyiQMAzpA319qRJZ+EemfTdF/k69q4axGYuvqVdXKnpOV44AR4I
 dKcHLOPpDZIxsh8lFytkm1UEAHptw1v7A64c+gcdjGK0tAA7aKbw/1nmNysowT23
 L0v2+34hkBUfG8A3uVgOwL1rjItEX5Fl54slpVsazSqlEbKrqC4MGNjmqdp3IOeC
 qfXTgkvkXmm8s8NyoJybKewM9Aw0tmK0jkAFHA+2sgcZPYKXjqWGv9KUOsXnCx5o
 3kPWIRT1sj4q2qzrgP14Q12O6qPLZ2/0oTBhi6nhj8+N1yiH+USS5zBITegF+w2n
 leQeVHtyBYHlPYQSQlCIZy7+10gkePvs+JmoAuL8YFISnGYnvOZqCeArlV7cnNI3
 CQt7ZBER5Dqw78Y756usUhpYrLWa9kOpcPVRmjJ/R62+TY1FkkyY7irETbn5EGjI
 NlhEa4PMYeYpAOccoxWEm9tIiiVD1abURhVBdn3Znfcb1Sv/lrGBlo9DYGFCxbBh
 xU1JP7sly8w0aPLqCbn1X3VY8dXp+CeYz4FQabHjQA/zr9lF08/pRYj3haAbYAyH
 0KphXurwz/YqY+LmRg7SbQ/KMgBAiBV8Qk9JyNvdvaQbnYnq7CWdpoHcpZu3mvpb
 HLGoXew58kZaSfxLHlcT5wwYlbq0rooXRstuFg2+BBcOFOMCQfw=
 =GM+1
 -----END PGP SIGNATURE-----

Merge tag 'for-6.7-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux

Pull btrfs fixes from David Sterba:

 - fix potential overflow in returned value from SEARCH_TREE_V2
   ioctl on 32bit architecture

 - zoned mode fixes:

     - drop unnecessary write pointer check for RAID0/RAID1/RAID10
       profiles, now it works because of raid-stripe-tree

     - wait for finishing the zone when direct IO needs a new
       allocation

 - simple quota fixes:

     - pass correct owning root pointer when cleaning up an
       aborted transaction

     - fix leaking some structures when processing delayed refs

     - change key type number of BTRFS_EXTENT_OWNER_REF_KEY,
       reorder it before inline refs that are supposed to be
       sorted, keeping the original number would complicate a lot
       of things; this change needs an updated version of
       btrfs-progs to work and filesystems need to be recreated

 - fix error pointer dereference after failure to allocate fs
   devices

 - fix race between accounting qgroup extents and removing a
   qgroup

* tag 'for-6.7-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
  btrfs: make OWNER_REF_KEY type value smallest among inline refs
  btrfs: fix qgroup record leaks when using simple quotas
  btrfs: fix race between accounting qgroup extents and removing a qgroup
  btrfs: fix error pointer dereference after failure to allocate fs devices
  btrfs: make found_logical_ret parameter mandatory for function queue_scrub_stripe()
  btrfs: get correct owning_root when dropping snapshot
  btrfs: zoned: wait for data BG to be finished on direct IO allocation
  btrfs: zoned: drop no longer valid write pointer check
  btrfs: directly return 0 on no error code in btrfs_insert_raid_extent()
  btrfs: use u64 for buffer sizes in the tree search ioctls
2023-11-13 09:09:12 -08:00
Linus Torvalds
1b907d0507 Sixteen smb3/cifs client fixes
-----BEGIN PGP SIGNATURE-----
 
 iQGzBAABCgAdFiEE6fsu8pdIjtWE/DpLiiy9cAdyT1EFAmVP3VMACgkQiiy9cAdy
 T1HlPAwAoeklucnmjVZJny15qsDHErR9I/CQseMksGHBomAFk2iUjUEL8cjozMMU
 3gZuXnYT07Gd95Tk4oytVqEFn4pXl4rdi1gsppr9ewPu0XYZBL0yt9L9rt7g9lm9
 yWvwa6skIOjJIeLs+Thzz7MBj3T759TT0O4J4Hl2LQ8QnDPvR9Zh0N01B6boRw7i
 qG8jcCgJRRHlj6B/e24wGdu7wTUxDxWCXGkyos30VfojdrQwfWJ45Hhn7MiytRfx
 qeEW2bYdSBZhqpQs6MbpkBz+nUZQf7oxhXbqfxqx8CsjaN7X//qA+mWl47n64t52
 h7A73LP8rDe6JJNZRY/LWGNK4vtqEVw2AvvbETqxiteLA61Xp/+3SBt3upepH6eT
 /EvSXuMmfeHUf/Od2u00ynos4VbFzFelHuzmGatv/s7VeHqRu4ImWHtRhI3BbmjK
 runuFRNcY8YrGfpu+niXIeYDI0a9zIeCKl75GYbf/Ns53EYYwfKJIrG+BX0i2CUO
 g72piup1
 =xjGh
 -----END PGP SIGNATURE-----

Merge tag '6.7-rc-smb3-client-fixes-part2' of git://git.samba.org/sfrench/cifs-2.6

Pull smb client fixes from Steve French:

 - ctime caching fix (for setxattr)

 - encryption fix

 - DNS resolver mount fix

 - debugging improvements

 - multichannel fixes including cases where server stops or starts
   supporting multichannel after mount

 - reconnect fix

 - minor cleanups

* tag '6.7-rc-smb3-client-fixes-part2' of git://git.samba.org/sfrench/cifs-2.6:
  cifs: update internal module version number for cifs.ko
  cifs: handle when server stops supporting multichannel
  cifs: handle when server starts supporting multichannel
  Missing field not being returned in ioctl CIFS_IOC_GET_MNT_INFO
  smb3: allow dumping session and tcon id to improve stats analysis and debugging
  smb: client: fix mount when dns_resolver key is not available
  smb3: fix caching of ctime on setxattr
  smb3: minor cleanup of session handling code
  cifs: reconnect work should have reference on server struct
  cifs: do not pass cifs_sb when trying to add channels
  cifs: account for primary channel in the interface list
  cifs: distribute channels across interfaces based on speed
  cifs: handle cases where a channel is closed
  smb3: more minor cleanups for session handling routines
  smb3: minor RDMA cleanup
  cifs: Fix encryption of cleared, but unset rq_iter data buffers
2023-11-11 17:17:22 -08:00
Linus Torvalds
826c484166 Three ksmbd server fixes
-----BEGIN PGP SIGNATURE-----
 
 iQGzBAABCgAdFiEE6fsu8pdIjtWE/DpLiiy9cAdyT1EFAmVN1/gACgkQiiy9cAdy
 T1HFlAv9Ejt62uSKquLYpa8OQPUS0L4rxW3RvpfsAdSsnnDWLAeuVPPkTOqTHnw7
 0tifp5h9cS8qYsWvACb5MdWPc71J91QU22tMmd8eeW++9IsBNOX/5Ph635PBppxC
 Q0FL+G2xQ/vlqi0QbkR4SdI7vTvU9LRvxNpqRHgjs4W45r1QC6e9LJ3ncJf1aKfz
 k2v90M0Oo++YXztLLYPapbtlHYc9c/Xufu6HPbfWH/Ryc2N2CxQ6Z3Kp9RGP5PGk
 gT4SBVR69VfWHK1JK8dqPkbiaiEyRUxhPVqTCMdVzFbFAhczQbcqa8Ufz9nVXfnO
 P98cq9c2/NF7JUYQVs6V4kZa74kD8osnuKud6706teM31Zr9OZlq2keyV2Zx1m6I
 Niwwzdn/lXfT8GCIDE20KNCAdV3y1vu5yQkg3Mnv0Yj1VawyHKlAdYeQ0So5s4Sm
 B5bXnXr5wgd4ughGYDhO5gCwSn6L8CGwkzQ+bkN4FdjBWPMKVAyuCQL/0UglJ2LK
 KuQxhCDE
 =YnRz
 -----END PGP SIGNATURE-----

Merge tag '6.7-rc-smb3-server-part2' of git://git.samba.org/ksmbd

Pull smb server fixes from Steve French:

 - slab out of bounds fix in ACL handling

 - fix malformed request oops

 - minor doc fix

* tag '6.7-rc-smb3-server-part2' of git://git.samba.org/ksmbd:
  ksmbd: handle malformed smb1 message
  ksmbd: fix kernel-doc comment of ksmbd_vfs_kern_path_locked()
  ksmbd: fix slab out of bounds write in smb_inherit_dacl()
2023-11-10 10:23:53 -08:00
Linus Torvalds
e21165bfbc Two items:
- support for idmapped mounts in CephFS (Christian Brauner, Alexander
   Mikhalitsyn).  The series was originally developed by Christian and
   later picked up and brought over the finish line by Alexander, who
   also contributed an enabler on the MDS side (separate owner_{u,g}id
   fields on the wire).  The required exports for mnt_idmap_{get,put}()
   in VFS have been acked by Christian and received no objection from
   Christoph.
 
 - a churny change in CephFS logging to include cluster and client
   identifiers in log and debug messages (Xiubo Li).  This would help
   in scenarios with dozens of CephFS mounts on the same node which are
   getting increasingly common, especially in the Kubernetes world.
 -----BEGIN PGP SIGNATURE-----
 
 iQFHBAABCAAxFiEEydHwtzie9C7TfviiSn/eOAIR84sFAmVNDyMTHGlkcnlvbW92
 QGdtYWlsLmNvbQAKCRBKf944AhHziy6xCACmUikhgo+pZDO7oQS7HChdZFz5Q2Jb
 5K9K7sJr7Zb2gFKLAV93cyFrz0JRljmgZuA3DmDTH1omrkrVAJfHJ6Md1UFG3W7o
 LaowP3kECsykOiz+YtVU2957sfoFqds/q6KCXdp1Pc8WfNnL1vysCim6EGYtCqUm
 c6vv0zkvEDQp+3kjTN01LHzHFZdHg8+STNM4BMiB0sO/NqbADnaQBEtDpYrgzwh7
 YPwVIKcXutUmAKlb7vjUF9ptSICYkGV7B+loPS4BDva1I6dT42xqLQOu89tTKU0D
 DiQAfE2oRgU+fl+mMooFJNSqD25Q6IkXrPs0HuiSHS4wtLGqYZAwwfHB
 =F9o1
 -----END PGP SIGNATURE-----

Merge tag 'ceph-for-6.7-rc1' of https://github.com/ceph/ceph-client

Pull ceph updates from Ilya Dryomov:

 - support for idmapped mounts in CephFS (Christian Brauner, Alexander
   Mikhalitsyn).

   The series was originally developed by Christian and later picked up
   and brought over the finish line by Alexander, who also contributed
   an enabler on the MDS side (separate owner_{u,g}id fields on the
   wire).

   The required exports for mnt_idmap_{get,put}() in VFS have been acked
   by Christian and received no objection from Christoph.

 - a churny change in CephFS logging to include cluster and client
   identifiers in log and debug messages (Xiubo Li).

   This would help in scenarios with dozens of CephFS mounts on the same
   node which are getting increasingly common, especially in the
   Kubernetes world.

* tag 'ceph-for-6.7-rc1' of https://github.com/ceph/ceph-client:
  ceph: allow idmapped mounts
  ceph: allow idmapped atomic_open inode op
  ceph: allow idmapped set_acl inode op
  ceph: allow idmapped setattr inode op
  ceph: pass idmap to __ceph_setattr
  ceph: allow idmapped permission inode op
  ceph: allow idmapped getattr inode op
  ceph: pass an idmapping to mknod/symlink/mkdir
  ceph: add enable_unsafe_idmap module parameter
  ceph: handle idmapped mounts in create_request_message()
  ceph: stash idmapping in mdsc request
  fs: export mnt_idmap_get/mnt_idmap_put
  libceph, ceph: move mdsmap.h to fs/ceph
  ceph: print cluster fsid and client global_id in all debug logs
  ceph: rename _to_client() to _to_fs_client()
  ceph: pass the mdsc to several helpers
  libceph: add doutc and *_client debug macros support
2023-11-10 09:52:56 -08:00
Steve French
fd2bd7c053 cifs: update internal module version number for cifs.ko
From 2.45 to 2.46

Signed-off-by: Steve French <stfrench@microsoft.com>
2023-11-10 09:33:26 -06:00
Shyam Prasad N
ee1d21794e cifs: handle when server stops supporting multichannel
When a server stops supporting multichannel, we will
keep attempting reconnects to the secondary channels today.
Avoid this by freeing extra channels when negotiate
returns no multichannel support.

Signed-off-by: Shyam Prasad N <sprasad@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2023-11-10 09:33:19 -06:00
Shyam Prasad N
705fc522fe cifs: handle when server starts supporting multichannel
When the user mounts with multichannel option, but the
server does not support it, there can be a time in future
where it can be supported.

With this change, such a case is handled.

Signed-off-by: Shyam Prasad N <sprasad@microsoft.com>
2023-11-10 09:33:15 -06:00
Steve French
784e0e20b4 Missing field not being returned in ioctl CIFS_IOC_GET_MNT_INFO
The tcon_flags field was always being set to zero in the information
about the mount returned by the ioctl CIFS_IOC_GET_MNT_INFO instead
of being set to the value of the Flags field in the tree connection
structure as intended.

Reviewed-by: Shyam Prasad N <sprasad@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2023-11-10 09:32:04 -06:00
Steve French
de4eceab57 smb3: allow dumping session and tcon id to improve stats analysis and debugging
When multiple mounts are to the same share from the same client it was not
possible to determine which section of /proc/fs/cifs/Stats (and DebugData)
correspond to that mount.  In some recent examples this turned out to  be
a significant problem when trying to analyze performance data - since
there are many cases where unless we know the tree id and session id we
can't figure out which stats (e.g. number of SMB3.1.1 requests by type,
the total time they take, which is slowest, how many fail etc.) apply to
which mount. The only existing loosely related ioctl CIFS_IOC_GET_MNT_INFO
does not return the information needed to uniquely identify which tcon
is which mount although it does return various flags and device info.

Add a cifs.ko ioctl CIFS_IOC_GET_TCON_INFO (0x800ccf0c) to return tid,
session id, tree connect count.

Cc: stable@vger.kernel.org
Reviewed-by: Shyam Prasad N <sprasad@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2023-11-10 02:00:30 -06:00
Paulo Alcantara
5e2fd17f43 smb: client: fix mount when dns_resolver key is not available
There was a wrong assumption that with CONFIG_CIFS_DFS_UPCALL=y there
would always be a dns_resolver key set up so we could unconditionally
upcall to resolve UNC hostname rather than using the value provided by
mount(2).

Only require it when performing automount of junctions within a DFS
share so users that don't have dns_resolver key still can mount their
regular shares with server hostname resolved by mount.cifs(8).

Fixes: 348a04a8d1 ("smb: client: get rid of dfs code dep in namespace.c")
Cc: stable@vger.kernel.org
Tested-by: Eduard Bachmakov <e.bachmakov@gmail.com>
Reported-by: Eduard Bachmakov <e.bachmakov@gmail.com>
Closes: https://lore.kernel.org/all/CADCRUiNvZuiUZ0VGZZO9HRyPyw6x92kiA7o7Q4tsX5FkZqUkKg@mail.gmail.com/
Signed-off-by: Paulo Alcantara (SUSE) <pc@manguebit.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2023-11-09 12:59:36 -06:00
Steve French
5923d6686a smb3: fix caching of ctime on setxattr
Fixes xfstest generic/728 which had been failing due to incorrect
ctime after setxattr and removexattr

Update ctime on successful set of xattr

Cc: stable@vger.kernel.org
Signed-off-by: Steve French <stfrench@microsoft.com>
2023-11-09 10:25:58 -06:00
Steve French
f72d965076 smb3: minor cleanup of session handling code
Minor cleanup of style issues found by checkpatch

Reviewed-by: Bharath SM <bharathsm@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2023-11-09 10:25:42 -06:00
Shyam Prasad N
19a4b9d6c3 cifs: reconnect work should have reference on server struct
The delayed work for reconnect takes server struct
as a parameter. But it does so without holding a ref
to it. Normally, this may not show a problem as
the reconnect work is only cancelled on umount.

However, since we now plan to support scaling down of
channels, and the scale down can happen from reconnect
work itself, we need to fix it.

This change takes a reference on the server struct
before it is passed to the delayed work. And drops
the reference in the delayed work itself. Or if
the delayed work is successfully cancelled, by the
process that cancels it.

Signed-off-by: Shyam Prasad N <sprasad@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2023-11-09 10:25:32 -06:00
Shyam Prasad N
9599d59eb8 cifs: do not pass cifs_sb when trying to add channels
The only reason why cifs_sb gets passed today to cifs_try_adding_channels
is to pass the local_nls field for the new channels and binding session.
However, the ses struct already has local_nls field that is setup during
the first cifs_setup_session. So there is no need to pass cifs_sb.

This change removes cifs_sb from the arg list for this and the functions
that it calls and uses ses->local_nls instead.

Cc: stable@vger.kernel.org
Signed-off-by: Shyam Prasad N <sprasad@microsoft.com>
Reviewed-by: Paulo Alcantara (SUSE) <pc@manguebit.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2023-11-09 10:25:21 -06:00
Shyam Prasad N
fa1d0508bd cifs: account for primary channel in the interface list
The refcounting of server interfaces should account
for the primary channel too. Although this is not
strictly necessary, doing so will account for the primary
channel in DebugData.

Cc: stable@vger.kernel.org
Reviewed-by: Paulo Alcantara (SUSE) <pc@manguebit.com>
Signed-off-by: Shyam Prasad N <sprasad@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2023-11-09 10:25:19 -06:00
Shyam Prasad N
a6d8fb54a5 cifs: distribute channels across interfaces based on speed
Today, if the server interfaces RSS capable, we simply
choose the fastest interface to setup a channel. This is not
a scalable approach, and does not make a lot of attempt to
distribute the connections.

This change does a weighted distribution of channels across
all the available server interfaces, where the weight is
a function of the advertised interface speed.

Also make sure that we don't mix rdma and non-rdma for channels.

Signed-off-by: Shyam Prasad N <sprasad@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2023-11-09 10:25:17 -06:00
Shyam Prasad N
0c51cc6f2c cifs: handle cases where a channel is closed
So far, SMB multichannel could only scale up, but not
scale down the number of channels. In this series of
patch, we now allow the client to deal with the case
of multichannel disabled on the server when the share
is mounted. With that change, we now need the ability
to scale down the channels.

This change allows the client to deal with cases of
missing channels more gracefully.

Signed-off-by: Shyam Prasad N <sprasad@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2023-11-09 10:25:14 -06:00