Fixes following W=1 warnings:
fs/btrfs/free-space-cache.c:1317: warning: Function parameter or member 'root' not described in '__btrfs_write_out_cache'
fs/btrfs/free-space-cache.c:1317: warning: Function parameter or member 'inode' not described in '__btrfs_write_out_cache'
fs/btrfs/free-space-cache.c:1317: warning: Function parameter or member 'ctl' not described in '__btrfs_write_out_cache'
fs/btrfs/free-space-cache.c:1317: warning: Function parameter or member 'block_group' not described in '__btrfs_write_out_cache'
fs/btrfs/free-space-cache.c:1317: warning: Function parameter or member 'io_ctl' not described in '__btrfs_write_out_cache'
fs/btrfs/free-space-cache.c:1317: warning: Function parameter or member 'trans' not described in '__btrfs_write_out_cache'
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
This fixes the following warnings:
fs/btrfs/delayed-ref.c:80: warning: Function parameter or member 'fs_info' not described in 'btrfs_delayed_refs_rsv_release'
fs/btrfs/delayed-ref.c:80: warning: Function parameter or member 'nr' not described in 'btrfs_delayed_refs_rsv_release'
fs/btrfs/delayed-ref.c:128: warning: Function parameter or member 'fs_info' not described in 'btrfs_migrate_to_delayed_refs_rsv'
fs/btrfs/delayed-ref.c:128: warning: Function parameter or member 'src' not described in 'btrfs_migrate_to_delayed_refs_rsv'
fs/btrfs/delayed-ref.c:128: warning: Function parameter or member 'num_bytes' not described in 'btrfs_migrate_to_delayed_refs_rsv'
fs/btrfs/delayed-ref.c:174: warning: Function parameter or member 'fs_info' not described in 'btrfs_delayed_refs_rsv_refill'
fs/btrfs/delayed-ref.c:174: warning: Function parameter or member 'flush' not described in 'btrfs_delayed_refs_rsv_refill'
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
This fixes following W=1 warnings:
fs/btrfs/file-item.c:27: warning: Cannot understand * @inode: the inode we want to update the disk_i_size for
on line 27 - I thought it was a doc line
fs/btrfs/file-item.c:65: warning: Cannot understand * @inode - the inode we're modifying
on line 65 - I thought it was a doc line
fs/btrfs/file-item.c:91: warning: Cannot understand * @inode - the inode we're modifying
on line 91 - I thought it was a doc line
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
This fixes the following compiler warnings:
fs/btrfs/extent_map.c:601: warning: Function parameter or member 'fs_info' not described in 'btrfs_add_extent_mapping'
fs/btrfs/extent_map.c:601: warning: Function parameter or member 'em_tree' not described in 'btrfs_add_extent_mapping'
fs/btrfs/extent_map.c:601: warning: Function parameter or member 'em_in' not described in 'btrfs_add_extent_mapping'
fs/btrfs/extent_map.c:601: warning: Function parameter or member 'start' not described in 'btrfs_add_extent_mapping'
fs/btrfs/extent_map.c:601: warning: Function parameter or member 'len' not described in 'btrfs_add_extent_mapping'
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Fixes fs/btrfs/extent_map.c:399: warning: Function parameter or member
'modified' not described in 'add_extent_mapping'
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
[BUG]
There is a long existing bug in the last parameter of
btrfs_add_ordered_extent(), in commit 771ed689d2 ("Btrfs: Optimize
compressed writeback and reads") back to 2008.
In that ancient commit btrfs_add_ordered_extent() expects the @type
parameter to be one of the following:
- BTRFS_ORDERED_REGULAR
- BTRFS_ORDERED_NOCOW
- BTRFS_ORDERED_PREALLOC
- BTRFS_ORDERED_COMPRESSED
But we pass 0 in cow_file_range(), which means BTRFS_ORDERED_IO_DONE.
Ironically extra check in __btrfs_add_ordered_extent() won't set the bit
if we see (type == IO_DONE || type == IO_COMPLETE), and avoid any
obvious bug.
But this still leads to regular COW ordered extent having no bit to
indicate its type in various trace events, rendering REGULAR bit
useless.
[FIX]
Change the following aspects to avoid such problem:
- Reorder btrfs_ordered_extent::flags
Now the type bits go first (REGULAR/NOCOW/PREALLCO/COMPRESSED), then
DIRECT bit, finally extra status bits like IO_DONE/COMPLETE/IOERR.
- Add extra ASSERT() for btrfs_add_ordered_extent_*()
- Remove @type parameter for btrfs_add_ordered_extent_compress()
As the only valid @type here is BTRFS_ORDERED_COMPRESSED.
- Remove the unnecessary special check for IO_DONE/COMPLETE in
__btrfs_add_ordered_extent()
This is just to make the code work, with extra ASSERT(), there are
limited values can be passed in.
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Fix below warnings reported by coccicheck:
./fs/btrfs/raid56.c:237:2-8: WARNING: NULL check before some freeing
functions is not needed.
Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: Yang Li <abaci-bugfix@linux.alibaba.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Zygo reported the following panic when testing my error handling patches
for relocation:
kernel BUG at fs/btrfs/backref.c:2545!
invalid opcode: 0000 [#1] SMP KASAN PTI CPU: 3 PID: 8472 Comm: btrfs Tainted: G W 14
Hardware name: QEMU Standard PC (i440FX + PIIX,
Call Trace:
btrfs_backref_error_cleanup+0x4df/0x530
build_backref_tree+0x1a5/0x700
? _raw_spin_unlock+0x22/0x30
? release_extent_buffer+0x225/0x280
? free_extent_buffer.part.52+0xd7/0x140
relocate_tree_blocks+0x2a6/0xb60
? kasan_unpoison_shadow+0x35/0x50
? do_relocation+0xc10/0xc10
? kasan_kmalloc+0x9/0x10
? kmem_cache_alloc_trace+0x6a3/0xcb0
? free_extent_buffer.part.52+0xd7/0x140
? rb_insert_color+0x342/0x360
? add_tree_block.isra.36+0x236/0x2b0
relocate_block_group+0x2eb/0x780
? merge_reloc_roots+0x470/0x470
btrfs_relocate_block_group+0x26e/0x4c0
btrfs_relocate_chunk+0x52/0x120
btrfs_balance+0xe2e/0x18f0
? pvclock_clocksource_read+0xeb/0x190
? btrfs_relocate_chunk+0x120/0x120
? lock_contended+0x620/0x6e0
? do_raw_spin_lock+0x1e0/0x1e0
? do_raw_spin_unlock+0xa8/0x140
btrfs_ioctl_balance+0x1f9/0x460
btrfs_ioctl+0x24c8/0x4380
? __kasan_check_read+0x11/0x20
? check_chain_key+0x1f4/0x2f0
? __asan_loadN+0xf/0x20
? btrfs_ioctl_get_supported_features+0x30/0x30
? kvm_sched_clock_read+0x18/0x30
? check_chain_key+0x1f4/0x2f0
? lock_downgrade+0x3f0/0x3f0
? handle_mm_fault+0xad6/0x2150
? do_vfs_ioctl+0xfc/0x9d0
? ioctl_file_clone+0xe0/0xe0
? check_flags.part.50+0x6c/0x1e0
? check_flags.part.50+0x6c/0x1e0
? check_flags+0x26/0x30
? lock_is_held_type+0xc3/0xf0
? syscall_enter_from_user_mode+0x1b/0x60
? do_syscall_64+0x13/0x80
? rcu_read_lock_sched_held+0xa1/0xd0
? __kasan_check_read+0x11/0x20
? __fget_light+0xae/0x110
__x64_sys_ioctl+0xc3/0x100
do_syscall_64+0x37/0x80
entry_SYSCALL_64_after_hwframe+0x44/0xa9
This occurs because of this check
if (RB_EMPTY_NODE(&upper->rb_node))
BUG_ON(!list_empty(&node->upper));
As we are dropping the backref node, if we discover that our upper node
in the edge we just cleaned up isn't linked into the cache that we are
now done with this node, thus the BUG_ON().
However this is an erroneous assumption, as we will look up all the
references for a node first, and then process the pending edges. All of
the 'upper' nodes in our pending edges won't be in the cache's rb_tree
yet, because they haven't been processed. We could very well have many
edges still left to cleanup on this node.
The fact is we simply do not need this check, we can just process all of
the edges only for this node, because below this check we do the
following
if (list_empty(&upper->lower)) {
list_add_tail(&upper->lower, &cache->leaves);
upper->lowest = 1;
}
If the upper node truly isn't used yet, then we add it to the
cache->leaves list to be cleaned up later. If it is still used then the
last child node that has it linked into its node will add it to the
leaves list and then it will be cleaned up.
Fix this problem by dropping this logic altogether. With this fix I no
longer see the panic when testing with error injection in the backref
code.
CC: stable@vger.kernel.org # 4.4+
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
While testing the error paths in relocation, I hit the following lockdep
splat:
======================================================
WARNING: possible circular locking dependency detected
5.10.0-rc3+ #206 Not tainted
------------------------------------------------------
btrfs-balance/1571 is trying to acquire lock:
ffff8cdbcc8f77d0 (&head_ref->mutex){+.+.}-{3:3}, at: btrfs_lookup_extent_info+0x156/0x3b0
but task is already holding lock:
ffff8cdbc54adbf8 (btrfs-tree-00){++++}-{3:3}, at: __btrfs_tree_lock+0x27/0x100
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #2 (btrfs-tree-00){++++}-{3:3}:
down_write_nested+0x43/0x80
__btrfs_tree_lock+0x27/0x100
btrfs_search_slot+0x248/0x890
relocate_tree_blocks+0x490/0x650
relocate_block_group+0x1ba/0x5d0
kretprobe_trampoline+0x0/0x50
-> #1 (btrfs-csum-01){++++}-{3:3}:
down_read_nested+0x43/0x130
__btrfs_tree_read_lock+0x27/0x100
btrfs_read_lock_root_node+0x31/0x40
btrfs_search_slot+0x5ab/0x890
btrfs_del_csums+0x10b/0x3c0
__btrfs_free_extent+0x49d/0x8e0
__btrfs_run_delayed_refs+0x283/0x11f0
btrfs_run_delayed_refs+0x86/0x220
btrfs_start_dirty_block_groups+0x2ba/0x520
kretprobe_trampoline+0x0/0x50
-> #0 (&head_ref->mutex){+.+.}-{3:3}:
__lock_acquire+0x1167/0x2150
lock_acquire+0x116/0x3e0
__mutex_lock+0x7e/0x7b0
btrfs_lookup_extent_info+0x156/0x3b0
walk_down_proc+0x1c3/0x280
walk_down_tree+0x64/0xe0
btrfs_drop_subtree+0x182/0x260
do_relocation+0x52e/0x660
relocate_tree_blocks+0x2ae/0x650
relocate_block_group+0x1ba/0x5d0
kretprobe_trampoline+0x0/0x50
other info that might help us debug this:
Chain exists of:
&head_ref->mutex --> btrfs-csum-01 --> btrfs-tree-00
Possible unsafe locking scenario:
CPU0 CPU1
---- ----
lock(btrfs-tree-00);
lock(btrfs-csum-01);
lock(btrfs-tree-00);
lock(&head_ref->mutex);
*** DEADLOCK ***
5 locks held by btrfs-balance/1571:
#0: ffff8cdb89749ff8 (&fs_info->delete_unused_bgs_mutex){+.+.}-{3:3}, at: btrfs_balance+0x563/0xf40
#1: ffff8cdb89748838 (&fs_info->cleaner_mutex){+.+.}-{3:3}, at: btrfs_relocate_block_group+0x156/0x300
#2: ffff8cdbc2c16650 (sb_internal#2){.+.+}-{0:0}, at: start_transaction+0x413/0x5c0
#3: ffff8cdbc135f538 (btrfs-treloc-01){+.+.}-{3:3}, at: __btrfs_tree_lock+0x27/0x100
#4: ffff8cdbc54adbf8 (btrfs-tree-00){++++}-{3:3}, at: __btrfs_tree_lock+0x27/0x100
stack backtrace:
CPU: 1 PID: 1571 Comm: btrfs-balance Not tainted 5.10.0-rc3+ #206
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.13.0-2.fc32 04/01/2014
Call Trace:
dump_stack+0x8b/0xb0
check_noncircular+0xcf/0xf0
? trace_call_bpf+0x139/0x260
__lock_acquire+0x1167/0x2150
lock_acquire+0x116/0x3e0
? btrfs_lookup_extent_info+0x156/0x3b0
__mutex_lock+0x7e/0x7b0
? btrfs_lookup_extent_info+0x156/0x3b0
? btrfs_lookup_extent_info+0x156/0x3b0
? release_extent_buffer+0x124/0x170
? _raw_spin_unlock+0x1f/0x30
? release_extent_buffer+0x124/0x170
btrfs_lookup_extent_info+0x156/0x3b0
walk_down_proc+0x1c3/0x280
walk_down_tree+0x64/0xe0
btrfs_drop_subtree+0x182/0x260
do_relocation+0x52e/0x660
relocate_tree_blocks+0x2ae/0x650
? add_tree_block+0x149/0x1b0
relocate_block_group+0x1ba/0x5d0
elfcorehdr_read+0x40/0x40
? elfcorehdr_read+0x40/0x40
? btrfs_balance+0x796/0xf40
? __kthread_parkme+0x66/0x90
? btrfs_balance+0xf40/0xf40
? balance_kthread+0x37/0x50
? kthread+0x137/0x150
? __kthread_bind_mask+0x60/0x60
? ret_from_fork+0x1f/0x30
As you can see this is bogus, we never take another tree's lock under
the csum lock. This happens because sometimes we have to read tree
blocks from disk without knowing which root they belong to during
relocation. We defaulted to an owner of 0, which translates to an fs
tree. This is fine as all fs trees have the same class, but obviously
isn't fine if the block belongs to a COW only tree.
Thankfully COW only trees only have their owners root as a reference to
them, and since we already look up the extent information during
relocation, go ahead and check and see if this block might belong to a
COW only tree, and if so save the owner in the tree_block struct. This
allows us to read_tree_block with the proper owner, which gets rid of
this lockdep splat.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
This patch will extract the code to grab an extent buffer from a page
into a helper, grab_extent_buffer_from_page().
This reduces one indent level, and provides the work place for later
expansion for subapge support.
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
The original comment is from the initial merge, which has several
problems:
- No holes check any more
- No inline decision is made
Update the out-of-date comment with more correct one.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
The refactoring involves the following modifications:
- iosize alignment
In fact we don't really need to manually do alignment at all.
All extent maps should already be aligned, thus basic ASSERT() check
would be enough.
- redundant variables
We have extra variable like blocksize/pg_offset/end.
They are all unnecessary.
@blocksize can be replaced by sectorsize size directly, and it's only
used to verify the em start/size is aligned.
@pg_offset can be easily calculated using @cur and page_offset(page).
@end is just assigned from @page_end and never modified, use
"start + PAGE_SIZE - 1" directly and remove @page_end.
- remove some BUG_ON()s
The BUG_ON()s are for extent map, which we have tree-checker to check
on-disk extent data item and runtime check.
ASSERT() should be enough.
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
The parameter offset is confusing, it's supposed to be the disk bytenr
of metadata/data. Rename it to disk_bytenr and update the comment.
Also rename each offset passed to submit_extent_page() as @disk_bytenr
so they're consistent.
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
The refactoring involves the following modifications:
- Return bool instead of int
- Parameter update for @cached of btrfs_dec_test_first_ordered_pending()
For btrfs_dec_test_first_ordered_pending(), @cached is only used to
return the finished ordered extent.
Rename it to @finished_ret.
- Comment updates
* Change one stale comment
Which still refers to btrfs_dec_test_ordered_pending(), but the
context is calling btrfs_dec_test_first_ordered_pending().
* Follow the common comment style for both functions
Add more detailed descriptions for parameters and the return value
* Move the reason why test_and_set_bit() is used into the call sites
- Change how the return value is calculated
The most anti-human part of the return value is:
if (...)
ret = 1;
...
return ret == 0;
This means, when we set ret to 1, the function returns 0.
Change the local variable name to @finished, and directly return the
value of it.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
btrfs_dio_private::bytes is only assigned from bio::bi_iter::bi_size,
which is never larger than U32.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Following the rework in e076ab2a2c ("btrfs: shrink delalloc pages
instead of full inodes") the nr variable is no longer passed by
reference to start_delalloc_inodes hence it cannot change. Additionally
we are always guaranteed for it to be positive number hence it's
redundant to have it as a condition in the loop. Simply remove that
usage.
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
It's currently u64 which gets instantly translated either to LONG_MAX
(if U64_MAX is passed) or cast to an unsigned long (which is in fact,
wrong because writeback_control::nr_to_write is a signed, long type).
Just convert the function's argument to be long time which obviates the
need to manually convert u64 value to a long. Adjust all call sites
which pass U64_MAX to pass LONG_MAX. Finally ensure that in
shrink_delalloc the u64 is converted to a long without overflowing,
resulting in a negative number.
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
After commit 040ee6120c ("Btrfs: send, improve clone range") we do not
use anymore the data_offset field of struct backref_ctx, as after that we
do all the necessary checks for the data offset of file extent items at
clone_range(). Since there are no more users of data_offset from that
structure, remove it.
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Instead of having three 'if' to handle non-NULL return value consolidate
this in one 'if (ret)'. That way the code is more obvious:
- Always drop delete_unused_bgs_mutex if ret is not NULL
- If ret is negative -> goto done
- If it's 1 -> reset ret to 0, release the path and finish the loop.
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
I noticed that shared ref entries in ref-verify didn't have the proper
owner set, which caused me to think there was something seriously wrong.
However the problem is if we have a parent we simply weren't filling out
the owner part of the reference, even though we have it.
Fix this by making sure we set all the proper fields when we modify a
reference, this way we'll have the proper owner if a problem happens and
we don't waste time thinking we're updating the wrong level.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
I noticed that sometimes I would have the wrong level printed out with
ref-verify while testing some error injection related problems. This is
because we only get the level from the main extent item, but our
references could go off the current leaf into another, and at that point
we lose our level.
Fix this by keeping track of the last tree block level that we found,
the same way we keep track of our bytenr and num_bytes, in case we
happen to wander into another leaf while still processing the references
for a bytenr.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
I was attempting to reproduce a problem that Zygo hit, but my error
injection wasn't firing for a few of the common calls to
btrfs_should_cancel_balance. This is because the compiler decided to
inline it at these spots. Keep this from happening by explicitly
marking the function as noinline so that error injection will always
work.
Reviewed-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
The following patches are going to address error handling in relocation,
in order to test those patches I need to be able to inject errors in
btrfs_search_slot and btrfs_cow_block, as we call both of these pretty
often in different cases during relocation.
Reviewed-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
It's no longer used. While at it also remove new_dirid in create_subvol
as it's used in a single place and open code it. No functional changes.
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Adjust the way free_objectid is being initialized, it now stores
BTRFS_FIRST_FREE_OBJECTID rather than the, somewhat arbitrary,
BTRFS_FIRST_FREE_OBJECTID - 1. This change also has the added benefit
that now it becomes unnecessary to explicitly initialize free_objectid
for a newly create fs root.
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
This reflects the true purpose of the member as it's being used solely
in context where a new objectid is being allocated. Future changes will
also change the way it's being used to closely follow this semantics.
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
This better reflects the semantics of the function i.e no search is
performed whatsoever.
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
This function is used to initialize the in-memory
btrfs_root::highest_objectid member, which is used to get an available
objectid. Rename it to better reflect its semantics.
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
First replace all inode instances with a pointer to btrfs_inode. This
removes multiple invocations of the BTRFS_I macro, subsequently remove
2 local variables as they are called only once and simply refer to
them directly.
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Return value in __load_free_space_cache is not properly set after
(unlikely) memory allocation failures and 0 is returned instead.
This is not a problem for the caller load_free_space_cache because only
value 1 is considered as 'cache loaded' but for clarity it's better
to set the errors accordingly.
Fixes: a67509c300 ("Btrfs: add a io_ctl struct and helpers for dealing with the space cache")
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
While doing error injection I would sometimes get a corrupt file system.
This is because I was injecting errors at btrfs_search_slot, but would
only do it one time per stack. This uncovered a problem in
commit_fs_roots, where if we get an error we would just break. However
we're in a nested loop, the first loop being a loop to find all the
dirty fs roots, and then subsequent root updates would succeed clearing
the error value.
This isn't likely to happen in real scenarios, however we could
potentially get a random ENOMEM once and then not again, and we'd end up
with a corrupted file system. Fix this by moving the error checking
around a bit to the main loop, as this is the only place where something
will fail, and return the error as soon as it occurs.
With this patch my reproducer no longer corrupts the file system.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
-----BEGIN PGP SIGNATURE-----
iQGzBAABCgAdFiEE6fsu8pdIjtWE/DpLiiy9cAdyT1EFAmAeJkgACgkQiiy9cAdy
T1EXAwv+IMIPxilkjjArn/36IG9pFBwMHtsQUojGf4dhUesL5DQzSraaRu2aYDPB
wdNnyuLHb7UUA6khramQnAr+eQ3O1nrCzHHgGK6RQk1tDlMqTZiR51cPLF67AVqA
Jg2Q+KlSzdPUea1G8iv61pnD6y7bAufEklNXk1Xbq9mPd81y3gfGi5bM6tKwy1a/
pYhHsko/2n1C6NO5d24yrKjXj2rRobY8XJ0UOHax+D5VxKfZ+5ub0sulq8UEg4ki
8BztAwkYwwU87QzkKTD8imDfAzAKyvKIQM/idrkyt1ZVkf6HdwM+EKZWbEY0G9Mv
u8y+E7cjT17jTtkthm2bfaubWepWkc1STxmFiEp3Xy7+HDRc1UUGY/wgHObzaDLo
P3V/G/XGCn8AJuLkpsx1iO5Cee2CHEtMCaDbCBgAnrSBxcLxqtXKOCrGXMARSRaP
ylUl94Ek9QGlG7YAGklcsNO896vWMNfstRUtFZBD68SV44Pam333ZBfOJ5fv0Xuy
svG0qcr8
=5FIb
-----END PGP SIGNATURE-----
Merge tag '5.11-rc6-smb3' of git://git.samba.org/sfrench/cifs-2.6
Pull cifs fixes from Steve French:
"Three small smb3 fixes for stable"
* tag '5.11-rc6-smb3' of git://git.samba.org/sfrench/cifs-2.6:
cifs: report error instead of invalid when revalidating a dentry fails
smb3: fix crediting for compounding when only one request in flight
smb3: Fix out-of-bounds bug in SMB2_negotiate()
-----BEGIN PGP SIGNATURE-----
iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmAd0KoQHGF4Ym9lQGtl
cm5lbC5kawAKCRD301j7KXHgpkPvD/kBm/uxstfomiryxDUeALadUZTIxkIsP8Zx
6IijgJXvynDJutz8gjA7aynK8j4YyrktiuS4C3ctxU+cyt3/M2ZFnnQpx88gvfK5
5OANmB1iMwUyu4GkZHsmWPo4aqv6mvE+QKKYMu8m++6/ZA4/458jx0AsjP1XSKth
VYeRLElPTH+JcoxSgn9DwEJiGGViN26rpiy3NG2fNt/dXNFgwD8BevjXAYdQNscs
Xrox2p2TLoMnCVWoDXg3XmMwZphigibjyWWgEEZp3LGrHU49HNUIL7GXv5No1nAO
okxmVL3zropEgCXqxeJ5eGG+Ve1JCuvMxgl34dVN3qoN6AhfU7BXbGFeoKYSQIIW
pgF2Qv0+KGEnRD7HOSLdygnl2gLP9ID+Xx214rKlnRE3bFkg5lwZxg4Pfos1Sn5N
PGLqfvhZ8/Qb5BObW4qMobz3yG5ozrHJ8+EeccgNJuOGQtw3yHxp5NAotzTp97mA
5RCw6f9HVlTcgRnDOdYskeUfb4N1i4Ps1/0RCHGWlxOpFsVkClWeDp1DTA+/gW5l
+7vREo3vpDfNW68PgWwp5y2RyfocOgRS6pRX0gDhtsLx6MJl1YKGbU0qbamdjofm
bOygR+Ce4rYiG+kFHkkJcWG9rjcomy2BXCHXoylx65FimYmrFuQzdxpRO2MrWpzJ
4zQcegXM1A==
=sGoQ
-----END PGP SIGNATURE-----
Merge tag 'io_uring-5.11-2021-02-05' of git://git.kernel.dk/linux-block
Pull io_uring fixes from Jens Axboe:
"Two small fixes that should go into 5.11:
- task_work resource drop fix (Pavel)
- identity COW fix (Xiaoguang)"
* tag 'io_uring-5.11-2021-02-05' of git://git.kernel.dk/linux-block:
io_uring: drop mm/files between task_work_submit
io_uring: don't modify identity's files uncess identity is cowed
Assuming
- //HOST/a is mounted on /mnt
- //HOST/b is mounted on /mnt/b
On a slow connection, running 'df' and killing it while it's
processing /mnt/b can make cifs_get_inode_info() returns -ERESTARTSYS.
This triggers the following chain of events:
=> the dentry revalidation fail
=> dentry is put and released
=> superblock associated with the dentry is put
=> /mnt/b is unmounted
This patch makes cifs_d_revalidate() return the error instead of 0
(invalid) when cifs_revalidate_dentry() fails, except for ENOENT (file
deleted) and ESTALE (file recreated).
Signed-off-by: Aurelien Aptel <aaptel@suse.com>
Suggested-by: Shyam Prasad N <nspmangalore@gmail.com>
Reviewed-by: Shyam Prasad N <nspmangalore@gmail.com>
CC: stable@vger.kernel.org
Signed-off-by: Steve French <stfrench@microsoft.com>
If a new hugetlb page is allocated during fallocate it will not be
marked as active (set_page_huge_active) which will result in a later
isolate_huge_page failure when the page migration code would like to
move that page. Such a failure would be unexpected and wrong.
Only export set_page_huge_active, just leave clear_page_huge_active as
static. Because there are no external users.
Link: https://lkml.kernel.org/r/20210115124942.46403-3-songmuchun@bytedance.com
Fixes: 70c3547e36 (hugetlbfs: add hugetlbfs_fallocate())
Signed-off-by: Muchun Song <songmuchun@bytedance.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Reviewed-by: Oscar Salvador <osalvador@suse.de>
Cc: David Hildenbrand <david@redhat.com>
Cc: Yang Shi <shy828301@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Currently we try to guess if a compound request is going to
succeed waiting for credits or not based on the number of
requests in flight. This approach doesn't work correctly
all the time because there may be only one request in
flight which is going to bring multiple credits satisfying
the compound request.
Change the behavior to fail a request only if there are no requests
in flight at all and proceed waiting for credits otherwise.
Cc: <stable@vger.kernel.org> # 5.1+
Signed-off-by: Pavel Shilovsky <pshilov@microsoft.com>
Reviewed-by: Tom Talpey <tom@talpey.com>
Reviewed-by: Shyam Prasad N <nspmangalore@gmail.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Since SQPOLL task can be shared and so task_work entries can be a mix of
them, we need to drop mm and files before trying to issue next request.
Cc: stable@vger.kernel.org # 5.10+
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQSQHSd0lITzzeNWNm3h3BK/laaZPAUCYBuyTQAKCRDh3BK/laaZ
PBBhAPwLy3ksQLhY7in4I8aKrSyWRpaCSAeLQUitxnX3eQiQnAD/S1EEIapwradV
y4ou1PBRsGnhwNgArXODVCcTgqDJqw8=
=GjU4
-----END PGP SIGNATURE-----
Merge tag 'ovl-fixes-5.11-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs
Pull overlayfs fixes from Miklos Szeredi:
- Fix capability conversion and minor overlayfs bugs that are related
to the unprivileged overlay mounts introduced in this cycle.
- Fix two recent (v5.10) and one old (v4.10) bug.
- Clean up security xattr copy-up (related to a SELinux regression).
* tag 'ovl-fixes-5.11-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs:
ovl: implement volatile-specific fsync error behaviour
ovl: skip getxattr of security labels
ovl: fix dentry leak in ovl_get_redirect
ovl: avoid deadlock on directory ioctl
cap: fix conversions on getxattr
ovl: perform vfs_getxattr() with mounter creds
ovl: add warning on user_ns mismatch
trees.
Current release - regressions:
- ip_tunnel: fix mtu calculation
- mlx5: fix function calculation for page trees
Previous releases - regressions:
- vsock: fix the race conditions in multi-transport support
- neighbour: prevent a dead entry from updating gc_list
- dsa: mv88e6xxx: override existent unicast portvec in port_fdb_add
Previous releases - always broken:
- bpf, cgroup: two copy_{from,to}_user() warn_on_once splats for BPF
cgroup getsockopt infra when user space is trying
to race against optlen, from Loris Reiff.
- bpf: add missing fput() in BPF inode storage map update helper
- udp: ipv4: manipulate network header of NATed UDP GRO fraglist
- mac80211: fix station rate table updates on assoc
- r8169: work around RTL8125 UDP HW bug
- igc: report speed and duplex as unknown when device is runtime
suspended
- rxrpc: fix deadlock around release of dst cached on udp tunnel
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmAZjwQACgkQMUZtbf5S
IruLbQ//Yg9+xEnqhDuOJZtYHB0rsJjLlKmtvgOsBr8BaTcUEPoPoqUPm+EMvCHb
o1fFa1qIrbS5luVEofu9hNX7DGXwvgawaMW2TympJhqLZQqjazCMB/st99LphhJw
RvaZI8aDOikosT4c+I0vm83jDQETonrjziIcPfHHPjn/Q+amGRRRXiTSQnRF/MlU
oARCG+U3kHsHBDUPNSCtSjKXshoZPjFb/pD7fQAlzzm7CssvbPhNWbducueyP2Fb
XW4RwJu9QBBH2JS6uZJ1Y6LVoRzusmE9dUam3KhkiL/CHs72lWPsc+Rn5gbBPvc5
Y4T4h61Xti1O4ULKdqhGceror6XY+4Qb1VlHWWztOhIo00wIAv3IHbTup/4o0HBr
j84MtcyOl/qxSFXjunPJkbWJngXikrkIMS0Bl6ZcPAejYM9wN6vCgbvFCHbEg1Rx
cWFnYyS9FCLduaxHSizv050tWhknOdX+zHK3fOtlW0yWnreJAB8Hoc21Zm7YKvg0
GxxcGK6AhqJ6s2ixVDv7MyJrltJ/hOJQb+T3HgHFuY2BYUs8F2r/HoHU/u4uCl76
RdBzbC/sLnBpMHf6r1rHTnGPsapoJOOYWnej71l425vX1qr5xnmxVNNB6HReObNv
+/jPoRYa5BVsVt2LmDcuH1O32pXJPWKVBR7Yfa6Bn2yzhcbECTc=
=ZByM
-----END PGP SIGNATURE-----
Merge tag 'net-5.11-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
"Networking fixes for 5.11-rc7, including fixes from bpf and mac80211
trees.
Current release - regressions:
- ip_tunnel: fix mtu calculation
- mlx5: fix function calculation for page trees
Previous releases - regressions:
- vsock: fix the race conditions in multi-transport support
- neighbour: prevent a dead entry from updating gc_list
- dsa: mv88e6xxx: override existent unicast portvec in port_fdb_add
Previous releases - always broken:
- bpf, cgroup: two copy_{from,to}_user() warn_on_once splats for BPF
cgroup getsockopt infra when user space is trying to race against
optlen, from Loris Reiff.
- bpf: add missing fput() in BPF inode storage map update helper
- udp: ipv4: manipulate network header of NATed UDP GRO fraglist
- mac80211: fix station rate table updates on assoc
- r8169: work around RTL8125 UDP HW bug
- igc: report speed and duplex as unknown when device is runtime
suspended
- rxrpc: fix deadlock around release of dst cached on udp tunnel"
* tag 'net-5.11-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (36 commits)
net: hsr: align sup_multicast_addr in struct hsr_priv to u16 boundary
net: ipa: fix two format specifier errors
net: ipa: use the right accessor in ipa_endpoint_status_skip()
net: ipa: be explicit about endianness
net: ipa: add a missing __iomem attribute
net: ipa: pass correct dma_handle to dma_free_coherent()
r8169: fix WoL on shutdown if CONFIG_DEBUG_SHIRQ is set
net/rds: restrict iovecs length for RDS_CMSG_RDMA_ARGS
net: mvpp2: TCAM entry enable should be written after SRAM data
net: lapb: Copy the skb before sending a packet
net/mlx5e: Release skb in case of failure in tc update skb
net/mlx5e: Update max_opened_tc also when channels are closed
net/mlx5: Fix leak upon failure of rule creation
net/mlx5: Fix function calculation for page trees
docs: networking: swap words in icmp_errors_use_inbound_ifaddr doc
udp: ipv4: manipulate network header of NATed UDP GRO fraglist
net: ip_tunnel: fix mtu calculation
vsock: fix the race conditions in multi-transport support
net: sched: replaced invalid qdisc tree flush helper in qdisc_replace
ibmvnic: device remove has higher precedence over reset
...
While addressing some warnings generated by -Warray-bounds, I found this
bug that was introduced back in 2017:
CC [M] fs/cifs/smb2pdu.o
fs/cifs/smb2pdu.c: In function ‘SMB2_negotiate’:
fs/cifs/smb2pdu.c:822:16: warning: array subscript 1 is above array bounds
of ‘__le16[1]’ {aka ‘short unsigned int[1]’} [-Warray-bounds]
822 | req->Dialects[1] = cpu_to_le16(SMB30_PROT_ID);
| ~~~~~~~~~~~~~^~~
fs/cifs/smb2pdu.c:823:16: warning: array subscript 2 is above array bounds
of ‘__le16[1]’ {aka ‘short unsigned int[1]’} [-Warray-bounds]
823 | req->Dialects[2] = cpu_to_le16(SMB302_PROT_ID);
| ~~~~~~~~~~~~~^~~
fs/cifs/smb2pdu.c:824:16: warning: array subscript 3 is above array bounds
of ‘__le16[1]’ {aka ‘short unsigned int[1]’} [-Warray-bounds]
824 | req->Dialects[3] = cpu_to_le16(SMB311_PROT_ID);
| ~~~~~~~~~~~~~^~~
fs/cifs/smb2pdu.c:816:16: warning: array subscript 1 is above array bounds
of ‘__le16[1]’ {aka ‘short unsigned int[1]’} [-Warray-bounds]
816 | req->Dialects[1] = cpu_to_le16(SMB302_PROT_ID);
| ~~~~~~~~~~~~~^~~
At the time, the size of array _Dialects_ was changed from 1 to 3 in struct
validate_negotiate_info_req, and then in 2019 it was changed from 3 to 4,
but those changes were never made in struct smb2_negotiate_req, which has
led to a 3 and a half years old out-of-bounds bug in function
SMB2_negotiate() (fs/cifs/smb2pdu.c).
Fix this by increasing the size of array _Dialects_ in struct
smb2_negotiate_req to 4.
Fixes: 9764c02fcb ("SMB3: Add support for multidialect negotiate (SMB2.1 and later)")
Fixes: d5c7076b77 ("smb3: add smb3.1.1 to default dialect list")
Cc: stable@vger.kernel.org
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
Highlights include:
Bugfixes:
- SUNRPC: Handle 0 length opaque XDR object data properly
- Fix a layout segment leak in pnfs_layout_process()
- pNFS/NFSv4: Update the layout barrier when we schedule a layoutreturn
- pNFS/NFSv4: Improve rejection of out-of-order layouts
- pNFS/NFSv4: Try to return invalid layout in pnfs_layout_process()
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEESQctxSBg8JpV8KqEZwvnipYKAPIFAmAW4QgACgkQZwvnipYK
APJNZw/6AnLawj0kjn7z0Wc2LA0QWxbAVGYGe28gQdy6qiBbuOiFDeH8itKk6m1c
R6ZPpFHFKYk6+CsNcNws2sz9gBQj7wzDIy3sHenIaiNgY/fWNKDC8woKkJFSUSMl
GSQ9rkCYwRJu1JxP7r/9gnw/86oUTy/PgMaGdz6CMZJlq9iNa8t2UqMOfmcN8EZ3
AIewe4fSV5ebfycVz6btdJy8OCwyUfQ1OMilfh+0+5HYlk/xUxr57+AHi9r8w6bq
3tzIq3imQRgZsPPo/DJo/D4hfeFYX849/Tp+I5ydREWIwREBz2PO8bHNFnDoeoLo
AJ8mkawvpx+jsHFaAHql6STvY7uTY7qqBqsX2qSCqd6n2VEU0+cnDCY1IcgjcfBR
ozaYHJQm9ZhHzska3r/aKBQmkth9LIPU6aIMcYtjzC3ywua2vfCBSPRYKES80kIV
Pzgf5yRZFTEp7jGV9Uhf3Hucm3oIF9WVonDpSPbThdHUUXAYAVK1HZwgWx72HskL
BEhdaD+zsacv58C1+BE3vlh6A/j/cZAQifTfflgkLE3JE1IiKJwFjH4q6jgLwccx
kWLopK9Ds+ta+kLtlCuNTsPt7aGUoZZleH1Ghzdkw5Dfv2eEnR3YM6raa294avw4
DzKE/Rzgv5JuoSJhkWW/PiBZHcxMsv3SK7LTjO2oteFz88olsgo=
=gLzv
-----END PGP SIGNATURE-----
Merge tag 'nfs-for-5.11-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfs
Pull NFS client fixes from Trond Myklebust:
- SUNRPC: Handle 0 length opaque XDR object data properly
- Fix a layout segment leak in pnfs_layout_process()
- pNFS/NFSv4: Update the layout barrier when we schedule a layoutreturn
- pNFS/NFSv4: Improve rejection of out-of-order layouts
- pNFS/NFSv4: Try to return invalid layout in pnfs_layout_process()
* tag 'nfs-for-5.11-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
SUNRPC: Handle 0 length opaque XDR object data properly
SUNRPC: Move simple_get_bytes and simple_get_netobj into private header
pNFS/NFSv4: Improve rejection of out-of-order layouts
pNFS/NFSv4: Update the layout barrier when we schedule a layoutreturn
pNFS/NFSv4: Try to return invalid layout in pnfs_layout_process()
pNFS/NFSv4: Fix a layout segment leak in pnfs_layout_process()
-----BEGIN PGP SIGNATURE-----
iQGzBAABCgAdFiEE6fsu8pdIjtWE/DpLiiy9cAdyT1EFAmAU7zwACgkQiiy9cAdy
T1FTFwwAicf0cTWv554HNz+7JLFledB7uVK7XgIMjqNzfGiGuBV1PQg6CGo/+FEi
/3W0qW+JzN3lKtRGKEb6BUEH/Eklz9p+RQl3K5H0e5YM/eGjpVAIB+6HxDSqq5XA
Iot4QCo7mnxdt8Keg0/X1s+ySp7QsYjK0QEHWPKBN5KrdzBtnSo0elJSNzmNXBXE
2aLCRyrszQmqjNLhePiuGvINR8nM8wKhNDV5iHN+UhvAboF4vIOBP/0kS5UuTo/D
NMlTvzp65+rag9NmJ64n19/WLU8MRnKrLm0HgpCDyCYQ09bXToM4DhKSAcUJsLYY
06DMF2mrKA0ZubRsoD3U2aFoC1gRji4/Dsx2/zJq5Lrj70TYxSrqJNH/F6wqPf8o
92rzm/k34EnmJMPu4omhA6M7eE6DUzFTtUcvwFgqfD95CglAvmiJ0YnN9fzS9pSB
s4+ON+0h/Wj/VukBDadjdWmlUkLwQnzW7o2AlMlJg/MAot1bn6d1TWe8kmSoD3D6
ZR7U5JZh
=4/ty
-----END PGP SIGNATURE-----
Merge tag '5.11-rc5-smb3' of git://git.samba.org/sfrench/cifs-2.6
Pull cifs fixes from Steve French:
"Four cifs patches found in additional testing of the conversion to the
new mount API: three small option processing ones, and one fixing domain
based DFS referrals"
* tag '5.11-rc5-smb3' of git://git.samba.org/sfrench/cifs-2.6:
cifs: fix dfs domain referrals
cifs: returning mount parm processing errors correctly
cifs: fix mounts to subdirectories of target
cifs: ignore auto and noauto options if given
AF_RXRPC sockets use UDP ports in encap mode. This causes socket and dst
from an incoming packet to get stolen and attached to the UDP socket from
whence it is leaked when that socket is closed.
When a network namespace is removed, the wait for dst records to be cleaned
up happens before the cleanup of the rxrpc and UDP socket, meaning that the
wait never finishes.
Fix this by moving the rxrpc (and, by dependence, the afs) private
per-network namespace registrations to the device group rather than subsys
group. This allows cached rxrpc local endpoints to be cleared and their
UDP sockets closed before we try waiting for the dst records.
The symptom is that lines looking like the following:
unregister_netdevice: waiting for lo to become free
get emitted at regular intervals after running something like the
referenced syzbot test.
Thanks to Vadim for tracking this down and work out the fix.
Reported-by: syzbot+df400f2f24a1677cd7e0@syzkaller.appspotmail.com
Reported-by: Vadim Fedorenko <vfedorenko@novek.ru>
Fixes: 5271953cad ("rxrpc: Use the UDP encap_rcv hook")
Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Vadim Fedorenko <vfedorenko@novek.ru>
Link: https://lore.kernel.org/r/161196443016.3868642.5577440140646403533.stgit@warthog.procyon.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEE8rQSAMVO+zA4DBdWxWXV+ddtWDsFAmAUIkAACgkQxWXV+ddt
WDsWVg/+IIEk9H1v9q9ShvVmPvmnlT8/0ywj1hdwFMBkFBjIeU8tBz9ZMGPXCzrF
XemmWKChVOnR3SIq/bMrwuRC/Gv/pBvwVshXLP51YJHv7lSGX0Ayrb27BFQcVaC/
3QhpE7veEiqxwLyMj+LWG4hE2X+oqiqzrXCpeC5un4zEluT45RSKooqueQ4jM8aw
DrKLQA57a1YEIqrE2KQzy5A6BnSNyxPXEEX34kbugmmen46Fh77hrwme1K9vQn1t
v3/V4LcarXADxxokAxU2Igb/vK0+BN33NOYsBwLWWD4kUaTGS4KczsDOowkRRTMH
/qiQUdca0X7ElR+VFl8rgB8PxuJcZ87aCdsMkErUA4sjxyp11VDIeEgirPNAcXtR
b+1LIkn3k3l8JzkKyXwDuZuNBsh0idTY24IE+QDBMIGq+jE1N6N3t5gEwa2NeaiP
9O5QnS5XAJCo8a9+gp1aF5z94vwQwvf9TA80nGrnpxGmXEEEZ9PgXsc4JON1Blhn
NtJDwBPzEjHCEYdE73/lRMsLmYeGhpRugKb+lQ+OTo2iZzxH2SjWn9vXKiN7vAp2
zysjzdPfkY5BLggH5cPg0fuRaf/Is00EeVqn3eA7QsFKDhrpoPFBO+aV5xeshsaz
8fjt7kkXFb+Vyy4SDvmPioJQ7/MFZ5Czn+BL1JwO4l/vYcEMUzM=
=/yHv
-----END PGP SIGNATURE-----
Merge tag 'for-5.11-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull btrfs fixes from David Sterba:
"A few more fixes for a late rc:
- fix lockdep complaint on 32bit arches and also remove an unsafe
memory use due to device vs filesystem lifetime
- two fixes for free space tree:
* race during log replay and cache rebuild, now more likely to
happen due to changes in this dev cycle
* possible free space tree corruption with online conversion
during initial tree population"
* tag 'for-5.11-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: fix log replay failure due to race with space cache rebuild
btrfs: fix lockdep warning due to seqcount_mutex on 32bit arch
btrfs: fix possible free space tree corruption with online conversion
-----BEGIN PGP SIGNATURE-----
iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmAUXQsQHGF4Ym9lQGtl
cm5lbC5kawAKCRD301j7KXHgppO4EAClcqoneAuhT4UvRVNxblXPhPaoC69aNgXd
s+34uQSCqeWrWIAokfKp8bh3kyRqe00591auA7DwtwNqGpWuIECX8o9QvROEkuxv
0o4JFGMTHOJKP1W79Oy3RpF5oee6rMMOQN7EFL272p2xd8NRCP33c4fKvJRz+DDE
0kCcZhVjca0nZ+9OJC+WAlV+dit3azCAKSp7cItJsdOgZL74ZcGECm0pA8RpStyi
tQrUr2yiHLkm1lcOYfid0fG2/5a4vAGZQav+EshOWYw9UGeMquq/aqPuZZtEUjKe
oEECACfJ9cWErsi1CirIk5j5RKHOHmFSG3kRAmyvFB4f3YDGYxerI7eodWjNA0d5
38wW96sWuV4l0ShPmD3jGWIDTTcDZh4nEImCObf5YJFbr2fQXofWVWseIyo0zG8Y
zDa1N/M7XgkrScX8OF33NC1uv/oExhHA7jXuQN6mRBESYjcCrH2Lf6mXAA2C8u4T
z1RaG7ckRXGSbV3ol1ROrHj0RTXQ3zeIHj3yMRU8TKH0z6s+ob46D2PZCLi6cLvI
IuELhzKsS1EzMSVsYk9/AegynWFjVCRJoVUVxTsrxfGEF7attwmur3lOAjbZwSWb
jXlRbrkgBL1Pwbjg8AODEoq0jJgVM/S/3fG2rpcYLwwYC+FQ73/K+URmEuMsqkFC
GrYllTSMFg==
=hb7W
-----END PGP SIGNATURE-----
Merge tag 'block-5.11-2021-01-29' of git://git.kernel.dk/linux-block
Pull block fixes from Jens Axboe:
"All over the place fixes for this release:
- blk-cgroup iteration teardown resched fix (Baolin)
- NVMe pull request from Christoph:
- add another Write Zeroes quirk (Chaitanya Kulkarni)
- handle a no path available corner case (Daniel Wagner)
- use the proper RCU aware list_add helper (Chao Leng)
- bcache regression fix (Coly)
- bdev->bd_size_lock IRQ fix. This will be fixed in drivers for 5.12,
but for now, we'll make it IRQ safe (Damien)
- null_blk zoned init fix (Damien)
- add_partition() error handling fix (Dinghao)
- s390 dasd kobject fix (Jan)
- nbd fix for freezing queue while adding connections (Josef)
- tag queueing regression fix (Ming)
- revert of a patch that inadvertently meant that we regressed write
performance on raid (Maxim)"
* tag 'block-5.11-2021-01-29' of git://git.kernel.dk/linux-block:
null_blk: cleanup zoned mode initialization
nvme-core: use list_add_tail_rcu instead of list_add_tail for nvme_init_ns_head
nvme-multipath: Early exit if no path is available
nvme-pci: add the DISABLE_WRITE_ZEROES quirk for a SPCC device
bcache: only check feature sets when sb->version >= BCACHE_SB_VERSION_CDEV_WITH_FEATURES
block: fix bd_size_lock use
blk-cgroup: Use cond_resched() when destroy blkgs
Revert "block: simplify set_init_blocksize" to regain lost performance
nbd: freeze the queue while we're adding connections
s390/dasd: Fix inconsistent kobject removal
block: Fix an error handling in add_partition
blk-mq: test QUEUE_FLAG_HCTX_ACTIVE for sbitmap_shared in hctx_may_queue
-----BEGIN PGP SIGNATURE-----
iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmAUXJoQHGF4Ym9lQGtl
cm5lbC5kawAKCRD301j7KXHgpplXD/9v4iQNBN/TzLnFufOAoSX8Y6Gm/0ykr7k3
wPVNBMJ4g7twdI2FDFZn6GDfEpT7+aIjSyPOcGbUznvVFNYzLrTdGpzxOXZ91E6K
G0wpxhYgQxeiaCYpfa4JFw1bfPSWM/e9IZ7dqO2rpUj0yJC2+0mUDP2xpoTbyfeR
bP/qVMp7Ij0WRul4GWHUN/KURYnpY97/3uGcXqjyxYA06KstIMMfxWCqx0So3eGR
MCjrHtASey/I0XnhcJ0M7Wa2OJBHzrh9txP2YCHtI1u3mU13V65L0kw5i4FzFlKY
g7OpAXmUnWuoLtUe/aPX5/gtSbtYeRrkmF4PRv/7FtW+pE9mWo7LtJC6ymWW/ymG
5qa3oc3X1A25EMnMngLfOcgOHkMQW5NQzBMXGObuYSQoiwp3eJY8JdJCabqbM8kx
9oJlOKiZU/jEbzNvPGZmjSjGj7uzAL90fK9K3X7pCB/ZIynzQo5mVhaGoeWNW6Nq
b+G0qcL79Ct1tas0Dgan86388yiS56CUGJOIGyDTvlIlKSCXo3K7/e1SFnt43M6K
WRHp8MgL7crM7UZpKAyBZD4BeL3SHp3sJMYdd0EgrJiHCO2IODDAmsuF8n57ef/1
aSmKKo8/hjxxFZ7NsBF8N3y+1SfItKjr3sZgGW6hXM+kzNFM2WPcOBHGsoejon/e
sZlBSj8D+w==
=oCb5
-----END PGP SIGNATURE-----
Merge tag 'io_uring-5.11-2021-01-29' of git://git.kernel.dk/linux-block
Pull io_uring fixes from Jens Axboe:
"We got the cancelation story sorted now, so for all intents and
purposes, this should be it for 5.11 outside of any potential little
fixes that may come in. This contains:
- task_work task state fixes (Hao, Pavel)
- Cancelation fixes (me, Pavel)
- Fix for an inflight req patch in this release (Pavel)
- Fix for a lock deadlock issue (Pavel)"
* tag 'io_uring-5.11-2021-01-29' of git://git.kernel.dk/linux-block:
io_uring: reinforce cancel on flush during exit
io_uring: fix sqo ownership false positive warning
io_uring: fix list corruption for splice file_get
io_uring: fix flush cqring overflow list while TASK_INTERRUPTIBLE
io_uring: fix wqe->lock/completion_lock deadlock
io_uring: fix cancellation taking mutex while TASK_UNINTERRUPTIBLE
io_uring: fix __io_uring_files_cancel() with TASK_UNINTERRUPTIBLE
io_uring: only call io_cqring_ev_posted() if events were posted
io_uring: if we see flush on exit, cancel related tasks
The new mount API requires additional changes to how DFS
is handled. Additional testing of DFS uncovered problems
with domain based DFS referrals (a follow on patch addresses
DFS links) which this patch addresses.
Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
Signed-off-by: Steve French <stfrench@microsoft.com>
of UID translations to occur, in some configurations, when setting v3
namespaced file capabilities.
-----BEGIN PGP SIGNATURE-----
iQJFBAABCgAvFiEEKvuQkp28KvPJn/fVfL7QslYSS/0FAmATYfMRHGNvZGVAdHlo
aWNrcy5jb20ACgkQfL7QslYSS/2Gsg//SoR/VpEn3FfsRs/niAsPAmOi0i9m0zxD
rMZ7M12GAQFo9B2AaBhttUejcLzBuIM6DIpRUYyP9QBoSONPvmNy2l58tD3xd97k
ZXkGqOwG65q20zRTWiS2x3u271SwmCiZJzC7xIYUH36ZLWySfYLuI0QD8HqYdfD1
iNdDiYCzkilRG0PuaPIFtfq4OL/NizeBdBwJR2F5PJQdixocVsmdJKO+lTw5A8PJ
EefMC8lgV5pg+nVHlERXr9bg5BXaxRhE4hqwDPD7qB91piA8j7CxNIdagmjw5d7p
KOLYO4Ek3wKJY1MGMJ/hBNXQIBxMJX7DBEFUi1y/+Eiw3QUr9XUkQiqe6nYLALPa
m0IOKQJOkcuZLd5cCACnfv6XTu8iAabpilwUIi6TnADwzByc0jaIjypIbkVulScH
YMGE+HO9X0jzMfpWMG/FopHVuGb5t6zdIukfb/Ndo6tIbEZx+cr+uZ3HEB86mMLw
diJVXDGtdRGZRz5seN6mVRuGUFL/Xlg/wEhq60hxxrBstW7yvMGkjII0AowH3Sri
pCXXq6W/t2MA1sJKfSp33v3vFeG98y77aDO0Djh3G7cg4XLCiAxQkQu3RO1Vm9wi
U5Q3Hd2Cmaxw/NG7tnG5R79wAEBoGW7LeRCAYsjcs6O1pmpn9krn9vSfXyhOAKz4
d1ukk31djsg=
=G4uE
-----END PGP SIGNATURE-----
Merge tag 'ecryptfs-5.11-rc6-setxattr-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/tyhicks/ecryptfs
Pull ecryptfs fix from Tyler Hicks:
"Fix a regression that resulted in two rounds of UID translations when
setting v3 namespaced file capabilities in some configurations"
* tag 'ecryptfs-5.11-rc6-setxattr-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/tyhicks/ecryptfs:
ecryptfs: fix uid translation for setxattr on security.capability
What 84965ff8a8 ("io_uring: if we see flush on exit, cancel related tasks")
really wants is to cancel all relevant REQ_F_INFLIGHT requests reliably.
That can be achieved by io_uring_cancel_files(), but we'll miss it
calling io_uring_cancel_task_requests(files=NULL) from io_uring_flush(),
because it will go through __io_uring_cancel_task_requests().
Just always call io_uring_cancel_files() during cancel, it's good enough
for now.
Cc: stable@vger.kernel.org # 5.9+
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>