Commit b7af0635c8 ("btrfs: print transaction aborted messages with an
error level") changed the log level of transaction aborted messages from
a debug level to an error level, so that such messages are always visible
even on production systems where the log level is normally above the debug
level (and also on some syzbot reports).
Later, commit fccf0c842e ("btrfs: move btrfs_abort_transaction to
transaction.c") changed the log level back to debug level when the error
number for a transaction abort should not have a stack trace printed.
This happened for absolutely no reason. It's always useful to print
transaction abort messages with an error level, regardless of whether
the error number should cause a stack trace or not.
So change back the log level to error level.
Fixes: fccf0c842e ("btrfs: move btrfs_abort_transaction to transaction.c")
CC: stable@vger.kernel.org # 6.5+
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
[BUG]
The following script would allow invalid mount options to be specified
(although such invalid options would just be ignored):
# mkfs.btrfs -f $dev
# mount $dev $mnt1 <<< Successful mount expected
# mount $dev $mnt2 -o junk <<< Failed mount expected
# echo $?
0
[CAUSE]
For the 2nd mount, since the fs is already mounted, we won't go through
open_ctree() thus no btrfs_parse_options(), but only through
btrfs_parse_subvol_options().
However we do not treat unrecognized options from valid but irrelevant
options, thus those invalid options would just be ignored by
btrfs_parse_subvol_options().
[FIX]
Add the handling for Opt_err to handle invalid options and error out,
while still ignore other valid options inside btrfs_parse_subvol_options().
Reported-by: Anand Jain <anand.jain@oracle.com>
CC: stable@vger.kernel.org # 4.14+
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Jens reported the following warnings from -Wmaybe-uninitialized recent
Linus' branch.
In file included from ./include/asm-generic/rwonce.h:26,
from ./arch/arm64/include/asm/rwonce.h:71,
from ./include/linux/compiler.h:246,
from ./include/linux/export.h:5,
from ./include/linux/linkage.h:7,
from ./include/linux/kernel.h:17,
from fs/btrfs/ioctl.c:6:
In function ‘instrument_copy_from_user_before’,
inlined from ‘_copy_from_user’ at ./include/linux/uaccess.h:148:3,
inlined from ‘copy_from_user’ at ./include/linux/uaccess.h:183:7,
inlined from ‘btrfs_ioctl_space_info’ at fs/btrfs/ioctl.c:2999:6,
inlined from ‘btrfs_ioctl’ at fs/btrfs/ioctl.c:4616:10:
./include/linux/kasan-checks.h:38:27: warning: ‘space_args’ may be used
uninitialized [-Wmaybe-uninitialized]
38 | #define kasan_check_write __kasan_check_write
./include/linux/instrumented.h:129:9: note: in expansion of macro
‘kasan_check_write’
129 | kasan_check_write(to, n);
| ^~~~~~~~~~~~~~~~~
./include/linux/kasan-checks.h: In function ‘btrfs_ioctl’:
./include/linux/kasan-checks.h:20:6: note: by argument 1 of type ‘const
volatile void *’ to ‘__kasan_check_write’ declared here
20 | bool __kasan_check_write(const volatile void *p, unsigned int
size);
| ^~~~~~~~~~~~~~~~~~~
fs/btrfs/ioctl.c:2981:39: note: ‘space_args’ declared here
2981 | struct btrfs_ioctl_space_args space_args;
| ^~~~~~~~~~
In function ‘instrument_copy_from_user_before’,
inlined from ‘_copy_from_user’ at ./include/linux/uaccess.h:148:3,
inlined from ‘copy_from_user’ at ./include/linux/uaccess.h:183:7,
inlined from ‘_btrfs_ioctl_send’ at fs/btrfs/ioctl.c:4343:9,
inlined from ‘btrfs_ioctl’ at fs/btrfs/ioctl.c:4658:10:
./include/linux/kasan-checks.h:38:27: warning: ‘args32’ may be used
uninitialized [-Wmaybe-uninitialized]
38 | #define kasan_check_write __kasan_check_write
./include/linux/instrumented.h:129:9: note: in expansion of macro
‘kasan_check_write’
129 | kasan_check_write(to, n);
| ^~~~~~~~~~~~~~~~~
./include/linux/kasan-checks.h: In function ‘btrfs_ioctl’:
./include/linux/kasan-checks.h:20:6: note: by argument 1 of type ‘const
volatile void *’ to ‘__kasan_check_write’ declared here
20 | bool __kasan_check_write(const volatile void *p, unsigned int
size);
| ^~~~~~~~~~~~~~~~~~~
fs/btrfs/ioctl.c:4341:49: note: ‘args32’ declared here
4341 | struct btrfs_ioctl_send_args_32 args32;
| ^~~~~~
This was due to his config options and having KASAN turned on,
which adds some extra checks around copy_from_user(), which then
triggered the -Wmaybe-uninitialized checker for these cases.
Fix the warnings by initializing the different structs we're copying
into.
Reported-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
A recent ext4 patch posting from Jan Kara reminded me of a
discussion a year ago about fstrim in progress preventing kernels
from suspending. The fix is simple, we should do the same for XFS.
This removes the -ERESTARTSYS error return from this code, replacing
it with either the last error seen or the number of blocks
successfully trimmed up to the point where we detected the stop
condition.
References: https://bugzilla.kernel.org/show_bug.cgi?id=216322
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
fstrim will hold the AGF lock for as long as it takes to walk and
discard all the free space in the AG that meets the userspace trim
criteria. For AGs with lots of free space extents (e.g. millions)
or the underlying device is really slow at processing discard
requests (e.g. Ceph RBD), this means the AGF hold time is often
measured in minutes to hours, not a few milliseconds as we normal
see with non-discard based operations.
This can result in the entire filesystem hanging whilst the
long-running fstrim is in progress. We can have transactions get
stuck waiting for the AGF lock (data or metadata extent allocation
and freeing), and then more transactions get stuck waiting on the
locks those transactions hold. We can get to the point where fstrim
blocks an extent allocation or free operation long enough that it
ends up pinning the tail of the log and the log then runs out of
space. At this point, every modification in the filesystem gets
blocked. This includes read operations, if atime updates need to be
made.
To fix this problem, we need to be able to discard free space
extents safely without holding the AGF lock. Fortunately, we already
do this with online discard via busy extents. We can mark free space
extents as "busy being discarded" under the AGF lock and then unlock
the AGF, knowing that nobody will be able to allocate that free
space extent until we remove it from the busy tree.
Modify xfs_trim_extents to use the same asynchronous discard
mechanism backed by busy extents as is used with online discard.
This results in the AGF only needing to be held for short periods of
time and it is never held while we issue discards. Hence if discard
submission gets throttled because it is slow and/or there are lots
of them, we aren't preventing other operations from being performed
on AGF while we wait for discards to complete...
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Because we are going to use the same list-based discard submission
interface for fstrim-based discards, too.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Stable Fix:
* Revert "SUNRPC dont update timeout value on connection reset"
* NFSv4: Fix a state manager thread deadlock regression
Bugfixes:
* Fix a potential NULL pointer dereference in nfs_inode_remove_request()
* Fix a rare NULL pointer dereference in xs_tcp_tls_setup_socket()
* Fix long delay before failing a TLS mount when server does not support TLS
* Fix various NFS state manager issues
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEnZ5MQTpR7cLU7KEp18tUv7ClQOsFAmUcZYIACgkQ18tUv7Cl
QOtNyBAA0RaAI4D5Em3/JxmdaqoK8LgSlw3gSup0VoWG1/gyKlt0wW6gDPzffIVK
pb5wTMdDVzZmbamll15gvOJYYY4tEVpMGg5unR5LIIhS/5Z5TbGEU3ioO7xsKnTI
0FYtQ0fEXJaPqVLUh3xY/W7Bn2wdehD2bqSfYakddL1C9+cc1XRnxA5BLEL67ZQ4
zlBhI2acv/9eEGZjVYdI8cv+27WvQY+ud21VZrEGPuztZklfwipNBLr3qT2WpwTU
vLqp5Py1PydEV7CHIrPtAKhQIuxlYr4m//YXMD4jU4bQuHql0TmUGzJEvfsOzIZn
LPEJy2kIA37/Rm92lPeZ0m6E/bwXz8iv7HTFpvgIf6ZIY2PSsVNFZYovhSv/yV8N
qoRcpBswi5nYXZ2KoVfjp+J8NQXWjVS/ODoycdCCWVKEjrVh5oFPfYH67jph4o7l
E0QaSNKPhy2OHWVgf8hQNJ+Fi34qvxYG1F2RmaTcJybyco1o4jVQ4WJhNGNJbLWH
dorEko6n82XPBB9+1Qo2nUqLwNgNdyHmmqBO0OG5tJH3mrIrRPJ8DBjAIWggDGSH
pfB2GftuTuXIUvY97PxZlW0Iz1ZKX6NPFS/gtHJizbVGl+E5wttxbPPJNsApA/v1
ZoE2K2cgSsyqTXR6arf55EJZyO4oAnmYEPQLu5jlyV83vQQz/xY=
=qUGw
-----END PGP SIGNATURE-----
Merge tag 'nfs-for-6.6-3' of git://git.linux-nfs.org/projects/anna/linux-nfs
Pull NFS client fixes from Anna Schumaker:
"Stable fixes:
- Revert "SUNRPC dont update timeout value on connection reset"
- NFSv4: Fix a state manager thread deadlock regression
Fixes:
- Fix potential NULL pointer dereference in nfs_inode_remove_request()
- Fix rare NULL pointer dereference in xs_tcp_tls_setup_socket()
- Fix long delay before failing a TLS mount when server does not
support TLS
- Fix various NFS state manager issues"
* tag 'nfs-for-6.6-3' of git://git.linux-nfs.org/projects/anna/linux-nfs:
nfs: decrement nrequests counter before releasing the req
SUNRPC/TLS: Lock the lower_xprt during the tls handshake
Revert "SUNRPC dont update timeout value on connection reset"
NFSv4: Fix a state manager thread deadlock regression
NFSv4: Fix a nfs4_state_manager() race
SUNRPC: Fail quickly when server does not recognize TLS
A wrong return value from ovl_check_encode_origin() would cause
ovl_dentry_to_fid() to try to encode fid from NULL upper dentry.
Reported-by: syzbot+2208f82282740c1c8915@syzkaller.appspotmail.com
Fixes: 16aac5ad1f ("ovl: support encoding non-decodable file handles")
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
ovl_permission() accesses ->layers[...].mnt; we can't have ->layers
freed without an RCU delay on fs shutdown.
Fortunately, kern_unmount_array() that is used to drop those mounts
does include an RCU delay, so freeing is delayed; unfortunately, the
array passed to kern_unmount_array() is formed by mangling ->layers
contents and that happens without any delays.
The ->layers[...].name string entries are used to store the strings to
display in "lowerdir=..." by ovl_show_options(). Those entries are not
accessed in RCU walk.
Move the name strings into a separate array ofs->config.lowerdirs and
reuse the ofs->config.lowerdirs array as the temporary mount array to
pass to kern_unmount_array().
Reported-by: Al Viro <viro@zeniv.linux.org.uk>
Link: https://lore.kernel.org/r/20231002023711.GP3389589@ZenIV/
Acked-by: Miklos Szeredi <mszeredi@redhat.com>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
d_inode_rcu() is right - we might be in rcu pathwalk;
however, OVL_E() hides plain d_inode() on the same dentry...
Fixes: a6ff2bc0be ("ovl: use OVL_E() and OVL_E_FLAGS() accessors")
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
... into ->free_inode(), that is.
Fixes: 0af950f57f "ovl: move ovl_entry into ovl_inode"
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Commit 724768a393 ("ovl: fix incorrect fdput() on aio completion")
took a refcount on real file before submitting aio, but forgot to
avoid clearing FDPUT_FPUT from real.flags stack variable.
This can result in a file reference leak.
Fixes: 724768a393 ("ovl: fix incorrect fdput() on aio completion")
Reported-by: Gil Lev <contact@levgil.com>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
to issues which were introduced after 6.5.
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZRmSDAAKCRDdBJ7gKXxA
jlSaAQCe3SnBdjRmuzbp5iIfNJOY7GXLN4NwMsArRUxRGY27IwD+KWhXZP/ydVnt
ZgS4x9rmarHuh5Pxds+6SRGhihRz/Ak=
=sf/5
-----END PGP SIGNATURE-----
Merge tag 'mm-hotfixes-stable-2023-10-01-08-34' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull misc fixes from Andrew Morton:
"Fourteen hotfixes, eleven of which are cc:stable. The remainder
pertain to issues which were introduced after 6.5"
* tag 'mm-hotfixes-stable-2023-10-01-08-34' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
Crash: add lock to serialize crash hotplug handling
selftests/mm: fix awk usage in charge_reserved_hugetlb.sh and hugetlb_reparenting_test.sh that may cause error
mm: mempolicy: keep VMA walk if both MPOL_MF_STRICT and MPOL_MF_MOVE are specified
mm/damon/vaddr-test: fix memory leak in damon_do_test_apply_three_regions()
mm, memcg: reconsider kmem.limit_in_bytes deprecation
mm: zswap: fix potential memory corruption on duplicate store
arm64: hugetlb: fix set_huge_pte_at() to work with all swap entries
mm: hugetlb: add huge page size param to set_huge_pte_at()
maple_tree: add MAS_UNDERFLOW and MAS_OVERFLOW states
maple_tree: add mas_is_active() to detect in-tree walks
nilfs2: fix potential use after free in nilfs_gccache_submit_read_data()
mm: abstract moving to the next PFN
mm: report success more often from filemap_map_folio_range()
fs: binfmt_elf_efpic: fix personality for ELF-FDPIC
- Make sure 32 bit applications using user events have aligned access when
running on a 64 bit kernel.
- Add cond_resched in the loop that handles converting enums in print_fmt
string is trace events.
- Fix premature wake ups of polling processes in the tracing ring buffer. When
a task polls waiting for a percentage of the ring buffer to be filled, the
writer still will wake it up at every event. Add the polling's percentage to
the "shortest_full" list to tell the writer when to wake it up.
- For eventfs dir lookups on dynamic events, an event system's only event could
be removed, leaving its dentry with no children. This is totally legitimate.
But in eventfs_release() it must not access the children array, as it is only
allocated when the dentry has children.
-----BEGIN PGP SIGNATURE-----
iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCZRiI2xQccm9zdGVkdEBn
b29kbWlzLm9yZwAKCRAp5XQQmuv6qlvoAQDKbevbqA0C8lEV1rbVh4Q9Rnq580rz
EAyEO/RrSOwE9AEA2z+Q597mDjEiqQBvqTjBkS+0xZ7AUQYZRWgTHRIbegg=
=tqOM
-----END PGP SIGNATURE-----
Merge tag 'trace-v6.6-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull tracing fixes from Steven Rostedt:
- Make sure 32-bit applications using user events have aligned access
when running on a 64-bit kernel.
- Add cond_resched in the loop that handles converting enums in
print_fmt string is trace events.
- Fix premature wake ups of polling processes in the tracing ring
buffer. When a task polls waiting for a percentage of the ring buffer
to be filled, the writer still will wake it up at every event. Add
the polling's percentage to the "shortest_full" list to tell the
writer when to wake it up.
- For eventfs dir lookups on dynamic events, an event system's only
event could be removed, leaving its dentry with no children. This is
totally legitimate. But in eventfs_release() it must not access the
children array, as it is only allocated when the dentry has children.
* tag 'trace-v6.6-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
eventfs: Test for dentries array allocated in eventfs_release()
tracing/user_events: Align set_bit() address for all archs
tracing: relax trace_event_eval_update() execution with cond_resched()
ring-buffer: Update "shortest_full" in polling
The dcache_dir_open_wrapper() could be called when a dynamic event is
being deleted leaving a dentry with no children. In this case the
dlist->dentries array will never be allocated. This needs to be checked
for in eventfs_release(), otherwise it will trigger a NULL pointer
dereference.
Link: https://lore.kernel.org/linux-trace-kernel/20230930090106.1c3164e9@rorschach.local.home
Cc: Mark Rutland <mark.rutland@arm.com>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Fixes: ef36b4f928 ("eventfs: Remember what dentries were created on dir open")
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
* Handle a race between writing and shrinking block devices by
returning EIO.
* Fix a typo in a comment.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQQ2qTKExjcn+O1o2YRKO3ySh0YRpgUCZRWpqAAKCRBKO3ySh0YR
puinAP46EI8AvxQOid2ukGIEP09ZdhYNcJkWsigZ8k7Z/wcqagD/UVliaDPGC2kk
rlrPN6jmNyqDzAP6muBmqu2v44GVWwY=
=7nql
-----END PGP SIGNATURE-----
Merge tag 'iomap-6.6-fixes-4' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux
Pull iomap fixes from Darrick Wong:
- Handle a race between writing and shrinking block devices by
returning EIO
- Fix a typo in a comment
* tag 'iomap-6.6-fixes-4' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
iomap: Spelling s/preceeding/preceding/g
iomap: add a workaround for racy i_size updates on block devices
- Fix NFSv4 READ corner case
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEKLLlsBKG3yQ88j7+M2qzM29mf5cFAmUYSnEACgkQM2qzM29m
f5dtSxAAk/n3CzeIJFALIcP+JmHW6Zh2KbpygcqhxFoO1FAyRBJgOIEhtdD/c65Q
fzfCQDZUzdTSirklBS5m0lIMhqaYMe1LUC8OIq/hh309vQ6WnlLltIU3fsaZFva5
kAaDeI/oGq2YnbjQENWCGHvKC7RR1jOTWASI/+52oYg/fwRkt14so/wDl54mxhNP
q6Gt0Xw/mXkslAkQNQ+7a5eAa4TtepzkjkNWL4hSqboHQ4QFT2tmXqDV3E+V9tzX
F/tlEN3EJo3nDNcUYAh6ec/YXo0tBh4DsnJxA9bRYloU40EUGP19ewXLEQgXPw9M
IgSNluxTmDTQGz6kSUkQpepevWDUN0+fDt1kVqACIRXuvuMcGuDUCbD4VMa8HNnc
bCMg8R/SkrBnzTamHJpPhbV/F/C0Iwp80RpqDMAnt4splrQUCJeB+Wl1qJAnYUW5
TdJxSJKCvypMIcNCd1ZMSVbeByUgZ/qXKKsS2YNS4l8+DGdnEJQ2mnFKw3Nk5bEF
byWoyW6RMXCIwaqQa8hNsO5kW4MTxURxLXcgqzN7+5L2ht7pAYpNjnOc0vIBPe5t
9z+/dSk8a9QXBwRTxPL4QOgludXPqvokGBtCIV1LD5xQA9OZuXivnBz2DWmCt+IR
GOYfjs3gSlvVrv7+EGffYlOh9f7GN/wlUfUxERKcB6HPweGOtqs=
=aD10
-----END PGP SIGNATURE-----
Merge tag 'nfsd-6.6-2' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux
Pull nfsd fix from Chuck Lever:
- Fix NFSv4 READ corner case
* tag 'nfsd-6.6-2' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux:
NFSD: Fix zero NFSv4 READ results when RQ_SPLICE_OK is not set
In nilfs_gccache_submit_read_data(), brelse(bh) is called to drop the
reference count of bh when the call to nilfs_dat_translate() fails. If
the reference count hits 0 and its owner page gets unlocked, bh may be
freed. However, bh->b_page is dereferenced to put the page after that,
which may result in a use-after-free bug. This patch moves the release
operation after unlocking and putting the page.
NOTE: The function in question is only called in GC, and in combination
with current userland tools, address translation using DAT does not occur
in that function, so the code path that causes this issue will not be
executed. However, it is possible to run that code path by intentionally
modifying the userland GC library or by calling the GC ioctl directly.
[konishi.ryusuke@gmail.com: NOTE added to the commit log]
Link: https://lkml.kernel.org/r/1543201709-53191-1-git-send-email-bianpan2016@163.com
Link: https://lkml.kernel.org/r/20230921141731.10073-1-konishi.ryusuke@gmail.com
Fixes: a3d93f709e ("nilfs2: block cache for garbage collection")
Signed-off-by: Pan Bian <bianpan2016@163.com>
Reported-by: Ferry Meng <mengferry@linux.alibaba.com>
Closes: https://lkml.kernel.org/r/20230818092022.111054-1-mengferry@linux.alibaba.com
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Tested-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
The elf-fdpic loader hard sets the process personality to either
PER_LINUX_FDPIC for true elf-fdpic binaries or to PER_LINUX for normal ELF
binaries (in this case they would be constant displacement compiled with
-pie for example). The problem with that is that it will lose any other
bits that may be in the ELF header personality (such as the "bug
emulation" bits).
On the ARM architecture the ADDR_LIMIT_32BIT flag is used to signify a
normal 32bit binary - as opposed to a legacy 26bit address binary. This
matters since start_thread() will set the ARM CPSR register as required
based on this flag. If the elf-fdpic loader loses this bit the process
will be mis-configured and crash out pretty quickly.
Modify elf-fdpic loader personality setting so that it preserves the upper
three bytes by using the SET_PERSONALITY macro to set it. This macro in
the generic case sets PER_LINUX and preserves the upper bytes.
Architectures can override this for their specific use case, and ARM does
exactly this.
The problem shows up quite easily running under qemu using the ARM
architecture, but not necessarily on all types of real ARM hardware. If
the underlying ARM processor does not support the legacy 26-bit addressing
mode then everything will work as expected.
Link: https://lkml.kernel.org/r/20230907011808.2985083-1-gerg@kernel.org
Fixes: 1bde925d23 ("fs/binfmt_elf_fdpic.c: provide NOMMU loader for regular ELF binaries")
Signed-off-by: Greg Ungerer <gerg@kernel.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Greg Ungerer <gerg@kernel.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-----BEGIN PGP SIGNATURE-----
iQGzBAABCgAdFiEE6fsu8pdIjtWE/DpLiiy9cAdyT1EFAmUXOMIACgkQiiy9cAdy
T1G+XQv9Fj1kWJRPHih1wTRFHAgysHRtFw04KvW9SLmpGkPLBslJm3Fg1yPiXytc
nfqZNCPS/tuWIdqc9YRJqEWKfCO6X/+0IESnN6Wl4jIqMSviL/Hg3DzXZr5YCsAy
1vJ+DYmQkmWNgZ8grnFjKCSezTrAb+b+VLZqsx7dzT8NhTRxdKoTBDS31jTGMswV
OIQ1b/aLv9hgUS08wqzSKveMq8DkX66UbkSakM+tVImu32eh7u1HG89P7y3e/dr3
lGwd/Pq+IIiyAgZ0uoPdI9hQ2+Md6JfhVFMiMTLUfwh1LCDgrYnYrOkuZWYvDr9z
t2Y0+IwEkljk7HcFaL0NKPJW2beG4eNh/t2t6ff6vK8MhzlXp3KM5yVBlXNYc7hA
JfsMxVIFhUobeRKbQY9S6BstHyo19pdfeHDm/+RicIhRfOo++7kWYzwqKsD0pvLC
wcr3CBLqqsPXamRwUBbxnMASjYVmoz4nSXusXLDxmSWK39NCjEIz3YeFZfcAdoou
jnvMikMA
=jim3
-----END PGP SIGNATURE-----
Merge tag '6.6-rc3-ksmbd-server-fixes' of git://git.samba.org/ksmbd
Pull smb server fixes from Steve French:
"Two SMB3 server fixes for null pointer dereferences:
- invalid SMB3 request case (fixes issue found in testing the read
compound patch)
- iovec error case in response processing"
* tag '6.6-rc3-ksmbd-server-fixes' of git://git.samba.org/ksmbd:
ksmbd: check iov vector index in ksmbd_conn_write()
ksmbd: return invalid parameter error response if smb2 request is invalid
marked for stable and two cleanups.
-----BEGIN PGP SIGNATURE-----
iQFHBAABCAAxFiEEydHwtzie9C7TfviiSn/eOAIR84sFAmUW5QwTHGlkcnlvbW92
QGdtYWlsLmNvbQAKCRBKf944AhHziw7+B/99D/BRIJUaCz8hm2xMZC3Yu6Cvi2de
YlgHZBuHm5lmihzITEdoHTmWlIpgGchqjTaikCvVooKEe1w4sNr7nYFiXUFVw9sf
W/I06dtlJlj2f4oyK91i4sIzpQKbXZznFDpTHThjRJt+uyUp3RYVbrCDMmGAnJv3
foppstycm5fe2Y2e/RgNyYOHY+EAjvS5UrpvT3lAX+iw5KXR1pyMBrFo8iICLPlZ
TIPP4mwlVwb/WB1rGnxaK65RJxGuXLwuMWdLF9kq1ZeCld6owPH3x2RDax2+vMvA
bZxI6gQymU2SquKiNseYF3kQ+2KdC5mfjmkoncOH79Je4JPHf7xa4LYZ
=7/Ez
-----END PGP SIGNATURE-----
Merge tag 'ceph-for-6.6-rc4' of https://github.com/ceph/ceph-client
Pull ceph fixes from Ilya Dryomov:
"A series that fixes an involved 'double watch error' deadlock in RBD
marked for stable and two cleanups"
* tag 'ceph-for-6.6-rc4' of https://github.com/ceph/ceph-client:
rbd: take header_rwsem in rbd_dev_refresh() only when updating
rbd: decouple parent info read-in from updating rbd_dev
rbd: decouple header read-in from updating rbd_dev->header
rbd: move rbd_dev_refresh() definition
Revert "ceph: make members in struct ceph_mds_request_args_ext a union"
ceph: remove unnecessary check for NULL in parse_longname()
* Include modifications made to commit "xfs: load uncached unlinked inodes
into memory on demand" (Commit ID: 68b957f64f)
which address review comments provided by Dave Chinner.
Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQQjMC4mbgVeU7MxEIYH7y4RirJu9AUCZRPtowAKCRAH7y4RirJu
9EcNAQDnuVtf89FL0Qqqtho5TeK2UO4JhEcTWI4Wj1d9w7h4lAEA5ZTYu8oJDg0k
zoTXgr9sbpzcf53fgY0hwqPVjdV8dwU=
=WkWe
-----END PGP SIGNATURE-----
Merge tag 'xfs-6.6-fixes-2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux
Pull xfs fix from Chandan Babu:
- fix for commit 68b957f64f ("xfs: load uncached unlinked inodes into
memory on demand") which address review comments provided by Dave
Chinner
* tag 'xfs-6.6-fixes-2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
xfs: fix reloading entire unlinked bucket lists
Forget to reset ctx->password to NULL will lead to bug like double free
Cc: stable@vger.kernel.org
Cc: Willy Tarreau <w@1wt.eu>
Reviewed-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Quang Le <quanglex97@gmail.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Fix a misspelling of "preceding".
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Reviewed-by: Bill O'Donnell <bodonnel@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
nfsd4_encode_readv() uses xdr->buf->page_len as a starting point for
the nfsd_iter_read() sink buffer -- page_len is going to be offset
by the parts of the COMPOUND that have already been encoded into
xdr->buf->pages.
However, that value must be captured /before/
xdr_reserve_space_vec() advances page_len by the expected size of
the read payload. Otherwise, the whole front part of the first
page of the payload in the reply will be uninitialized.
Mantas hit this because sec=krb5i forces RQ_SPLICE_OK off, which
invokes the readv part of the nfsd4_encode_read() path. Also,
older Linux NFS clients appear to send shorter READ requests
for files smaller than a page, whereas newer clients just send
page-sized requests and let the server send as many bytes as
are in the file.
Reported-by: Mantas Mikulėnas <grawity@gmail.com>
Closes: https://lore.kernel.org/linux-nfs/f1d0b234-e650-0f6e-0f5d-126b3d51d1eb@gmail.com/
Fixes: 703d752155 ("NFSD: Hoist rq_vec preparation into nfsd_read() [step two]")
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
smatch warn:
fs/ntfs3/fslog.c:2172 last_log_lsn() warn: possible memory leak of 'page_bufs'
Jump to label 'out' to free 'page_bufs' and is more consistent with
other code.
Signed-off-by: Su Hui <suhui@nfschina.com>
Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
Calling stat() from userspace correctly identified junctions in an NTFS
partition as symlinks, but using readdir() and iterating through the
directory containing the same junction did not identify the junction
as a symlink.
When emitting directory contents, check FILE_ATTRIBUTE_REPARSE_POINT
attribute to detect junctions and report them as links.
Signed-off-by: Gabriel Marcano <gabemarcano@yahoo.com>
Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
Upon investigation of the C reproducer provided by Syzbot, it seemed
the reproducer was trying to mount a corrupted NTFS filesystem, then
issue a rename syscall to some nodes in the filesystem. This can be
shown by modifying the reproducer to only include the mount syscall,
and investigating the filesystem by e.g. `ls` and `rm` commands. As a
result, during the problematic call to `hdr_fine_e`, the `inode` being
supplied did not go through `indx_init`, hence the `cmp` function
pointer was never set.
The fix is simply to check whether `cmp` is not set, and return NULL
if that's the case, in order to be consistent with other error
scenarios of the `hdr_find_e` method. The rationale behind this patch
is that:
- We should prevent crashing the kernel even if the mounted filesystem
is corrupted. Any syscalls made on the filesystem could return
invalid, but the kernel should be able to sustain these calls.
- Only very specific corruption would lead to this bug, so it would be
a pretty rare case in actual usage anyways. Therefore, introducing a
check to specifically protect against this bug seems appropriate.
Because of its rarity, an `unlikely` clause is used to wrap around
this nullity check.
Reported-by: syzbot+60cf892fc31d1f4358fc@syzkaller.appspotmail.com
Signed-off-by: Ziqi Zhao <astrajoan@yahoo.com>
Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
Syzbot was able to create a device which has the last sector of size
512.
After failing to boot from initial sector, reading from boot info from
offset 511 causes OOB read.
To prevent such reports add sanity check to validate if size of buffer_head
if big enough to hold ntfs3 bootinfo
Fixes: 6a4cd3ea7d ("fs/ntfs3: Alternative boot if primary boot is corrupted")
Reported-by: syzbot+53ce40c8c0322c06aea5@syzkaller.appspotmail.com
Signed-off-by: Pavel Skripkin <paskripkin@gmail.com>
Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
Here is a BUG report about linux-6.1 from syzbot, but it still remains
within upstream:
BUG: KASAN: slab-out-of-bounds in ntfs_list_ea fs/ntfs3/xattr.c:191 [inline]
BUG: KASAN: slab-out-of-bounds in ntfs_listxattr+0x401/0x570 fs/ntfs3/xattr.c:710
Read of size 1 at addr ffff888021acaf3d by task syz-executor128/3632
Call Trace:
kasan_report+0x139/0x170 mm/kasan/report.c:495
ntfs_list_ea fs/ntfs3/xattr.c:191 [inline]
ntfs_listxattr+0x401/0x570 fs/ntfs3/xattr.c:710
vfs_listxattr fs/xattr.c:457 [inline]
listxattr+0x293/0x2d0 fs/xattr.c:804
path_listxattr fs/xattr.c:828 [inline]
__do_sys_llistxattr fs/xattr.c:846 [inline]
Before derefering field members of `ea` in unpacked_ea_size(), we need to
check whether the EA_FULL struct is located in access validate range.
Similarly, when derefering `ea->name` field member, we need to check
whethe the ea->name is located in access validate range, too.
Fixes: be71b5cba2 ("fs/ntfs3: Add attrib operations")
Reported-by: syzbot+9fcea5ef6dc4dc72d334@syzkaller.appspotmail.com
Signed-off-by: Zeng Heng <zengheng4@huawei.com>
[almaz.alexandrovich@paragon-software.com: took the ret variable out of the loop block]
Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
Commit 4dc73c6791 reintroduces the deadlock that was fixed by commit
aeabb3c961 ("NFSv4: Fix a NFSv4 state manager deadlock") because it
prevents the setup of new threads to handle reboot recovery, while the
older recovery thread is stuck returning delegations.
Fixes: 4dc73c6791 ("NFSv4: keep state manager thread active if swap is enabled")
Cc: stable@vger.kernel.org
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
If the NFS4CLNT_RUN_MANAGER flag got set just before we cleared
NFS4CLNT_MANAGER_RUNNING, then we might have won the race against
nfs4_schedule_state_manager(), and are responsible for handling the
recovery situation.
Fixes: aeabb3c961 ("NFSv4: Fix a NFSv4 state manager deadlock")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEE8rQSAMVO+zA4DBdWxWXV+ddtWDsFAmURvloACgkQxWXV+ddt
WDt+CQ/+NgBtQn7eyABsdHzXWPxpFyGZrdw5ldKnly3G+WDW2GKMaZ6CpDuEZGNQ
vMAkSGX5LIHXvO79pDnGG0i+bRINWrc5HZVZ/p5Da6wplBTgIPlbLmxaZX9MJLbx
j7Oz37GXiQJY8BxnVCnsb+bhhTrTbO9HFUQr/nxefIvu22OBdL1WXYcfuBOeEsFG
qr/aeC52YqCVgXvt+8a5DqAKE0NWc4PFMFUMo4vlf1xuL652fvff7xiup1CAIgBh
qsCa17E7q+qjri2phAhbFNadfpH5wGfyjTWScOlaFuXjRhW2v2oqz3WU5IQj4dmu
PI+k++PLUzIxT0IcjD1YbZzRFaEI6fR2W0GA4LK08fjVehh2ao5jOjtRgLl8HlqG
qC5fslAPzUxRmwMmCjSGfXF14sgtyLy8eVWf69xn06/1cbEmfHDrWNXP1QHuq6eT
Jqy8Ywia3jRzzfZ1utABJPLBW4hFQKkyobtyd67fxslUFmtuLvLqGTiOdmVFiD9K
o+BF2xjEz2n8O1+aRZk5SFNC9zcaASaRg/wQrhvSI9qxM18fh4TXgKQOniLzAK7v
lZc+JkegFW4CVquCUpmbsdZAOpVNRXfPOJIt/w6G+oRbaiTvPUnrH+uyq8IGREbw
E7d8XIP0qlF0DQBGK4Mw/riZz/e5MmEKNjza6M+fj2uglpfWTv4=
=6WEW
-----END PGP SIGNATURE-----
Merge tag 'for-6.6-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull btrfs fixes from David Sterba:
- delayed refs fixes:
- fix race when refilling delayed refs block reserve
- prevent transaction block reserve underflow when starting
transaction
- error message and value adjustments
- fix build warnings with CONFIG_CC_OPTIMIZE_FOR_SIZE and
-Wmaybe-uninitialized
- fix for smatch report where uninitialized data from invalid extent
buffer range could be returned to the caller
- fix numeric overflow in statfs when calculating lower threshold
for a full filesystem
* tag 'for-6.6-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: initialize start_slot in btrfs_log_prealloc_extents
btrfs: make sure to initialize start and len in find_free_dev_extent
btrfs: reset destination buffer when read_extent_buffer() gets invalid range
btrfs: properly report 0 avail for very full file systems
btrfs: log message if extent item not found when running delayed extent op
btrfs: remove redundant BUG_ON() from __btrfs_inc_extent_ref()
btrfs: return -EUCLEAN for delayed tree ref with a ref count not equals to 1
btrfs: prevent transaction block reserve underflow when starting transaction
btrfs: fix race when refilling delayed refs block reserve
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCZRKHuAAKCRCRxhvAZXjc
ohOLAQDU9Fxq5UdqCdmsyi/b24XJFZlQhcVIZy2Hrhcor9TiVQEAjuECGlxFPSgj
atVOWLdugDJquiHextqTEMgIecJpNw4=
=uINF
-----END PGP SIGNATURE-----
Merge tag 'v6.6-rc4.vfs.fixes' of gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs
Pull vfs fixes from Christian Brauner:
"This contains the usual miscellaneous fixes and cleanups for vfs and
individual fses:
Fixes:
- Revert ki_pos on error from buffered writes for direct io fallback
- Add missing documentation for block device and superblock handling
for changes merged this cycle
- Fix reiserfs flexible array usage
- Ensure that overlayfs sets ctime when setting mtime and atime
- Disable deferred caller completions with overlayfs writes until
proper support exists
Cleanups:
- Remove duplicate initialization in pipe code
- Annotate aio kioctx_table with __counted_by"
* tag 'v6.6-rc4.vfs.fixes' of gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs:
overlayfs: set ctime when setting mtime and atime
ntfs3: put resources during ntfs_fill_super()
ovl: disable IOCB_DIO_CALLER_COMP
porting: document superblock as block device holder
porting: document new block device opening order
fs/pipe: remove duplicate "offset" initializer
fs-writeback: do not requeue a clean inode having skipped pages
aio: Annotate struct kioctx_table with __counted_by
direct_write_fallback(): on error revert the ->ki_pos update from buffered write
reiserfs: Replace 1-element array with C99 style flex-array
A szybot reproducer that does write I/O while truncating the size of a
block device can end up in clean_bdev_aliases, which tries to clean the
bdev aliases that it uses. This is because iomap_to_bh automatically
sets the BH_New flag when outside of i_size. For block devices updates
to i_size are racy and we can hit this case in a tiny race window,
leading to the eventual clean_bdev_aliases call. Fix this by erroring
out of > i_size I/O on block devices.
Reported-by: syzbot+1fa947e7f09e136925b8@syzkaller.appspotmail.com
Signed-off-by: Christoph Hellwig <hch@lst.de>
Tested-by: syzbot+1fa947e7f09e136925b8@syzkaller.appspotmail.com
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Nathan reported that he was seeing the new warning in
setattr_copy_mgtime pop when starting podman containers. Overlayfs is
trying to set the atime and mtime via notify_change without also
setting the ctime.
POSIX states that when the atime and mtime are updated via utimes() that
we must also update the ctime to the current time. The situation with
overlayfs copy-up is analogies, so add ATTR_CTIME to the bitmask.
notify_change will fill in the value.
Reported-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Tested-by: Nathan Chancellor <nathan@kernel.org>
Acked-by: Christian Brauner <brauner@kernel.org>
Acked-by: Amir Goldstein <amir73il@gmail.com>
Message-Id: <20230913-ctime-v1-1-c6bc509cbc27@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
During ntfs_fill_super() some resources are allocated that we need to
cleanup in ->put_super() such as additional inodes. When
ntfs_fill_super() fails these resources need to be cleaned up as well.
Reported-by: syzbot+2751da923b5eb8307b0b@syzkaller.appspotmail.com
Fixes: 78a06688a4 ("ntfs3: drop inode references in ntfs_put_super()")
Signed-off-by: Christian Brauner <brauner@kernel.org>
overlayfs copies the kiocb flags when it sets up a new kiocb to handle
a write, but it doesn't properly support dealing with the deferred
caller completions of the kiocb. This means it doesn't get the final
write completion value, and hence will complete the write with '0' as
the result.
We could support the caller completions in overlayfs, but for now let's
just disable them in the generated write kiocb.
Reported-by: Zorro Lang <zlang@redhat.com>
Link: https://lore.kernel.org/io-uring/20230924142754.ejwsjen5pvyc32l4@dell-per750-06-vm-08.rhts.eng.pek2.redhat.com/
Fixes: 8c052fb300 ("iomap: support IOCB_DIO_CALLER_COMP")
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Message-Id: <71897125-e570-46ce-946a-d4729725e28f@kernel.dk>
Signed-off-by: Christian Brauner <brauner@kernel.org>
During review of the patcheset that provided reloading of the incore
iunlink list, Dave made a few suggestions, and I updated the copy in my
dev tree. Unfortunately, I then got distracted by ... who even knows
what ... and forgot to backport those changes from my dev tree to my
release candidate branch. I then sent multiple pull requests with stale
patches, and that's what was merged into -rc3.
So.
This patch re-adds the use of an unlocked iunlink list check to
determine if we want to allocate the resources to recreate the incore
list. Since lost iunlinked inodes are supposed to be rare, this change
helps us avoid paying the transaction and AGF locking costs every time
we open any inode.
This also re-adds the shutdowns on failure, and re-applies the
restructuring of the inner loop in xfs_inode_reload_unlinked_bucket, and
re-adds a requested comment about the quotachecking code.
Retain the original RVB tag from Dave since there's no code change from
the last submission.
Fixes: 68b957f64f ("xfs: load uncached unlinked inodes into memory on demand")
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
- Fix the "bytes" output of the per_cpu stat file
The tracefs/per_cpu/cpu*/stats "bytes" was giving bogus values as the
accounting was not accurate. It is suppose to show how many used bytes are
still in the ring buffer, but even when the ring buffer was empty it would
still show there were bytes used.
- Fix a bug in eventfs where reading a dynamic event directory (open) and then
creating a dynamic event that goes into that diretory screws up the accounting.
On close, the newly created event dentry will get a "dput" without ever having
a "dget" done for it. The fix is to allocate an array on dir open to save what
dentries were actually "dget" on, and what ones to "dput" on close.
-----BEGIN PGP SIGNATURE-----
iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCZQ9wihQccm9zdGVkdEBn
b29kbWlzLm9yZwAKCRAp5XQQmuv6quz4AP4vSFohvmAcTzC+sKP7gMLUvEmqL76+
1pixXrQOIP5BrQEApUW3VnjqYgjZJR2ne0N4MvvmYElm/ylBhDd4JRrD3g8=
=X9wd
-----END PGP SIGNATURE-----
Merge tag 'trace-v6.6-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull tracing fixes from Steven Rostedt:
- Fix the "bytes" output of the per_cpu stat file
The tracefs/per_cpu/cpu*/stats "bytes" was giving bogus values as the
accounting was not accurate. It is suppose to show how many used
bytes are still in the ring buffer, but even when the ring buffer was
empty it would still show there were bytes used.
- Fix a bug in eventfs where reading a dynamic event directory (open)
and then creating a dynamic event that goes into that diretory screws
up the accounting.
On close, the newly created event dentry will get a "dput" without
ever having a "dget" done for it. The fix is to allocate an array on
dir open to save what dentries were actually "dget" on, and what ones
to "dput" on close.
* tag 'trace-v6.6-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
eventfs: Remember what dentries were created on dir open
ring-buffer: Fix bytes info in per_cpu buffer stats
cc:stable.
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZQ8hRwAKCRDdBJ7gKXxA
jlK9AQDzT/FUQV3kIshsV1IwAKFcg7gtcFSN0vs+pV+e1+4tbQD/Z2OgfGFFsCSP
X6uc2cYHc9DG5/o44iFgadW8byMssQs=
=w+St
-----END PGP SIGNATURE-----
Merge tag 'mm-hotfixes-stable-2023-09-23-10-31' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull misc fixes from Andrew Morton:
"13 hotfixes, 10 of which pertain to post-6.5 issues. The other three
are cc:stable"
* tag 'mm-hotfixes-stable-2023-09-23-10-31' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
proc: nommu: fix empty /proc/<pid>/maps
filemap: add filemap_map_order0_folio() to handle order0 folio
proc: nommu: /proc/<pid>/maps: release mmap read lock
mm: memcontrol: fix GFP_NOFS recursion in memory.high enforcement
pidfd: prevent a kernel-doc warning
argv_split: fix kernel-doc warnings
scatterlist: add missing function params to kernel-doc
selftests/proc: fixup proc-empty-vm test after KSM changes
revert "scripts/gdb/symbols: add specific ko module load command"
selftests: link libasan statically for tests with -fsanitize=address
task_work: add kerneldoc annotation for 'data' argument
mm: page_alloc: fix CMA and HIGHATOMIC landing on the wrong buddy list
sh: mm: re-add lost __ref to ioremap_prot() to fix modpost warning
-----BEGIN PGP SIGNATURE-----
iQGzBAABCgAdFiEE6fsu8pdIjtWE/DpLiiy9cAdyT1EFAmUOSHkACgkQiiy9cAdy
T1HAswwAmCPPUWgIiR4XqWiXOmWh60+4Q7xsmSVvRAixvTf79Tkif/1AmVYhnYls
QvnbT5TIFPtFbnWHOIYSPxkjBlVRNTgSviSdOebZAv4aP3DJ+HhWqnv39ti7DFMt
qbNn9j53czq5eLChkIiCfqbeg4I8ra+vqZeeDs5TaRFqMLZ5KlJGlVJjnf88/o8m
cCjifqq53Bq1pZeaR1Q9aiwuzsSKazs4odEeFvwFSw0tsdjEj4Rk5vJqDcMnt460
yIXzoUv3iGJ2oz7qi5uysGPOpLAbDqwbJ5+127qja3cbgHg3MJqwotheRuULYX8G
Dz9hhgu5oGlvSO+fZXnsT2bQL+oqYUyql9EU5NBTF2gPf5M3E1vQaMNPj1cueNAg
dfa3x3Ez/M1XuH2aptK3ePuPIQIZgMjpMC7BPBKaIvxtGcNxEIuL0s+TEQbjJ8R1
/ybJv2i+LYfCrnjTDvH0Y4lTS40AHIcOwywQd6rbMuStK71+B7/bNYJkdKKK9PEF
L39ZwXDj
=Q4Rt
-----END PGP SIGNATURE-----
Merge tag '6.6-rc2-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6
Pull smb client fixes from Steve French:
"Six smb3 client fixes, including three for stable, from the SMB
plugfest (testing event) this week:
- Reparse point handling fix (found when investigating dir
enumeration when fifo in dir)
- Fix excessive thread creation for dir lease cleanup
- UAF fix in negotiate path
- remove duplicate error message mapping and fix confusing warning
message
- add dynamic trace point to improve debugging RDMA connection
attempts"
* tag '6.6-rc2-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6:
smb3: fix confusing debug message
smb: client: handle STATUS_IO_REPARSE_TAG_NOT_HANDLED
smb3: remove duplicate error mapping
cifs: Fix UAF in cifs_demultiplex_thread()
smb3: do not start laundromat thread when dir leases disabled
smb3: Add dynamic trace points for RDMA (smbdirect) reconnect
* Return EIO on bad inputs to iomap_to_bh instead of BUGging, to deal
less poorly with block device io racing with block device resizing.
* Fix a stale page data exposure bug introduced in 6.6-rc1 when
unsharing a file range that is not in the page cache.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQQ2qTKExjcn+O1o2YRKO3ySh0YRpgUCZQsCFAAKCRBKO3ySh0YR
pkhZAP9VHpjBn95MEai0dxAVjAi8IDcfwdzBuifBWlkQwnt6MAEAiHfHEDfN23o9
4Xg9EDqa8IOSYwxphJYnYG73Luvi5QQ=
=Xtnv
-----END PGP SIGNATURE-----
Merge tag 'iomap-6.6-fixes-2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux
Pull iomap fixes from Darrick Wong:
- Return EIO on bad inputs to iomap_to_bh instead of BUGging, to deal
less poorly with block device io racing with block device resizing
- Fix a stale page data exposure bug introduced in 6.6-rc1 when
unsharing a file range that is not in the page cache
* tag 'iomap-6.6-fixes-2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
iomap: convert iomap_unshare_iter to use large folios
iomap: don't skip reading in !uptodate folios when unsharing a range
iomap: handle error conditions more gracefully in iomap_to_bh
* Fix an integer overflow bug when processing an fsmap call.
* Fix crash due to CPU hot remove event racing with filesystem mount
operation.
* During read-only mount, XFS does not allow the contents of the log to be
recovered when there are one or more unrecognized rcompat features in the
primary superblock, since the log might have intent items which the kernel
does not know how to process.
* During recovery of log intent items, XFS now reserves log space sufficient
for one cycle of a permanent transaction to execute. Otherwise, this could
lead to livelocks due to non-availability of log space.
* On an fs which has an ondisk unlinked inode list, trying to delete a file
or allocating an O_TMPFILE file can cause the fs to the shutdown if the
first inode in the ondisk inode list is not present in the inode cache.
The bug is solved by explicitly loading the first inode in the ondisk
unlinked inode list into the inode cache if it is not already cached.
A similar problem arises when the uncached inode is present in the middle
of the ondisk unlinked inode list. This second bug is triggered when
executing operations like quotacheck and bulkstat. In this case, XFS now
reads in the entire ondisk unlinked inode list.
* Enable LARP mode only on recent v5 filesystems.
* Fix a out of bounds memory access in scrub.
* Fix a performance bug when locating the tail of the log during mounting a
filesystem.
Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQQjMC4mbgVeU7MxEIYH7y4RirJu9AUCZQkx4QAKCRAH7y4RirJu
9HrTAQD6QhvHkS43vueGOb4WISZPG/jMKJ/FjvwLZrIZ0erbJwEAtRWhClwFv3NZ
exJFtsmxrKC6Vifuo0pvfoCiK5mUvQ8=
=SrJR
-----END PGP SIGNATURE-----
Merge tag 'xfs-6.6-fixes-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux
Pull xfs fixes from Chandan Babu:
- Fix an integer overflow bug when processing an fsmap call
- Fix crash due to CPU hot remove event racing with filesystem mount
operation
- During read-only mount, XFS does not allow the contents of the log to
be recovered when there are one or more unrecognized rcompat features
in the primary superblock, since the log might have intent items
which the kernel does not know how to process
- During recovery of log intent items, XFS now reserves log space
sufficient for one cycle of a permanent transaction to execute.
Otherwise, this could lead to livelocks due to non-availability of
log space
- On an fs which has an ondisk unlinked inode list, trying to delete a
file or allocating an O_TMPFILE file can cause the fs to the shutdown
if the first inode in the ondisk inode list is not present in the
inode cache. The bug is solved by explicitly loading the first inode
in the ondisk unlinked inode list into the inode cache if it is not
already cached
A similar problem arises when the uncached inode is present in the
middle of the ondisk unlinked inode list. This second bug is
triggered when executing operations like quotacheck and bulkstat. In
this case, XFS now reads in the entire ondisk unlinked inode list
- Enable LARP mode only on recent v5 filesystems
- Fix a out of bounds memory access in scrub
- Fix a performance bug when locating the tail of the log during
mounting a filesystem
* tag 'xfs-6.6-fixes-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
xfs: use roundup_pow_of_two instead of ffs during xlog_find_tail
xfs: only call xchk_stats_merge after validating scrub inputs
xfs: require a relatively recent V5 filesystem for LARP mode
xfs: make inode unlinked bucket recovery work with quotacheck
xfs: load uncached unlinked inodes into memory on demand
xfs: reserve less log space when recovering log intent items
xfs: fix log recovery when unknown rocompat bits are set
xfs: reload entire unlinked bucket lists
xfs: allow inode inactivation during a ro mount log recovery
xfs: use i_prev_unlinked to distinguish inodes that are not on the unlinked list
xfs: remove CPU hotplug infrastructure
xfs: remove the all-mounts list
xfs: use per-mount cpumask to track nonempty percpu inodegc lists
xfs: fix an agbno overflow in __xfs_getfsmap_datadev
xfs: fix per-cpu CIL structure aggregation racing with dying cpus
xfs: fix select in config XFS_ONLINE_SCRUB_STATS
Using the following code with libtracefs:
int dfd;
// create the directory events/kprobes/kp1
tracefs_kprobe_raw(NULL, "kp1", "schedule_timeout", "time=$arg1");
// Open the kprobes directory
dfd = tracefs_instance_file_open(NULL, "events/kprobes", O_RDONLY);
// Do a lookup of the kprobes/kp1 directory (by looking at enable)
tracefs_file_exists(NULL, "events/kprobes/kp1/enable");
// Now create a new entry in the kprobes directory
tracefs_kprobe_raw(NULL, "kp2", "schedule_hrtimeout", "expires=$arg1");
// Do another lookup to create the dentries
tracefs_file_exists(NULL, "events/kprobes/kp2/enable"))
// Close the directory
close(dfd);
What happened above, the first open (dfd) will call
dcache_dir_open_wrapper() that will create the dentries and up their ref
counts.
Now the creation of "kp2" will add another dentry within the kprobes
directory.
Upon the close of dfd, eventfs_release() will now do a dput for all the
entries in kprobes. But this is where the problem lies. The open only
upped the dentry of kp1 and not kp2. Now the close is decrementing both
kp1 and kp2, which causes kp2 to get a negative count.
Doing a "trace-cmd reset" which deletes all the kprobes cause the kernel
to crash! (due to the messed up accounting of the ref counts).
To solve this, save all the dentries that are opened in the
dcache_dir_open_wrapper() into an array, and use this array to know what
dentries to do a dput on in eventfs_release().
Since the dcache_dir_open_wrapper() calls dcache_dir_open() which uses the
file->private_data, we need to also add a wrapper around dcache_readdir()
that uses the cursor assigned to the file->private_data. This is because
the dentries need to also be saved in the file->private_data. To do this
create the structure:
struct dentry_list {
void *cursor;
struct dentry **dentries;
};
Which will hold both the cursor and the dentries. Some shuffling around is
needed to make sure that dcache_dir_open() and dcache_readdir() only see
the cursor.
Link: https://lore.kernel.org/linux-trace-kernel/20230919211804.230edf1e@gandalf.local.home/
Link: https://lore.kernel.org/linux-trace-kernel/20230922163446.1431d4fa@gandalf.local.home
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Ajay Kaher <akaher@vmware.com>
Fixes: 6394044955 ("eventfs: Implement eventfs lookup, read, open functions")
Reported-by: "Masami Hiramatsu (Google)" <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
If ->iov_idx is zero, This means that the iov vector for the response
was not added during the request process. In other words, it means that
there is a problem in generating a response, So this patch return as
an error to avoid NULL pointer dereferencing problem.
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCZQsZLQAKCRCRxhvAZXjc
op0vAP96hkSUnmXmxTr8GHId3yfElN8ZZ3aSfePeBdljjKEZVAEA2+cbHLy4GqRi
TpjP1HNIdmtbVSC2ZnrgqkbwGageQgg=
=s92y
-----END PGP SIGNATURE-----
Merge tag 'v6.6-rc3.vfs.ctime.revert' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull finegrained timestamp reverts from Christian Brauner:
"Earlier this week we sent a few minor fixes for the multi-grained
timestamp work in [1]. While we were polishing those up after Linus
realized that there might be a nicer way to fix them we received a
regression report in [2] that fine grained timestamps break gnulib
tests and thus possibly other tools.
The kernel will elide fine-grain timestamp updates when no one is
actively querying for them to avoid performance impacts. So a sequence
like write(f1) stat(f2) write(f2) stat(f2) write(f1) stat(f1) may
result in timestamp f1 to be older than the final f2 timestamp even
though f1 was last written too but the second write didn't update the
timestamp.
Such plotholes can lead to subtle bugs when programs compare
timestamps. For example, the nap() function in [2] will estimate that
it needs to wait one ns on a fine-grain timestamp enabled filesytem
between subsequent calls to observe a timestamp change. But in general
we don't update timestamps with more than one jiffie if we think that
no one is actively querying for fine-grain timestamps to avoid
performance impacts.
While discussing various fixes the decision was to go back to the
drawing board and ultimately to explore a solution that involves only
exposing such fine-grained timestamps to nfs internally and never to
userspace.
As there are multiple solutions discussed the honest thing to do here
is not to fix this up or disable it but to cleanly revert. The general
infrastructure will probably come back but there is no reason to keep
this code in mainline.
The general changes to timestamp handling are valid and a good cleanup
that will stay. The revert is fully bisectable"
Link: https://lore.kernel.org/all/20230918-hirte-neuzugang-4c2324e7bae3@brauner [1]
Link: https://lore.kernel.org/all/bf0524debb976627693e12ad23690094e4514303.camel@linuxfromscratch.org [2]
* tag 'v6.6-rc3.vfs.ctime.revert' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
Revert "fs: add infrastructure for multigrain timestamps"
Revert "btrfs: convert to multigrain timestamps"
Revert "ext4: switch to multigrain timestamps"
Revert "xfs: switch to multigrain timestamps"
Revert "tmpfs: add support for multigrain timestamps"
Jens reported a compiler warning when using
CONFIG_CC_OPTIMIZE_FOR_SIZE=y that looks like this
fs/btrfs/tree-log.c: In function ‘btrfs_log_prealloc_extents’:
fs/btrfs/tree-log.c:4828:23: warning: ‘start_slot’ may be used
uninitialized [-Wmaybe-uninitialized]
4828 | ret = copy_items(trans, inode, dst_path, path,
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
4829 | start_slot, ins_nr, 1, 0);
| ~~~~~~~~~~~~~~~~~~~~~~~~~
fs/btrfs/tree-log.c:4725:13: note: ‘start_slot’ was declared here
4725 | int start_slot;
| ^~~~~~~~~~
The compiler is incorrect, as we only use this code when ins_len > 0,
and when ins_len > 0 we have start_slot properly initialized. However
we generally find the -Wmaybe-uninitialized warnings valuable, so
initialize start_slot to get rid of the warning.
Reported-by: Jens Axboe <axboe@kernel.dk>
Tested-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Jens reported a compiler error when using CONFIG_CC_OPTIMIZE_FOR_SIZE=y
that looks like this
In function ‘gather_device_info’,
inlined from ‘btrfs_create_chunk’ at fs/btrfs/volumes.c:5507:8:
fs/btrfs/volumes.c:5245:48: warning: ‘dev_offset’ may be used uninitialized [-Wmaybe-uninitialized]
5245 | devices_info[ndevs].dev_offset = dev_offset;
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~
fs/btrfs/volumes.c: In function ‘btrfs_create_chunk’:
fs/btrfs/volumes.c:5196:13: note: ‘dev_offset’ was declared here
5196 | u64 dev_offset;
This occurs because find_free_dev_extent is responsible for setting
dev_offset, however if we get an -ENOMEM at the top of the function
we'll return without setting the value.
This isn't actually a problem because we will see the -ENOMEM in
gather_device_info() and return and not use the uninitialized value,
however we also just don't want the compiler warning so rework the code
slightly in find_free_dev_extent() to make sure it's always setting
*start and *len to avoid the compiler warning.
Reported-by: Jens Axboe <axboe@kernel.dk>
Tested-by: Jens Axboe <axboe@kernel.dk>
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
The message said it was an invalid mode, when it was intentionally
not set. Fix confusing message logged to dmesg.
Reviewed-by: Paulo Alcantara (SUSE) <pc@manguebit.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Fix missing set of cifs_open_info_data::reparse_point when SMB2_CREATE
request fails with STATUS_IO_REPARSE_TAG_NOT_HANDLED.
Fixes: 5f71ebc412 ("smb: client: parse reparse point flag in create response")
Signed-off-by: Paulo Alcantara (SUSE) <pc@manguebit.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
In status_to_posix_error STATUS_IO_REPARSE_TAG_NOT_HANDLED was mapped
to both -EOPNOTSUPP and also to -EIO but the later one (-EIO) is
ignored. Remove the duplicate.
Reviewed-by: Paulo Alcantara (SUSE) <pc@manguebit.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Commit f98b6215d7 ("btrfs: extent_io: do extra check for extent buffer
read write functions") changed how we handle invalid extent buffer range
for read_extent_buffer().
Previously if the range is invalid we just set the destination to zero,
but after the patch we do nothing and error out.
This can lead to smatch static checker errors like:
fs/btrfs/print-tree.c:186 print_uuid_item() error: uninitialized symbol 'subvol_id'.
fs/btrfs/tests/extent-io-tests.c:338 check_eb_bitmap() error: uninitialized symbol 'has'.
fs/btrfs/tests/extent-io-tests.c:353 check_eb_bitmap() error: uninitialized symbol 'has'.
fs/btrfs/uuid-tree.c:203 btrfs_uuid_tree_remove() error: uninitialized symbol 'read_subid'.
fs/btrfs/uuid-tree.c:353 btrfs_uuid_tree_iterate() error: uninitialized symbol 'subid_le'.
fs/btrfs/uuid-tree.c:72 btrfs_uuid_tree_lookup() error: uninitialized symbol 'data'.
fs/btrfs/volumes.c:7415 btrfs_dev_stats_value() error: uninitialized symbol 'val'.
Fix those warnings by reverting back to the old memset() behavior.
By this we keep the static checker happy and would still make a lot of
noise when such invalid ranges are passed in.
Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
Fixes: f98b6215d7 ("btrfs: extent_io: do extra check for extent buffer read write functions")
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
A user reported some issues with smaller file systems that get very
full. While investigating this issue I noticed that df wasn't showing
100% full, despite having 0 chunk space and having < 1MiB of available
metadata space.
This turns out to be an overflow issue, we're doing:
total_available_metadata_space - SZ_4M < global_block_rsv_size
to determine if there's not enough space to make metadata allocations,
which overflows if total_available_metadata_space is < 4M. Fix this by
checking to see if our available space is greater than the 4M threshold.
This makes df properly report 100% usage on the file system.
CC: stable@vger.kernel.org # 4.14+
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
When running a delayed extent operation, if we don't find the extent item
in the extent tree we just return -EIO without any logged message. This
indicates some bug or possibly a memory or fs corruption, so the return
value should not be -EIO but -EUCLEAN instead, and since it's not expected
to ever happen, print an informative error message so that if it happens
we have some idea of what went wrong, where to look at.
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
At __btrfs_inc_extent_ref() we are doing a BUG_ON() if we are dealing with
a tree block reference that has a reference count that is different from 1,
but we have already dealt with this case at run_delayed_tree_ref(), making
it useless. So remove the BUG_ON().
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
When running a delayed tree reference, if we find a ref count different
from 1, we return -EIO. This isn't an IO error, as it indicates either a
bug in the delayed refs code or a memory corruption, so change the error
code from -EIO to -EUCLEAN. Also tag the branch as 'unlikely' as this is
not expected to ever happen, and change the error message to print the
tree block's bytenr without the parenthesis (and there was a missing space
between the 'block' word and the opening parenthesis), for consistency as
that's the style we used everywhere else.
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
When starting a transaction, with a non-zero number of items, we reserve
metadata space for that number of items and for delayed refs by doing a
call to btrfs_block_rsv_add(), with the transaction block reserve passed
as the block reserve argument. This reserves metadata space and adds it
to the transaction block reserve. Later we migrate the space we reserved
for delayed references from the transaction block reserve into the delayed
refs block reserve, by calling btrfs_migrate_to_delayed_refs_rsv().
btrfs_migrate_to_delayed_refs_rsv() decrements the number of bytes to
migrate from the source block reserve, and this however may result in an
underflow in case the space added to the transaction block reserve ended
up being used by another task that has not reserved enough space for its
own use - examples are tasks doing reflinks or hole punching because they
end up calling btrfs_replace_file_extents() -> btrfs_drop_extents() and
may need to modify/COW a variable number of leaves/paths, so they keep
trying to use space from the transaction block reserve when they need to
COW an extent buffer, and may end up trying to use more space then they
have reserved (1 unit/path only for removing file extent items).
This can be avoided by simply reserving space first without adding it to
the transaction block reserve, then add the space for delayed refs to the
delayed refs block reserve and finally add the remaining reserved space
to the transaction block reserve. This also makes the code a bit shorter
and simpler. So just do that.
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
If we have two (or more) tasks attempting to refill the delayed refs block
reserve we can end up with the delayed block reserve being over reserved,
that is, with a reserved space greater than its size. If this happens, we
are holding to more reserved space than necessary for a while.
The race happens like this:
1) The delayed refs block reserve has a size of 8M and a reserved space of
6M for example;
2) Task A calls btrfs_delayed_refs_rsv_refill();
3) Task B also calls btrfs_delayed_refs_rsv_refill();
4) Task A sees there's a 2M difference between the size and the reserved
space of the delayed refs rsv, so it will reserve 2M of space by
calling btrfs_reserve_metadata_bytes();
5) Task B also sees that 2M difference, and like task A, it reserves
another 2M of metadata space;
6) Both task A and task B increase the reserved space of block reserve
by 2M, by calling btrfs_block_rsv_add_bytes(), so the block reserve
ends up with a size of 8M and a reserved space of 10M;
7) The extra, over reserved space will eventually be freed by some task
calling btrfs_delayed_refs_rsv_release() -> btrfs_block_rsv_release()
-> block_rsv_release_bytes(), as there we will detect the over reserve
and release that space.
So fix this by checking if we still need to add space to the delayed refs
block reserve after reserving the metadata space, and if we don't, just
release that space immediately.
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEE8rQSAMVO+zA4DBdWxWXV+ddtWDsFAmULIZUACgkQxWXV+ddt
WDv77Q//ZiKpmevPQmQfUtmV8WwMfD2a9zRlKBpGggwtrD4mf3CYRLnOpTm81MPO
vFIuYacBn+9UXqp2j/IbvNWfQAPQNVDxSPXx66uba93RJc+bB1J3TydxcEyJ7fr4
dwhLLk01jttfk0+rnjF34fmXiHSTtI6D2WeaLCzUbaPLw4SZ+ul+GAdeF3P174iO
OMNBUln7hK00Q7j8kFf4j6SW1yIIKMTl6MfOFJYanIqzx51PYFFVtKwoCr0Vt53v
ZHbgrK582ZJO6pKF9kJF/1tqrY9/Df8jzgSypK8pew/SukMOrf7iVwrmhietuhKA
92j5sxKhCRyq6Qg6ZwC0jyk+oMqrT8r+q3r38a5qDJx/9Q279vkXBqQnACfLjmnH
6+sNdkY5/uBWnDMh/+d6yBtfbdW5DtuET4McYpJt1Nk2St/f3UzPaL4LcNkDXNPk
t1Q4W4v0KS1V8TbsLfdD629CMghxQNKVs1XqyCAbUq9ub4LE2CtL3lDm730qZoZt
+LM7+sAxEOJC6yqYfdEbcIc8l27Hl5nZEzamcvMrRz61N85/8Jx4Sq2b6VSE9TCE
hNEWAL5sOjhuhmUPhatYC+KO1P6NDP+Yg99yZCZIT9s/P1oK5H+aETshWX+lvJ+Q
Ai+qzKvp2ERHFcE+R5qIXs/uX7azpzjqsRZxY2/zdp70ugQDSXE=
=0eEg
-----END PGP SIGNATURE-----
Merge tag 'for-6.6-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull btrfs fixes from David Sterba:
"A few more followup fixes to the directory listing.
People have noticed different behaviour compared to other filesystems
after changes in 6.5. This is now unified to more "logical" and
expected behaviour while still within POSIX. And a few more fixes for
stable.
- change behaviour of readdir()/rewinddir() when new directory
entries are created after opendir(), properly tracking the last
entry
- fix race in readdir when multiple threads can set the last entry
index for a directory
Additionally:
- use exclusive lock when direct io might need to drop privs and call
notify_change()
- don't clear uptodate bit on page after an error, this may lead to a
deadlock in subpage mode
- fix waiting pattern when multiple readers block on Merkle tree
data, switch to folios"
* tag 'for-6.6-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: fix race between reading a directory and adding entries to it
btrfs: refresh dir last index during a rewinddir(3) call
btrfs: set last dir index to the current last index when opening dir
btrfs: don't clear uptodate on write errors
btrfs: file_remove_privs needs an exclusive lock in direct io write
btrfs: convert btrfs_read_merkle_tree_page() to use a folio
This reverts commit ffb6cf19e0.
Users reported regressions due to enabling multi-grained timestamps
unconditionally. As no clear consensus on a solution has come up and the
discussion has gone back to the drawing board revert the infrastructure
changes for. If it isn't code that's here to stay, make it go away.
Message-ID: <20230920-keine-eile-c9755b5825db@brauner>
Acked-by: Jan Kara <jack@suse.cz>
Acked-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
This reverts commit 50e9ceef1d.
Users reported regressions due to enabling multi-grained timestamps
unconditionally. As no clear consensus on a solution has come up and the
discussion has gone back to the drawing board revert the infrastructure
changes for. If it isn't code that's here to stay, make it go away.
Message-ID: <20230920-keine-eile-c9755b5825db@brauner>
Acked-by: Jan Kara <jack@suse.cz>
Acked-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
This reverts commit 0269b58586.
Users reported regressions due to enabling multi-grained timestamps
unconditionally. As no clear consensus on a solution has come up and the
discussion has gone back to the drawing board revert the infrastructure
changes for. If it isn't code that's here to stay, make it go away.
Message-ID: <20230920-keine-eile-c9755b5825db@brauner>
Acked-by: Jan Kara <jack@suse.cz>
Acked-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
This reverts commit e44df26647.
Users reported regressions due to enabling multi-grained timestamps
unconditionally. As no clear consensus on a solution has come up and the
discussion has gone back to the drawing board revert the infrastructure
changes for. If it isn't code that's here to stay, make it go away.
Message-ID: <20230920-keine-eile-c9755b5825db@brauner>
Acked-by: Jan Kara <jack@suse.cz>
Acked-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
This code duplication was introduced by commit a194dfe6e6 ("pipe:
Rearrange sequence in pipe_write() to preallocate slot"), but since
the pipe's mutex is locked, nobody else can modify the value
meanwhile.
Signed-off-by: Max Kellermann <max.kellermann@ionos.com>
Message-Id: <20230919074045.1066796-1-max.kellermann@ionos.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
When writing back an inode and performing an fsync on it concurrently, a
deadlock issue may arise as shown below. In each writeback iteration, a
clean inode is requeued to the wb->b_dirty queue due to non-zero
pages_skipped, without anything actually being written. This causes an
infinite loop and prevents the plug from being flushed, resulting in a
deadlock. We now avoid requeuing the clean inode to prevent this issue.
wb_writeback fsync (inode-Y)
blk_start_plug(&plug)
for (;;) {
iter i-1: some reqs with page-X added into plug->mq_list // f2fs node page-X with PG_writeback
filemap_fdatawrite
__filemap_fdatawrite_range // write inode-Y with sync_mode WB_SYNC_ALL
do_writepages
f2fs_write_data_pages
__f2fs_write_data_pages // wb_sync_req[DATA]++ for WB_SYNC_ALL
f2fs_write_cache_pages
f2fs_write_single_data_page
f2fs_do_write_data_page
f2fs_outplace_write_data
f2fs_update_data_blkaddr
f2fs_wait_on_page_writeback
wait_on_page_writeback // wait for f2fs node page-X
iter i:
progress = __writeback_inodes_wb(wb, work)
. writeback_sb_inodes
. __writeback_single_inode // write inode-Y with sync_mode WB_SYNC_NONE
. . do_writepages
. . f2fs_write_data_pages
. . . __f2fs_write_data_pages // skip writepages due to (wb_sync_req[DATA]>0)
. . . wbc->pages_skipped += get_dirty_pages(inode) // wbc->pages_skipped = 1
. if (!(inode->i_state & I_DIRTY_ALL)) // i_state = I_SYNC | I_SYNC_QUEUED
. total_wrote++; // total_wrote = 1
. requeue_inode // requeue inode-Y to wb->b_dirty queue due to non-zero pages_skipped
if (progress) // progress = 1
continue;
iter i+1:
queue_io
// similar process with iter i, infinite for-loop !
}
blk_finish_plug(&plug) // flush plug won't be called
Signed-off-by: Chunhai Guo <guochunhai@vivo.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Message-Id: <20230916045131.957929-1-guochunhai@vivo.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
Prepare for the coming implementation by GCC and Clang of the __counted_by
attribute. Flexible array members annotated with __counted_by can have
their accesses bounds-checked at run-time checking via CONFIG_UBSAN_BOUNDS
(for array indexing) and CONFIG_FORTIFY_SOURCE (for strcpy/memcpy-family
functions).
As found with Coccinelle[1], add __counted_by for struct kioctx_table.
[1] https://github.com/kees/kernel-tools/blob/trunk/coccinelle/examples/counted_by.cocci
Cc: Benjamin LaHaise <bcrl@kvack.org>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Christian Brauner <brauner@kernel.org>
Cc: linux-aio@kvack.org
Cc: linux-fsdevel@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: "Gustavo A. R. Silva" <gustavoars@kernel.org>
Message-Id: <20230915201413.never.881-kees@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
If we fail filemap_write_and_wait_range() on the range the buffered write went
into, we only report the "number of bytes which we direct-written", to quote
the comment in there. Which is fine, but buffered write has already advanced
iocb->ki_pos, so we need to roll that back. Otherwise we end up with e.g.
write(2) advancing position by more than the amount it reports having written.
Fixes: 182c25e9c1 "filemap: update ki_pos in generic_perform_write"
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Message-Id: <20230827214518.GU3390869@ZenIV>
Signed-off-by: Christian Brauner <brauner@kernel.org>
There is a UAF when xfstests on cifs:
BUG: KASAN: use-after-free in smb2_is_network_name_deleted+0x27/0x160
Read of size 4 at addr ffff88810103fc08 by task cifsd/923
CPU: 1 PID: 923 Comm: cifsd Not tainted 6.1.0-rc4+ #45
...
Call Trace:
<TASK>
dump_stack_lvl+0x34/0x44
print_report+0x171/0x472
kasan_report+0xad/0x130
kasan_check_range+0x145/0x1a0
smb2_is_network_name_deleted+0x27/0x160
cifs_demultiplex_thread.cold+0x172/0x5a4
kthread+0x165/0x1a0
ret_from_fork+0x1f/0x30
</TASK>
Allocated by task 923:
kasan_save_stack+0x1e/0x40
kasan_set_track+0x21/0x30
__kasan_slab_alloc+0x54/0x60
kmem_cache_alloc+0x147/0x320
mempool_alloc+0xe1/0x260
cifs_small_buf_get+0x24/0x60
allocate_buffers+0xa1/0x1c0
cifs_demultiplex_thread+0x199/0x10d0
kthread+0x165/0x1a0
ret_from_fork+0x1f/0x30
Freed by task 921:
kasan_save_stack+0x1e/0x40
kasan_set_track+0x21/0x30
kasan_save_free_info+0x2a/0x40
____kasan_slab_free+0x143/0x1b0
kmem_cache_free+0xe3/0x4d0
cifs_small_buf_release+0x29/0x90
SMB2_negotiate+0x8b7/0x1c60
smb2_negotiate+0x51/0x70
cifs_negotiate_protocol+0xf0/0x160
cifs_get_smb_ses+0x5fa/0x13c0
mount_get_conns+0x7a/0x750
cifs_mount+0x103/0xd00
cifs_smb3_do_mount+0x1dd/0xcb0
smb3_get_tree+0x1d5/0x300
vfs_get_tree+0x41/0xf0
path_mount+0x9b3/0xdd0
__x64_sys_mount+0x190/0x1d0
do_syscall_64+0x35/0x80
entry_SYSCALL_64_after_hwframe+0x46/0xb0
The UAF is because:
mount(pid: 921) | cifsd(pid: 923)
-------------------------------|-------------------------------
| cifs_demultiplex_thread
SMB2_negotiate |
cifs_send_recv |
compound_send_recv |
smb_send_rqst |
wait_for_response |
wait_event_state [1] |
| standard_receive3
| cifs_handle_standard
| handle_mid
| mid->resp_buf = buf; [2]
| dequeue_mid [3]
KILL the process [4] |
resp_iov[i].iov_base = buf |
free_rsp_buf [5] |
| is_network_name_deleted [6]
| callback
1. After send request to server, wait the response until
mid->mid_state != SUBMITTED;
2. Receive response from server, and set it to mid;
3. Set the mid state to RECEIVED;
4. Kill the process, the mid state already RECEIVED, get 0;
5. Handle and release the negotiate response;
6. UAF.
It can be easily reproduce with add some delay in [3] - [6].
Only sync call has the problem since async call's callback is
executed in cifsd process.
Add an extra state to mark the mid state to READY before wakeup the
waitter, then it can get the resp safely.
Fixes: ec637e3ffb ("[CIFS] Avoid extra large buffer allocation (and memcpy) in cifs_readpages")
Reviewed-by: Paulo Alcantara (SUSE) <pc@manguebit.com>
Signed-off-by: Zhang Xiaoxu <zhangxiaoxu5@huawei.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
On no-MMU, /proc/<pid>/maps reads as an empty file. This happens because
find_vma(mm, 0) always returns NULL (assuming no vma actually contains the
zero address, which is normally the case).
To fix this bug and improve the maintainability in the future, this patch
makes the no-MMU implementation as similar as possible to the MMU
implementation.
The only remaining differences are the lack of hold/release_task_mempolicy
and the extra code to shoehorn the gate vma into the iterator.
This has been tested on top of 6.5.3 on an STM32F746.
Link: https://lkml.kernel.org/r/20230915160055.971059-2-ben.wolsieffer@hefring.com
Fixes: 0c563f1480 ("proc: remove VMA rbtree use from nommu")
Signed-off-by: Ben Wolsieffer <ben.wolsieffer@hefring.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Giulio Benetti <giulio.benetti@benettiengineering.com>
Cc: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
The no-MMU implementation of /proc/<pid>/map doesn't normally release
the mmap read lock, because it uses !IS_ERR_OR_NULL(_vml) to determine
whether to release the lock. Since _vml is NULL when the end of the
mappings is reached, the lock is not released.
Reading /proc/1/maps twice doesn't cause a hang because it only
takes the read lock, which can be taken multiple times and therefore
doesn't show any problem if the lock isn't released. Instead, you need
to perform some operation that attempts to take the write lock after
reading /proc/<pid>/maps. To actually reproduce the bug, compile the
following code as 'proc_maps_bug':
#include <stdio.h>
#include <unistd.h>
#include <sys/mman.h>
int main(int argc, char *argv[]) {
void *buf;
sleep(1);
buf = mmap(NULL, 4096, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
puts("mmap returned");
return 0;
}
Then, run:
./proc_maps_bug &; cat /proc/$!/maps; fg
Without this patch, mmap() will hang and the command will never
complete.
This code was incorrectly adapted from the MMU implementation, which at
the time released the lock in m_next() before returning the last entry.
The MMU implementation has diverged further from the no-MMU version since
then, so this patch brings their locking and error handling into sync,
fixing the bug and hopefully avoiding similar issues in the future.
Link: https://lkml.kernel.org/r/20230914163019.4050530-2-ben.wolsieffer@hefring.com
Fixes: 47fecca15c ("fs/proc/task_nommu.c: don't use priv->task->mm")
Signed-off-by: Ben Wolsieffer <ben.wolsieffer@hefring.com>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Cc: Giulio Benetti <giulio.benetti@benettiengineering.com>
Cc: Greg Ungerer <gerg@uclinux.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
disabled
When no directory lease support, or for IPC shares where directories
can not be opened, do not start an unneeded laundromat thread for
that mount (it wastes resources).
Fixes: d14de8067e ("cifs: Add a laundromat thread for cached directories")
Reviewed-by: Paulo Alcantara (SUSE) <pc@manguebit.com>
Acked-by: Tom Talpey <tom@talpey.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Convert iomap_unshare_iter to create large folios if possible, since the
write and zeroing paths already do that. I think this got missed in the
conversion of the write paths that landed in 6.6-rc1.
Cc: ritesh.list@gmail.com, willy@infradead.org
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
smb3_smbd_connect_done and smb3_smbd_connect_err
To improve debugging of RDMA issues add those two. We already
had dynamic tracepoints for non-RDMA connect done and error cases.
Reviewed-by: Paulo Alcantara (SUSE) <pc@manguebit.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Prior to commit a01b8f2252, we would always read in the contents of a
!uptodate folio prior to writing userspace data into the folio,
allocated a folio state object, etc. Ritesh introduced an optimization
that skips all of that if the write would cover the entire folio.
Unfortunately, the optimization misses the unshare case, where we always
have to read in the folio contents since there isn't a data buffer
supplied by userspace. This can result in stale kernel memory exposure
if userspace issues a FALLOC_FL_UNSHARE_RANGE call on part of a shared
file that isn't already cached.
This was caught by observing fstests regressions in the "unshare around"
mechanism that is used for unaligned writes to a reflinked realtime
volume when the realtime extent size is larger than 1FSB, though I think
it applies to any shared file.
Cc: ritesh.list@gmail.com, willy@infradead.org
Fixes: a01b8f2252 ("iomap: Allocate ifs in ->write_begin() early")
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
Bugfixes:
* Various O_DIRECT related fixes from Trond
* Error handling
* Locking issues
* Use the correct commit infor for joining page groups
* Fixes for rescheduling IO
* Sunrpc bad verifier fixes
* Report EINVAL errors from connect()
* Revalidate creds that the server has rejected
* Revert "SUNRPC: Fail faster on bad verifier"
* Fix pNFS session trunking when MDS=DS
* Fix zero-value filehandles for post-open getattr operations
* Fix compiler warning about tautological comparisons
* Revert "SUNRPC: clean up integer overflow check" before Trond's fix
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEnZ5MQTpR7cLU7KEp18tUv7ClQOsFAmUIjesACgkQ18tUv7Cl
QOv/MBAAxxZrW45qlTsEMvhnFXu1cvt08IdqivWURuSxK6oV31tARMqw00MxzhsF
ubNX4hynDNmsWYJLryHTTGwUJzaNQOhAGjU46AhwFgS3Lj2wefeuR3IYuEDgQcHF
8uXrrT/GWRIG4a/Uu1IxodfgMvgUPV+HGFc5BQFGkXGqPQdgbdxBDtOtyU1QlmaV
rqKnxPgUibEZwVoz8+PNsN+pwNjKSaSR3pH53KKS6HUBAM69sT4jJiJDO/UGSlzb
F9U1DHPeyltHXVAaQXUjHlsjh49NbIjzD5/T8xKWscvS7rbQFCcJkUl3MBHyvSEr
eixx7muqOMCEm4lNFJqHIY7XrEN+50wi92tKj05Bz9wrnp4L84cxSV/KNZrKNA3T
LCeIm4DGynPcLPopEUm9lqZMrabwvdV4wZSiybB6p/F/5ksVyaMnwTLzZZ1+NhkB
OglNhF4L6m18/Ai/bY6XOixzgzWfcmYfXRhq7YoeO/JSvixG9Sk2crC6ryjWadgo
xbitjjCYXHl3MUkH2TcaQoMGFhmvSShXg//5YXpxm3/0C3EVPa44T7/aFJqelL2p
kkVsLcjcevAOwBHxehYLofL6c3GhLqRnwmTV7yqgJqfZf8uGxeQhXCMqOmv57/c0
3iJJM4Llb14B+t3xl3T8MotT4WejzlHDnJKrscipfXJwRT7UDIM=
=M6aI
-----END PGP SIGNATURE-----
Merge tag 'nfs-for-6.6-2' of git://git.linux-nfs.org/projects/anna/linux-nfs
Pull NFS client fixes from Anna Schumaker:
"Various O_DIRECT related fixes from Trond:
- Error handling
- Locking issues
- Use the correct commit info for joining page groups
- Fixes for rescheduling IO
Sunrpc bad verifier fixes:
- Report EINVAL errors from connect()
- Revalidate creds that the server has rejected
- Revert "SUNRPC: Fail faster on bad verifier"
Misc:
- Fix pNFS session trunking when MDS=DS
- Fix zero-value filehandles for post-open getattr operations
- Fix compiler warning about tautological comparisons
- Revert 'SUNRPC: clean up integer overflow check' before Trond's fix"
* tag 'nfs-for-6.6-2' of git://git.linux-nfs.org/projects/anna/linux-nfs:
SUNRPC: Silence compiler complaints about tautological comparisons
Revert "SUNRPC: clean up integer overflow check"
NFSv4.1: fix zero value filehandle in post open getattr
NFSv4.1: fix pnfs MDS=DS session trunking
Revert "SUNRPC: Fail faster on bad verifier"
SUNRPC: Mark the cred for revalidation if the server rejects it
NFS/pNFS: Report EINVAL errors from connect() to the server
NFS: More fixes for nfs_direct_write_reschedule_io()
NFS: Use the correct commit info in nfs_join_page_group()
NFS: More O_DIRECT accounting fixes for error paths
NFS: Fix O_DIRECT locking issues
NFS: Fix error handling for O_DIRECT write scheduling
If a network filesystem using netfs implements a clamp_length()
function, it can set subrequest lengths smaller than a page size.
When we loop through the folios in netfs_rreq_unlock_folios() to
set any folios to be written back, we need to make sure we only
call folio_start_fscache() once for each folio.
Otherwise, this simple testcase:
mount -o fsc,rsize=1024,wsize=1024 127.0.0.1:/export /mnt/nfs
dd if=/dev/zero of=/mnt/nfs/file.bin bs=4096 count=1
1+0 records in
1+0 records out
4096 bytes (4.1 kB, 4.0 KiB) copied, 0.0126359 s, 324 kB/s
echo 3 > /proc/sys/vm/drop_caches
cat /mnt/nfs/file.bin > /dev/null
will trigger an oops similar to the following:
page dumped because: VM_BUG_ON_FOLIO(folio_test_private_2(folio))
------------[ cut here ]------------
kernel BUG at include/linux/netfs.h:44!
...
CPU: 5 PID: 134 Comm: kworker/u16:5 Kdump: loaded Not tainted 6.4.0-rc5
...
RIP: 0010:netfs_rreq_unlock_folios+0x68e/0x730 [netfs]
...
Call Trace:
netfs_rreq_assess+0x497/0x660 [netfs]
netfs_subreq_terminated+0x32b/0x610 [netfs]
nfs_netfs_read_completion+0x14e/0x1a0 [nfs]
nfs_read_completion+0x2f9/0x330 [nfs]
rpc_free_task+0x72/0xa0 [sunrpc]
rpc_async_release+0x46/0x70 [sunrpc]
process_one_work+0x3bd/0x710
worker_thread+0x89/0x610
kthread+0x181/0x1c0
ret_from_fork+0x29/0x50
Fixes: 3d3c950467 ("netfs: Provide readahead and readpage netfs helpers"
Link: https://bugzilla.redhat.com/show_bug.cgi?id=2210612
Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: David Howells <dhowells@redhat.com>
Link: https://lore.kernel.org/r/20230608214137.856006-1-dwysocha@redhat.com/ # v1
Link: https://lore.kernel.org/r/20230915185704.1082982-1-dwysocha@redhat.com/ # v2
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Patch eef46ab713 introduced a new gfs2 quota=quiet mount option.
Checks for the new option were added to quota.c, but a check in
gfs2_quota_lock_check() was overlooked. This patch adds the missing
check.
Fixes: eef46ab713 ("gfs2: Introduce new quota=quiet mount option")
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Before this patch, function gfs2_scan_glock_lru would only try to free
glocks that had a reference count of 0. But if the reference count ever
got to 0, the glock should have already been freed.
Shrinker function gfs2_dispose_glock_lru checks whether glocks on the
LRU are demote_ok, and if so, tries to demote them. But that's only
possible if the reference count is at least 1.
This patch changes gfs2_scan_glock_lru so it will try to demote and/or
dispose of glocks that have a reference count of 1 and which are either
demotable, or are already unlocked.
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
On a thawed filesystem, the freeze glock is held in shared mode. In
order to initiate a cluster-wide freeze, the node initiating the freeze
drops the freeze glock and grabs it in exclusive mode. The other nodes
recognize this as contention on the freeze glock; function
freeze_go_callback is invoked. This indicates to them that they must
freeze the filesystem locally, drop the freeze glock, and then
re-acquire it in shared mode before being able to unfreeze the
filesystem locally.
While a node is trying to re-acquire the freeze glock in shared mode,
additional contention can occur. In that case, the node must behave in
the same way as above.
Unfortunately, freeze_go_callback() contains a check that causes it to
bail out when the freeze glock isn't held in shared mode. Fix that to
allow the glock to be unlocked or held in shared mode.
In addition, update a reference to trylock_super() which has been
renamed to super_trylock_shared() in the meantime.
Fixes: b77b4a4815 ("gfs2: Rework freeze / thaw logic")
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Function ceph_get_inode() never returns NULL; instead it returns an
ERR_PTR() if something fails. Thus, the check for NULL in parse_longname()
is useless and can be dropped. Instead, move there the debug code that
does the error checking so that it's only executed if ceph_get_inode() is
called.
Fixes: dd66df0053 ("ceph: add support for encrypted snapshot names")
Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
Signed-off-by: Luís Henriques <lhenriques@suse.de>
Reviewed-by: Xiubo Li <xiubli@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>