-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCZTD6IQAKCRCRxhvAZXjc
opXLAQC9X+ECnGUAOy/kvOrEBkBb7G4BuZ8XsrnL976riVNp0gEA85LaJV9Ow7Xk
51k/1ujhYkglQbCsa0zo+mI4ueE3wAQ=
=Dqrj
-----END PGP SIGNATURE-----
Merge tag 'v6.6-rc7.vfs.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull vfs fix from Christian Brauner:
"An openat() call from io_uring triggering an audit call can apparently
cause the refcount of struct filename to be incremented from multiple
threads concurrently during async execution, triggering a refcount
underflow and hitting a BUG_ON(). That bug has been lurking around
since at least v5.16 apparently.
Switch to an atomic counter to fix that. The underflow check is
downgraded from a BUG_ON() to a WARN_ON_ONCE() but we could easily
remove that check altogether tbh"
* tag 'v6.6-rc7.vfs.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
audit,io_uring: io_uring openat triggers audit reference count underflow
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEh0DEKNP0I9IjwfWEqbAzH4MkB7YFAmUwyyAACgkQqbAzH4Mk
B7aQsg/9FohvuET2YOqYlmtgbq1YjFlWkVxoNB2Be4gxKEqRcFN4V/O8PrOD0ZBX
KH8ASWGp9EKQlVKucAPUhSaZMv4u2HaKTVyRnH5KpUmNG7rC7EzRgYDY4xA5RfjL
aIxgVOhaSibFBGfys2KoS8MHFgZ9mcCWZZ1154ueSkZfX9ghytZcql5i4Ahi2OGQ
u+aa/tkhoIS9ZKWp/uIz4R+fDClrG8UoiWmJ1LoOHQ5YM3enboOld6Zz1CjbC9yN
ldInX7klL9aHRf3SoAeRVCyL23e4GB5R4+QmBhZQemhwy0jF0/7E0kCmCK0wYFWn
3t1M/+i0S7o7rI+utTA2mTuZzxnNPMXJsn2TnpH9vv1o633WOlL0mrvsM7Z/NJDz
VShPinQ9eJGvZQ9kYuMfZODtK0BP7OV8mRi7aXinzHucG+ISB2fBGxsnA0lX9EfT
fHix1oNI33lfazVzaStJ3hs/n9KXwP/9P49rBcLh75O2t3TBKc2T84qpywNx+W6S
9GDMV5V2FNWf/hMCL8pcTtab3MvpbeU5DqKnxW1D1vu2vdGiHVum/Aq26aE5W7nl
EcE5pjKppdNZgk3Qwj190vym7d8lbK4Esua6ieNM/qvWb7hAGoZO7Tjcypvj+YMY
W4aiJTW/jULPT2tkUvLvHg/r5DFHi30v6eCeTCrPfivxFRIAzBQ=
=ScrJ
-----END PGP SIGNATURE-----
Merge tag 'ntfs3_for_6.6' of https://github.com/Paragon-Software-Group/linux-ntfs3
Pull ntfs3 fixes from Konstantin Komarov:
- memory leak
- some logic errors, NULL dereferences
- some code was refactored
- more sanity checks
* tag 'ntfs3_for_6.6' of https://github.com/Paragon-Software-Group/linux-ntfs3:
fs/ntfs3: Avoid possible memory leak
fs/ntfs3: Fix directory element type detection
fs/ntfs3: Fix possible null-pointer dereference in hdr_find_e()
fs/ntfs3: Fix OOB read in ntfs_init_from_boot
fs/ntfs3: fix panic about slab-out-of-bounds caused by ntfs_list_ea()
fs/ntfs3: Fix NULL pointer dereference on error in attr_allocate_frame()
fs/ntfs3: Fix possible NULL-ptr-deref in ni_readpage_cmpr()
fs/ntfs3: Do not allow to change label if volume is read-only
fs/ntfs3: Add more info into /proc/fs/ntfs3/<dev>/volinfo
fs/ntfs3: Refactoring and comments
fs/ntfs3: Fix alternative boot searching
fs/ntfs3: Allow repeated call to ntfs3_put_sbi
fs/ntfs3: Use inode_set_ctime_to_ts instead of inode_set_ctime
fs/ntfs3: Fix shift-out-of-bounds in ntfs_fill_super
fs/ntfs3: fix deadlock in mark_as_free_ex
fs/ntfs3: Add more attributes checks in mi_enum_attr()
fs/ntfs3: Use kvmalloc instead of kmalloc(... __GFP_NOWARN)
fs/ntfs3: Write immediately updated ntfs state
fs/ntfs3: Add ckeck in ni_update_parent()
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEE8rQSAMVO+zA4DBdWxWXV+ddtWDsFAmUuntEACgkQxWXV+ddt
WDssdQ/9Fo6tN+MCH5ISAcvsW6WvBUWT62MrDnzawh96QhUf3NYf9sjME7QqwHHv
w60SDiqRlAd5UzxdIPC4Qa/6GVZZh2yFLzew3l8Fh6anxhjO5argdsfx1Wv4ADk/
FHI8zs6EZiTlk0JmEnHNclliZfaDutBRQiPL+HZx4+FCrJweS5U/4Jpg7vdfp/tp
eWdJ51pDM8iyqGTsP7a7/VaL5wLoJhbdD9wYgupZUhvY6g2tCZ71/hNiWdbKtCK8
EyQxXiAlc+k1UflOx6Xip1HLIh6HmKwxntXxRy+yj4IvJ3PhI+KS5Nqdl35TszN9
6y9MRo3oCU+2y89Yay4HZZb6DLxcAi6VwpyswnntodFQ+ICXEw7ZaNi3rSO+FCO8
KxfhLniMD5gflRP4gy+o9iZxgVQ75nmiPgBt53r+sAKZ7lv86x84DJ/ZUqL8EV0e
OJhxdzhoT0Ks8OstIuE87fgzUCjqMcgAavxcn1psKBC6/JY9v6OneA8qauSswkKs
P+diJIqZHHOBQVKFedqdIrDU6AstivSBq0ToPBslbBlcy97EO4IRoiMIw+QgHPYn
CHsPHtooBmxPyw+4HTFuzY1NIrSeUFYxTDAs9p5kMPmltkVAlLPcrpGZVya9tjds
l/YuwY2f0C9Q1pjcAc9FcN8Y5kLRCYNEWMl0M1VpC22KgjRN6r0=
=GrNu
-----END PGP SIGNATURE-----
Merge tag 'for-6.6-rc6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull btrfs fix from David Sterba:
"Fix a bug in chunk size decision that could lead to suboptimal
placement and filling patterns"
* tag 'for-6.6-rc6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: fix stripe length calculation for non-zoned data chunk allocation
Commit f6fca3917b "btrfs: store chunk size in space-info struct"
broke data chunk allocations on non-zoned multi-device filesystems when
using default chunk_size. Commit 5da431b71d "btrfs: fix the max chunk
size and stripe length calculation" partially fixed that, and this patch
completes the fix for that case.
After commit f6fca3917b and 5da431b71d, the sequence of events for
a data chunk allocation on a non-zoned filesystem is:
1. btrfs_create_chunk calls init_alloc_chunk_ctl, which copies
space_info->chunk_size (default 10 GiB) to ctl->max_stripe_len
unmodified. Before f6fca3917b, ctl->max_stripe_len value was
1 GiB for non-zoned data chunks and not configurable.
2. btrfs_create_chunk calls gather_device_info which consumes
and produces more fields of chunk_ctl.
3. gather_device_info multiplies ctl->max_stripe_len by
ctl->dev_stripes (which is 1 in all cases except dup)
and calls find_free_dev_extent with that number as num_bytes.
4. find_free_dev_extent locates the first dev_extent hole on
a device which is at least as large as num_bytes. With default
max_chunk_size from f6fca3917b, it finds the first hole which is
longer than 10 GiB, or the largest hole if that hole is shorter
than 10 GiB. This is different from the pre-f6fca3917b4d
behavior, where num_bytes is 1 GiB, and find_free_dev_extent
may choose a different hole.
5. gather_device_info repeats step 4 with all devices to find
the first or largest dev_extent hole that can be allocated on
each device.
6. gather_device_info sorts the device list by the hole size
on each device, using total unallocated space on each device to
break ties, then returns to btrfs_create_chunk with the list.
7. btrfs_create_chunk calls decide_stripe_size_regular.
8. decide_stripe_size_regular finds the largest stripe_len that
fits across the first nr_devs device dev_extent holes that were
found by gather_device_info (and satisfies other constraints
on stripe_len that are not relevant here).
9. decide_stripe_size_regular caps the length of the stripe it
computed at 1 GiB. This cap appeared in 5da431b71d to correct
one of the other regressions introduced in f6fca3917b.
10. btrfs_create_chunk creates a new chunk with the above
computed size and number of devices.
At step 4, gather_device_info() has found a location where stripe up to
10 GiB in length could be allocated on several devices, and selected
which devices should have a dev_extent allocated on them, but at step
9, only 1 GiB of the space that was found on each device can be used.
This mismatch causes new suboptimal chunk allocation cases that did not
occur in pre-f6fca3917b4d kernels.
Consider a filesystem using raid1 profile with 3 devices. After some
balances, device 1 has 10x 1 GiB unallocated space, while devices 2
and 3 have 1x 10 GiB unallocated space, i.e. the same total amount of
space, but distributed across different numbers of dev_extent holes.
For visualization, let's ignore all the chunks that were allocated before
this point, and focus on the remaining holes:
Device 1: [_] [_] [_] [_] [_] [_] [_] [_] [_] [_] (10x 1 GiB unallocated)
Device 2: [__________] (10 GiB contig unallocated)
Device 3: [__________] (10 GiB contig unallocated)
Before f6fca3917b, the allocator would fill these optimally by
allocating chunks with dev_extents on devices 1 and 2 ([12]), 1 and 3
([13]), or 2 and 3 ([23]):
[after 0 chunk allocations]
Device 1: [_] [_] [_] [_] [_] [_] [_] [_] [_] [_] (10 GiB)
Device 2: [__________] (10 GiB)
Device 3: [__________] (10 GiB)
[after 1 chunk allocation]
Device 1: [12] [_] [_] [_] [_] [_] [_] [_] [_] [_]
Device 2: [12] [_________] (9 GiB)
Device 3: [__________] (10 GiB)
[after 2 chunk allocations]
Device 1: [12] [13] [_] [_] [_] [_] [_] [_] [_] [_] (8 GiB)
Device 2: [12] [_________] (9 GiB)
Device 3: [13] [_________] (9 GiB)
[after 3 chunk allocations]
Device 1: [12] [13] [12] [_] [_] [_] [_] [_] [_] [_] (7 GiB)
Device 2: [12] [12] [________] (8 GiB)
Device 3: [13] [_________] (9 GiB)
[...]
[after 12 chunk allocations]
Device 1: [12] [13] [12] [13] [12] [13] [12] [13] [_] [_] (2 GiB)
Device 2: [12] [12] [23] [23] [12] [12] [23] [23] [__] (2 GiB)
Device 3: [13] [13] [23] [23] [13] [23] [13] [23] [__] (2 GiB)
[after 13 chunk allocations]
Device 1: [12] [13] [12] [13] [12] [13] [12] [13] [12] [_] (1 GiB)
Device 2: [12] [12] [23] [23] [12] [12] [23] [23] [12] [_] (1 GiB)
Device 3: [13] [13] [23] [23] [13] [23] [13] [23] [__] (2 GiB)
[after 14 chunk allocations]
Device 1: [12] [13] [12] [13] [12] [13] [12] [13] [12] [13] (full)
Device 2: [12] [12] [23] [23] [12] [12] [23] [23] [12] [_] (1 GiB)
Device 3: [13] [13] [23] [23] [13] [23] [13] [23] [13] [_] (1 GiB)
[after 15 chunk allocations]
Device 1: [12] [13] [12] [13] [12] [13] [12] [13] [12] [13] (full)
Device 2: [12] [12] [23] [23] [12] [12] [23] [23] [12] [23] (full)
Device 3: [13] [13] [23] [23] [13] [23] [13] [23] [13] [23] (full)
This allocates all of the space with no waste. The sorting function used
by gather_device_info considers free space holes above 1 GiB in length
to be equal to 1 GiB, so once find_free_dev_extent locates a sufficiently
long hole on each device, all the holes appear equal in the sort, and the
comparison falls back to sorting devices by total free space. This keeps
usable space on each device equal so they can all be filled completely.
After f6fca3917b, the allocator prefers the devices with larger holes
over the devices with more free space, so it makes bad allocation choices:
[after 1 chunk allocation]
Device 1: [_] [_] [_] [_] [_] [_] [_] [_] [_] [_] (10 GiB)
Device 2: [23] [_________] (9 GiB)
Device 3: [23] [_________] (9 GiB)
[after 2 chunk allocations]
Device 1: [_] [_] [_] [_] [_] [_] [_] [_] [_] [_] (10 GiB)
Device 2: [23] [23] [________] (8 GiB)
Device 3: [23] [23] [________] (8 GiB)
[after 3 chunk allocations]
Device 1: [_] [_] [_] [_] [_] [_] [_] [_] [_] [_] (10 GiB)
Device 2: [23] [23] [23] [_______] (7 GiB)
Device 3: [23] [23] [23] [_______] (7 GiB)
[...]
[after 9 chunk allocations]
Device 1: [_] [_] [_] [_] [_] [_] [_] [_] [_] [_] (10 GiB)
Device 2: [23] [23] [23] [23] [23] [23] [23] [23] [23] [_] (1 GiB)
Device 3: [23] [23] [23] [23] [23] [23] [23] [23] [23] [_] (1 GiB)
[after 10 chunk allocations]
Device 1: [12] [_] [_] [_] [_] [_] [_] [_] [_] [_] (9 GiB)
Device 2: [23] [23] [23] [23] [23] [23] [23] [23] [12] (full)
Device 3: [23] [23] [23] [23] [23] [23] [23] [23] [_] (1 GiB)
[after 11 chunk allocations]
Device 1: [12] [13] [_] [_] [_] [_] [_] [_] [_] [_] (8 GiB)
Device 2: [23] [23] [23] [23] [23] [23] [23] [23] [12] (full)
Device 3: [23] [23] [23] [23] [23] [23] [23] [23] [13] (full)
No further allocations are possible, with 8 GiB wasted (4 GiB of data
space). The sort in gather_device_info now considers free space in
holes longer than 1 GiB to be distinct, so it will prefer devices 2 and
3 over device 1 until all but 1 GiB is allocated on devices 2 and 3.
At that point, with only 1 GiB unallocated on every device, the largest
hole length on each device is equal at 1 GiB, so the sort finally moves
to ordering the devices with the most free space, but by this time it
is too late to make use of the free space on device 1.
Note that it's possible to contrive a case where the pre-f6fca3917b4d
allocator fails the same way, but these cases generally have extensive
dev_extent fragmentation as a precondition (e.g. many holes of 768M
in length on one device, and few holes 1 GiB in length on the others).
With the regression in f6fca3917b, bad chunk allocation can occur even
under optimal conditions, when all dev_extent holes are exact multiples
of stripe_len in length, as in the example above.
Also note that post-f6fca3917b4d kernels do treat dev_extent holes
larger than 10 GiB as equal, so the bad behavior won't show up on a
freshly formatted filesystem; however, as the filesystem ages and fills
up, and holes ranging from 1 GiB to 10 GiB in size appear, the problem
can show up as a failure to balance after adding or removing devices,
or an unexpected shortfall in available space due to unequal allocation.
To fix the regression and make data chunk allocation work
again, set ctl->max_stripe_len back to the original SZ_1G, or
space_info->chunk_size if that's smaller (the latter can happen if the
user set space_info->chunk_size to less than 1 GiB via sysfs, or it's
a 32 MiB system chunk with a hardcoded chunk_size and stripe_len).
While researching the background of the earlier commits, I found that an
identical fix was already proposed at:
https://lore.kernel.org/linux-btrfs/de83ac46-a4a3-88d3-85ce-255b7abc5249@gmx.com/
The previous review missed one detail: ctl->max_stripe_len is used
before decide_stripe_size_regular() is called, when it is too late for
the changes in that function to have any effect. ctl->max_stripe_len is
not used directly by decide_stripe_size_regular(), but the parameter
does heavily influence the per-device free space data presented to
the function.
Fixes: f6fca3917b ("btrfs: store chunk size in space-info struct")
CC: stable@vger.kernel.org # 6.1+
Link: https://lore.kernel.org/linux-btrfs/20231007051421.19657-1-ce3g8jdj@umail.furryterror.org/
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Zygo Blaxell <ce3g8jdj@umail.furryterror.org>
Signed-off-by: David Sterba <dsterba@suse.com>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEE9zuTYTs0RXF+Ke33EVvVyTe/1WoFAmUrsD8ACgkQEVvVyTe/
1WpCmg//XfJm6TdeFFuoEoYezytHXnEHnIM+kHMOqmtY1AoaW4884dvevKOCBsUF
BXXfnIsJc1GmKBDCOxaWFXTRN9iLHTLww8dZzJFX1vDYsqo96t7TbbssyOuOy9vp
K4uYX/pZzuggyBCUZvz/UdjZPEAwPVmsU/uXe/gkHQLS+wpOH0e7hOB7p91I0iZx
0JdleyoDyw2AtqWLscGrTYqozW+lNMl0smADWaGfLcn29rRkR10mhq5heYa+4b2C
TiG29rpIDgXyhz9w+Tq4f3oxJDrE1qlQGX7G8tJjISS9UDHxrZqFun0+nr15Y8Ge
3YstSaMlarAx+UGtuEX80QcYHlYDAWnCKwkRAD8wBtKLeO0pG6BSmCCVi34CAoN+
NBy5KQeoDV96vqoIc8lDmXWOgOkzOogJzzspOWT3H7jmonTjkYL/rYGuV1V6lpuk
Ngopc7HxjiZeGb9Zchr1KOT6xJzjYm74/Ph8I9ECPAamgO6UxlOcsBF2uc7OZIlL
Jzv5Hl+ITiXzxa/KQonx2nEtdO8INf5wxHjy4nlnnY0dcibGCA1wi/3/8/vqJBIf
tuRoQJ/eHruTf23dwPRFwo4JoOViSv3Up9GFyEodllBA6DSAHK96low8eeEZAw9e
MeyWy45dewgH1jvx3bb8Sd29CVxUXvMEl33hYngu09/cv7XulMw=
=9LTH
-----END PGP SIGNATURE-----
Merge tag 'ovl-fixes-6.6-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/overlayfs/vfs
Pull overlayfs fixes from Amir Goldstein:
- Various fixes for regressions due to conversion to new mount
api in v6.5
- Disable a new mount option syntax (append lowerdir) that was
added in v6.5 because we plan to add a different lowerdir
append syntax in v6.7
* tag 'ovl-fixes-6.6-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/overlayfs/vfs:
ovl: temporarily disable appending lowedirs
ovl: fix regression in showing lowerdir mount option
ovl: fix regression in parsing of mount options with escaped comma
fs: factor out vfs_parse_monolithic_sep() helper
-----BEGIN PGP SIGNATURE-----
iQGzBAABCgAdFiEE6fsu8pdIjtWE/DpLiiy9cAdyT1EFAmUrPnIACgkQiiy9cAdy
T1FTewv/ZgF5CX5M8EpLvUENcdzJQIh2JolP2PeSbqVjf/pI/LKrkGP6fHEo8+S2
Mnntxik3aNZ3hQ755SpGw+mT4hgr2umsDDrZxLVjvDDiLxqW9zlr55JdBS5xJvxN
enKJ8wDbP+Usn4Gb0TfY9xrPWgHyYSn9+dYoPYuh1Z1zqmhFfIRpGSBHbwUM6Ssa
vONpZUdnzdRnuHKyU3+xgU4Pr6KZLnriM/iGgjncCHJRCem0f27W50xsXkkpoVIg
GIhNRe//YL1dlqbpQpXY+4+KI/3d2JRZeVpnzCJ7ucuwyjq5KNJSnTI7Jsgqnf3V
ADe9m/HnknOG3lkQrzojxTNGLmXqlvsxUUNVtjGRccAHaOJDCsKA5Wv5L2jJahdP
ynuXz5iwtQqaHPkfIL5D48RYiMTVemmsHdP+cdnleXkcU8GpN7cd2Gnt591Wo8by
lcRS01pMlRfSh6SyKcDghEHbb2BDKyRSbIlvsy+85CBdKAqOpJXl/96p67UPcQjI
/K9g3cY6
=X4B/
-----END PGP SIGNATURE-----
Merge tag '6.6-rc5-ksmbd-server-fixes' of git://git.samba.org/ksmbd
Pull smb server fixes from Steve French:
- Fix for possible double free in RPC read
- Add additional check to clarify smb2_open path and quiet Coverity
- Fix incorrect error rsp in a compounding path
- Fix to properly fail open of file with pending delete on close
* tag '6.6-rc5-ksmbd-server-fixes' of git://git.samba.org/ksmbd:
ksmbd: fix potential double free on smb2_read_pipe() error path
ksmbd: fix Null pointer dereferences in ksmbd_update_fstate()
ksmbd: fix wrong error response status by using set_smb2_rsp_status()
ksmbd: not allow to open file if delelete on close bit is set
-----BEGIN PGP SIGNATURE-----
iQGzBAABCgAdFiEE6fsu8pdIjtWE/DpLiiy9cAdyT1EFAmUrPK0ACgkQiiy9cAdy
T1FULQv+IFTddGvRcreJFFeq5pndSGNuN4gs2dUkc+wQaS80wH1TI/tmmOB8DNx7
amXhPv4bo9inbus0ZAZLlOY6Vr7bAhQTGjXLJEVwes+yYxpxbJVmNt9aPCA6yiaP
OAC5jKUEIdIgQkcwpSaUYkiCayB0i38R300HSFLFk2XqPOt5QRCIxrBm5Ssq7yN2
OckglUl0p+FY54kI4kL6sfXi7tRV6rqjzofIxJJuiUqU3IUJk+G1b/pNouWLhEpi
wQPmpdcRvV70QGinzdsuVWNQV8HEZ+zvQQkdiIqKPqEl1hMzndFYMfSNnvDcDN+W
xMmcnIW0ei/3qjYDZ3wjnMEH/978nBrQEtUL/dpAXOmVr+U72NKQE22IlWEICSo8
GZ7JJxcB20WHNHzSFH1+Y+7kgNTgwLd59uK27xqm1/8+qcvhf/MMnjA/ym9F4pyq
4IEHPy7keQ5apSdBhtItJxg5etjpjujQomk1yRxrIU1JAnF0LpL/BFX85aT19QxX
hbbqWeyd
=ZlF9
-----END PGP SIGNATURE-----
Merge tag '6.6-rc5-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6
Pull smb client fixes from Steve French:
- fix caching race with open_cached_dir and laundromat cleanup of
cached dirs (addresses a problem spotted with xfstest run with
directory leases enabled)
- reduce excessive resource usage of laundromat threads
* tag '6.6-rc5-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6:
smb: client: prevent new fids from being removed by laundromat
smb: client: make laundromat a delayed worker
Kernel v6.5 converted overlayfs to new mount api.
As an added bonus, it also added a feature to allow appending lowerdirs
using lowerdir=:/lower2,lowerdir=::/data3 syntax.
This new syntax has raised some concerns regarding escaping of colons.
We decided to try and disable this syntax, which hasn't been in the wild
for so long and introduce it again in 6.7 using explicit mount options
lowerdir+=/lower2,datadir+=/data3.
Suggested-by: Miklos Szeredi <miklos@szeredi.hu>
Link: https://lore.kernel.org/r/CAJfpegsr3A4YgF2YBevWa6n3=AcP7hNndG6EPMu3ncvV-AM71A@mail.gmail.com/
Fixes: b36a5780cb ("ovl: modify layer parameter parsing")
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
* Fix calculation of offset of AG's last block and its length.
* Update incore AG block count when shrinking an AG.
* Process free extents to busy list in FIFO order.
* Make XFS report its i_version as the STATX_CHANGE_COOKIE.
Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQQjMC4mbgVeU7MxEIYH7y4RirJu9AUCZSjJjQAKCRAH7y4RirJu
9HK3AQDLHHw5nlrVShiOtps543OOJ1uBA2PO/lxvvM3X6GPbbwEA8uYLqNPThpF2
+a5h8y9LV6uTEDsdpSoEPBmCgXKhYQg=
=RmDo
-----END PGP SIGNATURE-----
Merge tag 'xfs-6.6-fixes-5' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux
Pull xfs fixes from Chandan Babu:
- Fix calculation of offset of AG's last block and its length
- Update incore AG block count when shrinking an AG
- Process free extents to busy list in FIFO order
- Make XFS report its i_version as the STATX_CHANGE_COOKIE
* tag 'xfs-6.6-fixes-5' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
xfs: reinstate the old i_version counter as STATX_CHANGE_COOKIE
xfs: Remove duplicate include
xfs: correct calculation for agend and blockcount
xfs: process free extents to busy list in FIFO order
xfs: adjust the incore perag block_count when shrinking
Before commit b36a5780cb ("ovl: modify layer parameter parsing"),
spaces and commas in lowerdir mount option value used to be escaped using
seq_show_option().
In current upstream, when lowerdir value has a space, it is not escaped
in /proc/mounts, e.g.:
none /mnt overlay rw,relatime,lowerdir=l l,upperdir=u,workdir=w 0 0
which results in broken output of the mount utility:
none on /mnt type overlay (rw,relatime,lowerdir=l)
Store the original lowerdir mount options before unescaping and show
them using the same escaping used for seq_show_option() in addition to
escaping the colon separator character.
Fixes: b36a5780cb ("ovl: modify layer parameter parsing")
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
kernel_connect() which recently grown protection against someone using
BPF to rewrite the address. All but one marked for stable.
-----BEGIN PGP SIGNATURE-----
iQFHBAABCAAxFiEEydHwtzie9C7TfviiSn/eOAIR84sFAmUpYoQTHGlkcnlvbW92
QGdtYWlsLmNvbQAKCRBKf944AhHziwvmCACK13dkAaupcHyteYPloBgtJLNixR3X
6++nHCOXGtE7cK6n1snobFQgp/d5BqSKAeymyLqDjJOJVqG/5n8FR1gwcuY/Ogdj
Aju2Mkt7R/R/V6kvmCbqGSwiOvxZP1gBFkKcluRNQkFNP3boKw4vmJGq29Rabbyr
66NfZSETsR/H4JhAWvUyVmffrvxIx11THnvmrAnprGVKoK72HwOQFx0H4KLD2Hio
aN/yiiR15yDS6hL8dfilt+QGO6o+ZWgvRl1GOzqjwISgfeUUYVknMJXrEf+20kHs
kX3tWaLWVZsU1FyVFRO5HPYLAREjALXstFO4K1OHPERM7EipWNinIHoS
=wyYP
-----END PGP SIGNATURE-----
Merge tag 'ceph-for-6.6-rc6' of https://github.com/ceph/ceph-client
Pull ceph fixes from Ilya Dryomov:
"Fixes for an overreaching WARN_ON, two error paths and a switch to
kernel_connect() which recently grown protection against someone using
BPF to rewrite the address.
All but one marked for stable"
* tag 'ceph-for-6.6-rc6' of https://github.com/ceph/ceph-client:
ceph: fix type promotion bug on 32bit systems
libceph: use kernel_connect()
ceph: remove unnecessary IS_ERR() check in ceph_fname_to_usr()
ceph: fix incorrect revoked caps assert in ceph_fill_file_size()
An io_uring openat operation can update an audit reference count
from multiple threads resulting in the call trace below.
A call to io_uring_submit() with a single openat op with a flag of
IOSQE_ASYNC results in the following reference count updates.
These first part of the system call performs two increments that do not race.
do_syscall_64()
__do_sys_io_uring_enter()
io_submit_sqes()
io_openat_prep()
__io_openat_prep()
getname()
getname_flags() /* update 1 (increment) */
__audit_getname() /* update 2 (increment) */
The openat op is queued to an io_uring worker thread which starts the
opportunity for a race. The system call exit performs one decrement.
do_syscall_64()
syscall_exit_to_user_mode()
syscall_exit_to_user_mode_prepare()
__audit_syscall_exit()
audit_reset_context()
putname() /* update 3 (decrement) */
The io_uring worker thread performs one increment and two decrements.
These updates can race with the system call decrement.
io_wqe_worker()
io_worker_handle_work()
io_wq_submit_work()
io_issue_sqe()
io_openat()
io_openat2()
do_filp_open()
path_openat()
__audit_inode() /* update 4 (increment) */
putname() /* update 5 (decrement) */
__audit_uring_exit()
audit_reset_context()
putname() /* update 6 (decrement) */
The fix is to change the refcnt member of struct audit_names
from int to atomic_t.
kernel BUG at fs/namei.c:262!
Call Trace:
...
? putname+0x68/0x70
audit_reset_context.part.0.constprop.0+0xe1/0x300
__audit_uring_exit+0xda/0x1c0
io_issue_sqe+0x1f3/0x450
? lock_timer_base+0x3b/0xd0
io_wq_submit_work+0x8d/0x2b0
? __try_to_del_timer_sync+0x67/0xa0
io_worker_handle_work+0x17c/0x2b0
io_wqe_worker+0x10a/0x350
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/lkml/MW2PR2101MB1033FFF044A258F84AEAA584F1C9A@MW2PR2101MB1033.namprd21.prod.outlook.com/
Fixes: 5bd2182d58 ("audit,io_uring,io-wq: add some basic audit support to io_uring")
Signed-off-by: Dan Clash <daclash@linux.microsoft.com>
Link: https://lore.kernel.org/r/20231012215518.GA4048@linuxonhyperv3.guj3yctzbm1etfxqx2vob5hsef.xx.internal.cloudapp.net
Reviewed-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Christian Brauner <brauner@kernel.org>
Fix new smatch warnings:
fs/smb/server/smb2pdu.c:6131 smb2_read_pipe() error: double free of 'rpc_resp'
Fixes: e2b76ab8b5 ("ksmbd: add support for read compound")
Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
Coverity Scan report the following one. This report is a false alarm.
Because fp is never NULL when rc is zero. This patch add null check for fp
in ksmbd_update_fstate to make alarm silence.
*** CID 1568583: Null pointer dereferences (FORWARD_NULL)
/fs/smb/server/smb2pdu.c: 3408 in smb2_open()
3402 path_put(&path);
3403 path_put(&parent_path);
3404 }
3405 ksmbd_revert_fsids(work);
3406 err_out1:
3407 if (!rc) {
>>> CID 1568583: Null pointer dereferences (FORWARD_NULL)
>>> Passing null pointer "fp" to "ksmbd_update_fstate", which dereferences it.
3408 ksmbd_update_fstate(&work->sess->file_table, fp, FP_INITED);
3409 rc = ksmbd_iov_pin_rsp(work, (void *)rsp, iov_len);
3410 }
3411 if (rc) {
3412 if (rc == -EINVAL)
3413 rsp->hdr.Status = STATUS_INVALID_PARAMETER;
Fixes: e2b76ab8b5 ("ksmbd: add support for read compound")
Reported-by: Coverity Scan <scan-admin@coverity.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
set_smb2_rsp_status() after __process_request() sets the wrong error
status. This patch resets all iov vectors and sets the error status
on clean one.
Fixes: e2b76ab8b5 ("ksmbd: add support for read compound")
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
Cthon test fail with the following error.
check for proper open/unlink operation
nfsjunk files before unlink:
-rwxr-xr-x 1 root root 0 9월 25 11:03 ./nfs2y8Jm9
./nfs2y8Jm9 open; unlink ret = 0
nfsjunk files after unlink:
-rwxr-xr-x 1 root root 0 9월 25 11:03 ./nfs2y8Jm9
data compare ok
nfsjunk files after close:
ls: cannot access './nfs2y8Jm9': No such file or directory
special tests failed
Cthon expect to second unlink failure when file is already unlinked.
ksmbd can not allow to open file if flags of ksmbd inode is set with
S_DEL_ON_CLS flags.
Cc: stable@vger.kernel.org
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
Ever since commit 91c7794713 ("ovl: allow filenames with comma"), the
following example was legit overlayfs mount options:
mount -t overlay overlay -o 'lowerdir=/tmp/a\,b/lower' /mnt
The conversion to new mount api moved to using the common helper
generic_parse_monolithic() and discarded the specialized ovl_next_opt()
option separator.
Bring back ovl_next_opt() and use vfs_parse_monolithic_sep() to fix the
regression.
Reported-by: Ryan Hendrickson <ryan.hendrickson@alum.mit.edu>
Closes: https://lore.kernel.org/r/8da307fb-9318-cf78-8a27-ba5c5a0aef6d@alum.mit.edu/
Fixes: 1784fbc2ed ("ovl: port to new mount api")
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Factor out vfs_parse_monolithic_sep() from generic_parse_monolithic(),
so filesystems could use it with a custom option separator callback.
Acked-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Check if @cfid->time is set in laundromat so we guarantee that only
fully cached fids will be selected for removal. While we're at it,
add missing locks to protect access of @cfid fields in order to avoid
races with open_cached_dir() and cfids_laundromat_worker(),
respectively.
Signed-off-by: Paulo Alcantara (SUSE) <pc@manguebit.com>
Reviewed-by: Shyam Prasad N <sprasad@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
By having laundromat kthread processing cached directories on every
second turned out to be overkill, especially when having multiple SMB
mounts.
Relax it by using a delayed worker instead that gets scheduled on
every @dir_cache_timeout (default=30) seconds per tcon.
This also fixes the 1s delay when tearing down tcon.
Signed-off-by: Paulo Alcantara (SUSE) <pc@manguebit.com>
Reviewed-by: Shyam Prasad N <sprasad@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
The handling of STATX_CHANGE_COOKIE was moved into generic_fillattr in
commit 0d72b92883 (fs: pass the request_mask to generic_fillattr), but
we didn't account for the fact that xfs doesn't call generic_fillattr at
all.
Make XFS report its i_version as the STATX_CHANGE_COOKIE.
Fixes: 0d72b92883 (fs: pass the request_mask to generic_fillattr)
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
./fs/xfs/scrub/xfile.c: xfs_format.h is included more than once.
Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Closes: https://bugzilla.openanolis.cn/show_bug.cgi?id=6209
Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
The agend should be "start + length - 1", then, blockcount should be
"end + 1 - start". Correct 2 calculation mistakes.
Also, rename "agend" to "range_agend" because it's not the end of the AG
per se; it's the end of the dead region within an AG's agblock space.
Fixes: 5cf32f63b0 ("xfs: fix the calculation for "end" and "length"")
Signed-off-by: Shiyang Ruan <ruansy.fnst@fujitsu.com>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEE8rQSAMVO+zA4DBdWxWXV+ddtWDsFAmUmbQMACgkQxWXV+ddt
WDtBshAAqOwMrqRwOKOze/LQ4Kl9A8p0l+XxYdt7nRSY7n15xpN6uLVsc0gTwO5n
HOquDe2ivrpdOXI6ArcujTTFHaBGX+mmubU/yi54MH0iwuCR32dYhj3j7mDUIf6F
GpTEjgxIdE4AMUw7e7Rzqbdcmq//+H+bBdm+2YkNNEBmPP06483GYthjKJ7zWdrn
pPksR9f611aHU4jZnKZJeHgZh4iVrIszIxkjeMD5NJ6KUb8LJmISLOOJzowkmugt
JH8bd1F/+/53MmpntWGnHnURI9J6UxBL0cNnYW26FjY21N3RGR2BumotW73hYaD7
6fwuxs4ZWlLqHUtIOaAVUUSfEVse7k/i7m4+sDB1JLh26alqUHunqCFV+3ROTnOY
jHwWW+qyQhxJnfgtHyDrwcybfW0V41hhmDIhoeezkSDtbnacNTMfwzXS2ELcp0KJ
/13TCruweFN0g4lBR8HfbKJCCzPayxCirtubx1nIMRysHfo10aDWz1MSvr3mkOyo
gwif/j9BMKN0+fg6l9eZNHWHfQ8qfL3dvSRBlvJcP5mnG5ZuVkxJUFH0m/UfdFbZ
sbeJHSP9wex5tJKmG3kJPAuZWwGLHCiMMCnsWoq+02KV8IXrw3Ji5z/8Hhsb51Ps
r7BGRO2A2rD9XLJtc9BCiwiV177/WknmTUtRpOyxHFfb37bKmHg=
=Wz/9
-----END PGP SIGNATURE-----
Merge tag 'for-6.6-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull btrfs fixes from David Sterba:
"A revert of recent mount option parsing fix, this breaks mounts with
security options.
The second patch is a flexible array annotation"
* tag 'for-6.6-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: add __counted_by for struct btrfs_delayed_item and use struct_size()
Revert "btrfs: reject unknown mount options early"
When we're adding extents to the busy discard list, add them to the tail
of the list so that we get FIFO order. For FITRIM commands, this means
that we send discard bios sorted in order from longest to shortest, like
we did before commit 89cfa89960.
For transactions that are freeing extents, this puts them in the
transaction's busy list in FIFO order as well, which shouldn't make any
noticeable difference.
Fixes: 89cfa89960 ("xfs: reduce AGF hold times during fstrim operations")
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
If we reduce the number of blocks in an AG, we must update the incore
geometry values as well.
Fixes: 0800169e3e ("xfs: Pre-calculate per-AG agbno geometry")
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Prepare for the coming implementation by GCC and Clang of the __counted_by
attribute. Flexible array members annotated with __counted_by can have
their accesses bounds-checked at run-time via CONFIG_UBSAN_BOUNDS (for
array indexing) and CONFIG_FORTIFY_SOURCE (for strcpy/memcpy-family
functions).
While there, use struct_size() helper, instead of the open-coded
version, to calculate the size for the allocation of the whole
flexible structure, including of course, the flexible-array member.
This code was found with the help of Coccinelle, and audited and
fixed manually.
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
This reverts commit 5f521494cc.
The patch breaks mounts with security mount options like
$ mount -o context=system_u:object_r:root_t:s0 /dev/sdX /mn
mount: /mnt: wrong fs type, bad option, bad superblock on /dev/sdX, missing codepage or helper program, ...
We cannot reject all unknown options in btrfs_parse_subvol_options() as
intended, the security options can be present at this point and it's not
possible to enumerate them in a future proof way. This means unknown
mount options are silently accepted like before when the filesystem is
mounted with either -o subvol=/path or as followup mounts of the same
device.
Reported-by: Shinichiro Kawasaki <shinichiro.kawasaki@wdc.com
Signed-off-by: David Sterba <dsterba@suse.com>
In this code "ret" is type long and "src_objlen" is unsigned int. The
problem is that on 32bit systems, when we do the comparison signed longs
are type promoted to unsigned int. So negative error codes from
do_splice_direct() are treated as success instead of failure.
Cc: stable@vger.kernel.org
Fixes: 1b0c3b9f91 ("ceph: re-org copy_file_range and fix some error paths")
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Reviewed-by: Xiubo Li <xiubli@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Before returning, function ceph_fname_to_usr() does a final IS_ERR() check
in 'dir':
if ((dir != fname->dir) && !IS_ERR(dir)) {...}
This check is unnecessary because, if the 'dir' variable has changed to
something other than 'fname->dir' (it's initial value), that error check has
been performed already and, if there was indeed an error, it would have
been returned immediately.
Besides, this useless IS_ERR() is also confusing static analysis tools.
Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
Closes: https://lore.kernel.org/r/202309282202.xZxGdvS3-lkp@intel.com/
Signed-off-by: Luis Henriques <lhenriques@suse.de>
Reviewed-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
When truncating the inode the MDS will acquire the xlock for the
ifile Locker, which will revoke the 'Frwsxl' caps from the clients.
But when the client just releases and flushes the 'Fw' caps to MDS,
for exmaple, and once the MDS receives the caps flushing msg it
just thought the revocation has finished. Then the MDS will continue
truncating the inode and then issued the truncate notification to
all the clients. While just before the clients receives the cap
flushing ack they receive the truncation notification, the clients
will detecte that the 'issued | dirty' is still holding the 'Fw'
caps.
Cc: stable@vger.kernel.org
Link: https://tracker.ceph.com/issues/56693
Fixes: b0d7c22310 ("ceph: introduce i_truncate_mutex")
Signed-off-by: Xiubo Li <xiubli@redhat.com>
Reviewed-by: Milind Changire <mchangir@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
-----BEGIN PGP SIGNATURE-----
iQGzBAABCgAdFiEE6fsu8pdIjtWE/DpLiiy9cAdyT1EFAmUiD7MACgkQiiy9cAdy
T1Eh8gv+JyoDuNQnEZ+/Db/dYXEb6HfZD0bIajUu9uJ47q6EauT3LUrEsolBem8L
dy4GeEwdO3Ue2tcTY1z0wEyFp6nGqnIOOdLPqtNd4/NxXizzbH8dM8/WbGTbLz3H
LqR9Ssy/+4tE/XkiQ1L4klqa1M+9buch7o/VPgQqLSNAFw5VjUdZDkaP7GYN7Voj
p9KqYspi0gD6/0HkyBo55i4CDk6o0g3kukVPjMeoht/9IwVKraZMdCLs8BnVN8aQ
r4tU1F6GhvvdBHMekzQwOxQcxIgsXMPMNE4Daz8n7OVRJYXaRT8OlGxNrH7NhXVo
0BuG7psIXeNjBGUcjUFbXhwfNv7iNdXFlP2/kpBTQFjm2KsJDJTF3VnO08WgWS+i
1YJmGIxuCZSDC39Gp70M04HwWVIiMReoaQl4PHI5u6FnD4jTkC6EA051tm/D5c+o
aDAxGa8E/B8A7cQmI2xZ2i91g8ALemybRzk4Ta/cwbO5DwBdIA+bGEo4Y4JiAZOj
7q7/Nk1p
=Sm4J
-----END PGP SIGNATURE-----
Merge tag '6.6-rc4-ksmbd-server-fixes' of git://git.samba.org/ksmbd
Pull smb server fixes from Steve French:
"Six SMB3 server fixes for various races found by RO0T Lab of Huawei:
- Fix oops when racing between oplock break ack and freeing file
- Simultaneous request fixes for parallel logoffs, and for parallel
lock requests
- Fixes for tree disconnect race, session expire race, and close/open
race"
* tag '6.6-rc4-ksmbd-server-fixes' of git://git.samba.org/ksmbd:
ksmbd: fix race condition between tree conn lookup and disconnect
ksmbd: fix race condition from parallel smb2 lock requests
ksmbd: fix race condition from parallel smb2 logoff requests
ksmbd: fix uaf in smb20_oplock_break_ack
ksmbd: fix race condition with fp
ksmbd: fix race condition between session lookup and expire
-----BEGIN PGP SIGNATURE-----
iQGzBAABCgAdFiEE6fsu8pdIjtWE/DpLiiy9cAdyT1EFAmUgfesACgkQiiy9cAdy
T1G4Owv+PWBc6OqHR3VzXHvkw5A6I9f1neVcYtN+8NyxVreaMWmj/xSe1IshBNxm
4vkkkdEMr8E3FQQ4tuciFrUMpPhlYy8mXa2QqUuHv0gKe/etcnecp6cHtMyiCDDs
Z1PSGzFlQ/1f/O5yGmUy4l3Mn6HPh2InHoPrw7GKbMitxwaoAhVgHEb+5mljM+7n
fMcEuNWRkIFTJc6F8/VOHkFR/+vLGpTOCW+QwN/WrD9GgxI+Y07BhAWBNjju1tRI
ao5KYQYhOu/SsZ51QwUCb5uxk7IoCIBKeR6YTCHsbXavUTzKNcmNTYnAVia03ebp
0+DLOtZquEYdvKKa5vDWEJK5+W1Em6RKHF8rfhn7DKHeJTbyznzcpwOLU8AbwyTE
u7WXw1kyhGgEur4GJHYBs8pXcaBnP2GXf/njW0SAQQP0fu5vgwqFo8xWm077K8jZ
6NQ85qR+3AlytXP8Qz7rSb0dZMuOSfc7KflMlgN4PVj9kcHj3ku3YXnHBxSti5hG
BetEOa4m
=vjil
-----END PGP SIGNATURE-----
Merge tag '6.6-rc4-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6
Pull smb client fixes from Steve French:
- protect cifs/smb3 socket connect from BPF address overwrite
- fix case when directory leases disabled but wasting resources with
unneeded thread on each mount
* tag '6.6-rc4-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6:
smb: client: do not start laundromat thread on nohandlecache
smb: use kernel_connect() and kernel_bind()
* Prevent filesystem hang when executing fstrim operations on large and slow
storage.
Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQQjMC4mbgVeU7MxEIYH7y4RirJu9AUCZR5OZgAKCRAH7y4RirJu
9PC3AQDm/LTG1QYfytvl2EpmCQKDiCzT/RyCDe2xfLqULE24uwEA6pL9wYaoMTAg
MK1k2W/fMBlcijrLNdCzRaAqDL5l+AQ=
=+bWV
-----END PGP SIGNATURE-----
Merge tag 'xfs-6.6-fixes-3' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux
Pull xfs fixes from Chandan Babu:
- Prevent filesystem hang when executing fstrim operations on large and
slow storage
* tag 'xfs-6.6-fixes-3' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
xfs: abort fstrim if kernel is suspending
xfs: reduce AGF hold times during fstrim operations
xfs: move log discard work to xfs_discard.c
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEE8rQSAMVO+zA4DBdWxWXV+ddtWDsFAmUe+t0ACgkQxWXV+ddt
WDv6MA/7B31L45dH+qHM3XFUygJuTBk44OynDSRD/JrPS6ruycu3QpWCZ82+ozUz
v8ULN3xJV4j2EWWa7w20CNfMITqEdOAvHHX6GAuXwTfLwy3ov+/L8tOt2OAQ44go
kr6jiQULdBwfMxEp+6a5kMw0enVuEz3H+P8gWWUfQHuse+Cgk1TIdvLL8YuaoL0x
mEphDtNLFh7UcsKxxVwgNXWowPxIO62xW/11hJKrF9ZpyFfER1TzfaO9kZStH2oe
ylHYkWsVf6GdHtXlsVnvDSNdj+GW/KLRLWKouQNjbInSjmZzEBliBbVbXLCI1fvO
/LpN1uu8T1XezBvxoEFw2JenkmFqMDg+ocl81owoG/IdJLOqPWCerUGb7VPtooT3
dLx3buXXVBhx70qRdCgg5SwsjNTSElV5Ub9AnYGP5oux5of8oLOb9dSpQsxcE7iE
yJEltu6+A1X+uVFHiDI8IIGghyZRq2UXc6zVdE3cHFfjwwB22aOtcRKZDw4O3Qzn
DMuACRWZk8WL9gpQZEPa07JmSS3VPN6iY1gq3CYeZpoHOW6BMMDYb2p5/f+yNbWW
a2JkDW+BnorEqqssMUyB2tf5k3fbOn1M15LSAH5oVXKA/F7dlxnSQksa7AI/pfFK
InAmPLWQhzcIuNhpUs/+FwZ2csc0mbAWroX+fIRF3S99GR2e9ag=
=/WDi
-----END PGP SIGNATURE-----
Merge tag 'for-6.6-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull btrfs fixes from David Sterba:
- reject unknown mount options
- adjust transaction abort error message level
- fix one more build warning with -Wmaybe-uninitialized
- proper error handling in several COW-related cases
* tag 'for-6.6-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: error out when reallocating block for defrag using a stale transaction
btrfs: error when COWing block from a root that is being deleted
btrfs: error out when COWing block using a stale transaction
btrfs: always print transaction aborted messages with an error level
btrfs: reject unknown mount options early
btrfs: fix some -Wmaybe-uninitialized warnings in ioctl.c
Eric has reported that commit dabc8b2075 ("quota: fix dqput() to
follow the guarantees dquot_srcu should provide") heavily increases
runtime of generic/270 xfstest for ext4 in nojournal mode. The reason
for this is that ext4 in nojournal mode leaves dquots dirty until the last
dqput() and thus the cleanup done in quota_release_workfn() has to write
them all. Due to the way quota_release_workfn() is written this results
in synchronize_srcu() call for each dirty dquot which makes the dquot
cleanup when turning quotas off extremely slow.
To be able to avoid synchronize_srcu() for each dirty dquot we need to
rework how we track dquots to be cleaned up. Instead of keeping the last
dquot reference while it is on releasing_dquots list, we drop it right
away and mark the dquot with new DQ_RELEASING_B bit instead. This way we
can we can remove dquot from releasing_dquots list when new reference to
it is acquired and thus there's no need to call synchronize_srcu() each
time we drop dq_list_lock.
References: https://lore.kernel.org/all/ZRytn6CxFK2oECUt@debian-BULLSEYE-live-builder-AMD64
Reported-by: Eric Whitney <enwlinux@gmail.com>
Fixes: dabc8b2075 ("quota: fix dqput() to follow the guarantees dquot_srcu should provide")
CC: stable@vger.kernel.org
Signed-off-by: Jan Kara <jack@suse.cz>
- Fix a memory leak issue when using LZMA global compressed
deduplication;
- Fix empty device tags in flatdev mode;
- Update documentation for recent new features.
-----BEGIN PGP SIGNATURE-----
iIcEABYIAC8WIQThPAmQN9sSA0DVxtI5NzHcH7XmBAUCZR7+cREceGlhbmdAa2Vy
bmVsLm9yZwAKCRA5NzHcH7XmBEHRAQCYKy+4zs4J2AavzVaxRPsbgun3VfVb4pxe
yqxR+vDOLgD/RAQoadAMcgKYnnDZrziLtb0myOO2SFGEIMTkI7FzEQE=
=+SF9
-----END PGP SIGNATURE-----
Merge tag 'erofs-for-6.6-rc5-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs
Pull erofs fixes from Gao Xiang:
- Fix a memory leak issue when using LZMA global compressed
deduplication
- Fix empty device tags in flatdev mode
- Update documentation for recent new features
* tag 'erofs-for-6.6-rc5-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs:
erofs: update documentation
erofs: allow empty device tags in flatdev mode
erofs: fix memory leak of LZMA global compressed deduplication
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEE9zuTYTs0RXF+Ke33EVvVyTe/1WoFAmUeSFQACgkQEVvVyTe/
1WrKdxAAoMtgK5/GcNO3BijxtU1M7onu70Jm4froOFR0R2hdDFG76vfHrPHpavha
gnSOAOpDZYNkkJTKNmN2b047dRvrtgYiZ1rrSxlLjXY+2ojVoyAFXddVPNpT4nSy
2xka8KqLeU8Mtzt0GQq/PVe0JLgjbEEejn4wFdoilSBPZhf4sik24ReQVyxhYG3B
aILG83GZAuqokzO3HkhYaMo+qgm0BVLcHwZ0zd5UQ404srB7nmE4JxS4+sh0mrsc
LQmZdlXnzEj46nJ9pluWvUH2bGAA1+rtSg182mbcrwnXOlct40x0TI8MzzFcnmaj
mue37V+iLHLRr6BeIzNghbTflF0SeUH8Qbn7mjYkv4+VTfomYU0OzV3Wy54yUqA9
VrugqHAVpgv6r4gNPgkvNpp4YKTeFUd14FI4sO+Ny/f0CHCA+TeEP5g3IH5QguxF
vf23Uww9QyNgRp/JaV0kQaJfRqbMtUylMOiTo7Kaou+XV3A6AfnHhyyeS2qVmYI8
bsXjG4QzYrlxmvcFuVBXrLuiigVEI8DcTHSwXZ5NqpqIb/dCmXexZ48loL73f0Qj
Us8U1w7JymQDscug74F4S47CE3JoUH4hsu25StQ1LTvfIc+leQy1GBU3tVJv/d7U
cgD7OkktwjuQ5ClS+8gdP9fQ9AzI9/vGEhQL9+peKW6FK5EfWPc=
=r3VL
-----END PGP SIGNATURE-----
Merge tag 'ovl-fixes-6.6-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/overlayfs/vfs
Pull overlayfs fixes from Amir Goldstein:
- Fix for file reference leak regression
- Fix for NULL pointer deref regression
- Fixes for RCU-walk race regressions:
Two of the fixes were taken from Al's RCU pathwalk race fixes series
with his consent [1].
Note that unlike most of Al's series, these two patches are not about
racing with ->kill_sb() and they are also very recent regressions
from v6.5, so I think it's worth getting them into v6.5.y.
There is also a fix for an RCU pathwalk race with ->kill_sb(), which
may have been solved in vfs generic code as you suggested, but it
also rids overlayfs from a nasty hack, so I think it's worth anyway.
Link: https://lore.kernel.org/linux-fsdevel/20231003204749.GA800259@ZenIV/ [1]
* tag 'ovl-fixes-6.6-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/overlayfs/vfs:
ovl: fix NULL pointer defer when encoding non-decodable lower fid
ovl: make use of ->layers safe in rcu pathwalk
ovl: fetch inode once in ovl_dentry_revalidate_common()
ovl: move freeing ovl_entry past rcu delay
ovl: fix file reference leak when submitting aio
if thread A in smb2_write is using work-tcon, other thread B use
smb2_tree_disconnect free the tcon, then thread A will use free'd tcon.
Time
+
Thread A | Thread A
smb2_write | smb2_tree_disconnect
|
|
| kfree(tree_conn)
|
// UAF! |
work->tcon->share_conf |
+
This patch add state, reference count and lock for tree conn to fix race
condition issue.
Reported-by: luosili <rootlab@huawei.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
There is a race condition issue between parallel smb2 lock request.
Time
+
Thread A | Thread A
smb2_lock | smb2_lock
|
insert smb_lock to lock_list |
spin_unlock(&work->conn->llist_lock) |
|
| spin_lock(&conn->llist_lock);
| kfree(cmp_lock);
|
// UAF! |
list_add(&smb_lock->llist, &rollback_list) +
This patch swaps the line for adding the smb lock to the rollback list and
adding the lock list of connection to fix the race issue.
Reported-by: luosili <rootlab@huawei.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
If parallel smb2 logoff requests come in before closing door, running
request count becomes more than 1 even though connection status is set to
KSMBD_SESS_NEED_RECONNECT. It can't get condition true, and sleep forever.
This patch fix race condition problem by returning error if connection
status was already set to KSMBD_SESS_NEED_RECONNECT.
Reported-by: luosili <rootlab@huawei.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
drop reference after use opinfo.
Signed-off-by: luosili <rootlab@huawei.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
fp can used in each command. If smb2_close command is coming at the
same time, UAF issue can happen by race condition.
Time
+
Thread A | Thread B1 B2 .... B5
smb2_open | smb2_close
|
__open_id |
insert fp to file_table |
|
| atomic_dec_and_test(&fp->refcount)
| if fp->refcount == 0, free fp by kfree.
// UAF! |
use fp |
+
This patch add f_state not to use freed fp is used and not to free fp in
use.
Reported-by: luosili <rootlab@huawei.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
Honor 'nohandlecache' mount option by not starting laundromat thread
even when SMB server supports directory leases. Do not waste system
resources by having laundromat thread running with no directory
caching at all.
Fixes: 2da338ff75 ("smb3: do not start laundromat thread when dir leases disabled")
Signed-off-by: Paulo Alcantara (SUSE) <pc@manguebit.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Recent changes to kernel_connect() and kernel_bind() ensure that
callers are insulated from changes to the address parameter made by BPF
SOCK_ADDR hooks. This patch wraps direct calls to ops->connect() and
ops->bind() with kernel_connect() and kernel_bind() to ensure that SMB
mounts do not see their mount address overwritten in such cases.
Link: https://lore.kernel.org/netdev/9944248dba1bce861375fcce9de663934d933ba9.camel@redhat.com/
Cc: <stable@vger.kernel.org> # 6.0+
Signed-off-by: Jordan Rife <jrife@google.com>
Acked-by: Paulo Alcantara (SUSE) <pc@manguebit.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
At btrfs_realloc_node() we have these checks to verify we are not using a
stale transaction (a past transaction with an unblocked state or higher),
and the only thing we do is to trigger two WARN_ON(). This however is a
critical problem, highly unexpected and if it happens it's most likely due
to a bug, so we should error out and turn the fs into error state so that
such issue is much more easily noticed if it's triggered.
The problem is critical because in btrfs_realloc_node() we COW tree blocks,
and using such stale transaction will lead to not persisting the extent
buffers used for the COW operations, as allocating tree block adds the
range of the respective extent buffers to the ->dirty_pages iotree of the
transaction, and a stale transaction, in the unlocked state or higher,
will not flush dirty extent buffers anymore, therefore resulting in not
persisting the tree block and resource leaks (not cleaning the dirty_pages
iotree for example).
So do the following changes:
1) Return -EUCLEAN if we find a stale transaction;
2) Turn the fs into error state, with error -EUCLEAN, so that no
transaction can be committed, and generate a stack trace;
3) Combine both conditions into a single if statement, as both are related
and have the same error message;
4) Mark the check as unlikely, since this is not expected to ever happen.
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
At btrfs_cow_block() we check if the block being COWed belongs to a root
that is being deleted and if so we log an error message. However this is
an unexpected case and it indicates a bug somewhere, so we should return
an error and abort the transaction. So change this in the following ways:
1) Abort the transaction with -EUCLEAN, so that if the issue ever happens
it can easily be noticed;
2) Change the logged message level from error to critical, and change the
message itself to print the block's logical address and the ID of the
root;
3) Return -EUCLEAN to the caller;
4) As this is an unexpected scenario, that should never happen, mark the
check as unlikely, allowing the compiler to potentially generate better
code.
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
At btrfs_cow_block() we have these checks to verify we are not using a
stale transaction (a past transaction with an unblocked state or higher),
and the only thing we do is to trigger a WARN with a message and a stack
trace. This however is a critical problem, highly unexpected and if it
happens it's most likely due to a bug, so we should error out and turn the
fs into error state so that such issue is much more easily noticed if it's
triggered.
The problem is critical because using such stale transaction will lead to
not persisting the extent buffer used for the COW operation, as allocating
a tree block adds the range of the respective extent buffer to the
->dirty_pages iotree of the transaction, and a stale transaction, in the
unlocked state or higher, will not flush dirty extent buffers anymore,
therefore resulting in not persisting the tree block and resource leaks
(not cleaning the dirty_pages iotree for example).
So do the following changes:
1) Return -EUCLEAN if we find a stale transaction;
2) Turn the fs into error state, with error -EUCLEAN, so that no
transaction can be committed, and generate a stack trace;
3) Combine both conditions into a single if statement, as both are related
and have the same error message;
4) Mark the check as unlikely, since this is not expected to ever happen.
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>