- Fix fallocate so that it drops all file privileges when files are
modified instead of open-coding that incompletely.
- Fix fallocate to flush the log if the caller wanted synchronous file
updates.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEUzaAxoMeQq6m2jMV+H93GTRKtOsFAmH7/7wACgkQ+H93GTRK
tOut/BAAl2aLNDdBjjp9rBbNdnZtXSQ3JKBo/6FrGX+7M2rr5/Ppskzfm2FE6HQj
TS4CNAg1Gid/gbK956iOU+E3LvGR1lUA3yhtm0OZ7yffQ+1zGA2RUCTy39SZEUTI
PTpzF6gH/oZAKrfv2r9yLEFJdPyBoo8BBBHBRshw5TiDqMG1pJ3toTySehLXHsRf
Bf4pXGyOy0+q9zkIprUQg5l3cEEdoOW5mLwv/1soiLx+mz6Go/MHFaAO/U3m2KBq
mwNUZcXoRSihyd7xKEqLQfYuy7abrtICsupoMGvCzFI1ox1ZzPN5/aNAI1LBXz8+
tK+gRaJg7stGVuI28zIEw4mmcLWG+h46fyuunhY6EIcoXZzpcXL9gxQzle6ypE8e
ScRtRnjUd4CsDyRNzDez082K1T6M6pD3QXMWPn29/WnjKElbEKwqDRjri4HOqwyq
A3buwhvuzK2ZS8f2k1/YLrNqZLCZwd9LRsz085GI7HUp7cardiTFERNzt4Nt935k
T6lDFNKXVW9MZvSXGWNGmwHAQRvmkW+i7pyhE1LCsTwF0Q6YNwbhZAl9SJK5S5Zk
TJEam4d6xY86Ppu7kXcYWtv8gJxMJyiGbywI8xZDz1VrkwNbfB0ZK/8A9KEMGc1r
fuUxTZF12BY5Sm5YaxlmrDnkkvn3K9CHQ3DI7BiP+IldwDdhuOU=
=dqTg
-----END PGP SIGNATURE-----
Merge tag 'xfs-5.17-fixes-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux
Pull xfs fixes from Darrick Wong:
"I was auditing operations in XFS that clear file privileges, and
realized that XFS' fallocate implementation drops suid/sgid but
doesn't clear file capabilities the same way that file writes and
reflink do.
There are VFS helpers that do it correctly, so refactor XFS to use
them. I also noticed that we weren't flushing the log at the correct
point in the fallocate operation, so that's fixed too.
Summary:
- Fix fallocate so that it drops all file privileges when files are
modified instead of open-coding that incompletely.
- Fix fallocate to flush the log if the caller wanted synchronous
file updates"
* tag 'xfs-5.17-fixes-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
xfs: ensure log flush at the end of a synchronous fallocate call
xfs: move xfs_update_prealloc_flags() to xfs_pnfs.c
xfs: set prealloc flag in xfs_alloc_file_space()
xfs: fallocate() should call file_modified()
xfs: remove XFS_PREALLOC_SYNC
xfs: reject crazy array sizes being fed to XFS_IOC_GETBMAP*
- Fix a bug where callers of ->sync_fs (e.g. sync_filesystem and
syncfs(2)) ignore the return value.
- Fix a bug where callers of sync_filesystem (e.g. fs freeze) ignore
the return value.
- Fix a bug in XFS where xfs_fs_sync_fs never passed back error
returns.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEUzaAxoMeQq6m2jMV+H93GTRKtOsFAmH2xBcACgkQ+H93GTRK
tOvcFQ//Wo3mMeyte2IkA4Km6tKxB5TMk3oEAYRcp5H2w4tmAnqrgukFRHR3kee8
b7/qBqAeI66FsneT74Rvb5gR+X2gPgiwKg88ZKmCnufJruL2YNL5D0u7e8sTaqO6
DLIf8EXw2l9KZOsMjPy2wBbflvufhM1LciXR25YwslYXUySaIDqNJ6R58Xyx7xni
WNNxkM4rHry2AoLBGnyLti88xguLCSGdS5vEyiNj/dKHZn47/J8xXPppFyDyzGA2
2oVfGCIgUZGL/Hy99iM8kTIN50GS0Eq1tiM6qUnad2U8KUWj1g2l9yqL6pCC3iuT
PLn4iB7IDV6zs+FOLdgEYLp7iw/2BxUzocBTaZQlZm0VMJ8UYQEvOeecnOqnly+9
tt12ZLk3iFVSTUejw3PofqrHc2nTUJr8ojrzKrwIc0Pmur6YlIlO99RHUxy+Li4o
IVh+D5cqOiwCul5cdGk9120dBACEICI+Q70vjYpnwrgjozHg1tML8d1VtPI9+fs3
fSzjkLlIc+vgLYgsxvmZg8kSEd2hDb3lq2aSirx42aaEvX94QDly/SPNz78FdehZ
JWrSewp8rBAW8+yvfBKKcpyhCyULLCGIQjKOziuwLpjjEHMOd6CpJ5w/2c/fKxzy
B775+6lSb+P/FG3ikW7PKYNa5wWfgp0/Maz79XiuAcUJI5FK9R8=
=BdrI
-----END PGP SIGNATURE-----
Merge tag 'vfs-5.17-fixes-2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux
Pull vfs fixes from Darrick Wong:
"I was auditing the sync_fs code paths recently and noticed that most
callers of ->sync_fs ignore its return value (and many implementations
never return nonzero even if the fs is broken!), which means that
internal fs errors and corruption are not passed up to userspace
callers of syncfs(2) or FIFREEZE. Hence fixing the common code and
XFS, and I'll start working on the ext4/btrfs folks if this is merged.
Summary:
- Fix a bug where callers of ->sync_fs (e.g. sync_filesystem and
syncfs(2)) ignore the return value.
- Fix a bug where callers of sync_filesystem (e.g. fs freeze) ignore
the return value.
- Fix a bug in XFS where xfs_fs_sync_fs never passed back error
returns"
* tag 'vfs-5.17-fixes-2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
xfs: return errors in xfs_fs_sync_fs
quota: make dquot_quota_sync return errors from ->sync_fs
vfs: make sync_filesystem return errors from ->sync_fs
vfs: make freeze_super abort when sync_filesystem returns error
- Limit the length of ioend chains in writeback so that we don't trip
the softlockup watchdog and to limit long tail latency on clearing
PageWriteback.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEUzaAxoMeQq6m2jMV+H93GTRKtOsFAmHxg5QACgkQ+H93GTRK
tOv/5g/9H1grcwR5RuUm0Xae5R9BMeu3GTOwgN7GIeo7YkQHm5F0O2o0jTXOZAuP
1LZ00CrBHPgHf/sRi0MkGJnRMWQEtrlsexVfDQn05yAB159n68qAwrHSUH5MHHtV
vz2S+X6hBdbPvie22HKKwr+rqEz3oe0vdJ94jRDHAamKL3PM1WbK3R48hJFN2Wua
EGWxid0d6HGSpUVObSRnJyKh4MhKniPWMJTDIAthmEQIUPBJa4aIju38jq7skiep
TLYz9GnnrUyDYlmNSgmvG/a/BcAz2GJQJ1i6FlkuAL8Wjlh+ni30j5P3Bifb8cKf
i1N7YzgaV/5frx1d70ovswblgijF/4iz22CQfX5I1NBYy9rCuVe1N+GltWX/wYG7
ShqP4cjzgrzeEir0ELCN8ubU15SrQ4XDCgj+wHO1f0hD7ldRfKOQObFhdR/k5J+T
MqhjSg5x3N0Y39Gp27T+cDvKM2tYZjb8aqrfU6hk8FUUwgXSovJbx97B6jr4xGHH
/W/7MREuY65Lz+KQ2FOWJJk/AGnzxtyINFUSzDGV5IlR/eQDNkbOJBanRx34iZX9
j3pVEXwWvjRkKm+fhAHxXG0mBNhYwzwuiZgLuL7MQKfal8LShCKnc9PKfAV9PgNI
NcEnmhV6wzC4gJidsFTqsUZqQcURFGUDX0inpaswgr3MEXRUe9s=
=a5K9
-----END PGP SIGNATURE-----
Merge tag 'iomap-5.17-fixes-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux
Pull iomap fix from Darrick Wong:
"A single bugfix for iomap.
The fix should eliminate occasional complaints about stall warnings
when a lot of writeback IO completes all at once and we have to then
go clearing status on a large number of folios.
Summary:
- Limit the length of ioend chains in writeback so that we don't trip
the softlockup watchdog and to limit long tail latency on clearing
PageWriteback"
* tag 'iomap-5.17-fixes-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
xfs, iomap: limit individual ioend chain lengths in writeback
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEE8rQSAMVO+zA4DBdWxWXV+ddtWDsFAmH9eUcACgkQxWXV+ddt
WDvCvQ//bANu7air/Og5r2Mn0ZYyrcQl+yDYE75UC/tzTZNNtP8guwGllwlcsA0v
RQPiuFFtvjKMgKP6Eo1mVeUPkpX83VQkT+sqFRsFEFxazIXnSvEJ+iHVcuiZvgj1
VkTjdt7/mLb573zSA0MLhJqK1BBuFhUTCCHFHlCLoiYeekPAui1pbUC4LAE/+ksU
veCn9YS+NGkDpIv/b9mcALVBe+XkZlmw1LON8ArEbpY4ToafRk0qZfhV7CvyRbSP
Y1zLUScNLHGoR2WA1WhwlwuMePdhgX/8neGNiXsiw3WnmZhFoUVX7oUa6IWnKkKk
dD+x5Z3Z2xBQGK8djyqxzUFJ2VAvz15xGIM452ofGa1BJFZgV9hjPA6Y4RFdWx63
4AZ6OJwhrXhgMtWBhRtM6mGeje56ljwaxku9qhe585z8H5V8ezUNwWVkjY0bLKsd
iT3bUHEReoYRWuyszI1ZSm1DbyzNY2943ly97p/j8qKp4SHX39/QYAKmnuuHdIup
TnTBJOh38rj4S8BfF873aKAo7EfLJcDbTcZ1ivbuX5FeByRuQB4F0c1RRi4usMLc
DL5mhDhT71U1l/LF3IANQ4ieUfZbeFHd+dAVkYsGkYzzaWL8E03L582l/fqaVGsp
RaVpiuKnh2cyDXUxob8IYT5mZ/saa96xBSK8VEqnwjNEQCzKEeU=
=5MJl
-----END PGP SIGNATURE-----
Merge tag 'for-5.17-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull btrfs fixes from David Sterba:
"A few fixes and error handling improvements:
- fix deadlock between quota disable and qgroup rescan worker
- fix use-after-free after failure to create a snapshot
- skip warning on unmount after log cleanup failure
- don't start transaction for scrub if the fs is mounted read-only
- tree checker verifies item sizes"
* tag 'for-5.17-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: skip reserved bytes warning on unmount after log cleanup failure
btrfs: fix use of uninitialized variable at rm device ioctl
btrfs: fix use-after-free after failure to create a snapshot
btrfs: tree-checker: check item_size for dev_item
btrfs: tree-checker: check item_size for inode_item
btrfs: fix deadlock between quota disable and qgroup rescan worker
btrfs: don't start transaction for scrub if the fs is mounted read-only
- fix fsdax partition offset misbehavior;
- clean up z_erofs_decompressqueue_work() declaration;
- fix up EOF lcluster inlining, especially for small compressed data.
-----BEGIN PGP SIGNATURE-----
iIcEABYIAC8WIQThPAmQN9sSA0DVxtI5NzHcH7XmBAUCYf0fBhEceGlhbmdAa2Vy
bmVsLm9yZwAKCRA5NzHcH7XmBCuaAP9GtcEDQ38ozhTgiD7ae8mPX808my3WDs+0
lyySofcEeAD+IxKNv0Gx7j7TgeaFWQZc7r16JeOPI9TKAe4BZaa1Tg0=
=J/Hq
-----END PGP SIGNATURE-----
Merge tag 'erofs-for-5.17-rc3-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs
Pull erofs fixes from Gao Xiang:
"Two fixes related to fsdax cleanup in this cycle and ztailpacking to
fix small compressed data inlining. There is also a trivial cleanup to
rearrange code for better reading.
Summary:
- fix fsdax partition offset misbehavior
- clean up z_erofs_decompressqueue_work() declaration
- fix up EOF lcluster inlining, especially for small compressed data"
* tag 'erofs-for-5.17-rc3-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs:
erofs: fix small compressed files inlining
erofs: avoid unnecessary z_erofs_decompressqueue_work() declaration
erofs: fix fsdax partition offset handling
the 9p 'walk' operation requires fid arguments to not originate from
an open or create call and we've missed that for a while as the
servers regularly running tests with don't enforce the check and
no active reviewer knew about the rule.
Both reporters confirmed reverting this patch fixes things for them
and looking at it further wasn't actually required...
Will take more time for follow up and enforcing the rule more
thoroughly later.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEE/IPbcYBuWt0zoYhOq06b7GqY5nAFAmH9A04ACgkQq06b7GqY
5nBsmQ/9G4MTSusDbUL80d/VMXmUWQ4wfiQ9/t8ikH/QQL3nNf5az5WJxp38McHQ
SJrZdpSbfy44DRwoAtkO964Qrvn+Yxm+H7aVfXWwThW5l6bF2/tox7VgSLnQmQoE
96V4zHMDmP1ibmAf3Eh0SfmQQnN7eDec9AXD6sf7sI0pKoRCUD2R8oHVrMLSdJKr
tRWpFP0C0HLrB6ptkLYOEt0IrQxMfdN7qlWNE7016REJgtNXPrN6U2DN3iXITAHp
f1pSm+BKrzPcDpT5fGzPfhDvrVtEoep8+vTaszSEedg+Va9sJp8fc3Ha+YWNdZGv
7ZM6uMTubNOFaSEn33HqsJ7EkUuihckdbkg3FwQJNK6wOHslzWBLeQFj2vNDw8ji
jBU8owqAbWOlQ2wciHPXrb83mV97nnEMQVge4+9LQQaxLJaoIPTwohKUQxbjiRBU
b6St4Xs1Npq/h818aycH3pMsoA5FwDrny5OR++E4RbbgiffejmjkxjXG8BPx/mWt
UjT5BQQ4tD/JQGbB6OCpDkuCnMpHApNIvEodN8P72uZCU9nK6ARXse1tYcR6JqNf
otNcJBrGTd4o3r/Fgb4wi8D0bLMcVNvXYRQyV/ebRC+NJkUZDIjmfFT8joKnxuIf
w54gPfPRroxTnvwlniyxsT7HPAS1F6Ds5ve3wTiUNqiGWX6JtsQ=
=afc6
-----END PGP SIGNATURE-----
Merge tag '9p-for-5.17-rc3' of git://github.com/martinetd/linux
Pull 9p fix from Dominique Martinet:
"Fix 'cannot walk open fid' rule
The 9p 'walk' operation requires fid arguments to not originate from
an open or create call and we've missed that for a while as the
servers regularly running tests with don't enforce the check and no
active reviewer knew about the rule.
Both reporters confirmed reverting this patch fixes things for them
and looking at it further wasn't actually required... Will take more
time for follow up and enforcing the rule more thoroughly later"
* tag '9p-for-5.17-rc3' of git://github.com/martinetd/linux:
Revert "fs/9p: search open fids first"
-----BEGIN PGP SIGNATURE-----
iQGzBAABCgAdFiEE6fsu8pdIjtWE/DpLiiy9cAdyT1EFAmH8v0oACgkQiiy9cAdy
T1F8CQwAjxP339Ln4AQMiuYNWxnzhHgbKKgPD+PZxvS/lvBzIcyTz0aHqRTIGwLr
8+NH39nzQCDgXRh8qMPc/RPj+qDUMeEV7mEwv3pj9R13tUDJB41bi2nLEIcOoISq
uJ66aiJ+Nm6e7dU8rbHq7fpvdfnaQgOK64hMk4WWJsiByehUT/fGMs0GTMQKry5i
t0xYfyfX/ty8/8yrlHTAHFrq60468VHLsaX3Bkxm3Mxff1pziAtwjiqhkP6U1jNQ
7/qaxHHwgY6Yogoy9Oe9MsFHwhx9dFi3g3j+BLbZn4rpffvU6mJF28iFx9Q42P+K
BDXLzYhnczkgyTnyASkoZt6s5YVWMLR7K2WT7obLwAmS2UbOkvGiuZCs7xp6AOFz
9AQCKJqM2V5tWs9doNGubE3pIRjOOFi5ohXq6+CUZv5kqEzyfy+xfEr557JJxSAw
/PVb1dW7CbDWK+BW1AtfcQo9NA6FQmkXTy+/QdzeEoIxm3szuHOJ+/lY7RP9QTt0
Fdc+8Y1o
=2fkO
-----END PGP SIGNATURE-----
Merge tag '5.17-rc3-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6
Pull cifs fixes from Steve French:
"SMB3 client fixes including:
- multiple fscache related fixes, reenabling ability to read/write to
cached files for cifs.ko (that was temporarily disabled for cifs.ko
a few weeks ago due to the recent fscache changes)
- also includes a new fscache helper function ("query_occupancy")
used by above
- fix for multiuser mounts and NTLMSSP auth (workstation name) for
stable
- fix locking ordering problem in multichannel code
- trivial malformed comment fix"
* tag '5.17-rc3-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6:
cifs: fix workstation_name for multiuser mounts
Invalidate fscache cookie only when inode attributes are changed.
cifs: Fix the readahead conversion to manage the batch when reading from cache
cifs: Implement cache I/O by accessing the cache directly
netfs, cachefiles: Add a method to query presence of data in the cache
cifs: Transition from ->readpages() to ->readahead()
cifs: unlock chan_lock before calling cifs_put_tcp_session
Fix a warning about a malformed kernel doc comment in cifs
Prior to ztailpacking feature, it's enough that each lcluster has
two pclusters at most, and the last pcluster should be turned into
an uncompressed pcluster when necessary. For example,
_________________________________________________
|_ pcluster n-2 _|_ pcluster n-1 _|____ EOFed ____|
which should be converted into:
_________________________________________________
|_ pcluster n-2 _|_ pcluster n-1 (uncompressed)' _|
That is fine since either pcluster n-1 or (uncompressed)' takes one
physical block.
However, after ztailpacking was supported, the game is changed since
the last pcluster can be inlined now. And such case above is quite
common for inlining small files. Therefore, in order to inline more
effectively, special EOF lclusters are now supported which can have
three parts at most, as illustrated below:
_________________________________________________
|_ pcluster n-2 _|_ pcluster n-1 _|____ EOFed ____|
^ i_size
Actually similar code exists in Yue Hu's original patchset [1], but I
removed this part on purpose. After evaluating more real cases with
small files, I've changed my mind.
[1] https://lore.kernel.org/r/20211215094449.15162-1-huyue2@yulong.com
Link: https://lore.kernel.org/r/20220203190203.30794-1-xiang@kernel.org
Fixes: ab92184ff8 ("erofs: add on-disk compressed tail-packing inline support")
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Set workstation_name from the master_tcon for multiuser mounts.
Just in case, protect size_of_ntlmssp_blob against a NULL workstation_name.
Fixes: 49bd49f983 ("cifs: send workstation name during ntlmssp session setup")
Cc: stable@vger.kernel.org # 5.16
Reviewed-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
Signed-off-by: Ryan Bair <ryandbair@gmail.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
For example if mtime or size has changed.
Signed-off-by: Rohith Surabattula <rohiths@microsoft.com>
Reviewed-by: Shyam Prasad N <sprasad@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
- Ensure SM_NOTIFY doesn't crash the NFS server host
- Ensure NLM locks are cleaned up after client reboot
- Fix a leak of internal NFSv4 lease information
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEKLLlsBKG3yQ88j7+M2qzM29mf5cFAmH1em4ACgkQM2qzM29m
f5drhRAAq8uU+tgABqZNj4aLivUOAionkSiV6Blk1V44DO00yhY2y3dAsOu8bO0k
Kh1Yu0QSZeaYDSi2Ak9qCKAl8eNg8lvlxWJ5pQ+GERVJiZj3JJRPSUJI+5r/aQMi
k774Y+DzLwPn6/r5iTyymm3vx1wcas+Y/v2nvmHob/G74UKngbhOhP05XS/1MDlM
fdTtXVKqLx92grDljTXWCtT5q5mpOc+OFufo2a5+b1aJjUWiU/rraT1mArNlEC7F
IMw/eZn6ZnZv+ywbVJFGeRib/Xa7jNeKA+4CQMH+quk/s8rHEaUJqeM5439HLBYk
E0KrFAdn+VDV5A6I9TIB1vtykl0KzC/r2u8G4vbA++rfpuxW36lGS95JFnDctGG+
uwk/f4p2+D7oSGt7gLXt8LTOAx0/NeT+OTtUqZRPcoKO7uXvkkCCu2irD9VpGSpD
A83Qq0ewT9ntNy0Feik3FgmRSmPTgvywE78MeRFoundd3QhtghUunfY1N2soDt7t
0hyqBhcH8ypWjFoKmv+wAHLPcGcdeg+8T0w3hFPcyTrrdYo/OJl4MNgrIczA2z8O
nWCZ+lOZq3QtAkd0eGSFPhnTVebCP5n6yvIfDN4rZc+ASNAqXCR5e1yCDE1gfO+E
I1uCcxzewWPe3DsuYWQznEx5u4Rpiml5JF1q5uKFwTNj4UTBFKQ=
=IC/r
-----END PGP SIGNATURE-----
Merge tag 'nfsd-5.17-1' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux
Pull nfsd fixes from Chuck Lever:
"Notable bug fixes:
- Ensure SM_NOTIFY doesn't crash the NFS server host
- Ensure NLM locks are cleaned up after client reboot
- Fix a leak of internal NFSv4 lease information"
* tag 'nfsd-5.17-1' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux:
nfsd: nfsd4_setclientid_confirm mistakenly expires confirmed client.
lockd: fix failure to cleanup client locks
lockd: fix server crash on reboot of client holding lock
-----BEGIN PGP SIGNATURE-----
iQEzBAABCAAdFiEEq1nRK9aeMoq1VSgcnJ2qBz9kQNkFAmH6gtsACgkQnJ2qBz9k
QNkHTggAvFfCBZ11AM3MIQifVrN06q0Aq/xTCR56lbLoqjVbLx+oUAPYk45AgzBJ
9NhYFwPbevLs+c1JrigmTW/i2juZghdfV3CBu6uWvnIDgZbjDwt1RFZ/TMuzG488
Orr12n34J+kaM89BxBPxecFXqGW1bqtIeIUkM6M4OefagVvueRP591GEHRPGk60S
nz90LIZN2fsXrDq6K0EC4LVnMF8VWe7lpW8vHORc/O83KasHFGv1xXJ/Z1ovq9ln
N17pbjFvECyKwIsvQCuKoxa/iutKqiUSQiVyFyN9IryBE5bMKSuv87EP+dKPcP/O
Vmkg2AZRAbL9+M/rhgu4mF0bfhnuJQ==
=uGov
-----END PGP SIGNATURE-----
Merge tag 'fsnotify_for_v5.17-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs
Pull fanotify fix from Jan Kara:
"Fix stale file descriptor in copy_event_to_user"
* tag 'fsnotify_for_v5.17-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
fanotify: Fix stale file descriptor in copy_event_to_user()
Since we've started treating fallocate more like a file write, we
should flush the log to disk if the user has asked for synchronous
writes either by setting it via fcntl flags, or inode flags, or with
the sync mount option. We've already got a helper for this, so use
it.
[The original patch by Darrick was massaged by Dave to fit this patchset]
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
The operations that xfs_update_prealloc_flags() perform are now
unique to xfs_fs_map_blocks(), so move xfs_update_prealloc_flags()
to be a static function in xfs_pnfs.c and cut out all the
other functionality that is doesn't use anymore.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Now that we only call xfs_update_prealloc_flags() from
xfs_file_fallocate() in the case where we need to set the
preallocation flag, do this in xfs_alloc_file_space() where we
already have the inode joined into a transaction and get
rid of the call to xfs_update_prealloc_flags() from the fallocate
code.
This also means that we now correctly avoid setting the
XFS_DIFLAG_PREALLOC flag when xfs_is_always_cow_inode() is true, as
these inodes will never have preallocated extents.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
In XFS, we always update the inode change and modification time when
any fallocate() operation succeeds. Furthermore, as various
fallocate modes can change the file contents (extending EOF,
punching holes, zeroing things, shifting extents), we should drop
file privileges like suid just like we do for a regular write().
There's already a VFS helper that figures all this out for us, so
use that.
The net effect of this is that we no longer drop suid/sgid if the
caller is root, but we also now drop file capabilities.
We also move the xfs_update_prealloc_flags() function so that it now
is only called by the scope that needs to set the the prealloc flag.
Based on a patch from Darrick Wong.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Callers can acheive the same thing by calling xfs_log_force_inode()
after making their modifications. There is no need for
xfs_update_prealloc_flags() to do this.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQSQHSd0lITzzeNWNm3h3BK/laaZPAUCYfkylwAKCRDh3BK/laaZ
PLY6AP9rui/UD3RE/koFfRVTTPuZkv7I14mHmCzDloYcmPDJCgEAwebzCiOo22kQ
Jn3V/B8mmG2vBv+qu+iM3/WbkPqCtgg=
=ajWT
-----END PGP SIGNATURE-----
Merge tag 'ovl-fixes-5.17-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs
Pull overlayfs fixes from Miklos Szeredi:
"Fix a regression introduced in v5.15, affecting copy up of files with
'noatime' or 'sync' attributes to a tmpfs upper layer"
* tag 'ovl-fixes-5.17-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs:
ovl: don't fail copy up if no fileattr support on upper
ovl: fix NULL pointer dereference in copy up warning
previous CONFIG_UNICODE. It is -rc material since we don't want to
expose the former symbol on 5.17.
This has been living on linux-next for the past week.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEE8jAUPq50yNjPBCi4QEuZqsMcppQFAmH4lC0ACgkQQEuZqsMc
ppRl1Q/+Lyba+DORs26C4p1GDS5ezHOCdbBUE8RFwWjIl+h5ckQ/8kndaXPRLorZ
1S9E6h5RfqhekGKOhMTXyfzqcW8qMzUy4i3J2lmJpDwATqLt+4Wu/M2BBH2CaIIL
EhhW8D+WduAEM/TFYihH9LJ0RopvIsqcy8qdu+oSBGfPAdxJ0f2+Yx0pNTRfqVmi
8+Dry0nRhP12o9wXElpZ0/BYEZTlY+Zo6L/heT6/GKDLpz/YmZp18GAc/0TWb3LL
ASujr+anU2LxSFskkyuMu+rbFE8eDshvHEuBZLxlD2o+tG6lAi4mNWZYc0/+jPMw
8TdJ5MEX3IlljXLRKuYctoCdsFQKLxH5IN5wLkiLvM5fBpeb/sWqNolx8f2s/f9R
TaUdjwiqFnML4VnlEH3hd3/hUUVbnE+xJo6g1iRGgJY3eecimvwl8P5H7k9Sn3OS
4zh0bHT9pfg+vUR0BVnfdWi4OpPxSrdqCgFhHsmKaGMvTApm0qMKK1Cg4OPNtYwr
d1RMqsqEBSJTHzr0nHoiWLhkIo8npRPy+LMK51D8j6wg0kOj4GGYerWm1MD9ZlbI
rhPy7nDgdcH48Gk1m6o7dROZKCvkZK+/QDPelBgZHGcGB94lUugYVJQrlBjI+2+7
Wx5oQLgQgeabeMtDZ/YNy5Dsre20vas2oLj5cs6uuoWNOcBO6Ew=
=YVNN
-----END PGP SIGNATURE-----
Merge tag 'unicode-for-next-5.17-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/krisman/unicode
Pull unicode cleanup from Gabriel Krisman Bertazi:
"A fix from Christoph Hellwig merging the CONFIG_UNICODE_UTF8_DATA into
the previous CONFIG_UNICODE. It is -rc material since we don't want to
expose the former symbol on 5.17.
This has been living on linux-next for the past week"
* tag 'unicode-for-next-5.17-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/krisman/unicode:
unicode: clean up the Kconfig symbol confusion
Fix the readahead conversion to correctly manage the last batch skipping
when reading from cache. This involves a readahead batch of one page or
one folio, so set the batch size according to the number of constituent
pages (should be 1 for a filesystem that doesn't do multipage folios yet).
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Steve French <smfrench@gmail.com>
Reviewed-by: Rohith Surabattula <rohiths.msft@gmail.com>
Reviewed-by: Shyam Prasad N <nspmangalore@gmail.com>
cc: Jeff Layton <jlayton@kernel.org>
cc: linux-cifs@vger.kernel.org
Signed-off-by: Steve French <stfrench@microsoft.com>
Move cifs to using fscache DIO API instead of the old upstream I/O API as
that has been removed. This is a stopgap solution as the intention is that
at sometime in the future, the cache will move to using larger blocks and
won't be able to store individual pages in order to deal with the potential
for data corruption due to the backing filesystem being able insert/remove
bridging blocks of zeros into its extent list[1].
cifs then reads and writes cache pages synchronously and one page at a time.
The preferred change would be to use the netfs lib, but the new I/O API can
be used directly. It's just that as the cache now needs to track data for
itself, caching blocks may exceed page size...
This code is somewhat borrowed from my "fallback I/O" patchset[2].
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Steve French <smfrench@gmail.com>
cc: Shyam Prasad N <nspmangalore@gmail.com>
cc: linux-cifs@vger.kernel.org
cc: linux-cachefs@redhat.com
Link: https://lore.kernel.org/r/YO17ZNOcq+9PajfQ@mit.edu [1]
Link: https://lore.kernel.org/r/202112100957.2oEDT20W-lkp@intel.com/ [2]
Acked-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
Add a netfs_cache_ops method by which a network filesystem can ask the
cache about what data it has available and where so that it can make a
multipage read more efficient.
Signed-off-by: David Howells <dhowells@redhat.com>
cc: linux-cachefs@redhat.com
Acked-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Rohith Surabattula <rohiths@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Transition the cifs filesystem from using the old ->readpages() method to
using the new ->readahead() method.
For the moment, this removes any invocation of fscache to read data from
the local cache, leaving that to another patch.
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Steve French <smfrench@gmail.com>
cc: Shyam Prasad N <nspmangalore@gmail.com>
cc: Matthew Wilcox <willy@infradead.org>
cc: Jeff Layton <jlayton@kernel.org>
cc: linux-cifs@vger.kernel.org
cc: linux-cachefs@redhat.com
Reviewed-by: Rohith Surabattula <rohiths@microsoft.com>
Acked-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
This code calls fd_install() which gives the userspace access to the fd.
Then if copy_info_records_to_user() fails it calls put_unused_fd(fd) but
that will not release it and leads to a stale entry in the file
descriptor table.
Generally you can't trust the fd after a call to fd_install(). The fix
is to delay the fd_install() until everything else has succeeded.
Fortunately it requires CAP_SYS_ADMIN to reach this code so the security
impact is less.
Fixes: f644bc449b ("fanotify: fix copy_event_to_user() fid error clean up")
Link: https://lore.kernel.org/r/20220128195656.GA26981@kili
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Mathias Krause <minipli@grsecurity.net>
Signed-off-by: Jan Kara <jack@suse.cz>
Syzbot tripped over the following complaint from the kernel:
WARNING: CPU: 2 PID: 15402 at mm/util.c:597 kvmalloc_node+0x11e/0x125 mm/util.c:597
While trying to run XFS_IOC_GETBMAP against the following structure:
struct getbmap fubar = {
.bmv_count = 0x22dae649,
};
Obviously, this is a crazy huge value since the next thing that the
ioctl would do is allocate 37GB of memory. This is enough to make
kvmalloc mad, but isn't large enough to trip the validation functions.
In other words, I'm fussing with checks that were **already sufficient**
because that's easier than dealing with 644 internal bug reports. Yes,
that's right, six hundred and forty-four.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
Reviewed-by: Catherine Hoang <catherine.hoang@oracle.com>
After the recent changes made by commit c2e3930529 ("btrfs: clear
extent buffer uptodate when we fail to write it") and its followup fix,
commit 651740a502 ("btrfs: check WRITE_ERR when trying to read an
extent buffer"), we can now end up not cleaning up space reservations of
log tree extent buffers after a transaction abort happens, as well as not
cleaning up still dirty extent buffers.
This happens because if writeback for a log tree extent buffer failed,
then we have cleared the bit EXTENT_BUFFER_UPTODATE from the extent buffer
and we have also set the bit EXTENT_BUFFER_WRITE_ERR on it. Later on,
when trying to free the log tree with free_log_tree(), which iterates
over the tree, we can end up getting an -EIO error when trying to read
a node or a leaf, since read_extent_buffer_pages() returns -EIO if an
extent buffer does not have EXTENT_BUFFER_UPTODATE set and has the
EXTENT_BUFFER_WRITE_ERR bit set. Getting that -EIO means that we return
immediately as we can not iterate over the entire tree.
In that case we never update the reserved space for an extent buffer in
the respective block group and space_info object.
When this happens we get the following traces when unmounting the fs:
[174957.284509] BTRFS: error (device dm-0) in cleanup_transaction:1913: errno=-5 IO failure
[174957.286497] BTRFS: error (device dm-0) in free_log_tree:3420: errno=-5 IO failure
[174957.399379] ------------[ cut here ]------------
[174957.402497] WARNING: CPU: 2 PID: 3206883 at fs/btrfs/block-group.c:127 btrfs_put_block_group+0x77/0xb0 [btrfs]
[174957.407523] Modules linked in: btrfs overlay dm_zero (...)
[174957.424917] CPU: 2 PID: 3206883 Comm: umount Tainted: G W 5.16.0-rc5-btrfs-next-109 #1
[174957.426689] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
[174957.428716] RIP: 0010:btrfs_put_block_group+0x77/0xb0 [btrfs]
[174957.429717] Code: 21 48 8b bd (...)
[174957.432867] RSP: 0018:ffffb70d41cffdd0 EFLAGS: 00010206
[174957.433632] RAX: 0000000000000001 RBX: ffff8b09c3848000 RCX: ffff8b0758edd1c8
[174957.434689] RDX: 0000000000000001 RSI: ffffffffc0b467e7 RDI: ffff8b0758edd000
[174957.436068] RBP: ffff8b0758edd000 R08: 0000000000000000 R09: 0000000000000000
[174957.437114] R10: 0000000000000246 R11: 0000000000000000 R12: ffff8b09c3848148
[174957.438140] R13: ffff8b09c3848198 R14: ffff8b0758edd188 R15: dead000000000100
[174957.439317] FS: 00007f328fb82800(0000) GS:ffff8b0a2d200000(0000) knlGS:0000000000000000
[174957.440402] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[174957.441164] CR2: 00007fff13563e98 CR3: 0000000404f4e005 CR4: 0000000000370ee0
[174957.442117] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[174957.443076] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[174957.443948] Call Trace:
[174957.444264] <TASK>
[174957.444538] btrfs_free_block_groups+0x255/0x3c0 [btrfs]
[174957.445238] close_ctree+0x301/0x357 [btrfs]
[174957.445803] ? call_rcu+0x16c/0x290
[174957.446250] generic_shutdown_super+0x74/0x120
[174957.446832] kill_anon_super+0x14/0x30
[174957.447305] btrfs_kill_super+0x12/0x20 [btrfs]
[174957.447890] deactivate_locked_super+0x31/0xa0
[174957.448440] cleanup_mnt+0x147/0x1c0
[174957.448888] task_work_run+0x5c/0xa0
[174957.449336] exit_to_user_mode_prepare+0x1e5/0x1f0
[174957.449934] syscall_exit_to_user_mode+0x16/0x40
[174957.450512] do_syscall_64+0x48/0xc0
[174957.450980] entry_SYSCALL_64_after_hwframe+0x44/0xae
[174957.451605] RIP: 0033:0x7f328fdc4a97
[174957.452059] Code: 03 0c 00 f7 (...)
[174957.454320] RSP: 002b:00007fff13564ec8 EFLAGS: 00000246 ORIG_RAX: 00000000000000a6
[174957.455262] RAX: 0000000000000000 RBX: 00007f328feea264 RCX: 00007f328fdc4a97
[174957.456131] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000560b8ae51dd0
[174957.457118] RBP: 0000560b8ae51ba0 R08: 0000000000000000 R09: 00007fff13563c40
[174957.458005] R10: 00007f328fe49fc0 R11: 0000000000000246 R12: 0000000000000000
[174957.459113] R13: 0000560b8ae51dd0 R14: 0000560b8ae51cb0 R15: 0000000000000000
[174957.460193] </TASK>
[174957.460534] irq event stamp: 0
[174957.461003] hardirqs last enabled at (0): [<0000000000000000>] 0x0
[174957.461947] hardirqs last disabled at (0): [<ffffffffb0e94214>] copy_process+0x934/0x2040
[174957.463147] softirqs last enabled at (0): [<ffffffffb0e94214>] copy_process+0x934/0x2040
[174957.465116] softirqs last disabled at (0): [<0000000000000000>] 0x0
[174957.466323] ---[ end trace bc7ee0c490bce3af ]---
[174957.467282] ------------[ cut here ]------------
[174957.468184] WARNING: CPU: 2 PID: 3206883 at fs/btrfs/block-group.c:3976 btrfs_free_block_groups+0x330/0x3c0 [btrfs]
[174957.470066] Modules linked in: btrfs overlay dm_zero (...)
[174957.483137] CPU: 2 PID: 3206883 Comm: umount Tainted: G W 5.16.0-rc5-btrfs-next-109 #1
[174957.484691] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
[174957.486853] RIP: 0010:btrfs_free_block_groups+0x330/0x3c0 [btrfs]
[174957.488050] Code: 00 00 00 ad de (...)
[174957.491479] RSP: 0018:ffffb70d41cffde0 EFLAGS: 00010206
[174957.492520] RAX: ffff8b08d79310b0 RBX: ffff8b09c3848000 RCX: 0000000000000000
[174957.493868] RDX: 0000000000000001 RSI: fffff443055ee600 RDI: ffffffffb1131846
[174957.495183] RBP: ffff8b08d79310b0 R08: 0000000000000000 R09: 0000000000000000
[174957.496580] R10: 0000000000000001 R11: 0000000000000000 R12: ffff8b08d7931000
[174957.498027] R13: ffff8b09c38492b0 R14: dead000000000122 R15: dead000000000100
[174957.499438] FS: 00007f328fb82800(0000) GS:ffff8b0a2d200000(0000) knlGS:0000000000000000
[174957.500990] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[174957.502117] CR2: 00007fff13563e98 CR3: 0000000404f4e005 CR4: 0000000000370ee0
[174957.503513] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[174957.504864] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[174957.506167] Call Trace:
[174957.506654] <TASK>
[174957.507047] close_ctree+0x301/0x357 [btrfs]
[174957.507867] ? call_rcu+0x16c/0x290
[174957.508567] generic_shutdown_super+0x74/0x120
[174957.509447] kill_anon_super+0x14/0x30
[174957.510194] btrfs_kill_super+0x12/0x20 [btrfs]
[174957.511123] deactivate_locked_super+0x31/0xa0
[174957.511976] cleanup_mnt+0x147/0x1c0
[174957.512610] task_work_run+0x5c/0xa0
[174957.513309] exit_to_user_mode_prepare+0x1e5/0x1f0
[174957.514231] syscall_exit_to_user_mode+0x16/0x40
[174957.515069] do_syscall_64+0x48/0xc0
[174957.515718] entry_SYSCALL_64_after_hwframe+0x44/0xae
[174957.516688] RIP: 0033:0x7f328fdc4a97
[174957.517413] Code: 03 0c 00 f7 d8 (...)
[174957.521052] RSP: 002b:00007fff13564ec8 EFLAGS: 00000246 ORIG_RAX: 00000000000000a6
[174957.522514] RAX: 0000000000000000 RBX: 00007f328feea264 RCX: 00007f328fdc4a97
[174957.523950] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000560b8ae51dd0
[174957.525375] RBP: 0000560b8ae51ba0 R08: 0000000000000000 R09: 00007fff13563c40
[174957.526763] R10: 00007f328fe49fc0 R11: 0000000000000246 R12: 0000000000000000
[174957.528058] R13: 0000560b8ae51dd0 R14: 0000560b8ae51cb0 R15: 0000000000000000
[174957.529404] </TASK>
[174957.529843] irq event stamp: 0
[174957.530256] hardirqs last enabled at (0): [<0000000000000000>] 0x0
[174957.531061] hardirqs last disabled at (0): [<ffffffffb0e94214>] copy_process+0x934/0x2040
[174957.532075] softirqs last enabled at (0): [<ffffffffb0e94214>] copy_process+0x934/0x2040
[174957.533083] softirqs last disabled at (0): [<0000000000000000>] 0x0
[174957.533865] ---[ end trace bc7ee0c490bce3b0 ]---
[174957.534452] BTRFS info (device dm-0): space_info 4 has 1070841856 free, is not full
[174957.535404] BTRFS info (device dm-0): space_info total=1073741824, used=2785280, pinned=0, reserved=49152, may_use=0, readonly=65536 zone_unusable=0
[174957.537029] BTRFS info (device dm-0): global_block_rsv: size 0 reserved 0
[174957.537859] BTRFS info (device dm-0): trans_block_rsv: size 0 reserved 0
[174957.538697] BTRFS info (device dm-0): chunk_block_rsv: size 0 reserved 0
[174957.539552] BTRFS info (device dm-0): delayed_block_rsv: size 0 reserved 0
[174957.540403] BTRFS info (device dm-0): delayed_refs_rsv: size 0 reserved 0
This also means that in case we have log tree extent buffers that are
still dirty, we can end up not cleaning them up in case we find an
extent buffer with EXTENT_BUFFER_WRITE_ERR set on it, as in that case
we have no way for iterating over the rest of the tree.
This issue is very often triggered with test cases generic/475 and
generic/648 from fstests.
The issue could almost be fixed by iterating over the io tree attached to
each log root which keeps tracks of the range of allocated extent buffers,
log_root->dirty_log_pages, however that does not work and has some
inconveniences:
1) After we sync the log, we clear the range of the extent buffers from
the io tree, so we can't find them after writeback. We could keep the
ranges in the io tree, with a separate bit to signal they represent
extent buffers already written, but that means we need to hold into
more memory until the transaction commits.
How much more memory is used depends a lot on whether we are able to
allocate contiguous extent buffers on disk (and how often) for a log
tree - if we are able to, then a single extent state record can
represent multiple extent buffers, otherwise we need multiple extent
state record structures to track each extent buffer.
In fact, my earlier approach did that:
https://lore.kernel.org/linux-btrfs/3aae7c6728257c7ce2279d6660ee2797e5e34bbd.1641300250.git.fdmanana@suse.com/
However that can cause a very significant negative impact on
performance, not only due to the extra memory usage but also because
we get a larger and deeper dirty_log_pages io tree.
We got a report that, on beefy machines at least, we can get such
performance drop with fsmark for example:
https://lore.kernel.org/linux-btrfs/20220117082426.GE32491@xsang-OptiPlex-9020/
2) We would be doing it only to deal with an unexpected and exceptional
case, which is basically failure to read an extent buffer from disk
due to IO failures. On a healthy system we don't expect transaction
aborts to happen after all;
3) Instead of relying on iterating the log tree or tracking the ranges
of extent buffers in the dirty_log_pages io tree, using the radix
tree that tracks extent buffers (fs_info->buffer_radix) to find all
log tree extent buffers is not reliable either, because after writeback
of an extent buffer it can be evicted from memory by the release page
callback of the btree inode (btree_releasepage()).
Since there's no way to be able to properly cleanup a log tree without
being able to read its extent buffers from disk and without using more
memory to track the logical ranges of the allocated extent buffers do
the following:
1) When we fail to cleanup a log tree, setup a flag that indicates that
failure;
2) Trigger writeback of all log tree extent buffers that are still dirty,
and wait for the writeback to complete. This is just to cleanup their
state, page states, page leaks, etc;
3) When unmounting the fs, ignore if the number of bytes reserved in a
block group and in a space_info is not 0 if, and only if, we failed to
cleanup a log tree. Also ignore only for metadata block groups and the
metadata space_info object.
This is far from a perfect solution, but it serves to silence test
failures such as those from generic/475 and generic/648. However having
a non-zero value for the reserved bytes counters on unmount after a
transaction abort, is not such a terrible thing and it's completely
harmless, it does not affect the filesystem integrity in any way.
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Clang static analysis reports this problem
ioctl.c:3333:8: warning: 3rd function call argument is an
uninitialized value
ret = exclop_start_or_cancel_reloc(fs_info,
cancel is only set in one branch of an if-check and is always used. So
initialize to false.
Fixes: 1a15eb724a ("btrfs: use btrfs_get_dev_args_from_path in dev removal ioctls")
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: Tom Rix <trix@redhat.com>
Signed-off-by: David Sterba <dsterba@suse.com>
At ioctl.c:create_snapshot(), we allocate a pending snapshot structure and
then attach it to the transaction's list of pending snapshots. After that
we call btrfs_commit_transaction(), and if that returns an error we jump
to 'fail' label, where we kfree() the pending snapshot structure. This can
result in a later use-after-free of the pending snapshot:
1) We allocated the pending snapshot and added it to the transaction's
list of pending snapshots;
2) We call btrfs_commit_transaction(), and it fails either at the first
call to btrfs_run_delayed_refs() or btrfs_start_dirty_block_groups().
In both cases, we don't abort the transaction and we release our
transaction handle. We jump to the 'fail' label and free the pending
snapshot structure. We return with the pending snapshot still in the
transaction's list;
3) Another task commits the transaction. This time there's no error at
all, and then during the transaction commit it accesses a pointer
to the pending snapshot structure that the snapshot creation task
has already freed, resulting in a user-after-free.
This issue could actually be detected by smatch, which produced the
following warning:
fs/btrfs/ioctl.c:843 create_snapshot() warn: '&pending_snapshot->list' not removed from list
So fix this by not having the snapshot creation ioctl directly add the
pending snapshot to the transaction's list. Instead add the pending
snapshot to the transaction handle, and then at btrfs_commit_transaction()
we add the snapshot to the list only when we can guarantee that any error
returned after that point will result in a transaction abort, in which
case the ioctl code can safely free the pending snapshot and no one can
access it anymore.
CC: stable@vger.kernel.org # 5.10+
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Check item size before accessing the device item to avoid out of bound
access, similar to inode_item check.
Signed-off-by: Su Yue <l@damenly.su>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
[BUG]
The following super simple script would crash btrfs at unmount time, if
CONFIG_BTRFS_ASSERT() is set.
mkfs.btrfs -f $dev
mount $dev $mnt
xfs_io -f -c "pwrite 0 4k" $mnt/file
umount $mnt
mount -r ro $dev $mnt
btrfs scrub start -Br $mnt
umount $mnt
This will trigger the following ASSERT() introduced by commit
0a31daa4b6 ("btrfs: add assertion for empty list of transactions at
late stage of umount").
That patch is definitely not the cause, it just makes enough noise for
developers.
[CAUSE]
We will start transaction for the following call chain during scrub:
scrub_enumerate_chunks()
|- btrfs_inc_block_group_ro()
|- btrfs_join_transaction()
However for RO mount, there is no running transaction at all, thus
btrfs_join_transaction() will start a new transaction.
Furthermore, since it's read-only mount, btrfs_sync_fs() will not call
btrfs_commit_super() to commit the new but empty transaction.
And leads to the ASSERT().
The bug has been there for a long time. Only the new ASSERT() makes it
noisy enough to be noticed.
[FIX]
For read-only scrub on read-only mount, there is no need to start a
transaction nor to allocate new chunks in btrfs_inc_block_group_ro().
Just do extra read-only mount check in btrfs_inc_block_group_ro(), and
if it's read-only, skip all chunk allocation and go inc_block_group_ro()
directly.
CC: stable@vger.kernel.org # 5.4+
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Now that the VFS will do something with the return values from
->sync_fs, make ours pass on error codes.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Acked-by: Christian Brauner <brauner@kernel.org>
Strangely, dquot_quota_sync ignores the return code from the ->sync_fs
call, which means that quotacalls like Q_SYNC never see the error. This
doesn't seem right, so fix that.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Acked-by: Christian Brauner <brauner@kernel.org>
Strangely, sync_filesystem ignores the return code from the ->sync_fs
call, which means that syscalls like syncfs(2) never see the error.
This doesn't seem right, so fix that.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Acked-by: Christian Brauner <brauner@kernel.org>
If we fail to synchronize the filesystem while preparing to freeze the
fs, abort the freeze.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Acked-by: Christian Brauner <brauner@kernel.org>
This reverts commit 478ba09edc.
That commit was meant as a fix for setattrs with by fd (e.g. ftruncate)
to use an open fid instead of the first fid it found on lookup.
The proper fix for that is to use the fid associated with the open file
struct, available in iattr->ia_file for such operations, and was
actually done just before in 6624664160 ("9p: retrieve fid from file
when file instance exist.")
As such, this commit is no longer required.
Furthermore, changing lookup to return open fids first had unwanted side
effects, as it turns out the protocol forbids the use of open fids for
further walks (e.g. clone_fid) and we broke mounts for some servers
enforcing this rule.
Note this only reverts to the old working behaviour, but it's still
possible for lookup to return open fids if dentry->d_fsdata is not set,
so more work is needed to make sure we respect this rule in the future,
for example by adding a flag to the lookup functions to only match
certain fid open modes depending on caller requirements.
Link: https://lkml.kernel.org/r/20220130130651.712293-1-asmadeus@codewreck.org
Fixes: 478ba09edc ("fs/9p: search open fids first")
Cc: stable@vger.kernel.org # v5.11+
Reported-by: ron minnich <rminnich@gmail.com>
Reported-by: ng@0x80.stream
Signed-off-by: Dominique Martinet <asmadeus@codewreck.org>
commit 6f1b228529 introduces a regression which can deadlock as
follows:
Task1: Task2:
jbd2_journal_commit_transaction ocfs2_test_bg_bit_allocatable
spin_lock(&jh->b_state_lock) jbd_lock_bh_journal_head
__jbd2_journal_remove_checkpoint spin_lock(&jh->b_state_lock)
jbd2_journal_put_journal_head
jbd_lock_bh_journal_head
Task1 and Task2 lock bh->b_state and jh->b_state_lock in different
order, which finally result in a deadlock.
So use jbd2_journal_[grab|put]_journal_head instead in
ocfs2_test_bg_bit_allocatable() to fix it.
Link: https://lkml.kernel.org/r/20220121071205.100648-3-joseph.qi@linux.alibaba.com
Fixes: 6f1b228529 ("ocfs2: fix race between searching chunks and release journal_head from buffer_head")
Signed-off-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Reported-by: Gautham Ananthakrishna <gautham.ananthakrishna@oracle.com>
Tested-by: Gautham Ananthakrishna <gautham.ananthakrishna@oracle.com>
Reported-by: Saeed Mirzamohammadi <saeed.mirzamohammadi@oracle.com>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: Andreas Dilger <adilger.kernel@dilger.ca>
Cc: Changwei Ge <gechangwei@live.cn>
Cc: Gang He <ghe@suse.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Jun Piao <piaojun@huawei.com>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Patch series "ocfs2: fix a deadlock case".
This fixes a deadlock case in ocfs2. We firstly export jbd2 symbols
jbd2_journal_[grab|put]_journal_head as preparation and later use them
in ocfs2 insread of jbd_[lock|unlock]_bh_journal_head to fix the
deadlock.
This patch (of 2):
This exports symbols jbd2_journal_[grab|put]_journal_head, which will be
used outside modules, e.g. ocfs2.
Link: https://lkml.kernel.org/r/20220121071205.100648-2-joseph.qi@linux.alibaba.com
Signed-off-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Changwei Ge <gechangwei@live.cn>
Cc: Gang He <ghe@suse.com>
Cc: Jun Piao <piaojun@huawei.com>
Cc: Andreas Dilger <adilger.kernel@dilger.ca>
Cc: Gautham Ananthakrishna <gautham.ananthakrishna@oracle.com>
Cc: Saeed Mirzamohammadi <saeed.mirzamohammadi@oracle.com>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
We should unregister the table upon module unload otherwise something
horrible will happen when we load binfmt_misc module again. Also note
that we should keep value returned by register_sysctl_mount_point() and
release it later, otherwise it will leak.
Also, per Christian's comment, to fully restore the old behavior that
won't break userspace the check(binfmt_misc_header) should be
eliminated.
To reproduce:
modprobe binfmt_misc
modprobe -r binfmt_misc
modprobe binfmt_misc
modprobe -r binfmt_misc
modprobe binfmt_misc
resulting in
modprobe: can't load module binfmt_misc (kernel/fs/binfmt_misc.ko): Cannot allocate memory
and an unhappy kernel:
binfmt_misc: Failed to create fs/binfmt_misc sysctl mount point
binfmt_misc: Failed to create fs/binfmt_misc sysctl mount point
BUG: unable to handle page fault for address: fffffbfff8004802
Call Trace:
init_misc_binfmt+0x2d/0x1000 [binfmt_misc]
Link: https://lkml.kernel.org/r/20220124181812.1869535-2-ztong0001@gmail.com
Fixes: 3ba442d533 ("fs: move binfmt_misc sysctl to its own file")
Signed-off-by: Tong Zhang <ztong0001@gmail.com>
Co-developed-by: Christian Brauner<brauner@kernel.org>
Acked-by: Luis Chamberlain <mcgrof@kernel.org>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Iurii Zaikin <yzaikin@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
While removing an smb session, we need to free up the
tcp session for each channel for that session. We were
doing this with chan_lock held. This results in a
cyclic dependency with cifs_tcp_ses_lock.
For now, unlock the chan_lock temporarily before calling
cifs_put_tcp_session. This should not cause any problem
for now, since we do not remove channels anywhere else.
And this code segment will not be called by two threads.
When we do implement the code for removing channels, we
will need to execute proper ref counting here.
Signed-off-by: Shyam Prasad N <sprasad@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
-----BEGIN PGP SIGNATURE-----
iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmH0Z+0QHGF4Ym9lQGtl
cm5lbC5kawAKCRD301j7KXHgpgisD/99DrERqFZpgyWybHamp+ehzZLiAjJUKskQ
ihRuQ5lULQvtKTHpJm6uDw2SbwA/mJNxT2UIdQtMlqbtLOdkLnE5CWr587rbI88i
HqW2FYiGJOHbU5abEbCno47OsaHiwtgcANFQn/RzQgpWQWvYhQac0I9CnEvkkpzI
wBlgVdSZ5rc5u309fulK3SPPtiaZ6ThwBxmgAqirW/1c4n6dLFp6Eu/ORAdRoFRE
fzokylWi9sGSPH5ansvoN/PtiDdn4RW43hc/Jwf9F09MSFpsEm/NCE2HYp96T1U6
dn5O9xxN7dGe9+IsB1VreQSaO+ELbZRzlFSSvmZLPf03LX3O6ZiyIphABCThVSp2
u3FdVHfZFRfS208krqFmjPEtxFfRNLJTwdUN4o7gsd7hBXyj1aR2/kySyg7dAV5D
cBxJrRZp3+He4BJN4R/HDKt1tMWFroLT30IOHvVb/DHMo6Ow7yvtS65Iu43QD7G2
G9WeeVv1pXU+jwSTfhK45oiAyvIO1ud9DqDbWiorOKQwQz8P5JUUmy8392FDIcuJ
f6+PgJlcJzlzoC7kwxw0QEarkySYxQ5ezAcppa99p/ylqosr6M61/QrAp5xdyziH
RzlmmIQ5VCgnDvdnmCtCtnyaptbAiJjozcXHe7SaAI2CK091AhZqUnN8eUkGOJaG
8eYcyVT2Dg==
=G140
-----END PGP SIGNATURE-----
Merge tag 'io_uring-5.17-2022-01-28' of git://git.kernel.dk/linux-block
Pull io_uring fixes from Jens Axboe:
"Just two small fixes this time:
- Fix a bug that can lead to node registration taking 1 second, when
it should finish much quicker (Dylan)
- Remove an unused argument from a function (Usama)"
* tag 'io_uring-5.17-2022-01-28' of git://git.kernel.dk/linux-block:
io_uring: remove unused argument from io_rsrc_node_alloc
io_uring: fix bug in slow unregistering of nodes
creates interacting with pool namespace-constrained OSD permissions
from Jeff (marked for stable).
-----BEGIN PGP SIGNATURE-----
iQFHBAABCAAxFiEEydHwtzie9C7TfviiSn/eOAIR84sFAmH0FAkTHGlkcnlvbW92
QGdtYWlsLmNvbQAKCRBKf944AhHzixpkB/4ssxwtq8aP82Kh/WuS9qdtofWtZvOB
6BPJdxseWvVMk9Bw/bO6BrxHpHJJKbhPPxKkhJGliSi19kWJKC7FYbuDRp7ffj4Y
3UDiRaD5aQyjFi1rIrBjDb3dgA1dxdXH4EVAPf44dlRD7HjqaglVldFWSHxqH1RQ
v4eB9RYHzNZKuAFsXF4+bzwZXMWgzaMXDNA5AzgRb8Zb3I2K2xSsIxDHD8MHlw8v
BFX7QNobsxi9vjmENOmH8WcjWx8abtdNl0ndNmHcc3u2QyNEfRcdx8DIytQCVRSw
gCyQVLgZIiedeZ84D6jmQu+RR668ztH+X6/QWWj5eHh6YlvtyqTPnuHe
=VuFn
-----END PGP SIGNATURE-----
Merge tag 'ceph-for-5.17-rc2' of git://github.com/ceph/ceph-client
Pull ceph fixes from Ilya Dryomov:
"A ZERO_SIZE_PTR dereference fix from Xiubo and two fixes for async
creates interacting with pool namespace-constrained OSD permissions
from Jeff (marked for stable)"
* tag 'ceph-for-5.17-rc2' of git://github.com/ceph/ceph-client:
ceph: set pool_ns in new inode layout for async creates
ceph: properly put ceph_string reference after async create attempt
ceph: put the requests/sessions when it fails to alloc memory
Fix by removing the extra asterisk.
Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Rohith Surabattula <rohiths@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
The kernel test robot reports that commit c42ff46f97 ("ocfs2: simplify
subdirectory registration with register_sysctl()") is broken, and
results in kernel warning messages like
sysctl table check failed: fs/ocfs2/nm Not a file
sysctl table check failed: fs/ocfs2/nm No proc_handler
sysctl table check failed: fs/ocfs2/nm bogus .mode 0555
and in fact this was already reported back in linux-next, but nobody
seems to have reacted to that report. Possibly that original report
only ever made it to the lkp list.
The problem seems to be that the simplification didn't actually go far
enough, and should have converted the whole directory path to the final
sysctl file, rather than just the two first components.
So take that last step.
Fixes: c42ff46f97 ("ocfs2: simplify subdirectory registration with register_sysctl()")
Reported-by: kernel test robot <oliver.sang@intel.com>
Link: https://lore.kernel.org/all/20220128065310.GF8421@xsang-OptiPlex-9020/
Link: https://lists.01.org/hyperkitty/list/lkp@lists.01.org/thread/KQ2F6TPJWMDVEXJM4WTUC4DU3EH3YJVT/
Tested-by: Jan Kara <jack@suse.cz>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: Luis Chamberlain <mcgrof@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-----BEGIN PGP SIGNATURE-----
iQEzBAABCAAdFiEEq1nRK9aeMoq1VSgcnJ2qBz9kQNkFAmHz0QsACgkQnJ2qBz9k
QNkN+AgA6XqWHKYyElfgJFt1UqaoNMz/Faz9H/+PKiBNSTf6/+67D+V7DFz6jJrv
dDwHNzfDg9kR+pbAAPwhl2jfnQoUlsr019Hrqa5HpWlj5geVpbdunYUzS2WOkwqD
/m+brcLgPdKb2uIysj6wOh9B7wa8V9ksl3EjQvvwaHaU0p1YLUqidVXucYvs8DUo
bgXNaj9GmeysxnmU+aILotWuuXH2vOP4Q2Uk4qz3rN6xW9eEXtpQ4y7gWBp/GA8y
Ia8FtFdQdvlSDOJYMdPOTBu5RB7gY9ElrapvVaWNYtCWI/jRv666nZsLaERYNhLN
uUsG4MWjYbOqW5XqFDbSOwbDqvMh5Q==
=mEFA
-----END PGP SIGNATURE-----
Merge tag 'fsnotify_for_v5.17-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs
Pull fsnotify fixes from Jan Kara:
"Fixes for userspace breakage caused by fsnotify changes ~3 years ago
and one fanotify cleanup"
* tag 'fsnotify_for_v5.17-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
fsnotify: fix fsnotify hooks in pseudo filesystems
fsnotify: invalidate dcache before IN_DELETE event
fanotify: remove variable set but not used
-----BEGIN PGP SIGNATURE-----
iQEzBAABCAAdFiEEq1nRK9aeMoq1VSgcnJ2qBz9kQNkFAmHz0I4ACgkQnJ2qBz9k
QNkvkAf/fD0swq13rFWiMDo+Azj9BwdjTeK7kEhGBCw9rL3BchH+OTMgPmcdplTJ
uzXVyWZ98hsCBY4Aclu3Grcy1Cj9A3+JW9FqrWbwWy8t583DuaV31UL99AXb/4sH
xYAmMm9V9gO5ckIAffHzHcEyN+dHMj89zl8vAtxGnlwR6gZLE1wjB3AmWBlctJpE
bWQWCpuJsW+7o7ky5hIQ2hQWLxb5OejdYtLu6Su4tM86xfMwi3pB0tzilJ6jorAH
eH5LBt5pVDI+gYQY1WYoZTy4eQKM0lhVW3WplhbNJwkJUupNS5xCXFfwB8bUppwc
+hyGn5G3GJZ2Ya+xALO3GfS8fX9ESQ==
=dQRa
-----END PGP SIGNATURE-----
Merge tag 'fs_for_v5.17-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs
Pull udf and quota fixes from Jan Kara:
"Fixes for crashes in UDF when inode expansion fails and one quota
cleanup"
* tag 'fs_for_v5.17-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
quota: cleanup double word in comment
udf: Restore i_lenAlloc when inode expansion fails
udf: Fix NULL ptr deref when converting from inline format
From RFC 7530 Section 16.34.5:
o The server has not recorded an unconfirmed { v, x, c, *, * } and
has recorded a confirmed { v, x, c, *, s }. If the principals of
the record and of SETCLIENTID_CONFIRM do not match, the server
returns NFS4ERR_CLID_INUSE without removing any relevant leased
client state, and without changing recorded callback and
callback_ident values for client { x }.
The current code intends to do what the spec describes above but
it forgot to set 'old' to NULL resulting to the confirmed client
to be expired.
Fixes: 2b63482185 ("nfsd: fix clid_inuse on mount with security change")
Signed-off-by: Dai Ngo <dai.ngo@oracle.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Bruce Fields <bfields@fieldses.org>
In my testing, we're sometimes hitting the request->fl_flags & FL_EXISTS
case in posix_lock_inode, presumably just by random luck since we're not
actually initializing fl_flags here.
This probably didn't matter before commit 7f024fcd5c ("Keep read and
write fds with each nlm_file") since we wouldn't previously unlock
unless we knew there were locks.
But now it causes lockd to give up on removing more locks.
We could just initialize fl_flags, but really it seems dubious to be
calling vfs_lock_file with random values in some of the fields.
Fixes: 7f024fcd5c ("Keep read and write fds with each nlm_file")
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
[ cel: fixed checkpatch.pl nit ]
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>