* Enable FITRIM on the realtime device.
* Introduce byte-based grant head log reservation tracking instead of
physical log location tracking.
This allows grant head to track a full 64 bit bytes space and hence
overcome the limit of 4GB indexing that has been present until now.
* Fixes
- xfs_flush_unmap_range() and xfs_prepare_shift() should consider RT extents
in the flush unmap range.
- Implement bounds check when traversing log operations during log replay.
- Prevent out of bounds access when traversing a directory data block.
- Prevent incorrect ENOSPC when concurrently performing file creation and
file writes.
- Fix rtalloc rotoring when delalloc is in use
* Cleanups
- Clean up I/O path inode locking helpers and the page fault handler.
- xfs: hoist inode operations to libxfs in anticipation of the metadata
inode directory feature, which maintains a directory tree of metadata
inodes. This will be necessary for further enhancements to the realtime
feature, subvolume support.
- Clean up some warts in the extent freeing log intent code.
- Clean up the refcount and rmap intent code before adding support for
realtime devices.
- Provide the correct email address for sysfs ABI documentation.
Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQQjMC4mbgVeU7MxEIYH7y4RirJu9AUCZo9pkwAKCRAH7y4RirJu
9EV+AQDBlX2AxTzKPsfb74qKaFgDpTdud8b1U779tijs4a6ZbwD8CvS40NXAjqmq
R2j3wWQP3rkRxBusnStQ/9El20Q+WAI=
=BcGP
-----END PGP SIGNATURE-----
Merge tag 'xfs-6.11-merge-3' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux
Pull xfs updates from Chandan Babu:
"Major changes in this release are limited to enabling FITRIM on
realtime devices and Byte-based grant head log reservation tracking.
The remaining changes are limited to fixes and cleanups included in
this pull request.
Core:
- Enable FITRIM on the realtime device
- Introduce byte-based grant head log reservation tracking instead of
physical log location tracking.
This allows grant head to track a full 64 bit bytes space and hence
overcome the limit of 4GB indexing that has been present until now
Fixes:
- xfs_flush_unmap_range() and xfs_prepare_shift() should consider RT
extents in the flush unmap range
- Implement bounds check when traversing log operations during log
replay
- Prevent out of bounds access when traversing a directory data block
- Prevent incorrect ENOSPC when concurrently performing file creation
and file writes
- Fix rtalloc rotoring when delalloc is in use
Cleanups:
- Clean up I/O path inode locking helpers and the page fault handler
- xfs: hoist inode operations to libxfs in anticipation of the
metadata inode directory feature, which maintains a directory tree
of metadata inodes. This will be necessary for further enhancements
to the realtime feature, subvolume support
- Clean up some warts in the extent freeing log intent code
- Clean up the refcount and rmap intent code before adding support
for realtime devices
- Provide the correct email address for sysfs ABI documentation"
* tag 'xfs-6.11-merge-3' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: (80 commits)
xfs: fix rtalloc rotoring when delalloc is in use
xfs: get rid of xfs_ag_resv_rmapbt_alloc
xfs: skip flushing log items during push
xfs: grant heads track byte counts, not LSNs
xfs: pass the full grant head to accounting functions
xfs: track log space pinned by the AIL
xfs: collapse xlog_state_set_callback in caller
xfs: l_last_sync_lsn is really AIL state
xfs: ensure log tail is always up to date
xfs: background AIL push should target physical space
xfs: AIL doesn't need manual pushing
xfs: move and rename xfs_trans_committed_bulk
xfs: fix the contact address for the sysfs ABI documentation
xfs: Avoid races with cnt_btree lastrec updates
xfs: move xfs_refcount_update_defer_add to xfs_refcount_item.c
xfs: simplify usage of the rcur local variable in xfs_refcount_finish_one
xfs: don't bother calling xfs_refcount_finish_one_cleanup in xfs_refcount_finish_one
xfs: reuse xfs_refcount_update_cancel_item
xfs: add a ci_entry helper
xfs: remove xfs_trans_set_refcount_flags
...
INIT_XATTRS is overloaded here -- it's set during the creat process when
we think that we're immediately going to set some ACL xattrs to save
time. However, it's also used by the parent pointers code to enable the
attr fork in preparation to receive ppptr xattrs. This results in
xfs_has_parent() branches scattered around the codebase to turn on
INIT_XATTRS.
Linkable files are created far more commonly than unlinkable temporary
files or directory tree roots, so we should centralize this logic in
xfs_inode_init. For the three callers that don't want parent pointers
(online repiar tempfiles, unlinkable tempfiles, rootdir creation) we
provide an UNLINKABLE flag to skip attr fork initialization.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Move the initialization of the xfs_icreate_args structure out of
xfs_create and xfs_create_tempfile into their callers so that we can set
the new inode's attributes in one place and pass that through instead of
open coding the collection of attributes all over the code.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Move the extent size helpers to xfs_bmap.c in libxfs since they're used
there already.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
When unaligned truncate down a big realtime file, xfs_truncate_page()
only zeros out the tail EOF block, __xfs_bunmapi() should split the tail
written extent and convert the later one that beyond EOF block to
unwritten, but it couldn't work as expected now since the reserved block
is zero in xfs_setattr_size(), this could expose stale data just after
commit '943bc0882ceb ("iomap: don't increase i_size if it's not a write
operation")'.
If we truncate file that contains a large enough written extent:
|< rxext >|< rtext >|
...WWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWW
^ (new EOF) ^ old EOF
Since we only zeros out the tail of the EOF block, and
xfs_itruncate_extents()->..->__xfs_bunmapi() unmap the whole ailgned
extents, it becomes this state:
|< rxext >|
...WWWzWWWWWWWWWWWWW
^ new EOF
Then if we do an extending write like this, the blocks in the previous
tail extent becomes stale:
|< rxext >|
...WWWzSSSSSSSSSSSSS..........WWWWWWWWWWWWWWWWW
^ old EOF ^ append start ^ new EOF
Fix this by reserving XFS_DIOSTRAT_SPACE_RES blocks for big realtime
inode.
Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
Link: https://lore.kernel.org/r/20240618142112.1315279-2-yi.zhang@huaweicloud.com
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
Add parent pointer attribute during xfs_create, and subroutines to
initialize attributes. Note that the xfs_attr_intent object contains a
pointer to the caller's xfs_da_args object, so the latter must persist
until transaction commit.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Allison Henderson <allison.henderson@oracle.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
[djwong: shorten names, adjust to new format, set init_xattrs for parent
pointers]
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Tmp files are used as part of rename operations and will need attr forks
initialized for parent pointers. Expose the init_xattrs parameter to
the calling function to initialize the fork.
Signed-off-by: Allison Henderson <allison.henderson@oracle.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Parent pointers match attrs on name+value, unlike everything else which
matches on only the name. Therefore, we cannot keep using the heuristic
that !value means remove. Make this an explicit operation code.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
This field only ever contains XATTR_{CREATE,REPLACE}, and it only goes
as deep as xfs_attr_set. Remove the field from the structure and
replace it with an enum specifying exactly what kind of change we want
to make to the xattr structure. Upsert is the name that we'll give to
the flags==0 operation, because we're either updating an existing value
or inserting it, and the caller doesn't care.
Note: The "UPSERTR" name created here is to make userspace porting
easier. It will be removed in the next patch.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Commit aff3a9edb7 ("xfs: Use preallocation for inodes with extsz
hints") disabled delayed allocation for all inodes with extent size
hints due a data exposure problem. It turns out we fixed this data
exposure problem since by always creating unwritten extents for
delalloc conversions due to more data exposure problems, but the
writeback path doesn't actually support extent size hints when
converting delalloc these days, which probably isn't a problem given
that people using the hints know what they get.
However due to the way how xfs_get_extsz_hint is implemented, it
always claims an extent size hint for RT inodes even if the RT
extent size is a single FSB. Due to that the above commit effectively
disabled delalloc support for RT inodes.
Switch xfs_get_extsz_hint to return 0 for this case and work around
that in a few places to reinstate delalloc support for RT inodes on
file systems with an sb_rextsize of 1.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
We're about to start adding functionality that uses internal inodes that
are private to XFS. What this means is that userspace should never be
able to access any information about these files, and should not be able
to open these files by handle.
To prevent users from ever finding the file or mis-interactions with the
security apparatus, set S_PRIVATE on the inode. Don't allow bulkstat,
open-by-handle, or linking of S_PRIVATE files into the directory tree.
This should keep private inodes actually private.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Move the two public symbols in xfs_file.c to xfs_file.h. We're about to
add more public symbols in that source file, so let's finally create the
header file.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
mrlock was an rwsem wrapper that also recorded whether the lock was
held for read or write. Now that we can ask the generic code whether
the lock is held for read or write, we can remove this wrapper and use
an rwsem directly.
As the comment says, we can't use lockdep to assert that the ILOCK is
held for write, because we might be in a workqueue, and we aren't able
to tell lockdep that we do in fact own the lock.
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
To use the new rwsem_assert_held()/rwsem_assert_held_write(), we can't
use the existing ASSERT macro. Add a new xfs_assert_ilocked() and
convert all the callers.
Fix an apparent bug in xfs_isilocked(): If the caller specifies
XFS_IOLOCK_EXCL | XFS_ILOCK_EXCL, xfs_assert_ilocked() will check both
the IOLOCK and the ILOCK are held for write. xfs_isilocked() only
checked that the ILOCK was held for write.
xfs_assert_ilocked() is always on, even if DEBUG or XFS_WARN aren't
defined. It's a cheap check, so I don't think it's worth defining
it away.
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
The remaining callers of kmem_free() are freeing heap memory, so
we can convert them directly to kfree() and get rid of kmem_free()
altogether.
This conversion was done with:
$ for f in `git grep -l kmem_free fs/xfs`; do
> sed -i s/kmem_free/kfree/ $f
> done
$
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
Update the per-folio stable writes flag dependening on which device an
inode resides on.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20231025141020.192413-5-hch@lst.de
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCZTppYgAKCRCRxhvAZXjc
okIHAP9anLz1QDyMLH12ASuHjgBc0Of3jcB6NB97IWGpL4O21gEA46ohaD+vcJuC
YkBLU3lXqQ87nfu28ExFAzh10hG2jwM=
=m4pB
-----END PGP SIGNATURE-----
Merge tag 'vfs-6.7.ctime' of gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs
Pull vfs inode time accessor updates from Christian Brauner:
"This finishes the conversion of all inode time fields to accessor
functions as discussed on list. Changing timestamps manually as we
used to do before is error prone. Using accessors function makes this
robust.
It does not contain the switch of the time fields to discrete 64 bit
integers to replace struct timespec and free up space in struct inode.
But after this, the switch can be trivially made and the patch should
only affect the vfs if we decide to do it"
* tag 'vfs-6.7.ctime' of gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs: (86 commits)
fs: rename inode i_atime and i_mtime fields
security: convert to new timestamp accessors
selinux: convert to new timestamp accessors
apparmor: convert to new timestamp accessors
sunrpc: convert to new timestamp accessors
mm: convert to new timestamp accessors
bpf: convert to new timestamp accessors
ipc: convert to new timestamp accessors
linux: convert to new timestamp accessors
zonefs: convert to new timestamp accessors
xfs: convert to new timestamp accessors
vboxsf: convert to new timestamp accessors
ufs: convert to new timestamp accessors
udf: convert to new timestamp accessors
ubifs: convert to new timestamp accessors
tracefs: convert to new timestamp accessors
sysv: convert to new timestamp accessors
squashfs: convert to new timestamp accessors
server: convert to new timestamp accessors
client: convert to new timestamp accessors
...
The handling of STATX_CHANGE_COOKIE was moved into generic_fillattr in
commit 0d72b92883 (fs: pass the request_mask to generic_fillattr), but
we didn't account for the fact that xfs doesn't call generic_fillattr at
all.
Make XFS report its i_version as the STATX_CHANGE_COOKIE.
Fixes: 0d72b92883 (fs: pass the request_mask to generic_fillattr)
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
This reverts commit e44df26647.
Users reported regressions due to enabling multi-grained timestamps
unconditionally. As no clear consensus on a solution has come up and the
discussion has gone back to the drawing board revert the infrastructure
changes for. If it isn't code that's here to stay, make it go away.
Message-ID: <20230920-keine-eile-c9755b5825db@brauner>
Acked-by: Jan Kara <jack@suse.cz>
Acked-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
Enable multigrain timestamps, which should ensure that there is an
apparent change to the timestamp whenever it has been written after
being actively observed via getattr.
Also, anytime the mtime changes, the ctime must also change, and those
are now the only two options for xfs_trans_ichgtime. Have that function
unconditionally bump the ctime, and ASSERT that XFS_ICHGTIME_CHG is
always set.
Acked-by: "Darrick J. Wong" <djwong@kernel.org>
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Message-Id: <20230807-mgctime-v7-11-d1dec143a704@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
Now that all of the update_time operations are prepared for it, we can
drop the timespec64 argument from the update_time operation. Do that and
remove it from some associated functions like inode_update_time and
inode_needs_update_time.
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Jan Kara <jack@suse.cz>
Message-Id: <20230807-mgctime-v7-8-d1dec143a704@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
In later patches we're going to drop the "now" parameter from the
update_time operation. Prepare XFS for this by reworking how it fetches
timestamps and sets them in the inode. Ensure that we update the ctime
even if only S_MTIME is set.
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Acked-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Jan Kara <jack@suse.cz>
Message-Id: <20230807-mgctime-v7-7-d1dec143a704@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
In future patches we're going to change how the ctime is updated
to keep track of when it has been queried. The way that the update_time
operation works (and a lot of its callers) make this difficult, since
they grab a timestamp early and then pass it down to eventually be
copied into the inode.
All of the existing update_time callers pass in the result of
current_time() in some fashion. Drop the "time" parameter from
generic_update_time, and rework it to fetch its own timestamp.
This change means that an update_time could fetch a different timestamp
than was seen in inode_needs_update_time. update_time is only ever
called with one of two flag combinations: Either S_ATIME is set, or
S_MTIME|S_CTIME|S_VERSION are set.
With this change we now treat the flags argument as an indicator that
some value needed to be updated when last checked, rather than an
indication to update specific timestamps.
Rework the logic for updating the timestamps and put it in a new
inode_update_timestamps helper that other update_time routines can use.
S_ATIME is as treated as we always have, but if any of the other three
are set, then we attempt to update all three.
Also, some callers of generic_update_time need to know what timestamps
were actually updated. Change it to return an S_* flag mask to indicate
that and rework the callers to expect it.
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Jan Kara <jack@suse.cz>
Message-Id: <20230807-mgctime-v7-3-d1dec143a704@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
In later patches, we're going to change how the inode's ctime field is
used. Switch to using accessor functions instead of raw accesses of
inode->i_ctime.
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Jan Kara <jack@suse.cz>
Message-Id: <20230705190309.579783-80-jlayton@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
Convert to struct mnt_idmap.
Last cycle we merged the necessary infrastructure in
256c8aed2b ("fs: introduce dedicated idmap type for mounts").
This is just the conversion to struct mnt_idmap.
Currently we still pass around the plain namespace that was attached to a
mount. This is in general pretty convenient but it makes it easy to
conflate namespaces that are relevant on the filesystem with namespaces
that are relevent on the mount level. Especially for non-vfs developers
without detailed knowledge in this area this can be a potential source for
bugs.
Once the conversion to struct mnt_idmap is done all helpers down to the
really low-level helpers will take a struct mnt_idmap argument instead of
two namespace arguments. This way it becomes impossible to conflate the two
eliminating the possibility of any bugs. All of the vfs and all filesystems
only operate on struct mnt_idmap.
Acked-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
Convert to struct mnt_idmap.
Remove legacy file_mnt_user_ns() and mnt_user_ns().
Last cycle we merged the necessary infrastructure in
256c8aed2b ("fs: introduce dedicated idmap type for mounts").
This is just the conversion to struct mnt_idmap.
Currently we still pass around the plain namespace that was attached to a
mount. This is in general pretty convenient but it makes it easy to
conflate namespaces that are relevant on the filesystem with namespaces
that are relevent on the mount level. Especially for non-vfs developers
without detailed knowledge in this area this can be a potential source for
bugs.
Once the conversion to struct mnt_idmap is done all helpers down to the
really low-level helpers will take a struct mnt_idmap argument instead of
two namespace arguments. This way it becomes impossible to conflate the two
eliminating the possibility of any bugs. All of the vfs and all filesystems
only operate on struct mnt_idmap.
Acked-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
Convert to struct mnt_idmap.
Last cycle we merged the necessary infrastructure in
256c8aed2b ("fs: introduce dedicated idmap type for mounts").
This is just the conversion to struct mnt_idmap.
Currently we still pass around the plain namespace that was attached to a
mount. This is in general pretty convenient but it makes it easy to
conflate namespaces that are relevant on the filesystem with namespaces
that are relevent on the mount level. Especially for non-vfs developers
without detailed knowledge in this area this can be a potential source for
bugs.
Once the conversion to struct mnt_idmap is done all helpers down to the
really low-level helpers will take a struct mnt_idmap argument instead of
two namespace arguments. This way it becomes impossible to conflate the two
eliminating the possibility of any bugs. All of the vfs and all filesystems
only operate on struct mnt_idmap.
Acked-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
Convert to struct mnt_idmap.
Last cycle we merged the necessary infrastructure in
256c8aed2b ("fs: introduce dedicated idmap type for mounts").
This is just the conversion to struct mnt_idmap.
Currently we still pass around the plain namespace that was attached to a
mount. This is in general pretty convenient but it makes it easy to
conflate namespaces that are relevant on the filesystem with namespaces
that are relevent on the mount level. Especially for non-vfs developers
without detailed knowledge in this area this can be a potential source for
bugs.
Once the conversion to struct mnt_idmap is done all helpers down to the
really low-level helpers will take a struct mnt_idmap argument instead of
two namespace arguments. This way it becomes impossible to conflate the two
eliminating the possibility of any bugs. All of the vfs and all filesystems
only operate on struct mnt_idmap.
Acked-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
Convert to struct mnt_idmap.
Last cycle we merged the necessary infrastructure in
256c8aed2b ("fs: introduce dedicated idmap type for mounts").
This is just the conversion to struct mnt_idmap.
Currently we still pass around the plain namespace that was attached to a
mount. This is in general pretty convenient but it makes it easy to
conflate namespaces that are relevant on the filesystem with namespaces
that are relevent on the mount level. Especially for non-vfs developers
without detailed knowledge in this area this can be a potential source for
bugs.
Once the conversion to struct mnt_idmap is done all helpers down to the
really low-level helpers will take a struct mnt_idmap argument instead of
two namespace arguments. This way it becomes impossible to conflate the two
eliminating the possibility of any bugs. All of the vfs and all filesystems
only operate on struct mnt_idmap.
Acked-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
Convert to struct mnt_idmap.
Last cycle we merged the necessary infrastructure in
256c8aed2b ("fs: introduce dedicated idmap type for mounts").
This is just the conversion to struct mnt_idmap.
Currently we still pass around the plain namespace that was attached to a
mount. This is in general pretty convenient but it makes it easy to
conflate namespaces that are relevant on the filesystem with namespaces
that are relevent on the mount level. Especially for non-vfs developers
without detailed knowledge in this area this can be a potential source for
bugs.
Once the conversion to struct mnt_idmap is done all helpers down to the
really low-level helpers will take a struct mnt_idmap argument instead of
two namespace arguments. This way it becomes impossible to conflate the two
eliminating the possibility of any bugs. All of the vfs and all filesystems
only operate on struct mnt_idmap.
Acked-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
Convert to struct mnt_idmap.
Last cycle we merged the necessary infrastructure in
256c8aed2b ("fs: introduce dedicated idmap type for mounts").
This is just the conversion to struct mnt_idmap.
Currently we still pass around the plain namespace that was attached to a
mount. This is in general pretty convenient but it makes it easy to
conflate namespaces that are relevant on the filesystem with namespaces
that are relevent on the mount level. Especially for non-vfs developers
without detailed knowledge in this area this can be a potential source for
bugs.
Once the conversion to struct mnt_idmap is done all helpers down to the
really low-level helpers will take a struct mnt_idmap argument instead of
two namespace arguments. This way it becomes impossible to conflate the two
eliminating the possibility of any bugs. All of the vfs and all filesystems
only operate on struct mnt_idmap.
Acked-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
Convert to struct mnt_idmap.
Last cycle we merged the necessary infrastructure in
256c8aed2b ("fs: introduce dedicated idmap type for mounts").
This is just the conversion to struct mnt_idmap.
Currently we still pass around the plain namespace that was attached to a
mount. This is in general pretty convenient but it makes it easy to
conflate namespaces that are relevant on the filesystem with namespaces
that are relevent on the mount level. Especially for non-vfs developers
without detailed knowledge in this area this can be a potential source for
bugs.
Once the conversion to struct mnt_idmap is done all helpers down to the
really low-level helpers will take a struct mnt_idmap argument instead of
two namespace arguments. This way it becomes impossible to conflate the two
eliminating the possibility of any bugs. All of the vfs and all filesystems
only operate on struct mnt_idmap.
Acked-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
Convert to struct mnt_idmap.
Last cycle we merged the necessary infrastructure in
256c8aed2b ("fs: introduce dedicated idmap type for mounts").
This is just the conversion to struct mnt_idmap.
Currently we still pass around the plain namespace that was attached to a
mount. This is in general pretty convenient but it makes it easy to
conflate namespaces that are relevant on the filesystem with namespaces
that are relevent on the mount level. Especially for non-vfs developers
without detailed knowledge in this area this can be a potential source for
bugs.
Once the conversion to struct mnt_idmap is done all helpers down to the
really low-level helpers will take a struct mnt_idmap argument instead of
two namespace arguments. This way it becomes impossible to conflate the two
eliminating the possibility of any bugs. All of the vfs and all filesystems
only operate on struct mnt_idmap.
Acked-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
Convert to struct mnt_idmap.
Last cycle we merged the necessary infrastructure in
256c8aed2b ("fs: introduce dedicated idmap type for mounts").
This is just the conversion to struct mnt_idmap.
Currently we still pass around the plain namespace that was attached to a
mount. This is in general pretty convenient but it makes it easy to
conflate namespaces that are relevant on the filesystem with namespaces
that are relevent on the mount level. Especially for non-vfs developers
without detailed knowledge in this area this can be a potential source for
bugs.
Once the conversion to struct mnt_idmap is done all helpers down to the
really low-level helpers will take a struct mnt_idmap argument instead of
two namespace arguments. This way it becomes impossible to conflate the two
eliminating the possibility of any bugs. All of the vfs and all filesystems
only operate on struct mnt_idmap.
Acked-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
Convert to struct mnt_idmap.
Last cycle we merged the necessary infrastructure in
256c8aed2b ("fs: introduce dedicated idmap type for mounts").
This is just the conversion to struct mnt_idmap.
Currently we still pass around the plain namespace that was attached to a
mount. This is in general pretty convenient but it makes it easy to
conflate namespaces that are relevant on the filesystem with namespaces
that are relevent on the mount level. Especially for non-vfs developers
without detailed knowledge in this area this can be a potential source for
bugs.
Once the conversion to struct mnt_idmap is done all helpers down to the
really low-level helpers will take a struct mnt_idmap argument instead of
two namespace arguments. This way it becomes impossible to conflate the two
eliminating the possibility of any bugs. All of the vfs and all filesystems
only operate on struct mnt_idmap.
Acked-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
Convert to struct mnt_idmap.
Last cycle we merged the necessary infrastructure in
256c8aed2b ("fs: introduce dedicated idmap type for mounts").
This is just the conversion to struct mnt_idmap.
Currently we still pass around the plain namespace that was attached to a
mount. This is in general pretty convenient but it makes it easy to
conflate namespaces that are relevant on the filesystem with namespaces
that are relevent on the mount level. Especially for non-vfs developers
without detailed knowledge in this area this can be a potential source for
bugs.
Once the conversion to struct mnt_idmap is done all helpers down to the
really low-level helpers will take a struct mnt_idmap argument instead of
two namespace arguments. This way it becomes impossible to conflate the two
eliminating the possibility of any bugs. All of the vfs and all filesystems
only operate on struct mnt_idmap.
Acked-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
Convert to struct mnt_idmap.
Last cycle we merged the necessary infrastructure in
256c8aed2b ("fs: introduce dedicated idmap type for mounts").
This is just the conversion to struct mnt_idmap.
Currently we still pass around the plain namespace that was attached to a
mount. This is in general pretty convenient but it makes it easy to
conflate namespaces that are relevant on the filesystem with namespaces
that are relevent on the mount level. Especially for non-vfs developers
without detailed knowledge in this area this can be a potential source for
bugs.
Once the conversion to struct mnt_idmap is done all helpers down to the
really low-level helpers will take a struct mnt_idmap argument instead of
two namespace arguments. This way it becomes impossible to conflate the two
eliminating the possibility of any bugs. All of the vfs and all filesystems
only operate on struct mnt_idmap.
Acked-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
- More userfaultfs work from Peter Xu.
- Several convert-to-folios series from Sidhartha Kumar and Huang Ying.
- Some filemap cleanups from Vishal Moola.
- David Hildenbrand added the ability to selftest anon memory COW handling.
- Some cpuset simplifications from Liu Shixin.
- Addition of vmalloc tracing support by Uladzislau Rezki.
- Some pagecache folioifications and simplifications from Matthew Wilcox.
- A pagemap cleanup from Kefeng Wang: we have VM_ACCESS_FLAGS, so use it.
- Miguel Ojeda contributed some cleanups for our use of the
__no_sanitize_thread__ gcc keyword. This series shold have been in the
non-MM tree, my bad.
- Naoya Horiguchi improved the interaction between memory poisoning and
memory section removal for huge pages.
- DAMON cleanups and tuneups from SeongJae Park
- Tony Luck fixed the handling of COW faults against poisoned pages.
- Peter Xu utilized the PTE marker code for handling swapin errors.
- Hugh Dickins reworked compound page mapcount handling, simplifying it
and making it more efficient.
- Removal of the autonuma savedwrite infrastructure from Nadav Amit and
David Hildenbrand.
- zram support for multiple compression streams from Sergey Senozhatsky.
- David Hildenbrand reworked the GUP code's R/O long-term pinning so
that drivers no longer need to use the FOLL_FORCE workaround which
didn't work very well anyway.
- Mel Gorman altered the page allocator so that local IRQs can remnain
enabled during per-cpu page allocations.
- Vishal Moola removed the try_to_release_page() wrapper.
- Stefan Roesch added some per-BDI sysfs tunables which are used to
prevent network block devices from dirtying excessive amounts of
pagecache.
- David Hildenbrand did some cleanup and repair work on KSM COW
breaking.
- Nhat Pham and Johannes Weiner have implemented writeback in zswap's
zsmalloc backend.
- Brian Foster has fixed a longstanding corner-case oddity in
file[map]_write_and_wait_range().
- sparse-vmemmap changes for MIPS, LoongArch and NIOS2 from Feiyang
Chen.
- Shiyang Ruan has done some work on fsdax, to make its reflink mode
work better under xfstests. Better, but still not perfect.
- Christoph Hellwig has removed the .writepage() method from several
filesystems. They only need .writepages().
- Yosry Ahmed wrote a series which fixes the memcg reclaim target
beancounting.
- David Hildenbrand has fixed some of our MM selftests for 32-bit
machines.
- Many singleton patches, as usual.
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCY5j6ZwAKCRDdBJ7gKXxA
jkDYAP9qNeVqp9iuHjZNTqzMXkfmJPsw2kmy2P+VdzYVuQRcJgEAgoV9d7oMq4ml
CodAgiA51qwzId3GRytIo/tfWZSezgA=
=d19R
-----END PGP SIGNATURE-----
Merge tag 'mm-stable-2022-12-13' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull MM updates from Andrew Morton:
- More userfaultfs work from Peter Xu
- Several convert-to-folios series from Sidhartha Kumar and Huang Ying
- Some filemap cleanups from Vishal Moola
- David Hildenbrand added the ability to selftest anon memory COW
handling
- Some cpuset simplifications from Liu Shixin
- Addition of vmalloc tracing support by Uladzislau Rezki
- Some pagecache folioifications and simplifications from Matthew
Wilcox
- A pagemap cleanup from Kefeng Wang: we have VM_ACCESS_FLAGS, so use
it
- Miguel Ojeda contributed some cleanups for our use of the
__no_sanitize_thread__ gcc keyword.
This series should have been in the non-MM tree, my bad
- Naoya Horiguchi improved the interaction between memory poisoning and
memory section removal for huge pages
- DAMON cleanups and tuneups from SeongJae Park
- Tony Luck fixed the handling of COW faults against poisoned pages
- Peter Xu utilized the PTE marker code for handling swapin errors
- Hugh Dickins reworked compound page mapcount handling, simplifying it
and making it more efficient
- Removal of the autonuma savedwrite infrastructure from Nadav Amit and
David Hildenbrand
- zram support for multiple compression streams from Sergey Senozhatsky
- David Hildenbrand reworked the GUP code's R/O long-term pinning so
that drivers no longer need to use the FOLL_FORCE workaround which
didn't work very well anyway
- Mel Gorman altered the page allocator so that local IRQs can remnain
enabled during per-cpu page allocations
- Vishal Moola removed the try_to_release_page() wrapper
- Stefan Roesch added some per-BDI sysfs tunables which are used to
prevent network block devices from dirtying excessive amounts of
pagecache
- David Hildenbrand did some cleanup and repair work on KSM COW
breaking
- Nhat Pham and Johannes Weiner have implemented writeback in zswap's
zsmalloc backend
- Brian Foster has fixed a longstanding corner-case oddity in
file[map]_write_and_wait_range()
- sparse-vmemmap changes for MIPS, LoongArch and NIOS2 from Feiyang
Chen
- Shiyang Ruan has done some work on fsdax, to make its reflink mode
work better under xfstests. Better, but still not perfect
- Christoph Hellwig has removed the .writepage() method from several
filesystems. They only need .writepages()
- Yosry Ahmed wrote a series which fixes the memcg reclaim target
beancounting
- David Hildenbrand has fixed some of our MM selftests for 32-bit
machines
- Many singleton patches, as usual
* tag 'mm-stable-2022-12-13' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (313 commits)
mm/hugetlb: set head flag before setting compound_order in __prep_compound_gigantic_folio
mm: mmu_gather: allow more than one batch of delayed rmaps
mm: fix typo in struct pglist_data code comment
kmsan: fix memcpy tests
mm: add cond_resched() in swapin_walk_pmd_entry()
mm: do not show fs mm pc for VM_LOCKONFAULT pages
selftests/vm: ksm_functional_tests: fixes for 32bit
selftests/vm: cow: fix compile warning on 32bit
selftests/vm: madv_populate: fix missing MADV_POPULATE_(READ|WRITE) definitions
mm/gup_test: fix PIN_LONGTERM_TEST_READ with highmem
mm,thp,rmap: fix races between updates of subpages_mapcount
mm: memcg: fix swapcached stat accounting
mm: add nodes= arg to memory.reclaim
mm: disable top-tier fallback to reclaim on proactive reclaim
selftests: cgroup: make sure reclaim target memcg is unprotected
selftests: cgroup: refactor proactive reclaim code to reclaim_until()
mm: memcg: fix stale protection of reclaim target memcg
mm/mmap: properly unaccount memory on mas_preallocate() failure
omfs: remove ->writepage
jfs: remove ->writepage
...
Since the basic function for fsdax and reflink has been implemented,
remove the restrictions of them for widly test.
Link: https://lkml.kernel.org/r/1669908773-207-1-git-send-email-ruansy.fnst@fujitsu.com
Signed-off-by: Shiyang Ruan <ruansy.fnst@fujitsu.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Jason Gunthorpe <jgg@nvidia.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
The current way of setting and getting posix acls through the generic
xattr interface is error prone and type unsafe. The vfs needs to
interpret and fixup posix acls before storing or reporting it to
userspace. Various hacks exist to make this work. The code is hard to
understand and difficult to maintain in it's current form. Instead of
making this work by hacking posix acls through xattr handlers we are
building a dedicated posix acl api around the get and set inode
operations. This removes a lot of hackiness and makes the codepaths
easier to maintain. A lot of background can be found in [1].
The current inode operation for getting posix acls takes an inode
argument but various filesystems (e.g., 9p, cifs, overlayfs) need access
to the dentry. In contrast to the ->set_acl() inode operation we cannot
simply extend ->get_acl() to take a dentry argument. The ->get_acl()
inode operation is called from:
acl_permission_check()
-> check_acl()
-> get_acl()
which is part of generic_permission() which in turn is part of
inode_permission(). Both generic_permission() and inode_permission() are
called in the ->permission() handler of various filesystems (e.g.,
overlayfs). So simply passing a dentry argument to ->get_acl() would
amount to also having to pass a dentry argument to ->permission(). We
should avoid this unnecessary change.
So instead of extending the existing inode operation rename it from
->get_acl() to ->get_inode_acl() and add a ->get_acl() method later that
passes a dentry argument and which filesystems that need access to the
dentry can implement instead of ->get_inode_acl(). Filesystems like cifs
which allow setting and getting posix acls but not using them for
permission checking during lookup can simply not implement
->get_inode_acl().
This is intended to be a non-functional change.
Link: https://lore.kernel.org/all/20220801145520.1532837-1-brauner@kernel.org [1]
Suggested-by/Inspired-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
The current way of setting and getting posix acls through the generic
xattr interface is error prone and type unsafe. The vfs needs to
interpret and fixup posix acls before storing or reporting it to
userspace. Various hacks exist to make this work. The code is hard to
understand and difficult to maintain in it's current form. Instead of
making this work by hacking posix acls through xattr handlers we are
building a dedicated posix acl api around the get and set inode
operations. This removes a lot of hackiness and makes the codepaths
easier to maintain. A lot of background can be found in [1].
Since some filesystem rely on the dentry being available to them when
setting posix acls (e.g., 9p and cifs) they cannot rely on set acl inode
operation. But since ->set_acl() is required in order to use the generic
posix acl xattr handlers filesystems that do not implement this inode
operation cannot use the handler and need to implement their own
dedicated posix acl handlers.
Update the ->set_acl() inode method to take a dentry argument. This
allows all filesystems to rely on ->set_acl().
As far as I can tell all codepaths can be switched to rely on the dentry
instead of just the inode. Note that the original motivation for passing
the dentry separate from the inode instead of just the dentry in the
xattr handlers was because of security modules that call
security_d_instantiate(). This hook is called during
d_instantiate_new(), d_add(), __d_instantiate_anon(), and
d_splice_alias() to initialize the inode's security context and possibly
to set security.* xattrs. Since this only affects security.* xattrs this
is completely irrelevant for posix acls.
Link: https://lore.kernel.org/all/20220801145520.1532837-1-brauner@kernel.org [1]
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
This update contains:
- fixes for filesystem shutdown procedure during a DAX memory
failure notification
- bug fixes
- logic cleanups
- log message cleanups
- updates to use vfs{g,u}id_t helpers where appropriate
Signed-off-by: Dave Chinner <david@fromorbit.com>
-----BEGIN PGP SIGNATURE-----
iQJIBAABCgAyFiEEmJOoJ8GffZYWSjj/regpR/R1+h0FAmNEjOoUHGRhdmlkQGZy
b21vcmJpdC5jb20ACgkQregpR/R1+h2UNg/+Ib1V1XSL6g+sidIPlm5/J3U2sWTh
gRPgd5f5U25T50TEuor93RcOBMXTEww5tsRkQLmekzzgRiCcXu24VyzfCsbx9u4o
JrWt7po+NXPtJW8VedNdHVlOiMBQsf1u3ZY54nmv63EW69J/BEK9jTUeGy3rK0DY
+A/wVvVDipp8VZZ5zh/SwQh1pp3CSSElwuVdlcRl5cJiKiD2vg+Z/NvHnrp+1u+9
F6rOW6RFjU9PqfNGhx9RjC+pYVlmVrDUwHj680ReDsdgDOWzbnW05ft74JpRdGfC
tEy9vxjQ8/3/7vTHspXCI4RIn9LrBjNke2eRgsdRqVcjHa2KQ+hUvG6v8sxV+Wms
7N1oIS2IKtLhUGOZyCwgUCLKFQ1blfkF/XyKx9DFumsnMLzmG2ret9DuOBPccr+c
o7e1ArIlgnJpre2nvPhF5EcM+dArVMuZGPG03vL1iS7A79Ak0/e8Jivee1ScdupP
yNJZzBYYbCkpIUVD9wzNeziSwwSXgW9j1nr6HEOLpBAa0/v0OS5iZfjWVvSLK24Q
OcstRkY7cV5LkU2weIK6UQ6KZF4lJxUQ9j50OeeTuy3fou3utUt0vgVEGIiZ/edp
H0gnQ15wT/RFjOL3i9zMhh24Yy25E/df1ugYmrOwszOQxU21KuZhvzhhehD5pC+r
3MyQQ/e99VoDzEw=
=2o9v
-----END PGP SIGNATURE-----
Merge tag 'xfs-6.1-for-linus' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux
Pull xfs updates from Dave Chinner:
"There are relatively few updates this cycle; half the cycle was eaten
by a grue, the other half was eaten by a tricky data corruption issue
that I still haven't entirely solved.
Hence there's no major changes in this cycle and it's largely just
minor cleanups and small bug fixes:
- fixes for filesystem shutdown procedure during a DAX memory failure
notification
- bug fixes
- logic cleanups
- log message cleanups
- updates to use vfs{g,u}id_t helpers where appropriate"
* tag 'xfs-6.1-for-linus' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
xfs: on memory failure, only shut down fs after scanning all mappings
xfs: rearrange the logic and remove the broken comment for xfs_dir2_isxx
xfs: trim the mapp array accordingly in xfs_da_grow_inode_int
xfs: do not need to check return value of xlog_kvmalloc()
xfs: port to vfs{g,u}id_t and associated helpers
xfs: remove xfs_setattr_time() declaration
xfs: Remove the unneeded result variable
xfs: missing space in xfs trace log
xfs: simplify if-else condition in xfs_reflink_trim_around_shared
xfs: simplify if-else condition in xfs_validate_new_dalign
xfs: replace unnecessary seq_printf with seq_puts
xfs: clean up "%Ld/%Lu" which doesn't meet C standard
xfs: remove redundant else for clean code
xfs: remove the redundant word in comment
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQQqUNBr3gm4hGXdBJlZ7Krx/gZQ6wUCY0DP2AAKCRBZ7Krx/gZQ
6/+qAQCEGQWpcC5MB17zylaX7gqzhgAsDrwtpevlno3aIv/1pQD/YWr/E8tf7WTW
ERXRXMRx1cAzBJhUhVgIY+3ANfU2Rg4=
=cko4
-----END PGP SIGNATURE-----
Merge tag 'pull-tmpfile' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs tmpfile updates from Al Viro:
"Miklos' ->tmpfile() signature change; pass an unopened struct file to
it, let it open the damn thing. Allows to add tmpfile support to FUSE"
* tag 'pull-tmpfile' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
fuse: implement ->tmpfile()
vfs: open inside ->tmpfile()
vfs: move open right after ->tmpfile()
vfs: make vfs_tmpfile() static
ovl: use vfs_tmpfile_open() helper
cachefiles: use vfs_tmpfile_open() helper
cachefiles: only pass inode to *mark_inode_inuse() helpers
cachefiles: tmpfile error handling cleanup
hugetlbfs: cleanup mknod and tmpfile
vfs: add vfs_tmpfile_open() helper
This is in preparation for adding tmpfile support to fuse, which requires
that the tmpfile creation and opening are done as a single operation.
Replace the 'struct dentry *' argument of i_op->tmpfile with
'struct file *'.
Call finish_open_simple() as the last thing in ->tmpfile() instances (may
be omitted in the error case).
Change d_tmpfile() argument to 'struct file *' as well to make callers more
readable.
Reviewed-by: Christian Brauner (Microsoft) <brauner@kernel.org>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
A while ago we introduced a dedicated vfs{g,u}id_t type in commit
1e5267cd08 ("mnt_idmapping: add vfs{g,u}id_t"). We already switched
over a good part of the VFS. Ultimately we will remove all legacy
idmapped mount helpers that operate only on k{g,u}id_t in favor of the
new type safe helpers that operate on vfs{g,u}id_t.
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
Add support for STATX_DIOALIGN to xfs, so that direct I/O alignment
restrictions are exposed to userspace in a generic way.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Link: https://lore.kernel.org/r/20220827065851.135710-9-ebiggers@kernel.org
- Improve scalability of the XFS log by removing spinlocks and global
synchronization points.
- Add security labels to whiteout inodes to match the other filesystems.
- Clean up per-ag pointer passing to simplify call sites.
- Reduce verifier overhead by precalculating more AG geometry.
- Implement fast-path lockless lookups in the buffer cache to reduce
spinlock hammering.
- Make attr forks a permanent part of the inode structure to fix a UAF
bug and because most files these days tend to have security labels and
soon will have parent pointers too.
- Clean up XFS_IFORK_Q usage and give it a better name.
- Fix more UAF bugs in the xattr code.
- SOB my tags.
- Fix some typos in the timestamp range documentation.
- Fix a few more memory leaks.
- Code cleanups and typo fixes.
- Fix an unlocked inode fork pointer access in getbmap.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEUzaAxoMeQq6m2jMV+H93GTRKtOsFAmLmrLkACgkQ+H93GTRK
tOviexAAo7mJ03hCCWnnkcEYbVQNMH4WRuCpR45D8lz4PU/s6yL7/uxuyodc0dMm
/ZUWjCas1GMZmbOkCkL9eeatrZmgT5SeDbYc4EtHicHYi4sTgCB7ymx0soCUHXYi
7c0kdz+eQ/oY4QvY6JZwbFkRENDL2pkxM9itGHZT0OXHmAnGcIYvzP5Vuc2GtelL
0VWCcpusG0uck3+P1qa8e+TtkR2HU5PVGgAU7OhmAIs07aE3AheVEsPydgGKSIS9
PICnMg1oIgly4VQi28cp/5hU+Au6yBMGogxW8ultPFlM5RWKFt8MKUUhclzS+hZL
9dGSZ3JjpZrdmuUa9mdPnr1MsgrTF6CWHAeUsblSXUzjRT8S3Yz8I3gUMJAA/H17
ZGBu55+TlZtE4ZsK3q/4pqZXfylaaumbEqEi5lJX+7/IYh/WLAgxJihWSpSK2B4a
VBqi12EvMlrjZ4vrD2hqVEJAlguoWiqxgv2gXEZ5wy9dfvzGgysXwAigj0YQeJNQ
J++AYwdYs0pCK0O4eTGZsvp+6o9wj92irtrxwiucuKreDZTOlpCBOAXVTxqom1nX
1NS1YmKvC/RM1na6tiOIundwypgSXUe32qdan34xEWBVPY0mnSpX0N9Lcyoc0xbg
kajAKK9TIy968su/eoBuTQf2AIu1jbWMBNZSg9oELZjfrm0CkWM=
=fNjj
-----END PGP SIGNATURE-----
Merge tag 'xfs-5.20-merge-6' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux
Pull xfs updates from Darrick Wong:
"The biggest changes for this release are the log scalability
improvements, lockless lookups for the buffer cache, and making the
attr fork a permanent part of the incore inode in preparation for
directory parent pointers.
There's also a bunch of bug fixes that have accumulated since -rc5. I
might send you a second pull request with some more bug fixes that I'm
still working on.
Once the merge window ends, I will hand maintainership back to Dave
Chinner until the 6.1-rc1 release so that I can conduct the design
review for the online fsck feature, and try to get it merged.
Summary:
- Improve scalability of the XFS log by removing spinlocks and global
synchronization points.
- Add security labels to whiteout inodes to match the other
filesystems.
- Clean up per-ag pointer passing to simplify call sites.
- Reduce verifier overhead by precalculating more AG geometry.
- Implement fast-path lockless lookups in the buffer cache to reduce
spinlock hammering.
- Make attr forks a permanent part of the inode structure to fix a
UAF bug and because most files these days tend to have security
labels and soon will have parent pointers too.
- Clean up XFS_IFORK_Q usage and give it a better name.
- Fix more UAF bugs in the xattr code.
- SOB my tags.
- Fix some typos in the timestamp range documentation.
- Fix a few more memory leaks.
- Code cleanups and typo fixes.
- Fix an unlocked inode fork pointer access in getbmap"
* tag 'xfs-5.20-merge-6' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: (61 commits)
xfs: delete extra space and tab in blank line
xfs: fix NULL pointer dereference in xfs_getbmap()
xfs: Fix typo 'the the' in comment
xfs: Fix comment typo
xfs: don't leak memory when attr fork loading fails
xfs: fix for variable set but not used warning
xfs: xfs_buf cache destroy isn't RCU safe
xfs: delete unnecessary NULL checks
xfs: fix comment for start time value of inode with bigtime enabled
xfs: fix use-after-free in xattr node block inactivation
xfs: lockless buffer lookup
xfs: remove a superflous hash lookup when inserting new buffers
xfs: reduce the number of atomic when locking a buffer after lookup
xfs: merge xfs_buf_find() and xfs_buf_get_map()
xfs: break up xfs_buf_find() into individual pieces
xfs: add in-memory iunlink log item
xfs: add log item precommit operation
xfs: combine iunlink inode update functions
xfs: clean up xfs_iunlink_update_inode()
xfs: double link the unlinked inode list
...
Replace this shouty macro with a real C function that has a more
descriptive name.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
We got a report that "renameat2() with flags=RENAME_WHITEOUT doesn't
apply an SELinux label on xfs" as it does on other filesystems
(for example, ext4 and tmpfs.) While I'm not quite sure how labels
may interact w/ whiteout files, leaving them as unlabeled seems
inconsistent at best. Now that xfs_init_security is not static,
rename it to xfs_inode_init_security per dchinner's suggestion.
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>