2
0
mirror of https://github.com/edk2-porting/linux-next.git synced 2024-12-29 07:34:06 +08:00
Commit Graph

983029 Commits

Author SHA1 Message Date
Darrick J. Wong
b8055ed677 xfs: reduce quota reservation when doing a dax unwritten extent conversion
In commit 3b0fe47805, we reduced the free space requirement to
perform a pre-write unwritten extent conversion on an S_DAX file.  Since
we're not actually allocating any space, the logic goes, we only need
enough reservation to handle shape changes in the bmbt.

The same logic should have been applied to quota -- we're not allocating
any space, so we only need to reserve enough quota to handle the bmbt
shape changes.

Fixes: 3b0fe47805 ("xfs: Don't use reserved blocks for data blocks with DAX")
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
2021-02-03 09:18:48 -08:00
Darrick J. Wong
1aecf3734a xfs: fix chown leaking delalloc quota blocks when fssetxattr fails
While refactoring the quota code to create a function to allocate inode
change transactions, I noticed that xfs_qm_vop_chown_reserve does more
than just make reservations: it also *modifies* the incore counts
directly to handle the owner id change for the delalloc blocks.

I then observed that the fssetxattr code continues validating input
arguments after making the quota reservation but before dirtying the
transaction.  If the routine decides to error out, it fails to undo the
accounting switch!  This leads to incorrect quota reservation and
failure down the line.

We can fix this by making the reservation function do only that -- for
the new dquot, it reserves ondisk and delalloc blocks to the
transaction, and the old dquot hangs on to its incore reservation for
now.  Once we actually switch the dquots, we can then update the incore
reservations because we've dirtied the transaction and it's too late to
turn back now.

No fixes tag because this has been broken since the start of git.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
2021-02-03 09:17:47 -08:00
Chandan Babu R
560ab6c0d1 xfs: Fix 'set but not used' warning in xfs_bmap_compute_alignments()
With both CONFIG_XFS_DEBUG and CONFIG_XFS_WARN disabled, the only reference to
local variable "error" in xfs_bmap_compute_alignments() gets eliminated during
pre-processing stage of the compilation process. This causes the compiler to
generate a "set but not used" warning.

Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Chandan Babu R <chandanrlinux@gmail.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Brian Foster <bfoster@redhat.com>
2021-02-01 09:44:24 -08:00
Brian Foster
4533fc6315 xfs: fix unused log variable in xfs_log_cover()
The log variable is only used in kernels with asserts enabled.
Remove it and open code the dereference to avoid unused variable
warnings.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-02-01 09:44:23 -08:00
Christoph Hellwig
ae29e4220f xfs: reduce ilock acquisitions in xfs_file_fsync
If the inode is not pinned by the time fsync is called we don't need the
ilock to protect against concurrent clearing of ili_fsync_fields as the
inode won't need a log flush or clearing of these fields.  Not taking
the iolock allows for full concurrency of fsync and thus O_DSYNC
completions with io_uring/aio write submissions.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
2021-01-22 16:54:52 -08:00
Christoph Hellwig
f22c7f8777 xfs: refactor xfs_file_fsync
Factor out the log syncing logic into two helpers to make the code easier
to read and more maintainable.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
2021-01-22 16:54:52 -08:00
Brian Foster
5b0ad7c2a5 xfs: cover the log on freeze instead of cleaning it
Filesystem freeze cleans the log and immediately redirties it so log
recovery runs if a crash occurs after the filesystem is frozen. Now
that log quiesce covers the log, there is no need to clean the log and
redirty it to trigger log recovery because covering has the same
effect. Update xfs_fs_freeze() to quiesce (and thus cover) the log.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
2021-01-22 16:54:52 -08:00
Brian Foster
ea2064da45 xfs: remove xfs_quiesce_attr()
xfs_quiesce_attr() is now a wrapper for xfs_log_clean(). Remove it
and call xfs_log_clean() directly.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
2021-01-22 16:54:51 -08:00
Brian Foster
5232b93150 xfs: remove duplicate wq cancel and log force from attr quiesce
These two calls are repeated at the beginning of xfs_log_quiesce().
Drop them from xfs_quiesce_attr().

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
2021-01-22 16:54:51 -08:00
Brian Foster
f46e5a1746 xfs: fold sbcount quiesce logging into log covering
xfs_log_sbcount() calls xfs_sync_sb() to sync superblock counters to
disk when lazy superblock accounting is enabled. This occurs on
unmount, freeze, and read-only (re)mount and ensures the final
values are calculated and persisted to disk before each form of
quiesce completes.

Now that log covering occurs in all of these contexts and uses the
same xfs_sync_sb() mechanism to update log state, there is no need
to log the superblock separately for any reason. Update the log
quiesce path to sync the superblock at least once for any mount
where lazy superblock accounting is enabled. If the log is already
covered, it will remain in the covered state. Otherwise, the next
sync as part of the normal covering sequence will carry the
associated superblock update with it. Remove xfs_log_sbcount() now
that it is no longer needed.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
2021-01-22 16:54:51 -08:00
Brian Foster
b0eb9e1182 xfs: don't reset log idle state on covering checkpoints
Now that log covering occurs on quiesce, we'd like to reuse the
underlying superblock sync for final superblock updates. This
includes things like lazy superblock counter updates, log feature
incompat bits in the future, etc. One quirk to this approach is that
once the log is in the IDLE (i.e. already covered) state, any
subsequent log write resets the state back to NEED. This means that
a final superblock sync to an already covered log requires two more
sb syncs to return the log back to IDLE again.

For example, if a lazy superblock enabled filesystem is mount cycled
without any modifications, the unmount path syncs the superblock
once and writes an unmount record. With the desired log quiesce
covering behavior, we sync the superblock three times at unmount
time: once for the lazy superblock counter update and twice more to
cover the log. By contrast, if the log is active or only partially
covered at unmount time, a final superblock sync would doubly serve
as the one or two remaining syncs required to cover the log.

This duplicate covering sequence is unnecessary because the
filesystem remains consistent if a crash occurs at any point. The
superblock will either be recovered in the event of a crash or
written back before the log is quiesced and potentially cleaned with
an unmount record.

Update the log covering state machine to remain in the IDLE state if
additional covering checkpoints pass through the log. This
facilitates final superblock updates (such as lazy superblock
counters) via a single sb sync without losing covered status. This
provides some consistency with the active and partially covered
cases and also avoids harmless, but spurious checkpoints when
quiescing the log.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
2021-01-22 16:54:51 -08:00
Brian Foster
303591a0a9 xfs: cover the log during log quiesce
The log quiesce mechanism historically terminates by marking the log
clean with an unmount record. The primary objective is to indicate
that log recovery is no longer required after the quiesce has
flushed all in-core changes and written back filesystem metadata.
While this is perfectly fine, it is somewhat hacky as currently used
in certain contexts. For example, filesystem freeze quiesces (i.e.
cleans) the log and immediately redirties it with a dummy superblock
transaction to ensure that log recovery runs in the event of a
crash.

While this functions correctly, cleaning the log from freeze context
is clearly superfluous given the current redirtying behavior.
Instead, the desired behavior can be achieved by simply covering the
log. This effectively retires all on-disk log items from the active
range of the log by issuing two synchronous and sequential dummy
superblock update transactions that serve to update the on-disk log
head and tail. The subtle difference is that the log technically
remains dirty due to the lack of an unmount record, though recovery
is effectively a no-op due to the content of the checkpoints being
clean (i.e. the unmodified on-disk superblock).

Log covering currently runs in the background and only triggers once
the filesystem and log has idled. The purpose of the background
mechanism is to prevent log recovery from replaying the most
recently logged items long after those items may have been written
back. In the quiesce path, the log has been deliberately idled by
forcing the log and pushing the AIL until empty in a context where
no further mutable filesystem operations are allowed. Therefore, we
can cover the log as the final step in the log quiesce codepath to
reflect that all previously active items have been successfully
written back.

This facilitates selective log covering from certain contexts (i.e.
freeze) that only seek to quiesce, but not necessarily clean the
log. Note that as a side effect of this change, log covering now
occurs when cleaning the log as well. This is harmless, facilitates
subsequent cleanups, and is mostly temporary as various operations
switch to use explicit log covering.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
2021-01-22 16:54:51 -08:00
Brian Foster
9e54ee0fc9 xfs: separate log cleaning from log quiesce
Log quiesce is currently associated with cleaning the log, which is
accomplished by writing an unmount record as the last step of the
quiesce sequence. The quiesce codepath is a bit convoluted in this
regard due to how it is reused from various contexts. In preparation
to create separate log cleaning and log covering interfaces, lift
the write of the unmount record into a new cleaning helper and call
that wherever xfs_log_quiesce() is currently invoked. No functional
changes.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-01-22 16:54:51 -08:00
Brian Foster
37444fc4cc xfs: lift writable fs check up into log worker task
The log covering helper checks whether the filesystem is writable to
determine whether to cover the log. The helper is currently only
called from the background log worker. In preparation to reuse the
helper from freezing contexts, lift the check into xfs_log_worker().

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-01-22 16:54:50 -08:00
Brian Foster
50d25484be xfs: sync lazy sb accounting on quiesce of read-only mounts
xfs_log_sbcount() syncs the superblock specifically to accumulate
the in-core percpu superblock counters and commit them to disk. This
is required to maintain filesystem consistency across quiesce
(freeze, read-only mount/remount) or unmount when lazy superblock
accounting is enabled because individual transactions do not update
the superblock directly.

This mechanism works as expected for writable mounts, but
xfs_log_sbcount() skips the update for read-only mounts. Read-only
mounts otherwise still allow log recovery and write out an unmount
record during log quiesce. If a read-only mount performs log
recovery, it can modify the in-core superblock counters and write an
unmount record when the filesystem unmounts without ever syncing the
in-core counters. This leaves the filesystem with a clean log but in
an inconsistent state with regard to lazy sb counters.

Update xfs_log_sbcount() to use the same logic
xfs_log_unmount_write() uses to determine when to write an unmount
record. This ensures that lazy accounting is always synced before
the log is cleaned. Refactor this logic into a new helper to
distinguish between a writable filesystem and a writable log.
Specifically, the log is writable unless the filesystem is mounted
with the norecovery mount option, the underlying log device is
read-only, or the filesystem is shutdown. Drop the freeze state
check because the update is already allowed during the freezing
process and no context calls this function on an already frozen fs.
Also, retain the shutdown check in xfs_log_unmount_write() to catch
the case where the preceding log force might have triggered a
shutdown.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Gao Xiang <hsiangkao@redhat.com>
Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Bill O'Donnell <billodo@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-01-22 16:54:50 -08:00
Jeffrey Mitchell
8aa921a953 xfs: set inode size after creating symlink
When XFS creates a new symlink, it writes its size to disk but not to the
VFS inode. This causes i_size_read() to return 0 for that symlink until
it is re-read from disk, for example when the system is rebooted.

I found this inconsistency while protecting directories with eCryptFS.
The command "stat path/to/symlink/in/ecryptfs" will report "Size: 0" if
the symlink was created after the last reboot on an XFS root.

Call i_size_write() in xfs_symlink()

Signed-off-by: Jeffrey Mitchell <jeffrey.mitchell@starlab.io>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
2021-01-22 16:54:50 -08:00
Brian Foster
8321ddb2fa xfs: don't drain buffer lru on freeze and read-only remount
xfs_buftarg_drain() is called from xfs_log_quiesce() to ensure the
buffer cache is reclaimed during unmount. xfs_log_quiesce() is also
called from xfs_quiesce_attr(), however, which means that cache
state is completely drained for filesystem freeze and read-only
remount. While technically harmless, this is unnecessarily
heavyweight. Both freeze and read-only mounts allow reads and thus
allow population of the buffer cache. Therefore, the transitional
sequence in either case really only needs to quiesce outstanding
writes to return the filesystem in a generally read-only state.

Additionally, some users have reported that attempts to freeze a
filesystem concurrent with a read-heavy workload causes the freeze
process to stall for a significant amount of time. This occurs
because, as mentioned above, the read workload repopulates the
buffer LRU while the freeze task attempts to drain it.

To improve this situation, replace the drain in xfs_log_quiesce()
with a buffer I/O quiesce and lift the drain into the unmount path.
This removes buffer LRU reclaim from freeze and read-only [re]mount,
but ensures the LRU is still drained before the filesystem unmounts.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-01-22 16:54:50 -08:00
Brian Foster
10fb9ac125 xfs: rename xfs_wait_buftarg() to xfs_buftarg_drain()
xfs_wait_buftarg() is vaguely named and somewhat overloaded. Its
primary purpose is to reclaim all buffers from the provided buffer
target LRU. In preparation to refactor xfs_wait_buftarg() into
serialization and LRU draining components, rename the function and
associated helpers to something more descriptive. This patch has no
functional changes with the minor exception of renaming a
tracepoint.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-01-22 16:54:50 -08:00
Yumei Huang
88a9e03bee xfs: Fix assert failure in xfs_setattr_size()
An assert failure is triggered by syzkaller test due to
ATTR_KILL_PRIV is not cleared before xfs_setattr_size.
As ATTR_KILL_PRIV is not checked/used by xfs_setattr_size,
just remove it from the assert.

Signed-off-by: Yumei Huang <yuhuang@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-01-22 16:54:50 -08:00
Christoph Hellwig
01ea173e10 xfs: fix up non-directory creation in SGID directories
XFS always inherits the SGID bit if it is set on the parent inode, while
the generic inode_init_owner does not do this in a few cases where it can
create a possible security problem, see commit 0fa3ecd878
("Fix up non-directory creation in SGID directories") for details.

Switch XFS to use the generic helper for the normal path to fix this,
just keeping the simple field inheritance open coded for the case of the
non-sgid case with the bsdgrpid mount option.

Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Reported-by: Christian Brauner <christian.brauner@ubuntu.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-01-22 16:54:49 -08:00
Eric Biggers
eaf92540a9 xfs: remove a stale comment from xfs_file_aio_write_checks()
The comment in xfs_file_aio_write_checks() about calling file_modified()
after dropping the ilock doesn't make sense, because the code that
unconditionally acquires and drops the ilock was removed by
commit 467f78992a ("xfs: reduce ilock hold times in
xfs_file_aio_write_checks").

Remove this outdated comment.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2021-01-22 16:54:49 -08:00
Chandan Babu R
3015196746 xfs: Introduce error injection to allocate only minlen size extents for files
This commit adds XFS_ERRTAG_BMAP_ALLOC_MINLEN_EXTENT error tag which
helps userspace test programs to get xfs_bmap_btalloc() to always
allocate minlen sized extents.

This is required for test programs which need a guarantee that minlen
extents allocated for a file do not get merged with their existing
neighbours in the inode's BMBT. "Inode fork extent overflow check" for
Directories, Xattrs and extension of realtime inodes need this since the
file offset at which the extents are being allocated cannot be
explicitly controlled from userspace.

One way to use this error tag is to,
1. Consume all of the free space by sequentially writing to a file.
2. Punch alternate blocks of the file. This causes CNTBT to contain
   sufficient number of one block sized extent records.
3. Inject XFS_ERRTAG_BMAP_ALLOC_MINLEN_EXTENT error tag.
After step 3, xfs_bmap_btalloc() will issue space allocation
requests for minlen sized extents only.

ENOSPC error code is returned to userspace when there aren't any "one
block sized" extents left in any of the AGs.

Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Chandan Babu R <chandanrlinux@gmail.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2021-01-22 16:54:49 -08:00
Chandan Babu R
07c72e5562 xfs: Process allocated extent in a separate function
This commit moves over the code in xfs_bmap_btalloc() which is
responsible for processing an allocated extent to a new function. Apart
from xfs_bmap_btalloc(), the new function will be invoked by another
function introduced in a future commit.

Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Chandan Babu R <chandanrlinux@gmail.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2021-01-22 16:54:49 -08:00
Chandan Babu R
0961fddfdd xfs: Compute bmap extent alignments in a separate function
This commit moves over the code which computes stripe alignment and
extent size hint alignment into a separate function. Apart from
xfs_bmap_btalloc(), the new function will be used by another function
introduced in a future commit.

Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Chandan Babu R <chandanrlinux@gmail.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2021-01-22 16:54:49 -08:00
Chandan Babu R
aff4db57d5 xfs: Remove duplicate assert statement in xfs_bmap_btalloc()
The check for verifying if the allocated extent is from an AG whose
index is greater than or equal to that of tp->t_firstblock is already
done a couple of statements earlier in the same function. Hence this
commit removes the redundant assert statement.

Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Chandan Babu R <chandanrlinux@gmail.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2021-01-22 16:54:49 -08:00
Chandan Babu R
f9fa87169d xfs: Introduce error injection to reduce maximum inode fork extent count
This commit adds XFS_ERRTAG_REDUCE_MAX_IEXTENTS error tag which enables
userspace programs to test "Inode fork extent count overflow detection"
by reducing maximum possible inode fork extent count to 10.

Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
Signed-off-by: Chandan Babu R <chandanrlinux@gmail.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2021-01-22 16:54:48 -08:00
Chandan Babu R
bcc561f21f xfs: Check for extent overflow when swapping extents
Removing an initial range of source/donor file's extent and adding a new
extent (from donor/source file) in its place will cause extent count to
increase by 1.

Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
Signed-off-by: Chandan Babu R <chandanrlinux@gmail.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2021-01-22 16:54:48 -08:00
Chandan Babu R
ee898d78c3 xfs: Check for extent overflow when remapping an extent
Remapping an extent involves unmapping the existing extent and mapping
in the new extent. When unmapping, an extent containing the entire unmap
range can be split into two extents,
i.e. | Old extent | hole | Old extent |
Hence extent count increases by 1.

Mapping in the new extent into the destination file can increase the
extent count by 1.

Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Chandan Babu R <chandanrlinux@gmail.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2021-01-22 16:54:48 -08:00
Chandan Babu R
5f1d5bbfb2 xfs: Check for extent overflow when moving extent from cow to data fork
Moving an extent to data fork can cause a sub-interval of an existing
extent to be unmapped. This will increase extent count by 1. Mapping in
the new extent can increase the extent count by 1 again i.e.
 | Old extent | New extent | Old extent |
Hence number of extents increases by 2.

Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
Signed-off-by: Chandan Babu R <chandanrlinux@gmail.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2021-01-22 16:54:48 -08:00
Chandan Babu R
c442f3086d xfs: Check for extent overflow when writing to unwritten extent
A write to a sub-interval of an existing unwritten extent causes
the original extent to be split into 3 extents
i.e. | Unwritten | Real | Unwritten |
Hence extent count can increase by 2.

Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
Signed-off-by: Chandan Babu R <chandanrlinux@gmail.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2021-01-22 16:54:48 -08:00
Chandan Babu R
3a19bb147c xfs: Check for extent overflow when adding/removing xattrs
Adding/removing an xattr can cause XFS_DA_NODE_MAXDEPTH extents to be
added. One extra extent for dabtree in case a local attr is large enough
to cause a double split.  It can also cause extent count to increase
proportional to the size of a remote xattr's value.

Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
Signed-off-by: Chandan Babu R <chandanrlinux@gmail.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2021-01-22 16:54:48 -08:00
Chandan Babu R
02092a2f03 xfs: Check for extent overflow when renaming dir entries
A rename operation is essentially a directory entry remove operation
from the perspective of parent directory (i.e. src_dp) of rename's
source. Hence the only place where we check for extent count overflow
for src_dp is in xfs_bmap_del_extent_real(). xfs_bmap_del_extent_real()
returns -ENOSPC when it detects a possible extent count overflow and in
response, the higher layers of directory handling code do the following:
1. Data/Free blocks: XFS lets these blocks linger until a future remove
   operation removes them.
2. Dabtree blocks: XFS swaps the blocks with the last block in the Leaf
   space and unmaps the last block.

For target_dp, there are two cases depending on whether the destination
directory entry exists or not.

When destination directory entry does not exist (i.e. target_ip ==
NULL), extent count overflow check is performed only when transaction
has a non-zero sized space reservation associated with it.  With a
zero-sized space reservation, XFS allows a rename operation to continue
only when the directory has sufficient free space in its data/leaf/free
space blocks to hold the new entry.

When destination directory entry exists (i.e. target_ip != NULL), all
we need to do is change the inode number associated with the already
existing entry. Hence there is no need to perform an extent count
overflow check.

Signed-off-by: Chandan Babu R <chandanrlinux@gmail.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2021-01-22 16:54:47 -08:00
Chandan Babu R
0dbc5cb1a9 xfs: Check for extent overflow when removing dir entries
Directory entry removal must always succeed; Hence XFS does the
following during low disk space scenario:
1. Data/Free blocks linger until a future remove operation.
2. Dabtree blocks would be swapped with the last block in the leaf space
   and then the new last block will be unmapped.

This facility is reused during low inode extent count scenario i.e. this
commit causes xfs_bmap_del_extent_real() to return -ENOSPC error code so
that the above mentioned behaviour is exercised causing no change to the
directory's extent count.

Signed-off-by: Chandan Babu R <chandanrlinux@gmail.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2021-01-22 16:54:47 -08:00
Chandan Babu R
f5d9274919 xfs: Check for extent overflow when adding dir entries
Directory entry addition can cause the following,
1. Data block can be added/removed.
   A new extent can cause extent count to increase by 1.
2. Free disk block can be added/removed.
   Same behaviour as described above for Data block.
3. Dabtree blocks.
   XFS_DA_NODE_MAXDEPTH blocks can be added. Each of these
   can be new extents. Hence extent count can increase by
   XFS_DA_NODE_MAXDEPTH.

Signed-off-by: Chandan Babu R <chandanrlinux@gmail.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2021-01-22 16:54:47 -08:00
Chandan Babu R
85ef08b5a6 xfs: Check for extent overflow when punching a hole
The extent mapping the file offset at which a hole has to be
inserted will be split into two extents causing extent count to
increase by 1.

Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
Signed-off-by: Chandan Babu R <chandanrlinux@gmail.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2021-01-22 16:54:47 -08:00
Chandan Babu R
727e1acd29 xfs: Check for extent overflow when trivally adding a new extent
When adding a new data extent (without modifying an inode's existing
extents) the extent count increases only by 1. This commit checks for
extent count overflow in such cases.

Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
Signed-off-by: Chandan Babu R <chandanrlinux@gmail.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2021-01-22 16:54:47 -08:00
Chandan Babu R
b9b7e1dc56 xfs: Add helper for checking per-inode extent count overflow
XFS does not check for possible overflow of per-inode extent counter
fields when adding extents to either data or attr fork.

For e.g.
1. Insert 5 million xattrs (each having a value size of 255 bytes) and
   then delete 50% of them in an alternating manner.

2. On a 4k block sized XFS filesystem instance, the above causes 98511
   extents to be created in the attr fork of the inode.

   xfsaild/loop0  2008 [003]  1475.127209: probe:xfs_inode_to_disk: (ffffffffa43fb6b0) if_nextents=98511 i_ino=131

3. The incore inode fork extent counter is a signed 32-bit
   quantity. However the on-disk extent counter is an unsigned 16-bit
   quantity and hence cannot hold 98511 extents.

4. The following incorrect value is stored in the attr extent counter,
   # xfs_db -f -c 'inode 131' -c 'print core.naextents' /dev/loop0
   core.naextents = -32561

This commit adds a new helper function (i.e.
xfs_iext_count_may_overflow()) to check for overflow of the per-inode
data and xattr extent counters. Future patches will use this function to
make sure that an FS operation won't cause the extent counter to
overflow.

Suggested-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Chandan Babu R <chandanrlinux@gmail.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2021-01-22 16:54:47 -08:00
Darrick J. Wong
6da1b4b1ab xfs: fix an ABBA deadlock in xfs_rename
When overlayfs is running on top of xfs and the user unlinks a file in
the overlay, overlayfs will create a whiteout inode and ask xfs to
"rename" the whiteout file atop the one being unlinked.  If the file
being unlinked loses its one nlink, we then have to put the inode on the
unlinked list.

This requires us to grab the AGI buffer of the whiteout inode to take it
off the unlinked list (which is where whiteouts are created) and to grab
the AGI buffer of the file being deleted.  If the whiteout was created
in a higher numbered AG than the file being deleted, we'll lock the AGIs
in the wrong order and deadlock.

Therefore, grab all the AGI locks we think we'll need ahead of time, and
in order of increasing AG number per the locking rules.

Reported-by: wenli xie <wlxie7296@gmail.com>
Fixes: 93597ae8da ("xfs: Fix deadlock between AGI and AGF when target_ip exists in xfs_rename()")
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
2021-01-22 16:54:43 -08:00
Linus Torvalds
19c329f680 Linux 5.11-rc4 2021-01-17 16:37:05 -08:00
Linus Torvalds
e2da783614 perf tools fixes for 5.11:
- Fix 'CPU too large' error in Intel PT.
 
 - Correct event attribute sizes in 'perf inject'.
 
 - Sync build_bug.h and kvm.h kernel copies.
 
 - Fix bpf.h header include directive in 5sec.c 'perf trace' bpf example.
 
 - libbpf tests fixes.
 
 - Fix shadow stat 'perf test' for non-bash shells.
 
 - Take cgroups into account for shadow stats in 'perf stat'.
 
 Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
 
 Test results:
 
 The first ones are container based builds of tools/perf with and without libelf
 support.  Where clang is available, it is also used to build perf with/without
 libelf, and building with LIBCLANGLLVM=1 (built-in clang) with gcc and clang
 when clang and its devel libraries are installed.
 
 The objtool and samples/bpf/ builds are disabled now that I'm switching from
 using the sources in a local volume to fetching them from a http server to
 build it inside the container, to make it easier to build in a container cluster.
 Those will come back later.
 
 Several are cross builds, the ones with -x-ARCH and the android one, and those
 may not have all the features built, due to lack of multi-arch devel packages,
 available and being used so far on just a few, like
 debian:experimental-x-{arm64,mipsel}.
 
 The 'perf test' one will perform a variety of tests exercising
 tools/perf/util/, tools/lib/{bpf,traceevent,etc}, as well as run perf commands
 with a variety of command line event specifications to then intercept the
 sys_perf_event syscall to check that the perf_event_attr fields are set up as
 expected, among a variety of other unit tests.
 
 Then there is the 'make -C tools/perf build-test' ones, that build tools/perf/
 with a variety of feature sets, exercising the build with an incomplete set of
 features as well as with a complete one. It is planned to have it run on each
 of the containers mentioned above, using some container orchestration
 infrastructure. Get in contact if interested in helping having this in place.
 
   $ grep "model name" -m1 /proc/cpuinfo
   model name: AMD Ryzen 9 3900X 12-Core Processor
   # export PERF_TARBALL=http://192.168.86.5/perf/perf-5.11.0-rc3.tar.xz
   # dm
    1    66.93 alpine:3.4                    : Ok   gcc (Alpine 5.3.0) 5.3.0, clang version 3.8.0 (tags/RELEASE_380/final)
    2    68.65 alpine:3.5                    : Ok   gcc (Alpine 6.2.1) 6.2.1 20160822, clang version 3.8.1 (tags/RELEASE_381/final)
    3    73.00 alpine:3.6                    : Ok   gcc (Alpine 6.3.0) 6.3.0, clang version 4.0.0 (tags/RELEASE_400/final)
    4    79.04 alpine:3.7                    : Ok   gcc (Alpine 6.4.0) 6.4.0, Alpine clang version 5.0.0 (tags/RELEASE_500/final) (based on LLVM 5.0.0)
    5    79.71 alpine:3.8                    : Ok   gcc (Alpine 6.4.0) 6.4.0, Alpine clang version 5.0.1 (tags/RELEASE_501/final) (based on LLVM 5.0.1)
    6    82.51 alpine:3.9                    : Ok   gcc (Alpine 8.3.0) 8.3.0, Alpine clang version 5.0.1 (tags/RELEASE_502/final) (based on LLVM 5.0.1)
    7   103.45 alpine:3.10                   : Ok   gcc (Alpine 8.3.0) 8.3.0, Alpine clang version 8.0.0 (tags/RELEASE_800/final) (based on LLVM 8.0.0)
    8   113.86 alpine:3.11                   : Ok   gcc (Alpine 9.3.0) 9.3.0, Alpine clang version 9.0.0 (https://git.alpinelinux.org/aports f7f0d2c2b8bcd6a5843401a9a702029556492689) (based on LLVM 9.0.0)
    9   109.31 alpine:3.12                   : Ok   gcc (Alpine 9.3.0) 9.3.0, Alpine clang version 10.0.0 (https://gitlab.alpinelinux.org/alpine/aports.git 7445adce501f8473efdb93b17b5eaf2f1445ed4c)
   10   113.90 alpine:edge                   : Ok   gcc (Alpine 10.2.0) 10.2.0, Alpine clang version 10.0.1
   11    66.76 alt:p8                        : Ok   x86_64-alt-linux-gcc (GCC) 5.3.1 20151207 (ALT p8 5.3.1-alt3.M80P.1), clang version 3.8.0 (tags/RELEASE_380/final)
   12    83.71 alt:p9                        : Ok   x86_64-alt-linux-gcc (GCC) 8.4.1 20200305 (ALT p9 8.4.1-alt0.p9.1), clang version 10.0.0
   13    80.70 alt:sisyphus                  : Ok   x86_64-alt-linux-gcc (GCC) 9.3.1 20200518 (ALT Sisyphus 9.3.1-alt1), clang version 10.0.1
   14    62.75 amazonlinux:1                 : Ok   gcc (GCC) 7.2.1 20170915 (Red Hat 7.2.1-2), clang version 3.6.2 (tags/RELEASE_362/final)
   15    97.65 amazonlinux:2                 : Ok   gcc (GCC) 7.3.1 20180712 (Red Hat 7.3.1-12), clang version 7.0.1 (Amazon Linux 2 7.0.1-1.amzn2.0.2)
   16    21.18 android-ndk:r12b-arm          : Ok   arm-linux-androideabi-gcc (GCC) 4.9.x 20150123 (prerelease)
   17    21.07 android-ndk:r15c-arm          : Ok   arm-linux-androideabi-gcc (GCC) 4.9.x 20150123 (prerelease)
   18    25.83 centos:6                      : Ok   gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-23)
   19    30.65 centos:7                      : Ok   gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-44)
   20    93.44 centos:8                      : Ok   gcc (GCC) 8.3.1 20191121 (Red Hat 8.3.1-5), clang version 10.0.1 (Red Hat 10.0.1-1.module_el8.3.0+467+cb298d5b)
   21    60.64 clearlinux:latest             : Ok   gcc (Clear Linux OS for Intel Architecture) 10.2.1 20201217 releases/gcc-10.2.0-643-g7cbb07d2fc, clang version 10.0.1
   22    74.57 debian:8                      : Ok   gcc (Debian 4.9.2-10+deb8u2) 4.9.2, Debian clang version 3.5.0-10 (tags/RELEASE_350/final) (based on LLVM 3.5.0)
   23    75.40 debian:9                      : Ok   gcc (Debian 6.3.0-18+deb9u1) 6.3.0 20170516, clang version 3.8.1-24 (tags/RELEASE_381/final)
   24    72.75 debian:10                     : Ok   gcc (Debian 8.3.0-6) 8.3.0, clang version 7.0.1-8+deb10u2 (tags/RELEASE_701/final)
   25    72.36 debian:experimental           : Ok   gcc (Debian 10.2.1-6) 10.2.1 20210110, Debian clang version 11.0.1-2
   26    32.35 debian:experimental-x-arm64   : Ok   aarch64-linux-gnu-gcc (Debian 10.2.1-6) 10.2.1 20210110
   27    28.65 debian:experimental-x-mips64  : Ok   mips64-linux-gnuabi64-gcc (Debian 10.2.1-3) 10.2.1 20201224
   28    13.79 debian:experimental-x-mipsel  : FAIL mipsel-linux-gnu-gcc (Debian 10.2.1-3) 10.2.1 20201224
 
       CC       /tmp/build/perf/util/map.o
     util/map.c: In function 'map__new':
     util/map.c:109:5: error: '%s' directive output may be truncated writing between 1 and 2147483645 bytes into a region of size 4096 [-Werror=format-truncation=]
       109 |    "%s/platforms/%s/arch-%s/usr/lib/%s",
           |     ^~
     In file included from /usr/mipsel-linux-gnu/include/stdio.h:867,
                      from util/symbol.h:11,
                      from util/map.c:2:
     /usr/mipsel-linux-gnu/include/bits/stdio2.h:67:10: note: '__builtin___snprintf_chk' output 32 or more bytes (assuming 4294967321) into a destination of size 4096
        67 |   return __builtin___snprintf_chk (__s, __n, __USE_FORTIFY_LEVEL - 1,
           |          ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
        68 |        __bos (__s), __fmt, __va_arg_pack ());
           |        ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
   29    29.14 fedora:20                     : Ok   gcc (GCC) 4.8.3 20140911 (Red Hat 4.8.3-7)
   30    30.66 fedora:22                     : Ok   gcc (GCC) 5.3.1 20160406 (Red Hat 5.3.1-6), clang version 3.5.0 (tags/RELEASE_350/final)
   31    66.33 fedora:23                     : Ok   gcc (GCC) 5.3.1 20160406 (Red Hat 5.3.1-6), clang version 3.7.0 (tags/RELEASE_370/final)
   32    77.51 fedora:24                     : Ok   gcc (GCC) 6.3.1 20161221 (Red Hat 6.3.1-1), clang version 3.8.1 (tags/RELEASE_381/final)
   33    25.23 fedora:24-x-ARC-uClibc        : Ok   arc-linux-gcc (ARCompact ISA Linux uClibc toolchain 2017.09-rc2) 7.1.1 20170710
   34    79.68 fedora:25                     : Ok   gcc (GCC) 6.4.1 20170727 (Red Hat 6.4.1-1), clang version 3.9.1 (tags/RELEASE_391/final)
   35    93.09 fedora:26                     : Ok   gcc (GCC) 7.3.1 20180130 (Red Hat 7.3.1-2), clang version 4.0.1 (tags/RELEASE_401/final)
   36    94.12 fedora:27                     : Ok   gcc (GCC) 7.3.1 20180712 (Red Hat 7.3.1-6), clang version 5.0.2 (tags/RELEASE_502/final)
   37   101.97 fedora:28                     : Ok   gcc (GCC) 8.3.1 20190223 (Red Hat 8.3.1-2), clang version 6.0.1 (tags/RELEASE_601/final)
   38   107.51 fedora:29                     : Ok   gcc (GCC) 8.3.1 20190223 (Red Hat 8.3.1-2), clang version 7.0.1 (Fedora 7.0.1-6.fc29)
   39   111.24 fedora:30                     : Ok   gcc (GCC) 9.3.1 20200408 (Red Hat 9.3.1-2), clang version 8.0.0 (Fedora 8.0.0-3.fc30)
   40    25.85 fedora:30-x-ARC-uClibc        : Ok   arc-linux-gcc (ARCv2 ISA Linux uClibc toolchain 2019.03-rc1) 8.3.1 20190225
   41   110.61 fedora:31                     : Ok   gcc (GCC) 9.3.1 20200408 (Red Hat 9.3.1-2), clang version 9.0.1 (Fedora 9.0.1-4.fc31)
   42    93.78 fedora:32                     : Ok   gcc (GCC) 10.2.1 20201016 (Red Hat 10.2.1-6), clang version 10.0.1 (Fedora 10.0.1-3.fc32)
   43    91.51 fedora:33                     : Ok   gcc (GCC) 10.2.1 20201125 (Red Hat 10.2.1-9), clang version 11.0.0 (Fedora 11.0.0-2.fc33)
   44    92.75 fedora:34                     : Ok   gcc (GCC) 11.0.0 20210113 (Red Hat 11.0.0-0), clang version 11.0.1 (Fedora 11.0.1-4.fc34)
   45    92.33 fedora:rawhide                : Ok   gcc (GCC) 11.0.0 20210109 (Red Hat 11.0.0-0), clang version 11.0.1 (Fedora 11.0.1-4.fc34)
   46    33.58 gentoo-stage3-amd64:latest    : Ok   gcc (Gentoo 9.3.0-r1 p3) 9.3.0
   47    66.03 mageia:5                      : Ok   gcc (GCC) 4.9.2, clang version 3.5.2 (tags/RELEASE_352/final)
   48    84.73 mageia:6                      : Ok   gcc (Mageia 5.5.0-1.mga6) 5.5.0, clang version 3.9.1 (tags/RELEASE_391/final)
   49    98.35 manjaro:latest                : Ok   gcc (GCC) 10.2.0, clang version 10.0.1
   50   223.15 openmandriva:cooker           : Ok   gcc (GCC) 10.2.0 20200723 (OpenMandriva), OpenMandriva 11.0.0-1 clang version 11.0.0 (/builddir/build/BUILD/llvm-project-llvmorg-11.0.0/clang 63e22714ac938c6b537bd958f70680d3331a2030)
   51   117.30 opensuse:15.0                 : Ok   gcc (SUSE Linux) 7.4.1 20190905 [gcc-7-branch revision 275407], clang version 5.0.1 (tags/RELEASE_501/final 312548)
   52   124.82 opensuse:15.1                 : Ok   gcc (SUSE Linux) 7.5.0, clang version 7.0.1 (tags/RELEASE_701/final 349238)
   53   113.33 opensuse:15.2                 : Ok   gcc (SUSE Linux) 7.5.0, clang version 9.0.1
   54   106.17 opensuse:42.3                 : Ok   gcc (SUSE Linux) 4.8.5, clang version 3.8.0 (tags/RELEASE_380/final 262553)
   55   108.15 opensuse:tumbleweed           : Ok   gcc (SUSE Linux) 10.2.1 20200825 [revision c0746a1beb1ba073c7981eb09f55b3d993b32e5c], clang version 10.0.1
   56    25.57 oraclelinux:6                 : Ok   gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-23.0.1)
   57    30.86 oraclelinux:7                 : Ok   gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-44.0.3)
   58    91.75 oraclelinux:8                 : Ok   gcc (GCC) 8.3.1 20191121 (Red Hat 8.3.1-5.0.1), clang version 10.0.1 (Red Hat 10.0.1-1.0.1.module+el8.3.0+7827+89335dbf)
   59    27.64 ubuntu:12.04                  : Ok   gcc (Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3, Ubuntu clang version 3.0-6ubuntu3 (tags/RELEASE_30/final) (based on LLVM 3.0)
   60    29.65 ubuntu:14.04                  : Ok   gcc (Ubuntu 4.8.4-2ubuntu1~14.04.4) 4.8.4
   61    75.65 ubuntu:16.04                  : Ok   gcc (Ubuntu 5.4.0-6ubuntu1~16.04.12) 5.4.0 20160609, clang version 3.8.0-2ubuntu4 (tags/RELEASE_380/final)
   62    25.57 ubuntu:16.04-x-arm            : Ok   arm-linux-gnueabihf-gcc (Ubuntu/Linaro 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609
   63    25.52 ubuntu:16.04-x-arm64          : Ok   aarch64-linux-gnu-gcc (Ubuntu/Linaro 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609
   64    25.01 ubuntu:16.04-x-powerpc        : Ok   powerpc-linux-gnu-gcc (Ubuntu 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609
   65    25.51 ubuntu:16.04-x-powerpc64      : Ok   powerpc64-linux-gnu-gcc (Ubuntu/IBM 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609
   66    25.70 ubuntu:16.04-x-powerpc64el    : Ok   powerpc64le-linux-gnu-gcc (Ubuntu/IBM 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609
   67    24.95 ubuntu:16.04-x-s390           : Ok   s390x-linux-gnu-gcc (Ubuntu 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609
   68    87.96 ubuntu:18.04                  : Ok   gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0, clang version 6.0.0-1ubuntu2 (tags/RELEASE_600/final)
   69    27.40 ubuntu:18.04-x-arm            : Ok   arm-linux-gnueabihf-gcc (Ubuntu/Linaro 7.5.0-3ubuntu1~18.04) 7.5.0
   70    27.14 ubuntu:18.04-x-arm64          : Ok   aarch64-linux-gnu-gcc (Ubuntu/Linaro 7.5.0-3ubuntu1~18.04) 7.5.0
   71    22.68 ubuntu:18.04-x-m68k           : Ok   m68k-linux-gnu-gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0
   72    26.52 ubuntu:18.04-x-powerpc        : Ok   powerpc-linux-gnu-gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0
   73    28.97 ubuntu:18.04-x-powerpc64      : Ok   powerpc64-linux-gnu-gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0
   74    28.54 ubuntu:18.04-x-powerpc64el    : Ok   powerpc64le-linux-gnu-gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0
   75   163.57 ubuntu:18.04-x-riscv64        : Ok   riscv64-linux-gnu-gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0
   76    24.07 ubuntu:18.04-x-s390           : Ok   s390x-linux-gnu-gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0
   77    26.77 ubuntu:18.04-x-sh4            : Ok   sh4-linux-gnu-gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0
   78    24.00 ubuntu:18.04-x-sparc64        : Ok   sparc64-linux-gnu-gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0
   79    69.36 ubuntu:19.10                  : Ok   gcc (Ubuntu 9.2.1-9ubuntu2) 9.2.1 20191008, clang version 8.0.1-3build1 (tags/RELEASE_801/final)
   80    27.07 ubuntu:19.10-x-alpha          : Ok   alpha-linux-gnu-gcc (Ubuntu 9.2.1-9ubuntu1) 9.2.1 20191008
   81    24.29 ubuntu:19.10-x-hppa           : Ok   hppa-linux-gnu-gcc (Ubuntu 9.2.1-9ubuntu1) 9.2.1 20191008
   82    74.99 ubuntu:20.04                  : Ok   gcc (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0, clang version 10.0.0-4ubuntu1
   83    30.49 ubuntu:20.04-x-powerpc64el    : Ok   powerpc64le-linux-gnu-gcc (Ubuntu 10.2.0-5ubuntu1~20.04) 10.2.0
   84    73.54 ubuntu:20.10                  : Ok   gcc (Ubuntu 10.2.0-13ubuntu1) 10.2.0, Ubuntu clang version 11.0.0-2
   $
 
   # uname -a
   Linux quaco 5.10.7-100.fc32.x86_64 #1 SMP Tue Jan 12 20:25:28 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
   # git log --oneline -1
   648b054a46 perf inject: Correct event attribute sizes
   # perf version --build-options
   perf version 5.11.rc3.g648b054a4647
                    dwarf: [ on  ]  # HAVE_DWARF_SUPPORT
       dwarf_getlocations: [ on  ]  # HAVE_DWARF_GETLOCATIONS_SUPPORT
                    glibc: [ on  ]  # HAVE_GLIBC_SUPPORT
            syscall_table: [ on  ]  # HAVE_SYSCALL_TABLE_SUPPORT
                   libbfd: [ on  ]  # HAVE_LIBBFD_SUPPORT
                   libelf: [ on  ]  # HAVE_LIBELF_SUPPORT
                  libnuma: [ on  ]  # HAVE_LIBNUMA_SUPPORT
   numa_num_possible_cpus: [ on  ]  # HAVE_LIBNUMA_SUPPORT
                  libperl: [ on  ]  # HAVE_LIBPERL_SUPPORT
                libpython: [ on  ]  # HAVE_LIBPYTHON_SUPPORT
                 libslang: [ on  ]  # HAVE_SLANG_SUPPORT
                libcrypto: [ on  ]  # HAVE_LIBCRYPTO_SUPPORT
                libunwind: [ on  ]  # HAVE_LIBUNWIND_SUPPORT
       libdw-dwarf-unwind: [ on  ]  # HAVE_DWARF_SUPPORT
                     zlib: [ on  ]  # HAVE_ZLIB_SUPPORT
                     lzma: [ on  ]  # HAVE_LZMA_SUPPORT
                get_cpuid: [ on  ]  # HAVE_AUXTRACE_SUPPORT
                      bpf: [ on  ]  # HAVE_LIBBPF_SUPPORT
                      aio: [ on  ]  # HAVE_AIO_SUPPORT
                     zstd: [ on  ]  # HAVE_ZSTD_SUPPORT
                  libpfm4: [ OFF ]  # HAVE_LIBPFM
   # perf test
    1: vmlinux symtab matches kallsyms                                 : Ok
    2: Detect openat syscall event                                     : Ok
    3: Detect openat syscall event on all cpus                         : Ok
    4: Read samples using the mmap interface                           : Ok
    5: Test data source output                                         : Ok
    6: Parse event definition strings                                  : Ok
    7: Simple expression parser                                        : Ok
    8: PERF_RECORD_* events & perf_sample fields                       : Ok
    9: Parse perf pmu format                                           : Ok
   10: PMU events                                                      :
   10.1: PMU event table sanity                                        : Ok
   10.2: PMU event map aliases                                         : Ok
   10.3: Parsing of PMU event table metrics                            : Ok
   10.4: Parsing of PMU event table metrics with fake PMUs             : Ok
   11: DSO data read                                                   : Ok
   12: DSO data cache                                                  : Ok
   13: DSO data reopen                                                 : Ok
   14: Roundtrip evsel->name                                           : Ok
   15: Parse sched tracepoints fields                                  : Ok
   16: syscalls:sys_enter_openat event fields                          : Ok
   17: Setup struct perf_event_attr                                    : Ok
   18: Match and link multiple hists                                   : Ok
   19: 'import perf' in python                                         : Ok
   20: Breakpoint overflow signal handler                              : Ok
   21: Breakpoint overflow sampling                                    : Ok
   22: Breakpoint accounting                                           : Ok
   23: Watchpoint                                                      :
   23.1: Read Only Watchpoint                                          : Skip (missing hardware support)
   23.2: Write Only Watchpoint                                         : Ok
   23.3: Read / Write Watchpoint                                       : Ok
   23.4: Modify Watchpoint                                             : Ok
   24: Number of exit events of a simple workload                      : Ok
   25: Software clock events period values                             : Ok
   26: Object code reading                                             : Ok
   27: Sample parsing                                                  : Ok
   28: Use a dummy software event to keep tracking                     : Ok
   29: Parse with no sample_id_all bit set                             : Ok
   30: Filter hist entries                                             : Ok
   31: Lookup mmap thread                                              : Ok
   32: Share thread maps                                               : Ok
   33: Sort output of hist entries                                     : Ok
   34: Cumulate child hist entries                                     : Ok
   35: Track with sched_switch                                         : Ok
   36: Filter fds with revents mask in a fdarray                       : Ok
   37: Add fd to a fdarray, making it autogrow                         : Ok
   38: kmod_path__parse                                                : Ok
   39: Thread map                                                      : Ok
   40: LLVM search and compile                                         :
   40.1: Basic BPF llvm compile                                        : Ok
   40.2: kbuild searching                                              : Ok
   40.3: Compile source for BPF prologue generation                    : Ok
   40.4: Compile source for BPF relocation                             : Ok
   41: Session topology                                                : Ok
   42: BPF filter                                                      :
   42.1: Basic BPF filtering                                           : Ok
   42.2: BPF pinning                                                   : Ok
   42.3: BPF prologue generation                                       : Ok
   42.4: BPF relocation checker                                        : Ok
   43: Synthesize thread map                                           : Ok
   44: Remove thread map                                               : Ok
   45: Synthesize cpu map                                              : Ok
   46: Synthesize stat config                                          : Ok
   47: Synthesize stat                                                 : Ok
   48: Synthesize stat round                                           : Ok
   49: Synthesize attr update                                          : Ok
   50: Event times                                                     : Ok
   51: Read backward ring buffer                                       : Ok
   52: Print cpu map                                                   : Ok
   53: Merge cpu map                                                   : Ok
   54: Probe SDT events                                                : Ok
   55: is_printable_array                                              : Ok
   56: Print bitmap                                                    : Ok
   57: perf hooks                                                      : Ok
   58: builtin clang support                                           : Skip (not compiled in)
   59: unit_number__scnprintf                                          : Ok
   60: mem2node                                                        : Ok
   61: time utils                                                      : Ok
   62: Test jit_write_elf                                              : Ok
   63: Test libpfm4 support                                            : Skip (not compiled in)
   64: Test api io                                                     : Ok
   65: maps__merge_in                                                  : Ok
   66: Demangle Java                                                   : Ok
   67: Parse and process metrics                                       : Ok
   68: PE file support                                                 : Ok
   69: Event expansion for cgroups                                     : Ok
   70: Convert perf time to TSC                                        : Ok
   71: x86 rdpmc                                                       : Ok
   72: DWARF unwind                                                    : Ok
   73: x86 instruction decoder - new instructions                      : Ok
   74: Intel PT packet decoder                                         : Ok
   75: x86 bp modify                                                   : Ok
   76: probe libc's inet_pton & backtrace it with ping                 : Ok
   77: Use vfs_getname probe to get syscall args filenames             : Ok
   78: Check Arm CoreSight trace data recording and synthesized samples: Skip
   79: perf stat metrics (shadow stat) test                            : Ok
   80: build id cache operations                                       : Ok
   81: Add vfs_getname probe to get syscall args filenames             : Ok
   82: Check open filename arg using perf trace + vfs_getname          : Ok
   83: Zstd perf.data compression/decompression                        : Ok
 
   $ make -C tools/perf build-test
   make: Entering directory '/home/acme/git/perf/tools/perf'
   - tarpkg: ./tests/perf-targz-src-pkg .
            make_no_libpython_O: make NO_LIBPYTHON=1
                  make_no_sdt_O: make NO_SDT=1
                    make_tags_O: make tags
                 make_install_O: make install
             make_install_bin_O: make install-bin
                   make_debug_O: make DEBUG=1
   make_no_libdw_dwarf_unwind_O: make NO_LIBDW_DWARF_UNWIND=1
               make_no_libelf_O: make NO_LIBELF=1
                  make_cscope_O: make cscope
            make_no_backtrace_O: make NO_BACKTRACE=1
              make_no_libnuma_O: make NO_LIBNUMA=1
                   make_no_ui_O: make NO_NEWT=1 NO_SLANG=1 NO_GTK2=1
                 make_no_newt_O: make NO_NEWT=1
         make_with_babeltrace_O: make LIBBABELTRACE=1
        make_util_pmu_bison_o_O: make util/pmu-bison.o
            make_no_libunwind_O: make NO_LIBUNWIND=1
         make_no_libbpf_DEBUG_O: make NO_LIBBPF=1 DEBUG=1
                     make_doc_O: make doc
                  make_perf_o_O: make perf.o
                 make_no_gtk2_O: make NO_GTK2=1
          make_with_clangllvm_O: make LIBCLANGLLVM=1
               make_clean_all_O: make clean all
             make_no_demangle_O: make NO_DEMANGLE=1
               make_with_gtk2_O: make GTK2=1
              make_util_map_o_O: make util/map.o
                    make_pure_O: make
            make_no_libbionic_O: make NO_LIBBIONIC=1
             make_no_libaudit_O: make NO_LIBAUDIT=1
               make_no_libbpf_O: make NO_LIBBPF=1
    make_install_prefix_slash_O: make install prefix=/tmp/krava/
                    make_help_O: make help
          make_no_syscall_tbl_O: make NO_SYSCALL_TABLE=1
              make_no_scripts_O: make NO_LIBPYTHON=1 NO_LIBPERL=1
                 make_minimal_O: make NO_LIBPERL=1 NO_LIBPYTHON=1 NO_NEWT=1 NO_GTK2=1 NO_DEMANGLE=1 NO_LIBELF=1 NO_LIBUNWIND=1 NO_BACKTRACE=1 NO_LIBNUMA=1 NO_LIBAUDIT=1 NO_LIBBIONIC=1 NO_LIBDW_DWARF_UNWIND=1 NO_AUXTRACE=1 NO_LIBBPF=1 NO_LIBCRYPTO=1 NO_SDT=1 NO_JVMTI=1 NO_LIBZSTD=1 NO_LIBCAP=1 NO_SYSCALL_TABLE=1
            make_no_libcrypto_O: make NO_LIBCRYPTO=1
                  make_static_O: make LDFLAGS=-static NO_PERF_READ_VDSO32=1 NO_PERF_READ_VDSOX32=1 NO_JVMTI=1
          make_install_prefix_O: make install prefix=/tmp/krava
             make_no_auxtrace_O: make NO_AUXTRACE=1
            make_with_libpfm4_O: make LIBPFM4=1
              make_no_libperl_O: make NO_LIBPERL=1
                make_no_slang_O: make NO_SLANG=1
   OK
   make: Leaving directory '/home/acme/git/perf/tools/perf'
   $
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQR2GiIUctdOfX2qHhGyPKLppCJ+JwUCYASljgAKCRCyPKLppCJ+
 J/E/AQCOGFqF7UmEzuuTecWeeBNCwVyD3woHLU13ll/e5VLNggD/YD9t8CZS+vwy
 21yL4/yXZloLFE48OCLRNWeq91FL/gs=
 =uZDD
 -----END PGP SIGNATURE-----

Merge tag 'perf-tools-fixes-2021-01-17' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux

Pull perf tools fixes from Arnaldo Carvalho de Melo:

 - Fix 'CPU too large' error in Intel PT

 - Correct event attribute sizes in 'perf inject'

 - Sync build_bug.h and kvm.h kernel copies

 - Fix bpf.h header include directive in 5sec.c 'perf trace' bpf example

 - libbpf tests fixes

 - Fix shadow stat 'perf test' for non-bash shells

 - Take cgroups into account for shadow stats in 'perf stat'

* tag 'perf-tools-fixes-2021-01-17' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux:
  perf inject: Correct event attribute sizes
  perf intel-pt: Fix 'CPU too large' error
  perf stat: Take cgroups into account for shadow stats
  perf stat: Introduce struct runtime_stat_data
  libperf tests: Fail when failing to get a tracepoint id
  libperf tests: If a test fails return non-zero
  libperf tests: Avoid uninitialized variable warning
  perf test: Fix shadow stat test for non-bash shells
  tools headers: Syncronize linux/build_bug.h with the kernel sources
  tools headers UAPI: Sync kvm.h headers with the kernel sources
  perf bpf examples: Fix bpf.h header include directive in 5sec.c example
2021-01-17 13:14:46 -08:00
Linus Torvalds
a1339d6355 powerpc fixes for 5.11 #4
One fix for a lack of alignment in our linker script, that can lead to crashes
 depending on configuration etc.
 
 One fix for the 32-bit VDSO after the C VDSO conversion.
 
 Thanks to:
   Andreas Schwab, Ariel Marcovitch, Christophe Leroy.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCAAxFiEEJFGtCPCthwEv2Y/bUevqPMjhpYAFAmAEDicTHG1wZUBlbGxl
 cm1hbi5pZC5hdQAKCRBR6+o8yOGlgAMOEAC2jmfH0UwkbDWsKtolps7Gy4hzr1HH
 PzauZgWHA+eq+j6I7oDsACsVOsVnUGCBsEkOfFYTIlBroVgnTdXlRU3WSsisnTfW
 sjaQguv3nP01P82CicIVCJJJJFpJENuXcs4Dr02OYP9VMFytWiAr6RvxxCOqozVo
 dcCg7/04za+v5mR3KRdw2Jf5mlox5kN7wFCFMLlSzadAdUneP+Qt583shEx0KejH
 IXQOXTp191Q0luFh/2TLz+gzai/A2v16Bk/Q7h3VQ/EQ3V0jpEil6bQXX2UI6on8
 dRngTQ4j7gZ5b7QcpqvO2t2otWthGO0YQ/rfI3p1XdpWZNQKFA2I3cXblSqFEhp1
 /qI2K5zUiLbRSW4NrgxZ6zIt0PYuxYnrIt7Wwj7nV+79RP+9o9t1VcvUMAaSL5C+
 DfQq8GJdsUUUifFzNzq9EeuL2T0RHFooK0xNd00hc43NJjmnhni3TY20UI4r7b8k
 PmeKJg94Pc4a6PmtGUsOgG53CGENVDTDPCSY7e9XSIAMT0jV0Cbo4+0uwk4s/J/b
 1oEROtc8TTq6I47ARc6GZgQ9Wui4C/34uxIuhF7uTTGrWYlMgFcMOkRGUt8CuBrD
 DLhjA37uqgf+bK2g2heCOQXIjh9JCGc3V7BEB0d545xxv0vjpIPmk9mXwGyxth0N
 /lCUHrl64VtI/g==
 =jpfR
 -----END PGP SIGNATURE-----

Merge tag 'powerpc-5.11-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

Pull powerpc fixes from Michael Ellerman:
 "One fix for a lack of alignment in our linker script, that can lead to
  crashes depending on configuration etc.

  One fix for the 32-bit VDSO after the C VDSO conversion.

  Thanks to Andreas Schwab, Ariel Marcovitch, and Christophe Leroy"

* tag 'powerpc-5.11-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
  powerpc/vdso: Fix clock_gettime_fallback for vdso32
  powerpc: Fix alignment bug within the init sections
2021-01-17 12:28:58 -08:00
Linus Torvalds
a527a2b32d Merge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull misc vfs fixes from Al Viro:
 "Several assorted fixes.

  I still think that audit ->d_name race is better fixed this way for
  the benefit of backports, with any possibly fancier variants done on
  top of it"

* 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  dump_common_audit_data(): fix racy accesses to ->d_name
  iov_iter: fix the uaccess area in copy_compat_iovec_from_user
  umount(2): move the flag validity checks first
2021-01-17 12:16:47 -08:00
Linus Torvalds
feb889fb40 mm: don't put pinned pages into the swap cache
So technically there is nothing wrong with adding a pinned page to the
swap cache, but the pinning obviously means that the page can't actually
be free'd right now anyway, so it's a bit pointless.

However, the real problem is not with it being a bit pointless: the real
issue is that after we've added it to the swap cache, we'll try to unmap
the page.  That will succeed, because the code in mm/rmap.c doesn't know
or care about pinned pages.

Even the unmapping isn't fatal per se, since the page will stay around
in memory due to the pinning, and we do hold the connection to it using
the swap cache.  But when we then touch it next and take a page fault,
the logic in do_swap_page() will map it back into the process as a
possibly read-only page, and we'll then break the page association on
the next COW fault.

Honestly, this issue could have been fixed in any of those other places:
(a) we could refuse to unmap a pinned page (which makes conceptual
sense), or (b) we could make sure to re-map a pinned page writably in
do_swap_page(), or (c) we could just make do_wp_page() not COW the
pinned page (which was what we historically did before that "mm:
do_wp_page() simplification" commit).

But while all of them are equally valid models for breaking this chain,
not putting pinned pages into the swap cache in the first place is the
simplest one by far.

It's also the safest one: the reason why do_wp_page() was changed in the
first place was that getting the "can I re-use this page" wrong is so
fraught with errors.  If you do it wrong, you end up with an incorrectly
shared page.

As a result, using "page_maybe_dma_pinned()" in either do_wp_page() or
do_swap_page() would be a serious bug since it is only a (very good)
heuristic.  Re-using the page requires a hard black-and-white rule with
no room for ambiguity.

In contrast, saying "this page is very likely dma pinned, so let's not
add it to the swap cache and try to unmap it" is an obviously safe thing
to do, and if the heuristic might very rarely be a false positive, no
harm is done.

Fixes: 09854ba94c ("mm: do_wp_page() simplification")
Reported-and-tested-by: Martin Raiber <martin@urbackup.org>
Cc: Pavel Begunkov <asml.silence@gmail.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Peter Xu <peterx@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-01-17 12:08:04 -08:00
Linus Torvalds
0da0a8a0a0 SCSI fixes on 20210116
Nine minor fixes, 7 in drivers and 2 in the core SCSI disk driver (sd)
 which should be harmless involving removing an unused variable and
 quietening a spurious warning.
 
 Signed-off-by: James E.J. Bottomley <jejb@linux.ibm.com>
 -----BEGIN PGP SIGNATURE-----
 
 iJwEABMIAEQWIQTnYEDbdso9F2cI+arnQslM7pishQUCYANKJiYcamFtZXMuYm90
 dG9tbGV5QGhhbnNlbnBhcnRuZXJzaGlwLmNvbQAKCRDnQslM7pishVSyAP9a4xdK
 9A4seh2/LW3GwPsoQUJQINe4yok/lGVXSQk3XQD/QkkYbzeqVGN4BHK1LQX2R9Sw
 wy+E4ENjdOY+9p+KMxo=
 =m0jh
 -----END PGP SIGNATURE-----

Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi

Pull SCSI fixes from James Bottomley:
 "Nine minor fixes, seven in drivers and two in the core SCSI disk
  driver (sd) which should be harmless involving removing an unused
  variable and quietening a spurious warning"

Signed-off-by: James E.J. Bottomley <jejb@linux.ibm.com>

* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
  scsi: sd: Remove obsolete variable in sd_remove()
  scsi: sd: Suppress spurious errors when WRITE SAME is being disabled
  scsi: scsi_debug: Fix memleak in scsi_debug_init()
  scsi: mpt3sas: Fix spelling mistake in Kconfig "compatiblity" -> "compatibility"
  scsi: qedi: Correct max length of CHAP secret
  scsi: ufs: Correct the LUN used in eh_device_reset_handler() callback
  scsi: ufs: Relocate flush of exceptional event
  scsi: ufs: Relax the condition of UFSHCI_QUIRK_SKIP_MANUAL_WB_FLUSH_CTRL
  scsi: ufs: Fix possible power drain during system suspend
2021-01-16 12:25:40 -08:00
Al Viro
d36a1dd9f7 dump_common_audit_data(): fix racy accesses to ->d_name
We are not guaranteed the locking environment that would prevent
dentry getting renamed right under us.  And it's possible for
old long name to be freed after rename, leading to UAF here.

Cc: stable@kernel.org # v2.6.2+
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2021-01-16 15:11:35 -05:00
Linus Torvalds
54c6247d06 block-5.11-2021-01-16
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmADKbkQHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgpgrWD/9gmAaxW3cF5K9UPcfT9VRz50PRfa5cGNwu
 eQ3ETIkqQtmizibj9BXBYwPWn6FK6gSeI8LjP0SXOUxkG1O23Hwwb6ieQ2tRg+CX
 Wmb+S7EUwpkHBWWmulfsX9PXQsyG2D4oz8Obxwp0jvI3XzuEeMQ5gYTpO+0YTxi1
 VgdeOZfV5Xben0kvpwlUfowdOD6CK8LSPJ4KjuynkvgtfKuipsCmJqMqRZqFMu3f
 F4p+ngVYuqEcKddh+h5pFifUy2bo076eYe4kE6vGlmdySjaVyHiLJcBmkMDxb/cp
 daWTaBGCsvZfP0uwLgATD0P0UqJn44Vx15FlN50uSlrfaEiglO6aBuWwbQpuPpq/
 FF/hhUscs9cmZhYGO8TNnOZHfzNmYmPx9dH/D+Fo6mqCaGvsOyGGUeKuG86hfLlM
 AEVGFiVdlQNdLq7f2HHFKfWL1XvHbvGrgQ5d4dnJrD9a/1TgXTiluleC1+vvLfoP
 cRTATNtfMTcc88QF+L1BxtHXkUsf6dyqPt1AalmKqwHDv9d1C/B0aZt6a/JFTXp4
 utsxDi+WOb4HVT4/ojl35HML6CWATqVgHH1rSG2Q8UPWvLX+O8QalRIBhtNaRt9k
 hr3joKlmeyCjJdrgE+07xlwfiqTmmhmTkGjJw39KyQTiOyyEPey+HWwJEhO1QaFR
 hgoQvLA4mw==
 =CYyj
 -----END PGP SIGNATURE-----

Merge tag 'block-5.11-2021-01-16' of git://git.kernel.dk/linux-block

Pull block fixes from Jens Axboe:
 "Just an nvme pull request via Christoph:

   - don't initialize hwmon for discover controllers (Sagi Grimberg)

   - fix iov_iter handling in nvme-tcp (Sagi Grimberg)

   - fix a preempt warning in nvme-tcp (Sagi Grimberg)

   - fix a possible NULL pointer dereference in nvme (Israel Rukshin)"

* tag 'block-5.11-2021-01-16' of git://git.kernel.dk/linux-block:
  nvme: don't intialize hwmon for discovery controllers
  nvme-tcp: fix possible data corruption with bio merges
  nvme-tcp: Fix warning with CONFIG_DEBUG_PREEMPT
  nvmet-rdma: Fix NULL deref when setting pi_enable and traddr INADDR_ANY
2021-01-16 11:39:58 -08:00
Linus Torvalds
11c0239ae2 io_uring-5.11-2021-01-16
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmADKaIQHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgpnjVD/402IBhG40EoLMdcztTbOF7S5JZLbECbWam
 HPN99D0YyjRQNwsd7CPzeO+7W4zO4P0T/NcnXLF1TmPnkRynDNsT4oygcLrJoH8n
 xZoAXi6ykM/4eE+4NSaNfWF/f3DtGepsIHMpSNNyzE+qxz4I36/EPTu7M13ga1A4
 H6pAbvU1DrDZ+z62rhWcb+4ouUV7QDS02UTNo9cW4ueeMNLyW/Zxp9lBzNuAkbzF
 gRlsxkhXLgcqtA36EdRWKs9fOwcdTyiMGzS5DRRmmE/O/a3//yl0ENFTPmWd6+jf
 daZcAYW6BdB+0va1sYGbECl9zOkMlhcm/hNFnl3ZmFIW5a4QiRy2WLoWuiNwVSYM
 nGCz0l5P2pP32tt0fQlSQh8KdI+py+c7MhJ1IaRhJi8X4t1MHoOR3vlilw0UvrdQ
 zpm8H+hMmL3g9/6pZLwxBvd31N1Y/A8ghh6qTdQ2EwdPyjImR3Xv/wYUqdbfvBo5
 VcB6ccoFwuP5KzQcyI2j2WRcx7AtHZuiVGvoqkA8QL9Onq3GWoY8tnkf/sP8RxlQ
 wL91wVkT46UMkCLeY7QCGdx4nnmSahCEa8SQEYxibnh++9jh+c6Y5FtZc3YNjnED
 GRFT9vJmqNmrvKCLo78iTf/F8vJllGXQRKDeMuU9DVj3ylhZOBSEhpmMIZwdXNQa
 CqMCqia4Xg==
 =XL14
 -----END PGP SIGNATURE-----

Merge tag 'io_uring-5.11-2021-01-16' of git://git.kernel.dk/linux-block

Pull io_uring fixes from Jens Axboe:
 "We still have a pending fix for a cancelation issue, but it's still
  being investigated. In the meantime:

   - Dead mm handling fix (Pavel)

   - SQPOLL setup error handling (Pavel)

   - Flush timeout sequence fix (Marcelo)

   - Missing finish_wait() for one exit case"

* tag 'io_uring-5.11-2021-01-16' of git://git.kernel.dk/linux-block:
  io_uring: ensure finish_wait() is always called in __io_uring_task_cancel()
  io_uring: flush timeouts that should already have expired
  io_uring: do sqo disable on install_fd error
  io_uring: fix null-deref in io_disable_sqo_submit
  io_uring: don't take files/mm for a dead task
  io_uring: drop mm and files after task_work_run
2021-01-16 11:12:02 -08:00
Linus Torvalds
acda701bf1 RISC-V Fixes for 5.11-rc4
There are a few more fixes than a normal rc4, largely due to the bubble
 introduced by the holiday break:
 
 * A fix to return -ENOSYS for syscall number -1, which previously
   returned an uninitialized value.
 * A fix to time_init() to ensure of_clk_init() has been called, without
   which clock drivers may not be initialized.
 * A fix to the sifive,uart0 driver to properly display the baud rate.  A
   fix to initialize MPIE that allows interrupts to be processed during
   system calls.
 * A fix to avoid erronously begin tracing IRQs when interrupts are
   disabled, which at least triggers suprious lockdep failures.
 * A workaround for a warning related to calling smp_processor_id() while
   preemptible.  The warning itself is suprious on currently availiable
   systems.
 * A fix to properly include the generic time VDSO calls.  A fix to our
   kasan address mapping.  A fix to the HiFive Unleashed device tree,
   which allows the Ethernet PHY to be properly initialized by Linux (as
   opposed to relying on the bootloader).
 * A defconfig update to include SiFive's GPIO driver, which is present
   on the HiFive Unleashed and necessary to initialize the PHY.
 * A fix to avoid allocating memory while initializing reserved memory.
 * A fix to avoid allocating the last 4K of memory, as pointers there
   alias with syscall errors.
 
 There are also two cleanups that should have no functional effect but do
 fix build warnings:
 
 * A cleanup to drop a duplicated definition of PAGE_KERNEL_EXEC.
 * A cleanup to properly declare the asm register SP shim.
 * A cleanup to the rv32 memory size Kconfig entry, to reflect the actual
   size of memory availiable.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEKzw3R0RoQ7JKlDp6LhMZ81+7GIkFAmACfN8THHBhbG1lckBk
 YWJiZWx0LmNvbQAKCRAuExnzX7sYiWkQD/40g0dX0dX7fkGHk/zbwlkJffA/VCF7
 +dtqfJSjpRkfPCBEOCJxfCYdHM0adrH2FACwN2JhQmOzOpbccO6Xhr/9jgqms43x
 WB5J0WNKrIT2ZGpB6Hp5htD2lo2OuCLNtdnMK0I3NJMBhJHUKb4HGMwFXKBg+RK1
 LtOQMKEJSgkwT6Mhe5K8+tz5VLu6XQUJMCAzxoeFoTDi513HDDcAoOcvfKt0VujD
 1g2HDtIhYlIYUiGliBcFh88slycWh8Im41Hi2Dh5Nfe9PaiexskpL1+g+CCnZUpa
 yIi72AOirjNxxvfPfmAcTXRx4lHB/YQnB9EUnI6W16D7XniXedC7StR+zhzRjkuy
 lePMKJE6x8mTMF5zv2GLP1FalBeJ2SFwLtboT7zfRlhI+IucMqCaSsE06RHg5/52
 YL9ScVi0xS423f3ctGNUrlSKQKdV1s5D7pm6Qf2WZlqUvwKTspOvkzDn8hNpINCe
 0D3EvPDWVoRSWx4u0Ao/Ig2+xwumtun1IPmF7WHqW+aSxVTs0g0XiEiA8ubaU9bD
 JpqfCL9ftzD5S85igHp8+a3zuFmK+sgR/7JCdp7TqBoHmLoQjg8c5L56m1cZy8Mn
 Wu6E1X0n0IyDDjDqWbGTkfkKC6CvbUy6wRW3LdEnhufon3ldiPMTB66gLs4qfcqF
 Yq5WkrJyGK98Ww==
 =guLu
 -----END PGP SIGNATURE-----

Merge tag 'riscv-for-linus-5.11-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux

Pull RISC-V fixes from Palmer Dabbelt:
 "There are a few more fixes than a normal rc4, largely due to the
  bubble introduced by the holiday break:

   - return -ENOSYS for syscall number -1, which previously returned an
     uninitialized value.

   - ensure of_clk_init() has been called in time_init(), without which
     clock drivers may not be initialized.

   - fix sifive,uart0 driver to properly display the baud rate. A fix to
     initialize MPIE that allows interrupts to be processed during
     system calls.

   - avoid erronously begin tracing IRQs when interrupts are disabled,
     which at least triggers suprious lockdep failures.

   - workaround for a warning related to calling smp_processor_id()
     while preemptible. The warning itself is suprious on currently
     availiable systems.

   - properly include the generic time VDSO calls. A fix to our kasan
     address mapping. A fix to the HiFive Unleashed device tree, which
     allows the Ethernet PHY to be properly initialized by Linux (as
     opposed to relying on the bootloader).

   - defconfig update to include SiFive's GPIO driver, which is present
     on the HiFive Unleashed and necessary to initialize the PHY.

   - avoid allocating memory while initializing reserved memory.

   - avoid allocating the last 4K of memory, as pointers there alias
     with syscall errors.

  There are also two cleanups that should have no functional effect but
  do fix build warnings:

   - drop a duplicated definition of PAGE_KERNEL_EXEC.

   - properly declare the asm register SP shim.

   - cleanup the rv32 memory size Kconfig entry, to reflect the actual
     size of memory availiable"

* tag 'riscv-for-linus-5.11-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
  RISC-V: Fix maximum allowed phsyical memory for RV32
  RISC-V: Set current memblock limit
  RISC-V: Do not allocate memblock while iterating reserved memblocks
  riscv: stacktrace: Move register keyword to beginning of declaration
  riscv: defconfig: enable gpio support for HiFive Unleashed
  dts: phy: add GPIO number and active state used for phy reset
  dts: phy: fix missing mdio device and probe failure of vsc8541-01 device
  riscv: Fix KASAN memory mapping.
  riscv: Fixup CONFIG_GENERIC_TIME_VSYSCALL
  riscv: cacheinfo: Fix using smp_processor_id() in preemptible
  riscv: Trace irq on only interrupt is enabled
  riscv: Drop a duplicated PAGE_KERNEL_EXEC
  riscv: Enable interrupts during syscalls with M-Mode
  riscv: Fix sifive serial driver
  riscv: Fix kernel time_init()
  riscv: return -ENOSYS for syscall -1
2021-01-16 11:00:08 -08:00
Linus Torvalds
9348b73c2e mm: don't play games with pinned pages in clear_page_refs
Turning a pinned page read-only breaks the pinning after COW.  Don't do it.

The whole "track page soft dirty" state doesn't work with pinned pages
anyway, since the page might be dirtied by the pinning entity without
ever being noticed in the page tables.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-01-16 10:51:26 -08:00
Linus Torvalds
29a951dfb3 mm: fix clear_refs_write locking
Turning page table entries read-only requires the mmap_sem held for
writing.

So stop doing the odd games with turning things from read locks to write
locks and back.  Just get the write lock.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-01-16 10:46:39 -08:00