mirror of
https://mirrors.bfsu.edu.cn/git/linux.git
synced 2024-09-22 22:11:38 +08:00
8d97039646
37734 Commits
Author | SHA1 | Message | Date | |
---|---|---|---|---|
Frederic Weisbecker
|
8d97039646 |
rcu/nocb: Prepare nocb_cb_wait() to start with a non-offloaded rdp
In order to be able to toggle the offloaded state from cpusets, a nocb kthread will need to be created for all possible CPUs whenever either of the "rcu_nocbs=" or "nohz_full=" parameters are specified. Therefore, the nocb_cb_wait() kthread must be prepared to start running on a de-offloaded rdp. To accomplish this, simply move the sleeping condition to the beginning of the nocb_cb_wait() function, which prevents this kthread from attempting to invoke callbacks before the corresponding CPU is offloaded. Reviewed-by: Neeraj Upadhyay <quic_neeraju@quicinc.com> Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Uladzislau Rezki <urezki@gmail.com> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Joel Fernandes <joel@joelfernandes.org> Tested-by: Juri Lelli <juri.lelli@redhat.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> |
||
Frederic Weisbecker
|
2ebc45c44c |
rcu/nocb: Remove rcu_node structure from nocb list when de-offloaded
The nocb_gp_wait() function iterates over all CPUs in its group, including even those CPUs that have been de-offloaded. This is of course suboptimal, especially if none of the CPUs within the group are currently offloaded. This will become even more of a problem once a nocb kthread is created for all possible CPUs. Therefore use a standard double linked list to link all the offloaded rcu_data structures and safely add or delete these structure as we offload or de-offload them, respectively. Reviewed-by: Neeraj Upadhyay <quic_neeraju@quicinc.com> Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Uladzislau Rezki <urezki@gmail.com> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Joel Fernandes <joel@joelfernandes.org> Tested-by: Juri Lelli <juri.lelli@redhat.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> |
||
Frederic Weisbecker
|
0598a4d442 |
rcu/nocb: Don't invoke local rcu core on callback overload from nocb kthread
rcu_core() tries to ensure that its self-invocation in case of callbacks overload only happen in softirq/rcuc mode. Indeed it doesn't make sense to trigger local RCU core from nocb_cb kthread since it can execute on a CPU different from the target rdp. Also in case of overload, the nocb_cb kthread simply iterates a new loop of callbacks processing. However the "offloaded" check that aims at preventing misplaced rcu_core() invocations is wrong. First of all that state is volatile and second: softirq/rcuc can execute while the target rdp is offloaded. As a result rcu_core() can be invoked on the wrong CPU while in the process of (de-)offloading. Fix that with moving the rcu_core() self-invocation to rcu_core() itself, irrespective of the rdp offloaded state. Tested-by: Valentin Schneider <valentin.schneider@arm.com> Tested-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Cc: Valentin Schneider <valentin.schneider@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Joel Fernandes <joel@joelfernandes.org> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Neeraj Upadhyay <neeraju@codeaurora.org> Cc: Uladzislau Rezki <urezki@gmail.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> |
||
Frederic Weisbecker
|
a554ba2888 |
rcu: Apply callbacks processing time limit only on softirq
Time limit only makes sense when callbacks are serviced in softirq mode because: _ In case we need to get back to the scheduler, cond_resched_tasks_rcu_qs() is called after each callback. _ In case some other softirq vector needs the CPU, the call to local_bh_enable() before cond_resched_tasks_rcu_qs() takes care about them via a call to do_softirq(). Therefore, make sure the time limit only applies to softirq mode. Reviewed-by: Valentin Schneider <valentin.schneider@arm.com> Tested-by: Valentin Schneider <valentin.schneider@arm.com> Tested-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Cc: Valentin Schneider <valentin.schneider@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Joel Fernandes <joel@joelfernandes.org> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Neeraj Upadhyay <neeraju@codeaurora.org> Cc: Uladzislau Rezki <urezki@gmail.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> |
||
Frederic Weisbecker
|
3e61e95e2d |
rcu: Fix callbacks processing time limit retaining cond_resched()
The callbacks processing time limit makes sure we are not exceeding a given amount of time executing the queue. However its "continue" clause bypasses the cond_resched() call on rcuc and NOCB kthreads, delaying it until we reach the limit, which can be very long... Make sure the scheduler has a higher priority than the time limit. Reviewed-by: Valentin Schneider <valentin.schneider@arm.com> Tested-by: Valentin Schneider <valentin.schneider@arm.com> Tested-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Cc: Valentin Schneider <valentin.schneider@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Joel Fernandes <joel@joelfernandes.org> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Neeraj Upadhyay <neeraju@codeaurora.org> Cc: Uladzislau Rezki <urezki@gmail.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> |
||
Frederic Weisbecker
|
78ad37a2c5 |
rcu/nocb: Limit number of softirq callbacks only on softirq
The current condition to limit the number of callbacks executed in a row checks the offloaded state of the rdp. Not only is it volatile but it is also misleading: the rcu_core() may well be executing callbacks concurrently with NOCB kthreads, and the offloaded state would then be verified on both cases. As a result the limit would spuriously not apply anymore on softirq while in the middle of (de-)offloading process. Fix and clarify the condition with those constraints in mind: _ If callbacks are processed either by rcuc or NOCB kthread, the call to cond_resched_tasks_rcu_qs() is enough to take care of the overload. _ If instead callbacks are processed by softirqs: * If need_resched(), exit the callbacks processing * Otherwise if CPU is idle we can continue * Otherwise exit because a softirq shouldn't interrupt a task for too long nor deprive other pending softirq vectors of the CPU. Tested-by: Valentin Schneider <valentin.schneider@arm.com> Tested-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Cc: Valentin Schneider <valentin.schneider@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Joel Fernandes <joel@joelfernandes.org> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Neeraj Upadhyay <neeraju@codeaurora.org> Cc: Uladzislau Rezki <urezki@gmail.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> |
||
Frederic Weisbecker
|
7b65dfa32d |
rcu/nocb: Use appropriate rcu_nocb_lock_irqsave()
Instead of hardcoding IRQ save and nocb lock, use the consolidated API (and fix a comment as per Valentin Schneider's suggestion). Reviewed-by: Valentin Schneider <valentin.schneider@arm.com> Tested-by: Valentin Schneider <valentin.schneider@arm.com> Tested-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Cc: Valentin Schneider <valentin.schneider@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Joel Fernandes <joel@joelfernandes.org> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Neeraj Upadhyay <neeraju@codeaurora.org> Cc: Uladzislau Rezki <urezki@gmail.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> |
||
Frederic Weisbecker
|
344e219d7d |
rcu/nocb: Check a stable offloaded state to manipulate qlen_last_fqs_check
It's not entirely obvious why rdp->qlen_last_fqs_check is updated before processing the queue only on offloaded rdp. There can be different effect to that, either in favour of triggering the force quiescent state path or not. For example: 1) If the number of callbacks has decreased since the last rdp->qlen_last_fqs_check update (because we recently called rcu_do_batch() and we executed below qhimark callbacks) and the number of processed callbacks on a subsequent do_batch() arranges for exceeding qhimark on non-offloaded but not on offloaded setup, then we may spare a later run to the force quiescent state slow path on __call_rcu_nocb_wake(), as compared to the non-offloaded counterpart scenario. Here is such an offloaded scenario instance: qhimark = 1000 rdp->last_qlen_last_fqs_check = 3000 rcu_segcblist_n_cbs(rdp) = 2000 rcu_do_batch() { if (offloaded) rdp->last_qlen_fqs_check = rcu_segcblist_n_cbs(rdp) // 2000 // run 1000 callback rcu_segcblist_n_cbs(rdp) = 1000 // Not updating rdp->qlen_last_fqs_check if (count < rdp->qlen_last_fqs_check - qhimark) rdp->qlen_last_fqs_check = count; } call_rcu() * 1001 { __call_rcu_nocb_wake() { // not taking the fqs slowpath: // rcu_segcblist_n_cbs(rdp) == 2001 // rdp->qlen_last_fqs_check == 2000 // qhimark == 1000 if (len > rdp->qlen_last_fqs_check + qhimark) ... } In the case of a non-offloaded scenario, rdp->qlen_last_fqs_check would be 1000 and the fqs slowpath would have executed. 2) If the number of callbacks has increased since the last rdp->qlen_last_fqs_check update (because we recently queued below qhimark callbacks) and the number of callbacks executed in rcu_do_batch() doesn't exceed qhimark for either offloaded or non-offloaded setup, then it's possible that the offloaded scenario later run the force quiescent state slow path on __call_rcu_nocb_wake() while the non-offloaded doesn't. qhimark = 1000 rdp->last_qlen_last_fqs_check = 3000 rcu_segcblist_n_cbs(rdp) = 2000 rcu_do_batch() { if (offloaded) rdp->last_qlen_last_fqs_check = rcu_segcblist_n_cbs(rdp) // 2000 // run 100 callbacks // concurrent queued 100 rcu_segcblist_n_cbs(rdp) = 2000 // Not updating rdp->qlen_last_fqs_check if (count < rdp->qlen_last_fqs_check - qhimark) rdp->qlen_last_fqs_check = count; } call_rcu() * 1001 { __call_rcu_nocb_wake() { // Taking the fqs slowpath: // rcu_segcblist_n_cbs(rdp) == 3001 // rdp->qlen_last_fqs_check == 2000 // qhimark == 1000 if (len > rdp->qlen_last_fqs_check + qhimark) ... } In the case of a non-offloaded scenario, rdp->qlen_last_fqs_check would be 3000 and the fqs slowpath would have executed. The reason for updating rdp->qlen_last_fqs_check when invoking callbacks for offloaded CPUs is that there is usually no point in waking up either the rcuog or rcuoc kthreads while in this state. After all, both threads are prohibited from indefinite sleeps. The exception is when some huge number of callbacks are enqueued while rcu_do_batch() is in the midst of invoking, in which case interrupting the rcuog kthread's timed sleep might get more callbacks set up for the next grace period. Reported-and-tested-by: Valentin Schneider <valentin.schneider@arm.com> Tested-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Original-patch-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Cc: Valentin Schneider <valentin.schneider@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Joel Fernandes <joel@joelfernandes.org> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Neeraj Upadhyay <neeraju@codeaurora.org> Cc: Uladzislau Rezki <urezki@gmail.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> |
||
Frederic Weisbecker
|
b3bb02fe5a |
rcu/nocb: Make rcu_core() callbacks acceleration (de-)offloading safe
When callbacks are offloaded, the NOCB kthreads handle the callbacks progression on behalf of rcu_core(). However during the (de-)offloading process, the kthread may not be entirely up to the task. As a result some callbacks grace period sequence number may remain stale for a while because rcu_core() won't take care of them either. Fix this with forcing callbacks acceleration from rcu_core() as long as the offloading process isn't complete. Reported-and-tested-by: Valentin Schneider <valentin.schneider@arm.com> Tested-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Cc: Valentin Schneider <valentin.schneider@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Joel Fernandes <joel@joelfernandes.org> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Neeraj Upadhyay <neeraju@codeaurora.org> Cc: Uladzislau Rezki <urezki@gmail.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> |
||
Thomas Gleixner
|
24ee940d89 |
rcu/nocb: Make rcu_core() callbacks acceleration preempt-safe
While reporting a quiescent state for a given CPU, rcu_core() takes advantage of the freshly loaded grace period sequence number and the locked rnp to accelerate the callbacks whose sequence number have been assigned a stale value. This action is only necessary when the rdp isn't offloaded, otherwise the NOCB kthreads already take care of the callbacks progression. However the check for the offloaded state is volatile because it is performed outside the IRQs disabled section. It's possible for the offloading process to preempt rcu_core() at that point on PREEMPT_RT. This is dangerous because rcu_core() may end up accelerating callbacks concurrently with NOCB kthreads without appropriate locking. Fix this with moving the offloaded check inside the rnp locking section. Reported-and-tested-by: Valentin Schneider <valentin.schneider@arm.com> Reviewed-by: Valentin Schneider <valentin.schneider@arm.com> Tested-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Joel Fernandes <joel@joelfernandes.org> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Neeraj Upadhyay <neeraju@codeaurora.org> Cc: Uladzislau Rezki <urezki@gmail.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> |
||
Frederic Weisbecker
|
fbb94cbd70 |
rcu/nocb: Invoke rcu_core() at the start of deoffloading
On PREEMPT_RT, if rcu_core() is preempted by the de-offloading process, some work, such as callbacks acceleration and invocation, may be left unattended due to the volatile checks on the offloaded state. In the worst case this work is postponed until the next rcu_pending() check that can take a jiffy to reach, which can be a problem in case of callbacks flooding. Solve that with invoking rcu_core() early in the de-offloading process. This way any work dismissed by an ongoing rcu_core() call fooled by a preempting deoffloading process will be caught up by a nearby future recall to rcu_core(), this time fully aware of the de-offloading state. Tested-by: Valentin Schneider <valentin.schneider@arm.com> Tested-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Cc: Valentin Schneider <valentin.schneider@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Joel Fernandes <joel@joelfernandes.org> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Neeraj Upadhyay <neeraju@codeaurora.org> Cc: Uladzislau Rezki <urezki@gmail.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> |
||
Frederic Weisbecker
|
213d56bf33 |
rcu/nocb: Prepare state machine for a new step
Currently SEGCBLIST_SOFTIRQ_ONLY is a bit of an exception among the segcblist flags because it is an exclusive state that doesn't mix up with the other flags. Remove it in favour of: _ A flag specifying that rcu_core() needs to perform callbacks execution and acceleration and _ A flag specifying we want the nocb lock to be held in any needed circumstances This clarifies the code and is more flexible: It allows to have a state where rcu_core() runs with locking while offloading hasn't started yet. This is a necessary step to prepare for triggering rcu_core() at the very beginning of the de-offloading process so that rcu_core() won't dismiss work while being preempted by the de-offloading process, at least not without a pending subsequent rcu_core() that will quickly catch up. Reviewed-by: Valentin Schneider <Valentin.Schneider@arm.com> Tested-by: Valentin Schneider <valentin.schneider@arm.com> Tested-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Joel Fernandes <joel@joelfernandes.org> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Neeraj Upadhyay <neeraju@codeaurora.org> Cc: Uladzislau Rezki <urezki@gmail.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> |
||
Frederic Weisbecker
|
118e0d4a1b |
rcu/nocb: Make local rcu_nocb_lock_irqsave() safe against concurrent deoffloading
rcu_nocb_lock_irqsave() can be preempted between the call to rcu_segcblist_is_offloaded() and the actual locking. This matters now that rcu_core() is preemptible on PREEMPT_RT and the (de-)offloading process can interrupt the softirq or the rcuc kthread. As a result we may locklessly call into code that requires nocb locking. In practice this is a problem while we accelerate callbacks on rcu_core(). Simply disabling interrupts before (instead of after) checking the NOCB offload state fixes the issue. Reported-and-tested-by: Valentin Schneider <valentin.schneider@arm.com> Tested-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Cc: Valentin Schneider <valentin.schneider@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Joel Fernandes <joel@joelfernandes.org> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Neeraj Upadhyay <neeraju@codeaurora.org> Cc: Uladzislau Rezki <urezki@gmail.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> |
||
Paul E. McKenney
|
614ddad17f |
rcu: Tighten rcu_advance_cbs_nowake() checks
Currently, rcu_advance_cbs_nowake() checks that a grace period is in progress, however, that grace period could end just after the check. This commit rechecks that a grace period is still in progress while holding the rcu_node structure's lock. The grace period cannot end while the current CPU's rcu_node structure's ->lock is held, thus avoiding false positives from the WARN_ON_ONCE(). As Daniel Vacek noted, it is not necessary for the rcu_node structure to have a CPU that has not yet passed through its quiescent state. Tested-by: Guillaume Morin <guillaume@morinfr.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> |
||
Linus Torvalds
|
622c72b651 |
A single fix for POSIX CPU timers to address a problem where POSIX CPU
timer delivery stops working for a new child task because copy_process() copies state information which is only valid for the parent task. -----BEGIN PGP SIGNATURE----- iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmGRDVUTHHRnbHhAbGlu dXRyb25peC5kZQAKCRCmGPVMDXSYocOFD/42NOdli73N+Jdq7APHUIHXzu+6DVT6 CI5toLQw+0KPoF0s1wg4+J0YCDt2k0Pu4lOabF3Ze/+c6RlR5zfCXESqsXdHaCjh E91Vs57u0ataRMEHo6KB6eBIutuF8hyxfY6vVXfkTRNAreUIWiwWYrlB0G64JVOG +/l1W7adovjLcLwcW+ArrnLJwkBKtXunK6PVv2IrdRHwpMHbwoNRCCCFvzkqnWmQ 4Yy2/NaB/PEBK5kezP1/j9EMcGCTWk1JJIm+l/PEwCCcbIgIdUahpW3XHAaqms6R oukqCvE5ukfmVzBFYBhCamhF8heyEeBVRqGU+Yyk48+I+DQFBCqaqa1NKSuEUdNL Nycy6Rp1yn7CHVSB461shMS6NJGOSNDBjv7vxer3WjV3HPJu7y0RrN7jXbkSfQnm hVKjkmbDEYwylgzFE5+T857NqD5MEXeuIBtTO08hNRnpd61aB3x+qq+8ElE6ST8Y pm6rMzw0AZ5buPK8QdGVDk0dD4WKObj1LzmRZvBtYeWynO6sxyKUl6B2CgAxrvn5 D1Li2/arkJMCVeIuIL5uE6DPoxSh8J7OuEC4KeWX8M8xQSEDImqfZ+tDL2Esv6jv xDmymq584hiCBc1CJjCOA9kZYe6KNXC7lkVOns6GaKKzLhkrcvUR3dUGhMyzxAMO t9QIAinR6JwRRA== =EBbc -----END PGP SIGNATURE----- Merge tag 'timers-urgent-2021-11-14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer fix from Thomas Gleixner: "A single fix for POSIX CPU timers to address a problem where POSIX CPU timer delivery stops working for a new child task because copy_process() copies state information which is only valid for the parent task" * tag 'timers-urgent-2021-11-14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: posix-cpu-timers: Clear task::posix_cputimers_work in copy_process() |
||
Linus Torvalds
|
c36e33e2f4 |
A set of fixes for the interrupt subsystem:
- Core code: A regression fix for the Open Firmware interrupt mapping code where a interrupt controller property in a node caused a map property in the same node to be ignored. - Interrupt chip drivers: - Workaround a limitation in SiFive PLIC interrupt chip which silently ignores an EOI when the interrupt line is masked. - Provide the missing mask/unmask implementation for the CSKY MP interrupt controller. - PCI/MSI: - Prevent a use after free when PCI/MSI interrupts are released by destroying the sysfs entries before freeing the memory which is accessed in the sysfs show() function. - Implement a mask quirk for the Nvidia ION AHCI chip which does not advertise masking capability despite implementing it. Even worse the chip comes out of reset with all MSI entries masked, which due to the missing masking capability never get unmasked. - Move the check which prevents accessing the MSI[X] masking for XEN back into the low level accessors. The recent consolidation missed that these accessors can be invoked from places which do not have that check which broke XEN. Move them back to he original place instead of sprinkling tons of these checks all over the code. -----BEGIN PGP SIGNATURE----- iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmGRDCsTHHRnbHhAbGlu dXRyb25peC5kZQAKCRCmGPVMDXSYoTL5D/4n7CUudohHPckr0Rl3LbnSUfyY9g3H irTKur71AT392YerJtQp+WBp3AKYMDD8wPTgydfpWe95ouIjx5jhb/co7uSifG6k ZssXYS10bkvjqyS8E2s5FnA5xbnagunK/R981qju14Ec39xqx1JzlUnO/Pra0Kcr 5rBV7br9jJMBleBI4OFuS9fS8dVL1MH/yushkuDNfIKEnaElnaxaYUk/ZdzkMMAW lt1B+dPhK24t1hXQvZKp/iVQUGrJWdzzy9aDiUYPv1IZP+V5nbLMgmFvEv8jNdNa 6kkfp0l30nXM9rgvcp2KkasVUPVhurVEwitzz9+tT6LRA+/kSwi2yx8/FwCVUcL6 xD0AgKQgxOj/WwGJTZswvPu3afsLuw3rGmx5uH1IV40P9mPX0AiHWgvoaInHjzlJ QKFQ7mJEuUcC6cJ36RGqX9njhKvPIcUENGCTjGSffcXsWltPrOCg2mQFcsDa9fSH qPfXDVv4YINI+0MAlOULh6TLWQ07xy37HiskJu/AgILOfipoDi8pXdqNJRfvxB1S D3O8vB+SH3lPj69w4dtj7539SdNZn8CCyN3RbNlstl2vHV5Bus3cVk0CcOhG8qNW KwK/tSH8O0ZYHAsUu8OqBipXy6qOPi/10MJQn3NOpvvOmS4oDd+82bq+jp5qJpsG 42WNuzEoBdaUiA== =LBQL -----END PGP SIGNATURE----- Merge tag 'irq-urgent-2021-11-14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull irq fixes from Thomas Gleixner: "A set of fixes for the interrupt subsystem Core code: - A regression fix for the Open Firmware interrupt mapping code where a interrupt controller property in a node caused a map property in the same node to be ignored. Interrupt chip drivers: - Workaround a limitation in SiFive PLIC interrupt chip which silently ignores an EOI when the interrupt line is masked. - Provide the missing mask/unmask implementation for the CSKY MP interrupt controller. PCI/MSI: - Prevent a use after free when PCI/MSI interrupts are released by destroying the sysfs entries before freeing the memory which is accessed in the sysfs show() function. - Implement a mask quirk for the Nvidia ION AHCI chip which does not advertise masking capability despite implementing it. Even worse the chip comes out of reset with all MSI entries masked, which due to the missing masking capability never get unmasked. - Move the check which prevents accessing the MSI[X] masking for XEN back into the low level accessors. The recent consolidation missed that these accessors can be invoked from places which do not have that check which broke XEN. Move them back to he original place instead of sprinkling tons of these checks all over the code" * tag 'irq-urgent-2021-11-14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: of/irq: Don't ignore interrupt-controller when interrupt-map failed irqchip/sifive-plic: Fixup EOI failed when masked irqchip/csky-mpintc: Fixup mask/unmask implementation PCI/MSI: Destroy sysfs before freeing entries PCI: Add MSI masking quirk for Nvidia ION AHCI PCI/MSI: Deal with devices lying about their MSI mask capability PCI/MSI: Move non-mask check back into low level accessors |
||
Linus Torvalds
|
fc661f2dcb |
- Avoid touching ~100 config files in order to be able to select
the preemption model - clear cluster CPU masks too, on the CPU unplug path - prevent use-after-free in cfs - Prevent a race condition when updating CPU cache domains - Factor out common shared part of smp_prepare_cpus() into a common helper which can be called by both baremetal and Xen, in order to fix a booting of Xen PV guests -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmGQ8HcACgkQEsHwGGHe VUouoA//WAZ/dZu7IiM06JhZWswa2yNsdU8qQHys81lEqstaBqiWuZdg1qJTVIir 2d0aN0keiPcsLyAsp1UJ2g/K/7D5vSJWDzsHKfEAToiAm8Tntai2LlSocWWfeSQm 10grDHWpEHbj0hTHTA6HYOr2WbY4/LnR4cdL0WobIzivIrRTx49d0XUOUfWLP5KX 60uM6dSjwpJrQUnvzk+bhGiHVmutFrEJy+UU/0o+nxkdhwraNiSbLi0007BGRCof 6dokRRvLLR09dl1LMG51gVjQch4j/lCx6EWWUhYOFeV3I3gibSCNkmu7dpmMCBTR QWO01cR9gyFN4xQ2is4I36M5L0/8T+sbGvvXIXNDT/XWr0/p+g6p2mx0cd2XiYIr ZthGRcxxV/KGmxfPaygKS9tpQseMEIrdd6VjAnGfZ3OS6CtUvYt8d0B2Soj8FALQ N9fMXDIEP3uUZim8UvCT6HBKlj9LR5uI5n+dAQ6uzsenO9WqeGeldc/N26/+osdN vo4lNYTqiXJPhJvunYW5t4j5JnUa3grDHioAPWaQRJlWtEZBGKs9SXTcweg/KURb mNfe1RfSlGJt28RD3E18gXeSS7xWdKgpcVX1rmW/9tUjX04NNDWjq4sAzOj7c+Ir 4sr78XgCY0pUxFaFYxvQWFUy7wcm0zAczo1RGUhcDTf1edDEvjo= =s2MX -----END PGP SIGNATURE----- Merge tag 'sched_urgent_for_v5.16_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler fixes from Borislav Petkov: - Avoid touching ~100 config files in order to be able to select the preemption model - clear cluster CPU masks too, on the CPU unplug path - prevent use-after-free in cfs - Prevent a race condition when updating CPU cache domains - Factor out common shared part of smp_prepare_cpus() into a common helper which can be called by both baremetal and Xen, in order to fix a booting of Xen PV guests * tag 'sched_urgent_for_v5.16_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: preempt: Restore preemption model selection configs arch_topology: Fix missing clear cluster_cpumask in remove_cpu_topology() sched/fair: Prevent dead task groups from regaining cfs_rq's sched/core: Mitigate race cpus_share_cache()/update_top_cache_domain() x86/smp: Factor out parts of native_smp_prepare_cpus() |
||
Linus Torvalds
|
f7018be292 |
- Prevent unintentional page sharing by checking whether a page
reference to a PMU samples page has been acquired properly before that - Make sure the LBR_SELECT MSR is saved/restored too - Reset the LBR_SELECT MSR when resetting the LBR PMU to clear any residual data left -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmGQ5z8ACgkQEsHwGGHe VUqqdQ/+JIV6t0yIj7aNADaakwAe+i9zFzzUuvb5KT0zPzZirswkz6xeZ4g8S8PZ lSjqKk8M2Yt3SJiqi/s3KNIOev52wtKGmeOFz1I+DUNpgk0wGHkRtVHV/iSptB61 Kp/fJvOVppY5grs5B0fRYkM5e477RPyZo+E0COKnff1bQ+k+z2ItMLCVxFCxQS6k HmgPW7CBye811YcEg28lSwgS1OXiMZ19gACIsqnQ6kQP2Puo8+HT1/V1n+0grejb OeYxURuYSRPd6Ft76qz0YlRIe1dgKllUBr7b0AaM11ADBMtWBTxqJcQvq/mOIHmT 9to0dVB/xFySR57iaL7BRuZFOrt8MRqJniEedMO99Dm9sxEVfHs1iXC9r7wZxQAf /HcvVkcyOJD92Kv+4LS5tKjowCByOYEJW2YQIgXEbA6oIhRuM9/fdxEW6lHwgdwc BPnOR6rtYuq+I+merBIIijAuf8OsIGY7ap2B+f7DkiOtA9+SHZsrU22J8T7CED/w gmrAC3+3KGt7YDs6WZTbvkXminZQyu5WpHe+2K6dlCIPmJLqEsYUx8TeXa/okyvb 8ZXy/CfJNbHUrk6GZw7RFoeannwSPv9ZJO3Mfy5PDvwDk0Fj0J+/G92mR2Zucxpo siNyBCivPY5vBPqk+x6eUPev/C3wPS+dNrs4HOyr1N2gZwgTk40= =Ciqw -----END PGP SIGNATURE----- Merge tag 'perf_urgent_for_v5.16_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fixes from Borislav Petkov: - Prevent unintentional page sharing by checking whether a page reference to a PMU samples page has been acquired properly before that - Make sure the LBR_SELECT MSR is saved/restored too - Reset the LBR_SELECT MSR when resetting the LBR PMU to clear any residual data left * tag 'perf_urgent_for_v5.16_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf/core: Avoid put_page() when GUP fails perf/x86/vlbr: Add c->flags to vlbr event constraints perf/x86/lbr: Reset LBR_SELECT during vlbr reset |
||
Linus Torvalds
|
7c3737c706 |
Three tracing fixes:
- Make local osnoise_instances static - Copy just actual size of histogram strings - Properly check missing operands in histogram expressions -----BEGIN PGP SIGNATURE----- iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCYY++DxQccm9zdGVkdEBn b29kbWlzLm9yZwAKCRAp5XQQmuv6qn93AQD9sBFtm7D/90P8KMp/yl75OTd1InGm uZPOioR/itFXBwD6A4Y4xbpN0aWByM4P31pqFjZRxY0wmInHw3fkd8EjmQM= =LgAs -----END PGP SIGNATURE----- Merge tag 'trace-v5.16-4' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing fixes from Steven Rostedt: "Three tracing fixes: - Make local osnoise_instances static - Copy just actual size of histogram strings - Properly check missing operands in histogram expressions" * tag 'trace-v5.16-4' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: tracing/histogram: Fix check for missing operands in an expression tracing/histogram: Do not copy the fixed-size char array field over the field size tracing/osnoise: Make osnoise_instances static |
||
Kalesh Singh
|
1cab6bce42 |
tracing/histogram: Fix check for missing operands in an expression
If a binary operation is detected while parsing an expression string,
the operand strings are deduced by splitting the experssion string at
the position of the detected binary operator. Both operand strings are
sub-strings (can be empty string) of the expression string but will
never be NULL.
Currently a NULL check is used for missing operands, fix this by
checking for empty strings instead.
Link: https://lkml.kernel.org/r/20211112191324.1302505-1-kaleshsingh@google.com
Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
Fixes:
|
||
Masami Hiramatsu
|
63f84ae6b8 |
tracing/histogram: Do not copy the fixed-size char array field over the field size
Do not copy the fixed-size char array field of the events over
the field size. The histogram treats char array as a string and
there are 2 types of char array in the event, fixed-size and
dynamic string. The dynamic string (__data_loc) field must be
null terminated, but the fixed-size char array field may not
be null terminated (not a string, but just a data).
In that case, histogram can copy the data after the field.
This uses the original field size for fixed-size char array
field to restrict the histogram not to access over the original
field size.
Link: https://lkml.kernel.org/r/163673292822.195747.3696966210526410250.stgit@devnote2
Fixes:
|
||
Linus Torvalds
|
f78e9de80f |
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input
Pull input updates from Dmitry Torokhov: "Just one new driver (Cypress StreetFighter touchkey), and no input core changes this time. Plus various fixes and enhancements to existing drivers" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input: (54 commits) Input: iforce - fix control-message timeout Input: wacom_i2c - use macros for the bit masks Input: ili210x - reduce sample period to 15ms Input: ili210x - improve polled sample spacing Input: ili210x - special case ili251x sample read out Input: elantench - fix misreporting trackpoint coordinates Input: synaptics-rmi4 - Fix device hierarchy Input: i8042 - Add quirk for Fujitsu Lifebook T725 Input: cap11xx - add support for cap1206 Input: remove unused header <linux/input/cy8ctmg110_pdata.h> Input: ili210x - add ili251x firmware update support Input: ili210x - export ili251x version details via sysfs Input: ili210x - use resolution from ili251x firmware Input: pm8941-pwrkey - respect reboot_mode for warm reset reboot: export symbol 'reboot_mode' Input: max77693-haptic - drop unneeded MODULE_ALIAS Input: cpcap-pwrbutton - do not set input parent explicitly Input: max8925_onkey - don't mark comment as kernel-doc Input: ads7846 - do not attempt IRQ workaround when deferring probe Input: ads7846 - use input_set_capability() ... |
||
Daniel Bristot de Oliveira
|
d7458bc0d8 |
tracing/osnoise: Make osnoise_instances static
Make the struct list_head osnoise_instances definition static.
Link: https://lore.kernel.org/all/202111120052.ZuikQSJi-lkp@intel.com/
Link: https://lkml.kernel.org/r/d001f0eeac66e2b2eeec7d2a15e9e7abede0453a.1636667971.git.bristot@kernel.org
Cc: Ingo Molnar <mingo@redhat.com>
Fixes:
|
||
Linus Torvalds
|
ca2ef2d9f2 |
KCSAN pull request for v5.16
This series contains initialization fixups, testing improvements, addition of instruction pointer to data-race reports, and scoped data-race checks. -----BEGIN PGP SIGNATURE----- iQJHBAABCgAxFiEEbK7UrM+RBIrCoViJnr8S83LZ+4wFAmGNQO4THHBhdWxtY2tA a2VybmVsLm9yZwAKCRCevxLzctn7jIECD/49FaTsFhtZdEDlvLI2u2QJnxkVjwda PBZkJrB66jDk0Dyc0oUxOu4GGSw64vze8HOJxWhaBA4tmqWGDA0DmTqRFQ3VJ4uW Csl1uCzkIR9R0dgkDFwkvnq2fNbcr4SwDu0i+7Iig3zws7nhnZlSJPSze6gFkVX2 mLtUXybSR4FvlFMRePHd6cxltmwUohLKOklsI6emOfnSgBBFQ3584wEZ2HN5KwwO 8EwVxE5YNWyZQKqIj76tUoa8qkWbp5SoiiK6mzSQbJpgX8gLN3GngeAc9ZrfY09R aiSQK9FnkcNkpnRROKA6Go6ze5NGa+1NvF32swZ1nSYOb/LFBDtwt4G8Y8cqdmLv UtsxjFX4hhxdZzBSbGK3GwDWtDLWgHrmf5K/qPNHkM+QwdoyS27C5Kzfs4jkbtZ0 rAEWBxTrtdTCd+xMIz04ZDlio05CqSqme2/t4xaxGpcYGHLcuSi3uFa1cRvfaew8 rSfq2WKd9Cu2dKmjyF+EtN4Y2o8l8IaxJyeq5bVrBHeijIBH0KdCWkeDhWIJcMmE Wo36PYsFLyCdAwr66IoNFHvOxbtAQsERZa0/2FGlOKBAzntNA72BdlAFgKJWiLKg M5K1Q+r7kfns/T1JhftTByryZBd5JM+OiZ/rwU0hCRY48L93ftTzGYSyLVfPBeZ0 lDgc/oJQziv9fA== =MjQ1 -----END PGP SIGNATURE----- Merge tag 'kcsan.2021.11.11a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu Pull KCSAN updates from Paul McKenney: "This contains initialization fixups, testing improvements, addition of instruction pointer to data-race reports, and scoped data-race checks" * tag 'kcsan.2021.11.11a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu: kcsan: selftest: Cleanup and add missing __init kcsan: Move ctx to start of argument list kcsan: Support reporting scoped read-write access type kcsan: Start stack trace with explicit location if provided kcsan: Save instruction pointer for scoped accesses kcsan: Add ability to pass instruction pointer of access to reporting kcsan: test: Fix flaky test case kcsan: test: Use kunit_skip() to skip tests kcsan: test: Defer kcsan_test_init() after kunit initialization |
||
Linus Torvalds
|
600b18f88f |
Two tracing fixes:
- Add mutex protection to ring_buffer_reset() - Fix deadlock in modify_ftrace_direct_multi() -----BEGIN PGP SIGNATURE----- iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCYY0ivBQccm9zdGVkdEBn b29kbWlzLm9yZwAKCRAp5XQQmuv6qhAvAPsHNmAXJ32HuOgVrTCm4WxOSDdukri+ E5KirCzp0jtQQwEAxwz8neUalfZ8RQyHdpDe9vP9Ay0lCjbfrPxD0DUtiQE= =VwcI -----END PGP SIGNATURE----- Merge tag 'trace-v5.16-3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing fixes from Steven Rostedt: "Two locking fixes: - Add mutex protection to ring_buffer_reset() - Fix deadlock in modify_ftrace_direct_multi()" * tag 'trace-v5.16-3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: ftrace/direct: Fix lockup in modify_ftrace_direct_multi ring-buffer: Protect ring_buffer_reset() from reentrancy |
||
Linus Torvalds
|
f54ca91fe6 |
Networking fixes for 5.16-rc1, including fixes from bpf, can
and netfilter. Current release - regressions: - bpf: do not reject when the stack read size is different from the tracked scalar size - net: fix premature exit from NAPI state polling in napi_disable() - riscv, bpf: fix RV32 broken build, and silence RV64 warning Current release - new code bugs: - net: fix possible NULL deref in sock_reserve_memory - amt: fix error return code in amt_init(); fix stopping the workqueue - ax88796c: use the correct ioctl callback Previous releases - always broken: - bpf: stop caching subprog index in the bpf_pseudo_func insn - security: fixups for the security hooks in sctp - nfc: add necessary privilege flags in netlink layer, limit operations to admin only - vsock: prevent unnecessary refcnt inc for non-blocking connect - net/smc: fix sk_refcnt underflow on link down and fallback - nfnetlink_queue: fix OOB when mac header was cleared - can: j1939: ignore invalid messages per standard - bpf, sockmap: - fix race in ingress receive verdict with redirect to self - fix incorrect sk_skb data_end access when src_reg = dst_reg - strparser, and tls are reusing qdisc_skb_cb and colliding - ethtool: fix ethtool msg len calculation for pause stats - vlan: fix a UAF in vlan_dev_real_dev() when ref-holder tries to access an unregistering real_dev - udp6: make encap_rcv() bump the v6 not v4 stats - drv: prestera: add explicit padding to fix m68k build - drv: felix: fix broken VLAN-tagged PTP under VLAN-aware bridge - drv: mvpp2: fix wrong SerDes reconfiguration order Misc & small latecomers: - ipvs: auto-load ipvs on genl access - mctp: sanity check the struct sockaddr_mctp padding fields - libfs: support RENAME_EXCHANGE in simple_rename() - avoid double accounting for pure zerocopy skbs Signed-off-by: Jakub Kicinski <kuba@kernel.org> -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmGNQdwACgkQMUZtbf5S IrsiMQ//f66lTJ8PJ5Qj70hX9dC897olx7uGHB9eiKoyOcJI459hFlfXwRU2T4Tf fPNwPNUQ9Mynw9tX/jWEi+7zd6r6TSHGXK49U9/rIbQ95QjKY4LHowIE63x+vPl2 5Cpf+80zXC3DUX1fijgyG1ujnU3kBaqopTxDLmlsHw2PGkwT5Ox1DUwkhc370eEL xlpq3PYGWA8/AQNyhSVBkG/UmoLaq0jYNP5yVcOj4jGjgcgLe1SLrqczENr35QHZ cRkuBsFBMBZF7wSX2f9qQIB/+b1pcLlD9IO+K3S7Ruq+rUd7qfL/tmwNxEh0axYK AyIun1Bxcy7QJGjtpGAz+Ku7jS9T3HxzyxhqilQo3co8jAW0WJ1YwHl+XPgQXyjV DLG6Vxt4syiwsoSXGn8MQugs4nlBT+0qWl8YamIR+o7KkAYPc2QWkXlzEDfNeIW8 JNCZA3sy7VGi1ytorZGx16sQsEWnyRG9a6/WV20Dr+HVs1SKPcFzIfG6mVngR07T mQMHnbAF6Z5d8VTcPQfMxd7UH48s1bHtk5lcSTa3j0Cw+GkA6ytTmjPdJ1qRcdkH dl9jAfADe4O6frG+9XH7FEFqhmkghVI7bOCA4ZOhClVaIcDGgEZc2y7sY9/oZ7P4 KXBD2R5X1caCUM0UtzwL7/8ddOtPtHIrFnhY+7+I6ijt9qmI0BY= =Ttgq -----END PGP SIGNATURE----- Merge tag 'net-5.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Jakub Kicinski: "Including fixes from bpf, can and netfilter. Current release - regressions: - bpf: do not reject when the stack read size is different from the tracked scalar size - net: fix premature exit from NAPI state polling in napi_disable() - riscv, bpf: fix RV32 broken build, and silence RV64 warning Current release - new code bugs: - net: fix possible NULL deref in sock_reserve_memory - amt: fix error return code in amt_init(); fix stopping the workqueue - ax88796c: use the correct ioctl callback Previous releases - always broken: - bpf: stop caching subprog index in the bpf_pseudo_func insn - security: fixups for the security hooks in sctp - nfc: add necessary privilege flags in netlink layer, limit operations to admin only - vsock: prevent unnecessary refcnt inc for non-blocking connect - net/smc: fix sk_refcnt underflow on link down and fallback - nfnetlink_queue: fix OOB when mac header was cleared - can: j1939: ignore invalid messages per standard - bpf, sockmap: - fix race in ingress receive verdict with redirect to self - fix incorrect sk_skb data_end access when src_reg = dst_reg - strparser, and tls are reusing qdisc_skb_cb and colliding - ethtool: fix ethtool msg len calculation for pause stats - vlan: fix a UAF in vlan_dev_real_dev() when ref-holder tries to access an unregistering real_dev - udp6: make encap_rcv() bump the v6 not v4 stats - drv: prestera: add explicit padding to fix m68k build - drv: felix: fix broken VLAN-tagged PTP under VLAN-aware bridge - drv: mvpp2: fix wrong SerDes reconfiguration order Misc & small latecomers: - ipvs: auto-load ipvs on genl access - mctp: sanity check the struct sockaddr_mctp padding fields - libfs: support RENAME_EXCHANGE in simple_rename() - avoid double accounting for pure zerocopy skbs" * tag 'net-5.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (123 commits) selftests/net: udpgso_bench_rx: fix port argument net: wwan: iosm: fix compilation warning cxgb4: fix eeprom len when diagnostics not implemented net: fix premature exit from NAPI state polling in napi_disable() net/smc: fix sk_refcnt underflow on linkdown and fallback net/mlx5: Lag, fix a potential Oops with mlx5_lag_create_definer() gve: fix unmatched u64_stats_update_end() net: ethernet: lantiq_etop: Fix compilation error selftests: forwarding: Fix packet matching in mirroring selftests vsock: prevent unnecessary refcnt inc for nonblocking connect net: marvell: mvpp2: Fix wrong SerDes reconfiguration order net: ethernet: ti: cpsw_ale: Fix access to un-initialized memory net: stmmac: allow a tc-taprio base-time of zero selftests: net: test_vxlan_under_vrf: fix HV connectivity test net: hns3: allow configure ETS bandwidth of all TCs net: hns3: remove check VF uc mac exist when set by PF net: hns3: fix some mac statistics is always 0 in device version V2 net: hns3: fix kernel crash when unload VF while it is being reset net: hns3: sync rx ring head in echo common pull net: hns3: fix pfc packet number incorrect after querying pfc parameters ... |
||
Greg Thelen
|
4716023a8f |
perf/core: Avoid put_page() when GUP fails
PEBS PERF_SAMPLE_PHYS_ADDR events use perf_virt_to_phys() to convert PMU
sampled virtual addresses to physical using get_user_page_fast_only()
and page_to_phys().
Some get_user_page_fast_only() error cases return false, indicating no
page reference, but still initialize the output page pointer with an
unreferenced page. In these error cases perf_virt_to_phys() calls
put_page(). This causes page reference count underflow, which can lead
to unintentional page sharing.
Fix perf_virt_to_phys() to only put_page() if get_user_page_fast_only()
returns a referenced page.
Fixes:
|
||
Valentin Schneider
|
a8b76910e4 |
preempt: Restore preemption model selection configs
Commit
|
||
Mathias Krause
|
b027789e5e |
sched/fair: Prevent dead task groups from regaining cfs_rq's
Kevin is reporting crashes which point to a use-after-free of a cfs_rq in update_blocked_averages(). Initial debugging revealed that we've live cfs_rq's (on_list=1) in an about to be kfree()'d task group in free_fair_sched_group(). However, it was unclear how that can happen. His kernel config happened to lead to a layout of struct sched_entity that put the 'my_q' member directly into the middle of the object which makes it incidentally overlap with SLUB's freelist pointer. That, in combination with SLAB_FREELIST_HARDENED's freelist pointer mangling, leads to a reliable access violation in form of a #GP which made the UAF fail fast. Michal seems to have run into the same issue[1]. He already correctly diagnosed that commit |
||
Vincent Donnefort
|
42dc938a59 |
sched/core: Mitigate race cpus_share_cache()/update_top_cache_domain()
Nothing protects the access to the per_cpu variable sd_llc_id. When testing
the same CPU (i.e. this_cpu == that_cpu), a race condition exists with
update_top_cache_domain(). One scenario being:
CPU1 CPU2
==================================================================
per_cpu(sd_llc_id, CPUX) => 0
partition_sched_domains_locked()
detach_destroy_domains()
cpus_share_cache(CPUX, CPUX) update_top_cache_domain(CPUX)
per_cpu(sd_llc_id, CPUX) => 0
per_cpu(sd_llc_id, CPUX) = CPUX
per_cpu(sd_llc_id, CPUX) => CPUX
return false
ttwu_queue_cond() wouldn't catch smp_processor_id() == cpu and the result
is a warning triggered from ttwu_queue_wakelist().
Avoid a such race in cpus_share_cache() by always returning true when
this_cpu == that_cpu.
Fixes:
|
||
Thomas Gleixner
|
9c8e9c9681 |
PCI/MSI: Move non-mask check back into low level accessors
The recent rework of PCI/MSI[X] masking moved the non-mask checks from the
low level accessors into the higher level mask/unmask functions.
This missed the fact that these accessors can be invoked from other places
as well. The missing checks break XEN-PV which sets pci_msi_ignore_mask and
also violates the virtual MSIX and the msi_attrib.maskbit protections.
Instead of sprinkling checks all over the place, lift them back into the
low level accessor functions. To avoid checking three different conditions
combine them into one property of msi_desc::msi_attrib.
[ josef: Fixed the missed conversion in the core code ]
Fixes:
|
||
Linus Torvalds
|
5147da902e |
Merge branch 'exit-cleanups-for-v5.16' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace
Pull exit cleanups from Eric Biederman: "While looking at some issues related to the exit path in the kernel I found several instances where the code is not using the existing abstractions properly. This set of changes introduces force_fatal_sig a way of sending a signal and not allowing it to be caught, and corrects the misuse of the existing abstractions that I found. A lot of the misuse of the existing abstractions are silly things such as doing something after calling a no return function, rolling BUG by hand, doing more work than necessary to terminate a kernel thread, or calling do_exit(SIGKILL) instead of calling force_sig(SIGKILL). In the review a deficiency in force_fatal_sig and force_sig_seccomp where ptrace or sigaction could prevent the delivery of the signal was found. I have added a change that adds SA_IMMUTABLE to change that makes it impossible to interrupt the delivery of those signals, and allows backporting to fix force_sig_seccomp And Arnd found an issue where a function passed to kthread_run had the wrong prototype, and after my cleanup was failing to build." * 'exit-cleanups-for-v5.16' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (23 commits) soc: ti: fix wkup_m3_rproc_boot_thread return type signal: Add SA_IMMUTABLE to ensure forced siganls do not get changed signal: Replace force_sigsegv(SIGSEGV) with force_fatal_sig(SIGSEGV) exit/r8188eu: Replace the macro thread_exit with a simple return 0 exit/rtl8712: Replace the macro thread_exit with a simple return 0 exit/rtl8723bs: Replace the macro thread_exit with a simple return 0 signal/x86: In emulate_vsyscall force a signal instead of calling do_exit signal/sparc32: In setup_rt_frame and setup_fram use force_fatal_sig signal/sparc32: Exit with a fatal signal when try_to_clear_window_buffer fails exit/syscall_user_dispatch: Send ordinary signals on failure signal: Implement force_fatal_sig exit/kthread: Have kernel threads return instead of calling do_exit signal/s390: Use force_sigsegv in default_trap_handler signal/vm86_32: Properly send SIGSEGV when the vm86 state cannot be saved. signal/vm86_32: Replace open coded BUG_ON with an actual BUG_ON signal/sparc: In setup_tsb_params convert open coded BUG into BUG signal/powerpc: On swapcontext failure force SIGSEGV signal/sh: Use force_sig(SIGKILL) instead of do_group_exit(SIGKILL) signal/mips: Update (_save|_restore)_fp_context to fail with -EFAULT signal/sparc32: Remove unreachable do_exit in do_sparc_fault ... |
||
Linus Torvalds
|
a41b74451b |
kernel.sys.v5.16
-----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCYYvEbgAKCRCRxhvAZXjc og17AQDj+gsxk2lT4GsRo+WrI9qegGSvYHaxbOoqqSL6rHrrsQD+IU92dwVfuUXE oP+De6/TBmsdygnlECxITp8p4ByhGAM= =wi2X -----END PGP SIGNATURE----- Merge tag 'kernel.sys.v5.16' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux Pull prctl updates from Christian Brauner: "This contains the missing prctl uapi pieces for PR_SCHED_CORE. In order to activate core scheduling the caller is expected to specify the scope of the new core scheduling domain. For example, passing 2 in the 4th argument of prctl(PR_SCHED_CORE, PR_SCHED_CORE_CREATE, <pid>, 2, 0); would indicate that the new core scheduling domain encompasses all tasks in the process group of <pid>. Specifying 0 would only create a core scheduling domain for the thread identified by <pid> and 2 would encompass the whole thread-group of <pid>. Note, the values 0, 1, and 2 correspond to PIDTYPE_PID, PIDTYPE_TGID, and PIDTYPE_PGID. A first version tried to expose those values directly to which I objected because: - PIDTYPE_* is an enum that is kernel internal which we should not expose to userspace directly. - PIDTYPE_* indicates what a given struct pid is used for it doesn't express a scope. But what the 4th argument of PR_SCHED_CORE prctl() expresses is the scope of the operation, i.e. the scope of the core scheduling domain at creation time. So Eugene's patch now simply introduces three new defines PR_SCHED_CORE_SCOPE_THREAD, PR_SCHED_CORE_SCOPE_THREAD_GROUP, and PR_SCHED_CORE_SCOPE_PROCESS_GROUP. They simply express what happens. This has been on the mailing list for quite a while with all relevant scheduler folks Cced. I announced multiple times that I'd pick this up if I don't see or her anyone else doing it. None of this touches proper scheduler code but only concerns uapi so I think this is fine. With core scheduling being quite common now for vm managers (e.g. moving individual vcpu threads into their own core scheduling domain) and container managers (e.g. moving the init process into its own core scheduling domain and letting all created children inherit it) having to rely on raw numbers passed as the 4th argument in prctl() is a bit annoying and everyone is starting to come up with their own defines" * tag 'kernel.sys.v5.16' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux: uapi/linux/prctl: provide macro definitions for the PR_SCHED_CORE type argument |
||
Linus Torvalds
|
6752de1aeb |
pidfd.v5.16
-----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCYYvE0wAKCRCRxhvAZXjc oo36AQCQRC9+LsfBsfoqrrdfWqp9ifs9DuytUg+CTftsy1Pn0QD/ZtySkNx9mnNl 0/lSTN5dJBfEYm6Xcfxuu/vu/iauhw0= =dY6T -----END PGP SIGNATURE----- Merge tag 'pidfd.v5.16' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux Pull pidfd updates from Christian Brauner: "Various places in the kernel have picked up pidfds. The two most recent additions have probably been the ability to use pidfds in bpf maps and the usage of pidfds in mm-based syscalls such as process_mrelease() and process_madvise(). The same pattern to turn a pidfd into a struct task exists in two places. One of those places used PIDTYPE_TGID while the other one used PIDTYPE_PID even though it is clearly documented in all pidfd-helpers that pidfds __currently__ only refer to thread-group leaders (subject to change in the future if need be). This isn't a bug per se but has the potential to be one if we allow pidfds to refer to individual threads. If that happens we want to audit all codepaths that make use of them to ensure they can deal with pidfds refering to individual threads. This adds a simple helper to turn a pidfd into a struct task making it easy to grep for such places. Plus, it gets rid of code-duplication" * tag 'pidfd.v5.16' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux: mm: use pidfd_get_task() pid: add pidfd_get_task() helper |
||
Jiri Olsa
|
2e6e9058d1 |
ftrace/direct: Fix lockup in modify_ftrace_direct_multi
We can't call unregister_ftrace_function under ftrace_lock.
Link: https://lkml.kernel.org/r/20211109114217.1645296-1-jolsa@kernel.org
Fixes:
|
||
Steven Rostedt (VMware)
|
51d1579466 |
ring-buffer: Protect ring_buffer_reset() from reentrancy
The resetting of the entire ring buffer use to simply go through and reset
each individual CPU buffer that had its own protection and synchronization.
But this was very slow, due to performing a synchronization for each CPU.
The code was reshuffled to do one disabling of all CPU buffers, followed
by a single RCU synchronization, and then the resetting of each of the CPU
buffers. But unfortunately, the mutex that prevented multiple occurrences
of resetting the buffer was not moved to the upper function, and there is
nothing to protect from it.
Take the ring buffer mutex around the global reset.
Cc: stable@vger.kernel.org
Fixes:
|
||
Linus Torvalds
|
372594985c |
dma-mapping updates for Linux 5.16
- convert sparc32 to the generic dma-direct code - use bitmap_zalloc (Christophe JAILLET) -----BEGIN PGP SIGNATURE----- iQI/BAABCgApFiEEgdbnc3r/njty3Iq9D55TZVIEUYMFAmGKfNYLHGhjaEBsc3Qu ZGUACgkQD55TZVIEUYMEIRAAhOocEFpeaSg8iLMd7QLzm5vvzAuR43iykkKCvdvV Q4P+g8H9Jr65ThsGS90AuuDKuyKh3tmbL7loHlyDygmRHhHALOO4127um4RAnOAL 1y2qCRwgHEZTu1uiu65cB+RRrlJP6T4sHV7+U3uZ3P5nfQoVVIoHKMceSTLIa3dx WPyJXP33TWK50ZvGYuzMhO5hQPA8sKSePiaN3gz3anF0lMnqlUNh1Iso6nasUW40 XifOFM2Bg/SO7HpBGssrku6Zc5x9TpyuQtLP0u+LpjrbUYUZvz/OteyVu5cTZdbP QG7MG6jcvDuU41sjKYNjaNpGZlvmXrEs4pXiwbOhzHTG8TFIEiR/LRsrvBGS7DJ8 y0NKNryIKR3+9fMKDH0PWHC7NszJbAQR0J7OT7+GP8cx9M62x5MuV8d2uOXp6TPY v3VO0SJQrBZLKpY7vixZ6TOYMz15kmULMRrkGzf95+z5MpM2RjJ4lY8Kqlm2PBRR Q3k53Ii8ya9U61SvgcCH39gR1fGT+WO8E5UFttCfhUhn49KJc7DqbEUiOC8Ta7QC OONXxhGLdXAkti5NLFAexk8zdLBVRMnzfG44tBnP/JWDbQu3lMNuQfUXzsJK9yDb zWr/832qwTIzT01NGZDFWdKUPNpafyuDQ1lP9rZZ2ZLo+f/EXNsHvczXvkwP08xS cyY= =DvuN -----END PGP SIGNATURE----- Merge tag 'dma-mapping-5.16' of git://git.infradead.org/users/hch/dma-mapping Pull dma-mapping updates from Christoph Hellwig: "Just a small set of changes this time. The request dma_direct_alloc cleanups are still under review and haven't made the cut. Summary: - convert sparc32 to the generic dma-direct code - use bitmap_zalloc (Christophe JAILLET)" * tag 'dma-mapping-5.16' of git://git.infradead.org/users/hch/dma-mapping: dma-mapping: use 'bitmap_zalloc()' when applicable sparc32: use DMA_DIRECT_REMAP sparc32: remove dma_make_coherent sparc32: remove the call to dma_make_coherent in arch_dma_free |
||
Linus Torvalds
|
59a2ceeef6 |
Merge branch 'akpm' (patches from Andrew)
Merge more updates from Andrew Morton: "87 patches. Subsystems affected by this patch series: mm (pagecache and hugetlb), procfs, misc, MAINTAINERS, lib, checkpatch, binfmt, kallsyms, ramfs, init, codafs, nilfs2, hfs, crash_dump, signals, seq_file, fork, sysvfs, kcov, gdb, resource, selftests, and ipc" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (87 commits) ipc/ipc_sysctl.c: remove fallback for !CONFIG_PROC_SYSCTL ipc: check checkpoint_restore_ns_capable() to modify C/R proc files selftests/kselftest/runner/run_one(): allow running non-executable files virtio-mem: disallow mapping virtio-mem memory via /dev/mem kernel/resource: disallow access to exclusive system RAM regions kernel/resource: clean up and optimize iomem_is_exclusive() scripts/gdb: handle split debug for vmlinux kcov: replace local_irq_save() with a local_lock_t kcov: avoid enable+disable interrupts if !in_task() kcov: allocate per-CPU memory on the relevant node Documentation/kcov: define `ip' in the example Documentation/kcov: include types.h in the example sysv: use BUILD_BUG_ON instead of runtime check kernel/fork.c: unshare(): use swap() to make code cleaner seq_file: fix passing wrong private data seq_file: move seq_escape() to a header signal: remove duplicate include in signal.h crash_dump: remove duplicate include in crash_dump.h crash_dump: fix boolreturn.cocci warning hfs/hfsplus: use WARN_ON for sanity check ... |
||
David Hildenbrand
|
a9e7b8d4f6 |
kernel/resource: disallow access to exclusive system RAM regions
virtio-mem dynamically exposes memory inside a device memory region as system RAM to Linux, coordinating with the hypervisor which parts are actually "plugged" and consequently usable/accessible. On the one hand, the virtio-mem driver adds/removes whole memory blocks, creating/removing busy IORESOURCE_SYSTEM_RAM resources, on the other hand, it logically (un)plugs memory inside added memory blocks, dynamically either exposing them to the buddy or hiding them from the buddy and marking them PG_offline. In contrast to physical devices, like a DIMM, the virtio-mem driver is required to actually make use of any of the device-provided memory, because it performs the handshake with the hypervisor. virtio-mem memory cannot simply be access via /dev/mem without a driver. There is no safe way to: a) Access plugged memory blocks via /dev/mem, as they might contain unplugged holes or might get silently unplugged by the virtio-mem driver and consequently turned inaccessible. b) Access unplugged memory blocks via /dev/mem because the virtio-mem driver is required to make them actually accessible first. The virtio-spec states that unplugged memory blocks MUST NOT be written, and only selected unplugged memory blocks MAY be read. We want to make sure, this is the case in sane environments -- where the virtio-mem driver was loaded. We want to make sure that in a sane environment, nobody "accidentially" accesses unplugged memory inside the device managed region. For example, a user might spot a memory region in /proc/iomem and try accessing it via /dev/mem via gdb or dumping it via something else. By the time the mmap() happens, the memory might already have been removed by the virtio-mem driver silently: the mmap() would succeeed and user space might accidentially access unplugged memory. So once the driver was loaded and detected the device along the device-managed region, we just want to disallow any access via /dev/mem to it. In an ideal world, we would mark the whole region as busy ("owned by a driver") and exclude it; however, that would be wrong, as we don't really have actual system RAM at these ranges added to Linux ("busy system RAM"). Instead, we want to mark such ranges as "not actual busy system RAM but still soft-reserved and prepared by a driver for future use." Let's teach iomem_is_exclusive() to reject access to any range with "IORESOURCE_SYSTEM_RAM | IORESOURCE_EXCLUSIVE", even if not busy and even if "iomem=relaxed" is set. Introduce EXCLUSIVE_SYSTEM_RAM to make it easier for applicable drivers to depend on this setting in their Kconfig. For now, there are no applicable ranges and we'll modify virtio-mem next to properly set IORESOURCE_EXCLUSIVE on the parent resource container it creates to contain all actual busy system RAM added via add_memory_driver_managed(). Link: https://lkml.kernel.org/r/20210920142856.17758-3-david@redhat.com Signed-off-by: David Hildenbrand <david@redhat.com> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Cc: Andy Shevchenko <andy.shevchenko@gmail.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Hanjun Guo <guohanjun@huawei.com> Cc: Jason Wang <jasowang@redhat.com> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
David Hildenbrand
|
b78dfa059f |
kernel/resource: clean up and optimize iomem_is_exclusive()
Patch series "virtio-mem: disallow mapping virtio-mem memory via /dev/mem", v5. Let's add the basic infrastructure to exclude some physical memory regions marked as "IORESOURCE_SYSTEM_RAM" completely from /dev/mem access, even though they are not marked IORESOURCE_BUSY and even though "iomem=relaxed" is set. Resource IORESOURCE_EXCLUSIVE for that purpose instead of adding new flags to express something similar to "soft-busy" or "not busy yet, but already prepared by a driver and not to be mapped by user space". Use it for virtio-mem, to disallow mapping any virtio-mem memory via /dev/mem to user space after the virtio-mem driver was loaded. This patch (of 3): We end up traversing subtrees of ranges we are not interested in; let's optimize this case, skipping such subtrees, cleaning up the function a bit. For example, in the following configuration (/proc/iomem): 00000000-00000fff : Reserved 00001000-00057fff : System RAM 00058000-00058fff : Reserved 00059000-0009cfff : System RAM 0009d000-000fffff : Reserved 000a0000-000bffff : PCI Bus 0000:00 000c0000-000c3fff : PCI Bus 0000:00 000c4000-000c7fff : PCI Bus 0000:00 000c8000-000cbfff : PCI Bus 0000:00 000cc000-000cffff : PCI Bus 0000:00 000d0000-000d3fff : PCI Bus 0000:00 000d4000-000d7fff : PCI Bus 0000:00 000d8000-000dbfff : PCI Bus 0000:00 000dc000-000dffff : PCI Bus 0000:00 000e0000-000e3fff : PCI Bus 0000:00 000e4000-000e7fff : PCI Bus 0000:00 000e8000-000ebfff : PCI Bus 0000:00 000ec000-000effff : PCI Bus 0000:00 000f0000-000fffff : PCI Bus 0000:00 000f0000-000fffff : System ROM 00100000-3fffffff : System RAM 40000000-403fffff : Reserved 40000000-403fffff : pnp 00:00 40400000-80a79fff : System RAM ... We don't have to look at any children of "0009d000-000fffff : Reserved" if we can just skip these 15 items directly because the parent range is not of interest. Link: https://lkml.kernel.org/r/20210920142856.17758-1-david@redhat.com Link: https://lkml.kernel.org/r/20210920142856.17758-2-david@redhat.com Signed-off-by: David Hildenbrand <david@redhat.com> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Jason Wang <jasowang@redhat.com> Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com> Cc: Hanjun Guo <guohanjun@huawei.com> Cc: Andy Shevchenko <andy.shevchenko@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
Sebastian Andrzej Siewior
|
d5d2c51f1e |
kcov: replace local_irq_save() with a local_lock_t
The kcov code mixes local_irq_save() and spin_lock() in kcov_remote_{start|end}(). This creates a warning on PREEMPT_RT because local_irq_save() disables interrupts and spin_lock_t is turned into a sleeping lock which can not be acquired in a section with disabled interrupts. The kcov_remote_lock is used to synchronize the access to the hash-list kcov_remote_map. The local_irq_save() block protects access to the per-CPU data kcov_percpu_data. There is no compelling reason to change the lock type to raw_spin_lock_t to make it work with local_irq_save(). Changing it would require to move memory allocation (in kcov_remote_add()) and deallocation outside of the locked section. Adding an unlimited amount of entries to the hashlist will increase the IRQ-off time during lookup. It could be argued that this is debug code and the latency does not matter. There is however no need to do so and it would allow to use this facility in an RT enabled build. Using a local_lock_t instead of local_irq_save() has the befit of adding a protection scope within the source which makes it obvious what is protected. On a !PREEMPT_RT && !LOCKDEP build the local_lock_irqsave() maps directly to local_irq_save() so there is overhead at runtime. Replace the local_irq_save() section with a local_lock_t. Link: https://lkml.kernel.org/r/20210923164741.1859522-6-bigeasy@linutronix.de Link: https://lore.kernel.org/r/20210830172627.267989-6-bigeasy@linutronix.de Reported-by: Clark Williams <williams@redhat.com> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Acked-by: Dmitry Vyukov <dvyukov@google.com> Acked-by: Marco Elver <elver@google.com> Tested-by: Marco Elver <elver@google.com> Reviewed-by: Andrey Konovalov <andreyknvl@gmail.com> Cc: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
Sebastian Andrzej Siewior
|
22036abe17 |
kcov: avoid enable+disable interrupts if !in_task()
kcov_remote_start() may need to allocate memory in the in_task() case (otherwise per-CPU memory has been pre-allocated) and therefore requires enabled interrupts. The interrupts are enabled before checking if the allocation is required so if no allocation is required then the interrupts are needlessly enabled and disabled again. Enable interrupts only if memory allocation is performed. Link: https://lkml.kernel.org/r/20210923164741.1859522-5-bigeasy@linutronix.de Link: https://lore.kernel.org/r/20210830172627.267989-5-bigeasy@linutronix.de Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Acked-by: Dmitry Vyukov <dvyukov@google.com> Acked-by: Marco Elver <elver@google.com> Tested-by: Marco Elver <elver@google.com> Reviewed-by: Andrey Konovalov <andreyknvl@gmail.com> Cc: Clark Williams <williams@redhat.com> Cc: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
Sebastian Andrzej Siewior
|
741ddd4519 |
kcov: allocate per-CPU memory on the relevant node
During boot kcov allocates per-CPU memory which is used later if remote/ softirq processing is enabled. Allocate the per-CPU memory on the CPU local node to avoid cross node memory access. Link: https://lkml.kernel.org/r/20210923164741.1859522-4-bigeasy@linutronix.de Link: https://lore.kernel.org/r/20210830172627.267989-4-bigeasy@linutronix.de Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Acked-by: Dmitry Vyukov <dvyukov@google.com> Acked-by: Marco Elver <elver@google.com> Tested-by: Marco Elver <elver@google.com> Reviewed-by: Andrey Konovalov <andreyknvl@gmail.com> Cc: Clark Williams <williams@redhat.com> Cc: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
Ran Xiaokai
|
ba1f70ddd1 |
kernel/fork.c: unshare(): use swap() to make code cleaner
Use swap() instead of reimplementing it. Link: https://lkml.kernel.org/r/20210909022046.8151-1-ran.xiaokai@zte.com.cn Signed-off-by: Ran Xiaokai <ran.xiaokai@zte.com.cn> Cc: Gabriel Krisman Bertazi <krisman@collabora.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Alexey Gladkov <legion@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
Kefeng Wang
|
808b64565b |
extable: use is_kernel_text() helper
The core_kernel_text() should check the gate area, as it is part of kernel text range, use is_kernel_text() in core_kernel_text(). Link: https://lkml.kernel.org/r/20210930071143.63410-9-wangkefeng.wang@huawei.com Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Alexander Potapenko <glider@google.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Andrey Konovalov <andreyknvl@gmail.com> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: "David S. Miller" <davem@davemloft.net> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Cc: Matt Turner <mattst88@gmail.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Simek <monstr@monstr.eu> Cc: Paul Mackerras <paulus@samba.org> Cc: Petr Mladek <pmladek@suse.com> Cc: Richard Henderson <rth@twiddle.net> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
Kefeng Wang
|
b9ad8fe7b8 |
sections: move is_kernel_inittext() into sections.h
The is_kernel_inittext() and init_kernel_text() are with same functionality, let's just keep is_kernel_inittext() and move it into sections.h, then update all the callers. Link: https://lkml.kernel.org/r/20210930071143.63410-5-wangkefeng.wang@huawei.com Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Alexander Potapenko <glider@google.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Andrey Konovalov <andreyknvl@gmail.com> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: "David S. Miller" <davem@davemloft.net> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Cc: Matt Turner <mattst88@gmail.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Simek <monstr@monstr.eu> Cc: Paul Mackerras <paulus@samba.org> Cc: Petr Mladek <pmladek@suse.com> Cc: Richard Henderson <rth@twiddle.net> Cc: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
Kefeng Wang
|
a20deb3a34 |
sections: move and rename core_kernel_data() to is_kernel_core_data()
Move core_kernel_data() into sections.h and rename it to is_kernel_core_data(), also make it return bool value, then update all the callers. Link: https://lkml.kernel.org/r/20210930071143.63410-4-wangkefeng.wang@huawei.com Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Alexander Potapenko <glider@google.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Andrey Konovalov <andreyknvl@gmail.com> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Cc: Matt Turner <mattst88@gmail.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Simek <monstr@monstr.eu> Cc: Paul Mackerras <paulus@samba.org> Cc: Petr Mladek <pmladek@suse.com> Cc: Richard Henderson <rth@twiddle.net> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
Kefeng Wang
|
1b1ad288b8 |
kallsyms: remove arch specific text and data check
Patch series "sections: Unify kernel sections range check and use", v4.
There are three head files(kallsyms.h, kernel.h and sections.h) which
include the kernel sections range check, let's make some cleanup and unify
them.
1. cleanup arch specific text/data check and fix address boundary check
in kallsyms.h
2. make all the basic/core kernel range check function into sections.h
3. update all the callers, and use the helper in sections.h to simplify
the code
After this series, we have 5 APIs about kernel sections range check in
sections.h
* is_kernel_rodata() --- already in sections.h
* is_kernel_core_data() --- come from core_kernel_data() in kernel.h
* is_kernel_inittext() --- come from kernel.h and kallsyms.h
* __is_kernel_text() --- add new internal helper
* __is_kernel() --- add new internal helper
Note: For the last two helpers, people should not use directly, consider to
use corresponding function in kallsyms.h.
This patch (of 11):
Remove arch specific text and data check after commit
|
||
Linus Torvalds
|
e851dfae43 |
kgdb patches for 5.16
A single patch this cycle. We replace some open-coded routines to classify task states with the scheduler's own function to do this. Alongside the obvious benefits of removing funky code and aligning more exactly with the scheduler's task classification, this also fixes a long standing compiler warning by removing the open-coded routines that generated the warning. Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org> -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEELzVBU1D3lWq6cKzwfOMlXTn3iKEFAmGJA90ACgkQfOMlXTn3 iKFJcw/9FLy94FS+y/F8xpTU89d2j92f8q1mxS9g7ToDzDOiIPLyNazbX+4PsXVQ FGgLpTqzNZX3+D5eAnOA/BNwWXtsvdxpsNnkY5ZCsVY5kZ0zsBYHe1O5CM2TbcMg bOvJRQVI/FjydlrwqxIz9gAD7FmT/QvyecIbHZm/zFiCxdQwZy3rFwREd5ENsjoG wumCCCH8Gh/afi9Pu3ZKHoZggNy/gmtSP3h3wmyoQneVFIJ4Vw5J61GFCvMPD+pN wuAXWpuzWaND5IPTr4aZMKHNSaxqADQoEpNWxkgRh0cNL4NGBKsdLZMcqTTiyWww TJSDtQKqocQB99eouwzQoA8SBsZwRvKRf/33QUXrWCAjl5YRK+9fSd8+dEf9Zd0o A3sh99ecmHXknY6K2uO7NFjUPLSA/QeMGBzNx9lt7RoL+14tjqZkrAvWXooZzBY3 j39gwI1kSplmmCSoXwoW3AFVcCLJcGzE9qh0NUmZgt3kv8K1SUo3gxotKs8KwKj/ xVozOokmZV2ZuCTf8oIw7ntLwIFjiUaYBE7JY+c8mT8VWCbs5ztyOb11I35YYT0V InXMDICLxZBD85eNOHyPC0fAud5emfboHl5GSUxo2hPgrRKuBmqElGtxG9CC8DLR SItPjKfrYI1CJd4uoFX54nC3GmwLVSAq3xDwpYsN4A4lLbJJytc= =vIfI -----END PGP SIGNATURE----- Merge tag 'kgdb-5.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/danielt/linux Pull kgdb update from Daniel Thompson: "A single patch this cycle. We replace some open-coded routines to classify task states with the scheduler's own function to do this. Alongside the obvious benefits of removing funky code and aligning more exactly with the scheduler's task classification, this also fixes a long standing compiler warning by removing the open-coded routines that generated the warning" * tag 'kgdb-5.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/danielt/linux: kdb: Adopt scheduler's task classification |
||
Linus Torvalds
|
67b7e1f241 |
modules patches for 5.16-rc1
As requested by Jessica I'm stepping in to help with modules maintenance. This is my first pull request to you. I've collected only two patches for modules for the 5.16-rc1 merge window. These patches are from Shuah Khan as she debugged some corner case error with modules. The error messages are improved for elf_validity_check(). While doing this work a corner case fix was spotted on validate_section_offset() due to a possible overflow bug on 64-bit. The impact of this fix is low given this just limits module section headers placed within the 32-bit boundary, and we obviously don't have insane module sizes. Even if a specially crafted module is constructed later checks would invalidate the module right away. I've let this sit through 0-day testing since October 15th with no issues found. Signed-off-by: Luis Chamberlain <mcgrof@kernel.org> -----BEGIN PGP SIGNATURE----- iQJGBAABCgAwFiEENnNq2KuOejlQLZofziMdCjCSiKcFAmGFrvcSHG1jZ3JvZkBr ZXJuZWwub3JnAAoJEM4jHQowkoinhFAP/1BBXuM/vevC1IdZaEU4M8pg07NOpkZt PYJc8CxWKTtEg5hrLJMqOexXGwvAg/nq28IFWvUKh3bGtEghPyrQu6+I4mXsjnjJ t9/AO+BOYU14DJGDAYEuReNsaAcyeRooHLriuUaNvhhaN9q+v+FRyBWNphmA6Tz7 VkCtmCNMFJZlhd9Cu4jOZpJe6CIe9gZ0czYfRshAl/3ZRSQjYaddtbYf1Cs8Vwah by4o2YyvctrRzeOj/Fy+kbqZw2St39nZ5fKYwijRn1ZwHRQo6NQqrlMeS8rI0LgG 1YwWgNWO1FjaPzyIFcAhk2bUF2TxEf5/eVpXn2qXHnmVZ55oBPP/O7Th0/5OK9gD utOMbO1nqBLBXUyX/1dO/UT36XcrqtUP0Y9VgjIvj9n8Y82RGYmBScH/TOU1f7A7 sH56sW9/3YvIOe8AShBHJ7IKqZXU0inIGasFYwKKm2pAOLtajaC9Sr5fqVbuyfNF J2+nXipVzjI0f9SGTqmE41jynFGln6nfd1pgOOiysg9ZqxieINB0J8l0OHe6fZz/ zU4TehXZHE9DApP8D+rVpP0ltwR2YWs2u0zRqHr/0GEWYH00JZu2ymDR13W7izSp KiiveBxhwBpewgV5cyua8TDyeKhn3mEJFNmijlaq4yq1P2oKeWTQRDRZjwUP8EZY s16oV+BW7Kp+ =Evek -----END PGP SIGNATURE----- Merge tag 'modules-5.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux Pull module updates from Luis Chamberlain: "As requested by Jessica I'm stepping in to help with modules maintenance. This is my first pull request to you. I've collected only two patches for modules for the 5.16-rc1 merge window. These patches are from Shuah Khan as she debugged some corner case error with modules. The error messages are improved for elf_validity_check(). While doing this work a corner case fix was spotted on validate_section_offset() due to a possible overflow bug on 64-bit. The impact of this fix is low given this just limits module section headers placed within the 32-bit boundary, and we obviously don't have insane module sizes. Even if a specially crafted module is constructed later checks would invalidate the module right away. I've let this sit through 0-day testing since October 15th with no issues found" * tag 'modules-5.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux: module: change to print useful messages from elf_validity_check() module: fix validate_section_offset() overflow bug on 64-bit |