2
0
mirror of https://github.com/edk2-porting/linux-next.git synced 2025-01-13 08:04:45 +08:00
Commit Graph

32874 Commits

Author SHA1 Message Date
Paul E. McKenney
e4fe5dd6f2 rcu-tasks: Further refactor RCU-tasks to allow adding more variants
This commit refactors RCU tasks to allow variants to be added.  These
variants will share the current Tasks-RCU tasklist scan and the holdout
list processing.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-04-27 11:03:51 -07:00
Paul E. McKenney
c97d12a63c rcu-tasks: Use unique names for RCU-Tasks kthreads and messages
This commit causes the flavors of RCU Tasks to use different names
for their kthreads and in their console messages.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-04-27 11:03:51 -07:00
Paul E. McKenney
3d6e43c75d rcutorture: Add torture tests for RCU Tasks Rude
This commit adds the definitions required to torture the rude flavor of
RCU tasks.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-04-27 11:03:51 -07:00
Paul E. McKenney
c84aad7654 rcu-tasks: Add an RCU-tasks rude variant
This commit adds a "rude" variant of RCU-tasks that has as quiescent
states schedule(), cond_resched_tasks_rcu_qs(), userspace execution,
and (in theory, anyway) cond_resched().  In other words, RCU-tasks rude
readers are regions of code with preemption disabled, but excluding code
early in the CPU-online sequence and late in the CPU-offline sequence.
Updates make use of IPIs and force an IPI and a context switch on each
online CPU.  This variant is useful in some situations in tracing.

Suggested-by: Steven Rostedt <rostedt@goodmis.org>
[ paulmck: Apply EXPORT_SYMBOL_GPL() feedback from Qiujun Huang. ]
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
[ paulmck: Apply review feedback from Steve Rostedt. ]
2020-04-27 11:03:51 -07:00
Paul E. McKenney
5873b8a94e rcu-tasks: Refactor RCU-tasks to allow variants to be added
This commit splits out generic processing from RCU-tasks-specific
processing in order to allow additional flavors to be added.  It also
adds a def_bool TASKS_RCU_GENERIC to enable the common RCU-tasks
infrastructure code.

This is primarily, but not entirely, a code-movement commit.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-04-27 11:03:51 -07:00
Paul E. McKenney
9cf8fc6fab rcutorture: Add a test for synchronize_rcu_mult()
This commit adds a crude test for synchronize_rcu_mult().  This is
currently a smoke test rather than a high-quality stress test.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-04-27 11:03:51 -07:00
Paul E. McKenney
07e105158d rcu-tasks: Create struct to hold state information
This commit creates an rcu_tasks struct to hold state information for
RCU Tasks.  This is a preparation commit for adding additional flavors
of Tasks RCU, each of which would have its own rcu_tasks struct.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-04-27 11:03:50 -07:00
Paul E. McKenney
eacd6f04a1 rcu-tasks: Move Tasks RCU to its own file
This code-movement-only commit is in preparation for adding an additional
flavor of Tasks RCU, which relies on workqueues to detect grace periods.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-04-27 11:03:50 -07:00
Paul E. McKenney
5bef8da66a rcu: Add per-task state to RCU CPU stall warnings
Currently, an RCU-preempt CPU stall warning simply lists the PIDs of
those tasks holding up the current grace period.  This can be helpful,
but more can be even more helpful.

To this end, this commit adds the nesting level, whether the task
thinks it was preempted in its current RCU read-side critical section,
whether RCU core has asked this task for a quiescent state, whether the
expedited-grace-period hint is set, and whether the task believes that
it is on the blocked-tasks list (it must be, or it would not be printed,
but if things are broken, best not to take too much for granted).

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-04-27 11:03:50 -07:00
Paul E. McKenney
2beaf3280e sched/core: Add function to sample state of locked-down task
A running task's state can be sampled in a consistent manner (for example,
for diagnostic purposes) simply by invoking smp_call_function_single()
on its CPU, which may be obtained using task_cpu(), then having the
IPI handler verify that the desired task is in fact still running.
However, if the task is not running, this sampling can in theory be done
immediately and directly.  In practice, the task might start running at
any time, including during the sampling period.  Gaining a consistent
sample of a not-running task therefore requires that something be done
to lock down the target task's state.

This commit therefore adds a try_invoke_on_locked_down_task() function
that invokes a specified function if the specified task can be locked
down, returning true if successful and if the specified function returns
true.  Otherwise this function simply returns false.  Given that the
function passed to try_invoke_on_nonrunning_task() might be invoked with
a runqueue lock held, that function had better be quite lightweight.

The function is passed the target task's task_struct pointer and the
argument passed to try_invoke_on_locked_down_task(), allowing easy access
to task state and to a location for further variables to be passed in
and out.

Note that the specified function will be called even if the specified
task is currently running.  The function can use ->on_rq and task_curr()
to quickly and easily determine the task's state, and can return false
if this state is not to the function's liking.  The caller of the
try_invoke_on_locked_down_task() would then see the false return value,
and could take appropriate action, for example, trying again later or
sending an IPI if matters are more urgent.

It is expected that use cases such as the RCU CPU stall warning code will
simply return false if the task is currently running.  However, there are
use cases involving nohz_full CPUs where the specified function might
instead fall back to an alternative sampling scheme that relies on heavier
synchronization (such as memory barriers) in the target task.

Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Ben Segall <bsegall@google.com>
Cc: Mel Gorman <mgorman@suse.de>
[ paulmck: Apply feedback from Peter Zijlstra and Steven Rostedt. ]
[ paulmck: Invoke if running to handle feedback from Mathieu Desnoyers. ]
Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-04-27 11:03:50 -07:00
Paul E. McKenney
66777e5821 rcu-tasks: Use context-switch hook for PREEMPT=y kernels
Currently, the PREEMPT=y version of rcu_note_context_switch() does not
invoke rcu_tasks_qs(), and we need it to in order to keep RCU Tasks
Trace's IPIs down to a dull roar.  This commit therefore enables this
hook.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-04-27 11:03:50 -07:00
Paul E. McKenney
ac3caf8274 rcu: Add comments marking transitions between RCU watching and not
It is not as clear as it might be just where in RCU's idle entry/exit
code RCU stops and starts watching the current CPU.  This commit therefore
adds comments calling out the transitions.

Reported-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-04-27 11:03:50 -07:00
Paul E. McKenney
52b1fc3f79 rcutorture: Add test of holding scheduler locks across rcu_read_unlock()
Now that it should be safe to hold scheduler locks across
rcu_read_unlock(), even in cases where the corresponding RCU read-side
critical section might have been preempted and boosted, the commit adds
a test of this capability to rcutorture.  This has been tested on current
mainline (which can deadlock in this situation), and lockdep duly reported
the expected deadlock.  On -rcu, lockdep is silent, thus far, anyway.

Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-04-27 11:03:50 -07:00
Lai Jiangshan
5f5fa7ea89 rcu: Don't use negative nesting depth in __rcu_read_unlock()
Now that RCU flavors have been consolidated, an RCU-preempt
rcu_read_unlock() in an interrupt or softirq handler cannot possibly
end the RCU read-side critical section.  Consider the old vulnerability
involving rcu_read_unlock() being invoked within such a handler that
interrupted an __rcu_read_unlock_special(), in which a wakeup might be
invoked with a scheduler lock held.  Because rcu_read_unlock_special()
no longer does wakeups in such situations, it is no longer necessary
for __rcu_read_unlock() to set the nesting level negative.

This commit therefore removes this recursion-protection code from
__rcu_read_unlock().

[ paulmck: Let rcu_exp_handler() continue to call rcu_report_exp_rdp(). ]
[ paulmck: Adjust other checks given no more negative nesting. ]
Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-04-27 11:03:50 -07:00
Lai Jiangshan
f0bdf6d473 rcu: Remove unused ->rcu_read_unlock_special.b.deferred_qs field
The ->rcu_read_unlock_special.b.deferred_qs field is set to true in
rcu_read_unlock_special() but never set to false.  This is not
particularly useful, so this commit removes this field.

The only possible justification for this field is to ease debugging
of RCU deferred quiscent states, but the combination of the other
->rcu_read_unlock_special fields plus ->rcu_blocked_node and of course
->rcu_read_lock_nesting should cover debugging needs.  And if this last
proves incorrect, this patch can always be reverted, along with the
required setting of ->rcu_read_unlock_special.b.deferred_qs to false
in rcu_preempt_deferred_qs_irqrestore().

Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-04-27 11:03:50 -07:00
Lai Jiangshan
07b4a930fc rcu: Don't set nesting depth negative in rcu_preempt_deferred_qs()
Now that RCU flavors have been consolidated, an RCU-preempt
rcu_read_unlock() in an interrupt or softirq handler cannot possibly
end the RCU read-side critical section.  Consider the old vulnerability
involving rcu_preempt_deferred_qs() being invoked within such a handler
that interrupted an extended RCU read-side critical section, in which
a wakeup might be invoked with a scheduler lock held.  Because
rcu_read_unlock_special() no longer does wakeups in such situations,
it is no longer necessary for rcu_preempt_deferred_qs() to set the
nesting level negative.

This commit therefore removes this recursion-protection code from
rcu_preempt_deferred_qs().

[ paulmck: Fix typo in commit log per Steve Rostedt. ]
Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-04-27 11:03:50 -07:00
Paul E. McKenney
e4453d8a1c rcu: Make rcu_read_unlock_special() safe for rq/pi locks
The scheduler is currently required to hold rq/pi locks across the entire
RCU read-side critical section or not at all.  This is inconvenient and
leaves traps for the unwary, including the author of this commit.

But now that excessively long grace periods enable scheduling-clock
interrupts for holdout nohz_full CPUs, the nohz_full rescue logic in
rcu_read_unlock_special() can be dispensed with.  In other words, the
rcu_read_unlock_special() function can refrain from doing wakeups unless
such wakeups are guaranteed safe.

This commit therefore avoids unsafe wakeups, freeing the scheduler to
hold rq/pi locks across rcu_read_unlock() even if the corresponding RCU
read-side critical section might have been preempted.  This commit also
updates RCU's requirements documentation.

This commit is inspired by a patch from Lai Jiangshan:
https://lore.kernel.org/lkml/20191102124559.1135-2-laijs@linux.alibaba.com
This commit is further intended to be a step towards his goal of permitting
the inlining of RCU-preempt's rcu_read_lock() and rcu_read_unlock().

Cc: Lai Jiangshan <laijs@linux.alibaba.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-04-27 11:03:50 -07:00
Paul E. McKenney
c76e7e0bce rcu: Add KCSAN stubs to update.c
This commit adds stubs for KCSAN's data_race(), ASSERT_EXCLUSIVE_WRITER(),
and ASSERT_EXCLUSIVE_ACCESS() macros to allow code using these macros
to move ahead.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-04-27 11:03:50 -07:00
Paul E. McKenney
353159365e rcu: Add KCSAN stubs
This commit adds stubs for KCSAN's data_race(), ASSERT_EXCLUSIVE_WRITER(),
and ASSERT_EXCLUSIVE_ACCESS() macros to allow code using these macros to
move ahead.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-04-27 11:00:06 -07:00
Linus Torvalds
3e0dea5768 An update for the proc interface of time namespaces: Use symbolic names
instead of clockid numbers. The usability nuisance of numbers was noticed
 by Michael when polishing the man page.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAl6cVQsTHHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYoWBjEAC0dCUHKDLoG0FeyG4tb4FEBW2iTqM8
 UFirH26K18s8QSePdvfJlaxtN2SdfNZG7UgYN7wz1fDFQy05zTz7Rek8UrDuu3rh
 mVph/UZtUJl+6ypW2Lw9x5RWpT5yzay2iowUyBPnNxU9F/0uRKvXQFju3L83Lo/z
 Z4ni7gVEw87dQi5E74tEv6iaydgPuCBpGxoMahotnHyclqMjA0QuAK6nhN5ZTcAn
 senoorS/VqkSF5qEvIUwe7+F+kkMbwQryT7merJyNwh/F49xTTXRyBmiys1MF8Og
 MTEvldXKy2pCh2UfRa/x84WWwOUVNivTXdIXjhalsblczL0j1z9MsQ8b3AOXOiLf
 S+/Ntbb2dGo4qE22jekMwZ54Pm4x5NzChCU8+3pvd6IrPWZKi6vue74Kd0RNHQg/
 0kWOlZnIP2ArVW0bFqV6jhMYkjmVdK6gm7cUpFV66L2H8zbfFuc4OlxJYEFYivye
 9Yck+rFQmMwA15ZXYIpggkd7Rf/5CGF1CiMBAvP/ILubpgbJqnn6/tGByq8tDKdy
 mqXX+NHF0M/7rJd5vr7wP6p3E5nQ9l/41rh9ii9EDLXf4jsWVO3EyobJ7fFHwprs
 5tTWGxVJymUQLq/LQPXOVVENGK+ZsXXNGn/4n8IOVroeypxADTGyhtSh122kFFhv
 jPcVHqpBUd0g4Q==
 =slEk
 -----END PGP SIGNATURE-----

Merge tag 'timers-urgent-2020-04-19' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull time namespace fix from Thomas Gleixner:
 "An update for the proc interface of time namespaces: Use symbolic
  names instead of clockid numbers. The usability nuisance of numbers
  was noticed by Michael when polishing the man page"

* tag 'timers-urgent-2020-04-19' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  proc, time/namespace: Show clock symbolic names in /proc/pid/timens_offsets
2020-04-19 11:46:21 -07:00
Linus Torvalds
80ade29e1e A set of fixes/updates for the interrupt subsystem:
- Remove setup_irq() and remove_irq(). All users have been converted so
    remove them before new users surface.
 
  - A set of bugfixes for various interrupt chip drivers
 
  - Add a few missing static attributes to address sparse warnings.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAl6cUuMTHHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYoYi7EACOFPrwdOlKqDdgU1FGReEzhJeNSSyH
 yUp1m2nNckz8Y2B+ihnLsfvcktZSXYRuDTZ/u/rmaKqq2wH5Q/h4DNQxEfoMNUep
 IVBlvAFcGsvpdSbrlc+nx6sEo0K2b22AQVHdyPECiQYFZJikstAtEfzEv+ZaUr2S
 Lcds295BIQylbugQpcVZL73j6iUKQ+P5YU0Wlkd/Vhlnxe9UdMd/N1P3GoRaRlOa
 QxYDJCnZJjWkN+cEVRCAZVTat6pd3zaMHvEabI39Lzx4U+nu4vh62TILwk+wdpuA
 DzgA+ENFXzv2zLlnF8gB0wKWw3J99No9gfRpuK/vWBQ68UeZsPlM5PKEr93oD4cC
 To9D70r71UM+LS+Km8ciFlqeT4N+hIMb/x8rpIf5Tcfn5spXjNEhR4U6/d/D2ZYy
 cQiu82th9kSOMGBhlrfkJ0gAT20UfAktDHU1M4JhwI5Y/DLusb6mfg0CRMj8ucOV
 0xrKkgHxhX162oRTKzy5OTMWQRGTvIQZg1QE3xxtrT2qCq4ypu0EHQbh3GdfcIVQ
 8n+s/Dde6etmbSwDDdEuRi///zM+hvaiXi5KOV28LYgRDbU78cAX8uRgX9sq2pg+
 WxK9ulprkW6Ci1yTts9Q6FY+ZBekg7NBKXXDCJdPwXxTLRrdci68pPZip12AaWxP
 2HYxWhE8LvmKAw==
 =jaX5
 -----END PGP SIGNATURE-----

Merge tag 'irq-urgent-2020-04-19' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull irq fixes from Thomas Gleixner:
 "A set of fixes/updates for the interrupt subsystem:

   - Remove setup_irq() and remove_irq(). All users have been converted
     so remove them before new users surface.

   - A set of bugfixes for various interrupt chip drivers

   - Add a few missing static attributes to address sparse warnings"

* tag 'irq-urgent-2020-04-19' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  irqchip/irq-bcm7038-l1: Make bcm7038_l1_of_init() static
  irqchip/irq-mvebu-icu: Make legacy_bindings static
  irqchip/meson-gpio: Fix HARDIRQ-safe -> HARDIRQ-unsafe lock order
  irqchip/sifive-plic: Fix maximum priority threshold value
  irqchip/ti-sci-inta: Fix processing of masked irqs
  irqchip/mbigen: Free msi_desc on device teardown
  irqchip/gic-v4.1: Update effective affinity of virtual SGIs
  irqchip/gic-v4.1: Add support for VPENDBASER's Dirty+Valid signaling
  genirq: Remove setup_irq() and remove_irq()
2020-04-19 11:23:33 -07:00
Linus Torvalds
08dd387277 Two fixes for the scheduler:
- Work around an uninitializaed variable warning where GCC can't figure it
    out.
 
  - Allow 'isolcpus=' to skip unknown subparameters so that older kernels
    work with the commandline of a newer kernel. Improve the error output
    while at it.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAl6cVFwTHHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYoZAaD/9i9QgLuj1Ka59kNPAs68i5Kjar72VS
 us1dM2n0Tx6lIUEYsdJsu4GTRi5NEBqLbmwSgsXROnhI6Jd17hHp5JViezk1GZWc
 Zg2uARAj9Jsqh2q5IjriNOwzq47PDC4dmSUzaecJff8PqGkk9Lpry6qvx3A02uSn
 tVVQAXqwCbPTaQzuhEf/q6mbfRaO90tVqGdseD+1wE0FBFfPLwddegLEGhL1vYsA
 55UhpKCGsS9lrfmgkxk1Xb3e0pJBObiV0SXdn2qHqJTpVUaDTZzsWgJHXg+0Fe1V
 0ZsuGfmaaisYPBZmqRo4HALbkgnvVECSbp7FAnhvqiQMyNaciiwkkFv9Ap5+aayf
 c8wXz/emAmuEMNzipovyFUITCIOs6IL1CkESsbO8Bgx9sTHO+pcgNEYrsX1953UC
 45GjhXR3ymnclqsVqfMWIcNRukk0g9W38yp1DgA5IIhVz1rHogEquD9F10qsCGb1
 FgSOnyGlU0I0JR5tEfqR0TeCuqLGKB2NvnEgLU4OVpsdEC5ac87uvzWEZuOmR5Z4
 vQCkps1z1ABW5fB/kFO83OiA5BZfDGnq5Vvh6XDOv6EeWjhIXrolu6VeTYpBSInq
 w0oNpsaA9wsy7WIy1RJ8jtSNsgS8fULCE5lUBtFeSUY/T7IcWd0lwnTlL97A4qzg
 GdYVT/UAHLCzCA==
 =AKgh
 -----END PGP SIGNATURE-----

Merge tag 'sched-urgent-2020-04-19' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull scheduler fixes from Thomas Gleixner:
 "Two fixes for the scheduler:

   - Work around an uninitialized variable warning where GCC can't
     figure it out.

   - Allow 'isolcpus=' to skip unknown subparameters so that older
     kernels work with the commandline of a newer kernel. Improve the
     error output while at it"

* tag 'sched-urgent-2020-04-19' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  sched/vtime: Work around an unitialized variable warning
  sched/isolation: Allow "isolcpus=" to skip unknown sub-parameters
2020-04-19 11:18:20 -07:00
Linus Torvalds
5e7de58127 A single bugfix for RCU to prevent taking a lock in NMI context.
-----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAl6cUf8THHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYofFtD/92Ufh1t+1uSe/txUJ/lTdY98cpcD5i
 UZJYdF102VLkhwA3Ors0Udtrnvuyxb5FFSfJZ3/N7tsNaXgcz1QOQsqoZz4SSgLJ
 pPj+2v+LNI6rIWDXuwszbLM/nT1mJGAK9NQ6+AhvIyBMPbht4/Lajsv7vkFMTzXw
 8o+a5NqLKWDca7/eLcgbcMiSsulRZxRld0D7MSP7RBeEWeylt31q3JIBp7ldzJ77
 0KCdQEd3TAkP+hOZW98CNgcLgGtCxiOJ5EgjUOFJyD/+5mj219czKF53HXnn4amk
 5lmdzSB0RKV2JTNFKNZQOobgMPp8VIIf6R6QlDp5MdrGYWTIbV0p5Hak2u41Cyma
 BfxxkVZJipjC7mgAvZLgy0/Md7n2Eu5uAW0e72AYEmv7IwOGyHh9YL7IYiZQld67
 5q8xEgrIIpaCwscVjZqP3+GHc8KGWYuv8puMCeOk1v7UeWsRlc8j/eGWIXnY4E8v
 wvqWAB91dHlBh+CHaFtdy97buYinVzW/Tv2NTxLFKgxyGJTg82H2hdTpjRVYi5Z4
 DM2NQRLcD1ozvh8tsFzXWP+/uemlE+EUPBZofCjJ0WtzH1GWarf3YNqviFqldRLr
 GyEFoyIc3Ra/hTEzD9yCC0UWwJubhAVLWOuu9pJKuaaei8s1aiusABQGbz5sNG9l
 AcpB9KFMtsLAXA==
 =QjbA
 -----END PGP SIGNATURE-----

Merge tag 'core-urgent-2020-04-19' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull RCU fix from Thomas Gleixner:
 "A single bugfix for RCU to prevent taking a lock in NMI context"

* tag 'core-urgent-2020-04-19' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  rcu: Don't acquire lock in NMI handler in rcu_nmi_enter_common()
2020-04-19 11:16:00 -07:00
Linus Torvalds
774acb2a09 for-linus-2020-04-18
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCXprWIAAKCRCRxhvAZXjc
 omUyAQCQcvJQhilLv0b7FtBAbN7+TkzV8vAQTzEITuHPa6m/HwEA2Gp9ZDTJfQbV
 T6utOrTm/LT0mfBkiDLSnLPtVzh7mgE=
 =Jz3d
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-2020-04-18' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux

Pull thread fixes from Christian Brauner:
 "A few fixes and minor improvements:

   - Correctly validate the cgroup file descriptor when clone3() is used
     with CLONE_INTO_CGROUP.

   - Check that a new enough version of struct clone_args is passed
     which supports the cgroup file descriptor argument when
     CLONE_INTO_CGROUP is set in the flags argument.

   - Catch nonsensical struct clone_args layouts at build time.

   - Catch extensions of struct clone_args without updating the uapi
     visible size definitions at build time.

   - Check whether the signal is valid early in kill_pid_usb_asyncio()
     before doing further work.

   - Replace open-coded rcu_read_lock()+kill_pid_info()+rcu_read_unlock()
     sequence in kill_something_info() with kill_proc_info() which is a
     dedicated helper to do just that"

* tag 'for-linus-2020-04-18' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux:
  clone3: add build-time CLONE_ARGS_SIZE_VER* validity checks
  clone3: add a check for the user struct size if CLONE_INTO_CGROUP is set
  clone3: fix cgroup argument sanity check
  signal: use kill_proc_info instead of kill_pid_info in kill_something_info
  signal: check sig before setting info in kill_pid_usb_asyncio
2020-04-18 11:38:51 -07:00
Linus Torvalds
c8372665b4 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from David Miller:

 1) Disable RISCV BPF JIT builds when !MMU, from Björn Töpel.

 2) nf_tables leaves dangling pointer after free, fix from Eric Dumazet.

 3) Out of boundary write in __xsk_rcv_memcpy(), fix from Li RongQing.

 4) Adjust icmp6 message source address selection when routes have a
    preferred source address set, from Tim Stallard.

 5) Be sure to validate HSR protocol version when creating new links,
    from Taehee Yoo.

 6) CAP_NET_ADMIN should be sufficient to manage l2tp tunnels even in
    non-initial namespaces, from Michael Weiß.

 7) Missing release firmware call in mlx5, from Eran Ben Elisha.

 8) Fix variable type in macsec_changelink(), caught by KASAN. Fix from
    Taehee Yoo.

 9) Fix pause frame negotiation in marvell phy driver, from Clemens
    Gruber.

10) Record RX queue early enough in tun packet paths such that XDP
    programs will see the correct RX queue index, from Gilberto Bertin.

11) Fix double unlock in mptcp, from Florian Westphal.

12) Fix offset overflow in ARM bpf JIT, from Luke Nelson.

13) marvell10g needs to soft reset PHY when coming out of low power
    mode, from Russell King.

14) Fix MTU setting regression in stmmac for some chip types, from
    Florian Fainelli.

* git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (101 commits)
  amd-xgbe: Use __napi_schedule() in BH context
  mISDN: make dmril and dmrim static
  net: stmmac: dwmac-sunxi: Provide TX and RX fifo sizes
  net: dsa: mt7530: fix tagged frames pass-through in VLAN-unaware mode
  tipc: fix incorrect increasing of link window
  Documentation: Fix tcp_challenge_ack_limit default value
  net: tulip: make early_486_chipsets static
  dt-bindings: net: ethernet-phy: add desciption for ethernet-phy-id1234.d400
  ipv6: remove redundant assignment to variable err
  net/rds: Use ERR_PTR for rds_message_alloc_sgs()
  net: mscc: ocelot: fix untagged packet drops when enslaving to vlan aware bridge
  selftests/bpf: Check for correct program attach/detach in xdp_attach test
  libbpf: Fix type of old_fd in bpf_xdp_set_link_opts
  libbpf: Always specify expected_attach_type on program load if supported
  xsk: Add missing check on user supplied headroom size
  mac80211: fix channel switch trigger from unknown mesh peer
  mac80211: fix race in ieee80211_register_hw()
  net: marvell10g: soft-reset the PHY when coming out of low power
  net: marvell10g: report firmware version
  net/cxgb4: Check the return from t4_query_params properly
  ...
2020-04-16 14:52:29 -07:00
Andrei Vagin
94d440d618 proc, time/namespace: Show clock symbolic names in /proc/pid/timens_offsets
Michael Kerrisk suggested to replace numeric clock IDs with symbolic names.

Now the content of these files looks like this:
$ cat /proc/774/timens_offsets
monotonic      864000         0
boottime      1728000         0

For setting offsets, both representations of clocks (numeric and symbolic)
can be used.

As for compatibility, it is acceptable to change things as long as
userspace doesn't care. The format of timens_offsets files is very new and
there are no userspace tools yet which rely on this format.

But three projects crun, util-linux and criu rely on the interface of
setting time offsets and this is why it's required to continue supporting
the numeric clock IDs on write.

Fixes: 04a8682a71 ("fs/proc: Introduce /proc/pid/timens_offsets")
Suggested-by: Michael Kerrisk <mtk.manpages@gmail.com>
Signed-off-by: Andrei Vagin <avagin@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Michael Kerrisk <mtk.manpages@gmail.com>
Acked-by: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Dmitry Safonov <0x7f454c46@gmail.com>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20200411154031.642557-1-avagin@gmail.com
2020-04-16 12:10:54 +02:00
Borislav Petkov
e0d648f9d8 sched/vtime: Work around an unitialized variable warning
Work around this warning:

  kernel/sched/cputime.c: In function ‘kcpustat_field’:
  kernel/sched/cputime.c:1007:6: warning: ‘val’ may be used uninitialized in this function [-Wmaybe-uninitialized]

because GCC can't see that val is used only when err is 0.

Acked-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20200327214334.GF8015@zn.tnic
2020-04-15 11:06:50 +02:00
Peter Xu
3662daf023 sched/isolation: Allow "isolcpus=" to skip unknown sub-parameters
The "isolcpus=" parameter allows sub-parameters before the cpulist is
specified, and if the parser detects an unknown sub-parameters the whole
parameter will be ignored.

This design is incompatible with itself when new sub-parameters are added.
An older kernel will not recognize the new sub-parameter and will
invalidate the whole parameter so the CPU isolation will not take
effect. It emits a warning:

    isolcpus: Error, unknown flag

The better and compatible way is to allow "isolcpus=" to skip unknown
sub-parameters, so that even if new sub-parameters are added an older
kernel will still be able to behave as usual even if with the new
sub-parameter specified on the command line.

Ideally this should have been there when the first sub-parameter for
"isolcpus=" was introduced.

Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20200403223517.406353-1-peterx@redhat.com
2020-04-15 10:38:26 +02:00
Eugene Syromiatnikov
a966dcfe15 clone3: add build-time CLONE_ARGS_SIZE_VER* validity checks
CLONE_ARGS_SIZE_VER* macros are defined explicitly and not via
the offsets of the relevant struct clone_args fields, which makes
it rather error-prone, so it probably makes sense to add some
compile-time checks for them (including the one that breaks
on struct clone_args extension as a reminder to add a relevant
size macro and a similar check).  Function copy_clone_args_from_user
seems to be a good place for such checks.

Signed-off-by: Eugene Syromiatnikov <esyr@redhat.com>
Acked-by: Christian Brauner <christian.brauner@ubuntu.com>
Link: https://lore.kernel.org/r/20200412202658.GA31499@asgard.redhat.com
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2020-04-15 09:56:32 +02:00
Eugene Syromiatnikov
62173872ca clone3: add a check for the user struct size if CLONE_INTO_CGROUP is set
Passing CLONE_INTO_CGROUP with an under-sized structure (that doesn't
properly contain cgroup field) seems like garbage input, especially
considering the fact that fd 0 is a valid descriptor.

Signed-off-by: Eugene Syromiatnikov <esyr@redhat.com>
Acked-by: Christian Brauner <christian.brauner@ubuntu.com>
Link: https://lore.kernel.org/r/20200412203123.GA5869@asgard.redhat.com
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2020-04-15 09:56:25 +02:00
Eugene Syromiatnikov
e82a118f57 clone3: fix cgroup argument sanity check
Checking that cgroup field value of struct clone_args is less than 0
is useless, as it is defined as unsigned 64-bit integer.  Moreover,
it doesn't catch the situations where its higher bits are lost during
the assignment to the cgroup field of the cgroup field of the internal
struct kernel_clone_args (where it is declared as signed 32-bit
integer), so it is still possible to pass garbage there.  A check
against INT_MAX solves both these issues.

Fixes: ef2c41cf38 ("clone3: allow spawning processes into cgroups")
Signed-off-by: Eugene Syromiatnikov <esyr@redhat.com>
Acked-by: Christian Brauner <christian.brauner@ubuntu.com>
Link: https://lore.kernel.org/r/20200412202533.GA29554@asgard.redhat.com
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2020-04-15 09:56:12 +02:00
Xiao Yang
0bbe7f7199 tracing: Fix the race between registering 'snapshot' event trigger and triggering 'snapshot' operation
Traced event can trigger 'snapshot' operation(i.e. calls snapshot_trigger()
or snapshot_count_trigger()) when register_snapshot_trigger() has completed
registration but doesn't allocate buffer for 'snapshot' event trigger.  In
the rare case, 'snapshot' operation always detects the lack of allocated
buffer so make register_snapshot_trigger() allocate buffer first.

trigger-snapshot.tc in kselftest reproduces the issue on slow vm:
-----------------------------------------------------------
cat trace
...
ftracetest-3028  [002] ....   236.784290: sched_process_fork: comm=ftracetest pid=3028 child_comm=ftracetest child_pid=3036
     <...>-2875  [003] ....   240.460335: tracing_snapshot_instance_cond: *** SNAPSHOT NOT ALLOCATED ***
     <...>-2875  [003] ....   240.460338: tracing_snapshot_instance_cond: *** stopping trace here!   ***
-----------------------------------------------------------

Link: http://lkml.kernel.org/r/20200414015145.66236-1-yangx.jy@cn.fujitsu.com

Cc: stable@vger.kernel.org
Fixes: 93e31ffbf4 ("tracing: Add 'snapshot' event trigger command")
Signed-off-by: Xiao Yang <yangx.jy@cn.fujitsu.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-04-14 22:02:10 -04:00
Zou Wei
89f33dcadb bpf: remove unneeded conversion to bool in __mark_reg_unknown
This issue was detected by using the Coccinelle software:

  kernel/bpf/verifier.c:1259:16-21: WARNING: conversion to bool not needed here

The conversion to bool is unneeded, remove it.

Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Zou Wei <zou_wei@huawei.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Song Liu <songliubraving@fb.com>
Link: https://lore.kernel.org/bpf/1586779076-101346-1-git-send-email-zou_wei@huawei.com
2020-04-14 21:40:06 +02:00
Andrii Nakryiko
1f6cb19be2 bpf: Prevent re-mmap()'ing BPF map as writable for initially r/o mapping
VM_MAYWRITE flag during initial memory mapping determines if already mmap()'ed
pages can be later remapped as writable ones through mprotect() call. To
prevent user application to rewrite contents of memory-mapped as read-only and
subsequently frozen BPF map, remove VM_MAYWRITE flag completely on initially
read-only mapping.

Alternatively, we could treat any memory-mapping on unfrozen map as writable
and bump writecnt instead. But there is little legitimate reason to map
BPF map as read-only and then re-mmap() it as writable through mprotect(),
instead of just mmap()'ing it as read/write from the very beginning.

Also, at the suggestion of Jann Horn, drop unnecessary refcounting in mmap
operations. We can just rely on VMA holding reference to BPF map's file
properly.

Fixes: fc9702273e ("bpf: Add mmap() support for BPF_MAP_TYPE_ARRAY")
Reported-by: Jann Horn <jannh@google.com>
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Jann Horn <jannh@google.com>
Link: https://lore.kernel.org/bpf/20200410202613.3679837-1-andriin@fb.com
2020-04-14 21:28:57 +02:00
afzal mohammed
07d8350ede genirq: Remove setup_irq() and remove_irq()
Now that all the users of setup_irq() & remove_irq() have been replaced by
request_irq() & free_irq() respectively, delete them.

Signed-off-by: afzal mohammed <afzal.mohd.ma@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Link: https://lkml.kernel.org/r/0aa8771ada1ac8e1312f6882980c9c08bd023148.1585320721.git.afzal.mohd.ma@gmail.com
2020-04-14 10:08:50 +02:00
Ingo Molnar
40e7d7bdc1 Merge branch 'urgent-for-mingo' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu into core/urgent
Pull RCU fix from Paul E. McKenney.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2020-04-14 08:36:41 +02:00
Zhiqiang Liu
3075afdf15 signal: use kill_proc_info instead of kill_pid_info in kill_something_info
signal.c provides kill_proc_info, we can use it instead of kill_pid_info
in kill_something_info func gracefully.

Signed-off-by: Zhiqiang Liu <liuzhiqiang26@huawei.com>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Christian Brauner <christian.brauner@ubuntu.com>
Link: https://lore.kernel.org/r/80236965-f0b5-c888-95ff-855bdec75bb3@huawei.com
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2020-04-12 22:46:34 +02:00
Zhiqiang Liu
eaec2b0bd3 signal: check sig before setting info in kill_pid_usb_asyncio
In kill_pid_usb_asyncio, if signal is not valid, we do not need to
set info struct.

Signed-off-by: Zhiqiang Liu <liuzhiqiang26@huawei.com>
Acked-by: Christian Brauner <christian.brauner@ubuntu.com>
Link: https://lore.kernel.org/r/f525fd08-1cf7-fb09-d20c-4359145eb940@huawei.com
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2020-04-12 22:46:34 +02:00
Linus Torvalds
0785249f8b Time(keeping) updates:
- Fix the time_for_children symlink in /proc/$PID/ so it properly reflects
    that it part of the 'time' namespace
 
  - Add the missing userns limit for the allowed number of time namespaces,
    which was half defined but the actual array member was not added.  This
    went unnoticed as the array has an exessive empty member at the end but
    introduced a user visible regression as the output was corrupted.
 
  - Prevent further silent ucount corruption by adding a BUILD_BUG_ON() to
    catch half updated data.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAl6TFe4THHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYob4PD/47Qwz2z2mEeO037VbbI2gY4yl/raFo
 5KPWmnwonKrtVaYAldLutA3iaG7bBbUX5fRvbSRNTS6CJIHwwfLSx7/CeWMmIXEX
 0zsBsn5QXjG89lJZXM+ot74yzjvkeoad2g0jEHv92v0WDSXFiAWhkBUwknfNFbpa
 csEjkdpyn2zTVBGBzKVHWHXddkY0o0Q0JOy0EiH09rHGpQktPoLJdYp73VCygoJd
 NRAXhTmBQq85RMcSB3eVTbSPpIuBUzZke9zoio7YZwEjl6bkvSqetPmTdIr57u4s
 ex3PX++64EXD7r8ZW36fPGDqu6v0CH2ILK7QVhwyHAYJo2LQKVd+v25muaFrzfpn
 dSG1SqabWqdIHUoW/76ORyecAFLTzGwDu07UH+6VJbXeLfmuhe/LI3hdDQFph9NQ
 BOBKhaHm8aXmAmvrkxbbAikSkJYVHrAIp5abI4PSYoPaqK1DWnSPaT1cqtaIUgYL
 Mk15z19V9np4lMCH2cucAlap8U9EvQEIfCRRdl+crDu17ZzGID1pwhY2DA8adqcT
 SUfwzzUaykd5TZtDeIe+6G9fsgf/wbSTSSbrNGKlLXDbxx+iNVXErkmx0JXLEHV4
 47cmBwQZ255DzjMfuS4HzCck2MaaP8mDWgcbszgkP+GFnkf9EAP5XNp9st937mbG
 rzP+NkjNCldN9w==
 =wOiC
 -----END PGP SIGNATURE-----

Merge tag 'timers-urgent-2020-04-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull time(keeping) updates from Thomas Gleixner:

 - Fix the time_for_children symlink in /proc/$PID/ so it properly
   reflects that it part of the 'time' namespace

 - Add the missing userns limit for the allowed number of time
   namespaces, which was half defined but the actual array member was
   not added. This went unnoticed as the array has an exessive empty
   member at the end but introduced a user visible regression as the
   output was corrupted.

 - Prevent further silent ucount corruption by adding a BUILD_BUG_ON()
   to catch half updated data.

* tag 'timers-urgent-2020-04-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  ucount: Make sure ucounts in /proc/sys/user don't regress again
  time/namespace: Add max_time_namespaces ucount
  time/namespace: Fix time_for_children symlink
2020-04-12 10:13:14 -07:00
Linus Torvalds
590680d139 Scheduler fixes/updates:
- Deduplicate the average computations in the scheduler core and the fair
    class code.
 
  - Fix a raise between runtime distribution and assignement which can cause
    exceeding the quota by up to 70%.
 
  - Prevent negative results in the imbalanace calculation
 
  - Remove a stale warning in the workqueue code which can be triggered
    since the call site was moved out of preempt disabled code. It's a false
    positive.
 
  - Deduplicate the print macros for procfs
 
  - Add the ucmap values to the SCHED_DEBUG procfs output for completness
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAl6TFEoTHHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYoRoPD/9cTERNZb/aT/SVD3TntBs5bMlnRR2U
 KsyoN7w2bmYGY6XbKdjWX+KorPyJ9YRcUpOfmH+6JmMioM5RrgCtAQ4cXP4q89k3
 6+hdHu+WAhp2AyKr5LAYZK+NYdq0fy/9xx7fdXNri8/QakQCy1vu/u6iiKRMmrEf
 R7zB7v4ddTOTKamUuGBNnmM4GwfrD7td9NksfjDqV4yJ2ZkaQ+apWAPzJK8ixfEa
 TGjVnnUZA82tZRZ+O4RDN2AVgG0CuXYRYPWRfqi3XgQ3d0+ju5mM7q+6TQeEUuAq
 19IoscLhmYGb9yYCY5p0j1AUkyr6FcSFr1SJt/8Jxpyw4JsKw+ANFBA3kkzEBZ+h
 0VcWjUw4S6A6+0HdrqUQYjr5tl7RCY+hOE+QQ/Xvrm4PpWi0L+1tB+kgR7VKNPO8
 Xqfp/CK7Rcgh2/nUHT4uxdwh4ZNsLo39QGTuahyPXzeRVJTEGUjbzktnEUVc9wmd
 OfjpC0Q2DBdXx+vnzH82flv/rtUUYor/Owhpq92E6d50Z7CjTa6xRNbmQUiVKls8
 jqehpRBUg5cB3TI0BKeb5+kzgeGDGpm7MfZbvSwvoL0stdJdzNwwc4nhOLjPkJFK
 R5m7cg82Kx+r/00Vb7k6WgYZ10WTS2Il1OKbgwHl5uTlaKuY7TPEIRVuJQV/XUSG
 88n4jIwpRbLMNw==
 =BVqY
 -----END PGP SIGNATURE-----

Merge tag 'sched-urgent-2020-04-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull scheduler fixes/updates from Thomas Gleixner:

 - Deduplicate the average computations in the scheduler core and the
   fair class code.

 - Fix a raise between runtime distribution and assignement which can
   cause exceeding the quota by up to 70%.

 - Prevent negative results in the imbalanace calculation

 - Remove a stale warning in the workqueue code which can be triggered
   since the call site was moved out of preempt disabled code. It's a
   false positive.

 - Deduplicate the print macros for procfs

 - Add the ucmap values to the SCHED_DEBUG procfs output for completness

* tag 'sched-urgent-2020-04-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  sched/debug: Add task uclamp values to SCHED_DEBUG procfs
  sched/debug: Factor out printing formats into common macros
  sched/debug: Remove redundant macro define
  sched/core: Remove unused rq::last_load_update_tick
  workqueue: Remove the warning in wq_worker_sleeping()
  sched/fair: Fix negative imbalance in imbalance calculation
  sched/fair: Fix race between runtime distribution and assignment
  sched/fair: Align rq->avg_idle and rq->avg_scan_cost
2020-04-12 10:09:19 -07:00
Linus Torvalds
20e2aa8126 Thre fixes/updates for perf:
- Fix the perf event cgroup tracking which tries to track the cgroup even
    for disabled events.
 
  - Add Ice Lake server support for uncore events
 
  - Disable pagefaults when retrieving the physical address in the sampling
    code.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAl6TEbMTHHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYoYvgD/9Ufht3qrbE29oTS+q2x6PZRc01hvMW
 c3wh/7Anh5yBlTioUEl1tbNfSD1k9hKqMQJF6iPZQ4tzft6JAunstICngqJrJqSY
 si22mgY6QGKrA2+4UuxT7FZD5nrEhy1TrGNIOp/86Y9I59PAlrypWMxq71VUPjgB
 Yy9r6eSJqNX05r9tZy1WMloJaBBieaVvEefK9ZrO4s1XM/RU5pAl+/B1XauK8XN9
 e4bzu7d6r+w23pFuEqk3r4KWozafxczJAqV+w6/Me1lFgNj6m2GKq429bL/Bgj8d
 Re8N4donynPe9yvuDWS4eHD1z0nhOMQhOv85seBwBcEet0PnEPtyw/eUu2zZTE2J
 h1R7l8I3RIs9EmXtdoOuzLFbc8a9sw/lGQJYdP70TMEtLKPUcGkfNhwP196CRbZK
 TBVub3jP0STmo8u7PrkcMb53ZBZ17P317ous52f0xgkFB6YVqv29cB+NEyabjKMI
 fP9ZAaoJ9Rn7T9nji/8KsUcjpCUBhP//YWnf/Vax5PW+hkJhfrLOpesKpX1uAwzd
 fc4QzEXYr/a4YI30IvqSJBsJJZfvFl9ikMaXrv5TbR8nj/eszJInho5+olQj54A7
 SLzCOpVHc3jJ13d/C5Xjo0LhwSqjLv3JtvlFG34XHl+YlZUw0ua9u4Ld5vDIDYgw
 mwHHFp3P/yFAfQ==
 =5kqD
 -----END PGP SIGNATURE-----

Merge tag 'perf-urgent-2020-04-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull perf fixes from Thomas Gleixner:
 "Three fixes/updates for perf:

   - Fix the perf event cgroup tracking which tries to track the cgroup
     even for disabled events.

   - Add Ice Lake server support for uncore events

   - Disable pagefaults when retrieving the physical address in the
     sampling code"

* tag 'perf-urgent-2020-04-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf/core: Disable page faults when getting phys address
  perf/x86/intel/uncore: Add Ice Lake server uncore support
  perf/cgroup: Correct indirection in perf_less_group_idx()
  perf/core: Fix event cgroup tracking
2020-04-12 10:05:24 -07:00
Linus Torvalds
652fa53caa Three small fixes/updates for the locking core code:
- Plug a task struct reference leak in the percpu rswem implementation.
 
  - Document the refcount interaction with PID_MAX_LIMIT
 
  - Improve the 'invalid wait context' data dump in lockdep so it contains
    all information which is required to decode the problem
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAl6TEJoTHHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYoeY6D/42AKqIvpAPiPA+Aod9o8o7qDCoPJQN
 kwCx939+Mwrwm8YS3WJGJvR/jgoGeq33gerY93D0+jkMyhmvYeWbUIEdCenyy0+C
 tkA7pN1j61D7jMqmL8PgrObhQMM8tTdNNsiNxg5qZA8wI2oFBMbAStx5mCficCyJ
 8t3hh/2aRVlNN1MBjo25+jEWEXUIchAzmxnmR+5t/pRYWythOhLuWW05D8b1JoN3
 9zdHj9ZScONlrK+VmpXZl41SokT385arMXcmUuS+zPx6zl4P6yTgeHMLsm4Kt3+j
 1q0hm4fUEEW2IrF1Uc9pxtJh/39UVwvX8Wfj/wJLm3b1akmO+3tQ2cpiee4NHk2H
 WogvBNGG4QeGMrRkg0seiqUE6SOaNPXdGRszPc6EkzwpOywDjHVYV5qbII1VOpbK
 z/TJrtv8g8ejgAvyG0dEtEDydm2SqPAIAZFXj9JQ0I/gUZKTW5rvGQWdMXW6HPif
 wdJJ6xb4DC3f5DrBrXgpkMSdwfea2boIoLDMf9tCcdlRs6vmvY3iRLHy2xBLcRFt
 9yVDUxO40eJUxztbFKuZsQvgJyBxZubyK+Yo9CCvWQ95GfIr2JurIJEcD2fxVyRU
 57/Avt3zt7iViZ9LHJnL4JnWRxftWOdWusF2dsufmnuveCVhtA5B5bl7nGntpIT2
 mfJEDrJ7zzhgyA==
 =Agrh
 -----END PGP SIGNATURE-----

Merge tag 'locking-urgent-2020-04-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull locking fixes from Thomas Gleixner:
 "Three small fixes/updates for the locking core code:

   - Plug a task struct reference leak in the percpu rswem
     implementation.

   - Document the refcount interaction with PID_MAX_LIMIT

   - Improve the 'invalid wait context' data dump in lockdep so it
     contains all information which is required to decode the problem"

* tag 'locking-urgent-2020-04-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  locking/lockdep: Improve 'invalid wait context' splat
  locking/refcount: Document interaction with PID_MAX_LIMIT
  locking/percpu-rwsem: Fix a task_struct refcount
2020-04-12 09:47:10 -07:00
Linus Torvalds
75e7188397 dma-mapping fixes for 5.7
- fix an integer truncation in dma_direct_get_required_mask
    (Kishon Vijay Abraham)
  - fix the display of dma mapping types (Grygorii Strashko)
 -----BEGIN PGP SIGNATURE-----
 
 iQI/BAABCgApFiEEgdbnc3r/njty3Iq9D55TZVIEUYMFAl6Rfy8LHGhjaEBsc3Qu
 ZGUACgkQD55TZVIEUYPT3w/+MKNVGmyi6nSsruskr1CkLCah5GkHrDA4v+vwwkFZ
 zIeB7df/EOijzfYHlhEkiQWBcPvl9wmVBhFsuaY9XMQ/fJv06CpR5J2xtb+uZN9L
 HmfwCJPZwrHDqsonIiOtIWjyIOARLGC0CfBP+DCPJ/GDG9wys7eyfESh85PQEDvO
 QA4sPjJHuKeTTH2Q6QLM/aiqgr80Q4xrQLCVcgDAqAvqnCYi/E4HXty4pLNGpvJu
 5O2xzmbQT2l3gRXjxt5Ct4pol+nFmiWJ0sBDNVNUqge7vebBiu3e0VjEaeXUPAMq
 hQSrof2x+EwHITV10L82/poIKNQVVXW3aYgItMNXfOSBNqaNx4wVLRsrcMZWeMeJ
 0OHmCy6Sac2dK4QtY/NtZFLL1zW4ajsj8X975RjqBMcL6QAdCMxNSWGEY/j/fMQy
 FV1spy8FBfvXH+0O2POkYNBx/eICnMbGQ2ed5NZ8ONAl8wFLPM+2qIwmgb51a3TG
 8jR613Zqyw0aMnky0MUwvj5TlwwUyyXbGKpizl0vkFw7/NYXroNMaYWJSHF/ddFp
 vUChNqYhgCFY8FOhanFadyPxqU1InEoOkUS8/ij7NAR6cCrxsLuAohMYSNQJes3R
 IvmU36FtjO53rBdHaN4Eizk4EHVaZDxoll05UPU/B5IvzXqR6zy0PphoOxfzvmmq
 Qfk=
 =Af93
 -----END PGP SIGNATURE-----

Merge tag 'dma-mapping-5.7-1' of git://git.infradead.org/users/hch/dma-mapping

Pull dma-mapping fixes from Christoph Hellwig:

 - fix an integer truncation in dma_direct_get_required_mask
   (Kishon Vijay Abraham)

 - fix the display of dma mapping types (Grygorii Strashko)

* tag 'dma-mapping-5.7-1' of git://git.infradead.org/users/hch/dma-mapping:
  dma-debug: fix displaying of dma allocation type
  dma-direct: fix data truncation in dma_direct_get_required_mask()
2020-04-11 11:34:36 -07:00
Linus Torvalds
5b8b9d0c6d Merge branch 'akpm' (patches from Andrew)
Merge yet more updates from Andrew Morton:

 - Almost all of the rest of MM (memcg, slab-generic, slab, pagealloc,
   gup, hugetlb, pagemap, memremap)

 - Various other things (hfs, ocfs2, kmod, misc, seqfile)

* akpm: (34 commits)
  ipc/util.c: sysvipc_find_ipc() should increase position index
  kernel/gcov/fs.c: gcov_seq_next() should increase position index
  fs/seq_file.c: seq_read(): add info message about buggy .next functions
  drivers/dma/tegra20-apb-dma.c: fix platform_get_irq.cocci warnings
  change email address for Pali Rohár
  selftests: kmod: test disabling module autoloading
  selftests: kmod: fix handling test numbers above 9
  docs: admin-guide: document the kernel.modprobe sysctl
  fs/filesystems.c: downgrade user-reachable WARN_ONCE() to pr_warn_once()
  kmod: make request_module() return an error when autoloading is disabled
  mm/memremap: set caching mode for PCI P2PDMA memory to WC
  mm/memory_hotplug: add pgprot_t to mhp_params
  powerpc/mm: thread pgprot_t through create_section_mapping()
  x86/mm: introduce __set_memory_prot()
  x86/mm: thread pgprot_t through init_memory_mapping()
  mm/memory_hotplug: rename mhp_restrictions to mhp_params
  mm/memory_hotplug: drop the flags field from struct mhp_restrictions
  mm/special: create generic fallbacks for pte_special() and pte_mkspecial()
  mm/vma: introduce VM_ACCESS_FLAGS
  mm/vma: define a default value for VM_DATA_DEFAULT_FLAGS
  ...
2020-04-10 17:57:48 -07:00
Vasily Averin
f4d74ef622 kernel/gcov/fs.c: gcov_seq_next() should increase position index
If seq_file .next function does not change position index, read after
some lseek can generate unexpected output.

https://bugzilla.kernel.org/show_bug.cgi?id=206283
Signed-off-by: Vasily Averin <vvs@virtuozzo.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Peter Oberparleiter <oberpar@linux.ibm.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Manfred Spraul <manfred@colorfullife.com>
Cc: NeilBrown <neilb@suse.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Waiman Long <longman@redhat.com>
Link: http://lkml.kernel.org/r/f65c6ee7-bd00-f910-2f8a-37cc67e4ff88@virtuozzo.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-04-10 15:36:22 -07:00
Eric Biggers
d7d27cfc5c kmod: make request_module() return an error when autoloading is disabled
Patch series "module autoloading fixes and cleanups", v5.

This series fixes a bug where request_module() was reporting success to
kernel code when module autoloading had been completely disabled via
'echo > /proc/sys/kernel/modprobe'.

It also addresses the issues raised on the original thread
(https://lkml.kernel.org/lkml/20200310223731.126894-1-ebiggers@kernel.org/T/#u)
bydocumenting the modprobe sysctl, adding a self-test for the empty path
case, and downgrading a user-reachable WARN_ONCE().

This patch (of 4):

It's long been possible to disable kernel module autoloading completely
(while still allowing manual module insertion) by setting
/proc/sys/kernel/modprobe to the empty string.

This can be preferable to setting it to a nonexistent file since it
avoids the overhead of an attempted execve(), avoids potential
deadlocks, and avoids the call to security_kernel_module_request() and
thus on SELinux-based systems eliminates the need to write SELinux rules
to dontaudit module_request.

However, when module autoloading is disabled in this way,
request_module() returns 0.  This is broken because callers expect 0 to
mean that the module was successfully loaded.

Apparently this was never noticed because this method of disabling
module autoloading isn't used much, and also most callers don't use the
return value of request_module() since it's always necessary to check
whether the module registered its functionality or not anyway.

But improperly returning 0 can indeed confuse a few callers, for example
get_fs_type() in fs/filesystems.c where it causes a WARNING to be hit:

	if (!fs && (request_module("fs-%.*s", len, name) == 0)) {
		fs = __get_fs_type(name, len);
		WARN_ONCE(!fs, "request_module fs-%.*s succeeded, but still no fs?\n", len, name);
	}

This is easily reproduced with:

	echo > /proc/sys/kernel/modprobe
	mount -t NONEXISTENT none /

It causes:

	request_module fs-NONEXISTENT succeeded, but still no fs?
	WARNING: CPU: 1 PID: 1106 at fs/filesystems.c:275 get_fs_type+0xd6/0xf0
	[...]

This should actually use pr_warn_once() rather than WARN_ONCE(), since
it's also user-reachable if userspace immediately unloads the module.
Regardless, request_module() should correctly return an error when it
fails.  So let's make it return -ENOENT, which matches the error when
the modprobe binary doesn't exist.

I've also sent patches to document and test this case.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Jessica Yu <jeyu@kernel.org>
Acked-by: Luis Chamberlain <mcgrof@kernel.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Jeff Vander Stoep <jeffv@google.com>
Cc: Ben Hutchings <benh@debian.org>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20200310223731.126894-1-ebiggers@kernel.org
Link: http://lkml.kernel.org/r/20200312202552.241885-1-ebiggers@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-04-10 15:36:22 -07:00
Sergey Senozhatsky
ab6f762f0f printk: queue wake_up_klogd irq_work only if per-CPU areas are ready
printk_deferred(), similarly to printk_safe/printk_nmi, does not
immediately attempt to print a new message on the consoles, avoiding
calls into non-reentrant kernel paths, e.g. scheduler or timekeeping,
which potentially can deadlock the system.

Those printk() flavors, instead, rely on per-CPU flush irq_work to print
messages from safer contexts.  For same reasons (recursive scheduler or
timekeeping calls) printk() uses per-CPU irq_work in order to wake up
user space syslog/kmsg readers.

However, only printk_safe/printk_nmi do make sure that per-CPU areas
have been initialised and that it's safe to modify per-CPU irq_work.
This means that, for instance, should printk_deferred() be invoked "too
early", that is before per-CPU areas are initialised, printk_deferred()
will perform illegal per-CPU access.

Lech Perczak [0] reports that after commit 1b710b1b10 ("char/random:
silence a lockdep splat with printk()") user-space syslog/kmsg readers
are not able to read new kernel messages.

The reason is printk_deferred() being called too early (as was pointed
out by Petr and John).

Fix printk_deferred() and do not queue per-CPU irq_work before per-CPU
areas are initialized.

Link: https://lore.kernel.org/lkml/aa0732c6-5c4e-8a8b-a1c1-75ebe3dca05b@camlintechnologies.com/
Reported-by: Lech Perczak <l.perczak@camlintechnologies.com>
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Tested-by: Jann Horn <jannh@google.com>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-04-10 13:18:57 -07:00
Linus Torvalds
87ad46e601 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace
Pull proc fix from Eric Biederman:
 "A brown paper bag slipped through my proc changes, and syzcaller
  caught it when the code ended up in your tree.

  I have opted to fix it the simplest cleanest way I know how, so there
  is no reasonable chance for the bug to repeat"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
  proc: Use a dedicated lock in struct pid
2020-04-10 12:59:56 -07:00
Linus Torvalds
bbec2a2dc3 More power management updates for 5.7-rc1
Rework compat ioctl handling in the user space hibernation
 interface (Christoph Hellwig) and fix a typo in a function
 name in the cpuidle haltpoll driver (Yihao Wu).
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCAAwFiEE4fcc61cGeeHD/fCwgsRv/nhiVHEFAl6QPXkSHHJqd0Byand5
 c29ja2kubmV0AAoJEILEb/54YlRx918P+NU9lM++pMi2Ng4wHmtRtHW9mH4M6L54
 /LTjDNZ4IE/v6JlK6hzaFUAhrFViI23gmUat8a7TT/o1NUsPS0QHxatFK+nGbjEk
 blB/rBHWpC5vGo8SZbqmvI0hbRn0Q6Ah5I+iV+KAk9Z76mDEMNHZKpP2CfiqSVQE
 QYsUIJOSUo/CMT1SZRE/xDrvoU418Y1Ed6a6Kn9Ki5uXvqDTPHoAyETZ9M6tLphP
 kGBpbnbkHx3FyYZ3EyjVEd8O6cDsxS2gIWu6YUCB31N4G7v3bJ6rfgperTCN999J
 8eQY9rNVlaAWIP9t0ObC3xOpoaNUcZC+V+yaHev3LU/6LP4Sued8cNBxQn26/RWN
 vyM5M6K0OphjPll9QfVfZFRUcuoBMyAtgVYw8mwT64GVJt3ukbd2QYDQHORXBQEC
 ziTtfLQEuAiUw/DRoTGo2XYDTlnd2nu9mFoTRU6juOawOwgwPbQldOtSJiMtCDoR
 cyaQH/t528w10jGCa+mIJZToTIet+gN6ui83M3kQdxMTa5ulgPClU8Kujn1wPgKX
 6jFrf+NJ6SNm6A/EhPk9t/soe7hhzyGwqx351aMQpZGX6UH4hQQPXgC0tyZ71Uj5
 UJ7Ys5RHcXg4kub/FJsfhv/3wdSPiekwe+U89UCQY5lxXC2x3iVjp50B4auCjW23
 tQVHpAvegWg=
 =h4ud
 -----END PGP SIGNATURE-----

Merge tag 'pm-5.7-rc1-3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull more power management updates from Rafael Wysocki:
 "Rework compat ioctl handling in the user space hibernation interface
  (Christoph Hellwig) and fix a typo in a function name in the cpuidle
  haltpoll driver (Yihao Wu)"

* tag 'pm-5.7-rc1-3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  cpuidle-haltpoll: Fix small typo
  PM / sleep: handle the compat case in snapshot_set_swap_area()
  PM / sleep: move SNAPSHOT_SET_SWAP_AREA handling into a helper
2020-04-10 09:50:00 -07:00
David S. Miller
40fc7ad2c8 Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf
Daniel Borkmann says:

====================
pull-request: bpf 2020-04-10

The following pull-request contains BPF updates for your *net* tree.

We've added 13 non-merge commits during the last 7 day(s) which contain
a total of 13 files changed, 137 insertions(+), 43 deletions(-).

The main changes are:

1) JIT code emission fixes for riscv and arm32, from Luke Nelson and Xi Wang.

2) Disable vmlinux BTF info if GCC_PLUGIN_RANDSTRUCT is used, from Slava Bacherikov.

3) Fix oob write in AF_XDP when meta data is used, from Li RongQing.

4) Fix bpf_get_link_xdp_id() handling on single prog when flags are specified,
   from Andrey Ignatov.

5) Fix sk_assign() BPF helper for request sockets that can have sk_reuseport
   field uninitialized, from Joe Stringer.

6) Fix mprotect() test case for the BPF LSM, from KP Singh.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-09 17:39:22 -07:00