A rather large update for timekeeping and timers:

- The final step to get rid of auto-rearming posix-timers
 
     posix-timers are currently auto-rearmed by the kernel when the signal
     of the timer is ignored so that the timer signal can be delivered once
     the corresponding signal is unignored.
 
     This requires to throttle the timer to prevent a DoS by small intervals
     and keeps the system pointlessly out of low power states for no value.
     This is a long standing non-trivial problem due to the lock order of
     posix-timer lock and the sighand lock along with life time issues as
     the timer and the sigqueue have different life time rules.
 
     Cure this by:
 
      * Embedding the sigqueue into the timer struct to have the same life
        time rules. Aside of that this also avoids the lookup of the timer
        in the signal delivery and rearm path as it's just a always valid
        container_of() now.
 
      * Queuing ignored timer signals onto a seperate ignored list.
 
      * Moving queued timer signals onto the ignored list when the signal is
        switched to SIG_IGN before it could be delivered.
 
      * Walking the ignored list when SIG_IGN is lifted and requeue the
        signals to the actual signal lists. This allows the signal delivery
        code to rearm the timer.
 
     This also required to consolidate the signal delivery rules so they are
     consistent across all situations. With that all self test scenarios
     finally succeed.
 
   - Core infrastructure for VFS multigrain timestamping
 
     This is required to allow the kernel to use coarse grained time stamps
     by default and switch to fine grained time stamps when inode attributes
     are actively observed via getattr().
 
     These changes have been provided to the VFS tree as well, so that the
     VFS specific infrastructure could be built on top.
 
   - Cleanup and consolidation of the sleep() infrastructure
 
     * Move all sleep and timeout functions into one file
 
     * Rework udelay() and ndelay() into proper documented inline functions
       and replace the hardcoded magic numbers by proper defines.
 
     * Rework the fsleep() implementation to take the reality of the timer
       wheel granularity on different HZ values into account. Right now the
       boundaries are hard coded time ranges which fail to provide the
       requested accuracy on different HZ settings.
 
     * Update documentation for all sleep/timeout related functions and fix
       up stale documentation links all over the place
 
     * Fixup a few usage sites
 
   - Rework of timekeeping and adjtimex(2) to prepare for multiple PTP clocks
 
     A system can have multiple PTP clocks which are participating in
     seperate and independent PTP clock domains. So far the kernel only
     considers the PTP clock which is based on CLOCK TAI relevant as that's
     the clock which drives the timekeeping adjustments via the various user
     space daemons through adjtimex(2).
 
     The non TAI based clock domains are accessible via the file descriptor
     based posix clocks, but their usability is very limited. They can't be
     accessed fast as they always go all the way out to the hardware and
     they cannot be utilized in the kernel itself.
 
     As Time Sensitive Networking (TSN) gains traction it is required to
     provide fast user and kernel space access to these clocks.
 
     The approach taken is to utilize the timekeeping and adjtimex(2)
     infrastructure to provide this access in a similar way how the kernel
     provides access to clock MONOTONIC, REALTIME etc.
 
     Instead of creating a duplicated infrastructure this rework converts
     timekeeping and adjtimex(2) into generic functionality which operates
     on pointers to data structures instead of using static variables.
 
     This allows to provide time accessors and adjtimex(2) functionality for
     the independent PTP clocks in a subsequent step.
 
   - Consolidate hrtimer initialization
 
     hrtimers are set up by initializing the data structure and then
     seperately setting the callback function for historical reasons.
 
     That's an extra unnecessary step and makes Rust support less straight
     forward than it should be.
 
     Provide a new set of hrtimer_setup*() functions and convert the core
     code and a few usage sites of the less frequently used interfaces over.
 
     The bulk of the htimer_init() to hrtimer_setup() conversion is already
     prepared and scheduled for the next merge window.
 
   - Drivers:
 
     * Ensure that the global timekeeping clocksource is utilizing the
       cluster 0 timer on MIPS multi-cluster systems.
 
       Otherwise CPUs on different clusters use their cluster specific
       clocksource which is not guaranteed to be synchronized with other
       clusters.
 
     * Mostly boring cleanups, fixes, improvements and code movement
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmc7kPITHHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYoZKkD/9OUL6fOJrDUmOYBa4QVeMyfTef4EaL
 tvwIMM/29XQFeiq3xxCIn+EMnHjXn2lvIhYGQ7GKsbKYwvJ7ZBDpQb+UMhZ2nKI9
 6D6BP6WomZohKeH2fZbJQAdqOi3KRYdvQdIsVZUexkqiaVPphRvOH9wOr45gHtZM
 EyMRSotPlQTDqcrbUejDMEO94GyjDCYXRsyATLxjmTzL/N4xD4NRIiotjM2vL/a9
 8MuCgIhrKUEyYlFoOxxeokBsF3kk3/ez2jlG9b/N8VLH3SYIc2zgL58FBgWxlmgG
 bY71nVG3nUgEjxBd2dcXAVVqvb+5widk8p6O7xxOAQKTLMcJ4H0tQDkMnzBtUzvB
 DGAJDHAmAr0g+ja9O35Pkhunkh4HYFIbq0Il4d1HMKObhJV0JumcKuQVxrXycdm3
 UZfq3seqHsZJQbPgCAhlFU0/2WWScocbee9bNebGT33KVwSp5FoVv89C/6Vjb+vV
 Gusc3thqrQuMAZW5zV8g4UcBAA/xH4PB0I+vHib+9XPZ4UQ7/6xKl2jE0kd5hX7n
 AAUeZvFNFqIsY+B6vz+Jx/yzyM7u5cuXq87pof5EHVFzv56lyTp4ToGcOGYRgKH5
 JXeYV1OxGziSDrd5vbf9CzdWMzqMvTefXrHbWrjkjhNOe8E1A8O88RZ5uRKZhmSw
 hZZ4hdM9+3T7cg==
 =2VC6
 -----END PGP SIGNATURE-----

Merge tag 'timers-core-2024-11-18' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull timer updates from Thomas Gleixner:
 "A rather large update for timekeeping and timers:

   - The final step to get rid of auto-rearming posix-timers

     posix-timers are currently auto-rearmed by the kernel when the
     signal of the timer is ignored so that the timer signal can be
     delivered once the corresponding signal is unignored.

     This requires to throttle the timer to prevent a DoS by small
     intervals and keeps the system pointlessly out of low power states
     for no value. This is a long standing non-trivial problem due to
     the lock order of posix-timer lock and the sighand lock along with
     life time issues as the timer and the sigqueue have different life
     time rules.

     Cure this by:

       - Embedding the sigqueue into the timer struct to have the same
         life time rules. Aside of that this also avoids the lookup of
         the timer in the signal delivery and rearm path as it's just a
         always valid container_of() now.

       - Queuing ignored timer signals onto a seperate ignored list.

       - Moving queued timer signals onto the ignored list when the
         signal is switched to SIG_IGN before it could be delivered.

       - Walking the ignored list when SIG_IGN is lifted and requeue the
         signals to the actual signal lists. This allows the signal
         delivery code to rearm the timer.

     This also required to consolidate the signal delivery rules so they
     are consistent across all situations. With that all self test
     scenarios finally succeed.

   - Core infrastructure for VFS multigrain timestamping

     This is required to allow the kernel to use coarse grained time
     stamps by default and switch to fine grained time stamps when inode
     attributes are actively observed via getattr().

     These changes have been provided to the VFS tree as well, so that
     the VFS specific infrastructure could be built on top.

   - Cleanup and consolidation of the sleep() infrastructure

       - Move all sleep and timeout functions into one file

       - Rework udelay() and ndelay() into proper documented inline
         functions and replace the hardcoded magic numbers by proper
         defines.

       - Rework the fsleep() implementation to take the reality of the
         timer wheel granularity on different HZ values into account.
         Right now the boundaries are hard coded time ranges which fail
         to provide the requested accuracy on different HZ settings.

       - Update documentation for all sleep/timeout related functions
         and fix up stale documentation links all over the place

       - Fixup a few usage sites

   - Rework of timekeeping and adjtimex(2) to prepare for multiple PTP
     clocks

     A system can have multiple PTP clocks which are participating in
     seperate and independent PTP clock domains. So far the kernel only
     considers the PTP clock which is based on CLOCK TAI relevant as
     that's the clock which drives the timekeeping adjustments via the
     various user space daemons through adjtimex(2).

     The non TAI based clock domains are accessible via the file
     descriptor based posix clocks, but their usability is very limited.
     They can't be accessed fast as they always go all the way out to
     the hardware and they cannot be utilized in the kernel itself.

     As Time Sensitive Networking (TSN) gains traction it is required to
     provide fast user and kernel space access to these clocks.

     The approach taken is to utilize the timekeeping and adjtimex(2)
     infrastructure to provide this access in a similar way how the
     kernel provides access to clock MONOTONIC, REALTIME etc.

     Instead of creating a duplicated infrastructure this rework
     converts timekeeping and adjtimex(2) into generic functionality
     which operates on pointers to data structures instead of using
     static variables.

     This allows to provide time accessors and adjtimex(2) functionality
     for the independent PTP clocks in a subsequent step.

   - Consolidate hrtimer initialization

     hrtimers are set up by initializing the data structure and then
     seperately setting the callback function for historical reasons.

     That's an extra unnecessary step and makes Rust support less
     straight forward than it should be.

     Provide a new set of hrtimer_setup*() functions and convert the
     core code and a few usage sites of the less frequently used
     interfaces over.

     The bulk of the htimer_init() to hrtimer_setup() conversion is
     already prepared and scheduled for the next merge window.

   - Drivers:

       - Ensure that the global timekeeping clocksource is utilizing the
         cluster 0 timer on MIPS multi-cluster systems.

         Otherwise CPUs on different clusters use their cluster specific
         clocksource which is not guaranteed to be synchronized with
         other clusters.

       - Mostly boring cleanups, fixes, improvements and code movement"

* tag 'timers-core-2024-11-18' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (140 commits)
  posix-timers: Fix spurious warning on double enqueue versus do_exit()
  clocksource/drivers/arm_arch_timer: Use of_property_present() for non-boolean properties
  clocksource/drivers/gpx: Remove redundant casts
  clocksource/drivers/timer-ti-dm: Fix child node refcount handling
  dt-bindings: timer: actions,owl-timer: convert to YAML
  clocksource/drivers/ralink: Add Ralink System Tick Counter driver
  clocksource/drivers/mips-gic-timer: Always use cluster 0 counter as clocksource
  clocksource/drivers/timer-ti-dm: Don't fail probe if int not found
  clocksource/drivers:sp804: Make user selectable
  clocksource/drivers/dw_apb: Remove unused dw_apb_clockevent functions
  hrtimers: Delete hrtimer_init_on_stack()
  alarmtimer: Switch to use hrtimer_setup() and hrtimer_setup_on_stack()
  io_uring: Switch to use hrtimer_setup_on_stack()
  sched/idle: Switch to use hrtimer_setup_on_stack()
  hrtimers: Delete hrtimer_init_sleeper_on_stack()
  wait: Switch to use hrtimer_setup_sleeper_on_stack()
  timers: Switch to use hrtimer_setup_sleeper_on_stack()
  net: pktgen: Switch to use hrtimer_setup_sleeper_on_stack()
  futex: Switch to use hrtimer_setup_sleeper_on_stack()
  fs/aio: Switch to use hrtimer_setup_sleeper_on_stack()
  ...
This commit is contained in:
Linus Torvalds 2024-11-19 16:35:06 -08:00
commit bf9aa14fc5
90 changed files with 2375 additions and 2177 deletions

View File

@ -470,8 +470,6 @@ API usage
usleep_range() should be preferred over udelay(). The proper way of usleep_range() should be preferred over udelay(). The proper way of
using usleep_range() is mentioned in the kernel docs. using usleep_range() is mentioned in the kernel docs.
See: https://www.kernel.org/doc/html/latest/timers/timers-howto.html#delays-information-on-the-various-kernel-delay-sleep-mechanisms
Comments Comments
-------- --------

View File

@ -1,21 +0,0 @@
Actions Semi Owl Timer
Required properties:
- compatible : "actions,s500-timer" for S500
"actions,s700-timer" for S700
"actions,s900-timer" for S900
- reg : Offset and length of the register set for the device.
- interrupts : Should contain the interrupts.
- interrupt-names : Valid names are: "2hz0", "2hz1",
"timer0", "timer1", "timer2", "timer3"
See ../resource-names.txt
Example:
timer@b0168000 {
compatible = "actions,s500-timer";
reg = <0xb0168000 0x100>;
interrupts = <GIC_SPI 10 IRQ_TYPE_LEVEL_HIGH>,
<GIC_SPI 11 IRQ_TYPE_LEVEL_HIGH>;
interrupt-names = "timer0", "timer1";
};

View File

@ -0,0 +1,107 @@
# SPDX-License-Identifier: GPL-2.0-only OR BSD-2-Clause
%YAML 1.2
---
$id: http://devicetree.org/schemas/timer/actions,owl-timer.yaml#
$schema: http://devicetree.org/meta-schemas/core.yaml#
title: Actions Semi Owl timer
maintainers:
- Andreas Färber <afaerber@suse.de>
description:
Actions Semi Owl SoCs provide 32bit and 2Hz timers.
The 32bit timers support dynamic irq, as well as one-shot mode.
properties:
compatible:
enum:
- actions,s500-timer
- actions,s700-timer
- actions,s900-timer
clocks:
maxItems: 1
interrupts:
minItems: 1
maxItems: 6
interrupt-names:
minItems: 1
maxItems: 6
items:
enum:
- 2hz0
- 2hz1
- timer0
- timer1
- timer2
- timer3
reg:
maxItems: 1
required:
- compatible
- clocks
- interrupts
- interrupt-names
- reg
allOf:
- if:
properties:
compatible:
contains:
enum:
- actions,s500-timer
then:
properties:
interrupts:
minItems: 4
maxItems: 4
interrupt-names:
items:
- const: 2hz0
- const: 2hz1
- const: timer0
- const: timer1
- if:
properties:
compatible:
contains:
enum:
- actions,s700-timer
- actions,s900-timer
then:
properties:
interrupts:
minItems: 1
maxItems: 1
interrupt-names:
items:
- const: timer1
additionalProperties: false
examples:
- |
#include <dt-bindings/interrupt-controller/arm-gic.h>
#include <dt-bindings/interrupt-controller/irq.h>
soc {
#address-cells = <1>;
#size-cells = <1>;
timer@b0168000 {
compatible = "actions,s500-timer";
reg = <0xb0168000 0x100>;
clocks = <&hosc>;
interrupts = <GIC_SPI 8 IRQ_TYPE_LEVEL_HIGH>,
<GIC_SPI 9 IRQ_TYPE_LEVEL_HIGH>,
<GIC_SPI 10 IRQ_TYPE_LEVEL_HIGH>,
<GIC_SPI 11 IRQ_TYPE_LEVEL_HIGH>;
interrupt-names = "2hz0", "2hz1", "timer0", "timer1";
};
};
...

View File

@ -0,0 +1,121 @@
.. SPDX-License-Identifier: GPL-2.0
Delay and sleep mechanisms
==========================
This document seeks to answer the common question: "What is the
RightWay (TM) to insert a delay?"
This question is most often faced by driver writers who have to
deal with hardware delays and who may not be the most intimately
familiar with the inner workings of the Linux Kernel.
The following table gives a rough overview about the existing function
'families' and their limitations. This overview table does not replace the
reading of the function description before usage!
.. list-table::
:widths: 20 20 20 20 20
:header-rows: 2
* -
- `*delay()`
- `usleep_range*()`
- `*sleep()`
- `fsleep()`
* -
- busy-wait loop
- hrtimers based
- timer list timers based
- combines the others
* - Usage in atomic Context
- yes
- no
- no
- no
* - precise on "short intervals"
- yes
- yes
- depends
- yes
* - precise on "long intervals"
- Do not use!
- yes
- max 12.5% slack
- yes
* - interruptible variant
- no
- yes
- yes
- no
A generic advice for non atomic contexts could be:
#. Use `fsleep()` whenever unsure (as it combines all the advantages of the
others)
#. Use `*sleep()` whenever possible
#. Use `usleep_range*()` whenever accuracy of `*sleep()` is not sufficient
#. Use `*delay()` for very, very short delays
Find some more detailed information about the function 'families' in the next
sections.
`*delay()` family of functions
------------------------------
These functions use the jiffy estimation of clock speed and will busy wait for
enough loop cycles to achieve the desired delay. udelay() is the basic
implementation and ndelay() as well as mdelay() are variants.
These functions are mainly used to add a delay in atomic context. Please make
sure to ask yourself before adding a delay in atomic context: Is this really
required?
.. kernel-doc:: include/asm-generic/delay.h
:identifiers: udelay ndelay
.. kernel-doc:: include/linux/delay.h
:identifiers: mdelay
`usleep_range*()` and `*sleep()` family of functions
----------------------------------------------------
These functions use hrtimers or timer list timers to provide the requested
sleeping duration. In order to decide which function is the right one to use,
take some basic information into account:
#. hrtimers are more expensive as they are using an rb-tree (instead of hashing)
#. hrtimers are more expensive when the requested sleeping duration is the first
timer which means real hardware has to be programmed
#. timer list timers always provide some sort of slack as they are jiffy based
The generic advice is repeated here:
#. Use `fsleep()` whenever unsure (as it combines all the advantages of the
others)
#. Use `*sleep()` whenever possible
#. Use `usleep_range*()` whenever accuracy of `*sleep()` is not sufficient
First check fsleep() function description and to learn more about accuracy,
please check msleep() function description.
`usleep_range*()`
~~~~~~~~~~~~~~~~~
.. kernel-doc:: include/linux/delay.h
:identifiers: usleep_range usleep_range_idle
.. kernel-doc:: kernel/time/sleep_timeout.c
:identifiers: usleep_range_state
`*sleep()`
~~~~~~~~~~
.. kernel-doc:: kernel/time/sleep_timeout.c
:identifiers: msleep msleep_interruptible
.. kernel-doc:: include/linux/delay.h
:identifiers: ssleep fsleep

View File

@ -12,7 +12,7 @@ Timers
hrtimers hrtimers
no_hz no_hz
timekeeping timekeeping
timers-howto delay_sleep_functions
.. only:: subproject and html .. only:: subproject and html

View File

@ -1,115 +0,0 @@
===================================================================
delays - Information on the various kernel delay / sleep mechanisms
===================================================================
This document seeks to answer the common question: "What is the
RightWay (TM) to insert a delay?"
This question is most often faced by driver writers who have to
deal with hardware delays and who may not be the most intimately
familiar with the inner workings of the Linux Kernel.
Inserting Delays
----------------
The first, and most important, question you need to ask is "Is my
code in an atomic context?" This should be followed closely by "Does
it really need to delay in atomic context?" If so...
ATOMIC CONTEXT:
You must use the `*delay` family of functions. These
functions use the jiffy estimation of clock speed
and will busy wait for enough loop cycles to achieve
the desired delay:
ndelay(unsigned long nsecs)
udelay(unsigned long usecs)
mdelay(unsigned long msecs)
udelay is the generally preferred API; ndelay-level
precision may not actually exist on many non-PC devices.
mdelay is macro wrapper around udelay, to account for
possible overflow when passing large arguments to udelay.
In general, use of mdelay is discouraged and code should
be refactored to allow for the use of msleep.
NON-ATOMIC CONTEXT:
You should use the `*sleep[_range]` family of functions.
There are a few more options here, while any of them may
work correctly, using the "right" sleep function will
help the scheduler, power management, and just make your
driver better :)
-- Backed by busy-wait loop:
udelay(unsigned long usecs)
-- Backed by hrtimers:
usleep_range(unsigned long min, unsigned long max)
-- Backed by jiffies / legacy_timers
msleep(unsigned long msecs)
msleep_interruptible(unsigned long msecs)
Unlike the `*delay` family, the underlying mechanism
driving each of these calls varies, thus there are
quirks you should be aware of.
SLEEPING FOR "A FEW" USECS ( < ~10us? ):
* Use udelay
- Why not usleep?
On slower systems, (embedded, OR perhaps a speed-
stepped PC!) the overhead of setting up the hrtimers
for usleep *may* not be worth it. Such an evaluation
will obviously depend on your specific situation, but
it is something to be aware of.
SLEEPING FOR ~USECS OR SMALL MSECS ( 10us - 20ms):
* Use usleep_range
- Why not msleep for (1ms - 20ms)?
Explained originally here:
https://lore.kernel.org/r/15327.1186166232@lwn.net
msleep(1~20) may not do what the caller intends, and
will often sleep longer (~20 ms actual sleep for any
value given in the 1~20ms range). In many cases this
is not the desired behavior.
- Why is there no "usleep" / What is a good range?
Since usleep_range is built on top of hrtimers, the
wakeup will be very precise (ish), thus a simple
usleep function would likely introduce a large number
of undesired interrupts.
With the introduction of a range, the scheduler is
free to coalesce your wakeup with any other wakeup
that may have happened for other reasons, or at the
worst case, fire an interrupt for your upper bound.
The larger a range you supply, the greater a chance
that you will not trigger an interrupt; this should
be balanced with what is an acceptable upper bound on
delay / performance for your specific code path. Exact
tolerances here are very situation specific, thus it
is left to the caller to determine a reasonable range.
SLEEPING FOR LARGER MSECS ( 10ms+ )
* Use msleep or possibly msleep_interruptible
- What's the difference?
msleep sets the current task to TASK_UNINTERRUPTIBLE
whereas msleep_interruptible sets the current task to
TASK_INTERRUPTIBLE before scheduling the sleep. In
short, the difference is whether the sleep can be ended
early by a signal. In general, just use msleep unless
you know you have a need for the interruptible variant.
FLEXIBLE SLEEPING (any delay, uninterruptible)
* Use fsleep

View File

@ -1998,7 +1998,7 @@ F: Documentation/devicetree/bindings/mmc/owl-mmc.yaml
F: Documentation/devicetree/bindings/net/actions,owl-emac.yaml F: Documentation/devicetree/bindings/net/actions,owl-emac.yaml
F: Documentation/devicetree/bindings/pinctrl/actions,* F: Documentation/devicetree/bindings/pinctrl/actions,*
F: Documentation/devicetree/bindings/power/actions,owl-sps.txt F: Documentation/devicetree/bindings/power/actions,owl-sps.txt
F: Documentation/devicetree/bindings/timer/actions,owl-timer.txt F: Documentation/devicetree/bindings/timer/actions,owl-timer.yaml
F: arch/arm/boot/dts/actions/ F: arch/arm/boot/dts/actions/
F: arch/arm/mach-actions/ F: arch/arm/mach-actions/
F: arch/arm64/boot/dts/actions/ F: arch/arm64/boot/dts/actions/
@ -10138,10 +10138,12 @@ S: Maintained
T: git git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git timers/core T: git git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git timers/core
F: Documentation/timers/ F: Documentation/timers/
F: include/linux/clockchips.h F: include/linux/clockchips.h
F: include/linux/delay.h
F: include/linux/hrtimer.h F: include/linux/hrtimer.h
F: include/linux/timer.h F: include/linux/timer.h
F: kernel/time/clockevents.c F: kernel/time/clockevents.c
F: kernel/time/hrtimer.c F: kernel/time/hrtimer.c
F: kernel/time/sleep_timeout.c
F: kernel/time/timer.c F: kernel/time/timer.c
F: kernel/time/timer_list.c F: kernel/time/timer_list.c
F: kernel/time/timer_migration.* F: kernel/time/timer_migration.*

View File

@ -93,7 +93,6 @@ static void twd_timer_stop(void)
{ {
struct clock_event_device *clk = raw_cpu_ptr(twd_evt); struct clock_event_device *clk = raw_cpu_ptr(twd_evt);
twd_shutdown(clk);
disable_percpu_irq(clk->irq); disable_percpu_irq(clk->irq);
} }

View File

@ -1,13 +1,6 @@
# SPDX-License-Identifier: GPL-2.0 # SPDX-License-Identifier: GPL-2.0
if RALINK if RALINK
config CLKEVT_RT3352
bool
depends on SOC_RT305X || SOC_MT7620
default y
select TIMER_OF
select CLKSRC_MMIO
config RALINK_ILL_ACC config RALINK_ILL_ACC
bool bool
depends on SOC_RT305X depends on SOC_RT305X

View File

@ -10,8 +10,6 @@ ifndef CONFIG_MIPS_GIC
obj-y += clk.o timer.o obj-y += clk.o timer.o
endif endif
obj-$(CONFIG_CLKEVT_RT3352) += cevt-rt3352.o
obj-$(CONFIG_RALINK_ILL_ACC) += ill_acc.o obj-$(CONFIG_RALINK_ILL_ACC) += ill_acc.o
obj-$(CONFIG_IRQ_INTC) += irq.o obj-$(CONFIG_IRQ_INTC) += irq.o

View File

@ -1390,21 +1390,14 @@ bool __ref rtas_busy_delay(int status)
*/ */
ms = clamp(ms, 1U, 1000U); ms = clamp(ms, 1U, 1000U);
/* /*
* The delay hint is an order-of-magnitude suggestion, not * The delay hint is an order-of-magnitude suggestion, not a
* a minimum. It is fine, possibly even advantageous, for * minimum. It is fine, possibly even advantageous, for us to
* us to pause for less time than hinted. For small values, * pause for less time than hinted. To make sure pause time will
* use usleep_range() to ensure we don't sleep much longer * not be way longer than requested independent of HZ
* than actually needed. * configuration, use fsleep(). See fsleep() for details of
* * used sleeping functions.
* See Documentation/timers/timers-howto.rst for
* explanation of the threshold used here. In effect we use
* usleep_range() for 9900 and 9901, msleep() for
* 9902-9905.
*/ */
if (ms <= 20) fsleep(ms * 1000);
usleep_range(ms * 100, ms * 1000);
else
msleep(ms);
break; break;
case RTAS_BUSY: case RTAS_BUSY:
ret = true; ret = true;

View File

@ -302,7 +302,6 @@ CONFIG_DEBUG_MEMORY_INIT=y
CONFIG_DEBUG_PER_CPU_MAPS=y CONFIG_DEBUG_PER_CPU_MAPS=y
CONFIG_SOFTLOCKUP_DETECTOR=y CONFIG_SOFTLOCKUP_DETECTOR=y
CONFIG_WQ_WATCHDOG=y CONFIG_WQ_WATCHDOG=y
CONFIG_DEBUG_TIMEKEEPING=y
CONFIG_DEBUG_RT_MUTEXES=y CONFIG_DEBUG_RT_MUTEXES=y
CONFIG_DEBUG_SPINLOCK=y CONFIG_DEBUG_SPINLOCK=y
CONFIG_DEBUG_MUTEXES=y CONFIG_DEBUG_MUTEXES=y

View File

@ -146,7 +146,6 @@ config X86
select ARCH_HAS_PARANOID_L1D_FLUSH select ARCH_HAS_PARANOID_L1D_FLUSH
select BUILDTIME_TABLE_SORT select BUILDTIME_TABLE_SORT
select CLKEVT_I8253 select CLKEVT_I8253
select CLOCKSOURCE_VALIDATE_LAST_CYCLE
select CLOCKSOURCE_WATCHDOG select CLOCKSOURCE_WATCHDOG
# Word-size accesses may read uninitialized data past the trailing \0 # Word-size accesses may read uninitialized data past the trailing \0
# in strings and cause false KMSAN reports. # in strings and cause false KMSAN reports.

View File

@ -6,8 +6,6 @@
#include <linux/interrupt.h> #include <linux/interrupt.h>
#include <linux/math64.h> #include <linux/math64.h>
#define TICK_SIZE (tick_nsec / 1000)
unsigned long long native_sched_clock(void); unsigned long long native_sched_clock(void);
extern void recalibrate_cpu_khz(void); extern void recalibrate_cpu_khz(void);

View File

@ -263,13 +263,6 @@ static void kvm_xen_stop_timer(struct kvm_vcpu *vcpu)
atomic_set(&vcpu->arch.xen.timer_pending, 0); atomic_set(&vcpu->arch.xen.timer_pending, 0);
} }
static void kvm_xen_init_timer(struct kvm_vcpu *vcpu)
{
hrtimer_init(&vcpu->arch.xen.timer, CLOCK_MONOTONIC,
HRTIMER_MODE_ABS_HARD);
vcpu->arch.xen.timer.function = xen_timer_callback;
}
static void kvm_xen_update_runstate_guest(struct kvm_vcpu *v, bool atomic) static void kvm_xen_update_runstate_guest(struct kvm_vcpu *v, bool atomic)
{ {
struct kvm_vcpu_xen *vx = &v->arch.xen; struct kvm_vcpu_xen *vx = &v->arch.xen;
@ -1070,9 +1063,6 @@ int kvm_xen_vcpu_set_attr(struct kvm_vcpu *vcpu, struct kvm_xen_vcpu_attr *data)
break; break;
} }
if (!vcpu->arch.xen.timer.function)
kvm_xen_init_timer(vcpu);
/* Stop the timer (if it's running) before changing the vector */ /* Stop the timer (if it's running) before changing the vector */
kvm_xen_stop_timer(vcpu); kvm_xen_stop_timer(vcpu);
vcpu->arch.xen.timer_virq = data->u.timer.port; vcpu->arch.xen.timer_virq = data->u.timer.port;
@ -2235,6 +2225,8 @@ void kvm_xen_init_vcpu(struct kvm_vcpu *vcpu)
vcpu->arch.xen.poll_evtchn = 0; vcpu->arch.xen.poll_evtchn = 0;
timer_setup(&vcpu->arch.xen.poll_timer, cancel_evtchn_poll, 0); timer_setup(&vcpu->arch.xen.poll_timer, cancel_evtchn_poll, 0);
hrtimer_init(&vcpu->arch.xen.timer, CLOCK_MONOTONIC, HRTIMER_MODE_ABS_HARD);
vcpu->arch.xen.timer.function = xen_timer_callback;
kvm_gpc_init(&vcpu->arch.xen.runstate_cache, vcpu->kvm); kvm_gpc_init(&vcpu->arch.xen.runstate_cache, vcpu->kvm);
kvm_gpc_init(&vcpu->arch.xen.runstate2_cache, vcpu->kvm); kvm_gpc_init(&vcpu->arch.xen.runstate2_cache, vcpu->kvm);

View File

@ -400,7 +400,8 @@ config ARM_GT_INITIAL_PRESCALER_VAL
This affects CPU_FREQ max delta from the initial frequency. This affects CPU_FREQ max delta from the initial frequency.
config ARM_TIMER_SP804 config ARM_TIMER_SP804
bool "Support for Dual Timer SP804 module" if COMPILE_TEST bool "Support for Dual Timer SP804 module"
depends on ARM || ARM64 || COMPILE_TEST
depends on GENERIC_SCHED_CLOCK && HAVE_CLK depends on GENERIC_SCHED_CLOCK && HAVE_CLK
select CLKSRC_MMIO select CLKSRC_MMIO
select TIMER_OF if OF select TIMER_OF if OF
@ -753,4 +754,13 @@ config EP93XX_TIMER
Enables support for the Cirrus Logic timer block Enables support for the Cirrus Logic timer block
EP93XX. EP93XX.
config RALINK_TIMER
bool "Ralink System Tick Counter"
depends on SOC_RT305X || SOC_MT7620 || COMPILE_TEST
select CLKSRC_MMIO
select TIMER_OF
help
Enables support for system tick counter present on
Ralink SoCs RT3352 and MT7620.
endmenu endmenu

View File

@ -91,3 +91,4 @@ obj-$(CONFIG_GOLDFISH_TIMER) += timer-goldfish.o
obj-$(CONFIG_GXP_TIMER) += timer-gxp.o obj-$(CONFIG_GXP_TIMER) += timer-gxp.o
obj-$(CONFIG_CLKSRC_LOONGSON1_PWM) += timer-loongson1-pwm.o obj-$(CONFIG_CLKSRC_LOONGSON1_PWM) += timer-loongson1-pwm.o
obj-$(CONFIG_EP93XX_TIMER) += timer-ep93xx.o obj-$(CONFIG_EP93XX_TIMER) += timer-ep93xx.o
obj-$(CONFIG_RALINK_TIMER) += timer-ralink.o

View File

@ -1179,8 +1179,6 @@ static void arch_timer_stop(struct clock_event_device *clk)
disable_percpu_irq(arch_timer_ppi[arch_timer_uses_ppi]); disable_percpu_irq(arch_timer_ppi[arch_timer_uses_ppi]);
if (arch_timer_has_nonsecure_ppi()) if (arch_timer_has_nonsecure_ppi())
disable_percpu_irq(arch_timer_ppi[ARCH_TIMER_PHYS_NONSECURE_PPI]); disable_percpu_irq(arch_timer_ppi[ARCH_TIMER_PHYS_NONSECURE_PPI]);
clk->set_state_shutdown(clk);
} }
static int arch_timer_dying_cpu(unsigned int cpu) static int arch_timer_dying_cpu(unsigned int cpu)
@ -1430,7 +1428,7 @@ static int __init arch_timer_of_init(struct device_node *np)
arch_timers_present |= ARCH_TIMER_TYPE_CP15; arch_timers_present |= ARCH_TIMER_TYPE_CP15;
has_names = of_property_read_bool(np, "interrupt-names"); has_names = of_property_present(np, "interrupt-names");
for (i = ARCH_TIMER_PHYS_SECURE_PPI; i < ARCH_TIMER_MAX_TIMER_PPI; i++) { for (i = ARCH_TIMER_PHYS_SECURE_PPI; i < ARCH_TIMER_MAX_TIMER_PPI; i++) {
if (has_names) if (has_names)

View File

@ -195,7 +195,6 @@ static int gt_dying_cpu(unsigned int cpu)
{ {
struct clock_event_device *clk = this_cpu_ptr(gt_evt); struct clock_event_device *clk = this_cpu_ptr(gt_evt);
gt_clockevent_shutdown(clk);
disable_percpu_irq(clk->irq); disable_percpu_irq(clk->irq);
return 0; return 0;
} }

View File

@ -68,25 +68,6 @@ static inline void apbt_writel_relaxed(struct dw_apb_timer *timer, u32 val,
writel_relaxed(val, timer->base + offs); writel_relaxed(val, timer->base + offs);
} }
static void apbt_disable_int(struct dw_apb_timer *timer)
{
u32 ctrl = apbt_readl(timer, APBTMR_N_CONTROL);
ctrl |= APBTMR_CONTROL_INT;
apbt_writel(timer, ctrl, APBTMR_N_CONTROL);
}
/**
* dw_apb_clockevent_pause() - stop the clock_event_device from running
*
* @dw_ced: The APB clock to stop generating events.
*/
void dw_apb_clockevent_pause(struct dw_apb_clock_event_device *dw_ced)
{
disable_irq(dw_ced->timer.irq);
apbt_disable_int(&dw_ced->timer);
}
static void apbt_eoi(struct dw_apb_timer *timer) static void apbt_eoi(struct dw_apb_timer *timer)
{ {
apbt_readl_relaxed(timer, APBTMR_N_EOI); apbt_readl_relaxed(timer, APBTMR_N_EOI);
@ -284,26 +265,6 @@ dw_apb_clockevent_init(int cpu, const char *name, unsigned rating,
return dw_ced; return dw_ced;
} }
/**
* dw_apb_clockevent_resume() - resume a clock that has been paused.
*
* @dw_ced: The APB clock to resume.
*/
void dw_apb_clockevent_resume(struct dw_apb_clock_event_device *dw_ced)
{
enable_irq(dw_ced->timer.irq);
}
/**
* dw_apb_clockevent_stop() - stop the clock_event_device and release the IRQ.
*
* @dw_ced: The APB clock to stop generating the events.
*/
void dw_apb_clockevent_stop(struct dw_apb_clock_event_device *dw_ced)
{
free_irq(dw_ced->timer.irq, &dw_ced->ced);
}
/** /**
* dw_apb_clockevent_register() - register the clock with the generic layer * dw_apb_clockevent_register() - register the clock with the generic layer
* *

View File

@ -496,7 +496,6 @@ static int exynos4_mct_dying_cpu(unsigned int cpu)
per_cpu_ptr(&percpu_mct_tick, cpu); per_cpu_ptr(&percpu_mct_tick, cpu);
struct clock_event_device *evt = &mevt->evt; struct clock_event_device *evt = &mevt->evt;
evt->set_state_shutdown(evt);
if (mct_int_type == MCT_INT_SPI) { if (mct_int_type == MCT_INT_SPI) {
if (evt->irq != -1) if (evt->irq != -1)
disable_irq_nosync(evt->irq); disable_irq_nosync(evt->irq);

View File

@ -166,6 +166,37 @@ static u64 gic_hpt_read(struct clocksource *cs)
return gic_read_count(); return gic_read_count();
} }
static u64 gic_hpt_read_multicluster(struct clocksource *cs)
{
unsigned int hi, hi2, lo;
u64 count;
mips_cm_lock_other(0, 0, 0, CM_GCR_Cx_OTHER_BLOCK_GLOBAL);
if (mips_cm_is64) {
count = read_gic_redir_counter();
goto out;
}
hi = read_gic_redir_counter_32h();
while (true) {
lo = read_gic_redir_counter_32l();
/* If hi didn't change then lo didn't wrap & we're done */
hi2 = read_gic_redir_counter_32h();
if (hi2 == hi)
break;
/* Otherwise, repeat with the latest hi value */
hi = hi2;
}
count = (((u64)hi) << 32) + lo;
out:
mips_cm_unlock_other();
return count;
}
static struct clocksource gic_clocksource = { static struct clocksource gic_clocksource = {
.name = "GIC", .name = "GIC",
.read = gic_hpt_read, .read = gic_hpt_read,
@ -203,6 +234,11 @@ static int __init __gic_clocksource_init(void)
gic_clocksource.rating = 200; gic_clocksource.rating = 200;
gic_clocksource.rating += clamp(gic_frequency / 10000000, 0, 99); gic_clocksource.rating += clamp(gic_frequency / 10000000, 0, 99);
if (mips_cps_multicluster_cpus()) {
gic_clocksource.read = &gic_hpt_read_multicluster;
gic_clocksource.vdso_clock_mode = VDSO_CLOCKMODE_NONE;
}
ret = clocksource_register_hz(&gic_clocksource, gic_frequency); ret = clocksource_register_hz(&gic_clocksource, gic_frequency);
if (ret < 0) if (ret < 0)
pr_warn("Unable to register clocksource\n"); pr_warn("Unable to register clocksource\n");
@ -261,7 +297,8 @@ static int __init gic_clocksource_of_init(struct device_node *node)
* stable CPU frequency or on the platforms with CM3 and CPU frequency * stable CPU frequency or on the platforms with CM3 and CPU frequency
* change performed by the CPC core clocks divider. * change performed by the CPC core clocks divider.
*/ */
if (mips_cm_revision() >= CM_REV_CM3 || !IS_ENABLED(CONFIG_CPU_FREQ)) { if ((mips_cm_revision() >= CM_REV_CM3 || !IS_ENABLED(CONFIG_CPU_FREQ)) &&
!mips_cps_multicluster_cpus()) {
sched_clock_register(mips_cm_is64 ? sched_clock_register(mips_cm_is64 ?
gic_read_count_64 : gic_read_count_2x32, gic_read_count_64 : gic_read_count_2x32,
gic_count_width, gic_frequency); gic_count_width, gic_frequency);

View File

@ -201,7 +201,6 @@ static int armada_370_xp_timer_dying_cpu(unsigned int cpu)
{ {
struct clock_event_device *evt = per_cpu_ptr(armada_370_xp_evt, cpu); struct clock_event_device *evt = per_cpu_ptr(armada_370_xp_evt, cpu);
evt->set_state_shutdown(evt);
disable_percpu_irq(evt->irq); disable_percpu_irq(evt->irq);
return 0; return 0;
} }

View File

@ -85,7 +85,7 @@ static int __init gxp_timer_init(struct device_node *node)
clk = of_clk_get(node, 0); clk = of_clk_get(node, 0);
if (IS_ERR(clk)) { if (IS_ERR(clk)) {
ret = (int)PTR_ERR(clk); ret = PTR_ERR(clk);
pr_err("%pOFn clock not found: %d\n", node, ret); pr_err("%pOFn clock not found: %d\n", node, ret);
goto err_free; goto err_free;
} }

View File

@ -130,7 +130,6 @@ static int msm_local_timer_dying_cpu(unsigned int cpu)
{ {
struct clock_event_device *evt = per_cpu_ptr(msm_evt, cpu); struct clock_event_device *evt = per_cpu_ptr(msm_evt, cpu);
evt->set_state_shutdown(evt);
disable_percpu_irq(evt->irq); disable_percpu_irq(evt->irq);
return 0; return 0;
} }

View File

@ -1,7 +1,6 @@
// SPDX-License-Identifier: GPL-2.0
/* /*
* This file is subject to the terms and conditions of the GNU General Public * Ralink System Tick Counter driver present on RT3352 and MT7620 SoCs.
* License. See the file "COPYING" in the main directory of this archive
* for more details.
* *
* Copyright (C) 2013 by John Crispin <john@phrozen.org> * Copyright (C) 2013 by John Crispin <john@phrozen.org>
*/ */
@ -16,8 +15,6 @@
#include <linux/of_irq.h> #include <linux/of_irq.h>
#include <linux/of_address.h> #include <linux/of_address.h>
#include <asm/mach-ralink/ralink_regs.h>
#define SYSTICK_FREQ (50 * 1000) #define SYSTICK_FREQ (50 * 1000)
#define SYSTICK_CONFIG 0x00 #define SYSTICK_CONFIG 0x00
@ -40,7 +37,7 @@ static int systick_set_oneshot(struct clock_event_device *evt);
static int systick_shutdown(struct clock_event_device *evt); static int systick_shutdown(struct clock_event_device *evt);
static int systick_next_event(unsigned long delta, static int systick_next_event(unsigned long delta,
struct clock_event_device *evt) struct clock_event_device *evt)
{ {
struct systick_device *sdev; struct systick_device *sdev;
u32 count; u32 count;
@ -60,7 +57,7 @@ static void systick_event_handler(struct clock_event_device *dev)
static irqreturn_t systick_interrupt(int irq, void *dev_id) static irqreturn_t systick_interrupt(int irq, void *dev_id)
{ {
struct clock_event_device *dev = (struct clock_event_device *) dev_id; struct clock_event_device *dev = (struct clock_event_device *)dev_id;
dev->event_handler(dev); dev->event_handler(dev);

View File

@ -158,7 +158,6 @@ static int tegra_timer_stop(unsigned int cpu)
{ {
struct timer_of *to = per_cpu_ptr(&tegra_to, cpu); struct timer_of *to = per_cpu_ptr(&tegra_to, cpu);
to->clkevt.set_state_shutdown(&to->clkevt);
disable_irq_nosync(to->clkevt.irq); disable_irq_nosync(to->clkevt.irq);
return 0; return 0;

View File

@ -202,10 +202,10 @@ static bool __init dmtimer_is_preferred(struct device_node *np)
/* Secure gptimer12 is always clocked with a fixed source */ /* Secure gptimer12 is always clocked with a fixed source */
if (!of_property_read_bool(np, "ti,timer-secure")) { if (!of_property_read_bool(np, "ti,timer-secure")) {
if (!of_property_read_bool(np, "assigned-clocks")) if (!of_property_present(np, "assigned-clocks"))
return false; return false;
if (!of_property_read_bool(np, "assigned-clock-parents")) if (!of_property_present(np, "assigned-clock-parents"))
return false; return false;
} }
@ -686,9 +686,9 @@ subsys_initcall(dmtimer_percpu_timer_startup);
static int __init dmtimer_percpu_quirk_init(struct device_node *np, u32 pa) static int __init dmtimer_percpu_quirk_init(struct device_node *np, u32 pa)
{ {
struct device_node *arm_timer; struct device_node *arm_timer __free(device_node) =
of_find_compatible_node(NULL, NULL, "arm,armv7-timer");
arm_timer = of_find_compatible_node(NULL, NULL, "arm,armv7-timer");
if (of_device_is_available(arm_timer)) { if (of_device_is_available(arm_timer)) {
pr_warn_once("ARM architected timer wrap issue i940 detected\n"); pr_warn_once("ARM architected timer wrap issue i940 detected\n");
return 0; return 0;

View File

@ -1104,8 +1104,12 @@ static int omap_dm_timer_probe(struct platform_device *pdev)
return -ENOMEM; return -ENOMEM;
timer->irq = platform_get_irq(pdev, 0); timer->irq = platform_get_irq(pdev, 0);
if (timer->irq < 0) if (timer->irq < 0) {
return timer->irq; if (of_property_read_bool(dev->of_node, "ti,timer-pwm"))
dev_info(dev, "Did not find timer interrupt, timer usable in PWM mode only\n");
else
return timer->irq;
}
timer->io_base = devm_platform_ioremap_resource(pdev, 0); timer->io_base = devm_platform_ioremap_resource(pdev, 0);
if (IS_ERR(timer->io_base)) if (IS_ERR(timer->io_base))

View File

@ -273,11 +273,6 @@ i915_request_active_engine(struct i915_request *rq,
return ret; return ret;
} }
static void __rq_init_watchdog(struct i915_request *rq)
{
rq->watchdog.timer.function = NULL;
}
static enum hrtimer_restart __rq_watchdog_expired(struct hrtimer *hrtimer) static enum hrtimer_restart __rq_watchdog_expired(struct hrtimer *hrtimer)
{ {
struct i915_request *rq = struct i915_request *rq =
@ -294,6 +289,14 @@ static enum hrtimer_restart __rq_watchdog_expired(struct hrtimer *hrtimer)
return HRTIMER_NORESTART; return HRTIMER_NORESTART;
} }
static void __rq_init_watchdog(struct i915_request *rq)
{
struct i915_request_watchdog *wdg = &rq->watchdog;
hrtimer_init(&wdg->timer, CLOCK_MONOTONIC, HRTIMER_MODE_REL);
wdg->timer.function = __rq_watchdog_expired;
}
static void __rq_arm_watchdog(struct i915_request *rq) static void __rq_arm_watchdog(struct i915_request *rq)
{ {
struct i915_request_watchdog *wdg = &rq->watchdog; struct i915_request_watchdog *wdg = &rq->watchdog;
@ -304,8 +307,6 @@ static void __rq_arm_watchdog(struct i915_request *rq)
i915_request_get(rq); i915_request_get(rq);
hrtimer_init(&wdg->timer, CLOCK_MONOTONIC, HRTIMER_MODE_REL);
wdg->timer.function = __rq_watchdog_expired;
hrtimer_start_range_ns(&wdg->timer, hrtimer_start_range_ns(&wdg->timer,
ns_to_ktime(ce->watchdog.timeout_us * ns_to_ktime(ce->watchdog.timeout_us *
NSEC_PER_USEC), NSEC_PER_USEC),
@ -317,7 +318,7 @@ static void __rq_cancel_watchdog(struct i915_request *rq)
{ {
struct i915_request_watchdog *wdg = &rq->watchdog; struct i915_request_watchdog *wdg = &rq->watchdog;
if (wdg->timer.function && hrtimer_try_to_cancel(&wdg->timer) > 0) if (hrtimer_try_to_cancel(&wdg->timer) > 0)
i915_request_put(rq); i915_request_put(rq);
} }

View File

@ -46,24 +46,15 @@ static int anysee_ctrl_msg(struct dvb_usb_device *d,
dev_dbg(&d->udev->dev, "%s: >>> %*ph\n", __func__, slen, state->buf); dev_dbg(&d->udev->dev, "%s: >>> %*ph\n", __func__, slen, state->buf);
/* We need receive one message more after dvb_usb_generic_rw due /*
to weird transaction flow, which is 1 x send + 2 x receive. */ * We need receive one message more after dvb_usbv2_generic_rw_locked()
* due to weird transaction flow, which is 1 x send + 2 x receive.
*/
ret = dvb_usbv2_generic_rw_locked(d, state->buf, sizeof(state->buf), ret = dvb_usbv2_generic_rw_locked(d, state->buf, sizeof(state->buf),
state->buf, sizeof(state->buf)); state->buf, sizeof(state->buf));
if (ret) if (ret)
goto error_unlock; goto error_unlock;
/* TODO FIXME: dvb_usb_generic_rw() fails rarely with error code -32
* (EPIPE, Broken pipe). Function supports currently msleep() as a
* parameter but I would not like to use it, since according to
* Documentation/timers/timers-howto.rst it should not be used such
* short, under < 20ms, sleeps. Repeating failed message would be
* better choice as not to add unwanted delays...
* Fixing that correctly is one of those or both;
* 1) use repeat if possible
* 2) add suitable delay
*/
/* get answer, retry few times if error returned */ /* get answer, retry few times if error returned */
for (i = 0; i < 3; i++) { for (i = 0; i < 3; i++) {
/* receive 2nd answer */ /* receive 2nd answer */

View File

@ -823,8 +823,6 @@ int rt2x00usb_probe(struct usb_interface *usb_intf,
INIT_WORK(&rt2x00dev->rxdone_work, rt2x00usb_work_rxdone); INIT_WORK(&rt2x00dev->rxdone_work, rt2x00usb_work_rxdone);
INIT_WORK(&rt2x00dev->txdone_work, rt2x00usb_work_txdone); INIT_WORK(&rt2x00dev->txdone_work, rt2x00usb_work_txdone);
hrtimer_init(&rt2x00dev->txstatus_timer, CLOCK_MONOTONIC,
HRTIMER_MODE_REL);
retval = rt2x00usb_alloc_reg(rt2x00dev); retval = rt2x00usb_alloc_reg(rt2x00dev);
if (retval) if (retval)

View File

@ -1412,10 +1412,9 @@ static inline struct charger_desc *cm_get_drv_data(struct platform_device *pdev)
return dev_get_platdata(&pdev->dev); return dev_get_platdata(&pdev->dev);
} }
static enum alarmtimer_restart cm_timer_func(struct alarm *alarm, ktime_t now) static void cm_timer_func(struct alarm *alarm, ktime_t now)
{ {
cm_timer_set = false; cm_timer_set = false;
return ALARMTIMER_NORESTART;
} }
static int charger_manager_probe(struct platform_device *pdev) static int charger_manager_probe(struct platform_device *pdev)

View File

@ -1335,7 +1335,7 @@ static long read_events(struct kioctx *ctx, long min_nr, long nr,
if (until == 0 || ret < 0 || ret >= min_nr) if (until == 0 || ret < 0 || ret >= min_nr)
return ret; return ret;
hrtimer_init_sleeper_on_stack(&t, CLOCK_MONOTONIC, HRTIMER_MODE_REL); hrtimer_setup_sleeper_on_stack(&t, CLOCK_MONOTONIC, HRTIMER_MODE_REL);
if (until != KTIME_MAX) { if (until != KTIME_MAX) {
hrtimer_set_expires_range_ns(&t.timer, until, current->timer_slack_ns); hrtimer_set_expires_range_ns(&t.timer, until, current->timer_slack_ns);
hrtimer_sleeper_start_expires(&t, HRTIMER_MODE_REL); hrtimer_sleeper_start_expires(&t, HRTIMER_MODE_REL);

View File

@ -2552,8 +2552,8 @@ static int show_timer(struct seq_file *m, void *v)
seq_printf(m, "ID: %d\n", timer->it_id); seq_printf(m, "ID: %d\n", timer->it_id);
seq_printf(m, "signal: %d/%px\n", seq_printf(m, "signal: %d/%px\n",
timer->sigq->info.si_signo, timer->sigq.info.si_signo,
timer->sigq->info.si_value.sival_ptr); timer->sigq.info.si_value.sival_ptr);
seq_printf(m, "notify: %s/%s.%d\n", seq_printf(m, "notify: %s/%s.%d\n",
nstr[notify & ~SIGEV_THREAD_ID], nstr[notify & ~SIGEV_THREAD_ID],
(notify & SIGEV_THREAD_ID) ? "tid" : "pid", (notify & SIGEV_THREAD_ID) ? "tid" : "pid",

View File

@ -79,13 +79,11 @@ static enum hrtimer_restart timerfd_tmrproc(struct hrtimer *htmr)
return HRTIMER_NORESTART; return HRTIMER_NORESTART;
} }
static enum alarmtimer_restart timerfd_alarmproc(struct alarm *alarm, static void timerfd_alarmproc(struct alarm *alarm, ktime_t now)
ktime_t now)
{ {
struct timerfd_ctx *ctx = container_of(alarm, struct timerfd_ctx, struct timerfd_ctx *ctx = container_of(alarm, struct timerfd_ctx,
t.alarm); t.alarm);
timerfd_triggered(ctx); timerfd_triggered(ctx);
return ALARMTIMER_NORESTART;
} }
/* /*

View File

@ -2,6 +2,9 @@
#ifndef __ASM_GENERIC_DELAY_H #ifndef __ASM_GENERIC_DELAY_H
#define __ASM_GENERIC_DELAY_H #define __ASM_GENERIC_DELAY_H
#include <linux/math.h>
#include <vdso/time64.h>
/* Undefined functions to get compile-time errors */ /* Undefined functions to get compile-time errors */
extern void __bad_udelay(void); extern void __bad_udelay(void);
extern void __bad_ndelay(void); extern void __bad_ndelay(void);
@ -12,34 +15,73 @@ extern void __const_udelay(unsigned long xloops);
extern void __delay(unsigned long loops); extern void __delay(unsigned long loops);
/* /*
* The weird n/20000 thing suppresses a "comparison is always false due to * The microseconds/nanosecond delay multiplicators are used to convert a
* limited range of data type" warning with non-const 8-bit arguments. * constant microseconds/nanoseconds value to a value which can be used by the
* architectures specific implementation to transform it into loops.
*/ */
#define UDELAY_CONST_MULT ((unsigned long)DIV_ROUND_UP(1ULL << 32, USEC_PER_SEC))
#define NDELAY_CONST_MULT ((unsigned long)DIV_ROUND_UP(1ULL << 32, NSEC_PER_SEC))
/* 0x10c7 is 2**32 / 1000000 (rounded up) */ /*
#define udelay(n) \ * The maximum constant udelay/ndelay value picked out of thin air to prevent
({ \ * too long constant udelays/ndelays.
if (__builtin_constant_p(n)) { \ */
if ((n) / 20000 >= 1) \ #define DELAY_CONST_MAX 20000
__bad_udelay(); \
else \
__const_udelay((n) * 0x10c7ul); \
} else { \
__udelay(n); \
} \
})
/* 0x5 is 2**32 / 1000000000 (rounded up) */ /**
#define ndelay(n) \ * udelay - Inserting a delay based on microseconds with busy waiting
({ \ * @usec: requested delay in microseconds
if (__builtin_constant_p(n)) { \ *
if ((n) / 20000 >= 1) \ * When delaying in an atomic context ndelay(), udelay() and mdelay() are the
__bad_ndelay(); \ * only valid variants of delaying/sleeping to go with.
else \ *
__const_udelay((n) * 5ul); \ * When inserting delays in non atomic context which are shorter than the time
} else { \ * which is required to queue e.g. an hrtimer and to enter then the scheduler,
__ndelay(n); \ * it is also valuable to use udelay(). But it is not simple to specify a
} \ * generic threshold for this which will fit for all systems. An approximation
}) * is a threshold for all delays up to 10 microseconds.
*
* When having a delay which is larger than the architecture specific
* %MAX_UDELAY_MS value, please make sure mdelay() is used. Otherwise a overflow
* risk is given.
*
* Please note that ndelay(), udelay() and mdelay() may return early for several
* reasons (https://lists.openwall.net/linux-kernel/2011/01/09/56):
*
* #. computed loops_per_jiffy too low (due to the time taken to execute the
* timer interrupt.)
* #. cache behaviour affecting the time it takes to execute the loop function.
* #. CPU clock rate changes.
*/
static __always_inline void udelay(unsigned long usec)
{
if (__builtin_constant_p(usec)) {
if (usec >= DELAY_CONST_MAX)
__bad_udelay();
else
__const_udelay(usec * UDELAY_CONST_MULT);
} else {
__udelay(usec);
}
}
/**
* ndelay - Inserting a delay based on nanoseconds with busy waiting
* @nsec: requested delay in nanoseconds
*
* See udelay() for basic information about ndelay() and it's variants.
*/
static __always_inline void ndelay(unsigned long nsec)
{
if (__builtin_constant_p(nsec)) {
if (nsec >= DELAY_CONST_MAX)
__bad_udelay();
else
__const_udelay(nsec * NDELAY_CONST_MULT);
} else {
__udelay(nsec);
}
}
#define ndelay(x) ndelay(x)
#endif /* __ASM_GENERIC_DELAY_H */ #endif /* __ASM_GENERIC_DELAY_H */

View File

@ -20,12 +20,6 @@ enum alarmtimer_type {
ALARM_BOOTTIME_FREEZER, ALARM_BOOTTIME_FREEZER,
}; };
enum alarmtimer_restart {
ALARMTIMER_NORESTART,
ALARMTIMER_RESTART,
};
#define ALARMTIMER_STATE_INACTIVE 0x00 #define ALARMTIMER_STATE_INACTIVE 0x00
#define ALARMTIMER_STATE_ENQUEUED 0x01 #define ALARMTIMER_STATE_ENQUEUED 0x01
@ -42,14 +36,14 @@ enum alarmtimer_restart {
struct alarm { struct alarm {
struct timerqueue_node node; struct timerqueue_node node;
struct hrtimer timer; struct hrtimer timer;
enum alarmtimer_restart (*function)(struct alarm *, ktime_t now); void (*function)(struct alarm *, ktime_t now);
enum alarmtimer_type type; enum alarmtimer_type type;
int state; int state;
void *data; void *data;
}; };
void alarm_init(struct alarm *alarm, enum alarmtimer_type type, void alarm_init(struct alarm *alarm, enum alarmtimer_type type,
enum alarmtimer_restart (*function)(struct alarm *, ktime_t)); void (*function)(struct alarm *, ktime_t));
void alarm_start(struct alarm *alarm, ktime_t start); void alarm_start(struct alarm *alarm, ktime_t start);
void alarm_start_relative(struct alarm *alarm, ktime_t start); void alarm_start_relative(struct alarm *alarm, ktime_t start);
void alarm_restart(struct alarm *alarm); void alarm_restart(struct alarm *alarm);

View File

@ -215,7 +215,6 @@ static inline s64 clocksource_cyc2ns(u64 cycles, u32 mult, u32 shift)
extern int clocksource_unregister(struct clocksource*); extern int clocksource_unregister(struct clocksource*);
extern void clocksource_touch_watchdog(void); extern void clocksource_touch_watchdog(void);
extern void clocksource_change_rating(struct clocksource *cs, int rating);
extern void clocksource_suspend(void); extern void clocksource_suspend(void);
extern void clocksource_resume(void); extern void clocksource_resume(void);
extern struct clocksource * __init clocksource_default_clock(void); extern struct clocksource * __init clocksource_default_clock(void);

View File

@ -6,21 +6,12 @@
* Copyright (C) 1993 Linus Torvalds * Copyright (C) 1993 Linus Torvalds
* *
* Delay routines, using a pre-computed "loops_per_jiffy" value. * Delay routines, using a pre-computed "loops_per_jiffy" value.
* * Sleep routines using timer list timers or hrtimers.
* Please note that ndelay(), udelay() and mdelay() may return early for
* several reasons:
* 1. computed loops_per_jiffy too low (due to the time taken to
* execute the timer interrupt.)
* 2. cache behaviour affecting the time it takes to execute the
* loop function.
* 3. CPU clock rate changes.
*
* Please see this thread:
* https://lists.openwall.net/linux-kernel/2011/01/09/56
*/ */
#include <linux/math.h> #include <linux/math.h>
#include <linux/sched.h> #include <linux/sched.h>
#include <linux/jiffies.h>
extern unsigned long loops_per_jiffy; extern unsigned long loops_per_jiffy;
@ -35,12 +26,21 @@ extern unsigned long loops_per_jiffy;
* The 2nd mdelay() definition ensures GCC will optimize away the * The 2nd mdelay() definition ensures GCC will optimize away the
* while loop for the common cases where n <= MAX_UDELAY_MS -- Paul G. * while loop for the common cases where n <= MAX_UDELAY_MS -- Paul G.
*/ */
#ifndef MAX_UDELAY_MS #ifndef MAX_UDELAY_MS
#define MAX_UDELAY_MS 5 #define MAX_UDELAY_MS 5
#endif #endif
#ifndef mdelay #ifndef mdelay
/**
* mdelay - Inserting a delay based on milliseconds with busy waiting
* @n: requested delay in milliseconds
*
* See udelay() for basic information about mdelay() and it's variants.
*
* Please double check, whether mdelay() is the right way to go or whether a
* refactoring of the code is the better variant to be able to use msleep()
* instead.
*/
#define mdelay(n) (\ #define mdelay(n) (\
(__builtin_constant_p(n) && (n)<=MAX_UDELAY_MS) ? udelay((n)*1000) : \ (__builtin_constant_p(n) && (n)<=MAX_UDELAY_MS) ? udelay((n)*1000) : \
({unsigned long __ms=(n); while (__ms--) udelay(1000);})) ({unsigned long __ms=(n); while (__ms--) udelay(1000);}))
@ -63,30 +63,75 @@ unsigned long msleep_interruptible(unsigned int msecs);
void usleep_range_state(unsigned long min, unsigned long max, void usleep_range_state(unsigned long min, unsigned long max,
unsigned int state); unsigned int state);
/**
* usleep_range - Sleep for an approximate time
* @min: Minimum time in microseconds to sleep
* @max: Maximum time in microseconds to sleep
*
* For basic information please refere to usleep_range_state().
*
* The task will be in the state TASK_UNINTERRUPTIBLE during the sleep.
*/
static inline void usleep_range(unsigned long min, unsigned long max) static inline void usleep_range(unsigned long min, unsigned long max)
{ {
usleep_range_state(min, max, TASK_UNINTERRUPTIBLE); usleep_range_state(min, max, TASK_UNINTERRUPTIBLE);
} }
static inline void usleep_idle_range(unsigned long min, unsigned long max) /**
* usleep_range_idle - Sleep for an approximate time with idle time accounting
* @min: Minimum time in microseconds to sleep
* @max: Maximum time in microseconds to sleep
*
* For basic information please refere to usleep_range_state().
*
* The sleeping task has the state TASK_IDLE during the sleep to prevent
* contribution to the load avarage.
*/
static inline void usleep_range_idle(unsigned long min, unsigned long max)
{ {
usleep_range_state(min, max, TASK_IDLE); usleep_range_state(min, max, TASK_IDLE);
} }
/**
* ssleep - wrapper for seconds around msleep
* @seconds: Requested sleep duration in seconds
*
* Please refere to msleep() for detailed information.
*/
static inline void ssleep(unsigned int seconds) static inline void ssleep(unsigned int seconds)
{ {
msleep(seconds * 1000); msleep(seconds * 1000);
} }
/* see Documentation/timers/timers-howto.rst for the thresholds */ static const unsigned int max_slack_shift = 2;
#define USLEEP_RANGE_UPPER_BOUND ((TICK_NSEC << max_slack_shift) / NSEC_PER_USEC)
/**
* fsleep - flexible sleep which autoselects the best mechanism
* @usecs: requested sleep duration in microseconds
*
* flseep() selects the best mechanism that will provide maximum 25% slack
* to the requested sleep duration. Therefore it uses:
*
* * udelay() loop for sleep durations <= 10 microseconds to avoid hrtimer
* overhead for really short sleep durations.
* * usleep_range() for sleep durations which would lead with the usage of
* msleep() to a slack larger than 25%. This depends on the granularity of
* jiffies.
* * msleep() for all other sleep durations.
*
* Note: When %CONFIG_HIGH_RES_TIMERS is not set, all sleeps are processed with
* the granularity of jiffies and the slack might exceed 25% especially for
* short sleep durations.
*/
static inline void fsleep(unsigned long usecs) static inline void fsleep(unsigned long usecs)
{ {
if (usecs <= 10) if (usecs <= 10)
udelay(usecs); udelay(usecs);
else if (usecs <= 20000) else if (usecs < USLEEP_RANGE_UPPER_BOUND)
usleep_range(usecs, 2 * usecs); usleep_range(usecs, usecs + (usecs >> max_slack_shift));
else else
msleep(DIV_ROUND_UP(usecs, 1000)); msleep(DIV_ROUND_UP(usecs, USEC_PER_MSEC));
} }
#endif /* defined(_LINUX_DELAY_H) */ #endif /* defined(_LINUX_DELAY_H) */

View File

@ -34,9 +34,6 @@ struct dw_apb_clocksource {
}; };
void dw_apb_clockevent_register(struct dw_apb_clock_event_device *dw_ced); void dw_apb_clockevent_register(struct dw_apb_clock_event_device *dw_ced);
void dw_apb_clockevent_pause(struct dw_apb_clock_event_device *dw_ced);
void dw_apb_clockevent_resume(struct dw_apb_clock_event_device *dw_ced);
void dw_apb_clockevent_stop(struct dw_apb_clock_event_device *dw_ced);
struct dw_apb_clock_event_device * struct dw_apb_clock_event_device *
dw_apb_clockevent_init(int cpu, const char *name, unsigned rating, dw_apb_clockevent_init(int cpu, const char *name, unsigned rating,

View File

@ -228,32 +228,17 @@ static inline void hrtimer_cancel_wait_running(struct hrtimer *timer)
/* Initialize timers: */ /* Initialize timers: */
extern void hrtimer_init(struct hrtimer *timer, clockid_t which_clock, extern void hrtimer_init(struct hrtimer *timer, clockid_t which_clock,
enum hrtimer_mode mode); enum hrtimer_mode mode);
extern void hrtimer_init_sleeper(struct hrtimer_sleeper *sl, clockid_t clock_id, extern void hrtimer_setup(struct hrtimer *timer, enum hrtimer_restart (*function)(struct hrtimer *),
enum hrtimer_mode mode); clockid_t clock_id, enum hrtimer_mode mode);
extern void hrtimer_setup_on_stack(struct hrtimer *timer,
enum hrtimer_restart (*function)(struct hrtimer *),
clockid_t clock_id, enum hrtimer_mode mode);
extern void hrtimer_setup_sleeper_on_stack(struct hrtimer_sleeper *sl, clockid_t clock_id,
enum hrtimer_mode mode);
#ifdef CONFIG_DEBUG_OBJECTS_TIMERS #ifdef CONFIG_DEBUG_OBJECTS_TIMERS
extern void hrtimer_init_on_stack(struct hrtimer *timer, clockid_t which_clock,
enum hrtimer_mode mode);
extern void hrtimer_init_sleeper_on_stack(struct hrtimer_sleeper *sl,
clockid_t clock_id,
enum hrtimer_mode mode);
extern void destroy_hrtimer_on_stack(struct hrtimer *timer); extern void destroy_hrtimer_on_stack(struct hrtimer *timer);
#else #else
static inline void hrtimer_init_on_stack(struct hrtimer *timer,
clockid_t which_clock,
enum hrtimer_mode mode)
{
hrtimer_init(timer, which_clock, mode);
}
static inline void hrtimer_init_sleeper_on_stack(struct hrtimer_sleeper *sl,
clockid_t clock_id,
enum hrtimer_mode mode)
{
hrtimer_init_sleeper(sl, clock_id, mode);
}
static inline void destroy_hrtimer_on_stack(struct hrtimer *timer) { } static inline void destroy_hrtimer_on_stack(struct hrtimer *timer) { }
#endif #endif
@ -337,6 +322,28 @@ static inline int hrtimer_callback_running(struct hrtimer *timer)
return timer->base->running == timer; return timer->base->running == timer;
} }
/**
* hrtimer_update_function - Update the timer's callback function
* @timer: Timer to update
* @function: New callback function
*
* Only safe to call if the timer is not enqueued. Can be called in the callback function if the
* timer is not enqueued at the same time (see the comments above HRTIMER_STATE_ENQUEUED).
*/
static inline void hrtimer_update_function(struct hrtimer *timer,
enum hrtimer_restart (*function)(struct hrtimer *))
{
guard(raw_spinlock_irqsave)(&timer->base->cpu_base->lock);
if (WARN_ON_ONCE(hrtimer_is_queued(timer)))
return;
if (WARN_ON_ONCE(!function))
return;
timer->function = function;
}
/* Forward a hrtimer so it expires after now: */ /* Forward a hrtimer so it expires after now: */
extern u64 extern u64
hrtimer_forward(struct hrtimer *timer, ktime_t now, ktime_t interval); hrtimer_forward(struct hrtimer *timer, ktime_t now, ktime_t interval);

View File

@ -19,19 +19,19 @@
* @op: accessor function (takes @args as its arguments) * @op: accessor function (takes @args as its arguments)
* @val: Variable to read the value into * @val: Variable to read the value into
* @cond: Break condition (usually involving @val) * @cond: Break condition (usually involving @val)
* @sleep_us: Maximum time to sleep between reads in us (0 * @sleep_us: Maximum time to sleep between reads in us (0 tight-loops). Please
* tight-loops). Should be less than ~20ms since usleep_range * read usleep_range() function description for details and
* is used (see Documentation/timers/timers-howto.rst). * limitations.
* @timeout_us: Timeout in us, 0 means never timeout * @timeout_us: Timeout in us, 0 means never timeout
* @sleep_before_read: if it is true, sleep @sleep_us before read. * @sleep_before_read: if it is true, sleep @sleep_us before read.
* @args: arguments for @op poll * @args: arguments for @op poll
* *
* Returns 0 on success and -ETIMEDOUT upon a timeout. In either
* case, the last read value at @args is stored in @val. Must not
* be called from atomic context if sleep_us or timeout_us are used.
*
* When available, you'll probably want to use one of the specialized * When available, you'll probably want to use one of the specialized
* macros defined below rather than this macro directly. * macros defined below rather than this macro directly.
*
* Returns: 0 on success and -ETIMEDOUT upon a timeout. In either
* case, the last read value at @args is stored in @val. Must not
* be called from atomic context if sleep_us or timeout_us are used.
*/ */
#define read_poll_timeout(op, val, cond, sleep_us, timeout_us, \ #define read_poll_timeout(op, val, cond, sleep_us, timeout_us, \
sleep_before_read, args...) \ sleep_before_read, args...) \
@ -64,22 +64,22 @@
* @op: accessor function (takes @args as its arguments) * @op: accessor function (takes @args as its arguments)
* @val: Variable to read the value into * @val: Variable to read the value into
* @cond: Break condition (usually involving @val) * @cond: Break condition (usually involving @val)
* @delay_us: Time to udelay between reads in us (0 tight-loops). Should * @delay_us: Time to udelay between reads in us (0 tight-loops). Please
* be less than ~10us since udelay is used (see * read udelay() function description for details and
* Documentation/timers/timers-howto.rst). * limitations.
* @timeout_us: Timeout in us, 0 means never timeout * @timeout_us: Timeout in us, 0 means never timeout
* @delay_before_read: if it is true, delay @delay_us before read. * @delay_before_read: if it is true, delay @delay_us before read.
* @args: arguments for @op poll * @args: arguments for @op poll
* *
* Returns 0 on success and -ETIMEDOUT upon a timeout. In either
* case, the last read value at @args is stored in @val.
*
* This macro does not rely on timekeeping. Hence it is safe to call even when * This macro does not rely on timekeeping. Hence it is safe to call even when
* timekeeping is suspended, at the expense of an underestimation of wall clock * timekeeping is suspended, at the expense of an underestimation of wall clock
* time, which is rather minimal with a non-zero delay_us. * time, which is rather minimal with a non-zero delay_us.
* *
* When available, you'll probably want to use one of the specialized * When available, you'll probably want to use one of the specialized
* macros defined below rather than this macro directly. * macros defined below rather than this macro directly.
*
* Returns: 0 on success and -ETIMEDOUT upon a timeout. In either
* case, the last read value at @args is stored in @val.
*/ */
#define read_poll_timeout_atomic(op, val, cond, delay_us, timeout_us, \ #define read_poll_timeout_atomic(op, val, cond, delay_us, timeout_us, \
delay_before_read, args...) \ delay_before_read, args...) \
@ -119,17 +119,17 @@
* @addr: Address to poll * @addr: Address to poll
* @val: Variable to read the value into * @val: Variable to read the value into
* @cond: Break condition (usually involving @val) * @cond: Break condition (usually involving @val)
* @sleep_us: Maximum time to sleep between reads in us (0 * @sleep_us: Maximum time to sleep between reads in us (0 tight-loops). Please
* tight-loops). Should be less than ~20ms since usleep_range * read usleep_range() function description for details and
* is used (see Documentation/timers/timers-howto.rst). * limitations.
* @timeout_us: Timeout in us, 0 means never timeout * @timeout_us: Timeout in us, 0 means never timeout
* *
* Returns 0 on success and -ETIMEDOUT upon a timeout. In either
* case, the last read value at @addr is stored in @val. Must not
* be called from atomic context if sleep_us or timeout_us are used.
*
* When available, you'll probably want to use one of the specialized * When available, you'll probably want to use one of the specialized
* macros defined below rather than this macro directly. * macros defined below rather than this macro directly.
*
* Returns: 0 on success and -ETIMEDOUT upon a timeout. In either
* case, the last read value at @addr is stored in @val. Must not
* be called from atomic context if sleep_us or timeout_us are used.
*/ */
#define readx_poll_timeout(op, addr, val, cond, sleep_us, timeout_us) \ #define readx_poll_timeout(op, addr, val, cond, sleep_us, timeout_us) \
read_poll_timeout(op, val, cond, sleep_us, timeout_us, false, addr) read_poll_timeout(op, val, cond, sleep_us, timeout_us, false, addr)
@ -140,16 +140,16 @@
* @addr: Address to poll * @addr: Address to poll
* @val: Variable to read the value into * @val: Variable to read the value into
* @cond: Break condition (usually involving @val) * @cond: Break condition (usually involving @val)
* @delay_us: Time to udelay between reads in us (0 tight-loops). Should * @delay_us: Time to udelay between reads in us (0 tight-loops). Please
* be less than ~10us since udelay is used (see * read udelay() function description for details and
* Documentation/timers/timers-howto.rst). * limitations.
* @timeout_us: Timeout in us, 0 means never timeout * @timeout_us: Timeout in us, 0 means never timeout
* *
* Returns 0 on success and -ETIMEDOUT upon a timeout. In either
* case, the last read value at @addr is stored in @val.
*
* When available, you'll probably want to use one of the specialized * When available, you'll probably want to use one of the specialized
* macros defined below rather than this macro directly. * macros defined below rather than this macro directly.
*
* Returns: 0 on success and -ETIMEDOUT upon a timeout. In either
* case, the last read value at @addr is stored in @val.
*/ */
#define readx_poll_timeout_atomic(op, addr, val, cond, delay_us, timeout_us) \ #define readx_poll_timeout_atomic(op, addr, val, cond, delay_us, timeout_us) \
read_poll_timeout_atomic(op, val, cond, delay_us, timeout_us, false, addr) read_poll_timeout_atomic(op, val, cond, delay_us, timeout_us, false, addr)

View File

@ -502,7 +502,7 @@ static inline unsigned long _msecs_to_jiffies(const unsigned int m)
* - all other values are converted to jiffies by either multiplying * - all other values are converted to jiffies by either multiplying
* the input value by a factor or dividing it with a factor and * the input value by a factor or dividing it with a factor and
* handling any 32-bit overflows. * handling any 32-bit overflows.
* for the details see __msecs_to_jiffies() * for the details see _msecs_to_jiffies()
* *
* msecs_to_jiffies() checks for the passed in value being a constant * msecs_to_jiffies() checks for the passed in value being a constant
* via __builtin_constant_p() allowing gcc to eliminate most of the * via __builtin_constant_p() allowing gcc to eliminate most of the
@ -526,6 +526,19 @@ static __always_inline unsigned long msecs_to_jiffies(const unsigned int m)
} }
} }
/**
* secs_to_jiffies: - convert seconds to jiffies
* @_secs: time in seconds
*
* Conversion is done by simple multiplication with HZ
*
* secs_to_jiffies() is defined as a macro rather than a static inline
* function so it can be used in static initializers.
*
* Return: jiffies value
*/
#define secs_to_jiffies(_secs) ((_secs) * HZ)
extern unsigned long __usecs_to_jiffies(const unsigned int u); extern unsigned long __usecs_to_jiffies(const unsigned int u);
#if !(USEC_PER_SEC % HZ) #if !(USEC_PER_SEC % HZ)
static inline unsigned long _usecs_to_jiffies(const unsigned int u) static inline unsigned long _usecs_to_jiffies(const unsigned int u)

View File

@ -1378,12 +1378,13 @@ int phy_read_mmd(struct phy_device *phydev, int devad, u32 regnum);
* @regnum: The register on the MMD to read * @regnum: The register on the MMD to read
* @val: Variable to read the register into * @val: Variable to read the register into
* @cond: Break condition (usually involving @val) * @cond: Break condition (usually involving @val)
* @sleep_us: Maximum time to sleep between reads in us (0 * @sleep_us: Maximum time to sleep between reads in us (0 tight-loops). Please
* tight-loops). Should be less than ~20ms since usleep_range * read usleep_range() function description for details and
* is used (see Documentation/timers/timers-howto.rst). * limitations.
* @timeout_us: Timeout in us, 0 means never timeout * @timeout_us: Timeout in us, 0 means never timeout
* @sleep_before_read: if it is true, sleep @sleep_us before read. * @sleep_before_read: if it is true, sleep @sleep_us before read.
* Returns 0 on success and -ETIMEDOUT upon a timeout. In either *
* Returns: 0 on success and -ETIMEDOUT upon a timeout. In either
* case, the last read value at @args is stored in @val. Must not * case, the last read value at @args is stored in @val. Must not
* be called from atomic context if sleep_us or timeout_us are used. * be called from atomic context if sleep_us or timeout_us are used.
*/ */

View File

@ -5,12 +5,16 @@
#include <linux/alarmtimer.h> #include <linux/alarmtimer.h>
#include <linux/list.h> #include <linux/list.h>
#include <linux/mutex.h> #include <linux/mutex.h>
#include <linux/pid.h>
#include <linux/posix-timers_types.h> #include <linux/posix-timers_types.h>
#include <linux/rcuref.h>
#include <linux/spinlock.h> #include <linux/spinlock.h>
#include <linux/timerqueue.h> #include <linux/timerqueue.h>
struct kernel_siginfo; struct kernel_siginfo;
struct task_struct; struct task_struct;
struct sigqueue;
struct k_itimer;
static inline clockid_t make_process_cpuclock(const unsigned int pid, static inline clockid_t make_process_cpuclock(const unsigned int pid,
const clockid_t clock) const clockid_t clock)
@ -35,6 +39,8 @@ static inline int clockid_to_fd(const clockid_t clk)
#ifdef CONFIG_POSIX_TIMERS #ifdef CONFIG_POSIX_TIMERS
#include <linux/signal_types.h>
/** /**
* cpu_timer - Posix CPU timer representation for k_itimer * cpu_timer - Posix CPU timer representation for k_itimer
* @node: timerqueue node to queue in the task/sig * @node: timerqueue node to queue in the task/sig
@ -42,6 +48,7 @@ static inline int clockid_to_fd(const clockid_t clk)
* @pid: Pointer to target task PID * @pid: Pointer to target task PID
* @elist: List head for the expiry list * @elist: List head for the expiry list
* @firing: Timer is currently firing * @firing: Timer is currently firing
* @nanosleep: Timer is used for nanosleep and is not a regular posix-timer
* @handling: Pointer to the task which handles expiry * @handling: Pointer to the task which handles expiry
*/ */
struct cpu_timer { struct cpu_timer {
@ -49,7 +56,8 @@ struct cpu_timer {
struct timerqueue_head *head; struct timerqueue_head *head;
struct pid *pid; struct pid *pid;
struct list_head elist; struct list_head elist;
int firing; bool firing;
bool nanosleep;
struct task_struct __rcu *handling; struct task_struct __rcu *handling;
}; };
@ -101,6 +109,12 @@ static inline void posix_cputimers_rt_watchdog(struct posix_cputimers *pct,
pct->bases[CPUCLOCK_SCHED].nextevt = runtime; pct->bases[CPUCLOCK_SCHED].nextevt = runtime;
} }
void posixtimer_rearm_itimer(struct task_struct *p);
bool posixtimer_init_sigqueue(struct sigqueue *q);
void posixtimer_send_sigqueue(struct k_itimer *tmr);
bool posixtimer_deliver_signal(struct kernel_siginfo *info, struct sigqueue *timer_sigq);
void posixtimer_free_timer(struct k_itimer *timer);
/* Init task static initializer */ /* Init task static initializer */
#define INIT_CPU_TIMERBASE(b) { \ #define INIT_CPU_TIMERBASE(b) { \
.nextevt = U64_MAX, \ .nextevt = U64_MAX, \
@ -122,6 +136,10 @@ struct cpu_timer { };
static inline void posix_cputimers_init(struct posix_cputimers *pct) { } static inline void posix_cputimers_init(struct posix_cputimers *pct) { }
static inline void posix_cputimers_group_init(struct posix_cputimers *pct, static inline void posix_cputimers_group_init(struct posix_cputimers *pct,
u64 cpu_limit) { } u64 cpu_limit) { }
static inline void posixtimer_rearm_itimer(struct task_struct *p) { }
static inline bool posixtimer_deliver_signal(struct kernel_siginfo *info,
struct sigqueue *timer_sigq) { return false; }
static inline void posixtimer_free_timer(struct k_itimer *timer) { }
#endif #endif
#ifdef CONFIG_POSIX_CPU_TIMERS_TASK_WORK #ifdef CONFIG_POSIX_CPU_TIMERS_TASK_WORK
@ -132,50 +150,56 @@ static inline void clear_posix_cputimers_work(struct task_struct *p) { }
static inline void posix_cputimers_init_work(void) { } static inline void posix_cputimers_init_work(void) { }
#endif #endif
#define REQUEUE_PENDING 1
/** /**
* struct k_itimer - POSIX.1b interval timer structure. * struct k_itimer - POSIX.1b interval timer structure.
* @list: List head for binding the timer to signals->posix_timers * @list: List node for binding the timer to tsk::signal::posix_timers
* @ignored_list: List node for tracking ignored timers in tsk::signal::ignored_posix_timers
* @t_hash: Entry in the posix timer hash table * @t_hash: Entry in the posix timer hash table
* @it_lock: Lock protecting the timer * @it_lock: Lock protecting the timer
* @kclock: Pointer to the k_clock struct handling this timer * @kclock: Pointer to the k_clock struct handling this timer
* @it_clock: The posix timer clock id * @it_clock: The posix timer clock id
* @it_id: The posix timer id for identifying the timer * @it_id: The posix timer id for identifying the timer
* @it_active: Marker that timer is active * @it_status: The status of the timer
* @it_sig_periodic: The periodic status at signal delivery
* @it_overrun: The overrun counter for pending signals * @it_overrun: The overrun counter for pending signals
* @it_overrun_last: The overrun at the time of the last delivered signal * @it_overrun_last: The overrun at the time of the last delivered signal
* @it_requeue_pending: Indicator that timer waits for being requeued on * @it_signal_seq: Sequence count to control signal delivery
* signal delivery * @it_sigqueue_seq: The sequence count at the point where the signal was queued
* @it_sigev_notify: The notify word of sigevent struct for signal delivery * @it_sigev_notify: The notify word of sigevent struct for signal delivery
* @it_interval: The interval for periodic timers * @it_interval: The interval for periodic timers
* @it_signal: Pointer to the creators signal struct * @it_signal: Pointer to the creators signal struct
* @it_pid: The pid of the process/task targeted by the signal * @it_pid: The pid of the process/task targeted by the signal
* @it_process: The task to wakeup on clock_nanosleep (CPU timers) * @it_process: The task to wakeup on clock_nanosleep (CPU timers)
* @sigq: Pointer to preallocated sigqueue * @rcuref: Reference count for life time management
* @sigq: Embedded sigqueue
* @it: Union representing the various posix timer type * @it: Union representing the various posix timer type
* internals. * internals.
* @rcu: RCU head for freeing the timer. * @rcu: RCU head for freeing the timer.
*/ */
struct k_itimer { struct k_itimer {
struct hlist_node list; struct hlist_node list;
struct hlist_node ignored_list;
struct hlist_node t_hash; struct hlist_node t_hash;
spinlock_t it_lock; spinlock_t it_lock;
const struct k_clock *kclock; const struct k_clock *kclock;
clockid_t it_clock; clockid_t it_clock;
timer_t it_id; timer_t it_id;
int it_active; int it_status;
bool it_sig_periodic;
s64 it_overrun; s64 it_overrun;
s64 it_overrun_last; s64 it_overrun_last;
int it_requeue_pending; unsigned int it_signal_seq;
unsigned int it_sigqueue_seq;
int it_sigev_notify; int it_sigev_notify;
enum pid_type it_pid_type;
ktime_t it_interval; ktime_t it_interval;
struct signal_struct *it_signal; struct signal_struct *it_signal;
union { union {
struct pid *it_pid; struct pid *it_pid;
struct task_struct *it_process; struct task_struct *it_process;
}; };
struct sigqueue *sigq; struct sigqueue sigq;
rcuref_t rcuref;
union { union {
struct { struct {
struct hrtimer timer; struct hrtimer timer;
@ -196,5 +220,29 @@ void set_process_cpu_timer(struct task_struct *task, unsigned int clock_idx,
int update_rlimit_cpu(struct task_struct *task, unsigned long rlim_new); int update_rlimit_cpu(struct task_struct *task, unsigned long rlim_new);
void posixtimer_rearm(struct kernel_siginfo *info); #ifdef CONFIG_POSIX_TIMERS
static inline void posixtimer_putref(struct k_itimer *tmr)
{
if (rcuref_put(&tmr->rcuref))
posixtimer_free_timer(tmr);
}
static inline void posixtimer_sigqueue_getref(struct sigqueue *q)
{
struct k_itimer *tmr = container_of(q, struct k_itimer, sigq);
WARN_ON_ONCE(!rcuref_get(&tmr->rcuref));
}
static inline void posixtimer_sigqueue_putref(struct sigqueue *q)
{
struct k_itimer *tmr = container_of(q, struct k_itimer, sigq);
posixtimer_putref(tmr);
}
#else /* CONFIG_POSIX_TIMERS */
static inline void posixtimer_sigqueue_getref(struct sigqueue *q) { }
static inline void posixtimer_sigqueue_putref(struct sigqueue *q) { }
#endif /* !CONFIG_POSIX_TIMERS */
#endif #endif

View File

@ -106,17 +106,17 @@ struct reg_sequence {
* @addr: Address to poll * @addr: Address to poll
* @val: Unsigned integer variable to read the value into * @val: Unsigned integer variable to read the value into
* @cond: Break condition (usually involving @val) * @cond: Break condition (usually involving @val)
* @sleep_us: Maximum time to sleep between reads in us (0 * @sleep_us: Maximum time to sleep between reads in us (0 tight-loops). Please
* tight-loops). Should be less than ~20ms since usleep_range * read usleep_range() function description for details and
* is used (see Documentation/timers/timers-howto.rst). * limitations.
* @timeout_us: Timeout in us, 0 means never timeout * @timeout_us: Timeout in us, 0 means never timeout
* *
* Returns 0 on success and -ETIMEDOUT upon a timeout or the regmap_read * This is modelled after the readx_poll_timeout macros in linux/iopoll.h.
*
* Returns: 0 on success and -ETIMEDOUT upon a timeout or the regmap_read
* error return value in case of a error read. In the two former cases, * error return value in case of a error read. In the two former cases,
* the last read value at @addr is stored in @val. Must not be called * the last read value at @addr is stored in @val. Must not be called
* from atomic context if sleep_us or timeout_us are used. * from atomic context if sleep_us or timeout_us are used.
*
* This is modelled after the readx_poll_timeout macros in linux/iopoll.h.
*/ */
#define regmap_read_poll_timeout(map, addr, val, cond, sleep_us, timeout_us) \ #define regmap_read_poll_timeout(map, addr, val, cond, sleep_us, timeout_us) \
({ \ ({ \
@ -133,20 +133,20 @@ struct reg_sequence {
* @addr: Address to poll * @addr: Address to poll
* @val: Unsigned integer variable to read the value into * @val: Unsigned integer variable to read the value into
* @cond: Break condition (usually involving @val) * @cond: Break condition (usually involving @val)
* @delay_us: Time to udelay between reads in us (0 tight-loops). * @delay_us: Time to udelay between reads in us (0 tight-loops). Please
* Should be less than ~10us since udelay is used * read udelay() function description for details and
* (see Documentation/timers/timers-howto.rst). * limitations.
* @timeout_us: Timeout in us, 0 means never timeout * @timeout_us: Timeout in us, 0 means never timeout
* *
* Returns 0 on success and -ETIMEDOUT upon a timeout or the regmap_read
* error return value in case of a error read. In the two former cases,
* the last read value at @addr is stored in @val.
*
* This is modelled after the readx_poll_timeout_atomic macros in linux/iopoll.h. * This is modelled after the readx_poll_timeout_atomic macros in linux/iopoll.h.
* *
* Note: In general regmap cannot be used in atomic context. If you want to use * Note: In general regmap cannot be used in atomic context. If you want to use
* this macro then first setup your regmap for atomic use (flat or no cache * this macro then first setup your regmap for atomic use (flat or no cache
* and MMIO regmap). * and MMIO regmap).
*
* Returns: 0 on success and -ETIMEDOUT upon a timeout or the regmap_read
* error return value in case of a error read. In the two former cases,
* the last read value at @addr is stored in @val.
*/ */
#define regmap_read_poll_timeout_atomic(map, addr, val, cond, delay_us, timeout_us) \ #define regmap_read_poll_timeout_atomic(map, addr, val, cond, delay_us, timeout_us) \
({ \ ({ \
@ -177,17 +177,17 @@ struct reg_sequence {
* @field: Regmap field to read from * @field: Regmap field to read from
* @val: Unsigned integer variable to read the value into * @val: Unsigned integer variable to read the value into
* @cond: Break condition (usually involving @val) * @cond: Break condition (usually involving @val)
* @sleep_us: Maximum time to sleep between reads in us (0 * @sleep_us: Maximum time to sleep between reads in us (0 tight-loops). Please
* tight-loops). Should be less than ~20ms since usleep_range * read usleep_range() function description for details and
* is used (see Documentation/timers/timers-howto.rst). * limitations.
* @timeout_us: Timeout in us, 0 means never timeout * @timeout_us: Timeout in us, 0 means never timeout
* *
* Returns 0 on success and -ETIMEDOUT upon a timeout or the regmap_field_read * This is modelled after the readx_poll_timeout macros in linux/iopoll.h.
*
* Returns: 0 on success and -ETIMEDOUT upon a timeout or the regmap_field_read
* error return value in case of a error read. In the two former cases, * error return value in case of a error read. In the two former cases,
* the last read value at @addr is stored in @val. Must not be called * the last read value at @addr is stored in @val. Must not be called
* from atomic context if sleep_us or timeout_us are used. * from atomic context if sleep_us or timeout_us are used.
*
* This is modelled after the readx_poll_timeout macros in linux/iopoll.h.
*/ */
#define regmap_field_read_poll_timeout(field, val, cond, sleep_us, timeout_us) \ #define regmap_field_read_poll_timeout(field, val, cond, sleep_us, timeout_us) \
({ \ ({ \

View File

@ -138,6 +138,7 @@ struct signal_struct {
/* POSIX.1b Interval Timers */ /* POSIX.1b Interval Timers */
unsigned int next_posix_timer_id; unsigned int next_posix_timer_id;
struct hlist_head posix_timers; struct hlist_head posix_timers;
struct hlist_head ignored_posix_timers;
/* ITIMER_REAL timer for the process */ /* ITIMER_REAL timer for the process */
struct hrtimer real_timer; struct hrtimer real_timer;
@ -338,9 +339,6 @@ extern void force_fatal_sig(int);
extern void force_exit_sig(int); extern void force_exit_sig(int);
extern int send_sig(int, struct task_struct *, int); extern int send_sig(int, struct task_struct *, int);
extern int zap_other_threads(struct task_struct *p); extern int zap_other_threads(struct task_struct *p);
extern struct sigqueue *sigqueue_alloc(void);
extern void sigqueue_free(struct sigqueue *);
extern int send_sigqueue(struct sigqueue *, struct pid *, enum pid_type);
extern int do_sigaction(int, struct k_sigaction *, struct k_sigaction *); extern int do_sigaction(int, struct k_sigaction *, struct k_sigaction *);
static inline void clear_notify_signal(void) static inline void clear_notify_signal(void)

View File

@ -20,12 +20,10 @@ extern void __init tick_init(void);
extern void tick_suspend_local(void); extern void tick_suspend_local(void);
/* Should be core only, but XEN resume magic and ARM BL switcher require it */ /* Should be core only, but XEN resume magic and ARM BL switcher require it */
extern void tick_resume_local(void); extern void tick_resume_local(void);
extern void tick_cleanup_dead_cpu(int cpu);
#else /* CONFIG_GENERIC_CLOCKEVENTS */ #else /* CONFIG_GENERIC_CLOCKEVENTS */
static inline void tick_init(void) { } static inline void tick_init(void) { }
static inline void tick_suspend_local(void) { } static inline void tick_suspend_local(void) { }
static inline void tick_resume_local(void) { } static inline void tick_resume_local(void) { }
static inline void tick_cleanup_dead_cpu(int cpu) { }
#endif /* !CONFIG_GENERIC_CLOCKEVENTS */ #endif /* !CONFIG_GENERIC_CLOCKEVENTS */
#if defined(CONFIG_GENERIC_CLOCKEVENTS) && defined(CONFIG_HOTPLUG_CPU) #if defined(CONFIG_GENERIC_CLOCKEVENTS) && defined(CONFIG_HOTPLUG_CPU)

View File

@ -26,7 +26,7 @@
* occupies a single 64byte cache line. * occupies a single 64byte cache line.
* *
* The struct is separate from struct timekeeper as it is also used * The struct is separate from struct timekeeper as it is also used
* for a fast NMI safe accessors. * for the fast NMI safe accessors.
* *
* @base_real is for the fast NMI safe accessor to allow reading clock * @base_real is for the fast NMI safe accessor to allow reading clock
* realtime from any context. * realtime from any context.
@ -44,33 +44,38 @@ struct tk_read_base {
/** /**
* struct timekeeper - Structure holding internal timekeeping values. * struct timekeeper - Structure holding internal timekeeping values.
* @tkr_mono: The readout base structure for CLOCK_MONOTONIC * @tkr_mono: The readout base structure for CLOCK_MONOTONIC
* @tkr_raw: The readout base structure for CLOCK_MONOTONIC_RAW * @xtime_sec: Current CLOCK_REALTIME time in seconds
* @xtime_sec: Current CLOCK_REALTIME time in seconds * @ktime_sec: Current CLOCK_MONOTONIC time in seconds
* @ktime_sec: Current CLOCK_MONOTONIC time in seconds * @wall_to_monotonic: CLOCK_REALTIME to CLOCK_MONOTONIC offset
* @wall_to_monotonic: CLOCK_REALTIME to CLOCK_MONOTONIC offset * @offs_real: Offset clock monotonic -> clock realtime
* @offs_real: Offset clock monotonic -> clock realtime * @offs_boot: Offset clock monotonic -> clock boottime
* @offs_boot: Offset clock monotonic -> clock boottime * @offs_tai: Offset clock monotonic -> clock tai
* @offs_tai: Offset clock monotonic -> clock tai * @tai_offset: The current UTC to TAI offset in seconds
* @tai_offset: The current UTC to TAI offset in seconds * @tkr_raw: The readout base structure for CLOCK_MONOTONIC_RAW
* @clock_was_set_seq: The sequence number of clock was set events * @raw_sec: CLOCK_MONOTONIC_RAW time in seconds
* @cs_was_changed_seq: The sequence number of clocksource change events * @clock_was_set_seq: The sequence number of clock was set events
* @next_leap_ktime: CLOCK_MONOTONIC time value of a pending leap-second * @cs_was_changed_seq: The sequence number of clocksource change events
* @raw_sec: CLOCK_MONOTONIC_RAW time in seconds * @monotonic_to_boot: CLOCK_MONOTONIC to CLOCK_BOOTTIME offset
* @monotonic_to_boot: CLOCK_MONOTONIC to CLOCK_BOOTTIME offset * @cycle_interval: Number of clock cycles in one NTP interval
* @cycle_interval: Number of clock cycles in one NTP interval * @xtime_interval: Number of clock shifted nano seconds in one NTP
* @xtime_interval: Number of clock shifted nano seconds in one NTP * interval.
* interval. * @xtime_remainder: Shifted nano seconds left over when rounding
* @xtime_remainder: Shifted nano seconds left over when rounding * @cycle_interval
* @cycle_interval * @raw_interval: Shifted raw nano seconds accumulated per NTP interval.
* @raw_interval: Shifted raw nano seconds accumulated per NTP interval. * @next_leap_ktime: CLOCK_MONOTONIC time value of a pending leap-second
* @ntp_error: Difference between accumulated time and NTP time in ntp * @ntp_tick: The ntp_tick_length() value currently being
* shifted nano seconds. * used. This cached copy ensures we consistently
* @ntp_error_shift: Shift conversion between clock shifted nano seconds and * apply the tick length for an entire tick, as
* ntp shifted nano seconds. * ntp_tick_length may change mid-tick, and we don't
* @last_warning: Warning ratelimiter (DEBUG_TIMEKEEPING) * want to apply that new value to the tick in
* @underflow_seen: Underflow warning flag (DEBUG_TIMEKEEPING) * progress.
* @overflow_seen: Overflow warning flag (DEBUG_TIMEKEEPING) * @ntp_error: Difference between accumulated time and NTP time in ntp
* shifted nano seconds.
* @ntp_error_shift: Shift conversion between clock shifted nano seconds and
* ntp shifted nano seconds.
* @ntp_err_mult: Multiplication factor for scaled math conversion
* @skip_second_overflow: Flag used to avoid updating NTP twice with same second
* *
* Note: For timespec(64) based interfaces wall_to_monotonic is what * Note: For timespec(64) based interfaces wall_to_monotonic is what
* we need to add to xtime (or xtime corrected for sub jiffy times) * we need to add to xtime (or xtime corrected for sub jiffy times)
@ -88,10 +93,28 @@ struct tk_read_base {
* *
* @monotonic_to_boottime is a timespec64 representation of @offs_boot to * @monotonic_to_boottime is a timespec64 representation of @offs_boot to
* accelerate the VDSO update for CLOCK_BOOTTIME. * accelerate the VDSO update for CLOCK_BOOTTIME.
*
* The cacheline ordering of the structure is optimized for in kernel usage of
* the ktime_get() and ktime_get_ts64() family of time accessors. Struct
* timekeeper is prepended in the core timekeeping code with a sequence count,
* which results in the following cacheline layout:
*
* 0: seqcount, tkr_mono
* 1: xtime_sec ... tai_offset
* 2: tkr_raw, raw_sec
* 3,4: Internal variables
*
* Cacheline 0,1 contain the data which is used for accessing
* CLOCK_MONOTONIC/REALTIME/BOOTTIME/TAI, while cacheline 2 contains the
* data for accessing CLOCK_MONOTONIC_RAW. Cacheline 3,4 are internal
* variables which are only accessed during timekeeper updates once per
* tick.
*/ */
struct timekeeper { struct timekeeper {
/* Cacheline 0 (together with prepended seqcount of timekeeper core): */
struct tk_read_base tkr_mono; struct tk_read_base tkr_mono;
struct tk_read_base tkr_raw;
/* Cacheline 1: */
u64 xtime_sec; u64 xtime_sec;
unsigned long ktime_sec; unsigned long ktime_sec;
struct timespec64 wall_to_monotonic; struct timespec64 wall_to_monotonic;
@ -99,43 +122,28 @@ struct timekeeper {
ktime_t offs_boot; ktime_t offs_boot;
ktime_t offs_tai; ktime_t offs_tai;
s32 tai_offset; s32 tai_offset;
/* Cacheline 2: */
struct tk_read_base tkr_raw;
u64 raw_sec;
/* Cachline 3 and 4 (timekeeping internal variables): */
unsigned int clock_was_set_seq; unsigned int clock_was_set_seq;
u8 cs_was_changed_seq; u8 cs_was_changed_seq;
ktime_t next_leap_ktime;
u64 raw_sec;
struct timespec64 monotonic_to_boot; struct timespec64 monotonic_to_boot;
/* The following members are for timekeeping internal use */
u64 cycle_interval; u64 cycle_interval;
u64 xtime_interval; u64 xtime_interval;
s64 xtime_remainder; s64 xtime_remainder;
u64 raw_interval; u64 raw_interval;
/* The ntp_tick_length() value currently being used.
* This cached copy ensures we consistently apply the tick ktime_t next_leap_ktime;
* length for an entire tick, as ntp_tick_length may change
* mid-tick, and we don't want to apply that new value to
* the tick in progress.
*/
u64 ntp_tick; u64 ntp_tick;
/* Difference between accumulated time and NTP time in ntp
* shifted nano seconds. */
s64 ntp_error; s64 ntp_error;
u32 ntp_error_shift; u32 ntp_error_shift;
u32 ntp_err_mult; u32 ntp_err_mult;
/* Flag used to avoid updating NTP twice with same second */
u32 skip_second_overflow; u32 skip_second_overflow;
#ifdef CONFIG_DEBUG_TIMEKEEPING
long last_warning;
/*
* These simple flag variables are managed
* without locks, which is racy, but they are
* ok since we don't really care about being
* super precise about how many events were
* seen, just that a problem was observed.
*/
int underflow_seen;
int overflow_seen;
#endif
}; };
#ifdef CONFIG_GENERIC_TIME_VSYSCALL #ifdef CONFIG_GENERIC_TIME_VSYSCALL

View File

@ -280,6 +280,7 @@ struct ktime_timestamps {
* counter value * counter value
* @cycles: Clocksource counter value to produce the system times * @cycles: Clocksource counter value to produce the system times
* @real: Realtime system time * @real: Realtime system time
* @boot: Boot time
* @raw: Monotonic raw system time * @raw: Monotonic raw system time
* @cs_id: Clocksource ID * @cs_id: Clocksource ID
* @clock_was_set_seq: The sequence number of clock-was-set events * @clock_was_set_seq: The sequence number of clock-was-set events
@ -288,6 +289,7 @@ struct ktime_timestamps {
struct system_time_snapshot { struct system_time_snapshot {
u64 cycles; u64 cycles;
ktime_t real; ktime_t real;
ktime_t boot;
ktime_t raw; ktime_t raw;
enum clocksource_ids cs_id; enum clocksource_ids cs_id;
unsigned int clock_was_set_seq; unsigned int clock_was_set_seq;

View File

@ -139,14 +139,6 @@ unsigned long random_get_entropy_fallback(void);
#define MAXSEC 2048 /* max interval between updates (s) */ #define MAXSEC 2048 /* max interval between updates (s) */
#define NTP_PHASE_LIMIT ((MAXPHASE / NSEC_PER_USEC) << 5) /* beyond max. dispersion */ #define NTP_PHASE_LIMIT ((MAXPHASE / NSEC_PER_USEC) << 5) /* beyond max. dispersion */
/*
* kernel variables
* Note: maximum error = NTP sync distance = dispersion + delay / 2;
* estimated error = NTP dispersion.
*/
extern unsigned long tick_usec; /* USER_HZ period (usec) */
extern unsigned long tick_nsec; /* SHIFTED_HZ period (nsec) */
/* Required to safely shift negative values */ /* Required to safely shift negative values */
#define shift_right(x, s) ({ \ #define shift_right(x, s) ({ \
__typeof__(x) __x = (x); \ __typeof__(x) __x = (x); \

View File

@ -542,8 +542,8 @@ do { \
int __ret = 0; \ int __ret = 0; \
struct hrtimer_sleeper __t; \ struct hrtimer_sleeper __t; \
\ \
hrtimer_init_sleeper_on_stack(&__t, CLOCK_MONOTONIC, \ hrtimer_setup_sleeper_on_stack(&__t, CLOCK_MONOTONIC, \
HRTIMER_MODE_REL); \ HRTIMER_MODE_REL); \
if ((timeout) != KTIME_MAX) { \ if ((timeout) != KTIME_MAX) { \
hrtimer_set_expires_range_ns(&__t.timer, timeout, \ hrtimer_set_expires_range_ns(&__t.timer, timeout, \
current->timer_slack_ns); \ current->timer_slack_ns); \

View File

@ -46,7 +46,7 @@ union __sifields {
__kernel_timer_t _tid; /* timer id */ __kernel_timer_t _tid; /* timer id */
int _overrun; /* overrun count */ int _overrun; /* overrun count */
sigval_t _sigval; /* same as below */ sigval_t _sigval; /* same as below */
int _sys_private; /* not to be passed to user */ int _sys_private; /* Not used by the kernel. Historic leftover. Always 0. */
} _timer; } _timer;
/* POSIX.1b signals */ /* POSIX.1b signals */

View File

@ -30,8 +30,9 @@ static struct signal_struct init_signals = {
.cred_guard_mutex = __MUTEX_INITIALIZER(init_signals.cred_guard_mutex), .cred_guard_mutex = __MUTEX_INITIALIZER(init_signals.cred_guard_mutex),
.exec_update_lock = __RWSEM_INITIALIZER(init_signals.exec_update_lock), .exec_update_lock = __RWSEM_INITIALIZER(init_signals.exec_update_lock),
#ifdef CONFIG_POSIX_TIMERS #ifdef CONFIG_POSIX_TIMERS
.posix_timers = HLIST_HEAD_INIT, .posix_timers = HLIST_HEAD_INIT,
.cputimer = { .ignored_posix_timers = HLIST_HEAD_INIT,
.cputimer = {
.cputime_atomic = INIT_CPUTIME_ATOMIC, .cputime_atomic = INIT_CPUTIME_ATOMIC,
}, },
#endif #endif

View File

@ -2408,13 +2408,14 @@ static int io_cqring_schedule_timeout(struct io_wait_queue *iowq,
{ {
ktime_t timeout; ktime_t timeout;
hrtimer_init_on_stack(&iowq->t, clock_id, HRTIMER_MODE_ABS);
if (iowq->min_timeout) { if (iowq->min_timeout) {
timeout = ktime_add_ns(iowq->min_timeout, start_time); timeout = ktime_add_ns(iowq->min_timeout, start_time);
iowq->t.function = io_cqring_min_timer_wakeup; hrtimer_setup_on_stack(&iowq->t, io_cqring_min_timer_wakeup, clock_id,
HRTIMER_MODE_ABS);
} else { } else {
timeout = iowq->timeout; timeout = iowq->timeout;
iowq->t.function = io_cqring_timer_wakeup; hrtimer_setup_on_stack(&iowq->t, io_cqring_timer_wakeup, clock_id,
HRTIMER_MODE_ABS);
} }
hrtimer_set_expires_range_ns(&iowq->t, timeout, 0); hrtimer_set_expires_range_ns(&iowq->t, timeout, 0);

View File

@ -1176,7 +1176,7 @@ static u64 io_hybrid_iopoll_delay(struct io_ring_ctx *ctx, struct io_kiocb *req)
req->flags |= REQ_F_IOPOLL_STATE; req->flags |= REQ_F_IOPOLL_STATE;
mode = HRTIMER_MODE_REL; mode = HRTIMER_MODE_REL;
hrtimer_init_sleeper_on_stack(&timer, CLOCK_MONOTONIC, mode); hrtimer_setup_sleeper_on_stack(&timer, CLOCK_MONOTONIC, mode);
hrtimer_set_expires(&timer.timer, kt); hrtimer_set_expires(&timer.timer, kt);
set_current_state(TASK_INTERRUPTIBLE); set_current_state(TASK_INTERRUPTIBLE);
hrtimer_sleeper_start_expires(&timer, mode); hrtimer_sleeper_start_expires(&timer, mode);

View File

@ -76,7 +76,6 @@ static void io_timeout_complete(struct io_kiocb *req, struct io_tw_state *ts)
/* re-arm timer */ /* re-arm timer */
spin_lock_irq(&ctx->timeout_lock); spin_lock_irq(&ctx->timeout_lock);
list_add(&timeout->list, ctx->timeout_list.prev); list_add(&timeout->list, ctx->timeout_list.prev);
data->timer.function = io_timeout_fn;
hrtimer_start(&data->timer, timespec64_to_ktime(data->ts), data->mode); hrtimer_start(&data->timer, timespec64_to_ktime(data->ts), data->mode);
spin_unlock_irq(&ctx->timeout_lock); spin_unlock_irq(&ctx->timeout_lock);
return; return;

View File

@ -1339,7 +1339,6 @@ static int takedown_cpu(unsigned int cpu)
cpuhp_bp_sync_dead(cpu); cpuhp_bp_sync_dead(cpu);
lockdep_cleanup_dead_cpu(cpu, idle_thread_get(cpu)); lockdep_cleanup_dead_cpu(cpu, idle_thread_get(cpu));
tick_cleanup_dead_cpu(cpu);
/* /*
* Callbacks must be re-integrated right away to the RCU state machine. * Callbacks must be re-integrated right away to the RCU state machine.

View File

@ -1862,6 +1862,7 @@ static int copy_signal(unsigned long clone_flags, struct task_struct *tsk)
#ifdef CONFIG_POSIX_TIMERS #ifdef CONFIG_POSIX_TIMERS
INIT_HLIST_HEAD(&sig->posix_timers); INIT_HLIST_HEAD(&sig->posix_timers);
INIT_HLIST_HEAD(&sig->ignored_posix_timers);
hrtimer_init(&sig->real_timer, CLOCK_MONOTONIC, HRTIMER_MODE_REL); hrtimer_init(&sig->real_timer, CLOCK_MONOTONIC, HRTIMER_MODE_REL);
sig->real_timer.function = it_real_fn; sig->real_timer.function = it_real_fn;
#endif #endif

View File

@ -140,9 +140,9 @@ futex_setup_timer(ktime_t *time, struct hrtimer_sleeper *timeout,
if (!time) if (!time)
return NULL; return NULL;
hrtimer_init_sleeper_on_stack(timeout, (flags & FLAGS_CLOCKRT) ? hrtimer_setup_sleeper_on_stack(timeout,
CLOCK_REALTIME : CLOCK_MONOTONIC, (flags & FLAGS_CLOCKRT) ? CLOCK_REALTIME : CLOCK_MONOTONIC,
HRTIMER_MODE_ABS); HRTIMER_MODE_ABS);
/* /*
* If range_ns is 0, calling hrtimer_set_expires_range_ns() is * If range_ns is 0, calling hrtimer_set_expires_range_ns() is
* effectively the same as calling hrtimer_set_expires(). * effectively the same as calling hrtimer_set_expires().

View File

@ -398,8 +398,8 @@ void play_idle_precise(u64 duration_ns, u64 latency_ns)
cpuidle_use_deepest_state(latency_ns); cpuidle_use_deepest_state(latency_ns);
it.done = 0; it.done = 0;
hrtimer_init_on_stack(&it.timer, CLOCK_MONOTONIC, HRTIMER_MODE_REL_HARD); hrtimer_setup_on_stack(&it.timer, idle_inject_timer_fn, CLOCK_MONOTONIC,
it.timer.function = idle_inject_timer_fn; HRTIMER_MODE_REL_HARD);
hrtimer_start(&it.timer, ns_to_ktime(duration_ns), hrtimer_start(&it.timer, ns_to_ktime(duration_ns),
HRTIMER_MODE_REL_PINNED_HARD); HRTIMER_MODE_REL_PINNED_HARD);

View File

@ -59,6 +59,8 @@
#include <asm/cacheflush.h> #include <asm/cacheflush.h>
#include <asm/syscall.h> /* for syscall_get_* */ #include <asm/syscall.h> /* for syscall_get_* */
#include "time/posix-timers.h"
/* /*
* SLAB caches for signal bits. * SLAB caches for signal bits.
*/ */
@ -396,16 +398,9 @@ void task_join_group_stop(struct task_struct *task)
task_set_jobctl_pending(task, mask | JOBCTL_STOP_PENDING); task_set_jobctl_pending(task, mask | JOBCTL_STOP_PENDING);
} }
/* static struct ucounts *sig_get_ucounts(struct task_struct *t, int sig,
* allocate a new signal queue record int override_rlimit)
* - this may be called without locks if and only if t == current, otherwise an
* appropriate lock must be held to stop the target task from exiting
*/
static struct sigqueue *
__sigqueue_alloc(int sig, struct task_struct *t, gfp_t gfp_flags,
int override_rlimit, const unsigned int sigqueue_flags)
{ {
struct sigqueue *q = NULL;
struct ucounts *ucounts; struct ucounts *ucounts;
long sigpending; long sigpending;
@ -425,26 +420,53 @@ __sigqueue_alloc(int sig, struct task_struct *t, gfp_t gfp_flags,
if (!sigpending) if (!sigpending)
return NULL; return NULL;
if (override_rlimit || likely(sigpending <= task_rlimit(t, RLIMIT_SIGPENDING))) { if (unlikely(!override_rlimit && sigpending > task_rlimit(t, RLIMIT_SIGPENDING))) {
q = kmem_cache_alloc(sigqueue_cachep, gfp_flags); dec_rlimit_put_ucounts(ucounts, UCOUNT_RLIMIT_SIGPENDING);
} else {
print_dropped_signal(sig); print_dropped_signal(sig);
return NULL;
} }
if (unlikely(q == NULL)) { return ucounts;
}
static void __sigqueue_init(struct sigqueue *q, struct ucounts *ucounts,
const unsigned int sigqueue_flags)
{
INIT_LIST_HEAD(&q->list);
q->flags = sigqueue_flags;
q->ucounts = ucounts;
}
/*
* allocate a new signal queue record
* - this may be called without locks if and only if t == current, otherwise an
* appropriate lock must be held to stop the target task from exiting
*/
static struct sigqueue *sigqueue_alloc(int sig, struct task_struct *t, gfp_t gfp_flags,
int override_rlimit)
{
struct ucounts *ucounts = sig_get_ucounts(t, sig, override_rlimit);
struct sigqueue *q;
if (!ucounts)
return NULL;
q = kmem_cache_alloc(sigqueue_cachep, gfp_flags);
if (!q) {
dec_rlimit_put_ucounts(ucounts, UCOUNT_RLIMIT_SIGPENDING); dec_rlimit_put_ucounts(ucounts, UCOUNT_RLIMIT_SIGPENDING);
} else { return NULL;
INIT_LIST_HEAD(&q->list);
q->flags = sigqueue_flags;
q->ucounts = ucounts;
} }
__sigqueue_init(q, ucounts, 0);
return q; return q;
} }
static void __sigqueue_free(struct sigqueue *q) static void __sigqueue_free(struct sigqueue *q)
{ {
if (q->flags & SIGQUEUE_PREALLOC) if (q->flags & SIGQUEUE_PREALLOC) {
posixtimer_sigqueue_putref(q);
return; return;
}
if (q->ucounts) { if (q->ucounts) {
dec_rlimit_put_ucounts(q->ucounts, UCOUNT_RLIMIT_SIGPENDING); dec_rlimit_put_ucounts(q->ucounts, UCOUNT_RLIMIT_SIGPENDING);
q->ucounts = NULL; q->ucounts = NULL;
@ -479,42 +501,6 @@ void flush_signals(struct task_struct *t)
} }
EXPORT_SYMBOL(flush_signals); EXPORT_SYMBOL(flush_signals);
#ifdef CONFIG_POSIX_TIMERS
static void __flush_itimer_signals(struct sigpending *pending)
{
sigset_t signal, retain;
struct sigqueue *q, *n;
signal = pending->signal;
sigemptyset(&retain);
list_for_each_entry_safe(q, n, &pending->list, list) {
int sig = q->info.si_signo;
if (likely(q->info.si_code != SI_TIMER)) {
sigaddset(&retain, sig);
} else {
sigdelset(&signal, sig);
list_del_init(&q->list);
__sigqueue_free(q);
}
}
sigorsets(&pending->signal, &signal, &retain);
}
void flush_itimer_signals(void)
{
struct task_struct *tsk = current;
unsigned long flags;
spin_lock_irqsave(&tsk->sighand->siglock, flags);
__flush_itimer_signals(&tsk->pending);
__flush_itimer_signals(&tsk->signal->shared_pending);
spin_unlock_irqrestore(&tsk->sighand->siglock, flags);
}
#endif
void ignore_signals(struct task_struct *t) void ignore_signals(struct task_struct *t)
{ {
int i; int i;
@ -564,7 +550,7 @@ bool unhandled_signal(struct task_struct *tsk, int sig)
} }
static void collect_signal(int sig, struct sigpending *list, kernel_siginfo_t *info, static void collect_signal(int sig, struct sigpending *list, kernel_siginfo_t *info,
bool *resched_timer) struct sigqueue **timer_sigq)
{ {
struct sigqueue *q, *first = NULL; struct sigqueue *q, *first = NULL;
@ -587,12 +573,17 @@ still_pending:
list_del_init(&first->list); list_del_init(&first->list);
copy_siginfo(info, &first->info); copy_siginfo(info, &first->info);
*resched_timer = /*
(first->flags & SIGQUEUE_PREALLOC) && * posix-timer signals are preallocated and freed when the last
(info->si_code == SI_TIMER) && * reference count is dropped in posixtimer_deliver_signal() or
(info->si_sys_private); * immediately on timer deletion when the signal is not pending.
* Spare the extra round through __sigqueue_free() which is
__sigqueue_free(first); * ignoring preallocated signals.
*/
if (unlikely((first->flags & SIGQUEUE_PREALLOC) && (info->si_code == SI_TIMER)))
*timer_sigq = first;
else
__sigqueue_free(first);
} else { } else {
/* /*
* Ok, it wasn't in the queue. This must be * Ok, it wasn't in the queue. This must be
@ -609,12 +600,12 @@ still_pending:
} }
static int __dequeue_signal(struct sigpending *pending, sigset_t *mask, static int __dequeue_signal(struct sigpending *pending, sigset_t *mask,
kernel_siginfo_t *info, bool *resched_timer) kernel_siginfo_t *info, struct sigqueue **timer_sigq)
{ {
int sig = next_signal(pending, mask); int sig = next_signal(pending, mask);
if (sig) if (sig)
collect_signal(sig, pending, info, resched_timer); collect_signal(sig, pending, info, timer_sigq);
return sig; return sig;
} }
@ -626,42 +617,22 @@ static int __dequeue_signal(struct sigpending *pending, sigset_t *mask,
int dequeue_signal(sigset_t *mask, kernel_siginfo_t *info, enum pid_type *type) int dequeue_signal(sigset_t *mask, kernel_siginfo_t *info, enum pid_type *type)
{ {
struct task_struct *tsk = current; struct task_struct *tsk = current;
bool resched_timer = false; struct sigqueue *timer_sigq;
int signr; int signr;
lockdep_assert_held(&tsk->sighand->siglock); lockdep_assert_held(&tsk->sighand->siglock);
again:
*type = PIDTYPE_PID; *type = PIDTYPE_PID;
signr = __dequeue_signal(&tsk->pending, mask, info, &resched_timer); timer_sigq = NULL;
signr = __dequeue_signal(&tsk->pending, mask, info, &timer_sigq);
if (!signr) { if (!signr) {
*type = PIDTYPE_TGID; *type = PIDTYPE_TGID;
signr = __dequeue_signal(&tsk->signal->shared_pending, signr = __dequeue_signal(&tsk->signal->shared_pending,
mask, info, &resched_timer); mask, info, &timer_sigq);
#ifdef CONFIG_POSIX_TIMERS
/*
* itimer signal ?
*
* itimers are process shared and we restart periodic
* itimers in the signal delivery path to prevent DoS
* attacks in the high resolution timer case. This is
* compliant with the old way of self-restarting
* itimers, as the SIGALRM is a legacy signal and only
* queued once. Changing the restart behaviour to
* restart the timer in the signal dequeue path is
* reducing the timer noise on heavy loaded !highres
* systems too.
*/
if (unlikely(signr == SIGALRM)) {
struct hrtimer *tmr = &tsk->signal->real_timer;
if (!hrtimer_is_queued(tmr) && if (unlikely(signr == SIGALRM))
tsk->signal->it_real_incr != 0) { posixtimer_rearm_itimer(tsk);
hrtimer_forward(tmr, tmr->base->get_time(),
tsk->signal->it_real_incr);
hrtimer_restart(tmr);
}
}
#endif
} }
recalc_sigpending(); recalc_sigpending();
@ -683,22 +654,12 @@ int dequeue_signal(sigset_t *mask, kernel_siginfo_t *info, enum pid_type *type)
*/ */
current->jobctl |= JOBCTL_STOP_DEQUEUED; current->jobctl |= JOBCTL_STOP_DEQUEUED;
} }
#ifdef CONFIG_POSIX_TIMERS
if (resched_timer) {
/*
* Release the siglock to ensure proper locking order
* of timer locks outside of siglocks. Note, we leave
* irqs disabled here, since the posix-timers code is
* about to disable them again anyway.
*/
spin_unlock(&tsk->sighand->siglock);
posixtimer_rearm(info);
spin_lock(&tsk->sighand->siglock);
/* Don't expose the si_sys_private value to userspace */ if (IS_ENABLED(CONFIG_POSIX_TIMERS) && unlikely(timer_sigq)) {
info->si_sys_private = 0; if (!posixtimer_deliver_signal(info, timer_sigq))
goto again;
} }
#endif
return signr; return signr;
} }
EXPORT_SYMBOL_GPL(dequeue_signal); EXPORT_SYMBOL_GPL(dequeue_signal);
@ -773,17 +734,24 @@ void signal_wake_up_state(struct task_struct *t, unsigned int state)
kick_process(t); kick_process(t);
} }
/* static inline void posixtimer_sig_ignore(struct task_struct *tsk, struct sigqueue *q);
* Remove signals in mask from the pending set and queue.
* Returns 1 if any signals were found. static void sigqueue_free_ignored(struct task_struct *tsk, struct sigqueue *q)
* {
* All callers must be holding the siglock. if (likely(!(q->flags & SIGQUEUE_PREALLOC) || q->info.si_code != SI_TIMER))
*/ __sigqueue_free(q);
static void flush_sigqueue_mask(sigset_t *mask, struct sigpending *s) else
posixtimer_sig_ignore(tsk, q);
}
/* Remove signals in mask from the pending set and queue. */
static void flush_sigqueue_mask(struct task_struct *p, sigset_t *mask, struct sigpending *s)
{ {
struct sigqueue *q, *n; struct sigqueue *q, *n;
sigset_t m; sigset_t m;
lockdep_assert_held(&p->sighand->siglock);
sigandsets(&m, mask, &s->signal); sigandsets(&m, mask, &s->signal);
if (sigisemptyset(&m)) if (sigisemptyset(&m))
return; return;
@ -792,7 +760,7 @@ static void flush_sigqueue_mask(sigset_t *mask, struct sigpending *s)
list_for_each_entry_safe(q, n, &s->list, list) { list_for_each_entry_safe(q, n, &s->list, list) {
if (sigismember(mask, q->info.si_signo)) { if (sigismember(mask, q->info.si_signo)) {
list_del_init(&q->list); list_del_init(&q->list);
__sigqueue_free(q); sigqueue_free_ignored(p, q);
} }
} }
} }
@ -917,18 +885,18 @@ static bool prepare_signal(int sig, struct task_struct *p, bool force)
* This is a stop signal. Remove SIGCONT from all queues. * This is a stop signal. Remove SIGCONT from all queues.
*/ */
siginitset(&flush, sigmask(SIGCONT)); siginitset(&flush, sigmask(SIGCONT));
flush_sigqueue_mask(&flush, &signal->shared_pending); flush_sigqueue_mask(p, &flush, &signal->shared_pending);
for_each_thread(p, t) for_each_thread(p, t)
flush_sigqueue_mask(&flush, &t->pending); flush_sigqueue_mask(p, &flush, &t->pending);
} else if (sig == SIGCONT) { } else if (sig == SIGCONT) {
unsigned int why; unsigned int why;
/* /*
* Remove all stop signals from all queues, wake all threads. * Remove all stop signals from all queues, wake all threads.
*/ */
siginitset(&flush, SIG_KERNEL_STOP_MASK); siginitset(&flush, SIG_KERNEL_STOP_MASK);
flush_sigqueue_mask(&flush, &signal->shared_pending); flush_sigqueue_mask(p, &flush, &signal->shared_pending);
for_each_thread(p, t) { for_each_thread(p, t) {
flush_sigqueue_mask(&flush, &t->pending); flush_sigqueue_mask(p, &flush, &t->pending);
task_clear_jobctl_pending(t, JOBCTL_STOP_PENDING); task_clear_jobctl_pending(t, JOBCTL_STOP_PENDING);
if (likely(!(t->ptrace & PT_SEIZED))) { if (likely(!(t->ptrace & PT_SEIZED))) {
t->jobctl &= ~JOBCTL_STOPPED; t->jobctl &= ~JOBCTL_STOPPED;
@ -1115,7 +1083,7 @@ static int __send_signal_locked(int sig, struct kernel_siginfo *info,
else else
override_rlimit = 0; override_rlimit = 0;
q = __sigqueue_alloc(sig, t, GFP_ATOMIC, override_rlimit, 0); q = sigqueue_alloc(sig, t, GFP_ATOMIC, override_rlimit);
if (q) { if (q) {
list_add_tail(&q->list, &pending->list); list_add_tail(&q->list, &pending->list);
@ -1923,112 +1891,242 @@ int kill_pid(struct pid *pid, int sig, int priv)
} }
EXPORT_SYMBOL(kill_pid); EXPORT_SYMBOL(kill_pid);
#ifdef CONFIG_POSIX_TIMERS
/* /*
* These functions support sending signals using preallocated sigqueue * These functions handle POSIX timer signals. POSIX timers use
* structures. This is needed "because realtime applications cannot * preallocated sigqueue structs for sending signals.
* afford to lose notifications of asynchronous events, like timer
* expirations or I/O completions". In the case of POSIX Timers
* we allocate the sigqueue structure from the timer_create. If this
* allocation fails we are able to report the failure to the application
* with an EAGAIN error.
*/ */
struct sigqueue *sigqueue_alloc(void) static void __flush_itimer_signals(struct sigpending *pending)
{ {
return __sigqueue_alloc(-1, current, GFP_KERNEL, 0, SIGQUEUE_PREALLOC); sigset_t signal, retain;
} struct sigqueue *q, *n;
void sigqueue_free(struct sigqueue *q) signal = pending->signal;
{ sigemptyset(&retain);
spinlock_t *lock = &current->sighand->siglock;
unsigned long flags;
if (WARN_ON_ONCE(!(q->flags & SIGQUEUE_PREALLOC))) list_for_each_entry_safe(q, n, &pending->list, list) {
return; int sig = q->info.si_signo;
/*
* We must hold ->siglock while testing q->list
* to serialize with collect_signal() or with
* __exit_signal()->flush_sigqueue().
*/
spin_lock_irqsave(lock, flags);
q->flags &= ~SIGQUEUE_PREALLOC;
/*
* If it is queued it will be freed when dequeued,
* like the "regular" sigqueue.
*/
if (!list_empty(&q->list))
q = NULL;
spin_unlock_irqrestore(lock, flags);
if (q) if (likely(q->info.si_code != SI_TIMER)) {
__sigqueue_free(q); sigaddset(&retain, sig);
} } else {
sigdelset(&signal, sig);
int send_sigqueue(struct sigqueue *q, struct pid *pid, enum pid_type type) list_del_init(&q->list);
{ __sigqueue_free(q);
int sig = q->info.si_signo; }
struct sigpending *pending;
struct task_struct *t;
unsigned long flags;
int ret, result;
if (WARN_ON_ONCE(!(q->flags & SIGQUEUE_PREALLOC)))
return 0;
if (WARN_ON_ONCE(q->info.si_code != SI_TIMER))
return 0;
ret = -1;
rcu_read_lock();
/*
* This function is used by POSIX timers to deliver a timer signal.
* Where type is PIDTYPE_PID (such as for timers with SIGEV_THREAD_ID
* set), the signal must be delivered to the specific thread (queues
* into t->pending).
*
* Where type is not PIDTYPE_PID, signals must be delivered to the
* process. In this case, prefer to deliver to current if it is in
* the same thread group as the target process, which avoids
* unnecessarily waking up a potentially idle task.
*/
t = pid_task(pid, type);
if (!t)
goto ret;
if (type != PIDTYPE_PID && same_thread_group(t, current))
t = current;
if (!likely(lock_task_sighand(t, &flags)))
goto ret;
ret = 1; /* the signal is ignored */
result = TRACE_SIGNAL_IGNORED;
if (!prepare_signal(sig, t, false))
goto out;
ret = 0;
if (unlikely(!list_empty(&q->list))) {
/*
* If an SI_TIMER entry is already queue just increment
* the overrun count.
*/
q->info.si_overrun++;
result = TRACE_SIGNAL_ALREADY_PENDING;
goto out;
} }
q->info.si_overrun = 0;
sigorsets(&pending->signal, &signal, &retain);
}
void flush_itimer_signals(void)
{
struct task_struct *tsk = current;
guard(spinlock_irqsave)(&tsk->sighand->siglock);
__flush_itimer_signals(&tsk->pending);
__flush_itimer_signals(&tsk->signal->shared_pending);
}
bool posixtimer_init_sigqueue(struct sigqueue *q)
{
struct ucounts *ucounts = sig_get_ucounts(current, -1, 0);
if (!ucounts)
return false;
clear_siginfo(&q->info);
__sigqueue_init(q, ucounts, SIGQUEUE_PREALLOC);
return true;
}
static void posixtimer_queue_sigqueue(struct sigqueue *q, struct task_struct *t, enum pid_type type)
{
struct sigpending *pending;
int sig = q->info.si_signo;
signalfd_notify(t, sig); signalfd_notify(t, sig);
pending = (type != PIDTYPE_PID) ? &t->signal->shared_pending : &t->pending; pending = (type != PIDTYPE_PID) ? &t->signal->shared_pending : &t->pending;
list_add_tail(&q->list, &pending->list); list_add_tail(&q->list, &pending->list);
sigaddset(&pending->signal, sig); sigaddset(&pending->signal, sig);
complete_signal(sig, t, type); complete_signal(sig, t, type);
}
/*
* This function is used by POSIX timers to deliver a timer signal.
* Where type is PIDTYPE_PID (such as for timers with SIGEV_THREAD_ID
* set), the signal must be delivered to the specific thread (queues
* into t->pending).
*
* Where type is not PIDTYPE_PID, signals must be delivered to the
* process. In this case, prefer to deliver to current if it is in
* the same thread group as the target process, which avoids
* unnecessarily waking up a potentially idle task.
*/
static inline struct task_struct *posixtimer_get_target(struct k_itimer *tmr)
{
struct task_struct *t = pid_task(tmr->it_pid, tmr->it_pid_type);
if (t && tmr->it_pid_type != PIDTYPE_PID && same_thread_group(t, current))
t = current;
return t;
}
void posixtimer_send_sigqueue(struct k_itimer *tmr)
{
struct sigqueue *q = &tmr->sigq;
int sig = q->info.si_signo;
struct task_struct *t;
unsigned long flags;
int result;
guard(rcu)();
t = posixtimer_get_target(tmr);
if (!t)
return;
if (!likely(lock_task_sighand(t, &flags)))
return;
/*
* Update @tmr::sigqueue_seq for posix timer signals with sighand
* locked to prevent a race against dequeue_signal().
*/
tmr->it_sigqueue_seq = tmr->it_signal_seq;
/*
* Set the signal delivery status under sighand lock, so that the
* ignored signal handling can distinguish between a periodic and a
* non-periodic timer.
*/
tmr->it_sig_periodic = tmr->it_status == POSIX_TIMER_REQUEUE_PENDING;
if (!prepare_signal(sig, t, false)) {
result = TRACE_SIGNAL_IGNORED;
if (!list_empty(&q->list)) {
/*
* If task group is exiting with the signal already pending,
* wait for __exit_signal() to do its job. Otherwise if
* ignored, it's not supposed to be queued. Try to survive.
*/
WARN_ON_ONCE(!(t->signal->flags & SIGNAL_GROUP_EXIT));
goto out;
}
/* Periodic timers with SIG_IGN are queued on the ignored list */
if (tmr->it_sig_periodic) {
/*
* Already queued means the timer was rearmed after
* the previous expiry got it on the ignore list.
* Nothing to do for that case.
*/
if (hlist_unhashed(&tmr->ignored_list)) {
/*
* Take a signal reference and queue it on
* the ignored list.
*/
posixtimer_sigqueue_getref(q);
posixtimer_sig_ignore(t, q);
}
} else if (!hlist_unhashed(&tmr->ignored_list)) {
/*
* Covers the case where a timer was periodic and
* then the signal was ignored. Later it was rearmed
* as oneshot timer. The previous signal is invalid
* now, and this oneshot signal has to be dropped.
* Remove it from the ignored list and drop the
* reference count as the signal is not longer
* queued.
*/
hlist_del_init(&tmr->ignored_list);
posixtimer_putref(tmr);
}
goto out;
}
/* This should never happen and leaks a reference count */
if (WARN_ON_ONCE(!hlist_unhashed(&tmr->ignored_list)))
hlist_del_init(&tmr->ignored_list);
if (unlikely(!list_empty(&q->list))) {
/* This holds a reference count already */
result = TRACE_SIGNAL_ALREADY_PENDING;
goto out;
}
posixtimer_sigqueue_getref(q);
posixtimer_queue_sigqueue(q, t, tmr->it_pid_type);
result = TRACE_SIGNAL_DELIVERED; result = TRACE_SIGNAL_DELIVERED;
out: out:
trace_signal_generate(sig, &q->info, t, type != PIDTYPE_PID, result); trace_signal_generate(sig, &q->info, t, tmr->it_pid_type != PIDTYPE_PID, result);
unlock_task_sighand(t, &flags); unlock_task_sighand(t, &flags);
ret:
rcu_read_unlock();
return ret;
} }
static inline void posixtimer_sig_ignore(struct task_struct *tsk, struct sigqueue *q)
{
struct k_itimer *tmr = container_of(q, struct k_itimer, sigq);
/*
* If the timer is marked deleted already or the signal originates
* from a non-periodic timer, then just drop the reference
* count. Otherwise queue it on the ignored list.
*/
if (tmr->it_signal && tmr->it_sig_periodic)
hlist_add_head(&tmr->ignored_list, &tsk->signal->ignored_posix_timers);
else
posixtimer_putref(tmr);
}
static void posixtimer_sig_unignore(struct task_struct *tsk, int sig)
{
struct hlist_head *head = &tsk->signal->ignored_posix_timers;
struct hlist_node *tmp;
struct k_itimer *tmr;
if (likely(hlist_empty(head)))
return;
/*
* Rearming a timer with sighand lock held is not possible due to
* lock ordering vs. tmr::it_lock. Just stick the sigqueue back and
* let the signal delivery path deal with it whether it needs to be
* rearmed or not. This cannot be decided here w/o dropping sighand
* lock and creating a loop retry horror show.
*/
hlist_for_each_entry_safe(tmr, tmp , head, ignored_list) {
struct task_struct *target;
/*
* tmr::sigq.info.si_signo is immutable, so accessing it
* without holding tmr::it_lock is safe.
*/
if (tmr->sigq.info.si_signo != sig)
continue;
hlist_del_init(&tmr->ignored_list);
/* This should never happen and leaks a reference count */
if (WARN_ON_ONCE(!list_empty(&tmr->sigq.list)))
continue;
/*
* Get the target for the signal. If target is a thread and
* has exited by now, drop the reference count.
*/
guard(rcu)();
target = posixtimer_get_target(tmr);
if (target)
posixtimer_queue_sigqueue(&tmr->sigq, target, tmr->it_pid_type);
else
posixtimer_putref(tmr);
}
}
#else /* CONFIG_POSIX_TIMERS */
static inline void posixtimer_sig_ignore(struct task_struct *tsk, struct sigqueue *q) { }
static inline void posixtimer_sig_unignore(struct task_struct *tsk, int sig) { }
#endif /* !CONFIG_POSIX_TIMERS */
void do_notify_pidfd(struct task_struct *task) void do_notify_pidfd(struct task_struct *task)
{ {
struct pid *pid = task_pid(task); struct pid *pid = task_pid(task);
@ -4145,8 +4243,8 @@ void kernel_sigaction(int sig, __sighandler_t action)
sigemptyset(&mask); sigemptyset(&mask);
sigaddset(&mask, sig); sigaddset(&mask, sig);
flush_sigqueue_mask(&mask, &current->signal->shared_pending); flush_sigqueue_mask(current, &mask, &current->signal->shared_pending);
flush_sigqueue_mask(&mask, &current->pending); flush_sigqueue_mask(current, &mask, &current->pending);
recalc_sigpending(); recalc_sigpending();
} }
spin_unlock_irq(&current->sighand->siglock); spin_unlock_irq(&current->sighand->siglock);
@ -4196,6 +4294,8 @@ int do_sigaction(int sig, struct k_sigaction *act, struct k_sigaction *oact)
sigaction_compat_abi(act, oact); sigaction_compat_abi(act, oact);
if (act) { if (act) {
bool was_ignored = k->sa.sa_handler == SIG_IGN;
sigdelsetmask(&act->sa.sa_mask, sigdelsetmask(&act->sa.sa_mask,
sigmask(SIGKILL) | sigmask(SIGSTOP)); sigmask(SIGKILL) | sigmask(SIGSTOP));
*k = *act; *k = *act;
@ -4213,9 +4313,11 @@ int do_sigaction(int sig, struct k_sigaction *act, struct k_sigaction *oact)
if (sig_handler_ignored(sig_handler(p, sig), sig)) { if (sig_handler_ignored(sig_handler(p, sig), sig)) {
sigemptyset(&mask); sigemptyset(&mask);
sigaddset(&mask, sig); sigaddset(&mask, sig);
flush_sigqueue_mask(&mask, &p->signal->shared_pending); flush_sigqueue_mask(p, &mask, &p->signal->shared_pending);
for_each_thread(p, t) for_each_thread(p, t)
flush_sigqueue_mask(&mask, &t->pending); flush_sigqueue_mask(p, &mask, &t->pending);
} else if (was_ignored) {
posixtimer_sig_unignore(p, sig);
} }
} }

View File

@ -17,11 +17,6 @@ config ARCH_CLOCKSOURCE_DATA
config ARCH_CLOCKSOURCE_INIT config ARCH_CLOCKSOURCE_INIT
bool bool
# Clocksources require validation of the clocksource against the last
# cycle update - x86/TSC misfeature
config CLOCKSOURCE_VALIDATE_LAST_CYCLE
bool
# Timekeeping vsyscall support # Timekeeping vsyscall support
config GENERIC_TIME_VSYSCALL config GENERIC_TIME_VSYSCALL
bool bool

View File

@ -1,5 +1,5 @@
# SPDX-License-Identifier: GPL-2.0 # SPDX-License-Identifier: GPL-2.0
obj-y += time.o timer.o hrtimer.o obj-y += time.o timer.o hrtimer.o sleep_timeout.o
obj-y += timekeeping.o ntp.o clocksource.o jiffies.o timer_list.o obj-y += timekeeping.o ntp.o clocksource.o jiffies.o timer_list.o
obj-y += timeconv.o timecounter.o alarmtimer.o obj-y += timeconv.o timecounter.o alarmtimer.o

View File

@ -197,28 +197,15 @@ static enum hrtimer_restart alarmtimer_fired(struct hrtimer *timer)
{ {
struct alarm *alarm = container_of(timer, struct alarm, timer); struct alarm *alarm = container_of(timer, struct alarm, timer);
struct alarm_base *base = &alarm_bases[alarm->type]; struct alarm_base *base = &alarm_bases[alarm->type];
unsigned long flags;
int ret = HRTIMER_NORESTART;
int restart = ALARMTIMER_NORESTART;
spin_lock_irqsave(&base->lock, flags); scoped_guard (spinlock_irqsave, &base->lock)
alarmtimer_dequeue(base, alarm); alarmtimer_dequeue(base, alarm);
spin_unlock_irqrestore(&base->lock, flags);
if (alarm->function) if (alarm->function)
restart = alarm->function(alarm, base->get_ktime()); alarm->function(alarm, base->get_ktime());
spin_lock_irqsave(&base->lock, flags);
if (restart != ALARMTIMER_NORESTART) {
hrtimer_set_expires(&alarm->timer, alarm->node.expires);
alarmtimer_enqueue(base, alarm);
ret = HRTIMER_RESTART;
}
spin_unlock_irqrestore(&base->lock, flags);
trace_alarmtimer_fired(alarm, base->get_ktime()); trace_alarmtimer_fired(alarm, base->get_ktime());
return ret; return HRTIMER_NORESTART;
} }
ktime_t alarm_expires_remaining(const struct alarm *alarm) ktime_t alarm_expires_remaining(const struct alarm *alarm)
@ -334,10 +321,9 @@ static int alarmtimer_resume(struct device *dev)
static void static void
__alarm_init(struct alarm *alarm, enum alarmtimer_type type, __alarm_init(struct alarm *alarm, enum alarmtimer_type type,
enum alarmtimer_restart (*function)(struct alarm *, ktime_t)) void (*function)(struct alarm *, ktime_t))
{ {
timerqueue_init(&alarm->node); timerqueue_init(&alarm->node);
alarm->timer.function = alarmtimer_fired;
alarm->function = function; alarm->function = function;
alarm->type = type; alarm->type = type;
alarm->state = ALARMTIMER_STATE_INACTIVE; alarm->state = ALARMTIMER_STATE_INACTIVE;
@ -350,10 +336,10 @@ __alarm_init(struct alarm *alarm, enum alarmtimer_type type,
* @function: callback that is run when the alarm fires * @function: callback that is run when the alarm fires
*/ */
void alarm_init(struct alarm *alarm, enum alarmtimer_type type, void alarm_init(struct alarm *alarm, enum alarmtimer_type type,
enum alarmtimer_restart (*function)(struct alarm *, ktime_t)) void (*function)(struct alarm *, ktime_t))
{ {
hrtimer_init(&alarm->timer, alarm_bases[type].base_clockid, hrtimer_setup(&alarm->timer, alarmtimer_fired, alarm_bases[type].base_clockid,
HRTIMER_MODE_ABS); HRTIMER_MODE_ABS);
__alarm_init(alarm, type, function); __alarm_init(alarm, type, function);
} }
EXPORT_SYMBOL_GPL(alarm_init); EXPORT_SYMBOL_GPL(alarm_init);
@ -480,35 +466,11 @@ u64 alarm_forward(struct alarm *alarm, ktime_t now, ktime_t interval)
} }
EXPORT_SYMBOL_GPL(alarm_forward); EXPORT_SYMBOL_GPL(alarm_forward);
static u64 __alarm_forward_now(struct alarm *alarm, ktime_t interval, bool throttle)
{
struct alarm_base *base = &alarm_bases[alarm->type];
ktime_t now = base->get_ktime();
if (IS_ENABLED(CONFIG_HIGH_RES_TIMERS) && throttle) {
/*
* Same issue as with posix_timer_fn(). Timers which are
* periodic but the signal is ignored can starve the system
* with a very small interval. The real fix which was
* promised in the context of posix_timer_fn() never
* materialized, but someone should really work on it.
*
* To prevent DOS fake @now to be 1 jiffy out which keeps
* the overrun accounting correct but creates an
* inconsistency vs. timer_gettime(2).
*/
ktime_t kj = NSEC_PER_SEC / HZ;
if (interval < kj)
now = ktime_add(now, kj);
}
return alarm_forward(alarm, now, interval);
}
u64 alarm_forward_now(struct alarm *alarm, ktime_t interval) u64 alarm_forward_now(struct alarm *alarm, ktime_t interval)
{ {
return __alarm_forward_now(alarm, interval, false); struct alarm_base *base = &alarm_bases[alarm->type];
return alarm_forward(alarm, base->get_ktime(), interval);
} }
EXPORT_SYMBOL_GPL(alarm_forward_now); EXPORT_SYMBOL_GPL(alarm_forward_now);
@ -567,30 +529,12 @@ static enum alarmtimer_type clock2alarm(clockid_t clockid)
* *
* Return: whether the timer is to be restarted * Return: whether the timer is to be restarted
*/ */
static enum alarmtimer_restart alarm_handle_timer(struct alarm *alarm, static void alarm_handle_timer(struct alarm *alarm, ktime_t now)
ktime_t now)
{ {
struct k_itimer *ptr = container_of(alarm, struct k_itimer, struct k_itimer *ptr = container_of(alarm, struct k_itimer, it.alarm.alarmtimer);
it.alarm.alarmtimer);
enum alarmtimer_restart result = ALARMTIMER_NORESTART;
unsigned long flags;
spin_lock_irqsave(&ptr->it_lock, flags); guard(spinlock_irqsave)(&ptr->it_lock);
posix_timer_queue_signal(ptr);
if (posix_timer_queue_signal(ptr) && ptr->it_interval) {
/*
* Handle ignored signals and rearm the timer. This will go
* away once we handle ignored signals proper. Ensure that
* small intervals cannot starve the system.
*/
ptr->it_overrun += __alarm_forward_now(alarm, ptr->it_interval, true);
++ptr->it_requeue_pending;
ptr->it_active = 1;
result = ALARMTIMER_RESTART;
}
spin_unlock_irqrestore(&ptr->it_lock, flags);
return result;
} }
/** /**
@ -751,18 +695,14 @@ static int alarm_timer_create(struct k_itimer *new_timer)
* @now: time at the timer expiration * @now: time at the timer expiration
* *
* Wakes up the task that set the alarmtimer * Wakes up the task that set the alarmtimer
*
* Return: ALARMTIMER_NORESTART
*/ */
static enum alarmtimer_restart alarmtimer_nsleep_wakeup(struct alarm *alarm, static void alarmtimer_nsleep_wakeup(struct alarm *alarm, ktime_t now)
ktime_t now)
{ {
struct task_struct *task = alarm->data; struct task_struct *task = alarm->data;
alarm->data = NULL; alarm->data = NULL;
if (task) if (task)
wake_up_process(task); wake_up_process(task);
return ALARMTIMER_NORESTART;
} }
/** /**
@ -814,10 +754,10 @@ static int alarmtimer_do_nsleep(struct alarm *alarm, ktime_t absexp,
static void static void
alarm_init_on_stack(struct alarm *alarm, enum alarmtimer_type type, alarm_init_on_stack(struct alarm *alarm, enum alarmtimer_type type,
enum alarmtimer_restart (*function)(struct alarm *, ktime_t)) void (*function)(struct alarm *, ktime_t))
{ {
hrtimer_init_on_stack(&alarm->timer, alarm_bases[type].base_clockid, hrtimer_setup_on_stack(&alarm->timer, alarmtimer_fired, alarm_bases[type].base_clockid,
HRTIMER_MODE_ABS); HRTIMER_MODE_ABS);
__alarm_init(alarm, type, function); __alarm_init(alarm, type, function);
} }

View File

@ -337,13 +337,21 @@ int clockevents_program_event(struct clock_event_device *dev, ktime_t expires,
} }
/* /*
* Called after a notify add to make devices available which were * Called after a clockevent has been added which might
* released from the notifier call. * have replaced a current regular or broadcast device. A
* released normal device might be a suitable replacement
* for the current broadcast device. Similarly a released
* broadcast device might be a suitable replacement for a
* normal device.
*/ */
static void clockevents_notify_released(void) static void clockevents_notify_released(void)
{ {
struct clock_event_device *dev; struct clock_event_device *dev;
/*
* Keep iterating as long as tick_check_new_device()
* replaces a device.
*/
while (!list_empty(&clockevents_released)) { while (!list_empty(&clockevents_released)) {
dev = list_entry(clockevents_released.next, dev = list_entry(clockevents_released.next,
struct clock_event_device, list); struct clock_event_device, list);
@ -610,39 +618,30 @@ void clockevents_resume(void)
#ifdef CONFIG_HOTPLUG_CPU #ifdef CONFIG_HOTPLUG_CPU
# ifdef CONFIG_GENERIC_CLOCKEVENTS_BROADCAST
/** /**
* tick_offline_cpu - Take CPU out of the broadcast mechanism * tick_offline_cpu - Shutdown all clock events related
* to this CPU and take it out of the
* broadcast mechanism.
* @cpu: The outgoing CPU * @cpu: The outgoing CPU
* *
* Called on the outgoing CPU after it took itself offline. * Called by the dying CPU during teardown.
*/ */
void tick_offline_cpu(unsigned int cpu) void tick_offline_cpu(unsigned int cpu)
{
raw_spin_lock(&clockevents_lock);
tick_broadcast_offline(cpu);
raw_spin_unlock(&clockevents_lock);
}
# endif
/**
* tick_cleanup_dead_cpu - Cleanup the tick and clockevents of a dead cpu
* @cpu: The dead CPU
*/
void tick_cleanup_dead_cpu(int cpu)
{ {
struct clock_event_device *dev, *tmp; struct clock_event_device *dev, *tmp;
unsigned long flags;
raw_spin_lock_irqsave(&clockevents_lock, flags); raw_spin_lock(&clockevents_lock);
tick_broadcast_offline(cpu);
tick_shutdown(cpu); tick_shutdown(cpu);
/* /*
* Unregister the clock event devices which were * Unregister the clock event devices which were
* released from the users in the notify chain. * released above.
*/ */
list_for_each_entry_safe(dev, tmp, &clockevents_released, list) list_for_each_entry_safe(dev, tmp, &clockevents_released, list)
list_del(&dev->list); list_del(&dev->list);
/* /*
* Now check whether the CPU has left unused per cpu devices * Now check whether the CPU has left unused per cpu devices
*/ */
@ -654,7 +653,8 @@ void tick_cleanup_dead_cpu(int cpu)
list_del(&dev->list); list_del(&dev->list);
} }
} }
raw_spin_unlock_irqrestore(&clockevents_lock, flags);
raw_spin_unlock(&clockevents_lock);
} }
#endif #endif

View File

@ -20,6 +20,8 @@
#include "tick-internal.h" #include "tick-internal.h"
#include "timekeeping_internal.h" #include "timekeeping_internal.h"
static void clocksource_enqueue(struct clocksource *cs);
static noinline u64 cycles_to_nsec_safe(struct clocksource *cs, u64 start, u64 end) static noinline u64 cycles_to_nsec_safe(struct clocksource *cs, u64 start, u64 end)
{ {
u64 delta = clocksource_delta(end, start, cs->mask); u64 delta = clocksource_delta(end, start, cs->mask);
@ -171,7 +173,6 @@ static inline void clocksource_watchdog_unlock(unsigned long *flags)
} }
static int clocksource_watchdog_kthread(void *data); static int clocksource_watchdog_kthread(void *data);
static void __clocksource_change_rating(struct clocksource *cs, int rating);
static void clocksource_watchdog_work(struct work_struct *work) static void clocksource_watchdog_work(struct work_struct *work)
{ {
@ -191,6 +192,13 @@ static void clocksource_watchdog_work(struct work_struct *work)
kthread_run(clocksource_watchdog_kthread, NULL, "kwatchdog"); kthread_run(clocksource_watchdog_kthread, NULL, "kwatchdog");
} }
static void clocksource_change_rating(struct clocksource *cs, int rating)
{
list_del(&cs->list);
cs->rating = rating;
clocksource_enqueue(cs);
}
static void __clocksource_unstable(struct clocksource *cs) static void __clocksource_unstable(struct clocksource *cs)
{ {
cs->flags &= ~(CLOCK_SOURCE_VALID_FOR_HRES | CLOCK_SOURCE_WATCHDOG); cs->flags &= ~(CLOCK_SOURCE_VALID_FOR_HRES | CLOCK_SOURCE_WATCHDOG);
@ -697,7 +705,7 @@ static int __clocksource_watchdog_kthread(void)
list_for_each_entry_safe(cs, tmp, &watchdog_list, wd_list) { list_for_each_entry_safe(cs, tmp, &watchdog_list, wd_list) {
if (cs->flags & CLOCK_SOURCE_UNSTABLE) { if (cs->flags & CLOCK_SOURCE_UNSTABLE) {
list_del_init(&cs->wd_list); list_del_init(&cs->wd_list);
__clocksource_change_rating(cs, 0); clocksource_change_rating(cs, 0);
select = 1; select = 1;
} }
if (cs->flags & CLOCK_SOURCE_RESELECT) { if (cs->flags & CLOCK_SOURCE_RESELECT) {
@ -1255,34 +1263,6 @@ int __clocksource_register_scale(struct clocksource *cs, u32 scale, u32 freq)
} }
EXPORT_SYMBOL_GPL(__clocksource_register_scale); EXPORT_SYMBOL_GPL(__clocksource_register_scale);
static void __clocksource_change_rating(struct clocksource *cs, int rating)
{
list_del(&cs->list);
cs->rating = rating;
clocksource_enqueue(cs);
}
/**
* clocksource_change_rating - Change the rating of a registered clocksource
* @cs: clocksource to be changed
* @rating: new rating
*/
void clocksource_change_rating(struct clocksource *cs, int rating)
{
unsigned long flags;
mutex_lock(&clocksource_mutex);
clocksource_watchdog_lock(&flags);
__clocksource_change_rating(cs, rating);
clocksource_watchdog_unlock(&flags);
clocksource_select();
clocksource_select_watchdog(false);
clocksource_suspend_select(false);
mutex_unlock(&clocksource_mutex);
}
EXPORT_SYMBOL(clocksource_change_rating);
/* /*
* Unbind clocksource @cs. Called with clocksource_mutex held * Unbind clocksource @cs. Called with clocksource_mutex held
*/ */

View File

@ -417,6 +417,11 @@ static inline void debug_hrtimer_init(struct hrtimer *timer)
debug_object_init(timer, &hrtimer_debug_descr); debug_object_init(timer, &hrtimer_debug_descr);
} }
static inline void debug_hrtimer_init_on_stack(struct hrtimer *timer)
{
debug_object_init_on_stack(timer, &hrtimer_debug_descr);
}
static inline void debug_hrtimer_activate(struct hrtimer *timer, static inline void debug_hrtimer_activate(struct hrtimer *timer,
enum hrtimer_mode mode) enum hrtimer_mode mode)
{ {
@ -428,28 +433,6 @@ static inline void debug_hrtimer_deactivate(struct hrtimer *timer)
debug_object_deactivate(timer, &hrtimer_debug_descr); debug_object_deactivate(timer, &hrtimer_debug_descr);
} }
static void __hrtimer_init(struct hrtimer *timer, clockid_t clock_id,
enum hrtimer_mode mode);
void hrtimer_init_on_stack(struct hrtimer *timer, clockid_t clock_id,
enum hrtimer_mode mode)
{
debug_object_init_on_stack(timer, &hrtimer_debug_descr);
__hrtimer_init(timer, clock_id, mode);
}
EXPORT_SYMBOL_GPL(hrtimer_init_on_stack);
static void __hrtimer_init_sleeper(struct hrtimer_sleeper *sl,
clockid_t clock_id, enum hrtimer_mode mode);
void hrtimer_init_sleeper_on_stack(struct hrtimer_sleeper *sl,
clockid_t clock_id, enum hrtimer_mode mode)
{
debug_object_init_on_stack(&sl->timer, &hrtimer_debug_descr);
__hrtimer_init_sleeper(sl, clock_id, mode);
}
EXPORT_SYMBOL_GPL(hrtimer_init_sleeper_on_stack);
void destroy_hrtimer_on_stack(struct hrtimer *timer) void destroy_hrtimer_on_stack(struct hrtimer *timer)
{ {
debug_object_free(timer, &hrtimer_debug_descr); debug_object_free(timer, &hrtimer_debug_descr);
@ -459,6 +442,7 @@ EXPORT_SYMBOL_GPL(destroy_hrtimer_on_stack);
#else #else
static inline void debug_hrtimer_init(struct hrtimer *timer) { } static inline void debug_hrtimer_init(struct hrtimer *timer) { }
static inline void debug_hrtimer_init_on_stack(struct hrtimer *timer) { }
static inline void debug_hrtimer_activate(struct hrtimer *timer, static inline void debug_hrtimer_activate(struct hrtimer *timer,
enum hrtimer_mode mode) { } enum hrtimer_mode mode) { }
static inline void debug_hrtimer_deactivate(struct hrtimer *timer) { } static inline void debug_hrtimer_deactivate(struct hrtimer *timer) { }
@ -472,6 +456,13 @@ debug_init(struct hrtimer *timer, clockid_t clockid,
trace_hrtimer_init(timer, clockid, mode); trace_hrtimer_init(timer, clockid, mode);
} }
static inline void debug_init_on_stack(struct hrtimer *timer, clockid_t clockid,
enum hrtimer_mode mode)
{
debug_hrtimer_init_on_stack(timer);
trace_hrtimer_init(timer, clockid, mode);
}
static inline void debug_activate(struct hrtimer *timer, static inline void debug_activate(struct hrtimer *timer,
enum hrtimer_mode mode) enum hrtimer_mode mode)
{ {
@ -1544,6 +1535,11 @@ static inline int hrtimer_clockid_to_base(clockid_t clock_id)
return HRTIMER_BASE_MONOTONIC; return HRTIMER_BASE_MONOTONIC;
} }
static enum hrtimer_restart hrtimer_dummy_timeout(struct hrtimer *unused)
{
return HRTIMER_NORESTART;
}
static void __hrtimer_init(struct hrtimer *timer, clockid_t clock_id, static void __hrtimer_init(struct hrtimer *timer, clockid_t clock_id,
enum hrtimer_mode mode) enum hrtimer_mode mode)
{ {
@ -1580,6 +1576,18 @@ static void __hrtimer_init(struct hrtimer *timer, clockid_t clock_id,
timerqueue_init(&timer->node); timerqueue_init(&timer->node);
} }
static void __hrtimer_setup(struct hrtimer *timer,
enum hrtimer_restart (*function)(struct hrtimer *),
clockid_t clock_id, enum hrtimer_mode mode)
{
__hrtimer_init(timer, clock_id, mode);
if (WARN_ON_ONCE(!function))
timer->function = hrtimer_dummy_timeout;
else
timer->function = function;
}
/** /**
* hrtimer_init - initialize a timer to the given clock * hrtimer_init - initialize a timer to the given clock
* @timer: the timer to be initialized * @timer: the timer to be initialized
@ -1600,6 +1608,46 @@ void hrtimer_init(struct hrtimer *timer, clockid_t clock_id,
} }
EXPORT_SYMBOL_GPL(hrtimer_init); EXPORT_SYMBOL_GPL(hrtimer_init);
/**
* hrtimer_setup - initialize a timer to the given clock
* @timer: the timer to be initialized
* @function: the callback function
* @clock_id: the clock to be used
* @mode: The modes which are relevant for initialization:
* HRTIMER_MODE_ABS, HRTIMER_MODE_REL, HRTIMER_MODE_ABS_SOFT,
* HRTIMER_MODE_REL_SOFT
*
* The PINNED variants of the above can be handed in,
* but the PINNED bit is ignored as pinning happens
* when the hrtimer is started
*/
void hrtimer_setup(struct hrtimer *timer, enum hrtimer_restart (*function)(struct hrtimer *),
clockid_t clock_id, enum hrtimer_mode mode)
{
debug_init(timer, clock_id, mode);
__hrtimer_setup(timer, function, clock_id, mode);
}
EXPORT_SYMBOL_GPL(hrtimer_setup);
/**
* hrtimer_setup_on_stack - initialize a timer on stack memory
* @timer: The timer to be initialized
* @function: the callback function
* @clock_id: The clock to be used
* @mode: The timer mode
*
* Similar to hrtimer_setup(), except that this one must be used if struct hrtimer is in stack
* memory.
*/
void hrtimer_setup_on_stack(struct hrtimer *timer,
enum hrtimer_restart (*function)(struct hrtimer *),
clockid_t clock_id, enum hrtimer_mode mode)
{
debug_init_on_stack(timer, clock_id, mode);
__hrtimer_setup(timer, function, clock_id, mode);
}
EXPORT_SYMBOL_GPL(hrtimer_setup_on_stack);
/* /*
* A timer is active, when it is enqueued into the rbtree or the * A timer is active, when it is enqueued into the rbtree or the
* callback function is running or it's in the state of being migrated * callback function is running or it's in the state of being migrated
@ -1944,7 +1992,7 @@ void hrtimer_sleeper_start_expires(struct hrtimer_sleeper *sl,
* Make the enqueue delivery mode check work on RT. If the sleeper * Make the enqueue delivery mode check work on RT. If the sleeper
* was initialized for hard interrupt delivery, force the mode bit. * was initialized for hard interrupt delivery, force the mode bit.
* This is a special case for hrtimer_sleepers because * This is a special case for hrtimer_sleepers because
* hrtimer_init_sleeper() determines the delivery mode on RT so the * __hrtimer_init_sleeper() determines the delivery mode on RT so the
* fiddling with this decision is avoided at the call sites. * fiddling with this decision is avoided at the call sites.
*/ */
if (IS_ENABLED(CONFIG_PREEMPT_RT) && sl->timer.is_hard) if (IS_ENABLED(CONFIG_PREEMPT_RT) && sl->timer.is_hard)
@ -1987,19 +2035,18 @@ static void __hrtimer_init_sleeper(struct hrtimer_sleeper *sl,
} }
/** /**
* hrtimer_init_sleeper - initialize sleeper to the given clock * hrtimer_setup_sleeper_on_stack - initialize a sleeper in stack memory
* @sl: sleeper to be initialized * @sl: sleeper to be initialized
* @clock_id: the clock to be used * @clock_id: the clock to be used
* @mode: timer mode abs/rel * @mode: timer mode abs/rel
*/ */
void hrtimer_init_sleeper(struct hrtimer_sleeper *sl, clockid_t clock_id, void hrtimer_setup_sleeper_on_stack(struct hrtimer_sleeper *sl,
enum hrtimer_mode mode) clockid_t clock_id, enum hrtimer_mode mode)
{ {
debug_init(&sl->timer, clock_id, mode); debug_init_on_stack(&sl->timer, clock_id, mode);
__hrtimer_init_sleeper(sl, clock_id, mode); __hrtimer_init_sleeper(sl, clock_id, mode);
} }
EXPORT_SYMBOL_GPL(hrtimer_init_sleeper); EXPORT_SYMBOL_GPL(hrtimer_setup_sleeper_on_stack);
int nanosleep_copyout(struct restart_block *restart, struct timespec64 *ts) int nanosleep_copyout(struct restart_block *restart, struct timespec64 *ts)
{ {
@ -2060,8 +2107,7 @@ static long __sched hrtimer_nanosleep_restart(struct restart_block *restart)
struct hrtimer_sleeper t; struct hrtimer_sleeper t;
int ret; int ret;
hrtimer_init_sleeper_on_stack(&t, restart->nanosleep.clockid, hrtimer_setup_sleeper_on_stack(&t, restart->nanosleep.clockid, HRTIMER_MODE_ABS);
HRTIMER_MODE_ABS);
hrtimer_set_expires_tv64(&t.timer, restart->nanosleep.expires); hrtimer_set_expires_tv64(&t.timer, restart->nanosleep.expires);
ret = do_nanosleep(&t, HRTIMER_MODE_ABS); ret = do_nanosleep(&t, HRTIMER_MODE_ABS);
destroy_hrtimer_on_stack(&t.timer); destroy_hrtimer_on_stack(&t.timer);
@ -2075,7 +2121,7 @@ long hrtimer_nanosleep(ktime_t rqtp, const enum hrtimer_mode mode,
struct hrtimer_sleeper t; struct hrtimer_sleeper t;
int ret = 0; int ret = 0;
hrtimer_init_sleeper_on_stack(&t, clockid, mode); hrtimer_setup_sleeper_on_stack(&t, clockid, mode);
hrtimer_set_expires_range_ns(&t.timer, rqtp, current->timer_slack_ns); hrtimer_set_expires_range_ns(&t.timer, rqtp, current->timer_slack_ns);
ret = do_nanosleep(&t, mode); ret = do_nanosleep(&t, mode);
if (ret != -ERESTART_RESTARTBLOCK) if (ret != -ERESTART_RESTARTBLOCK)
@ -2242,123 +2288,3 @@ void __init hrtimers_init(void)
hrtimers_prepare_cpu(smp_processor_id()); hrtimers_prepare_cpu(smp_processor_id());
open_softirq(HRTIMER_SOFTIRQ, hrtimer_run_softirq); open_softirq(HRTIMER_SOFTIRQ, hrtimer_run_softirq);
} }
/**
* schedule_hrtimeout_range_clock - sleep until timeout
* @expires: timeout value (ktime_t)
* @delta: slack in expires timeout (ktime_t)
* @mode: timer mode
* @clock_id: timer clock to be used
*/
int __sched
schedule_hrtimeout_range_clock(ktime_t *expires, u64 delta,
const enum hrtimer_mode mode, clockid_t clock_id)
{
struct hrtimer_sleeper t;
/*
* Optimize when a zero timeout value is given. It does not
* matter whether this is an absolute or a relative time.
*/
if (expires && *expires == 0) {
__set_current_state(TASK_RUNNING);
return 0;
}
/*
* A NULL parameter means "infinite"
*/
if (!expires) {
schedule();
return -EINTR;
}
hrtimer_init_sleeper_on_stack(&t, clock_id, mode);
hrtimer_set_expires_range_ns(&t.timer, *expires, delta);
hrtimer_sleeper_start_expires(&t, mode);
if (likely(t.task))
schedule();
hrtimer_cancel(&t.timer);
destroy_hrtimer_on_stack(&t.timer);
__set_current_state(TASK_RUNNING);
return !t.task ? 0 : -EINTR;
}
EXPORT_SYMBOL_GPL(schedule_hrtimeout_range_clock);
/**
* schedule_hrtimeout_range - sleep until timeout
* @expires: timeout value (ktime_t)
* @delta: slack in expires timeout (ktime_t)
* @mode: timer mode
*
* Make the current task sleep until the given expiry time has
* elapsed. The routine will return immediately unless
* the current task state has been set (see set_current_state()).
*
* The @delta argument gives the kernel the freedom to schedule the
* actual wakeup to a time that is both power and performance friendly
* for regular (non RT/DL) tasks.
* The kernel give the normal best effort behavior for "@expires+@delta",
* but may decide to fire the timer earlier, but no earlier than @expires.
*
* You can set the task state as follows -
*
* %TASK_UNINTERRUPTIBLE - at least @timeout time is guaranteed to
* pass before the routine returns unless the current task is explicitly
* woken up, (e.g. by wake_up_process()).
*
* %TASK_INTERRUPTIBLE - the routine may return early if a signal is
* delivered to the current task or the current task is explicitly woken
* up.
*
* The current task state is guaranteed to be TASK_RUNNING when this
* routine returns.
*
* Returns 0 when the timer has expired. If the task was woken before the
* timer expired by a signal (only possible in state TASK_INTERRUPTIBLE) or
* by an explicit wakeup, it returns -EINTR.
*/
int __sched schedule_hrtimeout_range(ktime_t *expires, u64 delta,
const enum hrtimer_mode mode)
{
return schedule_hrtimeout_range_clock(expires, delta, mode,
CLOCK_MONOTONIC);
}
EXPORT_SYMBOL_GPL(schedule_hrtimeout_range);
/**
* schedule_hrtimeout - sleep until timeout
* @expires: timeout value (ktime_t)
* @mode: timer mode
*
* Make the current task sleep until the given expiry time has
* elapsed. The routine will return immediately unless
* the current task state has been set (see set_current_state()).
*
* You can set the task state as follows -
*
* %TASK_UNINTERRUPTIBLE - at least @timeout time is guaranteed to
* pass before the routine returns unless the current task is explicitly
* woken up, (e.g. by wake_up_process()).
*
* %TASK_INTERRUPTIBLE - the routine may return early if a signal is
* delivered to the current task or the current task is explicitly woken
* up.
*
* The current task state is guaranteed to be TASK_RUNNING when this
* routine returns.
*
* Returns 0 when the timer has expired. If the task was woken before the
* timer expired by a signal (only possible in state TASK_INTERRUPTIBLE) or
* by an explicit wakeup, it returns -EINTR.
*/
int __sched schedule_hrtimeout(ktime_t *expires,
const enum hrtimer_mode mode)
{
return schedule_hrtimeout_range(expires, 0, mode);
}
EXPORT_SYMBOL_GPL(schedule_hrtimeout);

View File

@ -151,7 +151,27 @@ COMPAT_SYSCALL_DEFINE2(getitimer, int, which,
#endif #endif
/* /*
* The timer is automagically restarted, when interval != 0 * Invoked from dequeue_signal() when SIG_ALRM is delivered.
*
* Restart the ITIMER_REAL timer if it is armed as periodic timer. Doing
* this in the signal delivery path instead of self rearming prevents a DoS
* with small increments in the high reolution timer case and reduces timer
* noise in general.
*/
void posixtimer_rearm_itimer(struct task_struct *tsk)
{
struct hrtimer *tmr = &tsk->signal->real_timer;
if (!hrtimer_is_queued(tmr) && tsk->signal->it_real_incr != 0) {
hrtimer_forward(tmr, tmr->base->get_time(),
tsk->signal->it_real_incr);
hrtimer_restart(tmr);
}
}
/*
* Interval timers are restarted in the signal delivery path. See
* posixtimer_rearm_itimer().
*/ */
enum hrtimer_restart it_real_fn(struct hrtimer *timer) enum hrtimer_restart it_real_fn(struct hrtimer *timer)
{ {

File diff suppressed because it is too large Load Diff

View File

@ -453,7 +453,6 @@ static void disarm_timer(struct k_itimer *timer, struct task_struct *p)
struct cpu_timer *ctmr = &timer->it.cpu; struct cpu_timer *ctmr = &timer->it.cpu;
struct posix_cputimer_base *base; struct posix_cputimer_base *base;
timer->it_active = 0;
if (!cpu_timer_dequeue(ctmr)) if (!cpu_timer_dequeue(ctmr))
return; return;
@ -494,19 +493,28 @@ static int posix_cpu_timer_del(struct k_itimer *timer)
*/ */
WARN_ON_ONCE(ctmr->head || timerqueue_node_queued(&ctmr->node)); WARN_ON_ONCE(ctmr->head || timerqueue_node_queued(&ctmr->node));
} else { } else {
if (timer->it.cpu.firing) if (timer->it.cpu.firing) {
/*
* Prevent signal delivery. The timer cannot be dequeued
* because it is on the firing list which is not protected
* by sighand->lock. The delivery path is waiting for
* the timer lock. So go back, unlock and retry.
*/
timer->it.cpu.firing = false;
ret = TIMER_RETRY; ret = TIMER_RETRY;
else } else {
disarm_timer(timer, p); disarm_timer(timer, p);
}
unlock_task_sighand(p, &flags); unlock_task_sighand(p, &flags);
} }
out: out:
rcu_read_unlock(); rcu_read_unlock();
if (!ret)
put_pid(ctmr->pid);
if (!ret) {
put_pid(ctmr->pid);
timer->it_status = POSIX_TIMER_DISARMED;
}
return ret; return ret;
} }
@ -560,7 +568,7 @@ static void arm_timer(struct k_itimer *timer, struct task_struct *p)
struct cpu_timer *ctmr = &timer->it.cpu; struct cpu_timer *ctmr = &timer->it.cpu;
u64 newexp = cpu_timer_getexpires(ctmr); u64 newexp = cpu_timer_getexpires(ctmr);
timer->it_active = 1; timer->it_status = POSIX_TIMER_ARMED;
if (!cpu_timer_enqueue(&base->tqhead, ctmr)) if (!cpu_timer_enqueue(&base->tqhead, ctmr))
return; return;
@ -586,29 +594,20 @@ static void cpu_timer_fire(struct k_itimer *timer)
{ {
struct cpu_timer *ctmr = &timer->it.cpu; struct cpu_timer *ctmr = &timer->it.cpu;
timer->it_active = 0; timer->it_status = POSIX_TIMER_DISARMED;
if (unlikely(timer->sigq == NULL)) {
if (unlikely(ctmr->nanosleep)) {
/* /*
* This a special case for clock_nanosleep, * This a special case for clock_nanosleep,
* not a normal timer from sys_timer_create. * not a normal timer from sys_timer_create.
*/ */
wake_up_process(timer->it_process); wake_up_process(timer->it_process);
cpu_timer_setexpires(ctmr, 0); cpu_timer_setexpires(ctmr, 0);
} else if (!timer->it_interval) { } else {
/*
* One-shot timer. Clear it as soon as it's fired.
*/
posix_timer_queue_signal(timer); posix_timer_queue_signal(timer);
cpu_timer_setexpires(ctmr, 0); /* Disable oneshot timers */
} else if (posix_timer_queue_signal(timer)) { if (!timer->it_interval)
/* cpu_timer_setexpires(ctmr, 0);
* The signal did not get queued because the signal
* was ignored, so we won't get any callback to
* reload the timer. But we need to keep it
* ticking in case the signal is deliverable next time.
*/
posix_cpu_timer_rearm(timer);
++timer->it_requeue_pending;
} }
} }
@ -667,11 +666,17 @@ static int posix_cpu_timer_set(struct k_itimer *timer, int timer_flags,
old_expires = cpu_timer_getexpires(ctmr); old_expires = cpu_timer_getexpires(ctmr);
if (unlikely(timer->it.cpu.firing)) { if (unlikely(timer->it.cpu.firing)) {
timer->it.cpu.firing = -1; /*
* Prevent signal delivery. The timer cannot be dequeued
* because it is on the firing list which is not protected
* by sighand->lock. The delivery path is waiting for
* the timer lock. So go back, unlock and retry.
*/
timer->it.cpu.firing = false;
ret = TIMER_RETRY; ret = TIMER_RETRY;
} else { } else {
cpu_timer_dequeue(ctmr); cpu_timer_dequeue(ctmr);
timer->it_active = 0; timer->it_status = POSIX_TIMER_DISARMED;
} }
/* /*
@ -745,7 +750,7 @@ static void __posix_cpu_timer_get(struct k_itimer *timer, struct itimerspec64 *i
* - Timers which expired, but the signal has not yet been * - Timers which expired, but the signal has not yet been
* delivered * delivered
*/ */
if (iv && ((timer->it_requeue_pending & REQUEUE_PENDING) || sigev_none)) if (iv && timer->it_status != POSIX_TIMER_ARMED)
expires = bump_cpu_timer(timer, now); expires = bump_cpu_timer(timer, now);
else else
expires = cpu_timer_getexpires(&timer->it.cpu); expires = cpu_timer_getexpires(&timer->it.cpu);
@ -808,7 +813,7 @@ static u64 collect_timerqueue(struct timerqueue_head *head,
if (++i == MAX_COLLECTED || now < expires) if (++i == MAX_COLLECTED || now < expires)
return expires; return expires;
ctmr->firing = 1; ctmr->firing = true;
/* See posix_cpu_timer_wait_running() */ /* See posix_cpu_timer_wait_running() */
rcu_assign_pointer(ctmr->handling, current); rcu_assign_pointer(ctmr->handling, current);
cpu_timer_dequeue(ctmr); cpu_timer_dequeue(ctmr);
@ -1363,7 +1368,7 @@ static void handle_posix_cpu_timers(struct task_struct *tsk)
* timer call will interfere. * timer call will interfere.
*/ */
list_for_each_entry_safe(timer, next, &firing, it.cpu.elist) { list_for_each_entry_safe(timer, next, &firing, it.cpu.elist) {
int cpu_firing; bool cpu_firing;
/* /*
* spin_lock() is sufficient here even independent of the * spin_lock() is sufficient here even independent of the
@ -1375,13 +1380,13 @@ static void handle_posix_cpu_timers(struct task_struct *tsk)
spin_lock(&timer->it_lock); spin_lock(&timer->it_lock);
list_del_init(&timer->it.cpu.elist); list_del_init(&timer->it.cpu.elist);
cpu_firing = timer->it.cpu.firing; cpu_firing = timer->it.cpu.firing;
timer->it.cpu.firing = 0; timer->it.cpu.firing = false;
/* /*
* The firing flag is -1 if we collided with a reset * If the firing flag is cleared then this raced with a
* of the timer, which already reported this * timer rearm/delete operation. So don't generate an
* almost-firing as an overrun. So don't generate an event. * event.
*/ */
if (likely(cpu_firing >= 0)) if (likely(cpu_firing))
cpu_timer_fire(timer); cpu_timer_fire(timer);
/* See posix_cpu_timer_wait_running() */ /* See posix_cpu_timer_wait_running() */
rcu_assign_pointer(timer->it.cpu.handling, NULL); rcu_assign_pointer(timer->it.cpu.handling, NULL);
@ -1478,6 +1483,7 @@ static int do_cpu_nanosleep(const clockid_t which_clock, int flags,
timer.it_overrun = -1; timer.it_overrun = -1;
error = posix_cpu_timer_create(&timer); error = posix_cpu_timer_create(&timer);
timer.it_process = current; timer.it_process = current;
timer.it.cpu.nanosleep = true;
if (!error) { if (!error) {
static struct itimerspec64 zero_it; static struct itimerspec64 zero_it;

View File

@ -233,11 +233,12 @@ __initcall(init_posix_timers);
* The siginfo si_overrun field and the return value of timer_getoverrun(2) * The siginfo si_overrun field and the return value of timer_getoverrun(2)
* are of type int. Clamp the overrun value to INT_MAX * are of type int. Clamp the overrun value to INT_MAX
*/ */
static inline int timer_overrun_to_int(struct k_itimer *timr, int baseval) static inline int timer_overrun_to_int(struct k_itimer *timr)
{ {
s64 sum = timr->it_overrun_last + (s64)baseval; if (timr->it_overrun_last > (s64)INT_MAX)
return INT_MAX;
return sum > (s64)INT_MAX ? INT_MAX : (int)sum; return (int)timr->it_overrun_last;
} }
static void common_hrtimer_rearm(struct k_itimer *timr) static void common_hrtimer_rearm(struct k_itimer *timr)
@ -249,62 +250,62 @@ static void common_hrtimer_rearm(struct k_itimer *timr)
hrtimer_restart(timer); hrtimer_restart(timer);
} }
/* static bool __posixtimer_deliver_signal(struct kernel_siginfo *info, struct k_itimer *timr)
* This function is called from the signal delivery code if
* info->si_sys_private is not zero, which indicates that the timer has to
* be rearmed. Restart the timer and update info::si_overrun.
*/
void posixtimer_rearm(struct kernel_siginfo *info)
{ {
struct k_itimer *timr; guard(spinlock)(&timr->it_lock);
unsigned long flags;
timr = lock_timer(info->si_tid, &flags);
if (!timr)
return;
if (timr->it_interval && timr->it_requeue_pending == info->si_sys_private) {
timr->kclock->timer_rearm(timr);
timr->it_active = 1;
timr->it_overrun_last = timr->it_overrun;
timr->it_overrun = -1LL;
++timr->it_requeue_pending;
info->si_overrun = timer_overrun_to_int(timr, info->si_overrun);
}
unlock_timer(timr, flags);
}
int posix_timer_queue_signal(struct k_itimer *timr)
{
int ret, si_private = 0;
enum pid_type type;
lockdep_assert_held(&timr->it_lock);
timr->it_active = 0;
if (timr->it_interval)
si_private = ++timr->it_requeue_pending;
/* /*
* FIXME: if ->sigq is queued we can race with * Check if the timer is still alive or whether it got modified
* dequeue_signal()->posixtimer_rearm(). * since the signal was queued. In either case, don't rearm and
* * drop the signal.
* If dequeue_signal() sees the "right" value of
* si_sys_private it calls posixtimer_rearm().
* We re-queue ->sigq and drop ->it_lock().
* posixtimer_rearm() locks the timer
* and re-schedules it while ->sigq is pending.
* Not really bad, but not that we want.
*/ */
timr->sigq->info.si_sys_private = si_private; if (timr->it_signal_seq != timr->it_sigqueue_seq || WARN_ON_ONCE(!timr->it_signal))
return false;
type = !(timr->it_sigev_notify & SIGEV_THREAD_ID) ? PIDTYPE_TGID : PIDTYPE_PID; if (!timr->it_interval || WARN_ON_ONCE(timr->it_status != POSIX_TIMER_REQUEUE_PENDING))
ret = send_sigqueue(timr->sigq, timr->it_pid, type); return true;
/* If we failed to send the signal the timer stops. */
return ret > 0; timr->kclock->timer_rearm(timr);
timr->it_status = POSIX_TIMER_ARMED;
timr->it_overrun_last = timr->it_overrun;
timr->it_overrun = -1LL;
++timr->it_signal_seq;
info->si_overrun = timer_overrun_to_int(timr);
return true;
}
/*
* This function is called from the signal delivery code. It decides
* whether the signal should be dropped and rearms interval timers. The
* timer can be unconditionally accessed as there is a reference held on
* it.
*/
bool posixtimer_deliver_signal(struct kernel_siginfo *info, struct sigqueue *timer_sigq)
{
struct k_itimer *timr = container_of(timer_sigq, struct k_itimer, sigq);
bool ret;
/*
* Release siglock to ensure proper locking order versus
* timr::it_lock. Keep interrupts disabled.
*/
spin_unlock(&current->sighand->siglock);
ret = __posixtimer_deliver_signal(info, timr);
/* Drop the reference which was acquired when the signal was queued */
posixtimer_putref(timr);
spin_lock(&current->sighand->siglock);
return ret;
}
void posix_timer_queue_signal(struct k_itimer *timr)
{
lockdep_assert_held(&timr->it_lock);
timr->it_status = timr->it_interval ? POSIX_TIMER_REQUEUE_PENDING : POSIX_TIMER_DISARMED;
posixtimer_send_sigqueue(timr);
} }
/* /*
@ -317,62 +318,10 @@ int posix_timer_queue_signal(struct k_itimer *timr)
static enum hrtimer_restart posix_timer_fn(struct hrtimer *timer) static enum hrtimer_restart posix_timer_fn(struct hrtimer *timer)
{ {
struct k_itimer *timr = container_of(timer, struct k_itimer, it.real.timer); struct k_itimer *timr = container_of(timer, struct k_itimer, it.real.timer);
enum hrtimer_restart ret = HRTIMER_NORESTART;
unsigned long flags;
spin_lock_irqsave(&timr->it_lock, flags); guard(spinlock_irqsave)(&timr->it_lock);
posix_timer_queue_signal(timr);
if (posix_timer_queue_signal(timr)) { return HRTIMER_NORESTART;
/*
* The signal was not queued due to SIG_IGN. As a
* consequence the timer is not going to be rearmed from
* the signal delivery path. But as a real signal handler
* can be installed later the timer must be rearmed here.
*/
if (timr->it_interval != 0) {
ktime_t now = hrtimer_cb_get_time(timer);
/*
* FIXME: What we really want, is to stop this
* timer completely and restart it in case the
* SIG_IGN is removed. This is a non trivial
* change to the signal handling code.
*
* For now let timers with an interval less than a
* jiffy expire every jiffy and recheck for a
* valid signal handler.
*
* This avoids interrupt starvation in case of a
* very small interval, which would expire the
* timer immediately again.
*
* Moving now ahead of time by one jiffy tricks
* hrtimer_forward() to expire the timer later,
* while it still maintains the overrun accuracy
* for the price of a slight inconsistency in the
* timer_gettime() case. This is at least better
* than a timer storm.
*
* Only required when high resolution timers are
* enabled as the periodic tick based timers are
* automatically aligned to the next tick.
*/
if (IS_ENABLED(CONFIG_HIGH_RES_TIMERS)) {
ktime_t kj = TICK_NSEC;
if (timr->it_interval < kj)
now = ktime_add(now, kj);
}
timr->it_overrun += hrtimer_forward(timer, now, timr->it_interval);
ret = HRTIMER_RESTART;
++timr->it_requeue_pending;
timr->it_active = 1;
}
}
unlock_timer(timr, flags);
return ret;
} }
static struct pid *good_sigevent(sigevent_t * event) static struct pid *good_sigevent(sigevent_t * event)
@ -399,32 +348,27 @@ static struct pid *good_sigevent(sigevent_t * event)
} }
} }
static struct k_itimer * alloc_posix_timer(void) static struct k_itimer *alloc_posix_timer(void)
{ {
struct k_itimer *tmr = kmem_cache_zalloc(posix_timers_cache, GFP_KERNEL); struct k_itimer *tmr = kmem_cache_zalloc(posix_timers_cache, GFP_KERNEL);
if (!tmr) if (!tmr)
return tmr; return tmr;
if (unlikely(!(tmr->sigq = sigqueue_alloc()))) {
if (unlikely(!posixtimer_init_sigqueue(&tmr->sigq))) {
kmem_cache_free(posix_timers_cache, tmr); kmem_cache_free(posix_timers_cache, tmr);
return NULL; return NULL;
} }
clear_siginfo(&tmr->sigq->info); rcuref_init(&tmr->rcuref, 1);
return tmr; return tmr;
} }
static void k_itimer_rcu_free(struct rcu_head *head) void posixtimer_free_timer(struct k_itimer *tmr)
{
struct k_itimer *tmr = container_of(head, struct k_itimer, rcu);
kmem_cache_free(posix_timers_cache, tmr);
}
static void posix_timer_free(struct k_itimer *tmr)
{ {
put_pid(tmr->it_pid); put_pid(tmr->it_pid);
sigqueue_free(tmr->sigq); if (tmr->sigq.ucounts)
call_rcu(&tmr->rcu, k_itimer_rcu_free); dec_rlimit_put_ucounts(tmr->sigq.ucounts, UCOUNT_RLIMIT_SIGPENDING);
kfree_rcu(tmr, rcu);
} }
static void posix_timer_unhash_and_free(struct k_itimer *tmr) static void posix_timer_unhash_and_free(struct k_itimer *tmr)
@ -432,7 +376,7 @@ static void posix_timer_unhash_and_free(struct k_itimer *tmr)
spin_lock(&hash_lock); spin_lock(&hash_lock);
hlist_del_rcu(&tmr->t_hash); hlist_del_rcu(&tmr->t_hash);
spin_unlock(&hash_lock); spin_unlock(&hash_lock);
posix_timer_free(tmr); posixtimer_putref(tmr);
} }
static int common_timer_create(struct k_itimer *new_timer) static int common_timer_create(struct k_itimer *new_timer)
@ -467,7 +411,7 @@ static int do_timer_create(clockid_t which_clock, struct sigevent *event,
*/ */
new_timer_id = posix_timer_add(new_timer); new_timer_id = posix_timer_add(new_timer);
if (new_timer_id < 0) { if (new_timer_id < 0) {
posix_timer_free(new_timer); posixtimer_free_timer(new_timer);
return new_timer_id; return new_timer_id;
} }
@ -485,18 +429,23 @@ static int do_timer_create(clockid_t which_clock, struct sigevent *event,
goto out; goto out;
} }
new_timer->it_sigev_notify = event->sigev_notify; new_timer->it_sigev_notify = event->sigev_notify;
new_timer->sigq->info.si_signo = event->sigev_signo; new_timer->sigq.info.si_signo = event->sigev_signo;
new_timer->sigq->info.si_value = event->sigev_value; new_timer->sigq.info.si_value = event->sigev_value;
} else { } else {
new_timer->it_sigev_notify = SIGEV_SIGNAL; new_timer->it_sigev_notify = SIGEV_SIGNAL;
new_timer->sigq->info.si_signo = SIGALRM; new_timer->sigq.info.si_signo = SIGALRM;
memset(&new_timer->sigq->info.si_value, 0, sizeof(sigval_t)); memset(&new_timer->sigq.info.si_value, 0, sizeof(sigval_t));
new_timer->sigq->info.si_value.sival_int = new_timer->it_id; new_timer->sigq.info.si_value.sival_int = new_timer->it_id;
new_timer->it_pid = get_pid(task_tgid(current)); new_timer->it_pid = get_pid(task_tgid(current));
} }
new_timer->sigq->info.si_tid = new_timer->it_id; if (new_timer->it_sigev_notify & SIGEV_THREAD_ID)
new_timer->sigq->info.si_code = SI_TIMER; new_timer->it_pid_type = PIDTYPE_PID;
else
new_timer->it_pid_type = PIDTYPE_TGID;
new_timer->sigq.info.si_tid = new_timer->it_id;
new_timer->sigq.info.si_code = SI_TIMER;
if (copy_to_user(created_timer_id, &new_timer_id, sizeof (new_timer_id))) { if (copy_to_user(created_timer_id, &new_timer_id, sizeof (new_timer_id))) {
error = -EFAULT; error = -EFAULT;
@ -580,7 +529,14 @@ static struct k_itimer *__lock_timer(timer_t timer_id, unsigned long *flags)
* 1) Set timr::it_signal to NULL with timr::it_lock held * 1) Set timr::it_signal to NULL with timr::it_lock held
* 2) Release timr::it_lock * 2) Release timr::it_lock
* 3) Remove from the hash under hash_lock * 3) Remove from the hash under hash_lock
* 4) Call RCU for removal after the grace period * 4) Put the reference count.
*
* The reference count might not drop to zero if timr::sigq is
* queued. In that case the signal delivery or flush will put the
* last reference count.
*
* When the reference count reaches zero, the timer is scheduled
* for RCU removal after the grace period.
* *
* Holding rcu_read_lock() accross the lookup ensures that * Holding rcu_read_lock() accross the lookup ensures that
* the timer cannot be freed. * the timer cannot be freed.
@ -647,10 +603,10 @@ void common_timer_get(struct k_itimer *timr, struct itimerspec64 *cur_setting)
/* interval timer ? */ /* interval timer ? */
if (iv) { if (iv) {
cur_setting->it_interval = ktime_to_timespec64(iv); cur_setting->it_interval = ktime_to_timespec64(iv);
} else if (!timr->it_active) { } else if (timr->it_status == POSIX_TIMER_DISARMED) {
/* /*
* SIGEV_NONE oneshot timers are never queued and therefore * SIGEV_NONE oneshot timers are never queued and therefore
* timr->it_active is always false. The check below * timr->it_status is always DISARMED. The check below
* vs. remaining time will handle this case. * vs. remaining time will handle this case.
* *
* For all other timers there is nothing to update here, so * For all other timers there is nothing to update here, so
@ -667,7 +623,7 @@ void common_timer_get(struct k_itimer *timr, struct itimerspec64 *cur_setting)
* is a SIGEV_NONE timer move the expiry time forward by intervals, * is a SIGEV_NONE timer move the expiry time forward by intervals,
* so expiry is > now. * so expiry is > now.
*/ */
if (iv && (timr->it_requeue_pending & REQUEUE_PENDING || sig_none)) if (iv && timr->it_status != POSIX_TIMER_ARMED)
timr->it_overrun += kc->timer_forward(timr, now); timr->it_overrun += kc->timer_forward(timr, now);
remaining = kc->timer_remaining(timr, now); remaining = kc->timer_remaining(timr, now);
@ -775,7 +731,7 @@ SYSCALL_DEFINE1(timer_getoverrun, timer_t, timer_id)
if (!timr) if (!timr)
return -EINVAL; return -EINVAL;
overrun = timer_overrun_to_int(timr, 0); overrun = timer_overrun_to_int(timr);
unlock_timer(timr, flags); unlock_timer(timr, flags);
return overrun; return overrun;
@ -867,8 +823,6 @@ void posix_timer_set_common(struct k_itimer *timer, struct itimerspec64 *new_set
else else
timer->it_interval = 0; timer->it_interval = 0;
/* Prevent reloading in case there is a signal pending */
timer->it_requeue_pending = (timer->it_requeue_pending + 2) & ~REQUEUE_PENDING;
/* Reset overrun accounting */ /* Reset overrun accounting */
timer->it_overrun_last = 0; timer->it_overrun_last = 0;
timer->it_overrun = -1LL; timer->it_overrun = -1LL;
@ -886,8 +840,6 @@ int common_timer_set(struct k_itimer *timr, int flags,
if (old_setting) if (old_setting)
common_timer_get(timr, old_setting); common_timer_get(timr, old_setting);
/* Prevent rearming by clearing the interval */
timr->it_interval = 0;
/* /*
* Careful here. On SMP systems the timer expiry function could be * Careful here. On SMP systems the timer expiry function could be
* active and spinning on timr->it_lock. * active and spinning on timr->it_lock.
@ -895,7 +847,7 @@ int common_timer_set(struct k_itimer *timr, int flags,
if (kc->timer_try_to_cancel(timr) < 0) if (kc->timer_try_to_cancel(timr) < 0)
return TIMER_RETRY; return TIMER_RETRY;
timr->it_active = 0; timr->it_status = POSIX_TIMER_DISARMED;
posix_timer_set_common(timr, new_setting); posix_timer_set_common(timr, new_setting);
/* Keep timer disarmed when it_value is zero */ /* Keep timer disarmed when it_value is zero */
@ -908,7 +860,8 @@ int common_timer_set(struct k_itimer *timr, int flags,
sigev_none = timr->it_sigev_notify == SIGEV_NONE; sigev_none = timr->it_sigev_notify == SIGEV_NONE;
kc->timer_arm(timr, expires, flags & TIMER_ABSTIME, sigev_none); kc->timer_arm(timr, expires, flags & TIMER_ABSTIME, sigev_none);
timr->it_active = !sigev_none; if (!sigev_none)
timr->it_status = POSIX_TIMER_ARMED;
return 0; return 0;
} }
@ -936,6 +889,9 @@ retry:
if (old_spec64) if (old_spec64)
old_spec64->it_interval = ktime_to_timespec64(timr->it_interval); old_spec64->it_interval = ktime_to_timespec64(timr->it_interval);
/* Prevent signal delivery and rearming. */
timr->it_signal_seq++;
kc = timr->kclock; kc = timr->kclock;
if (WARN_ON_ONCE(!kc || !kc->timer_set)) if (WARN_ON_ONCE(!kc || !kc->timer_set))
error = -EINVAL; error = -EINVAL;
@ -1004,17 +960,31 @@ int common_timer_del(struct k_itimer *timer)
{ {
const struct k_clock *kc = timer->kclock; const struct k_clock *kc = timer->kclock;
timer->it_interval = 0;
if (kc->timer_try_to_cancel(timer) < 0) if (kc->timer_try_to_cancel(timer) < 0)
return TIMER_RETRY; return TIMER_RETRY;
timer->it_active = 0; timer->it_status = POSIX_TIMER_DISARMED;
return 0; return 0;
} }
/*
* If the deleted timer is on the ignored list, remove it and
* drop the associated reference.
*/
static inline void posix_timer_cleanup_ignored(struct k_itimer *tmr)
{
if (!hlist_unhashed(&tmr->ignored_list)) {
hlist_del_init(&tmr->ignored_list);
posixtimer_putref(tmr);
}
}
static inline int timer_delete_hook(struct k_itimer *timer) static inline int timer_delete_hook(struct k_itimer *timer)
{ {
const struct k_clock *kc = timer->kclock; const struct k_clock *kc = timer->kclock;
/* Prevent signal delivery and rearming. */
timer->it_signal_seq++;
if (WARN_ON_ONCE(!kc || !kc->timer_del)) if (WARN_ON_ONCE(!kc || !kc->timer_del))
return -EINVAL; return -EINVAL;
return kc->timer_del(timer); return kc->timer_del(timer);
@ -1040,12 +1010,18 @@ retry_delete:
spin_lock(&current->sighand->siglock); spin_lock(&current->sighand->siglock);
hlist_del(&timer->list); hlist_del(&timer->list);
spin_unlock(&current->sighand->siglock); posix_timer_cleanup_ignored(timer);
/* /*
* A concurrent lookup could check timer::it_signal lockless. It * A concurrent lookup could check timer::it_signal lockless. It
* will reevaluate with timer::it_lock held and observe the NULL. * will reevaluate with timer::it_lock held and observe the NULL.
*
* It must be written with siglock held so that the signal code
* observes timer->it_signal == NULL in do_sigaction(SIG_IGN),
* which prevents it from moving a pending signal of a deleted
* timer to the ignore list.
*/ */
WRITE_ONCE(timer->it_signal, NULL); WRITE_ONCE(timer->it_signal, NULL);
spin_unlock(&current->sighand->siglock);
unlock_timer(timer, flags); unlock_timer(timer, flags);
posix_timer_unhash_and_free(timer); posix_timer_unhash_and_free(timer);
@ -1091,6 +1067,8 @@ retry_delete:
} }
hlist_del(&timer->list); hlist_del(&timer->list);
posix_timer_cleanup_ignored(timer);
/* /*
* Setting timer::it_signal to NULL is technically not required * Setting timer::it_signal to NULL is technically not required
* here as nothing can access the timer anymore legitimately via * here as nothing can access the timer anymore legitimately via
@ -1123,6 +1101,19 @@ void exit_itimers(struct task_struct *tsk)
/* The timers are not longer accessible via tsk::signal */ /* The timers are not longer accessible via tsk::signal */
while (!hlist_empty(&timers)) while (!hlist_empty(&timers))
itimer_delete(hlist_entry(timers.first, struct k_itimer, list)); itimer_delete(hlist_entry(timers.first, struct k_itimer, list));
/*
* There should be no timers on the ignored list. itimer_delete() has
* mopped them up.
*/
if (!WARN_ON_ONCE(!hlist_empty(&tsk->signal->ignored_posix_timers)))
return;
hlist_move_list(&tsk->signal->ignored_posix_timers, &timers);
while (!hlist_empty(&timers)) {
posix_timer_cleanup_ignored(hlist_entry(timers.first, struct k_itimer,
ignored_list));
}
} }
SYSCALL_DEFINE2(clock_settime, const clockid_t, which_clock, SYSCALL_DEFINE2(clock_settime, const clockid_t, which_clock,

View File

@ -1,6 +1,12 @@
/* SPDX-License-Identifier: GPL-2.0 */ /* SPDX-License-Identifier: GPL-2.0 */
#define TIMER_RETRY 1 #define TIMER_RETRY 1
enum posix_timer_state {
POSIX_TIMER_DISARMED,
POSIX_TIMER_ARMED,
POSIX_TIMER_REQUEUE_PENDING,
};
struct k_clock { struct k_clock {
int (*clock_getres)(const clockid_t which_clock, int (*clock_getres)(const clockid_t which_clock,
struct timespec64 *tp); struct timespec64 *tp);
@ -36,7 +42,7 @@ extern const struct k_clock clock_process;
extern const struct k_clock clock_thread; extern const struct k_clock clock_thread;
extern const struct k_clock alarm_clock; extern const struct k_clock alarm_clock;
int posix_timer_queue_signal(struct k_itimer *timr); void posix_timer_queue_signal(struct k_itimer *timr);
void common_timer_get(struct k_itimer *timr, struct itimerspec64 *cur_setting); void common_timer_get(struct k_itimer *timr, struct itimerspec64 *cur_setting);
int common_timer_set(struct k_itimer *timr, int flags, int common_timer_set(struct k_itimer *timr, int flags,

377
kernel/time/sleep_timeout.c Normal file
View File

@ -0,0 +1,377 @@
// SPDX-License-Identifier: GPL-2.0
/*
* Kernel internal schedule timeout and sleeping functions
*/
#include <linux/delay.h>
#include <linux/jiffies.h>
#include <linux/timer.h>
#include <linux/sched/signal.h>
#include <linux/sched/debug.h>
#include "tick-internal.h"
/*
* Since schedule_timeout()'s timer is defined on the stack, it must store
* the target task on the stack as well.
*/
struct process_timer {
struct timer_list timer;
struct task_struct *task;
};
static void process_timeout(struct timer_list *t)
{
struct process_timer *timeout = from_timer(timeout, t, timer);
wake_up_process(timeout->task);
}
/**
* schedule_timeout - sleep until timeout
* @timeout: timeout value in jiffies
*
* Make the current task sleep until @timeout jiffies have elapsed.
* The function behavior depends on the current task state
* (see also set_current_state() description):
*
* %TASK_RUNNING - the scheduler is called, but the task does not sleep
* at all. That happens because sched_submit_work() does nothing for
* tasks in %TASK_RUNNING state.
*
* %TASK_UNINTERRUPTIBLE - at least @timeout jiffies are guaranteed to
* pass before the routine returns unless the current task is explicitly
* woken up, (e.g. by wake_up_process()).
*
* %TASK_INTERRUPTIBLE - the routine may return early if a signal is
* delivered to the current task or the current task is explicitly woken
* up.
*
* The current task state is guaranteed to be %TASK_RUNNING when this
* routine returns.
*
* Specifying a @timeout value of %MAX_SCHEDULE_TIMEOUT will schedule
* the CPU away without a bound on the timeout. In this case the return
* value will be %MAX_SCHEDULE_TIMEOUT.
*
* Returns: 0 when the timer has expired otherwise the remaining time in
* jiffies will be returned. In all cases the return value is guaranteed
* to be non-negative.
*/
signed long __sched schedule_timeout(signed long timeout)
{
struct process_timer timer;
unsigned long expire;
switch (timeout) {
case MAX_SCHEDULE_TIMEOUT:
/*
* These two special cases are useful to be comfortable
* in the caller. Nothing more. We could take
* MAX_SCHEDULE_TIMEOUT from one of the negative value
* but I' d like to return a valid offset (>=0) to allow
* the caller to do everything it want with the retval.
*/
schedule();
goto out;
default:
/*
* Another bit of PARANOID. Note that the retval will be
* 0 since no piece of kernel is supposed to do a check
* for a negative retval of schedule_timeout() (since it
* should never happens anyway). You just have the printk()
* that will tell you if something is gone wrong and where.
*/
if (timeout < 0) {
pr_err("%s: wrong timeout value %lx\n", __func__, timeout);
dump_stack();
__set_current_state(TASK_RUNNING);
goto out;
}
}
expire = timeout + jiffies;
timer.task = current;
timer_setup_on_stack(&timer.timer, process_timeout, 0);
timer.timer.expires = expire;
add_timer(&timer.timer);
schedule();
del_timer_sync(&timer.timer);
/* Remove the timer from the object tracker */
destroy_timer_on_stack(&timer.timer);
timeout = expire - jiffies;
out:
return timeout < 0 ? 0 : timeout;
}
EXPORT_SYMBOL(schedule_timeout);
/*
* __set_current_state() can be used in schedule_timeout_*() functions, because
* schedule_timeout() calls schedule() unconditionally.
*/
/**
* schedule_timeout_interruptible - sleep until timeout (interruptible)
* @timeout: timeout value in jiffies
*
* See schedule_timeout() for details.
*
* Task state is set to TASK_INTERRUPTIBLE before starting the timeout.
*/
signed long __sched schedule_timeout_interruptible(signed long timeout)
{
__set_current_state(TASK_INTERRUPTIBLE);
return schedule_timeout(timeout);
}
EXPORT_SYMBOL(schedule_timeout_interruptible);
/**
* schedule_timeout_killable - sleep until timeout (killable)
* @timeout: timeout value in jiffies
*
* See schedule_timeout() for details.
*
* Task state is set to TASK_KILLABLE before starting the timeout.
*/
signed long __sched schedule_timeout_killable(signed long timeout)
{
__set_current_state(TASK_KILLABLE);
return schedule_timeout(timeout);
}
EXPORT_SYMBOL(schedule_timeout_killable);
/**
* schedule_timeout_uninterruptible - sleep until timeout (uninterruptible)
* @timeout: timeout value in jiffies
*
* See schedule_timeout() for details.
*
* Task state is set to TASK_UNINTERRUPTIBLE before starting the timeout.
*/
signed long __sched schedule_timeout_uninterruptible(signed long timeout)
{
__set_current_state(TASK_UNINTERRUPTIBLE);
return schedule_timeout(timeout);
}
EXPORT_SYMBOL(schedule_timeout_uninterruptible);
/**
* schedule_timeout_idle - sleep until timeout (idle)
* @timeout: timeout value in jiffies
*
* See schedule_timeout() for details.
*
* Task state is set to TASK_IDLE before starting the timeout. It is similar to
* schedule_timeout_uninterruptible(), except this task will not contribute to
* load average.
*/
signed long __sched schedule_timeout_idle(signed long timeout)
{
__set_current_state(TASK_IDLE);
return schedule_timeout(timeout);
}
EXPORT_SYMBOL(schedule_timeout_idle);
/**
* schedule_hrtimeout_range_clock - sleep until timeout
* @expires: timeout value (ktime_t)
* @delta: slack in expires timeout (ktime_t)
* @mode: timer mode
* @clock_id: timer clock to be used
*
* Details are explained in schedule_hrtimeout_range() function description as
* this function is commonly used.
*/
int __sched schedule_hrtimeout_range_clock(ktime_t *expires, u64 delta,
const enum hrtimer_mode mode, clockid_t clock_id)
{
struct hrtimer_sleeper t;
/*
* Optimize when a zero timeout value is given. It does not
* matter whether this is an absolute or a relative time.
*/
if (expires && *expires == 0) {
__set_current_state(TASK_RUNNING);
return 0;
}
/*
* A NULL parameter means "infinite"
*/
if (!expires) {
schedule();
return -EINTR;
}
hrtimer_setup_sleeper_on_stack(&t, clock_id, mode);
hrtimer_set_expires_range_ns(&t.timer, *expires, delta);
hrtimer_sleeper_start_expires(&t, mode);
if (likely(t.task))
schedule();
hrtimer_cancel(&t.timer);
destroy_hrtimer_on_stack(&t.timer);
__set_current_state(TASK_RUNNING);
return !t.task ? 0 : -EINTR;
}
EXPORT_SYMBOL_GPL(schedule_hrtimeout_range_clock);
/**
* schedule_hrtimeout_range - sleep until timeout
* @expires: timeout value (ktime_t)
* @delta: slack in expires timeout (ktime_t)
* @mode: timer mode
*
* Make the current task sleep until the given expiry time has
* elapsed. The routine will return immediately unless
* the current task state has been set (see set_current_state()).
*
* The @delta argument gives the kernel the freedom to schedule the
* actual wakeup to a time that is both power and performance friendly
* for regular (non RT/DL) tasks.
* The kernel give the normal best effort behavior for "@expires+@delta",
* but may decide to fire the timer earlier, but no earlier than @expires.
*
* You can set the task state as follows -
*
* %TASK_UNINTERRUPTIBLE - at least @timeout time is guaranteed to
* pass before the routine returns unless the current task is explicitly
* woken up, (e.g. by wake_up_process()).
*
* %TASK_INTERRUPTIBLE - the routine may return early if a signal is
* delivered to the current task or the current task is explicitly woken
* up.
*
* The current task state is guaranteed to be TASK_RUNNING when this
* routine returns.
*
* Returns: 0 when the timer has expired. If the task was woken before the
* timer expired by a signal (only possible in state TASK_INTERRUPTIBLE) or
* by an explicit wakeup, it returns -EINTR.
*/
int __sched schedule_hrtimeout_range(ktime_t *expires, u64 delta,
const enum hrtimer_mode mode)
{
return schedule_hrtimeout_range_clock(expires, delta, mode,
CLOCK_MONOTONIC);
}
EXPORT_SYMBOL_GPL(schedule_hrtimeout_range);
/**
* schedule_hrtimeout - sleep until timeout
* @expires: timeout value (ktime_t)
* @mode: timer mode
*
* See schedule_hrtimeout_range() for details. @delta argument of
* schedule_hrtimeout_range() is set to 0 and has therefore no impact.
*/
int __sched schedule_hrtimeout(ktime_t *expires, const enum hrtimer_mode mode)
{
return schedule_hrtimeout_range(expires, 0, mode);
}
EXPORT_SYMBOL_GPL(schedule_hrtimeout);
/**
* msleep - sleep safely even with waitqueue interruptions
* @msecs: Requested sleep duration in milliseconds
*
* msleep() uses jiffy based timeouts for the sleep duration. Because of the
* design of the timer wheel, the maximum additional percentage delay (slack) is
* 12.5%. This is only valid for timers which will end up in level 1 or a higher
* level of the timer wheel. For explanation of those 12.5% please check the
* detailed description about the basics of the timer wheel.
*
* The slack of timers which will end up in level 0 depends on sleep duration
* (msecs) and HZ configuration and can be calculated in the following way (with
* the timer wheel design restriction that the slack is not less than 12.5%):
*
* ``slack = MSECS_PER_TICK / msecs``
*
* When the allowed slack of the callsite is known, the calculation could be
* turned around to find the minimal allowed sleep duration to meet the
* constraints. For example:
*
* * ``HZ=1000`` with ``slack=25%``: ``MSECS_PER_TICK / slack = 1 / (1/4) = 4``:
* all sleep durations greater or equal 4ms will meet the constraints.
* * ``HZ=1000`` with ``slack=12.5%``: ``MSECS_PER_TICK / slack = 1 / (1/8) = 8``:
* all sleep durations greater or equal 8ms will meet the constraints.
* * ``HZ=250`` with ``slack=25%``: ``MSECS_PER_TICK / slack = 4 / (1/4) = 16``:
* all sleep durations greater or equal 16ms will meet the constraints.
* * ``HZ=250`` with ``slack=12.5%``: ``MSECS_PER_TICK / slack = 4 / (1/8) = 32``:
* all sleep durations greater or equal 32ms will meet the constraints.
*
* See also the signal aware variant msleep_interruptible().
*/
void msleep(unsigned int msecs)
{
unsigned long timeout = msecs_to_jiffies(msecs);
while (timeout)
timeout = schedule_timeout_uninterruptible(timeout);
}
EXPORT_SYMBOL(msleep);
/**
* msleep_interruptible - sleep waiting for signals
* @msecs: Requested sleep duration in milliseconds
*
* See msleep() for some basic information.
*
* The difference between msleep() and msleep_interruptible() is that the sleep
* could be interrupted by a signal delivery and then returns early.
*
* Returns: The remaining time of the sleep duration transformed to msecs (see
* schedule_timeout() for details).
*/
unsigned long msleep_interruptible(unsigned int msecs)
{
unsigned long timeout = msecs_to_jiffies(msecs);
while (timeout && !signal_pending(current))
timeout = schedule_timeout_interruptible(timeout);
return jiffies_to_msecs(timeout);
}
EXPORT_SYMBOL(msleep_interruptible);
/**
* usleep_range_state - Sleep for an approximate time in a given state
* @min: Minimum time in usecs to sleep
* @max: Maximum time in usecs to sleep
* @state: State of the current task that will be while sleeping
*
* usleep_range_state() sleeps at least for the minimum specified time but not
* longer than the maximum specified amount of time. The range might reduce
* power usage by allowing hrtimers to coalesce an already scheduled interrupt
* with this hrtimer. In the worst case, an interrupt is scheduled for the upper
* bound.
*
* The sleeping task is set to the specified state before starting the sleep.
*
* In non-atomic context where the exact wakeup time is flexible, use
* usleep_range() or its variants instead of udelay(). The sleep improves
* responsiveness by avoiding the CPU-hogging busy-wait of udelay().
*/
void __sched usleep_range_state(unsigned long min, unsigned long max, unsigned int state)
{
ktime_t exp = ktime_add_us(ktime_get(), min);
u64 delta = (u64)(max - min) * NSEC_PER_USEC;
if (WARN_ON_ONCE(max < min))
delta = 0;
for (;;) {
__set_current_state(state);
/* Do not return before the requested sleep time has elapsed */
if (!schedule_hrtimeout_range(&exp, delta, HRTIMER_MODE_ABS))
break;
}
}
EXPORT_SYMBOL(usleep_range_state);

View File

@ -25,6 +25,7 @@ extern int tick_do_timer_cpu __read_mostly;
extern void tick_setup_periodic(struct clock_event_device *dev, int broadcast); extern void tick_setup_periodic(struct clock_event_device *dev, int broadcast);
extern void tick_handle_periodic(struct clock_event_device *dev); extern void tick_handle_periodic(struct clock_event_device *dev);
extern void tick_check_new_device(struct clock_event_device *dev); extern void tick_check_new_device(struct clock_event_device *dev);
extern void tick_offline_cpu(unsigned int cpu);
extern void tick_shutdown(unsigned int cpu); extern void tick_shutdown(unsigned int cpu);
extern void tick_suspend(void); extern void tick_suspend(void);
extern void tick_resume(void); extern void tick_resume(void);
@ -142,10 +143,8 @@ static inline bool tick_broadcast_oneshot_available(void) { return tick_oneshot_
#endif /* !(BROADCAST && ONESHOT) */ #endif /* !(BROADCAST && ONESHOT) */
#if defined(CONFIG_GENERIC_CLOCKEVENTS_BROADCAST) && defined(CONFIG_HOTPLUG_CPU) #if defined(CONFIG_GENERIC_CLOCKEVENTS_BROADCAST) && defined(CONFIG_HOTPLUG_CPU)
extern void tick_offline_cpu(unsigned int cpu);
extern void tick_broadcast_offline(unsigned int cpu); extern void tick_broadcast_offline(unsigned int cpu);
#else #else
static inline void tick_offline_cpu(unsigned int cpu) { }
static inline void tick_broadcast_offline(unsigned int cpu) { } static inline void tick_broadcast_offline(unsigned int cpu) { }
#endif #endif

View File

@ -311,14 +311,6 @@ static enum hrtimer_restart tick_nohz_handler(struct hrtimer *timer)
return HRTIMER_RESTART; return HRTIMER_RESTART;
} }
static void tick_sched_timer_cancel(struct tick_sched *ts)
{
if (tick_sched_flag_test(ts, TS_FLAG_HIGHRES))
hrtimer_cancel(&ts->sched_timer);
else if (tick_sched_flag_test(ts, TS_FLAG_NOHZ))
tick_program_event(KTIME_MAX, 1);
}
#ifdef CONFIG_NO_HZ_FULL #ifdef CONFIG_NO_HZ_FULL
cpumask_var_t tick_nohz_full_mask; cpumask_var_t tick_nohz_full_mask;
EXPORT_SYMBOL_GPL(tick_nohz_full_mask); EXPORT_SYMBOL_GPL(tick_nohz_full_mask);
@ -1061,7 +1053,10 @@ static void tick_nohz_stop_tick(struct tick_sched *ts, int cpu)
* the tick timer. * the tick timer.
*/ */
if (unlikely(expires == KTIME_MAX)) { if (unlikely(expires == KTIME_MAX)) {
tick_sched_timer_cancel(ts); if (tick_sched_flag_test(ts, TS_FLAG_HIGHRES))
hrtimer_cancel(&ts->sched_timer);
else
tick_program_event(KTIME_MAX, 1);
return; return;
} }
@ -1610,21 +1605,13 @@ void tick_setup_sched_timer(bool hrtimer)
*/ */
void tick_sched_timer_dying(int cpu) void tick_sched_timer_dying(int cpu)
{ {
struct tick_device *td = &per_cpu(tick_cpu_device, cpu);
struct tick_sched *ts = &per_cpu(tick_cpu_sched, cpu); struct tick_sched *ts = &per_cpu(tick_cpu_sched, cpu);
struct clock_event_device *dev = td->evtdev;
ktime_t idle_sleeptime, iowait_sleeptime; ktime_t idle_sleeptime, iowait_sleeptime;
unsigned long idle_calls, idle_sleeps; unsigned long idle_calls, idle_sleeps;
/* This must happen before hrtimers are migrated! */ /* This must happen before hrtimers are migrated! */
tick_sched_timer_cancel(ts); if (tick_sched_flag_test(ts, TS_FLAG_HIGHRES))
hrtimer_cancel(&ts->sched_timer);
/*
* If the clockevents doesn't support CLOCK_EVT_STATE_ONESHOT_STOPPED,
* make sure not to call low-res tick handler.
*/
if (tick_sched_flag_test(ts, TS_FLAG_NOHZ))
dev->event_handler = clockevents_handle_noop;
idle_sleeptime = ts->idle_sleeptime; idle_sleeptime = ts->idle_sleeptime;
iowait_sleeptime = ts->iowait_sleeptime; iowait_sleeptime = ts->iowait_sleeptime;

View File

@ -556,9 +556,9 @@ EXPORT_SYMBOL(ns_to_timespec64);
* - all other values are converted to jiffies by either multiplying * - all other values are converted to jiffies by either multiplying
* the input value by a factor or dividing it with a factor and * the input value by a factor or dividing it with a factor and
* handling any 32-bit overflows. * handling any 32-bit overflows.
* for the details see __msecs_to_jiffies() * for the details see _msecs_to_jiffies()
* *
* __msecs_to_jiffies() checks for the passed in value being a constant * msecs_to_jiffies() checks for the passed in value being a constant
* via __builtin_constant_p() allowing gcc to eliminate most of the * via __builtin_constant_p() allowing gcc to eliminate most of the
* code, __msecs_to_jiffies() is called if the value passed does not * code, __msecs_to_jiffies() is called if the value passed does not
* allow constant folding and the actual conversion must be done at * allow constant folding and the actual conversion must be done at
@ -866,7 +866,7 @@ struct timespec64 timespec64_add_safe(const struct timespec64 lhs,
* *
* Handles compat or 32-bit modes. * Handles compat or 32-bit modes.
* *
* Return: %0 on success or negative errno on error * Return: 0 on success or negative errno on error
*/ */
int get_timespec64(struct timespec64 *ts, int get_timespec64(struct timespec64 *ts,
const struct __kernel_timespec __user *uts) const struct __kernel_timespec __user *uts)
@ -897,7 +897,7 @@ EXPORT_SYMBOL_GPL(get_timespec64);
* @ts: input &struct timespec64 * @ts: input &struct timespec64
* @uts: user's &struct __kernel_timespec * @uts: user's &struct __kernel_timespec
* *
* Return: %0 on success or negative errno on error * Return: 0 on success or negative errno on error
*/ */
int put_timespec64(const struct timespec64 *ts, int put_timespec64(const struct timespec64 *ts,
struct __kernel_timespec __user *uts) struct __kernel_timespec __user *uts)
@ -944,7 +944,7 @@ static int __put_old_timespec32(const struct timespec64 *ts64,
* *
* Handles X86_X32_ABI compatibility conversion. * Handles X86_X32_ABI compatibility conversion.
* *
* Return: %0 on success or negative errno on error * Return: 0 on success or negative errno on error
*/ */
int get_old_timespec32(struct timespec64 *ts, const void __user *uts) int get_old_timespec32(struct timespec64 *ts, const void __user *uts)
{ {
@ -963,7 +963,7 @@ EXPORT_SYMBOL_GPL(get_old_timespec32);
* *
* Handles X86_X32_ABI compatibility conversion. * Handles X86_X32_ABI compatibility conversion.
* *
* Return: %0 on success or negative errno on error * Return: 0 on success or negative errno on error
*/ */
int put_old_timespec32(const struct timespec64 *ts, void __user *uts) int put_old_timespec32(const struct timespec64 *ts, void __user *uts)
{ {
@ -979,7 +979,7 @@ EXPORT_SYMBOL_GPL(put_old_timespec32);
* @it: destination &struct itimerspec64 * @it: destination &struct itimerspec64
* @uit: user's &struct __kernel_itimerspec * @uit: user's &struct __kernel_itimerspec
* *
* Return: %0 on success or negative errno on error * Return: 0 on success or negative errno on error
*/ */
int get_itimerspec64(struct itimerspec64 *it, int get_itimerspec64(struct itimerspec64 *it,
const struct __kernel_itimerspec __user *uit) const struct __kernel_itimerspec __user *uit)
@ -1002,7 +1002,7 @@ EXPORT_SYMBOL_GPL(get_itimerspec64);
* @it: input &struct itimerspec64 * @it: input &struct itimerspec64
* @uit: user's &struct __kernel_itimerspec * @uit: user's &struct __kernel_itimerspec
* *
* Return: %0 on success or negative errno on error * Return: 0 on success or negative errno on error
*/ */
int put_itimerspec64(const struct itimerspec64 *it, int put_itimerspec64(const struct itimerspec64 *it,
struct __kernel_itimerspec __user *uit) struct __kernel_itimerspec __user *uit)
@ -1024,7 +1024,7 @@ EXPORT_SYMBOL_GPL(put_itimerspec64);
* @its: destination &struct itimerspec64 * @its: destination &struct itimerspec64
* @uits: user's &struct old_itimerspec32 * @uits: user's &struct old_itimerspec32
* *
* Return: %0 on success or negative errno on error * Return: 0 on success or negative errno on error
*/ */
int get_old_itimerspec32(struct itimerspec64 *its, int get_old_itimerspec32(struct itimerspec64 *its,
const struct old_itimerspec32 __user *uits) const struct old_itimerspec32 __user *uits)
@ -1043,7 +1043,7 @@ EXPORT_SYMBOL_GPL(get_old_itimerspec32);
* @its: input &struct itimerspec64 * @its: input &struct itimerspec64
* @uits: user's &struct old_itimerspec32 * @uits: user's &struct old_itimerspec32
* *
* Return: %0 on success or negative errno on error * Return: 0 on success or negative errno on error
*/ */
int put_old_itimerspec32(const struct itimerspec64 *its, int put_old_itimerspec32(const struct itimerspec64 *its,
struct old_itimerspec32 __user *uits) struct old_itimerspec32 __user *uits)

View File

@ -30,8 +30,9 @@
#include "timekeeping_internal.h" #include "timekeeping_internal.h"
#define TK_CLEAR_NTP (1 << 0) #define TK_CLEAR_NTP (1 << 0)
#define TK_MIRROR (1 << 1) #define TK_CLOCK_WAS_SET (1 << 1)
#define TK_CLOCK_WAS_SET (1 << 2)
#define TK_UPDATE_ALL (TK_CLEAR_NTP | TK_CLOCK_WAS_SET)
enum timekeeping_adv_mode { enum timekeeping_adv_mode {
/* Update timekeeper when a tick has passed */ /* Update timekeeper when a tick has passed */
@ -41,20 +42,18 @@ enum timekeeping_adv_mode {
TK_ADV_FREQ TK_ADV_FREQ
}; };
DEFINE_RAW_SPINLOCK(timekeeper_lock);
/* /*
* The most important data for readout fits into a single 64 byte * The most important data for readout fits into a single 64 byte
* cache line. * cache line.
*/ */
static struct { struct tk_data {
seqcount_raw_spinlock_t seq; seqcount_raw_spinlock_t seq;
struct timekeeper timekeeper; struct timekeeper timekeeper;
} tk_core ____cacheline_aligned = { struct timekeeper shadow_timekeeper;
.seq = SEQCNT_RAW_SPINLOCK_ZERO(tk_core.seq, &timekeeper_lock), raw_spinlock_t lock;
}; } ____cacheline_aligned;
static struct timekeeper shadow_timekeeper; static struct tk_data tk_core;
/* flag for if timekeeping is suspended */ /* flag for if timekeeping is suspended */
int __read_mostly timekeeping_suspended; int __read_mostly timekeeping_suspended;
@ -114,6 +113,19 @@ static struct tk_fast tk_fast_raw ____cacheline_aligned = {
.base[1] = FAST_TK_INIT, .base[1] = FAST_TK_INIT,
}; };
unsigned long timekeeper_lock_irqsave(void)
{
unsigned long flags;
raw_spin_lock_irqsave(&tk_core.lock, flags);
return flags;
}
void timekeeper_unlock_irqrestore(unsigned long flags)
{
raw_spin_unlock_irqrestore(&tk_core.lock, flags);
}
/* /*
* Multigrain timestamps require tracking the latest fine-grained timestamp * Multigrain timestamps require tracking the latest fine-grained timestamp
* that has been issued, and never returning a coarse-grained timestamp that is * that has been issued, and never returning a coarse-grained timestamp that is
@ -178,13 +190,15 @@ static void tk_set_wall_to_mono(struct timekeeper *tk, struct timespec64 wtm)
WARN_ON_ONCE(tk->offs_real != timespec64_to_ktime(tmp)); WARN_ON_ONCE(tk->offs_real != timespec64_to_ktime(tmp));
tk->wall_to_monotonic = wtm; tk->wall_to_monotonic = wtm;
set_normalized_timespec64(&tmp, -wtm.tv_sec, -wtm.tv_nsec); set_normalized_timespec64(&tmp, -wtm.tv_sec, -wtm.tv_nsec);
tk->offs_real = timespec64_to_ktime(tmp); /* Paired with READ_ONCE() in ktime_mono_to_any() */
tk->offs_tai = ktime_add(tk->offs_real, ktime_set(tk->tai_offset, 0)); WRITE_ONCE(tk->offs_real, timespec64_to_ktime(tmp));
WRITE_ONCE(tk->offs_tai, ktime_add(tk->offs_real, ktime_set(tk->tai_offset, 0)));
} }
static inline void tk_update_sleep_time(struct timekeeper *tk, ktime_t delta) static inline void tk_update_sleep_time(struct timekeeper *tk, ktime_t delta)
{ {
tk->offs_boot = ktime_add(tk->offs_boot, delta); /* Paired with READ_ONCE() in ktime_mono_to_any() */
WRITE_ONCE(tk->offs_boot, ktime_add(tk->offs_boot, delta));
/* /*
* Timespec representation for VDSO update to avoid 64bit division * Timespec representation for VDSO update to avoid 64bit division
* on every update. * on every update.
@ -201,7 +215,7 @@ static inline void tk_update_sleep_time(struct timekeeper *tk, ktime_t delta)
* the tkr's clocksource may change between the read reference, and the * the tkr's clocksource may change between the read reference, and the
* clock reference passed to the read function. This can cause crashes if * clock reference passed to the read function. This can cause crashes if
* the wrong clocksource is passed to the wrong read function. * the wrong clocksource is passed to the wrong read function.
* This isn't necessary to use when holding the timekeeper_lock or doing * This isn't necessary to use when holding the tk_core.lock or doing
* a read of the fast-timekeeper tkrs (which is protected by its own locking * a read of the fast-timekeeper tkrs (which is protected by its own locking
* and update logic). * and update logic).
*/ */
@ -212,97 +226,6 @@ static inline u64 tk_clock_read(const struct tk_read_base *tkr)
return clock->read(clock); return clock->read(clock);
} }
#ifdef CONFIG_DEBUG_TIMEKEEPING
#define WARNING_FREQ (HZ*300) /* 5 minute rate-limiting */
static void timekeeping_check_update(struct timekeeper *tk, u64 offset)
{
u64 max_cycles = tk->tkr_mono.clock->max_cycles;
const char *name = tk->tkr_mono.clock->name;
if (offset > max_cycles) {
printk_deferred("WARNING: timekeeping: Cycle offset (%lld) is larger than allowed by the '%s' clock's max_cycles value (%lld): time overflow danger\n",
offset, name, max_cycles);
printk_deferred(" timekeeping: Your kernel is sick, but tries to cope by capping time updates\n");
} else {
if (offset > (max_cycles >> 1)) {
printk_deferred("INFO: timekeeping: Cycle offset (%lld) is larger than the '%s' clock's 50%% safety margin (%lld)\n",
offset, name, max_cycles >> 1);
printk_deferred(" timekeeping: Your kernel is still fine, but is feeling a bit nervous\n");
}
}
if (tk->underflow_seen) {
if (jiffies - tk->last_warning > WARNING_FREQ) {
printk_deferred("WARNING: Underflow in clocksource '%s' observed, time update ignored.\n", name);
printk_deferred(" Please report this, consider using a different clocksource, if possible.\n");
printk_deferred(" Your kernel is probably still fine.\n");
tk->last_warning = jiffies;
}
tk->underflow_seen = 0;
}
if (tk->overflow_seen) {
if (jiffies - tk->last_warning > WARNING_FREQ) {
printk_deferred("WARNING: Overflow in clocksource '%s' observed, time update capped.\n", name);
printk_deferred(" Please report this, consider using a different clocksource, if possible.\n");
printk_deferred(" Your kernel is probably still fine.\n");
tk->last_warning = jiffies;
}
tk->overflow_seen = 0;
}
}
static inline u64 timekeeping_cycles_to_ns(const struct tk_read_base *tkr, u64 cycles);
static inline u64 timekeeping_debug_get_ns(const struct tk_read_base *tkr)
{
struct timekeeper *tk = &tk_core.timekeeper;
u64 now, last, mask, max, delta;
unsigned int seq;
/*
* Since we're called holding a seqcount, the data may shift
* under us while we're doing the calculation. This can cause
* false positives, since we'd note a problem but throw the
* results away. So nest another seqcount here to atomically
* grab the points we are checking with.
*/
do {
seq = read_seqcount_begin(&tk_core.seq);
now = tk_clock_read(tkr);
last = tkr->cycle_last;
mask = tkr->mask;
max = tkr->clock->max_cycles;
} while (read_seqcount_retry(&tk_core.seq, seq));
delta = clocksource_delta(now, last, mask);
/*
* Try to catch underflows by checking if we are seeing small
* mask-relative negative values.
*/
if (unlikely((~delta & mask) < (mask >> 3)))
tk->underflow_seen = 1;
/* Check for multiplication overflows */
if (unlikely(delta > max))
tk->overflow_seen = 1;
/* timekeeping_cycles_to_ns() handles both under and overflow */
return timekeeping_cycles_to_ns(tkr, now);
}
#else
static inline void timekeeping_check_update(struct timekeeper *tk, u64 offset)
{
}
static inline u64 timekeeping_debug_get_ns(const struct tk_read_base *tkr)
{
BUG();
}
#endif
/** /**
* tk_setup_internals - Set up internals to use clocksource clock. * tk_setup_internals - Set up internals to use clocksource clock.
* *
@ -407,19 +330,11 @@ static inline u64 timekeeping_cycles_to_ns(const struct tk_read_base *tkr, u64 c
return ((delta * tkr->mult) + tkr->xtime_nsec) >> tkr->shift; return ((delta * tkr->mult) + tkr->xtime_nsec) >> tkr->shift;
} }
static __always_inline u64 __timekeeping_get_ns(const struct tk_read_base *tkr) static __always_inline u64 timekeeping_get_ns(const struct tk_read_base *tkr)
{ {
return timekeeping_cycles_to_ns(tkr, tk_clock_read(tkr)); return timekeeping_cycles_to_ns(tkr, tk_clock_read(tkr));
} }
static inline u64 timekeeping_get_ns(const struct tk_read_base *tkr)
{
if (IS_ENABLED(CONFIG_DEBUG_TIMEKEEPING))
return timekeeping_debug_get_ns(tkr);
return __timekeeping_get_ns(tkr);
}
/** /**
* update_fast_timekeeper - Update the fast and NMI safe monotonic timekeeper. * update_fast_timekeeper - Update the fast and NMI safe monotonic timekeeper.
* @tkr: Timekeeping readout base from which we take the update * @tkr: Timekeeping readout base from which we take the update
@ -465,7 +380,7 @@ static __always_inline u64 __ktime_get_fast_ns(struct tk_fast *tkf)
seq = read_seqcount_latch(&tkf->seq); seq = read_seqcount_latch(&tkf->seq);
tkr = tkf->base + (seq & 0x01); tkr = tkf->base + (seq & 0x01);
now = ktime_to_ns(tkr->base); now = ktime_to_ns(tkr->base);
now += __timekeeping_get_ns(tkr); now += timekeeping_get_ns(tkr);
} while (read_seqcount_latch_retry(&tkf->seq, seq)); } while (read_seqcount_latch_retry(&tkf->seq, seq));
return now; return now;
@ -536,7 +451,7 @@ EXPORT_SYMBOL_GPL(ktime_get_raw_fast_ns);
* timekeeping_inject_sleeptime64() * timekeeping_inject_sleeptime64()
* __timekeeping_inject_sleeptime(tk, delta); * __timekeeping_inject_sleeptime(tk, delta);
* timestamp(); * timestamp();
* timekeeping_update(tk, TK_CLEAR_NTP...); * timekeeping_update_staged(tkd, TK_CLEAR_NTP...);
* *
* (2) On 32-bit systems, the 64-bit boot offset (tk->offs_boot) may be * (2) On 32-bit systems, the 64-bit boot offset (tk->offs_boot) may be
* partially updated. Since the tk->offs_boot update is a rare event, this * partially updated. Since the tk->offs_boot update is a rare event, this
@ -581,7 +496,7 @@ static __always_inline u64 __ktime_get_real_fast(struct tk_fast *tkf, u64 *mono)
tkr = tkf->base + (seq & 0x01); tkr = tkf->base + (seq & 0x01);
basem = ktime_to_ns(tkr->base); basem = ktime_to_ns(tkr->base);
baser = ktime_to_ns(tkr->base_real); baser = ktime_to_ns(tkr->base_real);
delta = __timekeeping_get_ns(tkr); delta = timekeeping_get_ns(tkr);
} while (raw_read_seqcount_latch_retry(&tkf->seq, seq)); } while (raw_read_seqcount_latch_retry(&tkf->seq, seq));
if (mono) if (mono)
@ -695,13 +610,11 @@ static void update_pvclock_gtod(struct timekeeper *tk, bool was_set)
int pvclock_gtod_register_notifier(struct notifier_block *nb) int pvclock_gtod_register_notifier(struct notifier_block *nb)
{ {
struct timekeeper *tk = &tk_core.timekeeper; struct timekeeper *tk = &tk_core.timekeeper;
unsigned long flags;
int ret; int ret;
raw_spin_lock_irqsave(&timekeeper_lock, flags); guard(raw_spinlock_irqsave)(&tk_core.lock);
ret = raw_notifier_chain_register(&pvclock_gtod_chain, nb); ret = raw_notifier_chain_register(&pvclock_gtod_chain, nb);
update_pvclock_gtod(tk, true); update_pvclock_gtod(tk, true);
raw_spin_unlock_irqrestore(&timekeeper_lock, flags);
return ret; return ret;
} }
@ -714,14 +627,8 @@ EXPORT_SYMBOL_GPL(pvclock_gtod_register_notifier);
*/ */
int pvclock_gtod_unregister_notifier(struct notifier_block *nb) int pvclock_gtod_unregister_notifier(struct notifier_block *nb)
{ {
unsigned long flags; guard(raw_spinlock_irqsave)(&tk_core.lock);
int ret; return raw_notifier_chain_unregister(&pvclock_gtod_chain, nb);
raw_spin_lock_irqsave(&timekeeper_lock, flags);
ret = raw_notifier_chain_unregister(&pvclock_gtod_chain, nb);
raw_spin_unlock_irqrestore(&timekeeper_lock, flags);
return ret;
} }
EXPORT_SYMBOL_GPL(pvclock_gtod_unregister_notifier); EXPORT_SYMBOL_GPL(pvclock_gtod_unregister_notifier);
@ -736,6 +643,18 @@ static inline void tk_update_leap_state(struct timekeeper *tk)
tk->next_leap_ktime = ktime_sub(tk->next_leap_ktime, tk->offs_real); tk->next_leap_ktime = ktime_sub(tk->next_leap_ktime, tk->offs_real);
} }
/*
* Leap state update for both shadow and the real timekeeper
* Separate to spare a full memcpy() of the timekeeper.
*/
static void tk_update_leap_state_all(struct tk_data *tkd)
{
write_seqcount_begin(&tkd->seq);
tk_update_leap_state(&tkd->shadow_timekeeper);
tkd->timekeeper.next_leap_ktime = tkd->shadow_timekeeper.next_leap_ktime;
write_seqcount_end(&tkd->seq);
}
/* /*
* Update the ktime_t based scalar nsec members of the timekeeper * Update the ktime_t based scalar nsec members of the timekeeper
*/ */
@ -769,9 +688,30 @@ static inline void tk_update_ktime_data(struct timekeeper *tk)
tk->tkr_raw.base = ns_to_ktime(tk->raw_sec * NSEC_PER_SEC); tk->tkr_raw.base = ns_to_ktime(tk->raw_sec * NSEC_PER_SEC);
} }
/* must hold timekeeper_lock */ /*
static void timekeeping_update(struct timekeeper *tk, unsigned int action) * Restore the shadow timekeeper from the real timekeeper.
*/
static void timekeeping_restore_shadow(struct tk_data *tkd)
{ {
lockdep_assert_held(&tkd->lock);
memcpy(&tkd->shadow_timekeeper, &tkd->timekeeper, sizeof(tkd->timekeeper));
}
static void timekeeping_update_from_shadow(struct tk_data *tkd, unsigned int action)
{
struct timekeeper *tk = &tk_core.shadow_timekeeper;
lockdep_assert_held(&tkd->lock);
/*
* Block out readers before running the updates below because that
* updates VDSO and other time related infrastructure. Not blocking
* the readers might let a reader see time going backwards when
* reading from the VDSO after the VDSO update and then reading in
* the kernel from the timekeeper before that got updated.
*/
write_seqcount_begin(&tkd->seq);
if (action & TK_CLEAR_NTP) { if (action & TK_CLEAR_NTP) {
tk->ntp_error = 0; tk->ntp_error = 0;
ntp_clear(); ntp_clear();
@ -789,14 +729,17 @@ static void timekeeping_update(struct timekeeper *tk, unsigned int action)
if (action & TK_CLOCK_WAS_SET) if (action & TK_CLOCK_WAS_SET)
tk->clock_was_set_seq++; tk->clock_was_set_seq++;
/* /*
* The mirroring of the data to the shadow-timekeeper needs * Update the real timekeeper.
* to happen last here to ensure we don't over-write the *
* timekeeper structure on the next update with stale data * We could avoid this memcpy() by switching pointers, but that has
* the downside that the reader side does not longer benefit from
* the cacheline optimized data layout of the timekeeper and requires
* another indirection.
*/ */
if (action & TK_MIRROR) memcpy(&tkd->timekeeper, tk, sizeof(*tk));
memcpy(&shadow_timekeeper, &tk_core.timekeeper, write_seqcount_end(&tkd->seq);
sizeof(tk_core.timekeeper));
} }
/** /**
@ -949,6 +892,14 @@ ktime_t ktime_mono_to_any(ktime_t tmono, enum tk_offsets offs)
unsigned int seq; unsigned int seq;
ktime_t tconv; ktime_t tconv;
if (IS_ENABLED(CONFIG_64BIT)) {
/*
* Paired with WRITE_ONCE()s in tk_set_wall_to_mono() and
* tk_update_sleep_time().
*/
return ktime_add(tmono, READ_ONCE(*offset));
}
do { do {
seq = read_seqcount_begin(&tk_core.seq); seq = read_seqcount_begin(&tk_core.seq);
tconv = ktime_add(tmono, *offset); tconv = ktime_add(tmono, *offset);
@ -1079,6 +1030,7 @@ void ktime_get_snapshot(struct system_time_snapshot *systime_snapshot)
unsigned int seq; unsigned int seq;
ktime_t base_raw; ktime_t base_raw;
ktime_t base_real; ktime_t base_real;
ktime_t base_boot;
u64 nsec_raw; u64 nsec_raw;
u64 nsec_real; u64 nsec_real;
u64 now; u64 now;
@ -1093,6 +1045,8 @@ void ktime_get_snapshot(struct system_time_snapshot *systime_snapshot)
systime_snapshot->clock_was_set_seq = tk->clock_was_set_seq; systime_snapshot->clock_was_set_seq = tk->clock_was_set_seq;
base_real = ktime_add(tk->tkr_mono.base, base_real = ktime_add(tk->tkr_mono.base,
tk_core.timekeeper.offs_real); tk_core.timekeeper.offs_real);
base_boot = ktime_add(tk->tkr_mono.base,
tk_core.timekeeper.offs_boot);
base_raw = tk->tkr_raw.base; base_raw = tk->tkr_raw.base;
nsec_real = timekeeping_cycles_to_ns(&tk->tkr_mono, now); nsec_real = timekeeping_cycles_to_ns(&tk->tkr_mono, now);
nsec_raw = timekeeping_cycles_to_ns(&tk->tkr_raw, now); nsec_raw = timekeeping_cycles_to_ns(&tk->tkr_raw, now);
@ -1100,6 +1054,7 @@ void ktime_get_snapshot(struct system_time_snapshot *systime_snapshot)
systime_snapshot->cycles = now; systime_snapshot->cycles = now;
systime_snapshot->real = ktime_add_ns(base_real, nsec_real); systime_snapshot->real = ktime_add_ns(base_real, nsec_real);
systime_snapshot->boot = ktime_add_ns(base_boot, nsec_real);
systime_snapshot->raw = ktime_add_ns(base_raw, nsec_raw); systime_snapshot->raw = ktime_add_ns(base_raw, nsec_raw);
} }
EXPORT_SYMBOL_GPL(ktime_get_snapshot); EXPORT_SYMBOL_GPL(ktime_get_snapshot);
@ -1459,45 +1414,35 @@ EXPORT_SYMBOL_GPL(timekeeping_clocksource_has_base);
*/ */
int do_settimeofday64(const struct timespec64 *ts) int do_settimeofday64(const struct timespec64 *ts)
{ {
struct timekeeper *tk = &tk_core.timekeeper;
struct timespec64 ts_delta, xt; struct timespec64 ts_delta, xt;
unsigned long flags;
int ret = 0;
if (!timespec64_valid_settod(ts)) if (!timespec64_valid_settod(ts))
return -EINVAL; return -EINVAL;
raw_spin_lock_irqsave(&timekeeper_lock, flags); scoped_guard (raw_spinlock_irqsave, &tk_core.lock) {
write_seqcount_begin(&tk_core.seq); struct timekeeper *tks = &tk_core.shadow_timekeeper;
timekeeping_forward_now(tk); timekeeping_forward_now(tks);
xt = tk_xtime(tk); xt = tk_xtime(tks);
ts_delta = timespec64_sub(*ts, xt); ts_delta = timespec64_sub(*ts, xt);
if (timespec64_compare(&tk->wall_to_monotonic, &ts_delta) > 0) { if (timespec64_compare(&tks->wall_to_monotonic, &ts_delta) > 0) {
ret = -EINVAL; timekeeping_restore_shadow(&tk_core);
goto out; return -EINVAL;
}
tk_set_wall_to_mono(tks, timespec64_sub(tks->wall_to_monotonic, ts_delta));
tk_set_xtime(tks, ts);
timekeeping_update_from_shadow(&tk_core, TK_UPDATE_ALL);
} }
tk_set_wall_to_mono(tk, timespec64_sub(tk->wall_to_monotonic, ts_delta));
tk_set_xtime(tk, ts);
out:
timekeeping_update(tk, TK_CLEAR_NTP | TK_MIRROR | TK_CLOCK_WAS_SET);
write_seqcount_end(&tk_core.seq);
raw_spin_unlock_irqrestore(&timekeeper_lock, flags);
/* Signal hrtimers about time change */ /* Signal hrtimers about time change */
clock_was_set(CLOCK_SET_WALL); clock_was_set(CLOCK_SET_WALL);
if (!ret) { audit_tk_injoffset(ts_delta);
audit_tk_injoffset(ts_delta); add_device_randomness(ts, sizeof(*ts));
add_device_randomness(ts, sizeof(*ts)); return 0;
}
return ret;
} }
EXPORT_SYMBOL(do_settimeofday64); EXPORT_SYMBOL(do_settimeofday64);
@ -1509,40 +1454,31 @@ EXPORT_SYMBOL(do_settimeofday64);
*/ */
static int timekeeping_inject_offset(const struct timespec64 *ts) static int timekeeping_inject_offset(const struct timespec64 *ts)
{ {
struct timekeeper *tk = &tk_core.timekeeper;
unsigned long flags;
struct timespec64 tmp;
int ret = 0;
if (ts->tv_nsec < 0 || ts->tv_nsec >= NSEC_PER_SEC) if (ts->tv_nsec < 0 || ts->tv_nsec >= NSEC_PER_SEC)
return -EINVAL; return -EINVAL;
raw_spin_lock_irqsave(&timekeeper_lock, flags); scoped_guard (raw_spinlock_irqsave, &tk_core.lock) {
write_seqcount_begin(&tk_core.seq); struct timekeeper *tks = &tk_core.shadow_timekeeper;
struct timespec64 tmp;
timekeeping_forward_now(tk); timekeeping_forward_now(tks);
/* Make sure the proposed value is valid */ /* Make sure the proposed value is valid */
tmp = timespec64_add(tk_xtime(tk), *ts); tmp = timespec64_add(tk_xtime(tks), *ts);
if (timespec64_compare(&tk->wall_to_monotonic, ts) > 0 || if (timespec64_compare(&tks->wall_to_monotonic, ts) > 0 ||
!timespec64_valid_settod(&tmp)) { !timespec64_valid_settod(&tmp)) {
ret = -EINVAL; timekeeping_restore_shadow(&tk_core);
goto error; return -EINVAL;
}
tk_xtime_add(tks, ts);
tk_set_wall_to_mono(tks, timespec64_sub(tks->wall_to_monotonic, *ts));
timekeeping_update_from_shadow(&tk_core, TK_UPDATE_ALL);
} }
tk_xtime_add(tk, ts);
tk_set_wall_to_mono(tk, timespec64_sub(tk->wall_to_monotonic, *ts));
error: /* even if we error out, we forwarded the time, so call update */
timekeeping_update(tk, TK_CLEAR_NTP | TK_MIRROR | TK_CLOCK_WAS_SET);
write_seqcount_end(&tk_core.seq);
raw_spin_unlock_irqrestore(&timekeeper_lock, flags);
/* Signal hrtimers about time change */ /* Signal hrtimers about time change */
clock_was_set(CLOCK_SET_WALL); clock_was_set(CLOCK_SET_WALL);
return 0;
return ret;
} }
/* /*
@ -1595,43 +1531,34 @@ static void __timekeeping_set_tai_offset(struct timekeeper *tk, s32 tai_offset)
*/ */
static int change_clocksource(void *data) static int change_clocksource(void *data)
{ {
struct timekeeper *tk = &tk_core.timekeeper; struct clocksource *new = data, *old = NULL;
struct clocksource *new, *old = NULL;
unsigned long flags;
bool change = false;
new = (struct clocksource *) data;
/* /*
* If the cs is in module, get a module reference. Succeeds * If the clocksource is in a module, get a module reference.
* for built-in code (owner == NULL) as well. * Succeeds for built-in code (owner == NULL) as well. Abort if the
* reference can't be acquired.
*/ */
if (try_module_get(new->owner)) { if (!try_module_get(new->owner))
if (!new->enable || new->enable(new) == 0) return 0;
change = true;
else /* Abort if the device can't be enabled */
module_put(new->owner); if (new->enable && new->enable(new) != 0) {
module_put(new->owner);
return 0;
} }
raw_spin_lock_irqsave(&timekeeper_lock, flags); scoped_guard (raw_spinlock_irqsave, &tk_core.lock) {
write_seqcount_begin(&tk_core.seq); struct timekeeper *tks = &tk_core.shadow_timekeeper;
timekeeping_forward_now(tk); timekeeping_forward_now(tks);
old = tks->tkr_mono.clock;
if (change) { tk_setup_internals(tks, new);
old = tk->tkr_mono.clock; timekeeping_update_from_shadow(&tk_core, TK_UPDATE_ALL);
tk_setup_internals(tk, new);
} }
timekeeping_update(tk, TK_CLEAR_NTP | TK_MIRROR | TK_CLOCK_WAS_SET);
write_seqcount_end(&tk_core.seq);
raw_spin_unlock_irqrestore(&timekeeper_lock, flags);
if (old) { if (old) {
if (old->disable) if (old->disable)
old->disable(old); old->disable(old);
module_put(old->owner); module_put(old->owner);
} }
@ -1756,6 +1683,12 @@ read_persistent_wall_and_boot_offset(struct timespec64 *wall_time,
*boot_offset = ns_to_timespec64(local_clock()); *boot_offset = ns_to_timespec64(local_clock());
} }
static __init void tkd_basic_setup(struct tk_data *tkd)
{
raw_spin_lock_init(&tkd->lock);
seqcount_raw_spinlock_init(&tkd->seq, &tkd->lock);
}
/* /*
* Flag reflecting whether timekeeping_resume() has injected sleeptime. * Flag reflecting whether timekeeping_resume() has injected sleeptime.
* *
@ -1780,9 +1713,10 @@ static bool persistent_clock_exists;
void __init timekeeping_init(void) void __init timekeeping_init(void)
{ {
struct timespec64 wall_time, boot_offset, wall_to_mono; struct timespec64 wall_time, boot_offset, wall_to_mono;
struct timekeeper *tk = &tk_core.timekeeper; struct timekeeper *tks = &tk_core.shadow_timekeeper;
struct clocksource *clock; struct clocksource *clock;
unsigned long flags;
tkd_basic_setup(&tk_core);
read_persistent_wall_and_boot_offset(&wall_time, &boot_offset); read_persistent_wall_and_boot_offset(&wall_time, &boot_offset);
if (timespec64_valid_settod(&wall_time) && if (timespec64_valid_settod(&wall_time) &&
@ -1802,24 +1736,21 @@ void __init timekeeping_init(void)
*/ */
wall_to_mono = timespec64_sub(boot_offset, wall_time); wall_to_mono = timespec64_sub(boot_offset, wall_time);
raw_spin_lock_irqsave(&timekeeper_lock, flags); guard(raw_spinlock_irqsave)(&tk_core.lock);
write_seqcount_begin(&tk_core.seq);
ntp_init(); ntp_init();
clock = clocksource_default_clock(); clock = clocksource_default_clock();
if (clock->enable) if (clock->enable)
clock->enable(clock); clock->enable(clock);
tk_setup_internals(tk, clock); tk_setup_internals(tks, clock);
tk_set_xtime(tk, &wall_time); tk_set_xtime(tks, &wall_time);
tk->raw_sec = 0; tks->raw_sec = 0;
tk_set_wall_to_mono(tk, wall_to_mono); tk_set_wall_to_mono(tks, wall_to_mono);
timekeeping_update(tk, TK_MIRROR | TK_CLOCK_WAS_SET); timekeeping_update_from_shadow(&tk_core, TK_CLOCK_WAS_SET);
write_seqcount_end(&tk_core.seq);
raw_spin_unlock_irqrestore(&timekeeper_lock, flags);
} }
/* time in seconds when suspend began for persistent clock */ /* time in seconds when suspend began for persistent clock */
@ -1897,22 +1828,14 @@ bool timekeeping_rtc_skipsuspend(void)
*/ */
void timekeeping_inject_sleeptime64(const struct timespec64 *delta) void timekeeping_inject_sleeptime64(const struct timespec64 *delta)
{ {
struct timekeeper *tk = &tk_core.timekeeper; scoped_guard(raw_spinlock_irqsave, &tk_core.lock) {
unsigned long flags; struct timekeeper *tks = &tk_core.shadow_timekeeper;
raw_spin_lock_irqsave(&timekeeper_lock, flags); suspend_timing_needed = false;
write_seqcount_begin(&tk_core.seq); timekeeping_forward_now(tks);
__timekeeping_inject_sleeptime(tks, delta);
suspend_timing_needed = false; timekeeping_update_from_shadow(&tk_core, TK_UPDATE_ALL);
}
timekeeping_forward_now(tk);
__timekeeping_inject_sleeptime(tk, delta);
timekeeping_update(tk, TK_CLEAR_NTP | TK_MIRROR | TK_CLOCK_WAS_SET);
write_seqcount_end(&tk_core.seq);
raw_spin_unlock_irqrestore(&timekeeper_lock, flags);
/* Signal hrtimers about time change */ /* Signal hrtimers about time change */
clock_was_set(CLOCK_SET_WALL | CLOCK_SET_BOOT); clock_was_set(CLOCK_SET_WALL | CLOCK_SET_BOOT);
@ -1924,20 +1847,19 @@ void timekeeping_inject_sleeptime64(const struct timespec64 *delta)
*/ */
void timekeeping_resume(void) void timekeeping_resume(void)
{ {
struct timekeeper *tk = &tk_core.timekeeper; struct timekeeper *tks = &tk_core.shadow_timekeeper;
struct clocksource *clock = tk->tkr_mono.clock; struct clocksource *clock = tks->tkr_mono.clock;
unsigned long flags;
struct timespec64 ts_new, ts_delta; struct timespec64 ts_new, ts_delta;
u64 cycle_now, nsec;
bool inject_sleeptime = false; bool inject_sleeptime = false;
u64 cycle_now, nsec;
unsigned long flags;
read_persistent_clock64(&ts_new); read_persistent_clock64(&ts_new);
clockevents_resume(); clockevents_resume();
clocksource_resume(); clocksource_resume();
raw_spin_lock_irqsave(&timekeeper_lock, flags); raw_spin_lock_irqsave(&tk_core.lock, flags);
write_seqcount_begin(&tk_core.seq);
/* /*
* After system resumes, we need to calculate the suspended time and * After system resumes, we need to calculate the suspended time and
@ -1951,7 +1873,7 @@ void timekeeping_resume(void)
* The less preferred source will only be tried if there is no better * The less preferred source will only be tried if there is no better
* usable source. The rtc part is handled separately in rtc core code. * usable source. The rtc part is handled separately in rtc core code.
*/ */
cycle_now = tk_clock_read(&tk->tkr_mono); cycle_now = tk_clock_read(&tks->tkr_mono);
nsec = clocksource_stop_suspend_timing(clock, cycle_now); nsec = clocksource_stop_suspend_timing(clock, cycle_now);
if (nsec > 0) { if (nsec > 0) {
ts_delta = ns_to_timespec64(nsec); ts_delta = ns_to_timespec64(nsec);
@ -1963,18 +1885,17 @@ void timekeeping_resume(void)
if (inject_sleeptime) { if (inject_sleeptime) {
suspend_timing_needed = false; suspend_timing_needed = false;
__timekeeping_inject_sleeptime(tk, &ts_delta); __timekeeping_inject_sleeptime(tks, &ts_delta);
} }
/* Re-base the last cycle value */ /* Re-base the last cycle value */
tk->tkr_mono.cycle_last = cycle_now; tks->tkr_mono.cycle_last = cycle_now;
tk->tkr_raw.cycle_last = cycle_now; tks->tkr_raw.cycle_last = cycle_now;
tk->ntp_error = 0; tks->ntp_error = 0;
timekeeping_suspended = 0; timekeeping_suspended = 0;
timekeeping_update(tk, TK_MIRROR | TK_CLOCK_WAS_SET); timekeeping_update_from_shadow(&tk_core, TK_CLOCK_WAS_SET);
write_seqcount_end(&tk_core.seq); raw_spin_unlock_irqrestore(&tk_core.lock, flags);
raw_spin_unlock_irqrestore(&timekeeper_lock, flags);
touch_softlockup_watchdog(); touch_softlockup_watchdog();
@ -1986,11 +1907,11 @@ void timekeeping_resume(void)
int timekeeping_suspend(void) int timekeeping_suspend(void)
{ {
struct timekeeper *tk = &tk_core.timekeeper; struct timekeeper *tks = &tk_core.shadow_timekeeper;
unsigned long flags; struct timespec64 delta, delta_delta;
struct timespec64 delta, delta_delta; static struct timespec64 old_delta;
static struct timespec64 old_delta;
struct clocksource *curr_clock; struct clocksource *curr_clock;
unsigned long flags;
u64 cycle_now; u64 cycle_now;
read_persistent_clock64(&timekeeping_suspend_time); read_persistent_clock64(&timekeeping_suspend_time);
@ -2005,9 +1926,8 @@ int timekeeping_suspend(void)
suspend_timing_needed = true; suspend_timing_needed = true;
raw_spin_lock_irqsave(&timekeeper_lock, flags); raw_spin_lock_irqsave(&tk_core.lock, flags);
write_seqcount_begin(&tk_core.seq); timekeeping_forward_now(tks);
timekeeping_forward_now(tk);
timekeeping_suspended = 1; timekeeping_suspended = 1;
/* /*
@ -2015,8 +1935,8 @@ int timekeeping_suspend(void)
* just read from the current clocksource. Save this to potentially * just read from the current clocksource. Save this to potentially
* use in suspend timing. * use in suspend timing.
*/ */
curr_clock = tk->tkr_mono.clock; curr_clock = tks->tkr_mono.clock;
cycle_now = tk->tkr_mono.cycle_last; cycle_now = tks->tkr_mono.cycle_last;
clocksource_start_suspend_timing(curr_clock, cycle_now); clocksource_start_suspend_timing(curr_clock, cycle_now);
if (persistent_clock_exists) { if (persistent_clock_exists) {
@ -2026,7 +1946,7 @@ int timekeeping_suspend(void)
* try to compensate so the difference in system time * try to compensate so the difference in system time
* and persistent_clock time stays close to constant. * and persistent_clock time stays close to constant.
*/ */
delta = timespec64_sub(tk_xtime(tk), timekeeping_suspend_time); delta = timespec64_sub(tk_xtime(tks), timekeeping_suspend_time);
delta_delta = timespec64_sub(delta, old_delta); delta_delta = timespec64_sub(delta, old_delta);
if (abs(delta_delta.tv_sec) >= 2) { if (abs(delta_delta.tv_sec) >= 2) {
/* /*
@ -2041,10 +1961,9 @@ int timekeeping_suspend(void)
} }
} }
timekeeping_update(tk, TK_MIRROR); timekeeping_update_from_shadow(&tk_core, 0);
halt_fast_timekeeper(tk); halt_fast_timekeeper(tks);
write_seqcount_end(&tk_core.seq); raw_spin_unlock_irqrestore(&tk_core.lock, flags);
raw_spin_unlock_irqrestore(&timekeeper_lock, flags);
tick_suspend(); tick_suspend();
clocksource_suspend(); clocksource_suspend();
@ -2149,16 +2068,17 @@ static __always_inline void timekeeping_apply_adjustment(struct timekeeper *tk,
*/ */
static void timekeeping_adjust(struct timekeeper *tk, s64 offset) static void timekeeping_adjust(struct timekeeper *tk, s64 offset)
{ {
u64 ntp_tl = ntp_tick_length();
u32 mult; u32 mult;
/* /*
* Determine the multiplier from the current NTP tick length. * Determine the multiplier from the current NTP tick length.
* Avoid expensive division when the tick length doesn't change. * Avoid expensive division when the tick length doesn't change.
*/ */
if (likely(tk->ntp_tick == ntp_tick_length())) { if (likely(tk->ntp_tick == ntp_tl)) {
mult = tk->tkr_mono.mult - tk->ntp_err_mult; mult = tk->tkr_mono.mult - tk->ntp_err_mult;
} else { } else {
tk->ntp_tick = ntp_tick_length(); tk->ntp_tick = ntp_tl;
mult = div64_u64((tk->ntp_tick >> tk->ntp_error_shift) - mult = div64_u64((tk->ntp_tick >> tk->ntp_error_shift) -
tk->xtime_remainder, tk->cycle_interval); tk->xtime_remainder, tk->cycle_interval);
} }
@ -2297,28 +2217,24 @@ static u64 logarithmic_accumulation(struct timekeeper *tk, u64 offset,
*/ */
static bool timekeeping_advance(enum timekeeping_adv_mode mode) static bool timekeeping_advance(enum timekeeping_adv_mode mode)
{ {
struct timekeeper *tk = &tk_core.shadow_timekeeper;
struct timekeeper *real_tk = &tk_core.timekeeper; struct timekeeper *real_tk = &tk_core.timekeeper;
struct timekeeper *tk = &shadow_timekeeper;
u64 offset;
int shift = 0, maxshift;
unsigned int clock_set = 0; unsigned int clock_set = 0;
unsigned long flags; int shift = 0, maxshift;
u64 offset;
raw_spin_lock_irqsave(&timekeeper_lock, flags); guard(raw_spinlock_irqsave)(&tk_core.lock);
/* Make sure we're fully resumed: */ /* Make sure we're fully resumed: */
if (unlikely(timekeeping_suspended)) if (unlikely(timekeeping_suspended))
goto out; return false;
offset = clocksource_delta(tk_clock_read(&tk->tkr_mono), offset = clocksource_delta(tk_clock_read(&tk->tkr_mono),
tk->tkr_mono.cycle_last, tk->tkr_mono.mask); tk->tkr_mono.cycle_last, tk->tkr_mono.mask);
/* Check if there's really nothing to do */ /* Check if there's really nothing to do */
if (offset < real_tk->cycle_interval && mode == TK_ADV_TICK) if (offset < real_tk->cycle_interval && mode == TK_ADV_TICK)
goto out; return false;
/* Do some additional sanity checking */
timekeeping_check_update(tk, offset);
/* /*
* With NO_HZ we may have to accumulate many cycle_intervals * With NO_HZ we may have to accumulate many cycle_intervals
@ -2334,8 +2250,7 @@ static bool timekeeping_advance(enum timekeeping_adv_mode mode)
maxshift = (64 - (ilog2(ntp_tick_length())+1)) - 1; maxshift = (64 - (ilog2(ntp_tick_length())+1)) - 1;
shift = min(shift, maxshift); shift = min(shift, maxshift);
while (offset >= tk->cycle_interval) { while (offset >= tk->cycle_interval) {
offset = logarithmic_accumulation(tk, offset, shift, offset = logarithmic_accumulation(tk, offset, shift, &clock_set);
&clock_set);
if (offset < tk->cycle_interval<<shift) if (offset < tk->cycle_interval<<shift)
shift--; shift--;
} }
@ -2349,23 +2264,7 @@ static bool timekeeping_advance(enum timekeeping_adv_mode mode)
*/ */
clock_set |= accumulate_nsecs_to_secs(tk); clock_set |= accumulate_nsecs_to_secs(tk);
write_seqcount_begin(&tk_core.seq); timekeeping_update_from_shadow(&tk_core, clock_set);
/*
* Update the real timekeeper.
*
* We could avoid this memcpy by switching pointers, but that
* requires changes to all other timekeeper usage sites as
* well, i.e. move the timekeeper pointer getter into the
* spinlocked/seqcount protected sections. And we trade this
* memcpy under the tk_core.seq against one before we start
* updating.
*/
timekeeping_update(tk, clock_set);
memcpy(real_tk, tk, sizeof(*tk));
/* The memcpy must come last. Do not put anything here! */
write_seqcount_end(&tk_core.seq);
out:
raw_spin_unlock_irqrestore(&timekeeper_lock, flags);
return !!clock_set; return !!clock_set;
} }
@ -2658,13 +2557,10 @@ EXPORT_SYMBOL_GPL(random_get_entropy_fallback);
*/ */
int do_adjtimex(struct __kernel_timex *txc) int do_adjtimex(struct __kernel_timex *txc)
{ {
struct timekeeper *tk = &tk_core.timekeeper;
struct audit_ntp_data ad; struct audit_ntp_data ad;
bool offset_set = false; bool offset_set = false;
bool clock_set = false; bool clock_set = false;
struct timespec64 ts; struct timespec64 ts;
unsigned long flags;
s32 orig_tai, tai;
int ret; int ret;
/* Validate the data before disabling interrupts */ /* Validate the data before disabling interrupts */
@ -2675,6 +2571,7 @@ int do_adjtimex(struct __kernel_timex *txc)
if (txc->modes & ADJ_SETOFFSET) { if (txc->modes & ADJ_SETOFFSET) {
struct timespec64 delta; struct timespec64 delta;
delta.tv_sec = txc->time.tv_sec; delta.tv_sec = txc->time.tv_sec;
delta.tv_nsec = txc->time.tv_usec; delta.tv_nsec = txc->time.tv_usec;
if (!(txc->modes & ADJ_NANO)) if (!(txc->modes & ADJ_NANO))
@ -2692,21 +2589,21 @@ int do_adjtimex(struct __kernel_timex *txc)
ktime_get_real_ts64(&ts); ktime_get_real_ts64(&ts);
add_device_randomness(&ts, sizeof(ts)); add_device_randomness(&ts, sizeof(ts));
raw_spin_lock_irqsave(&timekeeper_lock, flags); scoped_guard (raw_spinlock_irqsave, &tk_core.lock) {
write_seqcount_begin(&tk_core.seq); struct timekeeper *tks = &tk_core.shadow_timekeeper;
s32 orig_tai, tai;
orig_tai = tai = tk->tai_offset; orig_tai = tai = tks->tai_offset;
ret = __do_adjtimex(txc, &ts, &tai, &ad); ret = __do_adjtimex(txc, &ts, &tai, &ad);
if (tai != orig_tai) { if (tai != orig_tai) {
__timekeeping_set_tai_offset(tk, tai); __timekeeping_set_tai_offset(tks, tai);
timekeeping_update(tk, TK_MIRROR | TK_CLOCK_WAS_SET); timekeeping_update_from_shadow(&tk_core, TK_CLOCK_WAS_SET);
clock_set = true; clock_set = true;
} else {
tk_update_leap_state_all(&tk_core);
}
} }
tk_update_leap_state(tk);
write_seqcount_end(&tk_core.seq);
raw_spin_unlock_irqrestore(&timekeeper_lock, flags);
audit_ntp_log(&ad); audit_ntp_log(&ad);
@ -2730,15 +2627,8 @@ int do_adjtimex(struct __kernel_timex *txc)
*/ */
void hardpps(const struct timespec64 *phase_ts, const struct timespec64 *raw_ts) void hardpps(const struct timespec64 *phase_ts, const struct timespec64 *raw_ts)
{ {
unsigned long flags; guard(raw_spinlock_irqsave)(&tk_core.lock);
raw_spin_lock_irqsave(&timekeeper_lock, flags);
write_seqcount_begin(&tk_core.seq);
__hardpps(phase_ts, raw_ts); __hardpps(phase_ts, raw_ts);
write_seqcount_end(&tk_core.seq);
raw_spin_unlock_irqrestore(&timekeeper_lock, flags);
} }
EXPORT_SYMBOL(hardpps); EXPORT_SYMBOL(hardpps);
#endif /* CONFIG_NTP_PPS */ #endif /* CONFIG_NTP_PPS */

View File

@ -30,7 +30,6 @@ static inline void timekeeping_inc_mg_floor_swaps(void)
#endif #endif
#ifdef CONFIG_CLOCKSOURCE_VALIDATE_LAST_CYCLE
static inline u64 clocksource_delta(u64 now, u64 last, u64 mask) static inline u64 clocksource_delta(u64 now, u64 last, u64 mask)
{ {
u64 ret = (now - last) & mask; u64 ret = (now - last) & mask;
@ -41,14 +40,9 @@ static inline u64 clocksource_delta(u64 now, u64 last, u64 mask)
*/ */
return ret & ~(mask >> 1) ? 0 : ret; return ret & ~(mask >> 1) ? 0 : ret;
} }
#else
static inline u64 clocksource_delta(u64 now, u64 last, u64 mask)
{
return (now - last) & mask;
}
#endif
/* Semi public for serialization of non timekeeper VDSO updates. */ /* Semi public for serialization of non timekeeper VDSO updates. */
extern raw_spinlock_t timekeeper_lock; unsigned long timekeeper_lock_irqsave(void);
void timekeeper_unlock_irqrestore(unsigned long flags);
#endif /* _TIMEKEEPING_INTERNAL_H */ #endif /* _TIMEKEEPING_INTERNAL_H */

View File

@ -37,7 +37,6 @@
#include <linux/tick.h> #include <linux/tick.h>
#include <linux/kallsyms.h> #include <linux/kallsyms.h>
#include <linux/irq_work.h> #include <linux/irq_work.h>
#include <linux/sched/signal.h>
#include <linux/sched/sysctl.h> #include <linux/sched/sysctl.h>
#include <linux/sched/nohz.h> #include <linux/sched/nohz.h>
#include <linux/sched/debug.h> #include <linux/sched/debug.h>
@ -2422,7 +2421,8 @@ static inline void __run_timers(struct timer_base *base)
static void __run_timer_base(struct timer_base *base) static void __run_timer_base(struct timer_base *base)
{ {
if (time_before(jiffies, base->next_expiry)) /* Can race against a remote CPU updating next_expiry under the lock */
if (time_before(jiffies, READ_ONCE(base->next_expiry)))
return; return;
timer_base_lock_expiry(base); timer_base_lock_expiry(base);
@ -2526,141 +2526,6 @@ void update_process_times(int user_tick)
run_posix_cpu_timers(); run_posix_cpu_timers();
} }
/*
* Since schedule_timeout()'s timer is defined on the stack, it must store
* the target task on the stack as well.
*/
struct process_timer {
struct timer_list timer;
struct task_struct *task;
};
static void process_timeout(struct timer_list *t)
{
struct process_timer *timeout = from_timer(timeout, t, timer);
wake_up_process(timeout->task);
}
/**
* schedule_timeout - sleep until timeout
* @timeout: timeout value in jiffies
*
* Make the current task sleep until @timeout jiffies have elapsed.
* The function behavior depends on the current task state
* (see also set_current_state() description):
*
* %TASK_RUNNING - the scheduler is called, but the task does not sleep
* at all. That happens because sched_submit_work() does nothing for
* tasks in %TASK_RUNNING state.
*
* %TASK_UNINTERRUPTIBLE - at least @timeout jiffies are guaranteed to
* pass before the routine returns unless the current task is explicitly
* woken up, (e.g. by wake_up_process()).
*
* %TASK_INTERRUPTIBLE - the routine may return early if a signal is
* delivered to the current task or the current task is explicitly woken
* up.
*
* The current task state is guaranteed to be %TASK_RUNNING when this
* routine returns.
*
* Specifying a @timeout value of %MAX_SCHEDULE_TIMEOUT will schedule
* the CPU away without a bound on the timeout. In this case the return
* value will be %MAX_SCHEDULE_TIMEOUT.
*
* Returns 0 when the timer has expired otherwise the remaining time in
* jiffies will be returned. In all cases the return value is guaranteed
* to be non-negative.
*/
signed long __sched schedule_timeout(signed long timeout)
{
struct process_timer timer;
unsigned long expire;
switch (timeout)
{
case MAX_SCHEDULE_TIMEOUT:
/*
* These two special cases are useful to be comfortable
* in the caller. Nothing more. We could take
* MAX_SCHEDULE_TIMEOUT from one of the negative value
* but I' d like to return a valid offset (>=0) to allow
* the caller to do everything it want with the retval.
*/
schedule();
goto out;
default:
/*
* Another bit of PARANOID. Note that the retval will be
* 0 since no piece of kernel is supposed to do a check
* for a negative retval of schedule_timeout() (since it
* should never happens anyway). You just have the printk()
* that will tell you if something is gone wrong and where.
*/
if (timeout < 0) {
printk(KERN_ERR "schedule_timeout: wrong timeout "
"value %lx\n", timeout);
dump_stack();
__set_current_state(TASK_RUNNING);
goto out;
}
}
expire = timeout + jiffies;
timer.task = current;
timer_setup_on_stack(&timer.timer, process_timeout, 0);
__mod_timer(&timer.timer, expire, MOD_TIMER_NOTPENDING);
schedule();
del_timer_sync(&timer.timer);
/* Remove the timer from the object tracker */
destroy_timer_on_stack(&timer.timer);
timeout = expire - jiffies;
out:
return timeout < 0 ? 0 : timeout;
}
EXPORT_SYMBOL(schedule_timeout);
/*
* We can use __set_current_state() here because schedule_timeout() calls
* schedule() unconditionally.
*/
signed long __sched schedule_timeout_interruptible(signed long timeout)
{
__set_current_state(TASK_INTERRUPTIBLE);
return schedule_timeout(timeout);
}
EXPORT_SYMBOL(schedule_timeout_interruptible);
signed long __sched schedule_timeout_killable(signed long timeout)
{
__set_current_state(TASK_KILLABLE);
return schedule_timeout(timeout);
}
EXPORT_SYMBOL(schedule_timeout_killable);
signed long __sched schedule_timeout_uninterruptible(signed long timeout)
{
__set_current_state(TASK_UNINTERRUPTIBLE);
return schedule_timeout(timeout);
}
EXPORT_SYMBOL(schedule_timeout_uninterruptible);
/*
* Like schedule_timeout_uninterruptible(), except this task will not contribute
* to load average.
*/
signed long __sched schedule_timeout_idle(signed long timeout)
{
__set_current_state(TASK_IDLE);
return schedule_timeout(timeout);
}
EXPORT_SYMBOL(schedule_timeout_idle);
#ifdef CONFIG_HOTPLUG_CPU #ifdef CONFIG_HOTPLUG_CPU
static void migrate_timer_list(struct timer_base *new_base, struct hlist_head *head) static void migrate_timer_list(struct timer_base *new_base, struct hlist_head *head)
{ {
@ -2757,59 +2622,3 @@ void __init init_timers(void)
posix_cputimers_init_work(); posix_cputimers_init_work();
open_softirq(TIMER_SOFTIRQ, run_timer_softirq); open_softirq(TIMER_SOFTIRQ, run_timer_softirq);
} }
/**
* msleep - sleep safely even with waitqueue interruptions
* @msecs: Time in milliseconds to sleep for
*/
void msleep(unsigned int msecs)
{
unsigned long timeout = msecs_to_jiffies(msecs);
while (timeout)
timeout = schedule_timeout_uninterruptible(timeout);
}
EXPORT_SYMBOL(msleep);
/**
* msleep_interruptible - sleep waiting for signals
* @msecs: Time in milliseconds to sleep for
*/
unsigned long msleep_interruptible(unsigned int msecs)
{
unsigned long timeout = msecs_to_jiffies(msecs);
while (timeout && !signal_pending(current))
timeout = schedule_timeout_interruptible(timeout);
return jiffies_to_msecs(timeout);
}
EXPORT_SYMBOL(msleep_interruptible);
/**
* usleep_range_state - Sleep for an approximate time in a given state
* @min: Minimum time in usecs to sleep
* @max: Maximum time in usecs to sleep
* @state: State of the current task that will be while sleeping
*
* In non-atomic context where the exact wakeup time is flexible, use
* usleep_range_state() instead of udelay(). The sleep improves responsiveness
* by avoiding the CPU-hogging busy-wait of udelay(), and the range reduces
* power usage by allowing hrtimers to take advantage of an already-
* scheduled interrupt instead of scheduling a new one just for this sleep.
*/
void __sched usleep_range_state(unsigned long min, unsigned long max,
unsigned int state)
{
ktime_t exp = ktime_add_us(ktime_get(), min);
u64 delta = (u64)(max - min) * NSEC_PER_USEC;
for (;;) {
__set_current_state(state);
/* Do not return before the requested sleep time has elapsed */
if (!schedule_hrtimeout_range(&exp, delta, HRTIMER_MODE_ABS))
break;
}
}
EXPORT_SYMBOL(usleep_range_state);

View File

@ -151,9 +151,8 @@ void update_vsyscall_tz(void)
unsigned long vdso_update_begin(void) unsigned long vdso_update_begin(void)
{ {
struct vdso_data *vdata = __arch_get_k_vdso_data(); struct vdso_data *vdata = __arch_get_k_vdso_data();
unsigned long flags; unsigned long flags = timekeeper_lock_irqsave();
raw_spin_lock_irqsave(&timekeeper_lock, flags);
vdso_write_begin(vdata); vdso_write_begin(vdata);
return flags; return flags;
} }
@ -172,5 +171,5 @@ void vdso_update_end(unsigned long flags)
vdso_write_end(vdata); vdso_write_end(vdata);
__arch_sync_vdso_data(vdata); __arch_sync_vdso_data(vdata);
raw_spin_unlock_irqrestore(&timekeeper_lock, flags); timekeeper_unlock_irqrestore(flags);
} }

View File

@ -1328,19 +1328,6 @@ config SCHEDSTATS
endmenu endmenu
config DEBUG_TIMEKEEPING
bool "Enable extra timekeeping sanity checking"
help
This option will enable additional timekeeping sanity checks
which may be helpful when diagnosing issues where timekeeping
problems are suspected.
This may include checks in the timekeeping hotpaths, so this
option may have a (very small) performance impact to some
workloads.
If unsure, say N.
config DEBUG_PREEMPT config DEBUG_PREEMPT
bool "Debug preemptible kernel" bool "Debug preemptible kernel"
depends on DEBUG_KERNEL && PREEMPTION && TRACE_IRQFLAGS_SUPPORT depends on DEBUG_KERNEL && PREEMPTION && TRACE_IRQFLAGS_SUPPORT

View File

@ -1906,11 +1906,10 @@ static unsigned long damos_wmark_wait_us(struct damos *scheme)
static void kdamond_usleep(unsigned long usecs) static void kdamond_usleep(unsigned long usecs)
{ {
/* See Documentation/timers/timers-howto.rst for the thresholds */ if (usecs >= USLEEP_RANGE_UPPER_BOUND)
if (usecs > 20 * USEC_PER_MSEC)
schedule_timeout_idle(usecs_to_jiffies(usecs)); schedule_timeout_idle(usecs_to_jiffies(usecs));
else else
usleep_idle_range(usecs, usecs + 1); usleep_range_idle(usecs, usecs + 1);
} }
/* Returns negative error code if it's not activated but should return */ /* Returns negative error code if it's not activated but should return */

View File

@ -42,8 +42,6 @@
#define ZERO_KEY "\x00\x00\x00\x00\x00\x00\x00\x00" \ #define ZERO_KEY "\x00\x00\x00\x00\x00\x00\x00\x00" \
"\x00\x00\x00\x00\x00\x00\x00\x00" "\x00\x00\x00\x00\x00\x00\x00\x00"
#define secs_to_jiffies(_secs) msecs_to_jiffies((_secs) * 1000)
/* Handle HCI Event packets */ /* Handle HCI Event packets */
static void *hci_ev_skb_pull(struct hci_dev *hdev, struct sk_buff *skb, static void *hci_ev_skb_pull(struct hci_dev *hdev, struct sk_buff *skb,

View File

@ -2285,7 +2285,7 @@ static void spin(struct pktgen_dev *pkt_dev, ktime_t spin_until)
s64 remaining; s64 remaining;
struct hrtimer_sleeper t; struct hrtimer_sleeper t;
hrtimer_init_sleeper_on_stack(&t, CLOCK_MONOTONIC, HRTIMER_MODE_ABS); hrtimer_setup_sleeper_on_stack(&t, CLOCK_MONOTONIC, HRTIMER_MODE_ABS);
hrtimer_set_expires(&t.timer, spin_until); hrtimer_set_expires(&t.timer, spin_until);
remaining = ktime_to_ns(hrtimer_expires_remaining(&t.timer)); remaining = ktime_to_ns(hrtimer_expires_remaining(&t.timer));

View File

@ -107,14 +107,12 @@ static void idletimer_tg_expired(struct timer_list *t)
schedule_work(&timer->work); schedule_work(&timer->work);
} }
static enum alarmtimer_restart idletimer_tg_alarmproc(struct alarm *alarm, static void idletimer_tg_alarmproc(struct alarm *alarm, ktime_t now)
ktime_t now)
{ {
struct idletimer_tg *timer = alarm->data; struct idletimer_tg *timer = alarm->data;
pr_debug("alarm %s expired\n", timer->attr.attr.name); pr_debug("alarm %s expired\n", timer->attr.attr.name);
schedule_work(&timer->work); schedule_work(&timer->work);
return ALARMTIMER_NORESTART;
} }
static int idletimer_check_sysfs_name(const char *name, unsigned int size) static int idletimer_check_sysfs_name(const char *name, unsigned int size)

View File

@ -6597,11 +6597,11 @@ sub process {
# ignore udelay's < 10, however # ignore udelay's < 10, however
if (! ($delay < 10) ) { if (! ($delay < 10) ) {
CHK("USLEEP_RANGE", CHK("USLEEP_RANGE",
"usleep_range is preferred over udelay; see Documentation/timers/timers-howto.rst\n" . $herecurr); "usleep_range is preferred over udelay; see function description of usleep_range() and udelay().\n" . $herecurr);
} }
if ($delay > 2000) { if ($delay > 2000) {
WARN("LONG_UDELAY", WARN("LONG_UDELAY",
"long udelay - prefer mdelay; see arch/arm/include/asm/delay.h\n" . $herecurr); "long udelay - prefer mdelay; see function description of mdelay().\n" . $herecurr);
} }
} }
@ -6609,7 +6609,7 @@ sub process {
if ($line =~ /\bmsleep\s*\((\d+)\);/) { if ($line =~ /\bmsleep\s*\((\d+)\);/) {
if ($1 < 20) { if ($1 < 20) {
WARN("MSLEEP", WARN("MSLEEP",
"msleep < 20ms can sleep for up to 20ms; see Documentation/timers/timers-howto.rst\n" . $herecurr); "msleep < 20ms can sleep for up to 20ms; see function description of msleep().\n" . $herecurr);
} }
} }
@ -7077,11 +7077,11 @@ sub process {
my $max = $7; my $max = $7;
if ($min eq $max) { if ($min eq $max) {
WARN("USLEEP_RANGE", WARN("USLEEP_RANGE",
"usleep_range should not use min == max args; see Documentation/timers/timers-howto.rst\n" . "$here\n$stat\n"); "usleep_range should not use min == max args; see function description of usleep_range().\n" . "$here\n$stat\n");
} elsif ($min =~ /^\d+$/ && $max =~ /^\d+$/ && } elsif ($min =~ /^\d+$/ && $max =~ /^\d+$/ &&
$min > $max) { $min > $max) {
WARN("USLEEP_RANGE", WARN("USLEEP_RANGE",
"usleep_range args reversed, use min then max; see Documentation/timers/timers-howto.rst\n" . "$here\n$stat\n"); "usleep_range args reversed, use min then max; see function description of usleep_range().\n" . "$here\n$stat\n");
} }
} }

View File

@ -597,12 +597,12 @@ snd_sof_is_chain_dma_supported(struct snd_sof_dev *sdev, u32 dai_type)
* @addr: Address to poll * @addr: Address to poll
* @val: Variable to read the value into * @val: Variable to read the value into
* @cond: Break condition (usually involving @val) * @cond: Break condition (usually involving @val)
* @sleep_us: Maximum time to sleep between reads in us (0 * @sleep_us: Maximum time to sleep between reads in us (0 tight-loops). Please
* tight-loops). Should be less than ~20ms since usleep_range * read usleep_range() function description for details and
* is used (see Documentation/timers/timers-howto.rst). * limitations.
* @timeout_us: Timeout in us, 0 means never timeout * @timeout_us: Timeout in us, 0 means never timeout
* *
* Returns 0 on success and -ETIMEDOUT upon a timeout. In either * Returns: 0 on success and -ETIMEDOUT upon a timeout. In either
* case, the last read value at @addr is stored in @val. Must not * case, the last read value at @addr is stored in @val. Must not
* be called from atomic context if sleep_us or timeout_us are used. * be called from atomic context if sleep_us or timeout_us are used.
* *

View File

@ -31,7 +31,6 @@ CONFIG_SCHED_DEBUG=y
CONFIG_SCHED_INFO=y CONFIG_SCHED_INFO=y
CONFIG_SCHEDSTATS=y CONFIG_SCHEDSTATS=y
CONFIG_SCHED_STACK_END_CHECK=y CONFIG_SCHED_STACK_END_CHECK=y
CONFIG_DEBUG_TIMEKEEPING=y
CONFIG_DEBUG_PREEMPT=y CONFIG_DEBUG_PREEMPT=y
CONFIG_DEBUG_RT_MUTEXES=y CONFIG_DEBUG_RT_MUTEXES=y
CONFIG_DEBUG_SPINLOCK=y CONFIG_DEBUG_SPINLOCK=y