linux/arch/arm
Linus Torvalds d08c407f71 A large set of updates and features for timers and timekeeping:
- The hierarchical timer pull model
 
     When timer wheel timers are armed they are placed into the timer wheel
     of a CPU which is likely to be busy at the time of expiry. This is done
     to avoid wakeups on potentially idle CPUs.
 
     This is wrong in several aspects:
 
      1) The heuristics to select the target CPU are wrong by
         definition as the chance to get the prediction right is close
         to zero.
 
      2) Due to #1 it is possible that timers are accumulated on a
         single target CPU
 
      3) The required computation in the enqueue path is just overhead for
      	dubious value especially under the consideration that the vast
      	majority of timer wheel timers are either canceled or rearmed
      	before they expire.
 
     The timer pull model avoids the above by removing the target
     computation on enqueue and queueing timers always on the CPU on which
     they get armed.
 
     This is achieved by having separate wheels for CPU pinned timers and
     global timers which do not care about where they expire.
 
     As long as a CPU is busy it handles both the pinned and the global
     timers which are queued on the CPU local timer wheels.
 
     When a CPU goes idle it evaluates its own timer wheels:
 
       - If the first expiring timer is a pinned timer, then the global
       	timers can be ignored as the CPU will wake up before they expire.
 
       - If the first expiring timer is a global timer, then the expiry time
         is propagated into the timer pull hierarchy and the CPU makes sure
         to wake up for the first pinned timer.
 
     The timer pull hierarchy organizes CPUs in groups of eight at the
     lowest level and at the next levels groups of eight groups up to the
     point where no further aggregation of groups is required, i.e. the
     number of levels is log8(NR_CPUS). The magic number of eight has been
     established by experimention, but can be adjusted if needed.
 
     In each group one busy CPU acts as the migrator. It's only one CPU to
     avoid lock contention on remote timer wheels.
 
     The migrator CPU checks in its own timer wheel handling whether there
     are other CPUs in the group which have gone idle and have global timers
     to expire. If there are global timers to expire, the migrator locks the
     remote CPU timer wheel and handles the expiry.
 
     Depending on the group level in the hierarchy this handling can require
     to walk the hierarchy downwards to the CPU level.
 
     Special care is taken when the last CPU goes idle. At this point the
     CPU is the systemwide migrator at the top of the hierarchy and it
     therefore cannot delegate to the hierarchy. It needs to arm its own
     timer device to expire either at the first expiring timer in the
     hierarchy or at the first CPU local timer, which ever expires first.
 
     This completely removes the overhead from the enqueue path, which is
     e.g. for networking a true hotpath and trades it for a slightly more
     complex idle path.
 
     This has been in development for a couple of years and the final series
     has been extensively tested by various teams from silicon vendors and
     ran through extensive CI.
 
     There have been slight performance improvements observed on network
     centric workloads and an Intel team confirmed that this allows them to
     power down a die completely on a mult-die socket for the first time in
     a mostly idle scenario.
 
     There is only one outstanding ~1.5% regression on a specific overloaded
     netperf test which is currently investigated, but the rest is either
     positive or neutral performance wise and positive on the power
     management side.
 
   - Fixes for the timekeeping interpolation code for cross-timestamps:
 
     cross-timestamps are used for PTP to get snapshots from hardware timers
     and interpolated them back to clock MONOTONIC. The changes address a
     few corner cases in the interpolation code which got the math and logic
     wrong.
 
   - Simplifcation of the clocksource watchdog retry logic to automatically
     adjust to handle larger systems correctly instead of having more
     incomprehensible command line parameters.
 
   - Treewide consolidation of the VDSO data structures.
 
   - The usual small improvements and cleanups all over the place.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmXuAN0THHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYoVKXEADIR45rjR1Xtz32js7B53Y65O4WNoOQ
 6/ycWcswuGzg/h4QUpPSJ6gOGVmKSWwZi4n0P/VadCiXGSPPm0aUKsoRUt9DZsPY
 mtj2wjCSXKXiyhTl9OtrZME86ZAIGO1dQXa/sOHsiP5PCjgQkD0b5CYi1+B6eHDt
 1/Uo2Tb9g8VAPppq20V5Uo93GrPf642oyi3FCFrR1M112Uuak5DmqHJYiDpreNcG
 D5SgI+ykSiaUaVyHifvqijoJk0rYXkqEC6evl02477lJ/X0vVo2/M8XPS95BxHST
 s5Iruo4rP+qeAy8QvhZpoPX59fO0m/AgA7cf77XXAtOpVdLH+bs4ILsEbouAIOtv
 lsmRkcYt+TpvrZFHPAxks+6g3afuROiDtxD5sXXpVWxvofi8FwWqubdlqdsbw9MP
 ZCTNyzNyKL47QeDwBfSynYUL1RSyqsphtIwk4oeQklH9rwMAnW21hi30z15hQ0pQ
 FOVkmcwi79JNvl/G+jRkDzw7r8/zcHshWdSjyUM04CDjjnCDjQOFWSIjEPwbQjjz
 S4HXpJKJW963dBgs9Z84/Ctw1GwoBk1qedDWDJE1257Qvmo/Wpe/7GddWcazOGnN
 RRFMzGPbOqBDbjtErOKGU+iCisgNEvz2XK+TI16uRjWde7DxZpiTVYgNDrZ+/Pyh
 rQ23UBms6ZRR+A==
 =iQlu
 -----END PGP SIGNATURE-----

Merge tag 'timers-core-2024-03-10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull timer updates from Thomas Gleixner:
 "A large set of updates and features for timers and timekeeping:

   - The hierarchical timer pull model

     When timer wheel timers are armed they are placed into the timer
     wheel of a CPU which is likely to be busy at the time of expiry.
     This is done to avoid wakeups on potentially idle CPUs.

     This is wrong in several aspects:

       1) The heuristics to select the target CPU are wrong by
          definition as the chance to get the prediction right is
          close to zero.

       2) Due to #1 it is possible that timers are accumulated on
          a single target CPU

       3) The required computation in the enqueue path is just overhead
          for dubious value especially under the consideration that the
          vast majority of timer wheel timers are either canceled or
          rearmed before they expire.

     The timer pull model avoids the above by removing the target
     computation on enqueue and queueing timers always on the CPU on
     which they get armed.

     This is achieved by having separate wheels for CPU pinned timers
     and global timers which do not care about where they expire.

     As long as a CPU is busy it handles both the pinned and the global
     timers which are queued on the CPU local timer wheels.

     When a CPU goes idle it evaluates its own timer wheels:

       - If the first expiring timer is a pinned timer, then the global
         timers can be ignored as the CPU will wake up before they
         expire.

       - If the first expiring timer is a global timer, then the expiry
         time is propagated into the timer pull hierarchy and the CPU
         makes sure to wake up for the first pinned timer.

     The timer pull hierarchy organizes CPUs in groups of eight at the
     lowest level and at the next levels groups of eight groups up to
     the point where no further aggregation of groups is required, i.e.
     the number of levels is log8(NR_CPUS). The magic number of eight
     has been established by experimention, but can be adjusted if
     needed.

     In each group one busy CPU acts as the migrator. It's only one CPU
     to avoid lock contention on remote timer wheels.

     The migrator CPU checks in its own timer wheel handling whether
     there are other CPUs in the group which have gone idle and have
     global timers to expire. If there are global timers to expire, the
     migrator locks the remote CPU timer wheel and handles the expiry.

     Depending on the group level in the hierarchy this handling can
     require to walk the hierarchy downwards to the CPU level.

     Special care is taken when the last CPU goes idle. At this point
     the CPU is the systemwide migrator at the top of the hierarchy and
     it therefore cannot delegate to the hierarchy. It needs to arm its
     own timer device to expire either at the first expiring timer in
     the hierarchy or at the first CPU local timer, which ever expires
     first.

     This completely removes the overhead from the enqueue path, which
     is e.g. for networking a true hotpath and trades it for a slightly
     more complex idle path.

     This has been in development for a couple of years and the final
     series has been extensively tested by various teams from silicon
     vendors and ran through extensive CI.

     There have been slight performance improvements observed on network
     centric workloads and an Intel team confirmed that this allows them
     to power down a die completely on a mult-die socket for the first
     time in a mostly idle scenario.

     There is only one outstanding ~1.5% regression on a specific
     overloaded netperf test which is currently investigated, but the
     rest is either positive or neutral performance wise and positive on
     the power management side.

   - Fixes for the timekeeping interpolation code for cross-timestamps:

     cross-timestamps are used for PTP to get snapshots from hardware
     timers and interpolated them back to clock MONOTONIC. The changes
     address a few corner cases in the interpolation code which got the
     math and logic wrong.

   - Simplifcation of the clocksource watchdog retry logic to
     automatically adjust to handle larger systems correctly instead of
     having more incomprehensible command line parameters.

   - Treewide consolidation of the VDSO data structures.

   - The usual small improvements and cleanups all over the place"

* tag 'timers-core-2024-03-10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (62 commits)
  timer/migration: Fix quick check reporting late expiry
  tick/sched: Fix build failure for CONFIG_NO_HZ_COMMON=n
  vdso/datapage: Quick fix - use asm/page-def.h for ARM64
  timers: Assert no next dyntick timer look-up while CPU is offline
  tick: Assume timekeeping is correctly handed over upon last offline idle call
  tick: Shut down low-res tick from dying CPU
  tick: Split nohz and highres features from nohz_mode
  tick: Move individual bit features to debuggable mask accesses
  tick: Move got_idle_tick away from common flags
  tick: Assume the tick can't be stopped in NOHZ_MODE_INACTIVE mode
  tick: Move broadcast cancellation up to CPUHP_AP_TICK_DYING
  tick: Move tick cancellation up to CPUHP_AP_TICK_DYING
  tick: Start centralizing tick related CPU hotplug operations
  tick/sched: Don't clear ts::next_tick again in can_stop_idle_tick()
  tick/sched: Rename tick_nohz_stop_sched_tick() to tick_nohz_full_stop_tick()
  tick: Use IS_ENABLED() whenever possible
  tick/sched: Remove useless oneshot ifdeffery
  tick/nohz: Remove duplicate between lowres and highres handlers
  tick/nohz: Remove duplicate between tick_nohz_switch_to_nohz() and tick_setup_sched_timer()
  hrtimer: Select housekeeping CPU during migration
  ...
2024-03-11 14:38:26 -07:00
..
boot i.MX fixes for 6.8, round 2: 2024-03-04 15:24:28 +01:00
common locomo: make locomo_bus_type constant and static 2024-01-04 14:38:57 +01:00
configs ARM: imx_v6_v7_defconfig: Restore CONFIG_BACKLIGHT_CLASS_DEVICE 2024-02-23 09:58:39 +08:00
crypto crypto: arm/nhpoly1305 - implement ->digest 2023-10-20 13:39:25 +08:00
include vdso/ARM: Make union vdso_data_store available for all architectures 2024-02-20 20:56:00 +01:00
kernel vdso/ARM: Make union vdso_data_store available for all architectures 2024-02-20 20:56:00 +01:00
lib ARM: 9321/1: memset: cast the constant byte to unsigned char 2023-10-05 16:15:41 +01:00
mach-actions
mach-alpine ARM: alpine: Drop unused includes 2023-08-12 10:30:59 +02:00
mach-artpec
mach-aspeed
mach-at91 ARM: at91: pm: set soc_pm.data.mode in at91_pm_secure_init() 2023-11-19 11:32:44 +02:00
mach-axxia
mach-bcm ARM: bcm: Drop unused includes 2023-07-21 10:01:47 -07:00
mach-berlin ARM: berlin: Drop unused includes 2023-08-12 10:30:59 +02:00
mach-clps711x
mach-davinci ARM updates for v6.8-rc1 2024-01-17 11:34:45 -08:00
mach-digicolor
mach-dove ARM: dove: Drop unused includes 2023-08-12 10:30:59 +02:00
mach-ep93xx ARM: ep93xx: Add terminator to gpiod_lookup_table 2024-02-20 17:19:49 +01:00
mach-exynos ARM: SoC changes for 6.5 2023-06-29 15:28:33 -07:00
mach-footbridge
mach-gemini
mach-highbank
mach-hisi ARM: hisi: Drop unused includes 2023-07-19 06:29:04 +00:00
mach-hpe ARM: hpe: Drop unused includes 2023-08-12 10:30:59 +02:00
mach-imx ARM: SoC code changes for 6.8 2024-01-11 11:42:53 -08:00
mach-ixp4xx
mach-keystone ARM: keystone: Merge PM function into main support file 2023-08-01 23:57:28 -05:00
mach-lpc18xx
mach-lpc32xx
mach-mediatek
mach-meson ARM: meson: Drop unused includes 2023-07-31 11:58:18 +02:00
mach-milbeaut
mach-mmp ARM: mmp: Drop unused includes 2023-08-12 10:31:00 +02:00
mach-mstar
mach-mv78xx0
mach-mvebu ARM: mvebu: Explicitly include correct DT includes 2023-08-12 10:31:00 +02:00
mach-mxs ARM: mxs: Do not search for "fsl,clkctrl" 2023-12-06 11:21:43 +08:00
mach-nomadik ARM: nomadik: Drop unused includes 2023-08-12 10:31:00 +02:00
mach-npcm ARM: npcm: Drop unused includes 2023-08-12 10:31:00 +02:00
mach-omap1 gpio updates for v6.7-rc1 2023-10-31 17:21:54 -10:00
mach-omap2 ARM: OMAP2+: Fix null pointer dereference and memory leak in omap_soc_device_init 2023-11-30 13:57:00 +02:00
mach-orion5x
mach-pxa ARM: SoC cleanups for 6.6 2023-08-30 16:49:40 -07:00
mach-qcom
mach-realtek
mach-rockchip ARM: rockchip: Drop unused includes 2023-08-12 10:31:00 +02:00
mach-rpc
mach-s3c ASoC: wm8996: Convert to GPIO descriptors 2023-12-08 14:32:00 +00:00
mach-s5pv210 ARM: s5pv210: Explicitly include correct DT includes 2023-08-14 18:15:48 +02:00
mach-sa1100 ARM: locomo: fix locomolcd_power declaration 2023-09-28 09:15:51 +02:00
mach-shmobile ARM: shmobile: sh73a0: Reserve boot area when SMP is enabled 2023-09-27 11:00:27 +02:00
mach-socfpga ARM: socfpga: Explicitly include correct DT includes 2023-07-20 14:38:38 -05:00
mach-spear ARM: spear: Explicitly include correct DT includes 2023-08-12 10:31:01 +02:00
mach-sti ARM: sti: Drop unused includes 2023-08-12 10:30:59 +02:00
mach-stm32
mach-sunxi ARM: sun9i: smp: fix return code check of of_property_match_string 2024-01-02 16:45:16 +01:00
mach-tegra
mach-ux500 ARM: ux500: Move power-domain driver to the genpd dir 2023-07-14 10:41:59 +02:00
mach-versatile ARM: Delete ARM11MPCore (ARM11 ARMv6K SMP) support 2023-12-22 11:43:16 +00:00
mach-vt8500
mach-zynq ARM: zynq: Explicitly include correct DT includes 2023-07-20 17:06:50 +02:00
mm arch/arm/mm: fix major fault accounting when retrying under per-VMA lock 2024-02-07 21:20:35 -08:00
net arm32, bpf: add support for 64 bit division instruction 2023-09-15 17:16:56 -07:00
nwfpe
plat-orion
probes ARM: 9303/1: kprobes: avoid missing-declaration warnings 2023-06-19 09:35:51 +01:00
tools lsm/stable-6.8 PR 20240105 2024-01-09 12:57:46 -08:00
vdso arch: vdso: consolidate gettime prototypes 2023-11-23 11:32:32 +01:00
vfp ARM: 9327/1: vfp: Add missing VFP instructions to neon_support_hook 2023-12-05 11:40:27 +00:00
xen arm/xen: fix xen_vcpu_info allocation alignment 2023-11-23 09:32:41 +01:00
Kbuild
Kconfig ARM updates for v6.8-rc1 2024-01-17 11:34:45 -08:00
Kconfig-nommu
Kconfig.assembler
Kconfig.debug ARM: debug: fix DEBUG_UNCOMPRESS help for !MULTIPLATFORM 2024-01-05 23:26:07 +01:00
Kconfig.platforms ARM: mach-nspire: Rework support and directory structure 2023-12-22 14:23:30 +00:00
Makefile ARM: mach-nspire: Rework support and directory structure 2023-12-22 14:23:30 +00:00