Commit Graph

3045 Commits

Author SHA1 Message Date
Linus Torvalds
7c9026b2b0 pstore updates for v6.12-rc1
- ramoops: Fix .rst typo (Steven Rostedt)
 
 - pstore: replace spinlock_t by raw_spinlock_t (Wen Yang)
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRSPkdeREjth1dHnSE2KwveOeQkuwUCZufrKwAKCRA2KwveOeQk
 u0PQAQDtABDUAMyV9kgAG3nU9L0En+p8s+rF+9MJLDmFEnmjkQD8C7ky6cW74/3R
 b56tAsSsuFMuHdx4AGGD7rAWrnVUswc=
 =HNdN
 -----END PGP SIGNATURE-----

Merge tag 'pstore-v6.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux

Pull pstore updates from Kees Cook:

 - ramoops: Fix .rst typo (Steven Rostedt)

 - pstore: replace spinlock_t by raw_spinlock_t (Wen Yang)

* tag 'pstore-v6.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
  pstore: replace spinlock_t by raw_spinlock_t
  pstore/ramoops: Fix typo as there is no "reserver"
2024-09-18 11:47:03 +02:00
Linus Torvalds
067610ebaa RCU pull request for v6.12
This pull request contains the following branches:
 
 context_tracking.15.08.24a: Rename context tracking state related
         symbols and remove references to "dynticks" in various context
         tracking state variables and related helpers; force
         context_tracking_enabled_this_cpu() to be inlined to avoid
         leaving a noinstr section.
 
 csd.lock.15.08.24a: Enhance CSD-lock diagnostic reports; add an API
         to provide an indication of ongoing CSD-lock stall.
 
 nocb.09.09.24a: Update and simplify RCU nocb code to handle
         (de-)offloading of callbacks only for offline CPUs; fix RT
         throttling hrtimer being armed from offline CPU.
 
 rcutorture.14.08.24a: Remove redundant rcu_torture_ops get_gp_completed
         fields; add SRCU ->same_gp_state and ->get_comp_state
         functions; add generic test for NUM_ACTIVE_*RCU_POLL* for
         testing RCU and SRCU polled grace periods; add CFcommon.arch
         for arch-specific Kconfig options; print number of update types
         in rcu_torture_write_types();
         add rcutree.nohz_full_patience_delay testing to the TREE07
         scenario; add a stall_cpu_repeat module parameter to test
         repeated CPU stalls; add argument to limit number of CPUs a
         guest OS can use in torture.sh;
 
 rcustall.09.09.24a: Abbreviate RCU CPU stall warnings during CSD-lock
         stalls; Allow dump_cpu_task() to be called without disabling
         preemption; defer printing stall-warning backtrace when holding
         rcu_node lock.
 
 srcu.12.08.24a: Make SRCU gp seq wrap-around faster; add KCSAN checks
         for concurrent updates to ->srcu_n_exp_nodelay and
         ->reschedule_count which are used in heuristics governing
         auto-expediting of normal SRCU grace periods and
         grace-period-state-machine delays; mark idle SRCU-barrier
         callbacks to help identify stuck SRCU-barrier callback.
 
 rcu.tasks.14.08.24a: Remove RCU Tasks Rude asynchronous APIs as they
         are no longer used; stop testing RCU Tasks Rude asynchronous
         APIs; fix access to non-existent percpu regions; check
         processor-ID assumptions during chosen CPU calculation for
         callback enqueuing; update description of rtp->tasks_gp_seq
         grace-period sequence number; add rcu_barrier_cb_is_done()
         to identify whether a given rcu_barrier callback is stuck;
         mark idle Tasks-RCU-barrier callbacks; add
         *torture_stats_print() functions to print detailed
         diagnostics for Tasks-RCU variants; capture start time of
         rcu_barrier_tasks*() operation to help distinguish a hung
         barrier operation from a long series of barrier operations.
 
 rcu_scaling_tests.15.08.24a:
         refscale: Add a TINY scenario to support tests of Tiny RCU
         and Tiny SRCU; Optimize process_durations() operation;
 
         rcuscale: Dump stacks of stalled rcu_scale_writer() instances;
         dump grace-period statistics when rcu_scale_writer() stalls;
         mark idle RCU-barrier callbacks to identify stuck RCU-barrier
         callbacks; print detailed grace-period and barrier diagnostics
         on rcu_scale_writer() hangs for Tasks-RCU variants; warn if
         async module parameter is specified for RCU implementations
         that do not have async primitives such as RCU Tasks Rude;
         make all writer tasks report upon hang; tolerate repeated
         GFP_KERNEL failure in rcu_scale_writer(); use special allocator
         for rcu_scale_writer(); NULL out top-level pointers to heap
         memory to avoid double-free bugs on modprobe failures; maintain
         per-task instead of per-CPU callbacks count to avoid any issues
         with migration of either tasks or callbacks; constify struct
         ref_scale_ops.
 
 fixes.12.08.24a: Use system_unbound_wq for kfree_rcu work to avoid
         disturbing isolated CPUs.
 
 misc.11.08.24a: Warn on unexpected rcu_state.srs_done_tail state;
         Better define "atomic" for list_replace_rcu() and
         hlist_replace_rcu() routines; annotate struct
         kvfree_rcu_bulk_data with __counted_by().
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQSi2tPIQIc2VEtjarIAHS7/6Z0wpQUCZt8+8wAKCRAAHS7/6Z0w
 pTqoAPwPN//tlEoJx2PRs6t0q+nD1YNvnZawPaRmdzgdM8zJogD+PiSN+XhqRr80
 jzyvMDU4Aa0wjUNP3XsCoaCxo7L/lQk=
 =bZ9z
 -----END PGP SIGNATURE-----

Merge tag 'rcu.release.v6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/rcu/linux

Pull RCU updates from Neeraj Upadhyay:
 "Context tracking:
   - rename context tracking state related symbols and remove references
     to "dynticks" in various context tracking state variables and
     related helpers
   - force context_tracking_enabled_this_cpu() to be inlined to avoid
     leaving a noinstr section

  CSD lock:
   - enhance CSD-lock diagnostic reports
   - add an API to provide an indication of ongoing CSD-lock stall

  nocb:
   - update and simplify RCU nocb code to handle (de-)offloading of
     callbacks only for offline CPUs
   - fix RT throttling hrtimer being armed from offline CPU

  rcutorture:
   - remove redundant rcu_torture_ops get_gp_completed fields
   - add SRCU ->same_gp_state and ->get_comp_state functions
   - add generic test for NUM_ACTIVE_*RCU_POLL* for testing RCU and SRCU
     polled grace periods
   - add CFcommon.arch for arch-specific Kconfig options
   - print number of update types in rcu_torture_write_types()
   - add rcutree.nohz_full_patience_delay testing to the TREE07 scenario
   - add a stall_cpu_repeat module parameter to test repeated CPU stalls
   - add argument to limit number of CPUs a guest OS can use in
     torture.sh

  rcustall:
   - abbreviate RCU CPU stall warnings during CSD-lock stalls
   - Allow dump_cpu_task() to be called without disabling preemption
   - defer printing stall-warning backtrace when holding rcu_node lock

  srcu:
   - make SRCU gp seq wrap-around faster
   - add KCSAN checks for concurrent updates to ->srcu_n_exp_nodelay and
     ->reschedule_count which are used in heuristics governing
     auto-expediting of normal SRCU grace periods and
     grace-period-state-machine delays
   - mark idle SRCU-barrier callbacks to help identify stuck
     SRCU-barrier callback

  rcu tasks:
   - remove RCU Tasks Rude asynchronous APIs as they are no longer used
   - stop testing RCU Tasks Rude asynchronous APIs
   - fix access to non-existent percpu regions
   - check processor-ID assumptions during chosen CPU calculation for
     callback enqueuing
   - update description of rtp->tasks_gp_seq grace-period sequence
     number
   - add rcu_barrier_cb_is_done() to identify whether a given
     rcu_barrier callback is stuck
   - mark idle Tasks-RCU-barrier callbacks
   - add *torture_stats_print() functions to print detailed diagnostics
     for Tasks-RCU variants
   - capture start time of rcu_barrier_tasks*() operation to help
     distinguish a hung barrier operation from a long series of barrier
     operations

  refscale:
   - add a TINY scenario to support tests of Tiny RCU and Tiny
     SRCU
   - optimize process_durations() operation

  rcuscale:
   - dump stacks of stalled rcu_scale_writer() instances and
     grace-period statistics when rcu_scale_writer() stalls
   - mark idle RCU-barrier callbacks to identify stuck RCU-barrier
     callbacks
   - print detailed grace-period and barrier diagnostics on
     rcu_scale_writer() hangs for Tasks-RCU variants
   - warn if async module parameter is specified for RCU implementations
     that do not have async primitives such as RCU Tasks Rude
   - make all writer tasks report upon hang
   - tolerate repeated GFP_KERNEL failure in rcu_scale_writer()
   - use special allocator for rcu_scale_writer()
   - NULL out top-level pointers to heap memory to avoid double-free
     bugs on modprobe failures
   - maintain per-task instead of per-CPU callbacks count to avoid any
     issues with migration of either tasks or callbacks
   - constify struct ref_scale_ops

  Fixes:
   - use system_unbound_wq for kfree_rcu work to avoid disturbing
     isolated CPUs

  Misc:
   - warn on unexpected rcu_state.srs_done_tail state
   - better define "atomic" for list_replace_rcu() and
     hlist_replace_rcu() routines
   - annotate struct kvfree_rcu_bulk_data with __counted_by()"

* tag 'rcu.release.v6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/rcu/linux: (90 commits)
  rcu: Defer printing stall-warning backtrace when holding rcu_node lock
  rcu/nocb: Remove superfluous memory barrier after bypass enqueue
  rcu/nocb: Conditionally wake up rcuo if not already waiting on GP
  rcu/nocb: Fix RT throttling hrtimer armed from offline CPU
  rcu/nocb: Simplify (de-)offloading state machine
  context_tracking: Tag context_tracking_enabled_this_cpu() __always_inline
  context_tracking, rcu: Rename rcu_dyntick trace event into rcu_watching
  rcu: Update stray documentation references to rcu_dynticks_eqs_{enter, exit}()
  rcu: Rename rcu_momentary_dyntick_idle() into rcu_momentary_eqs()
  rcu: Rename rcu_implicit_dynticks_qs() into rcu_watching_snap_recheck()
  rcu: Rename dyntick_save_progress_counter() into rcu_watching_snap_save()
  rcu: Rename struct rcu_data .exp_dynticks_snap into .exp_watching_snap
  rcu: Rename struct rcu_data .dynticks_snap into .watching_snap
  rcu: Rename rcu_dynticks_zero_in_eqs() into rcu_watching_zero_in_eqs()
  rcu: Rename rcu_dynticks_in_eqs_since() into rcu_watching_snap_stopped_since()
  rcu: Rename rcu_dynticks_in_eqs() into rcu_watching_snap_in_eqs()
  rcu: Rename rcu_dynticks_eqs_online() into rcu_watching_online()
  context_tracking, rcu: Rename rcu_dynticks_curr_cpu_in_eqs() into rcu_is_watching_curr_cpu()
  context_tracking, rcu: Rename rcu_dynticks_task*() into rcu_task*()
  refscale: Constify struct ref_scale_ops
  ...
2024-09-18 07:52:24 +02:00
Linus Torvalds
85a77db95a workqueue: Changes for v6.12
Nothing major:
 
 - workqueue.panic_on_stall boot param added.
 
 - alloc_workqueue_lockdep_map() added (used by DRM).
 
 - Other cleanusp and doc updates.
 -----BEGIN PGP SIGNATURE-----
 
 iIQEABYKACwWIQTfIjM1kS57o3GsC/uxYfJx3gVYGQUCZuN3gQ4cdGpAa2VybmVs
 Lm9yZwAKCRCxYfJx3gVYGR1hAP0XObdExeNsVWe1JUUUX061+H+aA6aVffb9+J/t
 b32u3QEAsn+oNWzuvzlGlSQKQMpPk+dT2na0Q0yZNxkNEzUiEQQ=
 =TeDS
 -----END PGP SIGNATURE-----

Merge tag 'wq-for-6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq

Pull workqueue updates from Tejun Heo:
 "Nothing major:

   - workqueue.panic_on_stall boot param added

   - alloc_workqueue_lockdep_map() added (used by DRM)

   - Other cleanusp and doc updates"

* tag 'wq-for-6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq:
  kernel/workqueue.c: fix DEFINE_PER_CPU_SHARED_ALIGNED expansion
  workqueue: Fix another htmldocs build warning
  workqueue: fix null-ptr-deref on __alloc_workqueue() error
  workqueue: Don't call va_start / va_end twice
  workqueue: Fix htmldocs build warning
  workqueue: Add interface for user-defined workqueue lockdep map
  workqueue: Change workqueue lockdep map to pointer
  workqueue: Split alloc_workqueue into internal function and lockdep init
  Documentation: kernel-parameters: add workqueue.panic_on_stall
  workqueue: add cmdline parameter workqueue.panic_on_stall
2024-09-18 06:59:44 +02:00
Linus Torvalds
78567e2bc7 cgroup: Changes for v6.12
- cpuset isolation improvements.
 
 - cpuset cgroup1 support is split into its own file behind the new config
   option CONFIG_CPUSET_V1. This makes it the second controller which makes
   cgroup1 support optional after memcg.
 
 - Handling of unavailable v1 controller handling improved during cgroup1
   mount operations.
 
 - union_find applied to cpuset. It makes code simpler and more efficient.
 
 - Reduce spurious events in pids.events.
 
 - Cleanups and other misc changes.
 
 - Contains a merge of cgroup/for-6.11-fixes to receive cpuset fixes that
   further changes build upon.
 -----BEGIN PGP SIGNATURE-----
 
 iIQEABYKACwWIQTfIjM1kS57o3GsC/uxYfJx3gVYGQUCZuNU3Q4cdGpAa2VybmVs
 Lm9yZwAKCRCxYfJx3gVYGdMsAP9yqPxu//LiJ3lPWhKcVVKtdwrA3AYDLE81VSJO
 5VZJhAD+Ic+Ly/jZjDtjjQpZ1U3JsBpBRcVBqzeH0gD7eXaJgwk=
 =h/+c
 -----END PGP SIGNATURE-----

Merge tag 'cgroup-for-6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup

Pull cgroup updates from Tejun Heo:

 - cpuset isolation improvements

 - cpuset cgroup1 support is split into its own file behind the new
   config option CONFIG_CPUSET_V1. This makes it the second controller
   which makes cgroup1 support optional after memcg

 - Handling of unavailable v1 controller handling improved during
   cgroup1 mount operations

 - union_find applied to cpuset. It makes code simpler and more
   efficient

 - Reduce spurious events in pids.events

 - Cleanups and other misc changes

 - Contains a merge of cgroup/for-6.11-fixes to receive cpuset fixes
   that further changes build upon

* tag 'cgroup-for-6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: (34 commits)
  cgroup: Do not report unavailable v1 controllers in /proc/cgroups
  cgroup: Disallow mounting v1 hierarchies without controller implementation
  cgroup/cpuset: Expose cpuset filesystem with cpuset v1 only
  cgroup/cpuset: Move cpu.h include to cpuset-internal.h
  cgroup/cpuset: add sefltest for cpuset v1
  cgroup/cpuset: guard cpuset-v1 code under CONFIG_CPUSETS_V1
  cgroup/cpuset: rename functions shared between v1 and v2
  cgroup/cpuset: move v1 interfaces to cpuset-v1.c
  cgroup/cpuset: move validate_change_legacy to cpuset-v1.c
  cgroup/cpuset: move legacy hotplug update to cpuset-v1.c
  cgroup/cpuset: add callback_lock helper
  cgroup/cpuset: move memory_spread to cpuset-v1.c
  cgroup/cpuset: move relax_domain_level to cpuset-v1.c
  cgroup/cpuset: move memory_pressure to cpuset-v1.c
  cgroup/cpuset: move common code to cpuset-internal.h
  cgroup/cpuset: introduce cpuset-v1.c
  selftest/cgroup: Make test_cpuset_prs.sh deal with pre-isolated CPUs
  cgroup/cpuset: Account for boot time isolated CPUs
  cgroup/cpuset: remove use_parent_ecpus of cpuset
  cgroup/cpuset: remove fetch_xcpus
  ...
2024-09-18 06:39:03 +02:00
Linus Torvalds
d58db3f3a0 Another relatively mundane cycle for docs:
- The beginning of an EEVDF scheduler document
 
 - More Chinese translations
 
 - A rethrashing of our bisection documentation
 
 ...plus the usual array of smaller fixes, and more than the usual number of
 typo fixes.
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEIw+MvkEiF49krdp9F0NaE2wMflgFAmboMnkACgkQF0NaE2wM
 flha/Qf/e8zRinIYQJ7BmombNm39w3wUiNuXr8SWq7afqhsAJJzmOZ3oyyfssL+B
 a1pSjhxb15UrKf1kMKhdBxhDndXvto5UekJRBY5gsTvcBMBmtIovN+ZK5Z5jObsw
 gzHD9of08Ti7N4C2dSBdLPHtvIBX0rVeEK4oAH7AUaQviu1cfTaLQQA0dRYsaJeX
 iXsts2NkGl6ZUF7mk4nlzj8+Y1zot+mCd6B53iSimNKxwsPODrCZUobJAvxg1qVU
 pRCQcnpx2fTBnh4ugrcLZbautyhL9bJ8VQzFeoQgYpODDgDnZyTjN6kxv65LpxAz
 dXi+hx5Vk7lP3BbTp9EeGn305/qQPA==
 =JuBw
 -----END PGP SIGNATURE-----

Merge tag 'docs-6.12' of git://git.lwn.net/linux

Pull documentation update from Jonathan Corbet:
 "Another relatively mundane cycle for docs:

   - The beginning of an EEVDF scheduler document

   - More Chinese translations

   - A rethrashing of our bisection documentation

  ...plus the usual array of smaller fixes, and more than the usual
  number of typo fixes"

* tag 'docs-6.12' of git://git.lwn.net/linux: (48 commits)
  Remove duplicate "and" in 'Linux NVMe docs.
  docs:filesystems: fix spelling and grammar mistakes
  docs:filesystem: fix mispelled words on autofs page
  docs:mm: fixed spelling and grammar mistakes on vmalloc kernel stack page
  Documentation: PCI: fix typo in pci.rst
  docs/zh_CN: add the translation of kbuild/gcc-plugins.rst
  docs/process: fix typos
  docs:mm: fix spelling mistakes in heterogeneous memory management page
  accel/qaic: Fix a typo
  docs/zh_CN: update the translation of security-bugs
  docs: block: Fix grammar and spelling mistakes in bfq-iosched.rst
  Documentation: Fix spelling mistakes
  Documentation/gpu: Fix typo in Documentation/gpu/komeda-kms.rst
  scripts: sphinx-pre-install: remove unnecessary double check for $cur_version
  Loongarch: KVM: Add KVM hypercalls documentation for LoongArch
  Documentation: Document the kernel flag bdev_allow_write_mounted
  docs: scheduler: completion: Update member of struct completion
  docs: kerneldoc-preamble.sty: Suppress extra spaces in CJK literal blocks
  docs: submitting-patches: Advertise b4
  docs: update dev-tools/kcsan.rst url about KTSAN
  ...
2024-09-17 16:44:08 +02:00
Linus Torvalds
9ea925c806 Updates for timers and timekeeping:
- Core:
 
 	- Overhaul of posix-timers in preparation of removing the
 	  workaround for periodic timers which have signal delivery
 	  ignored.
 
         - Remove the historical extra jiffie in msleep()
 
 	  msleep() adds an extra jiffie to the timeout value to ensure
 	  minimal sleep time. The timer wheel ensures minimal sleep
 	  time since the large rewrite to a non-cascading wheel, but the
 	  extra jiffie in msleep() remained unnoticed. Remove it.
 
         - Make the timer slack handling correct for realtime tasks.
 
 	  The procfs interface is inconsistent and does neither reflect
 	  reality nor conforms to the man page. Show the correct 0 slack
 	  for real time tasks and enforce it at the core level instead of
 	  having inconsistent individual checks in various timer setup
 	  functions.
 
         - The usual set of updates and enhancements all over the place.
 
   - Drivers:
 
         - Allow the ACPI PM timer to be turned off during suspend
 
 	- No new drivers
 
 	- The usual updates and enhancements in various drivers
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmbn7jQTHHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYobqnD/9COlU0nwsulABI/aNIrsh6iYvnCC9v
 14CcNta7Qn+157Wfw9BWOyHdNhR1/fPCXE8jJ71zTyIOeW27HV2JyTtxTwe9ZcdK
 ViHAaj7YcIjcVUEC3StCoRCPnvLslEw4qJA5AOQuDyMivdQn+YVa2c0baJxKaXZt
 xk4HZdMj4NAS0jRKnoZSwtKW/+Oz6rR4GAWrZo+Zs1/8ur3HfqnQfi8lJ1hJtLLW
 V7XDCVRvamVi6Ah3ocYPPp/1P6yeQDA1ge9aMddqaza5STWISXRtSnFMUmYP3rbS
 FaL8TyL+ilfny8pkGB2WlG6nLuSbtvogtdEh1gG1k1RmZt44kAtk8ba/KiWFPBSb
 zK9cjojRMBS71f9G4kmb5F4rnXoLsg1YbD1Nzhz3wq2Cs1Z90dc2QwMren0zoQ1x
 Fn56ueRyAiagBlnrSaKyso/2RvqJTNoSdi3RkpjYeAph0UoDCqvTvKjGAf1mWiw1
 T/1lUWSVqWHnzZbM7XXzzajIN9bl6A7bbqlcAJ2O9vZIDt7273DG+bQym9Vh6Why
 0LTGGERHxzKBsG7WRg+2Gmvv6S18UPKRo8tLtlA758rHlFuPTZCShWrIriwSNl1K
 Hxon+d4BparSnm1h9W/NHPKJA574UbWRCBjdk58IkAj8DxZZY4ORD9SMP+ggkV7G
 F6p9cgoDNP9KFg==
 =jE0N
 -----END PGP SIGNATURE-----

Merge tag 'timers-core-2024-09-16' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull timer updates from Thomas Gleixner:
 "Core:

   - Overhaul of posix-timers in preparation of removing the workaround
     for periodic timers which have signal delivery ignored.

   - Remove the historical extra jiffie in msleep()

     msleep() adds an extra jiffie to the timeout value to ensure
     minimal sleep time. The timer wheel ensures minimal sleep time
     since the large rewrite to a non-cascading wheel, but the extra
     jiffie in msleep() remained unnoticed. Remove it.

   - Make the timer slack handling correct for realtime tasks.

     The procfs interface is inconsistent and does neither reflect
     reality nor conforms to the man page. Show the correct 0 slack for
     real time tasks and enforce it at the core level instead of having
     inconsistent individual checks in various timer setup functions.

   - The usual set of updates and enhancements all over the place.

  Drivers:

   - Allow the ACPI PM timer to be turned off during suspend

   - No new drivers

   - The usual updates and enhancements in various drivers"

* tag 'timers-core-2024-09-16' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (43 commits)
  ntp: Make sure RTC is synchronized when time goes backwards
  treewide: Fix wrong singular form of jiffies in comments
  cpu: Use already existing usleep_range()
  timers: Rename next_expiry_recalc() to be unique
  platform/x86:intel/pmc: Fix comment for the pmc_core_acpi_pm_timer_suspend_resume function
  clocksource/drivers/jcore: Use request_percpu_irq()
  clocksource/drivers/cadence-ttc: Add missing clk_disable_unprepare in ttc_setup_clockevent
  clocksource/drivers/asm9260: Add missing clk_disable_unprepare in asm9260_timer_init
  clocksource/drivers/qcom: Add missing iounmap() on errors in msm_dt_timer_init()
  clocksource/drivers/ingenic: Use devm_clk_get_enabled() helpers
  platform/x86:intel/pmc: Enable the ACPI PM Timer to be turned off when suspended
  clocksource: acpi_pm: Add external callback for suspend/resume
  clocksource/drivers/arm_arch_timer: Using for_each_available_child_of_node_scoped()
  dt-bindings: timer: rockchip: Add rk3576 compatible
  timers: Annotate possible non critical data race of next_expiry
  timers: Remove historical extra jiffie for timeout in msleep()
  hrtimer: Use and report correct timerslack values for realtime tasks
  hrtimer: Annotate hrtimer_cpu_base_.*_expiry() for sparse.
  timers: Add sparse annotation for timer_sync_wait_running().
  signal: Replace BUG_ON()s
  ...
2024-09-17 07:25:37 +02:00
Linus Torvalds
a430d95c5e lsm/stable-6.12 PR 20240911
-----BEGIN PGP SIGNATURE-----
 
 iQJIBAABCAAyFiEES0KozwfymdVUl37v6iDy2pc3iXMFAmbiGGAUHHBhdWxAcGF1
 bC1tb29yZS5jb20ACgkQ6iDy2pc3iXPU8BAA1+A15pmS34I9pq7c8TmRz3rNEs/a
 zrW1aWJ0X/+axNS7sW3Pwtt1EKuaOhskKU8gNSieRhljC8rgXIVjZzLw6Atgcr5k
 upulGbU9TXyVisYN+PWv9/84ito6/nYsKb7Mg3nUVsdodtIFVnsk1fxYLPHQEBig
 Pl3i26U3VqH93Kz0W5vs/QR2uduPB8ZyscdTgcbrY9Vv1Y7IDZ2g9QsJVKLvbQKL
 qcPK1JkHa+sBPJxDqS9A40zgbLbdPQgWQzsXX3dz822w1Ga7FIHSqxMBA6HwHZ+L
 kV4P58wVfavhwt/cQSKMWI/yiGPMMd0B6yD+m8ojOvGfOfRCWxGMmEMqHNuZ3m7k
 Bfll5ZgZTY8phUUhiNf3nxO3F3MM/5bHdhPOj3RReqbAbS6uWr4/fThPDYY/zIo6
 NCY3HGxx3Ae64uQ01gC2p/czC50jDsMwlbXiZbrgdBhjBm/CVk5ozb80mLVcGrLB
 +6XMzzSbC8IaNAH2fDmUJ2ABdwyNPgsSOTGZVzIanpxu1SU2/yk3SMxkp8fv5s36
 wLeODUVcLgsjVV538Mkm6PGTE4TlXaH9yi6apMyJAGp0vPYx5c3Xxk2y5A5cur5p
 hcrbDiX2QgeqFbwsz36incmPmbef2NU2c8feR8XLtPJuwNIeRcMSje0pnkaFlRmb
 TAUJ1sDQAzZ8Fy0=
 =HIAO
 -----END PGP SIGNATURE-----

Merge tag 'lsm-pr-20240911' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/lsm

Pull lsm updates from Paul Moore:

 - Move the LSM framework to static calls

   This transitions the vast majority of the LSM callbacks into static
   calls. Those callbacks which haven't been converted were left as-is
   due to the general ugliness of the changes required to support the
   static call conversion; we can revisit those callbacks at a future
   date.

 - Add the Integrity Policy Enforcement (IPE) LSM

   This adds a new LSM, Integrity Policy Enforcement (IPE). There is
   plenty of documentation about IPE in this patches, so I'll refrain
   from going into too much detail here, but the basic motivation behind
   IPE is to provide a mechanism such that administrators can restrict
   execution to only those binaries which come from integrity protected
   storage, e.g. a dm-verity protected filesystem. You will notice that
   IPE requires additional LSM hooks in the initramfs, dm-verity, and
   fs-verity code, with the associated patches carrying ACK/review tags
   from the associated maintainers. We couldn't find an obvious
   maintainer for the initramfs code, but the IPE patchset has been
   widely posted over several years.

   Both Deven Bowers and Fan Wu have contributed to IPE's development
   over the past several years, with Fan Wu agreeing to serve as the IPE
   maintainer moving forward. Once IPE is accepted into your tree, I'll
   start working with Fan to ensure he has the necessary accounts, keys,
   etc. so that he can start submitting IPE pull requests to you
   directly during the next merge window.

 - Move the lifecycle management of the LSM blobs to the LSM framework

   Management of the LSM blobs (the LSM state buffers attached to
   various kernel structs, typically via a void pointer named "security"
   or similar) has been mixed, some blobs were allocated/managed by
   individual LSMs, others were managed by the LSM framework itself.

   Starting with this pull we move management of all the LSM blobs,
   minus the XFRM blob, into the framework itself, improving consistency
   across LSMs, and reducing the amount of duplicated code across LSMs.
   Due to some additional work required to migrate the XFRM blob, it has
   been left as a todo item for a later date; from a practical
   standpoint this omission should have little impact as only SELinux
   provides a XFRM LSM implementation.

 - Fix problems with the LSM's handling of F_SETOWN

   The LSM hook for the fcntl(F_SETOWN) operation had a couple of
   problems: it was racy with itself, and it was disconnected from the
   associated DAC related logic in such a way that the LSM state could
   be updated in cases where the DAC state would not. We fix both of
   these problems by moving the security_file_set_fowner() hook into the
   same section of code where the DAC attributes are updated. Not only
   does this resolve the DAC/LSM synchronization issue, but as that code
   block is protected by a lock, it also resolve the race condition.

 - Fix potential problems with the security_inode_free() LSM hook

   Due to use of RCU to protect inodes and the placement of the LSM hook
   associated with freeing the inode, there is a bit of a challenge when
   it comes to managing any LSM state associated with an inode. The VFS
   folks are not open to relocating the LSM hook so we have to get
   creative when it comes to releasing an inode's LSM state.
   Traditionally we have used a single LSM callback within the hook that
   is triggered when the inode is "marked for death", but not actually
   released due to RCU.

   Unfortunately, this causes problems for LSMs which want to take an
   action when the inode's associated LSM state is actually released; so
   we add an additional LSM callback, inode_free_security_rcu(), that is
   called when the inode's LSM state is released in the RCU free
   callback.

 - Refactor two LSM hooks to better fit the LSM return value patterns

   The vast majority of the LSM hooks follow the "return 0 on success,
   negative values on failure" pattern, however, there are a small
   handful that have unique return value behaviors which has caused
   confusion in the past and makes it difficult for the BPF verifier to
   properly vet BPF LSM programs. This includes patches to
   convert two of these"special" LSM hooks to the common 0/-ERRNO pattern.

 - Various cleanups and improvements

   A handful of patches to remove redundant code, better leverage the
   IS_ERR_OR_NULL() helper, add missing "static" markings, and do some
   minor style fixups.

* tag 'lsm-pr-20240911' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/lsm: (40 commits)
  security: Update file_set_fowner documentation
  fs: Fix file_set_fowner LSM hook inconsistencies
  lsm: Use IS_ERR_OR_NULL() helper function
  lsm: remove LSM_COUNT and LSM_CONFIG_COUNT
  ipe: Remove duplicated include in ipe.c
  lsm: replace indirect LSM hook calls with static calls
  lsm: count the LSMs enabled at compile time
  kernel: Add helper macros for loop unrolling
  init/main.c: Initialize early LSMs after arch code, static keys and calls.
  MAINTAINERS: add IPE entry with Fan Wu as maintainer
  documentation: add IPE documentation
  ipe: kunit test for parser
  scripts: add boot policy generation program
  ipe: enable support for fs-verity as a trust provider
  fsverity: expose verified fsverity built-in signatures to LSMs
  lsm: add security_inode_setintegrity() hook
  ipe: add support for dm-verity as a trust provider
  dm-verity: expose root hash digest and signature data to LSMs
  block,lsm: add LSM blob and new LSM hooks for block devices
  ipe: add permissive toggle
  ...
2024-09-16 18:19:47 +02:00
Linus Torvalds
e8fc317dfc vfs-6.12.procfs
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCZuQEwAAKCRCRxhvAZXjc
 onI2AQDXa5XhIx0VpLWE9uVImVy3QuUKc/5pI1e1DKMgxLhKCgEAh15a4ETqmVaw
 Zp3ZSzoLD8Ez1WwWb6cWQuHFYRSjtwU=
 =+LKG
 -----END PGP SIGNATURE-----

Merge tag 'vfs-6.12.procfs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs

Pull procfs updates from Christian Brauner:
 "This contains the following changes for procfs:

   - Add config options and parameters to block forcing memory writes.

     This adds a Kconfig option and boot param to allow removing the
     FOLL_FORCE flag from /proc/<pid>/mem write calls as this can be
     used in various attacks.

     The traditional forcing behavior is kept as default because it can
     break GDB and some other use cases.

     This is the simpler version that you had requested.

   - Restrict overmounting of ephemeral entities.

     It is currently possible to mount on top of various ephemeral
     entities in procfs. This specifically includes magic links. To
     recap, magic links are links of the form /proc/<pid>/fd/<nr>. They
     serve as references to a target file and during path lookup they
     cause a jump to the target path. Such magic links disappear if the
     corresponding file descriptor is closed.

     Currently it is possible to overmount such magic links. This is
     mostly interesting for an attacker that wants to somehow trick a
     process into e.g., reopening something that it didn't intend to
     reopen or to hide a malicious file descriptor.

     But also it risks leaking mounts for long-running processes. When
     overmounting a magic link like above, the mount will not be
     detached when the file descriptor is closed. Only the target
     mountpoint will disappear. Which has the consequence of making it
     impossible to unmount that mount afterwards. So the mount will
     stick around until the process exits and the /proc/<pid>/ directory
     is cleaned up during proc_flush_pid() when the dentries are pruned
     and invalidated.

     That in turn means it's possible for a program to accidentally leak
     mounts and it's also possible to make a task leak mounts without
     it's knowledge if the attacker just keeps overmounting things under
     /proc/<pid>/fd/<nr>.

     Disallow overmounting of such ephemeral entities.

   - Cleanup the readdir method naming in some procfs file operations.

   - Replace kmalloc() and strcpy() with a simple kmemdup() call"

* tag 'vfs-6.12.procfs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
  proc: fold kmalloc() + strcpy() into kmemdup()
  proc: block mounting on top of /proc/<pid>/fdinfo/*
  proc: block mounting on top of /proc/<pid>/fd/*
  proc: block mounting on top of /proc/<pid>/map_files/*
  proc: add proc_splice_unmountable()
  proc: proc_readfdinfo() -> proc_fdinfo_iterate()
  proc: proc_readfd() -> proc_fd_iterate()
  proc: add config & param to block forcing mem writes
2024-09-16 09:36:59 +02:00
Linus Torvalds
02824a5fd1 Power management updates for 6.12-rc1
- Remove LATENCY_MULTIPLIER from cpufreq (Qais Yousef).
 
  - Add support for Granite Rapids and Sierra Forest in OOB mode to the
    intel_pstate cpufreq driver (Srinivas Pandruvada).
 
  - Add basic support for CPU capacity scaling on x86 and make the
    intel_pstate driver set asymmetric CPU capacity on hybrid systems
    without SMT (Rafael Wysocki).
 
  - Add missing MODULE_DESCRIPTION() macros to the powerpc cpufreq
    driver (Jeff Johnson).
 
  - Several OF related cleanups in cpufreq drivers (Rob Herring).
 
  - Enable COMPILE_TEST for ARM drivers (Rob Herrring).
 
  - Introduce quirks for syscon failures and use socinfo to get revision
    for TI cpufreq driver (Dhruva Gole, Nishanth Menon).
 
  - Minor cleanups in amd-pstate driver (Anastasia Belova, Dhananjay
    Ugwekar).
 
  - Minor cleanups for loongson, cpufreq-dt and powernv cpufreq drivers
    (Danila Tikhonov, Huacai Chen, and Liu Jing).
 
  - Make amd-pstate validate return of any attempt to update EPP limits,
    which fixes the masking hardware problems (Mario Limonciello).
 
  - Move the calculation of the AMD boost numerator outside of amd-pstate,
    correcting acpi-cpufreq on systems with preferred cores (Mario
    Limonciello).
 
  - Harden preferred core detection in amd-pstate to avoid potential
    false positives (Mario Limonciello).
 
  - Add extra unit test coverage for mode state machine (Mario
    Limonciello).
 
  - Fix an "Uninitialized variables" issue in amd-pstste (Qianqiang Liu).
 
  - Add Granite Rapids Xeon support to intel_idle (Artem Bityutskiy).
 
  - Disable promotion to C1E on Jasper Lake and Elkhart Lake in
    intel_idle (Kai-Heng Feng).
 
  - Use scoped device node handling to fix missing of_node_put() and
    simplify walking OF children in the riscv-sbi cpuidle driver (Krzysztof
    Kozlowski).
 
  - Remove dead code from cpuidle_enter_state() (Dhruva Gole).
 
  - Change an error pointer to NULL to fix error handling in the
    intel_rapl power capping driver (Dan Carpenter).
 
  - Fix off by one in get_rpi() in the intel_rapl power capping
    driver (Dan Carpenter).
 
  - Add support for ArrowLake-U to the intel_rapl power capping
    driver (Sumeet Pawnikar).
 
  - Fix the energy-pkg event for AMD CPUs in the intel_rapl power capping
    driver (Dhananjay Ugwekar).
 
  - Add support for AMD family 1Ah processors to the intel_rapl power
    capping driver (Dhananjay Ugwekar).
 
  - Remove unused stub for saveable_highmem_page() and remove deprecated
    macros from power management documentation (Andy Shevchenko).
 
  - Use ysfs_emit() and sysfs_emit_at() in "show" functions in the PM
    sysfs interface (Xueqin Luo).
 
  - Update the maintainers information for the operating-points-v2-ti-cpu DT
    binding (Dhruva Gole).
 
  - Drop unnecessary of_match_ptr() from ti-opp-supply (Rob Herring).
 
  - Add missing MODULE_DESCRIPTION() macros to devfreq governors (Jeff
    Johnson).
 
  - Use devm_clk_get_enabled() in the exynos-bus devfreq driver (Anand
    Moon).
 
  - Use of_property_present() instead of of_get_property() in the imx-bus
    devfreq driver (Rob Herring).
 
  - Update directory handling and installation process in the pm-graph
    Makefile and add .gitignore to ignore sleepgraph.py artifacts to
    pm-graph (Amit Vadhavana, Yo-Jung Lin).
 
  - Make cpupower display residency value in idle-info (Aboorva
    Devarajan).
 
  - Add missing powercap_set_enabled() stub function to cpupower (John
    B. Wyatt IV).
 
  - Add SWIG support to cpupower (John B. Wyatt IV).
 -----BEGIN PGP SIGNATURE-----
 
 iQJGBAABCAAwFiEE4fcc61cGeeHD/fCwgsRv/nhiVHEFAmbjKEQSHHJqd0Byand5
 c29ja2kubmV0AAoJEILEb/54YlRx8g8P/1RqL6NuCxH4eobwZigeyBS6/sLHPmKo
 wqHcerZsU7EH8DOlmBU0SH1Br2WBQAbaP8d1ukT5qkGBrZ+IM/A2ipZct0yAHH2D
 aBKwg7V3LvXo2mPuLve0knpM6W7zibPHJJlcjh8DmGQJabhWO7jr+p/0eS4JE2ek
 iE5FCXTxhvbcNJ9yWSt7+3HHmvj74P81As7txysLSzhWSZDcqXb0XJRgVJnWDt+x
 OyTAMEEAY2BuqmijHzqxxHcA1fxOBK/pa9yfPdKP7ePynLnpP7xd9A5oLbXQ4BL9
 PHqpD06ZBdSMQzKkyCODypZt8PL+FcEALE4u9chV/nzVwp7TrtDneXWA7RA0GXgq
 mp9hm51GmdptRayePR3s4TCA6a2BUw3Ue4fgs6XF/bexNpc3nx0wtP8HEevcuy8q
 Z7XQkpqW942vOohfoN42JwTjfDJhYTwSH3dcIY8UghHtzwZ5YKV1M4f97kNR7V2i
 QLJvaGJ5yTTcaHndkpc4EKknPyLRaWPh8h/yVmMRBcAaGBWaImul3a5NI07f0wLM
 LTenlpEcls7WSu9n3uvFXvT7nSS2CBV0huTbg449X4T2J0T6EooYsVuHNsFMNFLy
 Xm3lUtdm5QjAXFf+azOCO+26XQt8wObC0ttZtCC2j1b8D+9Riuwh5QHLr99rRTzn
 7Ic4U5Lkimzx
 =JM+K
 -----END PGP SIGNATURE-----

Merge tag 'pm-6.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull power management updates from Rafael Wysocki:
 "By the number of new lines of code, the most visible change here is
  the addition of hybrid CPU capacity scaling support to the
  intel_pstate driver. Next are the amd-pstate driver changes related to
  the calculation of the AMD boost numerator and preferred core
  detection.

  As far as new hardware support is concerned, the intel_idle driver
  will now handle Granite Rapids Xeon processors natively, the
  intel_rapl power capping driver will recognize family 1Ah of AMD
  processors and Intel ArrowLake-U chipos, and intel_pstate will handle
  Granite Rapids and Sierra Forest chips in the out-of-band (OOB) mode.

  Apart from the above, there is a usual collection of assorted fixes
  and code cleanups in many places and there are tooling updates.

  Specifics:

   - Remove LATENCY_MULTIPLIER from cpufreq (Qais Yousef)

   - Add support for Granite Rapids and Sierra Forest in OOB mode to the
     intel_pstate cpufreq driver (Srinivas Pandruvada)

   - Add basic support for CPU capacity scaling on x86 and make the
     intel_pstate driver set asymmetric CPU capacity on hybrid systems
     without SMT (Rafael Wysocki)

   - Add missing MODULE_DESCRIPTION() macros to the powerpc cpufreq
     driver (Jeff Johnson)

   - Several OF related cleanups in cpufreq drivers (Rob Herring)

   - Enable COMPILE_TEST for ARM drivers (Rob Herrring)

   - Introduce quirks for syscon failures and use socinfo to get
     revision for TI cpufreq driver (Dhruva Gole, Nishanth Menon)

   - Minor cleanups in amd-pstate driver (Anastasia Belova, Dhananjay
     Ugwekar)

   - Minor cleanups for loongson, cpufreq-dt and powernv cpufreq drivers
     (Danila Tikhonov, Huacai Chen, and Liu Jing)

   - Make amd-pstate validate return of any attempt to update EPP
     limits, which fixes the masking hardware problems (Mario
     Limonciello)

   - Move the calculation of the AMD boost numerator outside of
     amd-pstate, correcting acpi-cpufreq on systems with preferred cores
     (Mario Limonciello)

   - Harden preferred core detection in amd-pstate to avoid potential
     false positives (Mario Limonciello)

   - Add extra unit test coverage for mode state machine (Mario
     Limonciello)

   - Fix an "Uninitialized variables" issue in amd-pstste (Qianqiang
     Liu)

   - Add Granite Rapids Xeon support to intel_idle (Artem Bityutskiy)

   - Disable promotion to C1E on Jasper Lake and Elkhart Lake in
     intel_idle (Kai-Heng Feng)

   - Use scoped device node handling to fix missing of_node_put() and
     simplify walking OF children in the riscv-sbi cpuidle driver
     (Krzysztof Kozlowski)

   - Remove dead code from cpuidle_enter_state() (Dhruva Gole)

   - Change an error pointer to NULL to fix error handling in the
     intel_rapl power capping driver (Dan Carpenter)

   - Fix off by one in get_rpi() in the intel_rapl power capping driver
     (Dan Carpenter)

   - Add support for ArrowLake-U to the intel_rapl power capping driver
     (Sumeet Pawnikar)

   - Fix the energy-pkg event for AMD CPUs in the intel_rapl power
     capping driver (Dhananjay Ugwekar)

   - Add support for AMD family 1Ah processors to the intel_rapl power
     capping driver (Dhananjay Ugwekar)

   - Remove unused stub for saveable_highmem_page() and remove
     deprecated macros from power management documentation (Andy
     Shevchenko)

   - Use ysfs_emit() and sysfs_emit_at() in "show" functions in the PM
     sysfs interface (Xueqin Luo)

   - Update the maintainers information for the
     operating-points-v2-ti-cpu DT binding (Dhruva Gole)

   - Drop unnecessary of_match_ptr() from ti-opp-supply (Rob Herring)

   - Add missing MODULE_DESCRIPTION() macros to devfreq governors (Jeff
     Johnson)

   - Use devm_clk_get_enabled() in the exynos-bus devfreq driver (Anand
     Moon)

   - Use of_property_present() instead of of_get_property() in the
     imx-bus devfreq driver (Rob Herring)

   - Update directory handling and installation process in the pm-graph
     Makefile and add .gitignore to ignore sleepgraph.py artifacts to
     pm-graph (Amit Vadhavana, Yo-Jung Lin)

   - Make cpupower display residency value in idle-info (Aboorva
     Devarajan)

   - Add missing powercap_set_enabled() stub function to cpupower (John
     B. Wyatt IV)

   - Add SWIG support to cpupower (John B. Wyatt IV)"

* tag 'pm-6.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (62 commits)
  cpufreq/amd-pstate-ut: Fix an "Uninitialized variables" issue
  cpufreq/amd-pstate-ut: Add test case for mode switches
  cpufreq/amd-pstate: Export symbols for changing modes
  amd-pstate: Add missing documentation for `amd_pstate_prefcore_ranking`
  cpufreq: amd-pstate: Add documentation for `amd_pstate_hw_prefcore`
  cpufreq: amd-pstate: Optimize amd_pstate_update_limits()
  cpufreq: amd-pstate: Merge amd_pstate_highest_perf_set() into amd_get_boost_ratio_numerator()
  x86/amd: Detect preferred cores in amd_get_boost_ratio_numerator()
  x86/amd: Move amd_get_highest_perf() out of amd-pstate
  ACPI: CPPC: Adjust debug messages in amd_set_max_freq_ratio() to warn
  ACPI: CPPC: Drop check for non zero perf ratio
  x86/amd: Rename amd_get_highest_perf() to amd_get_boost_ratio_numerator()
  ACPI: CPPC: Adjust return code for inline functions in !CONFIG_ACPI_CPPC_LIB
  x86/amd: Move amd_get_highest_perf() from amd.c to cppc.c
  PM: hibernate: Remove unused stub for saveable_highmem_page()
  pm:cpupower: Add error warning when SWIG is not installed
  MAINTAINERS: Add Maintainers for SWIG Python bindings
  pm:cpupower: Include test_raw_pylibcpupower.py
  pm:cpupower: Add SWIG bindings files for libcpupower
  pm:cpupower: Add missing powercap_set_enabled() stub function
  ...
2024-09-16 07:47:50 +02:00
Linus Torvalds
114143a595 arm64 updates for 6.12
ACPI:
 * Enable PMCG erratum workaround for HiSilicon HIP10 and 11 platforms.
 * Ensure arm64-specific IORT header is covered by MAINTAINERS.
 
 CPU Errata:
 * Enable workaround for hardware access/dirty issue on Ampere-1A cores.
 
 Memory management:
 * Define PHYSMEM_END to fix a crash in the amdgpu driver.
 * Avoid tripping over invalid kernel mappings on the kexec() path.
 * Userspace support for the Permission Overlay Extension (POE) using
   protection keys.
 
 Perf and PMUs:
 * Add support for the "fixed instruction counter" extension in the CPU
   PMU architecture.
 * Extend and fix the event encodings for Apple's M1 CPU PMU.
 * Allow LSM hooks to decide on SPE permissions for physical profiling.
 * Add support for the CMN S3 and NI-700 PMUs.
 
 Confidential Computing:
 * Add support for booting an arm64 kernel as a protected guest under
   Android's "Protected KVM" (pKVM) hypervisor.
 
 Selftests:
 * Fix vector length issues in the SVE/SME sigreturn tests
 * Fix build warning in the ptrace tests.
 
 Timers:
 * Add support for PR_{G,S}ET_TSC so that 'rr' can deal with
   non-determinism arising from the architected counter.
 
 Miscellaneous:
 * Rework our IPI-based CPU stopping code to try NMIs if regular IPIs
   don't succeed.
 * Minor fixes and cleanups.
 -----BEGIN PGP SIGNATURE-----
 
 iQFEBAABCgAuFiEEPxTL6PPUbjXGY88ct6xw3ITBYzQFAmbkVNEQHHdpbGxAa2Vy
 bmVsLm9yZwAKCRC3rHDchMFjNKeIB/9YtbN7JMgsXktM94GP03r3tlFF36Y1S51S
 +zdDZclAVZCTCZN+PaFeAZ/+ah2EQYrY6rtDoHUSEMQdF9kH+ycuIPDTwaJ4Qkam
 QKXMpAgtY/4yf2rX4lhDF8rEvkhLDsu7oGDhqUZQsA33GrMBHfgA3oqpYwlVjvGq
 gkm7olTo9LdWAxkPpnjGrjB6Mv5Dq8dJRhW+0Q5AntI5zx3RdYGJZA9GUSzyYCCt
 FIYOtMmWPkQ0kKxIVxOxAOm/ubhfyCs2sjSfkaa3vtvtt+Yjye1Xd81rFciIbPgP
 QlK/Mes2kBZmjhkeus8guLI5Vi7tx3DQMkNqLXkHAAzOoC4oConE
 =6osL
 -----END PGP SIGNATURE-----

Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux

Pull arm64 updates from Will Deacon:
 "The highlights are support for Arm's "Permission Overlay Extension"
  using memory protection keys, support for running as a protected guest
  on Android as well as perf support for a bunch of new interconnect
  PMUs.

  Summary:

  ACPI:
   - Enable PMCG erratum workaround for HiSilicon HIP10 and 11
     platforms.
   - Ensure arm64-specific IORT header is covered by MAINTAINERS.

  CPU Errata:
   - Enable workaround for hardware access/dirty issue on Ampere-1A
     cores.

  Memory management:
   - Define PHYSMEM_END to fix a crash in the amdgpu driver.
   - Avoid tripping over invalid kernel mappings on the kexec() path.
   - Userspace support for the Permission Overlay Extension (POE) using
     protection keys.

  Perf and PMUs:
   - Add support for the "fixed instruction counter" extension in the
     CPU PMU architecture.
   - Extend and fix the event encodings for Apple's M1 CPU PMU.
   - Allow LSM hooks to decide on SPE permissions for physical
     profiling.
   - Add support for the CMN S3 and NI-700 PMUs.

  Confidential Computing:
   - Add support for booting an arm64 kernel as a protected guest under
     Android's "Protected KVM" (pKVM) hypervisor.

  Selftests:
   - Fix vector length issues in the SVE/SME sigreturn tests
   - Fix build warning in the ptrace tests.

  Timers:
   - Add support for PR_{G,S}ET_TSC so that 'rr' can deal with
     non-determinism arising from the architected counter.

  Miscellaneous:
   - Rework our IPI-based CPU stopping code to try NMIs if regular IPIs
     don't succeed.
   - Minor fixes and cleanups"

* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (94 commits)
  perf: arm-ni: Fix an NULL vs IS_ERR() bug
  arm64: hibernate: Fix warning for cast from restricted gfp_t
  arm64: esr: Define ESR_ELx_EC_* constants as UL
  arm64: pkeys: remove redundant WARN
  perf: arm_pmuv3: Use BR_RETIRED for HW branch event if enabled
  MAINTAINERS: List Arm interconnect PMUs as supported
  perf: Add driver for Arm NI-700 interconnect PMU
  dt-bindings/perf: Add Arm NI-700 PMU
  perf/arm-cmn: Improve format attr printing
  perf/arm-cmn: Clean up unnecessary NUMA_NO_NODE check
  arm64/mm: use lm_alias() with addresses passed to memblock_free()
  mm: arm64: document why pte is not advanced in contpte_ptep_set_access_flags()
  arm64: Expose the end of the linear map in PHYSMEM_END
  arm64: trans_pgd: mark PTEs entries as valid to avoid dead kexec()
  arm64/mm: Delete __init region from memblock.reserved
  perf/arm-cmn: Support CMN S3
  dt-bindings: perf: arm-cmn: Add CMN S3
  perf/arm-cmn: Refactor DTC PMU register access
  perf/arm-cmn: Make cycle counts less surprising
  perf/arm-cmn: Improve build-time assertion
  ...
2024-09-16 06:55:07 +02:00
Linus Torvalds
963d0d60d6 - Add CONFIG_ option for every hw CPU mitigation. The intent is to support
configurations and scenarios where the mitigations code is irrelevant
 
 - Other small fixlets and improvements
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmbfDhUACgkQEsHwGGHe
 VUrF9A//UkVKmIihXXak0GPqFhu8XrWeYlmwLxWe/uIy2hZCLp9L7n4pg0Ikxqz3
 9D9hYk+Jykfu/jsv0sR6LH6OAUTlJi+P0w3x3VeL1sgFPUkwFtOaN2v/t5H3SW5r
 l+VQpdUXPmLH6QbhvT84U6L/OQYr2cjhiYro47uwM9vO/SNao4HcbC/pdBr2dwxM
 KzzA9sEDg3Le391phIhEOIogA1lPNV7KMScg2VjPTqQzEJ3NQVzyYmqjPO70sN9F
 sAuksdF+rnPjc9K/W+qUcvlp8e9lDB8g0oPlyoOeubjXsnZU5YchriPdBbyAl0dJ
 bjpftXIrBj8Vtmh7Tc0Jx2tlMFXNT5FrzcqdD4sviLnhrKEJSkwAoFgIMp5A+tN8
 Kl8MrlABO8I8+zGRQB7TzhwaCC4AxCqUS3UEcYd4CBf5AWqT5i12ijbtIxPtdpG4
 5itngIV4HT8casudpC8i8OTjOTggorMa7Pu/bQULhnZwagH8chlBdoOlKKQVkeVG
 FUi+L/BljL9mASic7NRZI11tk44m9xWWkbbJOPlZaGJw9YzGrxD0YOfhbgcc9iaX
 SOUMVJEhJVJMBISGiBUQDB6r51ee6B8RKJ3ByxzpAbwsUR9cXyfSYfUyE5reQJy9
 3luj/iorL3guYU6EGEAtvbuTLGbKqybrV6zOB/QRXHWyhtUgrUA=
 =GFld
 -----END PGP SIGNATURE-----

Merge tag 'x86_bugs_for_v6.12_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 hw mitigation updates from Borislav Petkov:

 - Add CONFIG_ option for every hw CPU mitigation. The intent is to
   support configurations and scenarios where the mitigations code is
   irrelevant

 - Other small fixlets and improvements

* tag 'x86_bugs_for_v6.12_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/bugs: Fix handling when SRSO mitigation is disabled
  x86/bugs: Add missing NO_SSB flag
  Documentation/srso: Document a method for checking safe RET operates properly
  x86/bugs: Add a separate config for GDS
  x86/bugs: Remove GDS Force Kconfig option
  x86/bugs: Add a separate config for SSB
  x86/bugs: Add a separate config for Spectre V2
  x86/bugs: Add a separate config for SRBDS
  x86/bugs: Add a separate config for Spectre v1
  x86/bugs: Add a separate config for RETBLEED
  x86/bugs: Add a separate config for L1TF
  x86/bugs: Add a separate config for MMIO Stable Data
  x86/bugs: Add a separate config for TAA
  x86/bugs: Add a separate config for MDS
2024-09-16 06:48:38 +02:00
Rafael J. Wysocki
415dff1c96 Merge branch 'pm-cpufreq'
Merge cpufreq updates for 6.12-rc1:

 - Remove LATENCY_MULTIPLIER from cpufreq (Qais Yousef).

 - Add support for Granite Rapids and Sierra Forest in OOB mode to the
   intel_pstate cpufreq driver (Srinivas Pandruvada).

 - Add basic support for CPU capacity scaling on x86 and make the
   intel_pstate driver set asymmetric CPU capacity on hybrid systems
   without SMT (Rafael Wysocki).

 - Add missing MODULE_DESCRIPTION() macros to the powerpc cpufreq
   driver (Jeff Johnson).

 - Several OF related cleanups in cpufreq drivers (Rob Herring).

 - Enable COMPILE_TEST for ARM drivers (Rob Herrring).

 - Introduce quirks for syscon failures and use socinfo to get revision
   for TI cpufreq driver (Dhruva Gole, Nishanth Menon).

 - Minor cleanups in amd-pstate driver (Anastasia Belova, Dhananjay
   Ugwekar).

 - Minor cleanups for loongson, cpufreq-dt and powernv cpufreq drivers
   (Danila Tikhonov, Huacai Chen, and Liu Jing).

 - Make amd-pstate validate return of any attempt to update EPP limits,
   which fixes the masking hardware problems (Mario Limonciello).

 - Move the calculation of the AMD boost numerator outside of amd-pstate,
   correcting acpi-cpufreq on systems with preferred cores (Mario
   Limonciello).

 - Harden preferred core detection in amd-pstate to avoid potential
   false positives (Mario Limonciello).

 - Add extra unit test coverage for mode state machine (Mario
   Limonciello).

 - Fix an "Uninitialized variables" issue in amd-pstste (Qianqiang Liu).

* pm-cpufreq: (35 commits)
  cpufreq/amd-pstate-ut: Fix an "Uninitialized variables" issue
  cpufreq/amd-pstate-ut: Add test case for mode switches
  cpufreq/amd-pstate: Export symbols for changing modes
  amd-pstate: Add missing documentation for `amd_pstate_prefcore_ranking`
  cpufreq: amd-pstate: Add documentation for `amd_pstate_hw_prefcore`
  cpufreq: amd-pstate: Optimize amd_pstate_update_limits()
  cpufreq: amd-pstate: Merge amd_pstate_highest_perf_set() into amd_get_boost_ratio_numerator()
  x86/amd: Detect preferred cores in amd_get_boost_ratio_numerator()
  x86/amd: Move amd_get_highest_perf() out of amd-pstate
  ACPI: CPPC: Adjust debug messages in amd_set_max_freq_ratio() to warn
  ACPI: CPPC: Drop check for non zero perf ratio
  x86/amd: Rename amd_get_highest_perf() to amd_get_boost_ratio_numerator()
  ACPI: CPPC: Adjust return code for inline functions in !CONFIG_ACPI_CPPC_LIB
  x86/amd: Move amd_get_highest_perf() from amd.c to cppc.c
  cpufreq/amd-pstate: Catch failures for amd_pstate_epp_update_limit()
  cpufreq: ti-cpufreq: Use socinfo to get revision in AM62 family
  cpufreq: Fix the cacography in powernv-cpufreq.c
  cpufreq: ti-cpufreq: Introduce quirks to handle syscon fails appropriately
  cpufreq: loongson3: Use raw_smp_processor_id() in do_service_request()
  cpufreq: amd-pstate: add check for cpufreq_cpu_get's return value
  ...
2024-09-11 18:25:54 +02:00
Rafael J. Wysocki
9bcf30348f second round of amd-pstate changes for 6.12 (second try):
* Move the calculation of the AMD boost numerator outside of
   amd-pstate, correcting acpi-cpufreq on systems with preferred cores
 * Harden preferred core detection to avoid potential false positives
 * Add extra unit test coverage for mode state machine
 -----BEGIN PGP SIGNATURE-----
 
 iQJOBAABCgA4FiEECwtuSU6dXvs5GA2aLRkspiR3AnYFAmbhviEaHG1hcmlvLmxp
 bW9uY2llbGxvQGFtZC5jb20ACgkQLRkspiR3AnYqDA//TrvmXcpk1mnVJw3Y7MG0
 /n8dsLpxqVtEf+USnlGR+iRhgSQ/W/Kr7b5a+jmdCwpHChuWHt2FnNgcHLIxDnZC
 vmEJ02/2BCRoPKvcvV4VTh0ATu3O9nqwQiBVWBdNjDy+Dzr0pzA+SQopt1hCIsO2
 mzUodhpiBqYKlMf/i6+aM1gZCGGqoRC40aGqnJsgegb61vl7zIc2ZcbTxUQlyTfv
 t6J73IXLx8+YtrjejBYc7mRHhMQ2hCKy92C/8cNoGocj5faSKsAA3OUDcWq8qX0U
 zK3GGGdW8MLHSbt3VyntstnfiLL7TnzowcjvrMudIWpjC1987GlE9BApbN9VRZ8e
 ARN3Y7/ltjut/1fRB97BwjI9aDpzA0122Qzy4UOcK8o+be1eIr+ihV3Z9EN/snWg
 0L/oq5+rGHvvIzf1BwGhoPSvgBIu7eMIYDcRxKPlEiKsbXrL4DdJC/nXgaZ/HiGO
 eHx1dNy7LFrdnEwVI1frZWC6ZuZcpmOBdhnfU+leVxzB3Z++Qc266rsxKBsc5taZ
 PPV18pxfbbl3iL85KDIbuBUCmA0aY8WEdCKtfXpl7zlB5g0fZQLyYeUbvahK08Sk
 vyQAnPECbX/4v1Vx54Z70GPk0XD2+TXdg8yApnXrmRc36z/SLdprk5hPKbKhZu/r
 iPxFUnvd0HCtjsLrsq/qUiQ=
 =R4HZ
 -----END PGP SIGNATURE-----

Merge tag 'amd-pstate-v6.12-2024-09-11' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/superm1/linux

Merge the second round of amd-pstate changes for 6.12 from Mario
Limonciello:

"* Move the calculation of the AMD boost numerator outside of
   amd-pstate, correcting acpi-cpufreq on systems with preferred cores
 * Harden preferred core detection to avoid potential false positives
 * Add extra unit test coverage for mode state machine"

* tag 'amd-pstate-v6.12-2024-09-11' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/superm1/linux:
  cpufreq/amd-pstate-ut: Fix an "Uninitialized variables" issue
  cpufreq/amd-pstate-ut: Add test case for mode switches
  cpufreq/amd-pstate: Export symbols for changing modes
  amd-pstate: Add missing documentation for `amd_pstate_prefcore_ranking`
  cpufreq: amd-pstate: Add documentation for `amd_pstate_hw_prefcore`
  cpufreq: amd-pstate: Optimize amd_pstate_update_limits()
  cpufreq: amd-pstate: Merge amd_pstate_highest_perf_set() into amd_get_boost_ratio_numerator()
  x86/amd: Detect preferred cores in amd_get_boost_ratio_numerator()
  x86/amd: Move amd_get_highest_perf() out of amd-pstate
  ACPI: CPPC: Adjust debug messages in amd_set_max_freq_ratio() to warn
  ACPI: CPPC: Drop check for non zero perf ratio
  x86/amd: Rename amd_get_highest_perf() to amd_get_boost_ratio_numerator()
  ACPI: CPPC: Adjust return code for inline functions in !CONFIG_ACPI_CPPC_LIB
  x86/amd: Move amd_get_highest_perf() from amd.c to cppc.c
2024-09-11 18:22:23 +02:00
Mario Limonciello
15a2b764ea amd-pstate: Add missing documentation for amd_pstate_prefcore_ranking
`amd_pstate_prefcore_ranking` reflects the dynamic rankings of a CPU
core based on platform conditions.  Explicitly include it in the
documentation.

Reviewed-by: Gautham R. Shenoy <gautham.shenoy@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
2024-09-11 10:23:23 -05:00
Mario Limonciello
b96b82d1af cpufreq: amd-pstate: Add documentation for amd_pstate_hw_prefcore
Explain that the sysfs file represents both preferred core being
enabled by the user and supported by the hardware.

Reviewed-by: Gautham R. Shenoy <gautham.shenoy@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
2024-09-11 10:23:23 -05:00
Mario Limonciello
ad4caad58d cpufreq: amd-pstate: Merge amd_pstate_highest_perf_set() into amd_get_boost_ratio_numerator()
The special case in amd_pstate_highest_perf_set() is the value used
for calculating the boost numerator.  Merge this into
amd_get_boost_ratio_numerator() and then use that to calculate boost
ratio.

This allows dropping more special casing of the highest perf value.

Reviewed-by: Gautham R. Shenoy <gautham.shenoy@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
2024-09-11 10:23:23 -05:00
Thomas Gleixner
2f7eedca6c Merge branch 'linus' into timers/core
To update with the latest fixes.
2024-09-10 13:49:53 +02:00
Anna-Maria Behnsen
bd7c8ff9fe treewide: Fix wrong singular form of jiffies in comments
There are several comments all over the place, which uses a wrong singular
form of jiffies.

Replace 'jiffie' by 'jiffy'. No functional change.

Signed-off-by: Anna-Maria Behnsen <anna-maria@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> # m68k
Link: https://lore.kernel.org/all/20240904-devel-anna-maria-b4-timers-flseep-v1-3-e98760256370@linutronix.de
2024-09-08 20:47:40 +02:00
Neeraj Upadhyay
355debb83b Merge branches 'context_tracking.15.08.24a', 'csd.lock.15.08.24a', 'nocb.09.09.24a', 'rcutorture.14.08.24a', 'rcustall.09.09.24a', 'srcu.12.08.24a', 'rcu.tasks.14.08.24a', 'rcu_scaling_tests.15.08.24a', 'fixes.12.08.24a' and 'misc.11.08.24a' into next.09.09.24a 2024-09-09 00:09:47 +05:30
Robin Murphy
4d5a7680f2 perf: Add driver for Arm NI-700 interconnect PMU
The Arm NI-700 Network-on-Chip Interconnect has a relatively
straightforward design with a hierarchy of voltage, power, and clock
domains, where each clock domain then contains a number of interface
units and a PMU which can monitor events thereon. As such, it begets a
relatively straightforward driver to interface those PMUs with perf.

Even more so than with arm-cmn, users will require detailed knowledge of
the wider system topology in order to meaningfully analyse anything,
since the interconnect itself cannot know what lies beyond the boundary
of each inscrutably-numbered interface. Given that, for now they are
also expected to refer to the NI-700 documentation for the relevant
event IDs to provide as well. An identifier is implemented so we can
come back and add jevents if anyone really wants to.

Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Link: https://lore.kernel.org/r/9933058d0ab8138c78a61cd6852ea5d5ff48e393.1725470837.git.robin.murphy@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
2024-09-06 12:58:28 +01:00
Guilherme G. Piccoli
e04eb52bfa Documentation: Document the kernel flag bdev_allow_write_mounted
Commit ed5cc702d3 ("block: Add config option to not allow writing to mounted
devices") added a Kconfig option along with a kernel command-line tuning to
control writes to mounted block devices, as a means to deal with fuzzers like
Syzkaller, that provokes kernel crashes by directly writing on block devices
bypassing the filesystem (so the FS has no awareness and cannot cope with that).

The patch just missed adding such kernel command-line option to the kernel
documentation, so let's fix that.

Cc: Bart Van Assche <bvanassche@acm.org>
Cc: Darrick J. Wong <djwong@kernel.org>
Cc: Jens Axboe <axboe@kernel.dk>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Guilherme G. Piccoli <gpiccoli@igalia.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Link: https://lore.kernel.org/r/20240828145045.309835-1-gpiccoli@igalia.com
2024-09-05 14:18:28 -06:00
Jonathan Corbet
d224338aa1 Linux 6.11-rc6
-----BEGIN PGP SIGNATURE-----
 
 iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAmbUG7oeHHRvcnZhbGRz
 QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiG7LUH/26M4QJ5UGJHsehd
 bbHlE4or0jibFyMbUiYDOElqLITjCVH6mi3Kv3E7sfyLxSsglVRRNzLCTq/UgTf8
 E1L90q4wCySElzzIhH6cltuQdAhs7pRWs5BETByvIW+g+ayN0LZxUPbvB8yl/nOU
 Zx8flBEuM2isuRlnx+iRccbf2PxNadSkSYg2TlmZr8mfFKCiRxjU7x355Q3UcylQ
 b8S2jVgq69CSDF3IBOzwHZjdq5OceDsO8he0KcfSTvSgyFMcwhntAT397YEnFXnk
 KKjKPNCu3KqHtTxsi4Sc0wOxVcgctDv4OPethaL8yROQ7jdBTkvNpPT1yMf7bca8
 ZLpSo5Y=
 =TBcj
 -----END PGP SIGNATURE-----

Merge tag 'v6.11-rc6' into docs-mw

This is done primarily to get a docs build fix merged via another tree so
that "make htmldocs" stops failing.
2024-09-05 14:01:38 -06:00
Mike Yuan
e399257349 mm/memcontrol: respect zswap.writeback setting from parent cg too
Currently, the behavior of zswap.writeback wrt.  the cgroup hierarchy
seems a bit odd.  Unlike zswap.max, it doesn't honor the value from parent
cgroups.  This surfaced when people tried to globally disable zswap
writeback, i.e.  reserve physical swap space only for hibernation [1] -
disabling zswap.writeback only for the root cgroup results in subcgroups
with zswap.writeback=1 still performing writeback.

The inconsistency became more noticeable after I introduced the
MemoryZSwapWriteback= systemd unit setting [2] for controlling the knob.
The patch assumed that the kernel would enforce the value of parent
cgroups.  It could probably be workarounded from systemd's side, by going
up the slice unit tree and inheriting the value.  Yet I think it's more
sensible to make it behave consistently with zswap.max and friends.

[1] https://wiki.archlinux.org/title/Power_management/Suspend_and_hibernate#Disable_zswap_writeback_to_use_the_swap_space_only_for_hibernation
[2] https://github.com/systemd/systemd/pull/31734

Link: https://lkml.kernel.org/r/20240823162506.12117-1-me@yhndnzj.com
Fixes: 501a06fe8e ("zswap: memcontrol: implement zswap writeback disabling")
Signed-off-by: Mike Yuan <me@yhndnzj.com>
Reviewed-by: Nhat Pham <nphamcs@gmail.com>
Acked-by: Yosry Ahmed <yosryahmed@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Michal Koutný <mkoutny@suse.com>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Shakeel Butt <shakeel.butt@linux.dev>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-09-01 17:59:02 -07:00
Yicong Yang
d1c93d5c67 drivers/perf: hisi_pcie: Export supported Root Ports [bdf_min, bdf_max]
Currently users can get the Root Ports supported by the PCIe PMU by
"bus" sysfs attributes which indicates the PCIe bus number where
Root Ports are located. This maybe insufficient since Root Ports
supported by different PCIe PMUs may be located on the same PCIe bus.
So export the BDF range the Root Ports additionally.

Signed-off-by: Yicong Yang <yangyicong@hisilicon.com>
Acked-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/r/20240829090332.28756-4-yangyicong@huawei.com
Signed-off-by: Will Deacon <will@kernel.org>
2024-08-30 11:43:10 +01:00
Adrian Ratiu
41e8149c88
proc: add config & param to block forcing mem writes
This adds a Kconfig option and boot param to allow removing
the FOLL_FORCE flag from /proc/pid/mem write calls because
it can be abused.

The traditional forcing behavior is kept as default because
it can break GDB and some other use cases.

Previously we tried a more sophisticated approach allowing
distributions to fine-tune /proc/pid/mem behavior, however
that got NAK-ed by Linus [1], who prefers this simpler
approach with semantics also easier to understand for users.

Link: https://lore.kernel.org/lkml/CAHk-=wiGWLChxYmUA5HrT5aopZrB7_2VTa0NLZcxORgkUe5tEQ@mail.gmail.com/ [1]
Cc: Doug Anderson <dianders@chromium.org>
Cc: Jeff Xu <jeffxu@google.com>
Cc: Jann Horn <jannh@google.com>
Cc: Kees Cook <kees@kernel.org>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Christian Brauner <brauner@kernel.org>
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Adrian Ratiu <adrian.ratiu@collabora.com>
Link: https://lore.kernel.org/r/20240802080225.89408-1-adrian.ratiu@collabora.com
Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-08-30 08:19:43 +02:00
Borislav Petkov (AMD)
4015350525 Documentation/srso: Document a method for checking safe RET operates properly
Add a method to quickly verify whether safe RET operates properly on
a given system using perf tool.

Also, add a selftest which does the same thing.

Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20240731160531.28640-1-bp@kernel.org
2024-08-27 09:16:35 +02:00
Jani Nikula
cd0403adea Documentation: admin-guide: direct people to bug trackers, if specified
Update bug reporting info in bug-hunting.rst to direct people to
driver/subsystem bug trackers, if explicitly specified with the
MAINTAINERS "B:" entry. Use the new get_maintainer.pl --bug option to
print the info.

Cc: Joe Perches <joe@perches.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Link: https://lore.kernel.org/r/20240815113450.3397499-2-jani.nikula@intel.com
2024-08-26 16:10:12 -06:00
Thorsten Leemhuis
cbbdb6c625 docs: bug-bisect: rewrite to better match the other bisecting text
Rewrite the short document on bisecting kernel bugs. The new text
improves .config handling, brings a mention of 'git bisect skip', and
explains what to do after the bisection finished -- including trying a
revert to verify the result. The rewrite at the same time removes the
unrelated and outdated section on 'Devices not appearing' and replaces
some sentences about bug reporting with a pointer to the document
covering that topic in detail.

This overall brings the approach close to the one in the recently added
text Documentation/admin-guide/verify-bugs-and-bisect-regressions.rst.
As those two texts serve a similar purpose for different audiences,
mention that document in the head of this one and outline when the
other might be the better one to follow.

Signed-off-by: Thorsten Leemhuis <linux@leemhuis.info>
Reviewed-by: Petr Tesarik <petr@tesarici.cz>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Link: https://lore.kernel.org/r/74dc0137dcc3e2c05648e885a7bc31ffd39a0890.1724312119.git.linux@leemhuis.info
2024-08-26 15:34:51 -06:00
Krishna chaitanya chundru
96a37ec986 Documentation: dwc_pcie_pmu: Update bdf to sbdf
Update document to reflect the driver change to use sbdf instead
of bdf alone.

Signed-off-by: Krishna chaitanya chundru <quic_krichai@quicinc.com>
Reviewed-by: Yicong Yang <yangyicong@hisilicon.com>
Link: https://lore.kernel.org/r/20240816-dwc_pmu_fix-v2-2-198b8ab1077c@quicinc.com
Signed-off-by: Will Deacon <will@kernel.org>
2024-08-23 16:07:25 +01:00
Deven Bowers
ac6731870e documentation: add IPE documentation
Add IPE's admin and developer documentation to the kernel tree.

Co-developed-by: Fan Wu <wufan@linux.microsoft.com>
Signed-off-by: Deven Bowers <deven.desai@linux.microsoft.com>
Signed-off-by: Fan Wu <wufan@linux.microsoft.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
2024-08-20 14:03:47 -04:00
Chen Ridong
d1a92d2d6c cgroup: update some statememt about delegation
The comment in cgroup_file_write is missing some interfaces, such as
'cgroup.threads'. All delegatable files are listed in
'/sys/kernel/cgroup/delegate', so update the comment in cgroup_file_write.
Besides, add a statement that files outside the namespace shouldn't be
visible from inside the delegated namespace.

tj: Reflowed text for consistency.

Signed-off-by: Chen Ridong <chenridong@huawei.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2024-08-19 12:16:17 -10:00
Paul E. McKenney
1dd01c0650 rcu: Summarize RCU CPU stall warnings during CSD-lock stalls
During CSD-lock stalls, the additional information output by RCU CPU
stall warnings is usually redundant, flooding the console for not good
reason.  However, this has been the way things work for a few years.
This commit therefore adds an rcutree.csd_lock_suppress_rcu_stall kernel
boot parameter that causes RCU CPU stall warnings to be abbreviated to
a single line when there is at least one CPU that has been stuck waiting
for CSD lock for more than five seconds.

To make this abbreviated message happen with decent probability:

tools/testing/selftests/rcutorture/bin/kvm.sh --allcpus --duration 8 \
	--configs "2*TREE01" --kconfig "CONFIG_CSD_LOCK_WAIT_DEBUG=y" \
	--bootargs "csdlock_debug=1 rcutorture.stall_cpu=200 \
	rcutorture.stall_cpu_holdoff=120 rcutorture.stall_cpu_irqsoff=1 \
	rcutree.csd_lock_suppress_rcu_stall=1 \
	rcupdate.rcu_exp_cpu_stall_timeout=5000" --trust-make

[ paulmck: Apply kernel test robot feedback. ]

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Neeraj Upadhyay <neeraj.upadhyay@kernel.org>
2024-08-15 00:10:50 +05:30
Paul E. McKenney
0ff92d145a doc: Remove RCU Tasks Rude asynchronous APIs
The call_rcu_tasks_rude() and rcu_barrier_tasks_rude() APIs are no longer.
This commit therefore removes them from the documentation.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Neeraj Upadhyay <neeraj.upadhyay@kernel.org>
2024-08-14 16:45:07 +05:30
Paul E. McKenney
58cb321054 rcutorture: Add a stall_cpu_repeat module parameter
This commit adds an stall_cpu_repeat kernel, which is also the
rcutorture.stall_cpu_repeat boot parameter, to test repeated CPU stalls.
Note that only the first stall will pay attention to the stall_cpu_irqsoff
module parameter.  For the second and subsequent stalls, interrupts will
be enabled.  This is helpful when testing the interaction between RCU
CPU stall warnings and CSD-lock stall warnings.

Reported-by: Rik van Riel <riel@surriel.com>
Signed-off-by: "Paul E. McKenney" <paulmck@kernel.org>
Signed-off-by: Neeraj Upadhyay <neeraj.upadhyay@kernel.org>
2024-08-14 16:23:40 +05:30
Daniel Yang
86cfa9a85f Documentation: dm-crypt.rst warning + error fix
While building kernel documention using make htmldocs command, I was
getting unexpected indentation error. Single description was given for
two module parameters with wrong indentation. So, I corrected the
indentation of both parameters and the description.

Signed-off-by: Shibu kumar <shibukumar.bit@gmail.com>
Signed-off-by: Daniel Yang <danielyangkang@gmail.com>
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Fixes: 0d815e3400 ("dm-crypt: limit the size of encryption requests")
2024-08-13 16:36:39 +02:00
Steven Rostedt
a7050ca724 pstore/ramoops: Fix typo as there is no "reserver"
For some reason my finger always hits the 'r' after typing "reserve".
Fix the typo in the Documentation example.

Fixes: d9d814eebb ("pstore/ramoops: Add ramoops.mem_name= command line option")
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Link: https://lore.kernel.org/r/20240807170029.3c1ff651@gandalf.local.home
Signed-off-by: Kees Cook <kees@kernel.org>
2024-08-08 10:51:33 -07:00
Steve French
1b5487aefb smb3: fix setting SecurityFlags when encryption is required
Setting encryption as required in security flags was broken.
For example (to require all mounts to be encrypted by setting):

  "echo 0x400c5 > /proc/fs/cifs/SecurityFlags"

Would return "Invalid argument" and log "Unsupported security flags"
This patch fixes that (e.g. allowing overriding the default for
SecurityFlags  0x00c5, including 0x40000 to require seal, ie
SMB3.1.1 encryption) so now that works and forces encryption
on subsequent mounts.

Acked-by: Bharath SM <bharathsm@microsoft.com>
Cc: stable@vger.kernel.org
Signed-off-by: Steve French <stfrench@microsoft.com>
2024-08-08 11:14:53 -05:00
Shibu Kumar
e9c7acd723 docs: dm-crypt: Removal of unexpected indentation error
Add the required indentation to fix this docs build error:

  Documentation/admin-guide/device-mapper/dm-crypt.rst:167: ERROR: Unexpected indentation.

Also split the documentation for read and write into separate blocks.

Signed-off-by: Shibu kumar shibukumar.bit@gmail.com
[jc: rewrote changelog]
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Link: https://lore.kernel.org/r/20240803183306.32425-1-shibukumar.bit@gmail.com
2024-08-07 13:22:24 -06:00
Sangmoon Kim
946c57e61d Documentation: kernel-parameters: add workqueue.panic_on_stall
The workqueue.panic_on_stall kernel parameter was added in commit
073107b39e ("workqueue: add cmdline parameter workqueue.panic_on_stall")
but not listed in the kernel-parameters doc. Add it there.

Signed-off-by: Sangmoon Kim <sangmoon.kim@samsung.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2024-08-07 06:16:14 -10:00
Tetsuo Handa
b88f55389a profiling: remove profile=sleep support
The kernel sleep profile is no longer working due to a recursive locking
bug introduced by commit 42a20f86dc ("sched: Add wrapper for get_wchan()
to keep task blocked")

Booting with the 'profile=sleep' kernel command line option added or
executing

  # echo -n sleep > /sys/kernel/profiling

after boot causes the system to lock up.

Lockdep reports

  kthreadd/3 is trying to acquire lock:
  ffff93ac82e08d58 (&p->pi_lock){....}-{2:2}, at: get_wchan+0x32/0x70

  but task is already holding lock:
  ffff93ac82e08d58 (&p->pi_lock){....}-{2:2}, at: try_to_wake_up+0x53/0x370

with the call trace being

   lock_acquire+0xc8/0x2f0
   get_wchan+0x32/0x70
   __update_stats_enqueue_sleeper+0x151/0x430
   enqueue_entity+0x4b0/0x520
   enqueue_task_fair+0x92/0x6b0
   ttwu_do_activate+0x73/0x140
   try_to_wake_up+0x213/0x370
   swake_up_locked+0x20/0x50
   complete+0x2f/0x40
   kthread+0xfb/0x180

However, since nobody noticed this regression for more than two years,
let's remove 'profile=sleep' support based on the assumption that nobody
needs this functionality.

Fixes: 42a20f86dc ("sched: Add wrapper for get_wchan() to keep task blocked")
Cc: stable@vger.kernel.org # v5.16+
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2024-08-04 13:36:28 -07:00
Waiman Long
ab03125268 cgroup: Show # of subsystem CSSes in cgroup.stat
Cgroup subsystem state (CSS) is an abstraction in the cgroup layer to
help manage different structures in various cgroup subsystems by being
an embedded element inside a larger structure like cpuset or mem_cgroup.

The /proc/cgroups file shows the number of cgroups for each of the
subsystems.  With cgroup v1, the number of CSSes is the same as the
number of cgroups.  That is not the case anymore with cgroup v2. The
/proc/cgroups file cannot show the actual number of CSSes for the
subsystems that are bound to cgroup v2.

So if a v2 cgroup subsystem is leaking cgroups (usually memory cgroup),
we can't tell by looking at /proc/cgroups which cgroup subsystems may
be responsible.

As cgroup v2 had deprecated the use of /proc/cgroups, the hierarchical
cgroup.stat file is now being extended to show the number of live and
dying CSSes associated with all the non-inhibited cgroup subsystems that
have been bound to cgroup v2. The number includes CSSes in the current
cgroup as well as in all the descendants underneath it.  This will help
us pinpoint which subsystems are responsible for the increasing number
of dying (nr_dying_descendants) cgroups.

The CSSes dying counts are stored in the cgroup structure itself
instead of inside the CSS as suggested by Johannes. This will allow
us to accurately track dying counts of cgroup subsystems that have
recently been disabled in a cgroup. It is now possible that a zero
subsystem number is coupled with a non-zero dying subsystem number.

The cgroup-v2.rst file is updated to discuss this new behavior.

With this patch applied, a sample output from root cgroup.stat file
was shown below.

	nr_descendants 56
	nr_subsys_cpuset 1
	nr_subsys_cpu 43
	nr_subsys_io 43
	nr_subsys_memory 56
	nr_subsys_perf_event 57
	nr_subsys_hugetlb 1
	nr_subsys_pids 56
	nr_subsys_rdma 1
	nr_subsys_misc 1
	nr_dying_descendants 30
	nr_dying_subsys_cpuset 0
	nr_dying_subsys_cpu 0
	nr_dying_subsys_io 0
	nr_dying_subsys_memory 30
	nr_dying_subsys_perf_event 0
	nr_dying_subsys_hugetlb 0
	nr_dying_subsys_pids 0
	nr_dying_subsys_rdma 0
	nr_dying_subsys_misc 0

Another sample output from system.slice/cgroup.stat was:

	nr_descendants 34
	nr_subsys_cpuset 0
	nr_subsys_cpu 32
	nr_subsys_io 32
	nr_subsys_memory 34
	nr_subsys_perf_event 35
	nr_subsys_hugetlb 0
	nr_subsys_pids 34
	nr_subsys_rdma 0
	nr_subsys_misc 0
	nr_dying_descendants 30
	nr_dying_subsys_cpuset 0
	nr_dying_subsys_cpu 0
	nr_dying_subsys_io 0
	nr_dying_subsys_memory 30
	nr_dying_subsys_perf_event 0
	nr_dying_subsys_hugetlb 0
	nr_dying_subsys_pids 0
	nr_dying_subsys_rdma 0
	nr_dying_subsys_misc 0

Note that 'debug' controller wasn't used to provide this information because
the controller is not recommended in productions kernels, also many of them
won't enable CONFIG_CGROUP_DEBUG by default.

Similar information could be retrieved with debuggers like drgn but that's
also not always available (e.g. lockdown) and the additional cost of runtime
tracking here is deemed marginal.

tj: Added Michal's paragraphs on why this is not added the debug controller
    to the commit message.

Signed-off-by: Waiman Long <longman@redhat.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Roman Gushchin <roman.gushchin@linux.dev>
Reviewed-by: Kamalesh Babulal <kamalesh.babulal@oracle.com>
Cc: Michal Koutný <mkoutny@suse.com>
Link: http://lkml.kernel.org/r/20240715150034.2583772-1-longman@redhat.com
Signed-off-by: Tejun Heo <tj@kernel.org>
2024-07-31 07:00:02 -10:00
Benjamin Poirier
1b2255db3c Documentation: Add detailed explanation for 'N' taint flag
Every taint flag has an entry in the "More detailed explanation" section
except for the 'N' flag. That omission was probably just an oversight so
add an entry for that flag.

Signed-off-by: Benjamin Poirier <bpoirier@nvidia.com>
Acked-by: Luis Chamberlain <mcgrof@kernel.org>
Reviewed-by: David Gow <davidgow@google.com>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Link: https://lore.kernel.org/r/20240717203521.514348-1-bpoirier@nvidia.com
2024-07-30 07:56:30 -06:00
Linus Torvalds
65ad409e63 more s390 updates for 6.11 merge window
- Fix KMSAN build breakage caused by the conflict between s390 and
   mm-stable trees
 
 - Add KMSAN page markers for ptdump
 
 - Add runtime constant support
 
 - Fix __pa/__va for modules under non-GPL licenses by exporting necessary
   vm_layout struct with EXPORT_SYMBOL to prevent linkage problems
 
 - Fix an endless loop in the CF_DIAG event stop in the CPU Measurement
   Counter Facility code when the counter set size is zero
 
 - Remove the PROTECTED_VIRTUALIZATION_GUEST config option and enable
   its functionality by default
 
 - Support allocation of multiple MSI interrupts per device and improve
   logging of architecture-specific limitations
 
 - Add support for lowcore relocation as a debugging feature to catch
   all null ptr dereferences in the kernel address space, improving
   detection beyond the current implementation's limited write access
   protection
 
 - Clean up and rework CPU alternatives to allow for callbacks and early
   patching for the lowcore relocation
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEE3QHqV+H2a8xAv27vjYWKoQLXFBgFAmajrGUACgkQjYWKoQLX
 FBgZjwf/X8suh1Gm2qO47hdGURusOKmEa6GjYaihKzOi2I5yAWVXGsAYX7QtXI4X
 fxbKyuGJIvq7LOrIojN1JOCPGkDRztgMddqDJI7WljuRiw6dcd4L5tbaPgCKEv3Q
 AQcoq+Aeg1L5xnuNFPdQXl6+Fy2lTFqJCkUl+uW05pGAn2R212dYG3HB41TpwOtJ
 Sv2R5+yD9TQCKnHyuCQqaGf7d6SQTcVeBj8zrqVmcyduNK+BYYMOwlJ/UTRzeZEX
 3DmQg/TdAkxXf0jZ+vrNILEfHlIvwDAhFjdoAXXL0TX4lx2cHLx9AiqNxYhUprsG
 0gutc/nLq2FxhqofoJ0z9TCdb1Ef7w==
 =skva
 -----END PGP SIGNATURE-----

Merge tag 's390-6.11-2' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux

Pull more s390 updates from Vasily Gorbik:

 - Fix KMSAN build breakage caused by the conflict between s390 and
   mm-stable trees

 - Add KMSAN page markers for ptdump

 - Add runtime constant support

 - Fix __pa/__va for modules under non-GPL licenses by exporting
   necessary vm_layout struct with EXPORT_SYMBOL to prevent linkage
   problems

 - Fix an endless loop in the CF_DIAG event stop in the CPU Measurement
   Counter Facility code when the counter set size is zero

 - Remove the PROTECTED_VIRTUALIZATION_GUEST config option and enable
   its functionality by default

 - Support allocation of multiple MSI interrupts per device and improve
   logging of architecture-specific limitations

 - Add support for lowcore relocation as a debugging feature to catch
   all null ptr dereferences in the kernel address space, improving
   detection beyond the current implementation's limited write access
   protection

 - Clean up and rework CPU alternatives to allow for callbacks and early
   patching for the lowcore relocation

* tag 's390-6.11-2' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (39 commits)
  s390: Remove protvirt and kvm config guards for uv code
  s390/boot: Add cmdline option to relocate lowcore
  s390/kdump: Make kdump ready for lowcore relocation
  s390/entry: Make system_call() ready for lowcore relocation
  s390/entry: Make ret_from_fork() ready for lowcore relocation
  s390/entry: Make __switch_to() ready for lowcore relocation
  s390/entry: Make restart_int_handler() ready for lowcore relocation
  s390/entry: Make mchk_int_handler() ready for lowcore relocation
  s390/entry: Make int handlers ready for lowcore relocation
  s390/entry: Make pgm_check_handler() ready for lowcore relocation
  s390/entry: Add base register to CHECK_VMAP_STACK/CHECK_STACK macro
  s390/entry: Add base register to SIEEXIT macro
  s390/entry: Add base register to MBEAR macro
  s390/entry: Make __sie64a() ready for lowcore relocation
  s390/head64: Make startup code ready for lowcore relocation
  s390: Add infrastructure to patch lowcore accesses
  s390/atomic_ops: Disable flag outputs constraint for GCC versions below 14.2.0
  s390/entry: Move SIE indicator flag to thread info
  s390/nmi: Simplify ptregs setup
  s390/alternatives: Remove alternative facility list
  ...
2024-07-26 10:47:53 -07:00
Linus Torvalds
f646429524 parisc architecture fixes and updates for kernel v6.11-rc1:
- Add gettimeofday() and clock_gettime() vDSO functions
 - Enable PCI_MSI_ARCH_FALLBACKS to allow PCI to PCIe bridge adaptor
   with PCIe NVME card to function in parisc machines
 - Allow users to reduce kernel unaligned runtime warnings
 - minor code cleanups
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQS86RI+GtKfB8BJu973ErUQojoPXwUCZqFfJgAKCRD3ErUQojoP
 X6YqAQDC6/8icgbscM6dP5m4+oQ4/Nf3qzKM12jt87sAuRUAxAD9G6uyzUxtw7xS
 qcRlVoGrc/SLI18JMi3zvs1sEPsicA8=
 =cYa2
 -----END PGP SIGNATURE-----

Merge tag 'parisc-for-6.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux

Pull parisc updates from Helge Deller:
 "The gettimeofday() and clock_gettime() syscalls are now available as
  vDSO functions, and Dave added a patch which allows to use NVMe cards
  in the PCI slots as fast and easy alternative to SCSI discs.

  Summary:

   - add gettimeofday() and clock_gettime() vDSO functions

   - enable PCI_MSI_ARCH_FALLBACKS to allow PCI to PCIe bridge adaptor
     with PCIe NVME card to function in parisc machines

   - allow users to reduce kernel unaligned runtime warnings

   - minor code cleanups"

* tag 'parisc-for-6.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
  parisc: Add support for CONFIG_SYSCTL_ARCH_UNALIGN_NO_WARN
  parisc: Use max() to calculate parisc_tlb_flush_threshold
  parisc: Fix warning at drivers/pci/msi/msi.h:121
  parisc: Add 64-bit gettimeofday() and clock_gettime() vDSO functions
  parisc: Add 32-bit gettimeofday() and clock_gettime() vDSO functions
  parisc: Clean up unistd.h file
2024-07-25 12:37:42 -07:00
Helge Deller
cbade82334 parisc: Add support for CONFIG_SYSCTL_ARCH_UNALIGN_NO_WARN
Allow users to disable kernel warnings for unaligned memory
accesses from kernel via the /proc/sys/kernel/ignore-unaligned-usertrap
procfs entry.
That way users can disable those warnings in case they happen too
often.

Signed-off-by: Helge Deller <deller@gmx.de>
2024-07-24 02:04:05 +02:00
Sven Schnelle
035248a784 s390/alternatives: Remove noaltinstr option
The current Kernel doesn't boot without alternative patching on
z16 machines. To avoid such bugs in the future, remove the option
disable alternative patching.

Signed-off-by: Sven Schnelle <svens@linux.ibm.com>
Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2024-07-23 16:02:31 +02:00
Linus Torvalds
fbc90c042c - 875fa64577 ("mm/hugetlb_vmemmap: fix race with speculative PFN
walkers") is known to cause a performance regression
   (https://lore.kernel.org/all/3acefad9-96e5-4681-8014-827d6be71c7a@linux.ibm.com/T/#mfa809800a7862fb5bdf834c6f71a3a5113eb83ff).
   Yu has a fix which I'll send along later via the hotfixes branch.
 
 - In the series "mm: Avoid possible overflows in dirty throttling" Jan
   Kara addresses a couple of issues in the writeback throttling code.
   These fixes are also targetted at -stable kernels.
 
 - Ryusuke Konishi's series "nilfs2: fix potential issues related to
   reserved inodes" does that.  This should actually be in the
   mm-nonmm-stable tree, along with the many other nilfs2 patches.  My bad.
 
 - More folio conversions from Kefeng Wang in the series "mm: convert to
   folio_alloc_mpol()"
 
 - Kemeng Shi has sent some cleanups to the writeback code in the series
   "Add helper functions to remove repeated code and improve readability of
   cgroup writeback"
 
 - Kairui Song has made the swap code a little smaller and a little
   faster in the series "mm/swap: clean up and optimize swap cache index".
 
 - In the series "mm/memory: cleanly support zeropage in
   vm_insert_page*(), vm_map_pages*() and vmf_insert_mixed()" David
   Hildenbrand has reworked the rather sketchy handling of the use of the
   zeropage in MAP_SHARED mappings.  I don't see any runtime effects here -
   more a cleanup/understandability/maintainablity thing.
 
 - Dev Jain has improved selftests/mm/va_high_addr_switch.c's handling of
   higher addresses, for aarch64.  The (poorly named) series is
   "Restructure va_high_addr_switch".
 
 - The core TLB handling code gets some cleanups and possible slight
   optimizations in Bang Li's series "Add update_mmu_tlb_range() to
   simplify code".
 
 - Jane Chu has improved the handling of our
   fake-an-unrecoverable-memory-error testing feature MADV_HWPOISON in the
   series "Enhance soft hwpoison handling and injection".
 
 - Jeff Johnson has sent a billion patches everywhere to add
   MODULE_DESCRIPTION() to everything.  Some landed in this pull.
 
 - In the series "mm: cleanup MIGRATE_SYNC_NO_COPY mode", Kefeng Wang has
   simplified migration's use of hardware-offload memory copying.
 
 - Yosry Ahmed performs more folio API conversions in his series "mm:
   zswap: trivial folio conversions".
 
 - In the series "large folios swap-in: handle refault cases first",
   Chuanhua Han inches us forward in the handling of large pages in the
   swap code.  This is a cleanup and optimization, working toward the end
   objective of full support of large folio swapin/out.
 
 - In the series "mm,swap: cleanup VMA based swap readahead window
   calculation", Huang Ying has contributed some cleanups and a possible
   fixlet to his VMA based swap readahead code.
 
 - In the series "add mTHP support for anonymous shmem" Baolin Wang has
   taught anonymous shmem mappings to use multisize THP.  By default this
   is a no-op - users must opt in vis sysfs controls.  Dramatic
   improvements in pagefault latency are realized.
 
 - David Hildenbrand has some cleanups to our remaining use of
   page_mapcount() in the series "fs/proc: move page_mapcount() to
   fs/proc/internal.h".
 
 - David also has some highmem accounting cleanups in the series
   "mm/highmem: don't track highmem pages manually".
 
 - Build-time fixes and cleanups from John Hubbard in the series
   "cleanups, fixes, and progress towards avoiding "make headers"".
 
 - Cleanups and consolidation of the core pagemap handling from Barry
   Song in the series "mm: introduce pmd|pte_needs_soft_dirty_wp helpers
   and utilize them".
 
 - Lance Yang's series "Reclaim lazyfree THP without splitting" has
   reduced the latency of the reclaim of pmd-mapped THPs under fairly
   common circumstances.  A 10x speedup is seen in a microbenchmark.
 
   It does this by punting to aother CPU but I guess that's a win unless
   all CPUs are pegged.
 
 - hugetlb_cgroup cleanups from Xiu Jianfeng in the series
   "mm/hugetlb_cgroup: rework on cftypes".
 
 - Miaohe Lin's series "Some cleanups for memory-failure" does just that
   thing.
 
 - Is anyone reading this stuff?  If so, email me!
 
 - Someone other than SeongJae has developed a DAMON feature in Honggyu
   Kim's series "DAMON based tiered memory management for CXL memory".
   This adds DAMON features which may be used to help determine the
   efficiency of our placement of CXL/PCIe attached DRAM.
 
 - DAMON user API centralization and simplificatio work in SeongJae
   Park's series "mm/damon: introduce DAMON parameters online commit
   function".
 
 - In the series "mm: page_type, zsmalloc and page_mapcount_reset()"
   David Hildenbrand does some maintenance work on zsmalloc - partially
   modernizing its use of pageframe fields.
 
 - Kefeng Wang provides more folio conversions in the series "mm: remove
   page_maybe_dma_pinned() and page_mkclean()".
 
 - More cleanup from David Hildenbrand, this time in the series
   "mm/memory_hotplug: use PageOffline() instead of PageReserved() for
   !ZONE_DEVICE".  It "enlightens memory hotplug more about PageOffline()
   pages" and permits the removal of some virtio-mem hacks.
 
 - Barry Song's series "mm: clarify folio_add_new_anon_rmap() and
   __folio_add_anon_rmap()" is a cleanup to the anon folio handling in
   preparation for mTHP (multisize THP) swapin.
 
 - Kefeng Wang's series "mm: improve clear and copy user folio"
   implements more folio conversions, this time in the area of large folio
   userspace copying.
 
 - The series "Docs/mm/damon/maintaier-profile: document a mailing tool
   and community meetup series" tells people how to get better involved
   with other DAMON developers.  From SeongJae Park.
 
 - A large series ("kmsan: Enable on s390") from Ilya Leoshkevich does
   that.
 
 - David Hildenbrand sends along more cleanups, this time against the
   migration code.  The series is "mm/migrate: move NUMA hinting fault
   folio isolation + checks under PTL".
 
 - Jan Kara has found quite a lot of strangenesses and minor errors in
   the readahead code.  He addresses this in the series "mm: Fix various
   readahead quirks".
 
 - SeongJae Park's series "selftests/damon: test DAMOS tried regions and
   {min,max}_nr_regions" adds features and addresses errors in DAMON's self
   testing code.
 
 - Gavin Shan has found a userspace-triggerable WARN in the pagecache
   code.  The series "mm/filemap: Limit page cache size to that supported
   by xarray" addresses this.  The series is marked cc:stable.
 
 - Chengming Zhou's series "mm/ksm: cmp_and_merge_page() optimizations
   and cleanup" cleans up and slightly optimizes KSM.
 
 - Roman Gushchin has separated the memcg-v1 and memcg-v2 code - lots of
   code motion.  The series (which also makes the memcg-v1 code
   Kconfigurable) are
 
   "mm: memcg: separate legacy cgroup v1 code and put under config
   option" and
   "mm: memcg: put cgroup v1-specific memcg data under CONFIG_MEMCG_V1"
 
 - Dan Schatzberg's series "Add swappiness argument to memory.reclaim"
   adds an additional feature to this cgroup-v2 control file.
 
 - The series "Userspace controls soft-offline pages" from Jiaqi Yan
   permits userspace to stop the kernel's automatic treatment of excessive
   correctable memory errors.  In order to permit userspace to monitor and
   handle this situation.
 
 - Kefeng Wang's series "mm: migrate: support poison recover from migrate
   folio" teaches the kernel to appropriately handle migration from
   poisoned source folios rather than simply panicing.
 
 - SeongJae Park's series "Docs/damon: minor fixups and improvements"
   does those things.
 
 - In the series "mm/zsmalloc: change back to per-size_class lock"
   Chengming Zhou improves zsmalloc's scalability and memory utilization.
 
 - Vivek Kasireddy's series "mm/gup: Introduce memfd_pin_folios() for
   pinning memfd folios" makes the GUP code use FOLL_PIN rather than bare
   refcount increments.  So these paes can first be moved aside if they
   reside in the movable zone or a CMA block.
 
 - Andrii Nakryiko has added a binary ioctl()-based API to /proc/pid/maps
   for much faster reading of vma information.  The series is "query VMAs
   from /proc/<pid>/maps".
 
 - In the series "mm: introduce per-order mTHP split counters" Lance Yang
   improves the kernel's presentation of developer information related to
   multisize THP splitting.
 
 - Michael Ellerman has developed the series "Reimplement huge pages
   without hugepd on powerpc (8xx, e500, book3s/64)".  This permits
   userspace to use all available huge page sizes.
 
 - In the series "revert unconditional slab and page allocator fault
   injection calls" Vlastimil Babka removes a performance-affecting and not
   very useful feature from slab fault injection.
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZp2C+QAKCRDdBJ7gKXxA
 joTkAQDvjqOoFStqk4GU3OXMYB7WCU/ZQMFG0iuu1EEwTVDZ4QEA8CnG7seek1R3
 xEoo+vw0sWWeLV3qzsxnCA1BJ8cTJA8=
 =z0Lf
 -----END PGP SIGNATURE-----

Merge tag 'mm-stable-2024-07-21-14-50' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull MM updates from Andrew Morton:

 - In the series "mm: Avoid possible overflows in dirty throttling" Jan
   Kara addresses a couple of issues in the writeback throttling code.
   These fixes are also targetted at -stable kernels.

 - Ryusuke Konishi's series "nilfs2: fix potential issues related to
   reserved inodes" does that. This should actually be in the
   mm-nonmm-stable tree, along with the many other nilfs2 patches. My
   bad.

 - More folio conversions from Kefeng Wang in the series "mm: convert to
   folio_alloc_mpol()"

 - Kemeng Shi has sent some cleanups to the writeback code in the series
   "Add helper functions to remove repeated code and improve readability
   of cgroup writeback"

 - Kairui Song has made the swap code a little smaller and a little
   faster in the series "mm/swap: clean up and optimize swap cache
   index".

 - In the series "mm/memory: cleanly support zeropage in
   vm_insert_page*(), vm_map_pages*() and vmf_insert_mixed()" David
   Hildenbrand has reworked the rather sketchy handling of the use of
   the zeropage in MAP_SHARED mappings. I don't see any runtime effects
   here - more a cleanup/understandability/maintainablity thing.

 - Dev Jain has improved selftests/mm/va_high_addr_switch.c's handling
   of higher addresses, for aarch64. The (poorly named) series is
   "Restructure va_high_addr_switch".

 - The core TLB handling code gets some cleanups and possible slight
   optimizations in Bang Li's series "Add update_mmu_tlb_range() to
   simplify code".

 - Jane Chu has improved the handling of our
   fake-an-unrecoverable-memory-error testing feature MADV_HWPOISON in
   the series "Enhance soft hwpoison handling and injection".

 - Jeff Johnson has sent a billion patches everywhere to add
   MODULE_DESCRIPTION() to everything. Some landed in this pull.

 - In the series "mm: cleanup MIGRATE_SYNC_NO_COPY mode", Kefeng Wang
   has simplified migration's use of hardware-offload memory copying.

 - Yosry Ahmed performs more folio API conversions in his series "mm:
   zswap: trivial folio conversions".

 - In the series "large folios swap-in: handle refault cases first",
   Chuanhua Han inches us forward in the handling of large pages in the
   swap code. This is a cleanup and optimization, working toward the end
   objective of full support of large folio swapin/out.

 - In the series "mm,swap: cleanup VMA based swap readahead window
   calculation", Huang Ying has contributed some cleanups and a possible
   fixlet to his VMA based swap readahead code.

 - In the series "add mTHP support for anonymous shmem" Baolin Wang has
   taught anonymous shmem mappings to use multisize THP. By default this
   is a no-op - users must opt in vis sysfs controls. Dramatic
   improvements in pagefault latency are realized.

 - David Hildenbrand has some cleanups to our remaining use of
   page_mapcount() in the series "fs/proc: move page_mapcount() to
   fs/proc/internal.h".

 - David also has some highmem accounting cleanups in the series
   "mm/highmem: don't track highmem pages manually".

 - Build-time fixes and cleanups from John Hubbard in the series
   "cleanups, fixes, and progress towards avoiding "make headers"".

 - Cleanups and consolidation of the core pagemap handling from Barry
   Song in the series "mm: introduce pmd|pte_needs_soft_dirty_wp helpers
   and utilize them".

 - Lance Yang's series "Reclaim lazyfree THP without splitting" has
   reduced the latency of the reclaim of pmd-mapped THPs under fairly
   common circumstances. A 10x speedup is seen in a microbenchmark.

   It does this by punting to aother CPU but I guess that's a win unless
   all CPUs are pegged.

 - hugetlb_cgroup cleanups from Xiu Jianfeng in the series
   "mm/hugetlb_cgroup: rework on cftypes".

 - Miaohe Lin's series "Some cleanups for memory-failure" does just that
   thing.

 - Someone other than SeongJae has developed a DAMON feature in Honggyu
   Kim's series "DAMON based tiered memory management for CXL memory".
   This adds DAMON features which may be used to help determine the
   efficiency of our placement of CXL/PCIe attached DRAM.

 - DAMON user API centralization and simplificatio work in SeongJae
   Park's series "mm/damon: introduce DAMON parameters online commit
   function".

 - In the series "mm: page_type, zsmalloc and page_mapcount_reset()"
   David Hildenbrand does some maintenance work on zsmalloc - partially
   modernizing its use of pageframe fields.

 - Kefeng Wang provides more folio conversions in the series "mm: remove
   page_maybe_dma_pinned() and page_mkclean()".

 - More cleanup from David Hildenbrand, this time in the series
   "mm/memory_hotplug: use PageOffline() instead of PageReserved() for
   !ZONE_DEVICE". It "enlightens memory hotplug more about PageOffline()
   pages" and permits the removal of some virtio-mem hacks.

 - Barry Song's series "mm: clarify folio_add_new_anon_rmap() and
   __folio_add_anon_rmap()" is a cleanup to the anon folio handling in
   preparation for mTHP (multisize THP) swapin.

 - Kefeng Wang's series "mm: improve clear and copy user folio"
   implements more folio conversions, this time in the area of large
   folio userspace copying.

 - The series "Docs/mm/damon/maintaier-profile: document a mailing tool
   and community meetup series" tells people how to get better involved
   with other DAMON developers. From SeongJae Park.

 - A large series ("kmsan: Enable on s390") from Ilya Leoshkevich does
   that.

 - David Hildenbrand sends along more cleanups, this time against the
   migration code. The series is "mm/migrate: move NUMA hinting fault
   folio isolation + checks under PTL".

 - Jan Kara has found quite a lot of strangenesses and minor errors in
   the readahead code. He addresses this in the series "mm: Fix various
   readahead quirks".

 - SeongJae Park's series "selftests/damon: test DAMOS tried regions and
   {min,max}_nr_regions" adds features and addresses errors in DAMON's
   self testing code.

 - Gavin Shan has found a userspace-triggerable WARN in the pagecache
   code. The series "mm/filemap: Limit page cache size to that supported
   by xarray" addresses this. The series is marked cc:stable.

 - Chengming Zhou's series "mm/ksm: cmp_and_merge_page() optimizations
   and cleanup" cleans up and slightly optimizes KSM.

 - Roman Gushchin has separated the memcg-v1 and memcg-v2 code - lots of
   code motion. The series (which also makes the memcg-v1 code
   Kconfigurable) are "mm: memcg: separate legacy cgroup v1 code and put
   under config option" and "mm: memcg: put cgroup v1-specific memcg
   data under CONFIG_MEMCG_V1"

 - Dan Schatzberg's series "Add swappiness argument to memory.reclaim"
   adds an additional feature to this cgroup-v2 control file.

 - The series "Userspace controls soft-offline pages" from Jiaqi Yan
   permits userspace to stop the kernel's automatic treatment of
   excessive correctable memory errors. In order to permit userspace to
   monitor and handle this situation.

 - Kefeng Wang's series "mm: migrate: support poison recover from
   migrate folio" teaches the kernel to appropriately handle migration
   from poisoned source folios rather than simply panicing.

 - SeongJae Park's series "Docs/damon: minor fixups and improvements"
   does those things.

 - In the series "mm/zsmalloc: change back to per-size_class lock"
   Chengming Zhou improves zsmalloc's scalability and memory
   utilization.

 - Vivek Kasireddy's series "mm/gup: Introduce memfd_pin_folios() for
   pinning memfd folios" makes the GUP code use FOLL_PIN rather than
   bare refcount increments. So these paes can first be moved aside if
   they reside in the movable zone or a CMA block.

 - Andrii Nakryiko has added a binary ioctl()-based API to
   /proc/pid/maps for much faster reading of vma information. The series
   is "query VMAs from /proc/<pid>/maps".

 - In the series "mm: introduce per-order mTHP split counters" Lance
   Yang improves the kernel's presentation of developer information
   related to multisize THP splitting.

 - Michael Ellerman has developed the series "Reimplement huge pages
   without hugepd on powerpc (8xx, e500, book3s/64)". This permits
   userspace to use all available huge page sizes.

 - In the series "revert unconditional slab and page allocator fault
   injection calls" Vlastimil Babka removes a performance-affecting and
   not very useful feature from slab fault injection.

* tag 'mm-stable-2024-07-21-14-50' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (411 commits)
  mm/mglru: fix ineffective protection calculation
  mm/zswap: fix a white space issue
  mm/hugetlb: fix kernel NULL pointer dereference when migrating hugetlb folio
  mm/hugetlb: fix possible recursive locking detected warning
  mm/gup: clear the LRU flag of a page before adding to LRU batch
  mm/numa_balancing: teach mpol_to_str about the balancing mode
  mm: memcg1: convert charge move flags to unsigned long long
  alloc_tag: fix page_ext_get/page_ext_put sequence during page splitting
  lib: reuse page_ext_data() to obtain codetag_ref
  lib: add missing newline character in the warning message
  mm/mglru: fix overshooting shrinker memory
  mm/mglru: fix div-by-zero in vmpressure_calc_level()
  mm/kmemleak: replace strncpy() with strscpy()
  mm, page_alloc: put should_fail_alloc_page() back behing CONFIG_FAIL_PAGE_ALLOC
  mm, slab: put should_failslab() back behind CONFIG_SHOULD_FAILSLAB
  mm: ignore data-race in __swap_writepage
  hugetlbfs: ensure generic_hugetlb_get_unmapped_area() returns higher address than mmap_min_addr
  mm: shmem: rename mTHP shmem counters
  mm: swap_state: use folio_alloc_mpol() in __read_swap_cache_async()
  mm/migrate: putback split folios when numa hint migration fails
  ...
2024-07-21 17:15:46 -07:00
Linus Torvalds
2c9b351240 ARM:
* Initial infrastructure for shadow stage-2 MMUs, as part of nested
   virtualization enablement
 
 * Support for userspace changes to the guest CTR_EL0 value, enabling
   (in part) migration of VMs between heterogenous hardware
 
 * Fixes + improvements to pKVM's FF-A proxy, adding support for v1.1 of
   the protocol
 
 * FPSIMD/SVE support for nested, including merged trap configuration
   and exception routing
 
 * New command-line parameter to control the WFx trap behavior under KVM
 
 * Introduce kCFI hardening in the EL2 hypervisor
 
 * Fixes + cleanups for handling presence/absence of FEAT_TCRX
 
 * Miscellaneous fixes + documentation updates
 
 LoongArch:
 
 * Add paravirt steal time support.
 
 * Add support for KVM_DIRTY_LOG_INITIALLY_SET.
 
 * Add perf kvm-stat support for loongarch.
 
 RISC-V:
 
 * Redirect AMO load/store access fault traps to guest
 
 * perf kvm stat support
 
 * Use guest files for IMSIC virtualization, when available
 
 ONE_REG support for the Zimop, Zcmop, Zca, Zcf, Zcd, Zcb and Zawrs ISA
 extensions is coming through the RISC-V tree.
 
 s390:
 
 * Assortment of tiny fixes which are not time critical
 
 x86:
 
 * Fixes for Xen emulation.
 
 * Add a global struct to consolidate tracking of host values, e.g. EFER
 
 * Add KVM_CAP_X86_APIC_BUS_CYCLES_NS to allow configuring the effective APIC
   bus frequency, because TDX.
 
 * Print the name of the APICv/AVIC inhibits in the relevant tracepoint.
 
 * Clean up KVM's handling of vendor specific emulation to consistently act on
   "compatible with Intel/AMD", versus checking for a specific vendor.
 
 * Drop MTRR virtualization, and instead always honor guest PAT on CPUs
   that support self-snoop.
 
 * Update to the newfangled Intel CPU FMS infrastructure.
 
 * Don't advertise IA32_PERF_GLOBAL_OVF_CTRL as an MSR-to-be-saved, as it reads
   '0' and writes from userspace are ignored.
 
 * Misc cleanups
 
 x86 - MMU:
 
 * Small cleanups, renames and refactoring extracted from the upcoming
   Intel TDX support.
 
 * Don't allocate kvm_mmu_page.shadowed_translation for shadow pages that can't
   hold leafs SPTEs.
 
 * Unconditionally drop mmu_lock when allocating TDP MMU page tables for eager
   page splitting, to avoid stalling vCPUs when splitting huge pages.
 
 * Bug the VM instead of simply warning if KVM tries to split a SPTE that is
   non-present or not-huge.  KVM is guaranteed to end up in a broken state
   because the callers fully expect a valid SPTE, it's all but dangerous
   to let more MMU changes happen afterwards.
 
 x86 - AMD:
 
 * Make per-CPU save_area allocations NUMA-aware.
 
 * Force sev_es_host_save_area() to be inlined to avoid calling into an
   instrumentable function from noinstr code.
 
 * Base support for running SEV-SNP guests.  API-wise, this includes
   a new KVM_X86_SNP_VM type, encrypting/measure the initial image into
   guest memory, and finalizing it before launching it.  Internally,
   there are some gmem/mmu hooks needed to prepare gmem-allocated pages
   before mapping them into guest private memory ranges.
 
   This includes basic support for attestation guest requests, enough to
   say that KVM supports the GHCB 2.0 specification.
 
   There is no support yet for loading into the firmware those signing
   keys to be used for attestation requests, and therefore no need yet
   for the host to provide certificate data for those keys.  To support
   fetching certificate data from userspace, a new KVM exit type will be
   needed to handle fetching the certificate from userspace. An attempt to
   define a new KVM_EXIT_COCO/KVM_EXIT_COCO_REQ_CERTS exit type to handle
   this was introduced in v1 of this patchset, but is still being discussed
   by community, so for now this patchset only implements a stub version
   of SNP Extended Guest Requests that does not provide certificate data.
 
 x86 - Intel:
 
 * Remove an unnecessary EPT TLB flush when enabling hardware.
 
 * Fix a series of bugs that cause KVM to fail to detect nested pending posted
   interrupts as valid wake eents for a vCPU executing HLT in L2 (with
   HLT-exiting disable by L1).
 
 * KVM: x86: Suppress MMIO that is triggered during task switch emulation
 
   Explicitly suppress userspace emulated MMIO exits that are triggered when
   emulating a task switch as KVM doesn't support userspace MMIO during
   complex (multi-step) emulation.  Silently ignoring the exit request can
   result in the WARN_ON_ONCE(vcpu->mmio_needed) firing if KVM exits to
   userspace for some other reason prior to purging mmio_needed.
 
   See commit 0dc902267c ("KVM: x86: Suppress pending MMIO write exits if
   emulator detects exception") for more details on KVM's limitations with
   respect to emulated MMIO during complex emulator flows.
 
 Generic:
 
 * Rename the AS_UNMOVABLE flag that was introduced for KVM to AS_INACCESSIBLE,
   because the special casing needed by these pages is not due to just
   unmovability (and in fact they are only unmovable because the CPU cannot
   access them).
 
 * New ioctl to populate the KVM page tables in advance, which is useful to
   mitigate KVM page faults during guest boot or after live migration.
   The code will also be used by TDX, but (probably) not through the ioctl.
 
 * Enable halt poll shrinking by default, as Intel found it to be a clear win.
 
 * Setup empty IRQ routing when creating a VM to avoid having to synchronize
   SRCU when creating a split IRQCHIP on x86.
 
 * Rework the sched_in/out() paths to replace kvm_arch_sched_in() with a flag
   that arch code can use for hooking both sched_in() and sched_out().
 
 * Take the vCPU @id as an "unsigned long" instead of "u32" to avoid
   truncating a bogus value from userspace, e.g. to help userspace detect bugs.
 
 * Mark a vCPU as preempted if and only if it's scheduled out while in the
   KVM_RUN loop, e.g. to avoid marking it preempted and thus writing guest
   memory when retrieving guest state during live migration blackout.
 
 Selftests:
 
 * Remove dead code in the memslot modification stress test.
 
 * Treat "branch instructions retired" as supported on all AMD Family 17h+ CPUs.
 
 * Print the guest pseudo-RNG seed only when it changes, to avoid spamming the
   log for tests that create lots of VMs.
 
 * Make the PMU counters test less flaky when counting LLC cache misses by
   doing CLFLUSH{OPT} in every loop iteration.
 -----BEGIN PGP SIGNATURE-----
 
 iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmaZQB0UHHBib256aW5p
 QHJlZGhhdC5jb20ACgkQv/vSX3jHroNkZwf/bv2jiENaLFNGPe/VqTKMQ6PHQLMG
 +sNHx6fJPP35gTM8Jqf0/7/ummZXcSuC1mWrzYbecZm7Oeg3vwNXHZ4LquwwX6Dv
 8dKcUzLbWDAC4WA3SKhi8C8RV2v6E7ohy69NtAJmFWTc7H95dtIQm6cduV2osTC3
 OEuHe1i8d9umk6couL9Qhm8hk3i9v2KgCsrfyNrQgLtS3hu7q6yOTR8nT0iH6sJR
 KE5A8prBQgLmF34CuvYDw4Hu6E4j+0QmIqodovg2884W1gZQ9LmcVqYPaRZGsG8S
 iDdbkualLKwiR1TpRr3HJGKWSFdc7RblbsnHRvHIZgFsMQiimh4HrBSCyQ==
 =zepX
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull kvm updates from Paolo Bonzini:
 "ARM:

   - Initial infrastructure for shadow stage-2 MMUs, as part of nested
     virtualization enablement

   - Support for userspace changes to the guest CTR_EL0 value, enabling
     (in part) migration of VMs between heterogenous hardware

   - Fixes + improvements to pKVM's FF-A proxy, adding support for v1.1
     of the protocol

   - FPSIMD/SVE support for nested, including merged trap configuration
     and exception routing

   - New command-line parameter to control the WFx trap behavior under
     KVM

   - Introduce kCFI hardening in the EL2 hypervisor

   - Fixes + cleanups for handling presence/absence of FEAT_TCRX

   - Miscellaneous fixes + documentation updates

  LoongArch:

   - Add paravirt steal time support

   - Add support for KVM_DIRTY_LOG_INITIALLY_SET

   - Add perf kvm-stat support for loongarch

  RISC-V:

   - Redirect AMO load/store access fault traps to guest

   - perf kvm stat support

   - Use guest files for IMSIC virtualization, when available

  s390:

   - Assortment of tiny fixes which are not time critical

  x86:

   - Fixes for Xen emulation

   - Add a global struct to consolidate tracking of host values, e.g.
     EFER

   - Add KVM_CAP_X86_APIC_BUS_CYCLES_NS to allow configuring the
     effective APIC bus frequency, because TDX

   - Print the name of the APICv/AVIC inhibits in the relevant
     tracepoint

   - Clean up KVM's handling of vendor specific emulation to
     consistently act on "compatible with Intel/AMD", versus checking
     for a specific vendor

   - Drop MTRR virtualization, and instead always honor guest PAT on
     CPUs that support self-snoop

   - Update to the newfangled Intel CPU FMS infrastructure

   - Don't advertise IA32_PERF_GLOBAL_OVF_CTRL as an MSR-to-be-saved, as
     it reads '0' and writes from userspace are ignored

   - Misc cleanups

  x86 - MMU:

   - Small cleanups, renames and refactoring extracted from the upcoming
     Intel TDX support

   - Don't allocate kvm_mmu_page.shadowed_translation for shadow pages
     that can't hold leafs SPTEs

   - Unconditionally drop mmu_lock when allocating TDP MMU page tables
     for eager page splitting, to avoid stalling vCPUs when splitting
     huge pages

   - Bug the VM instead of simply warning if KVM tries to split a SPTE
     that is non-present or not-huge. KVM is guaranteed to end up in a
     broken state because the callers fully expect a valid SPTE, it's
     all but dangerous to let more MMU changes happen afterwards

  x86 - AMD:

   - Make per-CPU save_area allocations NUMA-aware

   - Force sev_es_host_save_area() to be inlined to avoid calling into
     an instrumentable function from noinstr code

   - Base support for running SEV-SNP guests. API-wise, this includes a
     new KVM_X86_SNP_VM type, encrypting/measure the initial image into
     guest memory, and finalizing it before launching it. Internally,
     there are some gmem/mmu hooks needed to prepare gmem-allocated
     pages before mapping them into guest private memory ranges

     This includes basic support for attestation guest requests, enough
     to say that KVM supports the GHCB 2.0 specification

     There is no support yet for loading into the firmware those signing
     keys to be used for attestation requests, and therefore no need yet
     for the host to provide certificate data for those keys.

     To support fetching certificate data from userspace, a new KVM exit
     type will be needed to handle fetching the certificate from
     userspace.

     An attempt to define a new KVM_EXIT_COCO / KVM_EXIT_COCO_REQ_CERTS
     exit type to handle this was introduced in v1 of this patchset, but
     is still being discussed by community, so for now this patchset
     only implements a stub version of SNP Extended Guest Requests that
     does not provide certificate data

  x86 - Intel:

   - Remove an unnecessary EPT TLB flush when enabling hardware

   - Fix a series of bugs that cause KVM to fail to detect nested
     pending posted interrupts as valid wake eents for a vCPU executing
     HLT in L2 (with HLT-exiting disable by L1)

   - KVM: x86: Suppress MMIO that is triggered during task switch
     emulation

     Explicitly suppress userspace emulated MMIO exits that are
     triggered when emulating a task switch as KVM doesn't support
     userspace MMIO during complex (multi-step) emulation

     Silently ignoring the exit request can result in the
     WARN_ON_ONCE(vcpu->mmio_needed) firing if KVM exits to userspace
     for some other reason prior to purging mmio_needed

     See commit 0dc902267c ("KVM: x86: Suppress pending MMIO write
     exits if emulator detects exception") for more details on KVM's
     limitations with respect to emulated MMIO during complex emulator
     flows

  Generic:

   - Rename the AS_UNMOVABLE flag that was introduced for KVM to
     AS_INACCESSIBLE, because the special casing needed by these pages
     is not due to just unmovability (and in fact they are only
     unmovable because the CPU cannot access them)

   - New ioctl to populate the KVM page tables in advance, which is
     useful to mitigate KVM page faults during guest boot or after live
     migration. The code will also be used by TDX, but (probably) not
     through the ioctl

   - Enable halt poll shrinking by default, as Intel found it to be a
     clear win

   - Setup empty IRQ routing when creating a VM to avoid having to
     synchronize SRCU when creating a split IRQCHIP on x86

   - Rework the sched_in/out() paths to replace kvm_arch_sched_in() with
     a flag that arch code can use for hooking both sched_in() and
     sched_out()

   - Take the vCPU @id as an "unsigned long" instead of "u32" to avoid
     truncating a bogus value from userspace, e.g. to help userspace
     detect bugs

   - Mark a vCPU as preempted if and only if it's scheduled out while in
     the KVM_RUN loop, e.g. to avoid marking it preempted and thus
     writing guest memory when retrieving guest state during live
     migration blackout

  Selftests:

   - Remove dead code in the memslot modification stress test

   - Treat "branch instructions retired" as supported on all AMD Family
     17h+ CPUs

   - Print the guest pseudo-RNG seed only when it changes, to avoid
     spamming the log for tests that create lots of VMs

   - Make the PMU counters test less flaky when counting LLC cache
     misses by doing CLFLUSH{OPT} in every loop iteration"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (227 commits)
  crypto: ccp: Add the SNP_VLEK_LOAD command
  KVM: x86/pmu: Add kvm_pmu_call() to simplify static calls of kvm_pmu_ops
  KVM: x86: Introduce kvm_x86_call() to simplify static calls of kvm_x86_ops
  KVM: x86: Replace static_call_cond() with static_call()
  KVM: SEV: Provide support for SNP_EXTENDED_GUEST_REQUEST NAE event
  x86/sev: Move sev_guest.h into common SEV header
  KVM: SEV: Provide support for SNP_GUEST_REQUEST NAE event
  KVM: x86: Suppress MMIO that is triggered during task switch emulation
  KVM: x86/mmu: Clean up make_huge_page_split_spte() definition and intro
  KVM: x86/mmu: Bug the VM if KVM tries to split a !hugepage SPTE
  KVM: selftests: x86: Add test for KVM_PRE_FAULT_MEMORY
  KVM: x86: Implement kvm_arch_vcpu_pre_fault_memory()
  KVM: x86/mmu: Make kvm_mmu_do_page_fault() return mapped level
  KVM: x86/mmu: Account pf_{fixed,emulate,spurious} in callers of "do page fault"
  KVM: x86/mmu: Bump pf_taken stat only in the "real" page fault handler
  KVM: Add KVM_PRE_FAULT_MEMORY vcpu ioctl to pre-populate guest memory
  KVM: Document KVM_PRE_FAULT_MEMORY ioctl
  mm, virt: merge AS_UNMOVABLE and AS_INACCESSIBLE
  perf kvm: Add kvm-stat for loongarch64
  LoongArch: KVM: Add PV steal time support in guest side
  ...
2024-07-20 12:41:03 -07:00
Linus Torvalds
d2be38b9a5 - added support for Realtek RTL9302C
- added support for Mobileye EyeQ6H
 - added support for Mobileye EyeQ OLB system controller
 - improved r4k clocksource
 - added mode for emulating ieee754 NAN2008
 - rework for BMIPS CBR address handling
 - fixes for Loongson 2K1000
 - defconfig updates
 - cleanups and fixes
 -----BEGIN PGP SIGNATURE-----
 
 iQJOBAABCAA4FiEEbt46xwy6kEcDOXoUeZbBVTGwZHAFAmabf5oaHHRzYm9nZW5k
 QGFscGhhLmZyYW5rZW4uZGUACgkQeZbBVTGwZHAYOQ//dgWc6RDS5vWKt14goHoR
 m3Qt63oHuxfGJsPCHdAqD4bAjxMa1eaRzbfXZ/cMrCSHsUo6bth8dmqFCDMjjWMT
 ifcCOCwXOf32NUTdm4mNLrKVUvCNeWUN6It8XBBF9r7seogvJPDpDZlEWUzYwfDE
 6e7MaaFIEMZN2Q5OAjb6PozTI0gQ3p3UAHVdvN4Z9jJxkYPzRqVostcFUL9M9iU6
 7OwGypIdZVSzB+6J6k0yv4rqNDei92SmlLjBD1+GK6uLdJG0JXiWn/XEMxOLyRP9
 kKyfpjCwOgAfbTnMoo1N2n1jkP1BqyAPHvGqF2HGpi5mFRW1i25WdcwvF/jImyes
 yQ/gLKt/y3sOqfssayDvK9acRkp0KQltpPfvWxBXM464+8+gKCdYPZ7+81AbXAiL
 Qx+bVVdE3HSoO9T06/b0Lpudue7eNU+jlaO8MLH778heT+5k+mlI/H0Ep7M5U7qO
 5V9xWlvLpceTa/gJ1cc9bUI5MG/2x+imw7COUcnv+wsWBJ3pGX4Jhwwe2hUn7ixd
 0lhrSrQi1ILkFd8gL2REoJ520RNUVfR8yDn7mNuYV1++zlGVb7EAt67v/J6Y1p8l
 9aQP/587oZvLAN2IBlovSzqvc6tHZlK6hO9d+ktqJood5NOjOWEGfT0RCm0eqiFF
 Er6qaWxjROZO1kiGjzo7v+4=
 =/6JH
 -----END PGP SIGNATURE-----

Merge tag 'mips_6.11' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux

Pull MIPS updates from Thomas Bogendoerfer:

 - add support for Realtek RTL9302C

 - add support for Mobileye EyeQ6H

 - add support for Mobileye EyeQ OLB system controller

 - improve r4k clocksource

 - add mode for emulating ieee754 NAN2008

 - rework for BMIPS CBR address handling

 - fixes for Loongson 2K1000

 - defconfig updates

 - cleanups and fixes

* tag 'mips_6.11' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux: (58 commits)
  MIPS: config: Add ip30_defconfig
  MIPS: config: lemote2f: Regenerate defconfig
  MIPS: config: generic: Add board-litex
  MIPS: config: Enable MSA and virtualization for MIPS64R6
  MIPS: Fix fallback march for SB1
  mips: dts: realtek: Add RTL9302C board
  mips: generic: add fdt fixup for Realtek reference board
  mips: select REALTEK_OTTO_TIMER for Realtek platforms
  dt-bindings: interrupt-controller: realtek,rtl-intc: Add rtl9300-intc
  dt-bindings: mips: realtek: Add rtl930x-soc compatible
  dt-bindings: vendor-prefixes: Add Cameo Communications
  mips: dts: realtek: add device_type property to cpu node
  mips: dts: realtek: use "serial" instead of "uart" in node name
  MIPS: Implement ieee754 NAN2008 emulation mode
  MIPS: lantiq: improve USB initialization
  MIPS: GIC: Generate redirect block accessors
  MIPS: CPS: Add a couple of multi-cluster utility functions
  MIPS: Octeron: remove source file executable bit
  MAINTAINERS: Mobileye: add OLB drivers and dt-bindings
  MIPS: mobileye: eyeq5: add OLB system-controller node
  ...
2024-07-20 09:03:36 -07:00
Linus Torvalds
3f386cb8ee pci-v6.11-changes
-----BEGIN PGP SIGNATURE-----
 
 iQJIBAABCgAyFiEEgMe7l+5h9hnxdsnuWYigwDrT+vwFAmaahiEUHGJoZWxnYWFz
 QGdvb2dsZS5jb20ACgkQWYigwDrT+vwypg/+LSzrx0CyyXruwwkjuoMIzqXoEpxV
 SSdJv47E9rnJymQvd0RAeNyc1BPbtRcP1FdEvV/G1ovb8qJSOJgU22PSSiMQsQ0h
 2WGBl1ShubQDDLBdy1AggAsRJhIH4P4tWZ4k5Ftz6WZPWA1UcrDqmjN4d02UIYZb
 A3YYcBEIm6bvrixxy+xq/Ii7S9A2idikabDLLGXOMSliFHx0ehWDNXyQEBONlrDh
 rEHih21rPtOltVEdJl7yF+SIA467HI09NuXfTviHWnJ1hinFoSlEHIhz4j+i+r//
 xOj7iDqtk/UAIToVsxtwgOnElNwY6ab/h/t1AmSSxX4FUEV2TiS1YEpUfX7pByt+
 dytgvepjQyycC/ZHUtRZFZ6+1M0z+Vgb5c3+jXyPh8pQEPqmXt8+KYVIi/wychmJ
 Opo4xniiDoKHSZ4E0bg/wMbe9yVCjTpX0i0S7BbNa/TRjud6vAhXvgx/y092jsdg
 h4lU0ywNCgea/rZFHZYomPjncx9xJ+rtOaH+/dVQhCm/wuRHnj7tJGZnl5LfCWVw
 +yNOcExQaE+lRvKqp6mQvUva3+4UArAL2tnFC00tGd0emRLIvXrxY2lF1sqp9wCZ
 AJu65El4nnpFNU7vJR7x4X31BvcdquFEvfofPxPXbPz09N8hPRhkunKzgd5ftKZS
 mcxMfStvIFXiMEM=
 =vw2i
 -----END PGP SIGNATURE-----

Merge tag 'pci-v6.11-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci

Pull pci updates from Bjorn Helgaas:
 "Enumeration:

   - Define PCIE_RESET_CONFIG_DEVICE_WAIT_MS for the generic 100ms
     required after reset before config access (Kevin Xie)

   - Define PCIE_T_RRS_READY_MS for the generic 100ms required after
     reset before config access (probably should be unified with
     PCIE_RESET_CONFIG_DEVICE_WAIT_MS) (Damien Le Moal)

  Resource management:

   - Rename find_resource() to find_resource_space() to be more
     descriptive (Ilpo Järvinen)

   - Export find_resource_space() for use by PCI core, which needs to
     learn whether there is available space for a bridge window (Ilpo
     Järvinen)

   - Prevent double counting of resources so window size doesn't grow on
     each remove/rescan cycle (Ilpo Järvinen)

   - Relax bridge window sizing algorithm so a device doesn't break
     simply because it was removed and rescanned (Ilpo Järvinen)

   - Evaluate the ACPI PRESERVE_BOOT_CONFIG _DSM in
     pci_register_host_bridge() (not acpi_pci_root_create()) so we can
     unify it with similar DT functionality (Vidya Sagar)

   - Extend use of DT "linux,pci-probe-only" property so it works
     per-host bridge as well as globally (Vidya Sagar)

   - Unify support for ACPI PRESERVE_BOOT_CONFIG _DSM and the DT
     "linux,pci-probe-only" property in pci_preserve_config() (Vidya
     Sagar)

  Driver binding:

   - Add devres infrastructure for managed request and map of partial
     BAR resources (Philipp Stanner)

   - Deprecate pcim_iomap_table() because uses like
     "pcim_iomap_table()[0]" have no good way to return errors (Philipp
     Stanner)

   - Add an always-managed pcim_request_region() for use instead of
     pci_request_region() and similar, which are sometimes managed
     depending on whether pcim_enable_device() has been called
     previously (Philipp Stanner)

   - Reimplement pcim_set_mwi() so it doesn't need to keep store MWI
     state (Philipp Stanner)

   - Add pcim_intx() for use instead of pci_intx(), which is sometimes
     managed depending on whether pcim_enable_device() has been called
     previously (Philipp Stanner)

   - Add managed pcim_iomap_range() to allow mapping of a partial BAR
     (Philipp Stanner)

   - Fix a devres mapping leak in drm/vboxvideo (Philipp Stanner)

  Error handling:

   - Add missing bridge locking in device reset path and add a warning
     for other possible lock issues (Dan Williams)

   - Fix use-after-free on concurrent DPC and hot-removal (Lukas Wunner)

  Power management:

   - Disable AER and DPC during suspend to avoid spurious wakeups if
     they share an interrupt with PME (Kai-Heng Feng)

  PCIe native device hotplug:

   - Detect if a device was removed or replaced during system sleep so
     we don't assume a new device is the one that used to be there
     (Lukas Wunner)

  Virtualization:

   - Add an ACS quirk for Broadcom BCM5760X multi-function NIC; it
     prevents transactions between functions even though it doesn't
     advertise ACS, so the functions can be attached individually via
     VFIO (Ajit Khaparde)

  Peer-to-peer DMA:

   - Add a "pci=config_acs=" kernel command-line parameter to relax
     default ACS settings to enable additional peer-to-peer
     configurations. Requires expert knowledge of topology and ACS
     operation (Vidya Sagar)

  Endpoint framework:

   - Remove unused struct pci_epf_group.type_group (Christophe JAILLET)

   - Fix error handling in vpci_scan_bus() and epf_ntb_epc_cleanup()
     (Dan Carpenter)

   - Make struct pci_epc_class constant (Greg Kroah-Hartman)

   - Remove unused pci_endpoint_test_bar_{readl,writel} functions
     (Jiapeng Chong)

   - Rename "BME" to "Bus Master Enable" (Manivannan Sadhasivam)

   - Rename struct pci_epc_event_ops.core_init() callback to epc_init()
     (Manivannan Sadhasivam)

   - Move DMA init to MHI .epc_init() callback for uniformity
     (Manivannan Sadhasivam)

   - Cancel EPF test delayed work when link goes down (Manivannan
     Sadhasivam)

   - Add struct pci_epc_event_ops.epc_deinit() callback for cleanup
     needed on fundamental reset (Manivannan Sadhasivam)

   - Add 64KB alignment to endpoint test to support Rockchip rk3588
     (Niklas Cassel)

   - Optimize endpoint test by using memcpy() instead of readl() (Niklas
     Cassel)

  Device tree bindings:

   - Add generic "ats-supported" property to advertise that a PCIe Root
     Complex supports ATS (Jean-Philippe Brucker)

  Amazon Annapurna Labs PCIe controller driver:

   - Validate IORESOURCE_BUS presence to avoid NULL pointer dereference
     (Aleksandr Mishin)

  Axis ARTPEC-6 PCIe controller driver:

   - Rename .cpu_addr_fixup() parameter to reflect that it is a PCI
     address, not a CPU address (Niklas Cassel)

  Freescale i.MX6 PCIe controller driver:

   - Convert to agnostic GPIO API (Andy Shevchenko)

  Freescale Layerscape PCIe controller driver:

   - Make struct mobiveil_rp_ops constant (Christophe JAILLET)

   - Use new generic dw_pcie_ep_linkdown() to handle link-down events
     (Manivannan Sadhasivam)

  HiSilicon Kirin PCIe controller driver:

   - Convert to agnostic GPIO API (Andy Shevchenko)

   - Use _scoped() iterator for OF children to ensure refcounts are
     decremented at loop exit (Javier Carrasco)

  Intel VMD host bridge driver:

   - Create sysfs "domain" symlink before downstream devices are exposed
     to userspace by pci_bus_add_devices() (Jiwei Sun)

  Loongson PCIe controller driver:

   - Enable MSI when LS7A is used with new CPUs that have integrated
     PCIe Root Complex, e.g., Loongson-3C6000, so downstream devices can
     use MSI (Huacai Chen)

  Microchip AXI PolarFlare PCIe controller driver:

   - Move pcie-microchip-host.c to a new PLDA directory (Minda Chen)

   - Factor PLDA generic items out to a common
     plda,xpressrich3-axi-common.yaml binding (Minda Chen)

   - Factor PLDA generic data structures and code out to shared
     pcie-plda.h, pcie-plda-host.c (Minda Chen)

   - Add PLDA generic interrupt handling with a .request_event_irq()
     callback for vendor-specific events (Minda Chen)

   - Add PLDA generic host init/deinit and map bus functions for use by
     vendor-specific drivers (Minda Chen)

   - Rework to use PLDA core (Minda Chen)

  Microsoft Hyper-V host bridge driver:

   - Return zero, not garbage, when reading PCI_INTERRUPT_PIN (Wei Liu)

  NVIDIA Tegra194 PCIe controller driver:

   - Remove unused struct tegra_pcie_soc (Dr. David Alan Gilbert)

   - Set 64KB inbound ATU alignment restriction (Jon Hunter)

  Qualcomm PCIe controller driver:

   - Make the MHI reg region mandatory for X1E80100, since all PCIe
     controllers have it (Abel Vesa)

   - Prevent use of uninitialized data and possible error pointer
     dereference (Dan Carpenter)

   - Return error, not success, if dev_pm_opp_find_freq_floor() fails
     (Dan Carpenter)

   - Add Operating Performance Points (OPP) support to scale performance
     state based on aggregate link bandwidth to improve SoC power
     efficiency (Krishna chaitanya chundru)

   - Vote for the CPU-PCIe ICC (interconnect) path to ensure it stays
     active even if other drivers don't vote for it (Krishna chaitanya
     chundru)

   - Use devm_clk_bulk_get_all() to get all the clocks from DT to avoid
     writing out all the clock names (Manivannan Sadhasivam)

   - Add DT binding and driver support for the SA8775P SoC (Mrinmay
     Sarkar)

   - Add HDMA support for the SA8775P SoC (Mrinmay Sarkar)

   - Override the SA8775P NO_SNOOP default to avoid possible memory
     corruption (Mrinmay Sarkar)

   - Make sure resources are disabled during PERST# assertion, even if
     the link is already disabled (Manivannan Sadhasivam)

   - Use new generic dw_pcie_ep_linkdown() to handle link-down events
     (Manivannan Sadhasivam)

   - Add DT and endpoint driver support for the SA8775P SoC (Mrinmay
     Sarkar)

   - Add Hyper DMA (HDMA) support for the SA8775P SoC and enable it in
     the EPF MHI driver (Mrinmay Sarkar)

   - Set PCIE_PARF_NO_SNOOP_OVERIDE to override the default NO_SNOOP
     attribute on the SA8775P SoC (both Root Complex and Endpoint mode)
     to avoid possible memory corruption (Mrinmay Sarkar)

  Renesas R-Car PCIe controller driver:

   - Demote WARN() to dev_warn_ratelimited() in rcar_pcie_wakeup() to
     avoid unnecessary backtrace (Marek Vasut)

   - Add DT and driver support for R-Car V4H (R8A779G0) host and
     endpoint. This requires separate proprietary firmware (Yoshihiro
     Shimoda)

  Rockchip PCIe controller driver:

   - Assert PERST# for 100ms after power is stable (Damien Le Moal)

   - Wait PCIE_T_RRS_READY_MS (100ms) after reset before starting
     configuration (Damien Le Moal)

   - Use GPIOD_OUT_LOW flag while requesting ep_gpio to fix a firmware
     crash on Qcom-based modems with Rockpro64 board (Manivannan
     Sadhasivam)

  Rockchip DesignWare PCIe controller driver:

   - Factor common parts of rockchip-dw-pcie DT binding to be shared by
     Root Complex and Endpoint mode (Niklas Cassel)

   - Add missing INTx signals to common DT binding (Niklas Cassel)

   - Add eDMA items to DT binding for Endpoint controller (Niklas
     Cassel)

   - Fix initial dw-rockchip PERST# GPIO value to prevent unnecessary
     short assert/deassert that causes issues with some WLAN controllers
     (Niklas Cassel)

   - Refactor dw-rockchip and add support for Endpoint mode (Niklas
     Cassel)

   - Call pci_epc_init_notify() and drop dw_pcie_ep_init_notify()
     wrapper (Niklas Cassel)

   - Add error messages in .probe() error paths to improve user
     experience (Uwe Kleine-König)

  Samsung Exynos PCIe controller driver:

   - Use bulk clock APIs to simplify clock setup (Shradha Todi)

  StarFive PCIe controller driver:

   - Add DT binding and driver support for the StarFive JH7110
     PLDA-based PCIe controller (Minda Chen)

  Synopsys DesignWare PCIe controller driver:

   - Add generic support for sending PME_Turn_Off when system suspends
     (Frank Li)

   - Fix incorrect interpretation of iATU slot 0 after PERST#
     assert/deassert (Frank Li)

   - Use msleep() instead of usleep_range() while waiting for link
     (Konrad Dybcio)

   - Refactor dw_pcie_edma_find_chip() to enable adding support for
     Hyper DMA (HDMA) (Manivannan Sadhasivam)

   - Enable drivers to supply the eDMA channel count since some can't
     auto detect this (Manivannan Sadhasivam)

   - Call pci_epc_init_notify() and drop dw_pcie_ep_init_notify()
     wrapper (Manivannan Sadhasivam)

   - Pass the eDMA mapping format directly from drivers instead of
     maintaining a capability for it (Manivannan Sadhasivam)

   - Add generic dw_pcie_ep_linkdown() to notify EPF drivers about
     link-down events and restore non-sticky DWC registers lost on link
     down (Manivannan Sadhasivam)

   - Add vendor-specific "apb" reg name, interrupt names, INTx names to
     generic binding (Niklas Cassel)

   - Enforce DWC restriction that 64-bit BARs must start with an
     even-numbered BAR (Niklas Cassel)

   - Consolidate args of dw_pcie_prog_outbound_atu() into a structure
     (Yoshihiro Shimoda)

   - Add support for endpoints to send Message TLPs, e.g., for INTx
     emulation (Yoshihiro Shimoda)

  TI DRA7xx PCIe controller driver:

   - Rename .cpu_addr_fixup() parameter to reflect that it is a PCI
     address, not a CPU address (Niklas Cassel)

  TI Keystone PCIe controller driver:

   - Validate IORESOURCE_BUS presence to avoid NULL pointer dereference
     (Aleksandr Mishin)

   - Work around AM65x/DRA80xM Errata #i2037 that corrupts TLPs and
     causes processor hangs by limiting Max_Read_Request_Size (MRRS) and
     Max_Payload_Size (MPS) (Kishon Vijay Abraham I)

   - Leave BAR 0 disabled for AM654x to fix a regression caused by
     6ab15b5e70 ("PCI: dwc: keystone: Convert .scan_bus() callback to
     use add_bus"), which caused a 45-second boot delay (Siddharth
     Vadapalli)

  Xilinx Versal CPM PCIe controller driver:

   - Fix overlapping bridge registers and 32-bit BAR addresses in DT
     binding (Thippeswamy Havalige)

  MicroSemi Switchtec management driver:

   - Make struct switchtec_class constant (Greg Kroah-Hartman)

  Miscellaneous:

   - Remove unused struct acpi_handle_node (Dr. David Alan Gilbert)

   - Add missing MODULE_DESCRIPTION() macros (Jeff Johnson)"

* tag 'pci-v6.11-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci: (154 commits)
  PCI: loongson: Enable MSI in LS7A Root Complex
  PCI: Extend ACS configurability
  PCI: Add missing bridge lock to pci_bus_lock()
  drm/vboxvideo: fix mapping leaks
  PCI: Add managed pcim_iomap_range()
  PCI: Remove legacy pcim_release()
  PCI: Add managed pcim_intx()
  PCI: vmd: Create domain symlink before pci_bus_add_devices()
  PCI: qcom: Prevent use of uninitialized data in qcom_pcie_suspend_noirq()
  PCI: qcom: Prevent potential error pointer dereference
  PCI: qcom: Fix missing error code in qcom_pcie_probe()
  PCI: Give pcim_set_mwi() its own devres cleanup callback
  PCI: Move struct pci_devres.pinned bit to struct pci_dev
  PCI: Remove struct pci_devres.enabled status bit
  PCI: Document hybrid devres hazards
  PCI: Add managed pcim_request_region()
  PCI: Deprecate pcim_iomap_table(), pcim_iomap_regions_request_all()
  PCI: Add managed partial-BAR request and map infrastructure
  PCI: Add devres helpers for iomap table
  PCI: Add and use devres helper for bit masks
  ...
2024-07-19 19:03:18 -07:00