Commit Graph

43938 Commits

Author SHA1 Message Date
Peter Newman
8da2b938eb x86/resctrl: Implement rename op for mon groups
To change the resources allocated to a large group of tasks, such as an
application container, a container manager must write all of the tasks'
IDs into the tasks file interface of the new control group. This is
challenging when the container's task list is always changing.

In addition, if the container manager is using monitoring groups to
separately track the bandwidth of containers assigned to the same
control group, when moving a container, it must first move the
container's tasks to the default monitoring group of the new control
group before it can move these tasks into the container's replacement
monitoring group under the destination control group. This is
undesirable because it makes bandwidth usage during the move
unattributable to the correct tasks and resets monitoring event counters
and cache usage information for the group.

Implement the rename operation only for resctrlfs monitor groups to
enable users to move a monitoring group from one control group to
another. This effects a change in resources allocated to all the tasks
in the monitoring group while otherwise leaving the monitoring data
intact.

Signed-off-by: Peter Newman <peternewman@google.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Tested-by: Babu Moger <babu.moger@amd.com>
Link: https://lore.kernel.org/r/20230419125015.693566-3-peternewman@google.com
2023-06-07 12:40:36 +02:00
Peter Newman
c45c06d4ae x86/resctrl: Factor rdtgroup lock for multi-file ops
rdtgroup_kn_lock_live() can only release a kernfs reference for a single
file before waiting on the rdtgroup_mutex, limiting its usefulness for
operations on multiple files, such as rename.

Factor the work needed to respectively break and unbreak active
protection on an individual file into rdtgroup_kn_{get,put}().

No functional change.

Signed-off-by: Peter Newman <peternewman@google.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Tested-by: Babu Moger <babu.moger@amd.com>
Link: https://lore.kernel.org/r/20230419125015.693566-2-peternewman@google.com
2023-06-07 12:15:18 +02:00
Shawn Wang
2997d94b5d x86/resctrl: Only show tasks' pid in current pid namespace
When writing a task id to the "tasks" file in an rdtgroup,
rdtgroup_tasks_write() treats the pid as a number in the current pid
namespace. But when reading the "tasks" file, rdtgroup_tasks_show() shows
the list of global pids from the init namespace, which is confusing and
incorrect.

To be more robust, let the "tasks" file only show pids in the current pid
namespace.

Fixes: e02737d5b8 ("x86/intel_rdt: Add tasks files")
Signed-off-by: Shawn Wang <shawnwang@linux.alibaba.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Acked-by: Reinette Chatre <reinette.chatre@intel.com>
Acked-by: Fenghua Yu <fenghua.yu@intel.com>
Tested-by: Reinette Chatre <reinette.chatre@intel.com>
Link: https://lore.kernel.org/all/20230116071246.97717-1-shawnwang@linux.alibaba.com/
2023-05-30 20:57:39 +02:00
Linus Torvalds
f8b2507c26 A single fix for x86:
- Prevent a bogus setting for the number of HT siblings, which is caused
    by the CPUID evaluation trainwreck of X86. That recomputes the value
    for each CPU, so the last CPU "wins". That can cause completely bogus
    sibling values.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmRzBxMTHHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYodfbEACp+oRBD2zIcmf8kpakMxIbH2CkDAxl
 AgqGHYVKAvsqKPQZhYmOFEsk+0isMzssuVaub9gMeLRmpMQxUqv7K20pluvWmuQm
 h7MTYEAvCZpBU2BGB1mixp57tQ/gpLSB4uaJU2Q5hh9po6uImvalIdqThf7ArxgA
 mm3BhtbFObLGwRaV2981zdsEs8Cjs9RusnAOrX5LbBBb6ibcnrqWy7DAhySMA0zr
 7RfaGyG/T9z6/BgX54FBQ7aDNU/RkZ8YYQoMj0kAFXdM3V+sa3oPfMRQpACPHKm4
 kwskJfWVdMq06kgOD7N9+WrOfBAgIsmzasT6VhTuGAGNo4CiEOthLqXDHwf4NLhN
 hj0XReHVyBiQ1KuI2lGNJ71c30KRkh7SCdGNiEFtnlpteXnS4bRnzWtfMf/+2SCC
 WOziRRYRzuAvvoTwoQS1BYJrDAB9f5jOX8FTYbC12rEk/RPOB6t4iFNLUzid7u1H
 yPYphoftXlQlli7EajVc5pSZCftMHRCzWernptiPFU5tMvyagYppcg25JY54rquC
 CTJQqoa/HPAEZhBjgZElwO9QBgu+VyJUq3R/Z+LbkQuz5z+PVDfqeDuDWhaqbcz1
 WGtdGdNo4bqVhwqgwyHjgKCFe05mF6tPuotnYZE/VSkXUCvLkjGEVg8HQ7WiPhJd
 IYDqxkIhX5h1Mg==
 =l43p
 -----END PGP SIGNATURE-----

Merge tag 'x86-urgent-2023-05-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 cpu fix from Thomas Gleixner:
 "A single fix for x86:

   - Prevent a bogus setting for the number of HT siblings, which is
     caused by the CPUID evaluation trainwreck of X86. That recomputes
     the value for each CPU, so the last CPU "wins". That can cause
     completely bogus sibling values"

* tag 'x86-urgent-2023-05-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/topology: Fix erroneous smp_num_siblings on Intel Hybrid platforms
2023-05-28 07:42:05 -04:00
Linus Torvalds
2d5438f4c6 A small set of perf fixes:
- Make the CHA discovery based on MSR readout to work around broken
    discovery tables in some SPR firmwares.
 
  - Prevent saving PEBS configuration which has software bits set that
    cause a crash when restored into the relevant MSR.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmRzBlETHHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYofUsEAC6VMmscn6MDmCpO71+o80IhFUeN/Pa
 /Qu4kKCQmeT/gonMG+bKCYSNobgdi4a7KNDRZypTjzMORTDmS3lT4aTbzy3ry+eI
 7rUTolAo3HiY0ZwPhU/evrGDf0O2d/19nd9pXiLZFinH9WsMf7mKsXhAJvlq6SKu
 SliGCQlB7+0bicdYgBkJqWPUbKY7QvcCcXiPx1Ujxg7CbMTcuj7MLzu/TVYmwZYH
 Je/k+vBnhl54PF9nGiqNYZGBPh4cBgkpHEFGFrvwplBRBdwHrjapNxhIRcA2N66u
 mr+f7Cl6StOP16Bn4dzG9nHLVTDz5NVgsbtPStVTcOoIzfOhch7mOL+tu/pafXdt
 zRL7U6usfbQqt1rcBBtPoeFlEifQwHoz3ZGFSyIZn5IRfXtXlDJHyPxlRYMB5/uz
 WjR10GzE0tZFLetvW4PdlMHcwwrOZNU+crvADmCk4KzzVDip9QLbomiGfxHVHI2J
 Fjstd6ZaNkuDHRy/r76YfwRDDzQWPvtsaWRLzD4aFQQxYdHHZOabBittK9UvENvu
 MjLRO+1ymN52Ap6TdcK9ZMNpb8ol/Xn5aYivNRlHsUjJ8RxJj94sElIlNHLI2yoa
 hICdMcGSCd7H+FIkXdgT2O8OBxEsf139xSrG2oSTOZFCjkR1BSxtyqNY47MFVSiA
 KBtbnJMkde48pg==
 =Z3GU
 -----END PGP SIGNATURE-----

Merge tag 'perf-urgent-2023-05-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull perf fixes from Thomas Gleixner:
 "A small set of perf fixes:

   - Make the MSR-readout based CHA discovery work around broken
     discovery tables in some SPR firmwares.

   - Prevent saving PEBS configuration which has software bits set that
     cause a crash when restored into the relevant MSR"

* tag 'perf-urgent-2023-05-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf/x86/uncore: Correct the number of CHAs on SPR
  perf/x86/intel: Save/restore cpuc->active_pebs_data_cfg when using guest PEBS
2023-05-28 07:37:23 -04:00
Linus Torvalds
abbf7fa15b A set of unwinder and tooling fixes:
- Ensure that the stack pointer on x86 is aligned again so that the
     unwinder does not read past the end of the stack
 
   - Discard .note.gnu.property section which has a pointlessly different
     alignment than the other note sections. That confuses tooling of all
     sorts including readelf, libbpf and pahole.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmRzBW0THHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYoczHD/4vCopMPBFsT5w5HhSTQSX5Kglx3UQL
 g+qa0Z/uJKLpMgGkBzWBQGXHH/UqyY4sk5T3DDOkVQjdSfD+zmJu4R44wAonEU+y
 SDW2MeBciyp1WtjwPNWNY9QgDUQj7Cr2dSLTVqBR9akladTcj7EKHApq0pgaqsbi
 MnmfXzPyH17PRfyW8FbsBjdQvhpkYZSZntQ0cU+W5VtsUtZvg2w4I8IU76ri9L0E
 OYBbk2zV8RabGVJBv79fnm4fXb78+Zkp/xvs5sTBRKRS+3T5FNNJjo8tNp1AtxdF
 eJMlOeY7PjSFyAL6+/+/uRqYNCnH4Rw2MlMqJJIhzKYCw7BFo0FAhi+mro63Z85b
 byGuriv1g7suOgOCK8IzTl7e1nTDP4YCZ0cMDDvcRceIiqR6Rrk3C/B5cBVz0Bng
 iFYFUJSlLT4G+tGZwupV2UUGsCwS5+vVCG/FVWes8Mz4g4zr2Dmrh6YMLzrcDshE
 eAY2yKMypD/JD9cJa4KOVJuFARaSDJDiBpsO1BjRC6MxLFaw1biB0mE2hkwn3LJe
 fLVvU/sWnIQ+dQoy7+GBzm7Pi5SmKBuLDsgOt/ocbCwfUwUQMGf+I7bNg+08UOOS
 dNzM6/agdd8rLZTS/Ma7EvnA367ZwVOb5COVMqwxAoKZbPkvgz1xvRGkFQCHYapx
 BwMILF9FIM0kvg==
 =TPwY
 -----END PGP SIGNATURE-----

Merge tag 'objtool-urgent-2023-05-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull unwinder fixes from Thomas Gleixner:
 "A set of unwinder and tooling fixes:

   - Ensure that the stack pointer on x86 is aligned again so that the
     unwinder does not read past the end of the stack

   - Discard .note.gnu.property section which has a pointlessly
     different alignment than the other note sections. That confuses
     tooling of all sorts including readelf, libbpf and pahole"

* tag 'objtool-urgent-2023-05-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/show_trace_log_lvl: Ensure stack pointer is aligned, again
  vmlinux.lds.h: Discard .note.gnu.property section
2023-05-28 07:33:29 -04:00
Linus Torvalds
4e893b5aa4 xen: branch for v6.4-rc4
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQRTLbB6QfY48x44uB6AXGG7T9hjvgUCZHGVRAAKCRCAXGG7T9hj
 vqtJAQDizKasLE7tSnfs/FrZ/4xPaDLe3bQifMx2C1dtYCjRcAD/ciZSa1L0WzZP
 dpEZnlYRzsR3bwLktQEMQFOvlbh1SwE=
 =K860
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-6.4-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip

Pull xen fixes from Juergen Gross:

 - a double free fix in the Xen pvcalls backend driver

 - a fix for a regression causing the MSI related sysfs entries to not
   being created in Xen PV guests

 - a fix in the Xen blkfront driver for handling insane input data
   better

* tag 'for-linus-6.4-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
  x86/pci/xen: populate MSI sysfs entries
  xen/pvcalls-back: fix double frees with pvcalls_new_active_socket()
  xen/blkfront: Only check REQ_FUA for writes
2023-05-27 09:42:56 -07:00
Linus Torvalds
47ee3f1dd9 x86: re-introduce support for ERMS copies for user space accesses
I tried to streamline our user memory copy code fairly aggressively in
commit adfcf4231b ("x86: don't use REP_GOOD or ERMS for user memory
copies"), in order to then be able to clean up the code and inline the
modern FSRM case in commit 577e6a7fd5 ("x86: inline the 'rep movs' in
user copies for the FSRM case").

We had reports [1] of that causing regressions earlier with blogbench,
but that turned out to be a horrible benchmark for that case, and not a
sufficient reason for re-instating "rep movsb" on older machines.

However, now Eric Dumazet reported [2] a regression in performance that
seems to be a rather more real benchmark, where due to the removal of
"rep movs" a TCP stream over a 100Gbps network no longer reaches line
speed.

And it turns out that with the simplified the calling convention for the
non-FSRM case in commit 427fda2c8a ("x86: improve on the non-rep
'copy_user' function"), re-introducing the ERMS case is actually fairly
simple.

Of course, that "fairly simple" is glossing over several missteps due to
having to fight our assembler alternative code.  This code really wanted
to rewrite a conditional branch to have two different targets, but that
made objtool sufficiently unhappy that this instead just ended up doing
a choice between "jump to the unrolled loop, or use 'rep movsb'
directly".

Let's see if somebody finds a case where the kernel memory copies also
care (see commit 68674f94ff: "x86: don't use REP_GOOD or ERMS for
small memory copies").  But Eric does argue that the user copies are
special because networking tries to copy up to 32KB at a time, if
order-3 pages allocations are possible.

In-kernel memory copies are typically small, unless they are the special
"copy pages at a time" kind that still use "rep movs".

Link: https://lore.kernel.org/lkml/202305041446.71d46724-yujie.liu@intel.com/ [1]
Link: https://lore.kernel.org/lkml/CANn89iKUbyrJ=r2+_kK+sb2ZSSHifFZ7QkPLDpAtkJ8v4WUumA@mail.gmail.com/ [2]
Reported-and-tested-by: Eric Dumazet <edumazet@google.com>
Fixes: adfcf4231b ("x86: don't use REP_GOOD or ERMS for user memory copies")
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2023-05-26 12:34:20 -07:00
Zhang Rui
edc0a2b595 x86/topology: Fix erroneous smp_num_siblings on Intel Hybrid platforms
Traditionally, all CPUs in a system have identical numbers of SMT
siblings.  That changes with hybrid processors where some logical CPUs
have a sibling and others have none.

Today, the CPU boot code sets the global variable smp_num_siblings when
every CPU thread is brought up. The last thread to boot will overwrite
it with the number of siblings of *that* thread. That last thread to
boot will "win". If the thread is a Pcore, smp_num_siblings == 2.  If it
is an Ecore, smp_num_siblings == 1.

smp_num_siblings describes if the *system* supports SMT.  It should
specify the maximum number of SMT threads among all cores.

Ensure that smp_num_siblings represents the system-wide maximum number
of siblings by always increasing its value. Never allow it to decrease.

On MeteorLake-P platform, this fixes a problem that the Ecore CPUs are
not updated in any cpu sibling map because the system is treated as an
UP system when probing Ecore CPUs.

Below shows part of the CPU topology information before and after the
fix, for both Pcore and Ecore CPU (cpu0 is Pcore, cpu 12 is Ecore).
...
-/sys/devices/system/cpu/cpu0/topology/package_cpus:000fff
-/sys/devices/system/cpu/cpu0/topology/package_cpus_list:0-11
+/sys/devices/system/cpu/cpu0/topology/package_cpus:3fffff
+/sys/devices/system/cpu/cpu0/topology/package_cpus_list:0-21
...
-/sys/devices/system/cpu/cpu12/topology/package_cpus:001000
-/sys/devices/system/cpu/cpu12/topology/package_cpus_list:12
+/sys/devices/system/cpu/cpu12/topology/package_cpus:3fffff
+/sys/devices/system/cpu/cpu12/topology/package_cpus_list:0-21

Notice that the "before" 'package_cpus_list' has only one CPU.  This
means that userspace tools like lscpu will see a little laptop like
an 11-socket system:

-Core(s) per socket:  1
-Socket(s):           11
+Core(s) per socket:  16
+Socket(s):           1

This is also expected to make the scheduler do rather wonky things
too.

[ dhansen: remove CPUID detail from changelog, add end user effects ]

CC: stable@kernel.org
Fixes: bbb65d2d36 ("x86: use cpuid vector 0xb when available for detecting cpu topology")
Fixes: 95f3d39ccf ("x86/cpu/topology: Provide detect_extended_topology_early()")
Suggested-by: Len Brown <len.brown@intel.com>
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/all/20230323015640.27906-1-rui.zhang%40intel.com
2023-05-25 10:48:42 -07:00
Kan Liang
38776cc45e perf/x86/uncore: Correct the number of CHAs on SPR
The number of CHAs from the discovery table on some SPR variants is
incorrect, because of a firmware issue. An accurate number can be read
from the MSR UNC_CBO_CONFIG.

Fixes: 949b11381f ("perf/x86/intel/uncore: Add Sapphire Rapids server CHA support")
Reported-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Stephane Eranian <eranian@google.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20230508140206.283708-1-kan.liang@linux.intel.com
2023-05-24 22:19:41 +02:00
Maximilian Heyne
335b422346 x86/pci/xen: populate MSI sysfs entries
Commit bf5e758f02 ("genirq/msi: Simplify sysfs handling") reworked the
creation of sysfs entries for MSI IRQs. The creation used to be in
msi_domain_alloc_irqs_descs_locked after calling ops->domain_alloc_irqs.
Then it moved into __msi_domain_alloc_irqs which is an implementation of
domain_alloc_irqs. However, Xen comes with the only other implementation
of domain_alloc_irqs and hence doesn't run the sysfs population code
anymore.

Commit 6c796996ee ("x86/pci/xen: Fixup fallout from the PCI/MSI
overhaul") set the flag MSI_FLAG_DEV_SYSFS for the xen msi_domain_info
but that doesn't actually have an effect because Xen uses it's own
domain_alloc_irqs implementation.

Fix this by making use of the fallback functions for sysfs population.

Fixes: bf5e758f02 ("genirq/msi: Simplify sysfs handling")
Signed-off-by: Maximilian Heyne <mheyne@amazon.de>
Reviewed-by: Juergen Gross <jgross@suse.com>
Link: https://lore.kernel.org/r/20230503131656.15928-1-mheyne@amazon.de
Signed-off-by: Juergen Gross <jgross@suse.com>
2023-05-24 18:08:49 +02:00
Like Xu
3c845304d2 perf/x86/intel: Save/restore cpuc->active_pebs_data_cfg when using guest PEBS
After commit b752ea0c28 ("perf/x86/intel/ds: Flush PEBS DS when changing
PEBS_DATA_CFG"), the cpuc->pebs_data_cfg may save some bits that are not
supported by real hardware, such as PEBS_UPDATE_DS_SW. This would cause
the VMX hardware MSR switching mechanism to save/restore invalid values
for PEBS_DATA_CFG MSR, thus crashing the host when PEBS is used for guest.
Fix it by using the active host value from cpuc->active_pebs_data_cfg.

Fixes: b752ea0c28 ("perf/x86/intel/ds: Flush PEBS DS when changing PEBS_DATA_CFG")
Signed-off-by: Like Xu <likexu@tencent.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Link: https://lore.kernel.org/r/20230517133808.67885-1-likexu@tencent.com
2023-05-23 10:01:13 +02:00
Linus Torvalds
ae8373a5ad Work around a TLB invalidation issue in recent hybrid CPUs
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEV76QKkVc4xCGURexaDWVMHDJkrAFAmRr6MMACgkQaDWVMHDJ
 krAaIA/+IIV2rijvxzlnN41RksfiuI8lmDXuOeAqFCFVAukm6/faiDHmXQC/4ipw
 JTdgq4JbB6wTZmYsijSMdwgWDt3BGXZAAXK7sSOXLNRkaf0EuPk6h59P7QXBAkRD
 A6ibjWPipV6/k/Mczm8o+WnTSHUln6CVz8yB13gF3eB6+QMO7yYznL4EJ4iUBdZC
 OWCukkI8zT07p1u7nqpOmTixXZEqSauC3qB9tg856w8UJoS1evOocgExZr2d8GT3
 Ex9Q+Hj8ULkM6tDeke9mfoDDEdHtHDkQIMH5L63siTc0LzvRIs/EZEnZGdZWNy9F
 zw6AaIShzJQ9Z9KTP/SRxQ3nvUh/UuI2z7KnZLLxHYqLZZVMNQtinK+ZDGlzvvqg
 xr3ZI2PXUXoQT8Mg0BzzpeLwuuSXS8DFyj63SErValRuyIQQs07MhIRFiU3abQtq
 A4nfXMzfCwfRn5ggm9HIPLKOtiw1ZlsQtDxmHULpXsUqGgBxEfmCF5f210TmkOhN
 yrNlDmNBr39j1xlUeFlJRXWm4badlo39G8s/70oqFTushvatfet7Gyt7fZseGhJL
 KNQCfxyCu8bYYmtYfuW6WTc8nq5sJmnTdWEjcT0ftbgDeYNtBjfKfhJ+8+q04FWl
 wBxu5nqfKMT1bAKIOLse9eVbaMXNuHJJkStQZAZP5/wUMoVvfHI=
 =TZgh
 -----END PGP SIGNATURE-----

Merge tag 'x86_urgent_for_6.4-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 fix from Dave Hansen:
 "This works around and issue where the INVLPG instruction may miss
  invalidating kernel TLB entries in recent hybrid CPUs.

  I do expect an eventual microcode fix for this. When the microcode
  version numbers are known, we can circle back around and add them the
  model table to disable this workaround"

* tag 'x86_urgent_for_6.4-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/mm: Avoid incomplete Global INVLPG flushes
2023-05-22 16:31:28 -07:00
Linus Torvalds
a35747c310 ARM:
* Plug a race in the stage-2 mapping code where the IPA and the PA
   would end up being out of sync
 
 * Make better use of the bitmap API (bitmap_zero, bitmap_zalloc...)
 
 * FP/SVE/SME documentation update, in the hope that this field
   becomes clearer...
 
 * Add workaround for Apple SEIS brokenness to a new SoC
 
 * Random comment fixes
 
 x86:
 
 * add MSR_IA32_TSX_CTRL into msrs_to_save
 
 * fixes for XCR0 handling in SGX enclaves
 
 Generic:
 
 * Fix vcpu_array[0] races
 
 * Fix race between starting a VM and "reboot -f"
 -----BEGIN PGP SIGNATURE-----
 
 iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmRp0WIUHHBib256aW5p
 QHJlZGhhdC5jb20ACgkQv/vSX3jHroPqVwf+OFayNPpURAFqfrOuISYW7hoCL24+
 sCtXyVv4Ei0np1vGekit2h/m8GmxO12xEBibcFeYj+YQItIqu9HvC08fRxAKaMeE
 N3p9iLuS1zcM3cEuZpg0r6QN+pKybttdadl70yho43CtagEM4FmB7dgyAo9AhyXk
 pZUaVfoO6beBQ/J6A6V/Q5xlue1LvHk1+K4rmNcYVTYn6ZOd+yYgvqng1nv5/h9b
 0HgW0aUWkEHAB67/sSnUUro707loMNTowsZlMCtgDk4Fzf8RwQ7qc8lClLk1UPjJ
 DHB6Hif9F0Q5mkrwn+c7xyVlKARaY6/FOshS2Q620q19+4fq5fUD9HgjrQ==
 =ARzH
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull kvm fixes from Paolo Bonzini:
 "ARM:

   - Plug a race in the stage-2 mapping code where the IPA and the PA
     would end up being out of sync

   - Make better use of the bitmap API (bitmap_zero, bitmap_zalloc...)

   - FP/SVE/SME documentation update, in the hope that this field
     becomes clearer...

   - Add workaround for Apple SEIS brokenness to a new SoC

   - Random comment fixes

  x86:

   - add MSR_IA32_TSX_CTRL into msrs_to_save

   - fixes for XCR0 handling in SGX enclaves

  Generic:

   - Fix vcpu_array[0] races

   - Fix race between starting a VM and 'reboot -f'"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  KVM: VMX: add MSR_IA32_TSX_CTRL into msrs_to_save
  KVM: x86: Don't adjust guest's CPUID.0x12.1 (allowed SGX enclave XFRM)
  KVM: VMX: Don't rely _only_ on CPUID to enforce XCR0 restrictions for ECREATE
  KVM: Fix vcpu_array[0] races
  KVM: VMX: Fix header file dependency of asm/vmx.h
  KVM: Don't enable hardware after a restart/shutdown is initiated
  KVM: Use syscore_ops instead of reboot_notifier to hook restart/shutdown
  KVM: arm64: vgic: Add Apple M2 PRO/MAX cpus to the list of broken SEIS implementations
  KVM: arm64: Clarify host SME state management
  KVM: arm64: Restructure check for SVE support in FP trap handler
  KVM: arm64: Document check for TIF_FOREIGN_FPSTATE
  KVM: arm64: Fix repeated words in comments
  KVM: arm64: Constify start/end/phys fields of the pgtable walker data
  KVM: arm64: Infer PA offset from VA in hyp map walker
  KVM: arm64: Infer the PA offset from IPA in stage-2 map walker
  KVM: arm64: Use the bitmap API to allocate bitmaps
  KVM: arm64: Slightly optimize flush_context()
2023-05-21 13:58:37 -07:00
Mingwei Zhang
b9846a698c KVM: VMX: add MSR_IA32_TSX_CTRL into msrs_to_save
Add MSR_IA32_TSX_CTRL into msrs_to_save[] to explicitly tell userspace to
save/restore the register value during migration. Missing this may cause
userspace that relies on KVM ioctl(KVM_GET_MSR_INDEX_LIST) fail to port the
value to the target VM.

In addition, there is no need to add MSR_IA32_TSX_CTRL when
ARCH_CAP_TSX_CTRL_MSR is not supported in kvm_get_arch_capabilities(). So
add the checking in kvm_probe_msr_to_save().

Fixes: c11f83e062 ("KVM: vmx: implement MSR_IA32_TSX_CTRL disable RTM functionality")
Reported-by: Jim Mattson <jmattson@google.com>
Signed-off-by: Mingwei Zhang <mizhang@google.com>
Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Reviewed-by: Jim Mattson <jmattson@google.com>
Message-Id: <20230509032348.1153070-1-mizhang@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-05-21 04:05:51 -04:00
Sean Christopherson
275a87244e KVM: x86: Don't adjust guest's CPUID.0x12.1 (allowed SGX enclave XFRM)
Drop KVM's manipulation of guest's CPUID.0x12.1 ECX and EDX, i.e. the
allowed XFRM of SGX enclaves, now that KVM explicitly checks the guest's
allowed XCR0 when emulating ECREATE.

Note, this could theoretically break a setup where userspace advertises
a "bad" XFRM and relies on KVM to provide a sane CPUID model, but QEMU
is the only known user of KVM SGX, and QEMU explicitly sets the SGX CPUID
XFRM subleaf based on the guest's XCR0.

Reviewed-by: Kai Huang <kai.huang@intel.com>
Tested-by: Kai Huang <kai.huang@intel.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20230503160838.3412617-3-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-05-21 04:05:51 -04:00
Sean Christopherson
ad45413d22 KVM: VMX: Don't rely _only_ on CPUID to enforce XCR0 restrictions for ECREATE
Explicitly check the vCPU's supported XCR0 when determining whether or not
the XFRM for ECREATE is valid.  Checking CPUID works because KVM updates
guest CPUID.0x12.1 to restrict the leaf to a subset of the guest's allowed
XCR0, but that is rather subtle and KVM should not modify guest CPUID
except for modeling true runtime behavior (allowed XFRM is most definitely
not "runtime" behavior).

Reviewed-by: Kai Huang <kai.huang@intel.com>
Tested-by: Kai Huang <kai.huang@intel.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20230503160838.3412617-2-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-05-21 04:05:51 -04:00
Jacob Xu
3367eeab97 KVM: VMX: Fix header file dependency of asm/vmx.h
Include a definition of WARN_ON_ONCE() before using it.

Fixes: bb1fcc70d9 ("KVM: nVMX: Allow L1 to use 5-level page walks for nested EPT")
Cc: Sean Christopherson <seanjc@google.com>
Signed-off-by: Jacob Xu <jacobhxu@google.com>
[reworded commit message; changed <asm/bug.h> to <linux/bug.h>]
Signed-off-by: Jim Mattson <jmattson@google.com>
Reviewed-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20220225012959.1554168-1-jmattson@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-05-19 13:56:25 -04:00
Linus Torvalds
2d1bcbc6cd Probes fixes for 6.4-rc1:
- Initialize 'ret' local variables on fprobe_handler() to fix the smatch
   warning. With this, fprobe function exit handler is not working
   randomly.
 
 - Fix to use preempt_enable/disable_notrace for rethook handler to
   prevent recursive call of fprobe exit handler (which is based on
   rethook)
 
 - Fix recursive call issue on fprobe_kprobe_handler().
 
 - Fix to detect recursive call on fprobe_exit_handler().
 
 - Fix to make all arch-dependent rethook code notrace.
   (the arch-independent code is already notrace)
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCgAdFiEEh7BulGwFlgAOi5DV2/sHvwUrPxsFAmRmKgQACgkQ2/sHvwUr
 PxvlCgf+OJk5O9IJlTgqDV6JNPsTzFS7qqyAyQmZW9Bj8STfWAIRxa0zeGbZE58K
 5LwgzAj+SqzYRwIvzzZ3xsA5j7f1Wj7wG0TQgmpnIW+hprwDrLsUhoZ5s1D/Ojel
 A4rAnqCrgnh5m5SenU2QCUngGKn004j4RASaZvRELDyvyIkBSqNhswCH8ZWGPror
 KuCu5AmEnFagYl0lmNL3H2aCITAg3QEK+fE6iR+lYsqfR3xbs4YAcqiylHBdY0wX
 ssK7LVdRmv7O6TxSj4P2ohDvLJP3eL9bVirsJpg0OVbqWJCs65T2rJJjXiKojYXf
 vSVWFJFK5oV98ZHfXTG9R7x0DEwc+g==
 =jO68
 -----END PGP SIGNATURE-----

Merge tag 'probes-fixes-v6.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace

Pull probes fixes from Masami Hiramatsu:

 - Initialize 'ret' local variables on fprobe_handler() to fix the
   smatch warning. With this, fprobe function exit handler is not
   working randomly.

 - Fix to use preempt_enable/disable_notrace for rethook handler to
   prevent recursive call of fprobe exit handler (which is based on
   rethook)

 - Fix recursive call issue on fprobe_kprobe_handler()

 - Fix to detect recursive call on fprobe_exit_handler()

 - Fix to make all arch-dependent rethook code notrace (the
   arch-independent code is already notrace)"

* tag 'probes-fixes-v6.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
  rethook, fprobe: do not trace rethook related functions
  fprobe: add recursion detection in fprobe_exit_handler
  fprobe: make fprobe_kprobe_handler recursion free
  rethook: use preempt_{disable, enable}_notrace in rethook_trampoline_handler
  tracing: fprobe: Initialize ret valiable to fix smatch error
2023-05-18 09:04:45 -07:00
Ze Gao
571a2a50a8 rethook, fprobe: do not trace rethook related functions
These functions are already marked as NOKPROBE to prevent recursion and
we have the same reason to blacklist them if rethook is used with fprobe,
since they are beyond the recursion-free region ftrace can guard.

Link: https://lore.kernel.org/all/20230517034510.15639-5-zegao@tencent.com/

Fixes: f3a112c0c4 ("x86,rethook,kprobes: Replace kretprobe with rethook on x86")
Signed-off-by: Ze Gao <zegao@tencent.com>
Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Cc: stable@vger.kernel.org
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
2023-05-18 07:08:01 +09:00
Dave Hansen
ce0b15d11a x86/mm: Avoid incomplete Global INVLPG flushes
The INVLPG instruction is used to invalidate TLB entries for a
specified virtual address.  When PCIDs are enabled, INVLPG is supposed
to invalidate TLB entries for the specified address for both the
current PCID *and* Global entries.  (Note: Only kernel mappings set
Global=1.)

Unfortunately, some INVLPG implementations can leave Global
translations unflushed when PCIDs are enabled.

As a workaround, never enable PCIDs on affected processors.

I expect there to eventually be microcode mitigations to replace this
software workaround.  However, the exact version numbers where that
will happen are not known today.  Once the version numbers are set in
stone, the processor list can be tweaked to only disable PCIDs on
affected processors with affected microcode.

Note: if anyone wants a quick fix that doesn't require patching, just
stick 'nopcid' on your kernel command-line.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
2023-05-17 08:55:02 -07:00
Vernon Lovejoy
2e4be0d011 x86/show_trace_log_lvl: Ensure stack pointer is aligned, again
The commit e335bb51cc ("x86/unwind: Ensure stack pointer is aligned")
tried to align the stack pointer in show_trace_log_lvl(), otherwise the
"stack < stack_info.end" check can't guarantee that the last read does
not go past the end of the stack.

However, we have the same problem with the initial value of the stack
pointer, it can also be unaligned. So without this patch this trivial
kernel module

	#include <linux/module.h>

	static int init(void)
	{
		asm volatile("sub    $0x4,%rsp");
		dump_stack();
		asm volatile("add    $0x4,%rsp");

		return -EAGAIN;
	}

	module_init(init);
	MODULE_LICENSE("GPL");

crashes the kernel.

Fixes: e335bb51cc ("x86/unwind: Ensure stack pointer is aligned")
Signed-off-by: Vernon Lovejoy <vlovejoy@redhat.com>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Link: https://lore.kernel.org/r/20230512104232.GA10227@redhat.com
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
2023-05-16 06:31:04 -07:00
Linus Torvalds
ef21831c2e - Make sure the PEBS buffer is flushed before reprogramming the hardware
so that the correct record sizes are used
 
 - Update the sample size for AMD BRS events
 
 - Fix a confusion with using the same on-stack struct with different
   events in the event processing path
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmRgzWIACgkQEsHwGGHe
 VUpdgw//a1toWyjwrIV1YMu8lEpsrPKpOqIFuDQcLSl1vsYrmTRJ47PI1j/ZTQeo
 HgNkEE6lxAa9h/lKAjlE/lACE6Hr59xnQmu0BdG/SS+hlhWkT+oKLEUWz5qD4MuE
 bWdpxwHOhMIFR1ASAMThy/mE9V4TKsI/tsd7lMXUo6/skDGCmCGIgRq//3NUB5fV
 0ivp5lv6NXFnUwS34Ot3fbWj/be7rr2vkYgN8WbwMAaEbpCIyseh6Tz+5ZRbENfP
 dMdh6ryuJ2BJ9BcDe9XlcEvPcaTvz7LVnzOVFz/AnBgtBTIOw/26xt17pgXBH7NK
 kpTKQTPp0mnt6ysnX5zYkeumKaxxqvVWaf18AQHkupj1HwggjiEFPnKK9KfslSy4
 1tcED/D3i5QLOx+A8lCtA4ACwGl0Cvwgvw98Gp9imLst/zmMKa4MK96BYCodirKJ
 iDKN5aFA6c3pKJ4KTE7N6KKFzwhslTrehTHAJIL7BiVw3aMGin6514OnMELZBzam
 /zud81OWAKywWWRSwg7wy+K8RGH0R6K5dhwFrrm2BMqAluMq+rX1pRY9pEsL6jDj
 bCl45L52IsXZBSz2JTwWHGTssPyeDIe157ICFDOBnIx08u4KzJ+Knxsbaq2Jjs3R
 9wm5H9yp/+q7//3XcEkdFjQwDVh2LJkY0QinH+6rPiAseBC9ukU=
 =OCba
 -----END PGP SIGNATURE-----

Merge tag 'perf_urgent_for_v6.4_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull perf fixes from Borislav Petkov:

 - Make sure the PEBS buffer is flushed before reprogramming the
   hardware so that the correct record sizes are used

 - Update the sample size for AMD BRS events

 - Fix a confusion with using the same on-stack struct with different
   events in the event processing path

* tag 'perf_urgent_for_v6.4_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf/x86/intel/ds: Flush PEBS DS when changing PEBS_DATA_CFG
  perf/x86: Fix missing sample size update on AMD BRS
  perf/core: Fix perf_sample_data not properly initialized for different swevents in perf_tp_event()
2023-05-14 07:56:51 -07:00
Linus Torvalds
011e33ee48 - Add the required PCI IDs so that the generic SMN accesses provided by
amd_nb.c work for drivers which switch to them. Add a PCI device ID
   to k10temp's table so that latter is loaded on such systems too
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmRgx6IACgkQEsHwGGHe
 VUpzJQ//WL//vtzAyQyEQQ0HP5cUNwtts79rEWpNyHJD0nFymbOrR7ho97HTi7pu
 a+wc3p9gwTC626GJD9dFVZs9YPqXI/Q2BKndQ6vrct9eaVnxWiJ45B5yJYLCUppE
 AHAChrDZ8ErXQjV1jeHtkIZY1mSa5Q7slfn8KFLzY5gIWA/3ds2pmIDK4r2wqQUR
 xIn+kDaaprsu8vhffWtHKIj5n1zRkLwPNZZ/2rZbQ7NJHt0yqu4Ld/L6myC6cQVj
 cX1Z1Wg++T3rZHxChx69w/jYuCY2EeM5JHXEbQqUG1tnfkpInZqvyjz1FUs9iYOY
 NOv9hcod9jIRjGvMooKn7MDhVy3yCtyrR590tKVP4qj3Wb+wKKJtPECjMfCo6I0H
 nVIiQPMZAbbo9UVLVA9UickSxKUgNCvhANUcnYWEQp+DM8wYxbV35PT8lomNz6Cj
 3mKo/uqQPfGn3yUR1GIQagYSxHZJ1NTv50lPQOV0tZtfP2Gk/H4uT2w6ZFJKiJIV
 KTiE5vFW9VhbVye/VcHWyJ3+mB5SgClaA+t87zGpgJiPjfVWo2bnB6JkwZodEWi9
 V8c53Hq+NG78QnFYZR5WNDYQ96LYWfurhcuqol8Sr836Y9LaOCGEAIsPMvG7esB2
 y/GHRWdhkjOf5uNHduGoOVzlH95qW9mBtdFacVpaM/eocb1Zghk=
 =dn2p
 -----END PGP SIGNATURE-----

Merge tag 'x86_urgent_for_v6.4_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 fix from Borislav Petkov:

 - Add the required PCI IDs so that the generic SMN accesses provided by
   amd_nb.c work for drivers which switch to them. Add a PCI device ID
   to k10temp's table so that latter is loaded on such systems too

* tag 'x86_urgent_for_v6.4_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  hwmon: (k10temp) Add PCI ID for family 19, model 78h
  x86/amd_nb: Add PCI ID for family 19h model 78h
2023-05-14 07:44:48 -07:00
Borislav Petkov (AMD)
9a48d60467 x86/retbleed: Fix return thunk alignment
SYM_FUNC_START_LOCAL_NOALIGN() adds an endbr leading to this layout
(leaving only the last 2 bytes of the address):

  3bff <zen_untrain_ret>:
  3bff:       f3 0f 1e fa             endbr64
  3c03:       f6                      test   $0xcc,%bl

  3c04 <__x86_return_thunk>:
  3c04:       c3                      ret
  3c05:       cc                      int3
  3c06:       0f ae e8                lfence

However, "the RET at __x86_return_thunk must be on a 64 byte boundary,
for alignment within the BTB."

Use SYM_START instead.

Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: <stable@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2023-05-12 17:19:53 -05:00
Mario Limonciello
23a5b8bb02 x86/amd_nb: Add PCI ID for family 19h model 78h
Commit

  310e782a99 ("platform/x86/amd: pmc: Utilize SMN index 0 for driver probe")

switched to using amd_smn_read() which relies upon the misc PCI ID used
by DF function 3 being included in a table.  The ID for model 78h is
missing in that table, so amd_smn_read() doesn't work.

Add the missing ID into amd_nb, restoring s2idle on this system.

  [ bp: Simplify commit message. ]

Fixes: 310e782a99 ("platform/x86/amd: pmc: Utilize SMN index 0 for driver probe")
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>  # pci_ids.h
Acked-by: Guenter Roeck <linux@roeck-us.net>
Link: https://lore.kernel.org/r/20230427053338.16653-2-mario.limonciello@amd.com
2023-05-08 11:25:19 +02:00
Kan Liang
b752ea0c28 perf/x86/intel/ds: Flush PEBS DS when changing PEBS_DATA_CFG
Several similar kernel warnings can be triggered,

  [56605.607840] CPU0 PEBS record size 0, expected 32, config 0 cpuc->record_size=208

when the below commands are running in parallel for a while on SPR.

  while true;
  do
	perf record --no-buildid -a --intr-regs=AX  \
		    -e cpu/event=0xd0,umask=0x81/pp \
		    -c 10003 -o /dev/null ./triad;
  done &

  while true;
  do
	perf record -o /tmp/out -W -d \
		    -e '{ld_blocks.store_forward:period=1000000, \
                         MEM_TRANS_RETIRED.LOAD_LATENCY:u:precise=2:ldlat=4}' \
		    -c 1037 ./triad;
  done

The triad program is just the generation of loads/stores.

The warnings are triggered when an unexpected PEBS record (with a
different config and size) is found.

A system-wide PEBS event with the large PEBS config may be enabled
during a context switch. Some PEBS records for the system-wide PEBS
may be generated while the old task is sched out but the new one
hasn't been sched in yet. When the new task is sched in, the
cpuc->pebs_record_size may be updated for the per-task PEBS events. So
the existing system-wide PEBS records have a different size from the
later PEBS records.

The PEBS buffer should be flushed right before the hardware is
reprogrammed. The new size and threshold should be updated after the
old buffer has been flushed.

Reported-by: Stephane Eranian <eranian@google.com>
Suggested-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20230421184529.3320912-1-kan.liang@linux.intel.com
2023-05-08 10:58:27 +02:00
Namhyung Kim
90befef5a9 perf/x86: Fix missing sample size update on AMD BRS
It missed to convert a PERF_SAMPLE_BRANCH_STACK user to call the new
perf_sample_save_brstack() helper in order to update the dyn_size.
This affects AMD Zen3 machines with the branch-brs event.

Fixes: eb55b455ef ("perf/core: Add perf_sample_save_brstack() helper")
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20230427030527.580841-1-namhyung@kernel.org
2023-05-08 10:58:26 +02:00
Linus Torvalds
b115d85a95 Locking changes in v6.4:
- Introduce local{,64}_try_cmpxchg() - a slightly more optimal
    primitive, which will be used in perf events ring-buffer code.
 
  - Simplify/modify rwsems on PREEMPT_RT, to address writer starvation.
 
  - Misc cleanups/fixes.
 
 Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmRUvUoRHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1hlIhAArP33rTKi+HAndQ3UHW3XtmHRxEEQTfiE
 wvIoN89h58QW4DGMeAV4ltafbIPQAkI233Aogwz903L0qbDV0Ro4OU3XJembRuWl
 LeOADKwYyypXdOa8XICuY9aIP7e1/h0DF3ySs7inLcwK9JCyAIxnsVHYej+hsRXA
 kZoXN98T3TR1C0V9UQy4SU3HI1lC3tsG3R9Ti9TnYUg3ygVXhRE9lOQ4kv9lFPVz
 BNuj2Blj7KNiVaY9kehrhO54THI7NmsCVZO44Rcl48I0KAcFulAmFcNlE7GnR8Nj
 thj38pU6XAFVHXG8MYjgE+Al+PnK48NtJxexCtHyGvGG4D2aLzRMnkolxAUCcVuK
 G+UBsQm3ybjYgHgt1zuN6ehcpT+5tULkDH8JA7vrgZYaVgxHzsUaHgYfCCWKnmUY
 mPR6aImEmYZwZVNLskhe0HT4mq244bp+VnWlnJ6LZK7t/itenvDhqnj7KTi4Bfej
 lTHplOTitV/8uCEW8V4pX+YTEenVsIQmTc/G3iIabXP/6HzLffA3q4vyW6vKIErE
 pqrpuFA0Z4GB+pU0mJXt7+I7zscDVthwI055jDyQBjA7IcdVGm2MjQ6xcNRW5FYN
 UynvaEMocue4ZO4WdFsd1ZBUd9VfoNzGQspBw46DhCL1MEQBYv36SKQNjej/9aRr
 ilVwqnOWI2s=
 =mM0A
 -----END PGP SIGNATURE-----

Merge tag 'locking-core-2023-05-05' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull locking updates from Ingo Molnar:

 - Introduce local{,64}_try_cmpxchg() - a slightly more optimal
   primitive, which will be used in perf events ring-buffer code

 - Simplify/modify rwsems on PREEMPT_RT, to address writer starvation

 - Misc cleanups/fixes

* tag 'locking-core-2023-05-05' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  locking/atomic: Correct (cmp)xchg() instrumentation
  locking/x86: Define arch_try_cmpxchg_local()
  locking/arch: Wire up local_try_cmpxchg()
  locking/generic: Wire up local{,64}_try_cmpxchg()
  locking/atomic: Add generic try_cmpxchg{,64}_local() support
  locking/rwbase: Mitigate indefinite writer starvation
  locking/arch: Rename all internal __xchg() names to __arch_xchg()
2023-05-05 12:56:55 -07:00
Linus Torvalds
d5ed10bb80 Merge branch 'x86-uaccess-cleanup': x86 uaccess header cleanups
Merge my x86 uaccess updates branch.

The LAM ("Linear Address Masking") updates in this release made me
unhappy about how "access_ok()" was done, and it actually turned out to
have a couple of small bugs in it too.  This is my cleanup of the code:

 - use the sign bit of the __user pointer rather than masking the
   address and checking it against the TASK_SIZE range.

   We already did this part for the get/put_user() side, but
   'access_ok()' did the naïve "mask and range check" thing, which not
   only generates nasty code, but also ended up meaning that __access_ok
   itself didn't do a good job, and so copy_from_user_nmi() didn't get
   the check right.

 - move all the code that is 64-bit only into the 64-bit version of the
   header file, so that we don't unnecessarily pollute the shared x86
   code and make it look like LAM might work in 32-bit too.

 - fix a bug in the address masking (that doesn't end up mattering: in
   this case the fix was to just remove the buggy code entirely).

 - a couple of trivial cleanups and added commentary about the
   access_ok() rules.

* x86-uaccess-cleanup:
  x86-64: mm: clarify the 'positive addresses' user address rules
  x86: mm: remove 'sign' games from LAM untagged_addr*() macros
  x86: uaccess: move 32-bit and 64-bit parts into proper <asm/uaccess_N.h> header
  x86: mm: remove architecture-specific 'access_ok()' define
  x86-64: make access_ok() independent of LAM
2023-05-05 12:29:57 -07:00
Paolo Bonzini
29b38e7650 Fix a long-standing flaw in x86's TDP MMU where unloading roots on a vCPU can
result in the root being freed even though the root is completely valid and
 can be reused as-is (with a TLB flush).
 -----BEGIN PGP SIGNATURE-----
 
 iQJGBAABCgAwFiEEMHr+pfEFOIzK+KY1YJEiAU0MEvkFAmRP/3ESHHNlYW5qY0Bn
 b29nbGUuY29tAAoJEGCRIgFNDBL5J7kQAIg6v9UzM/qp7/d6C4laZLTWC2YlGhiI
 1ZrfLU3/gQPYnnxv8GzLZ1CaXDhku2IIdyl2AQe8sUEmold45EapAW32rw2127j1
 z4jW8x8dKYXUd1HGe823O0Rm+Ls6bGcXmHj8LaBCBIV6loBINeNfLXNllsO/yIcR
 PmagzEqkNsMW3mvutdqb9mFP8p+mBzQu5qHlMEUb4WOXBmL06teHjR3qo7hi9Kl0
 nM0ZvuvCLGvufoI0RESiq7mXPKBz3yvhFbkjrUgBKQ/rij2PMO8iyULsLfGY1iAI
 m60aBfQPLJIH0NgvNHXkQOW59COYaY+I8udZqZZNr2uVb5A8J+/rQFSG/BP1Ccsw
 mtJgZRD5WdplcAjYlZCcEgBmwjznjSOFGYaOrAp02dJlbPw2/Tjaj1GHMvMjEIME
 KLvWTsN6xB9K0OhiXFvo1N4FCJbfi+PJPK0qVG7UttPnziCwYqAeIhGk4Kj6SHsX
 P23HnDO8U/rCwRG2tuyZmbllpUXsX0q08wyGlp1UcKAbtD8PPGPyz8+I7YakKI97
 RddIAh2qh5hwHON1xe35VSQ8X0OPOK1UnkiGTuBDdfldzxXK7OCfKKVQ6hsnpV6e
 0a6nQc2Ni7/f5jThPo2cTaKz389ZpVE2j1DaTT8QXq5JuBcTzrNI6HImcJwPFTWP
 +kUxewuRaaog
 =pzJT
 -----END PGP SIGNATURE-----

Merge tag 'kvm-x86-mmu-6.4-2' of https://github.com/kvm-x86/linux into HEAD

Fix a long-standing flaw in x86's TDP MMU where unloading roots on a vCPU can
result in the root being freed even though the root is completely valid and
can be reused as-is (with a TLB flush).
2023-05-05 06:12:36 -04:00
Linus Torvalds
342528ff00 This pull request contains the following changes for UML:
- Make stub data pages configurable
 - Make it harder to mix user and kernel code by accident
 -----BEGIN PGP SIGNATURE-----
 
 iQJKBAABCAA0FiEEdgfidid8lnn52cLTZvlZhesYu8EFAmRSslcWHHJpY2hhcmRA
 c2lnbWEtc3Rhci5hdAAKCRBm+VmF6xi7wVbGEADWL/4rcVNqaD/b9V3bUXDkNhpG
 81Jc9fXcx3OXV87GsxfadiQI05UaOhTAMWJlsdKUF0/IW5tSQSZc0QcDNxTs6aS1
 NDkWvjk8s8OpbqdmV0cNKFmaeginrH4AuASajvNleU0BxX4ziqw6BPadih+wGVGz
 iVEd/M5YH2yet33pvD7Wzh3KZOPlo2OOKBqXzE6Snc3yoOiIfN95U8Cl23nxzMI4
 F235LS5m+4+GWIMkYs6+hlWEa8SxSnsH/beQJxAybtetSHRxsDXOHlEpDI9olxYI
 +OFNqsFLDyhyjib2Snwt8HMES9VmMfUUwT7ZK1cChXN8Quix3lXAyz6rpocmdnL/
 X1tzqWb1bz+yqrVRjtN3AGbcav+sQp4UWXsoExVsiz3lzaXfrbPvRZFucH2+SStK
 Wadyy2v2rHZKNq9sDLUBf5GJ0C4LrBAyT15ADim2L5/K7pZyI7AKSPcdU0Xc8m5y
 kMswml7FLaI+gsuKOBl7jySaoNDiVZFvQjsxd1+hn5cBNOcyBSt2rSErDw3geJgn
 pFzVkLtklfxMpHMm+bwXONBZMrXOBPNgo0gSkVrCVzO68x6j/MaBGLFazSqOKxbc
 UcVULhiomjIae4AcTomzOQx7oM1Xs3XuPo0x2RgJ3gMlCCZKoDNuqDgB3ZmPP/gq
 cbbHHveXEx7aMOPz0A==
 =Qb+I
 -----END PGP SIGNATURE-----

Merge tag 'uml-for-linus-6.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/uml/linux

Pull uml updates from Richard Weinberger:

 - Make stub data pages configurable

 - Make it harder to mix user and kernel code by accident

* tag 'uml-for-linus-6.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/uml/linux:
  um: make stub data pages size tweakable
  um: prevent user code in modules
  um: further clean up user_syms
  um: don't export printf()
  um: hostfs: define our own API boundary
  um: add __weak for exported functions
2023-05-03 19:02:03 -07:00
Linus Torvalds
798dec3304 x86-64: mm: clarify the 'positive addresses' user address rules
Dave Hansen found the "(long) addr >= 0" code in the x86-64 access_ok
checks somewhat confusing, and suggested using a helper to clarify what
the code is doing.

So this does exactly that: clarifying what the sign bit check is all
about, by adding a helper macro that makes it clear what it is testing.

This also adds some explicit comments talking about how even with LAM
enabled, any addresses with the sign bit will still GP-fault in the
non-canonical region just above the sign bit.

This is all what allows us to do the user address checks with just the
sign bit, and furthermore be a bit cavalier about accesses that might be
done with an additional offset even past that point.

(And yes, this talks about 'positive' even though zero is also a valid
user address and so technically we should call them 'non-negative'.  But
I don't think using 'non-negative' ends up being more understandable).

Suggested-by: Dave Hansen <dave.hansen@intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2023-05-03 10:37:22 -07:00
Linus Torvalds
1dbc0a9515 x86: mm: remove 'sign' games from LAM untagged_addr*() macros
The intent of the sign games was to not modify kernel addresses when
untagging them.  However, that had two issues:

 (a) it didn't actually work as intended, since the mask was calculated
     as 'addr >> 63' on an _unsigned_ address. So instead of getting a
     mask of all ones for kernel addresses, you just got '1'.

 (b) untagging a kernel address isn't actually a valid operation anyway.

Now, (a) had originally been true for both 'untagged_addr()' and the
remote version of it, but had accidentally been fixed for the regular
version of untagged_addr() by commit e0bddc19ba ("x86/mm: Reduce
untagged_addr() overhead for systems without LAM").  That one rewrote
the shift to be part of the alternative asm code, and in the process
changed the unsigned shift into a signed 'sar' instruction.

And while it is true that we don't want to turn what looks like a kernel
address into a user address by masking off the high bit, that doesn't
need these sign masking games - all it needs is that the mm context
'untag_mask' value has the high bit set.

Which it always does.

So simplify the code by just removing the superfluous (and in the case
of untagged_addr_remote(), still buggy) sign bit games in the address
masking.

Acked-by: Dave Hansen <dave.hansen@intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2023-05-03 10:37:22 -07:00
Linus Torvalds
b9bd9f605c x86: uaccess: move 32-bit and 64-bit parts into proper <asm/uaccess_N.h> header
The x86 <asm/uaccess.h> file has grown features that are specific to
x86-64 like LAM support and the related access_ok() changes.  They
really should be in the <asm/uaccess_64.h> file and not pollute the
generic x86 header.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2023-05-03 10:37:22 -07:00
Linus Torvalds
6ccdc91d6a x86: mm: remove architecture-specific 'access_ok()' define
There's already a generic definition of 'access_ok()' in the
asm-generic/access_ok.h header file, and the only difference bwteen that
and the x86-specific one is the added check for WARN_ON_IN_IRQ().

And it turns out that the reason for that check is long gone: it used to
use a "user_addr_max()" inline function that depended on the current
thread, and caused problems in non-thread contexts.

For details, see commits 7c4788950b ("x86/uaccess, sched/preempt:
Verify access_ok() context") and in particular commit ae31fe51a3
("perf/x86: Restore TASK_SIZE check on frame pointer") about how and why
this came to be.

But that "current task" issue was removed in the big set_fs() removal by
Christoph Hellwig in commit 47058bb54b ("x86: remove address space
overrides using set_fs()").

So the reason for the test and the architecture-specific access_ok()
define no longer exists, and is actually harmful these days.  For
example, it led various 'copy_from_user_nmi()' games (eg using
__range_not_ok() instead, and then later converted to __access_ok() when
that became ok).

And that in turn meant that LAM was broken for the frame following
before this series, because __access_ok() used to not do the address
untagging.

Accessing user state still needs care in many contexts, but access_ok()
is not the place for this test.

Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Linus Torvalds torvalds@linux-foundation.org>
2023-05-03 10:37:22 -07:00
Linus Torvalds
6014bc2756 x86-64: make access_ok() independent of LAM
The linear address masking (LAM) code made access_ok() more complicated,
in that it now needs to untag the address in order to verify the access
range.  See commit 74c228d20a ("x86/uaccess: Provide untagged_addr()
and remove tags before address check").

We were able to avoid that overhead in the get_user/put_user code paths
by simply using the sign bit for the address check, and depending on the
GP fault if the address was non-canonical, which made it all independent
of LAM.

And we can do the same thing for access_ok(): simply check that the user
pointer range has the high bit clear.  No need to bother with any
address bit masking.

In fact, we can go a bit further, and just check the starting address
for known small accesses ranges: any accesses that overflow will still
be in the non-canonical area and will still GP fault.

To still make syzkaller catch any potentially unchecked user addresses,
we'll continue to warn about GP faults that are caused by accesses in
the non-canonical range.  But we'll limit that to purely "high bit set
and past the one-page 'slop' area".

We could probably just do that "check only starting address" for any
arbitrary range size: realistically all kernel accesses to user space
will be done starting at the low address.  But let's leave that kind of
optimization for later.  As it is, this already allows us to generate
simpler code and not worry about any tag bits in the address.

The one thing to look out for is the GUP address check: instead of
actually copying data in the virtual address range (and thus bad
addresses being caught by the GP fault), GUP will look up the page
tables manually.  As a result, the page table limits need to be checked,
and that was previously implicitly done by the access_ok().

With the relaxed access_ok() check, we need to just do an explicit check
for TASK_SIZE_MAX in the GUP code instead.  The GUP code already needs
to do the tag bit unmasking anyway, so there this is all very
straightforward, and there are no LAM issues.

Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2023-05-03 10:37:22 -07:00
Linus Torvalds
c8c655c34e s390:
* More phys_to_virt conversions
 
 * Improvement of AP management for VSIE (nested virtualization)
 
 ARM64:
 
 * Numerous fixes for the pathological lock inversion issue that
   plagued KVM/arm64 since... forever.
 
 * New framework allowing SMCCC-compliant hypercalls to be forwarded
   to userspace, hopefully paving the way for some more features
   being moved to VMMs rather than be implemented in the kernel.
 
 * Large rework of the timer code to allow a VM-wide offset to be
   applied to both virtual and physical counters as well as a
   per-timer, per-vcpu offset that complements the global one.
   This last part allows the NV timer code to be implemented on
   top.
 
 * A small set of fixes to make sure that we don't change anything
   affecting the EL1&0 translation regime just after having having
   taken an exception to EL2 until we have executed a DSB. This
   ensures that speculative walks started in EL1&0 have completed.
 
 * The usual selftest fixes and improvements.
 
 KVM x86 changes for 6.4:
 
 * Optimize CR0.WP toggling by avoiding an MMU reload when TDP is enabled,
   and by giving the guest control of CR0.WP when EPT is enabled on VMX
   (VMX-only because SVM doesn't support per-bit controls)
 
 * Add CR0/CR4 helpers to query single bits, and clean up related code
   where KVM was interpreting kvm_read_cr4_bits()'s "unsigned long" return
   as a bool
 
 * Move AMD_PSFD to cpufeatures.h and purge KVM's definition
 
 * Avoid unnecessary writes+flushes when the guest is only adding new PTEs
 
 * Overhaul .sync_page() and .invlpg() to utilize .sync_page()'s optimizations
   when emulating invalidations
 
 * Clean up the range-based flushing APIs
 
 * Revamp the TDP MMU's reaping of Accessed/Dirty bits to clear a single
   A/D bit using a LOCK AND instead of XCHG, and skip all of the "handle
   changed SPTE" overhead associated with writing the entire entry
 
 * Track the number of "tail" entries in a pte_list_desc to avoid having
   to walk (potentially) all descriptors during insertion and deletion,
   which gets quite expensive if the guest is spamming fork()
 
 * Disallow virtualizing legacy LBRs if architectural LBRs are available,
   the two are mutually exclusive in hardware
 
 * Disallow writes to immutable feature MSRs (notably PERF_CAPABILITIES)
   after KVM_RUN, similar to CPUID features
 
 * Overhaul the vmx_pmu_caps selftest to better validate PERF_CAPABILITIES
 
 * Apply PMU filters to emulated events and add test coverage to the
   pmu_event_filter selftest
 
 x86 AMD:
 
 * Add support for virtual NMIs
 
 * Fixes for edge cases related to virtual interrupts
 
 x86 Intel:
 
 * Don't advertise XTILE_CFG in KVM_GET_SUPPORTED_CPUID if XTILE_DATA is
   not being reported due to userspace not opting in via prctl()
 
 * Fix a bug in emulation of ENCLS in compatibility mode
 
 * Allow emulation of NOP and PAUSE for L2
 
 * AMX selftests improvements
 
 * Misc cleanups
 
 MIPS:
 
 * Constify MIPS's internal callbacks (a leftover from the hardware enabling
   rework that landed in 6.3)
 
 Generic:
 
 * Drop unnecessary casts from "void *" throughout kvm_main.c
 
 * Tweak the layout of "struct kvm_mmu_memory_cache" to shrink the struct
   size by 8 bytes on 64-bit kernels by utilizing a padding hole
 
 Documentation:
 
 * Fix goof introduced by the conversion to rST
 -----BEGIN PGP SIGNATURE-----
 
 iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmRNExkUHHBib256aW5p
 QHJlZGhhdC5jb20ACgkQv/vSX3jHroNyjwf+MkzDael9y9AsOZoqhEZ5OsfQYJ32
 Im5ZVYsPRU2K5TuoWql6meIihgclCj1iIU32qYHa2F1WYt2rZ72rJp+HoY8b+TaI
 WvF0pvNtqQyg3iEKUBKPA4xQ6mj7RpQBw86qqiCHmlfNt0zxluEGEPxH8xrWcfhC
 huDQ+NUOdU7fmJ3rqGitCvkUbCuZNkw3aNPR8dhU8RAWrwRzP2hBOmdxIeo81WWY
 XMEpJSijbGpXL9CvM0Jz9nOuMJwZwCCBGxg1vSQq0xTfLySNMxzvWZC2GFaBjucb
 j0UOQ7yE0drIZDVhd3sdNslubXXU6FcSEzacGQb9aigMUon3Tem9SHi7Kw==
 =S2Hq
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull kvm updates from Paolo Bonzini:
 "s390:

   - More phys_to_virt conversions

   - Improvement of AP management for VSIE (nested virtualization)

  ARM64:

   - Numerous fixes for the pathological lock inversion issue that
     plagued KVM/arm64 since... forever.

   - New framework allowing SMCCC-compliant hypercalls to be forwarded
     to userspace, hopefully paving the way for some more features being
     moved to VMMs rather than be implemented in the kernel.

   - Large rework of the timer code to allow a VM-wide offset to be
     applied to both virtual and physical counters as well as a
     per-timer, per-vcpu offset that complements the global one. This
     last part allows the NV timer code to be implemented on top.

   - A small set of fixes to make sure that we don't change anything
     affecting the EL1&0 translation regime just after having having
     taken an exception to EL2 until we have executed a DSB. This
     ensures that speculative walks started in EL1&0 have completed.

   - The usual selftest fixes and improvements.

  x86:

   - Optimize CR0.WP toggling by avoiding an MMU reload when TDP is
     enabled, and by giving the guest control of CR0.WP when EPT is
     enabled on VMX (VMX-only because SVM doesn't support per-bit
     controls)

   - Add CR0/CR4 helpers to query single bits, and clean up related code
     where KVM was interpreting kvm_read_cr4_bits()'s "unsigned long"
     return as a bool

   - Move AMD_PSFD to cpufeatures.h and purge KVM's definition

   - Avoid unnecessary writes+flushes when the guest is only adding new
     PTEs

   - Overhaul .sync_page() and .invlpg() to utilize .sync_page()'s
     optimizations when emulating invalidations

   - Clean up the range-based flushing APIs

   - Revamp the TDP MMU's reaping of Accessed/Dirty bits to clear a
     single A/D bit using a LOCK AND instead of XCHG, and skip all of
     the "handle changed SPTE" overhead associated with writing the
     entire entry

   - Track the number of "tail" entries in a pte_list_desc to avoid
     having to walk (potentially) all descriptors during insertion and
     deletion, which gets quite expensive if the guest is spamming
     fork()

   - Disallow virtualizing legacy LBRs if architectural LBRs are
     available, the two are mutually exclusive in hardware

   - Disallow writes to immutable feature MSRs (notably
     PERF_CAPABILITIES) after KVM_RUN, similar to CPUID features

   - Overhaul the vmx_pmu_caps selftest to better validate
     PERF_CAPABILITIES

   - Apply PMU filters to emulated events and add test coverage to the
     pmu_event_filter selftest

   - AMD SVM:
       - Add support for virtual NMIs
       - Fixes for edge cases related to virtual interrupts

   - Intel AMX:
       - Don't advertise XTILE_CFG in KVM_GET_SUPPORTED_CPUID if
         XTILE_DATA is not being reported due to userspace not opting in
         via prctl()
       - Fix a bug in emulation of ENCLS in compatibility mode
       - Allow emulation of NOP and PAUSE for L2
       - AMX selftests improvements
       - Misc cleanups

  MIPS:

   - Constify MIPS's internal callbacks (a leftover from the hardware
     enabling rework that landed in 6.3)

  Generic:

   - Drop unnecessary casts from "void *" throughout kvm_main.c

   - Tweak the layout of "struct kvm_mmu_memory_cache" to shrink the
     struct size by 8 bytes on 64-bit kernels by utilizing a padding
     hole

  Documentation:

   - Fix goof introduced by the conversion to rST"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (211 commits)
  KVM: s390: pci: fix virtual-physical confusion on module unload/load
  KVM: s390: vsie: clarifications on setting the APCB
  KVM: s390: interrupt: fix virtual-physical confusion for next alert GISA
  KVM: arm64: Have kvm_psci_vcpu_on() use WRITE_ONCE() to update mp_state
  KVM: arm64: Acquire mp_state_lock in kvm_arch_vcpu_ioctl_vcpu_init()
  KVM: selftests: Test the PMU event "Instructions retired"
  KVM: selftests: Copy full counter values from guest in PMU event filter test
  KVM: selftests: Use error codes to signal errors in PMU event filter test
  KVM: selftests: Print detailed info in PMU event filter asserts
  KVM: selftests: Add helpers for PMC asserts in PMU event filter test
  KVM: selftests: Add a common helper for the PMU event filter guest code
  KVM: selftests: Fix spelling mistake "perrmited" -> "permitted"
  KVM: arm64: vhe: Drop extra isb() on guest exit
  KVM: arm64: vhe: Synchronise with page table walker on MMU update
  KVM: arm64: pkvm: Document the side effects of kvm_flush_dcache_to_poc()
  KVM: arm64: nvhe: Synchronise with page table walker on TLBI
  KVM: arm64: Handle 32bit CNTPCTSS traps
  KVM: arm64: nvhe: Synchronise with page table walker on vcpu run
  KVM: arm64: vgic: Don't acquire its_lock before config_lock
  KVM: selftests: Add test to verify KVM's supported XCR0
  ...
2023-05-01 12:06:20 -07:00
Linus Torvalds
58390c8ce1 IOMMU Updates for Linux 6.4
Including:
 
 	- Convert to platform remove callback returning void
 
 	- Extend changing default domain to normal group
 
 	- Intel VT-d updates:
 	    - Remove VT-d virtual command interface and IOASID
 	    - Allow the VT-d driver to support non-PRI IOPF
 	    - Remove PASID supervisor request support
 	    - Various small and misc cleanups
 
 	- ARM SMMU updates:
 	    - Device-tree binding updates:
 	        * Allow Qualcomm GPU SMMUs to accept relevant clock properties
 	        * Document Qualcomm 8550 SoC as implementing an MMU-500
 	        * Favour new "qcom,smmu-500" binding for Adreno SMMUs
 
 	    - Fix S2CR quirk detection on non-architectural Qualcomm SMMU
 	      implementations
 
 	    - Acknowledge SMMUv3 PRI queue overflow when consuming events
 
 	    - Document (in a comment) why ATS is disabled for bypass streams
 
 	- AMD IOMMU updates:
 	    - 5-level page-table support
 	    - NUMA awareness for memory allocations
 
 	- Unisoc driver: Support for reattaching an existing domain
 
 	- Rockchip driver: Add missing set_platform_dma_ops callback
 
 	- Mediatek driver: Adjust the dma-ranges
 
 	- Various other small fixes and cleanups
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEr9jSbILcajRFYWYyK/BELZcBGuMFAmRONeAACgkQK/BELZcB
 GuPmpw/8C9ruxQ0JU5rcDBXQGvos4gMmxlbELMrBpbbiTtdb35xchpKfdhnECGIF
 k2SrrcF40R/S82SyzNU/eZtGKirtcXvGFraUFgu/QdCcnnqpRHs+IJMXX2NJP+it
 +0wO1uiInt3CN1ERcR4F31cDKiWjDG8bvQVE5LIyiy4KrIU5ld2G91Fkaa0R13Au
 6H+/wKkcUC6OyaGE6wPx474xBkapT20vj5AIQuAWisXJJR0wbBon1sUTo/IRKsU+
 IkNxH0W+1PNImJ+crAdf/nkOlyqoChY4ww6cm07LrOsBLIsX5bCqXfL4HvKthElD
 MEgk2SN5kfjfR5Vf29W4hZVM1CT8VbhO41I7OzaZ6X6RU2PXoldPKlgKtZGeSKn1
 9bcMpSgB0BtbttvBevSkxTo5KHFozXS2DG3DFoMB3yFMme8Th0LrhBZ9oB7NIPNw
 ntMo4K75vviC6Vvzjy4Anj/+y+Zm3W6wDDP7F12O6WZLkK5s4hrSsHUm/MQnnKQP
 muJlG870RnSl73xUQZe3cuBxktXuJ3EHqqYIPE0npzvauu8hhWcis3opf2Y+U2s8
 aBCCIgp5kTKqjHLh2e4lNCKZf1/b/dhxRcRBQhpAIb8YsjMlIJyM+G8Jz6K6gBga
 5Ld+68UQ3oHJwoLV1HCFN8jbpQ9KZn1s9+h3yrYjRAcLNiFb3nU=
 =OvTo
 -----END PGP SIGNATURE-----

Merge tag 'iommu-updates-v6.4' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu

Pull iommu updates from Joerg Roedel:

 - Convert to platform remove callback returning void

 - Extend changing default domain to normal group

 - Intel VT-d updates:
     - Remove VT-d virtual command interface and IOASID
     - Allow the VT-d driver to support non-PRI IOPF
     - Remove PASID supervisor request support
     - Various small and misc cleanups

 - ARM SMMU updates:
     - Device-tree binding updates:
         * Allow Qualcomm GPU SMMUs to accept relevant clock properties
         * Document Qualcomm 8550 SoC as implementing an MMU-500
         * Favour new "qcom,smmu-500" binding for Adreno SMMUs

     - Fix S2CR quirk detection on non-architectural Qualcomm SMMU
       implementations

     - Acknowledge SMMUv3 PRI queue overflow when consuming events

     - Document (in a comment) why ATS is disabled for bypass streams

 - AMD IOMMU updates:
     - 5-level page-table support
     - NUMA awareness for memory allocations

 - Unisoc driver: Support for reattaching an existing domain

 - Rockchip driver: Add missing set_platform_dma_ops callback

 - Mediatek driver: Adjust the dma-ranges

 - Various other small fixes and cleanups

* tag 'iommu-updates-v6.4' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: (82 commits)
  iommu: Remove iommu_group_get_by_id()
  iommu: Make iommu_release_device() static
  iommu/vt-d: Remove BUG_ON in dmar_insert_dev_scope()
  iommu/vt-d: Remove a useless BUG_ON(dev->is_virtfn)
  iommu/vt-d: Remove BUG_ON in map/unmap()
  iommu/vt-d: Remove BUG_ON when domain->pgd is NULL
  iommu/vt-d: Remove BUG_ON in handling iotlb cache invalidation
  iommu/vt-d: Remove BUG_ON on checking valid pfn range
  iommu/vt-d: Make size of operands same in bitwise operations
  iommu/vt-d: Remove PASID supervisor request support
  iommu/vt-d: Use non-privileged mode for all PASIDs
  iommu/vt-d: Remove extern from function prototypes
  iommu/vt-d: Do not use GFP_ATOMIC when not needed
  iommu/vt-d: Remove unnecessary checks in iopf disabling path
  iommu/vt-d: Move PRI handling to IOPF feature path
  iommu/vt-d: Move pfsid and ats_qdep calculation to device probe path
  iommu/vt-d: Move iopf code from SVA to IOPF enabling path
  iommu/vt-d: Allow SVA with device-specific IOPF
  dmaengine: idxd: Add enable/disable device IOPF feature
  arm64: dts: mt8186: Add dma-ranges for the parent "soc" node
  ...
2023-04-30 13:00:38 -07:00
Uros Bizjak
5cd4c26841 locking/x86: Define arch_try_cmpxchg_local()
Define target specific arch_try_cmpxchg_local(). This
definition overrides the generic arch_try_cmpxchg_local()
fallback definition and enables target-specific
implementation of try_cmpxchg_local().

Signed-off-by: Uros Bizjak <ubizjak@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20230405141710.3551-5-ubizjak@gmail.com
Cc: Linus Torvalds <torvalds@linux-foundation.org>
2023-04-29 09:09:23 +02:00
Uros Bizjak
d994f2c8e2 locking/arch: Wire up local_try_cmpxchg()
Implement target specific support for local_try_cmpxchg()
and local_cmpxchg() using typed C wrappers that call their
_local counterpart and provide additional checking of
their input arguments.

Signed-off-by: Uros Bizjak <ubizjak@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20230405141710.3551-4-ubizjak@gmail.com
Cc: Linus Torvalds <torvalds@linux-foundation.org>
2023-04-29 09:09:16 +02:00
Linus Torvalds
f20730efbd SMP cross-CPU function-call updates for v6.4:
- Remove diagnostics and adjust config for CSD lock diagnostics
 
  - Add a generic IPI-sending tracepoint, as currently there's no easy
    way to instrument IPI origins: it's arch dependent and for some
    major architectures it's not even consistently available.
 
 Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmRK438RHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1jJ5Q/5AZ0HGpyqwdFK8GmGznyu5qjP5HwV9pPq
 gZQScqSy4tZEeza4TFMi83CoXSg9uJ7GlYJqqQMKm78LGEPomnZtXXC7oWvTA9M5
 M/jAvzytmvZloSCXV6kK7jzSejMHhag97J/BjTYhZYQpJ9T+hNC87XO6J6COsKr9
 lPIYqkFrIkQNr6B0U11AQfFejRYP1ics2fnbnZL86G/zZAc6x8EveM3KgSer2iHl
 KbrO+xcYyGY8Ef9P2F72HhEGFfM3WslpT1yzqR3sm4Y+fuMG0oW3qOQuMJx0ZhxT
 AloterY0uo6gJwI0P9k/K4klWgz81Tf/zLb0eBAtY2uJV9Fo3YhPHuZC7jGPGAy3
 JusW2yNYqc8erHVEMAKDUsl/1KN4TE2uKlkZy98wno+KOoMufK5MA2e2kPPqXvUi
 Jk9RvFolnWUsexaPmCftti0OCv3YFiviVAJ/t0pchfmvvJA2da0VC9hzmEXpLJVF
 25nBTV/1uAOrWvOpCyo3ElrC2CkQVkFmK5rXMDdvf6ib0Nid4vFcCkCSLVfu+ePB
 11mi7QYro+CcnOug1K+yKogUDmsZgV/u1kUwgQzTIpZ05Kkb49gUiXw9L2RGcBJh
 yoDoiI66KPR7PWQ2qBdQoXug4zfEEtWG0O9HNLB0FFRC3hu7I+HHyiUkBWs9jasK
 PA5+V7HcQRk=
 =Wp7f
 -----END PGP SIGNATURE-----

Merge tag 'smp-core-2023-04-27' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull SMP cross-CPU function-call updates from Ingo Molnar:

 - Remove diagnostics and adjust config for CSD lock diagnostics

 - Add a generic IPI-sending tracepoint, as currently there's no easy
   way to instrument IPI origins: it's arch dependent and for some major
   architectures it's not even consistently available.

* tag 'smp-core-2023-04-27' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  trace,smp: Trace all smp_function_call*() invocations
  trace: Add trace_ipi_send_cpu()
  sched, smp: Trace smp callback causing an IPI
  smp: reword smp call IPI comment
  treewide: Trace IPIs sent via smp_send_reschedule()
  irq_work: Trace self-IPIs sent via arch_irq_work_raise()
  smp: Trace IPIs sent via arch_send_call_function_ipi_mask()
  sched, smp: Trace IPIs sent via send_call_function_single_ipi()
  trace: Add trace_ipi_send_cpumask()
  kernel/smp: Make csdlock_debug= resettable
  locking/csd_lock: Remove per-CPU data indirection from CSD lock debugging
  locking/csd_lock: Remove added data from CSD lock debugging
  locking/csd_lock: Add Kconfig option for csd_debug default
2023-04-28 15:03:43 -07:00
Linus Torvalds
7c339778f9 Perf changes for v6.4:
- Add Intel Granite Rapids support
 
  - Add uncore events for Intel SPR IMC PMU
 
  - Fix perf IRQ throttling bug
 
 Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmRK3GoRHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1hqghAAwZyNLY8oAu/izkp5DKML7okLa48nh1+K
 JWsM4GT16ldx6MBqxX4V7tiYfAeo6ydCkd/LbxPklAk4Fmcwt5HWotOEGHab7N7B
 iTOix481TIa+E7ERuDLU5vS0chtdE5CrVfmBwtkI4WEv0c8vBwQHvRzy6RaoGG+2
 XSqwFhkDG7l6N6rIz/V8zJKFo0hNPou0bllYJMM+A+BRdOthKhgXNjM6yuuKXgst
 TWxByDCoAG59uWvzq3WmKRLk/bJ4VC2lRDFpo01xtPvn1hA/Bs8MDF1940NG8uEh
 u7aTI1ZyaJJnnxaTk98r/aZIMYlHHWIIKnymtVFT52ZLeKARDt6MmX5ttec16Cd1
 a5IXuiWPkiSfGlmN7Q4sK89eg606WQ/i+eIKfNMG0ZruvKFiT0uD+gNUlQvE/4fk
 6OgJot/iF2B5+UWcBmEOMYV4jagSWLFkIQ3vGVT6E3Zp8gEa1/R3ek1aSbQeb8Gi
 kvMjcwthUNOgGIXrFKbdDTN/3seoNmrgIP2wetUzXhVaAOIYz6mpxIcXLc+vNiDh
 soTRtkLgNTkrVQ8AKp7snKNRLjazL6CqUd+RILH4tM+bshFnyeH6K4Eck+Fo//Dn
 XtEHry6nJ0ZPhsJPRwFHOoYCAuDQdy5DG7QQ07qXYW9DyJV7ZmefDJYOEjgcip6G
 f3/sUe7ekRA=
 =41Sa
 -----END PGP SIGNATURE-----

Merge tag 'perf-core-2023-04-27' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull perf updates from Ingo Molnar:

 - Add Intel Granite Rapids support

 - Add uncore events for Intel SPR IMC PMU

 - Fix perf IRQ throttling bug

* tag 'perf-core-2023-04-27' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf/x86/intel/uncore: Add events for Intel SPR IMC PMU
  perf/core: Fix hardlockup failure caused by perf throttle
  perf/x86/cstate: Add Granite Rapids support
  perf/x86/msr: Add Granite Rapids
  perf/x86/intel: Add Granite Rapids
2023-04-28 14:41:53 -07:00
Linus Torvalds
2aff7c706c Objtool changes for v6.4:
- Mark arch_cpu_idle_dead() __noreturn, make all architectures & drivers that did
    this inconsistently follow this new, common convention, and fix all the fallout
    that objtool can now detect statically.
 
  - Fix/improve the ORC unwinder becoming unreliable due to UNWIND_HINT_EMPTY ambiguity,
    split it into UNWIND_HINT_END_OF_STACK and UNWIND_HINT_UNDEFINED to resolve it.
 
  - Fix noinstr violations in the KCSAN code and the lkdtm/stackleak code.
 
  - Generate ORC data for __pfx code
 
  - Add more __noreturn annotations to various kernel startup/shutdown/panic functions.
 
  - Misc improvements & fixes.
 
 Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmRK1x0RHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1ghxQ/+IkCynMYtdF5OG9YwbcGJqsPSfOPMEcEM
 pUSFYg+gGPBDT/fJfcVSqvUtdnWbLC2kXt9yiswXz3X3J2nmNkBk5YKQftsNDcul
 TmKeqIIAK51XTncpegKH0EGnOX63oZ9Vxa8CTPdDlb+YF23Km2FoudGRI9F5qbUd
 LoraXqGYeiaeySkGyWmZVl6Uc8dIxnMkTN3H/oI9aB6TOrsi059hAtFcSaFfyemP
 c4LqXXCH7k2baiQt+qaLZ8cuZVG/+K5r2N2cmjO5kmJc6ynIaFnfMe4XxZLjp5LT
 /PulYI15bXkvSARKx5CRh/CDHMOx5Blw+ASO0RhWbdy0WH4ZhhcaVF5AeIpPW86a
 1LBcz97rMp72WmvKgrJeVO1r9+ll4SI6/YKGJRsxsCMdP3hgFpqntXyVjTFNdTM1
 0gH6H5v55x06vJHvhtTk8SR3PfMTEM2fRU5jXEOrGowoGifx+wNUwORiwj6LE3KQ
 SKUdT19RNzoW3VkFxhgk65ThK1S7YsJUKRoac3YdhttpqqqtFV//erenrZoR4k/p
 vzvKy68EQ7RCNyD5wNWNFe0YjeJl5G8gQ8bUm4Xmab7djjgz+pn4WpQB8yYKJLAo
 x9dqQ+6eUbw3Hcgk6qQ9E+r/svbulnAL0AeALAWK/91DwnZ2mCzKroFkLN7napKi
 fRho4CqzrtM=
 =NwEV
 -----END PGP SIGNATURE-----

Merge tag 'objtool-core-2023-04-27' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull objtool updates from Ingo Molnar:

 - Mark arch_cpu_idle_dead() __noreturn, make all architectures &
   drivers that did this inconsistently follow this new, common
   convention, and fix all the fallout that objtool can now detect
   statically

 - Fix/improve the ORC unwinder becoming unreliable due to
   UNWIND_HINT_EMPTY ambiguity, split it into UNWIND_HINT_END_OF_STACK
   and UNWIND_HINT_UNDEFINED to resolve it

 - Fix noinstr violations in the KCSAN code and the lkdtm/stackleak code

 - Generate ORC data for __pfx code

 - Add more __noreturn annotations to various kernel startup/shutdown
   and panic functions

 - Misc improvements & fixes

* tag 'objtool-core-2023-04-27' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (52 commits)
  x86/hyperv: Mark hv_ghcb_terminate() as noreturn
  scsi: message: fusion: Mark mpt_halt_firmware() __noreturn
  x86/cpu: Mark {hlt,resume}_play_dead() __noreturn
  btrfs: Mark btrfs_assertfail() __noreturn
  objtool: Include weak functions in global_noreturns check
  cpu: Mark nmi_panic_self_stop() __noreturn
  cpu: Mark panic_smp_self_stop() __noreturn
  arm64/cpu: Mark cpu_park_loop() and friends __noreturn
  x86/head: Mark *_start_kernel() __noreturn
  init: Mark start_kernel() __noreturn
  init: Mark [arch_call_]rest_init() __noreturn
  objtool: Generate ORC data for __pfx code
  x86/linkage: Fix padding for typed functions
  objtool: Separate prefix code from stack validation code
  objtool: Remove superfluous dead_end_function() check
  objtool: Add symbol iteration helpers
  objtool: Add WARN_INSN()
  scripts/objdump-func: Support multiple functions
  context_tracking: Fix KCSAN noinstr violation
  objtool: Add stackleak instrumentation to uaccess safe list
  ...
2023-04-28 14:02:54 -07:00
Linus Torvalds
22b8cc3e78 Add support for new Linear Address Masking CPU feature. This is similar
to ARM's Top Byte Ignore and allows userspace to store metadata in some
 bits of pointers without masking it out before use.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEV76QKkVc4xCGURexaDWVMHDJkrAFAmRK/WIACgkQaDWVMHDJ
 krAL+RAAw33EhsWyYVkeAtYmYBKkGvlgeSDULtfJKe5bynJBTHkGKfM6RE9MSJIt
 5fHWaConGh8HNpy0Us1sDvd/aWcWRm5h7ZcCVD+R4qrgh/vc7ULzM+elXe5jzr4W
 cyuTckF2eW6SVrYg6fH5q+6Uy/moDtrdkLRvwRBf+AYeepB8gvSSH5XixKDNiVBE
 pjNy1xXVZQokqD4tjsFelmLttyacR5OabiE/aeVNoFYf9yTwfnN8N3T6kwuOoS4l
 Lp6NA+/0ux+oBlR+Is+JJG8Mxrjvz96yJGZYdR2YP5k3bMQtHAAjuq2w+GgqZm5i
 j3/E6KQepEGaCfC+bHl68xy/kKx8ik+jMCEcBalCC25J3uxbLz41g6K3aI890wJn
 +5ZtfcmoDUk9pnUyLxR8t+UjOSBFAcRSUE+FTjUH1qEGsMPK++9a4iLXz5vYVK1+
 +YCt1u5LNJbkDxE8xVX3F5jkXh0G01SJsuUVAOqHSNfqSNmohFK8/omqhVRrRqoK
 A7cYLtnOGiUXLnvjrwSxPNOzRrG+GAwqaw8gwOTaYogETWbTY8qsSCEVl204uYwd
 m8io9rk2ZXUdDuha56xpBbPE0JHL9hJ2eKCuPkfvRgJT9YFyTh+e0UdX20k+nDjc
 ang1S350o/Y0sus6rij1qS8AuxJIjHucG0GdgpZk3KUbcxoRLhI=
 =qitk
 -----END PGP SIGNATURE-----

Merge tag 'x86_mm_for_6.4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 LAM (Linear Address Masking) support from Dave Hansen:
 "Add support for the new Linear Address Masking CPU feature.

  This is similar to ARM's Top Byte Ignore and allows userspace to store
  metadata in some bits of pointers without masking it out before use"

* tag 'x86_mm_for_6.4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/mm/iommu/sva: Do not allow to set FORCE_TAGGED_SVA bit from outside
  x86/mm/iommu/sva: Fix error code for LAM enabling failure due to SVA
  selftests/x86/lam: Add test cases for LAM vs thread creation
  selftests/x86/lam: Add ARCH_FORCE_TAGGED_SVA test cases for linear-address masking
  selftests/x86/lam: Add inherit test cases for linear-address masking
  selftests/x86/lam: Add io_uring test cases for linear-address masking
  selftests/x86/lam: Add mmap and SYSCALL test cases for linear-address masking
  selftests/x86/lam: Add malloc and tag-bits test cases for linear-address masking
  x86/mm/iommu/sva: Make LAM and SVA mutually exclusive
  iommu/sva: Replace pasid_valid() helper with mm_valid_pasid()
  mm: Expose untagging mask in /proc/$PID/status
  x86/mm: Provide arch_prctl() interface for LAM
  x86/mm: Reduce untagged_addr() overhead for systems without LAM
  x86/uaccess: Provide untagged_addr() and remove tags before address check
  mm: Introduce untagged_addr_remote()
  x86/mm: Handle LAM on context switch
  x86: CPUID and CR3/CR4 flags for Linear Address Masking
  x86: Allow atomic MM_CONTEXT flags setting
  x86/mm: Rework address range check in get_user() and put_user()
2023-04-28 09:43:49 -07:00
Linus Torvalds
7b664cc38e * Do conditional __tdx_hypercall() 'output' processing via an
assembly macro argument rather than a runtime register.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEV76QKkVc4xCGURexaDWVMHDJkrAFAmRKvncACgkQaDWVMHDJ
 krDQsA/+OlDCexITqWgmd3rPN2ZUuyB+aV4MfoKppQuAuWD4I7vxD5qeqjRS2XTh
 E6SSzp43zEVhVo6Kv3UvPR/Tr9edUGn2KzIWmqd1bOwhgbEfd898gzbWuRmK6i8t
 qqweR1RMAL/COgPAlcrdpTLl2PCc9tLYpDnQ8WcAUqH4uoePpQyN3Za0J/dcKX7l
 8XexOAaco4Wz3ylD9npPcLo9ytvohg+exJtCNldN1l2j5xXdA2fTqEJYaUMp/+Nd
 Z1TTQ43QcT7dRknFojxdYfAkCqBfr8ccBAwV1mriahKWY/3xl35BqSeJVlma1tkm
 UzkTY1CFwKYRk24C/oQK7OQMYnyJ7Q1RhSrd91lQWVjaTcI/3DPUKiKKdwFXDv4C
 FUYvuJkanPVk3PyCZRvltdNvsXsifzx0RKZWLZ+3TQ2jtaMEDOzPgChq7a6WfpkQ
 HQPuVoENHvyHdUycQhtELUsaJ3AdnOM87XiQDcbNNiaPiOLB9C8dhSWMKoPsMehO
 oAiUQ7lW6po0lcELVSKib2ASVpXhOmlAxdRyZ50mhjrbpcxfBBGD3+KdFqZ4Gs1c
 8UyrQbjVq07Lx2fvdizvDpIcr4M7z0xBAhJeIegC6z86XpJq5uvin+vOLzFAfe16
 WGy6FiZtVpXp4fyqUY7GgQNqhk1b8h6EHKd9d/zCSPuH8/wT/6g=
 =hvDm
 -----END PGP SIGNATURE-----

Merge tag 'x86_tdx_for_6.4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 tdx update from Dave Hansen:
 "The original tdx hypercall assembly code took two flags in %RSI to
  tweak its behavior at runtime. PeterZ recently axed one flag in commit
  e80a48bade ("x86/tdx: Remove TDX_HCALL_ISSUE_STI").

  Kill the other flag too and tweak the 'output' mode with an assembly
  macro instead. This results in elimination of one push/pop pair and
  overall easier to read assembly.

   - Do conditional __tdx_hypercall() 'output' processing via an
     assembly macro argument rather than a runtime register"

* tag 'x86_tdx_for_6.4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/tdx: Drop flags from __tdx_hypercall()
2023-04-28 09:36:09 -07:00
Linus Torvalds
e54debe657 * Improve AMX documentation along with example code
* Explicitly make some hardware constants part of the uabi
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEV76QKkVc4xCGURexaDWVMHDJkrAFAmRKkrkACgkQaDWVMHDJ
 krDtDBAAhWbKRK1rJJsz2GuliF3/f/cZwcNxGG+QGrYBl2F2ilOrmVwNYME2TvHD
 qQJHm8pU7vnDpnkZspqE0OoB6fbSa5qH3RfFhBFRziJFgN9mY0F0IJZeuH/EvJ/0
 7gkRMA3Fs41EESbAWhUTakvC6u3L06SUpUH2W8ixAcawZu+g/FksDXxE+eVVPZaQ
 Ztw17j6/m8W9bZ17HtyWK2vAepPlJhuXFPSAk7ox09ACwkqWAHO0/3RPcbc8HUZV
 lDyYeDhRELG1pai14GhTixRcgkdn4nnnNDmn13xpuwkpOh7FeZL/SoDmXtJ71CrJ
 I1YM1t9aB4ze2WDOo3mSKzU4efspGzAgIH26u19NQTmEp/9ppS+RaifXpt0r1yir
 ygOXkgk8l2qZPxryyL9ROU6b9cnPzsP9k3mWTtNJiJrx0CL73lWkA5KORb/Ezdnj
 kXAjTd4nUeCQJz+7PsnuvGqsT8/Dk1ugnHTu6Bn66U0hV0MNcx5G5m5HehDQBUmb
 TllHGJSGt/1AXIfBZ1p7GSrgCaq3NTzWNmcFxHS3bpC/pyGwszmdDBIS/pODfBfp
 0nG9cG8mte1KkhqjkSYTLtgarQEijs1NWrVnTUogg1kqtlvqZr8Zxun51YAW9Jt5
 zCGoB6W7EWVfJZBMHmVX7a4g21650mgte3YoAAyAwMJFtZG14ng=
 =GlmS
 -----END PGP SIGNATURE-----

Merge tag 'x86_fpu_for_6.4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 fpu updates from Dave Hansen:
 "There's no _actual_ kernel functionality here.

  This expands the documentation around AMX support including some code
  examples. The example code also exposed the fact that hardware
  architecture constants as part of the ABI, but there's no easy place
  that they get defined for apps. Adding them to a uabi header will
  eventually make life easier for consumers of the ABI.

  Summary:

   - Improve AMX documentation along with example code

   - Explicitly make some hardware constants part of the uabi"

* tag 'x86_fpu_for_6.4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  Documentation/x86: Explain the state component permission for guests
  Documentation/x86: Add the AMX enabling example
  x86/arch_prctl: Add AMX feature numbers as ABI constants
  Documentation/x86: Explain the purpose for dynamic features
2023-04-28 09:32:34 -07:00
Linus Torvalds
4980c176a7 Reduce redundant counter reads with resctrl refactoring
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEV76QKkVc4xCGURexaDWVMHDJkrAFAmRKki4ACgkQaDWVMHDJ
 krDf7w//eQjKw0dny5YZGOaHUNKMzJWQtzAnDJS0HuedWIar5+iCRiLh3zbxyKU0
 8IG/xkDHQ5Dd1V7mOyl1g4WdZ/rFmmepl2VtvYnfMs5x+U2sf6LttzzkXetbP+oj
 x5uvfa9Vx8Ad8unhgYa9KIIkg/x02ImyupPLw32R2a/cMTRoi+LJEGiiUAWFTCx6
 4ZCtryAKHDTgrbuOWTz46cEgil3ZQLBI/uvF3IKd7BegfpbXQq/iyXJhhD/hWfVw
 lqswuGZN+yVLTkyJ4EHxUXAJI1AuH327KZI1SgSTe8AKFiygx0ZOrkmeI6cXbKJO
 os22OdT+cwAI8OkblH+9rMAd4dmAnLw9o/rGylC9rzwyXmmRII5FJ6LrbWFvsHmh
 QrUTcRzBtHmwLfqUf60b4bXDmI2MMrN5PAxmvRsHbzSfzMHVJDPXG0IoGBhUPrjS
 QmZuCNjsaVIOOxaSm+EtfFMeRxmfTEc6e3YxEeykfjGqPph9o0YK6H3o/4MgupJe
 uik0scEqBzq2MXkOYv5dysiTb57QAR/Y+CWvZHJ1YcwFvjAQahqFSwQ+gItoHTDL
 Rec9Tq9cm0AG1mZL1fVWFPK+ECiFti3YvZoIZEtAyg6hOMoZsZvu+VWcHFRdgIGk
 5riWJE8MiVyzQGcvWFBFbaeYLm1+obGhJHMpMW476K3jr75y+RU=
 =xO/z
 -----END PGP SIGNATURE-----

Merge tag 'x86_cache_for_6.4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 resctrl update from Dave Hansen:
 "Reduce redundant counter reads with resctrl refactoring"

* tag 'x86_cache_for_6.4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/resctrl: Avoid redundant counter read in __mon_event_count()
2023-04-28 09:30:51 -07:00
Linus Torvalds
682f7bbad2 - Unify duplicated __pa() and __va() definitions
- Simplify sysctl tables registration
 
 - Remove unused symbols
 
 - Correct function name in comment
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmRKjI4ACgkQEsHwGGHe
 VUq85Q/9FbGNdHD2uX2KcpeWkEjYVlrhDudsG8e1JAMCjfSD0CCNhG7yU+6Jabfs
 LszYLwkNeHVpLUUOOtAnObqXpdcv2vML7/j6Cgg5aqdMDv3RwIgTti5tSkHr7s1A
 ejH0Qo/oYYt2OsJYkl+KuGhcaBmdpqEOIeOtV98vBtqgkRDCwdJhhMZeF0qgZ1kN
 r3bFdwy0KIiyI+EBYDXEsew/nI9oEuzoNgaOVIZCeOtHjtbgdl/kc7JgfDd0838D
 nsoNk1R8PVSl6RY30my7TKbFl7epWibinnD9M8NcyYpbLlfZKI7L60ZtQZ5Q49pz
 z+LtXTgeS/fjaFuM8LKkekGprpNiDClgygNini3QsmSb3kfb4ymxJLKbVuXziOLZ
 eYAE+xexCNUYXhmeamvPWjRP9cUgQc3TQD0IQFv/FO8M0gXBA4jTauyRrs+NNmVI
 G7W7T90x1XUu4fZDM/QZ2cn5qtdcRMZm4NcV0WY5OU/ZrrMmMNyGvDfrwLhFOSXi
 nOqzlJ9GNRVjhHsQhCG16B2y3guWmPGXyCvn6Ruuv7RQcm7oK4Rmq6bHuuqcAyaI
 R5z2pRib3AzPNgHUfMgDWuCa7D9jBimVJI/dG0bXG8DCnzaBXfYJn2ruvwvQlVLC
 4WqwdyUxR7k+vf1l0kQ5voGCLbXOcLFBfGP+7RRnEzlyCut2t74=
 =I3Mj
 -----END PGP SIGNATURE-----

Merge tag 'x86_cleanups_for_v6.4_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 cleanups from Borislav Petkov:

 - Unify duplicated __pa() and __va() definitions

 - Simplify sysctl tables registration

 - Remove unused symbols

 - Correct function name in comment

* tag 'x86_cleanups_for_v6.4_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/boot: Centralize __pa()/__va() definitions
  x86: Simplify one-level sysctl registration for itmt_kern_table
  x86: Simplify one-level sysctl registration for abi_table2
  x86/platform/intel-mid: Remove unused definitions from intel-mid.h
  x86/uaccess: Remove memcpy_page_flushcache()
  x86/entry: Change stale function name in comment to error_return()
2023-04-28 09:22:30 -07:00
Linus Torvalds
33afd4b763 Mainly singleton patches all over the place. Series of note are:
- updates to scripts/gdb from Glenn Washburn
 
 - kexec cleanups from Bjorn Helgaas
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZEr+6wAKCRDdBJ7gKXxA
 jn4NAP4u/hj/kR2dxYehcVLuQqJspCRZZBZlAReFJyHNQO6voAEAk0NN9rtG2+/E
 r0G29CJhK+YL0W6mOs8O1yo9J1rZnAM=
 =2CUV
 -----END PGP SIGNATURE-----

Merge tag 'mm-nonmm-stable-2023-04-27-16-01' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull non-MM updates from Andrew Morton:
 "Mainly singleton patches all over the place.

  Series of note are:

   - updates to scripts/gdb from Glenn Washburn

   - kexec cleanups from Bjorn Helgaas"

* tag 'mm-nonmm-stable-2023-04-27-16-01' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (50 commits)
  mailmap: add entries for Paul Mackerras
  libgcc: add forward declarations for generic library routines
  mailmap: add entry for Oleksandr
  ocfs2: reduce ioctl stack usage
  fs/proc: add Kthread flag to /proc/$pid/status
  ia64: fix an addr to taddr in huge_pte_offset()
  checkpatch: introduce proper bindings license check
  epoll: rename global epmutex
  scripts/gdb: add GDB convenience functions $lx_dentry_name() and $lx_i_dentry()
  scripts/gdb: create linux/vfs.py for VFS related GDB helpers
  uapi/linux/const.h: prefer ISO-friendly __typeof__
  delayacct: track delays from IRQ/SOFTIRQ
  scripts/gdb: timerlist: convert int chunks to str
  scripts/gdb: print interrupts
  scripts/gdb: raise error with reduced debugging information
  scripts/gdb: add a Radix Tree Parser
  lib/rbtree: use '+' instead of '|' for setting color.
  proc/stat: remove arch_idle_time()
  checkpatch: check for misuse of the link tags
  checkpatch: allow Closes tags with links
  ...
2023-04-27 19:57:00 -07:00