2
0
mirror of https://github.com/edk2-porting/linux-next.git synced 2024-12-23 12:43:55 +08:00
Commit Graph

39017 Commits

Author SHA1 Message Date
Vasily Averin
ec403e2ae0 memcg: enable accounting for ldt_struct objects
Each task can request own LDT and force the kernel to allocate up to 64Kb
memory per-mm.

There are legitimate workloads with hundreds of processes and there can be
hundreds of workloads running on large machines.  The unaccounted memory
can cause isolation issues between the workloads particularly on highly
utilized machines.

It makes sense to account for this objects to restrict the host's memory
consumption from inside the memcg-limited container.

Link: https://lkml.kernel.org/r/38010594-50fe-c06d-7cb0-d1f77ca422f3@virtuozzo.com
Signed-off-by: Vasily Averin <vvs@virtuozzo.com>
Acked-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Shakeel Butt <shakeelb@google.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Andrei Vagin <avagin@gmail.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Christian Brauner <christian.brauner@ubuntu.com>
Cc: Dmitry Safonov <0x7f454c46@gmail.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Cc: Jeff Layton <jlayton@kernel.org>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Jiri Slaby <jirislaby@kernel.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Kirill Tkhai <ktkhai@virtuozzo.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Roman Gushchin <guro@fb.com>
Cc: Serge Hallyn <serge@hallyn.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Yutian Yang <nglaive@gmail.com>
Cc: Zefan Li <lizefan.x@bytedance.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-09-03 09:58:13 -07:00
Linus Torvalds
98d006eb49 - Prevent the amd/power module from being removed while in use
- Mark AMD IBS as not supporting content exclusion
 
 - Add a workaround for AMD erratum #1197 where IBS registers might
 not be restored properly after exiting CC6 state
 
 - Fix a potential truncation of a 32-bit variable due to shifting
 
 - Read the correct bits describing the number of configurable address
 ranges on Intel PT
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmEraSMACgkQEsHwGGHe
 VUrrYRAAiSc3qhgB8vCdDc0xSiPVYmgr4kiODkydQAzGR71u+1vTW08TBerjBcqH
 kTn0rgB5gsCIk38KRElgT13jPNN7OURxZA/DuCxSTLGRuf7lAJFeb/wcHED8uT+A
 fGntA11WQNLhKFHnd0iyFoQr2sUSJ2kQEakCTN9i/D+7uOtNvRRbLFrVQkbixWM+
 KEp5F0nTDbZbkVfH+zPUgHA/dre5BfZuBRekgts+VhxdqAdo4tPu0gfeAnEsapXR
 Ej3yT1mLHo+MgCviBd28LlGHVsCC6BR4+IRJohh6iYI0jvAIxJf3DGJUKgv+w/9O
 3wDUGb3K5tNhq/H0U0/Mo3bpT/spEGQob4NaCpWpmKFImn2Fa+90bG4G5iq016Sp
 eDnriDNzRILjZ+TUyAJHmpS6qzzjU2q0OCaPeDtXSsJ6MGvCWdHDimn5UDmDZ94X
 gTMGCEDMVaLrM4TJDbN9tRLDKIR5rP/a+lbebJB1LNQZNYICdXZ5V+ak84Ao/5Zh
 zdcmZ6RzVk1diureXzvdpeBjMNWN7jVyf6UIaXArc+GPY8t9amBieULfvx8xQDKn
 5gToTVq6TnNr+APLpa+lraDzPLmIbAy/WqBVcbUJpT6Te+KL/naxlmTzotWU+G6f
 2YpucklscCpIgngNEgreA+6rxBiPjs/j/g4r13uwhCj+ZGa0XWc=
 =r2tc
 -----END PGP SIGNATURE-----

Merge tag 'perf_urgent_for_v5.14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull perf fixes from Borislav Petkov:

 - Prevent the amd/power module from being removed while in use

 - Mark AMD IBS as not supporting content exclusion

 - Add a workaround for AMD erratum #1197 where IBS registers might not
   be restored properly after exiting CC6 state

 - Fix a potential truncation of a 32-bit variable due to shifting

 - Read the correct bits describing the number of configurable address
   ranges on Intel PT

* tag 'perf_urgent_for_v5.14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf/x86/amd/power: Assign pmu.module
  perf/x86/amd/ibs: Extend PERF_PMU_CAP_NO_EXCLUDE to IBS Op
  perf/x86/amd/ibs: Work around erratum #1197
  perf/x86/intel/uncore: Fix integer overflow on 23 bit left shift of a u32
  perf/x86/intel/pt: Fix mask of num_address_ranges
2021-08-29 10:36:32 -07:00
Linus Torvalds
072a276745 - Fix build error on RHEL where -Werror=maybe-uninitialized is set.
- Restore the firmware's IDT when calling EFI boot services and before
 ExitBootServices() has been called. This fixes a boot failure on what
 appears to be a tablet with 32-bit UEFI running a 64-bit kernel.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmErZfIACgkQEsHwGGHe
 VUoSCxAAkO6D6qAXfa0ryl5hYVXOx1we/LsKpoG/LPolFxg/QmadNRu4C3iP4D2y
 jYo3NfKWk7Nxk4DJmqrglkchws8TyNg6+uCT1z4gaZFlbC38Guy3tsFsmz9/Mw8o
 o5hzlL644k63X7pjtXtQgMxITu/CS1907p8nk8X6/DRQKw0YxB2ViPc8jrLYQvjO
 3MWvAs88WKlxghLX8kltt3+I3ZNLpZKfwi6ew7b2/flA5mTCVBXSt23Zv9JKgUI0
 5By9uKm6ek/OdIzFfO3rZC8bwAq4gEOP811/hVvd8hB4bGvvKijVF+/O1+/k7j88
 4VJsABk96oQXe+JQllsDXPj2qiv7By021psbHFCvDF5cvzvtHVwOYqUJ/vjPMADN
 nct3jSE3oYOLpCs0RvFhY4rBEt4bUFCPp0TZycxFf0NPb+gSR9tfyGoTWy0hhwPP
 FCrbxLkG9swoXtn3lMt2ORQ9LD6IwfAYYo35BlHkA331PvlLBj97vXwsJCi+D8kn
 vMhSbLNxaQJoS8MLmNiLNAO9jWY+cH7UfiAGa2Nvp4AqRk7HCpeeZMuyOYsbvLEp
 2LH5X+rRqZCia6W/U06dexob8oa3bw02c6Vi9bFSslOms+739NM3TBBDScl3OXeW
 moToLSahM6Qg9+XoDK5nvQxmGOXeIKEa/dlqSxIUA0feiTBeUvo=
 =8vyu
 -----END PGP SIGNATURE-----

Merge tag 'x86_urgent_for_v5.14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 fixes from Borislav Petkov:

 - Fix build error on RHEL where -Werror=maybe-uninitialized is set.

 - Restore the firmware's IDT when calling EFI boot services and before
   ExitBootServices() has been called. This fixes a boot failure on what
   appears to be a tablet with 32-bit UEFI running a 64-bit kernel.

* tag 'x86_urgent_for_v5.14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/resctrl: Fix a maybe-uninitialized build warning treated as error
  x86/efi: Restore Firmware IDT before calling ExitBootServices()
2021-08-29 10:26:00 -07:00
Kim Phillips
ccf2648341 perf/x86/amd/power: Assign pmu.module
Assign pmu.module so the driver can't be unloaded whilst in use.

Signed-off-by: Kim Phillips <kim.phillips@amd.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20210817221048.88063-4-kim.phillips@amd.com
2021-08-26 09:12:57 +02:00
Kim Phillips
f11dd0d805 perf/x86/amd/ibs: Extend PERF_PMU_CAP_NO_EXCLUDE to IBS Op
Commit:

   2ff4025069 ("perf/core, arch/x86: Use PERF_PMU_CAP_NO_EXCLUDE for exclusion incapable PMUs")

neglected to do so.

Fixes: 2ff4025069 ("perf/core, arch/x86: Use PERF_PMU_CAP_NO_EXCLUDE for exclusion incapable PMUs")
Signed-off-by: Kim Phillips <kim.phillips@amd.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20210817221048.88063-2-kim.phillips@amd.com
2021-08-26 08:58:02 +02:00
Kim Phillips
26db2e0c51 perf/x86/amd/ibs: Work around erratum #1197
Erratum #1197 "IBS (Instruction Based Sampling) Register State May be
Incorrect After Restore From CC6" is published in a document:

  "Revision Guide for AMD Family 19h Models 00h-0Fh Processors" 56683 Rev. 1.04 July 2021

  https://bugzilla.kernel.org/show_bug.cgi?id=206537

Implement the erratum's suggested workaround and ignore IBS samples if
MSRC001_1031 == 0.

Signed-off-by: Kim Phillips <kim.phillips@amd.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20210817221048.88063-3-kim.phillips@amd.com
2021-08-26 08:58:02 +02:00
Colin Ian King
0b3a8738b7 perf/x86/intel/uncore: Fix integer overflow on 23 bit left shift of a u32
The u32 variable pci_dword is being masked with 0x1fffffff and then left
shifted 23 places. The shift is a u32 operation,so a value of 0x200 or
more in pci_dword will overflow the u32 and only the bottow 32 bits
are assigned to addr. I don't believe this was the original intent.
Fix this by casting pci_dword to a resource_size_t to ensure no
overflow occurs.

Note that the mask and 12 bit left shift operation does not need this
because the mask SNR_IMC_MMIO_MEM0_MASK and shift is always a 32 bit
value.

Fixes: ee49532b38 ("perf/x86/intel/uncore: Add IMC uncore support for Snow Ridge")
Addresses-Coverity: ("Unintentional integer overflow")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Link: https://lore.kernel.org/r/20210706114553.28249-1-colin.king@canonical.com
2021-08-26 08:58:02 +02:00
Xiaoyao Li
c53c6b7409 perf/x86/intel/pt: Fix mask of num_address_ranges
Per SDM, bit 2:0 of CPUID(0x14,1).EAX[2:0] reports the number of
configurable address ranges for filtering, not bit 1:0.

Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Link: https://lkml.kernel.org/r/20210824040622.4081502-1-xiaoyao.li@intel.com
2021-08-25 15:42:31 +02:00
Babu Moger
527f721478 x86/resctrl: Fix a maybe-uninitialized build warning treated as error
The recent commit

  064855a690 ("x86/resctrl: Fix default monitoring groups reporting")

caused a RHEL build failure with an uninitialized variable warning
treated as an error because it removed the default case snippet.

The RHEL Makefile uses '-Werror=maybe-uninitialized' to force possibly
uninitialized variable warnings to be treated as errors. This is also
reported by smatch via the 0day robot.

The error from the RHEL build is:

  arch/x86/kernel/cpu/resctrl/monitor.c: In function ‘__mon_event_count’:
  arch/x86/kernel/cpu/resctrl/monitor.c:261:12: error: ‘m’ may be used
  uninitialized in this function [-Werror=maybe-uninitialized]
    m->chunks += chunks;
              ^~

The upstream Makefile does not build using '-Werror=maybe-uninitialized'.
So, the problem is not seen there. Fix the problem by putting back the
default case snippet.

 [ bp: note that there's nothing wrong with the code and other compilers
   do not trigger this warning - this is being done just so the RHEL compiler
   is happy. ]

Fixes: 064855a690 ("x86/resctrl: Fix default monitoring groups reporting")
Reported-by: Terry Bowman <Terry.Bowman@amd.com>
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Babu Moger <babu.moger@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/162949631908.23903.17090272726012848523.stgit@bmoger-ubuntu
2021-08-22 09:11:29 +02:00
Joerg Roedel
22aa45cb46 x86/efi: Restore Firmware IDT before calling ExitBootServices()
Commit

  79419e13e8 ("x86/boot/compressed/64: Setup IDT in startup_32 boot path")

introduced an IDT into the 32-bit boot path of the decompressor stub.
But the IDT is set up before ExitBootServices() is called, and some UEFI
firmwares rely on their own IDT.

Save the firmware IDT on boot and restore it before calling into EFI
functions to fix boot failures introduced by above commit.

Fixes: 79419e13e8 ("x86/boot/compressed/64: Setup IDT in startup_32 boot path")
Reported-by: Fabio Aiuto <fabioaiuto83@gmail.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Cc: stable@vger.kernel.org # 5.13+
Link: https://lkml.kernel.org/r/20210820125703.32410-1-joro@8bytes.org
2021-08-21 17:57:04 +02:00
Linus Torvalds
02a3715449 Two nested virtualization fixes for AMD processors.
-----BEGIN PGP SIGNATURE-----
 
 iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmEabicUHHBib256aW5p
 QHJlZGhhdC5jb20ACgkQv/vSX3jHroObjgf7BjKq8zyDbS1NyTW+FCaPQk5x4zy6
 0EI521qJNoWMU8p9O7B4EUFJsLr4Oq7mIanact6hPSmctdWa2CxGi/FRG5QQTIpQ
 8Tb2UyPPYn98OTLnM1SBfhuix4QnnYX73IRkklzCFE3Prg+XEjUoTzORVhhioC+k
 sD0cdYEJczsEQ9Boic+6LKNVGs7WHqsVWSruPoEPevZhvl5+RKrmbkJMxyB6G+xF
 EVLmxfuiU4BurRzACHBEghlAWbaqQUVHampCI0/ppH2Fb5RMviTJK0Xcbj04B7Gx
 NH6l5VHTBW7LXiAxcF+oNWLZlmhmWzsVSmw4P01ZuFXlohW3rtPm5WjUiw==
 =+7g8
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull KVM fixes from Paolo Bonzini:
 "Two nested virtualization fixes for AMD processors"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  KVM: nSVM: always intercept VMLOAD/VMSAVE when nested (CVE-2021-3656)
  KVM: nSVM: avoid picking up unsupported bits from L2 in int_ctl (CVE-2021-3653)
2021-08-16 06:23:26 -10:00
Maxim Levitsky
c7dfa40099 KVM: nSVM: always intercept VMLOAD/VMSAVE when nested (CVE-2021-3656)
If L1 disables VMLOAD/VMSAVE intercepts, and doesn't enable
Virtual VMLOAD/VMSAVE (currently not supported for the nested hypervisor),
then VMLOAD/VMSAVE must operate on the L1 physical memory, which is only
possible by making L0 intercept these instructions.

Failure to do so allowed the nested guest to run VMLOAD/VMSAVE unintercepted,
and thus read/write portions of the host physical memory.

Fixes: 89c8a4984f ("KVM: SVM: Enable Virtual VMLOAD VMSAVE feature")

Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-08-16 09:48:37 -04:00
Maxim Levitsky
0f923e0712 KVM: nSVM: avoid picking up unsupported bits from L2 in int_ctl (CVE-2021-3653)
* Invert the mask of bits that we pick from L2 in
  nested_vmcb02_prepare_control

* Invert and explicitly use VIRQ related bits bitmask in svm_clear_vintr

This fixes a security issue that allowed a malicious L1 to run L2 with
AVIC enabled, which allowed the L2 to exploit the uninitialized and enabled
AVIC to read/write the host physical memory at some offsets.

Fixes: 3d6368ef58 ("KVM: SVM: Add VMRUN handler")
Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-08-16 09:48:27 -04:00
Linus Torvalds
c4f14eac22 A set of fixes for PCI/MSI and x86 interrupt startup:
- Mask all MSI-X entries when enabling MSI-X otherwise stale unmasked
    entries stay around e.g. when a crashkernel is booted.
 
  - Enforce masking of a MSI-X table entry when updating it, which mandatory
    according to speification
 
  - Ensure that writes to MSI[-X} tables are flushed.
 
  - Prevent invalid bits being set in the MSI mask register
 
  - Properly serialize modifications to the mask cache and the mask register
    for multi-MSI.
 
  - Cure the violation of the affinity setting rules on X86 during interrupt
    startup which can cause lost and stale interrupts. Move the initial
    affinity setting ahead of actualy enabling the interrupt.
 
  - Ensure that MSI interrupts are completely torn down before freeing them
    in the error handling case.
 
  - Prevent an array out of bounds access in the irq timings code.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmEY5bcTHHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYoaMvD/9KeK2430f4h/x/fQhHmHHIJOv3kqmB
 gRXX4RV+N/DfU9GSflbzPxY9l2SkydgpQjHeGnqpV7DYRIu84nVYAuWcWtPimHHy
 JxapniLlQv2GS+SIy9f1mmChH6VUPS05brHxKSqAQZvQIoZqza8vF3umZlV7eYF4
 uZFd86TCbDFsBxbsKmyV1FtQLo008EeEp8dtZ/1cZ9Fbp0M/mQkuu7aTNqY0qWwZ
 rAoGyE4PjDR+yf87XjE5z7hMs2vfUjiGXg7Kbp30NPKGcRyasb+SlHVKcvZKJIji
 Y0Bk/SOyqoj1Co3U+cEaWolB1MeGff4nP+Xx8xvyNklKxxs1+92Z7L1RElXIc0cL
 kmUehUSf5JuJ83B6ucAYbmnXKNw1XB00PaMy7iSxsYekTXJx+t0b+Rt6o0R3inWB
 xUWbIVmoL2uF1oOAb6mEc3wDNMBVkY33e9l2jD0PUPxKXZ730MVeojWJ8FGFiPOT
 9+aCRLjZHV5slVQAgLnlpcrseJLuUei6HLVwRXxv19Bz5L+HuAXUxWL9h74SRuE9
 14kH63aXSVDlcYyW7c3t8Lh6QjKAf7AIz0iG+u3n09IWyURd4agHuKOl5itileZB
 BK9NuRrNgmr2nEKG461Suc6GojLBXc1ih3ak+MG+O4iaLxnhapTjW3Weqr+OVXr+
 SrIjoxjpEk2ECA==
 =yf3u
 -----END PGP SIGNATURE-----

Merge tag 'irq-urgent-2021-08-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull irq fixes from Thomas Gleixner:
 "A set of fixes for PCI/MSI and x86 interrupt startup:

   - Mask all MSI-X entries when enabling MSI-X otherwise stale unmasked
     entries stay around e.g. when a crashkernel is booted.

   - Enforce masking of a MSI-X table entry when updating it, which
     mandatory according to speification

   - Ensure that writes to MSI[-X} tables are flushed.

   - Prevent invalid bits being set in the MSI mask register

   - Properly serialize modifications to the mask cache and the mask
     register for multi-MSI.

   - Cure the violation of the affinity setting rules on X86 during
     interrupt startup which can cause lost and stale interrupts. Move
     the initial affinity setting ahead of actualy enabling the
     interrupt.

   - Ensure that MSI interrupts are completely torn down before freeing
     them in the error handling case.

   - Prevent an array out of bounds access in the irq timings code"

* tag 'irq-urgent-2021-08-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  driver core: Add missing kernel doc for device::msi_lock
  genirq/msi: Ensure deactivation on teardown
  genirq/timings: Prevent potential array overflow in __irq_timings_store()
  x86/msi: Force affinity setup before startup
  x86/ioapic: Force affinity setup before startup
  genirq: Provide IRQCHIP_AFFINITY_PRE_STARTUP
  PCI/MSI: Protect msi_desc::masked for multi-MSI
  PCI/MSI: Use msi_mask_irq() in pci_msi_shutdown()
  PCI/MSI: Correct misleading comments
  PCI/MSI: Do not set invalid bits in MSI mask
  PCI/MSI: Enforce MSI[X] entry updates to be visible
  PCI/MSI: Enforce that MSI-X table entry is masked for update
  PCI/MSI: Mask all unused MSI-X entries
  PCI/MSI: Enable and mask MSI-X early
2021-08-15 06:49:40 -10:00
Linus Torvalds
b045b8cc86 - An objdump checker fix to ignore parenthesized strings in the objdump
version
 
 - Fix resctrl default monitoring groups reporting when new subgroups get
   created
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmEYyDsACgkQEsHwGGHe
 VUqYvw/7BhM3XR0xsjGnKAKTIofWNgpupqd/CIgcXwty9WKsJfLD0CMWnrvJXKi2
 NiyrGiJ3TKjgWajd7LAQzpVdq+YNgG4i5YY6Lvxc2VVgoccKQqpD0JfU9vT8m6cC
 kzSWV+dLs1ydhmgb+bxKqedrautaPjM7RN8/EAnv56mBUxlemD8WSx/rEnP9sgwF
 RE9teVSBuutMQj8lO238SJMN9AIF11Ti1ZIaHmuIKwjFTSLIETthE3o+Dhhq17gY
 vaP1uYFPlyh5tTJA0pa7wijoStPvZmdUzn5n2QQ5CJCkoDNXrmNEu7qS5SERbZBA
 U6jag/SNLwTkN2cA4Mmpb6HsA8r7vOhweovC9GgInnsyFiKAgZ1tUT7LbFQOUrhq
 QWQTrsews0xwhHLrv7r92mZf/W4cLoS0iEN9rinHiatb3Nr0/5ugDSgErw8scqLC
 JqjDCqy6Wm3NuRQhXoZfqid+WE/xN8BTfsbrQ7kuAqOV3NSVmm3K6XTSUTLtJ/C0
 x+Fj+W+4Q8UthQoW5WldsfnGLrKM4UjmXBQbM5o9fWW1L4gYIM6FD6uVqBZk5GAs
 bxuT4f1M/R3/5qdm9L69e4WPduyo53/+bJjmwA9DXLaKXvnFqkZikkV3+S3U+9/j
 pKhg+IfRZ4f1ymjH8sEwEsA037cmP3OzrIwF0vrQDIrWErL5h7Q=
 =Gg+d
 -----END PGP SIGNATURE-----

Merge tag 'x86_urgent_for_v5.14_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 fixes from Borislav Petkov:
 "Two fixes:

   - An objdump checker fix to ignore parenthesized strings in the
     objdump version

   - Fix resctrl default monitoring groups reporting when new subgroups
     get created"

* tag 'x86_urgent_for_v5.14_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/resctrl: Fix default monitoring groups reporting
  x86/tools: Fix objdump version check again
2021-08-15 06:30:24 -10:00
Linus Torvalds
3e763ec791 ARM:
- Plug race between enabling MTE and creating vcpus
 
 - Fix off-by-one bug when checking whether an address range is RAM
 
 x86:
 
 - Fixes for the new MMU, especially a memory leak on hosts with <39
   physical address bits
 
 - Remove bogus EFER.NX checks on 32-bit non-PAE hosts
 
 - WAITPKG fix
 -----BEGIN PGP SIGNATURE-----
 
 iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmEWjBwUHHBib256aW5p
 QHJlZGhhdC5jb20ACgkQv/vSX3jHroPrMgf9EDBsRvD/Kids0kddaoAgM6qICdsH
 tQX/GdsmecUlU16Bkp21XeZif1ZKcJxCmx/dhYmid3woi9HuX5AreFTlLjlJDRxg
 +lJvboqTV0kk7PjaYkOaqd42RSg/BiSLZ+JVPpbW7CqeIr1lGG4yhIC/Nl7fCCto
 sCaY/NoxtraoG5+WZcRRP7XptQmMRckVZ9bimHHh8dKqMkosGx1hcGfj64aKmx4F
 2EVrrjr+an3mpMnwvUIgNw4xEj/jUCFebvGAROVEsrZzNTZ9UrwgT0HeA92XwQVQ
 93z7nqcBUKHH11rnbOvRESEJD9f6I9vCSaiqRROwmoqLY/Xi7jly7XeDcA==
 =Lj8B
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull KVM fixes from Paolo Bonzini:
 "ARM:

   - Plug race between enabling MTE and creating vcpus

   - Fix off-by-one bug when checking whether an address range is RAM

  x86:

   - Fixes for the new MMU, especially a memory leak on hosts with <39
     physical address bits

   - Remove bogus EFER.NX checks on 32-bit non-PAE hosts

   - WAITPKG fix"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  KVM: x86/mmu: Protect marking SPs unsync when using TDP MMU with spinlock
  KVM: x86/mmu: Don't step down in the TDP iterator when zapping all SPTEs
  KVM: x86/mmu: Don't leak non-leaf SPTEs when zapping all SPTEs
  KVM: nVMX: Use vmx_need_pf_intercept() when deciding if L0 wants a #PF
  kvm: vmx: Sync all matching EPTPs when injecting nested EPT fault
  KVM: x86: remove dead initialization
  KVM: x86: Allow guest to set EFER.NX=1 on non-PAE 32-bit kernels
  KVM: VMX: Use current VMCS to query WAITPKG support for MSR emulation
  KVM: arm64: Fix race when enabling KVM_ARM_CAP_MTE
  KVM: arm64: Fix off-by-one in range_is_memory
2021-08-15 06:21:30 -10:00
Paolo Bonzini
6e949ddb0a Merge branch 'kvm-tdpmmu-fixes' into kvm-master
Merge topic branch with fixes for both 5.14-rc6 and 5.15.
2021-08-13 03:33:13 -04:00
Sean Christopherson
ce25681d59 KVM: x86/mmu: Protect marking SPs unsync when using TDP MMU with spinlock
Add yet another spinlock for the TDP MMU and take it when marking indirect
shadow pages unsync.  When using the TDP MMU and L1 is running L2(s) with
nested TDP, KVM may encounter shadow pages for the TDP entries managed by
L1 (controlling L2) when handling a TDP MMU page fault.  The unsync logic
is not thread safe, e.g. the kvm_mmu_page fields are not atomic, and
misbehaves when a shadow page is marked unsync via a TDP MMU page fault,
which runs with mmu_lock held for read, not write.

Lack of a critical section manifests most visibly as an underflow of
unsync_children in clear_unsync_child_bit() due to unsync_children being
corrupted when multiple CPUs write it without a critical section and
without atomic operations.  But underflow is the best case scenario.  The
worst case scenario is that unsync_children prematurely hits '0' and
leads to guest memory corruption due to KVM neglecting to properly sync
shadow pages.

Use an entirely new spinlock even though piggybacking tdp_mmu_pages_lock
would functionally be ok.  Usurping the lock could degrade performance when
building upper level page tables on different vCPUs, especially since the
unsync flow could hold the lock for a comparatively long time depending on
the number of indirect shadow pages and the depth of the paging tree.

For simplicity, take the lock for all MMUs, even though KVM could fairly
easily know that mmu_lock is held for write.  If mmu_lock is held for
write, there cannot be contention for the inner spinlock, and marking
shadow pages unsync across multiple vCPUs will be slow enough that
bouncing the kvm_arch cacheline should be in the noise.

Note, even though L2 could theoretically be given access to its own EPT
entries, a nested MMU must hold mmu_lock for write and thus cannot race
against a TDP MMU page fault.  I.e. the additional spinlock only _needs_ to
be taken by the TDP MMU, as opposed to being taken by any MMU for a VM
that is running with the TDP MMU enabled.  Holding mmu_lock for read also
prevents the indirect shadow page from being freed.  But as above, keep
it simple and always take the lock.

Alternative #1, the TDP MMU could simply pass "false" for can_unsync and
effectively disable unsync behavior for nested TDP.  Write protecting leaf
shadow pages is unlikely to noticeably impact traditional L1 VMMs, as such
VMMs typically don't modify TDP entries, but the same may not hold true for
non-standard use cases and/or VMMs that are migrating physical pages (from
L1's perspective).

Alternative #2, the unsync logic could be made thread safe.  In theory,
simply converting all relevant kvm_mmu_page fields to atomics and using
atomic bitops for the bitmap would suffice.  However, (a) an in-depth audit
would be required, (b) the code churn would be substantial, and (c) legacy
shadow paging would incur additional atomic operations in performance
sensitive paths for no benefit (to legacy shadow paging).

Fixes: a2855afc7e ("KVM: x86/mmu: Allow parallel page faults for the TDP MMU")
Cc: stable@vger.kernel.org
Cc: Ben Gardon <bgardon@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20210812181815.3378104-1-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-08-13 03:32:14 -04:00
Sean Christopherson
0103098fb4 KVM: x86/mmu: Don't step down in the TDP iterator when zapping all SPTEs
Set the min_level for the TDP iterator at the root level when zapping all
SPTEs to optimize the iterator's try_step_down().  Zapping a non-leaf
SPTE will recursively zap all its children, thus there is no need for the
iterator to attempt to step down.  This avoids rereading the top-level
SPTEs after they are zapped by causing try_step_down() to short-circuit.

In most cases, optimizing try_step_down() will be in the noise as the cost
of zapping SPTEs completely dominates the overall time.  The optimization
is however helpful if the zap occurs with relatively few SPTEs, e.g. if KVM
is zapping in response to multiple memslot updates when userspace is adding
and removing read-only memslots for option ROMs.  In that case, the task
doing the zapping likely isn't a vCPU thread, but it still holds mmu_lock
for read and thus can be a noisy neighbor of sorts.

Reviewed-by: Ben Gardon <bgardon@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20210812181414.3376143-3-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-08-13 03:31:56 -04:00
Sean Christopherson
524a1e4e38 KVM: x86/mmu: Don't leak non-leaf SPTEs when zapping all SPTEs
Pass "all ones" as the end GFN to signal "zap all" for the TDP MMU and
really zap all SPTEs in this case.  As is, zap_gfn_range() skips non-leaf
SPTEs whose range exceeds the range to be zapped.  If shadow_phys_bits is
not aligned to the range size of top-level SPTEs, e.g. 512gb with 4-level
paging, the "zap all" flows will skip top-level SPTEs whose range extends
beyond shadow_phys_bits and leak their SPs when the VM is destroyed.

Use the current upper bound (based on host.MAXPHYADDR) to detect that the
caller wants to zap all SPTEs, e.g. instead of using the max theoretical
gfn, 1 << (52 - 12).  The more precise upper bound allows the TDP iterator
to terminate its walk earlier when running on hosts with MAXPHYADDR < 52.

Add a WARN on kmv->arch.tdp_mmu_pages when the TDP MMU is destroyed to
help future debuggers should KVM decide to leak SPTEs again.

The bug is most easily reproduced by running (and unloading!) KVM in a
VM whose host.MAXPHYADDR < 39, as the SPTE for gfn=0 will be skipped.

  =============================================================================
  BUG kvm_mmu_page_header (Not tainted): Objects remaining in kvm_mmu_page_header on __kmem_cache_shutdown()
  -----------------------------------------------------------------------------
  Slab 0x000000004d8f7af1 objects=22 used=2 fp=0x00000000624d29ac flags=0x4000000000000200(slab|zone=1)
  CPU: 0 PID: 1582 Comm: rmmod Not tainted 5.14.0-rc2+ #420
  Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
  Call Trace:
   dump_stack_lvl+0x45/0x59
   slab_err+0x95/0xc9
   __kmem_cache_shutdown.cold+0x3c/0x158
   kmem_cache_destroy+0x3d/0xf0
   kvm_mmu_module_exit+0xa/0x30 [kvm]
   kvm_arch_exit+0x5d/0x90 [kvm]
   kvm_exit+0x78/0x90 [kvm]
   vmx_exit+0x1a/0x50 [kvm_intel]
   __x64_sys_delete_module+0x13f/0x220
   do_syscall_64+0x3b/0xc0
   entry_SYSCALL_64_after_hwframe+0x44/0xae

Fixes: faaf05b00a ("kvm: x86/mmu: Support zapping SPTEs in the TDP MMU")
Cc: stable@vger.kernel.org
Cc: Ben Gardon <bgardon@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20210812181414.3376143-2-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-08-13 03:31:46 -04:00
Sean Christopherson
18712c1370 KVM: nVMX: Use vmx_need_pf_intercept() when deciding if L0 wants a #PF
Use vmx_need_pf_intercept() when determining if L0 wants to handle a #PF
in L2 or if the VM-Exit should be forwarded to L1.  The current logic fails
to account for the case where #PF is intercepted to handle
guest.MAXPHYADDR < host.MAXPHYADDR and ends up reflecting all #PFs into
L1.  At best, L1 will complain and inject the #PF back into L2.  At
worst, L1 will eat the unexpected fault and cause L2 to hang on infinite
page faults.

Note, while the bug was technically introduced by the commit that added
support for the MAXPHYADDR madness, the shame is all on commit
a0c134347b ("KVM: VMX: introduce vmx_need_pf_intercept").

Fixes: 1dbf5d68af ("KVM: VMX: Add guest physical address check in EPT violation and misconfig")
Cc: stable@vger.kernel.org
Cc: Peter Shier <pshier@google.com>
Cc: Oliver Upton <oupton@google.com>
Cc: Jim Mattson <jmattson@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20210812045615.3167686-1-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-08-13 03:20:58 -04:00
Junaid Shahid
85aa8889b8 kvm: vmx: Sync all matching EPTPs when injecting nested EPT fault
When a nested EPT violation/misconfig is injected into the guest,
the shadow EPT PTEs associated with that address need to be synced.
This is done by kvm_inject_emulated_page_fault() before it calls
nested_ept_inject_page_fault(). However, that will only sync the
shadow EPT PTE associated with the current L1 EPTP. Since the ASID
is based on EP4TA rather than the full EPTP, so syncing the current
EPTP is not enough. The SPTEs associated with any other L1 EPTPs
in the prev_roots cache with the same EP4TA also need to be synced.

Signed-off-by: Junaid Shahid <junaids@google.com>
Message-Id: <20210806222229.1645356-1-junaids@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-08-13 03:20:58 -04:00
Paolo Bonzini
375d1adebc Merge branch 'kvm-vmx-secctl' into kvm-master
Merge common topic branch for 5.14-rc6 and 5.15 merge window.
2021-08-13 03:20:18 -04:00
Paolo Bonzini
ffbe17cada KVM: x86: remove dead initialization
hv_vcpu is initialized again a dozen lines below, and at this
point vcpu->arch.hyperv is not valid.  Remove the initializer.

Reported-by: kernel test robot <lkp@intel.com>
Reviewed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-08-13 03:20:18 -04:00
Sean Christopherson
1383279c64 KVM: x86: Allow guest to set EFER.NX=1 on non-PAE 32-bit kernels
Remove an ancient restriction that disallowed exposing EFER.NX to the
guest if EFER.NX=0 on the host, even if NX is fully supported by the CPU.
The motivation of the check, added by commit 2cc51560ae ("KVM: VMX:
Avoid saving and restoring msr_efer on lightweight vmexit"), was to rule
out the case of host.EFER.NX=0 and guest.EFER.NX=1 so that KVM could run
the guest with the host's EFER.NX and thus avoid context switching EFER
if the only divergence was the NX bit.

Fast forward to today, and KVM has long since stopped running the guest
with the host's EFER.NX.  Not only does KVM context switch EFER if
host.EFER.NX=1 && guest.EFER.NX=0, KVM also forces host.EFER.NX=0 &&
guest.EFER.NX=1 when using shadow paging (to emulate SMEP).  Furthermore,
the entire motivation for the restriction was made obsolete over a decade
ago when Intel added dedicated host and guest EFER fields in the VMCS
(Nehalem timeframe), which reduced the overhead of context switching EFER
from 400+ cycles (2 * WRMSR + 1 * RDMSR) to a mere ~2 cycles.

In practice, the removed restriction only affects non-PAE 32-bit kernels,
as EFER.NX is set during boot if NX is supported and the kernel will use
PAE paging (32-bit or 64-bit), regardless of whether or not the kernel
will actually use NX itself (mark PTEs non-executable).

Alternatively and/or complementarily, startup_32_smp() in head_32.S could
be modified to set EFER.NX=1 regardless of paging mode, thus eliminating
the scenario where NX is supported but not enabled.  However, that runs
the risk of breaking non-KVM non-PAE kernels (though the risk is very,
very low as there are no known EFER.NX errata), and also eliminates an
easy-to-use mechanism for stressing KVM's handling of guest vs. host EFER
across nested virtualization transitions.

Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20210805183804.1221554-1-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-08-13 03:20:17 -04:00
Babu Moger
064855a690 x86/resctrl: Fix default monitoring groups reporting
Creating a new sub monitoring group in the root /sys/fs/resctrl leads to
getting the "Unavailable" value for mbm_total_bytes and mbm_local_bytes
on the entire filesystem.

Steps to reproduce:

  1. mount -t resctrl resctrl /sys/fs/resctrl/

  2. cd /sys/fs/resctrl/

  3. cat mon_data/mon_L3_00/mbm_total_bytes
     23189832

  4. Create sub monitor group:
  mkdir mon_groups/test1

  5. cat mon_data/mon_L3_00/mbm_total_bytes
     Unavailable

When a new monitoring group is created, a new RMID is assigned to the
new group. But the RMID is not active yet. When the events are read on
the new RMID, it is expected to report the status as "Unavailable".

When the user reads the events on the default monitoring group with
multiple subgroups, the events on all subgroups are consolidated
together. Currently, if any of the RMID reads report as "Unavailable",
then everything will be reported as "Unavailable".

Fix the issue by discarding the "Unavailable" reads and reporting all
the successful RMID reads. This is not a problem on Intel systems as
Intel reports 0 on Inactive RMIDs.

Fixes: d89b737901 ("x86/intel_rdt/cqm: Add mon_data")
Reported-by: Paweł Szulik <pawel.szulik@intel.com>
Signed-off-by: Babu Moger <Babu.Moger@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Reinette Chatre <reinette.chatre@intel.com>
Cc: stable@vger.kernel.org
Link: https://bugzilla.kernel.org/show_bug.cgi?id=213311
Link: https://lkml.kernel.org/r/162793309296.9224.15871659871696482080.stgit@bmoger-ubuntu
2021-08-12 20:12:20 +02:00
Randy Dunlap
839ad22f75 x86/tools: Fix objdump version check again
Skip (omit) any version string info that is parenthesized.

Warning: objdump version 15) is older than 2.19
Warning: Skipping posttest.

where 'objdump -v' says:
GNU objdump (GNU Binutils; SUSE Linux Enterprise 15) 2.35.1.20201123-7.18

Fixes: 8bee738bb1 ("x86: Fix objdump version check in chkobjdump.awk for different formats.")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org>
Link: https://lore.kernel.org/r/20210731000146.2720-1-rdunlap@infradead.org
2021-08-12 17:17:25 +02:00
Sean Christopherson
7b9cae027b KVM: VMX: Use current VMCS to query WAITPKG support for MSR emulation
Use the secondary_exec_controls_get() accessor in vmx_has_waitpkg() to
effectively get the controls for the current VMCS, as opposed to using
vmx->secondary_exec_controls, which is the cached value of KVM's desired
controls for vmcs01 and truly not reflective of any particular VMCS.

While the waitpkg control is not dynamic, i.e. vmcs01 will always hold
the same waitpkg configuration as vmx->secondary_exec_controls, the same
does not hold true for vmcs02 if the L1 VMM hides the feature from L2.
If L1 hides the feature _and_ does not intercept MSR_IA32_UMWAIT_CONTROL,
L2 could incorrectly read/write L1's virtual MSR instead of taking a #GP.

Fixes: 6e3ba4abce ("KVM: vmx: Emulate MSR IA32_UMWAIT_CONTROL")
Cc: stable@vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20210810171952.2758100-2-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-08-10 13:32:09 -04:00
Thomas Gleixner
ff363f480e x86/msi: Force affinity setup before startup
The X86 MSI mechanism cannot handle interrupt affinity changes safely after
startup other than from an interrupt handler, unless interrupt remapping is
enabled. The startup sequence in the generic interrupt code violates that
assumption.

Mark the irq chips with the new IRQCHIP_AFFINITY_PRE_STARTUP flag so that
the default interrupt setting happens before the interrupt is started up
for the first time.

While the interrupt remapping MSI chip does not require this, there is no
point in treating it differently as this might spare an interrupt to a CPU
which is not in the default affinity mask.

For the non-remapping case go to the direct write path when the interrupt
is not yet started similar to the not yet activated case.

Fixes: 1840475676 ("genirq: Expose default irq affinity mask (take 3)")
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Marc Zyngier <maz@kernel.org>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20210729222542.886722080@linutronix.de
2021-08-10 10:59:21 +02:00
Thomas Gleixner
0c0e37dc11 x86/ioapic: Force affinity setup before startup
The IO/APIC cannot handle interrupt affinity changes safely after startup
other than from an interrupt handler. The startup sequence in the generic
interrupt code violates that assumption.

Mark the irq chip with the new IRQCHIP_AFFINITY_PRE_STARTUP flag so that
the default interrupt setting happens before the interrupt is started up
for the first time.

Fixes: 1840475676 ("genirq: Expose default irq affinity mask (take 3)")
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Marc Zyngier <maz@kernel.org>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20210729222542.832143400@linutronix.de
2021-08-10 10:59:21 +02:00
Linus Torvalds
74eedeba45 A set of perf fixes:
- Correct the permission checks for perf event which send SIGTRAP to a
    different process and clean up that code to be more readable.
 
  - Prevent an out of bound MSR access in the x86 perf code which happened
    due to an incomplete limiting to the actually available hardware
    counters.
 
  - Prevent access to the AMD64_EVENTSEL_HOSTONLY bit when running inside a
    guest.
 
  - Handle small core counter re-enabling correctly by issuing an ACK right
    before reenabling it to prevent a stale PEBS record being kept around.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmEPv6UTHHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYob8hD/wMmRLAoc/uvJIIICJ+IQVnnU8WToIS
 Qy1dAPpQMz6pQpRQor1AGpcP89IMnLVhZn84lsd+kw0/Lv630JbWsXvQ8jB2GPHn
 17XewPp4l4PDUgKaGEKIjPSjsmnZmzOLTYIy5gWOfA/h5EG/1D+ozvcRGDMaXWUw
 +65Pinaf2QKfjYZV11SVJMLF5zLYUxMc6vRag00WrcPxd+JO4eVeV36g0LTmhABW
 fOSDcBOSVrT2w9MYDpNmPvMh3dN2vlfhrEk10NBKslx8uk4t8sV/Jbs+48WhydKa
 zmdqthtjIekRUSxhiHJve70D9ngveCBSKQDp0Us2BWWxdnM0+HV6ozjuxO0julCH
 5tW4413fz2AoZJhWkTn3PE4nPG3apRCnL2B+jTFHHqCjKSkkrNDRJDOEUwasXjV5
 jn25DLhOq5ltkMrLFDTV/h2RZqU0fAMV2iwNSkjD3lVLgKt6B3/uSnvE9SXmaJjs
 njk/1LzeWwY+sk7YYXouPQ2STEDCKvOJGYZSS5pFA03mVaQgfuJxpyHKH+7nj9tV
 k0FLDLMmSucYIWBq0iapa8cR69e0ZIE48hSNR3AOIIOVh3LusmA4HkogOAQG7kdZ
 P2nKQUdN+SR8rL9KQRauP63J508fg0kkXNgSAm1lFWBDnFKt6shkkHGcL+5PzxJW
 1Bjx2wc52Ww84A==
 =hhv+
 -----END PGP SIGNATURE-----

Merge tag 'perf-urgent-2021-08-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull perf fixes from Thomas Gleixner:
 "A set of perf fixes:

   - Correct the permission checks for perf event which send SIGTRAP to
     a different process and clean up that code to be more readable.

   - Prevent an out of bound MSR access in the x86 perf code which
     happened due to an incomplete limiting to the actually available
     hardware counters.

   - Prevent access to the AMD64_EVENTSEL_HOSTONLY bit when running
     inside a guest.

   - Handle small core counter re-enabling correctly by issuing an ACK
     right before reenabling it to prevent a stale PEBS record being
     kept around"

* tag 'perf-urgent-2021-08-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf/x86/intel: Apply mid ACK for small core
  perf/x86/amd: Don't touch the AMD64_EVENTSEL_HOSTONLY bit inside the guest
  perf/x86: Fix out of bound MSR access
  perf: Refactor permissions check into perf_check_permission()
  perf: Fix required permissions if sigtrap is requested
2021-08-08 11:46:13 -07:00
Linus Torvalds
4972bb90c3 Kbuild fixes for v5.14 (2nd)
- Correct the Extended Regular Expressions in tools
 
  - Adjust scripts/checkversion.pl for the current Kbuild
 
  - Unset sub_make_done for 'make install' to make DKMS working again
 -----BEGIN PGP SIGNATURE-----
 
 iQJJBAABCgAzFiEEbmPs18K1szRHjPqEPYsBB53g2wYFAmEOl2QVHG1hc2FoaXJv
 eUBrZXJuZWwub3JnAAoJED2LAQed4NsG8ikP/1LjJPbB3PjFtVU24TD78z4ztfHq
 xCtwOWzcP5FMXcb5sQWGc0UjwlJq3+meIm8rcRqJfKSUSRxrtUyKm9llwK0sFezF
 GfC84AGKNwJwCAAoxZ7bpqmlQw7HnIGsrk9mzkw/NWa19nUMm3D4Oaek3KMdumdV
 BYPmm4AzTuyXah4a1ZZxmR/47WRty37jIBELAkpQyqhgFrxz420weewEMUiL53cv
 ipaXSluD1v+9ezJi5VBtsedC5TTUYPsqPfmwGaI6QNX4rT/kNNxj1e478JnAtkPM
 CKAbR/0pswWOvlYiDpdVTVmmyigYznCRsuwOwBp0DVYLlshVnCItKiv1rrhUHpED
 1m5jExu5NFoFNhHYTPoOxAj34AlQ7PQAN+M14cklvv1DNrtunR5QbVPEj7PMEd0W
 O5orQag4OqQQ2hqz/q57+FhX3QjijcGHwutjfxb/wfY84+q+e4QjaVITzpD92Yvq
 6j/FhDBE9Z0ZaznF1zgxghM72995n8HW6ZkCGDg6etfUi2aSeeNxFle1OYAtRtmp
 ZCefhAnPsUVjTvwzOZ/43ukUjW20o4uR/I/25MFVdQbFGDnCYpbC/RJDyJK2VxqY
 yznpY6sI9LbpbzxwXUzB5DyAosaExOi1iUJ0NK2YZ47lp7RVAxpvWBEr1VT7m37W
 WF7TETMdF5IV9lFn
 =M+2q
 -----END PGP SIGNATURE-----

Merge tag 'kbuild-fixes-v5.14-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild

Pull Kbuild fixes from Masahiro Yamada:

 - Correct the Extended Regular Expressions in tools

 - Adjust scripts/checkversion.pl for the current Kbuild

 - Unset sub_make_done for 'make install' to make DKMS work again

* tag 'kbuild-fixes-v5.14-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
  kbuild: cancel sub_make_done for the install target to fix DKMS
  scripts: checkversion: modernize linux/version.h search strings
  mips: Fix non-POSIX regexp
  x86/tools/relocs: Fix non-POSIX regexp
2021-08-07 10:03:02 -07:00
Kan Liang
acade63799 perf/x86/intel: Apply mid ACK for small core
A warning as below may be occasionally triggered in an ADL machine when
these conditions occur:

 - Two perf record commands run one by one. Both record a PEBS event.
 - Both runs on small cores.
 - They have different adaptive PEBS configuration (PEBS_DATA_CFG).

  [ ] WARNING: CPU: 4 PID: 9874 at arch/x86/events/intel/ds.c:1743 setup_pebs_adaptive_sample_data+0x55e/0x5b0
  [ ] RIP: 0010:setup_pebs_adaptive_sample_data+0x55e/0x5b0
  [ ] Call Trace:
  [ ]  <NMI>
  [ ]  intel_pmu_drain_pebs_icl+0x48b/0x810
  [ ]  perf_event_nmi_handler+0x41/0x80
  [ ]  </NMI>
  [ ]  __perf_event_task_sched_in+0x2c2/0x3a0

Different from the big core, the small core requires the ACK right
before re-enabling counters in the NMI handler, otherwise a stale PEBS
record may be dumped into the later NMI handler, which trigger the
warning.

Add a new mid_ack flag to track the case. Add all PMI handler bits in
the struct x86_hybrid_pmu to track the bits for different types of
PMUs.  Apply mid ACK for the small cores on an Alder Lake machine.

The existing hybrid() macro has a compile error when taking address of
a bit-field variable. Add a new macro hybrid_bit() to get the
bit-field value of a given PMU.

Fixes: f83d2f91d2 ("perf/x86/intel: Add Alder Lake Hybrid support")
Reported-by: Ammy Yi <ammy.yi@intel.com>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Tested-by: Ammy Yi <ammy.yi@intel.com>
Link: https://lkml.kernel.org/r/1627997128-57891-1-git-send-email-kan.liang@linux.intel.com
2021-08-06 14:25:15 +02:00
Linus Torvalds
97fcc07be8 Mostly bugfixes; plus, support for XMM arguments to Hyper-V hypercalls
now obeys KVM_CAP_HYPERV_ENFORCE_CPUID.  Both the XMM arguments feature
 and KVM_CAP_HYPERV_ENFORCE_CPUID are new in 5.14, and each did not know
 of the other.
 -----BEGIN PGP SIGNATURE-----
 
 iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmELlMoUHHBib256aW5p
 QHJlZGhhdC5jb20ACgkQv/vSX3jHroOIigf+O1JWYyfVR16qpVnkI9voa0j7aPKd
 0v3Qd0ybEbBqOl2QjYFe+aJa42d8bhiMGRG7UNNxYPmW6MukxM3rWZ0wd8IvlktW
 KMR/+2jPw8v8B+M44ty9y30cdW65EGxY8PXPrzSXSF9wT5v0fV8s14Yzxkc1edKM
 X3OVJk1AhlRaGN9PyGl3MJxdvQG7Ci9Ep500i5vhAsdb2Azu9TzfQRVUUJ8c2BWj
 PEI31Z+E0//G0oU/MiHZ5bsYCboiciiHHJxpdFpxVYN7rQ4sOxJswEsO+PpD00K3
 HsZXuTXCRQyZfjbI5QlwNWpU6mZrAb3T8GVNBOP+0g3+fDNyrZzJBiCzbQ==
 =bI5h
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull kvm fixes from Paolo Bonzini:
 "Mostly bugfixes; plus, support for XMM arguments to Hyper-V hypercalls
  now obeys KVM_CAP_HYPERV_ENFORCE_CPUID.

  Both the XMM arguments feature and KVM_CAP_HYPERV_ENFORCE_CPUID are
  new in 5.14, and each did not know of the other"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  KVM: x86/mmu: Fix per-cpu counter corruption on 32-bit builds
  KVM: selftests: fix hyperv_clock test
  KVM: SVM: improve the code readability for ASID management
  KVM: SVM: Fix off-by-one indexing when nullifying last used SEV VMCB
  KVM: Do not leak memory for duplicate debugfs directories
  KVM: selftests: Test access to XMM fast hypercalls
  KVM: x86: hyper-v: Check if guest is allowed to use XMM registers for hypercall input
  KVM: x86: Introduce trace_kvm_hv_hypercall_done()
  KVM: x86: hyper-v: Check access to hypercall before reading XMM registers
  KVM: x86: accept userspace interrupt only if no event is injected
2021-08-05 11:23:09 -07:00
H. Nikolaus Schaller
fa953adfad x86/tools/relocs: Fix non-POSIX regexp
Trying to run a cross-compiled x86 relocs tool on a BSD based
HOSTCC leads to errors like

  VOFFSET arch/x86/boot/compressed/../voffset.h - due to: vmlinux
  CC      arch/x86/boot/compressed/misc.o - due to: arch/x86/boot/compressed/../voffset.h
  OBJCOPY arch/x86/boot/compressed/vmlinux.bin - due to: vmlinux
  RELOCS  arch/x86/boot/compressed/vmlinux.relocs - due to: vmlinux
empty (sub)expressionarch/x86/boot/compressed/Makefile:118: recipe for target 'arch/x86/boot/compressed/vmlinux.relocs' failed
make[3]: *** [arch/x86/boot/compressed/vmlinux.relocs] Error 1

It turns out that relocs.c uses patterns like

	"something(|_end)"

This is not valid syntax or gives undefined results according
to POSIX 9.5.3 ERE Grammar

	https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap09.html

It seems to be silently accepted by the Linux regexp() implementation
while a BSD host complains.

Such patterns can be replaced by a transformation like

	"(|p1|p2)" -> "(p1|p2)?"

Fixes: fd95281530 ("x86-32, relocs: Whitelist more symbols for ld bug workaround")
Signed-off-by: H. Nikolaus Schaller <hns@goldelico.com>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2021-08-05 20:51:11 +09:00
Sean Christopherson
d5aaad6f83 KVM: x86/mmu: Fix per-cpu counter corruption on 32-bit builds
Take a signed 'long' instead of an 'unsigned long' for the number of
pages to add/subtract to the total number of pages used by the MMU.  This
fixes a zero-extension bug on 32-bit kernels that effectively corrupts
the per-cpu counter used by the shrinker.

Per-cpu counters take a signed 64-bit value on both 32-bit and 64-bit
kernels, whereas kvm_mod_used_mmu_pages() takes an unsigned long and thus
an unsigned 32-bit value on 32-bit kernels.  As a result, the value used
to adjust the per-cpu counter is zero-extended (unsigned -> signed), not
sign-extended (signed -> signed), and so KVM's intended -1 gets morphed to
4294967295 and effectively corrupts the counter.

This was found by a staggering amount of sheer dumb luck when running
kvm-unit-tests on a 32-bit KVM build.  The shrinker just happened to kick
in while running tests and do_shrink_slab() logged an error about trying
to free a negative number of objects.  The truly lucky part is that the
kernel just happened to be a slightly stale build, as the shrinker no
longer yells about negative objects as of commit 18bb473e50 ("mm:
vmscan: shrink deferred objects proportional to priority").

 vmscan: shrink_slab: mmu_shrink_scan+0x0/0x210 [kvm] negative objects to delete nr=-858993460

Fixes: bc8a3d8925 ("kvm: mmu: Fix overflow on kvm mmu page limit calculation")
Cc: stable@vger.kernel.org
Cc: Ben Gardon <bgardon@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20210804214609.1096003-1-seanjc@google.com>
Reviewed-by: Jim Mattson <jmattson@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-08-05 03:33:56 -04:00
Mingwei Zhang
bb2baeb214 KVM: SVM: improve the code readability for ASID management
KVM SEV code uses bitmaps to manage ASID states. ASID 0 was always skipped
because it is never used by VM. Thus, in existing code, ASID value and its
bitmap postion always has an 'offset-by-1' relationship.

Both SEV and SEV-ES shares the ASID space, thus KVM uses a dynamic range
[min_asid, max_asid] to handle SEV and SEV-ES ASIDs separately.

Existing code mixes the usage of ASID value and its bitmap position by
using the same variable called 'min_asid'.

Fix the min_asid usage: ensure that its usage is consistent with its name;
allocate extra size for ASID 0 to ensure that each ASID has the same value
with its bitmap position. Add comments on ASID bitmap allocation to clarify
the size change.

Signed-off-by: Mingwei Zhang <mizhang@google.com>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: Marc Orr <marcorr@google.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Alper Gun <alpergun@google.com>
Cc: Dionna Glaze <dionnaglaze@google.com>
Cc: Sean Christopherson <seanjc@google.com>
Cc: Vipin Sharma <vipinsh@google.com>
Cc: Peter Gonda <pgonda@google.com>
Cc: Joerg Roedel <joro@8bytes.org>
Message-Id: <20210802180903.159381-1-mizhang@google.com>
[Fix up sev_asid_free to also index by ASID, as suggested by Sean
 Christopherson, and use nr_asids in sev_cpu_init. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-08-04 09:43:03 -04:00
Like Xu
df51fe7ea1 perf/x86/amd: Don't touch the AMD64_EVENTSEL_HOSTONLY bit inside the guest
If we use "perf record" in an AMD Milan guest, dmesg reports a #GP
warning from an unchecked MSR access error on MSR_F15H_PERF_CTLx:

  [] unchecked MSR access error: WRMSR to 0xc0010200 (tried to write 0x0000020000110076) at rIP: 0xffffffff8106ddb4 (native_write_msr+0x4/0x20)
  [] Call Trace:
  []  amd_pmu_disable_event+0x22/0x90
  []  x86_pmu_stop+0x4c/0xa0
  []  x86_pmu_del+0x3a/0x140

The AMD64_EVENTSEL_HOSTONLY bit is defined and used on the host,
while the guest perf driver should avoid such use.

Fixes: 1018faa6cf ("perf/x86/kvm: Fix Host-Only/Guest-Only counting with SVM disabled")
Signed-off-by: Like Xu <likexu@tencent.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Liam Merwick <liam.merwick@oracle.com>
Tested-by: Kim Phillips <kim.phillips@amd.com>
Tested-by: Liam Merwick <liam.merwick@oracle.com>
Link: https://lkml.kernel.org/r/20210802070850.35295-1-likexu@tencent.com
2021-08-04 15:16:34 +02:00
Peter Zijlstra
f4b4b45652 perf/x86: Fix out of bound MSR access
On Wed, Jul 28, 2021 at 12:49:43PM -0400, Vince Weaver wrote:
> [32694.087403] unchecked MSR access error: WRMSR to 0x318 (tried to write 0x0000000000000000) at rIP: 0xffffffff8106f854 (native_write_msr+0x4/0x20)
> [32694.101374] Call Trace:
> [32694.103974]  perf_clear_dirty_counters+0x86/0x100

The problem being that it doesn't filter out all fake counters, in
specific the above (erroneously) tries to use FIXED_BTS. Limit the
fixed counters indexes to the hardware supplied number.

Reported-by: Vince Weaver <vincent.weaver@maine.edu>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Vince Weaver <vincent.weaver@maine.edu>
Tested-by: Like Xu <likexu@tencent.com>
Link: https://lkml.kernel.org/r/YQJxka3dxgdIdebG@hirez.programming.kicks-ass.net
2021-08-04 15:16:33 +02:00
Sean Christopherson
179c6c27bf KVM: SVM: Fix off-by-one indexing when nullifying last used SEV VMCB
Use the raw ASID, not ASID-1, when nullifying the last used VMCB when
freeing an SEV ASID.  The consumer, pre_sev_run(), indexes the array by
the raw ASID, thus KVM could get a false negative when checking for a
different VMCB if KVM manages to reallocate the same ASID+VMCB combo for
a new VM.

Note, this cannot cause a functional issue _in the current code_, as
pre_sev_run() also checks which pCPU last did VMRUN for the vCPU, and
last_vmentry_cpu is initialized to -1 during vCPU creation, i.e. is
guaranteed to mismatch on the first VMRUN.  However, prior to commit
8a14fe4f0c ("kvm: x86: Move last_cpu into kvm_vcpu_arch as
last_vmentry_cpu"), SVM tracked pCPU on its own and zero-initialized the
last_cpu variable.  Thus it's theoretically possible that older versions
of KVM could miss a TLB flush if the first VMRUN is on pCPU0 and the ASID
and VMCB exactly match those of a prior VM.

Fixes: 70cd94e60c ("KVM: SVM: VMRUN should use associated ASID when SEV is enabled")
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: Brijesh Singh <brijesh.singh@amd.com>
Cc: stable@vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-08-04 06:02:09 -04:00
Vitaly Kuznetsov
4e62aa96d6 KVM: x86: hyper-v: Check if guest is allowed to use XMM registers for hypercall input
TLFS states that "Availability of the XMM fast hypercall interface is
indicated via the “Hypervisor Feature Identification” CPUID Leaf
(0x40000003, see section 2.4.4) ... Any attempt to use this interface
when the hypervisor does not indicate availability will result in a #UD
fault."

Implement the check for 'strict' mode (KVM_CAP_HYPERV_ENFORCE_CPUID).

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Reviewed-by: Siddharth Chandrasekaran <sidcha@amazon.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20210730122625.112848-4-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-08-03 06:16:40 -04:00
Vitaly Kuznetsov
f5714bbb5b KVM: x86: Introduce trace_kvm_hv_hypercall_done()
Hypercall failures are unusual with potentially far going consequences
so it would be useful to see their results when tracing.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Reviewed-by: Siddharth Chandrasekaran <sidcha@amazon.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20210730122625.112848-3-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-08-03 06:16:40 -04:00
Vitaly Kuznetsov
2e2f1e8d04 KVM: x86: hyper-v: Check access to hypercall before reading XMM registers
In case guest doesn't have access to the particular hypercall we can avoid
reading XMM registers.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Reviewed-by: Siddharth Chandrasekaran <sidcha@amazon.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20210730122625.112848-2-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-08-03 06:16:40 -04:00
Linus Torvalds
c7d1022326 Networking fixes for 5.14-rc4, including fixes from bpf, can, WiFi (mac80211)
and netfilter trees.
 
 Current release - regressions:
 
  - mac80211: fix starting aggregation sessions on mesh interfaces
 
 Current release - new code bugs:
 
  - sctp: send pmtu probe only if packet loss in Search Complete state
 
  - bnxt_en: add missing periodic PHC overflow check
 
  - devlink: fix phys_port_name of virtual port and merge error
 
  - hns3: change the method of obtaining default ptp cycle
 
  - can: mcba_usb_start(): add missing urb->transfer_dma initialization
 
 Previous releases - regressions:
 
  - set true network header for ECN decapsulation
 
  - mlx5e: RX, avoid possible data corruption w/ relaxed ordering and LRO
 
  - phy: re-add check for PHY_BRCM_DIS_TXCRXC_NOENRGY on the BCM54811 PHY
 
  - sctp: fix return value check in __sctp_rcv_asconf_lookup
 
 Previous releases - always broken:
 
  - bpf:
        - more spectre corner case fixes, introduce a BPF nospec
          instruction for mitigating Spectre v4
        - fix OOB read when printing XDP link fdinfo
        - sockmap: fix cleanup related races
 
  - mac80211: fix enabling 4-address mode on a sta vif after assoc
 
  - can:
        - raw: raw_setsockopt(): fix raw_rcv panic for sock UAF
        - j1939: j1939_session_deactivate(): clarify lifetime of
               session object, avoid UAF
        - fix number of identical memory leaks in USB drivers
 
  - tipc:
        - do not blindly write skb_shinfo frags when doing decryption
        - fix sleeping in tipc accept routine
 
 Signed-off-by: Jakub Kicinski <kuba@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmEEWm8ACgkQMUZtbf5S
 Irv84A//V/nn9VRdpDpmodwBWVEc9SA00M/nmziRBLwRyG+fRMtnePY4Ha40TPbh
 LL6orth08hZKOjVmMc6Ea4EjZbV5E3iAKtAnaX6wi1HpEXVxKtFYnWxu9ydwTEd9
 An1fltDtWYkNi3kiq7il+Tp1/yZAQ+NYv5zQZCWJ47kkN3jkjULdAEBqODA2A6Ul
 0PQgS1rKzXukE19PlXDuaNuEekhTiEfaTwzHjdBJZkj1toGJGfHsvdQ/YJjixzB9
 44SjE4PfxIaMWP0BVaD6hwzaVQhaZETXhZZufdIDdQd7sDbmd6CPODX6mXfLEq4u
 JaWylgobsK+5ScHE6siVI+ZlW7stq9l1Ynm10ADiwsZVzKEoP745484aEFOLO6Z+
 Ln/IqDQCP/yJQmnl2i0+TfqVDh6BKYoIfUUK/+nzHw4Otycy0m3kj4P+74aYfjOv
 Q+cUgbXUemcrpq6wGUK+zK0NyNHVILvdPDnHPMMypwqPk18y5ZmFvaJAVUPSavD9
 N7t9LoLyGwK3i/Ir4l+JJZ1KgAv1+TbmyNBWvY1Yk/r/vHU3nBPIv26s7YarNAwD
 094vJEJ0+mqO4h+Xj1Nc7HEBFi46JfpN2L8uYoM7gpwziIRMdmpXVLmpEk43WmFi
 UMwWJWqabPEXaozC2UFcFLSk+jS7DiD+G5eG+Fd5HecmKzd7RI0=
 =sKPI
 -----END PGP SIGNATURE-----

Merge tag 'net-5.14-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Pull networking fixes from Jakub Kicinski:
 "Networking fixes for 5.14-rc4, including fixes from bpf, can, WiFi
  (mac80211) and netfilter trees.

  Current release - regressions:

   - mac80211: fix starting aggregation sessions on mesh interfaces

  Current release - new code bugs:

   - sctp: send pmtu probe only if packet loss in Search Complete state

   - bnxt_en: add missing periodic PHC overflow check

   - devlink: fix phys_port_name of virtual port and merge error

   - hns3: change the method of obtaining default ptp cycle

   - can: mcba_usb_start(): add missing urb->transfer_dma initialization

  Previous releases - regressions:

   - set true network header for ECN decapsulation

   - mlx5e: RX, avoid possible data corruption w/ relaxed ordering and
     LRO

   - phy: re-add check for PHY_BRCM_DIS_TXCRXC_NOENRGY on the BCM54811
     PHY

   - sctp: fix return value check in __sctp_rcv_asconf_lookup

  Previous releases - always broken:

   - bpf:
       - more spectre corner case fixes, introduce a BPF nospec
         instruction for mitigating Spectre v4
       - fix OOB read when printing XDP link fdinfo
       - sockmap: fix cleanup related races

   - mac80211: fix enabling 4-address mode on a sta vif after assoc

   - can:
       - raw: raw_setsockopt(): fix raw_rcv panic for sock UAF
       - j1939: j1939_session_deactivate(): clarify lifetime of session
         object, avoid UAF
       - fix number of identical memory leaks in USB drivers

   - tipc:
       - do not blindly write skb_shinfo frags when doing decryption
       - fix sleeping in tipc accept routine"

* tag 'net-5.14-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (91 commits)
  gve: Update MAINTAINERS list
  can: esd_usb2: fix memory leak
  can: ems_usb: fix memory leak
  can: usb_8dev: fix memory leak
  can: mcba_usb_start(): add missing urb->transfer_dma initialization
  can: hi311x: fix a signedness bug in hi3110_cmd()
  MAINTAINERS: add Yasushi SHOJI as reviewer for the Microchip CAN BUS Analyzer Tool driver
  bpf: Fix leakage due to insufficient speculative store bypass mitigation
  bpf: Introduce BPF nospec instruction for mitigating Spectre v4
  sis900: Fix missing pci_disable_device() in probe and remove
  net: let flow have same hash in two directions
  nfc: nfcsim: fix use after free during module unload
  tulip: windbond-840: Fix missing pci_disable_device() in probe and remove
  sctp: fix return value check in __sctp_rcv_asconf_lookup
  nfc: s3fwrn5: fix undefined parameter values in dev_err()
  net/mlx5: Fix mlx5_vport_tbl_attr chain from u16 to u32
  net/mlx5e: Fix nullptr in mlx5e_hairpin_get_mdev()
  net/mlx5: Unload device upon firmware fatal error
  net/mlx5e: Fix page allocation failure for ptp-RQ over SF
  net/mlx5e: Fix page allocation failure for trap-RQ over SF
  ...
2021-07-30 16:01:36 -07:00
Linus Torvalds
f6c5971bb7 libata-5.14-2021-07-30
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmEEEyMQHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgppZeEADdqROLANHp21UFSPyqllHumXVrCK3jXk9d
 ZHahUqT+xQqYZ3BC0hyP7vYuq+FWpr5Rumk6nah46JRv8RnvEHLOjkBqravGl6SV
 Zw2qvGe2R7LueBshsbG9m79D0cR2hcrMj2DYvsNIriTxkDVIo2wReaAg3V/vaep6
 +kpvcjEFB9G4K/ypG2qPJnZ2TCoBmi/iJK5wTbQOpPAxQJxBCJGffBLXg/Olfy74
 k6Oovp0bQWTEziAXNlgawn/Tiwav617/eZgz4ZxgnqzeVD1jJK8bPSf+O1UbNH6z
 lmULEdrc7fMTDgTbv5mElmxtXv+Ba5WZnZgzBFASt1BgvW/BSRNhs191T9Mq4U4L
 gLWDL/oRPhnCOP/AYQVhXzaV98hlOD+UBH3zypbBsCuWLGgDOoZOqjYyTOk+9PwB
 0LFEZr5i/ZAQmgvtYSOH8u9NowhfOThVDhvfWmoD6ByoF0rPeVyPUUr0P910aVwW
 R2JkHKdixqCvyxIZqxwWfTjzApn8fzBGlcY6skMeXbh5pDo9F5HL/QbkKedoUpbj
 fcbklkr/Aggz3pLWq49RqeTtUZiFnolOtUpz09sojA75BxBV0Aa11FYf8JNSKUx+
 8RWLIT80PIxKiPV7Ym4ZG9qJKfzob7Oq/XwKxtReKCnfFcGdF2imroajggvawsmS
 8UtOqwsHjg==
 =m5TP
 -----END PGP SIGNATURE-----

Merge tag 'libata-5.14-2021-07-30' of git://git.kernel.dk/linux-block

Pull libata fixlets from Jens Axboe:

 - A fix for PIO highmem (Christoph)

 - Kill HAVE_IDE as it's now unused (Lukas)

* tag 'libata-5.14-2021-07-30' of git://git.kernel.dk/linux-block:
  arch: Kconfig: clean up obsolete use of HAVE_IDE
  libata: fix ata_pio_sector for CONFIG_HIGHMEM
2021-07-30 10:56:47 -07:00
Lukas Bulwahn
094121ef81 arch: Kconfig: clean up obsolete use of HAVE_IDE
The arch-specific Kconfig files use HAVE_IDE to indicate if IDE is
supported.

As IDE support and the HAVE_IDE config vanishes with commit b7fb14d3ac
("ide: remove the legacy ide driver"), there is no need to mention
HAVE_IDE in all those arch-specific Kconfig files.

The issue was identified with ./scripts/checkkconfigsymbols.py.

Fixes: b7fb14d3ac ("ide: remove the legacy ide driver")
Suggested-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Link: https://lore.kernel.org/r/20210728182115.4401-1-lukas.bulwahn@gmail.com
Reviewed-by: Christoph Hellwig <hch@lst.de>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-07-30 08:19:09 -06:00
Paolo Bonzini
fa7a549d32 KVM: x86: accept userspace interrupt only if no event is injected
Once an exception has been injected, any side effects related to
the exception (such as setting CR2 or DR6) have been taked place.
Therefore, once KVM sets the VM-entry interruption information
field or the AMD EVENTINJ field, the next VM-entry must deliver that
exception.

Pending interrupts are processed after injected exceptions, so
in theory it would not be a problem to use KVM_INTERRUPT when
an injected exception is present.  However, DOSEMU is using
run->ready_for_interrupt_injection to detect interrupt windows
and then using KVM_SET_SREGS/KVM_SET_REGS to inject the
interrupt manually.  For this to work, the interrupt window
must be delayed after the completion of the previous event
injection.

Cc: stable@vger.kernel.org
Reported-by: Stas Sergeev <stsp2@yandex.ru>
Tested-by: Stas Sergeev <stsp2@yandex.ru>
Fixes: 71cc849b70 ("KVM: x86: Fix split-irqchip vs interrupt injection window request")
Reviewed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-07-30 07:53:02 -04:00
Linus Torvalds
7e96bf4762 ARM:
- Fix MTE shared page detection
 
 - Enable selftest's use of PMU registers when asked to
 
 s390:
 
 - restore 5.13 debugfs names
 
 x86:
 
 - fix sizes for vcpu-id indexed arrays
 
 - fixes for AMD virtualized LAPIC (AVIC)
 
 - other small bugfixes
 
 Generic:
 
 - access tracking performance test
 
 - dirty_log_perf_test command line parsing fix
 
 - Fix selftest use of obsolete pthread_yield() in favour of sched_yield()
 
 - use cpu_relax when halt polling
 
 - fixed missing KVM_CLEAR_DIRTY_LOG compat ioctl
 -----BEGIN PGP SIGNATURE-----
 
 iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmECvOwUHHBib256aW5p
 QHJlZGhhdC5jb20ACgkQv/vSX3jHroMjuAf/ZdJx7RKRQxMHG4jHGDtOIQq3qxds
 2uJsFZS3MWkphSOJ+mbomdXTOCHvhPbJlr5TXaSxGnasmAAl+mDk2qVT0tH6638m
 r6M+fu4X0RYvFz54Qnf96V0/elE6ee8rtteXD8WVKQ/XzE3odk1EOqbe7CBDx7yo
 A3SzO8eSBzxamKo22fmE3MR5LVVAcN9wNsCb88XGDTUkTbYl+w597r6zg83rMMlL
 gwD4f9+NYX6h88BVVwLUkWotUrD/5rRGpRVVEZk5eZKvFGzpukk15dfv0PA9347O
 AOM0i/PgnA+Qw6ZsTetWPjD8eFcXDBurGF1tIkyo4X8VogQG0wFIHxbezQ==
 =ZgK/
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull kvm fixes from Paolo Bonzini:
 "ARM:

   - Fix MTE shared page detection

   - Enable selftest's use of PMU registers when asked to

  s390:

   - restore 5.13 debugfs names

  x86:

   - fix sizes for vcpu-id indexed arrays

   - fixes for AMD virtualized LAPIC (AVIC)

   - other small bugfixes

  Generic:

   - access tracking performance test

   - dirty_log_perf_test command line parsing fix

   - Fix selftest use of obsolete pthread_yield() in favour of
     sched_yield()

   - use cpu_relax when halt polling

   - fixed missing KVM_CLEAR_DIRTY_LOG compat ioctl"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  KVM: add missing compat KVM_CLEAR_DIRTY_LOG
  KVM: use cpu_relax when halt polling
  KVM: SVM: use vmcb01 in svm_refresh_apicv_exec_ctrl
  KVM: SVM: tweak warning about enabled AVIC on nested entry
  KVM: SVM: svm_set_vintr don't warn if AVIC is active but is about to be deactivated
  KVM: s390: restore old debugfs names
  KVM: SVM: delay svm_vcpu_init_msrpm after svm->vmcb is initialized
  KVM: selftests: Introduce access_tracking_perf_test
  KVM: selftests: Fix missing break in dirty_log_perf_test arg parsing
  x86/kvm: fix vcpu-id indexed array sizes
  KVM: x86: Check the right feature bit for MSR_KVM_ASYNC_PF_ACK access
  docs: virt: kvm: api.rst: replace some characters
  KVM: Documentation: Fix KVM_CAP_ENFORCE_PV_FEATURE_CPUID name
  KVM: nSVM: Swap the parameter order for svm_copy_vmrun_state()/svm_copy_vmloadsave_state()
  KVM: nSVM: Rename nested_svm_vmloadsave() to svm_copy_vmloadsave_state()
  KVM: arm64: selftests: get-reg-list: actually enable pmu regs in pmu sublist
  KVM: selftests: change pthread_yield to sched_yield
  KVM: arm64: Fix detection of shared VMAs on guest fault
2021-07-29 09:42:09 -07:00
David S. Miller
fc16a5322e Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf
Daniel Borkmann says:

====================
pull-request: bpf 2021-07-29

The following pull-request contains BPF updates for your *net* tree.

We've added 9 non-merge commits during the last 14 day(s) which contain
a total of 20 files changed, 446 insertions(+), 138 deletions(-).

The main changes are:

1) Fix UBSAN out-of-bounds splat for showing XDP link fdinfo, from Lorenz Bauer.

2) Fix insufficient Spectre v4 mitigation in BPF runtime, from Daniel Borkmann,
   Piotr Krysiuk and Benedict Schlueter.

3) Batch of fixes for BPF sockmap found under stress testing, from John Fastabend.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-29 00:53:32 +01:00
Daniel Borkmann
f5e81d1117 bpf: Introduce BPF nospec instruction for mitigating Spectre v4
In case of JITs, each of the JIT backends compiles the BPF nospec instruction
/either/ to a machine instruction which emits a speculation barrier /or/ to
/no/ machine instruction in case the underlying architecture is not affected
by Speculative Store Bypass or has different mitigations in place already.

This covers both x86 and (implicitly) arm64: In case of x86, we use 'lfence'
instruction for mitigation. In case of arm64, we rely on the firmware mitigation
as controlled via the ssbd kernel parameter. Whenever the mitigation is enabled,
it works for all of the kernel code with no need to provide any additional
instructions here (hence only comment in arm64 JIT). Other archs can follow
as needed. The BPF nospec instruction is specifically targeting Spectre v4
since i) we don't use a serialization barrier for the Spectre v1 case, and
ii) mitigation instructions for v1 and v4 might be different on some archs.

The BPF nospec is required for a future commit, where the BPF verifier does
annotate intermediate BPF programs with speculation barriers.

Co-developed-by: Piotr Krysiuk <piotras@gmail.com>
Co-developed-by: Benedict Schlueter <benedict.schlueter@rub.de>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Piotr Krysiuk <piotras@gmail.com>
Signed-off-by: Benedict Schlueter <benedict.schlueter@rub.de>
Acked-by: Alexei Starovoitov <ast@kernel.org>
2021-07-29 00:20:56 +02:00