Add KVM_CAP_MSR_PLATFORM_INFO so that userspace can disable guest access
to reads of MSR_PLATFORM_INFO.
Disabling access to reads of this MSR gives userspace the control to "expose"
this platform-dependent information to guests in a clear way. As it exists
today, guests that read this MSR would get unpopulated information if userspace
hadn't already set it (and prior to this patch series, only the CPUID faulting
information could have been populated). This existing interface could be
confusing if guests don't handle the potential for incorrect/incomplete
information gracefully (e.g. zero reported for base frequency).
Signed-off-by: Drew Schmitt <dasch@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Allow userspace to set turbo bits in MSR_PLATFORM_INFO. Previously, only
the CPUID faulting bit was settable. But now any bit in
MSR_PLATFORM_INFO would be settable. This can be used, for example, to
convey frequency information about the platform on which the guest is
running.
Signed-off-by: Drew Schmitt <dasch@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
According to section "Checks on VMX Controls" in Intel SDM vol 3C, the
following check needs to be enforced on vmentry of L2 guests:
If the 'enable VPID' VM-execution control is 1, the value of the
of the VPID VM-execution control field must not be 0000H.
Signed-off-by: Krish Sadhukhan <krish.sadhukhan@oracle.com>
Reviewed-by: Mark Kanda <mark.kanda@oracle.com>
Reviewed-by: Liran Alon <liran.alon@oracle.com>
Reviewed-by: Jim Mattson <jmattson@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
According to section "Checks on VMX Controls" in Intel SDM vol 3C,
the following check needs to be enforced on vmentry of L2 guests:
- Bits 5:0 of the posted-interrupt descriptor address are all 0.
- The posted-interrupt descriptor address does not set any bits
beyond the processor's physical-address width.
Signed-off-by: Krish Sadhukhan <krish.sadhukhan@oracle.com>
Reviewed-by: Mark Kanda <mark.kanda@oracle.com>
Reviewed-by: Liran Alon <liran.alon@oracle.com>
Reviewed-by: Darren Kenny <darren.kenny@oracle.com>
Reviewed-by: Karl Heubaum <karl.heubaum@oracle.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
In case L1 do not intercept L2 HLT or enter L2 in HLT activity-state,
it is possible for a vCPU to be blocked while it is in guest-mode.
According to Intel SDM 26.6.5 Interrupt-Window Exiting and
Virtual-Interrupt Delivery: "These events wake the logical processor
if it just entered the HLT state because of a VM entry".
Therefore, if L1 enters L2 in HLT activity-state and L2 has a pending
deliverable interrupt in vmcs12->guest_intr_status.RVI, then the vCPU
should be waken from the HLT state and injected with the interrupt.
In addition, if while the vCPU is blocked (while it is in guest-mode),
it receives a nested posted-interrupt, then the vCPU should also be
waken and injected with the posted interrupt.
To handle these cases, this patch enhances kvm_vcpu_has_events() to also
check if there is a pending interrupt in L2 virtual APICv provided by
L1. That is, it evaluates if there is a pending virtual interrupt for L2
by checking RVI[7:4] > VPPR[7:4] as specified in Intel SDM 29.2.1
Evaluation of Pending Interrupts.
Note that this also handles the case of nested posted-interrupt by the
fact RVI is updated in vmx_complete_nested_posted_interrupt() which is
called from kvm_vcpu_check_block() -> kvm_arch_vcpu_runnable() ->
kvm_vcpu_running() -> vmx_check_nested_events() ->
vmx_complete_nested_posted_interrupt().
Reviewed-by: Nikita Leshenko <nikita.leshchenko@oracle.com>
Reviewed-by: Darren Kenny <darren.kenny@oracle.com>
Signed-off-by: Liran Alon <liran.alon@oracle.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
VMX cannot be enabled under SMM, check it when CR4 is set and when nested
virtualization state is restored.
This should fix some WARNs reported by syzkaller, mostly around
alloc_shadow_vmcs.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The functions
kvm_load_guest_fpu()
kvm_put_guest_fpu()
are only used locally, make them static. This requires also that both
functions are moved because they are used before their implementation.
Those functions were exported (via EXPORT_SYMBOL) before commit
e5bb40251a ("KVM: Drop kvm_{load,put}_guest_fpu() exports").
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
These structures are going to be used from KVM code so let's make
their names reflect their Hyper-V origin.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Reviewed-by: Roman Kagan <rkagan@virtuozzo.com>
Acked-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
A VMX preemption timer value of '0' is guaranteed to cause a VMExit
prior to the CPU executing any instructions in the guest. Use the
preemption timer (if it's supported) to trigger immediate VMExit
in place of the current method of sending a self-IPI. This ensures
that pending VMExit injection to L1 occurs prior to executing any
instructions in the guest (regardless of nesting level).
When deferring VMExit injection, KVM generates an immediate VMExit
from the (possibly nested) guest by sending itself an IPI. Because
hardware interrupts are blocked prior to VMEnter and are unblocked
(in hardware) after VMEnter, this results in taking a VMExit(INTR)
before any guest instruction is executed. But, as this approach
relies on the IPI being received before VMEnter executes, it only
works as intended when KVM is running as L0. Because there are no
architectural guarantees regarding when IPIs are delivered, when
running nested the INTR may "arrive" long after L2 is running e.g.
L0 KVM doesn't force an immediate switch to L1 to deliver an INTR.
For the most part, this unintended delay is not an issue since the
events being injected to L1 also do not have architectural guarantees
regarding their timing. The notable exception is the VMX preemption
timer[1], which is architecturally guaranteed to cause a VMExit prior
to executing any instructions in the guest if the timer value is '0'
at VMEnter. Specifically, the delay in injecting the VMExit causes
the preemption timer KVM unit test to fail when run in a nested guest.
Note: this approach is viable even on CPUs with a broken preemption
timer, as broken in this context only means the timer counts at the
wrong rate. There are no known errata affecting timer value of '0'.
[1] I/O SMIs also have guarantees on when they arrive, but I have
no idea if/how those are emulated in KVM.
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
[Use a hook for SVM instead of leaving the default in x86.c - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Provide a singular location where the VMX preemption timer bit is
set/cleared so that future usages of the preemption timer can ensure
the VMCS bit is up-to-date without having to modify unrelated code
paths. For example, the preemption timer can be used to force an
immediate VMExit. Cache the status of the timer to avoid redundant
VMREAD and VMWRITE, e.g. if the timer stays armed across multiple
VMEnters/VMExits.
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
A VMX preemption timer value of '0' at the time of VMEnter is
architecturally guaranteed to cause a VMExit prior to the CPU
executing any instructions in the guest. This architectural
definition is in place to ensure that a previously expired timer
is correctly recognized by the CPU as it is possible for the timer
to reach zero and not trigger a VMexit due to a higher priority
VMExit being signalled instead, e.g. a pending #DB that morphs into
a VMExit.
Whether by design or coincidence, commit f4124500c2 ("KVM: nVMX:
Fully emulate preemption timer") special cased timer values of '0'
and '1' to ensure prompt delivery of the VMExit. Unlike '0', a
timer value of '1' has no has no architectural guarantees regarding
when it is delivered.
Modify the timer emulation to trigger immediate VMExit if and only
if the timer value is '0', and document precisely why '0' is special.
Do this even if calibration of the virtual TSC failed, i.e. VMExit
will occur immediately regardless of the frequency of the timer.
Making only '0' a special case gives KVM leeway to be more aggressive
in ensuring the VMExit is injected prior to executing instructions in
the nested guest, and also eliminates any ambiguity as to why '1' is
a special case, e.g. why wasn't the threshold for a "short timeout"
set to 10, 100, 1000, etc...
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Switch to bitmap_zalloc() to show clearly what we are allocating.
Besides that it returns pointer of bitmap type instead of opaque void *.
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
kvm_commit_zap_page() has been renamed to kvm_mmu_commit_zap_page()
This patch is to fix the commit.
Signed-off-by: Lan Tianyu <Tianyu.Lan@microsoft.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
I run into the following error
testing/selftests/kvm/dirty_log_test.c:285: undefined reference to `pthread_create'
testing/selftests/kvm/dirty_log_test.c:297: undefined reference to `pthread_join'
collect2: error: ld returned 1 exit status
my gcc version is gcc version 4.8.4
"-pthread" would work everywhere
Signed-off-by: Lei Yang <Lei.Yang@windriver.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Here is the code path which shows kvm_mmu_setup() is invoked after
kvm_mmu_create(). Since kvm_mmu_setup() is only invoked in this code path,
this means the root_hpa and prev_roots are guaranteed to be invalid. And
it is not necessary to reset it again.
kvm_vm_ioctl_create_vcpu()
kvm_arch_vcpu_create()
vmx_create_vcpu()
kvm_vcpu_init()
kvm_arch_vcpu_init()
kvm_mmu_create()
kvm_arch_vcpu_setup()
kvm_mmu_setup()
kvm_init_mmu()
This patch set reset_roots to false in kmv_mmu_setup().
Fixes: 50c28f21d0
Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
Reviewed-by: Liran Alon <liran.alon@oracle.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
kvm should not attempt to read guest PDPTEs when CR0.PG = 0 and
CR4.PAE = 1.
Signed-off-by: Junaid Shahid <junaids@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
When VMX is used with flexpriority disabled (because of no support or
if disabled with module parameter) MMIO interface to lAPIC is still
available in x2APIC mode while it shouldn't be (kvm-unit-tests):
PASS: apic_disable: Local apic enabled in x2APIC mode
PASS: apic_disable: CPUID.1H:EDX.APIC[bit 9] is set
FAIL: apic_disable: *0xfee00030: 50014
The issue appears because we basically do nothing while switching to
x2APIC mode when APIC access page is not used. apic_mmio_{read,write}
only check if lAPIC is disabled before proceeding to actual write.
When APIC access is virtualized we correctly manipulate with VMX controls
in vmx_set_virtual_apic_mode() and we don't get vmexits from memory writes
in x2APIC mode so there's no issue.
Disabling MMIO interface seems to be easy. The question is: what do we
do with these reads and writes? If we add apic_x2apic_mode() check to
apic_mmio_in_range() and return -EOPNOTSUPP these reads and writes will
go to userspace. When lAPIC is in kernel, Qemu uses this interface to
inject MSIs only (see kvm_apic_mem_write() in hw/i386/kvm/apic.c). This
somehow works with disabled lAPIC but when we're in xAPIC mode we will
get a real injected MSI from every write to lAPIC. Not good.
The simplest solution seems to be to just ignore writes to the region
and return ~0 for all reads when we're in x2APIC mode. This is what this
patch does. However, this approach is inconsistent with what currently
happens when flexpriority is enabled: we allocate APIC access page and
create KVM memory region so in x2APIC modes all reads and writes go to
this pre-allocated page which is, btw, the same for all vCPUs.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Two fixes for KVM on POWER machines. Both of these relate to memory
corruption and host crashes seen when transparent huge pages are
enabled. The first fixes a host crash that can occur when a DMA
mapping is removed by the guest and the page mapped was part of a
transparent huge page; the second fixes corruption that could occur
when a hypervisor page fault for a radix guest is being serviced at
the same time that the backing page is being collapsed or split.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQEcBAABCAAGBQJbmFIyAAoJEJ2a6ncsY3Gf+L8H/jRQ0ONUpv2xrgirXdPmfuVv
xIVejn5chiygpo3ZY2YkRGjqMoX8usA5pDQONk9duoc48FedSjjmurfAkSA8NESI
y6DSRGB6pir/reP/7tBVk0eeeMBjbYnHPA7KfI8ijK424VmRpCT5stiUm7gQvSEm
LSRUSLwWKfCCjU78HVtiTuK865WZifrOCy6wiNEl79F1K6T1A+LeGaKrcDLjeK/Q
GsNSbwBK37BOvcsm0W1xrlnCmYtR/nVrhjTFMc5noBuc4znQd3wxitgiInFsOH5V
LUWL6IStFkbGKSxVZuJilkhVF58AAisrJnwvlZsjrExWYf1J42kbyvVoURt0O8I=
=blkZ
-----END PGP SIGNATURE-----
Merge tag 'kvm-ppc-fixes-4.19-2' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc into HEAD
Second set of PPC KVM fixes for 4.19
Two fixes for KVM on POWER machines. Both of these relate to memory
corruption and host crashes seen when transparent huge pages are
enabled. The first fixes a host crash that can occur when a DMA
mapping is removed by the guest and the page mapped was part of a
transparent huge page; the second fixes corruption that could occur
when a hypervisor page fault for a radix guest is being serviced at
the same time that the backing page is being collapsed or split.
- more fallout from the hugetlbfs enablement
- bugfix for vma handling
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)
iQIcBAABAgAGBQJbm68DAAoJEBF7vIC1phx8i3cP/R3UwXGrTq3u5+78PfUu+Nvm
zTSgHID25/pCsytvRcK7SSJQt+h53RjuiQlOcrtSJZvq3Btt23gl+pFltqvWGZDM
PYgy+LaaaoIXNeOJqwkwcnNsq4usQ0N1jQ8EgNUxEc2EBkffA6mKcTS0lWDm47AE
Yxsyz0RsvwcbYCOj10s1b4D9h97+ZkqAC2RDnafzQOb54j/aOH7blsStNDXfdkte
xHJ7mELOwAWre2QhD7OeTX1aCjBfKUJzWK2sPxm7qt5nEtH18o9F6j2Y/d8dnWg8
kqthyyb+nH0aHgv3G+7sKBOp1Ra7ftHAVrrbh3m9hceaW5+VuEhWSN6zVZKVdWta
4bjiWa9I1gKS/Wz479ztdbpt+koRAYSg5CLHAD7YsBR3hHtnAZwuRO38S4Xg+Tfb
Hp4Lhf8gnq9yQqrcyfKWQKtU9mb84oNtEw/wQPUvr2tZyeoB5NbnVy5UDa9V1DRS
FnF1/MIZ/MszkwbNEnhk+WWFy41h+uduizfXwQ/YO/xciEEdM2TaDHuLIpBegiSO
4kFfHRSMxqbwNhQSsJI8dIFpSHqXrtn2p6JTsWdi94B0QBYaNm4XMNpgEZIs5CsE
aNGhQa+IZ5jk6YsGMx4mqAmV2jN9bMEoKR/SaOvrSkPvAO9QxZP4V6W72YGoi6le
RZK8Z+HGxg6qpeaAiG0t
=TT85
-----END PGP SIGNATURE-----
Merge tag 'kvm-s390-master-4.19-2' of git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into HEAD
KVM: s390: Fixes for 4.19
- more fallout from the hugetlbfs enablement
- bugfix for vma handling
The Code of Conflict is not achieving its implicit goal of fostering
civility and the spirit of 'be excellent to each other'. Explicit
guidelines have demonstrated success in other projects and other areas
of the kernel.
Here is a Code of Conduct statement for the wider kernel. It is based
on the Contributor Covenant as described at www.contributor-covenant.org
From this point forward, we should abide by these rules in order to help
make the kernel community a welcoming environment to participate in.
Signed-off-by: Chris Mason <clm@fb.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Signed-off-by: Olof Johansson <olof@lxom.net>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull x86 fixes from Ingol Molnar:
"Misc fixes:
- EFI crash fix
- Xen PV fixes
- do not allow PTI on 2-level 32-bit kernels for now
- documentation fix"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/APM: Fix build warning when PROC_FS is not enabled
Revert "x86/mm/legacy: Populate the user page-table with user pgd's"
x86/efi: Load fixmap GDT in efi_call_phys_epilog() before setting %cr3
x86/xen: Disable CPU0 hotplug for Xen PV
x86/EISA: Don't probe EISA bus for Xen PV guests
x86/doc: Fix Documentation/x86/earlyprintk.txt
Pull scheduler fixes from Ingo Molnar:
"Misc fixes: various scheduler metrics corner case fixes, a
sched_features deadlock fix, and a topology fix for certain NUMA
systems"
* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched/fair: Fix kernel-doc notation warning
sched/fair: Fix load_balance redo for !imbalance
sched/fair: Fix scale_rt_capacity() for SMT
sched/fair: Fix vruntime_normalized() for remote non-migration wakeup
sched/pelt: Fix update_blocked_averages() for RT and DL classes
sched/topology: Set correct NUMA topology type
sched/debug: Fix potential deadlock when writing to sched_features
Fix build warning in apm_32.c when CONFIG_PROC_FS is not enabled:
../arch/x86/kernel/apm_32.c:1643:12: warning: 'proc_apm_show' defined but not used [-Wunused-function]
static int proc_apm_show(struct seq_file *m, void *v)
Fixes: 3f3942aca6 ("proc: introduce proc_create_single{,_data}")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Cc: Jiri Kosina <jikos@kernel.org>
Link: https://lkml.kernel.org/r/be39ac12-44c2-4715-247f-4dcc3c525b8b@infradead.org
-----BEGIN PGP SIGNATURE-----
iQGzBAABCgAdFiEE6fsu8pdIjtWE/DpLiiy9cAdyT1EFAlucKDwACgkQiiy9cAdy
T1EhEgwAqgVXTujce2UVPtaFY/MaGmaIwAimh+aYOCAADxLYJHkjtRzHd5PQgf+L
n55R2hFcMeelWxOMEb/aRmxIKLk8fmJYVWClM2+S7Z79M3GHexDbMS8+oDZnzCTB
EknvaTbi+vOt4HGABkbJ/jiQCgonmeobon30gLWaYa3XGeYc7ZV5gR+EXL9xSdvh
+I+x3rSDpm8fQ5njkB7RKgfB+ha4NQqZ6dXlYQzcb0vMO3/lhQ56Ypgn95Jlu5UW
pcLxUFE1do+JeGvIU1it2SCRJ5499g180Rxucl7X1xFBQ44Qss9QOeWkFTZ8784V
PIYVZMTUqO0Km4H22qXD8lIY5GjDuAXLYM3AddFkJpbKaw6g++ZsUXSfoA5zRIDn
10edaPK/hDIQeFaV+ySTN5g/Qh3YFnmY4kDL3t3CRZDe4+DTW/+cmrF3sGkhZDQt
+nDo0JxJsjNnJW6loB5Lb76lygvsng01owSsYSAChjhYwBvCDgp9/D85pDQ/oYkl
HKD9tiF3
=YNSx
-----END PGP SIGNATURE-----
Merge tag '4.19-rc3-smb3-cifs' of git://git.samba.org/sfrench/cifs-2.6
Pull cifs fixes from Steve French:
"Fixes for four CIFS/SMB3 potential pointer overflow issues, one minor
build fix, and a build warning cleanup"
* tag '4.19-rc3-smb3-cifs' of git://git.samba.org/sfrench/cifs-2.6:
cifs: read overflow in is_valid_oplock_break()
cifs: integer overflow in in SMB2_ioctl()
CIFS: fix wrapping bugs in num_entries()
cifs: prevent integer overflow in nxt_dir_entry()
fs/cifs: require sha512
fs/cifs: suppress a string overflow warning
Stable bugfixes:
- v4.17+: Fix a tracepoint Oops in initiate_file_draining()
- v4.17+: Fix a tracepoint Oops in initiate_file_draining()
- v4.11+: Fix an infinite loop on I/O
Other fixes:
- Return errors if a waiting layoutget is killed
- Don't open code clearing of delegation state
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEnZ5MQTpR7cLU7KEp18tUv7ClQOsFAlucGhMACgkQ18tUv7Cl
QOsMCA/9ET0bbzus25DWcnbT10bpEdtW4p6dR/2ztGqUGe0FVyCyVT70jBKFbnAL
cO1pqElKLj7TMPhTsxK63dDxXGELXCqDtmsHELaD8jf0h6270KAPariJBQ5+N0ud
5U5CXswW/zbQekE9GMTDQtAAGBzfht33PavFt2+5oVYTAQ5K6Pwvq2qMoifQxMlk
wjtVjypz0QjBy5bHBO6XGxX58JIc23EwA0/KDS4cU3vkwDXmEZcVYIUdqJF4gtCz
JdJdnT4b9ebtbdHENx8rkot3L1VSx6JfW9pPMvxLjxn8IG1rj7zQXtc7kpnoF8RY
WVGWuf4rn7Zo7YXf11SXNebMYgrljx5/0KcmUtSgSmCCqVUmY1e/a69e0fhkKfxn
/W/+fYIdC1wG0JXtrCN8eJbGropYj3B8Ln5TBV4LN91hFMI8Lx/4D1lKLoK7RNLJ
3Q6VmKIhKVMHgK6eAivWyN7X/WzIvAj37a2ix2xSjENUIP+l3ePAekc4f5XGMe8O
wx6wxgvVSomVrsM565XGcjw+LXUyzlXowS6JhR+Zn6fYmbDkPk0ZMC5HBFidakkB
YxO7aWjBvyYinOrBMWODeWatt4q50bI6YtQpxxWyDIdRXWbvQDjBILveTh6/sRGQ
KbA3r6XC/DSMri4Mo/r+92fbBEYV2otnkkYraz04VgBVYYgiJts=
=C83+
-----END PGP SIGNATURE-----
Merge tag 'nfs-for-4.19-2' of git://git.linux-nfs.org/projects/anna/linux-nfs
Pull NFS client bugfixes from Anna Schumaker:
"These are a handful of fixes for problems that Trond found. Patch #1
and #3 have the same name, a second issue was found after applying the
first patch.
Stable bugfixes:
- v4.17+: Fix tracepoint Oops in initiate_file_draining()
- v4.11+: Fix an infinite loop on I/O
Other fixes:
- Return errors if a waiting layoutget is killed
- Don't open code clearing of delegation state"
* tag 'nfs-for-4.19-2' of git://git.linux-nfs.org/projects/anna/linux-nfs:
NFS: Don't open code clearing of delegation state
NFSv4.1 fix infinite loop on I/O.
NFSv4: Fix a tracepoint Oops in initiate_file_draining()
pNFS: Ensure we return the error if someone kills a waiting layoutget
NFSv4: Fix a tracepoint Oops in initiate_file_draining()
modifies CC_FLAGS_FTRACE. The issue is that it breaks the dependencies
and causes "make targz-pkg" to rebuild the entire world.
-----BEGIN PGP SIGNATURE-----
iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCW5w7nhQccm9zdGVkdEBn
b29kbWlzLm9yZwAKCRAp5XQQmuv6qqoiAQCXoIkvallChR0KJW9zO5pfAjsLUT0d
CiMZ7Gins9GV0gD/WkByVRzXYqSc+pY435L1LK5ZROTRtQo3o/Zynzgo4ww=
=LtIq
-----END PGP SIGNATURE-----
Merge tag 'trace-v4.19-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Pull tracing fix from Steven Rostedt:
"This fixes an issue with the build system caused by a change that
modifies CC_FLAGS_FTRACE. The issue is that it breaks the dependencies
and causes "make targz-pkg" to rebuild the entire world"
* tag 'trace-v4.19-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
tracing/Makefile: Fix handling redefinition of CC_FLAGS_FTRACE
- Fix a regression on systems having a DT without any phandles which
happens on a PowerMac G3.
-----BEGIN PGP SIGNATURE-----
iQJEBAABCgAuFiEEktVUI4SxYhzZyEuo+vtdtY28YcMFAlucCXYQHHJvYmhAa2Vy
bmVsLm9yZwAKCRD6+121jbxhw7FoD/90jvgqPgZOtCPy6nmOMua87E/UxwEYTm40
gbiu6426IGnNAWI7A7dluTMIjsFqZU2AwUzbnxLkdVV96tpzTObVswoSvDFPZqFz
RFVlsAWUmKR4bdm+tC4qJ6p39WpwzM5HPTrTf7xbRahMWGBL0YKKkatEnmKQqthZ
YDJiueJObsFO3AUiewQgyN87209oMO7vGDHPFe1ojGxOp/T4b7bkQe9UaIrbzLxl
oxU+jBHb/E9mQQHsyP11ic/yQRjBxFeEcWWrjVd5q+rLGhwCWKPOHsXu11Vt4438
m3gnYup9QixDvlXfPgPBnK+AALj07d+15wP7UrAEIFG0fwzh9Vm/Bas2x+VjbO2y
LTFaSquXagg98m6TQ21Bpan8iSgGhLRBWEqfVTTniYWORz0qRZuRssEjS+HRdjEM
t3WisK0x7TSe/wS7B6A+wUVYTIDQ5RbSbrWu6h0G26Q+D/pZgZJC0gRRPJFfDa9Z
ofnEc3rqeCqn+uUS1UuLETtV7icKn06B6FkSspFHHAoCSjMWZdBRx7NXlIH1pjPi
VHzronPAORbaLVqNNlEkfESRt6Ii0uRFVFebjLrya4RD/EMvGpDfG8Qme40cLEJa
+hiZc7E9Fo9ERiqheDMrOSuDgSDca976j1o7eTOoN+K9mECW7A0z/H6ciA9ujUys
yEj5SK3/MQ==
=mlwN
-----END PGP SIGNATURE-----
Merge tag 'devicetree-fixes-for-4.19-2' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux
Pull DeviceTree fix from Rob Herring:
"One regression for a 20 year old PowerMac:
- Fix a regression on systems having a DT without any phandles which
happens on a PowerMac G3"
* tag 'devicetree-fixes-for-4.19-2' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux:
of: fix phandle cache creation for DTs with no phandles
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQRTLbB6QfY48x44uB6AXGG7T9hjvgUCW5ve3AAKCRCAXGG7T9hj
vkcPAQDdcQzJRiQfV/+Trx/KpEYOA4TLu+NfNeBAt6q0eLvU+wEAt85ryAiUPEZQ
57BzEuKBYKo3DZ7bz5TDv2tIKNZ/5AE=
=s5Kl
-----END PGP SIGNATURE-----
Merge tag 'for-linus-4.19c-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip
Pull xen fixes from Juergen Gross:
"This contains some minor cleanups and fixes:
- a new knob for controlling scrubbing of pages returned by the Xen
balloon driver to the Xen hypervisor to address a boot performance
issue seen in large guests booted pre-ballooned
- a fix of a regression in the gntdev driver which made it impossible
to use fully virtualized guests (HVM guests) with a 4.19 based dom0
- a fix in Xen cpu hotplug functionality which could be triggered by
wrong admin commands (setting number of active vcpus to 0)
One further note: the patches have all been under test for several
days in another branch. This branch has been rebased in order to avoid
merge conflicts"
* tag 'for-linus-4.19c-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
xen/gntdev: fix up blockable calls to mn_invl_range_start
xen: fix GCC warning and remove duplicate EVTCHN_ROW/EVTCHN_COL usage
xen: avoid crash in disable_hotplug_cpu
xen/balloon: add runtime control for scrubbing ballooned out pages
xen/manage: don't complain about an empty value in control/sysrq node
- don't allocate memory in platform_setup as the memory allocator is not
initialized at that point yet;
- remove unnecessary ifeq KBUILD_SRC from arch/xtensa/Makefile;
- enable SG chaining in arch/xtensa/Kconfig.
-----BEGIN PGP SIGNATURE-----
iQJHBAABCAAxFiEEK2eFS5jlMn3N6xfYUfnMkfg/oEQFAlucAIETHGpjbXZia2Jj
QGdtYWlsLmNvbQAKCRBR+cyR+D+gRNAAD/wME84VqljbhQpzUNmYTb7d5Pst8I3Z
Mwbx4DqDaCAPu8trhMUtSeV6QBal9xxlyl0b7KWrkb5uX1L9ZdaanMC1W9be6f7B
DiwKLrGOhnv0kyLZ4/Rrni6GN90TUo1/2UKyJfaNKJoaqojbLyOY8vGtdtxcQlvT
bQPQ1wPRdfiRTDZkW6dygFF6tY2VT6ao3NqapLhvFUqsm7pzKW5Y1jVkY7PTo8t5
Qk1XyXdCdeoTUqvP12mAe4mst0yGelLPj8phTkXt7OZUXaubky+Rp/enSLhcJdpu
HJKFC7lInh/C6IroINFYErndckSK19mOTm2SUq6R5saR8R/6OMniilaVJBvfm1BH
HhO/6Gzy6K9XQ0VscNAClxBEU7AKbalvHyfI8WcQC+y5Zqpa9nSk9jG1V9U4mLs/
cnL0w+qJ3nh9mJBB/+14S7GwXITu0DlZEOxhsW6xblr/nfmFmvyFSug7i+WAu2Y3
Y4KD9tM7+tBnztggblUPQQm+orPAfpXZDZkGU+e7du0BWZikwcjWeIar2GtFGODz
vtxKIIZyvuAOY4M6Ld+2U9GTcIdOhc+orzJFuNAqbzPnez+0chjFp8OYCj/Fdgj2
YhbiYcMxJRAFdrbcWhHMlmXOCkUrvKyDY+bpx9TzNXyZp0qb290uGshCxM9/q7CR
UB/JZ/up28v5jw==
=YJfc
-----END PGP SIGNATURE-----
Merge tag 'xtensa-20180914' of git://github.com/jcmvbkbc/linux-xtensa
Pull Xtensa fixes and cleanups from Max Filippov:
- don't allocate memory in platform_setup as the memory allocator is
not initialized at that point yet;
- remove unnecessary ifeq KBUILD_SRC from arch/xtensa/Makefile;
- enable SG chaining in arch/xtensa/Kconfig.
* tag 'xtensa-20180914' of git://github.com/jcmvbkbc/linux-xtensa:
xtensa: enable SG chaining in Kconfig
xtensa: remove unnecessary KBUILD_SRC ifeq conditional
xtensa: ISS: don't allocate memory in platform_setup
- Fix ioport_map() mapping the wrong physical address for some I/O BARs
- Remove direct use of "asm goto", since some compilers don't like that
- Ensure kimage_voffset is always present in vmcoreinfo PT_NOTE
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQEcBAABCgAGBQJbm68LAAoJELescNyEwWM0Zt4H/2nhsMtmQCLNXvA9gd1SEBba
Jrw4J+apYQBG7F7aY5mUZrNtCXECeNr1ukyxTEDMuSB0oTKWTFsQ52Sz4chsjl5u
JdNzWRq8LtkWbxxmnzTcYvjHZeDfDx/GQqAHJp047NBI4+6GxXVuEbrPPFDvQuE4
xvMhGpIg9R/jfkhmRKCstalp6aQPDr+Glz/Z+HwyXtuu8A5Q8uCUrlTflHXNVsrn
5GtR69eTseFb/UJtLbi7mesKP/awMm8wBub4wSs/5zKLucgBkTdOCWOjPtQZ6u7F
7hwTBk4ZLihdw2fMNOHGf8iZgPvXV+zVfDPtZWfJGlFJpX8FefbRDVZkSGxXHVw=
=dLyK
-----END PGP SIGNATURE-----
Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 fixes from Will Deacon:
"The trickle of arm64 fixes continues to come in.
Nothing that's the end of the world, but we've got a fix for PCI IO
port accesses, an accidental naked "asm goto" and a fix to the
vmcoreinfo PT_NOTE merged this time around which we'd like to get
sorted before it becomes ABI.
- Fix ioport_map() mapping the wrong physical address for some I/O
BARs
- Remove direct use of "asm goto", since some compilers don't like
that
- Ensure kimage_voffset is always present in vmcoreinfo PT_NOTE"
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
asm-generic: io: Fix ioport_map() for !CONFIG_GENERIC_IOMAP && CONFIG_INDIRECT_PIO
arm64: kernel: arch_crash_save_vmcoreinfo() should depend on CONFIG_CRASH_CORE
arm64: jump_label.h: use asm_volatile_goto macro instead of "asm goto"
Add a helper for the case when the nfs4 open state has been set to use
a delegation stateid, and we want to revert to using the open stateid.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
The previous fix broke recovery of delegated stateids because it assumes
that if we did not mark the delegation as suspect, then the delegation has
effectively been revoked, and so it removes that delegation irrespectively
of whether or not it is valid and still in use. While this is "mostly
harmless" for ordinary I/O, we've seen pNFS fail with LAYOUTGET spinning
in an infinite loop while complaining that we're using an invalid stateid
(in this case the all-zero stateid).
What we rather want to do here is ensure that the delegation is always
correctly marked as needing testing when that is the case. So we want
to close the loophole offered by nfs4_schedule_stateid_recovery(),
which marks the state as needing to be reclaimed, but not the
delegation that may be backing it.
Fixes: 0e3d3e5df0 ("NFSv4.1 fix infinite loop on IO BAD_STATEID error")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Cc: stable@vger.kernel.org # v4.11+
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Now that the value of 'ino' can be NULL or an ERR_PTR(), we need to
change the test in the tracepoint.
Fixes: ce5624f7e6 ("NFSv4: Return NFS4ERR_DELAY when a layout fails...")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Cc: stable@vger.kernel.org # v4.17+
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
If someone interrupts a wait on one or more outstanding layoutgets in
pnfs_update_layout() then return the ERESTARTSYS/EINTR error.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Now that the value of 'ino' can be NULL or an ERR_PTR(), we need to
change the test in the tracepoint.
Fixes: ce5624f7e6 ("NFSv4: Return NFS4ERR_DELAY when a layout fails...")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Cc: stable@vger.kernel.org # v4.17+
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
- Fix mic_x100_dma to use devm_kzalloc to ensure memory is freed after
use by device manage dmaengine release function added in merge window
-----BEGIN PGP SIGNATURE-----
iQIcBAABAgAGBQJbm6G4AAoJEHwUBw8lI4NHrPAP/0RzdRL3UaV7fN5b7CHEmQCF
nQyLBZWXxNOxlz7JkzRK6s1RAw1D4BECXoqSp3rX6KimIKvN/8r7rSPqim/z1j5Q
9s/nr0+8KXyfVrvKJtc0zXZEwz9ChnDchewgkvg+vNJ0Imf854/BFGDznSNGC6L3
bwpfNtZt8u/mY1zse8oqM7kIxjlBYo4pQCcQW/kOGcMFKGzaV3b2vEbOPMnuH3hu
LdEGBsH9TieJh+2T+uLsqkReRqKhI0Jt91j7eAIaFZaEwOz/YM9Qj8P5YbnKeM8c
W9uV7GRoih2MXlSBU41Xi4H4XQ06yC5oMsp5pT2r2mtfNSCtGbvg3IFog3IR/JNh
qOGu2IJ/z1WdVskffTxtJpX0ECC30jekOH4gzQ3DjGYvqfO2qZJHkwU9b6As5me6
2Ab/6wm8v4piw60ubhYrRYGyDJDqB/bmw0NNclUh8crSBZmTQ7RtWoNuiqTCUkb/
K4EGCkSAfhBf8U6zc817sWCpcwFk56C94Q5gycUis1uYObt/UG8rGguxPP60lJzZ
R3n85Bm8jaL+w2T7HYjtc3aoKQ6GGS2nybMYB6nD3d7llUhncT8VAcouDpswJkSE
iqMx55Yf6EjsbDszzvCBLJlhYtuzg9SjljwsS8wWWSMY9gkDjHRUzyNErRM6KNx8
jFFkd5tHT/xipuG7UyuA
=IrHr
-----END PGP SIGNATURE-----
Merge tag 'dmaengine-fix-4.19-rc4' of git://git.infradead.org/users/vkoul/slave-dma
Pull dmaengine fix from Vinod Koul:
"Fix the mic_x100_dma driver to use devm_kzalloc for driver memory, so
that it is freed properly when it unregisters from dmaengine using
managed API"
* tag 'dmaengine-fix-4.19-rc4' of git://git.infradead.org/users/vkoul/slave-dma:
dmaengine: mic_x100_dma: use devm_kzalloc to fix an issue
Here are a number of small USB driver fixes for -rc4.
The usual suspects of gadget, xhci, and dwc2/3 are in here, along with
some reverts of reported problem changes, and a number of build
documentation warning fixes. Full details are in the shortlog.
All of these have been in linux-next with no reported issues.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCW5uaDQ8cZ3JlZ0Brcm9h
aC5jb20ACgkQMUfUDdst+ykkNACdGUPpicLrl0xEeFxbKiBcSW8DC/IAoNBFFBxr
/dWpSzXtgbjIMCTu73yx
=+EZm
-----END PGP SIGNATURE-----
Merge tag 'usb-4.19-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb
Pull USB fixes from Greg KH:
"Here are a number of small USB driver fixes for -rc4.
The usual suspects of gadget, xhci, and dwc2/3 are in here, along with
some reverts of reported problem changes, and a number of build
documentation warning fixes. Full details are in the shortlog.
All of these have been in linux-next with no reported issues"
* tag 'usb-4.19-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: (28 commits)
Revert "cdc-acm: implement put_char() and flush_chars()"
usb: Change usb_of_get_companion_dev() place to usb/common
usb: xhci: fix interrupt transfer error happened on MTK platforms
usb: cdc-wdm: Fix a sleep-in-atomic-context bug in service_outstanding_interrupt()
usb: misc: uss720: Fix two sleep-in-atomic-context bugs
usb: host: u132-hcd: Fix a sleep-in-atomic-context bug in u132_get_frame()
usb: Avoid use-after-free by flushing endpoints early in usb_set_interface()
linux/mod_devicetable.h: fix kernel-doc missing notation for typec_device_id
usb/typec: fix kernel-doc notation warning for typec_match_altmode
usb: Don't die twice if PCI xhci host is not responding in resume
usb: mtu3: fix error of xhci port id when enable U3 dual role
usb: uas: add support for more quirk flags
USB: Add quirk to support DJI CineSSD
usb: typec: fix kernel-doc parameter warning
usb/dwc3/gadget: fix kernel-doc parameter warning
USB: yurex: Check for truncation in yurex_read()
USB: yurex: Fix buffer over-read in yurex_write()
usb: host: xhci-plat: Iterate over parent nodes for finding quirks
xhci: Fix use after free for URB cancellation on a reallocated endpoint
USB: add quirk for WORLDE Controller KS49 or Prodipe MIDI 49C USB controller
...
Here are 3 small HVC tty driver fixes to resolve a reported regression
from 4.19-rc1.
All of these have been in linux-next for a while with no reported
issues.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCW5uZQA8cZ3JlZ0Brcm9h
aC5jb20ACgkQMUfUDdst+yk0hQCgovZ/LlZeyPMn+54L19ONek2J/FcAnR3sRu9+
LuQxS9hOZijNt+capNs+
=MBA9
-----END PGP SIGNATURE-----
Merge tag 'tty-4.19-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty
Pull tty fixes from Greg KH:
"Here are three small HVC tty driver fixes to resolve a reported
regression from 4.19-rc1.
All of these have been in linux-next for a while with no reported
issues"
* tag 'tty-4.19-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
tty: hvc: hvc_write() fix break condition
tty: hvc: hvc_poll() fix read loop batching
tty: hvc: hvc_poll() fix read loop hang
Here are a few small staging and iio driver fixes for -rc4.
Nothing major, just a few small bugfixes for some reported issues, and a
MAINTAINERS file update for the fbtft drivers. We also re-enable the
building of the erofs filesystem as the patcheset that was causing it to
break never got merged in the -rc1 cycle, so there's no reason it can't
be turned back on for now. The problem that was previously there is now
being handled in that other tree at the moment, so it will not hit us
again in the future.
All of these patches have been in linux-next with no reported issues.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCW5uYBg8cZ3JlZ0Brcm9h
aC5jb20ACgkQMUfUDdst+ylXbgCggwl5qtveSqJKHggCYY1S4/jd9qQAnAoWc8j7
A351tsmd6c5vujWBaLr5
=a2vY
-----END PGP SIGNATURE-----
Merge tag 'staging-4.19-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging
Pull staging/IIO driver fixes from Greg KH:
"Here are a few small staging and iio driver fixes for -rc4.
Nothing major, just a few small bugfixes for some reported issues, and
a MAINTAINERS file update for the fbtft drivers.
We also re-enable the building of the erofs filesystem as the XArray
patches that were causing it to break never got merged in the -rc1
cycle, so there's no reason it can't be turned back on for now. The
problem that was previously there is now being handled in the Xarray
tree at the moment, so it will not hit us again in the future.
All of these patches have been in linux-next with no reported issues"
* tag 'staging-4.19-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging:
staging: vboxvideo: Change address of scanout buffer on page-flip
staging: vboxvideo: Fix IRQs no longer working
staging: gasket: TODO: re-implement using UIO
staging/fbtft: Update TODO and mailing lists
staging: erofs: rename superblock flags (MS_xyz -> SB_xyz)
iio: imu: st_lsm6dsx: take into account ts samples in wm configuration
Revert "iio: temperature: maxim_thermocouple: add MAX31856 part"
Revert "staging: erofs: disable compiling temporarile"
MAINTAINERS: Switch a maintainer for drivers/staging/gasket
staging: wilc1000: revert "fix TODO to compile spi and sdio components in single module"
Here are a small handful of char/misc driver fixes for 4.19-rc4.
All of them are simple, resolving reported problems in a few drivers.
Full details are in the shortlog.
All of these have been in linux-next with no reported issues.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCW5uYvQ8cZ3JlZ0Brcm9h
aC5jb20ACgkQMUfUDdst+yl4MQCgkSLYti3Cu2A90S/H0DKTyW8+21oAnj3wfuUn
sgFdJOwM1PNB5Y/Cfyvl
=NcTu
-----END PGP SIGNATURE-----
Merge tag 'char-misc-4.19-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc
Pull char/misc driver fixes from Greg KH:
"Here are a small handful of char/misc driver fixes for 4.19-rc4.
All of them are simple, resolving reported problems in a few drivers.
Full details are in the shortlog.
All of these have been in linux-next with no reported issues"
* tag 'char-misc-4.19-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
firmware: Fix security issue with request_firmware_into_buf()
vmbus: don't return values for uninitalized channels
fpga: dfl: fme: fix return value check in in pr_mgmt_init()
misc: hmc6352: fix potential Spectre v1
Tools: hv: Fix a bug in the key delete code
misc: ibmvsm: Fix wrong assignment of return code
android: binder: fix the race mmap and alloc_new_buf_locked
mei: bus: need to unlink client before freeing
mei: bus: fix hw module get/put balance
mei: fix use-after-free in mei_cl_write
mei: ignore not found client in the enumeration
This reverts commit 1f40a46cf4.
It turned out that this patch is not sufficient to enable PTI on 32 bit
systems with legacy 2-level page-tables. In this paging mode the huge-page
PTEs are in the top-level page-table directory, where also the mirroring to
the user-space page-table happens. So every huge PTE exits twice, in the
kernel and in the user page-table.
That means that accessed/dirty bits need to be fetched from two PTEs in
this mode to be safe, but this is not trivial to implement because it needs
changes to generic code just for the sake of enabling PTI with 32-bit
legacy paging. As all systems that need PTI should support PAE anyway,
remove support for PTI when 32-bit legacy paging is used.
Fixes: 7757d607c6 ('x86/pti: Allow CONFIG_PAGE_TABLE_ISOLATION for x86_32')
Reported-by: Meelis Roos <mroos@linux.ee>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: hpa@zytor.com
Cc: linux-mm@kvack.org
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Link: https://lkml.kernel.org/r/1536922754-31379-1-git-send-email-joro@8bytes.org
Patch series "mmu_notifiers follow ups".
Tetsuo has noticed some fallouts from 93065ac753 ("mm, oom: distinguish
blockable mode for mmu notifiers"). One of them has been fixed and picked
up by AMD/DRM maintainer [1]. XEN issue is fixed by patch 1. I have also
clarified expectations about blockable semantic of invalidate_range_end.
Finally the last patch removes MMU_INVALIDATE_DOES_NOT_BLOCK which is no
longer used nor needed.
[1] http://lkml.kernel.org/r/20180824135257.GU29735@dhcp22.suse.cz
This patch (of 3):
93065ac753 ("mm, oom: distinguish blockable mode for mmu notifiers") has
introduced blockable parameter to all mmu_notifiers and the notifier has
to back off when called in !blockable case and it could block down the
road.
The above commit implemented that for mn_invl_range_start but both
in_range checks are done unconditionally regardless of the blockable mode
and as such they would fail all the time for regular calls. Fix this by
checking blockable parameter as well.
Once we are there we can remove the stale TODO. The lock has to be
sleepable because we wait for completion down in gnttab_unmap_refs_sync.
Link: http://lkml.kernel.org/r/20180827112623.8992-2-mhocko@kernel.org
Fixes: 93065ac753 ("mm, oom: distinguish blockable mode for mmu notifiers")
Signed-off-by: Michal Hocko <mhocko@suse.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Reviewed-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
This patch removes duplicate macro useage in events_base.c.
It also fixes gcc warning:
variable ‘col’ set but not used [-Wunused-but-set-variable]
Signed-off-by: Joshua Abraham <j.abraham1776@gmail.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
The command 'xl vcpu-set 0 0', issued in dom0, will crash dom0:
BUG: unable to handle kernel NULL pointer dereference at 00000000000002d8
PGD 0 P4D 0
Oops: 0000 [#1] PREEMPT SMP NOPTI
CPU: 7 PID: 65 Comm: xenwatch Not tainted 4.19.0-rc2-1.ga9462db-default #1 openSUSE Tumbleweed (unreleased)
Hardware name: Intel Corporation S5520UR/S5520UR, BIOS S5500.86B.01.00.0050.050620101605 05/06/2010
RIP: e030:device_offline+0x9/0xb0
Code: 77 24 00 e9 ce fe ff ff 48 8b 13 e9 68 ff ff ff 48 8b 13 e9 29 ff ff ff 48 8b 13 e9 ea fe ff ff 90 66 66 66 66 90 41 54 55 53 <f6> 87 d8 02 00 00 01 0f 85 88 00 00 00 48 c7 c2 20 09 60 81 31 f6
RSP: e02b:ffffc90040f27e80 EFLAGS: 00010203
RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
RDX: ffff8801f3800000 RSI: ffffc90040f27e70 RDI: 0000000000000000
RBP: 0000000000000000 R08: ffffffff820e47b3 R09: 0000000000000000
R10: 0000000000007ff0 R11: 0000000000000000 R12: ffffffff822e6d30
R13: dead000000000200 R14: dead000000000100 R15: ffffffff8158b4e0
FS: 00007ffa595158c0(0000) GS:ffff8801f39c0000(0000) knlGS:0000000000000000
CS: e033 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00000000000002d8 CR3: 00000001d9602000 CR4: 0000000000002660
Call Trace:
handle_vcpu_hotplug_event+0xb5/0xc0
xenwatch_thread+0x80/0x140
? wait_woken+0x80/0x80
kthread+0x112/0x130
? kthread_create_worker_on_cpu+0x40/0x40
ret_from_fork+0x3a/0x50
This happens because handle_vcpu_hotplug_event is called twice. In the
first iteration cpu_present is still true, in the second iteration
cpu_present is false which causes get_cpu_device to return NULL.
In case of cpu#0, cpu_online is apparently always true.
Fix this crash by checking if the cpu can be hotplugged, which is false
for a cpu that was just removed.
Also check if the cpu was actually offlined by device_remove, otherwise
leave the cpu_present state as it is.
Rearrange to code to do all work with device_hotplug_lock held.
Signed-off-by: Olaf Hering <olaf@aepfle.de>
Reviewed-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Scrubbing pages on initial balloon down can take some time, especially
in nested virtualization case (nested EPT is slow). When HVM/PVH guest is
started with memory= significantly lower than maxmem=, all the extra
pages will be scrubbed before returning to Xen. But since most of them
weren't used at all at that point, Xen needs to populate them first
(from populate-on-demand pool). In nested virt case (Xen inside KVM)
this slows down the guest boot by 15-30s with just 1.5GB needed to be
returned to Xen.
Add runtime parameter to enable/disable it, to allow initially disabling
scrubbing, then enable it back during boot (for example in initramfs).
Such usage relies on assumption that a) most pages ballooned out during
initial boot weren't used at all, and b) even if they were, very few
secrets are in the guest at that time (before any serious userspace
kicks in).
Convert CONFIG_XEN_SCRUB_PAGES to CONFIG_XEN_SCRUB_PAGES_DEFAULT (also
enabled by default), controlling default value for the new runtime
switch.
Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
When guest receives a sysrq request from the host it acknowledges it by
writing '\0' to control/sysrq xenstore node. This, however, make xenstore
watch fire again but xenbus_scanf() fails to parse empty value with "%c"
format string:
sysrq: SysRq : Emergency Sync
Emergency Sync complete
xen:manage: Error -34 reading sysrq code in control/sysrq
Ignore -ERANGE the same way we already ignore -ENOENT, empty value in
control/sysrq is totally legal.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
The !CONFIG_GENERIC_IOMAP version of ioport_map uses MMIO_UPPER_LIMIT to
prevent users from making I/O accesses outside the expected I/O range -
however it erroneously treats MMIO_UPPER_LIMIT as a mask which is
contradictory to its other users.
The introduction of CONFIG_INDIRECT_PIO, which subtracts an arbitrary
amount from IO_SPACE_LIMIT to form MMIO_UPPER_LIMIT, results in ioport_map
mangling the given port rather than capping it.
We address this by aligning more closely with the CONFIG_GENERIC_IOMAP
implementation of ioport_map by using the comparison operator and
returning NULL where the port exceeds MMIO_UPPER_LIMIT. Though note that
we preserve the existing behavior of masking with IO_SPACE_LIMIT such that
we don't break existing buggy drivers that somehow rely on this masking.
Fixes: 5745392e0c ("PCI: Apply the new generic I/O management on PCI IO hosts")
Reported-by: Will Deacon <will.deacon@arm.com>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Andrew Murray <andrew.murray@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>