linux

korg/linux

mirror of https://mirrors.bfsu.edu.cn/git/linux.git synced 2024-12-26 20:44:32 +08:00

Author	SHA1	Message	Date
Oleg Nesterov	a7c80ebcac	x86/fpu: Avoid math_state_restore() without used_math() in __restore_xstate_sig() math_state_restore() assumes it is called with irqs disabled, but this is not true if the caller is __restore_xstate_sig(). This means that if ia32_fxstate == T and __copy_from_user() fails, __restore_xstate_sig() returns with irqs disabled too. This triggers: BUG: sleeping function called from invalid context at kernel/locking/rwsem.c:41 dump_stack ___might_sleep ? _raw_spin_unlock_irqrestore __might_sleep down_read ? _raw_spin_unlock_irqrestore print_vma_addr signal_fault sys32_rt_sigreturn Change __restore_xstate_sig() to call set_used_math() unconditionally. This avoids enabling and disabling interrupts in math_state_restore(). If copy_from_user() fails, we can simply do fpu_finit() by hand. [ Note: this is only the first step. math_state_restore() should not check used_math(), it should set this flag. While init_fpu() should simply die. ] Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: <stable@vger.kernel.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Pekka Riikonen <priikone@iki.fi> Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com> Cc: Rik van Riel <riel@redhat.com> Cc: Suresh Siddha <sbsiddha@gmail.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20150307153844.GB25954@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-03-13 12:44:28 +01:00
Stephan Mueller	ccfe8c3f7e	crypto: aesni - fix memory usage in GCM decryption The kernel crypto API logic requires the caller to provide the length of (ciphertext \|\| authentication tag) as cryptlen for the AEAD decryption operation. Thus, the cipher implementation must calculate the size of the plaintext output itself and cannot simply use cryptlen. The RFC4106 GCM decryption operation tries to overwrite cryptlen memory in req->dst. As the destination buffer for decryption only needs to hold the plaintext memory but cryptlen references the input buffer holding (ciphertext \|\| authentication tag), the assumption of the destination buffer length in RFC4106 GCM operation leads to a too large size. This patch simply uses the already calculated plaintext size. In addition, this patch fixes the offset calculation of the AAD buffer pointer: as mentioned before, cryptlen already includes the size of the tag. Thus, the tag does not need to be added. With the addition, the AAD will be written beyond the already allocated buffer. Note, this fixes a kernel crash that can be triggered from user space via AF_ALG(aead) -- simply use the libkcapi test application from [1] and update it to use rfc4106-gcm-aes. Using [1], the changes were tested using CAVS vectors to demonstrate that the crypto operation still delivers the right results. [1] http://www.chronox.de/libkcapi.html CC: Tadeusz Struk <tadeusz.struk@intel.com> Cc: stable@vger.kernel.org Signed-off-by: Stephan Mueller <smueller@chronox.de> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2015-03-13 21:32:21 +11:00
Petr Matousek	c1a6bff28c	kvm: x86: i8259: return initialized data on invalid-size read If data is read from PIC with invalid access size, the return data stays uninitialized even though success is returned. Fix this by always initializing the data. Signed-off-by: Petr Matousek <pmatouse@redhat.com> Reported-by: Nadav Amit <nadav.amit@gmail.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2015-03-12 22:02:46 -03:00
Daniel J Blueman	c8a470cab0	x86/apic/numachip: Fix sibling map with NumaChip On NumaChip systems, the physical processor ID assignment wasn't accounting for the number of nodes in AMD multi-module processors, giving an incorrect sibling map: $ cd /sys/devices/system/cpu/cpu29/topology $ grep . * core_id:5 core_siblings:00000000,ff000000 core_siblings_list:24-31 physical_package_id:3 thread_siblings:00000000,30000000 thread_siblings_list:28-29 This fixes it: $ cd /sys/devices/system/cpu/cpu29/topology $ grep . * core_id:5 core_siblings:00000000,ffff0000 core_siblings_list:16-31 physical_package_id:1 thread_siblings:00000000,30000000 thread_siblings_list:28-29 Signed-off-by: Daniel J Blueman <daniel@numascale.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: <stable@vger.kernel.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Steffen Persvold <sp@numascale.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1426135950-10110-1-git-send-email-daniel@numascale.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-03-12 16:58:59 +01:00
Li, Aubrey	7486341a98	x86/platform, acpi: Bypass legacy PIC and PIT in ACPI hardware reduced mode On a platform in ACPI Hardware-reduced mode, the legacy PIC and PIT may not be initialized even though they may be present in silicon. Touching these legacy components causes unexpected results on the system. On the Bay Trail-T(ASUS-T100) platform, touching these legacy components blocks platform hardware low idle power state(S0ix) during system suspend. So we should bypass them in ACPI hardware reduced mode. Suggested-by: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: Li Aubrey <aubrey.li@linux.intel.com> Cc: <alan@linux.intel.com> Cc: Alan Cox <alan@linux.intel.com> Cc: H. Peter Anvin <hpa@linux.intel.com> Cc: Len Brown <len.brown@intel.com> Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Cc: Rafael J. Wysocki <rjw@rjwysocki.net> Link: http://lkml.kernel.org/r/54FFF81C.20703@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-03-12 12:07:13 +01:00
Paolo Bonzini	dc9be0fac7	kvm: move advertising of KVM_CAP_IRQFD to common code POWER supports irqfds but forgot to advertise them. Some userspace does not check for the capability, but others check it---thus they work on x86 and s390 but not POWER. To avoid that other architectures in the future make the same mistake, let common code handle KVM_CAP_IRQFD the same way as KVM_CAP_IRQFD_RESAMPLE. Reported-and-tested-by: Greg Kurz <gkurz@linux.vnet.ibm.com> Cc: stable@vger.kernel.org Fixes: `297e21053a` Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2015-03-10 21:18:59 -03:00
Denys Vlasenko	263042e463	x86/asm/entry/64: Save user RSP in pt_regs->sp on SYSCALL64 fastpath Prepare for the removal of 'usersp', by simplifying PER_CPU(old_rsp) usage: - use it only as temp storage - store the userspace stack pointer immediately in pt_regs->sp on syscall entry, instead of using it later, on syscall exit. - change C code to use pt_regs->sp only, instead of PER_CPU(old_rsp) and task->thread.usersp. FIXUP/RESTORE_TOP_OF_STACK are simplified as well. Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com> Cc: Alexei Starovoitov <ast@plumgrid.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Will Drewry <wad@chromium.org> Link: http://lkml.kernel.org/r/1425926364-9526-4-git-send-email-dvlasenk@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-03-10 13:56:10 +01:00
Denys Vlasenko	616ab249f1	x86/asm/entry/64: Remove stub_iopl stub_iopl is no longer needed: pt_regs->flags needs no fixing up after previous change. Remove it. Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com> Cc: Alexei Starovoitov <ast@plumgrid.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Will Drewry <wad@chromium.org> Link: http://lkml.kernel.org/r/1425984307-2143-1-git-send-email-dvlasenk@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-03-10 13:56:10 +01:00
Denys Vlasenko	29722cd4ef	x86/asm/entry/64: Save R11 into pt_regs->flags on SYSCALL64 fastpath Before this patch, R11 was saved in pt_regs->r11. Which looks natural, but requires messy shuffling to/from iret frame whenever ptrace or e.g. sys_iopl() wants to modify flags - because that's how this register is used by SYSCALL/SYSRET. This patch saves R11 in pt_regs->flags, and uses that value for the SYSRET64 instruction. Shuffling is eliminated. FIXUP/RESTORE_TOP_OF_STACK are simplified. stub_iopl is no longer needed: pt_regs->flags needs no fixing up. Testing shows that syscall fast path is ~54.3 ns before and after the patch (on 2.7 GHz Sandy Bridge CPU). Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com> Cc: Alexei Starovoitov <ast@plumgrid.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Will Drewry <wad@chromium.org> Link: http://lkml.kernel.org/r/1425926364-9526-2-git-send-email-dvlasenk@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-03-10 13:56:10 +01:00
Andy Lutomirski	394838c960	x86/asm/entry/32: Fix user_mode() misuses The one in do_debug() is probably harmless, but better safe than sorry. Signed-off-by: Andy Lutomirski <luto@amacapital.net> Cc: <stable@vger.kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/d67deaa9df5458363623001f252d1aee3215d014.1425948056.git.luto@amacapital.net Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-03-10 04:21:51 +01:00
Denys Vlasenko	3e1aa7cb59	x86/asm: Optimize unnecessarily wide TEST instructions By the nature of the TEST operation, it is often possible to test a narrower part of the operand: "testl $3, mem" -> "testb $3, mem", "testq $3, %rcx" -> "testb $3, %cl" This results in shorter instructions, because the TEST instruction has no sign-entending byte-immediate forms unlike other ALU ops. Note that this change does not create any LCP (Length-Changing Prefix) stalls, which happen when adding a 0x66 prefix, which happens when 16-bit immediates are used, which changes such TEST instructions: [test_opcode] [modrm] [imm32] to: [0x66] [test_opcode] [modrm] [imm16] where [imm16] has a different length now: 2 bytes instead of 4. This confuses the decoder and slows down execution. REX prefixes were carefully designed to almost never hit this case: adding REX prefix does not change instruction length except MOVABS and MOV [addr],RAX instruction. This patch does not add instructions which would use a 0x66 prefix, code changes in assembly are: -48 f7 07 01 00 00 00 testq $0x1,(%rdi) +f6 07 01 testb $0x1,(%rdi) -48 f7 c1 01 00 00 00 test $0x1,%rcx +f6 c1 01 test $0x1,%cl -48 f7 c1 02 00 00 00 test $0x2,%rcx +f6 c1 02 test $0x2,%cl -41 f7 c2 01 00 00 00 test $0x1,%r10d +41 f6 c2 01 test $0x1,%r10b -48 f7 c1 04 00 00 00 test $0x4,%rcx +f6 c1 04 test $0x4,%cl -48 f7 c1 08 00 00 00 test $0x8,%rcx +f6 c1 08 test $0x8,%cl Linus further notes: "There are no stalls from using 8-bit instruction forms. Now, changing from 64-bit or 32-bit 'test' instructions to 8-bit ones could cause problems if it ends up having forwarding issues, so that instead of just forwarding the result, you end up having to wait for it to be stable in the L1 cache (or possibly the register file). The forwarding from the store buffer is simplest and most reliable if the read is done at the exact same address and the exact same size as the write that gets forwarded. But that's true only if: (a) the write was very recent and is still in the write queue. I'm not sure that's the case here anyway. (b) on at least most Intel microarchitectures, you have to test a different byte than the lowest one (so forwarding a 64-bit write to a 8-bit read ends up working fine, as long as the 8-bit read is of the low 8 bits of the written data). A very similar issue might show up for registers too, not just memory writes, if you use 'testb' with a high-byte register (where instead of forwarding the value from the original producer it needs to go through the register file and then shifted). But it's mainly a problem for store buffers. But afaik, the way Denys changed the test instructions, neither of the above issues should be true. The real problem for store buffer forwarding tends to be "write 8 bits, read 32 bits". That can be really surprisingly expensive, because the read ends up having to wait until the write has hit the cacheline, and we might talk tens of cycles of latency here. But "write 32 bits, read the low 8 bits" should be fast on pretty much all x86 chips, afaik." Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com> Acked-by: Andy Lutomirski <luto@amacapital.net> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: H. Peter Anvin <hpa@linux.intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Kees Cook <keescook@chromium.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Will Drewry <wad@chromium.org> Link: http://lkml.kernel.org/r/1425675332-31576-1-git-send-email-dvlasenk@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-03-07 11:12:43 +01:00
Andy Lutomirski	a7fcf28d43	x86/asm/entry: Replace this_cpu_sp0() with current_top_of_stack() and fix it on x86_32 I broke 32-bit kernels. The implementation of sp0 was correct as far as I can tell, but sp0 was much weirder on x86_32 than I realized. It has the following issues: - Init's sp0 is inconsistent with everything else's: non-init tasks are offset by 8 bytes. (I have no idea why, and the comment is unhelpful.) - vm86 does crazy things to sp0. Fix it up by replacing this_cpu_sp0() with current_top_of_stack() and using a new percpu variable to track the top of the stack on x86_32. Signed-off-by: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Fixes: `75182b1632` ("x86/asm/entry: Switch all C consumers of kernel_stack to this_cpu_sp0()") Link: http://lkml.kernel.org/r/d09dbe270883433776e0cbee3c7079433349e96d.1425692936.git.luto@amacapital.net Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-03-07 09:34:03 +01:00
Andy Lutomirski	b27559a433	x86/asm/entry: Delay loading sp0 slightly on task switch The change: `75182b1632` ("x86/asm/entry: Switch all C consumers of kernel_stack to this_cpu_sp0()") had the unintended side effect of changing the return value of current_thread_info() during part of the context switch process. Change it back. This has no effect as far as I can tell -- it's just for consistency. Signed-off-by: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/9fcaa47dd8487db59eed7a3911b6ae409476763e.1425692936.git.luto@amacapital.net Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-03-07 09:34:03 +01:00
Linus Torvalds	39ed853a24	Power management and ACPI fixes for v4.0-rc3 - Fix ACPI resources management problems introduced by the recent rework of the code in question (Jiang Liu) and a build issue introduced by those changes (Joachim Nilsson). - Fix a recent suspend-to-idle regression on systems where entering idle states causes local timers to stop, prevent suspend-to-idle from crashing in restricted configurations (no cpuidle driver, cpuidle disabled etc.) and clean up the idle loop somewhat while at it (Rafael J Wysocki). - Fix build problem in the cpufreq ppc driver (Geert Uytterhoeven). - Allow the ACPI backlight driver module to be loaded if ACPI is disabled which helps the i915 driver in those configurations (stable-candidate) and change the code to help debug unusual use cases (Chris Wilson). - Wakeup IRQ management changes in v3.18 caused some drivers on the at91 platform to trigger a warning from the IRQ core related to an unexpected combination of interrupt action handler flags. However, on at91 a timer IRQ is shared with some other devices (including system wakeup ones) and that leads to the unusual combination of flags in question. To make it possible to avoid the warning introduce a new interrupt action handler flag (which can be used by drivers to indicate the special case to the core) and rework the problematic at91 drivers to use it and work as expected during system suspend/resume. From Boris Brezillon, Rafael J Wysocki and Mark Rutland. - Clean up the generic power domains subsystem's debugfs interface (Kevin Hilman). / -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (GNU/Linux) iQIcBAABCAAGBQJU+cBpAAoJEILEb/54YlRxb+8P+weKzn3Lim4R86ZkYjUjSr+P Y+1d9CvQETsGMqaJRssBQ8npSaXqGF7kDjj3a4WIONxrgIs9k/7wZmNtTDYC2C7T flxQQunlaHrELFqguowSq2pLxDTbWIe1lF7vtPwv/Xn7bOd755NrnAPgITseuxh5 ggoZg4gWnfHL6THnnOY8Dw6ZciCe7/lxfdAQavL+0xYybvG8/0/Urn+CsA/Q4Oz7 S9g7OLuK5LOlgE8f14TvLykHCVrluGKXMaulDUqx0z4DqOS+OP+Dp65bLGAf6faE kYmfnJfN5vcfARxvBHyYCKuQAviMxhbS3R4fqO15SbRws4hLHL7IEmuuBAuEbPES oIXLR2OBHAWeyiStHxEOZ0yxwhK2KjCOks/dPPPGtK2ZF4PAmCsOk0cxh6WdnzH3 g50Tg5ebPFjnyT8OCFNFm1g1pAoKjt2RuN8OGcKwChYjek3Yk5fCrkty7jkJYtQE xcfXwaDPwolZbo3X0yGrchbqJYmOU16Kuu1U20L80uL/1TxmzlF27pUyLj4BbJxW co+cxumF4WA6lixfNOcVil4PEBgh3lhCD5FzkGOiE0CI/l3omVdmR40nPN++IllD O7QxFVGxSRZfEeIP0ujjB6rwxJ8JsK3vwlUngommby7KFtssh9/VZ8l4FbjXnDXl qLVbX2fxxSD3j8U9aEov =nc5T -----END PGP SIGNATURE----- Merge tag 'pm+acpi-4.0-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull power management and ACPI fixes from Rafael Wysocki: "These are fixes for recent regressions (ACPI resources management, suspend-to-idle), stable-candidate fixes (ACPI backlight), fixes related to the wakeup IRQ management changes made in v3.18, other fixes (suspend-to-idle, cpufreq ppc driver) and a couple of cleanups (suspend-to-idle, generic power domains, ACPI backlight). Specifics: - Fix ACPI resources management problems introduced by the recent rework of the code in question (Jiang Liu) and a build issue introduced by those changes (Joachim Nilsson). - Fix a recent suspend-to-idle regression on systems where entering idle states causes local timers to stop, prevent suspend-to-idle from crashing in restricted configurations (no cpuidle driver, cpuidle disabled etc.) and clean up the idle loop somewhat while at it (Rafael J Wysocki). - Fix build problem in the cpufreq ppc driver (Geert Uytterhoeven). - Allow the ACPI backlight driver module to be loaded if ACPI is disabled which helps the i915 driver in those configurations (stable-candidate) and change the code to help debug unusual use cases (Chris Wilson). - Wakeup IRQ management changes in v3.18 caused some drivers on the at91 platform to trigger a warning from the IRQ core related to an unexpected combination of interrupt action handler flags. However, on at91 a timer IRQ is shared with some other devices (including system wakeup ones) and that leads to the unusual combination of flags in question. To make it possible to avoid the warning introduce a new interrupt action handler flag (which can be used by drivers to indicate the special case to the core) and rework the problematic at91 drivers to use it and work as expected during system suspend/resume. From Boris Brezillon, Rafael J Wysocki and Mark Rutland. - Clean up the generic power domains subsystem's debugfs interface (Kevin Hilman)" * tag 'pm+acpi-4.0-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: genirq / PM: describe IRQF_COND_SUSPEND tty: serial: atmel: rework interrupt and wakeup handling watchdog: at91sam9: request the irq with IRQF_NO_SUSPEND cpuidle / sleep: Use broadcast timer for states that stop local timer clk: at91: implement suspend/resume for the PMC irqchip rtc: at91rm9200: rework wakeup and interrupt handling rtc: at91sam9: rework wakeup and interrupt handling PM / wakeup: export pm_system_wakeup symbol genirq / PM: Add flag for shared NO_SUSPEND interrupt lines ACPI / video: Propagate the error code for acpi_video_register ACPI / video: Load the module even if ACPI is disabled PM / Domains: cleanup: rename gpd -> genpd in debugfs interface cpufreq: ppc: Add missing #include <asm/smp.h> x86/PCI/ACPI: Relax ACPI resource descriptor checks to work around BIOS bugs x86/PCI/ACPI: Ignore resources consumed by host bridge itself cpuidle: Clean up fallback handling in cpuidle_idle_call() cpuidle / sleep: Do sanity checks in cpuidle_enter_freeze() too idle / sleep: Avoid excessive disabling and enabling interrupts PCI: versatile: Update for list_for_each_entry() API change genirq / PM: better describe IRQF_NO_SUSPEND semantics	2015-03-06 10:36:09 -08:00
Jiri Slaby	e893286918	x86/vdso: Fix the build on GCC5 On gcc5 the kernel does not link: ld: .eh_frame_hdr table[4] FDE at 0000000000000648 overlaps table[5] FDE at 0000000000000670. Because prior GCC versions always emitted NOPs on ALIGN directives, but gcc5 started omitting them. .LSTARTFDEDLSI1 says: /* HACK: The dwarf2 unwind routines will subtract 1 from the return address to get an address in the middle of the presumed call instruction. Since we didn't get here via a call, we need to include the nop before the real start to make up for it. / .long .LSTART_sigreturn-1-. / PC-relative start address */ But commit `69d0627a7f` ("x86 vDSO: reorder vdso32 code") from 2.6.25 replaced .org __kernel_vsyscall+32,0x90 by ALIGN right before __kernel_sigreturn. Of course, ALIGN need not generate any NOP in there. Esp. gcc5 collapses vclock_gettime.o and int80.o together with no generated NOPs as "ALIGN". So fix this by adding to that point at least a single NOP and make the function ALIGN possibly with more NOPs then. Kudos for reporting and diagnosing should go to Richard. Reported-by: Richard Biener <rguenther@suse.de> Signed-off-by: Jiri Slaby <jslaby@suse.cz> Acked-by: Andy Lutomirski <luto@amacapital.net> Cc: <stable@vger.kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1425543211-12542-1-git-send-email-jslaby@suse.cz Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-03-06 09:34:45 +01:00
Andy Lutomirski	9b47668843	x86/asm/entry: Rename 'INIT_TSS_IST' to 'CPU_TSS_IST' This has nothing to do with the init thread or the initial anything. It's just the CPU's TSS. Signed-off-by: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/a0bd5e26b32a2e1f08ff99017d0997118fbb2485.1425611534.git.luto@amacapital.net Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-03-06 08:32:58 +01:00
Andy Lutomirski	d0a0de21f8	x86/asm/entry: Remove INIT_TSS and fold the definitions into 'cpu_tss' The INIT_TSS is unnecessary. Just define the initial TSS where 'cpu_tss' is defined. While we're at it, merge the 32-bit and 64-bit definitions. The only syntactic change is that 32-bit kernels were computing sp0 as long, but now they compute it as unsigned long. Verified by objdump: the contents and relocations of .data..percpu..shared_aligned are unchanged on 32-bit and 64-bit kernels. Signed-off-by: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/8fc39fa3f6c5d635e93afbdd1a0fe0678a6d7913.1425611534.git.luto@amacapital.net Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-03-06 08:32:58 +01:00
Andy Lutomirski	24933b82c0	x86/asm/entry: Rename 'init_tss' to 'cpu_tss' It has nothing to do with init -- there's only one TSS per cpu. Other names considered include: - current_tss: Confusing because we never switch the tss. - singleton_tss: Too long. This patch was generated with 's/init_tss/cpu_tss/g'. Followup patches will fix INIT_TSS and INIT_TSS_IST by hand. Signed-off-by: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/da29fb2a793e4f649d93ce2d1ed320ebe8516262.1425611534.git.luto@amacapital.net Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-03-06 08:32:58 +01:00
Andy Lutomirski	9d0c914c60	x86/asm/entry/64/compat: Change the 32-bit sysenter code to use sp0 The ia32 sysenter code loaded the top of the kernel stack into rsp by loading kernel_stack and then adjusting it. It can be simplified to just read sp0 directly. This requires the addition of a new asm-offsets entry for sp0. Signed-off-by: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/88ff9006163d296a0665338585c36d9bfb85235d.1425611534.git.luto@amacapital.net Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-03-06 08:32:58 +01:00
Andy Lutomirski	75182b1632	x86/asm/entry: Switch all C consumers of kernel_stack to this_cpu_sp0() This will make modifying the semantics of kernel_stack easier. The change to ist_begin_non_atomic() is necessary because sp0 no longer points to the same THREAD_SIZE-aligned region as RSP; it's one byte too high for that. At Denys' suggestion, rather than offsetting it, just check explicitly that we're in the correct range ending at sp0. This has the added benefit that we no longer assume that the thread stack is aligned to THREAD_SIZE. Suggested-by: Denys Vlasenko <dvlasenk@redhat.com> Signed-off-by: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/ef8254ad414cbb8034c9a56396eeb24f5dd5b0de.1425611534.git.luto@amacapital.net Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-03-06 08:32:57 +01:00
Andy Lutomirski	8ef46a672a	x86/asm/entry: Add this_cpu_sp0() to read sp0 for the current cpu We currently store references to the top of the kernel stack in multiple places: kernel_stack (with an offset) and init_tss.x86_tss.sp0 (no offset). The latter is defined by hardware and is a clean canonical way to find the top of the stack. Add an accessor so we can start using it. This needs minor paravirt tweaks. On native, sp0 defines the top of the kernel stack and is therefore always correct. On Xen and lguest, the hypervisor tracks the top of the stack, but we want to start reading sp0 in the kernel. Fixing this is simple: just update our local copy of sp0 as well as the hypervisor's copy on task switches. Signed-off-by: Andy Lutomirski <luto@amacapital.net> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/8d675581859712bee09a055ed8f785d80dac1eca.1425611534.git.luto@amacapital.net Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-03-06 08:32:57 +01:00
Rafael J. Wysocki	8204680c7b	Merge branch 'acpi-resources' * acpi-resources: x86/PCI/ACPI: Relax ACPI resource descriptor checks to work around BIOS bugs x86/PCI/ACPI: Ignore resources consumed by host bridge itself PCI: versatile: Update for list_for_each_entry() API change	2015-03-05 23:14:40 +01:00
Linus Torvalds	99aedde086	Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Ingo Molnar: "Misc fixes: EFI fixes, an Intel Quark fix, an asm fix and an FPU handling fix" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/fpu/xsaves: Fix improper uses of __ex_table x86/intel/quark: Select COMMON_CLK x86/asm/entry/64: Remove a bogus 'ret_from_fork' optimization firmware: dmi_scan: Fix dmi_len type efi/libstub: Fix boundary checking in efi_high_alloc() firmware: dmi_scan: Fix dmi scan to handle "End of Table" structure	2015-03-05 11:25:23 -08:00
Quentin Casasnovas	06c8173eb9	x86/fpu/xsaves: Fix improper uses of __ex_table Commit: `f31a9f7c71` ("x86/xsaves: Use xsaves/xrstors to save and restore xsave area") introduced alternative instructions for XSAVES/XRSTORS and commit: `adb9d526e9` ("x86/xsaves: Add xsaves and xrstors support for booting time") added support for the XSAVES/XRSTORS instructions at boot time. Unfortunately both failed to properly protect them against faulting: The 'xstate_fault' macro will use the closest label named '1' backward and that ends up in the .altinstr_replacement section rather than in .text. This means that the kernel will never find in the __ex_table the .text address where this instruction might fault, leading to serious problems if userspace manages to trigger the fault. Signed-off-by: Quentin Casasnovas <quentin.casasnovas@oracle.com> Signed-off-by: Jamie Iles <jamie.iles@oracle.com> [ Improved the changelog, fixed some whitespace noise. ] Acked-by: Borislav Petkov <bp@alien8.de> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: <stable@vger.kernel.org> Cc: Allan Xavier <mr.a.xavier@gmail.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Thomas Gleixner <tglx@linutronix.de> Fixes: `adb9d526e9` ("x86/xsaves: Add xsaves and xrstors support for booting time") Fixes: `f31a9f7c71` ("x86/xsaves: Use xsaves/xrstors to save and restore xsave area") Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-03-05 18:20:36 +01:00
Andy Shevchenko	9ab6eb51ef	x86/intel/quark: Select COMMON_CLK The commit `8bbc2a135b` ("x86/intel/quark: Add Intel Quark platform support") introduced a minimal support of Intel Quark SoC. That allows to use core parts of the SoC. However, the SPI, I2C, and GPIO drivers can't be selected by kernel configuration because they depend on COMMON_CLK. The patch adds a COMMON_CLK selection to the platfrom definition to allow user choose the drivers. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Acked-by: Ong, Boon Leong <boon.leong.ong@intel.com> Cc: Bryan O'Donoghue <pure.logic@nexus-software.ie> Cc: Darren Hart <dvhart@linux.intel.com> Fixes: `8bbc2a135b` ("x86/intel/quark: Add Intel Quark platform support") Link: http://lkml.kernel.org/r/1425569044-2867-1-git-send-email-andriy.shevchenko@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-03-05 17:44:53 +01:00
Andy Lutomirski	956421fbb7	x86/asm/entry/64: Remove a bogus 'ret_from_fork' optimization 'ret_from_fork' checks TIF_IA32 to determine whether 'pt_regs' and the related state make sense for 'ret_from_sys_call'. This is entirely the wrong check. TS_COMPAT would make a little more sense, but there's really no point in keeping this optimization at all. This fixes a return to the wrong user CS if we came from int 0x80 in a 64-bit task. Signed-off-by: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: <stable@vger.kernel.org> Link: http://lkml.kernel.org/r/4710be56d76ef994ddf59087aad98c000fbab9a4.1424989793.git.luto@amacapital.net [ Backported from tip:x86/asm. ] Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-03-05 01:12:23 +01:00
Wang Nan	5eca7453d6	x86/traps: Separate set_intr_gate() and clean up early_trap_init() As early_trap_init() doesn't use IST, replace set_intr_gate_ist() and set_system_intr_gate_ist() with their standard counterparts. set_intr_gate() requires a trace_debug symbol which we don't have and won't use. This patch separates set_intr_gate() into two parts, and uses base version in early_trap_init(). Reported-by: Andy Lutomirski <luto@amacapital.net> Signed-off-by: Wang Nan <wangnan0@huawei.com> Acked-by: Andy Lutomirski <luto@amacapital.net> Cc: <dave.hansen@linux.intel.com> Cc: <lizefan@huawei.com> Cc: <masami.hiramatsu.pt@hitachi.com> Cc: <oleg@redhat.com> Cc: <rostedt@goodmis.org> Cc: Borislav Petkov <bp@alien8.de> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1425010789-13714-1-git-send-email-wangnan0@huawei.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-03-05 00:47:29 +01:00
Andy Lutomirski	1e3fbb8a1d	x86/asm/entry/64: Remove a bogus 'ret_from_fork' optimization 'ret_from_fork' checks TIF_IA32 to determine whether 'pt_regs' and the related state make sense for 'ret_from_sys_call'. This is entirely the wrong check. TS_COMPAT would make a little more sense, but there's really no point in keeping this optimization at all. This fixes a return to the wrong user CS if we came from int 0x80 in a 64-bit task. Signed-off-by: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/4710be56d76ef994ddf59087aad98c000fbab9a4.1424989793.git.luto@amacapital.net Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-03-04 22:50:53 +01:00
Denys Vlasenko	d441c1f2b7	x86/asm/entry/64: Simplify optimistic SYSRET Avoid redundant load of %r11 (it is already loaded a few instructions before). Also simplify %rsp restoration, instead of two steps: add $0x80, %rsp mov 0x18(%rsp), %rsp we can do a simplified single step to restore user-space RSP: mov 0x98(%rsp), %rsp and get the same result. Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com> Signed-off-by: Andy Lutomirski <luto@amacapital.net> [ Clarified the changelog. ] Cc: Alexei Starovoitov <ast@plumgrid.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Drewry <wad@chromium.org> Link: http://lkml.kernel.org/r/1aef69b346a6db0d99cdfb0f5ba83e8c985e27d7.1424989793.git.luto@amacapital.net Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-03-04 22:50:52 +01:00
Denys Vlasenko	b3ab90b333	x86/asm/entry/64/compat: Use more readable constant The last instance of "mysterious" SS+8 constant is replaced by SIZEOF_PTREGS. Message-Id: <1424822419-10267-1-git-send-email-dvlasenk@redhat.com> Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com> Signed-off-by: Andy Lutomirski <luto@amacapital.net> Cc: Alexei Starovoitov <ast@plumgrid.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Will Drewry <wad@chromium.org> Link: http://lkml.kernel.org/r/d35aeba3059407ac54f472ddcfbea767ff8916ac.1424989793.git.luto@amacapital.net Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-03-04 22:50:52 +01:00
Denys Vlasenko	911d2bb5cc	x86/asm/entry/64: Use more readable constants Constants such as SS+8 or SS+8-RIP are mysterious. In most cases, SS+8 is just meant to be SIZEOF_PTREGS, SS+8-RIP is RIP's offset in the iret frame. This patch changes some of these constants to be less mysterious. No code changes (verified with objdump). Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com> Signed-off-by: Andy Lutomirski <luto@amacapital.net> Cc: Alexei Starovoitov <ast@plumgrid.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Drewry <wad@chromium.org> Link: http://lkml.kernel.org/r/1d20491384773bd606e23a382fac23ddb49b5178.1424989793.git.luto@amacapital.net Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-03-04 22:50:52 +01:00
Denys Vlasenko	14f6e9532d	x86/asm/entry/64/compat: Fold the IA32_ARG_FIXUP macro into its callers Use of a small macro - one with conditional expansion - does more harm than good. It obfuscates code, with minimal code reuse. For example, because of obfuscation it's not obvious that in 'ia32_sysenter_target', we can optimize loading of r9 - currently it is loaded with a detour through ebp. This patch folds the IA32_ARG_FIXUP macro into its callers. No code changes. Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com> Signed-off-by: Andy Lutomirski <luto@amacapital.net> Cc: Alexei Starovoitov <ast@plumgrid.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Will Drewry <wad@chromium.org> Link: http://lkml.kernel.org/r/4da092094cd78734384ac31e0d4ec1d8f69145a2.1424989793.git.luto@amacapital.net Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-03-04 22:50:52 +01:00
Denys Vlasenko	ebfc453e27	x86/asm/entry/64: Clean up and document various entry code details This patch does a lot of cleanup in comments and formatting, but it does not change any code: - Rename 'save_paranoid' to 'paranoid_entry': this makes naming similar to its "non-paranoid" sibling, 'error_entry', and to its counterpart, 'paranoid_exit'. - Use the same CFI annotation atop 'paranoid_entry' and 'error_entry'. - Fix irregular indentation of assembler operands. - Add/fix comments on top of 'paranoid_entry' and 'error_entry'. - Remove stale comment about "oldrax". - Make comments about "no swapgs" flag in ebx more prominent. - Deindent wrongly indented top-level comment atop 'paranoid_exit'. - Indent wrongly deindented comment inside 'error_entry'. Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com> Signed-off-by: Andy Lutomirski <luto@amacapital.net> Cc: Alexei Starovoitov <ast@plumgrid.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Will Drewry <wad@chromium.org> Link: http://lkml.kernel.org/r/4640f9fcd5ea46eb299b1cd6d3f5da3167d2f78d.1424989793.git.luto@amacapital.net Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-03-04 22:50:51 +01:00
Denys Vlasenko	1eeb207f87	x86/asm/entry/64: Move 'save_paranoid' and 'ret_from_fork' closer to their users For some odd reason, these two functions are at the very top of the file. "save_paranoid"'s caller is approximately in the middle of it, move it there. Move 'ret_from_fork' to be right after fork/exec helpers. This is a pure block move, nothing is changed in the function bodies. Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com> Signed-off-by: Andy Lutomirski <luto@amacapital.net> Cc: Alexei Starovoitov <ast@plumgrid.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Drewry <wad@chromium.org> Link: http://lkml.kernel.org/r/6446bbfe4094532623a5b83779b7015fec167a9d.1424989793.git.luto@amacapital.net Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-03-04 22:50:51 +01:00
Denys Vlasenko	b87cf63e2a	x86/asm/entry: Add comments about various syscall instructions SYSCALL/SYSRET and SYSENTER/SYSEXIT have weird semantics. Moreover, they differ in 32- and 64-bit mode. What is saved? What is not? Is rsp set? Are interrupts disabled? People tend to not remember these details well enough. This patch adds comments which explain in detail what registers are modified by each of these instructions. The comments are placed immediately before corresponding entry and exit points. Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com> Signed-off-by: Andy Lutomirski <luto@amacapital.net> Cc: Alexei Starovoitov <ast@plumgrid.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Drewry <wad@chromium.org> Link: http://lkml.kernel.org/r/a94b98b63527797c871a81402ff5060b18fa880a.1424989793.git.luto@amacapital.net Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-03-04 22:50:51 +01:00
Andy Lutomirski	050273d19b	x86/asm/entry/64: Remove 'int_check_syscall_exit_work' Nothing references it anymore. Reported-by: Denys Vlasenko <vda.linux@googlemail.com> Signed-off-by: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Fixes: `96b6352c12` ("x86_64, entry: Remove the syscall exit audit and schedule optimizations") Link: http://lkml.kernel.org/r/dd2a4d26ecc7a5db61b476727175cd99ae2b32a4.1424989793.git.luto@amacapital.net Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-03-04 22:50:50 +01:00
Denys Vlasenko	f2db9382c1	x86/asm/entry: Do mass removal of 'ARGOFFSET' ARGOFFSET is zero now, removing it changes no code. A few macros lost "offset" parameter, since it is always zero now too. No code changes - verified with objdump. Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com> Signed-off-by: Andy Lutomirski <luto@amacapital.net> Cc: Alexei Starovoitov <ast@plumgrid.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Drewry <wad@chromium.org> Link: http://lkml.kernel.org/r/8689f937622d9d2db0ab8be82331fa15e4ed4713.1424989793.git.luto@amacapital.net Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-03-04 22:50:50 +01:00
Denys Vlasenko	0d55083698	x86/asm/entry/64: Shrink code in 'paranoid_exit' RESTORE_EXTRA_REGS + RESTORE_C_REGS looks small, but it's a lot of instructions (fourteen). Let's reuse them. Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com> [ Cleaned up the labels. ] Signed-off-by: Andy Lutomirski <luto@amacapital.net> Cc: Alexei Starovoitov <ast@plumgrid.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Will Drewry <wad@chromium.org> Link: http://lkml.kernel.org/r/1421272101-16847-2-git-send-email-dvlasenk@redhat.com Link: http://lkml.kernel.org/r/59d71848cee3ec9eb48c0252e602efd6bd560e3c.1424989793.git.luto@amacapital.net Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-03-04 22:50:50 +01:00
Denys Vlasenko	e90e147cbc	x86/asm/entry/64: Fix comments - Misleading and slightly incorrect comments in "struct pt_regs" are fixed (four instances). - Fix incorrect comment atop EMPTY_FRAME macro. - Explain in more detail what we do with stack layout during hw interrupt. - Correct comments about "partial stack frame" which are no longer true. Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com> Signed-off-by: Andy Lutomirski <luto@amacapital.net> Cc: Alexei Starovoitov <ast@plumgrid.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Will Drewry <wad@chromium.org> Link: http://lkml.kernel.org/r/1423778052-21038-3-git-send-email-dvlasenk@redhat.com Link: http://lkml.kernel.org/r/e1f4429c491fe6ceeddb879dea2786e0f8920f9c.1424989793.git.luto@amacapital.net Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-03-04 22:50:49 +01:00
Denys Vlasenko	76f5df43ca	x86/asm/entry/64: Always allocate a complete "struct pt_regs" on the kernel stack The 64-bit entry code was using six stack slots less by not saving/restoring registers which are callee-preserved according to the C ABI, and was not allocating space for them. Only when syscalls needed a complete "struct pt_regs" was the complete area allocated and filled in. As an additional twist, on interrupt entry a "slightly less truncated pt_regs" trick is used, to make nested interrupt stacks easier to unwind. This proved to be a source of significant obfuscation and subtle bugs. For example, 'stub_fork' had to pop the return address, extend the struct, save registers, and push return address back. Ugly. 'ia32_ptregs_common' pops return address and "returns" via jmp insn, throwing a wrench into CPU return stack cache. This patch changes the code to always allocate a complete "struct pt_regs" on the kernel stack. The saving of registers is still done lazily. "Partial pt_regs" trick on interrupt stack is retained. Macros which manipulate "struct pt_regs" on stack are reworked: - ALLOC_PT_GPREGS_ON_STACK allocates the structure. - SAVE_C_REGS saves to it those registers which are clobbered by C code. - SAVE_EXTRA_REGS saves to it all other registers. - Corresponding RESTORE_* and REMOVE_PT_GPREGS_FROM_STACK macros reverse it. 'ia32_ptregs_common', 'stub_fork' and friends lost their ugly dance with the return pointer. LOAD_ARGS32 in ia32entry.S now uses symbolic stack offsets instead of magic numbers. 'error_entry' and 'save_paranoid' now use SAVE_C_REGS + SAVE_EXTRA_REGS instead of having it open-coded yet again. Patch was run-tested: 64-bit executables, 32-bit executables, strace works. Timing tests did not show measurable difference in 32-bit and 64-bit syscalls. Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com> Signed-off-by: Andy Lutomirski <luto@amacapital.net> Cc: Alexei Starovoitov <ast@plumgrid.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Will Drewry <wad@chromium.org> Link: http://lkml.kernel.org/r/1423778052-21038-2-git-send-email-dvlasenk@redhat.com Link: http://lkml.kernel.org/r/b89763d354aa23e670b9bdf3a40ae320320a7c2e.1424989793.git.luto@amacapital.net Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-03-04 22:50:49 +01:00
Denys Vlasenko	6e1327bd2b	x86/asm/entry/64: Fix incorrect symbolic constant usage: R11->ARGOFFSET Since the last fix of this nature, a few more instances have crept in. Fix them up. No object code changes (constants have the same value). Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com> Signed-off-by: Andy Lutomirski <luto@amacapital.net> Cc: Alexei Starovoitov <ast@plumgrid.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Will Drewry <wad@chromium.org> Link: http://lkml.kernel.org/r/1423778052-21038-1-git-send-email-dvlasenk@redhat.com Link: http://lkml.kernel.org/r/f5e1c4084319a42e5f14d41e2d638949ce66bc08.1424989793.git.luto@amacapital.net Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-03-04 22:50:49 +01:00
Denys Vlasenko	49db46a67b	x86/asm: Introduce push/pop macros which generate CFI_REL_OFFSET and CFI_RESTORE Sequences: pushl_cfi %reg CFI_REL_OFFSET reg, 0 and: popl_cfi %reg CFI_RESTORE reg happen quite often. This patch adds macros which generate them. No assembly changes (verified with objdump -dr vmlinux.o). Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com> Signed-off-by: Andy Lutomirski <luto@amacapital.net> Cc: Alexei Starovoitov <ast@plumgrid.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Will Drewry <wad@chromium.org> Link: http://lkml.kernel.org/r/1421017655-25561-1-git-send-email-dvlasenk@redhat.com Link: http://lkml.kernel.org/r/2202eb90f175cf45d1b2d1c64dbb5676a8ad07ad.1424989793.git.luto@amacapital.net Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-03-04 22:50:49 +01:00
Denys Vlasenko	69e8544cd0	x86/asm/64: Open-code register save/restore in trace_hardirqs() thunks This is a preparatory patch for change in "struct pt_regs" handling in entry_64.S. trace_hardirqs() thunks were (ab)using a part of the 'pt_regs' handling code, namely the SAVE_ARGS/RESTORE_ARGS macros, to save/restore registers across C function calls. Since SAVE_ARGS is going to be changed, open-code register saving/restoring here. Incidentally, this removes a bit of dead code: one SAVE_ARGS was used just to emit a CFI annotation, but it also generated unreachable assembly instructions. Take a page from thunk_32.S and use push/pop instructions instead of movq, they are far shorter: 1 or 2 bytes versus 5, and no need for instructions to adjust %rsp: text data bss dec hex filename 333 40 0 373 175 thunk_64_movq.o 104 40 0 144 90 thunk_64_push_pop.o [ This is ugly as sin, but we'll fix up the ugliness in the next patch. I see no point in reordering patches just to avoid an ugly intermediate state. --Andy ] Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com> Signed-off-by: Andy Lutomirski <luto@amacapital.net> Cc: Alexei Starovoitov <ast@plumgrid.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Will Drewry <wad@chromium.org> Link: http://lkml.kernel.org/r/1420927210-19738-4-git-send-email-dvlasenk@redhat.com Link: http://lkml.kernel.org/r/4c979ad604f0f02c5ade3b3da308b53eabd5e198.1424989793.git.luto@amacapital.net Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-03-04 22:50:48 +01:00
Linus Torvalds	b8e81a3b68	Merge git://git.kernel.org/pub/scm/virt/kvm/kvm Pull KVM fixes from Marcelo Tosatti: "KVM bug fixes, including a SVM interrupt injection regression fix, MIPS and ARM bug fixes" * git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: MIPS: Enable after disabling interrupt KVM: MIPS: Fix trace event to save PC directly KVM: SVM: fix interrupt injection (apic->isr_count always 0) KVM: emulate: fix CMPXCHG8B on 32-bit hosts KVM: VMX: fix build without CONFIG_SMP arm/arm64: KVM: Add exit reaons to kvm_exit event tracing ARM: KVM: Fix size check in __coherent_cache_guest_page	2015-03-04 09:54:10 -08:00
Jiang Liu	63f1789ec7	x86/PCI/ACPI: Ignore resources consumed by host bridge itself When parsing resources for PCI host bridge, we should ignore resources consumed by host bridge itself and only report window resources available to child PCI busses. Fixes: `593669c2ac` (x86/PCI/ACPI: Use common ACPI resource interfaces ...) Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2015-03-04 14:09:44 +01:00
Ingo Molnar	f8e92fb4b0	A more involved rework of the alternatives framework to be able to pad instructions and thus make using the alternatives macros more straightforward and without having to figure out old and new instruction sizes but have the toolchain figure that out for us. Furthermore, it optimizes JMPs used so that fetch and decode can be relieved with smaller versions of the JMPs, where possible. Some stats: x86_64 defconfig: Alternatives sites total: 2478 Total padding added (in Bytes): 6051 The padding is currently done for: X86_FEATURE_ALWAYS X86_FEATURE_ERMS X86_FEATURE_LFENCE_RDTSC X86_FEATURE_MFENCE_RDTSC X86_FEATURE_SMAP This is with the latest version of the patchset. Of course, on each machine the alternatives sites actually being patched are a proper subset of the total number. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABAgAGBQJU9ekpAAoJEBLB8Bhh3lVKyjYP/AiHEiHkkjnpwTt49kUtUMI6 GIlGfJVNjp5LLnSRD/fkL/wdkBgQtMzr9O1g8Qi/lbFqxsOFteU9f1OtLx34ZwZw MhtdiHcrKGMsaIxTJh4FaqPHBT5ussm2yn1jlAX+LgILd3dpqe3oytsO8JihcK9j t2u9V/Lq92TV7zXxGgWJsPc86WhhgdldlU3X96S++Di18bnDaKbGkzthU6WzZG/H qtFZ5bfK8TlVHYduft+D9ZPzFYGp1WCOa03qU4+Djaxw02HDB6Ltysend9zg0lB1 RT/BP0PwHD3mOL11qpgtV1ChCbR8FJMN/z5+YdSNJgzDQA0H5Sf0UueTweosfAz+ /iC5t/wkegdYtqtA0nKVypYOJCS+UdfMZXenYgtSUJl6drB6I5BCW4mVft3AuWo+ EilPGpblvmjWRx1HiF4/Q/5zrSWHzmKQDyXuyxI9m0OUxAGAM0+8CY6wOqRA5pX+ /f5MjZ1hXELQGhl5Qdj4nqJacICGevJ8WYdZ53B+uYVxz7fbXk9hSYcZKT94UshD qSdaV4XJSuC7pDKqiWoNWXp5N1g+D2BgfwoQEr/RnodFZRlfc+cmOv/visak0OLr E/pp1vJvCi3+T3ImX1MCDiXmflQtFctiL3hNgMXYK2IGhJb2RDC2bFeZkksOHuAE BGgrn+usQDjVlikEnfI3 =0KXp -----END PGP SIGNATURE----- Merge tag 'alternatives_padding' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp into x86/asm Pull alternative instructions framework improvements from Borislav Petkov: "A more involved rework of the alternatives framework to be able to pad instructions and thus make using the alternatives macros more straightforward and without having to figure out old and new instruction sizes but have the toolchain figure that out for us. Furthermore, it optimizes JMPs used so that fetch and decode can be relieved with smaller versions of the JMPs, where possible. Some stats: x86_64 defconfig: Alternatives sites total: 2478 Total padding added (in Bytes): 6051 The padding is currently done for: X86_FEATURE_ALWAYS X86_FEATURE_ERMS X86_FEATURE_LFENCE_RDTSC X86_FEATURE_MFENCE_RDTSC X86_FEATURE_SMAP This is with the latest version of the patchset. Of course, on each machine the alternatives sites actually being patched are a proper subset of the total number." Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-03-04 06:36:15 +01:00
Ingo Molnar	d2c032e3dc	Linux 4.0-rc2 -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQEcBAABAgAGBQJU9enEAAoJEHm+PkMAQRiG/ewIAJ4MW4tcAhaVj6ndCF3+uL/b RaVm1apUjsTloe5Fl0TT9J5CO3zdOetmMNToy2sf0W4MJDIyHf21o83l7eniV/6q al/c3fQ6HVtNjiSUNghTtzVlL+gUD1F60b9BGYi1V5h2Mp8u0NG1alTGLQfCB8sE ArB+v2aWEdSPn7mZDA0Yuc1In+8bkpht3oy+OLD/8JNkqqLnml9YOyPjM1cuRpBr NxKCLcPzSHH9/nR3T6XtkxXYV5xD3+CDm9roJhfHukoFmfT/G3C65Zcp2KEed/Cw QQpu+ox7fpUs10F/Fbfm8AE+tRB4o2sGh97sprXrO5oaFdx6FPIBo4WN8i/Vy68= =qpY+ -----END PGP SIGNATURE----- Merge tag 'v4.0-rc2' into x86/asm, to refresh the tree Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-03-04 06:35:43 +01:00
Brian Gerst	7e8e385aaf	x86/compat: Remove sys32_vm86_warning The check against lastcomm is racy, and the message it produces isn't necessary. vm86 support can be disabled on a 32-bit kernel also, and doesn't have this message. Switch to sys_ni_syscall instead. Signed-off-by: Brian Gerst <brgerst@gmail.com> Cc: Borislav Petkov <bp@alien8.de> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1425439896-8322-4-git-send-email-brgerst@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-03-04 06:16:21 +01:00
Brian Gerst	2aa4a71092	x86/compat: Merge native and compat 32-bit syscall tables Combine the 32-bit syscall tables into one file. Signed-off-by: Brian Gerst <brgerst@gmail.com> Cc: Borislav Petkov <bp@alien8.de> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1425439896-8322-3-git-send-email-brgerst@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-03-04 06:16:21 +01:00
Brian Gerst	29a5ff97fa	x86/compat: Remove compat_ni_syscall() compat_ni_syscall() does the same thing as sys_ni_syscall(). Signed-off-by: Brian Gerst <brgerst@gmail.com> Cc: Borislav Petkov <bp@alien8.de> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1425439896-8322-2-git-send-email-brgerst@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-03-04 06:16:21 +01:00
Radim Krčmář	f563db4bdb	KVM: SVM: fix interrupt injection (apic->isr_count always 0) In commit `b4eef9b36d`, we started to use hwapic_isr_update() != NULL instead of kvm_apic_vid_enabled(vcpu->kvm). This didn't work because SVM had it defined and "apicv" path in apic_{set,clear}_isr() does not change apic->isr_count, because it should always be 1. The initial value of apic->isr_count was based on kvm_apic_vid_enabled(vcpu->kvm), which is always 0 for SVM, so KVM could have injected interrupts when it shouldn't. Fix it by implicitly setting SVM's hwapic_isr_update to NULL and make the initial isr_count depend on hwapic_isr_update() for good measure. Fixes: `b4eef9b36d` ("kvm: x86: vmx: NULL out hwapic_isr_update() in case of !enable_apicv") Reported-and-tested-by: Borislav Petkov <bp@suse.de> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2015-03-02 19:04:40 -03:00
Linus Torvalds	a38ecbbd0b	Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Ingo Molnar: "A CR4-shadow 32-bit init fix, plus two typo fixes" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86: Init per-cpu shadow copy of CR4 on 32-bit CPUs too x86/platform/intel-mid: Fix trivial printk message typo in intel_mid_arch_setup() x86/cpu/intel: Fix trivial typo in intel_tlb_table[]	2015-03-01 12:22:44 -08:00
Linus Torvalds	d7b48fec35	Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fixes from Ingo Molnar: "Two kprobes fixes and a handful of tooling fixes" * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf tools: Make sparc64 arch point to sparc perf symbols: Define EM_AARCH64 for older OSes perf top: Fix SIGBUS on sparc64 perf tools: Fix probing for PERF_FLAG_FD_CLOEXEC flag perf tools: Fix pthread_attr_setaffinity_np build error perf tools: Define _GNU_SOURCE on pthread_attr_setaffinity_np feature check perf bench: Fix order of arguments to memcpy_alloc_mem kprobes/x86: Check for invalid ftrace location in __recover_probed_insn() kprobes/x86: Use 5-byte NOP when the code might be modified by ftrace	2015-03-01 11:56:13 -08:00
Steven Rostedt	5b2bdbc845	x86: Init per-cpu shadow copy of CR4 on 32-bit CPUs too Commit: `1e02ce4ccc` ("x86: Store a per-cpu shadow copy of CR4") added a shadow CR4 such that reads and writes that do not modify the CR4 execute much faster than always reading the register itself. The change modified cpu_init() in common.c, so that the shadow CR4 gets initialized before anything uses it. Unfortunately, there's two cpu_init()s in common.c. There's one for 64-bit and one for 32-bit. The commit only added the shadow init to the 64-bit path, but the 32-bit path needs the init too. Link: http://lkml.kernel.org/r/20150227125208.71c36402@gandalf.local.home Fixes: `1e02ce4ccc` "x86: Store a per-cpu shadow copy of CR4" Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Acked-by: Andy Lutomirski <luto@amacapital.net> Cc: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/20150227145019.2bdd4354@gandalf.local.home Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-02-28 08:04:20 +01:00
Ingo Molnar	5838d18955	Merge branch 'linus' into x86/urgent, to merge dependent patch Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-02-28 08:03:10 +01:00
Juergen Gross	b8f05c8803	x86/xen: correct bug in p2m list initialization Commit `054954eb05` ("xen: switch to linear virtual mapped sparse p2m list") introduced an error. During initialization of the p2m list a p2m identity area mapped by a complete identity pmd entry has to be split up into smaller chunks sometimes, if a non-identity pfn is introduced in this area. If this non-identity pfn is not at index 0 of a p2m page the new p2m page needed is initialized with wrong identity entries, as the identity pfns don't start with the value corresponding to index 0, but with the initial non-identity pfn. This results in weird wrong mappings. Correct the wrong initialization by starting with the correct pfn. Cc: stable@vger.kernel.org # 3.19 Reported-by: Stefan Bader <stefan.bader@canonical.com> Signed-off-by: Juergen Gross <jgross@suse.com> Tested-by: Stefan Bader <stefan.bader@canonical.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com>	2015-02-27 14:53:19 +00:00
Wang Nan	b4d8327024	x86/traps: Enable DEBUG_STACK after cpu_init() for TRAP_DB/BP Before this patch early_trap_init() installs DEBUG_STACK for X86_TRAP_BP and X86_TRAP_DB. However, DEBUG_STACK doesn't work correctly until cpu_init() <-- trap_init(). This patch passes 0 to set_intr_gate_ist() and set_system_intr_gate_ist() instead of DEBUG_STACK to let it use same stack as kernel, and installs DEBUG_STACK for them in trap_init(). As core runs at ring 0 between early_trap_init() and trap_init(), there is no chance to get a bad stack before trap_init(). As NMI is also enabled in trap_init(), we don't need to care about is_debug_stack() and related things used in arch/x86/kernel/nmi.c. Signed-off-by: Wang Nan <wangnan0@huawei.com> Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Acked-by: Steven Rostedt <rostedt@goodmis.org> Cc: <dave.hansen@linux.intel.com> Cc: <lizefan@huawei.com> Cc: <luto@amacapital.net> Cc: <oleg@redhat.com> Link: http://lkml.kernel.org/r/1424929779-13174-1-git-send-email-wangnan0@huawei.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-02-26 12:29:20 +01:00
Andy Lutomirski	72c6fb4f74	x86/ia32-compat: Fix CLONE_SETTLS bitness of copy_thread() CLONE_SETTLS is expected to write a TLS entry in the GDT for 32-bit callers and to set FSBASE for 64-bit callers. The correct check is is_ia32_task(), which returns true in the context of a 32-bit syscall. TIF_IA32 is set if the task itself has a 32-bit personality, which is not the same thing. Signed-off-by: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Denys Vlasenko <dvlasenk@redhat.com> Link: http://lkml.kernel.org/r/45e2d0d695393d76406a0c7225b82c76223e0cc5.1424822291.git.luto@amacapital.net Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-02-25 08:27:50 +01:00
Andy Lutomirski	08571f1ae3	x86/ptrace: Remove checks for TIF_IA32 when changing CS and SS The ability for modified CS and/or SS to be useful has nothing to do with TIF_IA32. Similarly, if there's an exploit involving changing CS or SS, it's exploitable with or without a TIF_IA32 check. So just delete the check. Signed-off-by: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Denys Vlasenko <dvlasenk@redhat.com> Link: http://lkml.kernel.org/r/71c7ab36456855d11ae07edd4945a7dfe80f9915.1424822291.git.luto@amacapital.net Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-02-25 08:27:49 +01:00
Linus Torvalds	b24e2bdde4	xen: regression and bug fixes for 4.0-rc1 - Fix two regression introduced in 4.0-rc1 affecting PV/PVH guests in certain configurations. - Prevent pvscsi frontends bypassing backend checks. - Allow privcmd hypercalls to be preempted even on kernel with voluntary preemption. This fixes soft-lockups with long running toolstack hypercalls (e.g., when creating/destroying large domains). -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.12 (GNU/Linux) iQEcBAABAgAGBQJU7GDMAAoJEFxbo/MsZsTRj7AIAIFD2Dgu9p56nx6rUA6JOWP9 2O1VlINEkggUFNCekTJ3HkAjHreC2Ir67nk1PTTfL3ZN+PqxEv+IoUF1XKJxE/G3 0KchszK0MVnc7UBrUJWRUEpGo92iIC9QBc8MyDN6dGbfV5lT4aHB5ug/aSZqG8HV 8ITOftfwnVE8ukOK9bSRPZRejnoRMbGUmqpr0X17sE4Zb8+LEf0iO5Yh88NEjzxM OD5hL3TGN5uXh6rYHS+U82M1g83G4Q7v10WIN5O96QhA6e5rVbVNsohHpL8IHY5q kazw7wBY//SKGCGniACYnh+vZDeNddwCz8chj6S49h272zanMleyolrUjlcMGzQ= =xmcy -----END PGP SIGNATURE----- Merge tag 'stable/for-linus-4.0-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip Pull xen bugfixes from David Vrabel: "Xen regression and bug fixes for 4.0-rc1 - Fix two regressions introduced in 4.0-rc1 affecting PV/PVH guests in certain configurations. - Prevent pvscsi frontends bypassing backend checks. - Allow privcmd hypercalls to be preempted even on kernel with voluntary preemption. This fixes soft-lockups with long running toolstack hypercalls (e.g., when creating/destroying large domains)" * tag 'stable/for-linus-4.0-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: x86/xen: Initialize cr4 shadow for 64-bit PV(H) guests xen-scsiback: mark pvscsi frontend request consumed only after last read x86/xen: allow privcmd hypercalls to be preempted x86/xen: Make sure X2APIC_ENABLE bit of MSR_IA32_APICBASE is not set	2015-02-24 09:18:48 -08:00
Linus Torvalds	84257ce6b7	Lguest weird config build fix, and update to the documentation. Thanks, Rusty. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABAgAGBQJU69NiAAoJENkgDmzRrbjxKIEQAJe/Us6fBCToMFt71DNMtwHf 3SsyDCUciO/lu4H2/hVRI0vK0gYEAmhPZHGTKMKvsg46NJ5Y+BXvg2ZICEbn52Oh 1guQdNJhJbjNtmHhZQbNzOLJfEz0/0IJmcCKZoJ3hVGyQ54qTMila26FUJ3qWAzO s+Wlc7sw1FScSy6RMYA1UQ7foTxpIQZZPc9QKZCMU16ZnK8OutQemQOPJp/pAmm9 LKuOLAIeF18AbuiC0BChG+wm4U7v8Tci5kEI2gRb77VUjGLcccpZPoJyaP1ZaJP/ cvkraLvERrWe+PPF/9CCB8yMDsOMfGDMPo0FrdSpSWZ6JnfZmLzKZLmtMlyGjVVA y7zn8eBGIJP519Y+WK9/ySO/FL+ibbep13wjwTYOv6AQHkwpH0VvtKOo7EDYAWht Prk+xB7u0l1SBX5LPzwN9QAu8WVeECjgYpjY7Oa3Pss/RLoZlNkV6Gk7Bhe9b7Zy NOW8nIlKRCQWS5KsM2XYRTkjB0BNoYJyHULIFSrIOBRgRVxnF1vOqRSGnIEzmrBI O2ewyUirFmTQI2RZzUL9UPidBo1DbdcUSYNV1Ex4vW+y/++v2VS7RaKDDvqU9JKG TahKm2a9Xi/0dDkKu6IwFPCtEjdu5wqRawvryz8AztjpNQbucO1qBpxbc1AmAM3g jlBf1S7OtPXf5bOk8QOy =pJ4p -----END PGP SIGNATURE----- Merge tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux Pull lguest fixes from Rusty Russell: "Lguest weird config build fix, and update to the documentation" * tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux: lguest: update help text. lguest: now depends on PCI	2015-02-24 09:14:43 -08:00
Yannick Guerrini	579deee571	x86/platform/intel-mid: Fix trivial printk message typo in intel_mid_arch_setup() Change 'Uknown' to 'Unknown' Signed-off-by: Yannick Guerrini <yguerrini@tomshardware.fr> Cc: trivial@kernel.org Link: http://lkml.kernel.org/r/1424710358-10140-1-git-send-email-yguerrini@tomshardware.fr Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-02-24 08:52:37 +01:00
Paolo Bonzini	4ff6f8e61e	KVM: emulate: fix CMPXCHG8B on 32-bit hosts This has been broken for a long time: it broke first in 2.6.35, then was almost fixed in 2.6.36 but this one-liner slipped through the cracks. The bug shows up as an infinite loop in Windows 7 (and newer) boot on 32-bit hosts without EPT. Windows uses CMPXCHG8B to write to page tables, which causes a page fault if running without EPT; the emulator is then called from kvm_mmu_page_fault. The loop then happens if the higher 4 bytes are not 0; the common case for this is that the NX bit (bit 63) is 1. Fixes: `6550e1f165` Fixes: `16518d5ada` Cc: stable@vger.kernel.org # 2.6.35+ Reported-by: Erik Rull <erik.rull@rdsoftware.de> Tested-by: Erik Rull <erik.rull@rdsoftware.de> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-02-23 22:28:48 +01:00
Radim Krčmář	21bc8dc5b7	KVM: VMX: fix build without CONFIG_SMP 'apic' is not defined if !CONFIG_X86_64 && !CONFIG_X86_LOCAL_APIC. Posted interrupt makes no sense without CONFIG_SMP, and CONFIG_X86_LOCAL_APIC will be set with it. Reported-by: kbuild test robot <fengguang.wu@intel.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-02-23 22:28:48 +01:00
Boris Ostrovsky	5054daa285	x86/xen: Initialize cr4 shadow for 64-bit PV(H) guests Commit `1e02ce4ccc` ("x86: Store a per-cpu shadow copy of CR4") introduced CR4 shadows. These shadows are initialized in early boot code. The commit missed initialization for 64-bit PV(H) guests that this patch adds. Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com>	2015-02-23 16:30:26 +00:00
David Vrabel	fdfd811ddd	x86/xen: allow privcmd hypercalls to be preempted Hypercalls submitted by user space tools via the privcmd driver can take a long time (potentially many 10s of seconds) if the hypercall has many sub-operations. A fully preemptible kernel may deschedule such as task in any upcall called from a hypercall continuation. However, in a kernel with voluntary or no preemption, hypercall continuations in Xen allow event handlers to be run but the task issuing the hypercall will not be descheduled until the hypercall is complete and the ioctl returns to user space. These long running tasks may also trigger the kernel's soft lockup detection. Add xen_preemptible_hcall_begin() and xen_preemptible_hcall_end() to bracket hypercalls that may be preempted. Use these in the privcmd driver. When returning from an upcall, call xen_maybe_preempt_hcall() which adds a schedule point if if the current task was within a preemptible hypercall. Since _cond_resched() can move the task to a different CPU, clear and set xen_in_preemptible_hcall around the call. Signed-off-by: David Vrabel <david.vrabel@citrix.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>	2015-02-23 16:30:24 +00:00
Boris Ostrovsky	31795b470b	x86/xen: Make sure X2APIC_ENABLE bit of MSR_IA32_APICBASE is not set Commit `d524165cb8` ("x86/apic: Check x2apic early") tests X2APIC_ENABLE bit of MSR_IA32_APICBASE when CONFIG_X86_X2APIC is off and panics the kernel when this bit is set. Xen's PV guests will pass this MSR read to the hypervisor which will return its version of the MSR, where this bit might be set. Make sure we clear it before returning MSR value to the caller. Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com>	2015-02-23 16:30:23 +00:00
Borislav Petkov	e0bc8d179e	x86/lib/memcpy_64.S: Convert memcpy to ALTERNATIVE_2 macro Make REP_GOOD variant the default after alternatives have run. Signed-off-by: Borislav Petkov <bp@suse.de>	2015-02-23 13:55:52 +01:00
Borislav Petkov	a77600cd03	x86/lib/memmove_64.S: Convert memmove() to ALTERNATIVE macro Make it execute the ERMS version if support is present and we're in the forward memmove() part and remove the unfolded alternatives section definition. Signed-off-by: Borislav Petkov <bp@suse.de>	2015-02-23 13:54:14 +01:00
Borislav Petkov	84d95ad4cb	x86/lib/memset_64.S: Convert to ALTERNATIVE_2 macro Make alternatives replace single JMPs instead of whole memset functions, thus decreasing the amount of instructions copied during patching time at boot. While at it, make it use the REP_GOOD version by default which means alternatives NOP out the JMP to the other versions, as REP_GOOD is set by default on the majority of relevant x86 processors. Signed-off-by: Borislav Petkov <bp@suse.de>	2015-02-23 13:50:59 +01:00
Borislav Petkov	a930dc4543	x86/asm: Cleanup prefetch primitives This is based on a patch originally by hpa. With the current improvements to the alternatives, we can simply use %P1 as a mem8 operand constraint and rely on the toolchain to generate the proper instruction sizes. For example, on 32-bit, where we use an empty old instruction we get: apply_alternatives: feat: 632+8, old: (c104648b, len: 4), repl: (c195566c, len: 4) c104648b: alt_insn: 90 90 90 90 c195566c: rpl_insn: 0f 0d 4b 5c ... apply_alternatives: feat: 632+8, old: (c18e09b4, len: 3), repl: (c1955948, len: 3) c18e09b4: alt_insn: 90 90 90 c1955948: rpl_insn: 0f 0d 08 ... apply_alternatives: feat: 6*32+8, old: (c1190cf9, len: 7), repl: (c1955a79, len: 7) c1190cf9: alt_insn: 90 90 90 90 90 90 90 c1955a79: rpl_insn: 0f 0d 0d a0 d4 85 c1 all with the proper padding done depending on the size of the replacement instruction the compiler generates. Signed-off-by: Borislav Petkov <bp@suse.de> Cc: H. Peter Anvin <hpa@linux.intel.com>	2015-02-23 13:44:17 +01:00
Borislav Petkov	c70e1b475f	x86/asm: Use alternative_2() in rdtsc_barrier() ... now that we have it. Acked-by: Andy Lutomirski <luto@amacapital.net> Cc: Richard Weinberger <richard@nod.at> Signed-off-by: Borislav Petkov <bp@suse.de>	2015-02-23 13:44:17 +01:00
Borislav Petkov	6620ef28c8	x86/lib/clear_page_64.S: Convert to ALTERNATIVE_2 macro Move clear_page() up so that we can get 2-byte forward JMPs when patching: apply_alternatives: feat: 3*32+16, old: (ffffffff8130adb0, len: 5), repl: (ffffffff81d0b859, len: 5) ffffffff8130adb0: alt_insn: 90 90 90 90 90 recompute_jump: new_displ: 0x0000003e ffffffff81d0b859: rpl_insn: eb 3e 66 66 90 even though the compiler generated 5-byte JMPs which we padded with 5 NOPs. Also, make the REP_GOOD version be the default as the majority of machines set REP_GOOD. This way we get to save ourselves the JMP: old insn VA: 0xffffffff813038b0, CPU feat: X86_FEATURE_REP_GOOD, size: 5, padlen: 0 clear_page: ffffffff813038b0 <clear_page>: ffffffff813038b0: e9 0b 00 00 00 jmpq ffffffff813038c0 repl insn: 0xffffffff81cf0e92, size: 0 old insn VA: 0xffffffff813038b0, CPU feat: X86_FEATURE_ERMS, size: 5, padlen: 0 clear_page: ffffffff813038b0 <clear_page>: ffffffff813038b0: e9 0b 00 00 00 jmpq ffffffff813038c0 repl insn: 0xffffffff81cf0e92, size: 5 ffffffff81cf0e92: e9 69 2a 61 ff jmpq ffffffff81303900 ffffffff813038b0 <clear_page>: ffffffff813038b0: e9 69 2a 61 ff jmpq ffffffff8091631e Signed-off-by: Borislav Petkov <bp@suse.de>	2015-02-23 13:44:16 +01:00
Borislav Petkov	8e65f6e03a	x86/entry_32: Convert X86_INVD_BUG to ALTERNATIVE macro Booting a 486 kernel on an AMD guest with this patch applied, says: apply_alternatives: feat: 0*32+25, old: (c160a475, len: 5), repl: (c19557d4, len: 5) c160a475: alt_insn: 68 10 35 00 c1 c19557d4: rpl_insn: 68 80 39 00 c1 which is: old insn VA: 0xc160a475, CPU feat: X86_FEATURE_XMM, size: 5 simd_coprocessor_error: c160a475: 68 10 35 00 c1 push $0xc1003510 <do_general_protection> repl insn: 0xc19557d4, size: 5 c160a475: 68 80 39 00 c1 push $0xc1003980 <do_simd_coprocessor_error> Signed-off-by: Borislav Petkov <bp@suse.de>	2015-02-23 13:44:15 +01:00
Borislav Petkov	669f8a9001	x86/smap: Use ALTERNATIVE macro ... and drop unfolded version. No need for ASM_NOP3 anymore either as the alternatives do the proper padding at build time and insert proper NOPs at boot time. There should be no apparent operational change from this patch. Signed-off-by: Borislav Petkov <bp@suse.de>	2015-02-23 13:44:14 +01:00
Borislav Petkov	de2ff88884	x86/lib/copy_user_64.S: Convert to ALTERNATIVE_2 Use the asm macro and drop the locally grown version. Signed-off-by: Borislav Petkov <bp@suse.de>	2015-02-23 13:44:13 +01:00
Borislav Petkov	090a3f6155	x86/lib/copy_page_64.S: Use generic ALTERNATIVE macro ... instead of the semi-version with the spelled out sections. What is more, make the REP_GOOD version be the default copy_page() version as the majority of the relevant x86 CPUs do set X86_FEATURE_REP_GOOD. Thus, copy_page gets compiled to: ffffffff8130af80 <copy_page>: ffffffff8130af80: e9 0b 00 00 00 jmpq ffffffff8130af90 <copy_page_regs> ffffffff8130af85: b9 00 02 00 00 mov $0x200,%ecx ffffffff8130af8a: f3 48 a5 rep movsq %ds:(%rsi),%es:(%rdi) ffffffff8130af8d: c3 retq ffffffff8130af8e: 66 90 xchg %ax,%ax ffffffff8130af90 <copy_page_regs>: ... and after the alternatives have run, the JMP to the old, unrolled version gets NOPed out: ffffffff8130af80 <copy_page>: ffffffff8130af80: 66 66 90 xchg %ax,%ax ffffffff8130af83: 66 90 xchg %ax,%ax ffffffff8130af85: b9 00 02 00 00 mov $0x200,%ecx ffffffff8130af8a: f3 48 a5 rep movsq %ds:(%rsi),%es:(%rdi) ffffffff8130af8d: c3 retq On modern uarches, those NOPs are cheaper than the unconditional JMP previously. Signed-off-by: Borislav Petkov <bp@suse.de>	2015-02-23 13:44:12 +01:00
Borislav Petkov	4fd4b6e553	x86/alternatives: Use optimized NOPs for padding Alternatives allow now for an empty old instruction. In this case we go and pad the space with NOPs at assembly time. However, there are the optimal, longer NOPs which should be used. Do that at patching time by adding alt_instr.padlen-sized NOPs at the old instruction address. Cc: Andy Lutomirski <luto@amacapital.net> Signed-off-by: Borislav Petkov <bp@suse.de>	2015-02-23 13:44:12 +01:00
Borislav Petkov	48c7a2509f	x86/alternatives: Make JMPs more robust Up until now we had to pay attention to relative JMPs in alternatives about how their relative offset gets computed so that the jump target is still correct. Or, as it is the case for near CALLs (opcode e8), we still have to go and readjust the offset at patching time. What is more, the static_cpu_has_safe() facility had to forcefully generate 5-byte JMPs since we couldn't rely on the compiler to generate properly sized ones so we had to force the longest ones. Worse than that, sometimes it would generate a replacement JMP which is longer than the original one, thus overwriting the beginning of the next instruction at patching time. So, in order to alleviate all that and make using JMPs more straight-forward we go and pad the original instruction in an alternative block with NOPs at build time, should the replacement(s) be longer. This way, alternatives users shouldn't pay special attention so that original and replacement instruction sizes are fine but the assembler would simply add padding where needed and not do anything otherwise. As a second aspect, we go and recompute JMPs at patching time so that we can try to make 5-byte JMPs into two-byte ones if possible. If not, we still have to recompute the offsets as the replacement JMP gets put far away in the .altinstr_replacement section leading to a wrong offset if copied verbatim. For example, on a locally generated kernel image old insn VA: 0xffffffff810014bd, CPU feat: X86_FEATURE_ALWAYS, size: 2 __switch_to: ffffffff810014bd: eb 21 jmp ffffffff810014e0 repl insn: size: 5 ffffffff81d0b23c: e9 b1 62 2f ff jmpq ffffffff810014f2 gets corrected to a 2-byte JMP: apply_alternatives: feat: 332+21, old: (ffffffff810014bd, len: 2), repl: (ffffffff81d0b23c, len: 5) alt_insn: e9 b1 62 2f ff recompute_jumps: next_rip: ffffffff81d0b241, tgt_rip: ffffffff810014f2, new_displ: 0x00000033, ret len: 2 converted to: eb 33 90 90 90 and a 5-byte JMP: old insn VA: 0xffffffff81001516, CPU feat: X86_FEATURE_ALWAYS, size: 2 __switch_to: ffffffff81001516: eb 30 jmp ffffffff81001548 repl insn: size: 5 ffffffff81d0b241: e9 10 63 2f ff jmpq ffffffff81001556 gets shortened into a two-byte one: apply_alternatives: feat: 332+21, old: (ffffffff81001516, len: 2), repl: (ffffffff81d0b241, len: 5) alt_insn: e9 10 63 2f ff recompute_jumps: next_rip: ffffffff81d0b246, tgt_rip: ffffffff81001556, new_displ: 0x0000003e, ret len: 2 converted to: eb 3e 90 90 90 ... and so on. This leads to a net win of around 40ish replacements * 3 bytes savings =~ 120 bytes of I$ on an AMD guest which means some savings of precious instruction cache bandwidth. The padding to the shorter 2-byte JMPs are single-byte NOPs which on smart microarchitectures means discarding NOPs at decode time and thus freeing up execution bandwidth. Signed-off-by: Borislav Petkov <bp@suse.de>	2015-02-23 13:44:11 +01:00
Borislav Petkov	4332195c56	x86/alternatives: Add instruction padding Up until now we have always paid attention to make sure the length of the new instruction replacing the old one is at least less or equal to the length of the old instruction. If the new instruction is longer, at the time it replaces the old instruction it will overwrite the beginning of the next instruction in the kernel image and cause your pants to catch fire. So instead of having to pay attention, teach the alternatives framework to pad shorter old instructions with NOPs at buildtime - but only in the case when len(old instruction(s)) < len(new instruction(s)) and add nothing in the >= case. (In that case we do add_nops() when patching). This way the alternatives user shouldn't have to care about instruction sizes and simply use the macros. Add asm ALTERNATIVE* flavor macros too, while at it. Also, we need to save the pad length in a separate struct alt_instr member for NOP optimization and the way to do that reliably is to carry the pad length instead of trying to detect whether we're looking at single-byte NOPs or at pathological instruction offsets like e9 90 90 90 90, for example, which is a valid instruction. Thanks to Michael Matz for the great help with toolchain questions. Signed-off-by: Borislav Petkov <bp@suse.de>	2015-02-23 13:44:00 +01:00
Borislav Petkov	db477a3386	x86/alternatives: Cleanup DPRINTK macro Make it pass __func__ implicitly. Also, dump info about each replacing we're doing. Fixup comments and style while at it. Signed-off-by: Borislav Petkov <bp@suse.de>	2015-02-23 13:35:50 +01:00
Borislav Petkov	338ea55579	x86/lib/copy_user_64.S: Remove FIX_ALIGNMENT define It is unconditionally enabled so remove it. No object file change. Signed-off-by: Borislav Petkov <bp@suse.de>	2015-02-23 13:35:49 +01:00
Yannick Guerrini	a927792c19	x86/cpu/intel: Fix trivial typo in intel_tlb_table[] Change 'ssociative' to 'associative' Signed-off-by: Yannick Guerrini <yguerrini@tomshardware.fr> Cc: Borislav Petkov <bp@suse.de> Cc: Bryan O'Donoghue <pure.logic@nexus-software.ie> Cc: Chris Bainbridge <chris.bainbridge@gmail.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Steven Honeyman <stevenhoneyman@gmail.com> Cc: trivial@kernel.org Link: http://lkml.kernel.org/r/1424558510-1420-1-git-send-email-yguerrini@tomshardware.fr Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-02-22 08:55:58 +01:00
Linus Torvalds	f9677375b0	Merge branch 'x86-platform-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull Intel Quark SoC support from Ingo Molnar: "This adds support for Intel Quark X1000 SoC boards, used in the low power 32-bit x86 Intel Galileo microcontroller board intended for the Arduino space. There's been some preparatory core x86 patches for Quark CPU quirks merged already, but this rounds it all up and adds Kconfig enablement. It's a clean hardware enablement addition tree at this point" * 'x86-platform-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/intel/quark: Fix simple_return.cocci warnings x86/intel/quark: Fix ptr_ret.cocci warnings x86/intel/quark: Add Intel Quark platform support x86/intel/quark: Add Isolated Memory Regions for Quark X1000	2015-02-21 11:12:07 -08:00
Linus Torvalds	10436cf881	Merge branch 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull locking fixes from Ingo Molnar: "Two fixes: the paravirt spin_unlock() corruption/crash fix, and an rtmutex NULL dereference crash fix" * 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/spinlocks/paravirt: Fix memory corruption on unlock locking/rtmutex: Avoid a NULL pointer dereference on deadlock	2015-02-21 10:45:03 -08:00
Linus Torvalds	5fbe4c224c	Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull misc x86 fixes from Ingo Molnar: "This contains: - EFI fixes - a boot printout fix - ASLR/kASLR fixes - intel microcode driver fixes - other misc fixes Most of the linecount comes from an EFI revert" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/mm/ASLR: Avoid PAGE_SIZE redefinition for UML subarch x86/microcode/intel: Handle truncated microcode images more robustly x86/microcode/intel: Guard against stack overflow in the loader x86, mm/ASLR: Fix stack randomization on 64-bit systems x86/mm/init: Fix incorrect page size in init_memory_mapping() printks x86/mm/ASLR: Propagate base load address calculation Documentation/x86: Fix path in zero-page.txt x86/apic: Fix the devicetree build in certain configs Revert "efi/libstub: Call get_memory_map() to obtain map and desc sizes" x86/efi: Avoid triple faults during EFI mixed mode calls	2015-02-21 10:41:29 -08:00
Linus Torvalds	b5aeca54d0	Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 uprobe/kprobe fixes from Ingo Molnar: "This contains two uprobes fixes, an uprobes comment update and a kprobes fix" * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: kprobes/x86: Mark 2 bytes NOP as boostable uprobes/x86: Fix 2-byte opcode table uprobes/x86: Fix 1-byte opcode tables uprobes/x86: Add comment with insn opcodes, mnemonics and why we dont support them	2015-02-21 10:39:16 -08:00
Linus Torvalds	3f4d9925e9	Merge branches 'core-urgent-for-linus' and 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull rcu fix and x86 irq fix from Ingo Molnar: - Fix a bug that caused an RCU warning splat. - Two x86 irq related fixes: a hotplug crash fix and an ACPI IRQ registry fix. * 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: rcu: Clear need_qs flag to prevent splat * 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/irq: Check for valid irq descriptor in check_irq_vectors_for_cpu_disable() x86/irq: Fix regression caused by commit `b568b8601f`	2015-02-21 10:36:06 -08:00
Petr Mladek	2a6730c8b6	kprobes/x86: Check for invalid ftrace location in __recover_probed_insn() __recover_probed_insn() should always be called from an address where an instructions starts. The check for ftrace_location() might help to discover a potential inconsistency. This patch adds WARN_ON() when the inconsistency is detected. Also it adds handling of the situation when the original code can not get recovered. Suggested-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Signed-off-by: Petr Mladek <pmladek@suse.cz> Cc: Ananth NMavinakayanahalli <ananth@in.ibm.com> Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com> Cc: David S. Miller <davem@davemloft.net> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Jiri Kosina <jkosina@suse.cz> Cc: Steven Rostedt <rostedt@goodmis.org> Link: http://lkml.kernel.org/r/1424441250-27146-3-git-send-email-pmladek@suse.cz Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-02-21 10:33:31 +01:00
Petr Mladek	650b7b23cb	kprobes/x86: Use 5-byte NOP when the code might be modified by ftrace can_probe() checks if the given address points to the beginning of an instruction. It analyzes all the instructions from the beginning of the function until the given address. The code might be modified by another Kprobe. In this case, the current code is read into a buffer, int3 breakpoint is replaced by the saved opcode in the buffer, and can_probe() analyzes the buffer instead. There is a bug that __recover_probed_insn() tries to restore the original code even for Kprobes using the ftrace framework. But in this case, the opcode is not stored. See the difference between arch_prepare_kprobe() and arch_prepare_kprobe_ftrace(). The opcode is stored by arch_copy_kprobe() only from arch_prepare_kprobe(). This patch makes Kprobe to use the ideal 5-byte NOP when the code can be modified by ftrace. It is the original instruction, see ftrace_make_nop() and ftrace_nop_replace(). Note that we always need to use the NOP for ftrace locations. Kprobes do not block ftrace and the instruction might get modified at anytime. It might even be in an inconsistent state because it is modified step by step using the int3 breakpoint. The patch also fixes indentation of the touched comment. Note that I found this problem when playing with Kprobes. I did it on x86_64 with gcc-4.8.3 that supported -mfentry. I modified samples/kprobes/kprobe_example.c and added offset 5 to put the probe right after the fentry area: static struct kprobe kp = { .symbol_name = "do_fork", + .offset = 5, }; Then I was able to load kprobe_example before jprobe_example but not the other way around: $> modprobe jprobe_example $> modprobe kprobe_example modprobe: ERROR: could not insert 'kprobe_example': Invalid or incomplete multibyte or wide character It did not make much sense and debugging pointed to the bug described above. Signed-off-by: Petr Mladek <pmladek@suse.cz> Acked-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Cc: Ananth NMavinakayanahalli <ananth@in.ibm.com> Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com> Cc: David S. Miller <davem@davemloft.net> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Jiri Kosina <jkosina@suse.cz> Cc: Steven Rostedt <rostedt@goodmis.org> Link: http://lkml.kernel.org/r/1424441250-27146-2-git-send-email-pmladek@suse.cz Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-02-21 10:33:30 +01:00
Jiri Kosina	570e1aa84c	x86/mm/ASLR: Avoid PAGE_SIZE redefinition for UML subarch Commit `f47233c2d3` ("x86/mm/ASLR: Propagate base load address calculation") causes PAGE_SIZE redefinition warnings for UML subarch builds. This is caused by added includes that were leftovers from previous patch versions are are not actually needed (especially page_types.h inlcude in module.c). Drop those stray includes. Reported-by: kbuild test robot <fengguang.wu@intel.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz> Cc: Borislav Petkov <bp@suse.de> Cc: H. Peter Anvin <hpa@linux.intel.com> Cc: Kees Cook <keescook@chromium.org> Link: http://lkml.kernel.org/r/alpine.LNX.2.00.1502201017240.28769@pobox.suse.cz Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-02-20 10:55:32 +01:00
Ross Zwisler	719d359dc7	x86/asm: Add support for the pcommit instruction Add support for the new pcommit (persistent commit) instruction. This instruction was announced in the document "Intel Architecture Instruction Set Extensions Programming Reference" with reference number 319433-022: https://software.intel.com/sites/default/files/managed/0d/53/319433-022.pdf The pcommit instruction ensures that data that has been flushed from the processor's cache hierarchy with clwb, clflushopt or clflush is accepted to memory and is durable on the DIMM. The primary use case for this is persistent memory. This function shows how to properly use clwb/clflushopt/clflush and pcommit with appropriate fencing: void flush_and_commit_buffer(void vaddr, unsigned int size) { void vend = vaddr + size - 1; for (; vaddr < vend; vaddr += boot_cpu_data.x86_clflush_size) clwb(vaddr); /* Flush any possible final partial cacheline / clwb(vend); / * sfence to order clwb/clflushopt/clflush cache flushes * mfence via mb() also works / wmb(); / pcommit and the required sfence for ordering / pcommit_sfence(); } After this function completes the data pointed to by vaddr is has been accepted to memory and will be durable if the vaddr points to persistent memory. Pcommit must always be ordered by an mfence or sfence, so to help simplify things we include both the pcommit and the required sfence in the alternatives generated by pcommit_sfence(). The other option is to keep them separated, but on platforms that don't support pcommit this would then turn into: void flush_and_commit_buffer(void vaddr, unsigned int size) { void vend = vaddr + size - 1; for (; vaddr < vend; vaddr += boot_cpu_data.x86_clflush_size) clwb(vaddr); / Flush any possible final partial cacheline / clwb(vend); / * sfence to order clwb/clflushopt/clflush cache flushes * mfence via mb() also works / wmb(); nop(); / from pcommit(), via alternatives / / * sfence to order pcommit * mfence via mb() also works */ wmb(); } This is still correct, but now you've got two fences separated by only a nop. With the commit and the fence together in pcommit_sfence() you avoid the final unneeded fence. Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com> Acked-by: Borislav Petkov <bp@suse.de> Acked-by: H. Peter Anvin <hpa@linux.intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1424367448-24254-1-git-send-email-ross.zwisler@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-02-20 09:43:36 +01:00
David Vrabel	e3a1f6cac1	x86: pte_protnone() and pmd_protnone() must check entry is not present Since _PAGE_PROTNONE aliases _PAGE_GLOBAL it is only valid if _PAGE_PRESENT is clear. Make pte_protnone() and pmd_protnone() check for this. This fixes a 64-bit Xen PV guest regression introduced by `8a0516ed8b` ("mm: convert p[te\|md]_numa users to p[te\|md]_protnone_numa"). Any userspace process would endlessly fault. In a 64-bit PV guest, userspace page table entries have _PAGE_GLOBAL set by the hypervisor. This meant that any fault on a present userspace entry (e.g., a write to a read-only mapping) would be misinterpreted as a NUMA hinting fault and the fault would not be correctly handled, resulting in the access endlessly faulting. Signed-off-by: David Vrabel <david.vrabel@citrix.com> Acked-by: Mel Gorman <mgorman@suse.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2015-02-19 15:04:49 -08:00
Linus Torvalds	27a22ee4c7	Merge branch 'kbuild' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild Pull kbuild updates from Michal Marek: - several cleanups in kbuild - serialize multiple config targets so that 'make defconfig kvmconfig' works - The cc-ifversion macro got support for an else-branch 'kbuild' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild: kbuild,gcov: simplify kernel/gcov/Makefile more kbuild: allow cc-ifversion to have the argument for false condition kbuild,gcov: simplify kernel/gcov/Makefile kbuild,gcov: remove unnecessary workaround kbuild: do not add $(call ...) to invoke cc-version or cc-fullversion kbuild: fix cc-ifversion macro kbuild: drop $(version_h) from MRPROPER_FILES kbuild: use mixed-targets when two or more config targets are given kbuild: remove redundant line from bounds.h/asm-offsets.h kbuild: merge bounds.h and asm-offsets.h rules kbuild: Drop support for clean-rule	2015-02-19 10:07:08 -08:00
Ingo Molnar	1fbe23e0de	* Two fixes hardening microcode data handling. (Quentin Casasnovas) -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABAgAGBQJU5cyVAAoJEBLB8Bhh3lVKJWEP/1eK+XyiVdxV7FRuPmmgjGUC mD6MypFCwc942orTdltm9vlRFTU6OE1AkfEVX3NKawy+lzt/mE+TbWzwx+mr26un pqyKgSGGKqACDBADgUiVxubXffhAx9Ke5obScZoFA/Yp+l7os8wkwr6AMjwU+XgU FMGKWra0yeZsfCSkQgQ+q+RjQe2TOjh3YYVcwpPRaU6jkJ3CR+MNQ2tVmJnEVMAq Q3xEce8mMN+xpuyTlCyvpSIid8M9klAeXb5kjqfffJGSBmtVJ+nn3mDV1a0ejeYQ aA6X6SBwpIBPPjhwJrsgUcGC0GeF4X0TKjg1F6ZEW0lN9/mipiM+t3OEgxcBH0G7 SOAUtQTRDasj0bJd5qKOhAWWmFoXjSc61XiMYUreOWDPoaje76oql+iN1auZsRSh RS6KCwYgdqQYscN05L/l4iHgJXeGUTm45BJ6rJb1wEJ9OldO5yK4O42Tn0IZyQ4g w10poQY4jkjPnHVUWvk5IQpu7AcBiZtov201a89QpRyPGFoGgOOu7n5y0nDLxuvK m3L8LrEve8xO8xdqyidQKE3KGLnDcuuTx9XscbEGtoNWQ8oGIYuYW9DvKzCK/kmU u24tx65tygcQ6NJoUW/S3mIwnlyM1egqziXjpzmfR2TvGraqNCkIkocYyf1etVjh c0Mem02eJTvarxEpvHAa =qoLG -----END PGP SIGNATURE----- Merge tag 'microcode_fixes_for-3.21' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp into x86/urgent Pull microcode fixes from Borislav Petkov: - Two fixes hardening microcode data handling. (Quentin Casasnovas) Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-02-19 13:32:42 +01:00
Quentin Casasnovas	35a9ff4eec	x86/microcode/intel: Handle truncated microcode images more robustly We do not check the input data bounds containing the microcode before copying a struct microcode_intel_header from it. A specially crafted microcode could cause the kernel to read invalid memory and lead to a denial-of-service. Signed-off-by: Quentin Casasnovas <quentin.casasnovas@oracle.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Link: http://lkml.kernel.org/r/1422964824-22056-3-git-send-email-quentin.casasnovas@oracle.com [ Made error message differ from the next one and flipped comparison. ] Signed-off-by: Borislav Petkov <bp@suse.de>	2015-02-19 12:42:23 +01:00
Quentin Casasnovas	f84598bd7c	x86/microcode/intel: Guard against stack overflow in the loader mc_saved_tmp is a static array allocated on the stack, we need to make sure mc_saved_count stays within its bounds, otherwise we're overflowing the stack in _save_mc(). A specially crafted microcode header could lead to a kernel crash or potentially kernel execution. Signed-off-by: Quentin Casasnovas <quentin.casasnovas@oracle.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Link: http://lkml.kernel.org/r/1422964824-22056-1-git-send-email-quentin.casasnovas@oracle.com Signed-off-by: Borislav Petkov <bp@suse.de>	2015-02-19 12:41:37 +01:00
Ingo Molnar	a267b0a349	Merge branch 'tip-x86-kaslr' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp into x86/urgent Pull ASLR and kASLR fixes from Borislav Petkov: - Add a global flag announcing KASLR state so that relevant code can do informed decisions based on its setting. (Jiri Kosina) - Fix a stack randomization entropy decrease bug. (Hector Marco-Gisbert) Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-02-19 12:31:34 +01:00
Hector Marco-Gisbert	4e7c22d447	x86, mm/ASLR: Fix stack randomization on 64-bit systems The issue is that the stack for processes is not properly randomized on 64 bit architectures due to an integer overflow. The affected function is randomize_stack_top() in file "fs/binfmt_elf.c": static unsigned long randomize_stack_top(unsigned long stack_top) { unsigned int random_variable = 0; if ((current->flags & PF_RANDOMIZE) && !(current->personality & ADDR_NO_RANDOMIZE)) { random_variable = get_random_int() & STACK_RND_MASK; random_variable <<= PAGE_SHIFT; } return PAGE_ALIGN(stack_top) + random_variable; return PAGE_ALIGN(stack_top) - random_variable; } Note that, it declares the "random_variable" variable as "unsigned int". Since the result of the shifting operation between STACK_RND_MASK (which is 0x3fffff on x86_64, 22 bits) and PAGE_SHIFT (which is 12 on x86_64): random_variable <<= PAGE_SHIFT; then the two leftmost bits are dropped when storing the result in the "random_variable". This variable shall be at least 34 bits long to hold the (22+12) result. These two dropped bits have an impact on the entropy of process stack. Concretely, the total stack entropy is reduced by four: from 2^28 to 2^30 (One fourth of expected entropy). This patch restores back the entropy by correcting the types involved in the operations in the functions randomize_stack_top() and stack_maxrandom_size(). The successful fix can be tested with: $ for i in `seq 1 10`; do cat /proc/self/maps \| grep stack; done 7ffeda566000-7ffeda587000 rw-p 00000000 00:00 0 [stack] 7fff5a332000-7fff5a353000 rw-p 00000000 00:00 0 [stack] 7ffcdb7a1000-7ffcdb7c2000 rw-p 00000000 00:00 0 [stack] 7ffd5e2c4000-7ffd5e2e5000 rw-p 00000000 00:00 0 [stack] ... Once corrected, the leading bytes should be between 7ffc and 7fff, rather than always being 7fff. Signed-off-by: Hector Marco-Gisbert <hecmargi@upv.es> Signed-off-by: Ismael Ripoll <iripoll@upv.es> [ Rebased, fixed 80 char bugs, cleaned up commit message, added test example and CVE ] Signed-off-by: Kees Cook <keescook@chromium.org> Cc: <stable@vger.kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Fixes: CVE-2015-1593 Link: http://lkml.kernel.org/r/20150214173350.GA18393@www.outflux.net Signed-off-by: Borislav Petkov <bp@suse.de>	2015-02-19 12:21:36 +01:00
Ingo Molnar	ee408b4207	Merge branch 'tip-x86-mm' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp into x86/urgent Pull boot printout fix from Borislav Petkov. Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-02-19 11:59:18 +01:00

1 2 3 4 5 ...

21024 Commits