2
0
mirror of https://github.com/edk2-porting/linux-next.git synced 2024-12-29 15:43:59 +08:00
Commit Graph

5060 Commits

Author SHA1 Message Date
Yinghai Lu
4c66a73f07 x86: sparse_irq: fix typo in debug print out
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-16 16:53:13 +02:00
Russ Anderson
fc8c2d763b x86: Add sysfs entries for UV v4
Create /sys/firmware/sgi_uv sysfs entries for partition_id and coherence_id.

Signed-off-by: Russ Anderson <rja@sgi.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-16 16:53:13 +02:00
Russ Anderson
922402f15a x86: Add UV partition call v4
Add a bios call to return partitioning related info.

Signed-off-by: Russ Anderson <rja@sgi.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-16 16:53:13 +02:00
Russ Anderson
7f5942329e x86: Add UV bios call infrastructure v4
Add the EFI callback function and associated wrapper code.
Initialize SAL system table entry info at boot time.

Signed-off-by: Russ Anderson <rja@sgi.com>
Signed-off-by: Paul Jackson <pj@sgi.com>
Acked-by: Huang Ying <ying.huang@intel.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-16 16:53:13 +02:00
Russ Anderson
a50f70b175 x86: Add UV EFI table entry v4
Look for a UV entry in the EFI tables.

Signed-off-by: Russ Anderson <rja@sgi.com>
Signed-off-by: Paul Jackson <pj@sgi.com>
Acked-by: Huang Ying <ying.huang@intel.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-16 16:53:13 +02:00
Ingo Molnar
37762b6ffb x86, UV: add uv_setup_irq() and uv_teardown_irq() functions, v3, fix
fix:

 arch/x86/kernel/uv_irq.c: In function 'uv_ack_apic':
 arch/x86/kernel/uv_irq.c:26: error: implicit declaration of function 'ack_APIC_irq'

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-16 16:53:13 +02:00
Dean Nelson
4173a0e737 x86, UV: add uv_setup_irq() and uv_teardown_irq() functions, v3
Provide a means for UV interrupt MMRs to be setup with the message to be sent
when an MSI is raised.

Signed-off-by: Dean Nelson <dcn@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-16 16:53:12 +02:00
Venki Pallipadi
5f79f2f2ad hpet: clean up warning
Fix the below compile warnings due to recent HPET MSI changes

arch/x86/kernel/hpet.c:48: warning: 'hpet_devs' defined but not used
arch/x86/kernel/hpet.c:50: warning: 'per_cpu__cpu_hpet_dev' defined but not used

Reported-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-16 16:53:12 +02:00
Yinghai Lu
c81bba49a1 x86: print out irq nr for msi/ht, v3
v2: fix hpet compiling error
v3: Bjorn want to use dev_printk instead

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Cc: Bjorn Helgaas <bjorn.helgaas@hp.com>
Cc: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-16 16:53:12 +02:00
Cyrill Gorcunov
c1370b49cc x86: io-apic - interrupt remapping fix
Clean up obscure for() cycle with straight while() form

Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
Cc: "Maciej W. Rozycki" <macro@linux-mips.org>
Acked-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-16 16:53:12 +02:00
Yinghai Lu
7564676813 x86: irq no should not use hex in /proc/interrupts
Arjan van de Ven noticed that we changed IRQ numbers from decimal
to hex in /proc/interrupts - that can break user-space utilities
like irqbalanced.

Reported-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-16 16:53:12 +02:00
Jack Steiner
8da077d6f3 x86, uv: fix ordering of calls to uv_system_init & uv_cpu_init
Fix problem caused by reordering of the calls to uv_cpu_init() &
uv_system_init. Originally, uv_cpu_init() was called AFTER uv_system_init.
This order was recently broken as a side-effect of other patches.

With this patch, initialization of cpu 0 is now done by the system_init
call.

Signed-off-by: Jack Steiner <steiner@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-16 16:53:11 +02:00
Cyrill Gorcunov
5ffa4eb222 x86: io-apic - interrupt remapping fix
Interrupt remapping could lead to NULL dereference in case of
kzalloc failed and memory leak in other way. So fix the
both cases.

Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
Cc: "Maciej W. Rozycki" <macro@linux-mips.org>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-16 16:53:11 +02:00
Yinghai Lu
9d98598d2f sparseirq: remove some debug print out
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Cc: Yinghai Lu <yhlu.kernel@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-16 16:53:11 +02:00
Cyrill Gorcunov
56ffa1a028 x86: io-apic - do not use KERN_DEBUG marker too much, fix
Yinghai Lu reported:

| >  0 add_pin_to_irq: irq 15 --> apic 0 pin 15
| > IOAPIC[0]: Set routing entry (8-15 -> 0x3f -> IRQ 15 Mode:0 Active:0)
| >  8-16 8-17 8-18 8-19 8-20 8-21 8-22 8-23 (apicid-pin) not connected
| >  9-0 9-1 9-2 9-3 9-4 9-5 9-6 9-7 9-8 9-9 9-10 9-11 9-12 9-13 9-14 9-15
| > 9-16 9-17 9-18 9-19 9-20 9-21 9-22 9-23 (apicid-pin) not connected
| >
|
| only first one not connected at first, and ...

here is a quick fix for this.

Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-16 16:53:10 +02:00
Cyrill Gorcunov
b189892de4 x86: apic - fix unused vars warning in calibrate_APIC_clock
If we don't have CONFIG_X86_PM_TIMER=y compiler warns about
unused variables. Move PM timer based calibration into a
separate function and make the code cleaner and the compiler
happy as well.

Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-16 16:53:10 +02:00
Cyrill Gorcunov
08ad776e3c x86: apic - skip writting ESR register if we dont have on
On 82489DX we don't have ESR register so we should not
write it.

Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-16 16:53:10 +02:00
Cyrill Gorcunov
9df08f1095 x86: apic - lapic_setup_esr does not handle esr_disable - fix it
lapic_setup_esr doesn't handle esr_disable inquire.
The error brought in during unification process.
Fix it.

Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-16 16:53:10 +02:00
Yinghai Lu
823b259b80 x86: print out apic id in hex format
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-16 16:53:09 +02:00
Cyrill Gorcunov
87783be4c2 x86: io-apic - get rid of __DO_ACTION macro
Replace __DO_ACTION macro with io_apic_modify_irq function.
This allow us to 'grep' definitions being hided by
__DO_ACTION macro:

	__unmask_IO_APIC_irq
	__mask_IO_APIC_irq
	__mask_and_edge_IO_APIC_irq
	__unmask_and_level_IO_APIC_irq

Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
Acked-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-16 16:53:09 +02:00
Steven Noonan
ba374c9bae x86: fix HPET compiler error when not using CONFIG_PCI_MSI
Added dummy function for hpet_setup_msi_irq().

Signed-off-by: Steven Noonan <steven@uplinklabs.net>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-16 16:53:09 +02:00
Venki Pallipadi
f0ed4e695f x86: using HPET in MSI mode and setting up per CPU HPET timers, fix
On Sat, Sep 06, 2008 at 06:03:53AM -0700, Ingo Molnar wrote:
>
> it crashes two testsystems, the fault on a NULL pointer in hpet init,
> with:
>
> initcall print_all_ICs+0x0/0x520 returned 0 after 26 msecs
> calling  hpet_late_init+0x0/0x1c0
> BUG: unable to handle kernel NULL pointer dereference at 000000000000008c
> IP: [<ffffffff80d228be>] hpet_late_init+0xfe/0x1c0
> PGD 0
> Oops: 0000 [1] SMP
> CPU 0
> Modules linked in:
> Pid: 1, comm: swapper Not tainted 2.6.27-rc5 #29725
> RIP: 0010:[<ffffffff80d228be>]  [<ffffffff80d228be>] hpet_late_init+0xfe/0x1c0
> RSP: 0018:ffff88003fa07dd0  EFLAGS: 00010246
> RAX: 0000000000000000 RBX: 0000000000000003 RCX: 0000000000000000
> RDX: ffffc20000000160 RSI: 0000000000000000 RDI: 0000000000000003
> RBP: ffff88003fa07e90 R08: 0000000000000000 R09: ffff88003fa07dd0
> R10: 0000000000000001 R11: 0000000000000000 R12: ffff88003fa07dd0
> R13: 0000000000000002 R14: ffffc20000000000 R15: 000000006f57e511
> FS:  0000000000000000(0000) GS:ffffffff80cf6a80(0000) knlGS:0000000000000000
> CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
> CR2: 000000000000008c CR3: 0000000000201000 CR4: 00000000000006e0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> Process swapper (pid: 1, threadinfo ffff88003fa06000, task ffff88003fa08000)
> Stack:  00000000fed00000 ffffc20000000000 0000000100000003 0000000800000002
>  0000000000000000 0000000000000000 0000000000000000 0000000000000000
>  0000000000000000 0000000000000000 0000000000000000 0000000000000000
> Call Trace:
>  [<ffffffff80d227c0>] ? hpet_late_init+0x0/0x1c0
>  [<ffffffff80209045>] do_one_initcall+0x45/0x190
>  [<ffffffff80296f39>] ? register_irq_proc+0x19/0xe0
>  [<ffffffff80d0d140>] ? early_idt_handler+0x0/0x73
>  [<ffffffff80d0dabc>] kernel_init+0x14c/0x1b0
>  [<ffffffff80942ac1>] ? trace_hardirqs_on_thunk+0x3a/0x3f
>  [<ffffffff8020dbd9>] child_rip+0xa/0x11
>  [<ffffffff8020ceee>] ? restore_args+0x0/0x30
>  [<ffffffff80d0d970>] ? kernel_init+0x0/0x1b0
>  [<ffffffff8020dbcf>] ? child_rip+0x0/0x11
> Code: 20 48 83 c1 01 48 39 f1 75 e3 44 89 e8 4c 8b 05 29 29 22 00 31 f6 48 8d 78 01 66 66 90 89 f0 48 8d 04 80 48 c1 e0 05 4a 8d 0c 00 <f6> 81 8c 00 00 00 08 74 26 8b 81 80 00 00 00 8b 91 88 00 00 00
> RIP  [<ffffffff80d228be>] hpet_late_init+0xfe/0x1c0
>  RSP <ffff88003fa07dd0>
> CR2: 000000000000008c
> Kernel panic - not syncing: Fatal exception

There was one code path, with CONFIG_PCI_MSI disabled, where we were accessing
hpet_devs without initialization. That resulted in the above crash. The change
below adds a check for hpet_devs.

Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Shaohua Li <shaohua.li@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-16 16:53:09 +02:00
Cyrill Gorcunov
2a554fb132 x86: io-apic - do not use KERN_DEBUG marker too much
Do not use KERN_DEBUG several times on the same line being printed.
Introduced by mine previous patch, sorry.

Reported-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-16 16:53:08 +02:00
Yinghai Lu
79c09698ca x86: lapic address print out like io apic addr
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-16 16:53:08 +02:00
Cyrill Gorcunov
3c2cbd2490 x86: io-apic - code style cleaning for setup_IO_APIC_irqs
By changing printout form we are able to shrink (and clean up) code a bit.

Former printout example:

	init IO_APIC IRQs
	 IO-APIC (apicid-pin) 1-1, 1-2, 1-3 not connected.
	 IO-APIC (apicid-pin) 2-1, 2-2, 2-3 not connected.

New printout example:

	init IO_APIC IRQs
	 1-1 1-2 1-3 (apicid-pin) not connected
	 2-1 2-2 2-3 (apicid-pin) not connected

Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-16 16:53:08 +02:00
venkatesh.pallipadi@intel.com
26afe5f2fb x86: HPET_MSI Initialise per-cpu HPET timers
Initialize a per CPU HPET MSI timer when possible. We retain the HPET
timer 0 (IRQ 0) and timer 1 (IRQ 8) as is when legacy mode is being used. We
setup the remaining HPET timers as per CPU MSI based timers. This per CPU
timer will eliminate the need for timer broadcasting with IRQ 0 when there
is non-functional LAPIC timer across CPU deep C-states.

If there are more CPUs than number of available timers, CPUs that do not
find any timer to use will continue using LAPIC and IRQ 0 broadcast.

Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Shaohua Li <shaohua.li@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-16 16:53:08 +02:00
Ingo Molnar
4588c1f035 x86: HPET_MSI Basic HPET_MSI setup code, cleanups
small style cleanups.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-16 16:53:07 +02:00
venkatesh.pallipadi@intel.com
58ac1e76ce x86: HPET_MSI Basic HPET_MSI setup code
Basic HPET MSI setup code. Routines to perform basic MSI read write
in HPET memory map and setting up irq_chip for HPET MSI.

Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Shaohua Li <shaohua.li@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-16 16:53:07 +02:00
venkatesh.pallipadi@intel.com
b40d575bf0 x86: HPET_MSI Refactor code in preparation for HPET_MSI
Preparatory patch before the actual HPET MSI changes. Sets up hpet_set_mode
and hpet_next_event for the MSI related changes. Just the code
refactoring and should be zero functional change.

Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Shaohua Li <shaohua.li@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-16 16:53:07 +02:00
Cyrill Gorcunov
ac54a6c937 x86: io-apic - declare irq_cfg_lock for SPARSE_IRQ only
We use irq_cfg_lock lock in SPARSE_IRQ only context so
move it under #ifdef and compiler will be happy.

Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-16 16:53:06 +02:00
Cyrill Gorcunov
676f4a920b x86: io-apic - use ARRAY_SIZE macro
Make the code width a bit shorter with ARRAY_SIZE macro.

Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-16 16:53:06 +02:00
Yinghai Lu
02c1df199c x86: print out if acpi want physical flat of all
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-16 16:53:05 +02:00
Yinghai Lu
a11b5abef5 x2apic: fix reserved APIC register accesses in print_local_APIC()
APIC_ARBPRI is a reserved register for XAPIC and beyond.
APIC_RRR is a reserved register except for 82489DX, APIC for Pentium processors.
APIC_EOI is a write only register.
APIC_DFR is reserved in x2apic mode.

Access to these registers in x2apic will result in #GP fault. Fix these
apic register accesses.

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Maciej W. Rozycki <macro@linux-mips.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-16 16:53:04 +02:00
Cyrill Gorcunov
1dd6ba2e17 x86: apic - unify smp_spurious/error_interrupt declaration
According to entry_64.S we do pass pt_regs pointer
into interrupt handlers but don't use them. So we
safely may merge the declarations.

Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-16 16:53:04 +02:00
Yinghai Lu
e492c5ae85 x86: let 64 bit to use 32 bit calibrate_apic_clock
Use the 32-bit APIC calibration code - it's more mature.

Signed-of-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-16 16:53:03 +02:00
Yinghai Lu
0f611ffaea x86: rename apic_32.c and apic_64.c to apic.c
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-16 16:53:03 +02:00
Yinghai Lu
6c15822752 x86: apic copy apic_64.c to apic_32.c
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-16 16:53:03 +02:00
Yinghai Lu
2f04fa888d x86: apic copy calibrate_APIC_clock to each other in apic_32/64.c
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-16 16:53:03 +02:00
Yinghai Lu
dc1528dd86 x86: apic unify smp_spurious/error_interrupt
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-16 16:53:03 +02:00
Yinghai Lu
773763df7d x86: merge header files in apic_xx.c
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-16 16:53:02 +02:00
Yinghai Lu
be7a656fe1 x86: copy detect_init_APIC to the other
Signed-off-by: Yinghai Lu <yhlu.kernel@mgail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-16 16:53:02 +02:00
Yinghai Lu
fa2bd35a8d x86: merge APIC_init_uniprocessor
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-16 16:53:02 +02:00
Yinghai Lu
f28c0ae21d x86: make apic_32/64.c more like
except x2apic, detec_init_APIC, and calibrating_APIC_clock

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-16 16:53:02 +02:00
Yinghai Lu
3491998dd5 x86: add hard_smp_prossor_id with MACRO in io_apic_xx.c
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-16 16:53:02 +02:00
Yinghai Lu
49899eacce x86: use HAVE_X2APIC in apic_64.c
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-16 16:53:01 +02:00
Yinghai Lu
b3c5117050 x86: apic_xx.c order variables
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-16 16:53:01 +02:00
Cyrill Gorcunov
6460bc73aa x86: apic - unify smp_apic_timer_interrupt
Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-16 16:53:01 +02:00
Cyrill Gorcunov
457cc52d46 x86: apic_32.c should use __cpuinit section
All callers are __init or __cpuinit so there is no need
to hold this code without CPU_HOTPLUG being set.

Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-16 16:53:01 +02:00
Cyrill Gorcunov
89c38c2867 x86: apic - unify setup_local_APIC
- remove useless read of APIC_LVR
- wrap with preempt_disable/enable
- check for integrated APIC just in place

v2: fix by Yinghai Lu.
	fix lapic_is_integrated using
	let 64-bit too have pic_mode

Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-16 16:53:01 +02:00
Cyrill Gorcunov
80e5609cab x86: apic_64.c - add sanity check for spurious vector definition
Do not check for SPUTIOUS_APIC_VECTOR definition twice.
Check it once - is what we need.

Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-16 16:53:00 +02:00
Cyrill Gorcunov
920fa7a507 x86: apic - unify setup_apicpmtimer
Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-16 16:53:00 +02:00
Cyrill Gorcunov
7c37e48b51 x86: apic - introduce get_physical_broadcast for 64bit
We don't really use it now on 64bit mode but
could reserve it for future.

Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-16 16:53:00 +02:00
Cyrill Gorcunov
db4b5525ca x86: apic_64.c - setup_APIC_timer has to be __cpuinit function
There is no need to hold this code if CPU_HOTPLUG is not
defined.

Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-16 16:53:00 +02:00
Alok Kataria
2699574b3c x86: VMI, initialize IRQ vector
Initialize vector_irq for the vmi used vector, to point to correct irq.

Signed-off-by: Alok N Kataria <akataria@vmware.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-16 16:52:59 +02:00
Yinghai Lu
052c0bff9b x86: fix probe_nr_irqs for xen
otherwise Xen is _completely_ unusable with 5 or more VCPUs.
(when !CONFIG_HAVE_SPARSE_IRQ).

based on Alex Nixon's patch.

also add +1 offset after redir_entries

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Acked-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Acked-by: Alex Nixon <alex.nixon@citrix.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-16 16:52:59 +02:00
Yinghai Lu
a2d332fa34 x86: fix 32-bit ioapic lockup with sparseirqs
Missed two lines when copying.

Fix panic on one of Ingo's machines that need to adjust ioapic id when
acpi off/ 32bit.

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-16 16:52:59 +02:00
Yinghai Lu
e89eb43863 x86: sparse_irq needs spin_lock in allocations
Suresh Siddha noticed that we should have a spinlock around it.

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-16 16:52:59 +02:00
Ingo Molnar
0c425cec64 warning: fix arch x86 kernel io_apic c
fix warning:

  arch/x86/kernel/io_apic.c: In function ‘print_local_APIC’:
  arch/x86/kernel/io_apic.c:1786: warning: format ‘%08x’ expects type ‘unsigned int’, but argument 2 has type ‘u64’
  arch/x86/kernel/io_apic.c:1787: warning: format ‘%08x’ expects type ‘unsigned int’, but argument 2 has type ‘u64’

By creating uniform behavior on 32-bit and 64-bit and printing out the ICR
value in two 32-bit words.

Code has changed:

   text	   data	    bss	    dec	    hex	filename
  22901	  19650	  17040	  59591	   e8c7	io_apic.o.before
  22899	  19650	  17040	  59589	   e8c5	io_apic.o.after

Due to the 32-bit cast narrowing the printed out value on 64-bit.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-16 16:52:58 +02:00
Alex Nixon
bf9d3cf73e xen: Fix bug `do_IRQ: cannot handle IRQ -1 vector 0x6 cpu 1'
Following commit 9c3f2468d8339866d9ef6a25aae31a8909c6be0d, do_IRQ()
looks up the IRQ number in the per-cpu variable vector_irq.

This commit makes Xen initialise an identity vector_irq map for both X86_32 and X86_64.

Signed-off-by: Alex Nixon <alex.nixon@citrix.com>
Acked-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-16 16:52:58 +02:00
Yinghai Lu
9d6a4d0823 x86: probe nr_irqs even only mptable is used
for !CONFIG_HAVE_SPARSE_IRQ

fix:

 In file included from arch/x86/kernel/early-quirks.c:18:
 include/asm/io_apic.h: In function 'probe_nr_irqs':
 include/asm/io_apic.h:209: error: 'NR_IRQS' undeclared (first use in this function)
 include/asm/io_apic.h:209: error: (Each undeclared identifier is reported only once
 include/asm/io_apic.h:209: error: for each function it appears in.)

v2: fix by Ingo

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-16 16:52:58 +02:00
Yinghai Lu
8f09cd20a2 x86: make HAVE_SPARSE_IRQ support selectable
Ingo said sparse_irq is some intrusive. need to make it selectable

to make it simple, remove irq_desc as parameter in some functions.
(ack, eoi, set_affinity).
may need to make member if irq_chip to take irq_desc, or struct irq later.

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-16 16:52:57 +02:00
Yinghai Lu
ffd5aae781 x86: print local APIC of APs one by one
instead of print that of all APs at the time

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-16 16:52:57 +02:00
Yinghai Lu
29ccbbf232 x86: remove first_free_entry/pin_map_size
no user now

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-16 16:52:57 +02:00
Yinghai Lu
3eb2cce84b x86: unify ack_apic_edge
use code in 64 to replace
	move_native_irq(irq, desc);
in 32 bit

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-16 16:52:56 +02:00
Yinghai Lu
4e738e2f30 x86: unify mask_IO_APIC_irq
use MACRO for 32 bit too

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-16 16:52:56 +02:00
Yinghai Lu
c691cc8452 io_apic: make 32 bit have io_apic resource in /proc/iomem
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-16 16:52:56 +02:00
Yinghai Lu
26d347c2c0 rename io_apic_64.c and io_apic_32.c to io_apic.c
The two files are now line by line equal. (sans a printk)

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-16 16:52:55 +02:00
Ingo Molnar
54168ed7f2 x86: make io_apic_32.c the same as io_apic_64.c
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-16 16:52:55 +02:00
Yinghai Lu
047c8fdb87 x86: make io_apic_64.c and io_apic_32.c the same
all the same except INTR_REMAPPING related and ioapic io resource.

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-16 16:52:55 +02:00
Yinghai Lu
aa45f97b1b x86: remove ioapic_force
no user left.

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-16 16:52:55 +02:00
Yinghai Lu
f876d213a5 x86: make 64 handle sis_apic_bug like the 32 bit
do we have 64bit system with sis chipset?

[ mingo@elte.hu: nope, the problem chipset was 32-bit only.
                 The code symmetry is good nevertheless. ]

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-16 16:52:55 +02:00
Yinghai Lu
d4057bdb6a x86: make headers files the same in io_apic_xx.c
also make no_timer_check to be global on 64 bit, because vmi_32 is using that.

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-16 16:52:54 +02:00
Yinghai Lu
efa2559f65 x86: order variables in io_apic_xx.c
move first_system_vector to apic_64.c.

also add #ifdef CONFIG_INTR_REMAP to prepare 32 bit to use
same file.

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-16 16:52:54 +02:00
Yinghai Lu
8ea5371baa x86: ordering functions in io_apic_64.c
try to make functions have the same order between 32-bit and 64-bit.

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-16 16:52:54 +02:00
Yinghai Lu
1d02519242 x86: ordering functions in io_apic_32.c
prepare for unification:

try to make functions be of the same order to io_apic_64.c.

v2: add calling setup_msi_irq back to arch_setup_msi_irq

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-16 16:52:54 +02:00
Yinghai Lu
d83e94acd9 x86, io-apic: remove union about dest for log/phy
let user decide the meaning of the bits.

This unifies the 32-bit and 64-bit io-apic code a bit.

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-16 16:52:53 +02:00
Yinghai Lu
7a959cff72 x86: add debug info for 32bit sparse_irq
so could figure out bugs where we get an interrupt, but vector_irq is
not initialized yet.

Signed-off-by: Yinghai Lu  <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-16 16:52:53 +02:00
Yinghai Lu
497c9a195d x86: make 32bit support per_cpu vector
so we can merge io_apic_32.c and io_apic_64.c

v2: Use cpu_online_map as target cpus for bigsmp, just like 64-bit is doing.

Also remove some unused TARGET_CPUS macro.

v3: need to check if desc is null in smp_irq_move_cleanup

also migration needs to reset vector too, so copy __target_IO_APIC_irq
from 64bit.

(the duplication will go away once the two files are unified.)

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-16 16:52:53 +02:00
Yinghai Lu
199751d715 x86: make 32 bit to use sparse_irq
but actually irq still needs to be less than NR_IRQS, because
interrupt[NR_IRQS] in entry.S.

need to enable per_cpu vector...

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-16 16:52:53 +02:00
Yinghai Lu
0f978f4505 x86: make 32bit to use irq_2_pin in irq_cfg
so it is more like 64 bit.

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-16 16:52:53 +02:00
Yinghai Lu
da51a82131 x86: make 32bit use irq_cfg_alloc, etc
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-16 16:52:52 +02:00
Yinghai Lu
a1420f395d x86: add irq_cfg for 32bit
it only contains vector ...

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-16 16:52:52 +02:00
Yinghai Lu
8b8e8c1bf7 x86: remove irqbalance in kernel for 32 bit
This has been deprecated for years, the user space irqbalanced utility
works better with numa, has configurable policies, etc...

Signed-off-by: Yinghai Lu <yhlu.kernel@gmai.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-16 16:52:52 +02:00
Yinghai Lu
6d50bc2683 x86: use 28 bits irq NR for pci msi/msix and ht
also print out irq no in /proc/interrups and /proc/stat in hex, so could
tell bus/dev/func.

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-16 16:52:52 +02:00
Yinghai Lu
52b17329d6 x86_64: make /proc/interrupts work with dyn irq_desc
loop with irq_desc list

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-16 16:52:51 +02:00
Yinghai Lu
a2f9f43858 x86_64: separate irq_cfgx from irq_cfgx_free
so later don't need to compare with -1U

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-16 16:52:51 +02:00
Yinghai Lu
cb5bc83225 x86_64: rename irq_desc/irq_desc_alloc
change names:

          irq_desc() ==> irq_desc_alloc
	__irq_desc() ==> irq_desc

Also split a few of the uses in lowlevel x86 code.

v2: need to check if desc is null in smp_irq_move_cleanup

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-16 16:52:51 +02:00
Yinghai Lu
1d5f6b36c4 x86: check with without_new in show_interrupts
so we don't get new one that we don't need it.

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-16 16:52:51 +02:00
Yinghai Lu
46926b67fc generic: add irq_desc in function in parameter
So we could remove some duplicated calling to irq_desc

v2: make sure irq_desc in  init/main.c is not used without generic_hardirqs

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-16 16:52:50 +02:00
Yinghai Lu
7d94f7ca40 irq: remove >= nr_irqs checking with config_have_sparse_irq
remove irq limit checks - nr_irqs is dynamic and we expand anytime.

v2: fix checking about result irq_cfg_without_new, so could use msi again
v3: use irq_desc_without_new to check irq is valid

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-16 16:52:50 +02:00
Yinghai Lu
46b8214d12 x86, ioapic: replace loop with nr_irqs with for_each_irq_icfg
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-16 16:52:33 +02:00
Yinghai Lu
2c6927a38f irq: replace loop with nr_irqs with for_each_irq_desc
There are a handful of loops that go from 0 to nr_irqs and use
get_irq_desc() on them. These would allocate all the irq_desc
entries, regardless of the need for them.

Use the smarter for_each_irq_desc() iterator that will only iterate
over the present ones.

v2: make sure arch without GENERIC_HARDIRQS work too

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-16 16:52:33 +02:00
Yinghai Lu
7f95ec9e4c x86: move kstat_irqs from kstat to irq_desc
based on Eric's patch ...

together mold it with dyn_array for irq_desc, will allcate kstat_irqs for
nr_irq_desc alltogether if needed. -- at that point nr_cpus is known already.

v2: make sure system without generic_hardirqs works they don't have irq_desc
v3: fix merging
v4: [mingo@elte.hu] fix typo

[ mingo@elte.hu ] irq: build fix

fix:

 arch/x86/xen/spinlock.c: In function 'xen_spin_lock_slow':
 arch/x86/xen/spinlock.c:90: error: 'struct kernel_stat' has no member named 'irqs'

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-16 16:52:32 +02:00
Yinghai Lu
e5a53714ac x86: put irq_2_pin pointer into irq_cfg
preallocate 32 irq_2_pin entries, and use get_one_free_irq_2_pin() to get
one more and link to irq_cfg if needed.

so don't waste one where no irq is enabled.

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-16 16:52:31 +02:00
Yinghai Lu
3ac2de48ed x86: add irq_cfg in io_apic_64.c
preallocate size is 32, and if it is not enough, irq_cfg will more
via alloc_bootmem() or kzalloc(). (depending on how early we are in
system setup)

v2: fix typo about size of init_one_irq_cfg ... should use sizeof(struct irq_cfg)
v3: according to Eric, change get_irq_cfg() to irq_cfg()
v4: squash add irq_cfg_alloc in

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-16 16:52:30 +02:00
Yinghai Lu
08678b0841 generic: sparse irqs: use irq_desc() together with dyn_array, instead of irq_desc[]
add CONFIG_HAVE_SPARSE_IRQ to for use condensed array.
Get rid of irq_desc[] array assumptions.

Preallocate 32 irq_desc, and irq_desc() will try to get more.

( No change in functionality is expected anywhere, except the odd build
  failure where we missed a code site or where a crossing commit itroduces
  new irq_desc[] usage. )

v2: according to Eric, change get_irq_desc() to irq_desc()

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-16 16:52:29 +02:00
Yinghai Lu
71f521bbaf x86, irq: get nr_irqs from madt
Until now, NR_IRQS was derived from black magic defines that had to
be "large enough" to both accomodate NR_CPUS and MAX_NR_IO_APICs.

This resulted in a way too large irq_desc[] array on most x86 systems.
Especially with larger CPU masks, the size of irq_desc can spiral out
of control quickly.

So be smarter about it and use precise allocation instead: determine the
default maximum possible IRQ number from the ACPI MADT. Use a minimum limit
of at least 32 IRQs for broken BIOSes.

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-16 16:52:08 +02:00
Ingo Molnar
a84488c213 irq: sparse irqs, fix #3
fix non-APIC UP build:

 arch/x86/kernel/built-in.o: In function `setup_arch':
 : undefined reference to `pin_map_size'
 arch/x86/kernel/built-in.o: In function `setup_arch':
 : undefined reference to `first_free_entry'

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-16 16:52:08 +02:00
Yinghai Lu
301e619020 x86: use dyn_array in io_apic_xx.c
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-16 16:52:08 +02:00
Yinghai Lu
0799e432ac x86: use nr_irqs
also add first_free_entry and pin_map_size, which were NR_IRQS derived
constants.

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-16 16:52:05 +02:00
Yinghai Lu
6da55c3e8d x86: enable dyn_array support
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-16 16:52:04 +02:00
Yinghai Lu
1f8ff037a8 x86: alloc dyn_array all together
so could spare some memory with small alignment in bootmem

also tighten the alignment checking, and make print out less debug info.

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-16 16:52:03 +02:00
Yinghai Lu
1f3fcd4b1a add per_cpu_dyn_array support
allow dyn-array in per_cpu area, allocated dynamically.

usage:

|  /* in .h */
| struct kernel_stat {
|        struct cpu_usage_stat   cpustat;
|        unsigned int *irqs;
| };
|
|  /* in .c */
| DEFINE_PER_CPU(struct kernel_stat, kstat);
|
| DEFINE_PER_CPU_DYN_ARRAY_ADDR(per_cpu__kstat_irqs, per_cpu__kstat.irqs, sizeof(unsigned int), nr_irqs, sizeof(unsigned long), NULL);

after setup_percpu()/per_cpu_alloc_dyn_array(), the dyn_array in
per_cpu area is ready to use.

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-16 16:52:03 +02:00
Yinghai Lu
fe648be019 x86: add after_bootmem flag for 32bit
to prepare to use dyn_array support etc.

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-16 16:52:02 +02:00
Pavel Vasilyev
1339c367a8 fix CONFIG_MMCONFIG=n build warning
arch/x86/kernel/acpi/boot.c💯 warning: 'acpi_mcfg_64bit_base_addr' defined
but not used

http://bugzilla.kernel.org/show_bug.cgi?id=11743

Signed-off-by: Pavel Vasilyev <linuxoid@tochka.ru>
Signed-off-by: Len Brown <len.brown@intel.com>
2008-10-15 17:33:48 -04:00
Robert Richter
5a289395bf Merge branch 'oprofile/x86-oprofile-for-tip' into oprofile/oprofile-for-tip
Conflicts:
	arch/x86/oprofile/op_model_ppro.c
2008-10-15 22:19:41 +02:00
Suravee Suthikulpanit
5f87dfb79f x86/oprofile: add the logic for enabling additional IBS bits
This patch adds the logic for enabling additional IBS control bits :
* IBS-Fetch IbsRandEn bit (bit 57)
* IBS-Op IbsOpCntCtl bit (bit 19)

Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Signed-off-by: Robert Richter <robert.richter@amd.com>
2008-10-15 20:56:56 +02:00
Robert Richter
69046d4304 x86/oprofile: reordering functions in nmi_int.c
No functional changes. The intension is to remove static function
declarations.

Signed-off-by: Robert Richter <robert.richter@amd.com>
2008-10-15 20:56:53 +02:00
Robert Richter
25ad2913ca oprofile: more whitespace fixes
Signed-off-by: Robert Richter <robert.richter@amd.com>
2008-10-15 20:55:51 +02:00
Robert Richter
c92960fccb oprofile: whitespace fixes
Signed-off-by: Robert Richter <robert.richter@amd.com>
2008-10-15 20:47:41 +02:00
Robert Richter
ccd755c2d9 OProfile: Rename IBS sysfs dir into "ibs_op"
The new name is now more close to those used in the spec.

Signed-off-by: Robert Richter <robert.richter@amd.com>
2008-10-15 20:47:38 +02:00
Robert Richter
2d55a47882 OProfile: Rework string handling in setup_ibs_files()
Signed-off-by: Robert Richter <robert.richter@amd.com>
2008-10-15 20:47:34 +02:00
Robert Richter
e2fee2761a OProfile: Rework oprofile_add_ibs_sample() function
Code looks much more cleaner now.

Signed-off-by: Robert Richter <robert.richter@amd.com>
2008-10-15 20:47:31 +02:00
Xiantao Zhang
3de42dc094 KVM: Separate irq ack notification out of arch/x86/kvm/irq.c
Moving irq ack notification logic as common, and make
it shared with ia64 side.

Signed-off-by: Xiantao Zhang <xiantao.zhang@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-10-15 14:25:35 +02:00
Xiantao Zhang
8a98f6648a KVM: Move device assignment logic to common code
To share with other archs, this patch moves device assignment
logic to common parts.

Signed-off-by: Xiantao Zhang <xiantao.zhang@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-10-15 14:25:33 +02:00
Zhang xiantao
371c01b28e KVM: Device Assignment: Move vtd.c from arch/x86/kvm/ to virt/kvm/
Preparation for kvm/ia64 VT-d support.

Signed-off-by: Zhang xiantao <xiantao.zhang@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-10-15 14:25:32 +02:00
Marcelo Tosatti
83dbc83a0d KVM: VMX: enable invlpg exiting if EPT is disabled
Manually disabling EPT via module option fails to re-enable INVLPG
exiting.

Reported-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-10-15 14:25:31 +02:00
Jan Kiszka
1b10bf31a5 KVM: x86: Silence various LAPIC-related host kernel messages
KVM-x86 dumps a lot of debug messages that have no meaning for normal
operation:
 - INIT de-assertion is ignored
 - SIPIs are sent and received
 - APIC writes are unaligned or < 4 byte long
   (Windows Server 2003 triggers this on SMP)

Degrade them to true debug messages, keeping the host kernel log clean
for real problems.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2008-10-15 14:25:30 +02:00
Weidong Han
e5fcfc821a KVM: Device Assignment: Map mmio pages into VT-d page table
Assigned device could DMA to mmio pages, so also need to map mmio pages
into VT-d page table.

Signed-off-by: Weidong Han <weidong.han@intel.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2008-10-15 14:25:29 +02:00
Marcelo Tosatti
e48258009d KVM: PIC: enhance IPI avoidance
The PIC code makes little effort to avoid kvm_vcpu_kick(), resulting in
unnecessary guest exits in some conditions.

For example, if the timer interrupt is routed through the IOAPIC, IRR
for IRQ 0 will get set but not cleared, since the APIC is handling the
acks.

This means that everytime an interrupt < 16 is triggered, the priority
logic will find IRQ0 pending and send an IPI to vcpu0 (in case IRQ0 is
not masked, which is Linux's case).

Introduce a new variable isr_ack to represent the IRQ's for which the
guest has been signalled / cleared the ISR. Use it to avoid more than
one IPI per trigger-ack cycle, in addition to the avoidance when ISR is
set in get_priority().

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2008-10-15 14:25:28 +02:00
Marcelo Tosatti
582801a95d KVM: MMU: add "oos_shadow" parameter to disable oos
Subject says it all.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2008-10-15 14:25:27 +02:00
Marcelo Tosatti
0074ff63eb KVM: MMU: speed up mmu_unsync_walk
Cache the unsynced children information in a per-page bitmap.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2008-10-15 14:25:26 +02:00
Marcelo Tosatti
4731d4c7a0 KVM: MMU: out of sync shadow core
Allow guest pagetables to go out of sync.  Instead of emulating write
accesses to guest pagetables, or unshadowing them, we un-write-protect
the page table and allow the guest to modify it at will.  We rely on
invlpg executions to synchronize individual ptes, and will synchronize
the entire pagetable on tlb flushes.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2008-10-15 14:25:25 +02:00
Marcelo Tosatti
6844dec694 KVM: MMU: mmu_convert_notrap helper
Need to convert shadow_notrap_nonpresent -> shadow_trap_nonpresent when
unsyncing pages.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2008-10-15 14:25:24 +02:00
Marcelo Tosatti
0738541396 KVM: MMU: awareness of new kvm_mmu_zap_page behaviour
kvm_mmu_zap_page will soon zap the unsynced children of a page. Restart
list walk in such case.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2008-10-15 14:25:23 +02:00
Marcelo Tosatti
ad8cfbe3ff KVM: MMU: mmu_parent_walk
Introduce a function to walk all parents of a given page, invoking a handler.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2008-10-15 14:25:22 +02:00
Marcelo Tosatti
a7052897b3 KVM: x86: trap invlpg
With pages out of sync invlpg needs to be trapped. For now simply nuke
the entry.

Untested on AMD.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2008-10-15 14:25:21 +02:00
Marcelo Tosatti
0ba73cdadb KVM: MMU: sync roots on mmu reload
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2008-10-15 14:25:20 +02:00
Marcelo Tosatti
e8bc217aef KVM: MMU: mode specific sync_page
Examine guest pagetable and bring the shadow back in sync. Caller is responsible
for local TLB flush before re-entering guest mode.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2008-10-15 14:25:19 +02:00
Marcelo Tosatti
38187c830c KVM: MMU: do not write-protect large mappings
There is not much point in write protecting large mappings. This
can only happen when a page is shadowed during the window between
is_largepage_backed and mmu_lock acquision. Zap the entry instead, so
the next pagefault will find a shadowed page via is_largepage_backed and
fallback to 4k translations.

Simplifies out of sync shadow.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2008-10-15 14:25:18 +02:00
Marcelo Tosatti
a378b4e64c KVM: MMU: move local TLB flush to mmu_set_spte
Since the sync page path can collapse flushes.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2008-10-15 14:25:17 +02:00
Marcelo Tosatti
1e73f9dd88 KVM: MMU: split mmu_set_spte
Split the spte entry creation code into a new set_spte function.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2008-10-15 14:25:16 +02:00
Marcelo Tosatti
93a423e704 KVM: MMU: flush remote TLBs on large->normal entry overwrite
It is necessary to flush all TLB's when a large spte entry is
overwritten with a normal page directory pointer.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2008-10-15 14:25:15 +02:00
Harvey Harrison
a08546001c x86: pvclock: fix shadowed variable warning
arch/x86/kernel/pvclock.c:102:6: warning: symbol 'tsc_khz' shadows an earlier one
include/asm/tsc.h:18:21: originally declared here

Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Avi Kivity <avi@redhat.com>
2008-10-15 14:25:14 +02:00
Gleb Natapov
af2152f545 KVM: don't enter guest after SIPI was received by a CPU
The vcpu should process pending SIPI message before entering guest mode again.
kvm_arch_vcpu_runnable() returns true if the vcpu is in SIPI state, so
we can't call it here.

Signed-off-by: Gleb Natapov <gleb@qumranet.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2008-10-15 14:25:09 +02:00
Harvey Harrison
2259e3a7a6 KVM: x86.c make kvm_load_realmode_segment static
Noticed by sparse:
arch/x86/kvm/x86.c:3591:5: warning: symbol 'kvm_load_realmode_segment' was not declared. Should it be static?

Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2008-10-15 14:25:07 +02:00
Marcelo Tosatti
4c2155ce81 KVM: switch to get_user_pages_fast
Convert gfn_to_pfn to use get_user_pages_fast, which can do lockless
pagetable lookups on x86. Kernel compilation on 4-way guest is 3.7%
faster on VMX.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2008-10-15 14:25:06 +02:00
Amit Shah
bfadaded0d KVM: Device Assignment: Free device structures if IRQ allocation fails
When an IRQ allocation fails, we free up the device structures and
disable the device so that we can unregister the device in the
userspace and not expose it to the guest at all.

Signed-off-by: Amit Shah <amit.shah@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2008-10-15 14:25:04 +02:00
Ben-Ami Yassour
62c476c7c7 KVM: Device Assignment with VT-d
Based on a patch by: Kay, Allen M <allen.m.kay@intel.com>

This patch enables PCI device assignment based on VT-d support.
When a device is assigned to the guest, the guest memory is pinned and
the mapping is updated in the VT-d IOMMU.

[Amit: Expose KVM_CAP_IOMMU so we can check if an IOMMU is present
and also control enable/disable from userspace]

Signed-off-by: Kay, Allen M <allen.m.kay@intel.com>
Signed-off-by: Weidong Han <weidong.han@intel.com>
Signed-off-by: Ben-Ami Yassour <benami@il.ibm.com>
Signed-off-by: Amit Shah <amit.shah@qumranet.com>

Acked-by: Mark Gross <mgross@linux.intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-10-15 14:25:04 +02:00
Ingo Molnar
6b2ada8210 Merge branches 'core/softlockup', 'core/softirq', 'core/resources', 'core/printk' and 'core/misc' into core-v28-for-linus 2008-10-15 12:48:44 +02:00
Guillaume Thouvenin
aa3a816b6d KVM: x86 emulator: Use DstAcc for 'and'
For instruction 'and al,imm' we use DstAcc instead of doing
the emulation directly into the instruction's opcode.

Signed-off-by: Guillaume Thouvenin <guillaume.thouvenin@ext.bull.net>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-10-15 10:16:14 +02:00
Guillaume Thouvenin
8a9fee67fb KVM: x86 emulator: Add cmp al, imm and cmp ax, imm instructions (ocodes 3c, 3d)
Add decode entries for these opcodes; execution is already implemented.

Signed-off-by: Guillaume Thouvenin <guillaume.thouvenin@ext.bull.net>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-10-15 10:16:14 +02:00
Guillaume Thouvenin
9c9fddd0e7 KVM: x86 emulator: Add DstAcc operand type
Add DstAcc operand type. That means that there are 4 bits now for
DstMask.

"In the good old days cpus would have only one register that was able to
 fully participate in arithmetic operations, typically called A for
 Accumulator.  The x86 retains this tradition by having special, shorter
 encodings for the A register (like the cmp opcode), and even some
 instructions that only operate on A (like mul).

 SrcAcc and DstAcc would accommodate these instructions by decoding A
 into the corresponding 'struct operand'."
  -- Avi Kivity

Signed-off-by: Guillaume Thouvenin <guillaume.thouvenin@ext.bull.net>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-10-15 10:16:14 +02:00
Sheng Yang
defed7ed92 x86: Move FEATURE_CONTROL bits to msr-index.h
For MSR_IA32_FEATURE_CONTROL is already there.

Signed-off-by: Sheng Yang <sheng.yang@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-10-15 10:16:14 +02:00
Sheng Yang
9ea542facb KVM: VMX: Rename IA32_FEATURE_CONTROL bits
Signed-off-by: Sheng Yang <sheng.yang@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-10-15 10:16:14 +02:00
Avi Kivity
ef46f18ea0 KVM: x86 emulator: fix jmp r/m64 instruction
jmp r/m64 doesn't require the rex.w prefix to indicate the operand size
is 64 bits.  Set the Stack attribute (even though it doesn't involve the
stack, really) to indicate this.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-10-15 10:15:27 +02:00
Jan Kiszka
4b92fe0c9d KVM: VMX: Cleanup stalled INTR_INFO read
Commit 1c0f4f5011829dac96347b5f84ba37c2252e1e08 left a useless access
of VM_ENTRY_INTR_INFO_FIELD in vmx_intr_assist behind. Clean this up.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-10-15 10:15:26 +02:00
Marcelo Tosatti
9c3e4aab5a KVM: x86: unhalt vcpu0 on reset
Since "KVM: x86: do not execute halted vcpus", HLT by vcpu0 before system
reset by the IO thread will hang the guest.

Mark vcpu as runnable in such case.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-10-15 10:15:26 +02:00
Mohammed Gamal
d19292e457 KVM: x86 emulator: Add call near absolute instruction (opcode 0xff/2)
Add call near absolute instruction.

Signed-off-by: Mohammed Gamal <m.gamal005@gmail.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-10-15 10:15:26 +02:00
Marcelo Tosatti
d76901750a KVM: x86: do not execute halted vcpus
Offline or uninitialized vcpu's can be executed if requested to perform
userspace work.

Follow Avi's suggestion to handle halted vcpu's in the main loop,
simplifying kvm_emulate_halt(). Introduce a new vcpu->requests bit to
indicate events that promote state from halted to running.

Also standardize vcpu wake sites.

Signed-off-by: Marcelo Tosatti <mtosatti <at> redhat.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-10-15 10:15:26 +02:00
Mohammed Gamal
a6a3034cb9 KVM: x86 emulator: Add in/out instructions (opcodes 0xe4-0xe7, 0xec-0xef)
The patch adds in/out instructions to the x86 emulator.

The instruction was encountered while running the BIOS while using
the invalid guest state emulation patch.

Signed-off-by: Mohammed Gamal <m.gamal005@gmail.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-10-15 10:15:25 +02:00
Avi Kivity
fa89a81766 KVM: Add statistics for guest irq injections
These can help show whether a guest is making progress or not.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-10-15 10:15:25 +02:00
Sheng Yang
d40a1ee485 KVM: MMU: Modify kvm_shadow_walk.entry to accept u64 addr
EPT is 4 level by default in 32pae(48 bits), but the addr parameter
of kvm_shadow_walk->entry() only accept unsigned long as virtual
address, which is 32bit in 32pae. This result in SHADOW_PT_INDEX()
overflow when try to fetch level 4 index.

Fix it by extend kvm_shadow_walk->entry() to accept 64bit addr in
parameter.

Signed-off-by: Sheng Yang <sheng.yang@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-10-15 10:15:25 +02:00
Mohammed Gamal
fb4616f431 KVM: x86 emulator: Add std and cld instructions (opcodes 0xfc-0xfd)
This adds the std and cld instructions to the emulator.

Encountered while running the BIOS with invalid guest
state emulation enabled.

Signed-off-by: Mohammed Gamal <m.gamal005@gmail.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-10-15 10:15:25 +02:00
Joerg Roedel
a89c1ad270 KVM: add MC5_MISC msr read support
Currently KVM implements MC0-MC4_MISC read support. When booting Linux this
results in KVM warnings in the kernel log when the guest tries to read
MC5_MISC. Fix this warnings with this patch.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-10-15 10:15:24 +02:00
Avi Kivity
48d1503949 KVM: SVM: No need to unprotect memory during event injection when using npt
No memory is protected anyway.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-10-15 10:15:24 +02:00
Avi Kivity
3201b5d9f0 KVM: MMU: Fix setting the accessed bit on non-speculative sptes
The accessed bit was accidentally turned on in a random flag word, rather
than, the spte itself, which was lucky, since it used the non-EPT compatible
PT_ACCESSED_MASK.

Fix by turning the bit on in the spte and changing it to use the portable
accessed mask.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-10-15 10:15:24 +02:00
Avi Kivity
171d595d3b KVM: MMU: Flush tlbs after clearing write permission when accessing dirty log
Otherwise, the cpu may allow writes to the tracked pages, and we lose
some display bits or fail to migrate correctly.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-10-15 10:15:24 +02:00
Avi Kivity
2245a28fe2 KVM: MMU: Add locking around kvm_mmu_slot_remove_write_access()
It was generally safe due to slots_lock being held for write, but it wasn't
very nice.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-10-15 10:15:24 +02:00
Avi Kivity
bc2d429979 KVM: MMU: Account for npt/ept/realmode page faults
Now that two-dimensional paging is becoming common, account for tdp page
faults.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-10-15 10:15:23 +02:00
Mohammed Gamal
a5e2e82b8b KVM: x86 emulator: Add mov r, imm instructions (opcodes 0xb0-0xbf)
The emulator only supported one instance of mov r, imm instruction
(opcode 0xb8), this adds the rest of these instructions.

Signed-off-by: Mohammed Gamal <m.gamal005@gmail.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-10-15 10:15:23 +02:00
Avi Kivity
acee3c04e8 KVM: Allocate guest memory as MAP_PRIVATE, not MAP_SHARED
There is no reason to share internal memory slots with fork()ed instances.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-10-15 10:15:23 +02:00
Avi Kivity
abb9e0b8e3 KVM: MMU: Convert the paging mode shadow walk to use the generic walker
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-10-15 10:15:23 +02:00
Avi Kivity
140754bc80 KVM: MMU: Convert direct maps to use the generic shadow walker
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-10-15 10:15:23 +02:00
Avi Kivity
3d000db568 KVM: MMU: Add generic shadow walker
We currently walk the shadow page tables in two places: direct map (for
real mode and two dimensional paging) and paging mode shadow.  Since we
anticipate requiring a third walk (for invlpg), it makes sense to have
a generic facility for shadow walk.

This patch adds such a shadow walker, walks the page tables and calls a
method for every spte encountered.  The method can examine the spte,
modify it, or even instantiate it.  The walk can be aborted by returning
nonzero from the method.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-10-15 10:15:23 +02:00
Avi Kivity
6c41f428b7 KVM: MMU: Infer shadow root level in direct_map()
In all cases the shadow root level is available in mmu.shadow_root_level,
so there is no need to pass it as a parameter.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-10-15 10:15:22 +02:00
Avi Kivity
6e37d3dc3e KVM: MMU: Unify direct map 4K and large page paths
The two paths are equivalent except for one argument, which is already
available.  Merge the two codepaths.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-10-15 10:15:22 +02:00
Avi Kivity
135f8c2b07 KVM: MMU: Move SHADOW_PT_INDEX to mmu.c
It is not specific to the paging mode, so can be made global (and reusable).

Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-10-15 10:15:22 +02:00
Avi Kivity
6eb06cb286 KVM: x86 emulator: remove bad ByteOp specifier from NEG descriptor
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-10-15 10:15:22 +02:00
roel kluin
41afa02587 KVM: x86 emulator: remove duplicate SrcImm
Signed-off-by: Roel Kluin <roel.kluin@gmail.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-10-15 10:15:21 +02:00
Avi Kivity
f4bbd9aaaa KVM: Load real mode segments correctly
Real mode segments to not reference the GDT or LDT; they simply compute
base = selector * 16.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-10-15 10:15:21 +02:00
Avi Kivity
a16b20da87 KVM: VMX: Change segment dpl at reset to 3
This is more emulation friendly, if not 100% correct.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-10-15 10:15:21 +02:00
Avi Kivity
5706be0daf KVM: VMX: Change cs reset state to be a data segment
Real mode cs is a data segment, not a code segment.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-10-15 10:15:21 +02:00
Harvey Harrison
ee032c993e KVM: make irq ack notifier functions static
sparse says:

arch/x86/kvm/x86.c:107:32: warning: symbol 'kvm_find_assigned_dev' was not declared. Should it be static?
arch/x86/kvm/i8254.c:225:6: warning: symbol 'kvm_pit_ack_irq' was not declared. Should it be static?

Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-10-15 10:15:21 +02:00
Amit Shah
29c8fa32c5 KVM: Use kvm_set_irq to inject interrupts
... instead of using the pic and ioapic variants

Signed-off-by: Amit Shah <amit.shah@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-10-15 10:15:21 +02:00
Amit Shah
94c935a1ee KVM: SVM: Fix typo
Fix typo in as-yet unused macro definition.

Signed-off-by: Amit Shah <amit.shah@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-10-15 10:15:20 +02:00
Mohammed Gamal
a89a8fb93b KVM: VMX: Modify mode switching and vmentry functions
This patch modifies mode switching and vmentry function in order to
drive invalid guest state emulation.

Signed-off-by: Mohammed Gamal <m.gamal005@gmail.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-10-15 10:15:20 +02:00
Mohammed Gamal
ea953ef0ca KVM: VMX: Add invalid guest state handler
This adds the invalid guest state handler function which invokes the x86
emulator until getting the guest to a VMX-friendly state.

[avi: leave atomic context if scheduling]
[guillaume: return to atomic context correctly]

Signed-off-by: Laurent Vivier <laurent.vivier@bull.net>
Signed-off-by: Guillaume Thouvenin <guillaume.thouvenin@ext.bull.net>
Signed-off-by: Mohammed Gamal <m.gamal005@gmail.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-10-15 10:15:20 +02:00
Mohammed Gamal
04fa4d3211 KVM: VMX: Add module parameter and emulation flag.
The patch adds the module parameter required to enable emulating invalid
guest state, as well as the emulation_required flag used to drive
emulation whenever needed.

Signed-off-by: Mohammed Gamal <m.gamal005@gmail.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-10-15 10:15:20 +02:00
Mohammed Gamal
648dfaa7df KVM: VMX: Add Guest State Validity Checks
This patch adds functions to check whether guest state is VMX compliant.

Signed-off-by: Mohammed Gamal <m.gamal005@gmail.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-10-15 10:15:20 +02:00
Amit Shah
6762b7299a KVM: Device assignment: Check for privileges before assigning irq
Even though we don't share irqs at the moment, we should ensure
regular user processes don't try to allocate system resources.

We check for capability to access IO devices (CAP_SYS_RAWIO) before
we request_irq on behalf of the guest.

Noticed by Avi.

Signed-off-by: Amit Shah <amit.shah@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-10-15 10:15:20 +02:00
Avi Kivity
dc7404cea3 KVM: Handle spurious acks for PIT interrupts
Spurious acks can be generated, for example if the PIC is being reset.
Handle those acks gracefully rather than flooding the log with warnings.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-10-15 10:15:19 +02:00
Marcelo Tosatti
85428ac7c3 KVM: fix i8259 reset irq acking
The irq ack during pic reset has three problems:

- Ignores slave/master PIC, using gsi 0-8 for both.
- Generates an ACK even if the APIC is in control.
- Depends upon IMR being clear, which is broken if the irq was masked
at the time it was generated.

The last one causes the BIOS to hang after the first reboot of
Windows installation, since PIT interrupts stop.

[avi: fix check whether pic interrupts are seen by cpu]

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-10-15 10:15:19 +02:00
Avi Kivity
ecfc79c700 KVM: VMX: Use interrupt queue for !irqchip_in_kernel
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-10-15 10:15:19 +02:00
Marcelo Tosatti
29415c37f0 KVM: set debug registers after "schedulable" section
The vcpu thread can be preempted after the guest_debug_pre() callback,
resulting in invalid debug registers on the new vcpu.

Move it inside the non-preemptable section.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-10-15 10:15:19 +02:00
Sheng Yang
464d17c8b7 KVM: VMX: Clean up magic number 0x66 in init_rmode_tss
Signed-off-by: Sheng Yang <sheng.yang@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-10-15 10:15:19 +02:00
Dave Hansen
6ad18fba05 KVM: Reduce stack usage in kvm_pv_mmu_op()
We're in a hot path.  We can't use kmalloc() because
it might impact performance.  So, we just stick the buffer that
we need into the kvm_vcpu_arch structure.  This is used very
often, so it is not really a waste.

We also have to move the buffer structure's definition to the
arch-specific x86 kvm header.

Signed-off-by: Dave Hansen <dave@linux.vnet.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-10-15 10:15:18 +02:00
Dave Hansen
b772ff362e KVM: Reduce stack usage in kvm_arch_vcpu_ioctl()
[sheng: fix KVM_GET_LAPIC using wrong size]

Signed-off-by: Dave Hansen <dave@linux.vnet.ibm.com>
Signed-off-by: Sheng Yang <sheng.yang@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-10-15 10:15:18 +02:00
Dave Hansen
f0d662759a KVM: Reduce kvm stack usage in kvm_arch_vm_ioctl()
On my machine with gcc 3.4, kvm uses ~2k of stack in a few
select functions.  This is mostly because gcc fails to
notice that the different case: statements could have their
stack usage combined.  It overflows very nicely if interrupts
happen during one of these large uses.

This patch uses two methods for reducing stack usage.
1. dynamically allocate large objects instead of putting
   on the stack.
2. Use a union{} member for all of the case variables. This
   tricks gcc into combining them all into a single stack
   allocation. (There's also a comment on this)

Signed-off-by: Dave Hansen <dave@linux.vnet.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-10-15 10:15:18 +02:00
Ben-Ami Yassour
4d5c5d0fe8 KVM: pci device assignment
Based on a patch from: Amit Shah <amit.shah@qumranet.com>

This patch adds support for handling PCI devices that are assigned to
the guest.

The device to be assigned to the guest is registered in the host kernel
and interrupt delivery is handled.  If a device is already assigned, or
the device driver for it is still loaded on the host, the device
assignment is failed by conveying a -EBUSY reply to the userspace.

Devices that share their interrupt line are not supported at the moment.

By itself, this patch will not make devices work within the guest.
The VT-d extension is required to enable the device to perform DMA.
Another alternative is PVDMA.

Signed-off-by: Amit Shah <amit.shah@qumranet.com>
Signed-off-by: Ben-Ami Yassour <benami@il.ibm.com>
Signed-off-by: Weidong Han <weidong.han@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-10-15 10:15:18 +02:00
Glauber Costa
0293615f3f x86: KVM guest: use paravirt function to calculate cpu khz
We're currently facing timing problems in guests that do
calibration under heavy load, and then the load vanishes.
This means we'll have a much lower lpj than we actually should,
and delays end up taking less time than they should, which is a
nasty bug.

Solution is to pass on the lpj value from host to guest, and have it
preset.

Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-10-15 10:15:17 +02:00
Glauber Costa
3807f345b2 x86: paravirt: factor out cpu_khz to common code
KVM intends to use paravirt code to calibrate khz. Xen
current code will do just fine. So as a first step, factor out
code to pvclock.c.

Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-10-15 10:15:17 +02:00
Marcelo Tosatti
3cf57fed21 KVM: PIT: fix injection logic and count
The PIT injection logic is problematic under the following cases:

1) If there is a higher priority vector to be delivered by the time
kvm_pit_timer_intr_post is invoked ps->inject_pending won't be set.
This opens the possibility for missing many PIT event injections (say if
guest executes hlt at this point).

2) ps->inject_pending is racy with more than two vcpus. Since there's no locking
around read/dec of pt->pending, two vcpu's can inject two interrupts for a single
pt->pending count.

Fix 1 by using an irq ack notifier: only reinject when the previous irq
has been acked. Fix 2 with appropriate locking around manipulation of
pending count and irq_ack by the injection / ack paths.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-10-15 10:15:17 +02:00
Marcelo Tosatti
f52447261b KVM: irq ack notification
Based on a patch from: Ben-Ami Yassour <benami@il.ibm.com>
which was based on a patch from: Amit Shah <amit.shah@qumranet.com>

Notify IRQ acking on PIC/APIC emulation. The previous patch missed two things:

- Edge triggered interrupts on IOAPIC
- PIC reset with IRR/ISR set should be equivalent to ack (LAPIC probably
needs something similar).

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
CC: Amit Shah <amit.shah@qumranet.com>
CC: Ben-Ami Yassour <benami@il.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-10-15 10:15:16 +02:00
Avi Kivity
564f15378f KVM: Add irq ack notifier list
This can be used by kvm subsystems that are interested in when
interrupts are acked, for example time drift compensation.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-10-15 10:15:16 +02:00
Alexander Graf
b5e2fec0eb KVM: Ignore DEBUGCTL MSRs with no effect
Netware writes to DEBUGCTL and reads from the DEBUGCTL and LAST*IP MSRs
without further checks and is really confused to receive a #GP during that.
To make it happy we should just make them stubs, which is exactly what SVM
already does.

Writes to DEBUGCTL that are vendor-specific are resembled to behave as if the
virtual CPU does not know them.

Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-10-15 10:15:15 +02:00
Avi Kivity
313dbd49dc KVM: VMX: Avoid vmwrite(HOST_RSP) when possible
Usually HOST_RSP retains its value across guest entries.  Take advantage
of this and avoid a vmwrite() when this is so.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-10-15 10:15:15 +02:00
Avi Kivity
80e31d4f61 KVM: SVM: Unify register save/restore across 32 and 64 bit hosts
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-10-15 10:15:14 +02:00
Avi Kivity
c801949ddf KVM: VMX: Unify register save/restore across 32 and 64 bit hosts
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-10-15 10:15:14 +02:00
Jan Kiszka
77ab6db0a1 KVM: VMX: Reinject real mode exception
As we execute real mode guests in VM86 mode, exception have to be
reinjected appropriately when the guest triggered them. For this purpose
the patch adopts the real-mode injection pattern used in vmx_inject_irq
to vmx_queue_exception, additionally taking care that the IP is set
correctly for #BP exceptions. Furthermore it extends
handle_rmode_exception to reinject all those exceptions that can be
raised in real mode.

This fixes the execution of himem.exe from FreeDOS and also makes its
debug.com work properly.

Note that guest debugging in real mode is broken now. This has to be
fixed by the scheduled debugging infrastructure rework (will be done
once base patches for QEMU have been accepted).

Signed-off-by: Jan Kiszka <jan.kiszka@web.de>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-10-15 10:15:14 +02:00
Jan Kiszka
19bd8afdc4 KVM: Consolidate XX_VECTOR defines
Signed-off-by: Jan Kiszka <jan.kiszka@web.de>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-10-15 10:15:14 +02:00
Avi Kivity
7edd0ce058 KVM: Consolidate PIC isr clearing into a function
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-10-15 10:15:14 +02:00
Mohammed Gamal
60bd83a125 KVM: VMX: Remove redundant check in handle_rmode_exception
Since checking for vcpu->arch.rmode.active is already done whenever we
call handle_rmode_exception(), checking it inside the function is redundant.

Signed-off-by: Mohammed Gamal <m.gamal005@gmail.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-10-15 10:15:13 +02:00
Avi Kivity
f7d9238f5d KVM: VMX: Move interrupt post-processing to vmx_complete_interrupts()
Instead of looking at failed injections in the vm entry path, move
processing to the exit path in vmx_complete_interrupts().  This simplifes
the logic and removes any state that is hidden in vmx registers.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-10-15 10:15:13 +02:00
Avi Kivity
937a7eaef9 KVM: Add a pending interrupt queue
Similar to the exception queue, this hold interrupts that have been
accepted by the virtual processor core but not yet injected.

Not yet used.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-10-15 10:15:13 +02:00
Avi Kivity
35920a3569 KVM: VMX: Fix pending exception processing
The vmx code assumes that IDT-Vectoring can only be set when an exception
is injected due to the exception in question.  That's not true, however:
if the exception is injected correctly, and later another exception occurs
but its delivery is blocked due to a fault, then we will incorrectly assume
the first exception was not delivered.

Fix by unconditionally dequeuing the pending exception, and requeuing it
(or the second exception) if we see it in the IDT-Vectoring field.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-10-15 10:15:13 +02:00
Avi Kivity
26eef70c3e KVM: Clear exception queue before emulating an instruction
If we're emulating an instruction, either it will succeed, in which case
any previously queued exception will be spurious, or we will requeue the
same exception.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-10-15 10:15:13 +02:00
Avi Kivity
668f612fa0 KVM: VMX: Move nmi injection failure processing to vm exit path
Instead of processing nmi injection failure in the vm entry path, move
it to the vm exit path (vm_complete_interrupts()).  This separates nmi
injection from nmi post-processing, and moves the nmi state from the VT
state into vcpu state (new variable nmi_injected specifying an injection
in progress).

Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-10-15 10:15:13 +02:00
Avi Kivity
cf393f7566 KVM: Move NMI IRET fault processing to new vmx_complete_interrupts()
Currently most interrupt exit processing is handled on the entry path,
which is confusing.  Move the NMI IRET fault processing to a new function,
vmx_complete_interrupts(), which is called on the vmexit path.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-10-15 10:15:12 +02:00
Avi Kivity
5b5c6a5a60 KVM: MMU: Simplify kvm_mmu_zap_page()
The twisty maze of conditionals can be reduced.

[joerg: fix tlb flushing]

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-10-15 10:15:12 +02:00
Avi Kivity
31aa2b44af KVM: MMU: Separate the code for unlinking a shadow page from its parents
Place into own function, in preparation for further cleanups.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-10-15 10:15:12 +02:00
Amit Shah
867767a365 KVM: Introduce kvm_set_irq to inject interrupts in guests
This function injects an interrupt into the guest given the kvm struct,
the (guest) irq number and the interrupt level.

Signed-off-by: Amit Shah <amit.shah@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-10-15 10:15:12 +02:00
Marcelo Tosatti
5fdbf9765b KVM: x86: accessors for guest registers
As suggested by Avi, introduce accessors to read/write guest registers.
This simplifies the ->cache_regs/->decache_regs interface, and improves
register caching which is important for VMX, where the cost of
vmcs_read/vmcs_write is significant.

[avi: fix warnings]

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-10-15 10:13:57 +02:00
Sheng Yang
ca60dfbb69 KVM: VMX: Rename misnamed msr bits
MSR_IA32_FEATURE_LOCKED is just a bit in fact, which shouldn't be prefixed with
MSR_.  So is MSR_IA32_FEATURE_VMXON_ENABLED.

Signed-off-by: Sheng Yang <sheng.yang@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-10-15 10:13:57 +02:00
Bjorn Helgaas
758a7f7bb8 x86: register a platform RTC device if PNP doesn't describe it
Most if not all x86 platforms have an RTC device, but sometimes the RTC
is not exposed as a PNP0b00/PNP0b01/PNP0b02 device in PNPBIOS or ACPI:

    http://bugzilla.kernel.org/show_bug.cgi?id=11580
    https://bugzilla.redhat.com/show_bug.cgi?id=451188

It's best if we can discover the RTC via PNP because then we know
which flavor of device it is, where it lives, and which IRQ it uses.

But if we can't, we should register a platform device using the
compiled-in RTC_PORT/RTC_IRQ resource assumptions.

Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Acked-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: David Brownell <dbrownell@users.sourceforge.net>
Reported-by: Rik Theys <rik.theys@esat.kuleuven.be>
Reported-by: shr_msn@yahoo.com.tw
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-10-14 16:30:14 -07:00
Anders Kaseorg
8b27386a9c ftrace: make ftrace_test_p6nop disassembler-friendly
Commit 4c3dc21b136f8cb4b72afee16c3ba7e961656c0b in tip introduced the
5-byte NOP ftrace_test_p6nop:

   jmp . + 5
   .byte 0x00, 0x00, 0x00

This is not friendly to disassemblers because an odd number of 0x00s
ends in the middle of an instruction boundary.  This changes the 0x00s
to 1-byte NOPs (0x90).

Signed-off-by: Anders Kaseorg <andersk@mit.edu>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-14 10:39:29 +02:00
Frédéric Weisbecker
ac2b86fdef x86/ftrace: use uaccess in atomic context
With latest -tip I get this bug:

[   49.439988] in_atomic():0, irqs_disabled():1
[   49.440118] INFO: lockdep is turned off.
[   49.440118] Pid: 2814, comm: modprobe Tainted: G        W 2.6.27-rc7 #4
[   49.440118]  [<c01215e1>] __might_sleep+0xe1/0x120
[   49.440118]  [<c01148ea>] ftrace_modify_code+0x2a/0xd0
[   49.440118]  [<c01148a2>] ? ftrace_test_p6nop+0x0/0xa
[   49.440118]  [<c016e80e>] __ftrace_update_code+0xfe/0x2f0
[   49.440118]  [<c01148a2>] ? ftrace_test_p6nop+0x0/0xa
[   49.440118]  [<c016f190>] ftrace_convert_nops+0x50/0x80
[   49.440118]  [<c016f1d6>] ftrace_init_module+0x16/0x20
[   49.440118]  [<c015498b>] load_module+0x185b/0x1d30
[   49.440118]  [<c01767a0>] ? find_get_page+0x0/0xf0
[   49.440118]  [<c02463c0>] ? sprintf+0x0/0x30
[   49.440118]  [<c034e012>] ? mutex_lock_interruptible_nested+0x1f2/0x350
[   49.440118]  [<c0154eb3>] sys_init_module+0x53/0x1b0
[   49.440118]  [<c0352340>] ? do_page_fault+0x0/0x740
[   49.440118]  [<c0104012>] syscall_call+0x7/0xb
[   49.440118]  =======================

It is because ftrace_modify_code() calls copy_to_user and
copy_from_user.
These functions have been inserted after guessing that there
couldn't be any race condition but copy_[to/from]_user might
sleep and __ftrace_update_code is called with local_irq_saved.

These function have been inserted since this commit:
d5e92e8978fd2574e415dc2792c5eb592978243d:
"ftrace: x86 use copy from user function"

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-14 10:38:16 +02:00
Harvey Harrison
37a52f5ef1 x86: suppress trivial sparse signedness warnings
Could just as easily change the three casts to cast to the correct
type...this patch changes the type of ftrace_nop instead.

Supresses sparse warnings:

 arch/x86/kernel/ftrace.c:157:14: warning: incorrect type in assignment (different signedness)
 arch/x86/kernel/ftrace.c:157:14:    expected long *static [toplevel] ftrace_nop
 arch/x86/kernel/ftrace.c:157:14:    got unsigned long *<noident>
 arch/x86/kernel/ftrace.c:161:14: warning: incorrect type in assignment (different signedness)
 arch/x86/kernel/ftrace.c:161:14:    expected long *static [toplevel] ftrace_nop
 arch/x86/kernel/ftrace.c:161:14:    got unsigned long *<noident>
 arch/x86/kernel/ftrace.c:165:14: warning: incorrect type in assignment (different signedness)
 arch/x86/kernel/ftrace.c:165:14:    expected long *static [toplevel] ftrace_nop
 arch/x86/kernel/ftrace.c:165:14:    got unsigned long *<noident>

Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-14 10:38:14 +02:00
Pekka Paalanen
4427414170 mmiotrace: remove left-over marker cruft
Signed-off-by: Pekka Paalanen <pq@iki.fi>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-14 10:37:17 +02:00
Pekka Paalanen
9e57fb35d7 x86 mmiotrace: implement mmiotrace_printk()
Offer mmiotrace users a function to inject markers from inside the kernel.
This depends on the trace_vprintk() patch.

Signed-off-by: Pekka Paalanen <pq@iki.fi>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-14 10:37:11 +02:00
Pekka Paalanen
bbe5c7830c x86 mmiotrace: fix a rare memory leak
Signed-off-by: Pekka Paalanen <pq@iki.fi>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-14 10:37:01 +02:00
Steven Rostedt
6f93fc076a ftrace: x86 use copy to and from user functions
The modification of code is performed either by kstop_machine, before
SMP starts, or on module code before the module is executed. There is
no reason to do the modifications from assembly. The copy to and from
user functions are sufficient and produces cleaner and easier to read
code.

Thanks to Benjamin Herrenschmidt for suggesting the idea.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-14 10:36:03 +02:00
Steven Rostedt
732f3ca7d4 ftrace: use only 5 byte nops for x86
Mathieu Desnoyers revealed a bug in the original code. The nop that is
used to relpace the mcount caller can be a two part nop. This runs the
risk where a process can be preempted after executing the first nop, but
before the second part of the nop.

The ftrace code calls kstop_machine to keep multiple CPUs from executing
code that is being modified, but it does not protect against a task preempting
in the middle of a two part nop.

If the above preemption happens and the tracer is enabled, after the
kstop_machine runs, all those nops will be calls to the trace function.
If the preempted process that was preempted between the two nops is executed
again, it will execute half of the call to the trace function, and this
might crash the system.

This patch instead uses what both the latest Intel and AMD spec suggests.
That is the P6_NOP5 sequence of "0x0f 0x1f 0x44 0x00 0x00".

Note, some older CPUs and QEMU might fault on this nop, so this nop
is executed with fault handling first. If it detects a fault, it will then
use the code "0x66 0x66 0x66 0x66 0x90". If that faults, it will then
default to a simple "jmp 1f; .byte 0x00 0x00 0x00; 1:". The jmp is
not optimal but will do if the first two can not be executed.

TODO: Examine the cpuid to determine the nop to use.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-14 10:35:01 +02:00
Steven Rostedt
0a37605c22 ftrace: x86 mcount stub
x86 now sets up the mcount locations through the build and no longer
needs to record the ip when the function is executed. This patch changes
the initial mcount to simply return. There's no need to do any other work.
If the ftrace start up test fails, the original mcount will be what everything
will use, so having this as fast as possible is a good thing.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-14 10:34:58 +02:00
Steven Rostedt
e4b2b88661 ftrace: enable using mcount recording on x86
Enable the use of the __mcount_loc infrastructure on x86_64 and i386.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-14 10:34:54 +02:00
Ingo Molnar
8b1fa1d7b2 ftrace: mark lapic_wd_event() notrace
it can be called in the NMI path:

[    0.645999] calling  ftrace_dynamic_init+0x0/0xd6
[    0.647521] ------------[ cut here ]------------
[    0.647521] WARNING: at kernel/trace/ftrace.c:348 ftrace_record_ip+0x4e/0x252()
[    0.647521] Modules linked in:
[    0.647521] Pid: 15, comm: kstop1 Not tainted 2.6.27-rc1-tip #22686
[    0.647521]
[    0.647521] Call Trace:
[    0.647521]  <NMI>  [<ffffffff8024593f>] warn_on_slowpath+0x5d/0x84
[    0.647521]  [<ffffffff80220b99>] ? lapic_wd_event+0xb/0x5c
[    0.647521]  [<ffffffff80287b3b>] ftrace_record_ip+0x4e/0x252
[    0.647521]  [<ffffffff80211274>] mcount_call+0x5/0x31
[    0.647521]  [<ffffffff80220b9e>] ? lapic_wd_event+0x10/0x5c
[    0.647521]  [<ffffffff8083f3ec>] nmi_watchdog_tick+0x19d/0x1ad
[    0.647521]  [<ffffffff8083e875>] default_do_nmi+0x75/0x1e3
[    0.647521]  [<ffffffff8083f0b3>] do_nmi+0x5d/0x94
[    0.647521]  [<ffffffff8083e2d2>] nmi+0xa2/0xc2
[    0.647521]  [<ffffffff802b48c3>] ? check_bytes_and_report+0x11/0xcc
[    0.647521]  <<EOE>>  [<ffffffff80211274>] ? mcount_call+0x5/0x31
[    0.647521]  [<ffffffff802b49df>] check_object+0x61/0x1b0
[    0.647521]  [<ffffffff802b502a>] __slab_free+0x169/0x2ae
[    0.647521]  [<ffffffff80242dbf>] ? __cleanup_sighand+0x25/0x27
[    0.647521]  [<ffffffff80242dbf>] ? __cleanup_sighand+0x25/0x27
[    0.647521]  [<ffffffff802b60cd>] kmem_cache_free+0x85/0xb9
[    0.647521]  [<ffffffff80242dbf>] __cleanup_sighand+0x25/0x27
[    0.647521]  [<ffffffff80247b3d>] release_task+0x256/0x339
[    0.647521]  [<ffffffff802490b4>] do_exit+0x764/0x7ef
[    0.647521]  [<ffffffff8027624c>] __xchg+0x0/0x38
[    0.647521]  [<ffffffff8027619a>] ? stop_cpu+0x0/0xb2
[    0.647521]  [<ffffffff8027619a>] ? stop_cpu+0x0/0xb2
[    0.647521]  [<ffffffff8025922f>] kthread+0x4e/0x7b
[    0.647521]  [<ffffffff80212979>] child_rip+0xa/0x11
[    0.647521]  [<ffffffff80211c17>] ? restore_args+0x0/0x30
[    0.647521]  [<ffffffff802283a5>] ? native_load_tls+0x14/0x2e
[    0.647521]  [<ffffffff802591e1>] ? kthread+0x0/0x7b
[    0.647521]  [<ffffffff8021296f>] ? child_rip+0x0/0x11
[    0.647521]
[    0.647521] ---[ end trace 4eaa2a86a8e2da22 ]---
[    0.672032] initcall ftrace_dynamic_init+0x0/0xd6 returned 0 after 19 msecs

also mark it no-kprobes while at it.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-14 10:34:36 +02:00
Pekka Paalanen
611b159768 x86: fix mmiotrace 8-bit register decoding
When SIL, DIL, BPL or SPL registers were used in MMIO, the datum
was extracted from AH, BH, CH, or DH, which are incorrect.

Signed-off-by: Pekka Paalanen <pq@iki.fi>
Cc: "Vegard Nossum" <vegard.nossum@gmail.com>
Cc: "Steven Rostedt" <srostedt@redhat.com>
Cc: proski@gnu.org
Cc: "Pekka Enberg"
	<penberg@cs.helsinki.fi>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-14 10:33:50 +02:00
Andi Kleen
59512900ba oprofile: discover counters for op ppro too
Discover number of counters for all family 6 models even when not
in arch perfmon mode.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Robert Richter <robert.richter@amd.com>
2008-10-13 19:25:11 +02:00
Andi Kleen
b991702884 oprofile: Implement Intel architectural perfmon support
Newer Intel CPUs (Core1+) have support for architectural
events described in CPUID 0xA. See the IA32 SDM Vol3b.18 for details.

The advantage of this is that it can be done without knowing about
the specific CPU, because the CPU describes by itself what
performance events are supported. This is only a fallback
because only a limited set of 6 events are supported.
This allows to do profiling on Nehalem and on Atom systems
(later not tested)

This patch implements support for that in oprofile's Intel
Family 6 profiling module. It also has the advantage of supporting
an arbitary number of events now as reported by the CPU.
Also allow arbitary counter widths >32bit while we're at it.

Requires a patched oprofile userland to support the new
architecture.

v2: update for latest oprofile tree
    remove force_arch_perfmon

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Robert Richter <robert.richter@amd.com>
2008-10-13 19:25:09 +02:00
Andi Kleen
f645f64064 oprofile: Don't report Nehalem as core_2
This essentially reverts Linus' earlier 4b9f12a377
commit. Nehalem is not core_2, so it shouldn't be reported as such.
However with the earlier arch perfmon patch it will fall back to
arch perfmon mode now, so there is no need to fake it as core_2.
The only drawback is that Linus will need to patch the arch perfmon
support into his oprofile binary now, but I think he can do that.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Robert Richter <robert.richter@amd.com>
2008-10-13 19:25:06 +02:00
Andi Kleen
5d4488027d oprofile: drop const in num counters field
allow to modify it at runtime

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Robert Richter <robert.richter@amd.com>
2008-10-13 19:25:04 +02:00
Linus Torvalds
244dc4e54b Merge git://git.infradead.org/users/dwmw2/random-2.6
* git://git.infradead.org/users/dwmw2/random-2.6:
  Fix autoloading of MacBook Pro backlight driver.
  Automatic MODULE_ALIAS() for DMI match tables.
  Remove asm/a.out.h files for all architectures without a.out support.
  Introduce HAVE_AOUT symbol to remove hard-coded arch list for BINFMT_AOUT
  Remove redundant CONFIG_ARCH_SUPPORTS_AOUT
  S390: Update comments about why we don't use <asm-generic/statfs.h>
  SPARC: Use <asm-generic/statfs.h>
  PowerPC: Use <asm-generic/statfs.h>
  PARISC: Use <asm-generic/statfs.h>
  x86_64: Use <asm-generic/statfs.h>
  IA64: Use <asm-generic/statfs.h>
  ARM: Use <asm-generic/statfs.h>
  Make <asm-generic/statfs.h> suitable for 64-bit platforms.
  Define and use PCI_DEVICE_ID_MARVELL_88ALP01_CCIC for CAFÉ camera driver
  [MTD] [NAND] Define and use PCI_DEVICE_ID_MARVELL_88ALP01_NAND for CAFÉ
  Use PCI_DEVICE_ID_88ALP01 for CAFÉ chip, rather than PCI_DEVICE_ID_CAFE.
  EFS: Don't set f_fsid in statfs().
2008-10-13 09:59:14 -07:00
David Woodhouse
e758936e02 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6
Conflicts:

	include/asm-x86/statfs.h
2008-10-13 17:13:56 +01:00
Ingo Molnar
3a1dfe6eef x86/mm: unify init task OOM handling
Linus noticed that the "again:" versus "survive:" OOM logic for
the init task was arbitrarily different.

The 64-bit codepath is the better one, because it correctly re-lookups
the vma after having dropped the ->mmap_sem.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-10-13 18:11:13 +02:00
Linus Torvalds
891cffbd6b x86/mm: do not trigger a kernel warning if user-space disables interrupts and generates a page fault
Arjan reported a spike in the following bug pattern in v2.6.27:

   http://www.kerneloops.org/searchweek.php?search=lock_page

which happens because hwclock started triggering warnings due to
a (correct) might_sleep() check in the MM code.

The warning occurs because hwclock uses this dubious sequence of
code to run "atomic" code:

  static unsigned long
  atomic(const char *name, unsigned long (*op)(unsigned long),
         unsigned long arg)
  {
    unsigned long v;
    __asm__ volatile ("cli");
    v = (*op)(arg);
    __asm__ volatile ("sti");
    return v;
  }

Then it pagefaults in that "atomic" section, triggering the warning.

There is no way the kernel could provide "atomicity" in this path,
a page fault is a cannot-continue machine event so the kernel has to
wait for the page to be filled in.

Even if it was just a minor fault we'd have to take locks and might have
to spend quite a bit of time with interrupts disabled - not nice to irq
latencies in general.

So instead just enable interrupts in the pagefault path unconditionally
if we come from user-space, and handle the fault.

Also, while touching this code, unify some trivial parts of the x86
VM paths at the same time.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Reported-by: Arjan van de Ven <arjan@infradead.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-13 17:46:39 +02:00
Ingo Molnar
c00193f9f0 Merge branches 'oprofile-v2' and 'timers/hpet' into x86/core-v4 2008-10-13 14:18:42 +02:00
Ingo Molnar
accba5f396 Merge branch 'linus' into oprofile-v2
Conflicts:
	arch/x86/kernel/apic_32.c
	arch/x86/oprofile/nmi_int.c
	include/linux/pci_ids.h
2008-10-13 11:05:51 +02:00
Ingo Molnar
c493756e2a Merge branch 'linus' into oprofile
Conflicts:
	arch/x86/kernel/apic_32.c
	include/linux/pci_ids.h
2008-10-13 10:52:30 +02:00
Yinghai Lu
c1a2f4b108 x86: change early_ioremap to use slots instead of nesting
so we could remove the requirement that one needs to call
early_iounmap() in exactly reverse order of early_ioremap().

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-13 10:34:23 +02:00
Jan Beulich
79aa10dd9f x86: adjust dependencies for CONFIG_X86_CMOV
Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-13 10:33:50 +02:00
Alexander van Heukelum
6a2ae2d9f9 dumpstack: x86: various small unification steps, fix
After "dumpstack: x86: various small unification steps", the
assembler gives the following compile error. The error is in
dumpstack_64.c.

{standard input}: Assembler messages:
{standard input}:720: Error: Incorrect register `%rbx' used with `l' suffix
{standard input}:1340: Error: Incorrect register `%r12' used with `l' suffix

Indeed the suffix in get_bp() was wrong.

Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-13 10:33:49 +02:00
Thomas Gleixner
cb48bb5999 x86: remove additional_cpus
remove remainder of additional_cpus logic. We now just listen to the
disabled_cpus value like we did for years. disabled_cpus is always >=
0 so no need for an extra check.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-13 10:33:48 +02:00
Ingo Molnar
b807305059 x86: remove additional_cpus configurability
additional_cpus=<x> parameter is dangerous and broken: for example
if we boot additional_cpus=-2 on a stock dual-core system it will
crash the box on bootup.

So reduce the maze of code a bit by removingthe user-configurability
angle.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-13 10:33:47 +02:00
Thomas Gleixner
649c6653fa x86: improve UP kernel when CPU-hotplug and SMP is enabled
num_possible_cpus() can be > 1 when disabled CPUs have been accounted.

Disabled CPUs are not in the cpu_present_map, so we can use
num_present_cpus() as a safe indicator to switch to UP alternatives.

Reported-by: Chuck Ebbert <cebbert@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: <stable@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-13 10:33:46 +02:00
Alexander van Heukelum
8a541665b9 dumpstack: x86: various small unification steps
- define STACKSLOTS_PER_LINE and use it
 - define get_bp macro to hide the %%ebp/%%rbp difference
 - i386: check task==NULL in dump_trace, like x86_64
 - i386: show_trace(NULL, ...) uses current automatically
 - x86_64: use [#%d] for die_counter, like i386
 - whitespace and comments

Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-13 10:33:45 +02:00
Alexander van Heukelum
802a67de0c dumpstack: i386: make kstack= an early boot-param and add oops=panic
- make kstack= and early_param
 - add oops=panic, setting panic_on_oops

Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-13 10:33:44 +02:00
Alexander van Heukelum
ca0a816403 dumpstack: x86: use log_lvl and unify trace formatting
- x86: Write log_lvl strings if available
 - start raw stack dumps on new line
 - i386: Remove extra indentation for raw stack dumps

Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-13 10:33:43 +02:00
Alexander van Heukelum
2ac53721f3 dumptrace: x86: consistently include loglevel, print stack switch
- i386 and x86_64: always printk the 'data' parameter
 - i386: announce stack switch (irq -> normal)
 - i386: check if there is a stack switch before announcing it

There is a warning that 'context' might come out corrupt in early
boot. If this is true it should be fixed, not worked around.

Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-13 10:33:42 +02:00
Alexander van Heukelum
3a18512db0 dumpstack: x86: add "end" parameter to valid_stack_ptr and print_context_stack
- Add "end" parameter to valid_stack_ptr and print_context_stack
 - use sizeof(long) as the size of a word on the stack

Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-13 10:33:41 +02:00
Alexander van Heukelum
161827903b dumpstack: x86: make printk_address equal
- x86_64: use %p to print an address
 - make i386-version the same as the above

The result should be the same on x86_64; on i386 the
output only changes if CONFIG_KALLSYMS is turned off,
in which case the address is printed twice.

Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-13 10:33:40 +02:00
Alexander van Heukelum
dd6e4eba1c dumpstack: x86: move die_nmi to dumpstack_32.c
For some reason die_nmi is still defined in traps.c for
i386, but is found in dumpstack_64.c for x86_64. Move it
to dumpstack_32.c

Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-13 10:33:39 +02:00
Alexander van Heukelum
8728861b4f traps: x86: finalize unification of traps.c
traps_32.c and traps_64.c are now equal. Move one to traps.c,
delete the other one and change the Makefile

Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-13 10:33:29 +02:00
Alexander van Heukelum
081f75bbdc traps: x86: make traps_32.c and traps_64.c equal
Use CONFIG_X86_64/CONFIG_X86_32 to condtionally compile the
parts needed for x86_64 or i386 only.

Runs a small userspace for a number of minimal configurations
and boots the defconfigs.

Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-13 10:33:28 +02:00
Alexander van Heukelum
c1d518c842 traps: x86: various noop-changes preparing for unification of traps_xx.c
- reordering include files
 - whitespace changes
 - comment changes
 - removed unused bad_intr()
 - make default_do_nmi static

Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-13 10:33:27 +02:00
Alexander van Heukelum
a5ae2330a5 traps: x86_64: use task_pid_nr(tsk) instead of tsk->pid in do_general_protection
Use task_pid_nr(tsk) instead of tsk->pid in do_general_protection.

Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-13 10:33:26 +02:00
Alexander van Heukelum
7970479c48 traps: i386: expand clear_mem_error and remove from mach_traps.h
This is the last user of clear_mem_error, which is defined
only on i386. Expand the inline function and remove it from
include/asm-x86/mach-default/mach_traps.h

Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-13 10:33:25 +02:00
Alexander van Heukelum
1c9af8a9f4 traps: x86_64: make io_check_error equal to the one on i386
Make io_check_error equal to the one on i386.

Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-13 10:33:24 +02:00
Alexander van Heukelum
4915a35e35 traps: i386: use preempt_conditional_sti/cli in do_int3
Use preempt_conditional_sti/cli in do_int3, like on x86_64.

Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-13 10:33:23 +02:00
Alexander van Heukelum
091d30c8f7 traps: x86_64: make math_state_restore more like i386
- rename variable me -> tsk
 - get thread and tsk like i386
 - expand used_math()
 - copy comment

Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-13 10:33:22 +02:00
Alexander van Heukelum
699d2937d4 traps: x86: converge trap_init functions
- set_system_gate on i386 is really set_system_trap_gate
 - set_system_gate on x86_64 is really set_system_intr_gate
 - ist=0 means no special stack switch is done:
	- introduce STACKFAULT_STACK, DOUBLEFAULT_STACK, NMI_STACK,
		DEBUG_STACK and MCE_STACK as on x86_64.
	- use the _ist variants with XXX_STACK set to zero
 - remove set_system_gate

Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm>
Signed-off-by: Ingo Molnar <mingo@elte.hu>

traps: x86: correct copy/paste bug: a trap is a GATE_TRAP

Fix copy/paste/forgot-to-edit bug in desc.h.

Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-13 10:33:22 +02:00
Alexander van Heukelum
3d2a71a596 x86, traps: converge do_debug handlers
Make the x86_64-version and the i386-version of do_debug
more similar.

 - introduce preempt_conditional_sti/cli to i386. The preempt-count
	is now elevated during the trap handler, like on x86_64. It
	does not run on a separate stack, however.
 - replace an open-coded "send_sigtrap"
 - copy some comments

Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-13 10:33:21 +02:00
Alexander van Heukelum
e407d62088 x86, traps: introduce dotraplinkage
Mark the exception handlers with "dotraplinkage" to hide the
calling convention differences between i386 and x86_64.

Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-13 10:33:20 +02:00
Alexander van Heukelum
ae82157b3d x86, traps, i386: factor out lazy io-bitmap copy
x86_64 does not do the lazy io-bitmap dance. Putting it in
its own function makes i386's do_general_protection look
much more like x86_64's.

Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-13 10:33:19 +02:00
Alexander van Heukelum
a28680b4b8 x86, traps: split out math_error and simd_math_error
Split out math_error from do_coprocessor_error and simd_math_error
from do_simd_coprocessor_error, like on i386. While at it, add the
"error_code" parameter to do_coprocessor_error, do_simd_coprocessor_error
and do_spurious_interrupt_bug.

This does not change the generated code, but brings the declarations in
line with all the other trap handlers.

Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-13 10:33:18 +02:00
Alexander van Heukelum
6fcbede3fd x86_64: split out dumpstack code from traps_64.c
The dumpstack code is logically quite independent from the
hardware traps. Split it out into its own file.

Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-13 10:33:17 +02:00
Alexander van Heukelum
2bc5f927d4 i386: split out dumpstack code from traps_32.c
The dumpstack code is logically quite independent from the
hardware traps. Split it out into its own file.

Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-13 10:33:16 +02:00
Vegard Nossum
af5c2bd16a x86: fix virt_addr_valid() with CONFIG_DEBUG_VIRTUAL=y, v2
virt_addr_valid() calls __pa(), which calls __phys_addr(). With
CONFIG_DEBUG_VIRTUAL=y, __phys_addr() will kill the kernel if the
address *isn't* valid. That's clearly wrong for virt_addr_valid().

We also incorporate the debugging checks into virt_addr_valid().

Signed-off-by: Vegard Nossum <vegardno@ben.ifi.uio.no>
Acked-by: Jiri Slaby <jirislaby@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-13 10:33:15 +02:00
Chuck Ebbert
7f2f49a582 x86: allow number of additional hotplug CPUs to be set at compile time, V2
x86: allow number of additional hotplug CPUs to be set at compile time, V2

The default number of additional CPU IDs for hotplugging is determined
by asking ACPI or mptables how many "disabled" CPUs there are in the
system, but many systems get this wrong so that e.g. a uniprocessor
machine gets an extra CPU allocated and never switches to single CPU
mode.

And sometimes CPU hotplugging is enabled only for suspend/hibernate
anyway, so the additional CPU IDs are not wanted. Allow the number
to be set to zero at compile time.

Also, force the number of extra CPUs to zero if hotplugging is disabled
which allows removing some conditional code.

Tested on uniprocessor x86_64 that ACPI claims has a disabled processor,
with CPU hotplugging configured.

("After" has the number of additional CPUs set to 0)
Before: NR_CPUS: 512, nr_cpu_ids: 2, nr_node_ids 1
After: NR_CPUS: 512, nr_cpu_ids: 1, nr_node_ids 1

[Changed the name of the option and the prompt according to Ingo's
 suggestion.]

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-13 10:33:14 +02:00
Krzysztof Helt
94f6bac105 x86: do not allow to optimize flag_is_changeable_p() (rev. 2)
The flag_is_changeable_p() is used by
has_cpuid_p() which can return different results
in the code sequence below:

 if (!have_cpuid_p())
      identify_cpu_without_cpuid(c);

  /* cyrix could have cpuid enabled via c_identify()*/
  if (!have_cpuid_p())
      return;

Otherwise, the gcc 3.4.6 optimizes these two calls
into one which make the code not working correctly.

Cyrix cpus have the CPUID instruction enabled before
the second call to the have_cpuid_p() but
it is not detected due to the gcc optimization.
Thus the ARR registers (mtrr like) are not detected
on such a cpu.

Signed-off-by: Krzysztof Helt <krzysztof.h1@wp.pl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-13 10:33:13 +02:00
Pekka Enberg
e2ce07c804 x86: __show_registers() and __show_regs() API unification
Currently the low-level function to dump user-passed registers on i386 is
called __show_registers() whereas on x86-64 it's called __show_regs(). Unify
the API to simplify porting of kmemcheck to x86-64.

Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
Acked-by: Vegard Nossum <vegard.nossum@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-13 10:33:04 +02:00
Jack Steiner
1e0b5d00b2 x86, UV: new UV genapic functions for x2apic
Add functions that use the infrastructure added by the x2apic code. These
functions were originally stubbed out since the UV code went into the
tree prior to the x2apic code.

Signed-off-by: Jack Steiner <steiner@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-13 10:22:53 +02:00
Chuck Ebbert
14adf855ba x86: move prefill_possible_map calling early, fix, V2
Commit 4a701737 ("x86: move prefill_possible_map calling early, fix")
is the wrong fix: prefill_possible_map() needs to be available
even when CONFIG_HOTPLUG_CPU is not set. A followon patch will do that.

Fix this correctly by making prefill_possible_map() available even when
CONFIG_HOTPLUG_CPU is not set. The function is needed so that
the number of possible CPUs can be determined.

Tested on uniprocessor machine with CPU hotplug disabled.

From boot log:
  Before: NR_CPUS: 512, nr_cpu_ids: 512, nr_node_ids 1
  After: NR_CPUS: 512, nr_cpu_ids: 1, nr_node_ids 1

Signed-off-by: Chuck Ebbert <cebbert@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-13 10:22:50 +02:00
Krzysztof Helt
69d45dd1c3 x86: merge winchip-2 and winchip-2a cpu choices
The Winchip-2 and Winchip-2A cpu choices select the
same options for kernel and compiler.

Merge them to save few bytes and reduce confusion.

Signed-off-by: Krzysztof Helt <krzysztof.h1@wp.pl>
Acked-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-13 10:22:48 +02:00
Jack Steiner
d2f904bb9a x86, uv: fix for size of hub mappings
Fix the size of the mappings of UV hub registers. Size must
be a function of the maximum node number within the SSI.

Signed-off-by: Jack Steiner <steiner@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-13 10:22:46 +02:00
Yinghai Lu
9b658f6f8b x86: cleanup, remove extra ifdef
also change two functions to static.

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-13 10:22:44 +02:00
Alexander van Heukelum
3c1326f8a6 traps: i386: make do_trap more like x86_64
This patch hardcodes which traps should be forwarded to
handle_vm86_trap in do_trap. This allows to remove the
vm86 parameter from the i386-version of do_trap, which
makes the DO_VM86_ERROR and DO_VM86_ERROR_INFO macros
unnecessary.

x86_64 part is whitespace only.

Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-13 10:22:34 +02:00
Alexander van Heukelum
69c89b5bf7 traps: x86: remove trace_hardirqs_fixup from pagefault handler
The last use of trace_hardirqs_fixup is unnecessary, because the
trap is taken with interrupt off on i386 as well as x86_64, and
the irq-tracer is notified of this from the assembly code.

trace_hardirqs_fixup and trace_hardirqs_fixup_flags are removed
from include/asm-x86/irqflags.h as they are no longer used.

Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-13 10:22:04 +02:00
Alexander van Heukelum
a491503e4d traps: x86_64: remove trace_hardirqs_fixup from debug handler
All exceptions are taken via interrupt gates. TRACE_IRQS_OFF
is called just before entering the C code, so the irq state
is known to the irq tracer at this point. No need to call
trace_hardirqs_fixup.

Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-13 10:22:02 +02:00
Alexander van Heukelum
8b1c870f19 traps: x86_64: remove trace_hardirqs_fixup from int3 handler
All exceptions are taken via interrupt gates. TRACE_IRQS_OFF
is called just before entering the C code, so the irq state
is known to the irq tracer at this point. No need to call
trace_hardirqs_fixup.

Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-13 10:22:00 +02:00
Alexander van Heukelum
4b986a3652 traps: x86_64: remove trace_hardirqs_fixup from DO_ERROR_INFO macro
All exceptions are taken via interrupt gates. TRACE_IRQS_OFF
is called just before entering the C code, so the irq state
is known to the irq tracer at this point. No need to call
trace_hardirqs_fixup.

Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-13 10:21:58 +02:00
Alexander van Heukelum
7e61a79324 traps: x86_64: add TRACE_IRQS_OFF in paranoidentry macro
Add TRACE_IRQS_OFF just before entering the C code.

All exceptions are taken via interrupt gates. If irq tracing is
enabled, it should be notified as soon as possible. Interrupts
are only (conditionally) re-enabled in C code.

Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-13 10:21:55 +02:00
Alexander van Heukelum
6b11d4ef3e traps: x86_64: add TRACE_IRQS_OFF in error_entry
Add TRACE_IRQS_OFF just before entering the C code.

All exceptions are taken via interrupt gates. If irq tracing is
enabled, it should be notified as soon as possible. Interrupts
are only (conditionally) re-enabled in C code.

Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-13 10:21:53 +02:00
Jack Steiner
2e42060c19 x86, uv: add early detection of UV system types
Portions of the ACPI code needs to know if a system is a UV system prior
to genapic initialization. This patch adds a call early_acpi_boot_init()
so that the apic type is discovered earlier.

V2 of the patch adding fixes from Yinghai Lu.
Much cleaner and smaller.

Signed-off-by: Jack Steiner <steiner@sgi.com>
Acked-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-13 10:21:51 +02:00
Glauber Costa
e04d645f32 x86: move vgetcpu mode probing to cpu detection
Take it out of time initialization and move it to
cpu detection time.

Signed-off-by: Glauber Costa <glommer@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-13 10:21:49 +02:00
Glauber Costa
33c053d0ae x86: wrap MCA_bus test around an ifdef
Only test for MCA_bus if support for MCA is compiled in.
Also, for x86_64, write the code inside the conditional
for consistency with i386. It won't bite us, since it'll
probably never select CONFIG_MCA anyway.

Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-13 10:21:47 +02:00
Glauber Costa
2c460d0b68 x86: replace hardcoded number
Replace "4" in time_32.c code by sizeof(long).
This way, it can work on x86_64 too.

Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-13 10:21:44 +02:00
Glauber Costa
461ebd1095 x86: rename timer_event_interrupt to timer_interrupt
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-13 10:21:42 +02:00
Glauber Costa
780209af71 x86: make init_ISA_irqs nonstatic
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-13 10:21:40 +02:00
Glauber Costa
2f97435e57 x86: factor out irq initialization for x86_64
Provide apic_intr_init and smp_intr_init functions.

Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-13 10:21:38 +02:00
Glauber Costa
2ff298372d x86: bind irq0 irq data to cpu0
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-13 10:21:36 +02:00
Glauber Costa
8de0b8a7ea x86: use user_mode_vm instead of user_mode
For x86_64, it does not really matter. But makes the
code equal to i386.

Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-13 10:21:34 +02:00
Glauber Costa
3927fa9e4b x86: use frame pointer information on x86_64 profile_pc
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-13 10:21:29 +02:00
Glauber Costa
097a0788df x86: set bp field in pt_regs properly
Save rbp twice: One is for marking the stack frame, as usual (already
there), and the other, to fill pt_regs properly. This is because bx
comes right before the last saved register in that structure, and not
bp. If the base pointer were in the place bx is today, this would not
be needed.

Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-13 10:21:27 +02:00
Glauber Costa
2c44e66843 x86: coalesce tests
Coalesce v8086_mode and user_mode into a single
user_mode_vm() test.

Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-13 10:21:25 +02:00
Glauber Costa
cf4cfb225a x86: use user_mode macro
Instead of using SEGMENT_IS_KERNEL_CODE, use the
"user_mode" macro, which can play the same role. Delete
the former, since it now lacks any user.

Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-13 10:21:23 +02:00
Yinghai Lu
bd32a8cfa8 x86: cpu don't print duplicated vendor string
Some CPUs have vendor string in the middle of model_id instead of beginning

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-13 10:21:21 +02:00
Jan Beulich
606ee44dbb x86: make mm/gup.c more virtualization friendly
Since pte_flags() is much cheaper than pte_val() in some virtualized
environments (namely, Xen), use the former whereever possible.

Signed-off-by: Jan Beulich <jbeulich@novell.com>
Cc: "Nick Piggin" <npiggin@suse.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-13 10:21:18 +02:00
Jan Beulich
5e72d9e485 x86-64: fix combining of regions in init_memory_mapping()
When nr_range gets decremented, the same slot must be considered for
coalescing with its new successor again.

The issue is apparently pretty benign to native code, but surfaces as a
boot time crash in our forward ported Xen tree (where the page table
setup overall works differently than in native).

Signed-off-by: Jan Beulich <jbeulich@novell.com>
Acked-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-13 10:21:16 +02:00
Cyrill Gorcunov
59ef48a58e x86: smpboot - check if we have ESR register in wakeup_secondary_cpu
We should check if we have ESR register before reading from it.

Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
Cc: Yinghai Lu <yhlu.kernel@gmail.com>
Cc: "Maciej W. Rozycki" <macro@linux-mips.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-13 10:21:14 +02:00
Ingo Molnar
3e6de5a393 x86: print out EBDA/lowmem address
it's useful for debugging purposes to know the location of the EBDA.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-13 10:21:10 +02:00
Yinghai Lu
a73aaedd95 x86: check dsdt before find oem table for es7000, v2
v2: use __acpi_unmap_table()

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-13 10:21:07 +02:00
Jeremy Fitzhardinge
a32ad46267 x86-64: don't check for map replacement
The check prevents flags on mappings from being changed, which is not
desireable.  There's no need to check for replacing a mapping, and
x86-32 does not do this check.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-13 10:21:05 +02:00
Jeremy Fitzhardinge
88b4c14696 x86: use early_memremap() in setup.c
The remappings in setup.c are all just ordinary memory, so use
early_memremap() rather than early_ioremap().

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-13 10:21:03 +02:00
Jeremy Fitzhardinge
1494177942 x86: add early_memremap()
early_ioremap() is also used to map normal memory when constructing
the linear memory mapping.  However, since we sometimes need to be able
to distinguish between actual IO mappings and normal memory mappings,
add a early_memremap() call, which maps with PAGE_KERNEL (as opposed
to PAGE_KERNEL_IO for early_ioremap()), and use it when constructing
pagetables.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-13 10:21:01 +02:00
Jeremy Fitzhardinge
be43d72835 x86: add _PAGE_IOMAP pte flag for IO mappings
Use one of the software-defined PTE bits to indicate that a mapping is
intended for an IO address.  On native hardware this is irrelevent,
since a physical address is a physical address.  But in a virtual
environment, physical addresses are also virtualized, so there needs
to be some way to distinguish between pseudo-physical addresses and
actual hardware addresses; _PAGE_IOMAP indicates this intent.

By default, __supported_pte_mask masks out _PAGE_IOMAP, so it doesn't
even appear in the final pagetable.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-13 10:20:56 +02:00
Alexander van Heukelum
07bb2f6236 i386: trace_hardirqs_fixup should now not be necessary: irqs are off.
The exception handlers in entry_32.S should now all call
TRACE_IRQS_OFF before calling the C code. The calls to
trace_hardirqs_fixup should now be unnecessary. Remove them.

Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-13 10:20:54 +02:00
Alexander van Heukelum
a790392faa i386: add TRACE_IRQS_OFF for the exception 3 (int3)
At this point interrupts are off, so let's inform the tracing
code of that fact before calling into C.

Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-13 10:20:52 +02:00
Alexander van Heukelum
e0c7317557 i386: add TRACE_IRQS_OFF for the nmi
At this point interrupts are off, so let's inform the tracing
code of that fact before calling into C.

Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-13 10:20:49 +02:00
Alexander van Heukelum
43024a8a5d i386: add TRACE_IRQS_OFF for exception 1 (debug)
At this point interrupts are off, so let's inform the tracing
code of that fact before calling into C.

Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-13 10:20:47 +02:00
Alexander van Heukelum
85cea51d7e i386: add TRACE_IRQS_OFF to entry_32.S in 'error_code'
Many exceptions use the same code path via the label 'error_code'
in entry_32.S. At this point interrupts are off, so let's inform
the tracing code of that fact before calling into C.

Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-13 10:20:45 +02:00
Alexander van Heukelum
f8e0870f58 i386: remove temporary DO_TRAP macros, expanding the last one used
Only one use of the DO_TRAP macros remains. Expand that one and
remove the macros now.

Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-13 10:20:43 +02:00
Alexander van Heukelum
b939bde278 i386: convert hardware exception 19 to an interrupt gate
Handle SIMD coprocessor exception with interrupt initially off.

Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-13 10:20:40 +02:00
Alexander van Heukelum
eb642f6208 i386: convert hardware exception 18 to an interrupt gate
Handle machine check exception with interrupt initially off.

Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-13 10:20:38 +02:00
Alexander van Heukelum
5feedfd401 i386: convert hardware exception 17 to an interrupt gate
Handle alignment check exception with interrupt initially off.

Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-13 10:20:36 +02:00
Alexander van Heukelum
252d28fe65 i386: convert hardware exception 16 to an interrupt gate
Handle coprocessor error exception with interrupt initially off.

Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-13 10:20:33 +02:00
Alexander van Heukelum
cf81978d5f i386: convert hardware exception 15 to an interrupt gate
Handle exception 15 with interrupt initially off.

Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-13 10:20:31 +02:00
Alexander van Heukelum
c6df0d71be i386: convert hardware exception 13 to an interrupt gate
Handle general protection exception with interrupt initially off.

Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-13 10:20:29 +02:00
Alexander van Heukelum
f5ca81878b i386: convert hardware exception 12 to an interrupt gate
Handle stack segment exception with interrupt initially off.

Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-13 10:20:27 +02:00
Alexander van Heukelum
36d936c798 i386: convert hardware exception 11 to an interrupt gate
Handle segment not present exception with interrupt initially off.

Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-13 10:20:24 +02:00
Alexander van Heukelum
6bf77bf939 i386: convert hardware exception 10 to an interrupt gate
Handle invalid TSS exception with interrupt initially off.

Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-13 10:20:22 +02:00
Alexander van Heukelum
51bc1ed606 i386: convert hardware exception 9 to an interrupt gate
Handle coprocessor segment overrun exception with interrupt initially off.

Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-13 10:20:20 +02:00
Alexander van Heukelum
7643e9b936 i386: convert hardware exception 7 to an interrupt gate
Handle no coprocessor exception with interrupt initially off.

device_not_available in entry_32.S calls either math_state_restore
or math_emulate. This patch adds an extra indirection to be
able to re-enable interrupts explicitly in traps_32.c

Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-13 10:20:17 +02:00
Alexander van Heukelum
12394cf567 i386: convert hardware exception 6 to an interrupt gate
Handle invalid opcode exception with interrupt initially off.

Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-13 10:20:15 +02:00
Alexander van Heukelum
64f644c0b4 i386: convert hardware exception 5 to an interrupt gate
Handle bounds exception with interrupt initially off.

Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-13 10:20:13 +02:00
Alexander van Heukelum
8d6f9d69bd i386: convert hardware exception 4 to an interrupt gate
Handle overflow exception with interrupt initially off.

Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-13 10:20:11 +02:00
Alexander van Heukelum
b94da1e4b7 i386: expand exception 3 DO_TRAP macro
The int3 exception was already takes as an interrupt and
do_int3 does not fit in the new DO_ERROR macro. This patch
just expands the DO_TRAP macro and rearranges the code a
bit.

No functional changes intended.

Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-13 10:20:08 +02:00
Alexander van Heukelum
976382dcbe i386: convert hardware exception 0 to an interrupt gate
Handle divide error exception with interrupt initially off.

Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-13 10:20:06 +02:00
Alexander van Heukelum
61aef7d249 i386: prepare to convert exceptions to interrupts
There is some macro magic in traps_32.c to construct standard
exception dispatch functions. This patch renames the DO_ERROR-
like macros to DO_TRAP, and introduces new DO_ERROR ones that
conditionally reenable interrupts explicitly, like x86_64.

No code changes.

Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-13 10:20:04 +02:00
Alexander van Heukelum
762db43470 i386: remove kprobes' restore_interrupts in favour of conditional_sti
x86_64 uses a helper function conditional_sti in traps_64.c which
is equal to restore_interrupts in kprobes.h. The only user of
restore_interrupts is in traps_32.c. Introduce conditional_sti
for i386 and remove restore_interrupts.

Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-13 10:20:02 +02:00
Yinghai Lu
927604c759 x86: rename discontig_32.c to numa_32.c
name it in line with its purpose.

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-13 10:19:59 +02:00
Manfred Spraul
0cefa5b9b0 arch/x86/kernel/smpboot.c: Clarify when irq processing begins.
Secondary cpus start with local interrupts disabled.
start_secondary() first initializes the new cpu, then it enables the
local interrupts. (although interrupts are enabled within smp_callin()
as well).

Right now, the local interrupts are enabled as a side effect of calling
ipi_call_lock_irq().

The attached patch clarifies when local interrupts are enabled.

Signed-off-by: Manfred Spraul <manfred@colorfullife.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-13 10:19:57 +02:00
Jan Beulich
295286a891 x86-64: slightly stream-line 32-bit syscall entry code
Avoid updating registers or memory twice as well as needlessly loading
or copying registers.

Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-13 10:19:54 +02:00
Ingo Molnar
8daf14cf56 Merge branches 'x86/xen', 'x86/build', 'x86/microcode', 'x86/mm-debug-v2', 'x86/memory-corruption-check', 'x86/early-printk', 'x86/xsave', 'x86/ptrace-v2', 'x86/quirks', 'x86/setup', 'x86/spinlocks' and 'x86/signal' into x86/core-v2 2008-10-12 15:50:02 +02:00
Ingo Molnar
1db5fff9ae x86: make processor type select depend on CONFIG_EMBEDDED
deselecting one of the CPU type CONFIG_CPU_SUP_* config options
can render a kernel unbootable. Make sure this option is only
available if CONFIG_EMBEDDED is enabled.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-12 15:44:07 +02:00
Ingo Molnar
b7b3a42533 x86: extend processor type select help text
extend the help text of the CONFIG_CPU_SUP_* config options to
express what it does and what effects it has.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-12 15:36:24 +02:00
Ingo Molnar
8a66712ba0 x86, amd-iommu: propagate PCI device enabling error
propagate an error in enabling the PCI device.

Also eliminates this warning:

 arch/x86/kernel/amd_iommu_init.c: In function ‘init_iommu_one’:
 arch/x86/kernel/amd_iommu_init.c:726: warning: ignoring return value of ‘pci_enable_device’, declared with attribute warn_unused_result

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-12 15:24:53 +02:00
Ingo Molnar
d562353a45 warnings: fix arch/x86/kernel/io_apic_64.c
fix:

 arch/x86/kernel/io_apic_64.c: In function ‘print_local_APIC’:
 arch/x86/kernel/io_apic_64.c:1284: warning: format ‘%08x’ expects type ‘unsigned int’, but argument 2 has type ‘long unsigned int’
 arch/x86/kernel/io_apic_64.c:1285: warning: format ‘%08x’ expects type ‘unsigned int’, but argument 2 has type ‘long unsigned int’

We want to print the two halves of 'icr' at 32 bit width.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-12 15:22:22 +02:00
Ingo Molnar
45e96f26f2 warnings: fix arch/x86/kernel/early_printk.c
fix warning:

  arch/x86/kernel/early_printk.c:993: warning: ‘enable_debug_console’ defined but not used

Eliminate dead code.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-12 15:19:36 +02:00
Ingo Molnar
9f482807a6 x86, fpu: check __clear_user() return value
fix warning:

  arch/x86/kernel/xsave.c: In function ‘save_i387_xstate’:
  arch/x86/kernel/xsave.c:98: warning: ignoring return value of ‘__clear_user’, declared with attribute warn_unused_result

check the return value and act on it. We should not be ignoring faults
at this point.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-12 15:17:39 +02:00
Ingo Molnar
620f2efcdc Merge branch 'linus' into x86/xsave 2008-10-12 15:17:14 +02:00
Ingo Molnar
46eaa67020 x86: memory corruption check - cleanup
Move the prototypes from the generic kernel.h header to the more
appropriate include/asm-x86/bios_ebda.h header file.

Also, remove the check from the power management code - this is a
pure x86 matter for now.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-12 15:09:23 +02:00
Ingo Molnar
a9b9e81c91 Merge branch 'linus' into x86/memory-corruption-check 2008-10-12 15:05:39 +02:00
Ingo Molnar
eceb138336 Merge branches 'core/signal' and 'x86/spinlocks' into x86/xen
Conflicts:
	include/asm-x86/spinlock.h
2008-10-12 13:20:25 +02:00
Ingo Molnar
84e9c95ad9 Merge branch 'x86/signal' into core/signal 2008-10-12 13:17:07 +02:00
Ingo Molnar
1389ac4b97 Merge branch 'linus' into x86/signal
Conflicts:
	arch/x86/kernel/signal_64.c
2008-10-12 12:49:27 +02:00
Ingo Molnar
acbaa41a78 Merge branch 'linus' into x86/quirks
Conflicts:
	arch/x86/kernel/early-quirks.c
2008-10-12 12:43:21 +02:00
Ingo Molnar
365d46dc9b Merge branch 'linus' into x86/xen
Conflicts:
	arch/x86/kernel/cpu/common.c
	arch/x86/kernel/process_64.c
	arch/x86/xen/enlighten.c
2008-10-12 12:37:32 +02:00
Roland McGrath
325af5fb14 x86: ioperm user_regset
This adds a user_regset type for the x86 io permissions bitmap.
This makes it appear in core dumps (when ioperm has been used).
It will also make it visible to debuggers in the future.

Signed-off-by: Roland McGrath <roland@redhat.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
[conflict resolutions: Signed-off-by: Ingo Molnar <mingo@elte.hu> ]
2008-10-12 12:05:55 +02:00
Ingo Molnar
206855c321 Merge branch 'x86/urgent' into core/signal
Conflicts:
	arch/x86/kernel/signal_64.c
2008-10-12 11:32:17 +02:00
Petr Vandrovec
cb58ffc388 x86: fix early panic on amd64 due to typo in supported CPU section
Do not crash when enumerating supported CPU architectures

SECURITY_INIT somehow ended up in x86_cpu_dev.init section.  That caused printk
in code which prints supported architectures to hit #GP due to non-canonical
address being used.

Signed-off-by: Petr Vandrovec <petr@vandrovec.name>
Cc: thomas.petazzoni@free-electrons.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-12 11:19:27 +02:00
Alan Cox
c613ec1a7f x86, early_ioremap: fix fencepost error
The x86 implementation of early_ioremap has an off by one error. If we get
an object which ends on the first byte of a page we undermap by one page and
this causes a crash on boot with the ASUS P5QL whose DMI table happens to fit
this alignment.

The size computation is currently

	last_addr = phys_addr + size - 1;
	npages = (PAGE_ALIGN(last_addr) - phys_addr)

(Consider a request for 1 byte at alignment 0...)

Closes #11693

Debugging work by Ian Campbell/Felix Geyer

Signed-off-by: Alan Cox <alan@rehat.com>
Cc: <stable@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-12 11:19:04 +02:00
David Rientjes
e1e23bb051 x86: avoid dereferencing beyond stack + THREAD_SIZE
It's possible for get_wchan() to dereference past task->stack + THREAD_SIZE
while iterating through instruction pointers if fp equals the upper boundary,
causing a kernel panic.

Signed-off-by: David Rientjes <rientjes@google.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-12 11:18:59 +02:00
Linus Torvalds
ead9d23d80 Merge phase #4 (X2APIC, APIC unification, CPU identification unification) of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-v28-for-linus-phase4-D' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (186 commits)
  x86, debug: print more information about unknown CPUs
  x86 setup: handle more than 8 CPU flag words
  x86: cpuid, fix typo
  x86: move transmeta cap read to early_init_transmeta()
  x86: identify_cpu_without_cpuid v2
  x86: extended "flags" to show virtualization HW feature in /proc/cpuinfo
  x86: move VMX MSRs to msr-index.h
  x86: centaur_64.c remove duplicated setting of CONSTANT_TSC
  x86: intel.c put workaround for old cpus together
  x86: let intel 64-bit use intel.c
  x86: make intel_64.c the same as intel.c
  x86: make intel.c have 64-bit support code
  x86: little clean up of intel.c/intel_64.c
  x86: make 64 bit to use amd.c
  x86: make amd_64 have 32 bit code
  x86: make amd.c have 64bit support code
  x86: merge header in amd_64.c
  x86: add srat_detect_node for amd64
  x86: remove duplicated force_mwait
  x86: cpu make amd.c more like amd_64.c v2
  ...
2008-10-11 11:51:16 -07:00
Ingo Molnar
0afe2db213 Merge branch 'x86/unify-cpu-detect' into x86-v28-for-linus-phase4-D
Conflicts:
	arch/x86/kernel/cpu/common.c
	arch/x86/kernel/signal_64.c
	include/asm-x86/cpufeature.h
2008-10-11 20:23:20 +02:00
Ingo Molnar
d84705969f Merge branch 'x86/apic' into x86-v28-for-linus-phase4-B
Conflicts:
	arch/x86/kernel/apic_32.c
	arch/x86/kernel/apic_64.c
	arch/x86/kernel/setup.c
	drivers/pci/intel-iommu.c
	include/asm-x86/cpufeature.h
	include/asm-x86/dma-mapping.h
2008-10-11 20:17:36 +02:00
Linus Torvalds
bf6f51e3a4 Merge phase #3 (IOMMU) of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-v28-for-linus-phase3-B' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (74 commits)
  AMD IOMMU: use iommu_device_max_index, fix
  AMD IOMMU: use iommu_device_max_index
  x86: add PCI IDs for AMD Barcelona PCI devices
  x86/iommu: use __GFP_ZERO instead of memset for GART
  x86/iommu: convert GART need_flush to bool
  x86/iommu: make GART driver checkpatch clean
  x86 gart: remove unnecessary initialization
  x86: restore old GART alloc_coherent behavior
  revert "x86: make GART to respect device's dma_mask about virtual mappings"
  x86: export pci-nommu's alloc_coherent
  iommu: remove fullflush and nofullflush in IOMMU generic option
  x86: remove set_bit_string()
  iommu: export iommu_area_reserve helper function
  AMD IOMMU: use coherent_dma_mask in alloc_coherent
  add AMD IOMMU tree to MAINTAINERS file
  AMD IOMMU: use cmd_buf_size when freeing the command buffer
  AMD IOMMU: calculate IVHD size with a function
  AMD IOMMU: remove unnecessary cast to u64 in the init code
  AMD IOMMU: free domain bitmap with its allocation order
  AMD IOMMU: simplify dma_mask_to_pages
  ...
2008-10-11 11:03:12 -07:00
Linus Torvalds
ec8deffa33 Merge phase #2 (PAT updates) of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-v28-for-linus-phase2-B' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (27 commits)
  x86, cpa: make the kernel physical mapping initialization a two pass sequence, fix
  x86, pat: cleanups
  x86: fix pagetable init 64-bit breakage
  x86: track memtype for RAM in page struct
  x86, cpa: srlz cpa(), global flush tlb after splitting big page and before doing cpa
  x86, cpa: remove cpa pool code
  x86, cpa: no need to check alias for __set_pages_p/__set_pages_np
  x86, cpa: dont use large pages for kernel identity mapping with DEBUG_PAGEALLOC
  x86, cpa: make the kernel physical mapping initialization a two pass sequence
  x86, cpa: remove USER permission from the very early identity mapping attribute
  x86, cpa: rename PTE attribute macros for kernel direct mapping in early boot
  x86: make sure the CPA test code's use of _PAGE_UNUSED1 is obvious
  linux-next: fix x86 tree build failure
  x86: have set_memory_array_{uc,wb} coalesce memtypes, fix
  agp: enable optimized agp_alloc_pages methods
  x86: have set_memory_array_{uc,wb} coalesce memtypes.
  x86: {reverve,free}_memtype() take a physical address
  x86: fix pageattr-test
  agp: add agp_generic_destroy_pages()
  agp: generic_alloc_pages()
  ...
2008-10-11 11:02:56 -07:00
Linus Torvalds
098ef215b1 Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/davej/cpufreq
* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/davej/cpufreq:
  [CPUFREQ] Fix BUG: using smp_processor_id() in preemptible code
  [CPUFREQ] Don't export governors for default governor
  [CPUFREQ][6/6] cpufreq: Add idle microaccounting in ondemand governor
  [CPUFREQ][5/6] cpufreq: Changes to get_cpu_idle_time_us(), used by ondemand governor
  [CPUFREQ][4/6] cpufreq_ondemand: Parameterize down differential
  [CPUFREQ][3/6] cpufreq: get_cpu_idle_time() changes in ondemand for idle-microaccounting
  [CPUFREQ][2/6] cpufreq: Change load calculation in ondemand for software coordination
  [CPUFREQ][1/6] cpufreq: Add cpu number parameter to __cpufreq_driver_getavg()
  [CPUFREQ] use deferrable delayed work init in conservative governor
  [CPUFREQ] drivers/cpufreq/cpufreq.c: Adjust error handling code involving cpufreq_cpu_put
  [CPUFREQ] add error handling for cpufreq_register_governor() error
  [CPUFREQ] acpi-cpufreq: add error handling for cpufreq_register_driver() error
  [CPUFREQ] Coding style fixes to arch/x86/kernel/cpu/cpufreq/powernow-k6.c
  [CPUFREQ] Coding style fixes to arch/x86/kernel/cpu/cpufreq/elanfreq.c
2008-10-11 08:49:34 -07:00
Yinghai Lu
ee29753327 ACPI: don't load acpi_cpufreq if acpi=off
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Len Brown <len.brown@intel.com>
2008-10-10 23:45:38 -04:00
Matt Mackall
5000cadcf3 x86: trim ACPI sleep stack buffer
x86_64 SMP suspend to RAM uses a 10k temporary stack for saving the
kernel state, but only 4k of it is used. Shrink it to 4k.

Signed-off-by: Matt Mackall <mpm@selenic.com>
Signed-off-by: Len Brown <len.brown@intel.com>
2008-10-10 18:05:52 -04:00
Matt Mackall
d0d0f7432c x86: remove magic number from ACPI sleep stack buffer
x86_64 SMP suspend to RAM uses a 10k temporary stack for saving the
kernel state, but only 4k of it is used. Shrink it to 4k.

Signed-off-by: Matt Mackall <mpm@selenic.com>
Acked-by: Pavel Machek <pavel@suse.cz>
Signed-off-by: Len Brown <len.brown@intel.com>
2008-10-10 18:05:51 -04:00
Linus Torvalds
b11ce8a26d Merge branch 'sched-v28-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'sched-v28-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (38 commits)
  sched debug: add name to sched_domain sysctl entries
  sched: sync wakeups vs avg_overlap
  sched: remove redundant code in cpu_cgroup_create()
  sched_rt.c: resch needed in rt_rq_enqueue() for the root rt_rq
  cpusets: scan_for_empty_cpusets(), cpuset doesn't seem to be so const
  sched: minor optimizations in wake_affine and select_task_rq_fair
  sched: maintain only task entities in cfs_rq->tasks list
  sched: fixup buddy selection
  sched: more sanity checks on the bandwidth settings
  sched: add some comments to the bandwidth code
  sched: fixlet for group load balance
  sched: rework wakeup preemption
  CFS scheduler: documentation about scheduling policies
  sched: clarify ifdef tangle
  sched: fix list traversal to use _rcu variant
  sched: turn off WAKEUP_OVERLAP
  sched: wakeup preempt when small overlap
  kernel/cpu.c: create a CPU_STARTING cpu_chain notifier
  kernel/cpu.c: Move the CPU_DYING notifiers
  sched: fix __load_balance_iterator() for cfq with only one task
  ...
2008-10-10 12:42:31 -07:00
Linus Torvalds
f6bccf6954 Merge git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
  crypto: skcipher - Use RNG interface instead of get_random_bytes
  crypto: rng - RNG interface and implementation
  crypto: api - Add fips_enable flag
  crypto: skcipher - Move IV generators into their own modules
  crypto: cryptomgr - Test ciphers using ECB
  crypto: api - Use test infrastructure
  crypto: cryptomgr - Add test infrastructure
  crypto: tcrypt - Add alg_test interface
  crypto: tcrypt - Abort and only log if there is an error
  crypto: crc32c - Use Intel CRC32 instruction
  crypto: tcrypt - Avoid using contiguous pages
  crypto: api - Display larval objects properly
  crypto: api - Export crypto_alg_lookup instead of __crypto_alg_lookup
  crypto: Kconfig - Replace leading spaces with tabs
2008-10-10 11:20:42 -07:00
Ingo Molnar
725c25819e Merge branches 'core/iommu', 'x86/amd-iommu' and 'x86/iommu' into x86-v28-for-linus-phase3-B
Conflicts:
	arch/x86/kernel/pci-gart_64.c
	include/asm-x86/dma-mapping.h
2008-10-10 19:47:12 +02:00
Ingo Molnar
3dd392a407 Merge branch 'linus' into x86/pat2
Conflicts:
	arch/x86/mm/init_64.c
2008-10-10 19:30:08 +02:00
Suresh Siddha
b27a43c1e9 x86, cpa: make the kernel physical mapping initialization a two pass sequence, fix
Jeremy Fitzhardinge wrote:

> I'd noticed that current tip/master hasn't been booting under Xen, and I
> just got around to bisecting it down to this change.
>
> commit 065ae73c5462d42e9761afb76f2b52965ff45bd6
> Author: Suresh Siddha <suresh.b.siddha@intel.com>
>
>    x86, cpa: make the kernel physical mapping initialization a two pass sequence
>
> This patch is causing Xen to fail various pagetable updates because it
> ends up remapping pagetables to RW, which Xen explicitly prohibits (as
> that would allow guests to make arbitrary changes to pagetables, rather
> than have them mediated by the hypervisor).

Instead of making init a two pass sequence, to satisfy the Intel's TLB
Application note (developer.intel.com/design/processor/applnots/317080.pdf
Section 6 page 26), we preserve the original page permissions
when fragmenting the large mappings and don't touch the existing memory
mapping (which satisfies Xen's requirements).

Only open issue is: on a native linux kernel, we will go back to mapping
the first 0-1GB kernel identity mapping as executable (because of the
static mapping setup in head_64.S). We can fix this in a different
patch if needed.

Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Acked-by: Jeremy Fitzhardinge <jeremy@goop.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-10 19:29:21 +02:00
Ingo Molnar
ad2cde16a2 x86, pat: cleanups
clean up recently added code to be more consistent with other x86 code.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-10 19:29:20 +02:00
Suresh Siddha
28dd033f43 x86: fix pagetable init 64-bit breakage
Fix _end alignment check - can trigger a crash if _end happens to be
on a page boundary.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-10 19:29:20 +02:00
Suresh Siddha
9542ada803 x86: track memtype for RAM in page struct
Track the memtype for RAM pages in page struct instead of using the
memtype list. This avoids the explosion in the number of entries in
memtype list (of the order of 20,000 with AGP) and makes the PAT
tracking simpler.

We are using PG_arch_1 bit in page->flags.

We still use the memtype list for non RAM pages.

Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-10 19:29:18 +02:00
Suresh Siddha
ad5ca55f6b x86, cpa: srlz cpa(), global flush tlb after splitting big page and before doing cpa
Do a global flush tlb after splitting the large page and before we do the
actual change page attribute in the PTE.

With out this, we violate the TLB application note, which says
    "The TLBs may contain both ordinary and large-page translations for
     a 4-KByte range of linear addresses. This may occur if software
     modifies the paging structures so that the page size used for the
     address range changes. If the two translations differ with respect
     to page frame or attributes (e.g., permissions), processor behavior
     is undefined and may be implementation-specific."

And also serialize cpa() (for !DEBUG_PAGEALLOC which uses large identity
mappings) using cpa_lock. So that we don't allow any other cpu, with stale
large tlb entries change the page attribute in parallel to some other cpu
splitting a large page entry along with changing the attribute.

Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: arjan@linux.intel.com
Cc: venkatesh.pallipadi@intel.com
Cc: jeremy@goop.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-10 19:29:17 +02:00
Suresh Siddha
8311eb84bf x86, cpa: remove cpa pool code
Interrupt context no longer splits large page in cpa(). So we can do away
with cpa memory pool code.

Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: arjan@linux.intel.com
Cc: venkatesh.pallipadi@intel.com
Cc: jeremy@goop.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-10 19:29:16 +02:00
Suresh Siddha
55121b4369 x86, cpa: no need to check alias for __set_pages_p/__set_pages_np
No alias checking needed for setting present/not-present mapping. Otherwise,
we may need to break large pages for 64-bit kernel text mappings (this adds to
complexity if we want to do this from atomic context especially, for ex:
with CONFIG_DEBUG_PAGEALLOC). Let's keep it simple!

Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: arjan@linux.intel.com
Cc: venkatesh.pallipadi@intel.com
Cc: jeremy@goop.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-10 19:29:15 +02:00
Suresh Siddha
0b8fdcbcd2 x86, cpa: dont use large pages for kernel identity mapping with DEBUG_PAGEALLOC
Don't use large pages for kernel identity mapping with DEBUG_PAGEALLOC.
This will remove the need to split the large page for the
allocated kernel page in the interrupt context.

This will simplify cpa code(as we don't do the split any more from the
interrupt context). cpa code simplication in the subsequent patches.

Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: arjan@linux.intel.com
Cc: venkatesh.pallipadi@intel.com
Cc: jeremy@goop.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-10 19:29:14 +02:00
Suresh Siddha
a2699e477b x86, cpa: make the kernel physical mapping initialization a two pass sequence
In the first pass, kernel physical mapping will be setup using large or
small pages but uses the same PTE attributes as that of the early
PTE attributes setup by early boot code in head_[32|64].S

After flushing TLB's, we go through the second pass, which setups the
direct mapped PTE's with the appropriate attributes (like NX, GLOBAL etc)
which are runtime detectable.

This two pass mechanism conforms to the TLB app note which says:

"Software should not write to a paging-structure entry in a way that would
 change, for any linear address, both the page size and either the page frame
 or attributes."

Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: arjan@linux.intel.com
Cc: venkatesh.pallipadi@intel.com
Cc: jeremy@goop.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-10 19:29:13 +02:00
Suresh Siddha
b2bc273146 x86, cpa: rename PTE attribute macros for kernel direct mapping in early boot
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: arjan@linux.intel.com
Cc: venkatesh.pallipadi@intel.com
Cc: jeremy@goop.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-10 19:29:11 +02:00
Linus Torvalds
d403a6484f Merge phase #1 of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
This merges phase 1 of the x86 tree, which is a collection of branches:

  x86/alternatives, x86/cleanups, x86/commandline, x86/crashdump,
  x86/debug, x86/defconfig, x86/doc, x86/exports, x86/fpu, x86/gart,
  x86/idle, x86/mm, x86/mtrr, x86/nmi-watchdog, x86/oprofile,
  x86/paravirt, x86/reboot, x86/sparse-fixes, x86/tsc, x86/urgent and
  x86/vmalloc

and as Ingo says: "these are the easiest, purely independent x86 topics
with no conflicts, in one nice Octopus merge".

* 'x86-v28-for-linus-phase1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (147 commits)
  x86: mtrr_cleanup: treat WRPROT as UNCACHEABLE
  x86: mtrr_cleanup: first 1M may be covered in var mtrrs
  x86: mtrr_cleanup: print out correct type v2
  x86: trivial printk fix in efi.c
  x86, debug: mtrr_cleanup print out var mtrr before change it
  x86: mtrr_cleanup try gran_size to less than 1M, v3
  x86: mtrr_cleanup try gran_size to less than 1M, cleanup
  x86: change MTRR_SANITIZER to def_bool y
  x86, debug printouts: IOMMU setup failures should not be KERN_ERR
  x86: export set_memory_ro and set_memory_rw
  x86: mtrr_cleanup try gran_size to less than 1M
  x86: mtrr_cleanup prepare to make gran_size to less 1M
  x86: mtrr_cleanup safe to get more spare regs now
  x86_64: be less annoying on boot, v2
  x86: mtrr_cleanup hole size should be less than half of chunk_size, v2
  x86: add mtrr_cleanup_debug command line
  x86: mtrr_cleanup optimization, v2
  x86: don't need to go to chunksize to 4G
  x86_64: be less annoying on boot
  x86, olpc: fix endian bug in openfirmware workaround
  ...
2008-10-10 08:28:58 -07:00
Hans Schou
43603c8df9 x86, debug: print more information about unknown CPUs
Write the name of the unknown vendor_id to output instead of just
"unknown".

Tag changed to 'vendor_id' as used in /proc/cpuinfo

Signed-off-by: Hans Schou <linux@schou.dk>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-10 17:03:59 +02:00
Ian Campbell
5dc64a3442 xen: do not reserve 2 pages of padding between hypervisor and fixmap.
When reserving space for the hypervisor the Xen paravirt backend adds
an extra two pages (this was carried forward from the 2.6.18-xen tree
which had them "for safety"). Depending on various CONFIG options this
can cause the boot time fixmaps to span multiple PMDs which is not
supported and triggers a WARN in early_ioremap_init().

This was exposed by 2216d199b1 which
moved the dmi table parsing earlier.
    x86: fix CONFIG_X86_RESERVE_LOW_64K=y

    The bad_bios_dmi_table() quirk never triggered because we do DMI setup
    too late. Move it a bit earlier.

There is no real reason to reserve these two extra pages and the
fixmap already incorporates FIX_HOLE which serves the same
purpose. None of the other callers of reserve_top_address do this.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-10 13:00:15 +02:00
Ingo Molnar
8eb95f28f6 Merge commit 'v2.6.27' into timers/hpet 2008-10-10 09:25:29 +02:00
venkatesh.pallipadi@intel.com
bf0b90e357 [CPUFREQ][1/6] cpufreq: Add cpu number parameter to __cpufreq_driver_getavg()
Add a cpu parameter to __cpufreq_driver_getavg(). This is needed for software
cpufreq coordination where policy->cpu may not be same as the CPU on which we
want to getavg frequency.

A follow-on patch will use this parameter to getavg freq from all cpus
in policy->cpus.

Change since last patch. Fix the offline/online and suspend/resume
oops reported by Youquan Song <youquan.song@intel.com>

Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Dave Jones <davej@redhat.com>
2008-10-09 13:52:43 -04:00
Akinobu Mita
847aef6ffd [CPUFREQ] acpi-cpufreq: add error handling for cpufreq_register_driver() error
add error handling for cpufreq_register_driver() error

Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Cc: cpufreq@lists.linux.org.uk
Signed-off-by: Dave Jones <davej@redhat.com>
2008-10-09 13:52:42 -04:00
Paolo Ciarrocchi
8d2d2051e5 [CPUFREQ] Coding style fixes to arch/x86/kernel/cpu/cpufreq/powernow-k6.c
Before:
total: 11 errors, 15 warnings, 255 lines checked

After:
total: 0 errors, 6 warnings, 254 lines checked

paolo@paolo-desktop:~/linux.trees.git$ md5sum /tmp/powernow-k6.o.*
476932f5e1ffe365db9d1dfb3f860369  /tmp/powernow-k6.o.after
476932f5e1ffe365db9d1dfb3f860369  /tmp/powernow-k6.o.before

Signed-off-by: Paolo Ciarrocchi <paolo.ciarrocchi@gmail.com>
Signed-off-by: Dave Jones <davej@redhat.com>
2008-10-09 13:52:42 -04:00
Paolo Ciarrocchi
18c6faa962 [CPUFREQ] Coding style fixes to arch/x86/kernel/cpu/cpufreq/elanfreq.c
Before:
total: 15 errors, 10 warnings, 308 lines checked

After:
total: 0 errors, 4 warnings, 308 lines checked

paolo@paolo-desktop:~/linux.trees.git$ md5sum /tmp/elafreq.o.*
add1d36c2f077c5aab7682e8642a9f34  /tmp/elafreq.o.after
add1d36c2f077c5aab7682e8642a9f34  /tmp/elafreq.o.before

paolo@paolo-desktop:~/linux.trees.git$ size /tmp/elafreq.o.*
   text    data     bss     dec     hex filename
    934     270       4    1208     4b8 /tmp/elafreq.o.after
    934     270       4    1208     4b8 /tmp/elafreq.o.before

Signed-off-by: Paolo Ciarrocchi <paolo.ciarrocchi@gmail.com>
Signed-off-by: Dave Jones <davej@redhat.com>
2008-10-09 13:52:42 -04:00
Németh Márton
8d59225720 [CPUFREQ] correct broken links and email addresses
Replace the no longer working links and email address in the
documentation and in source code.

Signed-off-by: Márton Németh <nm127@freemail.hu>
Signed-off-by: Dave Jones <davej@redhat.com>
2008-10-09 13:52:40 -04:00
Jeremy Fitzhardinge
eefb47f6a1 xen: use spin_lock_nest_lock when pinning a pagetable
When pinning/unpinning a pagetable with split pte locks, we can end up
holding multiple pte locks at once (we need to hold the locks while
there's a pending batched hypercall affecting the pte page).  Because
all the pte locks are in the same lock class, lockdep thinks that
we're potentially taking a lock recursively.

This warning is spurious because we always take the pte locks while
holding mm->page_table_lock.  lockdep now has spin_lock_nest_lock to
express this kind of dominant lock use, so use it here so that lockdep
knows what's going on.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-09 14:25:19 +02:00
Ingo Molnar
990d0f2ced Merge branches 'sched/devel', 'sched/cpu-hotplug', 'sched/cpusets' and 'sched/urgent' into sched/core 2008-10-08 11:31:02 +02:00
Suresh Siddha
04944b793e x86: xsave: set FP, SSE bits in the xsave header in the user sigcontext
If a processor implementation discern that a processor state component is in
its initialized state, it may modify the corresponding bit in the
xsave header.xstate_bv as '0'. State in the memory layout setup by 'xsave'
will be consistent with the bit values in the header.

During signal handling, legacy applications may change the FP/SSE bits
in the sigcontext memory layout without touching the FP/SSE header bits
in the xsave header. So always set FP/SSE bits in the xsave header
while saving the sigcontext state to the user space. During signal return,
this will enable the kernel to capture any changes to the FP/SSE bits by the
legacy applications which don't touch xsave headers.

xsave aware apps can change the xstate_bv in the xsave header aswell
as change any contents in the memory layout. xrestor as part of sigreturn
will capture all the changes.

Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2008-10-07 14:36:08 -07:00
Suresh Siddha
f364eadab5 x86: xsave: fix error condition in save_i387_xstate()
Actually return failure on error.

Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2008-10-07 14:36:01 -07:00
Ingo Molnar
8d89adf44c x86: SB450: deprioritize DMI quirks
This PCI ID based quick should be a full solution for the IRQ0 override
related slowdown problem on SB450 based systems:

  33fb0e4: x86: SB450: skip IRQ0 override if it is not routed to INT2 of IOAPIC

Emit a warning in those cases where the DMI quirk triggers but
the PCI ID based quirk didnt.

If this warning does not trigger then we can phase out the DMI quirks.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-07 06:47:52 +02:00
Andreas Herrmann
33fb0e4eb5 x86: SB450: skip IRQ0 override if it is not routed to INT2 of IOAPIC
On some HP nx6... laptops (e.g. nx6325) BIOS reports an IRQ0 override
but the SB450 chipset is configured such that timer interrupts goe to
INT0 of IOAPIC.

Check IRQ0 routing and if it is routed to INT0 of IOAPIC skip the
timer override.

[ This more generic PCI ID based quirk should alleviate the need for
  dmi_ignore_irq0_timer_override DMI quirks. ]

Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com>
Acked-by: "Maciej W. Rozycki" <macro@linux-mips.org>
Tested-by: Dmitry Torokhov <dtor@mail.ru>
Cc: <stable@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-07 06:42:58 +02:00
Linus Torvalds
afed26d151 Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jwessel/linux-2.6-kgdb
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jwessel/linux-2.6-kgdb:
  kgdb: call touch_softlockup_watchdog on resume
  kgdb, x86: Avoid invoking kgdb_nmicallback twice per NMI
2008-10-06 14:30:02 -07:00
Linus Torvalds
6106611e15 Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86: gart iommu have direct mapping when agp is present too
2008-10-06 14:29:16 -07:00
Jan Kiszka
e85ceae910 kgdb, x86: Avoid invoking kgdb_nmicallback twice per NMI
Stress-testing KVM's latest NMI support with kgdbts inside an SMP guest,
I came across spurious unhandled NMIs while running the singlestep test.
Looking closer at the code path each NMI takes when KGDB is enabled, I
noticed that kgdb_nmicallback is called twice per event: One time via
DIE_NMI_IPI notification, the second time on DIE_NMI. Removing the first
invocation cures the unhandled NMIs here.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
2008-10-06 13:50:59 -05:00
Rafael J. Wysocki
e84956f92a x86 ACPI: Blacklist two HP machines with buggy BIOSes
There is a bug in the BIOSes of some HP boxes with AMD Turions which
connects IO-APIC pins with ACPI thermal trip points in such a way that
if the state of the IO-APIC is not as expected by the (buggy) BIOS, the
thermal trip points are set to insanely low values (usually all of them
become 16 degrees Celsius).  As a result, thermal throttling kicks in
and knock the system down to its shoes.

Unfortunately some of the recent IO-APIC changes made the bug show up.
To prevent this from happening, blacklist machines that are known to be
affected (nx6115 and 6715b in this particular case).

This fixes http://bugzilla.kernel.org/show_bug.cgi?id=11516 listed as
a regression from 2.6.26.

On my box it was caused by:

commit 691874fa96
Author: Maciej W. Rozycki <macro@linux-mips.org>
Date:   Tue May 27 21:19:51 2008 +0100

    x86: I/O APIC: timer through 8259A second-chance

    Signed-off-by: Maciej W. Rozycki <macro@linux-mips.org>
    Signed-off-by: Ingo Molnar <mingo@elte.hu>

and the whole story is described in this (huge) thread:

    http://marc.info/?l=linux-kernel&m=121358440508410&w=4

Matthew Garrett told us about that happening on the nx6125:

    http://marc.info/?l=linux-kernel&m=121396307411930&w=4

and then Maciej analysed the breakage on the basis of a DSDT from the
nx6325:

    http://marc.info/?l=linux-kernel&m=121401068718826&w=4

As far as the Dmitry's and Jason's boxes are concerned, I recognized the
symptoms and asked them to verify that the blacklisting helped.

It appears that the buggy BIOS code has been copy-pasted to the entire
range of machines, for no good reason.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Tested-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Tested-by: Jason Vas Dias <jason.vas.dias@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-10-06 10:17:29 -07:00
Ingo Molnar
e496e3d645 Merge branches 'x86/alternatives', 'x86/cleanups', 'x86/commandline', 'x86/crashdump', 'x86/debug', 'x86/defconfig', 'x86/doc', 'x86/exports', 'x86/fpu', 'x86/gart', 'x86/idle', 'x86/mm', 'x86/mtrr', 'x86/nmi-watchdog', 'x86/oprofile', 'x86/paravirt', 'x86/reboot', 'x86/sparse-fixes', 'x86/tsc', 'x86/urgent' and 'x86/vmalloc' into x86-v28-for-linus-phase1 2008-10-06 18:17:07 +02:00
Ingo Molnar
b159d7a989 Merge branch 'x86/tracehook' into x86-v28-for-linus-phase1
Conflicts:
	arch/x86/kernel/signal_64.c

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-06 18:16:40 +02:00
Ingo Molnar
0962f402af Merge branch 'x86/prototypes' into x86-v28-for-linus-phase1
Conflicts:
	arch/x86/kernel/process_32.c

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-06 18:06:53 +02:00
Ingo Molnar
19268ed744 Merge branch 'x86/pebs' into x86-v28-for-linus-phase1
Conflicts:
	include/asm-x86/ds.h

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-06 16:17:23 +02:00
Ingo Molnar
b8cd9d056b Merge branch 'x86/header-guards' into x86-v28-for-linus-phase1
Conflicts:
	include/asm-x86/dma-mapping.h
	include/asm-x86/gpio.h
	include/asm-x86/idle.h
	include/asm-x86/kvm_host.h
	include/asm-x86/namei.h
	include/asm-x86/uaccess.h

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-06 16:15:57 +02:00
Michal Januszewski
2407390bd2 x86: replace a magic number with a named constant in the VESA boot code
Replace a magic number with a named constant in the VESA boot code.

Signed-off-by: Michal Januszewski <spock@gentoo.org>
Cc: linux-fbdev-devel@lists.sourceforge.net
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-05 18:39:29 +02:00