Mark zapped root pagetables as invalid and ignore such pages during lookup.
This is a problem with the cr3-target feature, where a zapped root table fools
the faulting code into creating a read-only mapping. The result is a lockup
if the instruction can't be emulated.
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Anthony Liguori <aliguori@us.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
In two case statements, use the ever popular 'i' instead of index:
arch/x86/kvm/x86.c:1063:7: warning: symbol 'index' shadows an earlier one
arch/x86/kvm/x86.c:1000:9: originally declared here
arch/x86/kvm/x86.c:1079:7: warning: symbol 'index' shadows an earlier one
arch/x86/kvm/x86.c:1000:9: originally declared here
Make it static.
arch/x86/kvm/x86.c:1945:24: warning: symbol 'emulate_ops' was not declared. Should it be static?
Drop the return statements.
arch/x86/kvm/x86.c:2878:2: warning: returning void-valued expression
arch/x86/kvm/x86.c:2944:2: warning: returning void-valued expression
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Fixes sparse warning as well.
arch/x86/kvm/svm.c:69:15: warning: symbol 'iopm_base' was not declared. Should it be static?
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Nesting __emulate_2op_nobyte inside__emulate_2op produces many shadowed
variable warnings on the internal variable _tmp used by both macros.
Change the outer macro to use __tmp.
Avoids a sparse warning like the following at every call site of __emulate_2op
arch/x86/kvm/x86_emulate.c:1091:3: warning: symbol '_tmp' shadows an earlier one
arch/x86/kvm/x86_emulate.c:1091:3: originally declared here
[18 more warnings suppressed]
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Replaces open-coded mask calculation in macros.
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
This is the guest part of kvm clock implementation
It does not do tsc-only timing, as tsc can have deltas
between cpus, and it did not seem worthy to me to keep
adjusting them.
We do use it, however, for fine-grained adjustment.
Other than that, time comes from the host.
[randy dunlap: add missing include]
[randy dunlap: disallow on Voyager or Visual WS]
Signed-off-by: Glauber de Oliveira Costa <gcosta@redhat.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
This is the host part of kvm clocksource implementation. As it does
not include clockevents, it is a fairly simple implementation. We
only have to register a per-vcpu area, and start writing to it periodically.
The area is binary compatible with xen, as we use the same shadow_info
structure.
[marcelo: fix bad_page on MSR_KVM_SYSTEM_TIME]
[avi: save full value of the msr, even if enable bit is clear]
[avi: clear previous value of time_page]
Signed-off-by: Glauber de Oliveira Costa <gcosta@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
This patch implements the Last Branch Record Virtualization (LBRV) feature of
the AMD Barcelona and Phenom processors into the kvm-amd module. It will only
be enabled if the guest enables last branch recording in the DEBUG_CTL MSR. So
there is no increased world switch overhead when the guest doesn't use these
MSRs.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Markus Rechberger <markus.rechberger@amd.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
This patch changes the kvm-amd module to allocate the SVM MSR permission map
per VCPU instead of a global map for all VCPUs. With this we have more
flexibility allowing specific guests to access virtualized MSRs. This is
required for LBR virtualization.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Markus Rechberger <markus.rechberger@amd.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Change the parameter of the init_vmcb() function in the kvm-amd module from
struct vmcb to struct vcpu_svm.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Markus Rechberger <markus.rechberger@amd.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Looking at Intel Volume 3b, page 148, table 20-11 and noticed
that the field name is 'Deliver' not 'Deliever'. Attached patch changes
the define name and its user in vmx.c
Signed-off-by: Ryan Harper <ryanh@us.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
This patch contains the SVM architecture dependent changes for KVM to enable
support for the Nested Paging feature of AMD Barcelona and Phenom processors.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
This patch contains the changes to the KVM MMU necessary for support of the
Nested Paging feature in AMD Barcelona and Phenom Processors.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
The load_pdptrs() function is required in the SVM module for NPT support.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
The mapping function for the nonpaging case in the softmmu does basically the
same as required for Nested Paging. Make this function generic so it can be
used for both.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
The generic x86 code has to know if the specific implementation uses Nested
Paging. In the generic code Nested Paging is called Two Dimensional Paging
(TDP) to avoid confusion with (future) TDP implementations of other vendors.
This patch exports the availability of TDP to the generic x86 code.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
To disable the use of the Nested Paging feature even if it is available in
hardware this patch adds a module parameter. Nested Paging can be disabled by
passing npt=0 to the kvm_amd module.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Let SVM detect if the Nested Paging feature is available on the hardware.
Disable it to keep this patch series bisectable.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
By moving the SVM feature detection from the each_cpu code to the hardware
setup code it runs only once. As an additional advance the feature check is now
available earlier in the module setup process.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
This patch makes the EFER register accessible on a 32bit KVM host. This is
necessary to boot 32 bit PAE guests under SVM.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
To allow access to the EFER register in 32bit KVM the EFER specific code has to
be exported to the x86 generic code. This patch does this in a backwards
compatible manner.
[avi: add check for EFER-less hosts]
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
This patch aligns the bits the guest can set in the EFER register with the
features in the host processor. Currently it lets EFER.NX disabled if the
processor does not support it and enables EFER.LME and EFER.LMA only for KVM on
64 bit hosts.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
This patch give the SVM and VMX implementations the ability to add some bits
the guest can set in its EFER register.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
To allow TLB entries to be retained across VM entry and VM exit, the VMM
can now identify distinct address spaces through a new virtual-processor ID
(VPID) field of the VMCS.
[avi: drop vpid_sync_all()]
[avi: add "cc" to asm constraints]
Signed-off-by: Sheng Yang <sheng.yang@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Currently an mmio guest pte is encoded in the shadow pagetable as a
not-present trapping pte, with the SHADOW_IO_MARK bit set. However
nothing is ever done with this information, so maintaining it is a
useless complication.
This patch moves the check for mmio to before shadow ptes are instantiated,
so the shadow code is never invoked for ptes that reference mmio. The code
is simpler, and with future work, can be made to handle mmio concurrently.
Signed-off-by: Avi Kivity <avi@qumranet.com>
Certain x86 instructions use bits 3:5 of the byte following the opcode as an
opcode extension, with the decode sometimes depending on bits 6:7 as well.
Add support for this in the main decoding table rather than an ad-hock
adaptation per opcode.
Signed-off-by: Avi Kivity <avi@qumranet.com>
A guest partial guest pte write will leave shadow_trap_nonpresent_pte
in spte, which generates a vmexit at the next guest access through that pte.
This patch improves this by reading the full guest pte in advance and thus
being able to update the spte and eliminate the vmexit.
This helps pae guests which use two 32-bit writes to set a single 64-bit pte.
[truncation fix by Eric]
Signed-off-by: Yaozu (Eddie) Dong <eddie.dong@intel.com>
Signed-off-by: Feng (Eric) Liu <eric.e.liu@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
I must have disabled this due to other bugs which were fixed over
time. And this is needed in order for child devices of "pmu"
to get proper resource values.
Signed-off-by: David S. Miller <davem@davemloft.net>
It's completely superfluous, CONFIG_COMPAT is sufficient.
What this used to be is an umbrella for enabling code shared
by all 32-bit compat binary support types. But with the
removal of SunOS and Solaris support, the only one left is
Linux 32-bit ELF.
Update defconfig.
Signed-off-by: David S. Miller <davem@davemloft.net>
Refer to chip as "SPARC" throughout.
Say 32-bit SPARC and 64-bit SPARC rather than mentioning specific
chips such like UltraSPARC, as appropriate.
Remove non-sense help text referring to things that will never appear
on a SPARC system, such as EISA busses etc.
Use "help" instead of "--help--"
Signed-off-by: David S. Miller <davem@davemloft.net>
Kernel bugzilla 10273
As reported by Jos van der Ende, ever since commit
5a606b72a4 ("[SPARC64]: Do not ACK an
INO if it is disabled or inprogress.") sun4u interrupts
can get stuck.
What this changset did was add the following conditional to
the various IRQ chip ->enable() handlers on sparc64:
if (unlikely(desc->status & (IRQ_DISABLED|IRQ_INPROGRESS)))
return;
which is correct, however it means that special care is needed
in the ->enable() method.
Specifically we must put the interrupt into IDLE state during
an enable, or else it might never be sent out again.
Setting the INO interrupt state to IDLE resets the state machine,
the interrupt input to the INO is retested by the hardware, and
if an interrupt is being signalled by the device, the INO
moves back into TRANSMIT state, and an interrupt vector is sent
to the cpu.
The two sun4v IRQ chip handlers were already doing this properly,
only sun4u got it wrong.
Signed-off-by: David S. Miller <davem@davemloft.net>
OK, so 25-mm1 gave a lockdep error which made me look into this.
The first thing that I noticed was the horrible mess; the second thing I
saw was hacks like: 71e93d1561
The problem is that arch idle routines are somewhat inconsitent with
their IRQ state handling and instead of fixing _that_, we go paper over
the problem.
So the thing I've tried to do is set a standard for idle routines and
fix them all up to adhere to that. So the rules are:
idle routines are entered with IRQs disabled
idle routines will exit with IRQs enabled
Nearly all already did this in one form or another.
Merge the 32 and 64 bit bits so they no longer have different bugs.
As for the actual lockdep warning; __sti_mwait() did a plainly un-annotated
irq-enable.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Tested-by: Bob Copeland <me@bobcopeland.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-x86-bigbox-bootmem-v3:
x86_64/mm: check and print vmemmap allocation continuous
x86_64: fix setup_node_bootmem to support big mem excluding with memmap
x86_64: make reserve_bootmem_generic() use new reserve_bootmem()
mm: allow reserve_bootmem() cross nodes
mm: offset align in alloc_bootmem()
mm: fix alloc_bootmem_core to use fast searching for all nodes
mm: make mem_map allocation continuous
On big systems with lots of memory, don't print out too much during
bootup, and make it easy to find if it is continuous.
on 256G 8 sockets system will get
[ffffe20000000000-ffffe20002bfffff] PMD -> [ffff810001400000-ffff810003ffffff] on node 0
[ffffe2001c700000-ffffe2001c7fffff] potential offnode page_structs
[ffffe20002c00000-ffffe2001c7fffff] PMD -> [ffff81000c000000-ffff8100255fffff] on node 0
[ffffe20038700000-ffffe200387fffff] potential offnode page_structs
[ffffe2001c800000-ffffe200387fffff] PMD -> [ffff810820200000-ffff81083c1fffff] on node 1
[ffffe20040000000-ffffe2007fffffff] PUD ->ffff811027a00000 on node 2
[ffffe20038800000-ffffe2003fffffff] PMD -> [ffff811020200000-ffff8110279fffff] on node 2
[ffffe20054700000-ffffe200547fffff] potential offnode page_structs
[ffffe20040000000-ffffe200547fffff] PMD -> [ffff811027c00000-ffff81103c3fffff] on node 2
[ffffe20070700000-ffffe200707fffff] potential offnode page_structs
[ffffe20054800000-ffffe200707fffff] PMD -> [ffff811820200000-ffff81183c1fffff] on node 3
[ffffe20080000000-ffffe200bfffffff] PUD ->ffff81202fa00000 on node 4
[ffffe20070800000-ffffe2007fffffff] PMD -> [ffff812020200000-ffff81202f9fffff] on node 4
[ffffe2008c700000-ffffe2008c7fffff] potential offnode page_structs
[ffffe20080000000-ffffe2008c7fffff] PMD -> [ffff81202fc00000-ffff81203c3fffff] on node 4
[ffffe200a8700000-ffffe200a87fffff] potential offnode page_structs
[ffffe2008c800000-ffffe200a87fffff] PMD -> [ffff812820200000-ffff81283c1fffff] on node 5
[ffffe200c0000000-ffffe200ffffffff] PUD ->ffff813037a00000 on node 6
[ffffe200a8800000-ffffe200bfffffff] PMD -> [ffff813020200000-ffff8130379fffff] on node 6
[ffffe200c4700000-ffffe200c47fffff] potential offnode page_structs
[ffffe200c0000000-ffffe200c47fffff] PMD -> [ffff813037c00000-ffff81303c3fffff] on node 6
[ffffe200c4800000-ffffe200e07fffff] PMD -> [ffff813820200000-ffff81383c1fffff] on node 7
instead of a very long print out...
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
typical case: four sockets system, every node has 4g ram, and we are using:
memmap=10g$4g
to mask out memory on node1 and node2
when numa is enabled, early_node_mem is used to get node_data and node_bootmap.
if it can not get memory from the same node with find_e820_area(), it will
use alloc_bootmem to get buff from previous nodes.
so check it and print out some info about it.
need to move early_res_to_bootmem into every setup_node_bootmem.
and it takes range that node has. otherwise alloc_bootmem could return addr
that reserved early.
depends on "mm: make reserve_bootmem can crossed the nodes".
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
"mm: make reserve_bootmem can crossed the nodes" provides new
reserve_bootmem(), let reserve_bootmem_generic() use that.
reserve_bootmem_generic() is used to reserve initramdisk, so this way
we can make sure even when bootloader or kexec load ranges cross the
node memory boundaries, reserve_bootmem still works.
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-generic-bitops-v3:
x86, bitops: select the generic bitmap search functions
x86: include/asm-x86/pgalloc.h/bitops.h: checkpatch cleanups - formatting only
x86: finalize bitops unification
x86, UML: remove x86-specific implementations of find_first_bit
x86: optimize find_first_bit for small bitmaps
x86: switch 64-bit to generic find_first_bit
x86: generic versions of find_first_(zero_)bit, convert i386
bitops: use __fls for fls64 on 64-bit archs
generic: implement __fls on all 64-bit archs
generic: introduce a generic __fls implementation
x86: merge the simple bitops and move them to bitops.h
x86, generic: optimize find_next_(zero_)bit for small constant-size bitmaps
x86, uml: fix uml with generic find_next_bit for x86
x86: change x86 to use generic find_next_bit
uml: Kconfig cleanup
uml: fix build error
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-x86-bigbox-bootparam:
x86, boot: Document for linked list of struct setup_data
x86, boot: export linked list of struct setup_data via debugfs
x86, boot: add linked list of struct setup_data
x86, boot: add free_early to early reservation machanism
Export linked list of struct setup_data via debugfs.
Signed-off-by: Huang Ying <ying.huang@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
This patch adds a field of 64-bit physical pointer to NULL terminated
single linked list of struct setup_data to real-mode kernel
header. This is used as a more extensible boot parameters passing
mechanism.
Signed-off-by: Huang Ying <ying.huang@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Add free_early to early reservation mechanism - this way early bootup
failure paths can stop wasting memory.
Signed-off-by: Huang Ying <ying.huang@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
disable /dev/mem mmap of RAM with PAT. It makes things safer and
eliminates aliasing. A future improvement would be to avoid the
range_is_allowed duplication.
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Introduce GENERIC_FIND_FIRST_BIT and GENERIC_FIND_NEXT_BIT in
lib/Kconfig, defaulting to off. An arch that wants to use the
generic implementation now only has to use a select statement
to include them.
I added an always-y option (X86_CPU) to arch/x86/Kconfig.cpu
and used that to select the generic search functions. This
way ARCH=um SUBARCH=i386 automatically picks up the change
too, and arch/um/Kconfig.i386 can therefore be simplified a
bit. ARCH=um SUBARCH=x86_64 does things differently, but
still compiles fine. It seems that a "def_bool y" always
wins over a "def_bool n"?
Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
x86 has been switched to the generic versions of find_first_bit
and find_first_zero_bit, but the original versions were retained.
This patch just removes the now unused x86-specific versions.
also update UML.
Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Switch x86_64 to generic find_first_bit. The x86_64-specific
implementation is not removed.
Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Generic versions of __find_first_bit and __find_first_zero_bit
are introduced as simplified versions of __find_next_bit and
__find_next_zero_bit. Their compilation and use are guarded by
a new config variable GENERIC_FIND_FIRST_BIT.
The generic versions of find_first_bit and find_first_zero_bit
are implemented in terms of the newly introduced __find_first_bit
and __find_first_zero_bit.
This patch does not remove the i386-specific implementation,
but it does switch i386 to use the generic functions by setting
GENERIC_FIND_FIRST_BIT=y for X86_32.
Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Some of those can be written in such a way that the same
inline assembly can be used to generate both 32 bit and
64 bit code.
For ffs and fls, x86_64 unconditionally used the cmov
instruction and i386 unconditionally used a conditional
branch over a mov instruction. In the current patch I
chose to select the version based on the availability
of the cmov instruction instead. A small detail here is
that x86_64 did not previously set CONFIG_X86_CMOV=y.
Improved comments for ffs, ffz, fls and variations.
Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The versions with inline assembly are in fact slower on the machines I
tested them on (in userspace) (Athlon XP 2800+, p4-like Xeon 2.8GHz, AMD
Opteron 270). The i386-version needed a fix similar to 06024f21 to avoid
crashing the benchmark.
Benchmark using: gcc -fomit-frame-pointer -Os. For each bitmap size
1...512, for each possible bitmap with one bit set, for each possible
offset: find the position of the first bit starting at offset. If you
follow ;). Times include setup of the bitmap and checking of the
results.
Athlon Xeon Opteron 32/64bit
x86-specific: 0m3.692s 0m2.820s 0m3.196s / 0m2.480s
generic: 0m2.622s 0m1.662s 0m2.100s / 0m1.572s
If the bitmap size is not a multiple of BITS_PER_LONG, and no set
(cleared) bit is found, find_next_bit (find_next_zero_bit) returns a
value outside of the range [0, size]. The generic version always returns
exactly size. The generic version also uses unsigned long everywhere,
while the x86 versions use a mishmash of int, unsigned (int), long and
unsigned long.
Using the generic version does give a slightly bigger kernel, though.
defconfig: text data bss dec hex filename
x86-specific: 4738555 481232 626688 5846475 5935cb vmlinux (32 bit)
generic: 4738621 481232 626688 5846541 59360d vmlinux (32 bit)
x86-specific: 5392395 846568 724424 6963387 6a40bb vmlinux (64 bit)
generic: 5392458 846568 724424 6963450 6a40fa vmlinux (64 bit)
Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
pointed out by Linus: arch/um/Kconfig.x86_64 should
include arch/x86/Kconfig.cpu instead of defining those
symbols itself.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
fix:
arch/um/os-Linux/helper.c: In function 'run_helper':
arch/um/os-Linux/helper.c:73: error: 'PATH_MAX' undeclared (first use in this function)
Signed-off-by: Ingo Molnar <mingo@elte.hu>
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-x86-fixes:
x86 PAT: decouple from nonpromisc devmem
x86 PAT: tone down debugging messages
Linus pointed it out that PAT should not depend on NONPROMISC_DEVMEM.
Also make PAT non-default.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Stephen Rothwell reported that linux-next did not build on powerpc64.
make optimized inlining dependent on architecture opt-in.
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
add CONFIG_OPTIMIZE_INLINING=y.
allow gcc to optimize the kernel image's size by uninlining
functions that have been marked 'inline'. Previously gcc was
forced by Linux to always-inline these functions via a gcc
attribute:
#define inline inline __attribute__((always_inline))
Especially when the user has already selected
CONFIG_OPTIMIZE_FOR_SIZE=y this can make a huge difference in
kernel image size (using a standard Fedora .config):
text data bss dec hex filename
5613924 562708 3854336 10030968 990f78 vmlinux.before
5486689 562708 3854336 9903733 971e75 vmlinux.after
that's a 2.3% text size reduction (!).
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This patch fixes section mismatch warnings in unlock_ExtINT_logic().
WARNING: arch/x86/kernel/built-in.o(.text+0x14a92): Section mismatch in reference from the function unlock_ExtINT_logic()
to the function .init.text:find_isa_irq_pin()
The function unlock_ExtINT_logic() references
the function __init find_isa_irq_pin().
This is often because unlock_ExtINT_logic lacks a __init
annotation or the annotation of find_isa_irq_pin is wrong.
Signed-off-by: Jacek Luczak <luczak.jacek@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Fix following warning:
WARNING: arch/x86/kernel/built-in.o(.text+0x12cc9): Section mismatch in reference from the function unlock_ExtINT_logic()
unlock_ExtINT_logic() is only used by __init check_timer(). Annotate unlock_ExtINT_logic() witch __init.
Signed-off-by: Jacek Luczak <luczak.jacek@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Fix folowing warning:
WARNING: arch/x86/kernel/built-in.o(.text+0x10799): Section mismatch in reference from the function uniq_ioapic_id()
uniq_ioapic_id() is only used by __init mp_register_ioapic(). Annotate uniq_ioapic_id() with __init.
Signed-off-by: Jacek Luczak <luczak.jacek@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
This patch fixes section mismatch warnings of __cpuinit
setup_trampoline() on 32-bit host.
Signed-off-by: Jacek Luczak <luczak.jacek@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
get_bios_ebda() exists in asm/rio.h and asm/bios_ebda.h.
This patch removes the one in asm/rio.h.
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Remove the magic number in the third argment of div_sc().
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Remove the magic number in the second argument of clocksource_hz2mult()
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
memset and NULL check after alloc_bootmem() are unnecessary.
Because it returns zeroed memory and it never return NULL.
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Use bitmap library for pin_programmed rather than reinvent
bitmaps.
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Remove duplicate code by using MP_intsrc_info() in mpparse.c
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Use BUILD_BUG_ON() instead of compile-time error technique with
extern non-exsistent function.
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Now that there are no more special cases in sys32_ptrace, we
can convert to using the generic compat_sys_ptrace entry point.
The sys32_ptrace function gets simpler and becomes compat_arch_ptrace.
Signed-off-by: Roland McGrath <roland@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This removes the special-case handling for PTRACE_GETSIGINFO
and PTRACE_SETSIGINFO from x86_64's sys32_ptrace. The generic
compat_ptrace_request code handles these.
Signed-off-by: Roland McGrath <roland@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This lifts the set_fs(USER_DS) call for signal handler setup out of the
three places copying the same code into the one place that calls them
all. There is no change in what it does.
Signed-off-by: Roland McGrath <roland@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This lifts the code diddling the TF and DF bits for signal handler setup
out of the several places copying the same code into the one place that
calls them all. There is no change in what it does.
I also separated the recently-added DF bit clearing from the TF diddling.
The compiler turns them back into one instruction anyway. The tossing
in of DF to the same line of code with no new comments was a bit more
arcane than seems wise.
Signed-off-by: Roland McGrath <roland@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
It is claimed that NexGen CPUs were never shipped:
http://lkml.org/lkml/2008/4/20/179
Also, the kernel support for these chips has been broken for
a long time, the code intended to support NexGen thereby being
essentially dead.
As an outcome of the discussion that can be found using the URL
above, this patch removes the NexGen support altogether.
The changes in this patch survived a defconfig build for i386, a
couple of successful randconfig builds, as well as a runtime test,
which consisted in booting a 32-bit x86 box up to the shell prompt.
Signed-off-by: Dmitri Vorobiev <dmitri.vorobiev@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
In arch/x86/kernel/setup_64.c, the standard_io_resources array
is needlessly defined as global. This patch makes this variable
static.
This patch was successfully build-tested using the defconfig
for x86_64. Runtime test was performed by booting a 64-bit x86
box up to the shell prompt.
Signed-off-by: Dmitri Vorobiev <dmitri.vorobiev@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
There are no users for the function amd_init_cpu() defined in
arch/x86/kernel/cpu/amd.c. This patch removes this routine.
This patch was build-tested using defconfigs for i386 and x86_64,
and a few randconfig instances. Runtime tests were performed by
booting 32- and 64-bit x86 boxen up to the shell prompt.
Signed-off-by: Dmitri Vorobiev <dmitri.vorobiev@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
At least on my Barcelona, I see MCE log entries after cold boot caused
by BIOS not properly clearing the respective registers. Therefore, this
patch extends the workaround to families 0x10 and 0x11 (the latter just
for completeness, I have nothing to verify this against).
At the same time, provide a way to make these entries visible via the
'mce=bootlog' command line option even on these machines.
Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
.. since it uses ILL_BADSTK (which is meaningless in the context of
SIGSEGV).
Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
There apparently was an unnoticed conflict between an earlier patch to
this file and mine (d1e084746b), which
I noticed only now. I suppose a change like the one below (untested) is
needed; I didn't get any response on a confirmation request for this from
the submitter of the first patch.
The issue is the writing of the 'checkbit' member at the end of
setup_intel_arch_watchdog(), which my patch made go to intel_arch_wd_ops
rather than wd_ops.
Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Two prior changes resulted in the "ecx" clobber being lost.
Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Linus reported these excessive debug printouts:
> Overlap at 0xe0300000-0xe0400000
> Overlap at 0xe0300000-0xe0380000
> Overlap at 0xe0300000-0xe0400000
> Overlap at 0xe0300000-0xe0400000
> Overlap at 0xe0300000-0xe0400000
> Overlap at 0xe0300000-0xe0400000
> Overlap at 0xe0300000-0xe0400000
turn that into a pr_debug().
Signed-off-by: Ingo Molnar <mingo@elte.hu>
* 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc: (49 commits)
[POWERPC] Add zImage.iseries to arch/powerpc/boot/.gitignore
[POWERPC] bootwrapper: fix build error on virtex405-head.S
[POWERPC] 4xx: Fix 460GT support to not enable FPU
[POWERPC] 4xx: Add NOR FLASH entries to Canyonlands and Glacier dts
[POWERPC] Xilinx: of_serial support for Xilinx uart 16550.
[POWERPC] Xilinx: boot support for Xilinx uart 16550.
[POWERPC] celleb: Add support for PCI Express
[POWERPC] celleb: Move miscellaneous files for Beat
[POWERPC] celleb: Move a file for SPU on Beat
[POWERPC] celleb: Move files for Beat mmu and iommu
[POWERPC] celleb: Move files for Beat hvcall interfaces
[POWERPC] celleb: Move the SCC related code for celleb
[POWERPC] celleb: Move the files for celleb base support
[POWERPC] celleb: Consolidate io-workarounds code
[POWERPC] cell: Generalize io-workarounds code
[POWERPC] Add CONFIG_PPC_PSERIES_DEBUG to enable debugging for platforms/pseries
[POWERPC] Convert from DBG() to pr_debug() in platforms/pseries/
[POWERPC] Register udbg console early on pseries LPAR
[POWERPC] Mark udbg console as CON_ANYTIME, ie. callable early in boot
[POWERPC] Set udbg_console index to 0
...
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-x86-pat:
generic: add ioremap_wc() interface wrapper
/dev/mem: make promisc the default
pat: cleanups
x86: PAT use reserve free memtype in mmap of /dev/mem
x86: PAT phys_mem_access_prot_allowed for dev/mem mmap
x86: PAT avoid aliasing in /dev/mem read/write
devmem: add range_is_allowed() check to mmap of /dev/mem
x86: introduce /dev/mem restrictions with a config option
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-xen-next: (52 commits)
xen: add balloon driver
xen: allow compilation with non-flat memory
xen: fold xen_sysexit into xen_iret
xen: allow set_pte_at on init_mm to be lockless
xen: disable preemption during tlb flush
xen pvfb: Para-virtual framebuffer, keyboard and pointer driver
xen: Add compatibility aliases for frontend drivers
xen: Module autoprobing support for frontend drivers
xen blkfront: Delay wait for block devices until after the disk is added
xen/blkfront: use bdget_disk
xen: Make xen-blkfront write its protocol ABI to xenstore
xen: import arch generic part of xencomm
xen: make grant table arch portable
xen: replace callers of alloc_vm_area()/free_vm_area() with xen_ prefixed one
xen: make include/xen/page.h portable moving those definitions under asm dir
xen: add resend_irq_on_evtchn() definition into events.c
Xen: make events.c portable for ia64/xen support
xen: move events.c to drivers/xen for IA64/Xen support
xen: move features.c from arch/x86/xen/features.c to drivers/xen
xen: add missing definitions in include/xen/interface/vcpu.h which ia64/xen needs
...
Add some autogenerated files to various .gitignore files
Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Clean up the codepath, remove alignment restrictions and do sanity
checking of the end result, to make sure we patched the right site.
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
kernel_text_address returns true even for modules which is not wanted
in text_poke. Use core_kernel_text instead.
This is a regression introduced in e587cadd8f
which caused occasionaly crashes after suspend/resume.
Signed-off-by: Jiri Slaby <jirislaby@gmail.com>
CC: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
CC: Andi Kleen <andi@firstfloor.org>
CC: pageexec@freemail.hu
CC: H. Peter Anvin <hpa@zytor.com>
CC: Jeremy Fitzhardinge <jeremy@goop.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Otherwise all sorts of bad things can happen, including
spurious softlockup reports.
Other platforms have this same bug, in one form or
another, just don't see the issue because they
don't sleep as long as sparc64 can in NOHZ.
Thanks to some brilliant debugging by Peter Zijlstra.
Signed-off-by: David S. Miller <davem@davemloft.net>
There's no real reason we can't support sparsemem/discontigmem, so do so.
This is mostly useful to support hotplug memory.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
xen_sysexit and xen_iret were doing essentially the same thing. Rather
than having a separate implementation for xen_sysexit, we can just strip
the stack back to an iret frame and jump into xen_iret. This removes
a lot of code and complexity - specifically, another critical region.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
The usual pagetable locking protocol doesn't seem to apply to updates
to init_mm, so don't rely on preemption being disabled in xen_set_pte_at
on init_mm.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Various places in the kernel flush the tlb even though preemption doens't
guarantee the tlb flush is happening on any particular CPU. In many cases
this doesn't seem to matter, so don't make a fuss about it.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
split out x86 specific part from grant-table.c and
allow ia64/xen specific initialization.
ia64/xen grant table is based on pseudo physical address
(guest physical address) unlike x86/xen. On ia64 init_mm
doesn't map identity straight mapped area.
ia64/xen specific grant table initialization is necessary.
Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
move arch/x86/xen/events.c undedr drivers/xen to share codes
with x86 and ia64. And minor adjustment to compile.
ia64/xen also uses events.c
Signed-off-by: Yaozu (Eddie) Dong <eddie.dong@intel.com>
Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
ia64/xen also uses it too. Move it into common place so that
ia64/xen can share the code.
Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Use jmp rather than call for the iret fixup, so its consistent with
the sysexit fixup, and it simplifies the stack (which is already
complex).
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Mask MCE/MCA out of cpu caps. Its harmless to leave them there, but
it does prevent the kernel from starting an unnecessary thread.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
If an event comes in while events are currently being processed, then
just increment the counter and have the outer event loop reprocess the
pending events. This prevents unbounded recursion on heavy event
loads (of course massive event storms will cause infinite loops).
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
retrigger_dynirq() was incomplete, and didn't properly set the event
to be pending again. It doesn't seem to actually get used.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Xen supports the notion of a debug interrupt which can be triggered
from the console. For now this is implemented to show pending events,
masks and each CPU's pending event set.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
64-bit Xen supports sysenter for 32-bit guests, so support its
use. (sysenter is faster than int $0x80 in 32-on-64.)
sysexit is still not supported, so we fake it up using iret.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
All pagetables need fundamentally the same setup and destruction, so
just use the same code for everything.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Make KERNEL_PGD_PTRS common, as previously it was only being defined
for 32-bit.
There are a couple of follow-on changes from this:
- KERNEL_PGD_PTRS was being defined in terms of USER_PGD_PTRS. The
definition of USER_PGD_PTRS doesn't really make much sense on x86-64,
since it can have two different user address-space configurations.
I renamed USER_PGD_PTRS to KERNEL_PGD_BOUNDARY, which is meaningful
for all of 32/32, 32/64 and 64/64 process configurations.
- USER_PTRS_PER_PGD was also defined and was being used for similar
purposes. Converting its users to KERNEL_PGD_BOUNDARY left it
completely unused, and so I removed it.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: Andi Kleen <ak@suse.de>
Cc: Zach Amsden <zach@vmware.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
We can fold the essentially common pte functions together now.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
pte_t always contains a "pte" field for the whole pte value, so make
use of it.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Rename (alloc|release)_(pt|pd) to pte/pmd to explicitly match the name
of the appropriate pagetable level structure.
[ x86.git merge work by Mark McLoughlin <markmc@redhat.com> ]
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Common definitions for 3-level pagetable functions.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Common definitions for 2-level pagetable functions.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Add a common arch/x86/mm/pgtable.c file for common pagetable functions.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Convert asm-x86/pgalloc_64.h from macros into functions (#include hell
prevents __*_free_tlb from being inline, but they're probably a bit
big to inline anyway).
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Use reserve_memtype and free_memtype wrappers for /dev/mem mmaps. The memtype
is slightly complicated here, given that we have to support existing X mappings.
We fallback on UC_MINUS for that.
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Introduce phys_mem_access_prot_allowed(), which checks whether the mapping
is possible, without any conflicts and returns success or failure based on that.
phys_mem_access_prot() by itself does not allow failure case. This ability
to return error is needed for PAT where we may have aliasing conflicts.
x86 setup __HAVE_PHYS_MEM_ACCESS_PROT and move x86 specific code out of
/dev/mem into arch specific area.
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Add xlate and unxlate around /dev/mem read/write. This sets up the mapping
that can be used for /dev/mem read and write without aliasing worries.
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This patch introduces a restriction on /dev/mem: Only non-memory can be
read or written unless the newly introduced config option is set.
The X server needs access to /dev/mem for the PCI space, but it doesn't need
access to memory; both the file permissions and SELinux permissions of /dev/mem
just make X effectively super-super powerful. With the exception of the
BIOS area, there's just no valid app that uses /dev/mem on actual memory.
Other popular users of /dev/mem are rootkits and the like.
(note: mmap access of memory via /dev/mem was already not allowed since
a really long time)
People who want to use /dev/mem for kernel debugging can enable the config
option.
The restrictions of this patch have been in the Fedora and RHEL kernels for
at least 4 years without any problems.
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
virtex405-head.S is an assembler file, not a C file; therefore BOOTAFLAGS
is the correct place to set the needed -mcpu=405 flag.
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
Signed-off-by: Josh Boyer <jwboyer@linux.vnet.ibm.com>
The AMCC 460GT doesn't have an FPU so let's not enable support for it.
Signed-off-by: Stefan Roese <sr@denx.de>
Signed-off-by: Josh Boyer <jwboyer@linux.vnet.ibm.com>
This patch adds default NOR entries to the AMCC Canyonlands (460EX)
and Glacier (460GT) dts files.
Signed-off-by: Stefan Roese <sr@denx.de>
Signed-off-by: Josh Boyer <jwboyer@linux.vnet.ibm.com>
The Xilinx 16550 uart core is not a standard 16550 because it uses
word-based addressing rather than byte-based adressing. With
additional properties it is compatible with the open firmware
'ns16550' compatible binding.
This code updates the ns16550 driver to use the reg-offset property
so that the Xilinx UART 16550 can be used with it. The reg-shift
was already being handled.
Signed-off-by: John Linn <john.linn@xilinx.com>
Acked-by: Grant Likely <grant.likely@secretlab.ca>
Signed-off-by: Josh Boyer <jwboyer@linux.vnet.ibm.com>
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/cooloney/blackfin-2.6: (85 commits)
Blackfin char driver for Blackfin on-chip OTP memory (v3)
Blackfin Serial Driver: fix bug - use mod_timer to replace only add_timer.
Blackfin Serial Driver: the uart break anomaly has been given its own number, so switch to it
Blackfin Serial Driver: use BFIN_UART_NR_PORTS to help SIR driver in uart port.
Blackfin Serial Driver: Fix bug - kernel hangs when accessing uart 0 on bf537 when booting u-boot and linux on uart 1
Blackfin Serial Driver: punt unused lsr variable
Blackfin Serial Driver: Enable IR function when user application (irattach /dev/ttyBFx -s) call TIOCSETD ioctl with line discipline N_IRDA
[Blackfin] arch: add include/boot .gitignore files
[Blackfin] arch: Functional power management support: Add support for cpu frequency scaling
[Blackfin] arch: Functional power management support: Remove broken cpu frequency scaling drivers
[Blackfin] arch: Equalize include files: Add PLL_DIV Masks
[Blackfin] arch: Add a warning about the value of CLKIN.
[Blackfin] arch: take DDR DEVWD into consideration as well for BF548
[Blackfin] arch: Remove the circular buffering mechanism for exceptions
[Blackfin] arch: lose unnecessary dependency on CONFIG_BFIN_ICACHE for MPU
[Blackfin] arch: fix bug - before assign new channel to the map register, need clear the bits first.
[Blackfin] arch: add Blackfin on-chip SIR IrDA driver support
[Blackfin] arch: BF54x memsizes are in mbits, not mbytes
[Blackfin] arch: try to remove condition that causes double fault, by checking current before it gets dereferenced
[Blackfin] arch: Update anomaly list.
...
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc-2.6: (23 commits)
sparc: sunzilog uart order
[SPARC64]: Detect trap frames in stack backtraces.
[SPARC64]: %l6 trap return handling no longer necessary.
[SPARC64]: Use trap type stored in pt_regs to handle syscall restart.
[SPARC64]: Store magic cookie and trap type in pt_regs.
[SPARC64]: PROM debug console can be CON_ANYTIME.
sparc64: cleanup after SunOS/Solaris binary emulation removal
sparc: cleanup after SunOS binary emulation removal
[SPARC64]: Add NUMA support.
[SPARC64]: Allocate TSB node-local.
[SPARC64]: NUMA device infrastructure.
[SPARC64]: Kill pci_iommu_table_init() declaration.
[SPARC64]: Once we have the boot cmdline, call parse_early_param()
[SPARC64]: Remove unused asm-sparc64/numnodes.h
[SPARC64]: Decrease SECTION_SIZE_BITS to 30.
[SPARC64]: Initialize MDESC earlier and use lmb_alloc()
[SPARC64]: Use lmb_alloc() for PROM device tree.
[SPARC64]: Call real_setup_per_cpu_areas() earlier and use lmb_alloc().
[SPARC64]: Fully use LMB information in bootmem_init().
[SPARC64]: Start using LMB information in bootmem_init().
...
OSF/1 brk(2) was broken by following one-liner in sys_brk()
(commit 4cc6028d40):
- if (brk < mm->end_code)
+ if (brk < mm->start_brk)
goto out;
The problem is that osf_set_program_attributes()
does update mm->end_code, but not mm->start_brk,
which still contains inappropriate value left from
binary loader, so brk() always fails.
Signed-off-by: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Legacy IDE resources were never properly allocated on most
alpha platforms, so IDE expectedly stopped working after
commit 10f000a2fd (generic
pci_enable_resources).
Always allocate "fixed" PCI resources before doing anything else;
remove Cypress IDE quirk, as it's a generic problem which is
handled in common PCI probe code.
Signed-off-by: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Acked-by: Jeff Garzik <jgarzik@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The recent irq cleanups for arch/arm/mach-integrator/time.c and
drivers/char/mwave/tp3780i.c changed the request_irq() dev_id
parameter, but neglected to change the matching free_irq() parameter,
thus creating a bug upon irq de-registration.
Given that the impetus for the changes is not yet accepted upstream,
it is best to revert the irq cleanups.
Mostly. A comment is added to time.c to reduce future confusion,
of type that led to my time.c cleanup in the first place.
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
This adds support for PCI Express port on Celleb. I/O space of this
PCI Express port is not mapped in memory space. So we use the
io-workaround mechanism to make accesses indirect.
Signed-off-by: Kou Ishizaki <kou.ishizaki@toshiba.co.jp>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This moves miscellaneous files for Beat into platforms/cell/.
All files in this patch are used by celleb-beat only.
Signed-off-by: Kou Ishizaki <kou.ishizaki@toshiba.co.jp>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This moves SPU support code on Beat into platforms/cell/.
Signed-off-by: Kou Ishizaki <kou.ishizaki@toshiba.co.jp>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This moves files for mmu and iommu on Beat into platforms/cell/.
All files in this patch are used by celleb-beat only.
Signed-off-by: Kou Ishizaki <kou.ishizaki@toshiba.co.jp>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This moves files for Beat hvcall interfaces into platforms/cell/.
All files in this patch are used by celleb-beat only.
Signed-off-by: Kou Ishizaki <kou.ishizaki@toshiba.co.jp>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This moves the SCC (Super Companion Chip) related code for celleb
into platforms/cell/.
All files in this patch are used by celleb-beat and celleb-native
commonly.
Signed-off-by: Kou Ishizaki <kou.ishizaki@toshiba.co.jp>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This moves the base code for celleb support into platforms/cell/.
All files in this patch are used by celleb-beat and celleb-native
commonly.
Signed-off-by: Kou Ishizaki <kou.ishizaki@toshiba.co.jp>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Now, we can use generic io-workarounds mechanism and the workaround
code for spider-pci. This changes Celleb PCI code to use spider-pci
code.
Signed-off-by: Kou Ishizaki <kou.ishizaki@toshiba.co.jp>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This splits cell io-workaround code into spider-pci dependent code and
a generic part, and also moves io-workarounds initialization into
cell_setup_phb.
Signed-off-by: Kou Ishizaki <kou.ishizaki@toshiba.co.jp>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Add a DEBUG config setting which turns on all (most) of the debugging
under platforms/pseries.
To have this take effect we need to remove all the #undef DEBUG's, in
various files. We leave the #undef DEBUG in platforms/pseries/lpar.c,
as this enables debugging printks from the low-level hash table routines,
and tends to make your system unusable. If you want those enabled you
still have to turn them on by hand.
Also some of the RAS code has a DEBUG block which causes a functional
change, so I've keyed this off a different (non-existant) debug #define.
This is only enabled if you have PPC_EARLY_DEBUG enabled also.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
In pseries/lpar.c, fix some printf specifier mismatches, and add
a newline to one printk.
In pseries/rtasd.c add "rtasd" to some messages to make it clear
where they're coming from.
In pseries/scanlog.c remove the hand-rolled runtime debugging support
in there. This file has been largely unchanged for eons, if we need to
debug it in future we can recompile.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
On pseries LPAR we can call the udbg routines, and the udbg console very
early. So mark the udbg console as safe to call early in boot, and register
the udbg console as soon as the udbg routines are hooked up.
This allows platforms/pseries code to use printk() and pr_debug() rather
than needing to call udbg_printf() directly for early debugging. This is
nice because a) it's standard, b) it goes via the printk buffer, and c)
you can get printk time stamps.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
The udbg console should be safe to call basically at any time after boot.
It does not need any per-cpu resources or for the cpu to be online, as
long as there is a udbg_putc routine hooked up it should work. So mark it
as CON_ANYTIME.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Because the udbg_console has CON_ENABLED set, it's possible that when we
register it with the console code the index won't be set. This leads to
slightly confusing boot messages like:
[ 0.000000] console [udbg-1] enabled
We could remove CON_ENABLED, but we don't want to do that, we always
want the udbg console to be activated, even if the user specified some
other console on the command line.
The simplest fix seems to be just to set the index to 0 by hand. There
is no issue with duplicate udbg consoles, as we guard against registering
multiple times in register_early_udbg_console().
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This adds the required functionality to fill in all pacas at runtime.
With NR_CPUS=1024
text data bss dec hex filename
137 1704032 0 1704169 1a00e9 arch/powerpc/kernel/paca.o :Before
121 1179744 524288 1704153 1a00d9 arch/powerpc/kernel/paca.o :After
Also remove unneeded #includes from arch/powerpc/kernel/paca.c
Signed-off-by: Tony Breeds <tony@bakeyournoodle.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Currently all iSeries secondary CPUs spin directly on the cpu_start
field in their paca. Make them spin on the global
__secondary_hold_spinloop until after the pacas have been initialised.
As Stephen Rothwell points out, this works at the moment because
__secondary_hold_spinloop is being set already, but iSeries isn't
looking at it :)
Signed-off-by: Tony Breeds <tony@bakeyournoodle.com>
Acked-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
* Removed get_msr(), get_srr0(), and get_srr1() - not used anywhere
* Use STACK_FRAME_OVERHEAD instead of magic number
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Replace two open-coded occurences of the of_get_next_parent() logic.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
As BenH said the other day, it is an "accident" that prom_init.o is
linked with the rest of the kernel. The truth is a little more
subtle, prom_init isn't truly bootloader, it does access kernel data
in a few places.
What we can do is discourage people from adding new code that accesses
data outside of prom_init. And hence this patch; from the script:
# This script checks prom_init.o to see what external symbols it
# is using, if it finds symbols not in the whitelist it returns
# an error. The point of this is to discourage people from
# intentionally or accidentally adding new code to prom_init.c
# which has side effects on other parts of the kernel.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
* Removed TI_EXECDOMAIN define as its not used anywhere
* Use STACK_INT_FRAME_SIZE to allow common define of INT_FRAME_SIZE
* Define TI_CPU on both ppc32 & ppc64 (removes an ifdef).
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Use (31-THREAD_SHIFT) to get to thread_info from stack pointer. This makes
the code a bit easier to read and more robust if we ever change THREAD_SHIFT.
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Remove the inclusion of asm-offsets.h from stacktrace.c. It isn't
supposed to be included in C code and it causes problems with multiple
definitions of things.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Paul Mackerras <paulus@samba.org>
The fixmap code from x86 allows us to have compile time virtual addresses
that we change the physical addresses of at run time.
This is useful for applications like kmap_atomic, PCI config that is done
via direct memory map, kexec/kdump.
We got ride of CONFIG_HIGHMEM_START as we can now determine a more optimal
location for PKMAP_BASE based on where the fixmap addresses start and
working back from there.
Additionally, the kmap code in asm-powerpc/highmem.h always had debug
enabled. Moved to using CONFIG_DEBUG_HIGHMEM to determine if we should
have the extra debug checking.
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Added support to allow an 85xx kernel to be run from a non-zero physical
address (useful for cooperative asymmetric multiprocessing situations and
kdump). The support can be configured at compile time by setting
CONFIG_PAGE_OFFSET, CONFIG_KERNEL_START, and CONFIG_PHYSICAL_START as
desired.
Alternatively, the kernel build can set CONFIG_RELOCATABLE. Setting this
config option causes the kernel to determine at runtime the physical
addresses of CONFIG_PAGE_OFFSET and CONFIG_KERNEL_START. If
CONFIG_RELOCATABLE is set, then CONFIG_PHYSICAL_START has no meaning.
However, CONFIG_PHYSICAL_START will always be used to set the LOAD program
header physical address field in the resulting ELF image.
Currently we are limited to running at a physical address that is a
multiple of 256M. This is due to how we map TLBs to cover
lowmem. This should be fixed to allow 64M or maybe even 16M alignment
in the future. It is considered an error to try and run a kernel at a
non-aligned physical address.
All the magic for this support is accomplished by proper initialization
of the kernel memory subsystem and use of ARCH_PFN_OFFSET.
The use of ARCH_PFN_OFFSET only affects normal memory and not IO mappings.
ioremap uses map_page and isn't affected by ARCH_PFN_OFFSET.
/dev/mem continues to allow access to any physical address in the system
regardless of how CONFIG_PHYSICAL_START is set.
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Commit 0119536cd3 added an assembly
version of strncmp to PowerPC. However, it changed a common header
file between arch/ppc and arch/powerpc without adding strncmp to
arch/ppc. This fixes that omission so that arch/ppc links again.
Signed-off-by: Josh Boyer <jwboyer@linux.vnet.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
The MPSC driver and prpmc2800.dts have been modified to use property
'cell-index' as the serial port number, but the early serial console
driver for the mv64x60 has not been modified to use this new property.
This fixes it.
Signed-off-by: Remi Machet (rmachet@slac.stanford.edu)
Acked-by: Dale Farnsworth <dale@farnsworth.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
If one of the devices of the mv64x60 init fails, the remaining
devices are not initialized.
This changes the code to display an error and continue the
initialization.
Signed-off-by: Remi Machet (rmachet@slac.stanford.edu)
Signed-off-by: Paul Mackerras <paulus@samba.org>
I2C parameters freq_m and freq_n are assigned defaults in the code,
but if properties for those parameters are not found in the open
firmware description the init routine returns an error and doesn't
create the platform device.
This changes the code so that it doesn't return an error if the
properties are not found but instead uses the default values.
Signed-off-by: Remi Machet (rmachet@slac.stanford.edu)
Acked-by: Dale Farnsworth <dale@farnsworth.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
The powerpc kernel stacks need to be naturally aligned, as they
contain the thread info at the bottom, which is obtained by
clearing the low bits of the stack pointer.
However, when using 64K pages, the stack is smaller than a page,
so we use kmalloc to allocate it, but that doesn't provide the
alignment guarantee we need.
It appeared to work so far... until one enables SLUB debugging
which then returns unaligned pointers. Ooops...
This fixes it by using a slab cache with enforced alignment. It
relies on my previous patch that adds a thread_info_cache_init()
callback.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
os-area.c requires routines declared in linux/of.h, so should include it.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
numa.c requires routines declared in linux/of.h, so should include it.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Now that we have a magic cookie in the pt_regs, we can
properly detect trap frames in stack bactraces.
Signed-off-by: David S. Miller <davem@davemloft.net>
Now that we indicate the "restart system call" in the
trap type field of pt_regs->magic, we don't need to
set the %l6 boolean in all of the trap return paths.
And we therefore don't need to pass it to do_notify_resume().
Signed-off-by: David S. Miller <davem@davemloft.net>
Now that we can check the trap type directly, we don't need the
funny restart_syscall indication from the trap return paths.
Signed-off-by: David S. Miller <davem@davemloft.net>
The proc-*.S files have the _prefetch_abort pointer placed at the end
of the processor structure but the cpu-multi32.h defines it in the
second position. The patch also fixes the support for XSC3 and the
MMU-less CPUs (740, 7tdmi, 940, 946 and 9tdmi).
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
This sets us up for several simplifications and facilities:
1) The magic cookie lets us identify trap frames more
accurately in stack backtraces.
2) The trap type lets us simplify all of the "are we in
a syscall" state management and checks.
3) We can now see if a task off the cpu is sleeping in
a system call or not. In fact, we can see what
trap it is sleeping in whatever the type. The utrace
guys will use this.
Based upon some discussions with Roland McGrath.
Signed-off-by: David S. Miller <davem@davemloft.net>
The following cleanups are now possible:
- arch/sparc64/kernel/entry.S:ret_sys_call no longer has to be global
- arch/sparc64/kernel/sparc64_ksyms.c:
remove no longer used prototypes
Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The following cleanups are now possible:
- arch/sparc/kernel/entry.S:ret_sys_call no longer has to be global
- arch/sparc/kernel/signal.c:sys_sigpause() can be removed
Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently there is only code to parse NUMA attributes on
sun4v/niagara systems, but later on we will add such parsing
for older systems.
Signed-off-by: David S. Miller <davem@davemloft.net>
We have to do it like this before we can move the PROM and MDESC device
tree code over to using lmb_alloc().
Signed-off-by: David S. Miller <davem@davemloft.net>
Call lmb_add() on available regions, and call lmb_reserve()
on the main kernel image and the ramdisk (if any).
Signed-off-by: David S. Miller <davem@davemloft.net>
And add some comments explaining all of the quirks involved in
the way the bootloader provides this information.
Signed-off-by: David S. Miller <davem@davemloft.net>
This reverts commit e4cc58944c, as
requested by Roland McGrath, because compat_ptrace_request (added in
commit e16b278164, "ptrace:
compat_ptrace_request siginfo") now handles this case.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Move XPC and XPNET from arch/ia64/sn/kernel to drivers/misc/sgi-xp.
Signed-off-by: Dean Nelson <dcn@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
- remove unused 'irq' argument from pfm_do_interrupt_handler()
- remove pointless cast to void*
- add KERN_xxx prefix to printk()
- remove braces around singleton C statement
- in tioce_provider.c, start tioce_dma_consistent() and
tioce_error_intr_handler() function declarations in column 0
This change's main purpose is to prepare for the patchset in
jgarzik/misc-2.6.git#irq-remove, that explores removal of the
never-used 'irq' argument in each interrupt handler.
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
There are many notify_die() and almost all take same style with
ia64_mca_spin(). This patch defines macros and replace them all,
to reduce lines and to improve readability.
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
There are 3 hooks in MCA handler, but this DIE_MCA_MONARCH_PROCESS
event does not notified other than for the first monarch.
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
While testing with CONFIG_VIRT_CPU_ACCOUNTING=y, I found that
I occasionally get very huge system time in some threads.
So I dug the issue and finally noticed that it was caused
because of an interrupt which interrupt in the following window:
> [arch/ia64/kernel/entry.S: (!CONFIG_PREEMPT && CONFIG_VIRT_CPU_ACCOUNTING)]
>
> ENTRY(ia64_leave_syscall)
> :
> (pUStk) rsm psr.i
> cmp.eq pLvSys,p0=r0,r0 // pLvSys=1: leave from syscall
> (pUStk) cmp.eq.unc p6,p0=r0,r0 // p6 <- pUStk
> .work_processed_syscall:
> adds r2=PT(LOADRS)+16,r12
> (pUStk) mov.m r22=ar.itc // fetch time at leave
> adds r18=TI_FLAGS+IA64_TASK_SIZE,r13
> ;;
> <<< window: from here >>>
> (p6) ld4 r31=[r18] // load current_thread_info()->flags
> ld8 r19=[r2],PT(B6)-PT(LOADRS)
> adds r3=PT(AR_BSPSTORE)+16,r12
> ;;
> mov r16=ar.bsp
> ld8 r18=[r2],PT(R9)-PT(B6)
> (p6) and r15=TIF_WORK_MASK,r31 // any work other than TIF_SYSCALL_TRACE?
> ;;
> ld8 r23=[r3],PT(R11)-PT(AR_BSPSTORE)
> (p6) cmp4.ne.unc p6,p0=r15, r0 // any special work pending?
> (p6) br.cond.spnt .work_pending_syscall
> ;;
> ld8 r9=[r2],PT(CR_IPSR)-PT(R9)
> ld8 r11=[r3],PT(CR_IIP)-PT(R11)
> (pNonSys) break 0 // bug check: we shouldn't be here if pNonSys is TRUE!
> ;;
> invala
> <<< window: to here >>>
> rsm psr.i | psr.ic // turn off interrupts and interruption collection
If pUStk is true, it means we are going to return user mode, hence we fetch
ar.itc to get time at leave from system.
It seems that it is not possible to interrupt the window if pUStk is true,
because interrupts are disabled early. And also disabling interrupt makes
sense because it is safe for referring current_thread_info()->flags.
However interrupting the window while pUStk is true was possible.
The route was:
ia64_trace_syscall
-> .work_pending_syscall_end
-> .work_processed_syscall
Only in case entering the window from this route, interrupts are enabled
during in the window even if pUStk is true. I suppose interrupts must be
disabled here anyway if pUStk is true.
I'm not sure but afraid that what kind of bad effect were there, other
than crazy system time which I found.
FYI, there was a commit 6f6d75825d that
points out a bug at same point(exit of ia64_trace_syscall) in 2006.
It can be said that there was an another bug.
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/juhl/trivial: (24 commits)
DOC: A couple corrections and clarifications in USB doc.
Generate a slightly more informative error msg for bad HZ
fix typo "is" -> "if" in Makefile
ext*: spelling fix prefered -> preferred
DOCUMENTATION: Use newer DEFINE_SPINLOCK macro in docs.
KEYS: Fix the comment to match the file name in rxrpc-type.h.
RAID: remove trailing space from printk line
DMA engine: typo fixes
Remove unused MAX_NODES_SHIFT
MAINTAINERS: Clarify access to OCFS2 development mailing list.
V4L: Storage class should be before const qualifier (sn9c102)
V4L: Storage class should be before const qualifier
sonypi: Storage class should be before const qualifier
intel_menlow: Storage class should be before const qualifier
DVB: Storage class should be before const qualifier
arm: Storage class should be before const qualifier
ALSA: Storage class should be before const qualifier
acpi: Storage class should be before const qualifier
firmware_sample_driver.c: fix coding style
MAINTAINERS: Add ati_remote2 driver
...
Fixed up trivial conflicts in firmware_sample_driver.c
This patch removes the no longer used export of kmap_atomic_to_page.
Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
[HWRNG] omap: Minor updates
[CRYPTO] kconfig: Ordering cleanup
[CRYPTO] all: Clean up init()/fini()
[CRYPTO] padlock-aes: Use generic setkey function
[CRYPTO] aes: Export generic setkey
[CRYPTO] api: Make the crypto subsystem fully modular
[CRYPTO] cts: Add CTS mode required for Kerberos AES support
[CRYPTO] lrw: Replace all adds to big endians variables with be*_add_cpu
[CRYPTO] tcrypt: Change the XTEA test vectors
[CRYPTO] tcrypt: Shrink the tcrypt module
[CRYPTO] tcrypt: Change the usage of the test vectors
[CRYPTO] api: Constify function pointer tables
[CRYPTO] aes-x86-32: Remove unused return code
[CRYPTO] tcrypt: Shrink speed templates
[CRYPTO] tcrypt: Group common speed templates
[CRYPTO] sha512: Rename sha512 to sha512_generic
[CRYPTO] sha384: Hardware acceleration for s390
[CRYPTO] sha512: Hardware acceleration for s390
[CRYPTO] s390: Generic sha_update and sha_final
[CRYPTO] api: Switch to proc_create()
* 'irq-cleanups-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/misc-2.6:
[ISDN] minor irq handler cleanups
drivers/char: minor irq handler cleanups
[PPC] minor irq handler cleanups
[BLACKFIN] minor irq handler cleanups
[SPARC] minor irq handler cleanups
ARM minor irq handler cleanup: avoid passing unused info to irq
* 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc: (202 commits)
[POWERPC] Fix compile breakage for 64-bit UP configs
[POWERPC] Define copy_siginfo_from_user32
[POWERPC] Add compat handler for PTRACE_GETSIGINFO
[POWERPC] i2c: Fix build breakage introduced by OF helpers
[POWERPC] Optimize fls64() on 64-bit processors
[POWERPC] irqtrace support for 64-bit powerpc
[POWERPC] Stacktrace support for lockdep
[POWERPC] Move stackframe definitions to common header
[POWERPC] Fix device-tree locking vs. interrupts
[POWERPC] Make pci_bus_to_host()'s struct pci_bus * argument const
[POWERPC] Remove unused __max_memory variable
[POWERPC] Simplify xics direct/lpar irq_host setup
[POWERPC] Use pseries_setup_i8259_cascade() in pseries_mpic_init_IRQ()
[POWERPC] Turn xics_setup_8259_cascade() into a generic pseries_setup_i8259_cascade()
[POWERPC] Move xics_setup_8259_cascade() into platforms/pseries/setup.c
[POWERPC] Use asm-generic/bitops/find.h in bitops.h
[POWERPC] 83xx: mpc8315 - fix USB UTMI Host setup
[POWERPC] 85xx: Fix the size of qe muram for MPC8568E
[POWERPC] 86xx: mpc86xx_hpcn - Temporarily accept old dts node identifier.
[POWERPC] 86xx: mark functions static, other minor cleanups
...
* git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-2.6: (36 commits)
SCSI: convert struct class_device to struct device
DRM: remove unused dev_class
IB: rename "dev" to "srp_dev" in srp_host structure
IB: convert struct class_device to struct device
memstick: convert struct class_device to struct device
driver core: replace remaining __FUNCTION__ occurrences
sysfs: refill attribute buffer when reading from offset 0
PM: Remove destroy_suspended_device()
Firmware: add iSCSI iBFT Support
PM: Remove legacy PM (fix)
Kobject: Replace list_for_each() with list_for_each_entry().
SYSFS: Explicitly include required header file slab.h.
Driver core: make device_is_registered() work for class devices
PM: Convert wakeup flag accessors to inline functions
PM: Make wakeup flags available whenever CONFIG_PM is set
PM: Fix misuse of wakeup flag accessors in serial core
Driver core: Call device_pm_add() after bus_add_device() in device_add()
PM: Handle device registrations during suspend/resume
block: send disk "change" event for rescan_partitions()
sysdev: detect multiple driver registrations
...
Fixed trivial conflict in include/linux/memory.h due to semaphore header
file change (made irrelevant by the change to mutex).
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mingo/linux-2.6-sched-devel: (62 commits)
sched: build fix
sched: better rt-group documentation
sched: features fix
sched: /debug/sched_features
sched: add SCHED_FEAT_DEADLINE
sched: debug: show a weight tree
sched: fair: weight calculations
sched: fair-group: de-couple load-balancing from the rb-trees
sched: fair-group scheduling vs latency
sched: rt-group: optimize dequeue_rt_stack
sched: debug: add some debug code to handle the full hierarchy
sched: fair-group: SMP-nice for group scheduling
sched, cpuset: customize sched domains, core
sched, cpuset: customize sched domains, docs
sched: prepatory code movement
sched: rt: multi level group constraints
sched: task_group hierarchy
sched: fix the task_group hierarchy for UID grouping
sched: allow the group scheduler to have multiple levels
sched: mix tasks and groups
...
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-x86: (77 commits)
x86: UV startup of slave cpus
x86: integrate pci-dma.c
x86: don't do dma if mask is NULL.
x86: return conditional to mmu
x86: remove kludge from x86_64
x86: unify gfp masks
x86: retry allocation if failed
x86: don't try to allocate from DMA zone at first
x86: use a fallback dev for i386
x86: use numa allocation function in i386
x86: remove virt_to_bus in pci-dma_64.c
x86: adjust dma_free_coherent for i386
x86: move bad_dma_address
x86: isolate coherent mapping functions
x86: move dma_coherent functions to pci-dma.c
x86: merge iommu initialization parameters
x86: merge dma_supported
x86: move pci fixup to pci-dma.c
x86: move x86_64-specific to common code.
x86: move initialization functions to pci-dma.c
...
* git://git.kernel.org/pub/scm/linux/kernel/git/lethal/sh-2.6: (27 commits)
sh: Fix up L2 cache probe.
sh: Fix up SH-4A part probe.
sh: Add support for SH7723 CPU subtype.
sh: Fix up SH7763 build.
sh: Add migor_ts support to MigoR
sh: Add rs5c732b RTC support to MigoR
sh: Add I2C support to MigoR
sh: Add I2C platform data to sh7722
sh: MigoR NAND flash support using gen_flash
sh: MigoR NOR flash support using physmap-flash
sh: Fix up mach-types formatting from merge damage.
sh: r7780rp: Hook up the I2C and SMBus platform devices.
sh: Use phyical addresses for MigoR smc91x resources
sh: Use physical addresses for sh7722 USBF resources
sh: Add MigoR header file
Fix sh_keysc double free
sh: Fix up __access_ok() check for nommu.
sh: Allow optimized clear/copy page routines to be used on SH-2.
sh: Hook up the rest of the SH7770 serial ports.
sh: Add support for Solution Engine SH7721 board
...
The C99 specification states in section 6.11.5:
The placement of a storage-class specifier other than at the
beginning of the declaration specifiers in a declaration is an
obsolescent feature.
Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
Signed-off-by: Jesper Juhl <jesper.juhl@gmail.com>
603 CPUs have the same issue that some 750 CPUs have in that they can crash
in funny ways if a store from an FPU register instruction is executed on a
register that has never been initialized since power on. This patch fixes
it by making sure all FP registers have been properly initialized at kernel
boot.
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Use the generic pci_enable_resources() instead of the arch-specific code.
Unlike this arch-specific code, the generic version:
- checks PCI_NUM_RESOURCES (11), not DEVICE_COUNT_RESOURCE (12), resources
- skips resources that have neither IORESOURCE_IO nor IORESOURCE_MEM set
- skips ROM resources unless IORESOURCE_ROM_ENABLE is set
- checks for resource collisions with "!r->parent"
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Acked-by: Kyle McMartin <kyle@mcmartin.ca>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Use the generic pci_enable_resources() instead of the arch-specific code.
Unlike this arch-specific code, the generic version:
- checks PCI_NUM_RESOURCES (11), not 6, resources
- skips resources that have neither IORESOURCE_IO nor IORESOURCE_MEM set
- skips ROM resources unless IORESOURCE_ROM_ENABLE is set
- checks for resource collisions with "!r->parent", not IORESOURCE_UNSET
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Use the generic pci_enable_resources() instead of the arch-specific code.
The generic version is functionally equivalent, but uses dev_printk.
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Use the generic pci_enable_resources() instead of the arch-specific code.
Unlike this arch-specific code, the generic version:
- does not check for a NULL dev pointer
- skips resources that have neither IORESOURCE_IO nor IORESOURCE_MEM set
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Use the generic pci_enable_resources() instead of the arch-specific code.
Unlike this arch-specific code, the generic version:
- skips resources unless requested in "mask"
- skips ROM resources unless IORESOURCE_ROM_ENABLE is set
- checks for resource collisions with "!r->parent"
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Use the generic pci_enable_resources() instead of the arch-specific code.
Unlike this arch-specific code, the generic version:
- checks for resource collisions with "!r->parent"
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
The "pci=routeirq" option was added in 2004, and I don't get any valid
reports anymore. The option is still mentioned in kernel-parameters.txt.
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
The PCI bus names included in /proc/iomem and /proc/ioports are
of the form 'PCI Bus #XX' where XX is the bus number. This patch
changes the naming to 'PCI Bus XXXX:YY' where XXXX is the domain
number and YY is the bus number. For example, PCI bus 14 in
domain 0 will show as 'PCI Bus 0000:14' instead of 'PCI Bus #14'.
This change makes the naming consistent with other architectures
such as ia64 where multiple PCI domain support has been around
longer.
Signed-off-by: Gary Hade <garyhade@us.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>