* 'proc-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/adobriyan/proc:
Revert "proc: revert /proc/uptime to ->read_proc hook"
proc 2/2: remove struct proc_dir_entry::owner
proc 1/2: do PDE usecounting even for ->read_proc, ->write_proc
proc: fix sparse warnings in pagemap_read()
proc: move fs/proc/inode-alloc.txt comment into a source file
Setting ->owner as done currently (pde->owner = THIS_MODULE) is racy
as correctly noted at bug #12454. Someone can lookup entry with NULL
->owner, thus not pinning enything, and release it later resulting
in module refcount underflow.
We can keep ->owner and supply it at registration time like ->proc_fops
and ->data.
But this leaves ->owner as easy-manipulative field (just one C assignment)
and somebody will forget to unpin previous/pin current module when
switching ->owner. ->proc_fops is declared as "const" which should give
some thoughts.
->read_proc/->write_proc were just fixed to not require ->owner for
protection.
rmmod'ed directories will be empty and return "." and ".." -- no harm.
And directories with tricky enough readdir and lookup shouldn't be modular.
We definitely don't want such modular code.
Removing ->owner will also make PDE smaller.
So, let's nuke it.
Kudos to Jeff Layton for reminding about this, let's say, oversight.
http://bugzilla.kernel.org/show_bug.cgi?id=12454
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
* 'percpu-cpumask-x86-for-linus-2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (682 commits)
percpu: fix spurious alignment WARN in legacy SMP percpu allocator
percpu: generalize embedding first chunk setup helper
percpu: more flexibility for @dyn_size of pcpu_setup_first_chunk()
percpu: make x86 addr <-> pcpu ptr conversion macros generic
linker script: define __per_cpu_load on all SMP capable archs
x86: UV: remove uv_flush_tlb_others() WARN_ON
percpu: finer grained locking to break deadlock and allow atomic free
percpu: move fully free chunk reclamation into a work
percpu: move chunk area map extension out of area allocation
percpu: replace pcpu_realloc() with pcpu_mem_alloc() and pcpu_mem_free()
x86, percpu: setup reserved percpu area for x86_64
percpu, module: implement reserved allocation and use it for module percpu variables
percpu: add an indirection ptr for chunk page map access
x86: make embedding percpu allocator return excessive free space
percpu: use negative for auto for pcpu_setup_first_chunk() arguments
percpu: improve first chunk initial area map handling
percpu: cosmetic renames in pcpu_setup_first_chunk()
percpu: clean up percpu constants
x86: un-__init fill_pud/pmd/pte
x86: remove vestigial fix_ioremap prototypes
...
Manually merge conflicts in arch/ia64/kernel/irq_ia64.c
Conflicts:
arch/sparc/kernel/time_64.c
drivers/gpu/drm/drm_proc.c
Manual merge to resolve build warning due to phys_addr_t type change
on x86:
drivers/gpu/drm/drm_info.c
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This patch removes the following warnings and related ones.
Plus some cosmetics.
arch/ia64/kernel/patch.c:112: warning: passing argument 1 of 'paravirt_fc' makes integer from pointer without a cast
arch/ia64/kernel/patch.c:135: warning: passing argument 1 of 'paravirt_fc' makes integer from pointer without a cast
arch/ia64/kernel/patch.c:166: warning: passing argument 1 of 'paravirt_fc' makes integer from pointer without a cast
arch/ia64/kernel/patch.c:202: warning: passing argument 1 of 'paravirt_fc' makes integer from pointer without a cast
arch/ia64/kernel/patch.c:220: warning: passing argument 1 of 'paravirt_fc' makes integer from pointer without a cast
Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
Signed-off-by: Tony Luck <tony.luck@intel.com>
arch/ia64/kernel/irq_ia64.c: In function 'ia64_handle_irq':
arch/ia64/kernel/irq_ia64.c:498: error: 'struct kernel_stat' has no member named 'irqs'
arch/ia64/kernel/irq_ia64.c:500: error: 'struct kernel_stat' has no member named 'irqs'
arch/ia64/kernel/irq_ia64.c: In function 'ia64_process_pending_intr':
arch/ia64/kernel/irq_ia64.c:556: error: 'struct kernel_stat' has no member named 'irqs'
arch/ia64/kernel/irq_ia64.c:558: error: 'struct kernel_stat' has no member named 'irqs'
Fix build breakage due to recent kstat_this_cpu changes in:
d7e51e6689
sparseirq: make some func to be used with genirq
Signed-off-by: Jes Sorensen <jes@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
* 'irq-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (32 commits)
x86: disable __do_IRQ support
sparseirq, powerpc/cell: fix unused variable warning in interrupt.c
genirq: deprecate obsolete typedefs and defines
genirq: deprecate __do_IRQ
genirq: add doc to struct irqaction
genirq: use kzalloc instead of explicit zero initialization
genirq: make irqreturn_t an enum
genirq: remove redundant if condition
genirq: remove unused hw_irq_controller typedef
irq: export remove_irq() and setup_irq() symbols
irq: match remove_irq() args with setup_irq()
irq: add remove_irq() for freeing of setup_irq() irqs
genirq: assert that irq handlers are indeed running in hardirq context
irq: name 'p' variables a bit better
irq: further clean up the free_irq() code flow
irq: refactor and clean up the free_irq() code flow
irq: clean up manage.c
irq: use GFP_KERNEL for action allocation in request_irq()
kernel/irq: fix sparse warning: make symbol static
irq: optimize init_kstat_irqs/init_copy_kstat_irqs
...
define paravirt_dv_serialize_data() and insert it to suppress
false positive warnings.
Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
Signed-off-by: Tony Luck <tony.luck@intel.com>
implement binary patching optimization for pv_cpu_ops.
With this optimization, indirect call for pv_cpu_ops methods can be
converted into inline execution or direct call.
Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
Signed-off-by: Tony Luck <tony.luck@intel.com>
add helper functions to support binary patching for paravirt_ops.
Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Move down __kernel_syscall_via_epc to the end of the page.
We want to paravirtualize only __kernel_syscall_via_epc because
it includes privileged instructions. Its paravirtualization increases
its symbols size.
On the other hand, each paravirtualized gate must have e symbols of
same value and size to native's because the page is mapped to GATE_ADDR
and GATE_ADDR + PERCPU_PAGE_SIZE and vmlinux is linked to those symbols.
Later to have the same symbol size, we pads NOPs at the end of
__kernel_syscall_via_epc. Move it after other functions to keep
symbols of other functions have same values and sizes.
Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
Signed-off-by: Tony Luck <tony.luck@intel.com>
define xen specific gate page.
At this phase bits in the gate page is same to native.
At the next phase, it will be paravirtualized.
Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
Signed-off-by: Tony Luck <tony.luck@intel.com>
paravirtualize gate page by allowing each pv_ops instances
to define its own gate page.
Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
Signed-off-by: Tony Luck <tony.luck@intel.com>
add sched_clock() hook to paravirtualize sched_clock().
ia64 sched_clock() is based on ar.itc which isn't stable
on virtualized environment because vcpu may move around on
pcpus. So it needs paravirtualization.
Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
Signed-off-by: Tony Luck <tony.luck@intel.com>
paravirtualize ar.itc and ar.itm in order to support save/restore.
Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Add two hooks, paravirt_get_fsyscall_table() and
paravirt_get_fsys_bubble_doen() to paravirtualize fsyscall implementation.
This patch just add the hooks fsyscall and don't paravirtualize it.
Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
Signed-off-by: Tony Luck <tony.luck@intel.com>
For kvm's MSI support, it needs these macros defined in ia64_msi.c, and
to avoid duplicate them, move them to one header file and share with
kvm.
Signed-off-by: Xiantao Zhang <xiantao.zhang@intel.com>
Acked-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Impact: use new API
Use the accessors rather than frobbing bits directly. Most of this is
in arch code I haven't even compiled, but is straightforward.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Mike Travis <travis@sgi.com>
Impact: cleanup, futureproof
In fact, all cpumask ops will only be valid (in general) for bit
numbers < nr_cpu_ids. So use that instead of NR_CPUS in various
places.
This is always safe: no cpu number can be >= nr_cpu_ids, and
nr_cpu_ids is initialized to NR_CPUS at boot.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Mike Travis <travis@sgi.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
We're weaning the core code off handing cpumask's around on-stack.
This introduces arch_send_call_function_ipi_mask().
We also take the chance to wean send_IPI_mask off the obsolescent
for_each_cpu_mask(): making it take the pointer seemed the most
natural way.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Impact: __per_cpu_load available on all SMP capable archs
Percpu now requires three symbols to be defined - __per_cpu_load,
__per_cpu_start and __per_cpu_end. There were three archs which
didn't have it. Update them as follows.
* powerpc: can use generic PERCPU() macro. Compile tested for
powerpc32, compile/boot tested for powerpc64.
* ia64: can use generic PERCPU_VADDR() macro. __phys_per_cpu_start is
identical to __per_cpu_load. Compile tested and symbol table looks
identical after the change except for the additional __per_cpu_load.
* arm: added explicit __per_cpu_load definition. Currently uses
unified .init output section so can't use the generic macro. Dunno
whether the unified .init ouput section is required by arch
peculiarity so I left it alone. Please break it up and use PERCPU()
if possible.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Pat Gefre <pfg@sgi.com>
Cc: Russell King <rmk@arm.linux.org.uk>
vi arch/ia64/kernel/iosapic.c +142
static struct iosapic_intr_info {
...
} iosapic_intr_info[NR_IRQS];
But at line 510 we have:
for (i = 0; i <= NR_IRQS; i++) {
s/<=/</
Signed-off-by: Roel Kluin <roel.kluin@gmail.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
static struct {
... :114
unsigned short hash[UNW_HASH_SIZE];
... :2152
for (index = 0; index <= UNW_HASH_SIZE; ++index) {
This is a bug, isn't it?
s/<=/</
Signed-off-by: Roel Kluin <roel.kluin@gmail.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
The second call to cpu_clear() is redundant, as we've already removed
the CPU from cpu_online_map before calling migrate_platform_irqs().
Signed-off-by: Alex Chiang <achiang@hp.com>
Signed-off-by: Tony Luck <aegl@agluck-desktop.(none)>
This reverts commit e7b140365b.
Commit e7b14036 removes the targetted disabled CPU from the
cpu_online_map after calls to migrate_platform_irqs and fixup_irqs.
Paul McKenney states that the reasoning behind the patch was to
prevent irq handlers from running on CPUs marked offline because:
RCU happily ignores CPUs that don't have their bits set in
cpu_online_map, so if there are RCU read-side critical sections
in the irq handlers being run, RCU will ignore them. If the
other CPUs were running, they might sequence through the RCU
state machine, which could result in data structures being
yanked out from under those irq handlers, which in turn could
result in oopses or worse.
Unfortunately, both ia64 functions above look at cpu_online_map to find
a new CPU to migrate interrupts onto. This means we can potentially
migrate an interrupt off ourself back to... ourself. Uh oh.
This causes an oops when we finally try to process pending interrupts on
the CPU we want to disable. The oops results from calling __do_IRQ with
a NULL pt_regs:
Unable to handle kernel NULL pointer dereference (address 0000000000000040)
Call Trace:
[<a000000100016930>] show_stack+0x50/0xa0
sp=e0000009c922fa00 bsp=e0000009c92214d0
[<a0000001000171a0>] show_regs+0x820/0x860
sp=e0000009c922fbd0 bsp=e0000009c9221478
[<a00000010003c700>] die+0x1a0/0x2e0
sp=e0000009c922fbd0 bsp=e0000009c9221438
[<a0000001006e92f0>] ia64_do_page_fault+0x950/0xa80
sp=e0000009c922fbd0 bsp=e0000009c92213d8
[<a00000010000c7a0>] ia64_native_leave_kernel+0x0/0x270
sp=e0000009c922fc60 bsp=e0000009c92213d8
[<a0000001000ecdb0>] profile_tick+0xd0/0x1c0
sp=e0000009c922fe30 bsp=e0000009c9221398
[<a00000010003bb90>] timer_interrupt+0x170/0x3e0
sp=e0000009c922fe30 bsp=e0000009c9221330
[<a00000010013a800>] handle_IRQ_event+0x80/0x120
sp=e0000009c922fe30 bsp=e0000009c92212f8
[<a00000010013aa00>] __do_IRQ+0x160/0x4a0
sp=e0000009c922fe30 bsp=e0000009c9221290
[<a000000100012290>] ia64_process_pending_intr+0x2b0/0x360
sp=e0000009c922fe30 bsp=e0000009c9221208
[<a0000001000112d0>] fixup_irqs+0xf0/0x2a0
sp=e0000009c922fe30 bsp=e0000009c92211a8
[<a00000010005bd80>] __cpu_disable+0x140/0x240
sp=e0000009c922fe30 bsp=e0000009c9221168
[<a0000001006c5870>] take_cpu_down+0x50/0xa0
sp=e0000009c922fe30 bsp=e0000009c9221148
[<a000000100122610>] stop_cpu+0xd0/0x200
sp=e0000009c922fe30 bsp=e0000009c92210f0
[<a0000001000e0440>] kthread+0xc0/0x140
sp=e0000009c922fe30 bsp=e0000009c92210c8
[<a000000100014ab0>] kernel_thread_helper+0xd0/0x100
sp=e0000009c922fe30 bsp=e0000009c92210a0
[<a00000010000a4c0>] start_kernel_thread+0x20/0x40
sp=e0000009c922fe30 bsp=e0000009c92210a0
I don't like this revert because it is fragile. ia64 is getting lucky
because we seem to only ever process timer interrupts in this path, but
if we ever race with an IPI here, we definitely use RCU and have the
potential of hitting an oops that Paul describes above.
Patching ia64's timer_interrupt() to check for NULL pt_regs is
insufficient though, as we still hit the above oops.
As a short term solution, I do think that this revert is the right
answer. The revert hold up under repeated testing (24+ hour test runs)
with this setup:
- 8-way rx6600
- randomly toggling CPU online/offline state every 2 seconds
- running CPU exercisers, memory hog, disk exercisers, and
network stressors
- average system load around ~160
In the long term, we really need to figure out why we set pt_regs = NULL
in ia64_process_pending_intr(). If it turns out that it is unnecessary
to do so, then we could safely re-introduce e7b14036 (along with some
other logic to be smarter about migrating interrupts).
One final note: x86 also removes the disabled CPU from cpu_online_map
and then re-enables interrupts for 1ms, presumably to handle any pending
interrupts:
arch/x86/kernel/irq_32.c (and irq_64.c):
cpu_disable_common:
[remove cpu from cpu_online_map]
fixup_irqs():
for_each_irq:
[break CPU affinities]
local_irq_enable();
mdelay(1);
local_irq_disable();
So they are doing implicitly what ia64 is doing explicitly.
Signed-off-by: Alex Chiang <achiang@hp.com>
Signed-off-by: Tony Luck <aegl@agluck-desktop.(none)>
Impact: fix build error
to fix:
tip/arch/ia64/kernel/acpi.c:203: error: conflicting types for '__acpi_unmap_table'
tip/include/linux/acpi.h:82: error: previous declaration of '__acpi_unmap_table' was here
tip/arch/ia64/kernel/acpi.c:203: error: conflicting types for '__acpi_unmap_table'
tip/include/linux/acpi.h:82: error: previous declaration of '__acpi_unmap_table' was here
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
to prevent wrongly overwriting fixmap that still want to use.
ACPI used to rely on low mappings being all linearly mapped and
grew a habit: it never really unmapped certain kinds of tables
after use.
This can cause problems - for example the hypothetical case
when some spurious access still references it.
v2: remove prev_map and prev_size in __apci_map_table
v3: let acpi_os_unmap_memory() call early_iounmap too, so remove extral calling to
early_acpi_os_unmap_memory
v4: fix typo in one acpi_get_table_with_size calling
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Acked-by: Len Brown <len.brown@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
dma_mapping_error is used to see if dma_map_single and dma_map_page
succeed. IA64 VT-d dma_mapping_error always says that dma_map_single
is successful even though it could fail. Note that X86 VT-d works
properly in this regard.
This patch fixes IA64 VT-d dma_mapping_error by adding VT-d's own
dma_mapping_error() that works for both X86_64 and IA64. VT-d uses
zero as an error dma address so VT-d's dma_mapping_error returns 1 if
a passed dma address is zero (as x86's VT-d dma_mapping_error does
now).
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Before the dma ops unification, IA64 always uses GFP_DMA for
dma_alloc_coherent like:
#define dma_alloc_coherent(dev, size, handle, gfp) \
platform_dma_alloc_coherent(dev, size, handle, (gfp) | GFP_DMA)
This GFP_DMA enforcement doesn't make sense for IOMMUs since they can
do address translation to give addresses that devices can access
to. The IOMMU drivers ignore the zone flag. However, this is still
necessary for swiotlb since it can't do address translation.
We don't always need to use GFP_DMA for swiotlb. We need GFP_DMA for
devices incapable of 64bit DMA.
This patch is sorta updated version of:
http://marc.info/?l=linux-kernel&m=122638215612705&w=2
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: Ingo Molnar <mingo@elte.hu>