When ib_cancel_mad() is called, it puts the canceled send on a list
and schedules a "flushed" callback from process context. However,
this leaves a window where a receive completion could be processed
before the send is fully flushed.
This is fine, except that ib_find_send_mad() will find the MAD and
return it to the receive processing, which results in the sender
getting both a successful receive and a "flushed" send completion for
the same request. Understandably, this confuses the sender, which is
expecting only one of these two callbacks, and leads to grief such as
a use-after-free in IPoIB.
Fix this by changing ib_find_send_mad() to return a send struct only
if the status is still successful (and not "flushed"). The search of
the send_list already had this check, so this patch just adds the same
check to the search of the wait_list.
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Fix the AMSO1100 firmware version computation, which was broken
due to "&&" being used where "&" should have.
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Rework some load-time error handling: c2_register_device() leaked when
it failed, and the function that called it didn't check the return code.
Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Change ehca's Kconfig to activates scaling code as default. After
several measurements we saw that this feature prevents dropped packets
(UD) in stress situation. Thus, enabling it helps to improve ehca's
bandwidth through IPoIB.
Signed-off-by: Hoang-Nam Nguyen <hnguyen@de.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Define and use a constant EHCA_MAX_MTU instead hardcoded value.
Signed-off-by: Hoang-Nam Nguyen <hnguyen@de.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Assure 4K alignment for firmware control blocks in 64K page mode,
because kzalloc()'s result address might not be 4K aligned if 64K
pages are enabled. Thus, we introduce wrappers called
ehca_{alloc,free}_fw_ctrlblock(), which use a slab cache for objects
with 4K length and 4K alignment in order to alloc/free firmware
control blocks in 64K page mode. In 4K page mode those wrappers just
are defines of get_zeroed_page() and free_page().
Signed-off-by: Hoang-Nam Nguyen <hnguyen@de.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
* 'for-linus' of master.kernel.org:/pub/scm/linux/kernel/git/roland/infiniband:
RDMA/addr: Use client registration to fix module unload race
IB/mthca: Fix MAD extended header format for MAD_IFC firmware command
IB/uverbs: Return sq_draining value in query_qp response
IB/amso1100: Fix incorrect pr_debug()
IB/amso1100: Use dma_alloc_coherent() instead of kmalloc/dma_map_single
IB/ehca: Fix eHCA driver compilation for uniprocessor
RDMA/cma: rdma_bind_addr() leaks a cma_dev reference count
IB/iser: Start connection after enabling iSER
Require registration with ib_addr module to prevent caller from
unloading while a callback is in progress.
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
* 'upstream' of git://ftp.linux-mips.org/pub/scm/upstream-linus:
[MIPS] Do not use -msym32 option for modules.
[MIPS] Don't use R10000 llsc workaround version for all llsc-full processors.
[MIPS] Ocelot G: Fix : "CURRENTLY_UNUSED" is not defined warning.
[MIPS] Fix warning about init_initrd() call if !CONFIG_BLK_DEV_INITRD.
[MIPS] IP27: Allow SMP ;-) Another changeset messed up by patch.
[MIPS] Fix merge screwup by patch(1)
Revert "[MIPS] Make SPARSEMEM selectable on QEMU."
On 64-bit kernel, modules are loaded into XKSEG for now. While XKSEG
address is not a sign-extended 32-bit address, we can not use -msym32
option.
Signed-off-by: Atsushi Nemoto <anemo@mba.ocn.ne.jp>
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
CC arch/mips/momentum/ocelot_g/gt-irq.o
arch/mips/momentum/ocelot_g/gt-irq.c:30:5: warning: "CURRENTLY_UNUSED" is not defined
arch/mips/momentum/ocelot_g/gt-irq.c:199:5: warning: "CURRENTLY_UNUSED" is not defined
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
When lmo commit 4ef893e0515e8bf336dfbd200884f244869fbb43 was merged to
kernel.org as e73ea273ef patch happily
applied the IP27 segment to IP22. f63f36c18b11e166d0f362ac04dbcd7e6ea23f9e
did fix the effects partially - and with a wrong log message. Now fixed
for real (tm).
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Patch happily applied an Ocelot G patch to Ocelot C when merging
linux-mips.org changeset 91ee9a801e65d2981dfe327d2519c7fc6ab02e6b into
kernel.org as 6ceb6d3ab2.
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
This reverts commit 31473747bd.
Another amazing example of patch(1) messing up - lmo changeset
66e8560d11d02bcadc261498471831a6375ad046 was merged twice to kernel.org
and ended up doing this rubbish job.
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
* master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6:
[NETFILTER]: silence a warning in ebtables
[IPV6]: File the fingerprints off ah6->spi/esp6->spi
[TCP]: Set default congestion control when no sysctl.
[TIPC] net/tipc/port.c: fix NULL dereference
This reverts commit f833229c96:
According to reviewers and the lspci data provided in commit message
itself, PCI ID 0x7110 should not have been added to ata_piix.
Signed-off-by: Jeff Garzik <jeff@garzik.org>
net/bridge/netfilter/ebtables.c: In function 'ebt_dev_check':
net/bridge/netfilter/ebtables.c:89: warning: initialization discards qualifiers from pointer target type
So make the char* a const char * and the warning is gone.
Signed-off-by: Meelis Roos <mroos@linux.ee>
Signed-off-by: David S. Miller <davem@davemloft.net>
I copied the logic from ll/sc arch implementations, but that
was wrong and makes no sense at all. Just do a straight
compare-exchange instruction, just like x86.
Based upon bug reports from Dennis Gilmore and Fabio Massimo.
Signed-off-by: David S. Miller <davem@davemloft.net>
In theory these are opaque 32bit values. However, we end up
allocating them sequentially in host-endian and stick unchanged
on the wire.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
The setting of the default congestion control was buried in
the sysctl code so it would not be done properly if SYSCTL was
not enabled.
Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The correct order is: NULL check before dereference
Spotted by the Coverity checker.
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
It turns out that the linker warnings on 64-bit powerpc about "section
blah exceeds stub group size" were being triggered by conditional
branches in head_64.S branching to global symbols, whether in
head_64.S or in other files. This eliminates the warnings by making
some global symbols in head_64.S no longer global, and by rearranging
some branches.
Signed-off-by: Paul Mackerras <paulus@samba.org>
[ Yee-haa. Maybe I'll notice newly introduced real warnings now - Linus ]
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Since the "mask" bit is in the low word, when we write a new entry, we
need to write the high word first, before we potentially unmask it.
The exception is when we actually want to mask the interrupt, in which
case we want to write the low word first to make sure that the high word
doesn't change while the interrupt routing is still active.
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This is preparation for fixing the ordering of the accesses that
got broken by the commit cf4c6a2f27 when
factoring out the "common" io apic routing entry accesses.
Move the accessor function (that were only used by io_apic.c) out
of a header file, and use proper memory-mapped accesses rather than
making up our own "volatile" pointers.
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc:
[POWERPC] Make alignment exception always check exception table
[POWERPC] Disallow kprobes on emulate_step and branch_taken
[POWERPC] Make mmiowb's io_sync preempt safe
[POWERPC] Make high hugepage areas preempt safe
[POWERPC] Make current preempt-safe
[POWERPC] qe_lib: qe_issue_cmd writes wrong value to CECDR
[POWERPC] Use 4kB iommu pages even on 64kB-page systems
[POWERPC] Fix oprofile support for e500 in arch/powerpc
[POWERPC] Fix rmb() for e500-based machines it
[POWERPC] Fix various offb issues
ahci_softreset() used to use ahci_tf_read() which reads D2H_REG area
to check for the Status register. However, this area is zeroed on
initialization and not set by initial signature FIS. Replace it with
ahci_check_status().
This bug prevented CLO code from being activated whenever BSY and/or
DRQ is set prior to softreset. This fix makes
AHCI_FLAG_RESET_NEEDS_CLO flag redundant.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
* 'upstream-linus' of master.kernel.org:/pub/scm/linux/kernel/git/jgarzik/libata-dev:
[PATCH] ata_piix: allow 01b MAP for both ICH6M and ICH7M
[PATCH] libata: unexport ata_dev_revalidate()
[PATCH] Add 0x7110 piix to ata_piix.c
[PATCH] sata_sis: fix flags handling for the secondary port
The alignment exception used to only check the exception table for
-EFAULT, not for other errors. That opens an oops window if we can
coerce the kernel into getting an alignment exception for other reasons
in what would normally be a user-protected accessor, which can be done
via some of the futex ops. This fixes it by always checking the
exception tables.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
On powerpc, probing on emulate_step function will crash 2.6.18.1 when
it is triggered.
When kprobe is triggered, emulate_step() is on its kernel path and
will cause recursive kprobe fault. And branch_taken() is called
in emulate_step(). This disallows kprobes on both of them.
Signed-off-by: Paul Mackerras <paulus@samba.org>
If mmiowb() is always used prior to releasing spinlock as Doc suggests,
then it's safe against preemption; but I'm not convinced that's always
the case. If preemption occurs between sync and get_paca()->io_sync = 0,
I believe there's no problem. But in the unlikely event that gcc does
the store relative to another register than r13 (as it did with current),
then there's a small danger of setting another cpu's io_sync to 0, after
it had just set it to 1. Rewrite ppc64 mmiowb to prevent that.
The remaining io_sync assignments in io.h all get_paca()->io_sync = 1,
which is harmless even if preempted to the wrong cpu (the context switch
itself syncs); and those in spinlock.h are while preemption is disabled.
Signed-off-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Checking source for other get_paca()->field preemption dangers found that
open_high_hpage_areas does a structure copy into its paca while preemption
is enabled: unsafe however gcc accomplishes it. Just remove that copy:
it's done safely afterwards by on_each_cpu, as in open_low_hpage_areas.
Signed-off-by: Hugh Dickins <hugh@veritas.com>
Acked-by: David Gibson <dwg@au1.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Repeated -j20 kernel builds on a G5 Quad running an SMP PREEMPT kernel
would often collapse within a day, some exec failing with "Bad address".
In each case examined, load_elf_binary was doing a kernel_read, but
generic_file_aio_read's access_ok saw current->thread.fs.seg as USER_DS
instead of KERNEL_DS.
objdump of filemap.o shows gcc 4.1.0 emitting "mr r5,r13 ... ld r9,416(r5)"
here for get_paca()->__current, instead of the expected and much more usual
"ld r9,416(r13)"; I've seen other gcc4s do the same, but perhaps not gcc3s.
So, if the task is preempted and rescheduled on a different cpu in between
the mr and the ld, r5 will be looking at a different paca_struct from the
one it's now on, pick up the wrong __current, and perhaps the wrong seg.
Presumably much worse could happen elsewhere, though that split is rare.
Other architectures appear to be safe (x86_64's read_pda is more limiting
than get_paca), but ppc64 needs to force "current" into one instruction.
Signed-off-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Changed qe_issue_cmd() to write cmd_input to the CECDR unmodified. It
was treating cmd_input as a virtual address and tried to convert it to
a physical address.
Signed-off-by: Timur Tabi <timur@freescale.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
The 10Gigabit ethernet device drivers appear to be able to chew
up all 256MB of TCE mappings on pSeries systems, as evidenced by
numerous error messages:
iommu_alloc failed, tbl c0000000010d5c48 vaddr c0000000d875eff0 npages 1
Some experimentation indicates that this is essentially because
one 1500 byte ethernet MTU gets mapped as a 64K DMA region when
the large 64K pages are enabled. Thus, it doesn't take much to
exhaust all of the available DMA mappings for a high-speed card.
This patch changes the iommu allocator to work with its own
unique, distinct page size. Although the patch is long, its
actually quite simple: it just #defines a distinct IOMMU_PAGE_SIZE
and then uses this in all the places that matter.
As a side effect, it also dramatically improves network performance
on platforms with H-calls on iommu translation inserts/removes (since
we no longer call it 16 times for a 1500 bytes packet when the iommu HW
is still 4k).
In the future, we might want to make the IOMMU_PAGE_SIZE a variable
in the iommu_table instance, thus allowing support for different HW
page sizes in the iommu itself.
Signed-off-by: Linas Vepstas <linas@austin.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: Olof Johansson <olof@lixom.net>
Acked-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Fixed a compile error in building the 85xx support with oprofile, and in
the process cleaned up some issues with the fsl_booke performance monitor
code.
* Reorganized FSL Book-E performance monitoring code so that the 7450
wouldn't be built if the e500 was, and cleaned it up so it was more
self-contained.
* Added a cpu_setup function for FSL Book-E. The original
cpu_setup function prototype had no arguments, assuming that
the reg_setup function would copy the required information into
variables which represented the registers. This was silly for
e500, since it has 1 register per counter (rather than 3 for
all counters), so the code has been restructured to have
cpu_setup take the current counter config array as an argument,
with op_powerpc_setup() invoking op_powerpc_cpu_setup() through
on_each_cpu(), and op_powerpc_cpu_setup() invoking the
model-specific cpu_setup function with an argument. The
argument is ignored on all other platforms at present.
* Fixed a confusing line where a trinary operator only had two
arguments
Signed-off-by: Andrew Fleming <afleming@freescale.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
The e500 core generates an illegal instruction exception when it tries
to execute the lwsync instruction, which we currently use for rmb().
This fixes it by using the LWSYNC macro, which turns into a plain sync
on 32-bit machines.
Signed-off-by: Andrew Fleming <afleming@freescale.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This patch fixes a few issues in offb:
- A test was inverted causing the palette hack to never work
(no device node was passed down to the init function)
- Some cards seem to have their assigned-addresses property in a random
order, thus we need to try using of_get_pci_address() first, which will
fail if it's not a PCI device, and fallback to of_get_address() in that
case. of_get_pci_address() properly parsees assigned-addresses to test
the BAR number and thus will get it right whatever the order is.
- Some cards (like GXT4500) provide a linebytes of 0xffffffff in the
device-tree which does no good. This patch handles that by using the
screen width when that happens. (Also fixes btext.c while at it).
- Add detection of the GXT4500 in addition to the GXT2000 for the
palette hacks (we use the same hack, palette is linear in register space
at offset 0x6000).
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
ICH7M was separated from ICH6M to allow undocumented MAP value 01b
which was spotted on an ASUS notebook. However, there is also
notebooks with MAP value 01b on ICH6M. This patch re-merges ICH6M and
ICH7M entries and allows MAP value 01b for both.
This problem has been reported and initial patch provided by Jonathan
Dieter.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Cc: Jonathan Dieter <jdieter@gmail.com>
Cc: Tom Deblauwe <tom.deblauwe@telenet.be>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
ata_dev_revalidate() isn't used outside of libata core. Unexport it.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Jeff Garzik <jeff@garzik.org>