2
0
mirror of https://github.com/edk2-porting/linux-next.git synced 2024-12-28 23:23:55 +08:00
Commit Graph

3095 Commits

Author SHA1 Message Date
Kumar Sanghvi
01b225e18f RDMA/cxgb4: Fix retry with MPAv1 logic for MPAv2
Fix logic so that we don't retry with MPAv1 once we have done that
already.  Otherwise, we end up retrying with MPAv1 even when its not
needed on getting peer aborts - and this could lead to kernel panic.

Signed-off-by: Kumar Sanghvi <kumaras@chelsio.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2011-11-28 11:58:07 -08:00
Jonathan Lallinger
c34c97ad8c RDMA/cxgb4: Fix iw_cxgb4 count_rcqes() logic
Fix another place in the code where logic dealing with the t4_cqe was
using the wrong QID.  This fixes the counting logic so that it tests
against the SQ QID instead of the RQ QID when counting RCQES.

Signed-off by: Jonathan Lallinger <jonathan@ogc.us>
Signed-off by: Steve Wise <swise@ogc.us>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2011-11-28 11:53:05 -08:00
Alexey Dobriyan
4e3fd7a06d net: remove ipv6_addr_copy()
C assignment can handle struct in6_addr copying.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-11-22 16:43:32 -05:00
David S. Miller
9ca36f7db2 infiniband: Update net drivers for netdev_features_t changes.
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-11-16 18:05:50 -05:00
Mike Marciniszyn
042f36e156 IB/qib: Don't use schedule_work()
It was mistakenly introduced by dde05cbdf8 ("IB/qib: Hold links
until tuning data is available").

Signed-off-by: Mike Marciniszyn <mike.marciniszyn@qlogic.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2011-11-08 10:37:53 -08:00
Linus Torvalds
32aaeffbd4 Merge branch 'modsplit-Oct31_2011' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux
* 'modsplit-Oct31_2011' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux: (230 commits)
  Revert "tracing: Include module.h in define_trace.h"
  irq: don't put module.h into irq.h for tracking irqgen modules.
  bluetooth: macroize two small inlines to avoid module.h
  ip_vs.h: fix implicit use of module_get/module_put from module.h
  nf_conntrack.h: fix up fallout from implicit moduleparam.h presence
  include: replace linux/module.h with "struct module" wherever possible
  include: convert various register fcns to macros to avoid include chaining
  crypto.h: remove unused crypto_tfm_alg_modname() inline
  uwb.h: fix implicit use of asm/page.h for PAGE_SIZE
  pm_runtime.h: explicitly requires notifier.h
  linux/dmaengine.h: fix implicit use of bitmap.h and asm/page.h
  miscdevice.h: fix up implicit use of lists and types
  stop_machine.h: fix implicit use of smp.h for smp_processor_id
  of: fix implicit use of errno.h in include/linux/of.h
  of_platform.h: delete needless include <linux/module.h>
  acpi: remove module.h include from platform/aclinux.h
  miscdevice.h: delete unnecessary inclusion of module.h
  device_cgroup.h: delete needless include <linux/module.h>
  net: sch_generic remove redundant use of <linux/module.h>
  net: inet_timewait_sock doesnt need <linux/module.h>
  ...

Fix up trivial conflicts (other header files, and  removal of the ab3550 mfd driver) in
 - drivers/media/dvb/frontends/dibx000_common.c
 - drivers/media/video/{mt9m111.c,ov6650.c}
 - drivers/mfd/ab3550-core.c
 - include/linux/dmaengine.h
2011-11-06 19:44:47 -08:00
Roland Dreier
b8108d6886 Merge branches 'iser', 'mthca' and 'qib' into for-next 2011-11-04 09:36:04 -07:00
Mike Marciniszyn
30ab7e230b IB/qib: Fix panic in RC error flushing logic
The following panic can occur when flushing a QP:

    RIP: 0010:[<ffffffffa0168e8b>]  [<ffffffffa0168e8b>] qib_send_complete+0x3b/0x190 [ib_qib]
    RSP: 0018:ffff8803cdc6fc90  EFLAGS: 00010046
    RAX: 0000000000000000 RBX: ffff8803d84ba000 RCX: 0000000000000000
    RDX: 0000000000000005 RSI: ffffc90015a53430 RDI: ffff8803d84ba000
    RBP: ffff8803cdc6fce0 R08: ffff8803cdc6fc90 R09: 0000000000000001
    R10: 00000000ffffffff R11: 0000000000000000 R12: ffff8803d84ba0c0
    R13: ffff8803d84ba5cc R14: 0000000000000800 R15: 0000000000000246
    FS:  0000000000000000(0000) GS:ffff880036600000(0000) knlGS:0000000000000000
    CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
    CR2: 0000000000000034 CR3: 00000003e44f9000 CR4: 00000000000406f0
    DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
    Process qib/0 (pid: 1350, threadinfo ffff8803cdc6e000, task ffff88042728a100)
    Stack:
     53544c5553455201 0000000100000005 0000000000000000 ffff8803d84ba000
     0000000000000000 0000000000000000 0000000000000000 0000000000000000
     0000000000000000 0000000000000001 ffff8803cdc6fd30 ffffffffa0165d7a
    Call Trace:
     [<ffffffffa0165d7a>] qib_make_rc_req+0x36a/0xe80 [ib_qib]
     [<ffffffffa0165a10>] ?  qib_make_rc_req+0x0/0xe80 [ib_qib]
     [<ffffffffa01698b3>] qib_do_send+0xf3/0xb60 [ib_qib]
     [<ffffffff814db757>] ? thread_return+0x4e/0x777
     [<ffffffffa01697c0>] ? qib_do_send+0x0/0xb60 [ib_qib]
     [<ffffffff81088bf0>] worker_thread+0x170/0x2a0
     [<ffffffff8108e530>] ?  autoremove_wake_function+0x0/0x40
     [<ffffffff81088a80>] ? worker_thread+0x0/0x2a0
     [<ffffffff8108e1c6>] kthread+0x96/0xa0
     [<ffffffff8100c1ca>] child_rip+0xa/0x20
     [<ffffffff8108e130>] ? kthread+0x0/0xa0
     [<ffffffff8100c1c0>] ? child_rip+0x0/0x20
    RIP  [<ffffffffa0168e8b>] qib_send_complete+0x3b/0x190 [ib_qib]

The RC error state flush logic in qib_make_rc_req() could return all
of the acked wqes and potentially have emptied the queue.  It would
then unconditionally try return a flush completion via
qib_send_complete() for an invalid wqe, or worse a valid one that is
not queued. The panic results when the completion code tries to
maintain an MR reference count for a NULL MR.

This fix modifies logic to only send one completion per
qib_make_rc_req() call and changing the completion status from
IB_WC_SUCCESS to IB_WC_WR_FLUSH_ERR as the completions progress.

The outer loop will call as many times as necessary to flush the queue.

Reviewed-by: Ram Vepa <ram.vepa@qlogic.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@qlogic.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2011-11-04 09:35:44 -07:00
Or Gerlitz
52439540ea IB/iser: DMA unmap TX bufs used for iSCSI/iSER headers
The current driver never does DMA unmapping on these buffers.  Fix that
by adding DMA unmapping to the task cleanup callback, and DMA mapping to
the task init function (drop the headers_initialized micro-optimization).

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2011-11-04 09:32:44 -07:00
Or Gerlitz
2c4ce60934 IB/iser: Use separate buffers for the login request/response
The driver counted on the transactional nature of iSCSI login/text
flows and used the same buffer for both the request and the response.
We also went further and did DMA mapping only once, with
DMA_FROM_DEVICE, which violates the DMA mapping API.  Fix that by
using different buffers, one for requests and one for responses, and
use the correct DMA mapping direction for each.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2011-11-04 09:30:52 -07:00
Roland Dreier
e4221314a5 IB/mthca: Fix buddy->num_free allocation size
The num_free field of mthca_buddy has a type of array of unsigned int
while it was allocated as an array of pointers.  On 64-bit platforms
this allocates twice more than required.  Fix this by allocating the
correct size for the type.

This is the same bug just fixed in mlx4 by Eli Cohen <eli@mellanox.co.il>.

Signed-off-by: Roland Dreier <roland@purestorage.com>
2011-11-03 17:48:25 -07:00
Linus Torvalds
f470f8d4e7 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: (62 commits)
  mlx4_core: Deprecate log_num_vlan module param
  IB/mlx4: Don't set VLAN in IBoE WQEs' control segment
  IB/mlx4: Enable 4K mtu for IBoE
  RDMA/cxgb4: Mark QP in error before disabling the queue in firmware
  RDMA/cxgb4: Serialize calls to CQ's comp_handler
  RDMA/cxgb3: Serialize calls to CQ's comp_handler
  IB/qib: Fix issue with link states and QSFP cables
  IB/mlx4: Configure extended active speeds
  mlx4_core: Add extended port capabilities support
  IB/qib: Hold links until tuning data is available
  IB/qib: Clean up checkpatch issue
  IB/qib: Remove s_lock around header validation
  IB/qib: Precompute timeout jiffies to optimize latency
  IB/qib: Use RCU for qpn lookup
  IB/qib: Eliminate divide/mod in converting idx to egr buf pointer
  IB/qib: Decode path MTU optimization
  IB/qib: Optimize RC/UC code by IB operation
  IPoIB: Use the right function to do DMA unmap pages
  RDMA/cxgb4: Use correct QID in insert_recv_cqe()
  RDMA/cxgb4: Make sure flush CQ entries are collected on connection close
  ...
2011-11-01 10:51:38 -07:00
Roland Dreier
504255f8d0 Merge branches 'amso1100', 'cma', 'cxgb3', 'cxgb4', 'fdr', 'ipath', 'ipoib', 'misc', 'mlx4', 'misc', 'nes', 'qib' and 'xrc' into for-next 2011-11-01 09:37:08 -07:00
Christoph Lameter
bc3e53f682 mm: distinguish between mlocked and pinned pages
Some kernel components pin user space memory (infiniband and perf) (by
increasing the page count) and account that memory as "mlocked".

The difference between mlocking and pinning is:

A. mlocked pages are marked with PG_mlocked and are exempt from
   swapping. Page migration may move them around though.
   They are kept on a special LRU list.

B. Pinned pages cannot be moved because something needs to
   directly access physical memory. They may not be on any
   LRU list.

I recently saw an mlockalled process where mm->locked_vm became
bigger than the virtual size of the process (!) because some
memory was accounted for twice:

Once when the page was mlocked and once when the Infiniband
layer increased the refcount because it needt to pin the RDMA
memory.

This patch introduces a separate counter for pinned pages and
accounts them seperately.

Signed-off-by: Christoph Lameter <cl@linux.com>
Cc: Mike Marciniszyn <infinipath@qlogic.com>
Cc: Roland Dreier <roland@kernel.org>
Cc: Sean Hefty <sean.hefty@intel.com>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-10-31 17:30:46 -07:00
Paul Gortmaker
fec14d2fce infiniband: add moduleparam.h to drivers/infiniband as required
These files were getting the moduleparam infrastructure from the
implicit presence of module.h being everywhere, but that is going
away soon.

Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2011-10-31 19:31:36 -04:00
Paul Gortmaker
b108d9764c infiniband: add in export.h for files using EXPORT_SYMBOL/THIS_MODULE
These were getting it implicitly via device.h --> module.h but
we are going to stop that when we clean up the headers.

Fix these in advance so the tree remains biscect-clean.

Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2011-10-31 19:31:35 -04:00
Paul Gortmaker
e4dd23d753 infiniband: Fix up module files that need to include module.h
They had been getting it implicitly via device.h but we can't
rely on that for the future, due to a pending cleanup so fix
it now.

Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2011-10-31 19:31:35 -04:00
Paul Gortmaker
fc87af74af infiniband: Fix up users implicitly relying on getting stat.h
They get it via module.h (via device.h) but we want to clean that up.
When we do, we'll get things like:

  CC [M]  drivers/infiniband/core/sysfs.o
  sysfs.c:361: error: 'S_IRUGO' undeclared here (not in a function)
  sysfs.c:654: error: 'S_IWUSR' undeclared here (not in a function)

so add in the stat header it is using explicitly in advance.

Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2011-10-31 19:31:34 -04:00
Or Gerlitz
80a2dcd8d0 IB/mlx4: Don't set VLAN in IBoE WQEs' control segment
There's no need to set the vlan-related fields in an IBoE send WQE
control segment:

 - the vlan to be used by a UD QP is set in the datagram segment.
 - for GSI (CM) QP, all the headers down to 8021q and MAC are built by
   the software anyway.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2011-10-31 11:57:51 -07:00
Or Gerlitz
bcacb89756 IB/mlx4: Enable 4K mtu for IBoE
The IBoE port MTU is derived from the corresponding Ethernet netdevice
MTU, which can support jumbo frames of 9K, and hence surely supports
the max IB mtu of 4K.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2011-10-31 11:55:15 -07:00
Tom Tucker
d32ae393db RDMA/cxgb4: Mark QP in error before disabling the queue in firmware
QPs need to be moved to error before telling the firwmare to shutdown
the queue.  Otherwise, the application can submit WRs that will never
get fetched by the hardware and never flushed by the driver.

Signed-off-by: Kumar Sanghvi <kumaras@chelsio.com>
Acked-by: Steve Wise <swsie@opengridcomputing.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2011-10-31 11:36:08 -07:00
Kumar Sanghvi
581bbe2cd0 RDMA/cxgb4: Serialize calls to CQ's comp_handler
Commit 01e7da6ba5 ("RDMA/cxgb4: Make sure flush CQ entries are
collected on connection close") introduced a potential problem where a
CQ's comp_handler can get called simultaneously from different places
in the iw_cxgb4 driver.  This does not comply with
Documentation/infiniband/core_locking.txt, which states that at a
given point of time, there should be only one callback per CQ should
be active.

This problem was reported by Parav Pandit <Parav.Pandit@Emulex.Com>.
Based on discussion between Parav Pandit and Steve Wise, this patch
fixes the above problem by serializing the calls to a CQ's
comp_handler using a spin_lock.

Reported-by: Parav Pandit <Parav.Pandit@Emulex.Com>
Signed-off-by: Kumar Sanghvi <kumaras@chelsio.com>
Acked-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2011-10-31 11:34:53 -07:00
Kumar Sanghvi
f7cc25d018 RDMA/cxgb3: Serialize calls to CQ's comp_handler
iw_cxgb3 has a potential problem where a CQ's comp_handler can get
called simultaneously from different places in iw_cxgb3 driver.  This
does not comply with Documentation/infiniband/core_locking.txt, which
states that at a given point of time, there should be only one
callback per CQ should be active.

Such problem was reported by Parav Pandit <Parav.Pandit@Emulex.Com>
for iw_cxgb4 driver.  Based on discussion between Parav Pandit and
Steve Wise, this patch fixes the above problem by serializing the
calls to a CQ's comp_handler using a spin_lock.

Signed-off-by: Kumar Sanghvi <kumaras@chelsio.com>
Acked-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2011-10-31 11:33:17 -07:00
Mitko Haralanov
16d99812d5 IB/qib: Fix issue with link states and QSFP cables
Fix an issue where the link would come up after replugging a cable
even if it has been DISABLED manually.

Signed-off-by: Mitko Haralanov <mitko@qlogic.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@qlogic.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2011-10-31 10:57:59 -07:00
Linus Torvalds
ec7ae51753 Merge git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6: (204 commits)
  [SCSI] qla4xxx: export address/port of connection (fix udev disk names)
  [SCSI] ipr: Fix BUG on adapter dump timeout
  [SCSI] megaraid_sas: Fix instance access in megasas_reset_timer
  [SCSI] hpsa: change confusing message to be more clear
  [SCSI] iscsi class: fix vlan configuration
  [SCSI] qla4xxx: fix data alignment and use nl helpers
  [SCSI] iscsi class: fix link local mispelling
  [SCSI] iscsi class: Replace iscsi_get_next_target_id with IDA
  [SCSI] aacraid: use lower snprintf() limit
  [SCSI] lpfc 8.3.27: Change driver version to 8.3.27
  [SCSI] lpfc 8.3.27: T10 additions for SLI4
  [SCSI] lpfc 8.3.27: Fix queue allocation failure recovery
  [SCSI] lpfc 8.3.27: Change algorithm for getting physical port name
  [SCSI] lpfc 8.3.27: Changed worst case mailbox timeout
  [SCSI] lpfc 8.3.27: Miscellanous logic and interface fixes
  [SCSI] megaraid_sas: Changelog and version update
  [SCSI] megaraid_sas: Add driver workaround for PERC5/1068 kdump kernel panic
  [SCSI] megaraid_sas: Add multiple MSI-X vector/multiple reply queue support
  [SCSI] megaraid_sas: Add support for MegaRAID 9360/9380 12GB/s controllers
  [SCSI] megaraid_sas: Clear FUSION_IN_RESET before enabling interrupts
  ...
2011-10-28 16:44:18 -07:00
Marcel Apfelbaum
a5e12dff75 IB/mlx4: Configure extended active speeds
Set the extended active speeds based on the hardware configuration.

Signed-off-by: Marcel Apfelbaum <marcela@dev.mellanox.co.il>
Reviewed-by: Hal Rosenstock <hal@mellanox.com>

[ Move FDR-10 handling into ib_link_query_port().  - Roland ]

Signed-off-by: Roland Dreier <roland@purestorage.com>
2011-10-28 11:36:16 -07:00
Mitko Haralanov
dde05cbdf8 IB/qib: Hold links until tuning data is available
Hold the link state machine until the tuning data is read from the
QSFP EEPROM so correct tuning settings are applied before the state
machine attempts to bring the link up.  Link is also held on cable
unplug in case a different cable is used.

Signed-off-by: Mitko Haralanov <mitko@qlogic.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@qlogic.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2011-10-21 15:08:20 -07:00
Mike Marciniszyn
44d75d3d92 IB/qib: Clean up checkpatch issue
This was probably present from initial submission.

Signed-off-by: Mike Marciniszyn <mike.marciniszyn@qlogic.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2011-10-21 15:08:18 -07:00
Mike Marciniszyn
9fd5473deb IB/qib: Remove s_lock around header validation
Review of qib_ruc_check_hdr() shows that the s_lock is not required in
the normal case.  The r_lock is held in all cases, and protects the qp
fields that are read.

The s_lock will be needed to around the call to qib_migrate_qp() to
insure that the send engine sees a consistent set of fields.

Signed-off-by: Mike Marciniszyn <mike.marciniszyn@qlogic.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2011-10-21 09:38:57 -07:00
Mike Marciniszyn
d0f2faf72d IB/qib: Precompute timeout jiffies to optimize latency
A new field is added to qib_qp called timeout_jiffies. It is
initialized upon create and modify.

The field is now used instead of a computation based on qp->timeout.

Signed-off-by: Mike Marciniszyn <mike.marciniszyn@qlogic.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2011-10-21 09:38:56 -07:00
Mike Marciniszyn
af061a644a IB/qib: Use RCU for qpn lookup
The heavy weight spinlock in qib_lookup_qpn() is replaced with RCU.
The hash list itself is now accessed via jhash functions instead of mod.

The changes should benefit multiple receive contexts in different
processors by not contending for the lock just to read the hash
structures.

The patch also adds a lookaside_qp (pointer) and a lookaside_qpn in
the context.  The interrupt handler will test the current packet's qpn
against lookaside_qpn if the lookaside_qp pointer is non-NULL.  The
pointer is NULL'ed when the interrupt handler exits.

Signed-off-by: Mike Marciniszyn <mike.marciniszyn@qlogic.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2011-10-21 09:38:54 -07:00
Mike Marciniszyn
9e1c0e4325 IB/qib: Eliminate divide/mod in converting idx to egr buf pointer
The context init now saves a shift from rcvegrbufs_perchunk
rcvegrbufs_perchunk_shift using ilog2.   A BUG_ON() protects the
power of 2 assumption.

Signed-off-by: Mike Marciniszyn <mike.marciniszyn@qlogic.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2011-10-21 09:38:52 -07:00
Mike Marciniszyn
cc6ea1385b IB/qib: Decode path MTU optimization
Store both the encoded and decoded MTU in the QP structure as a minor
optimization for UC/RC receive routines.

Signed-off-by: Mike Marciniszyn <mike.marciniszyn@qlogic.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2011-10-21 09:38:50 -07:00
Mike Marciniszyn
2fc109c890 IB/qib: Optimize RC/UC code by IB operation
The memset for zeroing work completions had been unconditional.

This patch removes the memset and moves the zeroing into the work
completion with a more explicit field by field set.  With this patch,
non-ONLY/non-LAST packets will avoid the overhead since they will not
generate a completion.

Signed-off-by: Mike Marciniszyn <mike.marciniszyn@qlogic.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2011-10-21 09:38:49 -07:00
Eric Dumazet
9e903e0852 net: add skb frag size accessors
To ease skb->truesize sanitization, its better to be able to localize
all references to skb frags size.

Define accessors : skb_frag_size() to fetch frag size, and
skb_frag_size_{set|add|sub}() to manipulate it.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-10-19 03:10:46 -04:00
Dotan Barak
787adb9d6a IPoIB: Use the right function to do DMA unmap pages
Pages that were mapped using ib_dma_map_page() should be unmapped
using ib_dma_unmap_page().

Signed-off-by: Dotan Barak <dotanb@dev.mellanox.co.il>
Reviewed-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2011-10-18 10:08:31 -07:00
Jonathan Lallinger
e14d62c05c RDMA/cxgb4: Use correct QID in insert_recv_cqe()
When creating flushed receive CQEs, set the QPID field in the t4_cqe
to the SQ QID and not the RQ QID.  Otherwise the poll code will not
find the correct QP context.

Signed-off by: Jonathan Lallinger <jonathan@ogc.us>
Signed-off by: Steve Wise <swise@ogc.us>

Signed-off-by: Roland Dreier <roland@purestorage.com>
2011-10-14 14:23:40 -07:00
Kumar Sanghvi
01e7da6ba5 RDMA/cxgb4: Make sure flush CQ entries are collected on connection close
At the time when a peer closes the connection, iw_cxgb4 will not send
a cq event if ibqp.uobject exists.  In that case, its possible for a
user application to get blocked in ibv_get_cq_event().

To resolve this, call the cq's comp_handler to unblock any read from
ibv_get_cq_event().  This will trigger userspace to poll the cq and
collect flush status completions for any pending work requests.

Signed-off-by: Kumar Sanghvi <kumaras@chelsio.com>
Acked-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2011-10-14 14:23:04 -07:00
Sean Hefty
42849b2697 RDMA/uverbs: Export ib_open_qp() capability to user space
Allow processes that share the same XRC domain to open an existing
shareable QP.  This permits those processes to receive events on the
shared QP and transfer ownership, so that any process may modify the
QP.  The latter allows the creating process to exit, while a remaining
process can still transition it for path migration purposes.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2011-10-13 09:50:56 -07:00
Sean Hefty
0e0ec7e063 RDMA/core: Export ib_open_qp() to share XRC TGT QPs
XRC TGT QPs are shared resources among multiple processes.  Since the
creating process may exit, allow other processes which share the same
XRC domain to open an existing QP.  This allows us to transfer
ownership of an XRC TGT QP to another process.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2011-10-13 09:49:51 -07:00
Sean Hefty
0a1405da99 IB/mlx4: Add support for XRC QPs
Support the creation of XRC INI and TGT QPs.  To handle the case where
a CQ or PD is not provided, we allocate them internally with the xrcd.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2011-10-13 09:44:18 -07:00
Sean Hefty
18abd5ea57 IB/mlx4: Add support for XRC SRQs
Allow the user to create XRC SRQs.  This patch is based on a patch
from Jack Morgenstrein <jackm@dev.mellanox.co.il>.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2011-10-13 09:43:46 -07:00
Sean Hefty
012a8ff577 IB/mlx4: Add support for XRC domains
Support creating and destroying XRC domains.  Any sharing of the XRCD
is managed above the low-level driver.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2011-10-13 09:43:03 -07:00
Sean Hefty
2622e18ef4 IB/cm: Do not automatically disconnect XRC TGT QPs
Because an XRC TGT QP can end up being shared among multiple
processes, don't have the ib_cm automatically send a DREQ when the
userspace process that owns the ib_cm_id exits.  Disconnect can be
initiated by the user directly; otherwise, the owner of the XRC INI QP
controls the connection.

Note that as a result of the process exiting, the ib_cm will stop
tracking the XRC connection on the target side.  For the purposes of
disconnecting, this isn't a big deal.  The ib_cm will respond to the
DREQ appropriately.  For other messages, mainly LAP, the CM will
reject the request, since there's no one available to route the
request to.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2011-10-13 09:42:06 -07:00
Sean Hefty
18c441a6c3 RDMA/cma: Support XRC QPs
Allow users to connect XRC QPs through the rdma_cm.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2011-10-13 09:41:36 -07:00
Sean Hefty
638ef7a6c6 RDMA/ucm: Allow user to specify QP type when creating id
Allow the user to indicate the QP type separately from the port space
when allocating an rdma_cm_id.  With RDMA_PS_IB, there is no longer a
1:1 relationship between the QP type and port space, so we need to
switch on the QP type to select between UD and connected QPs.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2011-10-13 09:40:36 -07:00
Sean Hefty
2d2e941529 RDMA/cm: Define new RDMA port space specific to IB
Add RDMA_PS_IB.  XRC QP types will use the IB port space when operating
over the RDMA CM.  For the 'IP protocol' field value, we select 0x3F,
which is listed as being for 'any local network'.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2011-10-13 09:39:52 -07:00
Sean Hefty
ef70044647 IB/cm: Update XRC support based on XRC annex errata
The XRC annex was updated to have XRC behave more like RD. Specifically,
the XRC TGT QPN moves from the local QPN to local EECN field.  Lookup of
SRQN is done using the REQ/REP protocol.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2011-10-13 09:38:35 -07:00
Sean Hefty
d26a360b77 IB/cm: Update protocol to support XRC
Update the REQ and REP messages to support XRC connection setup
according to the XRC Annex.  Several existing fields must be set to 0 or
1 when connecting XRC QPs, and a reserved field is changed to an
extended transport type.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2011-10-13 09:37:55 -07:00
Sean Hefty
b93f3c1872 RDMA/uverbs: Export XRC TGT QPs to user space
Allow user space to operate on XRC TGT QPs the same way as other types
of QPs, with one notable exception: since XRC TGT QPs may be shared
among multiple processes, the XRC TGT QP is allowed to exist beyond the
lifetime of the creating process.

The process that creates the QP is allowed to destroy it, but if the
process exits without destroying the QP, then the QP will be left bound
to the lifetime of the XRCD.

TGT QPs are not associated with CQs or a PD.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2011-10-13 09:37:07 -07:00
Sean Hefty
9977f4f64b RDMA/uverbs: Export XRC INI QPs to userspace
XRC INI QPs are similar to send only RC QPs.  Allow user space to create
INI QPs.  Note that INI QPs do not require receive CQs.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2011-10-13 09:32:27 -07:00
Sean Hefty
8541f8de05 RDMA/uverbs: Export XRC SRQs to user space
We require additional information to create XRC SRQs than we can
exchange using the existing create SRQ ABI.  Provide an enhanced create
ABI for extended SRQ types.

Based on patches by Jack Morgenstein <jackm@dev.mellanox.co.il>
and Roland Dreier <roland@purestorage.com>

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2011-10-13 09:29:18 -07:00
Sean Hefty
53d0bd1e7f RDMA/uverbs: Export XRC domains to user space
Allow user space to create XRC domains.  Because XRCDs are expected to
be shared among multiple processes, we use inodes to identify an XRCD.

Based on patches by Jack Morgenstein <jackm@dev.mellanox.co.il>

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2011-10-13 09:21:24 -07:00
Sean Hefty
d3d72d909e RDMA/verbs: Cleanup XRC TGT QPs when destroying XRCD
XRC TGT QPs are intended to be shared among multiple users and
processes.  Allow the destruction of an XRC TGT QP to be done explicitly
through ib_destroy_qp() or when the XRCD is destroyed.

To support destroying an XRC TGT QP, we need to track TGT QPs with the
XRCD.  When the XRCD is destroyed, all tracked XRC TGT QPs are also
cleaned up.

To avoid stale reference issues, if a user is holding a reference on a
TGT QP, we increment a reference count on the QP.  The user releases the
reference by calling ib_release_qp.  This releases any access to the QP
from a user above verbs, but allows the QP to continue to exist until
destroyed by the XRCD.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2011-10-13 09:20:27 -07:00
Sean Hefty
b42b63cf0d RDMA/core: Add XRC QPs
XRC ("eXtended reliable connected") is an IB transport that provides
better scalability by allowing senders to specify which shared receive
queue (SRQ) should be used to receive a message, which essentially
allows one transport context (QP connection) to serve multiple
destinations (as long as they share an adapter, of course).

XRC communication is between an initiator (INI) QP and a target (TGT)
QP.  Target QPs are associated with SRQs through an XRCD.  An XRC TGT QP
behaves like a receive-only RD QP.  XRC INI QPs behave similarly to RC
QPs, except that work requests posted to an XRC INI QP must specify the
remote SRQ that is the target of the work request.

We define two new QP types for XRC, to distinguish between INI and TGT
QPs, and update the core layer to support XRC QPs.

This patch is derived from work by Jack Morgenstein
<jackm@dev.mellanox.co.il>

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2011-10-13 09:16:19 -07:00
Sean Hefty
418d51307d RDMA/core: Add XRC SRQ type
XRC ("eXtended reliable connected") is an IB transport that provides
better scalability by allowing senders to specify which shared receive
queue (SRQ) should be used to receive a message, which essentially
allows one transport context (QP connection) to serve multiple
destinations (as long as they share an adapter, of course).

XRC defines SRQs that are specifically used by XRC connections.  Expand
the SRQ code to support XRC SRQs.  An XRC SRQ is currently restricted to
only XRC use according to the IB XRC Annex.

Portions of this patch were derived from work by
Jack Morgenstein <jackm@dev.mellanox.co.il>.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2011-10-13 09:14:31 -07:00
Sean Hefty
96104eda01 RDMA/core: Add SRQ type field
Currently, there is only a single ("basic") type of SRQ, but with XRC
support we will add a second.  Prepare for this by defining an SRQ type
and setting all current users to IB_SRQT_BASIC.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2011-10-13 09:13:26 -07:00
Sean Hefty
59991f94eb RDMA/core: Add XRC domain support
XRC ("eXtended reliable connected") is an IB transport that provides
better scalability by allowing senders to specify which shared receive
queue (SRQ) should be used to receive a message, which essentially
allows one transport context (QP connection) to serve multiple
destinations (as long as they share an adapter, of course).

A few new concepts are introduced to support this.  This patch adds:

 - A new device capability flag, IB_DEVICE_XRC, which low-level
   drivers set to indicate that a device supports XRC.
 - A new object type, XRC domains (struct ib_xrcd), and new verbs
   ib_alloc_xrcd()/ib_dealloc_xrcd().  XRCDs are used to limit which
   XRC SRQs an incoming message can target.

This patch is derived from work by Jack Morgenstein <jackm@dev.mellanox.co.il>.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2011-10-12 10:32:26 -07:00
Marcel Apfelbaum
e36fb88a9a IPoIB: Handle extended rates in debugfs
Use new function ib_rate_to_mbps() to handle printing rate in debugfs,
so that we handle extended rates.

Signed-off-by: Marcel Apfelbaum <marcela@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2011-10-11 11:57:08 -07:00
Marcel Apfelbaum
71eeba161d IB: Add new InfiniBand link speeds
Introduce support for the following extended speeds:

FDR-10: a Mellanox proprietary link speed which is 10.3125 Gbps with
        64b/66b encoding rather than 8b/10b encoding.
FDR:    IBA extended speed 14.0625 Gbps.
EDR:    IBA extended speed 25.78125 Gbps.

Signed-off-by: Marcel Apfelbaum <marcela@dev.mellanox.co.il>
Reviewed-by: Hal Rosenstock <hal@mellanox.com>
Reviewed-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2011-10-11 11:53:47 -07:00
Randy Dunlap
3e60a77ea2 IB/ipath: Add missing <linux/stat.h> in ipath_chip_init.c
Fix build errors:

    drivers/infiniband/hw/ipath/ipath_init_chip.c:54:1: error: 'S_IRUGO' undeclared here (not in a function)
    drivers/infiniband/hw/ipath/ipath_init_chip.c:54:1: error: bit-field '<anonymous>' width not an integer constant
    drivers/infiniband/hw/ipath/ipath_init_chip.c:67:1: error: 'S_IWUSR' undeclared here (not in a function)
    drivers/infiniband/hw/ipath/ipath_init_chip.c:67:1: error: bit-field '<anonymous>' width not an integer constant

Signed-off-by: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2011-10-10 12:01:22 -07:00
Faisal Latif
0f0bee8bbc RDMA/nes: Support for Packed And Unaligned fpdus
Support for Packed and Unaligned (PAU) FPDUs is needed for
interoperability between NES and non-NES nodes. When the NES hardware
detects a PAU frame, it will pass it to the driver to process the
frame.  NES driver creates a new frame for each FPDU and forwards it
to the hardware to be sent to its associated qp.

Signed-off-by: Tatyana Nikolova <Tatyana.E.Nikolova@intel.com>
Signed-off-by: Faisal Latif <Faisal.Latif@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2011-10-10 10:54:47 -07:00
Faisal Latif
6224c7eeff RDMA/nes: Print IP address for critcal errors
Print the IP address of the remote host when a critical asynchronous event is
received.

Signed-off-by: Tatyana Nikolova <Tatyana.E.Nikolova@intel.com>
Signed-off-by: Faisal Latif <Faisal.Latif@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2011-10-10 10:51:21 -07:00
Faisal Latif
bab3a9f43f RDMA/nes: Fix terminate connection
Fixes a crash that occurs during close when error async event is received.
Terminate message is not sent to the remote node if already processing close.

Signed-off-by: Tatyana Nikolova <Tatyana.E.Nikolova@intel.com>
Signed-off-by: Faisal Latif <Faisal.Latif@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2011-10-10 10:47:44 -07:00
David S. Miller
88c5100c28 Merge branch 'master' of github.com:davem330/net
Conflicts:
	net/batman-adv/soft-interface.c
2011-10-07 13:38:43 -04:00
Ian Campbell
5d6bcdfe38 net: use DMA_x_DEVICE and dma_mapping_error with skb_frag_dma_map
When I converted some drivers from pci_map_page to skb_frag_dma_map I
neglected to convert PCI_DMA_xDEVICE into DMA_x_DEVICE and
pci_dma_mapping_error into dma_mapping_error.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-10-06 16:17:20 -04:00
Tatyana Nikolova
615eb715ae RDMA/nes: Add support for MPAv2 Enhanced RDMA Negotiation
This patch adds support for Enhanced RDMA Connection Establishment
(draft-ietf-storm-mpa-peer-connect-06), aka MPAv2.  Details of draft
can be obtained from:
<http://www.ietf.org/id/draft-ietf-storm-mpa-peer-connect-06.txt>

For backwards compatibility, the MPAv2 enabled driver reverts to MPAv1
if the remote node doesn't support MPAv2.

Signed-off-by: Tatyana Nikolova <Tatyana.E.Nikolova@intel.com>
Signed-off-by: Faisal Latif <Faisal.Latif@intel.com>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2011-10-06 09:39:45 -07:00
Kumar Sanghvi
d2fe99e86b RDMA/cxgb4: Add support for MPAv2 Enhanced RDMA Negotiation
This patch adds support for Enhanced RDMA Connection Establishment
(draft-ietf-storm-mpa-peer-connect-06), aka MPAv2.  Details of draft
can be obtained from:
<http://www.ietf.org/id/draft-ietf-storm-mpa-peer-connect-06.txt>

The patch updates the following functions for initiator perspective:
 - send_mpa_request
 - process_mpa_reply
 - post_terminate for TERM error codes
 - destroy_qp for TERM related change
 - adds layer/etype/ecode to c4iw_qp_attrs for sending with TERM
 - peer_abort for retrying connection attempt with MPA_v1 message
 - added c4iw_reconnect function

The patch updates the following functions for responder perspective:
 - process_mpa_request
 - send_mpa_reply
 - c4iw_accept_cr
 - passes ird/ord to upper layers

Signed-off-by: Kumar Sanghvi <kumaras@chelsio.com>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2011-10-06 09:39:24 -07:00
Kumar Sanghvi
56da00fc92 RDMA/{amso1100,cxgb3}: Minimal MPAv2 support
As part of MPAv2 Enhanced RDMA Negotiation, pass max supported ird/ord
values upwards for the time being in iw_cxgb3 and amso1100.

Signed-off-by: Kumar Sanghvi <kumaras@chelsio.com>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2011-10-06 09:39:01 -07:00
Kumar Sanghvi
3ebeebc38b RDMA/iwcm: Propagate ird/ord values upwards
Update struct iw_cm_event to support propagating the ird/ord values
upwards to the application.

Signed-off-by: Kumar Sanghvi <kumaras@chelsio.com>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2011-10-06 09:37:52 -07:00
Mike Marciniszyn
53ab1c6498 IB/qib: Correct nfreectxts for multiple HCAs
The code that was recently introduced to report the number
of free contexts is flawed for multiple HCAs:

       /* Return the number of free user ports (contexts) available. */
       return scnprintf(buf, PAGE_SIZE, "%u\n", dd->cfgctxts -
                dd->first_user_ctxt - (u32)qib_stats.sps_ctxts);

The qib_stats is global to the module, not per HCA, so the code is broken
for multiple HCAs.

This patch adds a qib_devdata field, freectxts, that reflects the free
contexts for this HCA.

Signed-off-by: Mike Marciniszyn <mike.marciniszyn@qlogic.com>
Reviewed-by: Ram Vepa <ram.vepa@qlogic.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2011-10-06 09:33:35 -07:00
Julia Lawall
e2e435f290 RDMA/nes: Add missing calls to ib_umem_release()
Add calls to ib_umem_release(), as in the other error-handling code in
nes_reg_user_mr().

Signed-off-by: Julia Lawall <julia@diku.dk>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2011-10-06 09:33:24 -07:00
Hefty, Sean
caf6e3f221 RDMA/ucm: Removed checks for unsigned value < 0
cmd is unsigned, no need to check for < 0.  Found by code inspection.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2011-10-06 09:33:05 -07:00
Hefty, Sean
b7ab0b19a8 IB/mad: Verify mgmt class in received MADs
If a received MAD contains an invalid or reserved mgmt class, we will
attempt to access method_table outside of its range.  Add a check to
ensure that mgmt class falls within the handled range.

Found by code inspection.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2011-10-06 09:33:05 -07:00
Hefty, Sean
f45ee80eb0 RDMA/cma: Check for NULL conn_param in rdma_accept
Check that conn_param is not null before dereferencing it when
processing rdma_accept().  This is necessary to prevent a possible
system crash, which can be caused by user space.

Problem found by code inspection.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2011-10-06 09:33:04 -07:00
Yong Zhang
10889a3643 IB/ehca: Remove IRQF_DISABLED, since it's a no-op
Since commit e58aa3d2d0 ("genirq: Run irq handlers with interrupts
disabled"), we run all interrupt handlers with interrupts disabled and
we even check and yell when an interrupt handler returns with
interrupts enabled -- cf commit b738a50a20 ("genirq: Warn when
handler enables interrupts").

So now this flag is a no-op and can be removed.

Signed-off-by: Yong Zhang <yong.zhang0@gmail.com>
Acked-by: Hoang-Nam Nguyen <hnguyen@de.ibm.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2011-10-06 09:33:04 -07:00
Steve Wise
9efe10a1e1 RDMA/cxgb4: Fail RDMA initialization for unsupported cards
The iw_cxgb4 module crashes at init time if the T4 card does not
support RDMA.  So clean up the init logic to correctly deal with
non-RDMA cards.

 - If any RDMA resources are not available, then fail the initialization
   logging an info message.
 - Clean up properly on initialization failures.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2011-10-06 09:32:44 -07:00
Hefty, Sean
9595480c5d RDMA/cma: Fix crash in cma_req_handler
The RDMA CM uses the local qp_type to determine how to process an
incoming request.  This can result in an incoming REQ being treated as
a SIDR REQ and vice versa.  Fix this by switching off the event type
instead, and for good measure verify that the listener supports the
incoming connection request.

This problem showed up when a user space application mismatched the QP
types between a client and server app.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2011-10-06 09:32:33 -07:00
Andy Shevchenko
2be6053318 RDMA/amso1100: Use '%pM' format option to print MAC
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Acked-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2011-10-06 09:32:24 -07:00
Neil Horman
e48f129c2f [SCSI] cxgb3i: convert cdev->l2opt to use rcu to prevent NULL dereference
This oops was reported recently:
d:mon> e
cpu 0xd: Vector: 300 (Data Access) at [c0000000fd4c7120]
    pc: d00000000076f194: .t3_l2t_get+0x44/0x524 [cxgb3]
    lr: d000000000b02108: .init_act_open+0x150/0x3d4 [cxgb3i]
    sp: c0000000fd4c73a0
   msr: 8000000000009032
   dar: 0
 dsisr: 40000000
  current = 0xc0000000fd640d40
  paca    = 0xc00000000054ff80
    pid   = 5085, comm = iscsid
d:mon> t
[c0000000fd4c7450] d000000000b02108 .init_act_open+0x150/0x3d4 [cxgb3i]
[c0000000fd4c7500] d000000000e45378 .cxgbi_ep_connect+0x784/0x8e8 [libcxgbi]
[c0000000fd4c7650] d000000000db33f0 .iscsi_if_rx+0x71c/0xb18
[scsi_transport_iscsi2]
[c0000000fd4c7740] c000000000370c9c .netlink_data_ready+0x40/0xa4
[c0000000fd4c77c0] c00000000036f010 .netlink_sendskb+0x4c/0x9c
[c0000000fd4c7850] c000000000370c18 .netlink_sendmsg+0x358/0x39c
[c0000000fd4c7950] c00000000033be24 .sock_sendmsg+0x114/0x1b8
[c0000000fd4c7b50] c00000000033d208 .sys_sendmsg+0x218/0x2ac
[c0000000fd4c7d70] c00000000033f55c .sys_socketcall+0x228/0x27c
[c0000000fd4c7e30] c0000000000086a4 syscall_exit+0x0/0x40
--- Exception: c01 (System Call) at 00000080da560cfc

The root cause was an EEH error, which sent us down the offload_close path in
the cxgb3 driver, which in turn sets cdev->l2opt to NULL, without regard for
upper layer driver (like the cxgbi drivers) which might have execution contexts
in the middle of its use. The result is the oops above, when t3_l2t_get attempts
to dereference L2DATA(cdev)->nentries in arp_hash right after the EEH error handler sets it to NULL.

The fix is to prevent the setting of the NULL pointer until after there are no
further users of it.  The t3cdev->l2opt pointer is now converted to be an rcu
pointer and the L2DATA macro is now called under the protection of the
rcu_read_lock().  When the EEH error path:
t3_adapter_error->offload_close->cxgb3_offload_deactivate
Is exectured, setting of that l2opt pointer to NULL, is now gated on an rcu
quiescence point, preventing, allowing L2DATA callers to safely check for a NULL
pointer without concern that the underlying data will be freeded before the
pointer is dereferenced.

This has been tested by the reporter and shown to fix the reproted oops

[nhorman: fix up unitinialised variable reported by Dan Carpenter]
Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Reviewed-by: Karen Xie <kxie@chelsio.com>
Cc: stable@kernel.org
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2011-09-26 09:28:01 -05:00
David S. Miller
8decf86879 Merge branch 'master' of github.com:davem330/net
Conflicts:
	MAINTAINERS
	drivers/net/Kconfig
	drivers/net/ethernet/broadcom/bnx2x/bnx2x_link.c
	drivers/net/ethernet/broadcom/tg3.c
	drivers/net/wireless/iwlwifi/iwl-pci.c
	drivers/net/wireless/iwlwifi/iwl-trans-tx-pcie.c
	drivers/net/wireless/rt2x00/rt2800usb.c
	drivers/net/wireless/wl12xx/main.c
2011-09-22 03:23:13 -04:00
Mike Christie
f27fb2ef7b [SCSI] iscsi class: sysfs group is_visible callout for iscsi host attrs
The iscsi class currently does not support writable sysfs
attrs for LLD sysfs settings. This patch converts the
iscsi class and driver's host attrs to use the attribute
container sysfs group and the sysfs group's is_visible callout
to be able to support readable or writable sysfs attrs.

Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2011-08-27 08:36:14 -06:00
Mike Christie
1d063c1729 [SCSI] iscsi class: sysfs group is_visible callout for session attrs
The iscsi class currently does not support writable sysfs
attrs for LLD sysfs settings. This patch converts the
iscsi class and driver's session attrs to use the attribute
container sysfs group and the sysfs group's is_visible callout
to be able to support readable or writable sysfs attrs.

Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2011-08-27 08:36:06 -06:00
Mike Christie
3128c6c73c [SCSI] iscsi cls: sysfs group is_visible callout for conn attrs
The iscsi class currently does not support writable sysfs
attrs for LLD sysfs settings. This patch converts the
iscsi class and drivers to use the attribute container
sysfs group and the sysfs group's is_visible callout
to be able to support readable or writable sysfs attrs.

Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2011-08-27 08:36:03 -06:00
Ian Campbell
5581be3b48 IPoIB: convert to SKB paged frag API.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Roland Dreier <roland@kernel.org>
Cc: Sean Hefty <sean.hefty@intel.com>
Cc: Hal Rosenstock <hal.rosenstock@gmail.com>
Cc: linux-rdma@vger.kernel.org
Cc: netdev@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-08-26 12:38:42 -04:00
Ian Campbell
cf383ebb13 IB: nes: convert to SKB paged frag API.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Faisal Latif <faisal.latif@intel.com>
Cc: Roland Dreier <roland@kernel.org>
Cc: Sean Hefty <sean.hefty@intel.com>
Cc: Hal Rosenstock <hal.rosenstock@gmail.com>
Cc: linux-rdma@vger.kernel.org
Cc: netdev@vger.kernel.org
Acked-by: Faisal Latif <faisal.latif@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-08-26 12:38:42 -04:00
Ian Campbell
70b1052ad2 IB: amso1100: convert to SKB paged frag API.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Tom Tucker <tom@opengridcomputing.com>
Cc: Steve Wise <swise@opengridcomputing.com>
Cc: Roland Dreier <roland@kernel.org>
Cc: Sean Hefty <sean.hefty@intel.com>
Cc: Hal Rosenstock <hal.rosenstock@gmail.com>
Cc: linux-rdma@vger.kernel.org
Cc: netdev@vger.kernel.org
Acked-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-08-26 12:38:42 -04:00
Jiri Pirko
afc4b13df1 net: remove use of ndo_set_multicast_list in drivers
replace it by ndo_set_rx_mode

Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-08-17 20:22:03 -07:00
Roland Dreier
80b43de837 Merge branches 'ipoib' and 'iser' into for-next 2011-08-17 10:57:43 -07:00
Or Gerlitz
200ae1a08b IB/iser: Support iSCSI PDU padding
RFC3270 mandates that iSCSI PDUs are padded to the closest integer
number of four byte words.  Fix the iser code to support that on both
the TX/RX flows.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.co.il>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2011-08-17 09:45:07 -07:00
Or Gerlitz
0ace64b85e IBiser: Fix wrong mask when sizeof (dma_addr_t) > sizeof (unsigned long)
The code that prepares the SG associated with SCSI command for FMR was
buggy for systems with DMA addresses that don't fit in unsigned long,
e.g under the 32-bit based XenServer dom0 sizeof(dma_addr_t) is 8.

Fix that by casting to unsigned long long a masking constant used by
the code. This resolves a crash in iser_sg_to_page_vec on this system.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.co.il>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2011-08-17 09:40:55 -07:00
Bernd Schubert
22cfb0bf67 IPoIB: Fix possible NULL dereference in ipoib_start_xmit()
Fix a bug introduced in 69cce1d140 ("net: Abstract dst->neighbour
accesses behind helpers.") where we might dereference skb_dst(skb)
even if it is NULL, which causes:

    [  240.944030] BUG: unable to handle kernel NULL pointer dereference at 0000000000000040
    [  240.948007] IP: [<ffffffffa0366ce9>] ipoib_start_xmit+0x39/0x280 [ib_ipoib]
    [...]
    [  240.948007] Call Trace:
    [  240.948007]  <IRQ>
    [  240.948007]  [<ffffffff812cd5e0>] dev_hard_start_xmit+0x2a0/0x590
    [  240.948007]  [<ffffffff8131f680>] ? arp_create+0x70/0x200
    [  240.948007]  [<ffffffff812e8e1f>] sch_direct_xmit+0xef/0x1c0

Addresses: https://bugzilla.kernel.org/show_bug.cgi?id=41212
Signed-off-by: Bernd Schubert <bernd.schubert@itwm.fraunhofer.de>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2011-08-16 10:19:20 -07:00
David S. Miller
af3dcd2f44 mlx4: Fix infiniband Kconfig dependencies.
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-08-11 23:05:05 -07:00
Jeff Kirsher
f7917c009c chelsio: Move the Chelsio drivers
Moves the drivers for the Chelsio chipsets into
drivers/net/ethernet/chelsio/ and the necessary Kconfig and Makefile
changes.

CC: Divy Le Ray <divy@chelsio.com>
CC: Dimitris Michailidis <dm@chelsio.com>
CC: Casey Leedom <leedom@chelsio.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2011-08-10 19:54:52 -07:00
Linus Torvalds
91d41fdf31 Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending
* 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending:
  target: Convert to DIV_ROUND_UP_SECTOR_T usage for sectors / dev_max_sectors
  kernel.h: Add DIV_ROUND_UP_ULL and DIV_ROUND_UP_SECTOR_T macro usage
  iscsi-target: Add iSCSI fabric support for target v4.1
  iscsi: Add Serial Number Arithmetic LT and GT into iscsi_proto.h
  iscsi: Use struct scsi_lun in iscsi structs instead of u8[8]
  iscsi: Resolve iscsi_proto.h naming conflicts with drivers/target/iscsi
2011-07-27 13:21:40 -07:00
Arun Sharma
60063497a9 atomic: use <linux/atomic.h>
This allows us to move duplicated code in <asm/atomic.h>
(atomic_inc_not_zero() for now) to <linux/atomic.h>

Signed-off-by: Arun Sharma <asharma@fb.com>
Reviewed-by: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: David Miller <davem@davemloft.net>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: Mike Frysinger <vapier@gentoo.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-07-26 16:49:47 -07:00
Nicholas Bellinger
123521830c iscsi: Resolve iscsi_proto.h naming conflicts with drivers/target/iscsi
This patch renames the following iscsi_proto.h structures to avoid
namespace issues with drivers/target/iscsi/iscsi_target_core.h:

*) struct iscsi_cmd -> struct iscsi_scsi_req
*) struct iscsi_cmd_rsp -> struct iscsi_scsi_rsp
*) struct iscsi_login -> struct iscsi_login_req

This patch includes useful ISCSI_FLAG_LOGIN_[CURRENT,NEXT]_STAGE*,
and ISCSI_FLAG_SNACK_TYPE_* definitions used by iscsi_target_mod, and
fixes the incorrect definition of struct iscsi_snack to following
RFC-3720 Section 10.16. SNACK Request.

Also, this patch updates libiscsi, iSER, be2iscsi, and bn2xi to
use the updated structure definitions in a handful of locations.

Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: Nicholas A. Bellinger <nab@linux-iscsi.org>
2011-07-25 07:18:45 +00:00
Linus Torvalds
ece236ce2f Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: (26 commits)
  IB/qib: Defer HCA error events to tasklet
  mlx4_core: Bump the driver version to 1.0
  RDMA/cxgb4: Use printk_ratelimited() instead of printk_ratelimit()
  IB/mlx4: Support PMA counters for IBoE
  IB/mlx4: Use flow counters on IBoE ports
  IB/pma: Add include file for IBA performance counters definitions
  mlx4_core: Add network flow counters
  mlx4_core: Fix location of counter index in QP context struct
  mlx4_core: Read extended capabilities into the flags field
  mlx4_core: Extend capability flags to 64 bits
  IB/mlx4: Generate GID change events in IBoE code
  IB/core: Add GID change event
  RDMA/cma: Don't allow IPoIB port space for IBoE
  RDMA: Allow for NULL .modify_device() and .modify_port() methods
  IB/qib: Update active link width
  IB/qib: Fix potential deadlock with link down interrupt
  IB/qib: Add sysfs interface to read free contexts
  IB/mthca: Remove unnecessary read of PCI_CAP_ID_EXP
  IB/qib: Remove double define
  IB/qib: Remove unnecessary read of PCI_CAP_ID_EXP
  ...
2011-07-22 14:50:12 -07:00
Roland Dreier
4460207561 Merge branches 'cma', 'cxgb4', 'ipath', 'misc', 'mlx4', 'mthca', 'qib' and 'srp' into for-next 2011-07-22 11:56:11 -07:00
Mike Marciniszyn
e67306a380 IB/qib: Defer HCA error events to tasklet
With ib_qib options:

    options ib_qib krcvqs=1 pcie_caps=0x51 rcvhdrcnt=4096 singleport=1 ibmtu=4

a run of ib_write_bw -a yields the following:

    ------------------------------------------------------------------
     #bytes     #iterations    BW peak[MB/sec]    BW average[MB/sec]
     1048576   5000           2910.64            229.80
    ------------------------------------------------------------------

The top cpu use in a profile is:

    CPU: Intel Architectural Perfmon, speed 2400.15 MHz (estimated)
    Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit mask
    of 0x00 (No unit mask) count 1002300
    Counted LLC_MISSES events (Last level cache demand requests from this core that
    missed the LLC) with a unit mask of 0x41 (No unit mask) count 10000
    samples  %        samples  %        app name                 symbol name
    15237    29.2642  964      17.1195  ib_qib.ko                qib_7322intr
    12320    23.6618  1040     18.4692  ib_qib.ko                handle_7322_errors
    4106      7.8860  0              0  vmlinux                  vsnprintf


Analysis of the stats, profile, the code, and the annotated profile indicate:
 - All of the overflow interrupts (one per packet overflow) are
   serviced on CPU0 with no mitigation on the frequency.
 - All of the receive interrupts are being serviced by CPU0.  (That is
   the way truescale.cmds statically allocates the kctx IRQs to CPU)
 - The code is spending all of its time servicing QIB_I_C_ERROR
   RcvEgrFullErr interrupts on CPU0, starving the packet receive
   processing.
 - The decode_err routine is very inefficient, using a printf variant
   to format a "%s" and continues to loop when the errs mask has been
   cleared.
 - Both qib_7322intr and handle_7322_errors read pci registers, which
   is very inefficient.

The fix does the following:
 - Adds a tasklet to service QIB_I_C_ERROR
 - Replaces the very inefficient scnprintf() with a memcpy().  A field
   is added to qib_hwerror_msgs to save the sizeof("string") at
   compile time so that a strlen is not needed during err_decode().
 - The most frequent errors (Overflows) are serviced first to exit the
   loop as early as possible.
 - The loop now exits as soon as the errs mask is clear rather than
   fruitlessly looping through the msp array.

With this fix the performance changes to:

    ------------------------------------------------------------------
     #bytes     #iterations    BW peak[MB/sec]    BW average[MB/sec]
     1048576   5000           2990.64            2941.35
    ------------------------------------------------------------------

During testing of the error handling overflow patch, it was determined
that some CPU's were slower when servicing both overflow and receive
interrupts on CPU0 with different MSI interrupt vectors.

This patch adds an option (krcvq01_no_msi) to not use a dedicated MSI
interrupt for kctx's < 2 and to service them on the default interrupt.
For some CPUs, the cost of the interrupt enter/exit is more costly
than then the additional PCI read in the default handler.

Signed-off-by: Mike Marciniszyn <mike.marciniszyn@qlogic.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2011-07-22 11:56:05 -07:00
Jiri Pirko
7033c4ad87 nes: do vlan cleanup
- unify vlan and nonvlan rx path
- kill nesvnic->vlan_grp and nes_netdev_vlan_rx_register
- allow to turn on/off rx/tx vlan accel via ethtool (set_features)

Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-07-21 13:47:53 -07:00
Manuel Zerpies
3cbe182a5b RDMA/cxgb4: Use printk_ratelimited() instead of printk_ratelimit()
Since printk_ratelimit() shouldn't be used anymore (see comment in
include/linux/printk.h), replace it with printk_ratelimited().

Signed-off-by: Manuel Zerpies <manuel.f.zerpies@ww.stud.uni-erlangen.de>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2011-07-18 21:18:39 -07:00
Or Gerlitz
c37791349c IB/mlx4: Support PMA counters for IBoE
Use the per port counter attached to all QPs created on that port to
implement port level packets/bytes performance counters a la IB.
Derived from a patch by Eli Cohen <eli@mellanox.co.il>

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.co.il>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2011-07-18 21:04:36 -07:00
Or Gerlitz
cfcde11c3d IB/mlx4: Use flow counters on IBoE ports
Allocate flow counter per Ethernet/IBoE port, and attach this counter
to all the QPs created on that port.  Based on patch by Eli Cohen
<eli@mellanox.co.il>.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.co.il>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2011-07-18 21:04:36 -07:00
Or Gerlitz
6aea213a62 IB/pma: Add include file for IBA performance counters definitions
Move the various definitions and mad structures needed for software
implementation of IBA PM agent from the ipath and qib drivers into a
single include file, which in turn could be used by more consumers.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.co.il>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2011-07-18 21:04:35 -07:00
Or Gerlitz
6451c712fe IB/mlx4: Generate GID change events in IBoE code
IBoE doesn't use LIDs.  Use the GID change event to update the IB core
cache for addition/deletion of GIDs.

Signed-off-by: Eli Cohen <eli@mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.co.il>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2011-07-18 21:04:31 -07:00
Or Gerlitz
761d90ed4c IB/core: Add GID change event
Add IB GID change event type.  This is needed for IBoE when the HW
driver updates the GID (e.g when new VLANs are added/deleted) table
and the change should be reflected to the IB core cache.

Signed-off-by: Eli Cohen <eli@mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.co.il>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2011-07-18 21:04:30 -07:00
Moni Shoua
2efdd6a038 RDMA/cma: Don't allow IPoIB port space for IBoE
This patch fixes a kernel crash in cma_set_qkey().

When the link layer is Ethernet, it is wrong to use IPoIB port space
since no IPoIB interface is available.  Specifically, setting the
Q_Key when port space is RDMA_PS_IPOIB requires MGID calculation and
an SA query, which doesn't make sense over Ethernet.

Signed-off-by: Moni Shoua <monis@mellanox.co.il>
Acked-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2011-07-18 16:50:25 -07:00
Bart Van Assche
10e1b54bbb RDMA: Allow for NULL .modify_device() and .modify_port() methods
These methods don't make sense for iWARP devices, so rather than
forcing them to implement stubs, just return -ENOSYS in the core if
the hardware driver doesn't set .modify_device and/or .modify_port.

Signed-off-by: Roland Dreier <roland@purestorage.com>
2011-07-18 16:44:30 -07:00
Mitko Haralanov
e800bd032c IB/qib: Update active link width
Update the active link width on QLE7220 chips when link goes down if
chip width does not match shadowed width.

Signed-off-by: Mitko Haralanov <mitko@qlogic.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@qlogic.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2011-07-18 12:09:26 -07:00
Ram Vepa
4356d0b64b IB/qib: Fix potential deadlock with link down interrupt
There is a possibility of a deadlock due to the way locks are
acquired and released in qib_set_uevent_bits(). The function
qib_set_uevent_bits() is called in process context and it uses
spin_lock() and spin_unlock().  This same lock is acquired/released
in interrupt context which can lead to a deadlock when running on
the same cpu.

The fix is to replace spin_lock() and spin_unlock() with
spin_lock_irqsave() and spin_unlock_irqrestore() respectively in
qib_set_uevent_bits().

Signed-off-by: Mike Marciniszyn <mike.marciniszyn@qlogic.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2011-07-18 12:09:23 -07:00
Ram Vepa
2df4f7579d IB/qib: Add sysfs interface to read free contexts
Indicate the number of free user contexts via the sysfs file
/sys/class/infiniband/qib0/nfreectxts as required for PSM.

Signed-off-by: Mike Marciniszyn <mike.marciniszyn@qlogic.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2011-07-18 12:09:19 -07:00
Jon Mason
9b89925c0d IB/mthca: Remove unnecessary read of PCI_CAP_ID_EXP
The PCIE capability offset is saved during PCI bus walking.  It will
remove an unnecessary search in the PCI configuration space if this
value is referenced instead of reacquiring it.  Also, pci_is_pcie is a
better way of determining if the device is PCIE or not (as it uses the
same saved PCIE capability offset).

Signed-off-by: Jon Mason <jdmason@kudzu.us>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2011-07-18 12:01:22 -07:00
Edwin van Vliet
ac0cae4495 IB/qib: Remove double define
Signed-off-by: Edwin van Vliet <edwin@cheatah.nl>
Reviewed-by: Jesper Juhl <jj@chaosbits.net>
Acked-by: Mike Marciniszyn <mike.marciniszyn@qlogic.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2011-07-18 11:59:23 -07:00
Jon Mason
7f27cda037 IB/qib: Remove unnecessary read of PCI_CAP_ID_EXP
The PCIE capability offset is saved during PCI bus walking.  It will
remove an unnecessary search in the PCI configuration space if this
value is referenced instead of reacquiring it.

Signed-off-by: Jon Mason <jdmason@kudzu.us>
Acked-by: Mike Marciniszyn <mike.marciniszyn@qlogic.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2011-07-18 11:57:52 -07:00
Motohiro KOSAKI
5763181172 IB/ipath: Convert old cpumask api into new one
Adapt to new api.  We plan to remove old one later.  Almost all
changes are trivial, but there is one real fix: the following code is
unsafe:

	int ncpus = num_online_cpus()
	for (i = 0; i < ncpus; i++) {
		..
	}

because 1) we don't guarantee last bit of online cpus is equal to
num_online_cpus(). some arch assign sparse cpu number.  2) cpu
hotplugging may change cpu_online_mask at same time.  we need to pin
it by get_online_cpus().

Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Acked-by: Mike Marciniszyn <mike.marciniszyn@qlogic.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2011-07-18 11:56:18 -07:00
Motohiro KOSAKI
0cd85e6738 IB/qib: Convert old cpumask api into new one
Adapt to use new APIs.  We plan to remove old one later and plan to
change current->cpus_allowed implementation.

No functional change.

Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Acked-by: Mike Marciniszyn <mike.marciniszyn@qlogic.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2011-07-18 11:56:03 -07:00
Jack Morgenstein
0c9361fcde RDMA/cma: Avoid assigning an IS_ERR value to cm_id pointer in CMA id object
Avoid assigning an IS_ERR value to the cm_id pointer.  This fixes a
few anomalies in the error flow due to confusion about checking for
NULL vs IS_ERR, and eliminates the need to test for the IS_ERR value
every time we wish to determine if the cma_id object has a cm device
associated with it.

Also, eliminate the now-unnecessary procedure cma_has_cm_dev (we can
check directly for the existence of the device pointer -- for a
non-NULL check, makes no difference if it is the iwarp or the ib
pointer).

Finally, make a few code changes here to improve coding consistency.

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2011-07-18 09:44:55 -07:00
David S. Miller
69cce1d140 net: Abstract dst->neighbour accesses behind helpers.
dst_{get,set}_neighbour()

Signed-off-by: David S. Miller <davem@davemloft.net>
2011-07-17 23:11:35 -07:00
Goldwyn Rodrigues
cdb73db0b6 IB/mthca: Stop returning separate error and status from FW commands
Instead of having firmware command functions return an error and also
a status, leading to code like:

	err = mthca_FW_COMMAND(..., &status);
	if (err)
		goto out;
        if (status) {
		err = -E...;
		goto out;
	}

all over the place, just handle the FW status inside the FW command
handling code (the way mlx4 does it), so we can simply write:

	err = mthca_FW_COMMAND(...);
	if (err)
		goto out;

In addition to simplifying the source code, this also saves a healthy
chunk of text:

    add/remove: 0/0 grow/shrink: 10/88 up/down: 510/-3357 (-2847)
    function                                     old     new   delta
    static.trans_table                           324     584    +260
    mthca_cmd_poll                               352     477    +125
    mthca_cmd_wait                               511     567     +56
    mthca_table_put                              213     240     +27
    mthca_cleanup_db_tab                         372     387     +15
    __mthca_remove_one                           314     323      +9
    mthca_cleanup_user_db_tab                    275     283      +8
    __mthca_init_one                            1738    1746      +8
    mthca_cleanup                                 20      21      +1
    mthca_MAD_IFC                               1081    1082      +1
    mthca_MGID_HASH                               43      40      -3
    mthca_MAP_ICM_AUX                             23      20      -3
    mthca_MAP_ICM                                 19      16      -3
    mthca_MAP_FA                                  23      20      -3
    mthca_READ_MGM                                43      38      -5
    mthca_QUERY_SRQ                               43      38      -5
    mthca_QUERY_QP                                59      54      -5
    mthca_HW2SW_SRQ                               43      38      -5
    mthca_HW2SW_MPT                               60      55      -5
    mthca_HW2SW_EQ                                43      38      -5
    mthca_HW2SW_CQ                                43      38      -5
    mthca_free_icm_table                         120     114      -6
    mthca_query_srq                              214     206      -8
    mthca_free_qp                                662     654      -8
    mthca_cmd                                     38      28     -10
    mthca_alloc_db                              1321    1311     -10
    mthca_setup_hca                             1067    1055     -12
    mthca_WRITE_MTT                               35      22     -13
    mthca_WRITE_MGM                               40      27     -13
    mthca_UNMAP_ICM_AUX                           36      23     -13
    mthca_UNMAP_FA                                36      23     -13
    mthca_SYS_DIS                                 36      23     -13
    mthca_SYNC_TPT                                36      23     -13
    mthca_SW2HW_SRQ                               35      22     -13
    mthca_SW2HW_MPT                               35      22     -13
    mthca_SW2HW_EQ                                35      22     -13
    mthca_SW2HW_CQ                                35      22     -13
    mthca_RUN_FW                                  36      23     -13
    mthca_DISABLE_LAM                             36      23     -13
    mthca_CLOSE_IB                                36      23     -13
    mthca_CLOSE_HCA                               38      25     -13
    mthca_ARM_SRQ                                 39      26     -13
    mthca_free_icms                              178     164     -14
    mthca_QUERY_DDR                              389     375     -14
    mthca_resize_cq                             1063    1048     -15
    mthca_unmap_eq_icm                           123     107     -16
    mthca_map_eq_icm                             396     380     -16
    mthca_cmd_box                                 90      74     -16
    mthca_SET_IB                                 433     417     -16
    mthca_RESIZE_CQ                              369     353     -16
    mthca_MAP_ICM_page                           240     224     -16
    mthca_MAP_EQ                                 183     167     -16
    mthca_INIT_IB                                473     457     -16
    mthca_INIT_HCA                               745     729     -16
    mthca_map_user_db                            816     798     -18
    mthca_SYS_EN                                 157     139     -18
    mthca_cleanup_qp_table                        78      59     -19
    mthca_cleanup_eq_table                       168     149     -19
    mthca_UNMAP_ICM                              143     121     -22
    mthca_modify_srq                             172     149     -23
    mthca_unmap_fmr                              198     174     -24
    mthca_query_qp                               814     790     -24
    mthca_query_pkey                             343     319     -24
    mthca_SET_ICM_SIZE                            34      10     -24
    mthca_QUERY_DEV_LIM                         1870    1846     -24
    mthca_map_cmd                               1130    1105     -25
    mthca_ENABLE_LAM                             401     375     -26
    mthca_modify_port                            247     220     -27
    mthca_query_device                           884     850     -34
    mthca_NOP                                     75      41     -34
    mthca_table_get                              287     249     -38
    mthca_init_qp_table                          333     293     -40
    mthca_MODIFY_QP                              348     308     -40
    mthca_close_hca                              131      89     -42
    mthca_free_eq                                435     390     -45
    mthca_query_port                             755     705     -50
    mthca_free_cq                                581     528     -53
    mthca_alloc_icm_table                        578     524     -54
    mthca_multicast_attach                      1041     986     -55
    mthca_init_hca                               326     271     -55
    mthca_query_gid                              487     431     -56
    mthca_free_srq                               524     468     -56
    mthca_free_mr                                168     111     -57
    mthca_create_eq                             1560    1501     -59
    mthca_multicast_detach                       790     728     -62
    mthca_write_mtt                              918     854     -64
    mthca_register_device                       1406    1342     -64
    mthca_fmr_alloc                              947     883     -64
    mthca_mr_alloc                               652     582     -70
    mthca_process_mad                           1242    1164     -78
    mthca_dev_lim                                910     830     -80
    find_mgm                                     482     400     -82
    mthca_modify_qp                             3852    3753     -99
    mthca_init_cq                               1281    1181    -100
    mthca_alloc_srq                             1719    1610    -109
    mthca_init_eq_table                         1807    1679    -128
    mthca_init_tavor                             761     491    -270
    mthca_init_arbel                            2617    2098    -519

Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.de>
2011-07-15 13:33:20 -07:00
David S. Miller
6a7ebdf2fd Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
Conflicts:
	net/bluetooth/l2cap_core.c
2011-07-14 07:56:40 -07:00
Bart Van Assche
fd1b6c4a69 IB/srp: Avoid duplicate devices from LUN scan
SCSI scanning of a channel🆔lun triplet in Linux works as follows
(function scsi_scan_target() in drivers/scsi/scsi_scan.c):

- If lun == SCAN_WILD_CARD, send a REPORT LUNS command to the target
  and process the result.

- If lun != SCAN_WILD_CARD, send an INQUIRY command to the LUN
  corresponding to the specified channel🆔lun triplet to verify
  whether the LUN exists.

So a SCSI driver must either take the channel and target id values in
account in its quecommand() function or it should declare that it only
supports one channel and one target id.

Currently the ib_srp driver does neither.  As a result scanning the
SCSI bus via e.g. rescan-scsi-bus.sh causes many duplicate SCSI
devices to be created. For each 0:0:L device, several duplicates are
created with the same LUN number and with (C:I) != (0:0). Fix this by
declaring that the ib_srp driver only supports one channel and one
target id.

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Cc: <stable@kernel.org>
Acked-by: David Dillow <dillowda@ornl.gov>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2011-07-13 09:19:16 -07:00
David S. Miller
e12fe68ce3 Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 2011-07-05 23:23:37 -07:00
Goldwyn Rodrigues
b2bc478219 RDMA: Check for NULL mode in .devnode methods
Commits 71c29bd5c2 ("IB/uverbs: Add devnode method to set path/mode")
and c3af0980ce ("IB: Add devnode methods to cm_class and umad_class")
added devnode methods that set the mode.

However, these methods don't check for a NULL mode, and so we get a
crash when unloading modules because devtmpfs_delete_node() calls
device_get_devnode() with mode == NULL.

Add the missing checks.

Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.de>
[ Also fix cm.c.  - Roland ]
Signed-off-by: Roland Dreier <roland@purestorage.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-07-04 15:53:28 -07:00
Roland Dreier
c7d74b0909 Merge branches 'cxgb4' and 'qib' into for-next 2011-06-17 11:57:55 -07:00
Mitko Haralanov
3126448451 IB/qib: Ensure that LOS and DFE are being turned off
Due to timing, it is possible for the LOS and DFE to remain on. This
is due to the link progressing to LinkUP prior to the driver getting
the first Status Changed interrupt.  By expanding the conditions under
which LOS is turned off and DFE timeout is being set, timing is no
longer an issue.

Signed-off-by: Mitko Haralanov <mitko@qlogic.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@qlogic.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2011-06-17 11:56:59 -07:00
Steve Wise
8da7e7a552 RDMA/cxgb4: Couple of abort fixes
- fix a race where the driver could end up sending a close_con_req
  after an abort_rpl.  In c4iw_ep_disconnect(), send abort or close
  request with the ep mutex held.

- fix a hang where driver fails to wake up when a connection is reset
  during a normal close.  Wake up any waiters in the interrupt path,
  and correctly cleanup after rdma_fini() failures.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2011-06-17 11:54:56 -07:00
Steve Wise
301c2c3f03 RDMA/cxgb4: Don't truncate MR lengths
Remove left-over code from T3 that limited MR sizes to 32b.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2011-06-17 11:54:50 -07:00
Steve Wise
2ff7d09a1b RDMA/cxgb4: Don't exceed hw IQ depth limit for user CQs
Memory allocated for user CQs gets rounded up to the next page
boundary.  And after rounding, we recalculate the resulting IQ depth
and we need to make sure we don't exceed the HW limits.

This bug can result a much smaller CQ allocated than was expected if
the HW size field is exceeded, resulting in CQ overflow failures.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2011-06-17 11:52:45 -07:00
Greg Rose
c7ac8679be rtnetlink: Compute and store minimum ifinfo dump size
The message size allocated for rtnl ifinfo dumps was limited to
a single page.  This is not enough for additional interface info
available with devices that support SR-IOV and caused a bug in
which VF info would not be displayed if more than approximately
40 VFs were created per interface.

Implement a new function pointer for the rtnl_register service that will
calculate the amount of data required for the ifinfo dump and allocate
enough data to satisfy the request.

Signed-off-by: Greg Rose <gregory.v.rose@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2011-06-09 20:38:07 -07:00
Alexey Dobriyan
a6b7a40786 net: remove interrupt.h inclusion from netdevice.h
* remove interrupt.g inclusion from netdevice.h -- not needed
* fixup fallout, add interrupt.h and hardirq.h back where needed.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-06-06 22:55:11 -07:00
Linus Torvalds
4c171acc20 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband:
  RDMA/cma: Save PID of ID's owner
  RDMA/cma: Add support for netlink statistics export
  RDMA/cma: Pass QP type into rdma_create_id()
  RDMA: Update exported headers list
  RDMA/cma: Export enum cma_state in <rdma/rdma_cm.h>
  RDMA/nes: Add a check for strict_strtoul()
  RDMA/cxgb3: Don't post zero-byte read if endpoint is going away
  RDMA/cxgb4: Use completion objects for event blocking
  IB/srp: Fix integer -> pointer cast warnings
  IB: Add devnode methods to cm_class and umad_class
  IB/mad: Return EPROTONOSUPPORT when an RDMA device lacks the QP required
  IB/uverbs: Add devnode method to set path/mode
  RDMA/ucma: Add .nodename/.mode to tell userspace where to create device node
  RDMA: Add netlink infrastructure
  RDMA: Add error handling to ib_core_init()
2011-05-26 12:13:57 -07:00
Roland Dreier
8dc4abdf4c Merge branches 'cma', 'cxgb3', 'cxgb4', 'misc', 'nes', 'netlink', 'srp' and 'uverbs' into for-next 2011-05-25 13:47:20 -07:00
Nir Muchtar
83e9502d8d RDMA/cma: Save PID of ID's owner
Save the PID associated with an RDMA CM ID for reporting via netlink.

Signed-off-by: Nir Muchtar <nirm@voltaire.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2011-05-25 13:46:23 -07:00
Nir Muchtar
753f618ae0 RDMA/cma: Add support for netlink statistics export
Add callbacks and data types for statistics export of all current
devices/ids.  The schema for RDMA CM is a series of netlink messages.
Each one contains an rdma_cm_stat struct.  Additionally, two netlink
attributes are created for the addresses for each message (if
applicable).

Their types used are:
RDMA_NL_RDMA_CM_ATTR_SRC_ADDR (The source address for this ID)
RDMA_NL_RDMA_CM_ATTR_DST_ADDR (The destination address for this ID)
sockaddr_* structs are encapsulated within these attributes.

In other words, every transaction contains a series of messages like:

-------message 1-------
struct rdma_cm_id_stats {
       __u32 qp_num;
       __u32 bound_dev_if;
       __u32 port_space;
       __s32 pid;
       __u8 cm_state;
       __u8 node_type;
       __u8 port_num;
       __u8 reserved;
}
RDMA_NL_RDMA_CM_ATTR_SRC_ADDR attribute - contains the source address
RDMA_NL_RDMA_CM_ATTR_DST_ADDR attribute - contains the destination address
-------end 1-------
-------message 2-------
struct rdma_cm_id_stats
RDMA_NL_RDMA_CM_ATTR_SRC_ADDR attribute
RDMA_NL_RDMA_CM_ATTR_DST_ADDR attribute
-------end 2-------

Signed-off-by: Nir Muchtar <nirm@voltaire.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2011-05-25 13:46:23 -07:00
Sean Hefty
b26f9b9949 RDMA/cma: Pass QP type into rdma_create_id()
The RDMA CM currently infers the QP type from the port space selected
by the user.  In the future (eg with RDMA_PS_IB or XRC), there may not
be a 1-1 correspondence between port space and QP type.  For netlink
export of RDMA CM state, we want to export the QP type to userspace,
so it is cleaner to explicitly associate a QP type to an ID.

Modify rdma_create_id() to allow the user to specify the QP type, and
use it to make our selections of datagram versus connected mode.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2011-05-25 13:46:23 -07:00
Nir Muchtar
550e5ca77e RDMA/cma: Export enum cma_state in <rdma/rdma_cm.h>
Move cma.c's internal definition of enum cma_state to enum rdma_cm_state
in an exported header so that it can be exported via RDMA netlink.

Signed-off-by: Nir Muchtar <nirm@voltaire.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2011-05-25 13:46:22 -07:00
Liu Yuan
52f81dbaf1 RDMA/nes: Add a check for strict_strtoul()
It should check if strict_strtoul() succeeds before using
'wqm_quanta_value'.

Signed-off-by: Liu Yuan <tailai.ly@taobao.com>

[ Convert to kstrtoul() directly while we're here.  - Roland ]

Signed-off-by: Roland Dreier <roland@purestorage.com>
2011-05-24 10:06:25 -07:00
Steve Wise
807838686e RDMA/cxgb3: Don't post zero-byte read if endpoint is going away
tx_ack() wasn't checking the endpoint state and consequently would
attempt to post the p2p 0B read on an endpoint/QP that is closing or
aborting.  This causes a NULL pointer dereference crash.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2011-05-24 10:01:04 -07:00
Steve Wise
c337374bf2 RDMA/cxgb4: Use completion objects for event blocking
There exists a race condition when using wait_queue_head_t objects
that are declared on the stack.  This was being done in a few places
where we are sending work requests to the FW and awaiting replies, but
we don't have an endpoint structure with an embedded c4iw_wr_wait
struct.  So the code was allocating it locally on the stack.  Bad
design.  The race is:

  1) thread on cpuX declares the wait_queue_head_t on the stack, then
     posts a firmware WR with that wait object ptr as the cookie to be
     returned in the WR reply.  This thread will proceed to block in
     wait_event_timeout() but before it does:

  2) An interrupt runs on cpuY with the WR reply.  fw6_msg() handles
     this and calls c4iw_wake_up().  c4iw_wake_up() sets the condition
     variable in the c4iw_wr_wait object to TRUE and will call
     wake_up(), but before it calls wake_up():

  3) The thread on cpuX calls c4iw_wait_for_reply(), which calls
     wait_event_timeout().  The wait_event_timeout() macro checks the
     condition variable and returns immediately since it is TRUE.  So
     this thread never blocks/sleeps. The function then returns
     effectively deallocating the c4iw_wr_wait object that was on the
     stack.

  4) So at this point cpuY has a pointer to the c4iw_wr_wait object
     that is no longer valid.  Further its pointing to a stack frame
     that might now be in use by some other context/thread.  So cpuY
     continues execution and calls wake_up() on a ptr to a wait object
     that as been effectively deallocated.

This race, when it hits, can cause a crash in wake_up(), which I've
seen under heavy stress. It can also corrupt the referenced stack
which can cause any number of failures.

The fix:

Use struct completion, which supports on-stack declarations.
Completions use a spinlock around setting the condition to true and
the wake up so that steps 2 and 4 above are atomic and step 3 can
never happen in-between.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
2011-05-24 09:47:38 -07:00
Roland Dreier
737b94eb41 IB/srp: Fix integer -> pointer cast warnings
Fix

    drivers/infiniband/ulp/srp/ib_srp.c: In function 'srp_handle_recv':
    drivers/infiniband/ulp/srp/ib_srp.c:1150: warning: cast to pointer from integer of different size
    drivers/infiniband/ulp/srp/ib_srp.c: In function 'srp_send_completion':
    drivers/infiniband/ulp/srp/ib_srp.c🔢 warning: cast to pointer from integer of different size

by adding an intermediate cast to uintptr_t.

Signed-off-by: Roland Dreier <roland@purestorage.com>
Acked-by: David Dillow <dillowda@ornl.gov>
2011-05-23 11:30:04 -07:00
Roland Dreier
c3af0980ce IB: Add devnode methods to cm_class and umad_class
We want the ucmX, umadX and issmX device nodes to show up under
/dev/infiniband, and additionally ucmX should have mode 0666.  Add
appropriate devnode methods to their class structs for this.

Signed-off-by: Roland Dreier <roland@purestorage.com>
2011-05-23 11:24:28 -07:00
Ira Weiny
c8367c4cd9 IB/mad: Return EPROTONOSUPPORT when an RDMA device lacks the QP required
We had a script which was looping through the devices returned from
ibstat and attempted to register a SMI agent on an ethernet device.
This caused a kernel panic for IBoE devices that don't have QP0.

Fix this by checking if the QP exists before using it.

Signed-off-by: Ira Weiny <weiny2@llnl.gov>
Acked-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2011-05-23 11:15:48 -07:00
Roland Dreier
71c29bd5c2 IB/uverbs: Add devnode method to set path/mode
We want udev to create a device node under /dev/infiniband with
permission 0666 for uverbsX devices, so add a devnode method to set the
appropriate info.

Signed-off-by: Roland Dreier <roland@purestorage.com>
2011-05-23 11:10:05 -07:00
Roland Dreier
04ea2f8197 RDMA/ucma: Add .nodename/.mode to tell userspace where to create device node
We want udev to create a device node under /dev/infiniband with
permission 0666 for rdma_cm, so add that info to our struct miscdevice.

Signed-off-by: Roland Dreier <roland@purestorage.com>
Acked-by: Sean Hefty <sean.hefty@intel.com>
2011-05-23 10:48:43 -07:00
Paul Gortmaker
70c7160619 Add appropriate <linux/prefetch.h> include for prefetch users
After discovering that wide use of prefetch on modern CPUs
could be a net loss instead of a win, net drivers which were
relying on the implicit inclusion of prefetch.h via the list
headers showed up in the resulting cleanup fallout.  Give
them an explicit include via the following $0.02 script.

 =========================================
 #!/bin/bash
 MANUAL=""
 for i in `git grep -l 'prefetch(.*)' .` ; do
 	grep -q '<linux/prefetch.h>' $i
 	if [ $? = 0 ] ; then
 		continue
 	fi

 	(	echo '?^#include <linux/?a'
 		echo '#include <linux/prefetch.h>'
 		echo .
 		echo w
 		echo q
 	) | ed -s $i > /dev/null 2>&1
 	if [ $? != 0 ]; then
 		echo $i needs manual fixup
 		MANUAL="$i $MANUAL"
 	fi
 done
 echo ------------------- 8\<----------------------
 echo vi $MANUAL
 =========================================

Signed-off-by: Paul <paul.gortmaker@windriver.com>
[ Fixed up some incorrect #include placements, and added some
  non-network drivers and the fib_trie.c case    - Linus ]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-05-22 21:41:57 -07:00
Linus Torvalds
06f4e926d2 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6: (1446 commits)
  macvlan: fix panic if lowerdev in a bond
  tg3: Add braces around 5906 workaround.
  tg3: Fix NETIF_F_LOOPBACK error
  macvlan: remove one synchronize_rcu() call
  networking: NET_CLS_ROUTE4 depends on INET
  irda: Fix error propagation in ircomm_lmp_connect_response()
  irda: Kill set but unused variable 'bytes' in irlan_check_command_param()
  irda: Kill set but unused variable 'clen' in ircomm_connect_indication()
  rxrpc: Fix set but unused variable 'usage' in rxrpc_get_transport()
  be2net: Kill set but unused variable 'req' in lancer_fw_download()
  irda: Kill set but unused vars 'saddr' and 'daddr' in irlan_provider_connect_indication()
  atl1c: atl1c_resume() is only used when CONFIG_PM_SLEEP is defined.
  rxrpc: Fix set but unused variable 'usage' in rxrpc_get_peer().
  rxrpc: Kill set but unused variable 'local' in rxrpc_UDP_error_handler()
  rxrpc: Kill set but unused variable 'sp' in rxrpc_process_connection()
  rxrpc: Kill set but unused variable 'sp' in rxrpc_rotate_tx_window()
  pkt_sched: Kill set but unused variable 'protocol' in tc_classify()
  isdn: capi: Use pr_debug() instead of ifdefs.
  tg3: Update version to 3.119
  tg3: Apply rx_discards fix to 5719/5720
  ...

Fix up trivial conflicts in arch/x86/Kconfig and net/mac80211/agg-tx.c
as per Davem.
2011-05-20 13:43:21 -07:00
Roland Dreier
b2cbae2c24 RDMA: Add netlink infrastructure
Add basic RDMA netlink infrastructure that allows for registration of
RDMA clients for which data is to be exported and supplies message
construction callbacks.

Signed-off-by: Nir Muchtar <nirm@voltaire.com>

[ Reorganize a few things, add CONFIG_NET dependency.  - Roland ]

Signed-off-by: Roland Dreier <roland@purestorage.com>
2011-05-20 11:46:11 -07:00
Nir Muchtar
fd75c789ab RDMA: Add error handling to ib_core_init()
Fail RDMA midlayer initialization if sysfs setup fails.

Signed-off-by: Nir Muchtar <nirm@voltaire.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2011-05-20 11:46:10 -07:00
Benjamin Herrenschmidt
880102e785 Merge remote branch 'origin/master' into merge
Manual merge of arch/powerpc/kernel/smp.c and add missing scheduler_ipi()
call to arch/powerpc/platforms/cell/interrupt.c

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2011-05-20 15:36:52 +10:00
Benjamin Herrenschmidt
4c8440666b Merge branch 'merge' into next 2011-05-19 17:00:06 +10:00
Roland Dreier
1df9fad122 Merge branches 'cma', 'cxgb4' and 'qib' into for-next 2011-05-12 08:57:20 -07:00
Sergei Shtylyov
1c65335714 IB/qib: Use pci_dev->revision
The driver reads PCI revision ID from the PCI configuration register
while it's already stored by PCI subsystem in the revision field of
struct pci_dev.

Signed-off-by: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Acked-by: Mike Marciniszyn <mike.marciniszyn@qlogic.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2011-05-12 08:57:12 -07:00
David S. Miller
5fc3590c81 infiniband: Remove rt->rt_src usage in addr4_resolve()
Use an explicit flow key and fetch it from there.

Signed-off-by: David S. Miller <davem@davemloft.net>
2011-05-10 13:32:48 -07:00
Roland Dreier
d0c49bf391 RDMA/iwcm: Get rid of enum iw_cm_event_status
The IW_CM_EVENT_STATUS_xxx values were used in only a couple of places;
cma.c uses -Exxx values instead, and so do the amso1100, cxgb3 and cxgb4
drivers -- only nes was using the enum values (with the mild consequence
that all nes connection failures were treated as generic errors rather
than reported as timeouts or rejections).

We can fix this confusion by getting rid of enum iw_cm_event_status and
using a plain int for struct iw_cm_event.status, and converting nes to
use -Exxx as the other iWARP drivers do.

This also gets rid of the warning

    drivers/infiniband/core/cma.c: In function 'cma_iw_handler':
    drivers/infiniband/core/cma.c:1333:3: warning: case value '4294967185' not in enumerated type 'enum iw_cm_event_status'
    drivers/infiniband/core/cma.c:1336:3: warning: case value '4294967186' not in enumerated type 'enum iw_cm_event_status'
    drivers/infiniband/core/cma.c:1332:3: warning: case value '4294967192' not in enumerated type 'enum iw_cm_event_status'

Signed-off-by: Roland Dreier <roland@purestorage.com>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Reviewed-by: Sean Hefty <sean.hefty@intel.com>
Reviewed-by: Faisal Latif <faisal.latif@intel.com>
2011-05-09 22:23:57 -07:00
Sergei Shtylyov
ec03d6777a IB/ipath: Use pci_dev->revision, again
Commit 44c10138fd ("PCI: Change all drivers to use pci_device->revision")
already converted this driver to using the revision field of struct
pci_dev but commit bb9171448d ("IB/ipath: Misc changes to prepare
for IB7220 introduction") later reverted that change for some strange
reason.  Restore the change.

Signed-off-by: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Acked-by: Mike Marciniszyn <mike.marciniszyn@qlogic.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2011-05-09 22:07:31 -07:00
Mitko Haralanov
9f5754e34b IB/qib: Prevent driver hang with unprogrammed boards
The time limit test now correctly checks against current jiffies to
avoid the hang.

Signed-off-by: Mitko Haralanov <mitko@qlogic.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@qlogic.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2011-05-09 22:07:31 -07:00
Steve Wise
2f25e9a540 RDMA/cxgb4: EEH errors can hang the driver
A few more EEH fixes:

c4iw_wait_for_reply(): detect fatal EEH condition on timeout and
return an error.

The iw_cxgb4 driver was only calling ib_deregister_device() on an EEH
event followed by a ib_register_device() when the device was
reinitialized.  However, the RDMA core doesn't allow multiple
iterations of register/deregister by the provider. See
drivers/infiniband/core/sysfs.c: ib_device_unregister_sysfs() where
the kobject ref is held until the device is deallocated in
ib_deallocate_device().  Calling deregister adds this kobj reference,
and then a subsequent register call will generate a WARN_ON() from the
kobject subsystem because the kobject is being initialized but is
already initialized with the ref held.

So the provider must deregister and dealloc when resetting for an EEH
event, then alloc/register to re-initialize.  To do this, we cannot
use the device ptr as our ULD handle since it will change with each
reallocation.  This commit adds a ULD context struct which is used as
the ULD handle, and then contains the device pointer and other state
needed.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2011-05-09 22:06:23 -07:00
Steve Wise
d9594d990a RDMA/cxgb4: Reset wait condition atomically
The driver was never really waiting for RDMA_WR/FINI completions
because the condition variable used to determine if the completion
happened was never reset, and this condition variable is reused for
both connection setup and teardown.  This causes various driver
crashes under heavy loads due to releasing resources too early.

The fix is to use atomic bits to correctly reset the condition
immediately after the completion is detected.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2011-05-09 22:06:22 -07:00
Roel Kluin
85d215b0f3 RDMA/cxgb4: Fix missing parentheses
Parens are missing: '|' has a higher presedence than '?'.

Signed-off-by: Roel Kluin <roel.kluin@gmail.com>
Acked-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2011-05-09 22:06:22 -07:00
Steve Wise
bbe9a0a2bc RDMA/cxgb4: Initialization errors can cause crash
c4iw_uld_add() must return ERR_PTR() values instead of NULL on failure.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2011-05-09 22:06:22 -07:00
Steve Wise
30c95c2d49 RDMA/cxgb4: Don't change QP state outside EP lock
Concurrent ingress CLOSE and ULP ABORT operations causes a crash due
to a race condition where the close path releases the EP lock and then
tries to move the QP state to CLOSED.  This must be done inside the EP
lock to avoid the race.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2011-05-09 22:06:22 -07:00
Hefty, Sean
a9bb79128a RDMA/cma: Add an ID_REUSEADDR option
Lustre requires that clients bind to a privileged port number before
connecting to a remote server.  On larger clusters (typically more
than about 1000 nodes), the number of privileged ports is exhausted,
resulting in lustre being unusable.

To handle this, we add support for reusable addresses to the rdma_cm.
This mimics the behavior of the socket option SO_REUSEADDR.  A user
may set an rdma_cm_id to reuse an address before calling
rdma_bind_addr() (explicitly or implicitly).  If set, other
rdma_cm_id's may be bound to the same address, provided that they all
have reuse enabled, and there are no active listens.

If rdma_listen() is called on an rdma_cm_id that has reuse enabled, it
will only succeed if there are no other id's bound to that same
address.  The reuse option is exported to user space.  The behavior of
the kernel reuse implementation was verified against that given by
sockets.

This patch is derived from a path by Ira Weiny <weiny2@llnl.gov>

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2011-05-09 22:06:10 -07:00
Hefty, Sean
43b752daae RDMA/cma: Fix handling of IPv6 addressing in cma_use_port
cma_use_port() assumes that the sockaddr is an IPv4 address.  Since
IPv6 addressing is supported (and also to support other address
families) make the code more generic in its address handling.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2011-05-09 22:06:10 -07:00
David S. Miller
31e4543db2 ipv4: Make caller provide on-stack flow key to ip_route_output_ports().
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-05-03 20:25:42 -07:00
David Decotigny
7073949720 ethtool: cosmetic: Use ethtool ethtool_cmd_speed API
This updates the network drivers so that they don't access the
ethtool_cmd::speed field directly, but use ethtool_cmd_speed()
instead.

For most of the drivers, these changes are purely cosmetic and don't
fix any problem, such as for those 1GbE/10GbE drivers that indirectly
call their own ethtool get_settings()/mii_ethtool_gset(). The changes
are meant to enforce code consistency and provide robustness with
future larger throughputs, at the expense of a few CPU cycles for each
ethtool operation.

All drivers compiled with make allyesconfig ion x86_64 have been
updated.

Tested: make allyesconfig on x86_64 + e1000e/bnx2x work
Signed-off-by: David Decotigny <decot@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-04-29 14:03:01 -07:00
Lucas De Marchi
e9c549998d Revert wrong fixes for common misspellings
These changes were incorrectly fixed by codespell. They were now
manually corrected.

Signed-off-by: Lucas De Marchi <lucas.demarchi@profusion.mobi>
2011-04-26 23:31:11 -07:00
Nishanth Aravamudan
e297d9dd5c cxgb4: use pgprot_writecombine() on powerpc
Commit fe3cc0d99d ("powerpc: Add
pgprot_writecombine") in benh's tree exposes the pgprot_writecombine()
API to drivers on powerpc. cxgb4 has an open-coded version of the same,
so use the common API now that it's available.

Signed-off-by: Nishanth Aravamudan <nacc@us.ibm.com>
Cc: Steve Wise <swise@opengridcomputing.com>
Cc: Anton Blanchard <anton@samba.org>
Acked-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2011-04-27 14:18:25 +10:00
Michał Mirosław
3d96c74d89 net: infiniband/ulp/ipoib: convert to hw_features
Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-04-20 01:30:42 -07:00
Michał Mirosław
dd6f6d0249 net: infiniband/hw/nes: convert to hw_features
Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-04-20 01:30:41 -07:00
Lucas De Marchi
25985edced Fix common misspellings
Fixes generated by 'codespell' and manually reviewed.

Signed-off-by: Lucas De Marchi <lucas.demarchi@profusion.mobi>
2011-03-31 11:26:23 -03:00
Linus Torvalds
dc50eddb2f Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband:
  RDMA/nes: Fix test of uninitialized netdev
2011-03-25 21:06:37 -07:00
Linus Torvalds
00a2470546 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (56 commits)
  route: Take the right src and dst addresses in ip_route_newports
  ipv4: Fix nexthop caching wrt. scoping.
  ipv4: Invalidate nexthop cache nh_saddr more correctly.
  net: fix pch_gbe section mismatch warning
  ipv4: fix fib metrics
  mlx4_en: Removing HW info from ethtool -i report.
  net_sched: fix THROTTLED/RUNNING race
  drivers/net/a2065.c: Convert release_resource to release_region/release_mem_region
  drivers/net/ariadne.c: Convert release_resource to release_region/release_mem_region
  bonding: fix rx_handler locking
  myri10ge: fix rmmod crash
  mlx4_en: updated driver version to 1.5.4.1
  mlx4_en: Using blue flame support
  mlx4_core: reserve UARs for userspace consumers
  mlx4_core: maintain available field in bitmap allocator
  mlx4: Add blue flame support for kernel consumers
  mlx4_en: Enabling new steering
  mlx4: Add support for promiscuous mode in the new steering model.
  mlx4: generalization of multicast steering.
  mlx4_en: Reporting HW revision in ethtool -i
  ...
2011-03-25 21:02:22 -07:00
Roland Dreier
cf55bb2439 RDMA/nes: Fix test of uninitialized netdev
Commit 1765a57533 ("net: make dev->master general") introduced a
test of an uninitialized netdev.  Fix the code so the intended netdev
is tested.

Signed-off-by: Roland Dreier <roland@purestorage.com>
2011-03-24 17:18:30 -07:00
Linus Torvalds
0625bef606 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband:
  IB: Increase DMA max_segment_size on Mellanox hardware
  IB/mad: Improve an error message so error code is included
  RDMA/nes: Don't print success message at level KERN_ERR
  RDMA/addr: Fix return of uninitialized ret value
  IB/srp: try to use larger FMR sizes to cover our mappings
  IB/srp: add support for indirect tables that don't fit in SRP_CMD
  IB/srp: rework mapping engine to use multiple FMR entries
  IB/srp: allow sg_tablesize to be set for each target
  IB/srp: move IB CM setup completion into its own function
  IB/srp: always avoid non-zero offsets into an FMR
2011-03-24 07:59:46 -07:00
Yevgeny Petrilin
0345584e0b mlx4: generalization of multicast steering.
The same packet steering mechanism would be used both for IB and Ethernet,
Both multicasts and unicasts.
This commit prepares the general infrastructure for this.

Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-23 12:24:21 -07:00
Yevgeny Petrilin
725c89997e mlx4_en: Reporting HW revision in ethtool -i
HW revision is derived from device ID and rev id.

Signed-off-by: Eugenia Emantayev <eugenia@mellanox.co.il>
Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-23 12:24:20 -07:00
Roland Dreier
ba82638247 Merge branches 'misc', 'nes' and 'srp' into for-next 2011-03-23 11:13:37 -07:00
David Dillow
7f9e5c48c1 IB: Increase DMA max_segment_size on Mellanox hardware
By default, each device is assumed to be able only handle 64 KB chunks
during DMA. By giving the segment size a larger value, the block layer
will coalesce more S/G entries together for SRP, allowing larger
requests with the same sg_tablesize setting.  The block layer is the
only direct user of it, though a few IOMMU drivers reference it as
well for their *_map_sg coalescing code. pci-gart_64 on x86, and a
smattering on on sparc, powerpc, and ia64.

Since other IB protocols could potentially see larger segments with
this, let's check those:

 - iSER is fine, because you limit your maximum request size to 512
   KB, so we'll never overrun the page vector in struct iser_page_vec
   (128 entries currently). It is independent of the DMA segment size,
   and handles multi-page segments already.

 - IPoIB is fine, as it maps each page individually, and doesn't use
   ib_dma_map_sg().

 - RDS appears to do the right thing and has no dependencies on DMA
   segment size, but I don't claim to have done a complete audit.

 - NFSoRDMA and 9p are OK -- they do not use ib_dma_map_sg(), so they
   doesn't care about the coalescing.

 - Lustre's ko2iblnd does not care about coalescing -- it properly
   walks the returned sg list.

This patch ups the value on Mellanox hardware to 1 GB, which matches
reported firmware limits on mlx4.

Signed-off-by: David Dillow <dillowda@ornl.gov>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2011-03-22 09:39:18 -07:00
Roland Dreier
071c778347 Merge branch 'external-indirect' of git://git.kernel.org/pub/scm/linux/kernel/git/dad/srp-initiator into srp 2011-03-21 11:52:00 -07:00
Michael Heinz
1eba843dd7 IB/mad: Improve an error message so error code is included
Signed-off-by: Michael Heinz <michael.heinz@qlogic.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2011-03-18 09:42:20 -07:00
Roland Dreier
748bfd9c1d RDMA/nes: Don't print success message at level KERN_ERR
There's no reason to print "NetEffect RNIC driver successfully loaded" 
at level KERN_ERR (where it will uglify the console on a quiet boot).
Change it to KERN_INFO.

Signed-off-by: Roland Dreier <roland@purestorage.com>
2011-03-18 08:52:30 -07:00
Linus Torvalds
ec0afc9311 Merge branch 'kvm-updates/2.6.39' of git://git.kernel.org/pub/scm/virt/kvm/kvm
* 'kvm-updates/2.6.39' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (55 commits)
  KVM: unbreak userspace that does not sets tss address
  KVM: MMU: cleanup pte write path
  KVM: MMU: introduce a common function to get no-dirty-logged slot
  KVM: fix rcu usage in init_rmode_* functions
  KVM: fix kvmclock regression due to missing clock update
  KVM: emulator: Fix permission checking in io permission bitmap
  KVM: emulator: Fix io permission checking for 64bit guest
  KVM: SVM: Load %gs earlier if CONFIG_X86_32_LAZY_GS=n
  KVM: x86: Remove useless regs_page pointer from kvm_lapic
  KVM: improve comment on rcu use in irqfd_deassign
  KVM: MMU: remove unused macros
  KVM: MMU: cleanup page alloc and free
  KVM: MMU: do not record gfn in kvm_mmu_pte_write
  KVM: MMU: move mmu pages calculated out of mmu lock
  KVM: MMU: set spte accessed bit properly
  KVM: MMU: fix kvm_mmu_slot_remove_write_access dropping intermediate W bits
  KVM: Start lock documentation
  KVM: better readability of efer_reserved_bits
  KVM: Clear async page fault hash after switching to real mode
  KVM: VMX: Initialize vm86 TSS only once.
  ...
2011-03-17 18:40:35 -07:00
Linus Torvalds
c55d267de2 Merge git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6: (170 commits)
  [SCSI] scsi_dh_rdac: Add MD36xxf into device list
  [SCSI] scsi_debug: add consecutive medium errors
  [SCSI] libsas: fix ata list corruption issue
  [SCSI] hpsa: export resettable host attribute
  [SCSI] hpsa: move device attributes to avoid forward declarations
  [SCSI] scsi_debug: Logical Block Provisioning (SBC3r26)
  [SCSI] sd: Logical Block Provisioning update
  [SCSI] Include protection operation in SCSI command trace
  [SCSI] hpsa: fix incorrect PCI IDs and add two new ones (2nd try)
  [SCSI] target: Fix volume size misreporting for volumes > 2TB
  [SCSI] bnx2fc: Broadcom FCoE offload driver
  [SCSI] fcoe: fix broken fcoe interface reset
  [SCSI] fcoe: precedence bug in fcoe_filter_frames()
  [SCSI] libfcoe: Remove stale fcoe-netdev entries
  [SCSI] libfcoe: Move FCOE_MTU definition from fcoe.h to libfcoe.h
  [SCSI] libfc: introduce __fc_fill_fc_hdr that accepts fc_hdr as an argument
  [SCSI] fcoe, libfc: initialize EM anchors list and then update npiv EMs
  [SCSI] Revert "[SCSI] libfc: fix exchange being deleted when the abort itself is timed out"
  [SCSI] libfc: Fixing a memory leak when destroying an interface
  [SCSI] megaraid_sas: Version and Changelog update
  ...

Fix up trivial conflicts due to whitespace differences in
drivers/scsi/libsas/{sas_ata.c,sas_scsi_host.c}
2011-03-17 17:54:40 -07:00
Sean Hefty
1bdd6384c2 RDMA/addr: Fix return of uninitialized ret value
Commit b23dd4fe42 ("ipv4: Make output route lookup return rtable
directly") resulted in leaving ret uninitialized, where it may later
be returned.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2011-03-17 17:00:19 -07:00
Huang Ying
0014bd990e mm: export __get_user_pages
In most cases, get_user_pages and get_user_pages_fast should be used
to pin user pages in memory.  But sometimes, some special flags except
FOLL_GET, FOLL_WRITE and FOLL_FORCE are needed, for example in
following patch, KVM needs FOLL_HWPOISON.  To support these users,
__get_user_pages is exported directly.

There are some symbol name conflicts in infiniband driver, fixed them too.

Signed-off-by: Huang Ying <ying.huang@intel.com>
CC: Andrew Morton <akpm@linux-foundation.org>
CC: Michel Lespinasse <walken@google.com>
CC: Roland Dreier <roland@kernel.org>
CC: Ralph Campbell <infinipath@qlogic.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2011-03-17 13:08:27 -03:00
Linus Torvalds
7a6362800c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6: (1480 commits)
  bonding: enable netpoll without checking link status
  xfrm: Refcount destination entry on xfrm_lookup
  net: introduce rx_handler results and logic around that
  bonding: get rid of IFF_SLAVE_INACTIVE netdev->priv_flag
  bonding: wrap slave state work
  net: get rid of multiple bond-related netdevice->priv_flags
  bonding: register slave pointer for rx_handler
  be2net: Bump up the version number
  be2net: Copyright notice change. Update to Emulex instead of ServerEngines
  e1000e: fix kconfig for crc32 dependency
  netfilter ebtables: fix xt_AUDIT to work with ebtables
  xen network backend driver
  bonding: Improve syslog message at device creation time
  bonding: Call netif_carrier_off after register_netdevice
  bonding: Incorrect TX queue offset
  net_sched: fix ip_tos2prio
  xfrm: fix __xfrm_route_forward()
  be2net: Fix UDP packet detected status in RX compl
  Phonet: fix aligned-mode pipe socket buffer header reserve
  netxen: support for GbE port settings
  ...

Fix up conflicts in drivers/staging/brcm80211/brcmsmac/wl_mac80211.c
with the staging updates.
2011-03-16 16:29:25 -07:00
David Dillow
be8b981453 IB/srp: try to use larger FMR sizes to cover our mappings
Now that we can get larger SG lists, we can take advantage of HCAs that
allow us to use larger FMR sizes. In many cases, we can use up to 512
entries, so start there and work our way down.

Signed-off-by: David Dillow <dillowda@ornl.gov>
2011-03-15 19:41:30 -04:00
David Dillow
c07d424d61 IB/srp: add support for indirect tables that don't fit in SRP_CMD
This allows us to guarantee the ability to submit up to 8 MB requests
based on the current value of SCSI_MAX_SG_CHAIN_SEGMENTS. While FMR will
usually condense the requests into 8 SG entries, it is imperative that
the target support external tables in case the FMR mapping fails or is
not supported.

We add a safety valve to allow targets without the needed support to
reap the benefits of the large tables, but fail in a manner that lets
the user know that the data didn't make it to the device. The user must
add "allow_ext_sg=1" to the target parameters to indicate that the
target has the needed support.

If indirect_sg_entries is not specified in the modules options, then
the sg_tablesize for the target will default to cmd_sg_entries unless
overridden by the target options.

Signed-off-by: David Dillow <dillowda@ornl.gov>
2011-03-15 19:37:23 -04:00
David Dillow
8f26c9ff9c IB/srp: rework mapping engine to use multiple FMR entries
Instead of forcing all of the S/G entries to fit in one FMR, and falling
back to indirect descriptors if that fails, allow the use of as many
FMRs as needed to map the request. This lays the groundwork for allowing
indirect descriptor tables that are larger than can fit in the command
IU, but should marginally improve performance now by reducing the number
of indirect descriptors needed.

We increase the minimum page size for the FMR pool to 4K, as larger
pages help increase the coverage of each FMR, and it is rare that the
kernel would send down a request with scattered 512 byte fragments.

This patch also move some of the target initialization code afte the
parsing of options, to keep it together with the new code that needs to
allocate memory based on the options given.

Signed-off-by: David Dillow <dillowda@ornl.gov>
2011-03-15 19:35:16 -04:00
David Dillow
4924864404 IB/srp: allow sg_tablesize to be set for each target
Different configurations of target software allow differing max sizes of
the command IU. Allowing this to be changed per-target allows all
targets on an initiator to get an optimal setting.

We deprecate srp_sg_tablesize and replace it with cmd_sg_entries in
preparation for allowing more indirect descriptors than can fit in the
IU.

Signed-off-by: David Dillow <dillowda@ornl.gov>
2011-03-15 19:35:05 -04:00
David Dillow
961e0be89a IB/srp: move IB CM setup completion into its own function
This is to clean up prior to further changes.

Signed-off-by: David Dillow <dillowda@ornl.gov>
2011-03-15 19:34:48 -04:00
David Dillow
8c4037b501 IB/srp: always avoid non-zero offsets into an FMR
It is unclear exactly how this code works around Mellanox SRP targets,
or if the problem is on the target side or in the HCA itself. In an
abundance of caution, we should always enable the workaround.

Signed-off-by: David Dillow <dillowda@ornl.gov>
2011-03-15 19:34:28 -04:00
Roland Dreier
043332cf28 Merge branches 'cma', 'cxgb4', 'ipath' and 'qib' into for-next 2011-03-15 10:58:04 -07:00
Sean Hefty
a396d43a35 RDMA/cma: Replace global lock in rdma_destroy_id() with id-specific one
rdma_destroy_id currently uses the global rdma cm 'lock' to test if an
rdma_cm_id has been bound to a device.  This prevents an active
address resolution callback handler from assigning a device to the
rdma_cm_id after rdma_destroy_id checks for one.

Instead, we can replace the use of the global lock around the check to
the rdma_cm_id device pointer by setting the id state to destroying,
then flushing all active callbacks.  The latter is accomplished by
acquiring and releasing the handler_mutex.  Any active handler will
complete first, and any newly scheduled handlers will find the
rdma_cm_id in an invalid state.

In addition to optimizing the current locking scheme, the use of the
rdma_cm_id mutex is a more intuitive synchronization mechanism than
that of the global lock.  These changes are based on feedback from
Doug Ledford <dledford@redhat.com> while he was trying to debug a
crash in the rdma cm destroy path.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2011-03-15 10:57:34 -07:00
Sean Hefty
8d8ac86564 IB/cm: Cancel pending LAP message when exiting IB_CM_ESTABLISH state
This problem was reported by Moni Shoua <monis@mellanox.com> and Amir
Vadai <amirv@mellanox.com>:

	When destroying a cm_id from a context of a work queue and if
	the lap_state of this cm_id is IB_CM_LAP_SENT, we need to
	release the reference of this id that was taken upon the send
	of the LAP message.  Otherwise, if the expected APR message
	gets lost, it is only after a long time that the reference
	will be released, while during that the work handler thread is
	not available to process other things.

It turns out that we need to cancel any pending LAP messages whenever
we transition out of the IB_CM_ESTABLISH state.  This occurs when
disconnecting - either sending or receiving a DREQ.  It can also
happen in a corner case where we receive a REJ message after sending
an RTU, followed by a LAP.  Add checks and cancel any outstanding LAP
messages in these three cases.

Canceling the LAP when sending a DREQ fixes the destroy problem
reported by Moni.  When a cm_id is destroyed in the IB_CM_ESTABLISHED
state, it sends a DREQ to the remote side to notify the peer that the
connection is going away.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2011-03-15 10:56:12 -07:00
Sean Hefty
29963437a4 IB/cm: Bump reference count on cm_id before invoking callback
When processing a SIDR REQ, the ib_cm allocates a new cm_id.  The
refcount of the cm_id is initialized to 1.  However, cm_process_work
will decrement the refcount after invoking all callbacks.  The result
is that the cm_id will end up with refcount set to 0 by the end of the
sidr req handler.

If a user tries to destroy the cm_id, the destruction will proceed,
under the incorrect assumption that no other threads are referencing
the cm_id.  This can lead to a crash when the cm callback thread tries
to access the cm_id.

This problem was noticed as part of a larger investigation with kernel
crashes in the rdma_cm when running on a real time OS.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Acked-by: Doug Ledford <dledford@redhat.com>
Cc: <stable@kernel.org>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2011-03-15 10:56:12 -07:00
Sean Hefty
25ae21a101 RDMA/cma: Fix crash in request handlers
Doug Ledford and Red Hat reported a crash when running the rdma_cm on
a real-time OS.  The crash has the following call trace:

    cm_process_work
       cma_req_handler
          cma_disable_callback
          rdma_create_id
             kzalloc
             init_completion
          cma_get_net_info
          cma_save_net_info
          cma_any_addr
             cma_zero_addr
          rdma_translate_ip
             rdma_copy_addr
          cma_acquire_dev
             rdma_addr_get_sgid
             ib_find_cached_gid
             cma_attach_to_dev
          ucma_event_handler
             kzalloc
             ib_copy_ah_attr_to_user
          cma_comp

[ preempted ]

    cma_write
        copy_from_user
        ucma_destroy_id
           copy_from_user
           _ucma_find_context
           ucma_put_ctx
           ucma_free_ctx
              rdma_destroy_id
                 cma_exch
                 cma_cancel_operation
                 rdma_node_get_transport

        rt_mutex_slowunlock
        bad_area_nosemaphore
        oops_enter

They were able to reproduce the crash multiple times with the
following details:

    Crash seems to always happen on the:
            mutex_unlock(&conn_id->handler_mutex);
    as conn_id looks to have been freed during this code path.

An examination of the code shows that a race exists in the request
handlers.  When a new connection request is received, the rdma_cm
allocates a new connection identifier.  This identifier has a single
reference count on it.  If a user calls rdma_destroy_id() from another
thread after receiving a callback, rdma_destroy_id will proceed to
destroy the id and free the associated memory.  However, the request
handlers may still be in the process of running.  When control returns
to the request handlers, they can attempt to access the newly created
identifiers.

Fix this by holding a reference on the newly created rdma_cm_id until
the request handler is through accessing it.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Acked-by: Doug Ledford <dledford@redhat.com>
Cc: <stable@kernel.org>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2011-03-15 10:00:28 -07:00
Nicolas Kaiser
2a543904dd IB/ipath: Don't reset disabled devices
The comment some lines above states that disabled devices must not reset.

Signed-off-by: Nicolas Kaiser <nikai@nikai.net>
2011-03-14 14:25:59 -07:00
Mitko Haralanov
36b87b419c IB/qib: Fix M_Key field in SubnGet and SubnGetResp MADs
Set the M_Key field in SubnGet and SugnGetResp MADs based on correctly
interpreting the protection level specified in the M_KeyProtBits field.

Signed-off-by: Mitko Haralanov <mitko@qlogic.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@qlogic.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2011-03-14 12:11:51 -07:00
Mitko Haralanov
4634b7945c IB/qib: Set default LE2 value for active cables to 0
For active and far-EQ cables use an LE2 value of 0 for improved SI.

Signed-off-by: Mitko Haralanov <mitko@qlogic.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@qlogic.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2011-03-14 12:10:34 -07:00
Steve Wise
db5d040d7b RDMA/cxgb4: Debugfs dump_qp() updates
- Show whether the SQ is in onchip memory or not.
- Dump both SQ and RQ QIDs.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2011-03-14 12:09:14 -07:00
Steve Wise
767fbe8151 RDMA/cxgb4: Dispatch FATAL event on EEH errors
This at least kicks the user mode applications that are watching for
device events.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2011-03-14 12:09:13 -07:00
Steve Wise
b48f3b9c10 RDMA/cxgb4: Use ULP_MODE_TCPDDP
Set the ULP mode for initial RDMA connection setup to the proper DDP
mode.  This avoids wasting some HW resources while in streaming mode.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2011-03-14 12:09:12 -07:00
Steve Wise
a9c7719800 RDMA/cxgb4: Enable on-chip SQ support by default
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2011-03-14 12:09:12 -07:00
Steve Wise
ffc3f7487f RDMA/cxgb4: Do CIDX_INC updates every 1/16 CQ depth CQE reaps
This avoids the CIDX_INC overflow issue with T4A2 when running
kernel RDMA applications.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2011-03-14 12:09:11 -07:00
Steve Wise
2942813739 RDMA/cxgb4: Remove db_drop_task
Unloading iw_cxgb4 can crash due to the unload code trying to use
db_drop_task, which is uninitialized.  So remove this dead code.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2011-03-14 12:09:10 -07:00
Steve Wise
b52fe09e33 RDMA/cxgb4: Turn on delayed ACK
Set the default to on.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2011-03-14 12:09:09 -07:00
David S. Miller
4c9483b2fb ipv6: Convert to use flowi6 where applicable.
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-12 15:08:54 -08:00
David S. Miller
1d28f42c1b net: Put flowi_* prefix on AF independent members of struct flowi
I intend to turn struct flowi into a union of AF specific flowi
structs.  There will be a common structure that each variant includes
first, much like struct sock_common.

This is the first step to move in that direction.

Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-12 15:08:44 -08:00
David S. Miller
78fbfd8a65 ipv4: Create and use route lookup helpers.
The idea here is this minimizes the number of places one has to edit
in order to make changes to how flows are defined and used.

Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-12 15:08:42 -08:00
David S. Miller
b23dd4fe42 ipv4: Make output route lookup return rtable directly.
Instead of on the stack.

Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-02 14:31:35 -08:00
David S. Miller
273447b352 ipv4: Kill can_sleep arg to ip_route_output_flow()
This boolean state is now available in the flow flags.

Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-01 14:27:04 -08:00
David S. Miller
420d44daa7 ipv4: Make final arg to ip_route_output_flow to be boolean "can_sleep"
Since that is what the current vague "flags" argument means.

Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-01 14:19:23 -08:00
Mike Christie
7c53c6f89d [SCSI] iser: export addr and port
This pactch has iser export the address and port
of the endpoint.

Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
2011-02-24 12:41:23 -05:00
Mitko Haralanov
cc7fb05946 IB/qib: Return correct MAD when setting link width to 255
Fix a bug which causes the driver to return incorrect MADs as a
response to Set(PortInfo) which sets the link width to 0xFF or link
speed to 0xF.

Signed-off-by: Mitko Haralanov <mitko@qlogic.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@qlogic.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2011-02-22 16:56:37 -08:00
David S. Miller
da935c66ba Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
Conflicts:
	Documentation/feature-removal-schedule.txt
	drivers/net/e1000e/netdev.c
	net/xfrm/xfrm_policy.c
2011-02-19 19:17:35 -08:00
Roland Dreier
814b0a6120 Merge branches 'nes' and 'qib' into for-next 2011-02-17 14:04:59 -08:00
Mike Marciniszyn
c0af2c057d IB/qib: Prevent double completions after a timeout or RNR error
There is a double completion associated with error handling for RC QPs.

The sequence is:

 - The do_rc_ack() routine fields an RNR nack and there are 0
   rnr_retries configured on the QP.
 - qib_error_qp() stops the pending timer
 - qib_rc_send_complete() is called from sdma_complete()
 - qib_rc_send_complete() starts the timer because the msb of the psn
   just completed says an ack is needed.
 - a bunch of flushes occur as ipoib posts WQEs to an error'ed QP
 - rc_timeout() calls qib_restart_rc()
 - qib_restart_rc() calls qib_send_complete() with a
   IB_WC_RETRY_EXC_ERR on a wqe that has already been completed in the
   past

The fix avoids starting the timer since another packet will never
arrive.

Signed-off-by: Mike Marciniszyn <mike.marciniszyn@qlogic.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2011-02-17 14:04:50 -08:00
Jiri Pirko
1765a57533 net: make dev->master general
dev->master is now tightly connected to bonding driver. This patch makes
this pointer more general and ready to be used by others.

 - netdev_set_master() - bond specifics moved to new function
   netdev_set_bond_master()
 - introduced netif_is_bond_slave() to check if device is a bonding slave

Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-02-13 10:42:07 -08:00
Mike Marciniszyn
414ed90cee IB/qib: Fix double add_timer()
The following panic BUG_ON occurs during qib testing:

    Kernel BUG at include/linux/timer.h:82

    RIP  [<ffffffff881f7109>] :ib_qib:start_timer+0x73/0x89
     RSP <ffffffff80425bd0>
     <0>Kernel panic - not syncing: Fatal exception
     <0>Dumping qib trace buffer from panic
    qib_set_lid INFO: IB0:1 got a lid: 0xf8
    Done dumping qib trace buffer
    BUG: warning at kernel/panic.c:137/panic() (Tainted: G

The flaw is due to a missing state test when processing responses that
results in an add_timer() call when the same timer is already queued.
This code was executing in parallel with a QP destroy on another CPU
that had changed the state to reset, but the missing test caused to
response handling code to run on into the panic.

Signed-off-by: Mike Marciniszyn <mike.marciniszyn@qlogic.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2011-02-10 11:24:08 -08:00
Maciej Sosnowski
25a54a6bb8 RDMA/nes: Don't generate async events for unregistered devices
nes_port_ibevent() should not be called when the nes RDMA device is not
registered with the RDMA core.  Add missing checks of of_device_registered flag.

Signed-off-by: Maciej Sosnowski <maciej.sosnowski@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2011-02-03 15:55:26 -08:00
Linus Torvalds
9118626a30 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband:
  RDMA: Update missed conversion of flush_scheduled_work()
  RDMA/ucma: Copy iWARP route information on queries
  RDMA/amso1100: Fix compile warnings
  RDMA/cxgb4: Set the correct device physical function for iWARP connections
  RDMA/cxgb4: Limit MAXBURST EQ context field to 256B
  IB/qib: Hold link for TX SERDES settings
  mlx4_core: Add ConnectX-3 device IDs
2011-02-03 11:19:26 -08:00
Roland Dreier
e51c7b1ab0 Merge branches 'amso1100', 'cma', 'cxgb4', 'misc', 'mlx4' and 'qib' into for-next 2011-01-29 20:45:04 -08:00
Tejun Heo
96e61fa55e RDMA: Update missed conversion of flush_scheduled_work()
Commit f06267104d ("RDMA: Update workqueue usage") introduced ib_wq
and removed the use of flush_scheduled_work(); however, during the merge
process one chunk was lost in ib_sa_remove_one().  Fix it.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2011-01-28 16:39:08 -08:00
Steve Wise
e86f8b06f5 RDMA/ucma: Copy iWARP route information on queries
For iWARP rdma_cm ids, the "route" information is the L2 src and
next hop addresses.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2011-01-28 16:34:05 -08:00
Ralf Thielow
f9a4f6dcdd RDMA/amso1100: Fix compile warnings
Fix compile warnings on 32-bit by using "0" instead of "(u64) NULL" to
assign to "c2_vq_req->reply_msg".

Signed-off-by: Ralf Thielow <ralf.thielow@googlemail.com>

[ Change from "(unsigned long) NULL" to plain old "0" as suggested by
  Bart Van Assche <bvanassche@acm.org>.  - Roland ]

Signed-off-by: Roland Dreier <roland@purestorage.com>
2011-01-28 15:40:25 -08:00
Steve Wise
94788657c9 RDMA/cxgb4: Set the correct device physical function for iWARP connections
The PF passed to FW was 0, causing PCI failures in an SR-IOV environment.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Cc: <stable@kernel.org>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2011-01-28 15:34:28 -08:00
Steve Wise
6a09a9d694 RDMA/cxgb4: Limit MAXBURST EQ context field to 256B
MAXBURST cannot exceed 256B for on-chip queues.  With a 512B MAXBURST,
we can lock up the chip.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Cc: <stable@kernel.org>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2011-01-28 15:34:24 -08:00
Mitko Haralanov
d70585f7de IB/qib: Hold link for TX SERDES settings
Hold the IB link at DISABLED until we get the correct TX settings
on mezz boards.

Signed-off-by: Mike Marciniszyn <mike.marciniszyn@qlogic.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2011-01-28 15:30:02 -08:00
David Rientjes
6a108a14fa kconfig: rename CONFIG_EMBEDDED to CONFIG_EXPERT
The meaning of CONFIG_EMBEDDED has long since been obsoleted; the option
is used to configure any non-standard kernel with a much larger scope than
only small devices.

This patch renames the option to CONFIG_EXPERT in init/Kconfig and fixes
references to the option throughout the kernel.  A new CONFIG_EMBEDDED
option is added that automatically selects CONFIG_EXPERT when enabled and
can be used in the future to isolate options that should only be
considered for embedded systems (RISC architectures, SLOB, etc).

Calling the option "EXPERT" more accurately represents its intention: only
expert users who understand the impact of the configuration changes they
are making should enable it.

Reviewed-by: Ingo Molnar <mingo@elte.hu>
Acked-by: David Woodhouse <david.woodhouse@intel.com>
Signed-off-by: David Rientjes <rientjes@google.com>
Cc: Greg KH <gregkh@suse.de>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Robin Holt <holt@sgi.com>
Cc: <linux-arch@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-01-20 17:02:05 -08:00
Linus Torvalds
6845a44a31 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband:
  RDMA: Update workqueue usage
  RDMA/nes: Fix incorrect SFP+ link status detection on driver init
  RDMA/nes: Fix SFP+ link down detection issue with switch port disable
  RDMA/nes: Generate IB_EVENT_PORT_ERR/PORT_ACTIVE events
  RDMA/nes: Fix bonding on iw_nes
  IB/srp: Test only once whether iu allocation succeeded
  IB/mlx4: Handle protocol field in multicast table
  RDMA: Use vzalloc() to replace vmalloc()+memset(0)
  mlx4_{core, ib, en}: Fix driver when sizeof (phys_addr_t) > sizeof (long)
  IB/mthca: Fix driver when sizeof (phys_addr_t) > sizeof (long)
2011-01-17 14:45:48 -08:00
Roland Dreier
4790f4dc5f Merge branches 'misc', 'mlx4', 'mthca', 'nes' and 'srp' into for-next 2011-01-16 21:22:41 -08:00
Tejun Heo
f06267104d RDMA: Update workqueue usage
* ib_wq is added, which is used as the common workqueue for infiniband
  instead of the system workqueue.  All system workqueue usages
  including flush_scheduled_work() callers are converted to use and
  flush ib_wq.

* cancel_delayed_work() + flush_scheduled_work() converted to
  cancel_delayed_work_sync().

* qib_wq is removed and ib_wq is used instead.

This is to prepare for deprecation of flush_scheduled_work().

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2011-01-16 21:16:31 -08:00
Maciej Sosnowski
843276ad98 RDMA/nes: Fix incorrect SFP+ link status detection on driver init
During iw_nes initialization the link status for SFP+ PHY is always
detected as "up" regardless of real state (cable either connected or
disconnected).  Add SFP+ PHY specific link status detection to the
iw_nes initialization procedure.  Use link status recheck for
netdev_open to detect delayed state updates.

Signed-off-by: Maciej Sosnowski <maciej.sosnowski@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2011-01-16 13:23:35 -08:00
Maciej Sosnowski
5f61b2c693 RDMA/nes: Fix SFP+ link down detection issue with switch port disable
In case of SFP+ PHY, link status check at interrupt processing can
give false results.  For proper link status change detection a delayed
recheck is needed to give nes registers time to settle.  Add a
periodic link status recheck scheduled at interrupt to detect
potential delayed registers state changes.

Addresses: http://bugs.openfabrics.org/bugzilla/show_bug.cgi?id=2117
Signed-off-by: Maciej Sosnowski <maciej.sosnowski@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2011-01-16 13:23:34 -08:00
Maciej Sosnowski
ea623455b7 RDMA/nes: Generate IB_EVENT_PORT_ERR/PORT_ACTIVE events
Depending on link state change, IB_EVENT_PORT_ERR or
IB_EVENT_PORT_ACTIVE should be generated when handling MAC interrupts.

Plugging in a cable happens to result in series of interrupts changing
driver's link state a number of times before finally staying at link
up (e.g. link up, link down, link up, link down, ..., link up).  To
prevent sending series of redundant IB_EVENT_PORT_ACTIVE and
IB_EVENT_PORT_ERR events, we use a timer to debounce them in
nes_port_ibevent().

Signed-off-by: Maciej Sosnowski <maciej.sosnowski@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2011-01-16 13:23:34 -08:00
Maciej Sosnowski
2a4c97ead4 RDMA/nes: Fix bonding on iw_nes
Enable configuring bonds on nes devices by adding missing support for
master net_device to the driver.

Signed-off-by: Maciej Sosnowski <maciej.sosnowski@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2011-01-16 13:23:33 -08:00
Bart Van Assche
695b83495e IB/srp: Test only once whether iu allocation succeeded
Merge the two tests in srp_queuecommand() of whether information unit
allocation succeeded into one.  An intended side effect of this change
is that we fix the warning:

    drivers/infiniband/ulp/srp/ib_srp.c: In function 'srp_queuecommand':
    drivers/infiniband/ulp/srp/ib_srp.c:1116: warning: 'req' may be used uninitialized in this function

(seen with CONFIG_CC_OPTIMIZE_FOR_SIZE=y at least with gcc 4.4.4)

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Acked-by: David Dillow <dillowda@ornl.gov>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2011-01-13 14:00:43 -08:00
Linus Torvalds
008d23e485 Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial
* 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (43 commits)
  Documentation/trace/events.txt: Remove obsolete sched_signal_send.
  writeback: fix global_dirty_limits comment runtime -> real-time
  ppc: fix comment typo singal -> signal
  drivers: fix comment typo diable -> disable.
  m68k: fix comment typo diable -> disable.
  wireless: comment typo fix diable -> disable.
  media: comment typo fix diable -> disable.
  remove doc for obsolete dynamic-printk kernel-parameter
  remove extraneous 'is' from Documentation/iostats.txt
  Fix spelling milisec -> ms in snd_ps3 module parameter description
  Fix spelling mistakes in comments
  Revert conflicting V4L changes
  i7core_edac: fix typos in comments
  mm/rmap.c: fix comment
  sound, ca0106: Fix assignment to 'channel'.
  hrtimer: fix a typo in comment
  init/Kconfig: fix typo
  anon_inodes: fix wrong function name in comment
  fix comment typos concerning "consistent"
  poll: fix a typo in comment
  ...

Fix up trivial conflicts in:
 - drivers/net/wireless/iwlwifi/iwl-core.c (moved to iwl-legacy.c)
 - fs/ext4/ext4.h

Also fix missed 'diabled' typo in drivers/net/bnx2x/bnx2x.h while at it.
2011-01-13 10:05:56 -08:00
Aleksey Senin
da995a8aee IB/mlx4: Handle protocol field in multicast table
The newest device firmware stores IB vs. Ethernet protocol in two bits
in members_count field of multicast group table (0: Infiniband, 1:
Ethernet).  When changing the QP members count for a multicast group,
it important not to reset this information.  When calling multicast
attach first time, the protocol type should be specified.  In this
patch we always set it IB, but in the future we will handle Ethernet
too.  When looking for a QP, the protocol type shoud be checked too.

Signed-off-by: Aleksey Senin <alekseys@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2011-01-12 14:49:17 -08:00
Joe Perches
948579cd8c RDMA: Use vzalloc() to replace vmalloc()+memset(0)
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2011-01-12 11:11:58 -08:00
Roland Dreier
4979d18fe1 mlx4_{core, ib, en}: Fix driver when sizeof (phys_addr_t) > sizeof (long)
Some systems have PCI addresses that don't fit in unsigned long (eg some
32-bit PowerPC 440 systems have 36-bit bus addresses).  Fix up mlx4 drivers
by using phys_addr_t where appropriate, so we don't truncate any PCI
resource addresses before ioremapping them.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2011-01-12 09:50:36 -08:00
John L. Burr
eb4a7cbf27 IB/mthca: Fix driver when sizeof (phys_addr_t) > sizeof (long)
Some systems have PCI addresses that don't fit in unsigned long (eg some
32-bit PowerPC 440 systems have 36-bit bus addresses).  Fix up the driver
by using phys_addr_t where appropriate, so we don't truncate any PCI
resource addresses before ioremapping them.

Signed-off-by: John L. Burr <jlburr@cadence.com>

[ Update to apply to current driver source.  - Roland ]

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2011-01-11 20:39:47 -08:00
Linus Torvalds
f1d6d6cd90 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: (42 commits)
  IB/qib: Fix refcount leak in lkey/rkey validation
  IB/qib: Improve SERDES tunning on QMH boards
  IB/qib: Unnecessary delayed completions on RC connection
  IB/qib: Issue pre-emptive NAKs on eager buffer overflow
  IB/qib: RDMA lkey/rkey validation is inefficient for large MRs
  IB/qib: Change QPN increment
  IB/qib: Add fix missing from earlier patch
  IB/qib: Change receive queue/QPN selection
  IB/qib: Fix interrupt mitigation
  IB/qib: Avoid duplicate writes to the rcv head register
  IB/qib: Add a few new SERDES tunings
  IB/qib: Reset packet list after freeing
  IB/qib: New SERDES init routine and improvements to SI quality
  IB/qib: Clear WAIT_SEND flags when setting QP to error state
  IB/qib: Fix context allocation with multiple HCAs
  IB/qib: Fix multi-Florida HCA host panic on reboot
  IB/qib: Handle transitions from ACTIVE_DEFERRED to ACTIVE better
  IB/qib: UD send with immediate receive completion has wrong size
  IB/qib: Set port physical state even if other fields are invalid
  IB/qib: Generate completion callback on errors
  ...
2011-01-11 16:30:08 -08:00
Roland Dreier
2b76c05794 Merge branches 'cxgb4', 'ipath', 'ipoib', 'mlx4', 'mthca', 'nes', 'qib' and 'srp' into for-next 2011-01-10 17:43:30 -08:00
Mike Marciniszyn
4db62d4786 IB/qib: Fix refcount leak in lkey/rkey validation
The mr optimization introduced a reference count leak on an exception
test.  The lock/refcount manipulation is moved down and the problematic
exception test now calls bail to insure that the lock is released.

Additional fixes as suggested by Ralph Campbell <ralph.campbell@qlogic.org>:
- reduce lock scope of dma regions
- use explicit values on returns vs. automatic ret value

Signed-off-by: Mike Marciniszyn <mike.marciniszyn@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2011-01-10 17:42:23 -08:00
Mike Marciniszyn
f2d255a078 IB/qib: Improve SERDES tunning on QMH boards
Improve the QMH SERDES tunning on initial driver load by having the
driver go through a link state change.

Signed-off-by: Mike Marciniszyn <mike.marciniszyn@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2011-01-10 17:42:22 -08:00
Mike Marciniszyn
dd04e43d46 IB/qib: Unnecessary delayed completions on RC connection
Currently on receipt of a response message (ACKs, RDMA Response,
Atomic Responses etc.) if the SDMA completion counter is not advanced
the driver delays the completion of the WQE.  In most cases this is
overly pessimistic as the response (ACK) to a previously transmitted
send implies that the send is complete.  Ensure that SDMA queue is
progressed appropriately before determining if a send has delayed
completions.

Signed-off-by: Mike Marciniszyn <mike.marciniszyn@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2011-01-10 17:42:22 -08:00
Mike Marciniszyn
994bcd28a3 IB/qib: Issue pre-emptive NAKs on eager buffer overflow
Under congestion resulting in eager buffer overflow attempt to send
pre-emptive NAKs if header queue entries with TID errors are generated
and a valid header is present.  This prevents long timeouts and flow
restarts if a trailing set of packets are dropped due to eager
overflows.  Pre-emptive NAKs are currently only supported for RDMA
writes.

Signed-off-by: Mike Marciniszyn <mike.marciniszyn@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2011-01-10 17:42:22 -08:00
Mike Marciniszyn
2a600f14d2 IB/qib: RDMA lkey/rkey validation is inefficient for large MRs
The current code loops during rkey/lkey validiation to isolate the MR
for the RDMA, which is expensive when the current operation is inside
a very large memory region.

This fix optimizes rkey/lkey validation routines for user memory
regions and fast memory regions.  The MR entry can be isolated by
shifts/mods instead of looping.  The existing loop is preserved for
phys memory regions for now.

Signed-off-by: Mike Marciniszyn <mike.marciniszyn@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2011-01-10 17:42:22 -08:00
Mike Marciniszyn
7c3edd3ff3 IB/qib: Change QPN increment
Changing from +1 to +2 allows for better QP distribution across
receive contexts.

Signed-off-by: Mike Marciniszyn <mike.marciniszyn@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2011-01-10 17:42:22 -08:00
Mike Marciniszyn
057ae62fac IB/qib: Add fix missing from earlier patch
The upstream code was missing part of a receive/error race fix from
the internal tree.  Add the missing part, which makes future merges
possible.

Signed-off-by: Mike Marciniszyn <mike.marciniszyn@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2011-01-10 17:42:21 -08:00
Mike Marciniszyn
2528ea60f9 IB/qib: Change receive queue/QPN selection
The basic idea is that on SusieQ, the difficult part of mapping QPN to
context is handled by the mapping registers so the generic QPN
allocation doesn't need to worry about chip specifics.  For Monty and
Linda, there is no mapping table so the qpt->mask (same as
dd->qpn_mask), is used to see if the QPN to context falls within
[zero..dd->n_krcv_queues).

Signed-off-by: Mike Marciniszyn <mike.marciniszyn@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2011-01-10 17:42:21 -08:00
Mike Marciniszyn
19ede2e422 IB/qib: Fix interrupt mitigation
For SusieQ we need to write to the interrupt timer register before
updating the header queue head with interrupt count.  This is to
ensure that the timer is enabled properly and a receive available
interrupt is delivered.  Otherwise this interrupt can be lost if the
receiver header/eager queues are full before the timer is enabled.

Signed-off-by: Mike Marciniszyn <mike.marciniszyn@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2011-01-10 17:42:21 -08:00
Mike Marciniszyn
aa7374ac19 IB/qib: Avoid duplicate writes to the rcv head register
Avoid duplicate writes to the head register as this can lead to lost
interrupts if the context goes full before the second write is done.

Signed-off-by: Mike Marciniszyn <mike.marciniszyn@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2011-01-10 17:42:21 -08:00
Mike Marciniszyn
e706203c7c IB/qib: Add a few new SERDES tunings
Add new SERDES tuning to aid manufacturing.

Signed-off-by: Mike Marciniszyn <mike.marciniszyn@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2011-01-10 17:42:21 -08:00
Mike Marciniszyn
f73df408b2 IB/qib: Reset packet list after freeing
Reset the list pointers after freeing the SDMA packet list.  This is
done to any potential double-free cases.

Signed-off-by: Mike Marciniszyn <mike.marciniszyn@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2011-01-10 17:42:21 -08:00
Mike Marciniszyn
a0a234d47d IB/qib: New SERDES init routine and improvements to SI quality
Implement new SERDES initialization routine and improvements to signal
integrity -- disable LE1 adaptation, disable LOS after link-up, set
better SERDES parameters.

Signed-off-by: Mike Marciniszyn <mike.marciniszyn@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2011-01-10 17:42:20 -08:00
Mike Marciniszyn
16028f2777 IB/qib: Clear WAIT_SEND flags when setting QP to error state
If these flags are set when the QP is transitioned to the error state,
it will wait until the flags are cleared, which may never happen if
the error transition is due to a link going down.

Signed-off-by: Mike Marciniszyn <mike.marciniszyn@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2011-01-10 17:42:20 -08:00
Mike Marciniszyn
6676b3f746 IB/qib: Fix context allocation with multiple HCAs
The driver was incorrectly choosing HCAs on which to allocate new user
contexts based on overall count of usable ports regardless whether the
usable port was on the currently selected HCA.

Signed-off-by: Mike Marciniszyn <mike.marciniszyn@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2011-01-10 17:42:20 -08:00
Mike Marciniszyn
5dbbcb97cc IB/qib: Fix multi-Florida HCA host panic on reboot
Add check when setting configured contexts that the value does not
exceed the number of contexts allocated for the card.  If the value
exceeds the already allocated count, set it to what is already
allocated.

Signed-off-by: Mike Marciniszyn <mike.marciniszyn@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2011-01-10 17:42:20 -08:00
Mike Marciniszyn
b3d5cb2f20 IB/qib: Handle transitions from ACTIVE_DEFERRED to ACTIVE better
When the link transitions from ACTIVE_DEFERRED to ACTIVE, the driver
only sees the ACTIVE state. With this change, it will check whether
the state was already ACTIVE and if so, it will not generated IB
events and will not clear symbol error counts.

Signed-off-by: Mike Marciniszyn <mike.marciniszyn@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2011-01-10 17:42:20 -08:00
Mike Marciniszyn
c7665e5a69 IB/qib: UD send with immediate receive completion has wrong size
The code to generate receive completion entries for UD send with
immediate contains the wrong payload length.  This is because when the
code to compute the payload size was moved, the value of hdrsize
didn't get moved too.  The fix is to update tlen directly.

Signed-off-by: Mike Marciniszyn <mike.marciniszyn@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2011-01-10 17:42:20 -08:00
Mike Marciniszyn
3c9e5f4d65 IB/qib: Set port physical state even if other fields are invalid
The IBTA vol. 1 release 1.2.1 spec. says:
C14-24.2.1: If PortInfo:Portstate=Down, then a SubnSet(PortInfo) shall
make any changes it specifies to PortInfo:PortPhysicalState; any other
result is vendor-dependent.

The patch changes the error handling so that the reply says there are
invalid fields but still attempts to set fields that are in range
including PortInfo:PortPhysicalState.

Signed-off-by: Mike Marciniszyn <mike.marciniszyn@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2011-01-10 17:42:19 -08:00
Mike Marciniszyn
a377acd151 IB/qib: Generate completion callback on errors
According to IBTA vol. 1, C11-30.1.1, a notification callback is
invoked if the CQ is armed for the next solicited completion event or
an error completion.  The error case wasn't being generated correctly.

Signed-off-by: Mike Marciniszyn <mike.marciniszyn@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2011-01-10 17:42:19 -08:00
Mike Marciniszyn
f509f9c14d IB/qib: Add support for the new QME7362 card
Add support to recognize another board variation named QME7362.

Signed-off-by: Mike Marciniszyn <mike.marciniszyn@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2011-01-10 17:42:19 -08:00
Mike Marciniszyn
0a43e11722 IB/qib: Add receive header queue size module parameters
The receive header queue sizes need to modified for performance
tuning.  Three module parameters are added to support this.

Signed-off-by: Mike Marciniszyn <mike.marciniszyn@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2011-01-10 17:42:19 -08:00
Mike Marciniszyn
9d5b243f24 IB/qib: Remove IB latency turnoff
This is required for hardware testing.

Signed-off-by: Mike Marciniszyn <mike.marciniszyn@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2011-01-10 17:42:19 -08:00
Joe Perches
601d87b079 RDMA/nes: Fix string continuation line
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2011-01-10 17:42:14 -08:00
Dan Carpenter
d0444f1527 IB/mthca: Handle -ENOMEM in forward_trap()
ib_create_send_mad() can return ERR_PTR(-ENOMEM) here.

Signed-off-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2011-01-10 17:42:10 -08:00
Dan Carpenter
1397490938 IB/mlx4: Handle -ENOMEM in forward_trap()
ib_create_send_mad() can return ERR_PTR(-ENOMEM) here.

Signed-off-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2011-01-10 17:42:06 -08:00
Vladimir Sokolovsky
3afa9f19e5 IB/mlx4: Don't call dma_free_coherent() with irqs disabled
mlx4_ib_free_cq_buf() should not be called under spin_lock_irq() since
it calls dma_free_coherent(), which needs irqs enabled.  Fix this by
deferring the free to outside the locked region.

This was found due to the

	WARN_ON(irqs_disabled());

in swiotlb_free_coherent().

Signed-off-by: Vladimir Sokolovsky <vlad@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2011-01-10 17:42:06 -08:00
Or Gerlitz
8ae31e5b1f IPoIB: Add GRO support
Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2011-01-10 17:41:55 -08:00
Or Gerlitz
19e364f680 IPoIB: Remove LRO support
As a first step in moving from LRO to GRO, revert commit af40da894e
("IPoIB: add LRO support").  Also eliminate the ethtool set_flags
callback which isn't needed anymore.  Finally, we need to include
<linux/sched.h> directly to get the declaration of restart_syscall()
(which used to be included implicitly through <linux/inet_lro.h>).

Cc: Ben Hutchings <bhutchings@solarflare.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Vladimir Sokolovsky <vlad@mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2011-01-10 17:41:54 -08:00
Joe Perches
1eba27e87a IB/ipath: Use printf extension %pR for struct resource
Using %pR standardizes the struct resource output.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2011-01-10 17:41:50 -08:00
Steve Wise
db8b101671 RDMA/cxgb4: Don't re-init wait object in init/fini paths
Re-initializing the wait object in rdma_init()/rdma_fini() causes a
timing window which can lead to a deadlock during close.  Once this
deadlock hits, all RDMA activity over the T4 device will be stuck.

There's no need to re-init the wait object, so remove it.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Cc: <stable@kernel.org>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2011-01-10 17:41:43 -08:00
Stephen Hemminger
c943109163 RDMA/cxgb3,cxgb4: Remove dead code
This removes unused code found by running 'make namespacecheck';
compile tested only.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Acked-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2011-01-10 17:41:43 -08:00
David Dillow
9af762719e IB/srp: consolidate hot-path variables into cache lines
Put the variables accessed together in the hot-path into common
cachelines, and separate them by RW vs RO to avoid false dirtying.
We keep a local copy of the lkey and rkey in the target to avoid
traversing pointers (and associated cache lines) to find them.

Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: David Dillow <dillowda@ornl.gov>
2011-01-10 15:44:51 -05:00
Bart Van Assche
e968467822 IB/srp: stop sharing the host lock with SCSI
We don't need protection against the SCSI stack, so use our own lock to
allow parallel progress on separate CPUs.

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
[ broken out and small cleanups by David Dillow ]
Signed-off-by: David Dillow <dillowda@ornl.gov>
2011-01-10 15:44:50 -05:00
Bart Van Assche
94a9174c63 IB/srp: reduce lock coverage of command completion
We only need the lock to cover list and credit manipulations, so push
those into srp_remove_req() and update the call chains.

We reorder the request removal and command completion in
srp_process_rsp() to avoid the SCSI mid-layer sending another command
before we've released our request and added any credits returned by the
target. This prevents us from returning HOST_BUSY unneccesarily.

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
[ broken out, small cleanups, and modified to avoid potential extraneous
  HOST_BUSY returns by David Dillow ]
Signed-off-by: David Dillow <dillowda@ornl.gov>
2011-01-10 15:44:50 -05:00
Bart Van Assche
76c75b258f IB/srp: reduce local coverage for command submission and EH
We only need locks to protect our lists and number of credits available.
By pre-consuming the credit for the request, we can reduce our lock
coverage to just those areas. If we don't actually send the request,
we'll need to put the credit back into the pool.

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
[ broken out and small cleanups by David Dillow ]
Signed-off-by: David Dillow <dillowda@ornl.gov>
2011-01-10 15:44:49 -05:00
Bart Van Assche
536ae14e75 IB/srp: don't move active requests to their own list
We use req->scmnd != NULL to indicate an active request, so there's no
need to keep a separate list for them. We can afford the array iteration
during error handling, and dropping it gives us one less item that needs
lock protection.

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
[ broken out and small cleanups by David Dillow ]
Signed-off-by: David Dillow <dillowda@ornl.gov>
2011-01-10 15:44:42 -05:00
Linus Torvalds
b4a45f5fe8 Merge branch 'vfs-scale-working' of git://git.kernel.org/pub/scm/linux/kernel/git/npiggin/linux-npiggin
* 'vfs-scale-working' of git://git.kernel.org/pub/scm/linux/kernel/git/npiggin/linux-npiggin: (57 commits)
  fs: scale mntget/mntput
  fs: rename vfsmount counter helpers
  fs: implement faster dentry memcmp
  fs: prefetch inode data in dcache lookup
  fs: improve scalability of pseudo filesystems
  fs: dcache per-inode inode alias locking
  fs: dcache per-bucket dcache hash locking
  bit_spinlock: add required includes
  kernel: add bl_list
  xfs: provide simple rcu-walk ACL implementation
  btrfs: provide simple rcu-walk ACL implementation
  ext2,3,4: provide simple rcu-walk ACL implementation
  fs: provide simple rcu-walk generic_check_acl implementation
  fs: provide rcu-walk aware permission i_ops
  fs: rcu-walk aware d_revalidate method
  fs: cache optimise dentry and inode for rcu-walk
  fs: dcache reduce branches in lookup path
  fs: dcache remove d_mounted
  fs: fs_struct use seqlock
  fs: rcu-walk for path lookup
  ...
2011-01-07 08:56:33 -08:00
Nick Piggin
dc0474be3e fs: dcache rationalise dget variants
dget_locked was a shortcut to avoid the lazy lru manipulation when we already
held dcache_lock (lru manipulation was relatively cheap at that point).
However, how that the lru lock is an innermost one, we never hold it at any
caller, so the lock cost can now be avoided. We already have well working lazy
dcache LRU, so it should be fine to defer LRU manipulations to scan time.

Signed-off-by: Nick Piggin <npiggin@kernel.dk>
2011-01-07 17:50:24 +11:00
Nick Piggin
b5c84bf6f6 fs: dcache remove dcache_lock
dcache_lock no longer protects anything. remove it.

Signed-off-by: Nick Piggin <npiggin@kernel.dk>
2011-01-07 17:50:23 +11:00
Nick Piggin
b7ab39f631 fs: dcache scale dentry refcount
Make d_count non-atomic and protect it with d_lock. This allows us to ensure a
0 refcount dentry remains 0 without dcache_lock. It is also fairly natural when
we start protecting many other dentry members with d_lock.

Signed-off-by: Nick Piggin <npiggin@kernel.dk>
2011-01-07 17:50:21 +11:00
Bart Van Assche
dcb4cb85f4 IB/srp: allow lockless work posting
Only one CPU at a time will own an RX IU, so using the address of the IU
as the work request cookie allows us to avoid taking a lock. We can
similarly prepare the TX path for lockless posting by moving the free TX
IUs to a list. This also removes the requirement that the queue sizes be
a power of 2.

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
[ broken out, small cleanups, and modified to avoid needing an extra field
  in the IU by David Dillow]
Signed-off-by: David Dillow <dillowda@ornl.gov>
2011-01-05 15:24:25 -05:00
Bart Van Assche
9709f0e05b IB/srp: consolidate state change code
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
[ broken out and small cleanups by David Dillow ]
Signed-off-by: David Dillow <dillowda@ornl.gov>
2011-01-05 15:24:25 -05:00
David Dillow
f8b6e31e4e IB/srp: allow task management without a previous request
We can only have one task management comment outstanding, so move the
completion and status to the target port. This allows us to handle
resets of a LUN without a corresponding request having been sent.
Meanwhile, we don't need to play games with host_scribble, just use it
as the pointer it is.

This fixes a crash when we issue a bus reset using sg_reset.

Fixes: https://bugzilla.kernel.org/show_bug.cgi?id=13893
Reported-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: David Dillow <dillowda@ornl.gov>
2011-01-05 15:24:25 -05:00
David S. Miller
17f7f4d9fc Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
Conflicts:
	net/ipv4/fib_frontend.c
2010-12-26 22:37:05 -08:00
Jiri Kosina
4b7bd36470 Merge branch 'master' into for-next
Conflicts:
	MAINTAINERS
	arch/arm/mach-omap2/pm24xx.c
	drivers/scsi/bfa/bfa_fcpim.c

Needed to update to apply fixes for which the old branch was too
outdated.
2010-12-22 18:57:02 +01:00
Dan Carpenter
7182afea8d IB/uverbs: Handle large number of entries in poll CQ
In ib_uverbs_poll_cq() code there is a potential integer overflow if
userspace passes in a large cmd.ne.  The calls to kmalloc() would
allocate smaller buffers than intended, leading to memory corruption.
There iss also an information leak if resp wasn't all used.
Unprivileged userspace may call this function, although only if an
RDMA device that uses this function is present.

Fix this by copying CQ entries one at a time, which avoids the
allocation entirely, and also by moving this copying into a function
that makes sure to initialize all memory copied to userspace.

Special thanks to Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
for his help and advice.

Cc: <stable@kernel.org>
Signed-off-by: Dan Carpenter <error27@gmail.com>

[ Monkey around with things a bit to avoid bad code generation by gcc
  when designated initializers are used.  - Roland ]

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2010-12-08 15:23:49 -08:00
Linus Torvalds
75318ec327 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband:
  IB: Fix information leak in marshalling code
  IB/pack: Remove some unused code added by the IBoE patches
  IB/mlx4: Fix IBoE link state
  IB/mlx4: Fix IBoE reported link rate
  mlx4_core: Workaround firmware bug in query dev cap
  IB/mlx4: Fix memory ordering of VLAN insertion control bits
  MAINTAINERS: Update NetEffect entry
2010-12-02 12:10:56 -08:00
Roland Dreier
7adce751ce Merge branches 'misc', 'mlx4' and 'nes' into for-next 2010-12-01 16:33:47 -08:00
Vasiliy Kulikov
91a4d157d0 IB: Fix information leak in marshalling code
ib_ucm_init_qp_attr() and ucma_init_qp_attr() pass struct ib_uverbs_qp_attr
with reserved, qp_state, {ah_attr,alt_ah_attr}{reserved,->grh.reserved}
fields uninitialized to copy_to_user().  This leads to leaking of
contents of kernel stack memory to userspace.

Signed-off-by: Vasiliy Kulikov <segoon@openwall.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2010-12-01 16:33:18 -08:00
Or Gerlitz
f55864a4f4 IB/pack: Remove some unused code added by the IBoE patches
Remove unused functions added by commit ff7f5aab35 ("IB/pack: IBoE UD
packet packing support").

Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com>
2010-12-01 16:30:18 -08:00
Eli Cohen
21d606090e IB/mlx4: Fix IBoE link state
Use netif_running() and netif_carrier_ok() to report link state,
exactly as is done to report Ethernet link state in sysfs.

Signed-off-by: Eli Cohen <eli@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2010-12-01 16:11:29 -08:00
Eli Cohen
328266c561 IB/mlx4: Fix IBoE reported link rate
The link rate is the product of the link speed in the link width. For
Etherent ports the rate is 10G, so we use 1 for the width and 4 for
speed to get the correct rate.

Signed-off-by: Eli Cohen <eli@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2010-12-01 16:10:35 -08:00
Eli Cohen
e27535b9c6 IB/mlx4: Fix memory ordering of VLAN insertion control bits
We must fully update the control segment before marking it as valid,
so that hardware doesn't start executing it before we're ready.

Signed-off-by: Eli Cohen <eli@mellanox.co.il>

[ Move VLAN control bit setting to before wmb().  - Roland ]

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2010-12-01 11:08:54 -08:00