linux-next

mirror of https://github.com/edk2-porting/linux-next.git synced 2024-12-28 23:23:55 +08:00

Author	SHA1	Message	Date
Kumar Sanghvi	01b225e18f	RDMA/cxgb4: Fix retry with MPAv1 logic for MPAv2 Fix logic so that we don't retry with MPAv1 once we have done that already. Otherwise, we end up retrying with MPAv1 even when its not needed on getting peer aborts - and this could lead to kernel panic. Signed-off-by: Kumar Sanghvi <kumaras@chelsio.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2011-11-28 11:58:07 -08:00
Jonathan Lallinger	c34c97ad8c	RDMA/cxgb4: Fix iw_cxgb4 count_rcqes() logic Fix another place in the code where logic dealing with the t4_cqe was using the wrong QID. This fixes the counting logic so that it tests against the SQ QID instead of the RQ QID when counting RCQES. Signed-off by: Jonathan Lallinger <jonathan@ogc.us> Signed-off by: Steve Wise <swise@ogc.us> Signed-off-by: Roland Dreier <roland@purestorage.com>	2011-11-28 11:53:05 -08:00
Alexey Dobriyan	4e3fd7a06d	net: remove ipv6_addr_copy() C assignment can handle struct in6_addr copying. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-11-22 16:43:32 -05:00
David S. Miller	9ca36f7db2	infiniband: Update net drivers for netdev_features_t changes. Signed-off-by: David S. Miller <davem@davemloft.net>	2011-11-16 18:05:50 -05:00
Mike Marciniszyn	042f36e156	IB/qib: Don't use schedule_work() It was mistakenly introduced by `dde05cbdf8` ("IB/qib: Hold links until tuning data is available"). Signed-off-by: Mike Marciniszyn <mike.marciniszyn@qlogic.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2011-11-08 10:37:53 -08:00
Linus Torvalds	32aaeffbd4	Merge branch 'modsplit-Oct31_2011' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux * 'modsplit-Oct31_2011' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux: (230 commits) Revert "tracing: Include module.h in define_trace.h" irq: don't put module.h into irq.h for tracking irqgen modules. bluetooth: macroize two small inlines to avoid module.h ip_vs.h: fix implicit use of module_get/module_put from module.h nf_conntrack.h: fix up fallout from implicit moduleparam.h presence include: replace linux/module.h with "struct module" wherever possible include: convert various register fcns to macros to avoid include chaining crypto.h: remove unused crypto_tfm_alg_modname() inline uwb.h: fix implicit use of asm/page.h for PAGE_SIZE pm_runtime.h: explicitly requires notifier.h linux/dmaengine.h: fix implicit use of bitmap.h and asm/page.h miscdevice.h: fix up implicit use of lists and types stop_machine.h: fix implicit use of smp.h for smp_processor_id of: fix implicit use of errno.h in include/linux/of.h of_platform.h: delete needless include <linux/module.h> acpi: remove module.h include from platform/aclinux.h miscdevice.h: delete unnecessary inclusion of module.h device_cgroup.h: delete needless include <linux/module.h> net: sch_generic remove redundant use of <linux/module.h> net: inet_timewait_sock doesnt need <linux/module.h> ... Fix up trivial conflicts (other header files, and removal of the ab3550 mfd driver) in - drivers/media/dvb/frontends/dibx000_common.c - drivers/media/video/{mt9m111.c,ov6650.c} - drivers/mfd/ab3550-core.c - include/linux/dmaengine.h	2011-11-06 19:44:47 -08:00
Roland Dreier	b8108d6886	Merge branches 'iser', 'mthca' and 'qib' into for-next	2011-11-04 09:36:04 -07:00
Mike Marciniszyn	30ab7e230b	IB/qib: Fix panic in RC error flushing logic The following panic can occur when flushing a QP: RIP: 0010:[<ffffffffa0168e8b>] [<ffffffffa0168e8b>] qib_send_complete+0x3b/0x190 [ib_qib] RSP: 0018:ffff8803cdc6fc90 EFLAGS: 00010046 RAX: 0000000000000000 RBX: ffff8803d84ba000 RCX: 0000000000000000 RDX: 0000000000000005 RSI: ffffc90015a53430 RDI: ffff8803d84ba000 RBP: ffff8803cdc6fce0 R08: ffff8803cdc6fc90 R09: 0000000000000001 R10: 00000000ffffffff R11: 0000000000000000 R12: ffff8803d84ba0c0 R13: ffff8803d84ba5cc R14: 0000000000000800 R15: 0000000000000246 FS: 0000000000000000(0000) GS:ffff880036600000(0000) knlGS:0000000000000000 CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b CR2: 0000000000000034 CR3: 00000003e44f9000 CR4: 00000000000406f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process qib/0 (pid: 1350, threadinfo ffff8803cdc6e000, task ffff88042728a100) Stack: 53544c5553455201 0000000100000005 0000000000000000 ffff8803d84ba000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000001 ffff8803cdc6fd30 ffffffffa0165d7a Call Trace: [<ffffffffa0165d7a>] qib_make_rc_req+0x36a/0xe80 [ib_qib] [<ffffffffa0165a10>] ? qib_make_rc_req+0x0/0xe80 [ib_qib] [<ffffffffa01698b3>] qib_do_send+0xf3/0xb60 [ib_qib] [<ffffffff814db757>] ? thread_return+0x4e/0x777 [<ffffffffa01697c0>] ? qib_do_send+0x0/0xb60 [ib_qib] [<ffffffff81088bf0>] worker_thread+0x170/0x2a0 [<ffffffff8108e530>] ? autoremove_wake_function+0x0/0x40 [<ffffffff81088a80>] ? worker_thread+0x0/0x2a0 [<ffffffff8108e1c6>] kthread+0x96/0xa0 [<ffffffff8100c1ca>] child_rip+0xa/0x20 [<ffffffff8108e130>] ? kthread+0x0/0xa0 [<ffffffff8100c1c0>] ? child_rip+0x0/0x20 RIP [<ffffffffa0168e8b>] qib_send_complete+0x3b/0x190 [ib_qib] The RC error state flush logic in qib_make_rc_req() could return all of the acked wqes and potentially have emptied the queue. It would then unconditionally try return a flush completion via qib_send_complete() for an invalid wqe, or worse a valid one that is not queued. The panic results when the completion code tries to maintain an MR reference count for a NULL MR. This fix modifies logic to only send one completion per qib_make_rc_req() call and changing the completion status from IB_WC_SUCCESS to IB_WC_WR_FLUSH_ERR as the completions progress. The outer loop will call as many times as necessary to flush the queue. Reviewed-by: Ram Vepa <ram.vepa@qlogic.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@qlogic.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2011-11-04 09:35:44 -07:00
Or Gerlitz	52439540ea	IB/iser: DMA unmap TX bufs used for iSCSI/iSER headers The current driver never does DMA unmapping on these buffers. Fix that by adding DMA unmapping to the task cleanup callback, and DMA mapping to the task init function (drop the headers_initialized micro-optimization). Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2011-11-04 09:32:44 -07:00
Or Gerlitz	2c4ce60934	IB/iser: Use separate buffers for the login request/response The driver counted on the transactional nature of iSCSI login/text flows and used the same buffer for both the request and the response. We also went further and did DMA mapping only once, with DMA_FROM_DEVICE, which violates the DMA mapping API. Fix that by using different buffers, one for requests and one for responses, and use the correct DMA mapping direction for each. Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2011-11-04 09:30:52 -07:00
Roland Dreier	e4221314a5	IB/mthca: Fix buddy->num_free allocation size The num_free field of mthca_buddy has a type of array of unsigned int while it was allocated as an array of pointers. On 64-bit platforms this allocates twice more than required. Fix this by allocating the correct size for the type. This is the same bug just fixed in mlx4 by Eli Cohen <eli@mellanox.co.il>. Signed-off-by: Roland Dreier <roland@purestorage.com>	2011-11-03 17:48:25 -07:00
Linus Torvalds	f470f8d4e7	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: (62 commits) mlx4_core: Deprecate log_num_vlan module param IB/mlx4: Don't set VLAN in IBoE WQEs' control segment IB/mlx4: Enable 4K mtu for IBoE RDMA/cxgb4: Mark QP in error before disabling the queue in firmware RDMA/cxgb4: Serialize calls to CQ's comp_handler RDMA/cxgb3: Serialize calls to CQ's comp_handler IB/qib: Fix issue with link states and QSFP cables IB/mlx4: Configure extended active speeds mlx4_core: Add extended port capabilities support IB/qib: Hold links until tuning data is available IB/qib: Clean up checkpatch issue IB/qib: Remove s_lock around header validation IB/qib: Precompute timeout jiffies to optimize latency IB/qib: Use RCU for qpn lookup IB/qib: Eliminate divide/mod in converting idx to egr buf pointer IB/qib: Decode path MTU optimization IB/qib: Optimize RC/UC code by IB operation IPoIB: Use the right function to do DMA unmap pages RDMA/cxgb4: Use correct QID in insert_recv_cqe() RDMA/cxgb4: Make sure flush CQ entries are collected on connection close ...	2011-11-01 10:51:38 -07:00
Roland Dreier	504255f8d0	Merge branches 'amso1100', 'cma', 'cxgb3', 'cxgb4', 'fdr', 'ipath', 'ipoib', 'misc', 'mlx4', 'misc', 'nes', 'qib' and 'xrc' into for-next	2011-11-01 09:37:08 -07:00
Christoph Lameter	bc3e53f682	mm: distinguish between mlocked and pinned pages Some kernel components pin user space memory (infiniband and perf) (by increasing the page count) and account that memory as "mlocked". The difference between mlocking and pinning is: A. mlocked pages are marked with PG_mlocked and are exempt from swapping. Page migration may move them around though. They are kept on a special LRU list. B. Pinned pages cannot be moved because something needs to directly access physical memory. They may not be on any LRU list. I recently saw an mlockalled process where mm->locked_vm became bigger than the virtual size of the process (!) because some memory was accounted for twice: Once when the page was mlocked and once when the Infiniband layer increased the refcount because it needt to pin the RDMA memory. This patch introduces a separate counter for pinned pages and accounts them seperately. Signed-off-by: Christoph Lameter <cl@linux.com> Cc: Mike Marciniszyn <infinipath@qlogic.com> Cc: Roland Dreier <roland@kernel.org> Cc: Sean Hefty <sean.hefty@intel.com> Cc: Hugh Dickins <hughd@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-10-31 17:30:46 -07:00
Paul Gortmaker	fec14d2fce	infiniband: add moduleparam.h to drivers/infiniband as required These files were getting the moduleparam infrastructure from the implicit presence of module.h being everywhere, but that is going away soon. Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>	2011-10-31 19:31:36 -04:00
Paul Gortmaker	b108d9764c	infiniband: add in export.h for files using EXPORT_SYMBOL/THIS_MODULE These were getting it implicitly via device.h --> module.h but we are going to stop that when we clean up the headers. Fix these in advance so the tree remains biscect-clean. Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>	2011-10-31 19:31:35 -04:00
Paul Gortmaker	e4dd23d753	infiniband: Fix up module files that need to include module.h They had been getting it implicitly via device.h but we can't rely on that for the future, due to a pending cleanup so fix it now. Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>	2011-10-31 19:31:35 -04:00
Paul Gortmaker	fc87af74af	infiniband: Fix up users implicitly relying on getting stat.h They get it via module.h (via device.h) but we want to clean that up. When we do, we'll get things like: CC [M] drivers/infiniband/core/sysfs.o sysfs.c:361: error: 'S_IRUGO' undeclared here (not in a function) sysfs.c:654: error: 'S_IWUSR' undeclared here (not in a function) so add in the stat header it is using explicitly in advance. Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>	2011-10-31 19:31:34 -04:00
Or Gerlitz	80a2dcd8d0	IB/mlx4: Don't set VLAN in IBoE WQEs' control segment There's no need to set the vlan-related fields in an IBoE send WQE control segment: - the vlan to be used by a UD QP is set in the datagram segment. - for GSI (CM) QP, all the headers down to 8021q and MAC are built by the software anyway. Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2011-10-31 11:57:51 -07:00
Or Gerlitz	bcacb89756	IB/mlx4: Enable 4K mtu for IBoE The IBoE port MTU is derived from the corresponding Ethernet netdevice MTU, which can support jumbo frames of 9K, and hence surely supports the max IB mtu of 4K. Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2011-10-31 11:55:15 -07:00
Tom Tucker	d32ae393db	RDMA/cxgb4: Mark QP in error before disabling the queue in firmware QPs need to be moved to error before telling the firwmare to shutdown the queue. Otherwise, the application can submit WRs that will never get fetched by the hardware and never flushed by the driver. Signed-off-by: Kumar Sanghvi <kumaras@chelsio.com> Acked-by: Steve Wise <swsie@opengridcomputing.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2011-10-31 11:36:08 -07:00
Kumar Sanghvi	581bbe2cd0	RDMA/cxgb4: Serialize calls to CQ's comp_handler Commit `01e7da6ba5` ("RDMA/cxgb4: Make sure flush CQ entries are collected on connection close") introduced a potential problem where a CQ's comp_handler can get called simultaneously from different places in the iw_cxgb4 driver. This does not comply with Documentation/infiniband/core_locking.txt, which states that at a given point of time, there should be only one callback per CQ should be active. This problem was reported by Parav Pandit <Parav.Pandit@Emulex.Com>. Based on discussion between Parav Pandit and Steve Wise, this patch fixes the above problem by serializing the calls to a CQ's comp_handler using a spin_lock. Reported-by: Parav Pandit <Parav.Pandit@Emulex.Com> Signed-off-by: Kumar Sanghvi <kumaras@chelsio.com> Acked-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2011-10-31 11:34:53 -07:00
Kumar Sanghvi	f7cc25d018	RDMA/cxgb3: Serialize calls to CQ's comp_handler iw_cxgb3 has a potential problem where a CQ's comp_handler can get called simultaneously from different places in iw_cxgb3 driver. This does not comply with Documentation/infiniband/core_locking.txt, which states that at a given point of time, there should be only one callback per CQ should be active. Such problem was reported by Parav Pandit <Parav.Pandit@Emulex.Com> for iw_cxgb4 driver. Based on discussion between Parav Pandit and Steve Wise, this patch fixes the above problem by serializing the calls to a CQ's comp_handler using a spin_lock. Signed-off-by: Kumar Sanghvi <kumaras@chelsio.com> Acked-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2011-10-31 11:33:17 -07:00
Mitko Haralanov	16d99812d5	IB/qib: Fix issue with link states and QSFP cables Fix an issue where the link would come up after replugging a cable even if it has been DISABLED manually. Signed-off-by: Mitko Haralanov <mitko@qlogic.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@qlogic.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2011-10-31 10:57:59 -07:00
Linus Torvalds	ec7ae51753	Merge git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6 * git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6: (204 commits) [SCSI] qla4xxx: export address/port of connection (fix udev disk names) [SCSI] ipr: Fix BUG on adapter dump timeout [SCSI] megaraid_sas: Fix instance access in megasas_reset_timer [SCSI] hpsa: change confusing message to be more clear [SCSI] iscsi class: fix vlan configuration [SCSI] qla4xxx: fix data alignment and use nl helpers [SCSI] iscsi class: fix link local mispelling [SCSI] iscsi class: Replace iscsi_get_next_target_id with IDA [SCSI] aacraid: use lower snprintf() limit [SCSI] lpfc 8.3.27: Change driver version to 8.3.27 [SCSI] lpfc 8.3.27: T10 additions for SLI4 [SCSI] lpfc 8.3.27: Fix queue allocation failure recovery [SCSI] lpfc 8.3.27: Change algorithm for getting physical port name [SCSI] lpfc 8.3.27: Changed worst case mailbox timeout [SCSI] lpfc 8.3.27: Miscellanous logic and interface fixes [SCSI] megaraid_sas: Changelog and version update [SCSI] megaraid_sas: Add driver workaround for PERC5/1068 kdump kernel panic [SCSI] megaraid_sas: Add multiple MSI-X vector/multiple reply queue support [SCSI] megaraid_sas: Add support for MegaRAID 9360/9380 12GB/s controllers [SCSI] megaraid_sas: Clear FUSION_IN_RESET before enabling interrupts ...	2011-10-28 16:44:18 -07:00
Marcel Apfelbaum	a5e12dff75	IB/mlx4: Configure extended active speeds Set the extended active speeds based on the hardware configuration. Signed-off-by: Marcel Apfelbaum <marcela@dev.mellanox.co.il> Reviewed-by: Hal Rosenstock <hal@mellanox.com> [ Move FDR-10 handling into ib_link_query_port(). - Roland ] Signed-off-by: Roland Dreier <roland@purestorage.com>	2011-10-28 11:36:16 -07:00
Mitko Haralanov	dde05cbdf8	IB/qib: Hold links until tuning data is available Hold the link state machine until the tuning data is read from the QSFP EEPROM so correct tuning settings are applied before the state machine attempts to bring the link up. Link is also held on cable unplug in case a different cable is used. Signed-off-by: Mitko Haralanov <mitko@qlogic.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@qlogic.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2011-10-21 15:08:20 -07:00
Mike Marciniszyn	44d75d3d92	IB/qib: Clean up checkpatch issue This was probably present from initial submission. Signed-off-by: Mike Marciniszyn <mike.marciniszyn@qlogic.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2011-10-21 15:08:18 -07:00
Mike Marciniszyn	9fd5473deb	IB/qib: Remove s_lock around header validation Review of qib_ruc_check_hdr() shows that the s_lock is not required in the normal case. The r_lock is held in all cases, and protects the qp fields that are read. The s_lock will be needed to around the call to qib_migrate_qp() to insure that the send engine sees a consistent set of fields. Signed-off-by: Mike Marciniszyn <mike.marciniszyn@qlogic.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2011-10-21 09:38:57 -07:00
Mike Marciniszyn	d0f2faf72d	IB/qib: Precompute timeout jiffies to optimize latency A new field is added to qib_qp called timeout_jiffies. It is initialized upon create and modify. The field is now used instead of a computation based on qp->timeout. Signed-off-by: Mike Marciniszyn <mike.marciniszyn@qlogic.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2011-10-21 09:38:56 -07:00
Mike Marciniszyn	af061a644a	IB/qib: Use RCU for qpn lookup The heavy weight spinlock in qib_lookup_qpn() is replaced with RCU. The hash list itself is now accessed via jhash functions instead of mod. The changes should benefit multiple receive contexts in different processors by not contending for the lock just to read the hash structures. The patch also adds a lookaside_qp (pointer) and a lookaside_qpn in the context. The interrupt handler will test the current packet's qpn against lookaside_qpn if the lookaside_qp pointer is non-NULL. The pointer is NULL'ed when the interrupt handler exits. Signed-off-by: Mike Marciniszyn <mike.marciniszyn@qlogic.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2011-10-21 09:38:54 -07:00
Mike Marciniszyn	9e1c0e4325	IB/qib: Eliminate divide/mod in converting idx to egr buf pointer The context init now saves a shift from rcvegrbufs_perchunk rcvegrbufs_perchunk_shift using ilog2. A BUG_ON() protects the power of 2 assumption. Signed-off-by: Mike Marciniszyn <mike.marciniszyn@qlogic.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2011-10-21 09:38:52 -07:00
Mike Marciniszyn	cc6ea1385b	IB/qib: Decode path MTU optimization Store both the encoded and decoded MTU in the QP structure as a minor optimization for UC/RC receive routines. Signed-off-by: Mike Marciniszyn <mike.marciniszyn@qlogic.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2011-10-21 09:38:50 -07:00
Mike Marciniszyn	2fc109c890	IB/qib: Optimize RC/UC code by IB operation The memset for zeroing work completions had been unconditional. This patch removes the memset and moves the zeroing into the work completion with a more explicit field by field set. With this patch, non-ONLY/non-LAST packets will avoid the overhead since they will not generate a completion. Signed-off-by: Mike Marciniszyn <mike.marciniszyn@qlogic.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2011-10-21 09:38:49 -07:00
Eric Dumazet	9e903e0852	net: add skb frag size accessors To ease skb->truesize sanitization, its better to be able to localize all references to skb frags size. Define accessors : skb_frag_size() to fetch frag size, and skb_frag_size_{set\|add\|sub}() to manipulate it. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-10-19 03:10:46 -04:00
Dotan Barak	787adb9d6a	IPoIB: Use the right function to do DMA unmap pages Pages that were mapped using ib_dma_map_page() should be unmapped using ib_dma_unmap_page(). Signed-off-by: Dotan Barak <dotanb@dev.mellanox.co.il> Reviewed-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Roland Dreier <roland@purestorage.com>	2011-10-18 10:08:31 -07:00
Jonathan Lallinger	e14d62c05c	RDMA/cxgb4: Use correct QID in insert_recv_cqe() When creating flushed receive CQEs, set the QPID field in the t4_cqe to the SQ QID and not the RQ QID. Otherwise the poll code will not find the correct QP context. Signed-off by: Jonathan Lallinger <jonathan@ogc.us> Signed-off by: Steve Wise <swise@ogc.us> Signed-off-by: Roland Dreier <roland@purestorage.com>	2011-10-14 14:23:40 -07:00
Kumar Sanghvi	01e7da6ba5	RDMA/cxgb4: Make sure flush CQ entries are collected on connection close At the time when a peer closes the connection, iw_cxgb4 will not send a cq event if ibqp.uobject exists. In that case, its possible for a user application to get blocked in ibv_get_cq_event(). To resolve this, call the cq's comp_handler to unblock any read from ibv_get_cq_event(). This will trigger userspace to poll the cq and collect flush status completions for any pending work requests. Signed-off-by: Kumar Sanghvi <kumaras@chelsio.com> Acked-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2011-10-14 14:23:04 -07:00
Sean Hefty	42849b2697	RDMA/uverbs: Export ib_open_qp() capability to user space Allow processes that share the same XRC domain to open an existing shareable QP. This permits those processes to receive events on the shared QP and transfer ownership, so that any process may modify the QP. The latter allows the creating process to exit, while a remaining process can still transition it for path migration purposes. Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2011-10-13 09:50:56 -07:00
Sean Hefty	0e0ec7e063	RDMA/core: Export ib_open_qp() to share XRC TGT QPs XRC TGT QPs are shared resources among multiple processes. Since the creating process may exit, allow other processes which share the same XRC domain to open an existing QP. This allows us to transfer ownership of an XRC TGT QP to another process. Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2011-10-13 09:49:51 -07:00
Sean Hefty	0a1405da99	IB/mlx4: Add support for XRC QPs Support the creation of XRC INI and TGT QPs. To handle the case where a CQ or PD is not provided, we allocate them internally with the xrcd. Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2011-10-13 09:44:18 -07:00
Sean Hefty	18abd5ea57	IB/mlx4: Add support for XRC SRQs Allow the user to create XRC SRQs. This patch is based on a patch from Jack Morgenstrein <jackm@dev.mellanox.co.il>. Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2011-10-13 09:43:46 -07:00
Sean Hefty	012a8ff577	IB/mlx4: Add support for XRC domains Support creating and destroying XRC domains. Any sharing of the XRCD is managed above the low-level driver. Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2011-10-13 09:43:03 -07:00
Sean Hefty	2622e18ef4	IB/cm: Do not automatically disconnect XRC TGT QPs Because an XRC TGT QP can end up being shared among multiple processes, don't have the ib_cm automatically send a DREQ when the userspace process that owns the ib_cm_id exits. Disconnect can be initiated by the user directly; otherwise, the owner of the XRC INI QP controls the connection. Note that as a result of the process exiting, the ib_cm will stop tracking the XRC connection on the target side. For the purposes of disconnecting, this isn't a big deal. The ib_cm will respond to the DREQ appropriately. For other messages, mainly LAP, the CM will reject the request, since there's no one available to route the request to. Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2011-10-13 09:42:06 -07:00
Sean Hefty	18c441a6c3	RDMA/cma: Support XRC QPs Allow users to connect XRC QPs through the rdma_cm. Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2011-10-13 09:41:36 -07:00
Sean Hefty	638ef7a6c6	RDMA/ucm: Allow user to specify QP type when creating id Allow the user to indicate the QP type separately from the port space when allocating an rdma_cm_id. With RDMA_PS_IB, there is no longer a 1:1 relationship between the QP type and port space, so we need to switch on the QP type to select between UD and connected QPs. Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2011-10-13 09:40:36 -07:00
Sean Hefty	2d2e941529	RDMA/cm: Define new RDMA port space specific to IB Add RDMA_PS_IB. XRC QP types will use the IB port space when operating over the RDMA CM. For the 'IP protocol' field value, we select 0x3F, which is listed as being for 'any local network'. Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2011-10-13 09:39:52 -07:00
Sean Hefty	ef70044647	IB/cm: Update XRC support based on XRC annex errata The XRC annex was updated to have XRC behave more like RD. Specifically, the XRC TGT QPN moves from the local QPN to local EECN field. Lookup of SRQN is done using the REQ/REP protocol. Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2011-10-13 09:38:35 -07:00
Sean Hefty	d26a360b77	IB/cm: Update protocol to support XRC Update the REQ and REP messages to support XRC connection setup according to the XRC Annex. Several existing fields must be set to 0 or 1 when connecting XRC QPs, and a reserved field is changed to an extended transport type. Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2011-10-13 09:37:55 -07:00
Sean Hefty	b93f3c1872	RDMA/uverbs: Export XRC TGT QPs to user space Allow user space to operate on XRC TGT QPs the same way as other types of QPs, with one notable exception: since XRC TGT QPs may be shared among multiple processes, the XRC TGT QP is allowed to exist beyond the lifetime of the creating process. The process that creates the QP is allowed to destroy it, but if the process exits without destroying the QP, then the QP will be left bound to the lifetime of the XRCD. TGT QPs are not associated with CQs or a PD. Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2011-10-13 09:37:07 -07:00
Sean Hefty	9977f4f64b	RDMA/uverbs: Export XRC INI QPs to userspace XRC INI QPs are similar to send only RC QPs. Allow user space to create INI QPs. Note that INI QPs do not require receive CQs. Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2011-10-13 09:32:27 -07:00
Sean Hefty	8541f8de05	RDMA/uverbs: Export XRC SRQs to user space We require additional information to create XRC SRQs than we can exchange using the existing create SRQ ABI. Provide an enhanced create ABI for extended SRQ types. Based on patches by Jack Morgenstein <jackm@dev.mellanox.co.il> and Roland Dreier <roland@purestorage.com> Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2011-10-13 09:29:18 -07:00
Sean Hefty	53d0bd1e7f	RDMA/uverbs: Export XRC domains to user space Allow user space to create XRC domains. Because XRCDs are expected to be shared among multiple processes, we use inodes to identify an XRCD. Based on patches by Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2011-10-13 09:21:24 -07:00
Sean Hefty	d3d72d909e	RDMA/verbs: Cleanup XRC TGT QPs when destroying XRCD XRC TGT QPs are intended to be shared among multiple users and processes. Allow the destruction of an XRC TGT QP to be done explicitly through ib_destroy_qp() or when the XRCD is destroyed. To support destroying an XRC TGT QP, we need to track TGT QPs with the XRCD. When the XRCD is destroyed, all tracked XRC TGT QPs are also cleaned up. To avoid stale reference issues, if a user is holding a reference on a TGT QP, we increment a reference count on the QP. The user releases the reference by calling ib_release_qp. This releases any access to the QP from a user above verbs, but allows the QP to continue to exist until destroyed by the XRCD. Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2011-10-13 09:20:27 -07:00
Sean Hefty	b42b63cf0d	RDMA/core: Add XRC QPs XRC ("eXtended reliable connected") is an IB transport that provides better scalability by allowing senders to specify which shared receive queue (SRQ) should be used to receive a message, which essentially allows one transport context (QP connection) to serve multiple destinations (as long as they share an adapter, of course). XRC communication is between an initiator (INI) QP and a target (TGT) QP. Target QPs are associated with SRQs through an XRCD. An XRC TGT QP behaves like a receive-only RD QP. XRC INI QPs behave similarly to RC QPs, except that work requests posted to an XRC INI QP must specify the remote SRQ that is the target of the work request. We define two new QP types for XRC, to distinguish between INI and TGT QPs, and update the core layer to support XRC QPs. This patch is derived from work by Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2011-10-13 09:16:19 -07:00
Sean Hefty	418d51307d	RDMA/core: Add XRC SRQ type XRC ("eXtended reliable connected") is an IB transport that provides better scalability by allowing senders to specify which shared receive queue (SRQ) should be used to receive a message, which essentially allows one transport context (QP connection) to serve multiple destinations (as long as they share an adapter, of course). XRC defines SRQs that are specifically used by XRC connections. Expand the SRQ code to support XRC SRQs. An XRC SRQ is currently restricted to only XRC use according to the IB XRC Annex. Portions of this patch were derived from work by Jack Morgenstein <jackm@dev.mellanox.co.il>. Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2011-10-13 09:14:31 -07:00
Sean Hefty	96104eda01	RDMA/core: Add SRQ type field Currently, there is only a single ("basic") type of SRQ, but with XRC support we will add a second. Prepare for this by defining an SRQ type and setting all current users to IB_SRQT_BASIC. Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2011-10-13 09:13:26 -07:00
Sean Hefty	59991f94eb	RDMA/core: Add XRC domain support XRC ("eXtended reliable connected") is an IB transport that provides better scalability by allowing senders to specify which shared receive queue (SRQ) should be used to receive a message, which essentially allows one transport context (QP connection) to serve multiple destinations (as long as they share an adapter, of course). A few new concepts are introduced to support this. This patch adds: - A new device capability flag, IB_DEVICE_XRC, which low-level drivers set to indicate that a device supports XRC. - A new object type, XRC domains (struct ib_xrcd), and new verbs ib_alloc_xrcd()/ib_dealloc_xrcd(). XRCDs are used to limit which XRC SRQs an incoming message can target. This patch is derived from work by Jack Morgenstein <jackm@dev.mellanox.co.il>. Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2011-10-12 10:32:26 -07:00
Marcel Apfelbaum	e36fb88a9a	IPoIB: Handle extended rates in debugfs Use new function ib_rate_to_mbps() to handle printing rate in debugfs, so that we handle extended rates. Signed-off-by: Marcel Apfelbaum <marcela@dev.mellanox.co.il> Signed-off-by: Roland Dreier <roland@purestorage.com>	2011-10-11 11:57:08 -07:00
Marcel Apfelbaum	71eeba161d	IB: Add new InfiniBand link speeds Introduce support for the following extended speeds: FDR-10: a Mellanox proprietary link speed which is 10.3125 Gbps with 64b/66b encoding rather than 8b/10b encoding. FDR: IBA extended speed 14.0625 Gbps. EDR: IBA extended speed 25.78125 Gbps. Signed-off-by: Marcel Apfelbaum <marcela@dev.mellanox.co.il> Reviewed-by: Hal Rosenstock <hal@mellanox.com> Reviewed-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2011-10-11 11:53:47 -07:00
Randy Dunlap	3e60a77ea2	IB/ipath: Add missing <linux/stat.h> in ipath_chip_init.c Fix build errors: drivers/infiniband/hw/ipath/ipath_init_chip.c:54:1: error: 'S_IRUGO' undeclared here (not in a function) drivers/infiniband/hw/ipath/ipath_init_chip.c:54:1: error: bit-field '<anonymous>' width not an integer constant drivers/infiniband/hw/ipath/ipath_init_chip.c:67:1: error: 'S_IWUSR' undeclared here (not in a function) drivers/infiniband/hw/ipath/ipath_init_chip.c:67:1: error: bit-field '<anonymous>' width not an integer constant Signed-off-by: Randy Dunlap <rdunlap@xenotime.net> Signed-off-by: Roland Dreier <roland@purestorage.com>	2011-10-10 12:01:22 -07:00
Faisal Latif	0f0bee8bbc	RDMA/nes: Support for Packed And Unaligned fpdus Support for Packed and Unaligned (PAU) FPDUs is needed for interoperability between NES and non-NES nodes. When the NES hardware detects a PAU frame, it will pass it to the driver to process the frame. NES driver creates a new frame for each FPDU and forwards it to the hardware to be sent to its associated qp. Signed-off-by: Tatyana Nikolova <Tatyana.E.Nikolova@intel.com> Signed-off-by: Faisal Latif <Faisal.Latif@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2011-10-10 10:54:47 -07:00
Faisal Latif	6224c7eeff	RDMA/nes: Print IP address for critcal errors Print the IP address of the remote host when a critical asynchronous event is received. Signed-off-by: Tatyana Nikolova <Tatyana.E.Nikolova@intel.com> Signed-off-by: Faisal Latif <Faisal.Latif@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2011-10-10 10:51:21 -07:00
Faisal Latif	bab3a9f43f	RDMA/nes: Fix terminate connection Fixes a crash that occurs during close when error async event is received. Terminate message is not sent to the remote node if already processing close. Signed-off-by: Tatyana Nikolova <Tatyana.E.Nikolova@intel.com> Signed-off-by: Faisal Latif <Faisal.Latif@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2011-10-10 10:47:44 -07:00
David S. Miller	88c5100c28	Merge branch 'master' of github.com:davem330/net Conflicts: net/batman-adv/soft-interface.c	2011-10-07 13:38:43 -04:00
Ian Campbell	5d6bcdfe38	net: use DMA_x_DEVICE and dma_mapping_error with skb_frag_dma_map When I converted some drivers from pci_map_page to skb_frag_dma_map I neglected to convert PCI_DMA_xDEVICE into DMA_x_DEVICE and pci_dma_mapping_error into dma_mapping_error. Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-10-06 16:17:20 -04:00
Tatyana Nikolova	615eb715ae	RDMA/nes: Add support for MPAv2 Enhanced RDMA Negotiation This patch adds support for Enhanced RDMA Connection Establishment (draft-ietf-storm-mpa-peer-connect-06), aka MPAv2. Details of draft can be obtained from: <http://www.ietf.org/id/draft-ietf-storm-mpa-peer-connect-06.txt> For backwards compatibility, the MPAv2 enabled driver reverts to MPAv1 if the remote node doesn't support MPAv2. Signed-off-by: Tatyana Nikolova <Tatyana.E.Nikolova@intel.com> Signed-off-by: Faisal Latif <Faisal.Latif@intel.com> Reviewed-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2011-10-06 09:39:45 -07:00
Kumar Sanghvi	d2fe99e86b	RDMA/cxgb4: Add support for MPAv2 Enhanced RDMA Negotiation This patch adds support for Enhanced RDMA Connection Establishment (draft-ietf-storm-mpa-peer-connect-06), aka MPAv2. Details of draft can be obtained from: <http://www.ietf.org/id/draft-ietf-storm-mpa-peer-connect-06.txt> The patch updates the following functions for initiator perspective: - send_mpa_request - process_mpa_reply - post_terminate for TERM error codes - destroy_qp for TERM related change - adds layer/etype/ecode to c4iw_qp_attrs for sending with TERM - peer_abort for retrying connection attempt with MPA_v1 message - added c4iw_reconnect function The patch updates the following functions for responder perspective: - process_mpa_request - send_mpa_reply - c4iw_accept_cr - passes ird/ord to upper layers Signed-off-by: Kumar Sanghvi <kumaras@chelsio.com> Reviewed-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2011-10-06 09:39:24 -07:00
Kumar Sanghvi	56da00fc92	RDMA/{amso1100,cxgb3}: Minimal MPAv2 support As part of MPAv2 Enhanced RDMA Negotiation, pass max supported ird/ord values upwards for the time being in iw_cxgb3 and amso1100. Signed-off-by: Kumar Sanghvi <kumaras@chelsio.com> Reviewed-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2011-10-06 09:39:01 -07:00
Kumar Sanghvi	3ebeebc38b	RDMA/iwcm: Propagate ird/ord values upwards Update struct iw_cm_event to support propagating the ird/ord values upwards to the application. Signed-off-by: Kumar Sanghvi <kumaras@chelsio.com> Reviewed-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2011-10-06 09:37:52 -07:00
Mike Marciniszyn	53ab1c6498	IB/qib: Correct nfreectxts for multiple HCAs The code that was recently introduced to report the number of free contexts is flawed for multiple HCAs: /* Return the number of free user ports (contexts) available. */ return scnprintf(buf, PAGE_SIZE, "%u\n", dd->cfgctxts - dd->first_user_ctxt - (u32)qib_stats.sps_ctxts); The qib_stats is global to the module, not per HCA, so the code is broken for multiple HCAs. This patch adds a qib_devdata field, freectxts, that reflects the free contexts for this HCA. Signed-off-by: Mike Marciniszyn <mike.marciniszyn@qlogic.com> Reviewed-by: Ram Vepa <ram.vepa@qlogic.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2011-10-06 09:33:35 -07:00
Julia Lawall	e2e435f290	RDMA/nes: Add missing calls to ib_umem_release() Add calls to ib_umem_release(), as in the other error-handling code in nes_reg_user_mr(). Signed-off-by: Julia Lawall <julia@diku.dk> Signed-off-by: Roland Dreier <roland@purestorage.com>	2011-10-06 09:33:24 -07:00
Hefty, Sean	caf6e3f221	RDMA/ucm: Removed checks for unsigned value < 0 cmd is unsigned, no need to check for < 0. Found by code inspection. Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2011-10-06 09:33:05 -07:00
Hefty, Sean	b7ab0b19a8	IB/mad: Verify mgmt class in received MADs If a received MAD contains an invalid or reserved mgmt class, we will attempt to access method_table outside of its range. Add a check to ensure that mgmt class falls within the handled range. Found by code inspection. Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2011-10-06 09:33:05 -07:00
Hefty, Sean	f45ee80eb0	RDMA/cma: Check for NULL conn_param in rdma_accept Check that conn_param is not null before dereferencing it when processing rdma_accept(). This is necessary to prevent a possible system crash, which can be caused by user space. Problem found by code inspection. Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2011-10-06 09:33:04 -07:00
Yong Zhang	10889a3643	IB/ehca: Remove IRQF_DISABLED, since it's a no-op Since commit `e58aa3d2d0` ("genirq: Run irq handlers with interrupts disabled"), we run all interrupt handlers with interrupts disabled and we even check and yell when an interrupt handler returns with interrupts enabled -- cf commit `b738a50a20` ("genirq: Warn when handler enables interrupts"). So now this flag is a no-op and can be removed. Signed-off-by: Yong Zhang <yong.zhang0@gmail.com> Acked-by: Hoang-Nam Nguyen <hnguyen@de.ibm.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2011-10-06 09:33:04 -07:00
Steve Wise	9efe10a1e1	RDMA/cxgb4: Fail RDMA initialization for unsupported cards The iw_cxgb4 module crashes at init time if the T4 card does not support RDMA. So clean up the init logic to correctly deal with non-RDMA cards. - If any RDMA resources are not available, then fail the initialization logging an info message. - Clean up properly on initialization failures. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2011-10-06 09:32:44 -07:00
Hefty, Sean	9595480c5d	RDMA/cma: Fix crash in cma_req_handler The RDMA CM uses the local qp_type to determine how to process an incoming request. This can result in an incoming REQ being treated as a SIDR REQ and vice versa. Fix this by switching off the event type instead, and for good measure verify that the listener supports the incoming connection request. This problem showed up when a user space application mismatched the QP types between a client and server app. Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2011-10-06 09:32:33 -07:00
Andy Shevchenko	2be6053318	RDMA/amso1100: Use '%pM' format option to print MAC Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Acked-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2011-10-06 09:32:24 -07:00
Neil Horman	e48f129c2f	[SCSI] cxgb3i: convert cdev->l2opt to use rcu to prevent NULL dereference This oops was reported recently: d:mon> e cpu 0xd: Vector: 300 (Data Access) at [c0000000fd4c7120] pc: d00000000076f194: .t3_l2t_get+0x44/0x524 [cxgb3] lr: d000000000b02108: .init_act_open+0x150/0x3d4 [cxgb3i] sp: c0000000fd4c73a0 msr: 8000000000009032 dar: 0 dsisr: 40000000 current = 0xc0000000fd640d40 paca = 0xc00000000054ff80 pid = 5085, comm = iscsid d:mon> t [c0000000fd4c7450] d000000000b02108 .init_act_open+0x150/0x3d4 [cxgb3i] [c0000000fd4c7500] d000000000e45378 .cxgbi_ep_connect+0x784/0x8e8 [libcxgbi] [c0000000fd4c7650] d000000000db33f0 .iscsi_if_rx+0x71c/0xb18 [scsi_transport_iscsi2] [c0000000fd4c7740] c000000000370c9c .netlink_data_ready+0x40/0xa4 [c0000000fd4c77c0] c00000000036f010 .netlink_sendskb+0x4c/0x9c [c0000000fd4c7850] c000000000370c18 .netlink_sendmsg+0x358/0x39c [c0000000fd4c7950] c00000000033be24 .sock_sendmsg+0x114/0x1b8 [c0000000fd4c7b50] c00000000033d208 .sys_sendmsg+0x218/0x2ac [c0000000fd4c7d70] c00000000033f55c .sys_socketcall+0x228/0x27c [c0000000fd4c7e30] c0000000000086a4 syscall_exit+0x0/0x40 --- Exception: c01 (System Call) at 00000080da560cfc The root cause was an EEH error, which sent us down the offload_close path in the cxgb3 driver, which in turn sets cdev->l2opt to NULL, without regard for upper layer driver (like the cxgbi drivers) which might have execution contexts in the middle of its use. The result is the oops above, when t3_l2t_get attempts to dereference L2DATA(cdev)->nentries in arp_hash right after the EEH error handler sets it to NULL. The fix is to prevent the setting of the NULL pointer until after there are no further users of it. The t3cdev->l2opt pointer is now converted to be an rcu pointer and the L2DATA macro is now called under the protection of the rcu_read_lock(). When the EEH error path: t3_adapter_error->offload_close->cxgb3_offload_deactivate Is exectured, setting of that l2opt pointer to NULL, is now gated on an rcu quiescence point, preventing, allowing L2DATA callers to safely check for a NULL pointer without concern that the underlying data will be freeded before the pointer is dereferenced. This has been tested by the reporter and shown to fix the reproted oops [nhorman: fix up unitinialised variable reported by Dan Carpenter] Signed-off-by: Neil Horman <nhorman@tuxdriver.com> Reviewed-by: Karen Xie <kxie@chelsio.com> Cc: stable@kernel.org Signed-off-by: James Bottomley <JBottomley@Parallels.com>	2011-09-26 09:28:01 -05:00
David S. Miller	8decf86879	Merge branch 'master' of github.com:davem330/net Conflicts: MAINTAINERS drivers/net/Kconfig drivers/net/ethernet/broadcom/bnx2x/bnx2x_link.c drivers/net/ethernet/broadcom/tg3.c drivers/net/wireless/iwlwifi/iwl-pci.c drivers/net/wireless/iwlwifi/iwl-trans-tx-pcie.c drivers/net/wireless/rt2x00/rt2800usb.c drivers/net/wireless/wl12xx/main.c	2011-09-22 03:23:13 -04:00
Mike Christie	f27fb2ef7b	[SCSI] iscsi class: sysfs group is_visible callout for iscsi host attrs The iscsi class currently does not support writable sysfs attrs for LLD sysfs settings. This patch converts the iscsi class and driver's host attrs to use the attribute container sysfs group and the sysfs group's is_visible callout to be able to support readable or writable sysfs attrs. Signed-off-by: Mike Christie <michaelc@cs.wisc.edu> Signed-off-by: James Bottomley <JBottomley@Parallels.com>	2011-08-27 08:36:14 -06:00
Mike Christie	1d063c1729	[SCSI] iscsi class: sysfs group is_visible callout for session attrs The iscsi class currently does not support writable sysfs attrs for LLD sysfs settings. This patch converts the iscsi class and driver's session attrs to use the attribute container sysfs group and the sysfs group's is_visible callout to be able to support readable or writable sysfs attrs. Signed-off-by: Mike Christie <michaelc@cs.wisc.edu> Signed-off-by: James Bottomley <JBottomley@Parallels.com>	2011-08-27 08:36:06 -06:00
Mike Christie	3128c6c73c	[SCSI] iscsi cls: sysfs group is_visible callout for conn attrs The iscsi class currently does not support writable sysfs attrs for LLD sysfs settings. This patch converts the iscsi class and drivers to use the attribute container sysfs group and the sysfs group's is_visible callout to be able to support readable or writable sysfs attrs. Signed-off-by: Mike Christie <michaelc@cs.wisc.edu> Signed-off-by: James Bottomley <JBottomley@Parallels.com>	2011-08-27 08:36:03 -06:00
Ian Campbell	5581be3b48	IPoIB: convert to SKB paged frag API. Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Roland Dreier <roland@kernel.org> Cc: Sean Hefty <sean.hefty@intel.com> Cc: Hal Rosenstock <hal.rosenstock@gmail.com> Cc: linux-rdma@vger.kernel.org Cc: netdev@vger.kernel.org Signed-off-by: David S. Miller <davem@davemloft.net>	2011-08-26 12:38:42 -04:00
Ian Campbell	cf383ebb13	IB: nes: convert to SKB paged frag API. Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Faisal Latif <faisal.latif@intel.com> Cc: Roland Dreier <roland@kernel.org> Cc: Sean Hefty <sean.hefty@intel.com> Cc: Hal Rosenstock <hal.rosenstock@gmail.com> Cc: linux-rdma@vger.kernel.org Cc: netdev@vger.kernel.org Acked-by: Faisal Latif <faisal.latif@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-08-26 12:38:42 -04:00
Ian Campbell	70b1052ad2	IB: amso1100: convert to SKB paged frag API. Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Tom Tucker <tom@opengridcomputing.com> Cc: Steve Wise <swise@opengridcomputing.com> Cc: Roland Dreier <roland@kernel.org> Cc: Sean Hefty <sean.hefty@intel.com> Cc: Hal Rosenstock <hal.rosenstock@gmail.com> Cc: linux-rdma@vger.kernel.org Cc: netdev@vger.kernel.org Acked-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-08-26 12:38:42 -04:00
Jiri Pirko	afc4b13df1	net: remove use of ndo_set_multicast_list in drivers replace it by ndo_set_rx_mode Signed-off-by: Jiri Pirko <jpirko@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-08-17 20:22:03 -07:00
Roland Dreier	80b43de837	Merge branches 'ipoib' and 'iser' into for-next	2011-08-17 10:57:43 -07:00
Or Gerlitz	200ae1a08b	IB/iser: Support iSCSI PDU padding RFC3270 mandates that iSCSI PDUs are padded to the closest integer number of four byte words. Fix the iser code to support that on both the TX/RX flows. Signed-off-by: Or Gerlitz <ogerlitz@mellanox.co.il> Signed-off-by: Roland Dreier <roland@purestorage.com>	2011-08-17 09:45:07 -07:00
Or Gerlitz	0ace64b85e	IBiser: Fix wrong mask when sizeof (dma_addr_t) > sizeof (unsigned long) The code that prepares the SG associated with SCSI command for FMR was buggy for systems with DMA addresses that don't fit in unsigned long, e.g under the 32-bit based XenServer dom0 sizeof(dma_addr_t) is 8. Fix that by casting to unsigned long long a masking constant used by the code. This resolves a crash in iser_sg_to_page_vec on this system. Signed-off-by: Or Gerlitz <ogerlitz@mellanox.co.il> Signed-off-by: Roland Dreier <roland@purestorage.com>	2011-08-17 09:40:55 -07:00
Bernd Schubert	22cfb0bf67	IPoIB: Fix possible NULL dereference in ipoib_start_xmit() Fix a bug introduced in `69cce1d140` ("net: Abstract dst->neighbour accesses behind helpers.") where we might dereference skb_dst(skb) even if it is NULL, which causes: [ 240.944030] BUG: unable to handle kernel NULL pointer dereference at 0000000000000040 [ 240.948007] IP: [<ffffffffa0366ce9>] ipoib_start_xmit+0x39/0x280 [ib_ipoib] [...] [ 240.948007] Call Trace: [ 240.948007] <IRQ> [ 240.948007] [<ffffffff812cd5e0>] dev_hard_start_xmit+0x2a0/0x590 [ 240.948007] [<ffffffff8131f680>] ? arp_create+0x70/0x200 [ 240.948007] [<ffffffff812e8e1f>] sch_direct_xmit+0xef/0x1c0 Addresses: https://bugzilla.kernel.org/show_bug.cgi?id=41212 Signed-off-by: Bernd Schubert <bernd.schubert@itwm.fraunhofer.de> Signed-off-by: Roland Dreier <roland@purestorage.com>	2011-08-16 10:19:20 -07:00
David S. Miller	af3dcd2f44	mlx4: Fix infiniband Kconfig dependencies. Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-08-11 23:05:05 -07:00
Jeff Kirsher	f7917c009c	chelsio: Move the Chelsio drivers Moves the drivers for the Chelsio chipsets into drivers/net/ethernet/chelsio/ and the necessary Kconfig and Makefile changes. CC: Divy Le Ray <divy@chelsio.com> CC: Dimitris Michailidis <dm@chelsio.com> CC: Casey Leedom <leedom@chelsio.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2011-08-10 19:54:52 -07:00
Linus Torvalds	91d41fdf31	Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending * 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending: target: Convert to DIV_ROUND_UP_SECTOR_T usage for sectors / dev_max_sectors kernel.h: Add DIV_ROUND_UP_ULL and DIV_ROUND_UP_SECTOR_T macro usage iscsi-target: Add iSCSI fabric support for target v4.1 iscsi: Add Serial Number Arithmetic LT and GT into iscsi_proto.h iscsi: Use struct scsi_lun in iscsi structs instead of u8[8] iscsi: Resolve iscsi_proto.h naming conflicts with drivers/target/iscsi	2011-07-27 13:21:40 -07:00
Arun Sharma	60063497a9	atomic: use <linux/atomic.h> This allows us to move duplicated code in <asm/atomic.h> (atomic_inc_not_zero() for now) to <linux/atomic.h> Signed-off-by: Arun Sharma <asharma@fb.com> Reviewed-by: Eric Dumazet <eric.dumazet@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: David Miller <davem@davemloft.net> Cc: Eric Dumazet <eric.dumazet@gmail.com> Acked-by: Mike Frysinger <vapier@gentoo.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-07-26 16:49:47 -07:00
Nicholas Bellinger	123521830c	iscsi: Resolve iscsi_proto.h naming conflicts with drivers/target/iscsi This patch renames the following iscsi_proto.h structures to avoid namespace issues with drivers/target/iscsi/iscsi_target_core.h: ) struct iscsi_cmd -> struct iscsi_scsi_req ) struct iscsi_cmd_rsp -> struct iscsi_scsi_rsp ) struct iscsi_login -> struct iscsi_login_req This patch includes useful ISCSI_FLAG_LOGIN_[CURRENT,NEXT]_STAGE, and ISCSI_FLAG_SNACK_TYPE_* definitions used by iscsi_target_mod, and fixes the incorrect definition of struct iscsi_snack to following RFC-3720 Section 10.16. SNACK Request. Also, this patch updates libiscsi, iSER, be2iscsi, and bn2xi to use the updated structure definitions in a handful of locations. Signed-off-by: Mike Christie <michaelc@cs.wisc.edu> Signed-off-by: Nicholas A. Bellinger <nab@linux-iscsi.org>	2011-07-25 07:18:45 +00:00
Linus Torvalds	ece236ce2f	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: (26 commits) IB/qib: Defer HCA error events to tasklet mlx4_core: Bump the driver version to 1.0 RDMA/cxgb4: Use printk_ratelimited() instead of printk_ratelimit() IB/mlx4: Support PMA counters for IBoE IB/mlx4: Use flow counters on IBoE ports IB/pma: Add include file for IBA performance counters definitions mlx4_core: Add network flow counters mlx4_core: Fix location of counter index in QP context struct mlx4_core: Read extended capabilities into the flags field mlx4_core: Extend capability flags to 64 bits IB/mlx4: Generate GID change events in IBoE code IB/core: Add GID change event RDMA/cma: Don't allow IPoIB port space for IBoE RDMA: Allow for NULL .modify_device() and .modify_port() methods IB/qib: Update active link width IB/qib: Fix potential deadlock with link down interrupt IB/qib: Add sysfs interface to read free contexts IB/mthca: Remove unnecessary read of PCI_CAP_ID_EXP IB/qib: Remove double define IB/qib: Remove unnecessary read of PCI_CAP_ID_EXP ...	2011-07-22 14:50:12 -07:00
Roland Dreier	4460207561	Merge branches 'cma', 'cxgb4', 'ipath', 'misc', 'mlx4', 'mthca', 'qib' and 'srp' into for-next	2011-07-22 11:56:11 -07:00
Mike Marciniszyn	e67306a380	IB/qib: Defer HCA error events to tasklet With ib_qib options: options ib_qib krcvqs=1 pcie_caps=0x51 rcvhdrcnt=4096 singleport=1 ibmtu=4 a run of ib_write_bw -a yields the following: ------------------------------------------------------------------ #bytes #iterations BW peak[MB/sec] BW average[MB/sec] 1048576 5000 2910.64 229.80 ------------------------------------------------------------------ The top cpu use in a profile is: CPU: Intel Architectural Perfmon, speed 2400.15 MHz (estimated) Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit mask of 0x00 (No unit mask) count 1002300 Counted LLC_MISSES events (Last level cache demand requests from this core that missed the LLC) with a unit mask of 0x41 (No unit mask) count 10000 samples % samples % app name symbol name 15237 29.2642 964 17.1195 ib_qib.ko qib_7322intr 12320 23.6618 1040 18.4692 ib_qib.ko handle_7322_errors 4106 7.8860 0 0 vmlinux vsnprintf Analysis of the stats, profile, the code, and the annotated profile indicate: - All of the overflow interrupts (one per packet overflow) are serviced on CPU0 with no mitigation on the frequency. - All of the receive interrupts are being serviced by CPU0. (That is the way truescale.cmds statically allocates the kctx IRQs to CPU) - The code is spending all of its time servicing QIB_I_C_ERROR RcvEgrFullErr interrupts on CPU0, starving the packet receive processing. - The decode_err routine is very inefficient, using a printf variant to format a "%s" and continues to loop when the errs mask has been cleared. - Both qib_7322intr and handle_7322_errors read pci registers, which is very inefficient. The fix does the following: - Adds a tasklet to service QIB_I_C_ERROR - Replaces the very inefficient scnprintf() with a memcpy(). A field is added to qib_hwerror_msgs to save the sizeof("string") at compile time so that a strlen is not needed during err_decode(). - The most frequent errors (Overflows) are serviced first to exit the loop as early as possible. - The loop now exits as soon as the errs mask is clear rather than fruitlessly looping through the msp array. With this fix the performance changes to: ------------------------------------------------------------------ #bytes #iterations BW peak[MB/sec] BW average[MB/sec] 1048576 5000 2990.64 2941.35 ------------------------------------------------------------------ During testing of the error handling overflow patch, it was determined that some CPU's were slower when servicing both overflow and receive interrupts on CPU0 with different MSI interrupt vectors. This patch adds an option (krcvq01_no_msi) to not use a dedicated MSI interrupt for kctx's < 2 and to service them on the default interrupt. For some CPUs, the cost of the interrupt enter/exit is more costly than then the additional PCI read in the default handler. Signed-off-by: Mike Marciniszyn <mike.marciniszyn@qlogic.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2011-07-22 11:56:05 -07:00
Jiri Pirko	7033c4ad87	nes: do vlan cleanup - unify vlan and nonvlan rx path - kill nesvnic->vlan_grp and nes_netdev_vlan_rx_register - allow to turn on/off rx/tx vlan accel via ethtool (set_features) Signed-off-by: Jiri Pirko <jpirko@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-07-21 13:47:53 -07:00
Manuel Zerpies	3cbe182a5b	RDMA/cxgb4: Use printk_ratelimited() instead of printk_ratelimit() Since printk_ratelimit() shouldn't be used anymore (see comment in include/linux/printk.h), replace it with printk_ratelimited(). Signed-off-by: Manuel Zerpies <manuel.f.zerpies@ww.stud.uni-erlangen.de> Signed-off-by: Roland Dreier <roland@purestorage.com>	2011-07-18 21:18:39 -07:00
Or Gerlitz	c37791349c	IB/mlx4: Support PMA counters for IBoE Use the per port counter attached to all QPs created on that port to implement port level packets/bytes performance counters a la IB. Derived from a patch by Eli Cohen <eli@mellanox.co.il> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.co.il> Signed-off-by: Roland Dreier <roland@purestorage.com>	2011-07-18 21:04:36 -07:00
Or Gerlitz	cfcde11c3d	IB/mlx4: Use flow counters on IBoE ports Allocate flow counter per Ethernet/IBoE port, and attach this counter to all the QPs created on that port. Based on patch by Eli Cohen <eli@mellanox.co.il>. Signed-off-by: Or Gerlitz <ogerlitz@mellanox.co.il> Signed-off-by: Roland Dreier <roland@purestorage.com>	2011-07-18 21:04:36 -07:00
Or Gerlitz	6aea213a62	IB/pma: Add include file for IBA performance counters definitions Move the various definitions and mad structures needed for software implementation of IBA PM agent from the ipath and qib drivers into a single include file, which in turn could be used by more consumers. Signed-off-by: Or Gerlitz <ogerlitz@mellanox.co.il> Signed-off-by: Roland Dreier <roland@purestorage.com>	2011-07-18 21:04:35 -07:00
Or Gerlitz	6451c712fe	IB/mlx4: Generate GID change events in IBoE code IBoE doesn't use LIDs. Use the GID change event to update the IB core cache for addition/deletion of GIDs. Signed-off-by: Eli Cohen <eli@mellanox.co.il> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.co.il> Signed-off-by: Roland Dreier <roland@purestorage.com>	2011-07-18 21:04:31 -07:00
Or Gerlitz	761d90ed4c	IB/core: Add GID change event Add IB GID change event type. This is needed for IBoE when the HW driver updates the GID (e.g when new VLANs are added/deleted) table and the change should be reflected to the IB core cache. Signed-off-by: Eli Cohen <eli@mellanox.co.il> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.co.il> Signed-off-by: Roland Dreier <roland@purestorage.com>	2011-07-18 21:04:30 -07:00
Moni Shoua	2efdd6a038	RDMA/cma: Don't allow IPoIB port space for IBoE This patch fixes a kernel crash in cma_set_qkey(). When the link layer is Ethernet, it is wrong to use IPoIB port space since no IPoIB interface is available. Specifically, setting the Q_Key when port space is RDMA_PS_IPOIB requires MGID calculation and an SA query, which doesn't make sense over Ethernet. Signed-off-by: Moni Shoua <monis@mellanox.co.il> Acked-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2011-07-18 16:50:25 -07:00
Bart Van Assche	10e1b54bbb	RDMA: Allow for NULL .modify_device() and .modify_port() methods These methods don't make sense for iWARP devices, so rather than forcing them to implement stubs, just return -ENOSYS in the core if the hardware driver doesn't set .modify_device and/or .modify_port. Signed-off-by: Roland Dreier <roland@purestorage.com>	2011-07-18 16:44:30 -07:00
Mitko Haralanov	e800bd032c	IB/qib: Update active link width Update the active link width on QLE7220 chips when link goes down if chip width does not match shadowed width. Signed-off-by: Mitko Haralanov <mitko@qlogic.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@qlogic.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2011-07-18 12:09:26 -07:00
Ram Vepa	4356d0b64b	IB/qib: Fix potential deadlock with link down interrupt There is a possibility of a deadlock due to the way locks are acquired and released in qib_set_uevent_bits(). The function qib_set_uevent_bits() is called in process context and it uses spin_lock() and spin_unlock(). This same lock is acquired/released in interrupt context which can lead to a deadlock when running on the same cpu. The fix is to replace spin_lock() and spin_unlock() with spin_lock_irqsave() and spin_unlock_irqrestore() respectively in qib_set_uevent_bits(). Signed-off-by: Mike Marciniszyn <mike.marciniszyn@qlogic.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2011-07-18 12:09:23 -07:00
Ram Vepa	2df4f7579d	IB/qib: Add sysfs interface to read free contexts Indicate the number of free user contexts via the sysfs file /sys/class/infiniband/qib0/nfreectxts as required for PSM. Signed-off-by: Mike Marciniszyn <mike.marciniszyn@qlogic.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2011-07-18 12:09:19 -07:00
Jon Mason	9b89925c0d	IB/mthca: Remove unnecessary read of PCI_CAP_ID_EXP The PCIE capability offset is saved during PCI bus walking. It will remove an unnecessary search in the PCI configuration space if this value is referenced instead of reacquiring it. Also, pci_is_pcie is a better way of determining if the device is PCIE or not (as it uses the same saved PCIE capability offset). Signed-off-by: Jon Mason <jdmason@kudzu.us> Signed-off-by: Roland Dreier <roland@purestorage.com>	2011-07-18 12:01:22 -07:00
Edwin van Vliet	ac0cae4495	IB/qib: Remove double define Signed-off-by: Edwin van Vliet <edwin@cheatah.nl> Reviewed-by: Jesper Juhl <jj@chaosbits.net> Acked-by: Mike Marciniszyn <mike.marciniszyn@qlogic.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2011-07-18 11:59:23 -07:00
Jon Mason	7f27cda037	IB/qib: Remove unnecessary read of PCI_CAP_ID_EXP The PCIE capability offset is saved during PCI bus walking. It will remove an unnecessary search in the PCI configuration space if this value is referenced instead of reacquiring it. Signed-off-by: Jon Mason <jdmason@kudzu.us> Acked-by: Mike Marciniszyn <mike.marciniszyn@qlogic.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2011-07-18 11:57:52 -07:00
Motohiro KOSAKI	5763181172	IB/ipath: Convert old cpumask api into new one Adapt to new api. We plan to remove old one later. Almost all changes are trivial, but there is one real fix: the following code is unsafe: int ncpus = num_online_cpus() for (i = 0; i < ncpus; i++) { .. } because 1) we don't guarantee last bit of online cpus is equal to num_online_cpus(). some arch assign sparse cpu number. 2) cpu hotplugging may change cpu_online_mask at same time. we need to pin it by get_online_cpus(). Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Acked-by: Mike Marciniszyn <mike.marciniszyn@qlogic.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2011-07-18 11:56:18 -07:00
Motohiro KOSAKI	0cd85e6738	IB/qib: Convert old cpumask api into new one Adapt to use new APIs. We plan to remove old one later and plan to change current->cpus_allowed implementation. No functional change. Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Acked-by: Mike Marciniszyn <mike.marciniszyn@qlogic.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2011-07-18 11:56:03 -07:00
Jack Morgenstein	0c9361fcde	RDMA/cma: Avoid assigning an IS_ERR value to cm_id pointer in CMA id object Avoid assigning an IS_ERR value to the cm_id pointer. This fixes a few anomalies in the error flow due to confusion about checking for NULL vs IS_ERR, and eliminates the need to test for the IS_ERR value every time we wish to determine if the cma_id object has a cm device associated with it. Also, eliminate the now-unnecessary procedure cma_has_cm_dev (we can check directly for the existence of the device pointer -- for a non-NULL check, makes no difference if it is the iwarp or the ib pointer). Finally, make a few code changes here to improve coding consistency. Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Roland Dreier <roland@purestorage.com>	2011-07-18 09:44:55 -07:00
David S. Miller	69cce1d140	net: Abstract dst->neighbour accesses behind helpers. dst_{get,set}_neighbour() Signed-off-by: David S. Miller <davem@davemloft.net>	2011-07-17 23:11:35 -07:00
Goldwyn Rodrigues	cdb73db0b6	IB/mthca: Stop returning separate error and status from FW commands Instead of having firmware command functions return an error and also a status, leading to code like: err = mthca_FW_COMMAND(..., &status); if (err) goto out; if (status) { err = -E...; goto out; } all over the place, just handle the FW status inside the FW command handling code (the way mlx4 does it), so we can simply write: err = mthca_FW_COMMAND(...); if (err) goto out; In addition to simplifying the source code, this also saves a healthy chunk of text: add/remove: 0/0 grow/shrink: 10/88 up/down: 510/-3357 (-2847) function old new delta static.trans_table 324 584 +260 mthca_cmd_poll 352 477 +125 mthca_cmd_wait 511 567 +56 mthca_table_put 213 240 +27 mthca_cleanup_db_tab 372 387 +15 __mthca_remove_one 314 323 +9 mthca_cleanup_user_db_tab 275 283 +8 __mthca_init_one 1738 1746 +8 mthca_cleanup 20 21 +1 mthca_MAD_IFC 1081 1082 +1 mthca_MGID_HASH 43 40 -3 mthca_MAP_ICM_AUX 23 20 -3 mthca_MAP_ICM 19 16 -3 mthca_MAP_FA 23 20 -3 mthca_READ_MGM 43 38 -5 mthca_QUERY_SRQ 43 38 -5 mthca_QUERY_QP 59 54 -5 mthca_HW2SW_SRQ 43 38 -5 mthca_HW2SW_MPT 60 55 -5 mthca_HW2SW_EQ 43 38 -5 mthca_HW2SW_CQ 43 38 -5 mthca_free_icm_table 120 114 -6 mthca_query_srq 214 206 -8 mthca_free_qp 662 654 -8 mthca_cmd 38 28 -10 mthca_alloc_db 1321 1311 -10 mthca_setup_hca 1067 1055 -12 mthca_WRITE_MTT 35 22 -13 mthca_WRITE_MGM 40 27 -13 mthca_UNMAP_ICM_AUX 36 23 -13 mthca_UNMAP_FA 36 23 -13 mthca_SYS_DIS 36 23 -13 mthca_SYNC_TPT 36 23 -13 mthca_SW2HW_SRQ 35 22 -13 mthca_SW2HW_MPT 35 22 -13 mthca_SW2HW_EQ 35 22 -13 mthca_SW2HW_CQ 35 22 -13 mthca_RUN_FW 36 23 -13 mthca_DISABLE_LAM 36 23 -13 mthca_CLOSE_IB 36 23 -13 mthca_CLOSE_HCA 38 25 -13 mthca_ARM_SRQ 39 26 -13 mthca_free_icms 178 164 -14 mthca_QUERY_DDR 389 375 -14 mthca_resize_cq 1063 1048 -15 mthca_unmap_eq_icm 123 107 -16 mthca_map_eq_icm 396 380 -16 mthca_cmd_box 90 74 -16 mthca_SET_IB 433 417 -16 mthca_RESIZE_CQ 369 353 -16 mthca_MAP_ICM_page 240 224 -16 mthca_MAP_EQ 183 167 -16 mthca_INIT_IB 473 457 -16 mthca_INIT_HCA 745 729 -16 mthca_map_user_db 816 798 -18 mthca_SYS_EN 157 139 -18 mthca_cleanup_qp_table 78 59 -19 mthca_cleanup_eq_table 168 149 -19 mthca_UNMAP_ICM 143 121 -22 mthca_modify_srq 172 149 -23 mthca_unmap_fmr 198 174 -24 mthca_query_qp 814 790 -24 mthca_query_pkey 343 319 -24 mthca_SET_ICM_SIZE 34 10 -24 mthca_QUERY_DEV_LIM 1870 1846 -24 mthca_map_cmd 1130 1105 -25 mthca_ENABLE_LAM 401 375 -26 mthca_modify_port 247 220 -27 mthca_query_device 884 850 -34 mthca_NOP 75 41 -34 mthca_table_get 287 249 -38 mthca_init_qp_table 333 293 -40 mthca_MODIFY_QP 348 308 -40 mthca_close_hca 131 89 -42 mthca_free_eq 435 390 -45 mthca_query_port 755 705 -50 mthca_free_cq 581 528 -53 mthca_alloc_icm_table 578 524 -54 mthca_multicast_attach 1041 986 -55 mthca_init_hca 326 271 -55 mthca_query_gid 487 431 -56 mthca_free_srq 524 468 -56 mthca_free_mr 168 111 -57 mthca_create_eq 1560 1501 -59 mthca_multicast_detach 790 728 -62 mthca_write_mtt 918 854 -64 mthca_register_device 1406 1342 -64 mthca_fmr_alloc 947 883 -64 mthca_mr_alloc 652 582 -70 mthca_process_mad 1242 1164 -78 mthca_dev_lim 910 830 -80 find_mgm 482 400 -82 mthca_modify_qp 3852 3753 -99 mthca_init_cq 1281 1181 -100 mthca_alloc_srq 1719 1610 -109 mthca_init_eq_table 1807 1679 -128 mthca_init_tavor 761 491 -270 mthca_init_arbel 2617 2098 -519 Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.de>	2011-07-15 13:33:20 -07:00
David S. Miller	6a7ebdf2fd	Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: net/bluetooth/l2cap_core.c	2011-07-14 07:56:40 -07:00
Bart Van Assche	fd1b6c4a69	IB/srp: Avoid duplicate devices from LUN scan SCSI scanning of a channel🆔lun triplet in Linux works as follows (function scsi_scan_target() in drivers/scsi/scsi_scan.c): - If lun == SCAN_WILD_CARD, send a REPORT LUNS command to the target and process the result. - If lun != SCAN_WILD_CARD, send an INQUIRY command to the LUN corresponding to the specified channel🆔lun triplet to verify whether the LUN exists. So a SCSI driver must either take the channel and target id values in account in its quecommand() function or it should declare that it only supports one channel and one target id. Currently the ib_srp driver does neither. As a result scanning the SCSI bus via e.g. rescan-scsi-bus.sh causes many duplicate SCSI devices to be created. For each 0:0:L device, several duplicates are created with the same LUN number and with (C:I) != (0:0). Fix this by declaring that the ib_srp driver only supports one channel and one target id. Signed-off-by: Bart Van Assche <bvanassche@acm.org> Cc: <stable@kernel.org> Acked-by: David Dillow <dillowda@ornl.gov> Signed-off-by: Roland Dreier <roland@purestorage.com>	2011-07-13 09:19:16 -07:00
David S. Miller	e12fe68ce3	Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6	2011-07-05 23:23:37 -07:00
Goldwyn Rodrigues	b2bc478219	RDMA: Check for NULL mode in .devnode methods Commits `71c29bd5c2` ("IB/uverbs: Add devnode method to set path/mode") and `c3af0980ce` ("IB: Add devnode methods to cm_class and umad_class") added devnode methods that set the mode. However, these methods don't check for a NULL mode, and so we get a crash when unloading modules because devtmpfs_delete_node() calls device_get_devnode() with mode == NULL. Add the missing checks. Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.de> [ Also fix cm.c. - Roland ] Signed-off-by: Roland Dreier <roland@purestorage.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-07-04 15:53:28 -07:00
Roland Dreier	c7d74b0909	Merge branches 'cxgb4' and 'qib' into for-next	2011-06-17 11:57:55 -07:00
Mitko Haralanov	3126448451	IB/qib: Ensure that LOS and DFE are being turned off Due to timing, it is possible for the LOS and DFE to remain on. This is due to the link progressing to LinkUP prior to the driver getting the first Status Changed interrupt. By expanding the conditions under which LOS is turned off and DFE timeout is being set, timing is no longer an issue. Signed-off-by: Mitko Haralanov <mitko@qlogic.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@qlogic.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2011-06-17 11:56:59 -07:00
Steve Wise	8da7e7a552	RDMA/cxgb4: Couple of abort fixes - fix a race where the driver could end up sending a close_con_req after an abort_rpl. In c4iw_ep_disconnect(), send abort or close request with the ep mutex held. - fix a hang where driver fails to wake up when a connection is reset during a normal close. Wake up any waiters in the interrupt path, and correctly cleanup after rdma_fini() failures. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2011-06-17 11:54:56 -07:00
Steve Wise	301c2c3f03	RDMA/cxgb4: Don't truncate MR lengths Remove left-over code from T3 that limited MR sizes to 32b. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2011-06-17 11:54:50 -07:00
Steve Wise	2ff7d09a1b	RDMA/cxgb4: Don't exceed hw IQ depth limit for user CQs Memory allocated for user CQs gets rounded up to the next page boundary. And after rounding, we recalculate the resulting IQ depth and we need to make sure we don't exceed the HW limits. This bug can result a much smaller CQ allocated than was expected if the HW size field is exceeded, resulting in CQ overflow failures. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2011-06-17 11:52:45 -07:00
Greg Rose	c7ac8679be	rtnetlink: Compute and store minimum ifinfo dump size The message size allocated for rtnl ifinfo dumps was limited to a single page. This is not enough for additional interface info available with devices that support SR-IOV and caused a bug in which VF info would not be displayed if more than approximately 40 VFs were created per interface. Implement a new function pointer for the rtnl_register service that will calculate the amount of data required for the ifinfo dump and allocate enough data to satisfy the request. Signed-off-by: Greg Rose <gregory.v.rose@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2011-06-09 20:38:07 -07:00
Alexey Dobriyan	a6b7a40786	net: remove interrupt.h inclusion from netdevice.h * remove interrupt.g inclusion from netdevice.h -- not needed * fixup fallout, add interrupt.h and hardirq.h back where needed. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-06-06 22:55:11 -07:00
Linus Torvalds	4c171acc20	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: RDMA/cma: Save PID of ID's owner RDMA/cma: Add support for netlink statistics export RDMA/cma: Pass QP type into rdma_create_id() RDMA: Update exported headers list RDMA/cma: Export enum cma_state in <rdma/rdma_cm.h> RDMA/nes: Add a check for strict_strtoul() RDMA/cxgb3: Don't post zero-byte read if endpoint is going away RDMA/cxgb4: Use completion objects for event blocking IB/srp: Fix integer -> pointer cast warnings IB: Add devnode methods to cm_class and umad_class IB/mad: Return EPROTONOSUPPORT when an RDMA device lacks the QP required IB/uverbs: Add devnode method to set path/mode RDMA/ucma: Add .nodename/.mode to tell userspace where to create device node RDMA: Add netlink infrastructure RDMA: Add error handling to ib_core_init()	2011-05-26 12:13:57 -07:00
Roland Dreier	8dc4abdf4c	Merge branches 'cma', 'cxgb3', 'cxgb4', 'misc', 'nes', 'netlink', 'srp' and 'uverbs' into for-next	2011-05-25 13:47:20 -07:00
Nir Muchtar	83e9502d8d	RDMA/cma: Save PID of ID's owner Save the PID associated with an RDMA CM ID for reporting via netlink. Signed-off-by: Nir Muchtar <nirm@voltaire.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2011-05-25 13:46:23 -07:00
Nir Muchtar	753f618ae0	RDMA/cma: Add support for netlink statistics export Add callbacks and data types for statistics export of all current devices/ids. The schema for RDMA CM is a series of netlink messages. Each one contains an rdma_cm_stat struct. Additionally, two netlink attributes are created for the addresses for each message (if applicable). Their types used are: RDMA_NL_RDMA_CM_ATTR_SRC_ADDR (The source address for this ID) RDMA_NL_RDMA_CM_ATTR_DST_ADDR (The destination address for this ID) sockaddr_* structs are encapsulated within these attributes. In other words, every transaction contains a series of messages like: -------message 1------- struct rdma_cm_id_stats { __u32 qp_num; __u32 bound_dev_if; __u32 port_space; __s32 pid; __u8 cm_state; __u8 node_type; __u8 port_num; __u8 reserved; } RDMA_NL_RDMA_CM_ATTR_SRC_ADDR attribute - contains the source address RDMA_NL_RDMA_CM_ATTR_DST_ADDR attribute - contains the destination address -------end 1------- -------message 2------- struct rdma_cm_id_stats RDMA_NL_RDMA_CM_ATTR_SRC_ADDR attribute RDMA_NL_RDMA_CM_ATTR_DST_ADDR attribute -------end 2------- Signed-off-by: Nir Muchtar <nirm@voltaire.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2011-05-25 13:46:23 -07:00
Sean Hefty	b26f9b9949	RDMA/cma: Pass QP type into rdma_create_id() The RDMA CM currently infers the QP type from the port space selected by the user. In the future (eg with RDMA_PS_IB or XRC), there may not be a 1-1 correspondence between port space and QP type. For netlink export of RDMA CM state, we want to export the QP type to userspace, so it is cleaner to explicitly associate a QP type to an ID. Modify rdma_create_id() to allow the user to specify the QP type, and use it to make our selections of datagram versus connected mode. Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2011-05-25 13:46:23 -07:00
Nir Muchtar	550e5ca77e	RDMA/cma: Export enum cma_state in <rdma/rdma_cm.h> Move cma.c's internal definition of enum cma_state to enum rdma_cm_state in an exported header so that it can be exported via RDMA netlink. Signed-off-by: Nir Muchtar <nirm@voltaire.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2011-05-25 13:46:22 -07:00
Liu Yuan	52f81dbaf1	RDMA/nes: Add a check for strict_strtoul() It should check if strict_strtoul() succeeds before using 'wqm_quanta_value'. Signed-off-by: Liu Yuan <tailai.ly@taobao.com> [ Convert to kstrtoul() directly while we're here. - Roland ] Signed-off-by: Roland Dreier <roland@purestorage.com>	2011-05-24 10:06:25 -07:00
Steve Wise	807838686e	RDMA/cxgb3: Don't post zero-byte read if endpoint is going away tx_ack() wasn't checking the endpoint state and consequently would attempt to post the p2p 0B read on an endpoint/QP that is closing or aborting. This causes a NULL pointer dereference crash. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2011-05-24 10:01:04 -07:00
Steve Wise	c337374bf2	RDMA/cxgb4: Use completion objects for event blocking There exists a race condition when using wait_queue_head_t objects that are declared on the stack. This was being done in a few places where we are sending work requests to the FW and awaiting replies, but we don't have an endpoint structure with an embedded c4iw_wr_wait struct. So the code was allocating it locally on the stack. Bad design. The race is: 1) thread on cpuX declares the wait_queue_head_t on the stack, then posts a firmware WR with that wait object ptr as the cookie to be returned in the WR reply. This thread will proceed to block in wait_event_timeout() but before it does: 2) An interrupt runs on cpuY with the WR reply. fw6_msg() handles this and calls c4iw_wake_up(). c4iw_wake_up() sets the condition variable in the c4iw_wr_wait object to TRUE and will call wake_up(), but before it calls wake_up(): 3) The thread on cpuX calls c4iw_wait_for_reply(), which calls wait_event_timeout(). The wait_event_timeout() macro checks the condition variable and returns immediately since it is TRUE. So this thread never blocks/sleeps. The function then returns effectively deallocating the c4iw_wr_wait object that was on the stack. 4) So at this point cpuY has a pointer to the c4iw_wr_wait object that is no longer valid. Further its pointing to a stack frame that might now be in use by some other context/thread. So cpuY continues execution and calls wake_up() on a ptr to a wait object that as been effectively deallocated. This race, when it hits, can cause a crash in wake_up(), which I've seen under heavy stress. It can also corrupt the referenced stack which can cause any number of failures. The fix: Use struct completion, which supports on-stack declarations. Completions use a spinlock around setting the condition to true and the wake up so that steps 2 and 4 above are atomic and step 3 can never happen in-between. Signed-off-by: Steve Wise <swise@opengridcomputing.com>	2011-05-24 09:47:38 -07:00
Roland Dreier	737b94eb41	IB/srp: Fix integer -> pointer cast warnings Fix drivers/infiniband/ulp/srp/ib_srp.c: In function 'srp_handle_recv': drivers/infiniband/ulp/srp/ib_srp.c:1150: warning: cast to pointer from integer of different size drivers/infiniband/ulp/srp/ib_srp.c: In function 'srp_send_completion': drivers/infiniband/ulp/srp/ib_srp.c🔢 warning: cast to pointer from integer of different size by adding an intermediate cast to uintptr_t. Signed-off-by: Roland Dreier <roland@purestorage.com> Acked-by: David Dillow <dillowda@ornl.gov>	2011-05-23 11:30:04 -07:00
Roland Dreier	c3af0980ce	IB: Add devnode methods to cm_class and umad_class We want the ucmX, umadX and issmX device nodes to show up under /dev/infiniband, and additionally ucmX should have mode 0666. Add appropriate devnode methods to their class structs for this. Signed-off-by: Roland Dreier <roland@purestorage.com>	2011-05-23 11:24:28 -07:00
Ira Weiny	c8367c4cd9	IB/mad: Return EPROTONOSUPPORT when an RDMA device lacks the QP required We had a script which was looping through the devices returned from ibstat and attempted to register a SMI agent on an ethernet device. This caused a kernel panic for IBoE devices that don't have QP0. Fix this by checking if the QP exists before using it. Signed-off-by: Ira Weiny <weiny2@llnl.gov> Acked-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2011-05-23 11:15:48 -07:00
Roland Dreier	71c29bd5c2	IB/uverbs: Add devnode method to set path/mode We want udev to create a device node under /dev/infiniband with permission 0666 for uverbsX devices, so add a devnode method to set the appropriate info. Signed-off-by: Roland Dreier <roland@purestorage.com>	2011-05-23 11:10:05 -07:00
Roland Dreier	04ea2f8197	RDMA/ucma: Add .nodename/.mode to tell userspace where to create device node We want udev to create a device node under /dev/infiniband with permission 0666 for rdma_cm, so add that info to our struct miscdevice. Signed-off-by: Roland Dreier <roland@purestorage.com> Acked-by: Sean Hefty <sean.hefty@intel.com>	2011-05-23 10:48:43 -07:00
Paul Gortmaker	70c7160619	Add appropriate <linux/prefetch.h> include for prefetch users After discovering that wide use of prefetch on modern CPUs could be a net loss instead of a win, net drivers which were relying on the implicit inclusion of prefetch.h via the list headers showed up in the resulting cleanup fallout. Give them an explicit include via the following $0.02 script. ========================================= #!/bin/bash MANUAL="" for i in `git grep -l 'prefetch(.*)' .` ; do grep -q '<linux/prefetch.h>' $i if [ $? = 0 ] ; then continue fi ( echo '?^#include <linux/?a' echo '#include <linux/prefetch.h>' echo . echo w echo q ) \| ed -s $i > /dev/null 2>&1 if [ $? != 0 ]; then echo $i needs manual fixup MANUAL="$i $MANUAL" fi done echo ------------------- 8\<---------------------- echo vi $MANUAL ========================================= Signed-off-by: Paul <paul.gortmaker@windriver.com> [ Fixed up some incorrect #include placements, and added some non-network drivers and the fib_trie.c case - Linus ] Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-05-22 21:41:57 -07:00
Linus Torvalds	06f4e926d2	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6 * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6: (1446 commits) macvlan: fix panic if lowerdev in a bond tg3: Add braces around 5906 workaround. tg3: Fix NETIF_F_LOOPBACK error macvlan: remove one synchronize_rcu() call networking: NET_CLS_ROUTE4 depends on INET irda: Fix error propagation in ircomm_lmp_connect_response() irda: Kill set but unused variable 'bytes' in irlan_check_command_param() irda: Kill set but unused variable 'clen' in ircomm_connect_indication() rxrpc: Fix set but unused variable 'usage' in rxrpc_get_transport() be2net: Kill set but unused variable 'req' in lancer_fw_download() irda: Kill set but unused vars 'saddr' and 'daddr' in irlan_provider_connect_indication() atl1c: atl1c_resume() is only used when CONFIG_PM_SLEEP is defined. rxrpc: Fix set but unused variable 'usage' in rxrpc_get_peer(). rxrpc: Kill set but unused variable 'local' in rxrpc_UDP_error_handler() rxrpc: Kill set but unused variable 'sp' in rxrpc_process_connection() rxrpc: Kill set but unused variable 'sp' in rxrpc_rotate_tx_window() pkt_sched: Kill set but unused variable 'protocol' in tc_classify() isdn: capi: Use pr_debug() instead of ifdefs. tg3: Update version to 3.119 tg3: Apply rx_discards fix to 5719/5720 ... Fix up trivial conflicts in arch/x86/Kconfig and net/mac80211/agg-tx.c as per Davem.	2011-05-20 13:43:21 -07:00
Roland Dreier	b2cbae2c24	RDMA: Add netlink infrastructure Add basic RDMA netlink infrastructure that allows for registration of RDMA clients for which data is to be exported and supplies message construction callbacks. Signed-off-by: Nir Muchtar <nirm@voltaire.com> [ Reorganize a few things, add CONFIG_NET dependency. - Roland ] Signed-off-by: Roland Dreier <roland@purestorage.com>	2011-05-20 11:46:11 -07:00
Nir Muchtar	fd75c789ab	RDMA: Add error handling to ib_core_init() Fail RDMA midlayer initialization if sysfs setup fails. Signed-off-by: Nir Muchtar <nirm@voltaire.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2011-05-20 11:46:10 -07:00
Benjamin Herrenschmidt	880102e785	Merge remote branch 'origin/master' into merge Manual merge of arch/powerpc/kernel/smp.c and add missing scheduler_ipi() call to arch/powerpc/platforms/cell/interrupt.c Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2011-05-20 15:36:52 +10:00
Benjamin Herrenschmidt	4c8440666b	Merge branch 'merge' into next	2011-05-19 17:00:06 +10:00
Roland Dreier	1df9fad122	Merge branches 'cma', 'cxgb4' and 'qib' into for-next	2011-05-12 08:57:20 -07:00
Sergei Shtylyov	1c65335714	IB/qib: Use pci_dev->revision The driver reads PCI revision ID from the PCI configuration register while it's already stored by PCI subsystem in the revision field of struct pci_dev. Signed-off-by: Sergei Shtylyov <sshtylyov@ru.mvista.com> Acked-by: Mike Marciniszyn <mike.marciniszyn@qlogic.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2011-05-12 08:57:12 -07:00
David S. Miller	5fc3590c81	infiniband: Remove rt->rt_src usage in addr4_resolve() Use an explicit flow key and fetch it from there. Signed-off-by: David S. Miller <davem@davemloft.net>	2011-05-10 13:32:48 -07:00
Roland Dreier	d0c49bf391	RDMA/iwcm: Get rid of enum iw_cm_event_status The IW_CM_EVENT_STATUS_xxx values were used in only a couple of places; cma.c uses -Exxx values instead, and so do the amso1100, cxgb3 and cxgb4 drivers -- only nes was using the enum values (with the mild consequence that all nes connection failures were treated as generic errors rather than reported as timeouts or rejections). We can fix this confusion by getting rid of enum iw_cm_event_status and using a plain int for struct iw_cm_event.status, and converting nes to use -Exxx as the other iWARP drivers do. This also gets rid of the warning drivers/infiniband/core/cma.c: In function 'cma_iw_handler': drivers/infiniband/core/cma.c:1333:3: warning: case value '4294967185' not in enumerated type 'enum iw_cm_event_status' drivers/infiniband/core/cma.c:1336:3: warning: case value '4294967186' not in enumerated type 'enum iw_cm_event_status' drivers/infiniband/core/cma.c:1332:3: warning: case value '4294967192' not in enumerated type 'enum iw_cm_event_status' Signed-off-by: Roland Dreier <roland@purestorage.com> Reviewed-by: Steve Wise <swise@opengridcomputing.com> Reviewed-by: Sean Hefty <sean.hefty@intel.com> Reviewed-by: Faisal Latif <faisal.latif@intel.com>	2011-05-09 22:23:57 -07:00
Sergei Shtylyov	ec03d6777a	IB/ipath: Use pci_dev->revision, again Commit `44c10138fd` ("PCI: Change all drivers to use pci_device->revision") already converted this driver to using the revision field of struct pci_dev but commit `bb9171448d` ("IB/ipath: Misc changes to prepare for IB7220 introduction") later reverted that change for some strange reason. Restore the change. Signed-off-by: Sergei Shtylyov <sshtylyov@ru.mvista.com> Acked-by: Mike Marciniszyn <mike.marciniszyn@qlogic.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2011-05-09 22:07:31 -07:00
Mitko Haralanov	9f5754e34b	IB/qib: Prevent driver hang with unprogrammed boards The time limit test now correctly checks against current jiffies to avoid the hang. Signed-off-by: Mitko Haralanov <mitko@qlogic.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@qlogic.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2011-05-09 22:07:31 -07:00
Steve Wise	2f25e9a540	RDMA/cxgb4: EEH errors can hang the driver A few more EEH fixes: c4iw_wait_for_reply(): detect fatal EEH condition on timeout and return an error. The iw_cxgb4 driver was only calling ib_deregister_device() on an EEH event followed by a ib_register_device() when the device was reinitialized. However, the RDMA core doesn't allow multiple iterations of register/deregister by the provider. See drivers/infiniband/core/sysfs.c: ib_device_unregister_sysfs() where the kobject ref is held until the device is deallocated in ib_deallocate_device(). Calling deregister adds this kobj reference, and then a subsequent register call will generate a WARN_ON() from the kobject subsystem because the kobject is being initialized but is already initialized with the ref held. So the provider must deregister and dealloc when resetting for an EEH event, then alloc/register to re-initialize. To do this, we cannot use the device ptr as our ULD handle since it will change with each reallocation. This commit adds a ULD context struct which is used as the ULD handle, and then contains the device pointer and other state needed. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2011-05-09 22:06:23 -07:00
Steve Wise	d9594d990a	RDMA/cxgb4: Reset wait condition atomically The driver was never really waiting for RDMA_WR/FINI completions because the condition variable used to determine if the completion happened was never reset, and this condition variable is reused for both connection setup and teardown. This causes various driver crashes under heavy loads due to releasing resources too early. The fix is to use atomic bits to correctly reset the condition immediately after the completion is detected. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2011-05-09 22:06:22 -07:00
Roel Kluin	85d215b0f3	RDMA/cxgb4: Fix missing parentheses Parens are missing: '\|' has a higher presedence than '?'. Signed-off-by: Roel Kluin <roel.kluin@gmail.com> Acked-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2011-05-09 22:06:22 -07:00
Steve Wise	bbe9a0a2bc	RDMA/cxgb4: Initialization errors can cause crash c4iw_uld_add() must return ERR_PTR() values instead of NULL on failure. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2011-05-09 22:06:22 -07:00
Steve Wise	30c95c2d49	RDMA/cxgb4: Don't change QP state outside EP lock Concurrent ingress CLOSE and ULP ABORT operations causes a crash due to a race condition where the close path releases the EP lock and then tries to move the QP state to CLOSED. This must be done inside the EP lock to avoid the race. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2011-05-09 22:06:22 -07:00
Hefty, Sean	a9bb79128a	RDMA/cma: Add an ID_REUSEADDR option Lustre requires that clients bind to a privileged port number before connecting to a remote server. On larger clusters (typically more than about 1000 nodes), the number of privileged ports is exhausted, resulting in lustre being unusable. To handle this, we add support for reusable addresses to the rdma_cm. This mimics the behavior of the socket option SO_REUSEADDR. A user may set an rdma_cm_id to reuse an address before calling rdma_bind_addr() (explicitly or implicitly). If set, other rdma_cm_id's may be bound to the same address, provided that they all have reuse enabled, and there are no active listens. If rdma_listen() is called on an rdma_cm_id that has reuse enabled, it will only succeed if there are no other id's bound to that same address. The reuse option is exported to user space. The behavior of the kernel reuse implementation was verified against that given by sockets. This patch is derived from a path by Ira Weiny <weiny2@llnl.gov> Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2011-05-09 22:06:10 -07:00
Hefty, Sean	43b752daae	RDMA/cma: Fix handling of IPv6 addressing in cma_use_port cma_use_port() assumes that the sockaddr is an IPv4 address. Since IPv6 addressing is supported (and also to support other address families) make the code more generic in its address handling. Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2011-05-09 22:06:10 -07:00
David S. Miller	31e4543db2	ipv4: Make caller provide on-stack flow key to ip_route_output_ports(). Signed-off-by: David S. Miller <davem@davemloft.net>	2011-05-03 20:25:42 -07:00
David Decotigny	7073949720	ethtool: cosmetic: Use ethtool ethtool_cmd_speed API This updates the network drivers so that they don't access the ethtool_cmd::speed field directly, but use ethtool_cmd_speed() instead. For most of the drivers, these changes are purely cosmetic and don't fix any problem, such as for those 1GbE/10GbE drivers that indirectly call their own ethtool get_settings()/mii_ethtool_gset(). The changes are meant to enforce code consistency and provide robustness with future larger throughputs, at the expense of a few CPU cycles for each ethtool operation. All drivers compiled with make allyesconfig ion x86_64 have been updated. Tested: make allyesconfig on x86_64 + e1000e/bnx2x work Signed-off-by: David Decotigny <decot@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-04-29 14:03:01 -07:00
Lucas De Marchi	e9c549998d	Revert wrong fixes for common misspellings These changes were incorrectly fixed by codespell. They were now manually corrected. Signed-off-by: Lucas De Marchi <lucas.demarchi@profusion.mobi>	2011-04-26 23:31:11 -07:00
Nishanth Aravamudan	e297d9dd5c	cxgb4: use pgprot_writecombine() on powerpc Commit `fe3cc0d99d` ("powerpc: Add pgprot_writecombine") in benh's tree exposes the pgprot_writecombine() API to drivers on powerpc. cxgb4 has an open-coded version of the same, so use the common API now that it's available. Signed-off-by: Nishanth Aravamudan <nacc@us.ibm.com> Cc: Steve Wise <swise@opengridcomputing.com> Cc: Anton Blanchard <anton@samba.org> Acked-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2011-04-27 14:18:25 +10:00
Michał Mirosław	3d96c74d89	net: infiniband/ulp/ipoib: convert to hw_features Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-04-20 01:30:42 -07:00
Michał Mirosław	dd6f6d0249	net: infiniband/hw/nes: convert to hw_features Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-04-20 01:30:41 -07:00
Lucas De Marchi	25985edced	Fix common misspellings Fixes generated by 'codespell' and manually reviewed. Signed-off-by: Lucas De Marchi <lucas.demarchi@profusion.mobi>	2011-03-31 11:26:23 -03:00
Linus Torvalds	dc50eddb2f	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: RDMA/nes: Fix test of uninitialized netdev	2011-03-25 21:06:37 -07:00
Linus Torvalds	00a2470546	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6 * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (56 commits) route: Take the right src and dst addresses in ip_route_newports ipv4: Fix nexthop caching wrt. scoping. ipv4: Invalidate nexthop cache nh_saddr more correctly. net: fix pch_gbe section mismatch warning ipv4: fix fib metrics mlx4_en: Removing HW info from ethtool -i report. net_sched: fix THROTTLED/RUNNING race drivers/net/a2065.c: Convert release_resource to release_region/release_mem_region drivers/net/ariadne.c: Convert release_resource to release_region/release_mem_region bonding: fix rx_handler locking myri10ge: fix rmmod crash mlx4_en: updated driver version to 1.5.4.1 mlx4_en: Using blue flame support mlx4_core: reserve UARs for userspace consumers mlx4_core: maintain available field in bitmap allocator mlx4: Add blue flame support for kernel consumers mlx4_en: Enabling new steering mlx4: Add support for promiscuous mode in the new steering model. mlx4: generalization of multicast steering. mlx4_en: Reporting HW revision in ethtool -i ...	2011-03-25 21:02:22 -07:00
Roland Dreier	cf55bb2439	RDMA/nes: Fix test of uninitialized netdev Commit `1765a57533` ("net: make dev->master general") introduced a test of an uninitialized netdev. Fix the code so the intended netdev is tested. Signed-off-by: Roland Dreier <roland@purestorage.com>	2011-03-24 17:18:30 -07:00
Linus Torvalds	0625bef606	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: IB: Increase DMA max_segment_size on Mellanox hardware IB/mad: Improve an error message so error code is included RDMA/nes: Don't print success message at level KERN_ERR RDMA/addr: Fix return of uninitialized ret value IB/srp: try to use larger FMR sizes to cover our mappings IB/srp: add support for indirect tables that don't fit in SRP_CMD IB/srp: rework mapping engine to use multiple FMR entries IB/srp: allow sg_tablesize to be set for each target IB/srp: move IB CM setup completion into its own function IB/srp: always avoid non-zero offsets into an FMR	2011-03-24 07:59:46 -07:00
Yevgeny Petrilin	0345584e0b	mlx4: generalization of multicast steering. The same packet steering mechanism would be used both for IB and Ethernet, Both multicasts and unicasts. This commit prepares the general infrastructure for this. Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-03-23 12:24:21 -07:00
Yevgeny Petrilin	725c89997e	mlx4_en: Reporting HW revision in ethtool -i HW revision is derived from device ID and rev id. Signed-off-by: Eugenia Emantayev <eugenia@mellanox.co.il> Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-03-23 12:24:20 -07:00
Roland Dreier	ba82638247	Merge branches 'misc', 'nes' and 'srp' into for-next	2011-03-23 11:13:37 -07:00
David Dillow	7f9e5c48c1	IB: Increase DMA max_segment_size on Mellanox hardware By default, each device is assumed to be able only handle 64 KB chunks during DMA. By giving the segment size a larger value, the block layer will coalesce more S/G entries together for SRP, allowing larger requests with the same sg_tablesize setting. The block layer is the only direct user of it, though a few IOMMU drivers reference it as well for their *_map_sg coalescing code. pci-gart_64 on x86, and a smattering on on sparc, powerpc, and ia64. Since other IB protocols could potentially see larger segments with this, let's check those: - iSER is fine, because you limit your maximum request size to 512 KB, so we'll never overrun the page vector in struct iser_page_vec (128 entries currently). It is independent of the DMA segment size, and handles multi-page segments already. - IPoIB is fine, as it maps each page individually, and doesn't use ib_dma_map_sg(). - RDS appears to do the right thing and has no dependencies on DMA segment size, but I don't claim to have done a complete audit. - NFSoRDMA and 9p are OK -- they do not use ib_dma_map_sg(), so they doesn't care about the coalescing. - Lustre's ko2iblnd does not care about coalescing -- it properly walks the returned sg list. This patch ups the value on Mellanox hardware to 1 GB, which matches reported firmware limits on mlx4. Signed-off-by: David Dillow <dillowda@ornl.gov> Signed-off-by: Roland Dreier <roland@purestorage.com>	2011-03-22 09:39:18 -07:00
Roland Dreier	071c778347	Merge branch 'external-indirect' of git://git.kernel.org/pub/scm/linux/kernel/git/dad/srp-initiator into srp	2011-03-21 11:52:00 -07:00
Michael Heinz	1eba843dd7	IB/mad: Improve an error message so error code is included Signed-off-by: Michael Heinz <michael.heinz@qlogic.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2011-03-18 09:42:20 -07:00
Roland Dreier	748bfd9c1d	RDMA/nes: Don't print success message at level KERN_ERR There's no reason to print "NetEffect RNIC driver successfully loaded" at level KERN_ERR (where it will uglify the console on a quiet boot). Change it to KERN_INFO. Signed-off-by: Roland Dreier <roland@purestorage.com>	2011-03-18 08:52:30 -07:00
Linus Torvalds	ec0afc9311	Merge branch 'kvm-updates/2.6.39' of git://git.kernel.org/pub/scm/virt/kvm/kvm * 'kvm-updates/2.6.39' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (55 commits) KVM: unbreak userspace that does not sets tss address KVM: MMU: cleanup pte write path KVM: MMU: introduce a common function to get no-dirty-logged slot KVM: fix rcu usage in init_rmode_* functions KVM: fix kvmclock regression due to missing clock update KVM: emulator: Fix permission checking in io permission bitmap KVM: emulator: Fix io permission checking for 64bit guest KVM: SVM: Load %gs earlier if CONFIG_X86_32_LAZY_GS=n KVM: x86: Remove useless regs_page pointer from kvm_lapic KVM: improve comment on rcu use in irqfd_deassign KVM: MMU: remove unused macros KVM: MMU: cleanup page alloc and free KVM: MMU: do not record gfn in kvm_mmu_pte_write KVM: MMU: move mmu pages calculated out of mmu lock KVM: MMU: set spte accessed bit properly KVM: MMU: fix kvm_mmu_slot_remove_write_access dropping intermediate W bits KVM: Start lock documentation KVM: better readability of efer_reserved_bits KVM: Clear async page fault hash after switching to real mode KVM: VMX: Initialize vm86 TSS only once. ...	2011-03-17 18:40:35 -07:00
Linus Torvalds	c55d267de2	Merge git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6 * git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6: (170 commits) [SCSI] scsi_dh_rdac: Add MD36xxf into device list [SCSI] scsi_debug: add consecutive medium errors [SCSI] libsas: fix ata list corruption issue [SCSI] hpsa: export resettable host attribute [SCSI] hpsa: move device attributes to avoid forward declarations [SCSI] scsi_debug: Logical Block Provisioning (SBC3r26) [SCSI] sd: Logical Block Provisioning update [SCSI] Include protection operation in SCSI command trace [SCSI] hpsa: fix incorrect PCI IDs and add two new ones (2nd try) [SCSI] target: Fix volume size misreporting for volumes > 2TB [SCSI] bnx2fc: Broadcom FCoE offload driver [SCSI] fcoe: fix broken fcoe interface reset [SCSI] fcoe: precedence bug in fcoe_filter_frames() [SCSI] libfcoe: Remove stale fcoe-netdev entries [SCSI] libfcoe: Move FCOE_MTU definition from fcoe.h to libfcoe.h [SCSI] libfc: introduce __fc_fill_fc_hdr that accepts fc_hdr as an argument [SCSI] fcoe, libfc: initialize EM anchors list and then update npiv EMs [SCSI] Revert "[SCSI] libfc: fix exchange being deleted when the abort itself is timed out" [SCSI] libfc: Fixing a memory leak when destroying an interface [SCSI] megaraid_sas: Version and Changelog update ... Fix up trivial conflicts due to whitespace differences in drivers/scsi/libsas/{sas_ata.c,sas_scsi_host.c}	2011-03-17 17:54:40 -07:00
Sean Hefty	1bdd6384c2	RDMA/addr: Fix return of uninitialized ret value Commit `b23dd4fe42` ("ipv4: Make output route lookup return rtable directly") resulted in leaving ret uninitialized, where it may later be returned. Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2011-03-17 17:00:19 -07:00
Huang Ying	0014bd990e	mm: export __get_user_pages In most cases, get_user_pages and get_user_pages_fast should be used to pin user pages in memory. But sometimes, some special flags except FOLL_GET, FOLL_WRITE and FOLL_FORCE are needed, for example in following patch, KVM needs FOLL_HWPOISON. To support these users, __get_user_pages is exported directly. There are some symbol name conflicts in infiniband driver, fixed them too. Signed-off-by: Huang Ying <ying.huang@intel.com> CC: Andrew Morton <akpm@linux-foundation.org> CC: Michel Lespinasse <walken@google.com> CC: Roland Dreier <roland@kernel.org> CC: Ralph Campbell <infinipath@qlogic.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2011-03-17 13:08:27 -03:00
Linus Torvalds	7a6362800c	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6 * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6: (1480 commits) bonding: enable netpoll without checking link status xfrm: Refcount destination entry on xfrm_lookup net: introduce rx_handler results and logic around that bonding: get rid of IFF_SLAVE_INACTIVE netdev->priv_flag bonding: wrap slave state work net: get rid of multiple bond-related netdevice->priv_flags bonding: register slave pointer for rx_handler be2net: Bump up the version number be2net: Copyright notice change. Update to Emulex instead of ServerEngines e1000e: fix kconfig for crc32 dependency netfilter ebtables: fix xt_AUDIT to work with ebtables xen network backend driver bonding: Improve syslog message at device creation time bonding: Call netif_carrier_off after register_netdevice bonding: Incorrect TX queue offset net_sched: fix ip_tos2prio xfrm: fix __xfrm_route_forward() be2net: Fix UDP packet detected status in RX compl Phonet: fix aligned-mode pipe socket buffer header reserve netxen: support for GbE port settings ... Fix up conflicts in drivers/staging/brcm80211/brcmsmac/wl_mac80211.c with the staging updates.	2011-03-16 16:29:25 -07:00
David Dillow	be8b981453	IB/srp: try to use larger FMR sizes to cover our mappings Now that we can get larger SG lists, we can take advantage of HCAs that allow us to use larger FMR sizes. In many cases, we can use up to 512 entries, so start there and work our way down. Signed-off-by: David Dillow <dillowda@ornl.gov>	2011-03-15 19:41:30 -04:00
David Dillow	c07d424d61	IB/srp: add support for indirect tables that don't fit in SRP_CMD This allows us to guarantee the ability to submit up to 8 MB requests based on the current value of SCSI_MAX_SG_CHAIN_SEGMENTS. While FMR will usually condense the requests into 8 SG entries, it is imperative that the target support external tables in case the FMR mapping fails or is not supported. We add a safety valve to allow targets without the needed support to reap the benefits of the large tables, but fail in a manner that lets the user know that the data didn't make it to the device. The user must add "allow_ext_sg=1" to the target parameters to indicate that the target has the needed support. If indirect_sg_entries is not specified in the modules options, then the sg_tablesize for the target will default to cmd_sg_entries unless overridden by the target options. Signed-off-by: David Dillow <dillowda@ornl.gov>	2011-03-15 19:37:23 -04:00
David Dillow	8f26c9ff9c	IB/srp: rework mapping engine to use multiple FMR entries Instead of forcing all of the S/G entries to fit in one FMR, and falling back to indirect descriptors if that fails, allow the use of as many FMRs as needed to map the request. This lays the groundwork for allowing indirect descriptor tables that are larger than can fit in the command IU, but should marginally improve performance now by reducing the number of indirect descriptors needed. We increase the minimum page size for the FMR pool to 4K, as larger pages help increase the coverage of each FMR, and it is rare that the kernel would send down a request with scattered 512 byte fragments. This patch also move some of the target initialization code afte the parsing of options, to keep it together with the new code that needs to allocate memory based on the options given. Signed-off-by: David Dillow <dillowda@ornl.gov>	2011-03-15 19:35:16 -04:00
David Dillow	4924864404	IB/srp: allow sg_tablesize to be set for each target Different configurations of target software allow differing max sizes of the command IU. Allowing this to be changed per-target allows all targets on an initiator to get an optimal setting. We deprecate srp_sg_tablesize and replace it with cmd_sg_entries in preparation for allowing more indirect descriptors than can fit in the IU. Signed-off-by: David Dillow <dillowda@ornl.gov>	2011-03-15 19:35:05 -04:00
David Dillow	961e0be89a	IB/srp: move IB CM setup completion into its own function This is to clean up prior to further changes. Signed-off-by: David Dillow <dillowda@ornl.gov>	2011-03-15 19:34:48 -04:00
David Dillow	8c4037b501	IB/srp: always avoid non-zero offsets into an FMR It is unclear exactly how this code works around Mellanox SRP targets, or if the problem is on the target side or in the HCA itself. In an abundance of caution, we should always enable the workaround. Signed-off-by: David Dillow <dillowda@ornl.gov>	2011-03-15 19:34:28 -04:00
Roland Dreier	043332cf28	Merge branches 'cma', 'cxgb4', 'ipath' and 'qib' into for-next	2011-03-15 10:58:04 -07:00
Sean Hefty	a396d43a35	RDMA/cma: Replace global lock in rdma_destroy_id() with id-specific one rdma_destroy_id currently uses the global rdma cm 'lock' to test if an rdma_cm_id has been bound to a device. This prevents an active address resolution callback handler from assigning a device to the rdma_cm_id after rdma_destroy_id checks for one. Instead, we can replace the use of the global lock around the check to the rdma_cm_id device pointer by setting the id state to destroying, then flushing all active callbacks. The latter is accomplished by acquiring and releasing the handler_mutex. Any active handler will complete first, and any newly scheduled handlers will find the rdma_cm_id in an invalid state. In addition to optimizing the current locking scheme, the use of the rdma_cm_id mutex is a more intuitive synchronization mechanism than that of the global lock. These changes are based on feedback from Doug Ledford <dledford@redhat.com> while he was trying to debug a crash in the rdma cm destroy path. Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2011-03-15 10:57:34 -07:00
Sean Hefty	8d8ac86564	IB/cm: Cancel pending LAP message when exiting IB_CM_ESTABLISH state This problem was reported by Moni Shoua <monis@mellanox.com> and Amir Vadai <amirv@mellanox.com>: When destroying a cm_id from a context of a work queue and if the lap_state of this cm_id is IB_CM_LAP_SENT, we need to release the reference of this id that was taken upon the send of the LAP message. Otherwise, if the expected APR message gets lost, it is only after a long time that the reference will be released, while during that the work handler thread is not available to process other things. It turns out that we need to cancel any pending LAP messages whenever we transition out of the IB_CM_ESTABLISH state. This occurs when disconnecting - either sending or receiving a DREQ. It can also happen in a corner case where we receive a REJ message after sending an RTU, followed by a LAP. Add checks and cancel any outstanding LAP messages in these three cases. Canceling the LAP when sending a DREQ fixes the destroy problem reported by Moni. When a cm_id is destroyed in the IB_CM_ESTABLISHED state, it sends a DREQ to the remote side to notify the peer that the connection is going away. Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2011-03-15 10:56:12 -07:00
Sean Hefty	29963437a4	IB/cm: Bump reference count on cm_id before invoking callback When processing a SIDR REQ, the ib_cm allocates a new cm_id. The refcount of the cm_id is initialized to 1. However, cm_process_work will decrement the refcount after invoking all callbacks. The result is that the cm_id will end up with refcount set to 0 by the end of the sidr req handler. If a user tries to destroy the cm_id, the destruction will proceed, under the incorrect assumption that no other threads are referencing the cm_id. This can lead to a crash when the cm callback thread tries to access the cm_id. This problem was noticed as part of a larger investigation with kernel crashes in the rdma_cm when running on a real time OS. Signed-off-by: Sean Hefty <sean.hefty@intel.com> Acked-by: Doug Ledford <dledford@redhat.com> Cc: <stable@kernel.org> Signed-off-by: Roland Dreier <roland@purestorage.com>	2011-03-15 10:56:12 -07:00
Sean Hefty	25ae21a101	RDMA/cma: Fix crash in request handlers Doug Ledford and Red Hat reported a crash when running the rdma_cm on a real-time OS. The crash has the following call trace: cm_process_work cma_req_handler cma_disable_callback rdma_create_id kzalloc init_completion cma_get_net_info cma_save_net_info cma_any_addr cma_zero_addr rdma_translate_ip rdma_copy_addr cma_acquire_dev rdma_addr_get_sgid ib_find_cached_gid cma_attach_to_dev ucma_event_handler kzalloc ib_copy_ah_attr_to_user cma_comp [ preempted ] cma_write copy_from_user ucma_destroy_id copy_from_user _ucma_find_context ucma_put_ctx ucma_free_ctx rdma_destroy_id cma_exch cma_cancel_operation rdma_node_get_transport rt_mutex_slowunlock bad_area_nosemaphore oops_enter They were able to reproduce the crash multiple times with the following details: Crash seems to always happen on the: mutex_unlock(&conn_id->handler_mutex); as conn_id looks to have been freed during this code path. An examination of the code shows that a race exists in the request handlers. When a new connection request is received, the rdma_cm allocates a new connection identifier. This identifier has a single reference count on it. If a user calls rdma_destroy_id() from another thread after receiving a callback, rdma_destroy_id will proceed to destroy the id and free the associated memory. However, the request handlers may still be in the process of running. When control returns to the request handlers, they can attempt to access the newly created identifiers. Fix this by holding a reference on the newly created rdma_cm_id until the request handler is through accessing it. Signed-off-by: Sean Hefty <sean.hefty@intel.com> Acked-by: Doug Ledford <dledford@redhat.com> Cc: <stable@kernel.org> Signed-off-by: Roland Dreier <roland@purestorage.com>	2011-03-15 10:00:28 -07:00
Nicolas Kaiser	2a543904dd	IB/ipath: Don't reset disabled devices The comment some lines above states that disabled devices must not reset. Signed-off-by: Nicolas Kaiser <nikai@nikai.net>	2011-03-14 14:25:59 -07:00
Mitko Haralanov	36b87b419c	IB/qib: Fix M_Key field in SubnGet and SubnGetResp MADs Set the M_Key field in SubnGet and SugnGetResp MADs based on correctly interpreting the protection level specified in the M_KeyProtBits field. Signed-off-by: Mitko Haralanov <mitko@qlogic.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@qlogic.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2011-03-14 12:11:51 -07:00
Mitko Haralanov	4634b7945c	IB/qib: Set default LE2 value for active cables to 0 For active and far-EQ cables use an LE2 value of 0 for improved SI. Signed-off-by: Mitko Haralanov <mitko@qlogic.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@qlogic.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2011-03-14 12:10:34 -07:00
Steve Wise	db5d040d7b	RDMA/cxgb4: Debugfs dump_qp() updates - Show whether the SQ is in onchip memory or not. - Dump both SQ and RQ QIDs. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2011-03-14 12:09:14 -07:00
Steve Wise	767fbe8151	RDMA/cxgb4: Dispatch FATAL event on EEH errors This at least kicks the user mode applications that are watching for device events. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2011-03-14 12:09:13 -07:00
Steve Wise	b48f3b9c10	RDMA/cxgb4: Use ULP_MODE_TCPDDP Set the ULP mode for initial RDMA connection setup to the proper DDP mode. This avoids wasting some HW resources while in streaming mode. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2011-03-14 12:09:12 -07:00
Steve Wise	a9c7719800	RDMA/cxgb4: Enable on-chip SQ support by default Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2011-03-14 12:09:12 -07:00
Steve Wise	ffc3f7487f	RDMA/cxgb4: Do CIDX_INC updates every 1/16 CQ depth CQE reaps This avoids the CIDX_INC overflow issue with T4A2 when running kernel RDMA applications. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2011-03-14 12:09:11 -07:00
Steve Wise	2942813739	RDMA/cxgb4: Remove db_drop_task Unloading iw_cxgb4 can crash due to the unload code trying to use db_drop_task, which is uninitialized. So remove this dead code. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2011-03-14 12:09:10 -07:00
Steve Wise	b52fe09e33	RDMA/cxgb4: Turn on delayed ACK Set the default to on. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2011-03-14 12:09:09 -07:00
David S. Miller	4c9483b2fb	ipv6: Convert to use flowi6 where applicable. Signed-off-by: David S. Miller <davem@davemloft.net>	2011-03-12 15:08:54 -08:00
David S. Miller	1d28f42c1b	net: Put flowi_* prefix on AF independent members of struct flowi I intend to turn struct flowi into a union of AF specific flowi structs. There will be a common structure that each variant includes first, much like struct sock_common. This is the first step to move in that direction. Signed-off-by: David S. Miller <davem@davemloft.net>	2011-03-12 15:08:44 -08:00
David S. Miller	78fbfd8a65	ipv4: Create and use route lookup helpers. The idea here is this minimizes the number of places one has to edit in order to make changes to how flows are defined and used. Signed-off-by: David S. Miller <davem@davemloft.net>	2011-03-12 15:08:42 -08:00
David S. Miller	b23dd4fe42	ipv4: Make output route lookup return rtable directly. Instead of on the stack. Signed-off-by: David S. Miller <davem@davemloft.net>	2011-03-02 14:31:35 -08:00
David S. Miller	273447b352	ipv4: Kill can_sleep arg to ip_route_output_flow() This boolean state is now available in the flow flags. Signed-off-by: David S. Miller <davem@davemloft.net>	2011-03-01 14:27:04 -08:00
David S. Miller	420d44daa7	ipv4: Make final arg to ip_route_output_flow to be boolean "can_sleep" Since that is what the current vague "flags" argument means. Signed-off-by: David S. Miller <davem@davemloft.net>	2011-03-01 14:19:23 -08:00
Mike Christie	7c53c6f89d	[SCSI] iser: export addr and port This pactch has iser export the address and port of the endpoint. Signed-off-by: Mike Christie <michaelc@cs.wisc.edu> Signed-off-by: James Bottomley <James.Bottomley@suse.de>	2011-02-24 12:41:23 -05:00
Mitko Haralanov	cc7fb05946	IB/qib: Return correct MAD when setting link width to 255 Fix a bug which causes the driver to return incorrect MADs as a response to Set(PortInfo) which sets the link width to 0xFF or link speed to 0xF. Signed-off-by: Mitko Haralanov <mitko@qlogic.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@qlogic.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2011-02-22 16:56:37 -08:00
David S. Miller	da935c66ba	Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: Documentation/feature-removal-schedule.txt drivers/net/e1000e/netdev.c net/xfrm/xfrm_policy.c	2011-02-19 19:17:35 -08:00
Roland Dreier	814b0a6120	Merge branches 'nes' and 'qib' into for-next	2011-02-17 14:04:59 -08:00
Mike Marciniszyn	c0af2c057d	IB/qib: Prevent double completions after a timeout or RNR error There is a double completion associated with error handling for RC QPs. The sequence is: - The do_rc_ack() routine fields an RNR nack and there are 0 rnr_retries configured on the QP. - qib_error_qp() stops the pending timer - qib_rc_send_complete() is called from sdma_complete() - qib_rc_send_complete() starts the timer because the msb of the psn just completed says an ack is needed. - a bunch of flushes occur as ipoib posts WQEs to an error'ed QP - rc_timeout() calls qib_restart_rc() - qib_restart_rc() calls qib_send_complete() with a IB_WC_RETRY_EXC_ERR on a wqe that has already been completed in the past The fix avoids starting the timer since another packet will never arrive. Signed-off-by: Mike Marciniszyn <mike.marciniszyn@qlogic.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2011-02-17 14:04:50 -08:00
Jiri Pirko	1765a57533	net: make dev->master general dev->master is now tightly connected to bonding driver. This patch makes this pointer more general and ready to be used by others. - netdev_set_master() - bond specifics moved to new function netdev_set_bond_master() - introduced netif_is_bond_slave() to check if device is a bonding slave Signed-off-by: Jiri Pirko <jpirko@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-02-13 10:42:07 -08:00
Mike Marciniszyn	414ed90cee	IB/qib: Fix double add_timer() The following panic BUG_ON occurs during qib testing: Kernel BUG at include/linux/timer.h:82 RIP [<ffffffff881f7109>] :ib_qib:start_timer+0x73/0x89 RSP <ffffffff80425bd0> <0>Kernel panic - not syncing: Fatal exception <0>Dumping qib trace buffer from panic qib_set_lid INFO: IB0:1 got a lid: 0xf8 Done dumping qib trace buffer BUG: warning at kernel/panic.c:137/panic() (Tainted: G The flaw is due to a missing state test when processing responses that results in an add_timer() call when the same timer is already queued. This code was executing in parallel with a QP destroy on another CPU that had changed the state to reset, but the missing test caused to response handling code to run on into the panic. Signed-off-by: Mike Marciniszyn <mike.marciniszyn@qlogic.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2011-02-10 11:24:08 -08:00
Maciej Sosnowski	25a54a6bb8	RDMA/nes: Don't generate async events for unregistered devices nes_port_ibevent() should not be called when the nes RDMA device is not registered with the RDMA core. Add missing checks of of_device_registered flag. Signed-off-by: Maciej Sosnowski <maciej.sosnowski@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2011-02-03 15:55:26 -08:00
Linus Torvalds	9118626a30	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: RDMA: Update missed conversion of flush_scheduled_work() RDMA/ucma: Copy iWARP route information on queries RDMA/amso1100: Fix compile warnings RDMA/cxgb4: Set the correct device physical function for iWARP connections RDMA/cxgb4: Limit MAXBURST EQ context field to 256B IB/qib: Hold link for TX SERDES settings mlx4_core: Add ConnectX-3 device IDs	2011-02-03 11:19:26 -08:00
Roland Dreier	e51c7b1ab0	Merge branches 'amso1100', 'cma', 'cxgb4', 'misc', 'mlx4' and 'qib' into for-next	2011-01-29 20:45:04 -08:00
Tejun Heo	96e61fa55e	RDMA: Update missed conversion of flush_scheduled_work() Commit `f06267104d` ("RDMA: Update workqueue usage") introduced ib_wq and removed the use of flush_scheduled_work(); however, during the merge process one chunk was lost in ib_sa_remove_one(). Fix it. Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Roland Dreier <roland@purestorage.com>	2011-01-28 16:39:08 -08:00
Steve Wise	e86f8b06f5	RDMA/ucma: Copy iWARP route information on queries For iWARP rdma_cm ids, the "route" information is the L2 src and next hop addresses. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2011-01-28 16:34:05 -08:00
Ralf Thielow	f9a4f6dcdd	RDMA/amso1100: Fix compile warnings Fix compile warnings on 32-bit by using "0" instead of "(u64) NULL" to assign to "c2_vq_req->reply_msg". Signed-off-by: Ralf Thielow <ralf.thielow@googlemail.com> [ Change from "(unsigned long) NULL" to plain old "0" as suggested by Bart Van Assche <bvanassche@acm.org>. - Roland ] Signed-off-by: Roland Dreier <roland@purestorage.com>	2011-01-28 15:40:25 -08:00
Steve Wise	94788657c9	RDMA/cxgb4: Set the correct device physical function for iWARP connections The PF passed to FW was 0, causing PCI failures in an SR-IOV environment. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Cc: <stable@kernel.org> Signed-off-by: Roland Dreier <roland@purestorage.com>	2011-01-28 15:34:28 -08:00
Steve Wise	6a09a9d694	RDMA/cxgb4: Limit MAXBURST EQ context field to 256B MAXBURST cannot exceed 256B for on-chip queues. With a 512B MAXBURST, we can lock up the chip. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Cc: <stable@kernel.org> Signed-off-by: Roland Dreier <roland@purestorage.com>	2011-01-28 15:34:24 -08:00
Mitko Haralanov	d70585f7de	IB/qib: Hold link for TX SERDES settings Hold the IB link at DISABLED until we get the correct TX settings on mezz boards. Signed-off-by: Mike Marciniszyn <mike.marciniszyn@qlogic.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2011-01-28 15:30:02 -08:00
David Rientjes	6a108a14fa	kconfig: rename CONFIG_EMBEDDED to CONFIG_EXPERT The meaning of CONFIG_EMBEDDED has long since been obsoleted; the option is used to configure any non-standard kernel with a much larger scope than only small devices. This patch renames the option to CONFIG_EXPERT in init/Kconfig and fixes references to the option throughout the kernel. A new CONFIG_EMBEDDED option is added that automatically selects CONFIG_EXPERT when enabled and can be used in the future to isolate options that should only be considered for embedded systems (RISC architectures, SLOB, etc). Calling the option "EXPERT" more accurately represents its intention: only expert users who understand the impact of the configuration changes they are making should enable it. Reviewed-by: Ingo Molnar <mingo@elte.hu> Acked-by: David Woodhouse <david.woodhouse@intel.com> Signed-off-by: David Rientjes <rientjes@google.com> Cc: Greg KH <gregkh@suse.de> Cc: "David S. Miller" <davem@davemloft.net> Cc: Jens Axboe <axboe@kernel.dk> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Robin Holt <holt@sgi.com> Cc: <linux-arch@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-01-20 17:02:05 -08:00
Linus Torvalds	6845a44a31	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: RDMA: Update workqueue usage RDMA/nes: Fix incorrect SFP+ link status detection on driver init RDMA/nes: Fix SFP+ link down detection issue with switch port disable RDMA/nes: Generate IB_EVENT_PORT_ERR/PORT_ACTIVE events RDMA/nes: Fix bonding on iw_nes IB/srp: Test only once whether iu allocation succeeded IB/mlx4: Handle protocol field in multicast table RDMA: Use vzalloc() to replace vmalloc()+memset(0) mlx4_{core, ib, en}: Fix driver when sizeof (phys_addr_t) > sizeof (long) IB/mthca: Fix driver when sizeof (phys_addr_t) > sizeof (long)	2011-01-17 14:45:48 -08:00
Roland Dreier	4790f4dc5f	Merge branches 'misc', 'mlx4', 'mthca', 'nes' and 'srp' into for-next	2011-01-16 21:22:41 -08:00
Tejun Heo	f06267104d	RDMA: Update workqueue usage * ib_wq is added, which is used as the common workqueue for infiniband instead of the system workqueue. All system workqueue usages including flush_scheduled_work() callers are converted to use and flush ib_wq. * cancel_delayed_work() + flush_scheduled_work() converted to cancel_delayed_work_sync(). * qib_wq is removed and ib_wq is used instead. This is to prepare for deprecation of flush_scheduled_work(). Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2011-01-16 21:16:31 -08:00
Maciej Sosnowski	843276ad98	RDMA/nes: Fix incorrect SFP+ link status detection on driver init During iw_nes initialization the link status for SFP+ PHY is always detected as "up" regardless of real state (cable either connected or disconnected). Add SFP+ PHY specific link status detection to the iw_nes initialization procedure. Use link status recheck for netdev_open to detect delayed state updates. Signed-off-by: Maciej Sosnowski <maciej.sosnowski@intel.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2011-01-16 13:23:35 -08:00
Maciej Sosnowski	5f61b2c693	RDMA/nes: Fix SFP+ link down detection issue with switch port disable In case of SFP+ PHY, link status check at interrupt processing can give false results. For proper link status change detection a delayed recheck is needed to give nes registers time to settle. Add a periodic link status recheck scheduled at interrupt to detect potential delayed registers state changes. Addresses: http://bugs.openfabrics.org/bugzilla/show_bug.cgi?id=2117 Signed-off-by: Maciej Sosnowski <maciej.sosnowski@intel.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2011-01-16 13:23:34 -08:00
Maciej Sosnowski	ea623455b7	RDMA/nes: Generate IB_EVENT_PORT_ERR/PORT_ACTIVE events Depending on link state change, IB_EVENT_PORT_ERR or IB_EVENT_PORT_ACTIVE should be generated when handling MAC interrupts. Plugging in a cable happens to result in series of interrupts changing driver's link state a number of times before finally staying at link up (e.g. link up, link down, link up, link down, ..., link up). To prevent sending series of redundant IB_EVENT_PORT_ACTIVE and IB_EVENT_PORT_ERR events, we use a timer to debounce them in nes_port_ibevent(). Signed-off-by: Maciej Sosnowski <maciej.sosnowski@intel.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2011-01-16 13:23:34 -08:00
Maciej Sosnowski	2a4c97ead4	RDMA/nes: Fix bonding on iw_nes Enable configuring bonds on nes devices by adding missing support for master net_device to the driver. Signed-off-by: Maciej Sosnowski <maciej.sosnowski@intel.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2011-01-16 13:23:33 -08:00
Bart Van Assche	695b83495e	IB/srp: Test only once whether iu allocation succeeded Merge the two tests in srp_queuecommand() of whether information unit allocation succeeded into one. An intended side effect of this change is that we fix the warning: drivers/infiniband/ulp/srp/ib_srp.c: In function 'srp_queuecommand': drivers/infiniband/ulp/srp/ib_srp.c:1116: warning: 'req' may be used uninitialized in this function (seen with CONFIG_CC_OPTIMIZE_FOR_SIZE=y at least with gcc 4.4.4) Signed-off-by: Bart Van Assche <bvanassche@acm.org> Acked-by: David Dillow <dillowda@ornl.gov> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2011-01-13 14:00:43 -08:00
Linus Torvalds	008d23e485	Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial * 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (43 commits) Documentation/trace/events.txt: Remove obsolete sched_signal_send. writeback: fix global_dirty_limits comment runtime -> real-time ppc: fix comment typo singal -> signal drivers: fix comment typo diable -> disable. m68k: fix comment typo diable -> disable. wireless: comment typo fix diable -> disable. media: comment typo fix diable -> disable. remove doc for obsolete dynamic-printk kernel-parameter remove extraneous 'is' from Documentation/iostats.txt Fix spelling milisec -> ms in snd_ps3 module parameter description Fix spelling mistakes in comments Revert conflicting V4L changes i7core_edac: fix typos in comments mm/rmap.c: fix comment sound, ca0106: Fix assignment to 'channel'. hrtimer: fix a typo in comment init/Kconfig: fix typo anon_inodes: fix wrong function name in comment fix comment typos concerning "consistent" poll: fix a typo in comment ... Fix up trivial conflicts in: - drivers/net/wireless/iwlwifi/iwl-core.c (moved to iwl-legacy.c) - fs/ext4/ext4.h Also fix missed 'diabled' typo in drivers/net/bnx2x/bnx2x.h while at it.	2011-01-13 10:05:56 -08:00
Aleksey Senin	da995a8aee	IB/mlx4: Handle protocol field in multicast table The newest device firmware stores IB vs. Ethernet protocol in two bits in members_count field of multicast group table (0: Infiniband, 1: Ethernet). When changing the QP members count for a multicast group, it important not to reset this information. When calling multicast attach first time, the protocol type should be specified. In this patch we always set it IB, but in the future we will handle Ethernet too. When looking for a QP, the protocol type shoud be checked too. Signed-off-by: Aleksey Senin <alekseys@voltaire.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2011-01-12 14:49:17 -08:00
Joe Perches	948579cd8c	RDMA: Use vzalloc() to replace vmalloc()+memset(0) Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2011-01-12 11:11:58 -08:00
Roland Dreier	4979d18fe1	mlx4_{core, ib, en}: Fix driver when sizeof (phys_addr_t) > sizeof (long) Some systems have PCI addresses that don't fit in unsigned long (eg some 32-bit PowerPC 440 systems have 36-bit bus addresses). Fix up mlx4 drivers by using phys_addr_t where appropriate, so we don't truncate any PCI resource addresses before ioremapping them. Signed-off-by: Roland Dreier <rolandd@cisco.com>	2011-01-12 09:50:36 -08:00
John L. Burr	eb4a7cbf27	IB/mthca: Fix driver when sizeof (phys_addr_t) > sizeof (long) Some systems have PCI addresses that don't fit in unsigned long (eg some 32-bit PowerPC 440 systems have 36-bit bus addresses). Fix up the driver by using phys_addr_t where appropriate, so we don't truncate any PCI resource addresses before ioremapping them. Signed-off-by: John L. Burr <jlburr@cadence.com> [ Update to apply to current driver source. - Roland ] Signed-off-by: Roland Dreier <rolandd@cisco.com>	2011-01-11 20:39:47 -08:00
Linus Torvalds	f1d6d6cd90	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: (42 commits) IB/qib: Fix refcount leak in lkey/rkey validation IB/qib: Improve SERDES tunning on QMH boards IB/qib: Unnecessary delayed completions on RC connection IB/qib: Issue pre-emptive NAKs on eager buffer overflow IB/qib: RDMA lkey/rkey validation is inefficient for large MRs IB/qib: Change QPN increment IB/qib: Add fix missing from earlier patch IB/qib: Change receive queue/QPN selection IB/qib: Fix interrupt mitigation IB/qib: Avoid duplicate writes to the rcv head register IB/qib: Add a few new SERDES tunings IB/qib: Reset packet list after freeing IB/qib: New SERDES init routine and improvements to SI quality IB/qib: Clear WAIT_SEND flags when setting QP to error state IB/qib: Fix context allocation with multiple HCAs IB/qib: Fix multi-Florida HCA host panic on reboot IB/qib: Handle transitions from ACTIVE_DEFERRED to ACTIVE better IB/qib: UD send with immediate receive completion has wrong size IB/qib: Set port physical state even if other fields are invalid IB/qib: Generate completion callback on errors ...	2011-01-11 16:30:08 -08:00
Roland Dreier	2b76c05794	Merge branches 'cxgb4', 'ipath', 'ipoib', 'mlx4', 'mthca', 'nes', 'qib' and 'srp' into for-next	2011-01-10 17:43:30 -08:00
Mike Marciniszyn	4db62d4786	IB/qib: Fix refcount leak in lkey/rkey validation The mr optimization introduced a reference count leak on an exception test. The lock/refcount manipulation is moved down and the problematic exception test now calls bail to insure that the lock is released. Additional fixes as suggested by Ralph Campbell <ralph.campbell@qlogic.org>: - reduce lock scope of dma regions - use explicit values on returns vs. automatic ret value Signed-off-by: Mike Marciniszyn <mike.marciniszyn@qlogic.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2011-01-10 17:42:23 -08:00
Mike Marciniszyn	f2d255a078	IB/qib: Improve SERDES tunning on QMH boards Improve the QMH SERDES tunning on initial driver load by having the driver go through a link state change. Signed-off-by: Mike Marciniszyn <mike.marciniszyn@qlogic.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2011-01-10 17:42:22 -08:00
Mike Marciniszyn	dd04e43d46	IB/qib: Unnecessary delayed completions on RC connection Currently on receipt of a response message (ACKs, RDMA Response, Atomic Responses etc.) if the SDMA completion counter is not advanced the driver delays the completion of the WQE. In most cases this is overly pessimistic as the response (ACK) to a previously transmitted send implies that the send is complete. Ensure that SDMA queue is progressed appropriately before determining if a send has delayed completions. Signed-off-by: Mike Marciniszyn <mike.marciniszyn@qlogic.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2011-01-10 17:42:22 -08:00
Mike Marciniszyn	994bcd28a3	IB/qib: Issue pre-emptive NAKs on eager buffer overflow Under congestion resulting in eager buffer overflow attempt to send pre-emptive NAKs if header queue entries with TID errors are generated and a valid header is present. This prevents long timeouts and flow restarts if a trailing set of packets are dropped due to eager overflows. Pre-emptive NAKs are currently only supported for RDMA writes. Signed-off-by: Mike Marciniszyn <mike.marciniszyn@qlogic.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2011-01-10 17:42:22 -08:00
Mike Marciniszyn	2a600f14d2	IB/qib: RDMA lkey/rkey validation is inefficient for large MRs The current code loops during rkey/lkey validiation to isolate the MR for the RDMA, which is expensive when the current operation is inside a very large memory region. This fix optimizes rkey/lkey validation routines for user memory regions and fast memory regions. The MR entry can be isolated by shifts/mods instead of looping. The existing loop is preserved for phys memory regions for now. Signed-off-by: Mike Marciniszyn <mike.marciniszyn@qlogic.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2011-01-10 17:42:22 -08:00
Mike Marciniszyn	7c3edd3ff3	IB/qib: Change QPN increment Changing from +1 to +2 allows for better QP distribution across receive contexts. Signed-off-by: Mike Marciniszyn <mike.marciniszyn@qlogic.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2011-01-10 17:42:22 -08:00
Mike Marciniszyn	057ae62fac	IB/qib: Add fix missing from earlier patch The upstream code was missing part of a receive/error race fix from the internal tree. Add the missing part, which makes future merges possible. Signed-off-by: Mike Marciniszyn <mike.marciniszyn@qlogic.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2011-01-10 17:42:21 -08:00
Mike Marciniszyn	2528ea60f9	IB/qib: Change receive queue/QPN selection The basic idea is that on SusieQ, the difficult part of mapping QPN to context is handled by the mapping registers so the generic QPN allocation doesn't need to worry about chip specifics. For Monty and Linda, there is no mapping table so the qpt->mask (same as dd->qpn_mask), is used to see if the QPN to context falls within [zero..dd->n_krcv_queues). Signed-off-by: Mike Marciniszyn <mike.marciniszyn@qlogic.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2011-01-10 17:42:21 -08:00
Mike Marciniszyn	19ede2e422	IB/qib: Fix interrupt mitigation For SusieQ we need to write to the interrupt timer register before updating the header queue head with interrupt count. This is to ensure that the timer is enabled properly and a receive available interrupt is delivered. Otherwise this interrupt can be lost if the receiver header/eager queues are full before the timer is enabled. Signed-off-by: Mike Marciniszyn <mike.marciniszyn@qlogic.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2011-01-10 17:42:21 -08:00
Mike Marciniszyn	aa7374ac19	IB/qib: Avoid duplicate writes to the rcv head register Avoid duplicate writes to the head register as this can lead to lost interrupts if the context goes full before the second write is done. Signed-off-by: Mike Marciniszyn <mike.marciniszyn@qlogic.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2011-01-10 17:42:21 -08:00
Mike Marciniszyn	e706203c7c	IB/qib: Add a few new SERDES tunings Add new SERDES tuning to aid manufacturing. Signed-off-by: Mike Marciniszyn <mike.marciniszyn@qlogic.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2011-01-10 17:42:21 -08:00
Mike Marciniszyn	f73df408b2	IB/qib: Reset packet list after freeing Reset the list pointers after freeing the SDMA packet list. This is done to any potential double-free cases. Signed-off-by: Mike Marciniszyn <mike.marciniszyn@qlogic.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2011-01-10 17:42:21 -08:00
Mike Marciniszyn	a0a234d47d	IB/qib: New SERDES init routine and improvements to SI quality Implement new SERDES initialization routine and improvements to signal integrity -- disable LE1 adaptation, disable LOS after link-up, set better SERDES parameters. Signed-off-by: Mike Marciniszyn <mike.marciniszyn@qlogic.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2011-01-10 17:42:20 -08:00
Mike Marciniszyn	16028f2777	IB/qib: Clear WAIT_SEND flags when setting QP to error state If these flags are set when the QP is transitioned to the error state, it will wait until the flags are cleared, which may never happen if the error transition is due to a link going down. Signed-off-by: Mike Marciniszyn <mike.marciniszyn@qlogic.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2011-01-10 17:42:20 -08:00
Mike Marciniszyn	6676b3f746	IB/qib: Fix context allocation with multiple HCAs The driver was incorrectly choosing HCAs on which to allocate new user contexts based on overall count of usable ports regardless whether the usable port was on the currently selected HCA. Signed-off-by: Mike Marciniszyn <mike.marciniszyn@qlogic.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2011-01-10 17:42:20 -08:00
Mike Marciniszyn	5dbbcb97cc	IB/qib: Fix multi-Florida HCA host panic on reboot Add check when setting configured contexts that the value does not exceed the number of contexts allocated for the card. If the value exceeds the already allocated count, set it to what is already allocated. Signed-off-by: Mike Marciniszyn <mike.marciniszyn@qlogic.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2011-01-10 17:42:20 -08:00
Mike Marciniszyn	b3d5cb2f20	IB/qib: Handle transitions from ACTIVE_DEFERRED to ACTIVE better When the link transitions from ACTIVE_DEFERRED to ACTIVE, the driver only sees the ACTIVE state. With this change, it will check whether the state was already ACTIVE and if so, it will not generated IB events and will not clear symbol error counts. Signed-off-by: Mike Marciniszyn <mike.marciniszyn@qlogic.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2011-01-10 17:42:20 -08:00
Mike Marciniszyn	c7665e5a69	IB/qib: UD send with immediate receive completion has wrong size The code to generate receive completion entries for UD send with immediate contains the wrong payload length. This is because when the code to compute the payload size was moved, the value of hdrsize didn't get moved too. The fix is to update tlen directly. Signed-off-by: Mike Marciniszyn <mike.marciniszyn@qlogic.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2011-01-10 17:42:20 -08:00
Mike Marciniszyn	3c9e5f4d65	IB/qib: Set port physical state even if other fields are invalid The IBTA vol. 1 release 1.2.1 spec. says: C14-24.2.1: If PortInfo:Portstate=Down, then a SubnSet(PortInfo) shall make any changes it specifies to PortInfo:PortPhysicalState; any other result is vendor-dependent. The patch changes the error handling so that the reply says there are invalid fields but still attempts to set fields that are in range including PortInfo:PortPhysicalState. Signed-off-by: Mike Marciniszyn <mike.marciniszyn@qlogic.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2011-01-10 17:42:19 -08:00
Mike Marciniszyn	a377acd151	IB/qib: Generate completion callback on errors According to IBTA vol. 1, C11-30.1.1, a notification callback is invoked if the CQ is armed for the next solicited completion event or an error completion. The error case wasn't being generated correctly. Signed-off-by: Mike Marciniszyn <mike.marciniszyn@qlogic.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2011-01-10 17:42:19 -08:00
Mike Marciniszyn	f509f9c14d	IB/qib: Add support for the new QME7362 card Add support to recognize another board variation named QME7362. Signed-off-by: Mike Marciniszyn <mike.marciniszyn@qlogic.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2011-01-10 17:42:19 -08:00
Mike Marciniszyn	0a43e11722	IB/qib: Add receive header queue size module parameters The receive header queue sizes need to modified for performance tuning. Three module parameters are added to support this. Signed-off-by: Mike Marciniszyn <mike.marciniszyn@qlogic.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2011-01-10 17:42:19 -08:00
Mike Marciniszyn	9d5b243f24	IB/qib: Remove IB latency turnoff This is required for hardware testing. Signed-off-by: Mike Marciniszyn <mike.marciniszyn@qlogic.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2011-01-10 17:42:19 -08:00
Joe Perches	601d87b079	RDMA/nes: Fix string continuation line Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2011-01-10 17:42:14 -08:00
Dan Carpenter	d0444f1527	IB/mthca: Handle -ENOMEM in forward_trap() ib_create_send_mad() can return ERR_PTR(-ENOMEM) here. Signed-off-by: Dan Carpenter <error27@gmail.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2011-01-10 17:42:10 -08:00
Dan Carpenter	1397490938	IB/mlx4: Handle -ENOMEM in forward_trap() ib_create_send_mad() can return ERR_PTR(-ENOMEM) here. Signed-off-by: Dan Carpenter <error27@gmail.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2011-01-10 17:42:06 -08:00
Vladimir Sokolovsky	3afa9f19e5	IB/mlx4: Don't call dma_free_coherent() with irqs disabled mlx4_ib_free_cq_buf() should not be called under spin_lock_irq() since it calls dma_free_coherent(), which needs irqs enabled. Fix this by deferring the free to outside the locked region. This was found due to the WARN_ON(irqs_disabled()); in swiotlb_free_coherent(). Signed-off-by: Vladimir Sokolovsky <vlad@mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2011-01-10 17:42:06 -08:00
Or Gerlitz	8ae31e5b1f	IPoIB: Add GRO support Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2011-01-10 17:41:55 -08:00
Or Gerlitz	19e364f680	IPoIB: Remove LRO support As a first step in moving from LRO to GRO, revert commit `af40da894e` ("IPoIB: add LRO support"). Also eliminate the ethtool set_flags callback which isn't needed anymore. Finally, we need to include <linux/sched.h> directly to get the declaration of restart_syscall() (which used to be included implicitly through <linux/inet_lro.h>). Cc: Ben Hutchings <bhutchings@solarflare.com> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Vladimir Sokolovsky <vlad@mellanox.co.il> Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2011-01-10 17:41:54 -08:00
Joe Perches	1eba27e87a	IB/ipath: Use printf extension %pR for struct resource Using %pR standardizes the struct resource output. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2011-01-10 17:41:50 -08:00
Steve Wise	db8b101671	RDMA/cxgb4: Don't re-init wait object in init/fini paths Re-initializing the wait object in rdma_init()/rdma_fini() causes a timing window which can lead to a deadlock during close. Once this deadlock hits, all RDMA activity over the T4 device will be stuck. There's no need to re-init the wait object, so remove it. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Cc: <stable@kernel.org> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2011-01-10 17:41:43 -08:00
Stephen Hemminger	c943109163	RDMA/cxgb3,cxgb4: Remove dead code This removes unused code found by running 'make namespacecheck'; compile tested only. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Acked-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2011-01-10 17:41:43 -08:00
David Dillow	9af762719e	IB/srp: consolidate hot-path variables into cache lines Put the variables accessed together in the hot-path into common cachelines, and separate them by RW vs RO to avoid false dirtying. We keep a local copy of the lkey and rkey in the target to avoid traversing pointers (and associated cache lines) to find them. Reviewed-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: David Dillow <dillowda@ornl.gov>	2011-01-10 15:44:51 -05:00
Bart Van Assche	e968467822	IB/srp: stop sharing the host lock with SCSI We don't need protection against the SCSI stack, so use our own lock to allow parallel progress on separate CPUs. Signed-off-by: Bart Van Assche <bvanassche@acm.org> [ broken out and small cleanups by David Dillow ] Signed-off-by: David Dillow <dillowda@ornl.gov>	2011-01-10 15:44:50 -05:00
Bart Van Assche	94a9174c63	IB/srp: reduce lock coverage of command completion We only need the lock to cover list and credit manipulations, so push those into srp_remove_req() and update the call chains. We reorder the request removal and command completion in srp_process_rsp() to avoid the SCSI mid-layer sending another command before we've released our request and added any credits returned by the target. This prevents us from returning HOST_BUSY unneccesarily. Signed-off-by: Bart Van Assche <bvanassche@acm.org> [ broken out, small cleanups, and modified to avoid potential extraneous HOST_BUSY returns by David Dillow ] Signed-off-by: David Dillow <dillowda@ornl.gov>	2011-01-10 15:44:50 -05:00
Bart Van Assche	76c75b258f	IB/srp: reduce local coverage for command submission and EH We only need locks to protect our lists and number of credits available. By pre-consuming the credit for the request, we can reduce our lock coverage to just those areas. If we don't actually send the request, we'll need to put the credit back into the pool. Signed-off-by: Bart Van Assche <bvanassche@acm.org> [ broken out and small cleanups by David Dillow ] Signed-off-by: David Dillow <dillowda@ornl.gov>	2011-01-10 15:44:49 -05:00
Bart Van Assche	536ae14e75	IB/srp: don't move active requests to their own list We use req->scmnd != NULL to indicate an active request, so there's no need to keep a separate list for them. We can afford the array iteration during error handling, and dropping it gives us one less item that needs lock protection. Signed-off-by: Bart Van Assche <bvanassche@acm.org> [ broken out and small cleanups by David Dillow ] Signed-off-by: David Dillow <dillowda@ornl.gov>	2011-01-10 15:44:42 -05:00
Linus Torvalds	b4a45f5fe8	Merge branch 'vfs-scale-working' of git://git.kernel.org/pub/scm/linux/kernel/git/npiggin/linux-npiggin * 'vfs-scale-working' of git://git.kernel.org/pub/scm/linux/kernel/git/npiggin/linux-npiggin: (57 commits) fs: scale mntget/mntput fs: rename vfsmount counter helpers fs: implement faster dentry memcmp fs: prefetch inode data in dcache lookup fs: improve scalability of pseudo filesystems fs: dcache per-inode inode alias locking fs: dcache per-bucket dcache hash locking bit_spinlock: add required includes kernel: add bl_list xfs: provide simple rcu-walk ACL implementation btrfs: provide simple rcu-walk ACL implementation ext2,3,4: provide simple rcu-walk ACL implementation fs: provide simple rcu-walk generic_check_acl implementation fs: provide rcu-walk aware permission i_ops fs: rcu-walk aware d_revalidate method fs: cache optimise dentry and inode for rcu-walk fs: dcache reduce branches in lookup path fs: dcache remove d_mounted fs: fs_struct use seqlock fs: rcu-walk for path lookup ...	2011-01-07 08:56:33 -08:00
Nick Piggin	dc0474be3e	fs: dcache rationalise dget variants dget_locked was a shortcut to avoid the lazy lru manipulation when we already held dcache_lock (lru manipulation was relatively cheap at that point). However, how that the lru lock is an innermost one, we never hold it at any caller, so the lock cost can now be avoided. We already have well working lazy dcache LRU, so it should be fine to defer LRU manipulations to scan time. Signed-off-by: Nick Piggin <npiggin@kernel.dk>	2011-01-07 17:50:24 +11:00
Nick Piggin	b5c84bf6f6	fs: dcache remove dcache_lock dcache_lock no longer protects anything. remove it. Signed-off-by: Nick Piggin <npiggin@kernel.dk>	2011-01-07 17:50:23 +11:00
Nick Piggin	b7ab39f631	fs: dcache scale dentry refcount Make d_count non-atomic and protect it with d_lock. This allows us to ensure a 0 refcount dentry remains 0 without dcache_lock. It is also fairly natural when we start protecting many other dentry members with d_lock. Signed-off-by: Nick Piggin <npiggin@kernel.dk>	2011-01-07 17:50:21 +11:00
Bart Van Assche	dcb4cb85f4	IB/srp: allow lockless work posting Only one CPU at a time will own an RX IU, so using the address of the IU as the work request cookie allows us to avoid taking a lock. We can similarly prepare the TX path for lockless posting by moving the free TX IUs to a list. This also removes the requirement that the queue sizes be a power of 2. Signed-off-by: Bart Van Assche <bvanassche@acm.org> [ broken out, small cleanups, and modified to avoid needing an extra field in the IU by David Dillow] Signed-off-by: David Dillow <dillowda@ornl.gov>	2011-01-05 15:24:25 -05:00
Bart Van Assche	9709f0e05b	IB/srp: consolidate state change code Signed-off-by: Bart Van Assche <bvanassche@acm.org> [ broken out and small cleanups by David Dillow ] Signed-off-by: David Dillow <dillowda@ornl.gov>	2011-01-05 15:24:25 -05:00
David Dillow	f8b6e31e4e	IB/srp: allow task management without a previous request We can only have one task management comment outstanding, so move the completion and status to the target port. This allows us to handle resets of a LUN without a corresponding request having been sent. Meanwhile, we don't need to play games with host_scribble, just use it as the pointer it is. This fixes a crash when we issue a bus reset using sg_reset. Fixes: https://bugzilla.kernel.org/show_bug.cgi?id=13893 Reported-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: David Dillow <dillowda@ornl.gov>	2011-01-05 15:24:25 -05:00
David S. Miller	17f7f4d9fc	Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: net/ipv4/fib_frontend.c	2010-12-26 22:37:05 -08:00
Jiri Kosina	4b7bd36470	Merge branch 'master' into for-next Conflicts: MAINTAINERS arch/arm/mach-omap2/pm24xx.c drivers/scsi/bfa/bfa_fcpim.c Needed to update to apply fixes for which the old branch was too outdated.	2010-12-22 18:57:02 +01:00
Dan Carpenter	7182afea8d	IB/uverbs: Handle large number of entries in poll CQ In ib_uverbs_poll_cq() code there is a potential integer overflow if userspace passes in a large cmd.ne. The calls to kmalloc() would allocate smaller buffers than intended, leading to memory corruption. There iss also an information leak if resp wasn't all used. Unprivileged userspace may call this function, although only if an RDMA device that uses this function is present. Fix this by copying CQ entries one at a time, which avoids the allocation entirely, and also by moving this copying into a function that makes sure to initialize all memory copied to userspace. Special thanks to Jason Gunthorpe <jgunthorpe@obsidianresearch.com> for his help and advice. Cc: <stable@kernel.org> Signed-off-by: Dan Carpenter <error27@gmail.com> [ Monkey around with things a bit to avoid bad code generation by gcc when designated initializers are used. - Roland ] Signed-off-by: Roland Dreier <rolandd@cisco.com>	2010-12-08 15:23:49 -08:00
Linus Torvalds	75318ec327	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: IB: Fix information leak in marshalling code IB/pack: Remove some unused code added by the IBoE patches IB/mlx4: Fix IBoE link state IB/mlx4: Fix IBoE reported link rate mlx4_core: Workaround firmware bug in query dev cap IB/mlx4: Fix memory ordering of VLAN insertion control bits MAINTAINERS: Update NetEffect entry	2010-12-02 12:10:56 -08:00
Roland Dreier	7adce751ce	Merge branches 'misc', 'mlx4' and 'nes' into for-next	2010-12-01 16:33:47 -08:00
Vasiliy Kulikov	91a4d157d0	IB: Fix information leak in marshalling code ib_ucm_init_qp_attr() and ucma_init_qp_attr() pass struct ib_uverbs_qp_attr with reserved, qp_state, {ah_attr,alt_ah_attr}{reserved,->grh.reserved} fields uninitialized to copy_to_user(). This leads to leaking of contents of kernel stack memory to userspace. Signed-off-by: Vasiliy Kulikov <segoon@openwall.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2010-12-01 16:33:18 -08:00
Or Gerlitz	f55864a4f4	IB/pack: Remove some unused code added by the IBoE patches Remove unused functions added by commit `ff7f5aab35` ("IB/pack: IBoE UD packet packing support"). Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com>	2010-12-01 16:30:18 -08:00
Eli Cohen	21d606090e	IB/mlx4: Fix IBoE link state Use netif_running() and netif_carrier_ok() to report link state, exactly as is done to report Ethernet link state in sysfs. Signed-off-by: Eli Cohen <eli@mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2010-12-01 16:11:29 -08:00
Eli Cohen	328266c561	IB/mlx4: Fix IBoE reported link rate The link rate is the product of the link speed in the link width. For Etherent ports the rate is 10G, so we use 1 for the width and 4 for speed to get the correct rate. Signed-off-by: Eli Cohen <eli@mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2010-12-01 16:10:35 -08:00
Eli Cohen	e27535b9c6	IB/mlx4: Fix memory ordering of VLAN insertion control bits We must fully update the control segment before marking it as valid, so that hardware doesn't start executing it before we're ready. Signed-off-by: Eli Cohen <eli@mellanox.co.il> [ Move VLAN control bit setting to before wmb(). - Roland ] Signed-off-by: Roland Dreier <rolandd@cisco.com>	2010-12-01 11:08:54 -08:00

... 4 5 6 7 8 ...

3095 Commits