linux-next

mirror of https://github.com/edk2-porting/linux-next.git synced 2024-12-27 14:43:58 +08:00

Author	SHA1	Message	Date
Tatyana Nikolova	ec04847c0c	RDMA/core: Fix for parsing netlink string attribute The string iwpm_ulib_name is recorded in a nlmsg as a netlink attribute. Without this fix parsing of the nlmsg by the userspace port mapper service fails because of unknown attribute length, causing the port mapper service not to register the client, which has sent the nlmsg. Signed-off-by: Tatyana Nikolova <tatyana.e.nikolova@intel.com> Cc: <stable@vger.kernel.org> #v3.16 Reviewed-By: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-05-12 13:03:04 -04:00
David Ahern	0d0f738f6a	IB/core: Fix unaligned accesses Addresses the following kernel logs seen during boot of sparc systems: Kernel unaligned access at TPC[103bce50] cm_find_listen+0x34/0xf8 [ib_cm] Kernel unaligned access at TPC[103bce50] cm_find_listen+0x34/0xf8 [ib_cm] Kernel unaligned access at TPC[103bce50] cm_find_listen+0x34/0xf8 [ib_cm] Kernel unaligned access at TPC[103bce50] cm_find_listen+0x34/0xf8 [ib_cm] Kernel unaligned access at TPC[103bce50] cm_find_listen+0x34/0xf8 [ib_cm] Signed-off-by: David Ahern <david.ahern@oracle.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-05-05 13:21:27 -04:00
Honggang LI	471e705832	IB/core: change rdma_gid2ip into void function as it always return zero Signed-off-by: Honggang Li <honli@redhat.com> Acked-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-05-05 13:21:27 -04:00
Guy Shapiro	325ad0617a	IB/core: dma unmap optimizations While unmapping an ODP writable page, the dirty bit of the page is set. In order to do so, the head of the compound page is found. Currently, the compound head is found even on non-writable pages, where it is never used, leading to unnecessary cpu barrier that impacts performance. This patch moves the search for the compound head to be done only when needed. Signed-off-by: Guy Shapiro <guysh@mellanox.com> Acked-by: Shachar Raindel <raindel@mellanox.com> Reviewed-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-05-05 09:18:02 -04:00
Guy Shapiro	c1d383b578	IB/core: dma map/unmap locking optimizations Currently, while mapping or unmapping pages for ODP, the umem mutex is locked and unlocked once for each page. Such lock/unlock operation take few tens to hundreds of nsecs. This makes a significant impact when mapping or unmapping few MBs of memory. To avoid this, the mutex should be locked only once per operation, and not per page. Signed-off-by: Guy Shapiro <guysh@mellanox.com> Acked-by: Shachar Raindel <raindel@mellanox.com> Reviewed-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-05-05 09:18:02 -04:00
Tatyana Nikolova	6eec177461	RDMA/core: Enable the iWarp Port Mapper to provide the actual address of the connecting peer to its clients Add functionality to enable the port mapper on the passive side to provide to its clients the actual (non-mapped) ip/tcp address information of the connecting peer 1) Adding remote_info_cb() to process the address info of the connecting peer The address info is provided by the user space port mapper service when the connection is initiated by the peer 2) Adding a hash list to store the remote address info 3) Adding functionality to add/remove the remote address info After the info has been provided to the port mapper client, it is removed from the hash list Signed-off-by: Tatyana Nikolova <tatyana.e.nikolova@intel.com> Reviewed-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-05-05 09:18:01 -04:00
Jason Gunthorpe	285214409a	RDMA/CMA: Canonize IPv4 on IPV6 sockets properly When accepting a new IPv4 connect to an IPv6 socket, the CMA tries to canonize the address family to IPv4, but does not properly process the listening sockaddr to get the listening port, and does not properly set the address family of the canonized sockaddr. Fixes: `e51060f08a` ("IB: IP address based RDMA connection manager") Cc: <stable@vger.kernel.org> Reported-By: Yotam Kenneth <yotamke@mellanox.com> Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Tested-by: Haggai Eran <haggaie@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-05-05 09:18:01 -04:00
Doug Ledford	c1c2fef6cf	Merge branches 'cve-fixup', 'ipoib', 'iser', 'misc-4.1', 'or-mlx4' and 'srp' into for-4.1	2015-04-15 16:24:49 -04:00
Sébastien Dugué	a233c4b54c	ib_uverbs: Fix pages leak when using XRC SRQs Hello, When an application using XRCs abruptly terminates, the mmaped pages of the CQ buffers are leaked. This comes from the fact that when resources are released in ib_uverbs_cleanup_ucontext(), we fail to release the CQs because their refcount is not 0. When creating an XRC SRQ, we increment the associated CQ refcount. This refcount is only decremented when the SRQ is released. Therefore we need to release the SRQs prior to the CQs to make sure that all references to the CQs are gone before trying to release these. Signed-off-by: Sebastien Dugue <sebastien.dugue@bull.net> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-04-15 16:06:39 -04:00
Yann Droneaud	66578b0b2f	IB/core: don't disallow registering region starting at 0x0 In a call to ib_umem_get(), if address is 0x0 and size is already page aligned, check added in commit `8494057ab5` ("IB/uverbs: Prevent integer overflow in ib_umem_get address arithmetic") will refuse to register a memory region that could otherwise be valid (provided vm.mmap_min_addr sysctl and mmap_low_allowed SELinux knobs allow userspace to map something at address 0x0). This patch allows back such registration: ib_umem_get() should probably don't care of the base address provided it can be pinned with get_user_pages(). There's two possible overflows, in (addr + size) and in PAGE_ALIGN(addr + size), this patch keep ensuring none of them happen while allowing to pin memory at address 0x0. Anyway, the case of size equal 0 is no more (partially) handled as 0-length memory region are disallowed by an earlier check. Link: http://mid.gmane.org/cover.1428929103.git.ydroneaud@opteya.com Cc: <stable@vger.kernel.org> # `8494057ab5` ("IB/uverbs: Prevent integer overflow in ib_umem_get address arithmetic") Cc: Shachar Raindel <raindel@mellanox.com> Cc: Jack Morgenstein <jackm@mellanox.com> Cc: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Yann Droneaud <ydroneaud@opteya.com> Reviewed-by: Sagi Grimberg <sagig@mellanox.com> Reviewed-by: Haggai Eran <haggaie@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-04-15 16:05:02 -04:00
Yann Droneaud	8abaae62f3	IB/core: disallow registering 0-sized memory region If ib_umem_get() is called with a size equal to 0 and an non-page aligned address, one page will be pinned and a 0-sized umem will be returned to the caller. This should not be allowed: it's not expected for a memory region to have a size equal to 0. This patch adds a check to explicitly refuse to register a 0-sized region. Link: http://mid.gmane.org/cover.1428929103.git.ydroneaud@opteya.com Cc: <stable@vger.kernel.org> Cc: Shachar Raindel <raindel@mellanox.com> Cc: Jack Morgenstein <jackm@mellanox.com> Cc: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Yann Droneaud <ydroneaud@opteya.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-04-15 16:05:02 -04:00
Shachar Raindel	8494057ab5	IB/uverbs: Prevent integer overflow in ib_umem_get address arithmetic Properly verify that the resulting page aligned end address is larger than both the start address and the length of the memory area requested. Both the start and length arguments for ib_umem_get are controlled by the user. A misbehaving user can provide values which will cause an integer overflow when calculating the page aligned end address. This overflow can cause also miscalculation of the number of pages mapped, and additional logic issues. Addresses: CVE-2014-8159 Cc: <stable@vger.kernel.org> Signed-off-by: Shachar Raindel <raindel@mellanox.com> Signed-off-by: Jack Morgenstein <jackm@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2015-04-02 09:53:59 -07:00
Roland Dreier	147d1da951	Merge branches 'core', 'cxgb4', 'iser', 'mlx4', 'mlx5', 'ocrdma', 'odp', 'qib' and 'srp' into for-next	2015-02-20 09:04:40 -08:00
Haggai Eran	f4056bfd8c	IB/core: Add on demand paging caps to ib_uverbs_ex_query_device Add on-demand paging capabilities reporting to the extended query device verb. Yann Droneaud writes: Note: as offsetof() is used to retrieve the size of the lower chunk of the response, beware that it only works if the upper chunk is right after, without any implicit padding. And, as the size of the latter chunk is added to the base size, implicit padding at the end of the structure is not taken in account. Both point must be taken in account when extending the uverbs functionalities. Signed-off-by: Haggai Eran <haggaie@mellanox.com> Reviewed-by: Yann Droneaud <ydroneaud@opteya.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2015-02-18 08:36:26 -08:00
Eli Cohen	02d1aa7af1	IB/core: Add support for extended query device caps Add extensible query device capabilities verb to allow adding new features. ib_uverbs_ex_query_device is added and copy_query_dev_fields is used to copy capability fields to be used by both ib_uverbs_query_device and ib_uverbs_ex_query_device. Following the discussion about this patch [1], the code now validates the command's comp_mask is zero, returning -EINVAL for unknown values, in order to allow extending the verb in the future. The verb also checks the user-space provided response buffer size and only fills in capabilities that will fit in the buffer. In attempt to follow the spirit of presentation [2] by Tzahi Oved that was presented during OpenFabrics Alliance International Developer Workshop 2013, the comp_mask bits will only describe which fields are valid. Furthermore, fields that can simply be cleared when they are not supported, do not require a comp_mask bit at all. The verb returns a response_length field containing the actual number of bytes written by the kernel, so that a newer version running on an older kernel can tell which fields were actually returned. [1] [PATCH v1 0/5] IB/core: extended query device caps cleanup for v3.19 http://thread.gmane.org/gmane.linux.kernel.api/7889/ [2] https://www.openfabrics.org/images/docs/2013_Dev_Workshop/Tues_0423/2013_Workshop_Tues_0830_Tzahi_Oved-verbs_extensions_ofa_2013-tzahio.pdf Signed-off-by: Eli Cohen <eli@mellanox.com> Signed-off-by: Haggai Eran <haggaie@mellanox.com> Reviewed-by: Yann Droneaud <ydroneaud@opteya.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2015-02-18 08:36:26 -08:00
Haggai Eran	4fc701ead7	IB/core: Properly handle registration of on-demand paging MRs after dereg When the last on-demand paging MR is released the notifier count is left non-zero so that concurrent page faults will have to abort. If a new MR is then registered, the counter is reset. However, the decision is made to put the new MR in the list waiting for the notifier count to reach zero, before the counter is reset. An invalidation or another MR registration can release the MR to handle page faults, but without such an event the MR can wait forever. The patch fixes this issue by adding a check whether the MR is the first on-demand paging MR when deciding whether it is ready to handle page faults. If it is the first MR, we know that there are no mmu notifiers running in parallel to the registration. Fixes: `882214e2b1` ("IB/core: Implement support for MMU notifiers regarding on demand paging regions") Signed-off-by: Haggai Eran <haggaie@mellanox.com> Signed-off-by: Shachar Raindel <raindel@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2015-02-17 22:14:56 -08:00
Moshe Lazer	0fb8bcf022	IB/core: Fix deadlock on uverbs modify_qp error flow The deadlock occurs in __uverbs_modify_qp: we take a lock (idr_read_qp) and in case of failure in ib_resolve_eth_l2_attrs we don't release it (put_qp_read). Fix that. Fixes: `ed4c54e5b4` ("IB/core: Resolve Ethernet L2 addresses when modifying QP") Signed-off-by: Moshe Lazer <moshel@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2015-02-17 22:06:42 -08:00
Ilya Nelkenbaum	c2be9dc0e0	IB/core: When marshaling ucma path from user-space, clear unused fields When marshaling a user path to the kernel struct ib_sa_path, we need to zero smac and dmac and set the vlan id to the "no vlan" value. This is to ensure that Ethernet attributes are not used with InfiniBand QPs. Fixes: `dd5f03beb4` ("IB/core: Ethernet L2 attributes in verbs/cm structures") Signed-off-by: Ilya Nelkenbaum <ilyan@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2015-02-17 12:34:52 -08:00
Yann Droneaud	43c6116573	Revert "IB/core: Add support for extended query device caps" While commit `7e36ef8205` ("IB/core: Temporarily disable ex_query_device uverb") is correct as it makes the extended QUERY_DEVICE uverb (which came as part of commit `5a77abf9a9` ("IB/core: Add support for extended query device caps") and commit `860f10a799` ("IB/core: Add flags for on demand paging support")) not available to userspace, it doesn't address the initial issue regarding ib_copy_to_udata() [1][2]. Additionally, further discussions around this new uverb seems to conclude it would require a different data structure than the one currently described in <rdma/ib_user_verbs.h> [3]. Both of these issues require a revert of the changes, so this patch partially reverts commit `8cdd312cfe` ("IB/mlx5: Implement the ODP capability query verb") and commit `860f10a799` ("IB/core: Add flags for on demand paging support") and fully reverts commit `5a77abf9a9` ("IB/core: Add support for extended query device caps"). [1] "Re: [PATCH v3 06/17] IB/core: Add support for extended query device caps" http://mid.gmane.org/1418733236.2779.26.camel@opteya.com [2] "Re: [PATCH] IB/core: Temporarily disable ex_query_device uverb" http://mid.gmane.org/1423067503.3030.83.camel@opteya.com [3] "RE: [PATCH v1 1/5] IB/uverbs: ex_query_device: answer must not depend on request's comp_mask" http://mid.gmane.org/2807E5FD2F6FDA4886F6618EAC48510E0CC12C30@CRSMSX101.amr.corp.intel.com Cc: Eli Cohen <eli@mellanox.com> Cc: Haggai Eran <haggaie@mellanox.com> Cc: Ira Weiny <ira.weiny@intel.com> Cc: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Cc: Sagi Grimberg <sagig@mellanox.com> Cc: Shachar Raindel <raindel@mellanox.com> Signed-off-by: Yann Droneaud <ydroneaud@opteya.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2015-02-06 00:54:33 -08:00
Haggai Eran	7e36ef8205	IB/core: Temporarily disable ex_query_device uverb Commit `5a77abf9a9` ("IB/core: Add support for extended query device caps") added a new extended verb to query the capabilities of RDMA devices, but the semantics of this verb are still under debate [1]. Don't expose this verb to userspace until the ABI is nailed down. [1] [PATCH v1 0/5] IB/core: extended query device caps cleanup for v3.19 http://www.spinics.net/lists/linux-rdma/msg22904.html Signed-off-by: Haggai Eran <haggaie@mellanox.com> Reviewed-by: Yann Droneaud <ydroneaud@opteya.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2015-02-03 09:29:11 -08:00
Roland Dreier	a7cfef21e3	Merge branches 'core', 'cxgb4', 'ipoib', 'iser', 'mlx4', 'ocrdma', 'odp' and 'srp' into for-next	2014-12-15 18:19:20 -08:00
Haggai Eran	882214e2b1	IB/core: Implement support for MMU notifiers regarding on demand paging regions * Add an interval tree implementation for ODP umems. Create an interval tree for each ucontext (including a count of the number of ODP MRs in this context, semaphore, etc.), and register ODP umems in the interval tree. * Add MMU notifiers handling functions, using the interval tree to notify only the relevant umems and underlying MRs. * Register to receive MMU notifier events from the MM subsystem upon ODP MR registration (and unregister accordingly). * Add a completion object to synchronize the destruction of ODP umems. * Add mechanism to abort page faults when there's a concurrent invalidation. The way we synchronize between concurrent invalidations and page faults is by keeping a counter of currently running invalidations, and a sequence number that is incremented whenever an invalidation is caught. The page fault code checks the counter and also verifies that the sequence number hasn't progressed before it updates the umem's page tables. This is similar to what the kvm module does. In order to prevent the case where we register a umem in the middle of an ongoing notifier, we also keep a per ucontext counter of the total number of active mmu notifiers. We only enable new umems when all the running notifiers complete. Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Shachar Raindel <raindel@mellanox.com> Signed-off-by: Haggai Eran <haggaie@mellanox.com> Signed-off-by: Yuval Dagan <yuvalda@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2014-12-15 18:13:36 -08:00
Shachar Raindel	8ada2c1c0c	IB/core: Add support for on demand paging regions * Extend the umem struct to keep the ODP related data. * Allocate and initialize the ODP related information in the umem (page_list, dma_list) and freeing as needed in the end of the run. * Store a reference to the process PID struct in the ucontext. Used to safely obtain the task_struct and the mm during fault handling, without preventing the task destruction if needed. * Add 2 helper functions: ib_umem_odp_map_dma_pages and ib_umem_odp_unmap_dma_pages. These functions get the DMA addresses of specific pages of the umem (and, currently, pin them). * Support for page faults only - IB core will keep the reference on the pages used and call put_page when freeing an ODP umem area. Invalidations support will be added in a later patch. Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Shachar Raindel <raindel@mellanox.com> Signed-off-by: Haggai Eran <haggaie@mellanox.com> Signed-off-by: Majd Dibbiny <majd@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2014-12-15 18:13:36 -08:00
Sagi Grimberg	860f10a799	IB/core: Add flags for on demand paging support * Add a configuration option for enable on-demand paging support in the infiniband subsystem (CONFIG_INFINIBAND_ON_DEMAND_PAGING). In a later patch, this configuration option will select the MMU_NOTIFIER configuration option to enable mmu notifiers. * Add a flag for on demand paging (ODP) support in the IB device capabilities. * Add a flag to request ODP MR in the access flags to reg_mr. * Fail registrations done with the ODP flag when the low-level driver doesn't support this. * Change the conditions in which an MR will be writable to explicitly specify the access flags. This is to avoid making an MR writable just because it is an ODP MR. * Add a ODP capabilities to the extended query device verb. Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Shachar Raindel <raindel@mellanox.com> Signed-off-by: Haggai Eran <haggaie@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2014-12-15 18:13:35 -08:00
Eli Cohen	5a77abf9a9	IB/core: Add support for extended query device caps Add extensible query device capabilities verb to allow adding new features. ib_uverbs_ex_query_device is added and copy_query_dev_fields is used to copy capability fields to be used by both ib_uverbs_query_device and ib_uverbs_ex_query_device. Signed-off-by: Eli Cohen <eli@mellanox.com> Signed-off-by: Haggai Eran <haggaie@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2014-12-15 18:13:35 -08:00
Haggai Eran	c5d76f130b	IB/core: Add umem function to read data from user-space In some drivers there's a need to read data from a user space area that was pinned using ib_umem when running from a different process context. The ib_umem_copy_from function allows reading data from the physical pages pinned in the ib_umem struct. Signed-off-by: Haggai Eran <haggaie@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2014-12-15 18:13:35 -08:00
Haggai Eran	406f9e5fa9	IB/core: Replace ib_umem's offset field with a full address In order to allow umems that do not pin memory, we need the umem to keep track of its region's address. This makes the offset field redundant, and so this patch removes it. Signed-off-by: Haggai Eran <haggaie@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2014-12-15 18:13:35 -08:00
Or Kehati	346f98b41b	IB/addr: Improve address resolution callback scheduling Address resolution always does a context switch to a work-queue to deliver the address resolution event. When the IP address is already cached in the system ARP table, we're going through the following: chain: rdma_resolve_ip --> addr_resolve (cache hit) --> which ends up with: queue_req --> set_timeout (now) --> mod_delayed_work(,, delay=1) We actually do realize that the timeout should be zero, but the code forces it to a minimum of one jiffie. Using one jiffie as the minimum delay value results in sub-optimal scheduling of executing this work item by the workqueue, which on the below testbed costs about 3-4ms out of 12ms total time. To fix that, we let the minimum delay to be zero. Note that the connect step times change too, as there are address resolution calls from that flow. The results were taken from running both client and server on the same node, over mlx4 RoCE port. before --> step total ms max ms min us us / conn create id : 0.01 0.01 6.00 6.00 resolve addr : 4.02 4.01 4013.00 4016.00 resolve route: 0.18 0.18 182.00 183.00 create qp : 1.15 1.15 1150.00 1150.00 connect : 6.73 6.73 6730.00 6731.00 disconnect : 0.55 0.55 549.00 550.00 destroy : 0.01 0.01 9.00 9.00 after --> step total ms max ms min us us / conn create id : 0.01 0.01 6.00 6.00 resolve addr : 0.05 0.05 49.00 52.00 resolve route: 0.21 0.21 207.00 208.00 create qp : 1.10 1.10 1104.00 1104.00 connect : 1.22 1.22 1220.00 1221.00 disconnect : 0.71 0.71 713.00 713.00 destroy : 0.01 0.01 9.00 9.00 Signed-off-by: Or Kehati <ork@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Acked-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2014-12-15 18:10:13 -08:00
Jack Morgenstein	514f3ddffe	IB/core: Fix mgid key handling in SA agent multicast data-base Applications can request that the SM assign an MGID by passing a mcast member request containing MGID = 0. When the SM responds by sending the allocated MGID, this MGID replaces the 0-MGID in the multicast group. However, the MGID field in the group is also the key field in the IB core multicast code rbtree containing the multicast groups for the port. Since this is a key field, correct handling requires that the group entry be deleted from the rbtree and then re-inserted with the new key, so that the table structure is properly maintained. The current code does not do this correctly. Correct operation requires that if the key-field gid has changed at all, it should be deleted and re-inserted. Note that when inserting, if the new MGID is zero (not the case here but the code should handle this correctly), we allow duplicate entries for 0-MGIDs. Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Acked-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2014-12-15 18:10:13 -08:00
Moni Shoua	c1bd6cde8e	IB/core: Do not resolve VLAN if already resolved For RoCE, resolution of layer 2 address attributes forces no VLAN if link-local GIDs are used. This patch allows applications to choose the VLAN ID for link-local based RoCE GIDs by setting IB_QP_VID in their QP attribute mask, and prevents the core from overriding this choice. Cc: Ursula Braun <ursula.braun@de.ibm.com> Signed-off-by: Moni Shoua <monis@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2014-12-15 18:10:12 -08:00
Roland Dreier	7b909bb49a	Merge branches 'core', 'cxgb4', 'iser', 'mlx5' and 'ocrdma' into for-next	2014-10-14 14:09:12 -07:00
Jack Morgenstein	a040f95dc8	IB/core: Fix XRC race condition in ib_uverbs_open_qp In ib_uverbs_open_qp, the sharable xrc target qp is created as a "pseudo" qp and added to a list of qp's sharing the same physical QP. This is done before the "pseudo" qp is assigned a uobject. There is a race condition here if an async event arrives at the physical qp. If the event is handled after the pseudo qp is added to the list, but before it is assigned a uobject, the kernel crashes in ib_uverbs_qp_event_handler, due to trying to dereference a NULL uobject pointer. Note that simply checking for non-NULL is not enough, due to error flows in ib_uverbs_open_qp. If the failure is after assigning the uobject, but before the qp has fully been created, we still have a problem. Thus, in ib_uverbs_qp_event_handler, we test that the uobject is present, and also that it is live. Reported-by: Matthew Finlay <matt@mellanox.com> Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2014-10-14 00:30:56 -07:00
Devesh Sharma	8b0f93d949	IB/core: Clear AH attr variable to prevent garbage data During create-ah from userspace, uverbs is sending garbage data in attr.dmac and attr.vlan_id. This patch sets attr.dmac and attr.vlan_id to zero. Fixes: `dd5f03beb4` ("IB/core: Ethernet L2 attributes in verbs/cm structures") Signed-off-by: Devesh Sharma <devesh.sharma@emulex.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2014-10-14 00:29:06 -07:00
Eli Cohen	377b513485	IB/core: Avoid leakage from kernel to user space Clear the reserved field of struct ib_uverbs_async_event_desc which is copied to user space. Signed-off-by: Eli Cohen <eli@mellanox.com> Reviewed-by: Yann Droneaud <ydroneaud@opteya.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2014-10-09 00:08:40 -07:00
Roland Dreier	3bdad2d13f	Merge branches 'core', 'ipoib', 'iser', 'mlx4', 'ocrdma' and 'qib' into for-next	2014-09-22 10:05:40 -07:00
Matan Barak	a59c5850f0	IB/core: When marshaling uverbs path, clear unused fields When marsheling a user path to the kernel struct ib_sa_path, need to zero smac, dmac and set the vlan id to the "no vlan" value. Fixes: `dd5f03beb4` ("IB/core: Ethernet L2 attributes in verbs/cm structures") Reported-by: Aleksey Senin <alekseys@mellanox.com> Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2014-09-22 09:46:52 -07:00
Shawn Bohrer	87773dd56d	IB: ib_umem_release() should decrement mm->pinned_vm from ib_umem_get In debugging an application that receives -ENOMEM from ib_reg_mr(), I found that ib_umem_get() can fail because the pinned_vm count has wrapped causing it to always be larger than the lock limit even with RLIMIT_MEMLOCK set to RLIM_INFINITY. The wrapping of pinned_vm occurs because the process that calls ib_reg_mr() will have its mm->pinned_vm count incremented. Later a different process with a different mm_struct than the one that allocated the ib_umem struct ends up releasing it which results in decrementing the new processes mm->pinned_vm count past zero and wrapping. I'm not entirely sure what circumstances cause a different process to release the ib_umem than the one that allocated it but the kernel stack trace of the freeing process from my situation looks like the following: Call Trace: [<ffffffff814d64b1>] dump_stack+0x19/0x1b [<ffffffffa0b522a5>] ib_umem_release+0x1f5/0x200 [ib_core] [<ffffffffa0b90681>] mlx4_ib_destroy_qp+0x241/0x440 [mlx4_ib] [<ffffffffa0b4d93c>] ib_destroy_qp+0x12c/0x170 [ib_core] [<ffffffffa0cc7129>] ib_uverbs_close+0x259/0x4e0 [ib_uverbs] [<ffffffff81141cba>] __fput+0xba/0x240 [<ffffffff81141e4e>] ____fput+0xe/0x10 [<ffffffff81060894>] task_work_run+0xc4/0xe0 [<ffffffff810029e5>] do_notify_resume+0x95/0xa0 [<ffffffff814e3dd0>] int_signal+0x12/0x17 The following patch fixes the issue by storing the pid struct of the process that calls ib_umem_get() so that ib_umem_release and/or ib_umem_account() can properly decrement the pinned_vm count of the correct mm_struct. Signed-off-by: Shawn Bohrer <sbohrer@rgmadvisors.com> Reviewed-by: Shachar Raindel <raindel@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2014-09-19 09:55:42 -07:00
Roland Dreier	d087f6ad72	Merge branches 'core', 'cxgb4', 'ipoib', 'iser', 'iwcm', 'mad', 'misc', 'mlx4', 'mlx5', 'ocrdma' and 'srp' into for-next	2014-08-14 08:58:04 -07:00
Ira Weiny	1471cb6ca6	IB/mad: Add user space RMPP support Using the new registration mechanism, define a flag that indicates the user wishes to process RMPP messages in user space rather than have the kernel process them. Signed-off-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2014-08-10 20:36:00 -07:00
Ira Weiny	0f29b46d49	IB/mad: add new ioctl to ABI to support new registration options Registrations options are specified through flags. Definitions of flags will be in subsequent patches. Signed-off-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2014-08-10 20:36:00 -07:00
Ira Weiny	9ad13a4234	IB/mad: Add dev_notice messages for various umad/mad registration failures Registration failures can be difficult to debug from userspace. This gives more visibility. Signed-off-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2014-08-10 20:36:00 -07:00
Ira Weiny	7ef5d4b046	IB/mad: Update module to [pr\|dev]_* style print messages Use dev_* style print when struct device is available. Also combine previously line broken user-visible strings as per Documentation/CodingStyle: "However, never break user-visible strings such as printk messages, because that breaks the ability to grep for them." Signed-off-by: Ira Weiny <ira.weiny@intel.com> [ Remove PFX so the patch actually builds. - Roland ] Signed-off-by: Roland Dreier <roland@purestorage.com>	2014-08-10 20:35:39 -07:00
Ira Weiny	f426a40eb6	IB/umad: Update module to [pr\|dev]_* style print messages Use dev_* style print when struct device is available. Also combine previously line broken user-visible strings as per Documentation/CodingStyle: "However, never break user-visible strings such as printk messages, because that breaks the ability to grep for them." Signed-off-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2014-08-10 19:50:07 -07:00
Steve Wise	2f0304d218	RDMA/iwcm: Use a default listen backlog if needed If the user creates a listening cm_id with backlog of 0 the IWCM ends up not allowing any connection requests at all. The correct behavior is for the IWCM to pick a default value if the user backlog parameter is zero. Lustre from version 1.8.8 onward uses a backlog of 0, which breaks iwarp support without this fix. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Cc: <stable@vger.kernel.org> Signed-off-by: Roland Dreier <roland@purestorage.com>	2014-08-05 07:33:24 -07:00
Matan Barak	7e6edb9b2e	IB/core: Add user MR re-registration support Memory re-registration is a feature that enables changing the attributes of a memory region registered by user-space, including PD, translation (address and length) and access flags. Add the required support in uverbs and the kernel verbs API. Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2014-08-01 15:11:13 -07:00
Roland Dreier	eeaddf3670	Merge branches 'core', 'cxgb3', 'cxgb4', 'iser', 'iwpm', 'misc', 'mlx4', 'mlx5', 'noio', 'ocrdma', 'qib', 'srp' and 'usnic' into for-next	2014-06-10 10:12:14 -07:00
Tatyana Nikolova	30dc5e63d6	RDMA/core: Add support for iWARP Port Mapper user space service This patch adds iWARP Port Mapper (IWPM) Version 2 support. The iWARP Port Mapper implementation is based on the port mapper specification section in the Sockets Direct Protocol paper - http://www.rdmaconsortium.org/home/draft-pinkerton-iwarp-sdp-v1.0.pdf Existing iWARP RDMA providers use the same IP address as the native TCP/IP stack when creating RDMA connections. They need a mechanism to claim the TCP ports used for RDMA connections to prevent TCP port collisions when other host applications use TCP ports. The iWARP Port Mapper provides a standard mechanism to accomplish this. Without this service it is possible for RDMA application to bind/listen on the same port which is already being used by native TCP host application. If that happens the incoming TCP connection data can be passed to the RDMA stack with error. The iWARP Port Mapper solution doesn't contain any changes to the existing network stack in the kernel space. All the changes are contained with the infiniband tree and also in user space. The iWARP Port Mapper service is implemented as a user space daemon process. Source for the IWPM service is located at http://git.openfabrics.org/git?p=~tnikolova/libiwpm-1.0.0/.git;a=summary The iWARP driver (port mapper client) sends to the IWPM service the local IP address and TCP port it has received from the RDMA application, when starting a connection. The IWPM service performs a socket bind from user space to get an available TCP port, called a mapped port, and communicates it back to the client. In that sense, the IWPM service is used to map the TCP port, which the RDMA application uses to any port available from the host TCP port space. The mapped ports are used in iWARP RDMA connections to avoid collisions with native TCP stack which is aware that these ports are taken. When an RDMA connection using a mapped port is terminated, the client notifies the IWPM service, which then releases the TCP port. The message exchange between the IWPM service and the iWARP drivers (between user space and kernel space) is implemented using netlink sockets. 1) Netlink interface functions are added: ibnl_unicast() and ibnl_mulitcast() for sending netlink messages to user space 2) The signature of the existing ibnl_put_msg() is changed to be more generic 3) Two netlink clients are added: RDMA_NL_NES, RDMA_NL_C4IW corresponding to the two iWarp drivers - nes and cxgb4 which use the IWPM service 4) Enums are added to enumerate the attributes in the netlink messages, which are exchanged between the user space IWPM service and the iWARP drivers Signed-off-by: Tatyana Nikolova <tatyana.e.nikolova@intel.com> Signed-off-by: Steve Wise <swise@opengridcomputing.com> Reviewed-by: PJ Waskiewicz <pj.waskiewicz@solidfire.com> [ Fold in range checking fixes and nlh_next removal as suggested by Dan Carpenter and Steve Wise. Fix sparse endianness in hash. - Roland ] Signed-off-by: Roland Dreier <roland@purestorage.com>	2014-06-10 10:11:45 -07:00
Bart Van Assche	60e1751cb5	IB/umad: Fix use-after-free on close Avoid that closing /dev/infiniband/umad<n> or /dev/infiniband/issm<n> triggers a use-after-free. __fput() invokes f_op->release() before it invokes cdev_put(). Make sure that the ib_umad_device structure is freed by the cdev_put() call instead of f_op->release(). This avoids that changing the port mode from IB into Ethernet and back to IB followed by restarting opensmd triggers the following kernel oops: general protection fault: 0000 [#1] PREEMPT SMP RIP: 0010:[<ffffffff810cc65c>] [<ffffffff810cc65c>] module_put+0x2c/0x170 Call Trace: [<ffffffff81190f20>] cdev_put+0x20/0x30 [<ffffffff8118e2ce>] __fput+0x1ae/0x1f0 [<ffffffff8118e35e>] ____fput+0xe/0x10 [<ffffffff810723bc>] task_work_run+0xac/0xe0 [<ffffffff81002a9f>] do_notify_resume+0x9f/0xc0 [<ffffffff814b8398>] int_signal+0x12/0x17 Reference: https://bugzilla.kernel.org/show_bug.cgi?id=75051 Signed-off-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Yann Droneaud <ydroneaud@opteya.com> Cc: <stable@vger.kernel.org> # 3.x: `8ec0a0e6b5`: IB/umad: Fix error handling Signed-off-by: Roland Dreier <roland@purestorage.com>	2014-06-06 11:38:31 -07:00
Haggai Eran	584482ac80	IB/core: Fix kobject leak on device register error flow The ports kobject isn't being released during error flow in device registration. This patch refactors the ports kobject cleanup into a single function called from both the error flow in device registration and from the unregistration function. A couple of attributes aren't being deleted (iw_stats_group, and ib_class_attributes). While this may be handled implicitly by the destruction of their kobjects, it seems better to handle all the attributes the same way. Signed-off-by: Haggai Eran <haggaie@mellanox.com> [ Make free_port_list_attributes() static. - Roland ] Signed-off-by: Roland Dreier <roland@purestorage.com>	2014-06-05 09:37:10 -07:00
Haggai Eran	cad6d02acc	IB/core: Fix port kobject deletion during error flow When encountering an error during the add_port function, adding a port to sysfs, the port kobject is freed without being deleted from sysfs. Instead of freeing it directly, the patch uses kobject_put to release the kobject and delete it. Signed-off-by: Haggai Eran <haggaie@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2014-06-04 10:03:49 -07:00

1 2 3 4 5 ...

876 Commits