ib_dma_map_sgtable_attrs() should be mapping the sgls and setting nents
but the ib_uses_virt_dma() path falls back to ib_dma_virt_map_sg() which
will not set the nents in the sgtable.
Check the return value (per the map_sg calling convention) and set
sgt->nents appropriately on success.
Fixes: 79fbd3e124 ("RDMA: Use the sg_table directly and remove the opencoded version from umem")
Link: https://lore.kernel.org/r/20211013165942.89806-1-logang@deltatee.com
Reported-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Tested-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Add code to rxe_get_av in rxe_av.c to use the AH index in UD send WQEs to
lookup the kernel AH. For old user providers continue to use the AV passed
in WQEs. Move setting pkt->rxe to before the call to rxe_get_av() to get
access to the AH pool.
Link: https://lore.kernel.org/r/20211007204051.10086-6-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
The pd field in struct rxe_ah is redundant with the pd field in the
rdma-core's ib_ah. Eliminate the pd field in rxe_ah and add an inline to
extract the pd from the ibah field.
Link: https://lore.kernel.org/r/20211007204051.10086-5-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Make changes to rdma_user_rxe.h to allow indexing AH objects, passing the
index in UD send WRs to the driver and returning the index to the rxe
provider.
Modify rxe_create_ah() to add an index to AH when created and if called
from a new user provider return it to user space. If called from an old
provider mark the AH as not having a useful index. Modify rxe_destroy_ah
to drop the index before deleting the object.
Link: https://lore.kernel.org/r/20211007204051.10086-4-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Make changes to rxe_param.h and rxe_pool.c to allow indexing of AH
objects. Valid indices are non-zero so older providers can be detected.
Link: https://lore.kernel.org/r/20211007204051.10086-3-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Move the struct rxe_av av from struct rxe_send_wqe to struct rxe_send_wr
placing it in wr.ud at the same offset as it was previously. This has the
effect of increasing the size of struct rxe_send_wr while keeping the size
of struct rxe_send_wqe the same. This better reflects the use of this
field which is only used for UD sends. This change has no effect on ABI
compatibility so the modified rxe driver will operate with older versions
of rdma-core.
Link: https://lore.kernel.org/r/20211007204051.10086-2-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
The error flow fixed in this patch is not possible because all kernel
users of create QP interface check that device supports steering before
set IB_QP_CREATE_NETIF_QP flag.
Fixes: c1c9850112 ("IB/mlx4: Add support for steerable IB UD QPs")
Link: https://lore.kernel.org/r/91c61f6e60eb0240f8bbc321fda7a1d2986dd03c.1634023677.git.leonro@nvidia.com
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
The function irdma_cqp_up_map_cmd() is not used. So remove it.
Link: https://lore.kernel.org/r/20211011110128.4057-5-yanjun.zhu@linux.dev
Signed-off-by: Zhu Yanjun <yanjun.zhu@linux.dev>
Acked-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
The function irdma_get_hw_addr() is not used. So remove it.
Link: https://lore.kernel.org/r/20211011110128.4057-4-yanjun.zhu@linux.dev
Signed-off-by: Zhu Yanjun <yanjun.zhu@linux.dev>
Acked-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
The function irdma_sc_send_lsmm_nostag is not used. So remove it.
Link: https://lore.kernel.org/r/20211011110128.4057-3-yanjun.zhu@linux.dev
Signed-off-by: Zhu Yanjun <yanjun.zhu@linux.dev>
Acked-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
The function irdma_uk_mw_bind is not used. So remove it.
Link: https://lore.kernel.org/r/20211011110128.4057-2-yanjun.zhu@linux.dev
Signed-off-by: Zhu Yanjun <yanjun.zhu@linux.dev>
Acked-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
'destroy_workqueue()' already drains the queue before destroying it, so
there is no need to flush it explicitly.
Remove the redundant 'flush_workqueue()' calls.
This was generated with coccinelle:
@@
expression E;
@@
- flush_workqueue(E);
destroy_workqueue(E);
Link: https://lore.kernel.org/r/ca7bac6e6c9c5cc8d04eec3944edb13de0e381a3.1633874776.git.christophe.jaillet@wanadoo.fr
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
The pointer err_str is being initialized with a value that is never read,
it is being updated later on. The assignment is redundant and can be
removed.
Link: https://lore.kernel.org/r/20211007173942.21933-1-colin.king@canonical.com
Addresses-Coverity: ("Unused value")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Replacing kmalloc/kfree/dma_map_single/dma_unmap_single() with
dma_alloc_coherent/dma_free_coherent() helps to reduce code size, and
simplify the code, and coherent DMA will not clear the cache every time.
The SOC that this driver supports does not have incoherent DMA, so this
makes the code follow the DMA API properly with no performance
impact. Currently there are missing dma sync calls around the DMA
transfers.
Link: https://lore.kernel.org/r/20210926061116.282-1-caihuoqing@baidu.com
Signed-off-by: Cai Huoqing <caihuoqing@baidu.com>
Reviewed-by: Wenpeng Liang <liangwenpeng@huawei.com>
Tested-by: Wenpeng Liang <liangwenpeng@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
When get_hw_stats is called, query and return the optional counter
statistic as well.
Link: https://lore.kernel.org/r/20211008122439.166063-14-markzhang@nvidia.com
Signed-off-by: Aharon Landau <aharonl@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Mark Zhang <markzhang@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Add support for ib callback modify_op_stat() to add or remove an optional
counter. When adding, a steering flow table is created with a rule that
catches and counts all the matching packets. When removing, the table and
flow counter are destroyed.
Link: https://lore.kernel.org/r/20211008122439.166063-13-markzhang@nvidia.com
Signed-off-by: Aharon Landau <aharonl@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Mark Zhang <markzhang@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Adding steering infrastructure for adding and removing optional counter.
This allows to add and remove the counters dynamically in order not to
hurt performance.
Link: https://lore.kernel.org/r/20211008122439.166063-12-markzhang@nvidia.com
Signed-off-by: Aharon Landau <aharonl@nvidia.com>
Reviewed-by: Maor Gottlieb <maorg@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Mark Zhang <markzhang@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Add optional counter support when allocate and initialize hw_stats
structure. Optional counters have IB_STAT_FLAG_OPTIONAL flag set and are
disabled by default.
Link: https://lore.kernel.org/r/20211008122439.166063-11-markzhang@nvidia.com
Signed-off-by: Aharon Landau <aharonl@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Mark Zhang <markzhang@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Provide an option to allow users to enable/disable optional counters
through RDMA netlink. Limiting it to users with ADMIN capability only.
Examples:
1. Enable optional counters cc_rx_ce_pkts and cc_rx_cnp_pkts (and
disable all others):
$ sudo rdma statistic set link rocep8s0f0/1 optional-counters \
cc_rx_ce_pkts,cc_rx_cnp_pkts
2. Remove all optional counters:
$ sudo rdma statistic unset link rocep8s0f0/1 optional-counters
Link: https://lore.kernel.org/r/20211008122439.166063-10-markzhang@nvidia.com
Signed-off-by: Aharon Landau <aharonl@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Mark Zhang <markzhang@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
In order to allow expansion of the set command with more set options, take
the set mode out of the main set function.
Link: https://lore.kernel.org/r/20211008122439.166063-9-markzhang@nvidia.com
Signed-off-by: Aharon Landau <aharonl@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Mark Zhang <markzhang@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
This patch adds the ability to get the name, index and status of all
counters for each link through RDMA netlink. This can be used for
user-space to get the current optional-counter mode.
Examples:
$ rdma statistic mode
link rocep8s0f0/1 optional-counters cc_rx_ce_pkts
$ rdma statistic mode supported
link rocep8s0f0/1 supported optional-counters cc_rx_ce_pkts,cc_rx_cnp_pkts,cc_tx_cnp_pkts
link rocep8s0f1/1 supported optional-counters cc_rx_ce_pkts,cc_rx_cnp_pkts,cc_tx_cnp_pkts
Link: https://lore.kernel.org/r/20211008122439.166063-8-markzhang@nvidia.com
Signed-off-by: Aharon Landau <aharonl@nvidia.com>
Signed-off-by: Neta Ostrovsky <netao@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Mark Zhang <markzhang@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
An optional counter is a driver-specific counter that may be dynamically
enabled/disabled. This enhancement allows drivers to expose counters
which are, for example, mutually exclusive and cannot be enabled at the
same time, counters that might degrades performance, optional debug
counters, etc.
Optional counters are marked with IB_STAT_FLAG_OPTIONAL flag. They are not
exported in sysfs, and must be at the end of all stats, otherwise the
attr->show() in sysfs would get wrong indexes for hwcounters that are
behind optional counters.
Link: https://lore.kernel.org/r/20211008122439.166063-7-markzhang@nvidia.com
Signed-off-by: Aharon Landau <aharonl@nvidia.com>
Signed-off-by: Neta Ostrovsky <netao@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Mark Zhang <markzhang@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Add a bitmap in rdma_hw_stat structure, with each bit indicates whether
the corresponding counter is currently disabled or not. By default
hwcounters are enabled.
Link: https://lore.kernel.org/r/20211008122439.166063-6-markzhang@nvidia.com
Signed-off-by: Aharon Landau <aharonl@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Mark Zhang <markzhang@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Add a new API rdma_free_hw_stats_struct to pair with
rdma_alloc_hw_stats_struct (which is also de-inlined).
This will be useful when there are more alloc/free works in following
patches.
Link: https://lore.kernel.org/r/20211008122439.166063-5-markzhang@nvidia.com
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Mark Zhang <markzhang@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Add a counter statistic descriptor structure in rdma_hw_stats. In addition
to the counter name, more meta-information will be added. This code
extension is needed for optional-counter support in the following patches.
Link: https://lore.kernel.org/r/20211008122439.166063-4-markzhang@nvidia.com
Signed-off-by: Aharon Landau <aharonl@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Mark Zhang <markzhang@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
For dependencies in the following patches.
* mellanox/mlx5-next:
net/mlx5: Add priorities for counters in RDMA namespaces
net/mlx5: Add ifc bits to support optional counters
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Add additional flow steering priorities in the RDMA namespace.
This allows adding flow counters to count filtered RDMA traffic and then
continue processing in the regular RDMA steering flow.
Signed-off-by: Aharon Landau <aharonl@nvidia.com>
Reviewed-by: Maor Gottlieb <maorg@nvidia.com>
Signed-off-by: Mark Zhang <markzhang@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Adding bth_opcode field and the relevant bits. This field will be used
to capture and count congestion notification packets (CNP).
Adding source_vhca_port support bit.
This field will be used to check the capability to use the
source_vhca_port as a match criteria in cases of dual port.
Signed-off-by: Aharon Landau <aharonl@nvidia.com>
Reviewed-by: Maor Gottlieb <maorg@nvidia.com>
Signed-off-by: Mark Zhang <markzhang@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
This patch adds support for CQ notifications through the standard verbs
api.
In order to achieve that, a new event queue (EQ) object is introduced,
which is in charge of reporting completion events to the driver. On
driver load, EQs are allocated and their affinity is set to a single
cpu. When a user app creates a CQ with a completion channel, the
completion vector number is converted to a EQ number, which is in charge
of reporting the CQ events.
In addition, the CQ creation admin command now returns an offset for the
CQ doorbell, which is mapped to the userspace provider and is used to arm
the CQ when requested by the user.
The EQs use a single doorbell (located on the registers BAR), which
encodes the EQ number and arm as part of the doorbell value. The EQs are
polled by the driver on each new EQE, and arm it when the poll is
completed.
Link: https://lore.kernel.org/r/20211003105605.29222-1-galpress@amazon.com
Reviewed-by: Firas JahJah <firasj@amazon.com>
Reviewed-by: Yossi Leybovich <sleybo@amazon.com>
Signed-off-by: Gal Pressman <galpress@amazon.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
As ibv_poll_cq()'s manual said, only partial attributes are valid when
completion status != IBV_WC_SUCCESS.
Link: https://lore.kernel.org/r/20210930094813.226888-4-yangx.jy@fujitsu.com
Signed-off-by: Xiao Yang <yangx.jy@fujitsu.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
The is_user members of struct rxe_sq/rxe_rq/rxe_srq are unsed since
commit ae6e843fe0 ("RDMA/rxe: Add memory barriers to kernel queues").
In this case, it is fine to remove them directly.
Link: https://lore.kernel.org/r/20210930094813.226888-2-yangx.jy@fujitsu.com
Signed-off-by: Xiao Yang <yangx.jy@fujitsu.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
There are a couple of subtle error path bugs related to mapping the sgls:
- In rdma_rw_ctx_init(), dma_unmap would be called with an sg that could
have been incremented from the original call, as well as an nents that
is the dma mapped entries not the original number of nents called when
mapped.
- Similarly in rdma_rw_ctx_signature_init, both sg and prot_sg were
unmapped with the incorrect number of nents.
To fix this, switch to the sgtable interface for mapping which
conveniently stores the original nents for unmapping. This will get
cleaned up further once the dma mapping interface supports P2PDMA and
pci_p2pdma_map_sg() can be removed.
Fixes: 0e353e34e1 ("IB/core: add RW API support for signature MRs")
Fixes: a060b5629a ("IB/core: generic RDMA READ/WRITE API")
Link: https://lore.kernel.org/r/20211001213215.3761-1-logang@deltatee.com
Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Currently, if a cache entry is empty, the driver will try to take MRs
from larger cache entries. This behavior consumes a lot of memory.
In addition, when searching for an mkey in an entry, the entry is locked.
When using a multithreaded application with the old behavior, the threads
will block each other more often, which can hurt performance as can be
seen in the table below.
Therefore, avoid it by creating a new mkey when the requested cache entry
is empty.
The test was performed on a machine with
Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz 44 cores.
Here are the time measures for allocating MRs of 2^6 pages. The search in
the cache started from entry 6.
+------------+---------------------+---------------------+
| | Old behavior | New behavior |
| +----------+----------+----------+----------+
| | 1 thread | 5 thread | 1 thread | 5 thread |
+============+==========+==========+==========+==========+
| 1,000 MRs | 14 ms | 30 ms | 14 ms | 80 ms |
+------------+----------+----------+----------+----------+
| 10,000 MRs | 135 ms | 6 sec | 173 ms | 880 ms |
+------------+----------+----------+----------+----------+
|100,000 MRs | 11.2 sec | 57 sec | 1.74 sec | 8.8 sec |
+------------+----------+----------+----------+----------+
Link: https://lore.kernel.org/r/71af2770c737b936f7b10f457f0ef303ffcf7ad7.1632644527.git.leonro@nvidia.com
Signed-off-by: Aharon Landau <aharonl@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
This commit divides the sysfs entry cpu_migration into 2 different entries
One for "from cpus" and the other for "to cpus".
Link: https://lore.kernel.org/r/20210922125333.351454-8-haris.iqbal@ionos.com
Signed-off-by: Md Haris Iqbal <haris.iqbal@ionos.com>
Reviewed-by: Gioh Kim <gi-oh.kim@ionos.com>
Reviewed-by: Aleksei Marov <aleksei.marov@ionos.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Allowing these characters in sessname can lead to unexpected results,
particularly because / is used as a separator between files in a path, and
. points to the current directory.
Link: https://lore.kernel.org/r/20210922125333.351454-7-haris.iqbal@ionos.com
Signed-off-by: Md Haris Iqbal <haris.iqbal@ionos.com>
Reviewed-by: Gioh Kim <gi-oh.kim@ionos.com>
Reviewed-by: Aleksei Marov <aleksei.marov@ionos.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
The same code snip used twice, to avoid duplicate, replace it with a
destroy_cq helper.
Link: https://lore.kernel.org/r/20210922125333.351454-6-haris.iqbal@ionos.com
Signed-off-by: Jack Wang <jinpu.wang@ionos.com>
Signed-off-by: Md Haris Iqbal <haris.iqbal@ionos.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
if (con->cid >= con->sess->irq_con_num) check can be replaced with a
is_pollqueue helper.
Link: https://lore.kernel.org/r/20210922125333.351454-5-haris.iqbal@ionos.com
Signed-off-by: Jack Wang <jinpu.wang@ionos.com>
Signed-off-by: Md Haris Iqbal <haris.iqbal@ionos.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
When testing with poll mode, it will fail and lead to warning below on
client side:
$ echo "sessname=bla path=gid:fe80::2:c903:4e:d0b3@gid:fe80::2:c903:8:ca17 device_path=/dev/nullb2 nr_poll_queues=-1" | \
sudo tee /sys/devices/virtual/rnbd-client/ctl/map_device
rnbd_client L597: Mapping device /dev/nullb2 on session bla, (access_mode: rw, nr_poll_queues: 8)
WARNING: CPU: 3 PID: 9886 at drivers/infiniband/core/cq.c:447 ib_cq_pool_get+0x26f/0x2a0 [ib_core]
The problem is in case of poll queue, we need to still call
ib_alloc_cq/ib_free_cq, we can't use cq_poll api for poll queue.
As both client and server use shared function from rtrs, set irq_con_num
to con_num on server side, which is number of total connection of the
session, this way we can differ if the rtrs_con requires pollqueue.
Following up patches will replace the duplicate code with helpers.
Link: https://lore.kernel.org/r/20210922125333.351454-4-haris.iqbal@ionos.com
Signed-off-by: Jack Wang <jinpu.wang@ionos.com>
Reviewed-by: Gioh Kim <gi-oh.kim@ionos.com>
Signed-off-by: Md Haris Iqbal <haris.iqbal@cloud.ionos.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Since we have changed all sysfs show functions to use sysfs_emit, we do
not require the len (PAGE_SIZE) in our helper print functions. So remove
it from the function parameter.
Link: https://lore.kernel.org/r/20210922125333.351454-3-haris.iqbal@ionos.com
Signed-off-by: Md Haris Iqbal <haris.iqbal@ionos.com>
Signed-off-by: Jack Wang <jinpu.wang@ionos.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
sysfs_emit function was added to be aware of the PAGE_SIZE maximum of the
temporary buffer used for outputting sysfs content, so there is no
possible overruns. So replace the uses of any s*printf functions for the
sysfs show functions with sysfs_emit.
Link: https://lore.kernel.org/r/20210922125333.351454-2-haris.iqbal@ionos.com
Signed-off-by: Md Haris Iqbal <haris.iqbal@ionos.com>
Signed-off-by: Jack Wang <jinpu.wang@ionos.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Two list heads in the rdma_id_private are being used for multiple
purposes, to save a few bytes of memory. Give the different purposes
different names and union the memory that is clearly exclusive.
list splits into device_item and listen_any_item. device_item is threaded
onto the cma_device's list and listen_any goes onto the
listen_any_list. IDs doing any listen cannot have devices.
listen_list splits into listen_item and listen_list. listen_list is on the
parent listen any rdma_id_private and listen_item is on child listen that
is bound to a specific cma_dev.
Which name should be used in which case depends on the state and other
factors of the rdma_id_private. Remap all the confusing references to make
sense with the new names, so at least there is some hope of matching the
necessary preconditions with each access.
Link: https://lore.kernel.org/r/0-v1-a5ead4a0c19d+c3a-cma_list_head_jgg@nvidia.com
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
-----BEGIN PGP SIGNATURE-----
iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAmFaG98eHHRvcnZhbGRz
QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGosAH/jqy5B2BIEE39O+8
QTr3vO54SyRRuY/d98wZ+O4SPjfqfpCHuyjKt9YJpEdmzH754NC9gSPOOBegnvHI
DfrWaivmJ5mdjN2h7+JVqjs58krUv98wWNa5xfvqUp5H7wF3WQg3AxsaMKS1PePD
kFHfeFbxsg2gYhyhPK6gHtwLn6dEsx9bGny2bKvCh6KuJQEiUXoEcgnFzjFgLNxp
T5zI1cNSCNUzwRIe+vqQRlfVR2JlSI4tiy0zNJWy9dQ5Z4HOSbFcEz5Df2N7qNYn
/MqruaASmyREgo9yLHpR1BSyzrea8MCckY04ycYqKZb7gDwcrpAe4QVw2I/Fuzu9
q//PV4I=
=+mYg
-----END PGP SIGNATURE-----
Merge tag 'v5.15-rc4' into rdma.get for-next
Merged due to dependencies in following patches.
Conflict in drivers/infiniband/hw/hfi1/ipoib_tx.c resolved by hand to take
the %p change and txq stats rename together.
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
In our internal testing we have found that default maximum values are too
small. Ideally there should be no limits, but since maximum values are
reported via ibv_query_device, we have to return some value. So, the
default maximums have been changed to large values.
Link: https://lore.kernel.org/r/20210915011220.307585-1-Rao.Shoaib@oracle.com
Signed-off-by: Rao Shoaib <Rao.Shoaib@oracle.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
In commit b212921b13 ("elf: don't use MAP_FIXED_NOREPLACE for elf
executable mappings") we still leave MAP_FIXED_NOREPLACE in place for
load_elf_interp.
Unfortunately, this will cause kernel to fail to start with:
1 (init): Uhuuh, elf segment at 00003ffff7ffd000 requested but the memory is mapped already
Failed to execute /init (error -17)
The reason is that the elf interpreter (ld.so) has overlapping segments.
readelf -l ld-2.31.so
Program Headers:
Type Offset VirtAddr PhysAddr
FileSiz MemSiz Flags Align
LOAD 0x0000000000000000 0x0000000000000000 0x0000000000000000
0x000000000002c94c 0x000000000002c94c R E 0x10000
LOAD 0x000000000002dae0 0x000000000003dae0 0x000000000003dae0
0x00000000000021e8 0x0000000000002320 RW 0x10000
LOAD 0x000000000002fe00 0x000000000003fe00 0x000000000003fe00
0x00000000000011ac 0x0000000000001328 RW 0x10000
The reason for this problem is the same as described in commit
ad55eac74f ("elf: enforce MAP_FIXED on overlaying elf segments").
Not only executable binaries, elf interpreters (e.g. ld.so) can have
overlapping elf segments, so we better drop MAP_FIXED_NOREPLACE and go
back to MAP_FIXED in load_elf_interp.
Fixes: 4ed2863951 ("fs, elf: drop MAP_FIXED usage from elf_map")
Cc: <stable@vger.kernel.org> # v4.19
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Michal Hocko <mhocko@suse.com>
Signed-off-by: Chen Jingwen <chenjingwen6@huawei.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>