There is a spelling mistake in several dev_err messages, fix these.
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently, the e-switch driver requires going to legacy mode before
changing to the offloads mode. This makes sense for regular case as
the legacy mode is done by creating VFs.
However, it's problematic when ECPF is the eswitch manager. In such
case, ECPF will control the vports on peer host including the peer
PF and VFs. But ECPF doesn't need and shall not create VFs as the
VFs are created in the peer PF host.
Grant ECPF the ability to change from none to the offloads mode. Note
that currently the only way to go back to none mode is by unloading
the ECPF driver.
Signed-off-by: Bodong Wang <bodong@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
When host PF changes the number of VFs, the ECPF esw driver will get
a FW event. It should query the number of VFs enabled by host PF and
update the VF reps accordingly. Note that host PF can't change the
number of VFs dynamically, it has to reset the number of VFs to 0
before changing to a new positive number.
The host event is registered when driver is moving to switchdev mode,
and it's the last step to do in esw_offloads_init. It's unregistered
and the work queue is flushed when driver quits from switchdev mode.
In this way, the host event and devlink command are serialized.
When driver is enabling switchdev mode, pay attention to the following
two facts:
1. Host PF must not have VF initialized as the flow table in ECPF has
ENCAP enabled as default. Such flow table can't be created with
existing initialized VFs.
2. ECPF doesn't know how many VFs the host PF will enable, ECPF
offloads flow steering shall create the flow table/groups based on
the max number of VFs possibly supported by host PF.
Signed-off-by: Bodong Wang <bodong@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
ECPF connects to the eswitch through vport 0xfffe. ECPF may or may
not be the eswitch manager depending on firmware configuration.
1. If ECPF is eswitch manager: ECPF will take over the eswitch manager
responsibility. A rep of the host PF shall be created at the ECPF
side for the eswitch manager to control.
2. If ECPF is not eswitch manager: host PF will be the eswitch manager,
ECPF acts similar as a VF to the host PF. Host PF will be aware
of the ECPF vport presence and control it's rep.
Signed-off-by: Bodong Wang <bodong@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
In offloads mode, the current implementation puts the uplink
representor at index zero of the vport reps array. It is not "natural"
to place it at index 0 since we want to put the representor for vport
0 at index 0 with the introduction of SmartNIC. A separate patch will
handle the case whether a rep is needed for vport 0 (PF vport).
So, we want to have a different placeholder for uplink vport and
representor. It was placed at the end of vport and rep array. Since
vport number can no longer act as an index into the vport or
representors arrays, use functions to map vport numbers to indices
when accessing the vports or representors arrays, and vice versa.
Signed-off-by: Bodong Wang <bodong@mellanox.com>
Signed-off-by: Eli Cohen <eli@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Eswitch has two users: IB and ETH. They both register repersentors
when mlx5 interface is added, and unregister the repersentors when
mlx5 interface is removed. Ideally, each driver should only deal with
the entities which are unique to itself. However, current IB and ETH
drivers have to perform the following eswitch operations:
1. When registering, specify how many vports to register. This number
is the same for both drivers which is the total available vport
numbers.
2. When unregistering, specify the number of registered vports to do
unregister. Also, unload the repersentors which are already loaded.
It's unnecessary for eswitch driver to hands out the control of above
operations to individual driver users, as they're not unique to each
driver. Instead, such operations should be centralized to eswitch
driver. This consolidates eswitch control flow, and simplified IB and
ETH driver.
This patch doesn't change any functionality.
Signed-off-by: Bodong Wang <bodong@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Currently the driver loads and unloads all reps in an unbreakable
group. However, with ECPF, the reps of special vports such as uplink
and host PF should always be loaded in switchdev mode where the reps
for VFs will be loaded on-demand and unloaded on no-demand. This is
a pre-step for that change.
This patch doesn't change any functionality.
Signed-off-by: Bodong Wang <bodong@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Currently the eswitch vport reps have a valid indicator, which is
set on register and unset on unregister. However, a rep can be loaded
or not loaded when doing unregister, current driver checks if the
vport of that rep is enabled as a flag to imply the rep is loaded.
However, for ECPF, this is not valid as the host PF will enable the
vports for its VFs instead.
Add three states: {unregistered, registered, loaded}, with the
following state changes across different operations:
create: (none) -> unregistered
reg: unregistered -> registered
load: registered -> loaded
unload: loaded -> registered
unreg: registered -> unregistered
Note that the state shall only be updated inside eswitch driver rather
than individual drivers such as ETH or IB.
Signed-off-by: Bodong Wang <bodong@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Suggested-by: Mark Bloch <markb@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
With only PF and VF, it is sufficient to have the vport/rep array
index as the vport number. This is because PF and VF vports numbers
are consecutive serial numbers. In downstream patches with
introducing of ECPF and UPLINK vports, it's not consecutive any more.
Use getter to get specific vport/rep, and use iterator to traversal
a list of vport/rep. This hides the translation between array index
and vport number, and provides flexibility of using different
translation mechanism in the future.
This patch doesn't change any functionality.
Signed-off-by: Bodong Wang <bodong@mellanox.com>
Suggested-by: Saeed Mahameed <saeedm@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
When driver is entering offloads mode, there are two major tasks to
do: initialize flow steering and create representors. Flow steering
should make sure enough flow table/group spaces are reserved for all
reps. Representors will be created in a group, all or none.
With the introduction of ECPF, flow steering should still reserve the
same spaces. But, the representors are not always loaded/unloaded in a
single piece. Once ECPF is in offloads mode, it will get the number
of VF changing event from host PF. In such scenario, only the VF reps
should be loaded/unloaded, not the reps for special vports (such as
the uplink vport).
Thus, when entering offloads mode, driver should specify the total
number of reps, and the number of VF reps separately. When leaving
offloads mode, the cleanup should use the information self-contained
in eswitch such as number of VFs.
This patch doesn't change any functionality.
Signed-off-by: Bodong Wang <bodong@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
E-switch offloads mode initialize/cleanup multiple steering related
entities (flow table/group). Refactor these operations to internal
helper functions for better block design.
This patch doesn't change any functionality.
Signed-off-by: Bodong Wang <bodong@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Commands referring to vports use the following scheme:
1. When referring to my own vport, put 0 in vport and 0 in other_vport.
2. When referring to another vport, put the vport number of the
referred vport and put 1 in other_vport. It was assumed that driver
is accessing other vport when vport number is greater than 0.
With the above scheme, the case that ECPF eswitch manager is trying
to access host PF vport will fall over with scheme 1 as the vport
number is 0. This is apparently wrong as driver is trying to refer
other vport.
As such usage can only happen in the eswitch context, change relevant
functions to provide other vport input properly.
Signed-off-by: Bodong Wang <bodong@mellanox.com>
Signed-off-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
In SmartNIC mode, the eswitch manager is not necessarily the PF
(vport 0). Use a helper function to get the correct eswitch manager
vport number and cache on the eswitch instance for fast reference.
Signed-off-by: Bodong Wang <bodong@mellanox.com>
Signed-off-by: Eli Cohen <eli@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
When bonding is added, driver assumes that it's RoCE LAG if no VF is
enabled. This is not enough for ECPF as the VF is enabled in host PF
side. LAG should only choose RoCE mode when both slave devices meet
conditions below:
1. E-Switch offloads mode is NONE.
2. No VF is enabled.
Signed-off-by: Bodong Wang <bodong@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Merge mlx5-next shared branched into net-next,
From Bodong Wang:
1) Introduction of ECPF (Embedded CPU Physical Function), and low level
bits for mlx5 SmartNic capabilities support.
2) Vport enumeration refactoring that affect mlx5_ib and mlx5_core
From Aya Levin,
3) Add support for 50Gbps per lane link modes in the Port Type and Speed
register (PTYS)
4) Refactor low level query functions for PTYS register
5) Add support for 50Gbps per lane link modes to mlx5_ib
Note: due to a change in API in mlx5/core and a later patch from net-next,
a fixup was squashed with this merge commit that replaces FDB_UPLINK_VPORT
with MLX5_VPORT_UPLINK which exists only in upstream net-next.
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Lightweight tunnels are L3 constructs that are used with IP/IP6.
For example, lwtunnel_xmit is called from ip_output.c and
ip6_output.c only.
Make the dependency explicit at least for LWT-BPF, as now they
call into IP routing.
V2: added "Reported-by" below.
Reported-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Peter Oskolkov <posk@google.com>
Acked-by: Randy Dunlap <rdunlap@infradead.org> # build-tested
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
The netfilter conflicts were rather simple overlapping
changes.
However, the cls_tcindex.c stuff was a bit more complex.
On the 'net' side, Cong is fixing several races and memory
leaks. Whilst on the 'net-next' side we have Vlad adding
the rtnl-ness support.
What I've decided to do, in order to resolve this, is revert the
conversion over to using a workqueue that Cong did, bringing us back
to pure RCU. I did it this way because I believe that either Cong's
races don't apply with have Vlad did things, or Cong will have to
implement the race fix slightly differently.
Signed-off-by: David S. Miller <davem@davemloft.net>
-----BEGIN PGP SIGNATURE-----
iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAlxm7pAQHGF4Ym9lQGtl
cm5lbC5kawAKCRD301j7KXHgpl6JEACM5qHp7HEf7muuLKDUoX16G2eDOjacVxbL
q1kqyHNvrYD/aGo+8vcshCef6xno9fL1akIxTyaTcMwYJUk9JSMicsVimxC1OvI6
a5ZiWItX2L8Nh/heJe+FtutWbrT+Nd+3Q8DqI+U0YkRnjnXaRVgLFtBmjLOxBrqJ
Ps/VepB4GaxA0oWdPbhos/N3wa42uFy3ixdv3Kv6WmHdqraB9uagt8PwwUti3WzQ
uxWL6J+JOBSDha8l3fp68Okib1bm/6Nmmc9l8Yz1eFwf+Y+gVgw7wPQxkUD/XaFW
bDJGwp3NawK07EanIAIzfXUEGfLvgeRJBEP3OGwV/TAiHX5q9zQo/tbM6x8j4aT9
zGlwU/EnwFixgbRW/hOT5Ox4usBlfB1j0ZiNmgUm8QphHrELFnc35Kd+PR/KONNX
sI6ZiifEAMR+4S99kTZ5YjHUqcUVm9ndd8iQGW9mvM6vt3o1L6QKeOeEKBMlhMcx
V+JtViC50ojidYc82kEtQFY9OKRkc5x3k1wBsH49LGMT+fvEwETallOXHTarQKrv
QAZNN1NINkMmrL5bgBXFqf0qpOy4xHnhis5AilUHNZwa4G8iAe8oqz/2eUCydiV1
Ogx20a8T1ifeSkI2NXrwnBjVzqnfiO9wOb9py98BiLR6k59x3GYtbCdGtpIXfSFv
hG79KKoz3Q==
=8mjO
-----END PGP SIGNATURE-----
Merge tag 'for-linus-20190215' of git://git.kernel.dk/linux-block
Pull block fixes from Jens Axboe:
- Ensure we insert into the hctx dispatch list, if a request is marked
as DONTPREP (Jianchao)
- NVMe pull request, single missing unlock on error fix (Keith)
- MD pull request, single fix for a potentially data corrupting issue
(Nate)
- Floppy check_events regression fix (Yufen)
* tag 'for-linus-20190215' of git://git.kernel.dk/linux-block:
md/raid1: don't clear bitmap bits on interrupted recovery.
floppy: check_events callback should not return a negative number
nvme-pci: add missing unlock for reset error
blk-mq: insert rq with DONTPREP to hctx dispatch list when requeue
resulting in less memory use when DM crypt layers on DM integrity.
- Fix a long-standing DM thinp crash consistency bug that was due to
improper handling of FUA. This issue is specific to writes that
fill an entire thinp block which needs to be allocated.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQEcBAABAgAGBQJcZjtbAAoJEMUj8QotnQNaTxMIANdjCyW0LlpNNuDX8hyVzXAc
HNqyFxfNk7LD4ck5jn3HuQo5nCRvne+ltjol0vOqBokITXe1a9t+GB/fWSz0yZd9
69NvwgLoaZZ0pcxeddvUQ2TAOBxCdP8O4JokQL5QgnCt4nvUOWbGQBlSQNBf/8KO
9xa+0z36pMAC2dCnClKSQgwj+ZRZOBwOKSDVl7SiM7SvbNcirtBEgtvjr8gOrKvl
SbLtoFwj8hwJFCpllwIE4ec+bHw9XsCeFEBwGiSnp6GF2sgfLbx0/EpHj09M18Vt
QCXtYxcm8IMsh0w2y4YnmSWDk8yV7P/vVyoBmMjzv/gYx+6Eyxynk8pk32LNnEc=
=jC/2
-----END PGP SIGNATURE-----
Merge tag 'for-5.0/dm-fixes-3' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm
Pull device mapper fixes from Mike Snitzer:
- Fix bug in DM crypt's sizing of its block integrity tag space,
resulting in less memory use when DM crypt layers on DM integrity.
- Fix a long-standing DM thinp crash consistency bug that was due to
improper handling of FUA. This issue is specific to writes that fill
an entire thinp block which needs to be allocated.
* tag 'for-5.0/dm-fixes-3' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
dm thin: fix bug where bio that overwrites thin block ignores FUA
dm crypt: don't overallocate the integrity tag space
-----BEGIN PGP SIGNATURE-----
iQIcBAABAgAGBQJcZi5LAAoJEAx081l5xIa+tSQQAKblf/Ca7QryDUbAN8JIxeJp
NuJiNP4jvvZbCod/ISktU+0zY2uKnW09//ljkEdGVg+Ilww2HHm/drs1HRUMP6QP
U6EKhoJQ99OfsBYy3J+PloBz9uS/ziGJB6YN0qcJkTZ1tvAenNqO88MWJittDZCu
ao92sB0mwW3s+R/36OtCMce3LDGPuMst98z2+tN+C4JWZW4tktYyGo/fSsZ60Gry
Hxo/X/K0F4qn5vPfPL41fH0DzXpKiuztp7WsK97YS3Wa2VeNynKaORdcWyBBQq/n
t2NvLXyW58/wzHG0u1lbWEUEor2LJZ9Cd5aVl+i8e8giR2RogTEBVbqR+hXciTPe
3lUfSXKwwus1tsiX8amcIVPIpIyZ5Hk5igfql/EHFki3zdbOESVgrFtzwas2oW6b
GljURKcNe40ZK0btaogB8m1lZ1sN2poDgB3QYrIVFywzhv+Bm3TUexmWxbho/lYW
jho3OkP3UlxHyN4TF0CjG2lEU+kCo8EPzkOoRCzjTo49DUorqjtAMcZrbFr3z5wK
oq0m9G+itUqnKmsAA9ElHsGHK4pwRpA6hwFMnOZTWb6sVZGoT4fU6pAucZoZXE8j
wlADT/HRZECes9PVqrxeqA4h8EVlira6fdV1S9YzuMHrMpwo335Suf8eghQpFkST
oOXYb8/6GfKGyz3JDxb8
=dYoy
-----END PGP SIGNATURE-----
Merge tag 'drm-fixes-2019-02-15-1' of git://anongit.freedesktop.org/drm/drm
Pull drm fixes from Dave Airlie:
"Usual pull request, little larger than I'd like but nothing too
strange in it. Willy found an bug in the lease ioctl calculations, but
it's a drm master only ioctl which makes it harder to mess with.
i915:
- combo phy programming fix
- opregion version check fix for VBT RVDA lookup
- gem mmap ioctl race fix
- fbdev hpd during suspend fix
- array size bounds check fix in pmu
amdgpu:
- Vega20 psp fix
- Add vrr range to debugfs for freesync debugging
sched:
- Scheduler race fix
vkms:
- license header fixups
imx:
- Fix CSI register offsets for i.MX51 and i.MX53.
- Fix delayed page flip completion events on i.MX6QP due to
unexpected behaviour of the PRE when issuing NOP buffer updates to
the same buffer address.
- Stop throwing errors for plane updates on disabled CRTCs when a
userspace process is killed while a plane update is pending.
- Add missing of_node_put cleanup in imx_ldb_bind"
* tag 'drm-fixes-2019-02-15-1' of git://anongit.freedesktop.org/drm/drm:
drm: Use array_size() when creating lease
drm/amdgpu/psp11: TA firmware is optional (v3)
drm/i915/opregion: rvda is relative from opregion base in opregion 2.1+
drm/i915/opregion: fix version check
drm/i915: Prevent a race during I915_GEM_MMAP ioctl with WC set
drm/i915: Block fbdev HPD processing during suspend
drm/i915/pmu: Fix enable count array size and bounds checking
drm/i915/cnl: Fix CNL macros for Voltage Swing programming
drm/i915/icl: combo port vswing programming changes per BSPEC
drm/vkms: Fix license inconsistent
drm/amd/display: Expose connector VRR range via debugfs
drm/sched: Always trace the dependencies we wait on, to fix a race.
gpu: ipu-v3: pre: don't trigger update if buffer address doesn't change
gpu: ipu-v3: Fix CSI offsets for imx53
drm/imx: imx-ldb: add missing of_node_puts
gpu: ipu-v3: Fix i.MX51 CSI control registers offset
drm/imx: ignore plane updates on disabled crtcs
Pull crypto fix from Herbert Xu:
"This fixes a crash on resume in the ccree driver"
* 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
crypto: ccree - fix resume race condition on init
Pull networking fixes from David Miller:
1) Fix MAC address setting in mac80211 pmsr code, from Johannes Berg.
2) Probe SFP modules after being attached, from Russell King.
3) Byte ordering bug in SMC rx_curs_confirmed code, from Ursula Braun.
4) Revert some r8169 changes that are causing regressions, from Heiner
Kallweit.
5) Fix spurious connection timeouts in netfilter nat code, from Florian
Westphal.
6) SKB leak in tipc, from Hoang Le.
7) Short packet checkum issue in mlx4, similar to a previous mlx5
change, from Saeed Mahameed. The issue is that whilst padding bytes
are usually zero, it is not guarateed and the hardware doesn't take
the padding bytes into consideration when generating the checksum.
8) Fix various races in cls_tcindex, from Cong Wang.
9) Need to set stream ext to NULL before freeing in SCTP code, from Xin
Long.
10) Fix locking in phy_is_started, from Heiner Kallweit.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (54 commits)
net: ethernet: freescale: set FEC ethtool regs version
net: hns: Fix object reference leaks in hns_dsaf_roce_reset()
mm: page_alloc: fix ref bias in page_frag_alloc() for 1-byte allocs
net: phy: fix potential race in the phylib state machine
net: phy: don't use locking in phy_is_started
selftests: fix timestamping Makefile
net: dsa: bcm_sf2: potential array overflow in bcm_sf2_sw_suspend()
net: fix possible overflow in __sk_mem_raise_allocated()
dsa: mv88e6xxx: Ensure all pending interrupts are handled prior to exit
net: phy: fix interrupt handling in non-started states
sctp: set stream ext to NULL after freeing it in sctp_stream_outq_migrate
sctp: call gso_reset_checksum when computing checksum in sctp_gso_segment
net/mlx5e: XDP, fix redirect resources availability check
net/mlx5: Fix a compilation warning in events.c
net/mlx5: No command allowed when command interface is not ready
net/mlx5e: Fix NULL pointer derefernce in set channels error flow
netfilter: nft_compat: use-after-free when deleting targets
team: avoid complex list operations in team_nl_cmd_options_set()
net_sched: fix two more memory leaks in cls_tcindex
net_sched: fix a memory leak in cls_tcindex
...
Pull signal fix from Eric Biederman:
"Just a single patch that restores PTRACE_EVENT_EXIT functionality that
was accidentally broken by last weeks fixes"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
signal: Restore the stop PTRACE_EVENT_EXIT
Add new accessor for bpf_object to get opaque struct btf * from it.
struct btf * is needed for all operations with BTF and it's present in
bpf_object. The only thing missing is a way to get it.
Example use-case is to get BTF key_type_id and value_type_id for a map in
bpf_object. It can be done with btf__get_map_kv_tids() but that function
requires struct btf *.
Similar API can be added for struct btf_ext but no use-case for it yet.
Signed-off-by: Andrey Ignatov <rdna@fb.com>
Acked-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Add bpf_map__resize() to change max_entries for a map.
Quite often necessary map size is unknown at compile time and can be
calculated only at run time.
Currently the following approach is used to do so:
* bpf_object__open_buffer() to open Elf file from a buffer;
* bpf_object__find_map_by_name() to find relevant map;
* bpf_map__def() to get map attributes and create struct
bpf_create_map_attr from them;
* update max_entries in bpf_create_map_attr;
* bpf_create_map_xattr() to create new map with updated max_entries;
* bpf_map__reuse_fd() to replace the map in bpf_object with newly
created one.
And after all this bpf_object can finally be loaded. The map will have
new size.
It 1) is quite a lot of steps; 2) doesn't take BTF into account.
For "2)" even more steps should be made and some of them require changes
to libbpf (e.g. to get struct btf * from bpf_object).
Instead the whole problem can be solved by introducing simple
bpf_map__resize() API that checks the map and sets new max_entries if
the map is not loaded yet.
So the new steps are:
* bpf_object__open_buffer() to open Elf file from a buffer;
* bpf_object__find_map_by_name() to find relevant map;
* bpf_map__resize() to update max_entries.
That's much simpler and works with BTF.
Signed-off-by: Andrey Ignatov <rdna@fb.com>
Acked-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Commit c9b47cc1fa ("xsk: fix bug when trying to use both copy and
zero-copy on one queue id") moved the umem query code to the AF_XDP
core, and therefore removed the need to query the netdevice for a
umem.
This patch removes XDP_QUERY_XSK_UMEM and all code that implement that
behavior, which is just dead code.
Signed-off-by: Jan Sokolowski <jan.sokolowski@intel.com>
Acked-by: Björn Töpel <bjorn.topel@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Passing an object_count of sufficient size will make
object_count * 4 wrap around to be very small, then a later function
will happily iterate off the end of the object_ids array. Using
array_size() will saturate at SIZE_MAX, the kmalloc() will fail and
we'll return an -ENOMEM to the norty userspace.
Fixes: 62884cd386 ("drm: Add four ioctls for managing drm mode object leases [v7]")
Signed-off-by: Matthew Wilcox <willy@infradead.org>
Acked-by: Kees Cook <keescook@chromium.org>
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: <stable@vger.kernel.org> # v4.15+
Signed-off-by: Dave Airlie <airlied@redhat.com>
When provisioning a new data block for a virtual block, either because
the block was previously unallocated or because we are breaking sharing,
if the whole block of data is being overwritten the bio that triggered
the provisioning is issued immediately, skipping copying or zeroing of
the data block.
When this bio completes the new mapping is inserted in to the pool's
metadata by process_prepared_mapping(), where the bio completion is
signaled to the upper layers.
This completion is signaled without first committing the metadata. If
the bio in question has the REQ_FUA flag set and the system crashes
right after its completion and before the next metadata commit, then the
write is lost despite the REQ_FUA flag requiring that I/O completion for
this request must only be signaled after the data has been committed to
non-volatile storage.
Fix this by deferring the completion of overwrite bios, with the REQ_FUA
flag set, until after the metadata has been committed.
Cc: stable@vger.kernel.org
Signed-off-by: Nikos Tsironis <ntsironis@arrikto.com>
Acked-by: Joe Thornber <ejt@redhat.com>
Acked-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Syncing if_link.h that got out of sync.
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
bzero() call is deprecated and superseded by memset().
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Reported-by: David Laight <david.laight@aculab.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
On error the skb should be freed. Tested with diff/steps
provided by David Ahern.
v2: surface routing errors to the user instead of a generic EINVAL,
as suggested by David Ahern.
Reported-by: David Ahern <dsahern@gmail.com>
Fixes: 3bd0b15281 ("bpf: add handling of BPF_LWT_REROUTE to lwt_bpf.c")
Signed-off-by: Peter Oskolkov <posk@google.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
This reverts commit 8099b047ec.
It turns out that people do actually depend on the shebang string being
truncated, and on the fact that an interpreter (like perl) will often
just re-interpret it entirely to get the full argument list.
Reported-by: Samuel Dionne-Riel <samuel@dionne-riel.com>
Acked-by: Kees Cook <keescook@chromium.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Driver now supports new link modes: 50Gbps per lane support for
50G/100G/200G. This patch reads the correct field (legacy vs. extended)
based on a FW indication bit, and adds a translation function (link
modes to IB width and speed) to the new link modes.
Signed-off-by: Aya Levin <ayal@mellanox.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
This patch exposes new link modes (including 50Gbps per lane), and ext_*
fields which describes the new link modes in Port Type and Speed
register (PTYS).
Access functions, translation functions (speed <-> HW bits) and
link max speed function were modified.
Signed-off-by: Aya Levin <ayal@mellanox.com>
Reviewed-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Register Port Type and Speed (PTYS) introduces three new fields
extending the speed/protocols the can be reported and configured.
Signed-off-by: Aya Levin <ayal@mellanox.com>
Reviewed-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
This patch fascicles queries to speed related fields in Port Type and
Speed register (PTYS) into a single API. I addition, this patch
refactors functions which serves only Ethernet driver: remove the
protocol type as an input parameter, move code from 'core' directory
into 'en' directory and add 'eth' prefix to the function's name. The
patch also encapsulates functions that are not used outside the Ethernet
driver removes redundant include files.
Signed-off-by: Aya Levin <ayal@mellanox.com>
Reviewed-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
When dealing with the offloads mode initialization, driver refers to
the number of VFs and add magic number one (1) to take account of the
uplink. This is not clear and will make the code less readable after
adding other vports (e.g. host PF). As these are special vports
compared to VF vports, add a helper macro to denote such special
vports and eliminate the use of magic number.
Moreover, when creating offloads flow table and groups, the driver
reserves two more slots for UC and MC miss rules. Replace this magic
number with a helper macro as well.
This patch doesn't change any functionality.
Signed-off-by: Bodong Wang <bodong@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
These are two macros in the driver general header which deal with the
number of total vports and if a vport is vport manager. Such macros
are vport entities, better to place them at the vport header file.
This patch doesn't change any functionality.
Signed-off-by: Bodong Wang <bodong@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Driver used to name uplink vport as FDB_UPLINK_VPORT, it's hard to
comply with the same naming convention along with the introduction of
other vports. Use MLX5_VPORT as the prefix for such vports and
relocate the uplink vport definition to public header file for the
benefits of both net and IB drivers.
This patch doesn't change any functionality.
Signed-off-by: Bodong Wang <bodong@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
ECPF doesn't support SR-IOV, but an ECPF E-Switch manager shall know
the max VFs supported by its peer host PF in order to control those
VF vports.
The current driver implementation uses the total vfs quantity as
provided by the pci sub-system for an upper bound of the VF vports
the e-switch code needs to deal with. This obviously can't work as
is on ECPF e-switch manager. For now, we use a hard coded value of
128 on such systems.
Signed-off-by: Bodong Wang <bodong@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
In Embedded CPU (EC) configurations, the EC driver needs to know when
the number of virtual functions change on the corresponding PF at the
host side. This is required so the EC driver can create or destroy
representor net devices that represent the VFs ports.
Whenever a change in the number of VFs occurs, firmware will generate an
event towards the EC which will trigger a work to complete the rest of
the handling. The specifics of the handling will be introduced in a
downstream patch.
Signed-off-by: Bodong Wang <bodong@mellanox.com>
Signed-off-by: Eli Cohen <eli@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
The QUERY_HOST_PARAMS command is used by an Embedded CPU Physical
Function (ECPF) driver to identify and retrieve information about the
PF on the host side. E.g, number of virtual functions and PCI BDF.
The number of VFs can be changed on the fly, a function is added to
query current number of VFs and will be used in downstream patches.
Signed-off-by: Bodong Wang <bodong@mellanox.com>
Signed-off-by: Eli Cohen <eli@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
With the introduction of ECPF, we require that the ECPF driver will
aways call enable/disable HCA for that PF in the same way a PF does
this for its VFs. The PF is still responsible for calling enable and
disable HCA for its VFs.
To distinguish between the ECPF executing enable/disable HCA for
itself or for the PF, it sets the embedded CPU function bit in the
input params struct of these commands. When the bit is cleared and
function ID is zero, it refers to the peer PF.
Signed-off-by: Bodong Wang <bodong@mellanox.com>
Signed-off-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Mellanox's SmartNIC combines embedded CPU(e.g, ARM) processing power
with advanced network offloads to accelerate a multitude of security,
networking and storage applications.
With the introduction of the SmartNIC, there is a new PCI function
called Embedded CPU Physical Function(ECPF). And it's possible for a
PF to get its ICM pages from the ECPF PCI function. Driver shall
identify if it is running on such a function by reading a bit in
the initialization segment.
When firmware asks for pages, it would issue a page request event
specifying how many pages it requests and for which function. That
driver responds with a manage_pages command providing the requested
pages along with an indication for which function it is providing these
pages.
The encoding before this patch was as follows:
function_id == 0: pages are requested for the function receiving
the EQE.
function_id != 0: pages are requested for VF identified by the
function_id value
A new one bit field in the EQE identifies that pages are requested for
the ECPF.
The notion of page_supplier can be introduced here and to support that,
manage pages and query pages were modified so firmware can distinguish
the following cases:
1. Function provides pages for itself
2. PF provides pages for its VF
3. ECPF provides pages to itself
4. ECPF provides pages for another function
This distinction is possible through the introduction of the bit
"embedded_cpu_function" in query_pages, manage_pages and page request
EQE.
Signed-off-by: Bodong Wang <bodong@mellanox.com>
Signed-off-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
IB driver maintains different registration and load function calls
for uplink and VF vports. This is not necessary as they only differ
with each other on their profiles.
This patch doesn't change any functionality.
Signed-off-by: Bodong Wang <bodong@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>