Commit Graph

935214 Commits

Author SHA1 Message Date
Ido Schimmel
ec4f5b3617 mlxsw: spectrum: Use different trap group for externally routed packets
Cited commit mistakenly removed the trap group for externally routed
packets (e.g., via the management interface) and grouped locally routed
and externally routed packet traps under the same group, thereby
subjecting them to the same policer.

This can result in problems, for example, when FRR is restarted and
suddenly all transient traffic is trapped to the CPU because of a
default route through the management interface. Locally routed packets
required to re-establish a BGP connection will never reach the CPU and
the routing tables will not be re-populated.

Fix this by using a different trap group for externally routed packets.

Fixes: 8110668ecd ("mlxsw: spectrum_trap: Register layer 3 control traps")
Reported-by: Alex Veber <alexve@mellanox.com>
Tested-by: Alex Veber <alexve@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-29 12:16:21 -07:00
Ido Schimmel
89ab533135 mlxsw: spectrum_router: Allow programming link-local host routes
Cited commit added the ability to program link-local prefix routes to
the ASIC so that relevant packets are routed and trapped correctly.

However, host routes were not included in the change and thus not
programmed to the ASIC. This can result in packets being trapped via an
external route trap instead of a local route trap as in IPv4.

Fix this by programming all the link-local routes to the ASIC.

Fixes: 10d3757fcb ("mlxsw: spectrum_router: Allow programming link-local prefix routes")
Reported-by: Alex Veber <alexve@mellanox.com>
Tested-by: Alex Veber <alexve@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-29 12:16:21 -07:00
Ido Schimmel
83f3522860 ipv4: Silence suspicious RCU usage warning
fib_trie_unmerge() is called with RTNL held, but not from an RCU
read-side critical section. This leads to the following warning [1] when
the FIB alias list in a leaf is traversed with
hlist_for_each_entry_rcu().

Since the function is always called with RTNL held and since
modification of the list is protected by RTNL, simply use
hlist_for_each_entry() and silence the warning.

[1]
WARNING: suspicious RCU usage
5.8.0-rc4-custom-01520-gc1f937f3f83b #30 Not tainted
-----------------------------
net/ipv4/fib_trie.c:1867 RCU-list traversed in non-reader section!!

other info that might help us debug this:

rcu_scheduler_active = 2, debug_locks = 1
1 lock held by ip/164:
 #0: ffffffff85a27850 (rtnl_mutex){+.+.}-{3:3}, at: rtnetlink_rcv_msg+0x49a/0xbd0

stack backtrace:
CPU: 0 PID: 164 Comm: ip Not tainted 5.8.0-rc4-custom-01520-gc1f937f3f83b #30
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-2.fc32 04/01/2014
Call Trace:
 dump_stack+0x100/0x184
 lockdep_rcu_suspicious+0x153/0x15d
 fib_trie_unmerge+0x608/0xdb0
 fib_unmerge+0x44/0x360
 fib4_rule_configure+0xc8/0xad0
 fib_nl_newrule+0x37a/0x1dd0
 rtnetlink_rcv_msg+0x4f7/0xbd0
 netlink_rcv_skb+0x17a/0x480
 rtnetlink_rcv+0x22/0x30
 netlink_unicast+0x5ae/0x890
 netlink_sendmsg+0x98a/0xf40
 ____sys_sendmsg+0x879/0xa00
 ___sys_sendmsg+0x122/0x190
 __sys_sendmsg+0x103/0x1d0
 __x64_sys_sendmsg+0x7d/0xb0
 do_syscall_64+0x54/0xa0
 entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x7fc80a234e97
Code: Bad RIP value.
RSP: 002b:00007ffef8b66798 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fc80a234e97
RDX: 0000000000000000 RSI: 00007ffef8b66800 RDI: 0000000000000003
RBP: 000000005f141b1c R08: 0000000000000001 R09: 0000000000000000
R10: 00007fc80a2a8ac0 R11: 0000000000000246 R12: 0000000000000001
R13: 0000000000000000 R14: 00007ffef8b67008 R15: 0000556fccb10020

Fixes: 0ddcf43d5d ("ipv4: FIB Local/MAIN table collapse")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-29 12:12:50 -07:00
Ido Schimmel
b5141915b5 vxlan: Ensure FDB dump is performed under RCU
The commit cited below removed the RCU read-side critical section from
rtnl_fdb_dump() which means that the ndo_fdb_dump() callback is invoked
without RCU protection.

This results in the following warning [1] in the VXLAN driver, which
relied on the callback being invoked from an RCU read-side critical
section.

Fix this by calling rcu_read_lock() in the VXLAN driver, as already done
in the bridge driver.

[1]
WARNING: suspicious RCU usage
5.8.0-rc4-custom-01521-g481007553ce6 #29 Not tainted
-----------------------------
drivers/net/vxlan.c:1379 RCU-list traversed in non-reader section!!

other info that might help us debug this:

rcu_scheduler_active = 2, debug_locks = 1
1 lock held by bridge/166:
 #0: ffffffff85a27850 (rtnl_mutex){+.+.}-{3:3}, at: netlink_dump+0xea/0x1090

stack backtrace:
CPU: 1 PID: 166 Comm: bridge Not tainted 5.8.0-rc4-custom-01521-g481007553ce6 #29
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-2.fc32 04/01/2014
Call Trace:
 dump_stack+0x100/0x184
 lockdep_rcu_suspicious+0x153/0x15d
 vxlan_fdb_dump+0x51e/0x6d0
 rtnl_fdb_dump+0x4dc/0xad0
 netlink_dump+0x540/0x1090
 __netlink_dump_start+0x695/0x950
 rtnetlink_rcv_msg+0x802/0xbd0
 netlink_rcv_skb+0x17a/0x480
 rtnetlink_rcv+0x22/0x30
 netlink_unicast+0x5ae/0x890
 netlink_sendmsg+0x98a/0xf40
 __sys_sendto+0x279/0x3b0
 __x64_sys_sendto+0xe6/0x1a0
 do_syscall_64+0x54/0xa0
 entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x7fe14fa2ade0
Code: Bad RIP value.
RSP: 002b:00007fff75bb5b88 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
RAX: ffffffffffffffda RBX: 00005614b1ba0020 RCX: 00007fe14fa2ade0
RDX: 000000000000011c RSI: 00007fff75bb5b90 RDI: 0000000000000003
RBP: 00007fff75bb5b90 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00005614b1b89160
R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000

Fixes: 5e6d243587 ("bridge: netlink dump interface at par with brctl")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-29 12:04:54 -07:00
Mike Marciniszyn
54a485e9ec IB/rdmavt: Fix RQ counting issues causing use of an invalid RWQE
The lookaside count is improperly initialized to the size of the
Receive Queue with the additional +1.  In the traces below, the
RQ size is 384, so the count was set to 385.

The lookaside count is then rarely refreshed.  Note the high and
incorrect count in the trace below:

rvt_get_rwqe: [hfi1_0] wqe ffffc900078e9008 wr_id 55c7206d75a0 qpn c
	qpt 2 pid 3018 num_sge 1 head 1 tail 0, count 385
rvt_get_rwqe: (hfi1_rc_rcv+0x4eb/0x1480 [hfi1] <- rvt_get_rwqe) ret=0x1

The head,tail indicate there is only one RWQE posted although the count
says 385 and we correctly return the element 0.

The next call to rvt_get_rwqe with the decremented count:

rvt_get_rwqe: [hfi1_0] wqe ffffc900078e9058 wr_id 0 qpn c
	qpt 2 pid 3018 num_sge 0 head 1 tail 1, count 384
rvt_get_rwqe: (hfi1_rc_rcv+0x4eb/0x1480 [hfi1] <- rvt_get_rwqe) ret=0x1

Note that the RQ is empty (head == tail) yet we return the RWQE at tail 1,
which is not valid because of the bogus high count.

Best case, the RWQE has never been posted and the rc logic sees an RWQE
that is too small (all zeros) and puts the QP into an error state.

In the worst case, a server slow at posting receive buffers might fool
rvt_get_rwqe() into fetching an old RWQE and corrupt memory.

Fix by deleting the faulty initialization code and creating an
inline to fetch the posted count and convert all callers to use
new inline.

Fixes: f592ae3c99 ("IB/rdmavt: Fracture single lock used for posting and processing RWQEs")
Link: https://lore.kernel.org/r/20200728183848.22226.29132.stgit@awfm-01.aw.intel.com
Reported-by: Zhaojuan Guo <zguo@redhat.com>
Cc: <stable@vger.kernel.org> # 5.4.x
Reviewed-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Tested-by: Honggang Li <honli@redhat.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-29 15:54:36 -03:00
Linus Torvalds
c2f3850df7 drm fixes for 5.8-rc8
core:
 - fix possible use-after-free
 
 drm_fb_helper:
 - regression fix to use memcpy_io on bochs' sparc64
 
 nouveau:
 - format modifiers fixes
 - HDA regression fix
 - turing modesetting race fix
 
 of:
 - fix a double free
 dbi:
 - fix SPI Type 1 transfer
 
 mcde:
 - fix screen stability crash
 
 panel:
 - panel: fix display noise on auo,kd101n80-45na
 - panel: delay HPD checks for boe_nv133fhm_n61
 
 bridge:
 - bridge: drop connector check in nwl-dsi bridge
 - bridge: set proper bridge type for adv7511
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABAgAGBQJfIP13AAoJEAx081l5xIa+FSYP/AhLOGtHCzoXdaKKRrhIzRsP
 AGBpV6GCsc5VLwIokaTAf4PaRzQHQp1Kymy19vyqO+1Zp/vv0kNRohMi6ITQCkHS
 HK/mCmSE5ZBlMmzfbPJ5teZXz6z+fjaFDoVzvX4+G2mnI60uze50qUQbbp/iPAfy
 bfZUPhMt4eybaUq1zjdYff/ubeaz94y0x6JcaEfE1bwQYHs9iq0CQNZrRKHwdem4
 We1UzEUXQqrL67V3Bto5LN3pkw0qPbGcZAmxO5Te5h0mrpWd0ldtFxJ4Xe0xE1Ga
 pYPHU8Es88dMvqr8poYM0IDxQ/otGM0Derj/7N9chKQ677e3G0WZ7KQpctnyc8ge
 6WZOQxtksyja1nKqjfgYcoVOHlJG4miztuceZMO4S3XwQrRVUAQ/LZQT8X7mB0Ad
 9e4sDMQPhP4p9msFnGADwaTLFJRk2TKiyeTq0Qc8UdCFMU4IQ8m2GBFPuF/5CLj1
 948Y7LhoBWL/h657Fns+OeIfmvxiXIySzGNJWmScZrVcRsEISLo5u9Zbn4MrsgxK
 yX5S7fBM/yZAcATRQxxYhauB9wUu78wucy+vrRu23Xd/32oIuRRHXkaUk8ielX31
 VhdFObWf4D4yZtR5JvqpjkcB0p9kOOBlTN59PtMrEpXObmr9oOrdYQJlpQfwNT5k
 xRz1lyMwDPUJVT28rqt/
 =45sE
 -----END PGP SIGNATURE-----

Merge tag 'drm-fixes-2020-07-29' of git://anongit.freedesktop.org/drm/drm into master

Pull drm fixes from Dave Airlie:
 "The nouveau fixes missed the last pull by a few hours, and we had a
  few arm driver/panel/bridge fixes come in.

  This is possibly a bit more than I'm comfortable sending at this
  stage, but I've looked at each patch, the core + nouveau patches fix
  regressions, and the arm related ones are all around screens turning
  on and working, and are mostly trivial patches, the line count is
  mostly in comments.

  core:
   - fix possible use-after-free

  drm_fb_helper:
   - regression fix to use memcpy_io on bochs' sparc64

  nouveau:
   - format modifiers fixes
   - HDA regression fix
   - turing modesetting race fix

  of:
   - fix a double free

  dbi:
   - fix SPI Type 1 transfer

  mcde:
   - fix screen stability crash

  panel:
   - panel: fix display noise on auo,kd101n80-45na
   - panel: delay HPD checks for boe_nv133fhm_n61

  bridge:
   - bridge: drop connector check in nwl-dsi bridge
   - bridge: set proper bridge type for adv7511"

* tag 'drm-fixes-2020-07-29' of git://anongit.freedesktop.org/drm/drm:
  drm: hold gem reference until object is no longer accessed
  drm/dbi: Fix SPI Type 1 (9-bit) transfer
  drm/drm_fb_helper: fix fbdev with sparc64
  drm/mcde: Fix stability issue
  drm/bridge: nwl-dsi: Drop DRM_BRIDGE_ATTACH_NO_CONNECTOR check.
  drm/panel: Fix auo, kd101n80-45na horizontal noise on edges of panel
  drm: panel: simple: Delay HPD checking on boe_nv133fhm_n61 for 15 ms
  drm/bridge/adv7511: set the bridge type properly
  drm: of: Fix double-free bug
  drm/nouveau/fbcon: zero-initialise the mode_cmd2 structure
  drm/nouveau/fbcon: fix module unload when fbcon init has failed for some reason
  drm/nouveau/kms/tu102: wait for core update to complete when assigning windows
  drm/nouveau/kms/gf100: use correct format modifiers
  drm/nouveau/disp/gm200-: fix regression from HDA SOR selection changes
2020-07-29 11:39:20 -07:00
Willy Tarreau
f227e3ec3b random32: update the net random state on interrupt and activity
This modifies the first 32 bits out of the 128 bits of a random CPU's
net_rand_state on interrupt or CPU activity to complicate remote
observations that could lead to guessing the network RNG's internal
state.

Note that depending on some network devices' interrupt rate moderation
or binding, this re-seeding might happen on every packet or even almost
never.

In addition, with NOHZ some CPUs might not even get timer interrupts,
leaving their local state rarely updated, while they are running
networked processes making use of the random state.  For this reason, we
also perform this update in update_process_times() in order to at least
update the state when there is user or system activity, since it's the
only case we care about.

Reported-by: Amit Klein <aksecurity@gmail.com>
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Eric Dumazet <edumazet@google.com>
Cc: "Jason A. Donenfeld" <Jason@zx2c4.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-07-29 10:35:37 -07:00
Michael S. Tsirkin
168c358af2 virtio_balloon: fix up endian-ness for free cmd id
free cmd id is read using virtio endian, spec says all fields
in balloon are LE. Fix it up.

Fixes: 86a559787e ("virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT")
Cc: stable@vger.kernel.org
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Reviewed-by: Wei Wang <wei.w.wang@intel.com>
Acked-by: David Hildenbrand <david@redhat.com>
2020-07-29 13:24:30 -04:00
Alexander Duyck
ca72cc3483 virtio-balloon: Document byte ordering of poison_val
The poison_val field in the virtio_balloon_config is treated as a
little-endian field by the host. Since we are currently only having to deal
with a single byte poison value this isn't a problem, however if the value
should ever expand it would cause byte ordering issues. Document that in
the code so that we know that if the value should ever expand we need to
byte swap the value on big-endian architectures.

Signed-off-by: Alexander Duyck <alexander.h.duyck@linux.intel.com>
Link: https://lore.kernel.org/r/20200713203539.17140.71425.stgit@localhost.localdomain
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: David Hildenbrand <david@redhat.com>
2020-07-29 13:24:30 -04:00
Michael S. Tsirkin
295c1b9852 vhost/scsi: fix up req type endian-ness
vhost/scsi doesn't handle type conversion correctly
for request type when using virtio 1.0 and up for BE,
or cross-endian platforms.

Fix it up using vhost_32_to_cpu.

Cc: stable@vger.kernel.org
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
2020-07-29 13:24:30 -04:00
Jens Axboe
d6364a867c Merge branch 'nvme-5.8' of git://git.infradead.org/nvme into block-5.8
Pull NVMe fixes from Christoph.

* 'nvme-5.8' of git://git.infradead.org/nvme:
  nvme: add a Identify Namespace Identification Descriptor list quirk
  nvme-pci: prevent SK hynix PC400 from using Write Zeroes command
  nvme-tcp: fix possible hang waiting for icresp response
2020-07-29 11:21:14 -06:00
Leon Romanovsky
81530ab08e RDMA/mlx5: Allow providing extra scatter CQE QP flag
Scatter CQE feature relies on two flags MLX5_QP_FLAG_SCATTER_CQE and
MLX5_QP_FLAG_ALLOW_SCATTER_CQE, both of them can be provided without
relation to device capability.

Relax global validity check to allow MLX5_QP_FLAG_ALLOW_SCATTER_CQE QP
flag.

Existing user applications are failing on this new validity check.

Fixes: 90ecb37a75 ("RDMA/mlx5: Change scatter CQE flag to be set like other vendor flags")
Fixes: 37518fa49f ("RDMA/mlx5: Process all vendor flags in one place")
Link: https://lore.kernel.org/r/20200728120255.805733-1-leon@kernel.org
Reviewed-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-29 14:19:01 -03:00
Qiushi Wu
fe3c606843 firmware: Fix a reference count leak.
kobject_init_and_add() takes reference even when it fails.
If this function returns an error, kobject_put() must be called to
properly clean up the memory associated with the object.
Callback function fw_cfg_sysfs_release_entry() in kobject_put()
can handle the pointer "entry" properly.

Signed-off-by: Qiushi Wu <wu000273@umn.edu>
Link: https://lore.kernel.org/r/20200613190533.15712-1-wu000273@umn.edu
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2020-07-29 13:13:50 -04:00
Thomas Gleixner
bdd6558959 x86/i8259: Use printk_deferred() to prevent deadlock
0day reported a possible circular locking dependency:

Chain exists of:
  &irq_desc_lock_class --> console_owner --> &port_lock_key

 Possible unsafe locking scenario:

       CPU0                    CPU1
       ----                    ----
  lock(&port_lock_key);
                               lock(console_owner);
                               lock(&port_lock_key);
  lock(&irq_desc_lock_class);

The reason for this is a printk() in the i8259 interrupt chip driver
which is invoked with the irq descriptor lock held, which reverses the
lock operations vs. printk() from arbitrary contexts.

Switch the printk() to printk_deferred() to avoid that.

Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/87365abt2v.fsf@nanos.tec.linutronix.de
2020-07-29 16:27:16 +02:00
Paul Moore
8ac68dc455 revert: 1320a4052e ("audit: trigger accompanying records when no rules present")
Unfortunately the commit listed in the subject line above failed
to ensure that the task's audit_context was properly initialized/set
before enabling the "accompanying records".  Depending on the
situation, the resulting audit_context could have invalid values in
some of it's fields which could cause a kernel panic/oops when the
task/syscall exists and the audit records are generated.

We will revisit the original patch, with the necessary fixes, in a
future kernel but right now we just want to fix the kernel panic
with the least amount of added risk.

Cc: stable@vger.kernel.org
Fixes: 1320a4052e ("audit: trigger accompanying records when no rules present")
Reported-by: j2468h@googlemail.com
Signed-off-by: Paul Moore <paul@paul-moore.com>
2020-07-29 10:00:36 -04:00
Ranjani Sridharan
7fcd9bb5ac ALSA: hda: fix NULL pointer dereference during suspend
When the ASoC card registration fails and the codec component driver
never probes, the codec device is not initialized and therefore
memory for codec->wcaps is not allocated. This results in a NULL pointer
dereference when the codec driver suspend callback is invoked during
system suspend. Fix this by returning without performing any actions
during codec suspend/resume if the card was not registered successfully.

Reviewed-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
Signed-off-by: Ranjani Sridharan <ranjani.sridharan@linux.intel.com>
Link: https://lore.kernel.org/r/20200728231011.1454066-1-ranjani.sridharan@linux.intel.com
Signed-off-by: Takashi Iwai <tiwai@suse.de>
2020-07-29 09:54:49 +02:00
Christoph Hellwig
5bedd3afee nvme: add a Identify Namespace Identification Descriptor list quirk
Add a quirk for a device that does not support the Identify Namespace
Identification Descriptor list despite claiming 1.3 compliance.

Fixes: ea43d9709f ("nvme: fix identify error status silent ignore")
Reported-by: Ingo Brunberg <ingo_brunberg@web.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Tested-by: Ingo Brunberg <ingo_brunberg@web.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
2020-07-29 08:05:44 +02:00
Dave Airlie
a4a2739beb * drm: fix possible use-after-free
* dbi: fix SPI Type 1 transfer
  * drm_fb_helper: use memcpy_io on bochs' sparc64
  * mcde: fix stability
  * panel: fix display noise on auo,kd101n80-45na
  * panel: delay HPD checks for boe_nv133fhm_n61
  * bridge: drop connector check in nwl-dsi bridge
  * bridge: set proper bridge type for adv7511
  * of: fix a double free
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEchf7rIzpz2NEoWjlaA3BHVMLeiMFAl8gBZkACgkQaA3BHVML
 eiMf4Af+ITzLTKmaaWfQyiaE9KsMNa0dzv2bBpG/H15RevJ40O2qEgY2R4hYmONZ
 zMSXLfT8fgj0ZVEac9jE2VoLi6QtAcB+cB9k0jfIL4kT5aG33sek9go/LmAtL9FB
 tyqS3k4lt8wxnVjVJs+Cqiz4BpnKHC9RxxGB8l83kPRbSE+Ifq3sciB0HJx3x6eI
 K2FQqphsYuXyIdewJNCoZ5RKHaS9UjQutargnwWi2Tb3OAmUblZxvojbjAtNlHhx
 PkOD8/iCygsL87GCawoopLnWaPJTDmOEKmxttzLs37Dqw2rhTsRU47/E6MlBZuwe
 LBuXCAAdNs4iRDj9HUoIXnup4YGXOw==
 =gfQ2
 -----END PGP SIGNATURE-----

Merge tag 'drm-misc-fixes-2020-07-28' of git://anongit.freedesktop.org/drm/drm-misc into drm-fixes

 * drm: fix possible use-after-free
 * dbi: fix SPI Type 1 transfer
 * drm_fb_helper: use memcpy_io on bochs' sparc64
 * mcde: fix stability
 * panel: fix display noise on auo,kd101n80-45na
 * panel: delay HPD checks for boe_nv133fhm_n61
 * bridge: drop connector check in nwl-dsi bridge
 * bridge: set proper bridge type for adv7511
 * of: fix a double free

Signed-off-by: Dave Airlie <airlied@redhat.com>

From: Thomas Zimmermann <tzimmermann@suse.de>
Link: https://patchwork.freedesktop.org/patch/msgid/20200728110446.GA8076@linux-uq9g
2020-07-29 12:46:58 +10:00
Martin Varghese
1ed06dbc21 Documentation: bareudp: Corrected description of bareudp module.
Removed redundant words.

Fixes: 571912c69f ("net: UDP tunnel encapsulation module for tunnelling different protocols like MPLS, IP, NSH etc.")
Signed-off-by: Martin Varghese <martin.varghese@nokia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-28 17:53:03 -07:00
Guillaume Nault
302d201b5c bareudp: forbid mixing IP and MPLS in multiproto mode
In multiproto mode, bareudp_xmit() accepts sending multicast MPLS and
IPv6 packets regardless of the bareudp ethertype. In practice, this
let an IP tunnel send multicast MPLS packets, or an MPLS tunnel send
IPv6 packets.

We need to restrict the test further, so that the multiproto mode only
enables
  * IPv6 for IPv4 tunnels,
  * or multicast MPLS for unicast MPLS tunnels.

To improve clarity, the protocol validation is moved to its own
function, where each logical test has its own condition.

v2: s/ntohs/htons/

Fixes: 4b5f67232d ("net: Special handling for IP & MPLS.")
Signed-off-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-28 17:30:25 -07:00
Xiyu Yang
706ec91916 ipv6: Fix nexthop refcnt leak when creating ipv6 route info
ip6_route_info_create() invokes nexthop_get(), which increases the
refcount of the "nh".

When ip6_route_info_create() returns, local variable "nh" becomes
invalid, so the refcount should be decreased to keep refcount balanced.

The reference counting issue happens in one exception handling path of
ip6_route_info_create(). When nexthops can not be used with source
routing, the function forgets to decrease the refcnt increased by
nexthop_get(), causing a refcnt leak.

Fix this issue by pulling up the error source routing handling when
nexthops can not be used with source routing.

Fixes: f88d8ea67f ("ipv6: Plumb support for nexthop object in a fib6_info")
Signed-off-by: Xiyu Yang <xiyuyang19@fudan.edu.cn>
Signed-off-by: Xin Tan <tanxin.ctf@gmail.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-28 17:24:08 -07:00
David S. Miller
fa662d7816 Merge branch 'Fix-bugs-in-Octeontx2-netdev-driver'
Subbaraya Sundeep says:

====================
Fix bugs in Octeontx2 netdev driver

There are problems in the existing Octeontx2
netdev drivers like missing cancel_work for the
reset task, missing lock in reset task and
missing unergister_netdev in driver remove.
This patch set fixes the above problems.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-28 17:14:48 -07:00
Subbaraya Sundeep
ed543f5c6a octeontx2-pf: Unregister netdev at driver remove
Added unregister_netdev in the driver remove
function. Generally unregister_netdev is called
after disabling all the device interrupts but here
it is called before disabling device mailbox
interrupts. The reason behind this is VF needs
mailbox interrupt to communicate with its PF to
clean up its resources during otx2_stop.
otx2_stop disables packet I/O and queue interrupts
first and by using mailbox interrupt communicates
to PF to free VF resources. Hence this patch
calls unregister_device just before
disabling mailbox interrupts.

Fixes: 3184fb5ba9 ("octeontx2-vf: Virtual function driver support")
Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-28 17:14:48 -07:00
Subbaraya Sundeep
c0376f473c octeontx2-pf: cancel reset_task work
During driver exit cancel the queued
reset_task work in VF driver.

Fixes: 3184fb5ba9 ("octeontx2-vf: Virtual function driver support")
Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-28 17:14:48 -07:00
Subbaraya Sundeep
948a66338f octeontx2-pf: Fix reset_task bugs
Two bugs exist in the code related to reset_task
in PF driver one is the missing protection
against network stack ndo_open and ndo_close.
Other one is the missing cancel_work.
This patch fixes those problems.

Fixes: 4ff7d1488a ("octeontx2-pf: Error handling support")
Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-28 17:14:48 -07:00
Jakub Kicinski
3cab8c6552 mlx4: disable device on shutdown
It appears that not disabling a PCI device on .shutdown may lead to
a Hardware Error with particular (perhaps buggy) BIOS versions:

    mlx4_en: eth0: Close port called
    mlx4_en 0000:04:00.0: removed PHC
    reboot: Restarting system
    {1}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 1
    {1}[Hardware Error]: event severity: fatal
    {1}[Hardware Error]:  Error 0, type: fatal
    {1}[Hardware Error]:   section_type: PCIe error
    {1}[Hardware Error]:   port_type: 4, root port
    {1}[Hardware Error]:   version: 1.16
    {1}[Hardware Error]:   command: 0x4010, status: 0x0143
    {1}[Hardware Error]:   device_id: 0000:00:02.2
    {1}[Hardware Error]:   slot: 0
    {1}[Hardware Error]:   secondary_bus: 0x04
    {1}[Hardware Error]:   vendor_id: 0x8086, device_id: 0x2f06
    {1}[Hardware Error]:   class_code: 000604
    {1}[Hardware Error]:   bridge: secondary_status: 0x2000, control: 0x0003
    {1}[Hardware Error]:   aer_uncor_status: 0x00100000, aer_uncor_mask: 0x00000000
    {1}[Hardware Error]:   aer_uncor_severity: 0x00062030
    {1}[Hardware Error]:   TLP Header: 40000018 040000ff 791f4080 00000000
[hw error repeats]
    Kernel panic - not syncing: Fatal hardware error!
    CPU: 0 PID: 2189 Comm: reboot Kdump: loaded Not tainted 5.6.x-blabla #1
    Hardware name: HP ProLiant DL380 Gen9/ProLiant DL380 Gen9, BIOS P89 05/05/2017

Fix the mlx4 driver.

This is a very similar problem to what had been fixed in:
commit 0d98ba8d70 ("scsi: hpsa: disable device during shutdown")
to address https://bugzilla.kernel.org/show_bug.cgi?id=199779.

Fixes: 2ba5fbd62b ("net/mlx4_core: Handle AER flow properly")
Reported-by: Jake Lawrence <lawja@fb.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-28 17:10:56 -07:00
David S. Miller
a7ef23e568 Merge branch 'rhashtable-Fix-unprotected-RCU-dereference-in-__rht_ptr'
Herbert Xu says:

====================
rhashtable: Fix unprotected RCU dereference in __rht_ptr

This patch series fixes an unprotected dereference in __rht_ptr.
The first patch is a minimal fix that does not use the correct
RCU markings but is suitable for backport, and the second patch
cleans up the RCU markings.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-28 17:09:49 -07:00
Herbert Xu
ce9b362bf6 rhashtable: Restore RCU marking on rhash_lock_head
This patch restores the RCU marking on bucket_table->buckets as
it really does need RCU protection.  Its removal had led to a fatal
bug.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-28 17:09:49 -07:00
Herbert Xu
1748f6a2cb rhashtable: Fix unprotected RCU dereference in __rht_ptr
The rcu_dereference call in rht_ptr_rcu is completely bogus because
we've already dereferenced the value in __rht_ptr and operated on it.
This causes potential double readings which could be fatal.  The RCU
dereference must occur prior to the comparison in __rht_ptr.

This patch changes the order of RCU dereference so that it is done
first and the result is then fed to __rht_ptr.  The RCU marking
changes have been minimised using casts which will be removed in
a follow-up patch.

Fixes: ba6306e3f6 ("rhashtable: Remove RCU marking from...")
Reported-by: "Gong, Sishuai" <sishuai@purdue.edu>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-28 17:09:49 -07:00
René van Dorst
19016d93bf net: ethernet: mtk_eth_soc: Always call mtk_gmac0_rgmii_adjust() for mt7623
Modify mtk_gmac0_rgmii_adjust() so it can always be called.
mtk_gmac0_rgmii_adjust() sets-up the TRGMII clocks.

Signed-off-by: René van Dorst <opensource@vdorst.com>
Signed-off-By: David Woodhouse <dwmw2@infradead.org>
Tested-by: Frank Wunderlich <frank-w@public-files.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-28 17:04:30 -07:00
David S. Miller
b5cd55b334 mlx5-fixes-2020-07-28
-----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEGhZs6bAKwk/OTgTpSD+KveBX+j4FAl8ggswACgkQSD+KveBX
 +j7yOgf8DjzPtSpVfUA7Iq28WO6YxJy208oUdLjKNuRNr74vXulHegNlP6cFDHU3
 QIddvTBNMTp2BpJpoFMbEod7sVTBq5KVQZvpIuFM2JU4h76vL4cYbWeBuT6rFIoJ
 m5vuuUyAB+16QbJzagY/rqfQMs0w7KnR+Zhv18JzwyHhBiRLaPzYdmSWM2kkF8HZ
 3DrY8RWgkeaI9vTpE6Fau7BRNDUOMgjIahiUrojJuyPsYZpJf5g+KaMj4xvgcqMa
 vaPaw8iHN7+N3KIdcf6MJhfzx3SHP5YNieU/MfvE9sLvdPvfLETdpexYPB8b0/vs
 L9w2D8j0uZyXek30fIiIwHaibQGZPw==
 =W0Fn
 -----END PGP SIGNATURE-----

Merge tag 'mlx5-fixes-2020-07-28' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux

Saeed Mahameed says:

====================
mlx5 fixes-2020-07-28

This series introduces some fixes to mlx5 driver.
v1->v2:
 - Drop the "Hold reference on mirred devices" patch, until Or's
   comments are addressed.
 - Imporve "Modify uplink state" patch commit message per Or's request.

Please pull and let me know if there is any problem.

For -Stable:

For -stable v4.9
 ('net/mlx5e: Fix error path of device attach')

For -stable v4.15
 ('net/mlx5: Verify Hardware supports requested ptp function on a given
pin')

For -stable v5.3
 ('net/mlx5e: Modify uplink state on interface up/down')

For -stable v5.4
 ('net/mlx5e: Fix kernel crash when setting vf VLANID on a VF dev')
 ('net/mlx5: E-switch, Destroy TSAR when fail to enable the mode')

For -stable v5.5
 ('net/mlx5: E-switch, Destroy TSAR after reload interface')

For -stable v5.7
 ('net/mlx5: Fix a bug of using ptp channel index as pin index')
====================

Acked-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-28 16:55:13 -07:00
David S. Miller
2ff34c909f Merge branch 'net-lan78xx-fix-NULL-deref-and-memory-leak'
Johan Hovold says:

====================
net: lan78xx: fix NULL deref and memory leak

The first two patches fix a NULL-pointer dereference at probe that can
be triggered by a malicious device and a small transfer-buffer memory
leak, respectively.

For another subsystem I would have marked them:

	Cc: stable@vger.kernel.org	# 4.3

The third one replaces the driver's current broken endpoint lookup
helper, which could end up accepting incomplete interfaces and whose
results weren't even useeren
Johan
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-28 13:35:44 -07:00
Johan Hovold
ea060b3526 net: lan78xx: replace bogus endpoint lookup
Drop the bogus endpoint-lookup helper which could end up accepting
interfaces based on endpoints belonging to unrelated altsettings.

Note that the returned bulk pipes and interrupt endpoint descriptor
were never actually used. Instead the bulk-endpoint numbers are
hardcoded to 1 and 2 (matching the specification), while the interrupt-
endpoint descriptor was assumed to be the third descriptor created by
USB core.

Try to bring some order to this by dropping the bogus lookup helper and
adding the missing endpoint sanity checks while keeping the interrupt-
descriptor assumption for now.

Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-28 13:35:44 -07:00
Johan Hovold
63634aa679 net: lan78xx: fix transfer-buffer memory leak
The interrupt URB transfer-buffer was never freed on disconnect or after
probe errors.

Fixes: 55d7de9de6 ("Microchip's LAN7800 family USB 2/3 to 10/100/1000 Ethernet device driver")
Cc: Woojung.Huh@microchip.com <Woojung.Huh@microchip.com>
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-28 13:35:44 -07:00
Johan Hovold
8d8e95fd6d net: lan78xx: add missing endpoint sanity check
Add the missing endpoint sanity check to prevent a NULL-pointer
dereference should a malicious device lack the expected endpoints.

Note that the driver has a broken endpoint-lookup helper,
lan78xx_get_endpoints(), which can end up accepting interfaces in an
altsetting without endpoints as long as *some* altsetting has a bulk-in
and a bulk-out endpoint.

Fixes: 55d7de9de6 ("Microchip's LAN7800 family USB 2/3 to 10/100/1000 Ethernet device driver")
Cc: Woojung.Huh@microchip.com <Woojung.Huh@microchip.com>
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-28 13:35:44 -07:00
Rustam Kovhaev
e911e99a07 usb: hso: check for return value in hso_serial_common_create()
in case of an error tty_register_device_attr() returns ERR_PTR(),
add IS_ERR() check

Reported-and-tested-by: syzbot+67b2bd0e34f952d0321e@syzkaller.appspotmail.com
Link: https://syzkaller.appspot.com/bug?extid=67b2bd0e34f952d0321e
Signed-off-by: Rustam Kovhaev <rkovhaev@gmail.com>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-28 13:01:51 -07:00
Alaa Hleihel
350a63249d net/mlx5e: Fix kernel crash when setting vf VLANID on a VF dev
After the cited commit, function 'mlx5_eswitch_set_vport_vlan' started
to acquire esw->state_lock.
However, esw is not defined for VF devices, hence attempting to set vf
VLANID on a VF dev will cause a kernel panic.

Fix it by moving up the (redundant) esw validation from function
'__mlx5_eswitch_set_vport_vlan' since the rest of the callers now have
and use a valid esw.

For example with vf device eth4:
 # ip link set dev eth4 vf 0 vlan 0

Trace of the panic:
 [  411.409842] BUG: unable to handle page fault for address: 00000000000011b8
 [  411.449745] #PF: supervisor read access in kernel mode
 [  411.452348] #PF: error_code(0x0000) - not-present page
 [  411.454938] PGD 80000004189c9067 P4D 80000004189c9067 PUD 41899a067 PMD 0
 [  411.458382] Oops: 0000 [#1] SMP PTI
 [  411.460268] CPU: 4 PID: 5711 Comm: ip Not tainted 5.8.0-rc4_for_upstream_min_debug_2020_07_08_22_04 #1
 [  411.462447] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
 [  411.464158] RIP: 0010:__mutex_lock+0x4e/0x940
 [  411.464928] Code: fd 41 54 49 89 f4 41 52 53 89 d3 48 83 ec 70 44 8b 1d ee 03 b0 01 65 48 8b 04 25 28 00 00 00 48 89 45 c8 31 c0 45 85 db 75 0a <48> 3b 7f 60 0f 85 7e 05 00 00 49 8d 45 68 41 56 41 b8 01 00 00 00
 [  411.467678] RSP: 0018:ffff88841fcd74b0 EFLAGS: 00010246
 [  411.468562] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
 [  411.469715] RDX: 0000000000000000 RSI: 0000000000000002 RDI: 0000000000001158
 [  411.470812] RBP: ffff88841fcd7550 R08: ffffffffa00fa1ce R09: 0000000000000000
 [  411.471835] R10: ffff88841fcd7570 R11: 0000000000000000 R12: 0000000000000002
 [  411.472862] R13: 0000000000001158 R14: ffffffffa00fa1ce R15: 0000000000000000
 [  411.474004] FS:  00007faee7ca6b80(0000) GS:ffff88846fc00000(0000) knlGS:0000000000000000
 [  411.475237] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 [  411.476129] CR2: 00000000000011b8 CR3: 000000041909c006 CR4: 0000000000360ea0
 [  411.477260] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
 [  411.478340] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
 [  411.479332] Call Trace:
 [  411.479760]  ? __nla_validate_parse.part.6+0x57/0x8f0
 [  411.482825]  ? mlx5_eswitch_set_vport_vlan+0x3e/0xa0 [mlx5_core]
 [  411.483804]  mlx5_eswitch_set_vport_vlan+0x3e/0xa0 [mlx5_core]
 [  411.484733]  mlx5e_set_vf_vlan+0x41/0x50 [mlx5_core]
 [  411.485545]  do_setlink+0x613/0x1000
 [  411.486165]  __rtnl_newlink+0x53d/0x8c0
 [  411.486791]  ? mark_held_locks+0x49/0x70
 [  411.487429]  ? __lock_acquire+0x8fe/0x1eb0
 [  411.488085]  ? rcu_read_lock_sched_held+0x52/0x60
 [  411.488998]  ? kmem_cache_alloc_trace+0x16d/0x2d0
 [  411.489759]  rtnl_newlink+0x47/0x70
 [  411.490357]  rtnetlink_rcv_msg+0x24e/0x450
 [  411.490978]  ? netlink_deliver_tap+0x92/0x3d0
 [  411.491631]  ? validate_linkmsg+0x330/0x330
 [  411.492262]  netlink_rcv_skb+0x47/0x110
 [  411.492852]  netlink_unicast+0x1ac/0x270
 [  411.493551]  netlink_sendmsg+0x336/0x450
 [  411.494209]  sock_sendmsg+0x30/0x40
 [  411.494779]  ____sys_sendmsg+0x1dd/0x1f0
 [  411.495378]  ? copy_msghdr_from_user+0x5c/0x90
 [  411.496082]  ___sys_sendmsg+0x87/0xd0
 [  411.496683]  ? lock_acquire+0xb9/0x3a0
 [  411.497322]  ? lru_cache_add+0x5/0x170
 [  411.497944]  ? find_held_lock+0x2d/0x90
 [  411.498568]  ? handle_mm_fault+0xe46/0x18c0
 [  411.499205]  ? __sys_sendmsg+0x51/0x90
 [  411.499784]  __sys_sendmsg+0x51/0x90
 [  411.500341]  do_syscall_64+0x59/0x2e0
 [  411.500938]  ? asm_exc_page_fault+0x8/0x30
 [  411.501609]  ? rcu_read_lock_sched_held+0x52/0x60
 [  411.502350]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
 [  411.503093] RIP: 0033:0x7faee73b85a7
 [  411.503654] Code: Bad RIP value.

Fixes: 0e18134f4f ("net/mlx5e: Eswitch, use state_lock to synchronize vlan change")
Signed-off-by: Alaa Hleihel <alaa@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Vlad Buslov <vladbu@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-07-28 12:55:53 -07:00
Ron Diskin
7d0314b11c net/mlx5e: Modify uplink state on interface up/down
When setting the PF interface up/down, notify the firmware to update
uplink state via MODIFY_VPORT_STATE, when E-Switch is enabled.

This behavior will prevent sending traffic out on uplink port when PF is
down, such as sending traffic from a VF interface which is still up.
Currently when calling mlx5e_open/close(), the driver only sends PAOS
command to notify the firmware to set the physical port state to
up/down, however, it is not sufficient. When VF is in "auto" state, it
follows the uplink state, which was not updated on mlx5e_open/close()
before this patch.

When switchdev mode is enabled and uplink representor is first enabled,
set the uplink port state value back to its FW default "AUTO".

Fixes: 63bfd399de ("net/mlx5e: Send PAOS command on interface up/down")
Signed-off-by: Ron Diskin <rondi@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Moshe Shemesh <moshe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-07-28 12:55:51 -07:00
Eran Ben Elisha
ed56d749c3 net/mlx5: Query PPS pin operational status before registering it
In a special configuration, a ConnectX6-Dx pin pps-out might be activated
when driver is loaded. Fix the driver to always read the operational pin
mode when registering it, and advertise it accordingly.

Fixes: ee7f12205a ("net/mlx5e: Implement 1PPS support")
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-07-28 12:55:48 -07:00
Raed Salem
21083309ca net/mlx5e: Fix slab-out-of-bounds in mlx5e_rep_is_lag_netdev
mlx5e_rep_is_lag_netdev is used as first check as part of netdev events
handler for bond device of non-uplink representors, this handler can get
any netdevice under the same network namespace of mlx5e netdevice. Current
code treats the netdev as mlx5e netdev and only later on verifies this,
hence causes the following Kasan trace:
[15402.744990] ==================================================================
[15402.746942] BUG: KASAN: slab-out-of-bounds in mlx5e_rep_is_lag_netdev+0xcb/0xf0 [mlx5_core]
[15402.749009] Read of size 8 at addr ffff880391f3f6b0 by task ovs-vswitchd/5347

[15402.752065] CPU: 7 PID: 5347 Comm: ovs-vswitchd Kdump: loaded Tainted: G    B      O     --------- -t - 4.18.0-g3dcc204d291d-dirty #1
[15402.755349] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
[15402.757600] Call Trace:
[15402.758968]  dump_stack+0x71/0xab
[15402.760427]  print_address_description+0x6a/0x270
[15402.761969]  kasan_report+0x179/0x2d0
[15402.763445]  ? mlx5e_rep_is_lag_netdev+0xcb/0xf0 [mlx5_core]
[15402.765121]  mlx5e_rep_is_lag_netdev+0xcb/0xf0 [mlx5_core]
[15402.766782]  mlx5e_rep_esw_bond_netevent+0x129/0x620 [mlx5_core]

Fix by deferring the violating access to be post the netdev verify check.

Fixes: 7e51891a23 ("net/mlx5e: Use netdev events to set/del egress acl forward-to-vport rule")
Signed-off-by: Raed Salem <raeds@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Vu Pham <vuhuong@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-07-28 12:55:45 -07:00
Eran Ben Elisha
071995c877 net/mlx5: Verify Hardware supports requested ptp function on a given pin
Fix a bug where driver did not verify Hardware pin capabilities for
PTP functions.

Fixes: ee7f12205a ("net/mlx5e: Implement 1PPS support")
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Reviewed-by: Ariel Levkovich <lariel@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-07-28 12:55:43 -07:00
Eran Ben Elisha
88c8cf92db net/mlx5: Fix a bug of using ptp channel index as pin index
On PTP mlx5_ptp_enable(on=0) flow, driver mistakenly used channel index
as pin index.

After ptp patch marked in fixes tag was introduced, driver can freely
call ptp_find_pin() as part of the .enable() callback.

Fix driver mlx5_ptp_enable(on=0) flow to always use ptp_find_pin(). With
that, Driver will use the correct pin index in mlx5_ptp_enable(on=0) flow.

In addition, when initializing the pins, always set channel to zero. As
all pins can be attached to all channels, let ptp_set_pinfunc() to move
them between the channels.

For stable branches, this fix to be applied only on kernels that includes
both patches in fixes tag. Otherwise, mlx5_ptp_enable(on=0) will be stuck
on pincfg_mux.

Fixes: 62582a7ee7 ("ptp: Avoid deadlocks in the programmable pin code.")
Fixes: ee7f12205a ("net/mlx5e: Implement 1PPS support")
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Reviewed-by: Ariel Levkovich <lariel@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-07-28 12:55:40 -07:00
Maor Dickman
0e2e7aa57b net/mlx5e: Fix missing cleanup of ethtool steering during rep rx cleanup
The cited commit add initialization of ethtool steering during
representor rx initializations without cleaning it up in representor
rx cleanup, this may cause for stale ethtool flows to remain after
moving back from switchdev mode to legacy mode.

Fixed by calling ethtool steering cleanup during rep rx cleanup.

Fixes: 6783e8b29f ("net/mlx5e: Init ethtool steering for representors")
Signed-off-by: Maor Dickman <maord@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Vlad Buslov <vladbu@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-07-28 12:55:37 -07:00
Aya Levin
5cd39b6e9a net/mlx5e: Fix error path of device attach
On failure to attach the netdev, fix the rollback by re-setting the
device's state back to MLX5E_STATE_DESTROYING.

Failing to attach doesn't stop statistics polling via .ndo_get_stats64.
In this case, although the device is not attached, it falsely continues
to query the firmware for counters. Setting the device's state back to
MLX5E_STATE_DESTROYING prevents the firmware counters query.

Fixes: 26e59d8077 ("net/mlx5e: Implement mlx5e interface attach/detach callbacks")
Signed-off-by: Aya Levin <ayal@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-07-28 12:55:35 -07:00
Maor Gottlieb
59f8f7c84c net/mlx5: Fix forward to next namespace
The steering tree is as follow (nic RX as example):
		   ---------
                   |root_ns|
		   ---------
			|
      	--------------------------------
    	|		|	       |
   ---------- 	   ----------      ---------
   |p(prio)0|	   |   p1   |      |   pn  |
   ----------	   ----------	   ---------
        |		|
 ----------------  ---------------
 |ns(e.g bypass)|  |ns(e.g. lag) |
 ----------------  ---------------
  |     |    |
----  ----  ----
|p0|  |p1|  |pn|
----  ----  ----
 |
----
|FT|
----

find_next_chained_ft(prio) returns the first flow table in the next
priority. If prio is a parent of a flow table then it returns the first
flow table in the next priority in the same namespace, else if prio
is parent of namespace, then it should return the first flow table
in the next namespace. Currently if the user requests to forward to
next namespace, the code calls to find_next_chained_ft with the prio
of the next namespace and not the prio of the namesapce itself.

Fixes: 9254f8ed15 ("net/mlx5: Add support in forward to namespace")
Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-07-28 12:55:32 -07:00
Parav Pandit
0c2600c619 net/mlx5: E-switch, Destroy TSAR after reload interface
When eswitch offloads is enabled, TSAR is created before reloading
the interfaces.
However when eswitch offloads mode is disabled, TSAR is disabled before
reloading the interfaces.

To keep the eswitch enable/disable sequence as mirror, destroy TSAR
after reloading the interfaces.

Fixes: 1bd27b11c1 ("net/mlx5: Introduce E-switch QoS management")
Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-07-28 12:55:30 -07:00
Parav Pandit
2b8e9c7c3f net/mlx5: E-switch, Destroy TSAR when fail to enable the mode
When either esw_legacy_enable() or esw_offloads_enable() fails,
code missed to destroy the created TSAR.

Hence, add the missing call to destroy the TSAR.

Fixes: 610090ebce ("net/mlx5: E-switch, Initialize TSAR Qos hardware block before its user vports")
Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-07-28 12:55:27 -07:00
David S. Miller
7a9212d178 Merge branch 'hns3-fixes'
Huazhong Tan says:

====================
net: hns3: fixes for -net

There are some bugfixes for the HNS3 ethernet driver. patch#1 fixes
a desc filling bug, patch#2 fixes a false TX timeout issue, and
patch#3~#5 fixes some bugs related to VLAN and FD.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-28 12:54:48 -07:00
Guojia Liao
b7b5d25bdd net: hns3: fix for VLAN config when reset failed
When device is resetting or reset failed, firmware is unable to
handle mailbox. VLAN should not be configured in this case.

Fixes: fe4144d47e ("net: hns3: sync VLAN filter entries when kill VLAN ID failed")
Signed-off-by: Guojia Liao <liaoguojia@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-28 12:54:48 -07:00
Guojia Liao
efe3fa45f7 net: hns3: fix aRFS FD rules leftover after add a user FD rule
When user had created a FD rule, all the aRFS rules should be clear up.
HNS3 process flow as below:
1.get spin lock of fd_ruls_list
2.clear up all aRFS rules
3.release lock
4.get spin lock of fd_ruls_list
5.creat a rules
6.release lock;

There is a short period of time between step 3 and step 4, which would
creatting some new aRFS FD rules if driver was receiving packet.
So refactor the fd_rule_lock to fix it.

Fixes: 4412288757 ("net: hns3: refine the flow director handle")
Signed-off-by: Guojia Liao <liaoguojia@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-28 12:54:48 -07:00