In es58x_rx_err_msg(), if can->do_set_mode() fails, the function
directly returns without calling netif_rx(skb). This means that the
skb previously allocated by alloc_can_err_skb() is not freed. In other
terms, this is a memory leak.
This patch simply removes the return statement in the error branch and
let the function continue.
Issue was found with GCC -fanalyzer, please follow the link below for
details.
Fixes: 8537257874 ("can: etas_es58x: add core support for ETAS ES58X CAN USB interfaces")
Link: https://lore.kernel.org/all/20211026180740.1953265-1-mailhol.vincent@wanadoo.fr
Signed-off-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
fix the remaining build issues reported by patchwork
in firmware v4.0 support commit which has been already
merged.
Fix patchwork issues:
- source inline
- checkpatch
Fixes: bb5dbf2cc6 ("net: marvell: prestera: add firmware v4.0 support")
Signed-off-by: Volodymyr Mytnyk <vmytnyk@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
'net/protocol.h' included in 'drivers/net/amt.c' is duplicated.
Reported-by: Zeal Robot <zealci@zte.com.cn>
Signed-off-by: Zhang Mingyu <zhang.mingyu@zte.com.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
The mii ioctls are now handled by the ndo_eth_ioctl() callback,
not the old ndo_do_ioctl(), but octeontx2-nicvf introduced the
function for the old way.
Move it over to ndo_eth_ioctl() to actually allow calling it from
user space.
Fixes: 43510ef4dd ("octeontx2-nicvf: Add PTP hardware clock support to NIX VF")
Fixes: a76053707d ("dev_ioctl: split out ndo_eth_ioctl")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
The timestamp ioctls are now handled by the ndo_eth_ioctl() callback,
not the old ndo_do_ioctl(), but oax88796 introduced the
function for the old way.
Move it over to ndo_eth_ioctl() to actually allow calling it from
user space.
Fixes: a97c69ba4f ("net: ax88796c: ASIX AX88796C SPI Ethernet Adapter Driver")
Fixes: a76053707d ("dev_ioctl: split out ndo_eth_ioctl")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Lukasz Stelmach <l.stelmach@samsung.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eliminate the following coccicheck warning:
./drivers/net/amt.c:2795:6-9: ERROR: amt is NULL but dereferenced.
Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Signed-off-by: Yang Li <yang.lee@linux.alibaba.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Build bot says:
>> drivers/net/ethernet/asix/ax88796c_main.c:1116:34: warning: unused variable 'ax88796c_dt_ids' [-Wunused-const-variable]
static const struct of_device_id ax88796c_dt_ids[] = {
^
The only reference to this array is wrapped in of_match_ptr().
Reported-by: kernel test robot <lkp@intel.com>
Fixes: a97c69ba4f ("net: ax88796c: ASIX AX88796C SPI Ethernet Adapter Driver")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The print function dev_err() is redundant because
platform_get_irq() already prints an error.
Signed-off-by: Xu Wang <vulab@iscas.ac.cn>
Reviewed-by: Cai Huoqing <caihuoqing@baidu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The octeontx2 pf nic driver failsz to link when the devlink support
is not reachable:
aarch64-linux-ld: drivers/net/ethernet/marvell/octeontx2/nic/otx2_devlink.o: in function `otx2_dl_mcam_count_get':
otx2_devlink.c:(.text+0x10): undefined reference to `devlink_priv'
aarch64-linux-ld: drivers/net/ethernet/marvell/octeontx2/nic/otx2_devlink.o: in function `otx2_dl_mcam_count_validate':
otx2_devlink.c:(.text+0x50): undefined reference to `devlink_priv'
aarch64-linux-ld: drivers/net/ethernet/marvell/octeontx2/nic/otx2_devlink.o: in function `otx2_dl_mcam_count_set':
otx2_devlink.c:(.text+0xd0): undefined reference to `devlink_priv'
aarch64-linux-ld: drivers/net/ethernet/marvell/octeontx2/nic/otx2_devlink.o: in function `otx2_devlink_info_get':
otx2_devlink.c:(.text+0x150): undefined reference to `devlink_priv'
This is already selected by the admin function driver, but not the
actual nic, which might be built-in when the af driver is not.
Fixes: 2da4894327 ("octeontx2-pf: devlink params support to set mcam entry count")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use the macro 'swap()' defined in 'include/linux/minmax.h' to avoid
opencoding it.
Reported-by: Zeal Robot <zealci@zte.com.cn>
Signed-off-by: Yang Guang <yang.guang5@zte.com.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use the macro 'swap()' defined in 'include/linux/minmax.h' to avoid
opencoding it.
Reported-by: Zeal Robot <zealci@zte.com.cn>
Signed-off-by: Yang Guang <yang.guang5@zte.com.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
The assignment of err will be overwritten next, so this statement
should be deleted.
The clang_analyzer complains as follows:
drivers/net/ethernet/broadcom/tg3.c:5506:2: warning:
Value stored to 'expected_sg_dig_ctrl' is never read
Reported-by: Zeal Robot <zealci@zte.com.cn>
Signed-off-by: luo penghao <luo.penghao@zte.com.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
When I do fuzz test for bonding device interface, I got the following
use-after-free Calltrace:
==================================================================
BUG: KASAN: use-after-free in bond_enslave+0x1521/0x24f0
Read of size 8 at addr ffff88825bc11c00 by task ifenslave/7365
CPU: 5 PID: 7365 Comm: ifenslave Tainted: G E 5.15.0-rc1+ #13
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1 04/01/2014
Call Trace:
dump_stack_lvl+0x6c/0x8b
print_address_description.constprop.0+0x48/0x70
kasan_report.cold+0x82/0xdb
__asan_load8+0x69/0x90
bond_enslave+0x1521/0x24f0
bond_do_ioctl+0x3e0/0x450
dev_ifsioc+0x2ba/0x970
dev_ioctl+0x112/0x710
sock_do_ioctl+0x118/0x1b0
sock_ioctl+0x2e0/0x490
__x64_sys_ioctl+0x118/0x150
do_syscall_64+0x35/0xb0
entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x7f19159cf577
Code: b3 66 90 48 8b 05 11 89 2c 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 78
RSP: 002b:00007ffeb3083c78 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 00007ffeb3084bca RCX: 00007f19159cf577
RDX: 00007ffeb3083ce0 RSI: 0000000000008990 RDI: 0000000000000003
RBP: 00007ffeb3084bc4 R08: 0000000000000040 R09: 0000000000000000
R10: 00007ffeb3084bc0 R11: 0000000000000246 R12: 00007ffeb3083ce0
R13: 0000000000000000 R14: 0000000000000000 R15: 00007ffeb3083cb0
Allocated by task 7365:
kasan_save_stack+0x23/0x50
__kasan_kmalloc+0x83/0xa0
kmem_cache_alloc_trace+0x22e/0x470
bond_enslave+0x2e1/0x24f0
bond_do_ioctl+0x3e0/0x450
dev_ifsioc+0x2ba/0x970
dev_ioctl+0x112/0x710
sock_do_ioctl+0x118/0x1b0
sock_ioctl+0x2e0/0x490
__x64_sys_ioctl+0x118/0x150
do_syscall_64+0x35/0xb0
entry_SYSCALL_64_after_hwframe+0x44/0xae
Freed by task 7365:
kasan_save_stack+0x23/0x50
kasan_set_track+0x20/0x30
kasan_set_free_info+0x24/0x40
__kasan_slab_free+0xf2/0x130
kfree+0xd1/0x5c0
slave_kobj_release+0x61/0x90
kobject_put+0x102/0x180
bond_sysfs_slave_add+0x7a/0xa0
bond_enslave+0x11b6/0x24f0
bond_do_ioctl+0x3e0/0x450
dev_ifsioc+0x2ba/0x970
dev_ioctl+0x112/0x710
sock_do_ioctl+0x118/0x1b0
sock_ioctl+0x2e0/0x490
__x64_sys_ioctl+0x118/0x150
do_syscall_64+0x35/0xb0
entry_SYSCALL_64_after_hwframe+0x44/0xae
Last potentially related work creation:
kasan_save_stack+0x23/0x50
kasan_record_aux_stack+0xb7/0xd0
insert_work+0x43/0x190
__queue_work+0x2e3/0x970
delayed_work_timer_fn+0x3e/0x50
call_timer_fn+0x148/0x470
run_timer_softirq+0x8a8/0xc50
__do_softirq+0x107/0x55f
Second to last potentially related work creation:
kasan_save_stack+0x23/0x50
kasan_record_aux_stack+0xb7/0xd0
insert_work+0x43/0x190
__queue_work+0x2e3/0x970
__queue_delayed_work+0x130/0x180
queue_delayed_work_on+0xa7/0xb0
bond_enslave+0xe25/0x24f0
bond_do_ioctl+0x3e0/0x450
dev_ifsioc+0x2ba/0x970
dev_ioctl+0x112/0x710
sock_do_ioctl+0x118/0x1b0
sock_ioctl+0x2e0/0x490
__x64_sys_ioctl+0x118/0x150
do_syscall_64+0x35/0xb0
entry_SYSCALL_64_after_hwframe+0x44/0xae
The buggy address belongs to the object at ffff88825bc11c00
which belongs to the cache kmalloc-1k of size 1024
The buggy address is located 0 bytes inside of
1024-byte region [ffff88825bc11c00, ffff88825bc12000)
The buggy address belongs to the page:
page:ffffea00096f0400 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x25bc10
head:ffffea00096f0400 order:3 compound_mapcount:0 compound_pincount:0
flags: 0x57ff00000010200(slab|head|node=1|zone=2|lastcpupid=0x7ff)
raw: 057ff00000010200 ffffea0009a71c08 ffff888240001968 ffff88810004dbc0
raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected
Memory state around the buggy address:
ffff88825bc11b00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff88825bc11b80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff88825bc11c00: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff88825bc11c80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff88825bc11d00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
==================================================================
Put new_slave in bond_sysfs_slave_add() will cause use-after-free problems
when new_slave is accessed in the subsequent error handling process. Since
new_slave will be put in the subsequent error handling process, remove the
unnecessary put to fix it.
In addition, when sysfs_create_file() fails, if some files have been crea-
ted successfully, we need to call sysfs_remove_file() to remove them.
Since there are sysfs_create_files() & sysfs_remove_files() can be used,
use these two functions instead.
Fixes: 7afcaec496 (bonding: use kobject_put instead of _del after kobject_add)
Signed-off-by: Huang Guobin <huangguobin4@huawei.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Tony Nguyen says:
====================
Intel Wired LAN Driver Updates 2021-11-03
Brett fixes issues with promiscuous mode settings not being properly
enabled and removes setting of VF antispoof along with promiscuous
mode. He also ensures that VF Tx queues are always disabled and resolves
a race between virtchnl handling and VF related ndo ops.
Sylwester fixes an issue where a VF MAC could not be set to its primary
MAC if the address is already present.
* '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue:
ice: Fix race conditions between virtchnl handling and VF ndo ops
ice: Fix not stopping Tx queues for VFs
ice: Fix replacing VF hardware MAC to existing MAC filter
ice: Remove toggling of antispoof for VF trusted promiscuous mode
ice: Fix VF true promiscuous mode
====================
Link: https://lore.kernel.org/r/20211103161935.2997369-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
As reported by Zhang there's a small issue if in forced mode the duplex
mode changes with the link staying up [0]. In this case the MAC isn't
notified about the change.
The proposed patch relies on the phylib state machine and ignores the
fact that there are drivers that uses phylib but not the phylib state
machine. So let's don't change the behavior for such drivers and fix
it w/o re-adding state PHY_FORCING for the case that phylib state
machine is used.
[0] https://lore.kernel.org/netdev/a5c26ffd-4ee4-a5e6-4103-873208ce0dc5@huawei.com/T/
Fixes: 2bd229df5e ("net: phy: remove state PHY_FORCING")
Reported-by: Zhang Changzhong <zhangchangzhong@huawei.com>
Tested-by: Zhang Changzhong <zhangchangzhong@huawei.com>
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Link: https://lore.kernel.org/r/7b8b9456-a93f-abbc-1dc5-a2c2542f932c@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The VF can be configured via the PF's ndo ops at the same time the PF is
receiving/handling virtchnl messages. This has many issues, with
one of them being the ndo op could be actively resetting a VF (i.e.
resetting it to the default state and deleting/re-adding the VF's VSI)
while a virtchnl message is being handled. The following error was seen
because a VF ndo op was used to change a VF's trust setting while the
VIRTCHNL_OP_CONFIG_VSI_QUEUES was ongoing:
[35274.192484] ice 0000:88:00.0: Failed to set LAN Tx queue context, error: ICE_ERR_PARAM
[35274.193074] ice 0000:88:00.0: VF 0 failed opcode 6, retval: -5
[35274.193640] iavf 0000:88:01.0: PF returned error -5 (IAVF_ERR_PARAM) to our request 6
Fix this by making sure the virtchnl handling and VF ndo ops that
trigger VF resets cannot run concurrently. This is done by adding a
struct mutex cfg_lock to each VF structure. For VF ndo ops, the mutex
will be locked around the critical operations and VFR. Since the ndo ops
will trigger a VFR, the virtchnl thread will use mutex_trylock(). This
is done because if any other thread (i.e. VF ndo op) has the mutex, then
that means the current VF message being handled is no longer valid, so
just ignore it.
This issue can be seen using the following commands:
for i in {0..50}; do
rmmod ice
modprobe ice
sleep 1
echo 1 > /sys/class/net/ens785f0/device/sriov_numvfs
echo 1 > /sys/class/net/ens785f1/device/sriov_numvfs
ip link set ens785f1 vf 0 trust on
ip link set ens785f0 vf 0 trust on
sleep 2
echo 0 > /sys/class/net/ens785f0/device/sriov_numvfs
echo 0 > /sys/class/net/ens785f1/device/sriov_numvfs
sleep 1
echo 1 > /sys/class/net/ens785f0/device/sriov_numvfs
echo 1 > /sys/class/net/ens785f1/device/sriov_numvfs
ip link set ens785f1 vf 0 trust on
ip link set ens785f0 vf 0 trust on
done
Fixes: 7c710869d6 ("ice: Add handlers for VF netdevice operations")
Signed-off-by: Brett Creeley <brett.creeley@intel.com>
Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
When a VF is removed and/or reset its Tx queues need to be
stopped from the PF. This is done by calling the ice_dis_vf_qs()
function, which calls ice_vsi_stop_lan_tx_rings(). Currently
ice_dis_vf_qs() is protected by the VF state bit ICE_VF_STATE_QS_ENA.
Unfortunately, this is causing the Tx queues to not be disabled in some
cases and when the VF tries to re-enable/reconfigure its Tx queues over
virtchnl the op is failing. This is because a VF can be reset and/or
removed before the ICE_VF_STATE_QS_ENA bit is set, but the Tx queues
were already configured via ice_vsi_cfg_single_txq() in the
VIRTCHNL_OP_CONFIG_VSI_QUEUES op. However, the ICE_VF_STATE_QS_ENA bit
is set on a successful VIRTCHNL_OP_ENABLE_QUEUES, which will always
happen after the VIRTCHNL_OP_CONFIG_VSI_QUEUES op.
This was causing the following error message when loading the ice
driver, creating VFs, and modifying VF trust in an endless loop:
[35274.192484] ice 0000:88:00.0: Failed to set LAN Tx queue context, error: ICE_ERR_PARAM
[35274.193074] ice 0000:88:00.0: VF 0 failed opcode 6, retval: -5
[35274.193640] iavf 0000:88:01.0: PF returned error -5 (IAVF_ERR_PARAM) to our request 6
Fix this by always calling ice_dis_vf_qs() and silencing the error
message in ice_vsi_stop_tx_ring() since the calling code ignores the
return anyway. Also, all other places that call ice_vsi_stop_tx_ring()
catch the error, so this doesn't affect those flows since there was no
change to the values the function returns.
Other solutions were considered (i.e. tracking which VF queues had been
"started/configured" in VIRTCHNL_OP_CONFIG_VSI_QUEUES, but it seemed
more complicated than it was worth. This solution also brings in the
chance for other unexpected conditions due to invalid state bit checks.
So, the proposed solution seemed like the best option since there is no
harm in failing to stop Tx queues that were never started.
This issue can be seen using the following commands:
for i in {0..50}; do
rmmod ice
modprobe ice
sleep 1
echo 1 > /sys/class/net/ens785f0/device/sriov_numvfs
echo 1 > /sys/class/net/ens785f1/device/sriov_numvfs
ip link set ens785f1 vf 0 trust on
ip link set ens785f0 vf 0 trust on
sleep 2
echo 0 > /sys/class/net/ens785f0/device/sriov_numvfs
echo 0 > /sys/class/net/ens785f1/device/sriov_numvfs
sleep 1
echo 1 > /sys/class/net/ens785f0/device/sriov_numvfs
echo 1 > /sys/class/net/ens785f1/device/sriov_numvfs
ip link set ens785f1 vf 0 trust on
ip link set ens785f0 vf 0 trust on
done
Fixes: 77ca27c417 ("ice: add support for virtchnl_queue_select.[tx|rx]_queues bitmap")
Signed-off-by: Brett Creeley <brett.creeley@intel.com>
Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
VF was not able to change its hardware MAC address in case
the new address was already present in the MAC filter list.
Change the handling of VF add mac request to not return
if requested MAC address is already present on the list
and check if its hardware MAC needs to be updated in this case.
Fixes: ed4c068d46 ("ice: Enable ip link show on the PF to display VF unicast MAC(s)")
Signed-off-by: Sylwester Dziedziuch <sylwesterx.dziedziuch@intel.com>
Tested-by: Tony Brelinski <tony.brelinski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Currently when a trusted VF enables promiscuous mode spoofchk will be
disabled. This is wrong and should only be modified from the
ndo_set_vf_spoofchk callback. Fix this by removing the call to toggle
spoofchk for trusted VFs.
Fixes: 01b5e89aab ("ice: Add VF promiscuous support")
Signed-off-by: Brett Creeley <brett.creeley@intel.com>
Tested-by: Tony Brelinski <tony.brelinski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
When a VF requests promiscuous mode and it's trusted and true promiscuous
mode is enabled the PF driver attempts to enable unicast and/or
multicast promiscuous mode filters based on the request. This is fine,
but there are a couple issues with the current code.
[1] The define to configure the unicast promiscuous mode mask also
includes bits to configure the multicast promiscuous mode mask, which
causes multicast to be set/cleared unintentionally.
[2] All 4 cases for enable/disable unicast/multicast mode are not
handled in the promiscuous mode message handler, which causes
unexpected results regarding the current promiscuous mode settings.
To fix [1] make sure any promiscuous mask defines include the correct
bits for each of the promiscuous modes.
To fix [2] make sure that all 4 cases are handled since there are 2 bits
(FLAG_VF_UNICAST_PROMISC and FLAG_VF_MULTICAST_PROMISC) that can be
either set or cleared. Also, since either unicast and/or multicast
promiscuous configuration can fail, introduce two separate error values
to handle each of these cases.
Fixes: 01b5e89aab ("ice: Add VF promiscuous support")
Signed-off-by: Brett Creeley <brett.creeley@intel.com>
Tested-by: Tony Brelinski <tony.brelinski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Normally it is expected that the dsa_device_ops :: rcv() method finishes
parsing the DSA tag and consumes it, then never looks at it again.
But commit c0bcf53766 ("net: dsa: ocelot: add hardware timestamping
support for Felix") added support for RX timestamping in a very
unconventional way. On this switch, a partial timestamp is available in
the DSA header, but the driver got away with not parsing that timestamp
right away, but instead delayed that parsing for a little longer:
dsa_switch_rcv():
nskb = cpu_dp->rcv(skb, dev); <------------- not here
-> ocelot_rcv()
...
skb = nskb;
skb_push(skb, ETH_HLEN);
skb->pkt_type = PACKET_HOST;
skb->protocol = eth_type_trans(skb, skb->dev);
...
if (dsa_skb_defer_rx_timestamp(p, skb)) <--- but here
-> felix_rxtstamp()
return 0;
When in felix_rxtstamp(), this driver accounted for the fact that
eth_type_trans() happened in the meanwhile, so it got a hold of the
extraction header again by subtracting (ETH_HLEN + OCELOT_TAG_LEN) bytes
from the current skb->data.
This worked for quite some time but was quite fragile from the very
beginning. Not to mention that having DSA tag parsing split in two
different files, under different folders (net/dsa/tag_ocelot.c vs
drivers/net/dsa/ocelot/felix.c) made it quite non-obvious for patches to
come that they might break this.
Finally, the blamed commit does the following: at the end of
ocelot_rcv(), it checks whether the skb payload contains a VLAN header.
If it does, and this port is under a VLAN-aware bridge, that VLAN ID
might not be correct in the sense that the packet might have suffered
VLAN rewriting due to TCAM rules (VCAP IS1). So we consume the VLAN ID
from the skb payload using __skb_vlan_pop(), and take the classified
VLAN ID from the DSA tag, and construct a hwaccel VLAN tag with the
classified VLAN, and the skb payload is VLAN-untagged.
The big problem is that __skb_vlan_pop() does:
memmove(skb->data + VLAN_HLEN, skb->data, 2 * ETH_ALEN);
__skb_pull(skb, VLAN_HLEN);
aka it moves the Ethernet header 4 bytes to the right, and pulls 4 bytes
from the skb headroom (effectively also moving skb->data, by definition).
So for felix_rxtstamp()'s fragile logic, all bets are off now.
Instead of having the "extraction" pointer point to the DSA header,
it actually points to 4 bytes _inside_ the extraction header.
Corollary, the last 4 bytes of the "extraction" header are in fact 4
stale bytes of the destination MAC address from the Ethernet header,
from prior to the __skb_vlan_pop() movement.
So of course, RX timestamps are completely bogus when the system is
configured in this way.
The fix is actually very simple: just don't structure the code like that.
For better or worse, the DSA PTP timestamping API does not offer a
straightforward way for drivers to present their RX timestamps, but
other drivers (sja1105) have established a simple mechanism to carry
their RX timestamp from dsa_device_ops :: rcv() all the way to
dsa_switch_ops :: port_rxtstamp() and even later. That mechanism is to
simply save the partial timestamp to the skb->cb, and complete it later.
Question: why don't we simply populate the skb's struct
skb_shared_hwtstamps from ocelot_rcv(), and bother with this
complication of propagating the timestamp to felix_rxtstamp()?
Answer: dsa_switch_ops :: port_rxtstamp() answers the question whether
PTP packets need sleepable context to retrieve the full RX timestamp.
Currently felix_rxtstamp() answers "no, thanks" to that question, and
calls ocelot_ptp_gettime64() from softirq atomic context. This is
understandable, since Felix VSC9959 is a PCIe memory-mapped switch, so
hardware access does not require sleeping. But the felix driver is
preparing for the introduction of other switches where hardware access
is over a slow bus like SPI or MDIO:
https://lore.kernel.org/lkml/20210814025003.2449143-1-colin.foster@in-advantage.com/
So I would like to keep this code structure, so the rework needed when
that driver will need PTP support will be minimal (answer "yes, I need
deferred context for this skb's RX timestamp", then the partial
timestamp will still be found in the skb->cb.
Fixes: ea440cd2d9 ("net: dsa: tag_ocelot: use VLAN information from tagging header when available")
Reported-by: Po Liu <po.liu@nxp.com>
Cc: Yangbo Lu <yangbo.lu@nxp.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Some device set MAC06 exchange in the bootloader. This cause some
problem as we don't support this strange mode and we just set the port6
as the primary CPU port. With MAC06 exchange, PAD0 reg configure port6
instead of port0. Add an extra check and explicitly disable MAC06 exchange
to correctly configure the port PAD config.
Signed-off-by: Ansuel Smith <ansuelsmth@gmail.com>
Fixes: 3fcf734aa4 ("net: dsa: qca8k: add support for cpu port 6")
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Clean up the following includecheck warning:
./drivers/net/amt.c: net/protocol.h is included more than once.
Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com>
Reviewed-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
On m68k:
In function ‘prestera_hw_build_tests’,
inlined from ‘prestera_hw_switch_init’ at drivers/net/ethernet/marvell/prestera/prestera_hw.c:788:2:
././include/linux/compiler_types.h:335:38: error: call to ‘__compiletime_assert_345’ declared with attribute error: BUILD_BUG_ON failed: sizeof(struct prestera_msg_switch_attr_req) != 16
...
The driver assumes structure members are naturally aligned, but does not
add explicit padding, thus breaking architectures where integral values
are not always naturally aligned (e.g. on m68k, __alignof(int) is 2, not
4).
Fixes: bb5dbf2cc6 ("net: marvell: prestera: add firmware v4.0 support")
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Link: https://lore.kernel.org/r/20211102082433.3820514-1-geert@linux-m68k.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Fix following coccicheck warning:
./drivers/net/ethernet/broadcom/bnxt/bnxt_devlink.c:446:8-56: WARNING
avoid newline at end of message in NL_SET_ERR_MSG_MOD.
Signed-off-by: Wan Jiabing <wanjiabing@vivo.com>
Link: https://lore.kernel.org/r/20211102020312.16567-1-wanjiabing@vivo.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
This patch allows to use 0 for `coal->rx_coalesce_usecs` param to
disable rx irq coalescing.
Previously we could enable rx irq coalescing via ethtool
(For ex: `ethtool -C eth0 rx-usecs 2000`) but we couldn't disable
it because this part rejects 0 value:
if (!coal->rx_coalesce_usecs)
return -EINVAL;
Fixes: 84da2658a6 ("TI DaVinci EMAC : Implement interrupt pacing functionality.")
Signed-off-by: Maxim Kiselev <bigunclemax@gmail.com>
Reviewed-by: Grygorii Strashko <grygorii.strashko@ti.com>
Link: https://lore.kernel.org/r/20211101152343.4193233-1-bigunclemax@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Add a function to initialize phy rgmii delay according to phydev->interface.
Signed-off-by: Yuiko Oshino <yuiko.oshino@microchip.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://lore.kernel.org/r/20211101162119.29275-1-yuiko.oshino@microchip.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
- Remove socket skb caches
- Add a SO_RESERVE_MEM socket op to forward allocate buffer space
and avoid memory accounting overhead on each message sent
- Introduce managed neighbor entries - added by control plane and
resolved by the kernel for use in acceleration paths (BPF / XDP
right now, HW offload users will benefit as well)
- Make neighbor eviction on link down controllable by userspace
to work around WiFi networks with bad roaming implementations
- vrf: Rework interaction with netfilter/conntrack
- fq_codel: implement L4S style ce_threshold_ect1 marking
- sch: Eliminate unnecessary RCU waits in mini_qdisc_pair_swap()
BPF:
- Add support for new btf kind BTF_KIND_TAG, arbitrary type tagging
as implemented in LLVM14
- Introduce bpf_get_branch_snapshot() to capture Last Branch Records
- Implement variadic trace_printk helper
- Add a new Bloomfilter map type
- Track <8-byte scalar spill and refill
- Access hw timestamp through BPF's __sk_buff
- Disallow unprivileged BPF by default
- Document BPF licensing
Netfilter:
- Introduce egress hook for looking at raw outgoing packets
- Allow matching on and modifying inner headers / payload data
- Add NFT_META_IFTYPE to match on the interface type either from
ingress or egress
Protocols:
- Multi-Path TCP:
- increase default max additional subflows to 2
- rework forward memory allocation
- add getsockopts: MPTCP_INFO, MPTCP_TCPINFO, MPTCP_SUBFLOW_ADDRS
- MCTP flow support allowing lower layer drivers to configure msg
muxing as needed
- Automatic Multicast Tunneling (AMT) driver based on RFC7450
- HSR support the redbox supervision frames (IEC-62439-3:2018)
- Support for the ip6ip6 encapsulation of IOAM
- Netlink interface for CAN-FD's Transmitter Delay Compensation
- Support SMC-Rv2 eliminating the current same-subnet restriction,
by exploiting the UDP encapsulation feature of RoCE adapters
- TLS: add SM4 GCM/CCM crypto support
- Bluetooth: initial support for link quality and audio/codec
offload
Driver APIs:
- Add a batched interface for RX buffer allocation in AF_XDP
buffer pool
- ethtool: Add ability to control transceiver modules' power mode
- phy: Introduce supported interfaces bitmap to express MAC
capabilities and simplify PHY code
- Drop rtnl_lock from DSA .port_fdb_{add,del} callbacks
New drivers:
- WiFi driver for Realtek 8852AE 802.11ax devices (rtw89)
- Ethernet driver for ASIX AX88796C SPI device (x88796c)
Drivers:
- Broadcom PHYs
- support 72165, 7712 16nm PHYs
- support IDDQ-SR for additional power savings
- PHY support for QCA8081, QCA9561 PHYs
- NXP DPAA2: support for IRQ coalescing
- NXP Ethernet (enetc): support for software TCP segmentation
- Renesas Ethernet (ravb) - support DMAC and EMAC blocks of
Gigabit-capable IP found on RZ/G2L SoC
- Intel 100G Ethernet
- support for eswitch offload of TC/OvS flow API, including
offload of GRE, VxLAN, Geneve tunneling
- support application device queues - ability to assign Rx and Tx
queues to application threads
- PTP and PPS (pulse-per-second) extensions
- Broadcom Ethernet (bnxt)
- devlink health reporting and device reload extensions
- Mellanox Ethernet (mlx5)
- offload macvlan interfaces
- support HW offload of TC rules involving OVS internal ports
- support HW-GRO and header/data split
- support application device queues
- Marvell OcteonTx2:
- add XDP support for PF
- add PTP support for VF
- Qualcomm Ethernet switch (qca8k): support for QCA8328
- Realtek Ethernet DSA switch (rtl8366rb)
- support bridge offload
- support STP, fast aging, disabling address learning
- support for Realtek RTL8365MB-VC, a 4+1 port 10M/100M/1GE switch
- Mellanox Ethernet/IB switch (mlxsw)
- multi-level qdisc hierarchy offload (e.g. RED, prio and shaping)
- offload root TBF qdisc as port shaper
- support multiple routing interface MAC address prefixes
- support for IP-in-IP with IPv6 underlay
- MediaTek WiFi (mt76)
- mt7921 - ASPM, 6GHz, SDIO and testmode support
- mt7915 - LED and TWT support
- Qualcomm WiFi (ath11k)
- include channel rx and tx time in survey dump statistics
- support for 80P80 and 160 MHz bandwidths
- support channel 2 in 6 GHz band
- spectral scan support for QCN9074
- support for rx decapsulation offload (data frames in 802.3
format)
- Qualcomm phone SoC WiFi (wcn36xx)
- enable Idle Mode Power Save (IMPS) to reduce power consumption
during idle
- Bluetooth driver support for MediaTek MT7922 and MT7921
- Enable support for AOSP Bluetooth extension in Qualcomm WCN399x
and Realtek 8822C/8852A
- Microsoft vNIC driver (mana)
- support hibernation and kexec
- Google vNIC driver (gve)
- support for jumbo frames
- implement Rx page reuse
Refactor:
- Make all writes to netdev->dev_addr go thru helpers, so that we
can add this address to the address rbtree and handle the updates
- Various TCP cleanups and optimizations including improvements
to CPU cache use
- Simplify the gnet_stats, Qdisc stats' handling and remove
qdisc->running sequence counter
- Driver changes and API updates to address devlink locking
deficiencies
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmGAzX4ACgkQMUZtbf5S
IrvW3g//Q0ZLrOuHK9pZ8sCXMMhDj8qL6ajm0otMddHWA/+1UglwVBKFhsajfxOf
wJ/5LZis+XKLpLqKTU5chKVfn39HuDGe/D3l+egi01Gv5BW0+XzEhagfyR5tJX5z
wsGG5CXO/we/laVSzRiFtwwVEKHKN20YC+tIQwYOYP5Wy3q4G7qDsFhT7GqgsGCS
n74QUEAIB5Tz0ODWFqLtbsySzIurXrskibwt5T9bvAAlPw/lCU68mmG+NVJ7VddO
lBbNkLMOo8yW9Ci20H09SrYd4jZTmMARo9tsFO1tAvAMk7qpn0Wd8pnOYTjFFoMD
+qjiFSVMh7E0JGb8Y7NCvwaB99suAK5rfGP68Xwe62DfP7vYWEx4pZGxBP19F4ld
6Kn1ME33BX9rUF9tBecf0bdKfJUwB2Q2Xou/b9laG04bwiqsc9iG5FQq1C46lnLZ
QdzNiS1My4dJMczkWt66HF3Kx30ibwHfvKMIHjf4PqkzEatkv6Y6SBZ57KXL+Lde
0BQSFhbf0tm2Gf55etzrczLElI3uqHSFWUNZZ2Bt6WmzO1e6tpV9nAtRWF4C/dFg
QDpLJtOOOY65uq+qz09zoPfv2lem868SrCAuFrVn99bEpYjx/CGNFDeEI02l6jyr
84eUxd364UcbIk3fc+eTGdXHLQNVk30G0AHVBBxaWNIidwfqXeE=
=srde
-----END PGP SIGNATURE-----
Merge tag 'net-next-for-5.16' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next
Pull networking updates from Jakub Kicinski:
"Core:
- Remove socket skb caches
- Add a SO_RESERVE_MEM socket op to forward allocate buffer space and
avoid memory accounting overhead on each message sent
- Introduce managed neighbor entries - added by control plane and
resolved by the kernel for use in acceleration paths (BPF / XDP
right now, HW offload users will benefit as well)
- Make neighbor eviction on link down controllable by userspace to
work around WiFi networks with bad roaming implementations
- vrf: Rework interaction with netfilter/conntrack
- fq_codel: implement L4S style ce_threshold_ect1 marking
- sch: Eliminate unnecessary RCU waits in mini_qdisc_pair_swap()
BPF:
- Add support for new btf kind BTF_KIND_TAG, arbitrary type tagging
as implemented in LLVM14
- Introduce bpf_get_branch_snapshot() to capture Last Branch Records
- Implement variadic trace_printk helper
- Add a new Bloomfilter map type
- Track <8-byte scalar spill and refill
- Access hw timestamp through BPF's __sk_buff
- Disallow unprivileged BPF by default
- Document BPF licensing
Netfilter:
- Introduce egress hook for looking at raw outgoing packets
- Allow matching on and modifying inner headers / payload data
- Add NFT_META_IFTYPE to match on the interface type either from
ingress or egress
Protocols:
- Multi-Path TCP:
- increase default max additional subflows to 2
- rework forward memory allocation
- add getsockopts: MPTCP_INFO, MPTCP_TCPINFO, MPTCP_SUBFLOW_ADDRS
- MCTP flow support allowing lower layer drivers to configure msg
muxing as needed
- Automatic Multicast Tunneling (AMT) driver based on RFC7450
- HSR support the redbox supervision frames (IEC-62439-3:2018)
- Support for the ip6ip6 encapsulation of IOAM
- Netlink interface for CAN-FD's Transmitter Delay Compensation
- Support SMC-Rv2 eliminating the current same-subnet restriction, by
exploiting the UDP encapsulation feature of RoCE adapters
- TLS: add SM4 GCM/CCM crypto support
- Bluetooth: initial support for link quality and audio/codec offload
Driver APIs:
- Add a batched interface for RX buffer allocation in AF_XDP buffer
pool
- ethtool: Add ability to control transceiver modules' power mode
- phy: Introduce supported interfaces bitmap to express MAC
capabilities and simplify PHY code
- Drop rtnl_lock from DSA .port_fdb_{add,del} callbacks
New drivers:
- WiFi driver for Realtek 8852AE 802.11ax devices (rtw89)
- Ethernet driver for ASIX AX88796C SPI device (x88796c)
Drivers:
- Broadcom PHYs
- support 72165, 7712 16nm PHYs
- support IDDQ-SR for additional power savings
- PHY support for QCA8081, QCA9561 PHYs
- NXP DPAA2: support for IRQ coalescing
- NXP Ethernet (enetc): support for software TCP segmentation
- Renesas Ethernet (ravb) - support DMAC and EMAC blocks of
Gigabit-capable IP found on RZ/G2L SoC
- Intel 100G Ethernet
- support for eswitch offload of TC/OvS flow API, including
offload of GRE, VxLAN, Geneve tunneling
- support application device queues - ability to assign Rx and Tx
queues to application threads
- PTP and PPS (pulse-per-second) extensions
- Broadcom Ethernet (bnxt)
- devlink health reporting and device reload extensions
- Mellanox Ethernet (mlx5)
- offload macvlan interfaces
- support HW offload of TC rules involving OVS internal ports
- support HW-GRO and header/data split
- support application device queues
- Marvell OcteonTx2:
- add XDP support for PF
- add PTP support for VF
- Qualcomm Ethernet switch (qca8k): support for QCA8328
- Realtek Ethernet DSA switch (rtl8366rb)
- support bridge offload
- support STP, fast aging, disabling address learning
- support for Realtek RTL8365MB-VC, a 4+1 port 10M/100M/1GE switch
- Mellanox Ethernet/IB switch (mlxsw)
- multi-level qdisc hierarchy offload (e.g. RED, prio and shaping)
- offload root TBF qdisc as port shaper
- support multiple routing interface MAC address prefixes
- support for IP-in-IP with IPv6 underlay
- MediaTek WiFi (mt76)
- mt7921 - ASPM, 6GHz, SDIO and testmode support
- mt7915 - LED and TWT support
- Qualcomm WiFi (ath11k)
- include channel rx and tx time in survey dump statistics
- support for 80P80 and 160 MHz bandwidths
- support channel 2 in 6 GHz band
- spectral scan support for QCN9074
- support for rx decapsulation offload (data frames in 802.3
format)
- Qualcomm phone SoC WiFi (wcn36xx)
- enable Idle Mode Power Save (IMPS) to reduce power consumption
during idle
- Bluetooth driver support for MediaTek MT7922 and MT7921
- Enable support for AOSP Bluetooth extension in Qualcomm WCN399x and
Realtek 8822C/8852A
- Microsoft vNIC driver (mana)
- support hibernation and kexec
- Google vNIC driver (gve)
- support for jumbo frames
- implement Rx page reuse
Refactor:
- Make all writes to netdev->dev_addr go thru helpers, so that we can
add this address to the address rbtree and handle the updates
- Various TCP cleanups and optimizations including improvements to
CPU cache use
- Simplify the gnet_stats, Qdisc stats' handling and remove
qdisc->running sequence counter
- Driver changes and API updates to address devlink locking
deficiencies"
* tag 'net-next-for-5.16' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (2122 commits)
Revert "net: avoid double accounting for pure zerocopy skbs"
selftests: net: add arp_ndisc_evict_nocarrier
net: ndisc: introduce ndisc_evict_nocarrier sysctl parameter
net: arp: introduce arp_evict_nocarrier sysctl parameter
libbpf: Deprecate AF_XDP support
kbuild: Unify options for BTF generation for vmlinux and modules
selftests/bpf: Add a testcase for 64-bit bounds propagation issue.
bpf: Fix propagation of signed bounds from 64-bit min/max into 32-bit.
bpf: Fix propagation of bounds from 64-bit min/max into 32-bit and var_off.
net: vmxnet3: remove multiple false checks in vmxnet3_ethtool.c
net: avoid double accounting for pure zerocopy skbs
tcp: rename sk_wmem_free_skb
netdevsim: fix uninit value in nsim_drv_configure_vfs()
selftests/bpf: Fix also no-alu32 strobemeta selftest
bpf: Add missing map_delete_elem method to bloom filter map
selftests/bpf: Add bloom map success test for userspace calls
bpf: Add alignment padding for "map_extra" + consolidate holes
bpf: Bloom filter map naming fixups
selftests/bpf: Add test cases for struct_ops prog
bpf: Add dummy BPF STRUCT_OPS for test purpose
...
The end goal of the current buffer overflow detection work[0] is to gain
full compile-time and run-time coverage of all detectable buffer overflows
seen via array indexing or memcpy(), memmove(), and memset(). The str*()
family of functions already have full coverage.
While much of the work for these changes have been on-going for many
releases (i.e. 0-element and 1-element array replacements, as well as
avoiding false positives and fixing discovered overflows[1]), this series
contains the foundational elements of several related buffer overflow
detection improvements by providing new common helpers and FORTIFY_SOURCE
changes needed to gain the introspection required for compiler visibility
into array sizes. Also included are a handful of already Acked instances
using the helpers (or related clean-ups), with many more waiting at the
ready to be taken via subsystem-specific trees[2]. The new helpers are:
- struct_group() for gaining struct member range introspection.
- memset_after() and memset_startat() for clearing to the end of structures.
- DECLARE_FLEX_ARRAY() for using flex arrays in unions or alone in structs.
Also included is the beginning of the refactoring of FORTIFY_SOURCE to
support memcpy() introspection, fix missing and regressed coverage under
GCC, and to prepare to fix the currently broken Clang support. Finishing
this work is part of the larger series[0], but depends on all the false
positives and buffer overflow bug fixes to have landed already and those
that depend on this series to land.
As part of the FORTIFY_SOURCE refactoring, a set of both a compile-time
and run-time tests are added for FORTIFY_SOURCE and the mem*()-family
functions respectively. The compile time tests have found a legitimate
(though corner-case) bug[6] already.
Please note that the appearance of "panic" and "BUG" in the
FORTIFY_SOURCE refactoring are the result of relocating existing code,
and no new use of those code-paths are expected nor desired.
Finally, there are two tree-wide conversions for 0-element arrays and
flexible array unions to gain sane compiler introspection coverage that
result in no known object code differences.
After this series (and the changes that have now landed via netdev
and usb), we are very close to finally being able to build with
-Warray-bounds and -Wzero-length-bounds. However, due corner cases in
GCC[3] and Clang[4], I have not included the last two patches that turn
on these options, as I don't want to introduce any known warnings to
the build. Hopefully these can be solved soon.
[0] https://lore.kernel.org/lkml/20210818060533.3569517-1-keescook@chromium.org/
[1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/log/?qt=grep&q=FORTIFY_SOURCE
[2] https://lore.kernel.org/lkml/202108220107.3E26FE6C9C@keescook/
[3] https://lore.kernel.org/lkml/3ab153ec-2798-da4c-f7b1-81b0ac8b0c5b@roeck-us.net/
[4] https://bugs.llvm.org/show_bug.cgi?id=51682
[5] https://lore.kernel.org/lkml/202109051257.29B29745C0@keescook/
[6] https://lore.kernel.org/lkml/20211020200039.170424-1-keescook@chromium.org/
-----BEGIN PGP SIGNATURE-----
iQJKBAABCgA0FiEEpcP2jyKd1g9yPm4TiXL039xtwCYFAmGAFWcWHGtlZXNjb29r
QGNocm9taXVtLm9yZwAKCRCJcvTf3G3AJmKFD/45MJdnvW5MhIEeW5tc5UjfcIPS
ae+YvlEX/2ZwgSlTxocFVocE6hz7b6eCiX3dSAChPkPxsSfgeiuhjxsU+4ROnELR
04RqTA/rwT6JXfJcXbDPXfxDL4huUkgktAW3m1sT771AZspeap2GrSwFyttlTqKA
+kTiZ3lXJVFcw10uyhfp3Lk6eFJxdf5iOjuEou5kBOQfpNKEOduRL2K15hSowOwB
lARiAC+HbmN+E+npvDE7YqK4V7ZQ0/dtB0BlfqgTkn1spQz8N21kBAMpegV5vvIk
A+qGHc7q2oyk4M14TRTidQHGQ4juW1Kkvq3NV6KzwQIVD+mIfz0ESn3d4tnp28Hk
Y+OXTI1BRFlApQU9qGWv33gkNEozeyqMLDRLKhDYRSFPA9UKkpgXQRzeTzoLKyrQ
4B6n5NnUGcu7I6WWhpyZQcZLDsHGyy0vHzjQGs/NXtb1PzXJ5XIGuPdmx9pVMykk
IVKnqRcWyGWahfh3asOnoXvdhi1No4NSHQ/ZHfUM+SrIGYjBMaUisw66qm3Fe8ZU
lbO2CFkCsfGSoKNPHf0lUEGlkyxAiDolazOfflDNxdzzlZo2X1l/a7O/yoO4Pqul
cdL0eDjiNoQ2YR2TSYPnXq5KSL1RI0tlfS8pH8k1hVhZsQx0wpAQ+qki0S+fLePV
PdA9XB82G2tmqKc9cQ==
=9xbT
-----END PGP SIGNATURE-----
Merge tag 'overflow-v5.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux
Pull overflow updates from Kees Cook:
"The end goal of the current buffer overflow detection work[0] is to
gain full compile-time and run-time coverage of all detectable buffer
overflows seen via array indexing or memcpy(), memmove(), and
memset(). The str*() family of functions already have full coverage.
While much of the work for these changes have been on-going for many
releases (i.e. 0-element and 1-element array replacements, as well as
avoiding false positives and fixing discovered overflows[1]), this
series contains the foundational elements of several related buffer
overflow detection improvements by providing new common helpers and
FORTIFY_SOURCE changes needed to gain the introspection required for
compiler visibility into array sizes. Also included are a handful of
already Acked instances using the helpers (or related clean-ups), with
many more waiting at the ready to be taken via subsystem-specific
trees[2].
The new helpers are:
- struct_group() for gaining struct member range introspection
- memset_after() and memset_startat() for clearing to the end of
structures
- DECLARE_FLEX_ARRAY() for using flex arrays in unions or alone in
structs
Also included is the beginning of the refactoring of FORTIFY_SOURCE to
support memcpy() introspection, fix missing and regressed coverage
under GCC, and to prepare to fix the currently broken Clang support.
Finishing this work is part of the larger series[0], but depends on
all the false positives and buffer overflow bug fixes to have landed
already and those that depend on this series to land.
As part of the FORTIFY_SOURCE refactoring, a set of both a
compile-time and run-time tests are added for FORTIFY_SOURCE and the
mem*()-family functions respectively. The compile time tests have
found a legitimate (though corner-case) bug[6] already.
Please note that the appearance of "panic" and "BUG" in the
FORTIFY_SOURCE refactoring are the result of relocating existing code,
and no new use of those code-paths are expected nor desired.
Finally, there are two tree-wide conversions for 0-element arrays and
flexible array unions to gain sane compiler introspection coverage
that result in no known object code differences.
After this series (and the changes that have now landed via netdev and
usb), we are very close to finally being able to build with
-Warray-bounds and -Wzero-length-bounds.
However, due corner cases in GCC[3] and Clang[4], I have not included
the last two patches that turn on these options, as I don't want to
introduce any known warnings to the build. Hopefully these can be
solved soon"
Link: https://lore.kernel.org/lkml/20210818060533.3569517-1-keescook@chromium.org/ [0]
Link: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/log/?qt=grep&q=FORTIFY_SOURCE [1]
Link: https://lore.kernel.org/lkml/202108220107.3E26FE6C9C@keescook/ [2]
Link: https://lore.kernel.org/lkml/3ab153ec-2798-da4c-f7b1-81b0ac8b0c5b@roeck-us.net/ [3]
Link: https://bugs.llvm.org/show_bug.cgi?id=51682 [4]
Link: https://lore.kernel.org/lkml/202109051257.29B29745C0@keescook/ [5]
Link: https://lore.kernel.org/lkml/20211020200039.170424-1-keescook@chromium.org/ [6]
* tag 'overflow-v5.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: (30 commits)
fortify: strlen: Avoid shadowing previous locals
compiler-gcc.h: Define __SANITIZE_ADDRESS__ under hwaddress sanitizer
treewide: Replace 0-element memcpy() destinations with flexible arrays
treewide: Replace open-coded flex arrays in unions
stddef: Introduce DECLARE_FLEX_ARRAY() helper
btrfs: Use memset_startat() to clear end of struct
string.h: Introduce memset_startat() for wiping trailing members and padding
xfrm: Use memset_after() to clear padding
string.h: Introduce memset_after() for wiping trailing members/padding
lib: Introduce CONFIG_MEMCPY_KUNIT_TEST
fortify: Add compile-time FORTIFY_SOURCE tests
fortify: Allow strlen() and strnlen() to pass compile-time known lengths
fortify: Prepare to improve strnlen() and strlen() warnings
fortify: Fix dropped strcpy() compile-time write overflow check
fortify: Explicitly disable Clang support
fortify: Move remaining fortify helpers into fortify-string.h
lib/string: Move helper functions out of string.c
compiler_types.h: Remove __compiletime_object_size()
cm4000_cs: Use struct_group() to zero struct cm4000_dev region
can: flexcan: Use struct_group() to zero struct flexcan_regs regions
...
In one if branch, (ec->rx_coalesce_usecs != 0) is checked. When it is
checked again in two more places, it is always false and has no effect
on the whole check expression. We should remove it in both places.
In another if branch, (ec->use_adaptive_rx_coalesce != 0) is checked.
When it is checked again, it is always false. We should remove the
entire branch with it.
In addition we might as well let C precedence dictate by getting rid of
two pairs of parentheses in the neighboring lines in order to keep
expressions on both sides of '||' in balance with checkpatch warning
silenced.
Signed-off-by: Jean Sacren <sakiwit@gmail.com>
Link: https://lore.kernel.org/r/20211031012728.8325-1-sakiwit@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Build bot points out that I missed initializing ret
after refactoring.
Reported-by: kernel test robot <lkp@intel.com>
Fixes: 1c401078bc ("netdevsim: move details of vf config to dev")
Link: https://lore.kernel.org/r/20211101221845.3188490-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
In the previous patch, igmp report handler was added.
That handler can be used for mld too.
So, it uses that common code to parse mld report message.
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
amt 'Relay' interface manages multicast groups(igmp/mld) and sources.
In order to manage, it should have the function to parse igmp/mld
report messages. So, this adds the logic for parsing igmp report messages
and saves them on their own data structure.
struct amt_group_node means one group(igmp/mld).
struct amt_source_node means one source.
The same source can't exist in the same group.
The same group can exist in the same tunnel because it manages
the host address too.
The group information is used when forwarding multicast data.
If there are no groups in the specific tunnel, Relay doesn't forward it.
Although Relay manages sources, it doesn't support the source filtering
feature. Because the reason to manage sources is just that in order
to manage group more correctly.
In the next patch, MLD part will be added.
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Before forwarding multicast traffic, the amt interface establishes between
gateway and relay. In order to establish, amt defined some message type
and those message flow looks like the below.
Gateway Relay
------- -----
: Request :
[1] | N |
|---------------------->|
| Membership Query | [2]
| N,MAC,gADDR,gPORT |
|<======================|
[3] | Membership Update |
| ({G:INCLUDE({S})}) |
|======================>|
| |
---------------------:-----------------------:---------------------
| | | |
| | *Multicast Data | *IP Packet(S,G) |
| | gADDR,gPORT |<-----------------() |
| *IP Packet(S,G) |<======================| |
| ()<-----------------| | |
| | | |
---------------------:-----------------------:---------------------
~ ~
~ Request ~
[4] | N' |
|---------------------->|
| Membership Query | [5]
| N',MAC',gADDR',gPORT' |
|<======================|
[6] | |
| Teardown |
| N,MAC,gADDR,gPORT |
|---------------------->|
| | [7]
| Membership Update |
| ({G:INCLUDE({S})}) |
|======================>|
| |
---------------------:-----------------------:---------------------
| | | |
| | *Multicast Data | *IP Packet(S,G) |
| | gADDR',gPORT' |<-----------------() |
| *IP Packet (S,G) |<======================| |
| ()<-----------------| | |
| | | |
---------------------:-----------------------:---------------------
| |
: :
1. Discovery
- Sent by Gateway to Relay
- To find Relay unique ip address
2. Advertisement
- Sent by Relay to Gateway
- Contains the unique IP address
3. Request
- Sent by Gateway to Relay
- Solicit to receive 'Query' message.
4. Query
- Sent by Relay to Gateway
- Contains General Query message.
5. Update
- Sent by Gateway to Relay
- Contains report message.
6. Multicast Data
- Sent by Relay to Gateway
- encapsulated multicast traffic.
7. Teardown
- Not supported at this time.
Except for the Teardown message, it supports all messages.
In the next patch, IGMP/MLD logic will be added.
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
It adds definitions and control plane code for AMT.
this is very similar to udp tunneling interfaces such as gtp, vxlan, etc.
In the next patch, data plane code will be added.
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Rename functions serving as driver entry points
from nsim_dev_... to nsim_drv_... this makes the
API boundary between bus and dev clearer.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
max_vfs is a strange little beast because the file
hangs off of nsim's debugfs, but it configures a field
in the bus device. Move it to dev.c, let's look at it
as if the device driver was imposing VF limit based
on FW info (like pci_sriov_set_totalvfs()).
Again, when moving refactor the function not to hold
the vfs lock pointlessly while parsing the input.
Wrap the access from the read side in READ_ONCE()
to appease concurrency checkers. Do not check if
return value from snprintf() is negative...
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since "eswitch" configuration was added bus.c contains
a lot of device details which really belong to dev.c.
Restructure the code while moving it.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
When netdevsim got split into the faux bus vfconfig ended
up in the bus device (think pci_dev) which is strange because
it contains very networky not to say netdevy information.
Move it to nsim_dev, which is the driver "priv" structure
for the device.
To make sure we don't race with probe/remove take
the device lock (much like PCI).
While at it remove the NULL-checking of vfconfigs.
It appears to be pointless.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Legacy VF NDOs look at num_vfs and then based on that
index into vfconfig. If we don't rtnl_lock() num_vfs
may get set to 0 and vfconfig freed/replaced while
the NDO is running.
We don't need to protect replacing vfconfig since it's
only done when num_vfs is 0.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Implement the suspend/resume/shutdown callbacks for hibernation/kexec.
Add mana_gd_setup() and mana_gd_cleanup() for some common code, and
use them in the mand_gd_* callbacks.
Reuse mana_probe/remove() for the hibernation path.
Signed-off-by: Dexuan Cui <decui@microsoft.com>
Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently when the HWC creation fails, the error handling is flawed,
e.g. if mana_hwc_create_channel() -> mana_hwc_establish_channel() fails,
the resources acquired in mana_hwc_init_queues() is not released.
Enhance mana_hwc_destroy_channel() to do the proper cleanup work and
call it accordingly.
Signed-off-by: Dexuan Cui <decui@microsoft.com>
Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The PF driver might use the OS info for statistical purposes.
Signed-off-by: Dexuan Cui <decui@microsoft.com>
Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use the correct port index rather than 0.
Signed-off-by: Dexuan Cui <decui@microsoft.com>
Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If we get CRQ_INIT, we set errno to -EIO and first call complete() to
notify the waiter. Then we try to schedule a FAILOVER reset. If this
occurs while adapter is in PROBING state, ibmvnic_reset() changes the
error code to EAGAIN and returns without scheduling the FAILOVER. The
purpose of setting error code to EAGAIN is to ask the waiter to retry.
But due to the earlier complete() call, the waiter may already have seen
the -EIO response and decided not to retry. This can cause intermittent
failures when bringing up ibmvnic adapters during boot, specially in
in kexec/kdump kernels.
Defer the complete() call until after scheduling the reset.
Also streamline the error code to EAGAIN. Don't see why we need EIO
sometimes. All 3 callers of ibmvnic_reset_init() can handle EAGAIN.
Fixes: 17c8705838 ("ibmvnic: Return error code if init interrupted by transport event")
Reported-by: Vaishnavi Bhat <vaish123@in.ibm.com>
Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.ibm.com>
Reviewed-by: Dany Madden <drt@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Soon after registering a CRQ it is possible that we get a fail over or
maybe a CRQ_INIT from the VIOS while interrupts were disabled.
Look for any such CRQs after enabling interrupts.
Otherwise we can intermittently fail to bring up ibmvnic adapters during
boot, specially in kexec/kdump kernels.
Fixes: 032c5e8284 ("Driver for IBM System i/p VNIC protocol")
Reported-by: Vaishnavi Bhat <vaish123@in.ibm.com>
Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.ibm.com>
Reviewed-by: Dany Madden <drt@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If adapter's resetting bit is on, discard the packet but don't stop the
transmit queue - instead leave that to the reset code. With this change,
it is possible that we may get several calls to ibmvnic_xmit() that simply
discard packets and return.
But if we stop the queue here, we might end up doing so just after
__ibmvnic_open() started the queues (during a hard/soft reset) and before
the ->resetting bit was cleared. If that happens, there will be no one to
restart queue and transmissions will be blocked indefinitely.
This can cause a TIMEOUT reset and with auto priority failover enabled,
an unnecessary FAILOVER reset to less favored backing device and then a
FAILOVER back to the most favored backing device. If we hit the window
repeatedly, we can get stuck in a loop of TIMEOUT, FAILOVER, FAILOVER
resets leaving the adapter unusable for extended periods of time.
Fixes: 7f5b030830 ("ibmvnic: Free skb's in cases of failure in transmit")
Reported-by: Abdul Haleem <abdhalee@in.ibm.com>
Reported-by: Vaishnavi Bhat <vaish123@in.ibm.com>
Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.ibm.com>
Reviewed-by: Dany Madden <drt@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The tunnel_type check only allows for "netif_is_gretap", but for
OVS the port is actually "netif_is_ip6gretap" when setting up GRE
for ipv6, which means offloading request was rejected before.
Therefore, adding "netif_is_ip6gretap" allow ipv6gretap interface
for offloading.
Signed-off-by: Yu Xiao <yu.xiao@corigine.com>
Signed-off-by: Louis Peens <louis.peens@corigine.com>
Signed-off-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>