linux

korg/linux

mirror of https://mirrors.bfsu.edu.cn/git/linux.git synced 2024-11-15 16:24:13 +08:00

Author	SHA1	Message	Date
Nikolay Aleksandrov	2dba407f99	net: bridge: multicast: make tracked EHT hosts limit configurable Add two new port attributes which make EHT hosts limit configurable and export the current number of tracked EHT hosts: - IFLA_BRPORT_MCAST_EHT_HOSTS_LIMIT: configure/retrieve current limit - IFLA_BRPORT_MCAST_EHT_HOSTS_CNT: current number of tracked hosts Setting IFLA_BRPORT_MCAST_EHT_HOSTS_LIMIT to 0 is currently not allowed. Note that we have to increase RTNL_SLAVE_MAX_TYPE to 38 minimum, I've increased it to 40 to have space for two more future entries. v2: move br_multicast_eht_set_hosts_limit() to br_multicast_eht.c, no functional change Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-01-27 17:40:35 -08:00
Nikolay Aleksandrov	89268b056e	net: bridge: multicast: add per-port EHT hosts limit Add a default limit of 512 for number of tracked EHT hosts per-port. Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-01-27 17:40:35 -08:00
Pablo Neira Ayuso	345023b0db	netfilter: nftables: add nft_parse_register_store() and use it This new function combines the netlink register attribute parser and the store validation function. This update requires to replace: enum nft_registers dreg:8; in many of the expression private areas otherwise compiler complains with: error: cannot take address of bit-field ‘dreg’ when passing the register field as reference. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2021-01-27 23:16:02 +01:00
Nikolay Aleksandrov	3e841bacf7	net: bridge: multicast: fix br_multicast_eht_set_entry_lookup indentation Fix the messed up indentation in br_multicast_eht_set_entry_lookup(). Fixes: `baa74d39ca` ("net: bridge: multicast: add EHT source set handling functions") Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Link: https://lore.kernel.org/r/20210125082040.13022-1-razor@blackwall.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-01-26 16:26:50 -08:00
Jiapeng Zhong	8d21c882ab	bridge: Use PTR_ERR_OR_ZERO instead if(IS_ERR(...)) + PTR_ERR coccicheck suggested using PTR_ERR_OR_ZERO() and looking at the code. Fix the following coccicheck warnings: ./net/bridge/br_multicast.c:1295:7-13: WARNING: PTR_ERR_OR_ZERO can be used. Reported-by: Abaci <abaci@linux.alibaba.com> Signed-off-by: Jiapeng Zhong <abaci-bugfix@linux.alibaba.com> Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com> Link: https://lore.kernel.org/r/1611542381-91178-1-git-send-email-abaci-bugfix@linux.alibaba.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-01-25 18:23:07 -08:00
Rasmus Villemoes	6781939054	net: mrp: move struct definitions out of uapi None of these are actually used in the kernel/userspace interface - there's a userspace component of implementing MRP, and userspace will need to construct certain frames to put on the wire, but there's no reason the kernel should provide the relevant definitions in a UAPI header. In fact, some of those definitions were broken until previous commit, so only keep the few that are actually referenced in the kernel code, and move them to the br_private_mrp.h header. Signed-off-by: Rasmus Villemoes <rasmus.villemoes@prevas.dk> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-01-23 12:38:42 -08:00
Nikolay Aleksandrov	d5a1022283	net: bridge: multicast: mark IGMPv3/MLDv2 fast-leave deletes Mark groups which were deleted due to fast leave/EHT. Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-01-22 19:39:57 -08:00
Nikolay Aleksandrov	e87e4b5caa	net: bridge: multicast: handle block pg delete for all cases A block report can result in empty source and host sets for both include and exclude groups so if there are no hosts left we can safely remove the group. Pull the block group handling so it can cover both cases and add a check if EHT requires the delete. Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-01-22 19:39:57 -08:00
Nikolay Aleksandrov	c9739016a0	net: bridge: multicast: add EHT host filter_mode handling We should be able to handle host filter mode changing. For exclude mode we must create a zero-src entry so the group will be kept even without any S,G entries (non-zero source sets). That entry doesn't count to the entry limit and can always be created, its timer is refreshed on new exclude reports and if we change the host filter mode to include then it gets removed and we rely only on the non-zero source sets. Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-01-22 19:39:57 -08:00
Nikolay Aleksandrov	b66bf55bbc	net: bridge: multicast: optimize TO_INCLUDE EHT timeouts This is an optimization specifically for TO_INCLUDE which sends queries for the older entries and thus lowers the S,G timers to LMQT. If we have the following situation for a group in either include or exclude mode: - host A was interested in srcs X and Y, but is timing out - host B sends TO_INCLUDE src Z, the bridge lowers X and Y's timeouts to LMQT - host B sends BLOCK src Z after LMQT time has passed => since host B is the last host we can delete the group, but if we still have host A's EHT entries for X and Y (i.e. if they weren't lowered to LMQT previously) then we'll have to wait another LMQT time before deleting the group, with this optimization we can directly remove it regardless of the group mode as there are no more interested hosts Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-01-22 19:39:57 -08:00
Nikolay Aleksandrov	ddc255d993	net: bridge: multicast: add EHT include and exclude handling Add support for IGMPv3/MLDv2 include and exclude EHT handling. Similar to how the reports are processed we have 2 cases when the group is in include or exclude mode, these are processed as follows: - group include - is_include: create missing entries - to_include: flush existing entries and create a new set from the report, obviously if the src set is empty then we delete the group - group exclude - is_exclude: create missing entries - to_exclude: flush existing entries and create a new set from the report, any empty source set entries are removed If the group is in a different mode then we just flush all entries reported by the host and we create a new set with the new mode entries created from the report. If the report is include type, the source list is empty and the group has empty sources' set then we remove it. Any source set entries which are empty are removed as well. If the group is in exclude mode it can exist without any S,G entries (allowing for all traffic to pass). Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-01-22 19:39:57 -08:00
Nikolay Aleksandrov	474ddb37fa	net: bridge: multicast: add EHT allow/block handling Add support for IGMPv3/MLDv2 allow/block EHT handling. Similar to how the reports are processed we have 2 cases when the group is in include or exclude mode, these are processed as follows: - group include - allow: create missing entries - block: remove existing matching entries and remove the corresponding S,G entries if there are no more set host entries, then possibly delete the whole group if there are no more S,G entries - group exclude - allow - host include: create missing entries - host exclude: remove existing matching entries and remove the corresponding S,G entries if there are no more set host entries - block - host include: remove existing matching entries and remove the corresponding S,G entries if there are no more set host entries, then possibly delete the whole group if there are no more S,G entries - host exclude: create missing entries Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-01-22 19:39:56 -08:00
Nikolay Aleksandrov	dba6b0a5ca	net: bridge: multicast: add EHT host delete function Now that we can delete set entries, we can use that to remove EHT hosts. Since the group's host set entries exist only when there are related source set entries we just have to flush all source set entries joined by the host set entry and it will be automatically removed. Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-01-22 19:39:56 -08:00
Nikolay Aleksandrov	baa74d39ca	net: bridge: multicast: add EHT source set handling functions Add EHT source set and set-entry create, delete and lookup functions. These allow to manipulate source sets which contain their own host sets with entries which joined that S,G. We're limiting the maximum number of tracked S,G entries per host to PG_SRC_ENT_LIMIT (currently 32) which is the current maximum of S,G entries for a group. There's a per-set timer which will be used to destroy the whole set later. Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-01-22 19:39:56 -08:00
Nikolay Aleksandrov	5b16328879	net: bridge: multicast: add EHT host handling functions Add functions to create, destroy and lookup an EHT host. These are per-host entries contained in the eht_host_tree in net_bridge_port_group which are used to store a list of all sources (S,G) entries joined for that group by each host, the host's current filter mode and total number of joined entries. No functional changes yet, these would be used in later patches. Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-01-22 19:39:56 -08:00
Nikolay Aleksandrov	8f07b83119	net: bridge: multicast: add EHT structures and definitions Add EHT structures for tracking hosts and sources per group. We keep one set for each host which has all of the host's S,G entries, and one set for each multicast source which has all hosts that have joined that S,G. For each host, source entry we record the filter_mode and we keep an expiry timer. There is also one global expiry timer per source set, it is updated with each set entry update, it will be later used to lower the set's timer instead of lowering each entry's timer separately. Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-01-22 19:39:56 -08:00
Nikolay Aleksandrov	e7cfcf2c18	net: bridge: multicast: calculate idx position without changing ptr We need to preserve the srcs pointer since we'll be passing it for EHT handling later. Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-01-22 19:39:56 -08:00
Nikolay Aleksandrov	0ad57c99e8	net: bridge: multicast: __grp_src_block_incl can modify pg Prepare __grp_src_block_incl() for being able to cause a notification due to changes. Currently it cannot happen, but EHT would change that since we'll be deleting sources immediately. Make sure that if the pg is deleted we don't return true as that would cause the caller to access freed pg. This patch shouldn't cause any functional change. Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-01-22 19:39:56 -08:00
Nikolay Aleksandrov	54bea72196	net: bridge: multicast: pass host src address to IGMPv3/MLDv2 functions We need to pass the host address so later it can be used for explicit host tracking. No functional change. Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-01-22 19:39:56 -08:00
Nikolay Aleksandrov	9e10b9e656	net: bridge: multicast: rename src_size to addr_size Rename src_size argument to addr_size in preparation for passing host address as an argument to IGMPv3/MLDv2 functions. No functional change. Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-01-22 19:39:56 -08:00
Menglong Dong	a98c0c4742	net: bridge: check vlan with eth_type_vlan() method Replace some checks for ETH_P_8021Q and ETH_P_8021AD with eth_type_vlan(). Signed-off-by: Menglong Dong <dong.menglong@zte.com.cn> Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com> Link: https://lore.kernel.org/r/20210117080950.122761-1-dong.menglong@zte.com.cn Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-01-18 14:27:33 -08:00
Vladimir Oltean	b7a9e0da2d	net: switchdev: remove vid_begin -> vid_end range from VLAN objects The call path of a switchdev VLAN addition to the bridge looks something like this today: nbp_vlan_init \| __br_vlan_set_default_pvid \| \| \| \| \| br_afspec \| \| \| \| \| \| \| v \| \| \| br_process_vlan_info \| \| \| \| \| \| \| v \| \| \| br_vlan_info \| \| \| / \ / \| \| / \ / \| \| / \ / \| \| / \ / v v v v v nbp_vlan_add br_vlan_add ------+ \| ^ ^ \| \| \| / \| \| \| \| / / / \| \ br_vlan_get_master/ / v \ ^ / / br_vlan_add_existing \ \| / / \| \ \| / / / \ \| / / / \ \| / / / \ \| / / / v \| \| v / __vlan_add / / \| / / \| / v \| / __vlan_vid_add \| / \ \| / v v v br_switchdev_port_vlan_add The ranges UAPI was introduced to the bridge in commit `bdced7ef78` ("bridge: support for multiple vlans and vlan ranges in setlink and dellink requests") (Jan 10 2015). But the VLAN ranges (parsed in br_afspec) have always been passed one by one, through struct bridge_vlan_info tmp_vinfo, to br_vlan_info. So the range never went too far in depth. Then Scott Feldman introduced the switchdev_port_bridge_setlink function in commit `47f8328bb1` ("switchdev: add new switchdev bridge setlink"). That marked the introduction of the SWITCHDEV_OBJ_PORT_VLAN, which made full use of the range. But switchdev_port_bridge_setlink was called like this: br_setlink -> br_afspec -> switchdev_port_bridge_setlink Basically, the switchdev and the bridge code were not tightly integrated. Then commit `41c498b935` ("bridge: restore br_setlink back to original") came, and switchdev drivers were required to implement .ndo_bridge_setlink = switchdev_port_bridge_setlink for a while. In the meantime, commits such as `0944d6b5a2` ("bridge: try switchdev op first in __vlan_vid_add/del") finally made switchdev penetrate the br_vlan_info() barrier and start to develop the call path we have today. But remember, br_vlan_info() still receives VLANs one by one. Then Arkadi Sharshevsky refactored the switchdev API in 2017 in commit `29ab586c3d` ("net: switchdev: Remove bridge bypass support from switchdev") so that drivers would not implement .ndo_bridge_setlink any longer. The switchdev_port_bridge_setlink also got deleted. This refactoring removed the parallel bridge_setlink implementation from switchdev, and left the only switchdev VLAN objects to be the ones offloaded from __vlan_vid_add (basically RX filtering) and __vlan_add (the latter coming from commit `9c86ce2c1a` ("net: bridge: Notify about bridge VLANs")). That is to say, today the switchdev VLAN object ranges are not used in the kernel. Refactoring the above call path is a bit complicated, when the bridge VLAN call path is already a bit complicated. Let's go off and finish the job of commit `29ab586c3d` by deleting the bogus iteration through the VLAN ranges from the drivers. Some aspects of this feature never made too much sense in the first place. For example, what is a range of VLANs all having the BRIDGE_VLAN_INFO_PVID flag supposed to mean, when a port can obviously have a single pvid? This particular configuration _is_ denied as of commit `6623c60dc2` ("bridge: vlan: enforce no pvid flag in vlan ranges"), but from an API perspective, the driver still has to play pretend, and only offload the vlan->vid_end as pvid. And the addition of a switchdev VLAN object can modify the flags of another, completely unrelated, switchdev VLAN object! (a VLAN that is PVID will invalidate the PVID flag from whatever other VLAN had previously been offloaded with switchdev and had that flag. Yet switchdev never notifies about that change, drivers are supposed to guess). Nonetheless, having a VLAN range in the API makes error handling look scarier than it really is - unwinding on errors and all of that. When in reality, no one really calls this API with more than one VLAN. It is all unnecessary complexity. And despite appearing pretentious (two-phase transactional model and all), the switchdev API is really sloppy because the VLAN addition and removal operations are not paired with one another (you can add a VLAN 100 times and delete it just once). The bridge notifies through switchdev of a VLAN addition not only when the flags of an existing VLAN change, but also when nothing changes. There are switchdev drivers out there who don't like adding a VLAN that has already been added, and those checks don't really belong at driver level. But the fact that the API contains ranges is yet another factor that prevents this from being addressed in the future. Of the existing switchdev pieces of hardware, it appears that only Mellanox Spectrum supports offloading more than one VLAN at a time, through mlxsw_sp_port_vlan_set. I have kept that code internal to the driver, because there is some more bookkeeping that makes use of it, but I deleted it from the switchdev API. But since the switchdev support for ranges has already been de facto deleted by a Mellanox employee and nobody noticed for 4 years, I'm going to assume it's not a biggie. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> # switchdev and mlxsw Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Kurt Kanzenbach <kurt@linutronix.de> # hellcreek Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-01-11 16:00:56 -08:00
Menglong Dong	efb5b338da	net: bridge: fix misspellings using codespell tool Some typos are found out by codespell tool: $ codespell ./net/bridge/ ./net/bridge/br_stp.c:604: permanant ==> permanent ./net/bridge/br_stp.c:605: persistance ==> persistence ./net/bridge/br.c:125: underlaying ==> underlying ./net/bridge/br_input.c:43: modue ==> mode ./net/bridge/br_mrp.c:828: Determin ==> Determine ./net/bridge/br_mrp.c:848: Determin ==> Determine ./net/bridge/br_mrp.c:897: Determin ==> Determine Fix typos found by codespell. Signed-off-by: Menglong Dong <dong.menglong@zte.com.cn> Acked-by: Randy Dunlap <rdunlap@infradead.org> Link: https://lore.kernel.org/r/20210108025332.52480-1-dong.menglong@zte.com.cn Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-01-09 13:54:47 -08:00
Vladimir Oltean	90dc8fd360	net: bridge: notify switchdev of disappearance of old FDB entry upon migration Currently the bridge emits atomic switchdev notifications for dynamically learnt FDB entries. Monitoring these notifications works wonders for switchdev drivers that want to keep their hardware FDB in sync with the bridge's FDB. For example station A wants to talk to station B in the diagram below, and we are concerned with the behavior of the bridge on the DUT device: DUT +-------------------------------------+ \| br0 \| \| +------+ +------+ +------+ +------+ \| \| \| \| \| \| \| \| \| \| \| \| \| swp0 \| \| swp1 \| \| swp2 \| \| eth0 \| \| +-------------------------------------+ \| \| \| Station A \| \| \| \| +--+------+--+ +--+------+--+ \| \| \| \| \| \| \| \| \| \| swp0 \| \| \| \| swp0 \| \| Another \| +------+ \| \| +------+ \| Another switch \| br0 \| \| br0 \| switch \| +------+ \| \| +------+ \| \| \| \| \| \| \| \| \| \| \| swp1 \| \| \| \| swp1 \| \| +--+------+--+ +--+------+--+ \| Station B Interfaces swp0, swp1, swp2 are handled by a switchdev driver that has the following property: frames injected from its control interface bypass the internal address analyzer logic, and therefore, this hardware does not learn from the source address of packets transmitted by the network stack through it. So, since bridging between eth0 (where Station B is attached) and swp0 (where Station A is attached) is done in software, the switchdev hardware will never learn the source address of Station B. So the traffic towards that destination will be treated as unknown, i.e. flooded. This is where the bridge notifications come in handy. When br0 on the DUT sees frames with Station B's MAC address on eth0, the switchdev driver gets these notifications and can install a rule to send frames towards Station B's address that are incoming from swp0, swp1, swp2, only towards the control interface. This is all switchdev driver private business, which the notification makes possible. All is fine until someone unplugs Station B's cable and moves it to the other switch: DUT +-------------------------------------+ \| br0 \| \| +------+ +------+ +------+ +------+ \| \| \| \| \| \| \| \| \| \| \| \| \| swp0 \| \| swp1 \| \| swp2 \| \| eth0 \| \| +-------------------------------------+ \| \| \| Station A \| \| \| \| +--+------+--+ +--+------+--+ \| \| \| \| \| \| \| \| \| \| swp0 \| \| \| \| swp0 \| \| Another \| +------+ \| \| +------+ \| Another switch \| br0 \| \| br0 \| switch \| +------+ \| \| +------+ \| \| \| \| \| \| \| \| \| \| \| swp1 \| \| \| \| swp1 \| \| +--+------+--+ +--+------+--+ \| Station B Luckily for the use cases we care about, Station B is noisy enough that the DUT hears it (on swp1 this time). swp1 receives the frames and delivers them to the bridge, who enters the unlikely path in br_fdb_update of updating an existing entry. It moves the entry in the software bridge to swp1 and emits an addition notification towards that. As far as the switchdev driver is concerned, all that it needs to ensure is that traffic between Station A and Station B is not forever broken. If it does nothing, then the stale rule to send frames for Station B towards the control interface remains in place. But Station B is no longer reachable via the control interface, but via a port that can offload the bridge port learning attribute. It's just that the port is prevented from learning this address, since the rule overrides FDB updates. So the rule needs to go. The question is via what mechanism. It sure would be possible for this switchdev driver to keep track of all addresses which are sent to the control interface, and then also listen for bridge notifier events on its own ports, searching for the ones that have a MAC address which was previously sent to the control interface. But this is cumbersome and inefficient. Instead, with one small change, the bridge could notify of the address deletion from the old port, in a symmetrical manner with how it did for the insertion. Then the switchdev driver would not be required to monitor learn/forget events for its own ports. It could just delete the rule towards the control interface upon bridge entry migration. This would make hardware address learning be possible again. Then it would take a few more packets until the hardware and software FDB would be in sync again. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-01-07 15:34:45 -08:00
Wang Hai	989a1db06e	net: bridge: Fix a warning when del bridge sysfs I got a warining report: br_sysfs_addbr: can't create group bridge4/bridge ------------[ cut here ]------------ sysfs group 'bridge' not found for kobject 'bridge4' WARNING: CPU: 2 PID: 9004 at fs/sysfs/group.c:279 sysfs_remove_group fs/sysfs/group.c:279 [inline] WARNING: CPU: 2 PID: 9004 at fs/sysfs/group.c:279 sysfs_remove_group+0x153/0x1b0 fs/sysfs/group.c:270 Modules linked in: iptable_nat ... Call Trace: br_dev_delete+0x112/0x190 net/bridge/br_if.c:384 br_dev_newlink net/bridge/br_netlink.c:1381 [inline] br_dev_newlink+0xdb/0x100 net/bridge/br_netlink.c:1362 __rtnl_newlink+0xe11/0x13f0 net/core/rtnetlink.c:3441 rtnl_newlink+0x64/0xa0 net/core/rtnetlink.c:3500 rtnetlink_rcv_msg+0x385/0x980 net/core/rtnetlink.c:5562 netlink_rcv_skb+0x134/0x3d0 net/netlink/af_netlink.c:2494 netlink_unicast_kernel net/netlink/af_netlink.c:1304 [inline] netlink_unicast+0x4a0/0x6a0 net/netlink/af_netlink.c:1330 netlink_sendmsg+0x793/0xc80 net/netlink/af_netlink.c:1919 sock_sendmsg_nosec net/socket.c:651 [inline] sock_sendmsg+0x139/0x170 net/socket.c:671 ____sys_sendmsg+0x658/0x7d0 net/socket.c:2353 ___sys_sendmsg+0xf8/0x170 net/socket.c:2407 __sys_sendmsg+0xd3/0x190 net/socket.c:2440 do_syscall_64+0x33/0x40 arch/x86/entry/common.c:46 entry_SYSCALL_64_after_hwframe+0x44/0xa9 In br_device_event(), if the bridge sysfs fails to be added, br_device_event() should return error. This can prevent warining when removing bridge sysfs that do not exist. Fixes: `bb900b27a2` ("bridge: allow creating bridge devices with netlink") Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Wang Hai <wanghai38@huawei.com> Tested-by: Nikolay Aleksandrov <nikolay@nvidia.com> Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com> Link: https://lore.kernel.org/r/20201211122921.40386-1-wanghai38@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-12-14 18:27:49 -08:00
Jakub Kicinski	7bca5021a4	Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next Pablo Neira Ayuso says: ==================== Netfilter/IPVS updates for net-next 1) Missing dependencies in NFT_BRIDGE_REJECT, from Randy Dunlap. 2) Use atomic_inc_return() instead of atomic_add_return() in IPVS, from Yejune Deng. 3) Simplify check for overquota in xt_nfacct, from Kaixu Xia. 4) Move nfnl_acct_list away from struct net, from Miao Wang. 5) Pass actual sk in reject actions, from Jan Engelhardt. 6) Add timeout and protoinfo to ctnetlink destroy events, from Florian Westphal. 7) Four patches to generalize set infrastructure to support for multiple expressions per set element. * git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next: netfilter: nftables: netlink support for several set element expressions netfilter: nftables: generalize set extension to support for several expressions netfilter: nftables: move nft_expr before nft_set netfilter: nftables: generalize set expressions support netfilter: ctnetlink: add timeout and protoinfo to destroy events netfilter: use actual socket sk for REJECT action netfilter: nfnl_acct: remove data from struct net netfilter: Remove unnecessary conversion to bool ipvs: replace atomic_add_return() netfilter: nft_reject_bridge: fix build errors due to code movement ==================== Link: https://lore.kernel.org/r/20201212230513.3465-1-pablo@netfilter.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-12-14 15:43:21 -08:00
Jakub Kicinski	46d5e62dd3	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net xdp_return_frame_bulk() needs to pass a xdp_buff to __xdp_return(). strlcpy got converted to strscpy but here it makes no functional difference, so just keep the right code. Conflicts: net/netfilter/nf_tables_api.c Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-12-11 22:29:38 -08:00
Joseph Huang	851d0a73c9	bridge: Fix a deadlock when enabling multicast snooping When enabling multicast snooping, bridge module deadlocks on multicast_lock if 1) IPv6 is enabled, and 2) there is an existing querier on the same L2 network. The deadlock was caused by the following sequence: While holding the lock, br_multicast_open calls br_multicast_join_snoopers, which eventually causes IP stack to (attempt to) send out a Listener Report (in igmp6_join_group). Since the destination Ethernet address is a multicast address, br_dev_xmit feeds the packet back to the bridge via br_multicast_rcv, which in turn calls br_multicast_add_group, which then deadlocks on multicast_lock. The fix is to move the call br_multicast_join_snoopers outside of the critical section. This works since br_multicast_join_snoopers only deals with IP and does not modify any multicast data structures of the bridge, so there's no need to hold the lock. Steps to reproduce: 1. sysctl net.ipv6.conf.all.force_mld_version=1 2. have another querier 3. ip link set dev bridge type bridge mcast_snooping 0 && \ ip link set dev bridge type bridge mcast_snooping 1 < deadlock > A typical call trace looks like the following: [ 936.251495] _raw_spin_lock+0x5c/0x68 [ 936.255221] br_multicast_add_group+0x40/0x170 [bridge] [ 936.260491] br_multicast_rcv+0x7ac/0xe30 [bridge] [ 936.265322] br_dev_xmit+0x140/0x368 [bridge] [ 936.269689] dev_hard_start_xmit+0x94/0x158 [ 936.273876] __dev_queue_xmit+0x5ac/0x7f8 [ 936.277890] dev_queue_xmit+0x10/0x18 [ 936.281563] neigh_resolve_output+0xec/0x198 [ 936.285845] ip6_finish_output2+0x240/0x710 [ 936.290039] __ip6_finish_output+0x130/0x170 [ 936.294318] ip6_output+0x6c/0x1c8 [ 936.297731] NF_HOOK.constprop.0+0xd8/0xe8 [ 936.301834] igmp6_send+0x358/0x558 [ 936.305326] igmp6_join_group.part.0+0x30/0xf0 [ 936.309774] igmp6_group_added+0xfc/0x110 [ 936.313787] __ipv6_dev_mc_inc+0x1a4/0x290 [ 936.317885] ipv6_dev_mc_inc+0x10/0x18 [ 936.321677] br_multicast_open+0xbc/0x110 [bridge] [ 936.326506] br_multicast_toggle+0xec/0x140 [bridge] Fixes: `4effd28c12` ("bridge: join all-snoopers multicast address") Signed-off-by: Joseph Huang <Joseph.Huang@garmin.com> Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com> Link: https://lore.kernel.org/r/20201204235628.50653-1-Joseph.Huang@garmin.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-12-07 17:14:43 -08:00
Zhang Changzhong	ee4f52a8de	net: bridge: vlan: fix error return code in __vlan_add() Fix to return a negative error code from the error handling case instead of 0, as done elsewhere in this function. Fixes: `f8ed289fab` ("bridge: vlan: use br_vlan_(get\|put)_master to deal with refcounts") Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Zhang Changzhong <zhangchangzhong@huawei.com> Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com> Link: https://lore.kernel.org/r/1607071737-33875-1-git-send-email-zhangchangzhong@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-12-04 15:41:06 -08:00
Jakub Kicinski	55fd59b003	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Conflicts: drivers/net/ethernet/ibm/ibmvnic.c Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-12-03 15:44:09 -08:00
Danielle Ratson	22ec19f3ae	bridge: switchdev: Notify about VLAN protocol changes Drivers that support bridge offload need to be notified about changes to the bridge's VLAN protocol so that they could react accordingly and potentially veto the change. Add a new switchdev attribute to communicate the change to drivers. Signed-off-by: Danielle Ratson <danieller@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Ivan Vecera <ivecera@redhat.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-12-01 15:21:13 -08:00
Antoine Tenart	44f64f23ba	netfilter: bridge: reset skb->pkt_type after NF_INET_POST_ROUTING traversal Netfilter changes PACKET_OTHERHOST to PACKET_HOST before invoking the hooks as, while it's an expected value for a bridge, routing expects PACKET_HOST. The change is undone later on after hook traversal. This can be seen with pairs of functions updating skb>pkt_type and then reverting it to its original value: For hook NF_INET_PRE_ROUTING: setup_pre_routing / br_nf_pre_routing_finish For hook NF_INET_FORWARD: br_nf_forward_ip / br_nf_forward_finish But the third case where netfilter does this, for hook NF_INET_POST_ROUTING, the packet type is changed in br_nf_post_routing but never reverted. A comment says: /* We assume any code from br_dev_queue_push_xmit onwards doesn't care * about the value of skb->pkt_type. */ But when having a tunnel (say vxlan) attached to a bridge we have the following call trace: br_nf_pre_routing br_nf_pre_routing_ipv6 br_nf_pre_routing_finish br_nf_forward_ip br_nf_forward_finish br_nf_post_routing <- pkt_type is updated to PACKET_HOST br_nf_dev_queue_xmit <- but not reverted to its original value vxlan_xmit vxlan_xmit_one skb_tunnel_check_pmtu <- a check on pkt_type is performed In this specific case, this creates issues such as when an ICMPv6 PTB should be sent back. When CONFIG_BRIDGE_NETFILTER is enabled, the PTB isn't sent (as skb_tunnel_check_pmtu checks if pkt_type is PACKET_HOST and returns early). If the comment is right and no one cares about the value of skb->pkt_type after br_dev_queue_push_xmit (which isn't true), resetting it to its original value should be safe. Fixes: `1da177e4c3` ("Linux-2.6.12-rc2") Signed-off-by: Antoine Tenart <atenart@kernel.org> Reviewed-by: Florian Westphal <fw@strlen.de> Link: https://lore.kernel.org/r/20201123174902.622102-1-atenart@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-11-28 11:46:51 -08:00
Horatiu Vultur	bfd042321a	bridge: mrp: Implement LC mode for MRP Extend MRP to support LC mode(link check) for the interconnect port. This applies only to the interconnect ring. Opposite to RC mode(ring check) the LC mode is using CFM frames to detect when the link goes up or down and based on that the userspace will need to react. One advantage of the LC mode over RC mode is that there will be fewer frames in the normal rings. Because RC mode generates InTest on all ports while LC mode sends CFM frame only on the interconnect port. All 4 nodes part of the interconnect ring needs to have the same mode. And it is not possible to have running LC and RC mode at the same time on a node. Whenever the MIM starts it needs to detect the status of the other 3 nodes in the interconnect ring so it would send a frame called InLinkStatus, on which the clients needs to reply with their link status. This patch adds InLinkStatus frame type and extends existing rules on how to forward this frame. Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com> Link: https://lore.kernel.org/r/20201124082525.273820-1-horatiu.vultur@microchip.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-11-25 13:33:35 -08:00
Randy Dunlap	fd2d6bc4c2	netfilter: nft_reject_bridge: fix build errors due to code movement Fix build errors in net/bridge/netfilter/nft_reject_bridge.ko by selecting NF_REJECT_IPV4, which provides the missing symbols. ERROR: modpost: "nf_reject_skb_v4_tcp_reset" [net/bridge/netfilter/nft_reject_bridge.ko] undefined! ERROR: modpost: "nf_reject_skb_v4_unreach" [net/bridge/netfilter/nft_reject_bridge.ko] undefined! Fixes: `fa538f7cf0` ("netfilter: nf_reject: add reject skbuff creation helpers") Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2020-11-22 13:44:51 +01:00
Heiner Kallweit	7609ecb2ed	net: bridge: switch to net core statistics counters handling Use netdev->tstats instead of a member of net_bridge for storing a pointer to the per-cpu counters. This allows us to use core functionality for statistics handling. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Link: https://lore.kernel.org/r/9bad2be2-fd84-7c6e-912f-cee433787018@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-11-21 14:40:50 -08:00
Jakub Kicinski	56495a2442	Merge https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-11-19 19:08:46 -08:00
Heiner Kallweit	281cc2843b	net: bridge: replace struct br_vlan_stats with pcpu_sw_netstats Struct br_vlan_stats duplicates pcpu_sw_netstats (apart from br_vlan_stats not defining an alignment requirement), therefore switch to using the latter one. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Link: https://lore.kernel.org/r/04d25c3d-c5f6-3611-6d37-c2f40243dae2@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-11-18 17:23:16 -08:00
Heiner Kallweit	7a30ecc923	net: bridge: add missing counters to ndo_get_stats64 callback In br_forward.c and br_input.c fields dev->stats.tx_dropped and dev->stats.multicast are populated, but they are ignored in ndo_get_stats64. Fixes: `28172739f0` ("net: fix 64 bit counters on 32 bit arches") Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Link: https://lore.kernel.org/r/58ea9963-77ad-a7cf-8dfd-fc95ab95f606@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-11-16 15:47:50 -08:00
Horatiu Vultur	0169b82054	bridge: mrp: Use hlist_head instead of list_head for mrp Replace list_head with hlist_head for MRP list under the bridge. There is no need for a circular list when a linear list will work. This will also decrease the size of 'struct net_bridge'. Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com> Link: https://lore.kernel.org/r/20201106215049.1448185-1-horatiu.vultur@microchip.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-11-09 16:42:12 -08:00
Jakub Kicinski	b65ca4c388	Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next Pablo Neira Ayuso says: ==================== Netfilter updates for net-next 1) Move existing bridge packet reject infra to nf_reject_{ipv4,ipv6}.c from Jose M. Guisado. 2) Consolidate nft_reject_inet initialization and dump, also from Jose. 3) Add the netdev reject action, from Jose. 4) Allow to combine the exist flag and the destroy command in ipset, from Joszef Kadlecsik. 5) Expose bucket size parameter for hashtables, also from Jozsef. 6) Expose the init value for reproducible ipset listings, from Jozsef. 7) Use __printf attribute in nft_request_module, from Andrew Lunn. 8) Allow to use reject from the inet ingress chain. * git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next: netfilter: nft_reject_inet: allow to use reject from inet ingress netfilter: nftables: Add __printf() attribute netfilter: ipset: Expose the initval hash parameter to userspace netfilter: ipset: Add bucketsize parameter to all hash types netfilter: ipset: Support the -exist flag with the destroy command netfilter: nft_reject: add reject verdict support for netdev netfilter: nft_reject: unify reject init and dump into nft_reject netfilter: nf_reject: add reject skbuff creation helpers ==================== Link: https://lore.kernel.org/r/20201104141149.30082-1-pablo@netfilter.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-11-04 18:05:56 -08:00
Vladimir Oltean	c43fd36f7f	net: bridge: mcast: fix stub definition of br_multicast_querier_exists The commit cited below has changed only the functional prototype of br_multicast_querier_exists, but forgot to do that for the stub prototype (the one where CONFIG_BRIDGE_IGMP_SNOOPING is disabled). Fixes: `955062b03f` ("net: bridge: mcast: add support for raw L2 multicast groups") Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Link: https://lore.kernel.org/r/20201101000845.190009-1-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-10-31 17:23:19 -07:00
Jose M. Guisado Gomez	312ca575a5	netfilter: nft_reject: unify reject init and dump into nft_reject Bridge family is using the same static init and dump function as inet. This patch removes duplicate code unifying these functions body into nft_reject.c so they can be reused in the rest of families supporting reject verdict. Signed-off-by: Jose M. Guisado Gomez <guigom@riseup.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2020-10-31 10:40:42 +01:00
Jose M. Guisado Gomez	fa538f7cf0	netfilter: nf_reject: add reject skbuff creation helpers Adds reject skbuff creation helper functions to ipv4/6 nf_reject infrastructure. Use these functions for reject verdict in bridge family. Can be reused by all different families that support reject and will not inject the reject packet through ip local out. Signed-off-by: Jose M. Guisado Gomez <guigom@riseup.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2020-10-31 10:40:22 +01:00
Vladimir Oltean	0e761ac08f	net: bridge: explicitly convert between mdb entry state and port group flags When creating a new multicast port group, there is implicit conversion between the __u8 state member of struct br_mdb_entry and the unsigned char flags member of struct net_bridge_port_group. This implicit conversion relies on the fact that MDB_PERMANENT is equal to MDB_PG_FLAGS_PERMANENT. Let's be more explicit and convert the state to flags manually. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Link: https://lore.kernel.org/r/20201028234815.613226-1-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-10-30 17:58:16 -07:00
Nikolay Aleksandrov	955062b03f	net: bridge: mcast: add support for raw L2 multicast groups Extend the bridge multicast control and data path to configure routes for L2 (non-IP) multicast groups. The uapi struct br_mdb_entry union u is extended with another variant, mac_addr, which does not change the structure size, and which is valid when the proto field is zero. To be compatible with the forwarding code that is already in place, which acts as an IGMP/MLD snooping bridge with querier capabilities, we need to declare that for L2 MDB entries (for which there exists no such thing as IGMP/MLD snooping/querying), that there is always a querier. Otherwise, these entries would be flooded to all bridge ports and not just to those that are members of the L2 multicast group. Needless to say, only permanent L2 multicast groups can be installed on a bridge port. Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Link: https://lore.kernel.org/r/20201028233831.610076-1-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-10-30 17:49:19 -07:00
Henrik Bjoernlund	b6d0425b81	bridge: cfm: Netlink Notifications. This is the implementation of Netlink notifications out of CFM. Notifications are initiated whenever a state change happens in CFM. IFLA_BRIDGE_CFM: Points to the CFM information. IFLA_BRIDGE_CFM_MEP_STATUS_INFO: This indicate that the MEP instance status are following. IFLA_BRIDGE_CFM_CC_PEER_STATUS_INFO: This indicate that the peer MEP status are following. CFM nested attribute has the following attributes in next level. IFLA_BRIDGE_CFM_MEP_STATUS_INSTANCE: The MEP instance number of the delivered status. The type is NLA_U32. IFLA_BRIDGE_CFM_MEP_STATUS_OPCODE_UNEXP_SEEN: The MEP instance received CFM PDU with unexpected Opcode. The type is NLA_U32 (bool). IFLA_BRIDGE_CFM_MEP_STATUS_VERSION_UNEXP_SEEN: The MEP instance received CFM PDU with unexpected version. The type is NLA_U32 (bool). IFLA_BRIDGE_CFM_MEP_STATUS_RX_LEVEL_LOW_SEEN: The MEP instance received CCM PDU with MD level lower than configured level. This frame is discarded. The type is NLA_U32 (bool). IFLA_BRIDGE_CFM_CC_PEER_STATUS_INSTANCE: The MEP instance number of the delivered status. The type is NLA_U32. IFLA_BRIDGE_CFM_CC_PEER_STATUS_PEER_MEPID: The added Peer MEP ID of the delivered status. The type is NLA_U32. IFLA_BRIDGE_CFM_CC_PEER_STATUS_CCM_DEFECT: The CCM defect status. The type is NLA_U32 (bool). True means no CCM frame is received for 3.25 intervals. IFLA_BRIDGE_CFM_CC_CONFIG_EXP_INTERVAL. IFLA_BRIDGE_CFM_CC_PEER_STATUS_RDI: The last received CCM PDU RDI. The type is NLA_U32 (bool). IFLA_BRIDGE_CFM_CC_PEER_STATUS_PORT_TLV_VALUE: The last received CCM PDU Port Status TLV value field. The type is NLA_U8. IFLA_BRIDGE_CFM_CC_PEER_STATUS_IF_TLV_VALUE: The last received CCM PDU Interface Status TLV value field. The type is NLA_U8. IFLA_BRIDGE_CFM_CC_PEER_STATUS_SEEN: A CCM frame has been received from Peer MEP. The type is NLA_U32 (bool). This is cleared after GETLINK IFLA_BRIDGE_CFM_CC_PEER_STATUS_INFO. IFLA_BRIDGE_CFM_CC_PEER_STATUS_TLV_SEEN: A CCM frame with TLV has been received from Peer MEP. The type is NLA_U32 (bool). This is cleared after GETLINK IFLA_BRIDGE_CFM_CC_PEER_STATUS_INFO. IFLA_BRIDGE_CFM_CC_PEER_STATUS_SEQ_UNEXP_SEEN: A CCM frame with unexpected sequence number has been received from Peer MEP. The type is NLA_U32 (bool). When a sequence number is not one higher than previously received then it is unexpected. This is cleared after GETLINK IFLA_BRIDGE_CFM_CC_PEER_STATUS_INFO. Signed-off-by: Henrik Bjoernlund <henrik.bjoernlund@microchip.com> Reviewed-by: Horatiu Vultur <horatiu.vultur@microchip.com> Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-10-29 18:39:44 -07:00
Henrik Bjoernlund	e77824d81d	bridge: cfm: Netlink GET status Interface. This is the implementation of CFM netlink status get information interface. Add new nested netlink attributes. These attributes are used by the user space to get status information. GETLINK: Request filter RTEXT_FILTER_CFM_STATUS: Indicating that CFM status information must be delivered. IFLA_BRIDGE_CFM: Points to the CFM information. IFLA_BRIDGE_CFM_MEP_STATUS_INFO: This indicate that the MEP instance status are following. IFLA_BRIDGE_CFM_CC_PEER_STATUS_INFO: This indicate that the peer MEP status are following. CFM nested attribute has the following attributes in next level. GETLINK RTEXT_FILTER_CFM_STATUS: IFLA_BRIDGE_CFM_MEP_STATUS_INSTANCE: The MEP instance number of the delivered status. The type is u32. IFLA_BRIDGE_CFM_MEP_STATUS_OPCODE_UNEXP_SEEN: The MEP instance received CFM PDU with unexpected Opcode. The type is u32 (bool). IFLA_BRIDGE_CFM_MEP_STATUS_VERSION_UNEXP_SEEN: The MEP instance received CFM PDU with unexpected version. The type is u32 (bool). IFLA_BRIDGE_CFM_MEP_STATUS_RX_LEVEL_LOW_SEEN: The MEP instance received CCM PDU with MD level lower than configured level. This frame is discarded. The type is u32 (bool). IFLA_BRIDGE_CFM_CC_PEER_STATUS_INSTANCE: The MEP instance number of the delivered status. The type is u32. IFLA_BRIDGE_CFM_CC_PEER_STATUS_PEER_MEPID: The added Peer MEP ID of the delivered status. The type is u32. IFLA_BRIDGE_CFM_CC_PEER_STATUS_CCM_DEFECT: The CCM defect status. The type is u32 (bool). True means no CCM frame is received for 3.25 intervals. IFLA_BRIDGE_CFM_CC_CONFIG_EXP_INTERVAL. IFLA_BRIDGE_CFM_CC_PEER_STATUS_RDI: The last received CCM PDU RDI. The type is u32 (bool). IFLA_BRIDGE_CFM_CC_PEER_STATUS_PORT_TLV_VALUE: The last received CCM PDU Port Status TLV value field. The type is u8. IFLA_BRIDGE_CFM_CC_PEER_STATUS_IF_TLV_VALUE: The last received CCM PDU Interface Status TLV value field. The type is u8. IFLA_BRIDGE_CFM_CC_PEER_STATUS_SEEN: A CCM frame has been received from Peer MEP. The type is u32 (bool). This is cleared after GETLINK IFLA_BRIDGE_CFM_CC_PEER_STATUS_INFO. IFLA_BRIDGE_CFM_CC_PEER_STATUS_TLV_SEEN: A CCM frame with TLV has been received from Peer MEP. The type is u32 (bool). This is cleared after GETLINK IFLA_BRIDGE_CFM_CC_PEER_STATUS_INFO. IFLA_BRIDGE_CFM_CC_PEER_STATUS_SEQ_UNEXP_SEEN: A CCM frame with unexpected sequence number has been received from Peer MEP. The type is u32 (bool). When a sequence number is not one higher than previously received then it is unexpected. This is cleared after GETLINK IFLA_BRIDGE_CFM_CC_PEER_STATUS_INFO. Signed-off-by: Henrik Bjoernlund <henrik.bjoernlund@microchip.com> Reviewed-by: Horatiu Vultur <horatiu.vultur@microchip.com> Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-10-29 18:39:44 -07:00
Henrik Bjoernlund	5e312fc0e7	bridge: cfm: Netlink GET configuration Interface. This is the implementation of CFM netlink configuration get information interface. Add new nested netlink attributes. These attributes are used by the user space to get configuration information. GETLINK: Request filter RTEXT_FILTER_CFM_CONFIG: Indicating that CFM configuration information must be delivered. IFLA_BRIDGE_CFM: Points to the CFM information. IFLA_BRIDGE_CFM_MEP_CREATE_INFO: This indicate that MEP instance create parameters are following. IFLA_BRIDGE_CFM_MEP_CONFIG_INFO: This indicate that MEP instance config parameters are following. IFLA_BRIDGE_CFM_CC_CONFIG_INFO: This indicate that MEP instance CC functionality parameters are following. IFLA_BRIDGE_CFM_CC_RDI_INFO: This indicate that CC transmitted CCM PDU RDI parameters are following. IFLA_BRIDGE_CFM_CC_CCM_TX_INFO: This indicate that CC transmitted CCM PDU parameters are following. IFLA_BRIDGE_CFM_CC_PEER_MEP_INFO: This indicate that the added peer MEP IDs are following. CFM nested attribute has the following attributes in next level. GETLINK RTEXT_FILTER_CFM_CONFIG: IFLA_BRIDGE_CFM_MEP_CREATE_INSTANCE: The created MEP instance number. The type is u32. IFLA_BRIDGE_CFM_MEP_CREATE_DOMAIN: The created MEP domain. The type is u32 (br_cfm_domain). It must be BR_CFM_PORT. This means that CFM frames are transmitted and received directly on the port - untagged. Not in a VLAN. IFLA_BRIDGE_CFM_MEP_CREATE_DIRECTION: The created MEP direction. The type is u32 (br_cfm_mep_direction). It must be BR_CFM_MEP_DIRECTION_DOWN. This means that CFM frames are transmitted and received on the port. Not in the bridge. IFLA_BRIDGE_CFM_MEP_CREATE_IFINDEX: The created MEP residence port ifindex. The type is u32 (ifindex). IFLA_BRIDGE_CFM_MEP_DELETE_INSTANCE: The deleted MEP instance number. The type is u32. IFLA_BRIDGE_CFM_MEP_CONFIG_INSTANCE: The configured MEP instance number. The type is u32. IFLA_BRIDGE_CFM_MEP_CONFIG_UNICAST_MAC: The configured MEP unicast MAC address. The type is 6u8 (array). This is used as SMAC in all transmitted CFM frames. IFLA_BRIDGE_CFM_MEP_CONFIG_MDLEVEL: The configured MEP unicast MD level. The type is u32. It must be in the range 1-7. No CFM frames are passing through this MEP on lower levels. IFLA_BRIDGE_CFM_MEP_CONFIG_MEPID: The configured MEP ID. The type is u32. It must be in the range 0-0x1FFF. This MEP ID is inserted in any transmitted CCM frame. IFLA_BRIDGE_CFM_CC_CONFIG_INSTANCE: The configured MEP instance number. The type is u32. IFLA_BRIDGE_CFM_CC_CONFIG_ENABLE: The Continuity Check (CC) functionality is enabled or disabled. The type is u32 (bool). IFLA_BRIDGE_CFM_CC_CONFIG_EXP_INTERVAL: The CC expected receive interval of CCM frames. The type is u32 (br_cfm_ccm_interval). This is also the transmission interval of CCM frames when enabled. IFLA_BRIDGE_CFM_CC_CONFIG_EXP_MAID: The CC expected receive MAID in CCM frames. The type is CFM_MAID_LENGTHu8. This is MAID is also inserted in transmitted CCM frames. IFLA_BRIDGE_CFM_CC_PEER_MEP_INSTANCE: The configured MEP instance number. The type is u32. IFLA_BRIDGE_CFM_CC_PEER_MEPID: The CC Peer MEP ID added. The type is u32. When a Peer MEP ID is added and CC is enabled it is expected to receive CCM frames from that Peer MEP. IFLA_BRIDGE_CFM_CC_RDI_INSTANCE: The configured MEP instance number. The type is u32. IFLA_BRIDGE_CFM_CC_RDI_RDI: The RDI that is inserted in transmitted CCM PDU. The type is u32 (bool). IFLA_BRIDGE_CFM_CC_CCM_TX_INSTANCE: The configured MEP instance number. The type is u32. IFLA_BRIDGE_CFM_CC_CCM_TX_DMAC: The transmitted CCM frame destination MAC address. The type is 6*u8 (array). This is used as DMAC in all transmitted CFM frames. IFLA_BRIDGE_CFM_CC_CCM_TX_SEQ_NO_UPDATE: The transmitted CCM frame update (increment) of sequence number is enabled or disabled. The type is u32 (bool). IFLA_BRIDGE_CFM_CC_CCM_TX_PERIOD: The period of time where CCM frame are transmitted. The type is u32. The time is given in seconds. SETLINK IFLA_BRIDGE_CFM_CC_CCM_TX must be done before timeout to keep transmission alive. When period is zero any ongoing CCM frame transmission will be stopped. IFLA_BRIDGE_CFM_CC_CCM_TX_IF_TLV: The transmitted CCM frame update with Interface Status TLV is enabled or disabled. The type is u32 (bool). IFLA_BRIDGE_CFM_CC_CCM_TX_IF_TLV_VALUE: The transmitted Interface Status TLV value field. The type is u8. IFLA_BRIDGE_CFM_CC_CCM_TX_PORT_TLV: The transmitted CCM frame update with Port Status TLV is enabled or disabled. The type is u32 (bool). IFLA_BRIDGE_CFM_CC_CCM_TX_PORT_TLV_VALUE: The transmitted Port Status TLV value field. The type is u8. Signed-off-by: Henrik Bjoernlund <henrik.bjoernlund@microchip.com> Reviewed-by: Horatiu Vultur <horatiu.vultur@microchip.com> Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-10-29 18:39:43 -07:00
Henrik Bjoernlund	2be665c394	bridge: cfm: Netlink SET configuration Interface. This is the implementation of CFM netlink configuration set information interface. Add new nested netlink attributes. These attributes are used by the user space to create/delete/configure CFM instances. SETLINK: IFLA_BRIDGE_CFM: Indicate that the following attributes are CFM. IFLA_BRIDGE_CFM_MEP_CREATE: This indicate that a MEP instance must be created. IFLA_BRIDGE_CFM_MEP_DELETE: This indicate that a MEP instance must be deleted. IFLA_BRIDGE_CFM_MEP_CONFIG: This indicate that a MEP instance must be configured. IFLA_BRIDGE_CFM_CC_CONFIG: This indicate that a MEP instance Continuity Check (CC) functionality must be configured. IFLA_BRIDGE_CFM_CC_PEER_MEP_ADD: This indicate that a CC Peer MEP must be added. IFLA_BRIDGE_CFM_CC_PEER_MEP_REMOVE: This indicate that a CC Peer MEP must be removed. IFLA_BRIDGE_CFM_CC_CCM_TX: This indicate that the CC transmitted CCM PDU must be configured. IFLA_BRIDGE_CFM_CC_RDI: This indicate that the CC transmitted CCM PDU RDI must be configured. CFM nested attribute has the following attributes in next level. SETLINK RTEXT_FILTER_CFM_CONFIG: IFLA_BRIDGE_CFM_MEP_CREATE_INSTANCE: The created MEP instance number. The type is u32. IFLA_BRIDGE_CFM_MEP_CREATE_DOMAIN: The created MEP domain. The type is u32 (br_cfm_domain). It must be BR_CFM_PORT. This means that CFM frames are transmitted and received directly on the port - untagged. Not in a VLAN. IFLA_BRIDGE_CFM_MEP_CREATE_DIRECTION: The created MEP direction. The type is u32 (br_cfm_mep_direction). It must be BR_CFM_MEP_DIRECTION_DOWN. This means that CFM frames are transmitted and received on the port. Not in the bridge. IFLA_BRIDGE_CFM_MEP_CREATE_IFINDEX: The created MEP residence port ifindex. The type is u32 (ifindex). IFLA_BRIDGE_CFM_MEP_DELETE_INSTANCE: The deleted MEP instance number. The type is u32. IFLA_BRIDGE_CFM_MEP_CONFIG_INSTANCE: The configured MEP instance number. The type is u32. IFLA_BRIDGE_CFM_MEP_CONFIG_UNICAST_MAC: The configured MEP unicast MAC address. The type is 6u8 (array). This is used as SMAC in all transmitted CFM frames. IFLA_BRIDGE_CFM_MEP_CONFIG_MDLEVEL: The configured MEP unicast MD level. The type is u32. It must be in the range 1-7. No CFM frames are passing through this MEP on lower levels. IFLA_BRIDGE_CFM_MEP_CONFIG_MEPID: The configured MEP ID. The type is u32. It must be in the range 0-0x1FFF. This MEP ID is inserted in any transmitted CCM frame. IFLA_BRIDGE_CFM_CC_CONFIG_INSTANCE: The configured MEP instance number. The type is u32. IFLA_BRIDGE_CFM_CC_CONFIG_ENABLE: The Continuity Check (CC) functionality is enabled or disabled. The type is u32 (bool). IFLA_BRIDGE_CFM_CC_CONFIG_EXP_INTERVAL: The CC expected receive interval of CCM frames. The type is u32 (br_cfm_ccm_interval). This is also the transmission interval of CCM frames when enabled. IFLA_BRIDGE_CFM_CC_CONFIG_EXP_MAID: The CC expected receive MAID in CCM frames. The type is CFM_MAID_LENGTHu8. This is MAID is also inserted in transmitted CCM frames. IFLA_BRIDGE_CFM_CC_PEER_MEP_INSTANCE: The configured MEP instance number. The type is u32. IFLA_BRIDGE_CFM_CC_PEER_MEPID: The CC Peer MEP ID added. The type is u32. When a Peer MEP ID is added and CC is enabled it is expected to receive CCM frames from that Peer MEP. IFLA_BRIDGE_CFM_CC_RDI_INSTANCE: The configured MEP instance number. The type is u32. IFLA_BRIDGE_CFM_CC_RDI_RDI: The RDI that is inserted in transmitted CCM PDU. The type is u32 (bool). IFLA_BRIDGE_CFM_CC_CCM_TX_INSTANCE: The configured MEP instance number. The type is u32. IFLA_BRIDGE_CFM_CC_CCM_TX_DMAC: The transmitted CCM frame destination MAC address. The type is 6*u8 (array). This is used as DMAC in all transmitted CFM frames. IFLA_BRIDGE_CFM_CC_CCM_TX_SEQ_NO_UPDATE: The transmitted CCM frame update (increment) of sequence number is enabled or disabled. The type is u32 (bool). IFLA_BRIDGE_CFM_CC_CCM_TX_PERIOD: The period of time where CCM frame are transmitted. The type is u32. The time is given in seconds. SETLINK IFLA_BRIDGE_CFM_CC_CCM_TX must be done before timeout to keep transmission alive. When period is zero any ongoing CCM frame transmission will be stopped. IFLA_BRIDGE_CFM_CC_CCM_TX_IF_TLV: The transmitted CCM frame update with Interface Status TLV is enabled or disabled. The type is u32 (bool). IFLA_BRIDGE_CFM_CC_CCM_TX_IF_TLV_VALUE: The transmitted Interface Status TLV value field. The type is u8. IFLA_BRIDGE_CFM_CC_CCM_TX_PORT_TLV: The transmitted CCM frame update with Port Status TLV is enabled or disabled. The type is u32 (bool). IFLA_BRIDGE_CFM_CC_CCM_TX_PORT_TLV_VALUE: The transmitted Port Status TLV value field. The type is u8. Signed-off-by: Henrik Bjoernlund <henrik.bjoernlund@microchip.com> Reviewed-by: Horatiu Vultur <horatiu.vultur@microchip.com> Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-10-29 18:39:43 -07:00
Henrik Bjoernlund	dc32cbb3db	bridge: cfm: Kernel space implementation of CFM. CCM frame RX added. This is the third commit of the implementation of the CFM protocol according to 802.1Q section 12.14. Functionality is extended with CCM frame reception. The MEP instance now contains CCM based status information. Most important is the CCM defect status indicating if correct CCM frames are received with the expected interval. Signed-off-by: Henrik Bjoernlund <henrik.bjoernlund@microchip.com> Reviewed-by: Horatiu Vultur <horatiu.vultur@microchip.com> Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-10-29 18:39:43 -07:00
Henrik Bjoernlund	a806ad8ee2	bridge: cfm: Kernel space implementation of CFM. CCM frame TX added. This is the second commit of the implementation of the CFM protocol according to 802.1Q section 12.14. Functionality is extended with CCM frame transmission. Interface is extended with these functions: br_cfm_cc_rdi_set() br_cfm_cc_ccm_tx() br_cfm_cc_config_set() A MEP Continuity Check feature can be configured by br_cfm_cc_config_set() The Continuity Check parameters can be configured to be used when transmitting CCM. A MEP can be configured to start or stop transmission of CCM frames by br_cfm_cc_ccm_tx() The CCM will be transmitted for a selected period in seconds. Must call this function before timeout to keep transmission alive. A MEP transmitting CCM can be configured with inserted RDI in PDU by br_cfm_cc_rdi_set() Signed-off-by: Henrik Bjoernlund <henrik.bjoernlund@microchip.com> Reviewed-by: Horatiu Vultur <horatiu.vultur@microchip.com> Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-10-29 18:39:43 -07:00
Henrik Bjoernlund	86a14b79e1	bridge: cfm: Kernel space implementation of CFM. MEP create/delete. This is the first commit of the implementation of the CFM protocol according to 802.1Q section 12.14. It contains MEP instance create, delete and configuration. Connectivity Fault Management (CFM) comprises capabilities for detecting, verifying, and isolating connectivity failures in Virtual Bridged Networks. These capabilities can be used in networks operated by multiple independent organizations, each with restricted management access to each others equipment. CFM functions are partitioned as follows: - Path discovery - Fault detection - Fault verification and isolation - Fault notification - Fault recovery Interface consists of these functions: br_cfm_mep_create() br_cfm_mep_delete() br_cfm_mep_config_set() br_cfm_cc_config_set() br_cfm_cc_peer_mep_add() br_cfm_cc_peer_mep_remove() A MEP instance is created by br_cfm_mep_create() -It is the Maintenance association End Point described in 802.1Q section 19.2. -It is created on a specific level (1-7) and is assuring that no CFM frames are passing through this MEP on lower levels. -It initiates and validates CFM frames on its level. -It can only exist on a port that is related to a bridge. -Attributes given cannot be changed until the instance is deleted. A MEP instance can be deleted by br_cfm_mep_delete(). A created MEP instance has attributes that can be configured by br_cfm_mep_config_set(). A MEP Continuity Check feature can be configured by br_cfm_cc_config_set() The Continuity Check Receiver state machine can be enabled and disabled. According to 802.1Q section 19.2.8 A MEP can have Peer MEPs added and removed by br_cfm_cc_peer_mep_add() and br_cfm_cc_peer_mep_remove() The Continuity Check feature can maintain connectivity status on each added Peer MEP. Signed-off-by: Henrik Bjoernlund <henrik.bjoernlund@microchip.com> Reviewed-by: Horatiu Vultur <horatiu.vultur@microchip.com> Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-10-29 18:39:43 -07:00
Henrik Bjoernlund	f323aa54be	bridge: cfm: Add BRIDGE_CFM to Kconfig. This makes it possible to include or exclude the CFM protocol according to 802.1Q section 12.14. Signed-off-by: Henrik Bjoernlund <henrik.bjoernlund@microchip.com> Reviewed-by: Horatiu Vultur <horatiu.vultur@microchip.com> Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-10-29 18:39:43 -07:00
Henrik Bjoernlund	90c628dd47	net: bridge: extend the process of special frames This patch extends the processing of frames in the bridge. Currently MRP frames needs special processing and the current implementation doesn't allow a nice way to process different frame types. Therefore try to improve this by adding a list that contains frame types that need special processing. This list is iterated for each input frame and if there is a match based on frame type then these functions will be called and decide what to do with the frame. It can process the frame then the bridge doesn't need to do anything or don't process so then the bridge will do normal forwarding. Signed-off-by: Henrik Bjoernlund <henrik.bjoernlund@microchip.com> Reviewed-by: Horatiu Vultur <horatiu.vultur@microchip.com> Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-10-29 18:39:43 -07:00
Timothée COCAULT	63137bc588	netfilter: ebtables: Fixes dropping of small packets in bridge nat Fixes an error causing small packets to get dropped. skb_ensure_writable expects the second parameter to be a length in the ethernet payload.=20 If we want to write the ethernet header (src, dst), we should pass 0. Otherwise, packets with small payloads (< ETH_ALEN) will get dropped. Fixes: `c1a8311679` ("netfilter: bridge: convert skb_make_writable to skb_ensure_writable") Signed-off-by: Timothée COCAULT <timothee.cocault@orange.com> Reviewed-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2020-10-20 13:54:53 +02:00
Heiner Kallweit	f3f04f0f3a	net: bridge: use new function dev_fetch_sw_netstats Simplify the code by using new function dev_fetch_sw_netstats(). Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Link: https://lore.kernel.org/r/d1c3ff29-5691-9d54-d164-16421905fa59@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-10-13 17:33:49 -07:00
Jakub Kicinski	9d49aea13f	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Small conflict around locking in rxrpc_process_event() - channel_lock moved to bundle in next, while state lock needs _bh() from net. Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-10-08 15:44:50 -07:00
Henrik Bjoernlund	b6c02ef549	bridge: Netlink interface fix. This commit is correcting NETLINK br_fill_ifinfo() to be able to handle 'filter_mask' with multiple flags asserted. Fixes: `36a8e8e265` ("bridge: Extend br_fill_ifinfo to return MPR status") Signed-off-by: Henrik Bjoernlund <henrik.bjoernlund@microchip.com> Reviewed-by: Horatiu Vultur <horatiu.vultur@microchip.com> Suggested-by: Nikolay Aleksandrov <nikolay@nvidia.com> Tested-by: Horatiu Vultur <horatiu.vultur@microchip.com> Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-10-08 12:05:07 -07:00
David S. Miller	8b0308fe31	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Rejecting non-native endian BTF overlapped with the addition of support for it. The rest were more simple overlapping changes, except the renesas ravb binding update, which had to follow a file move as well as a YAML conversion. Signed-off-by: David S. Miller <davem@davemloft.net>	2020-10-05 18:40:01 -07:00
Taehee Yoo	eff7423365	net: core: introduce struct netdev_nested_priv for nested interface infrastructure Functions related to nested interface infrastructure such as netdev_walk_all_{ upper \| lower }_dev() pass both private functions and "data" pointer to handle their own things. At this point, the data pointer type is void *. In order to make it easier to expand common variables and functions, this new netdev_nested_priv structure is added. In the following patch, a new member variable will be added into this struct to fix the lockdep issue. Signed-off-by: Taehee Yoo <ap420073@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-28 15:00:15 -07:00
Nikolay Aleksandrov	f2f3729fb6	net: bridge: fdb: don't flush ext_learn entries When a user-space software manages fdb entries externally it should set the ext_learn flag which marks the fdb entry as externally managed and avoids expiring it (they're treated as static fdbs). Unfortunately on events where fdb entries are flushed (STP down, netlink fdb flush etc) these fdbs are also deleted automatically by the bridge. That in turn causes trouble for the managing user-space software (e.g. in MLAG setups we lose remote fdb entries on port flaps). These entries are completely externally managed so we should avoid automatically deleting them, the only exception are offloaded entries (i.e. BR_FDB_ADDED_BY_EXT_LEARN + BR_FDB_OFFLOADED). They are flushed as before. Fixes: `eb100e0e24` ("net: bridge: allow to add externally learned entries from user-space") Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-28 12:47:43 -07:00
Nikolay Aleksandrov	7470558240	net: bridge: mcast: remove only S,G port groups from sg_port hash We should remove a group from the sg_port hash only if it's an S,G entry. This makes it correct and more symmetric with group add. Also since *,G groups are not added to that hash we can hide a bug. Fixes: `085b53c8be` ("net: bridge: mcast: add sg_port rhashtable") Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-25 16:50:19 -07:00
Nikolay Aleksandrov	36cfec7359	net: bridge: mcast: when forwarding handle filter mode and blocked flag We need to avoid forwarding to ports in MCAST_INCLUDE filter mode when the mdst entry is a *,G or when the port has the blocked flag. Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-23 13:24:35 -07:00
Nikolay Aleksandrov	094b82fd53	net: bridge: mcast: handle host state Since host joins are considered as EXCLUDE {} joins we need to reflect that in all of *,G ports' S,G entries. Since the S,Gs can have host_joined == true only set automatically we can safely set it to false when removing all automatically added entries upon S,G delete. Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-23 13:24:34 -07:00
Nikolay Aleksandrov	9116ffbf1d	net: bridge: mcast: add support for blocked port groups When excluding S,G entries we need a way to block a particular S,G,port. The new port group flag is managed based on the source's timer as per RFCs 3376 and 3810. When a source expires and its port group is in EXCLUDE mode, it will be blocked. Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-23 13:24:34 -07:00
Nikolay Aleksandrov	8266a0491e	net: bridge: mcast: handle port group filter modes We need to handle group filter mode transitions and initial state. To change a port group's INCLUDE -> EXCLUDE mode (or when we have added a new port group in EXCLUDE mode) we need to add that port to all of ,G ports' S,G entries for proper replication. When the EXCLUDE state is changed from IGMPv3 report, br_multicast_fwd_filter_exclude() must be called after the source list processing because the assumption is that all of the group's S,G entries will be created before transitioning to EXCLUDE mode, i.e. most importantly its blocked entries will already be added so it will not get automatically added to them. The transition EXCLUDE -> INCLUDE happens only when a port group timer expires, it requires us to remove that port from all of ,G ports' S,G entries where it was automatically added previously. Finally when we are adding a new S,G entry we must add all of ,G's EXCLUDE ports to it. In order to distinguish automatically added ,G EXCLUDE ports we have a new port group flag - MDB_PG_FLAGS_STAR_EXCL. Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-23 13:24:34 -07:00
Nikolay Aleksandrov	b08123684b	net: bridge: mcast: install S,G entries automatically based on reports This patch adds support for automatic install of S,G mdb entries based on the port group's source list and the source entry's timer. Once installed the S,G will be used when forwarding packets if the approprate multicast/mld versions are set. A new source flag called BR_SGRP_F_INSTALLED denotes if the source has a forwarding mdb entry installed. Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-23 13:24:34 -07:00
Nikolay Aleksandrov	085b53c8be	net: bridge: mcast: add sg_port rhashtable To speedup S,G forward handling we need to be able to quickly find out if a port is a member of an S,G group. To do that add a global S,G port rhashtable with key: source addr, group addr, protocol, vid (all br_ip fields) and port pointer. Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-23 13:24:34 -07:00
Nikolay Aleksandrov	8f8cb77e0b	net: bridge: mcast: add rt_protocol field to the port group struct We need to be able to differentiate between pg entries created by user-space and the kernel when we start generating S,G entries for IGMPv3/MLDv2's fast path. User-space entries are created by default as RTPROT_STATIC and the kernel entries are RTPROT_KERNEL. Later we can allow user-space to provide the entry rt_protocol so we can differentiate between who added the entries specifically (e.g. clag, admin, frr etc). Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-23 13:24:34 -07:00
Nikolay Aleksandrov	7d07a68c25	net: bridge: mcast: when igmpv3/mldv2 are enabled lookup (S,G) first, then (,G) If (S,G) entries are enabled (igmpv3/mldv2) then look them up first. If there isn't a present (S,G) entry then try to find (,G). Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-23 13:24:34 -07:00
Nikolay Aleksandrov	88d4bd1804	net: bridge: mdb: add support for add/del/dump of entries with source Add new mdb attributes (MDBE_ATTR_SOURCE for setting, MDBA_MDB_EATTR_SOURCE for dumping) to allow add/del and dump of mdb entries with a source address (S,G). New S,G entries are created with filter mode of MCAST_INCLUDE. The same attributes are used for IPv4 and IPv6, they're validated and parsed based on their protocol. S,G host joined entries which are added by user are not allowed yet. Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-23 13:24:34 -07:00
Nikolay Aleksandrov	9c4258c78a	net: bridge: mdb: add support to extend add/del commands Since the MDB add/del code expects an exact struct br_mdb_entry we can't really add any extensions, thus add a new nested attribute at the level of MDBA_SET_ENTRY called MDBA_SET_ENTRY_ATTRS which will be used to pass all new options via netlink attributes. This patch doesn't change anything functionally since the new attribute is not used yet, only parsed. Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-23 13:24:34 -07:00
Nikolay Aleksandrov	eab3227b12	net: bridge: mcast: rename br_ip's u member to dst Since now we have src in br_ip, u no longer makes sense so rename it to dst. No functional changes. v2: fix build with CONFIG_BATMAN_ADV_MCAST CC: Marek Lindner <mareklindner@neomailbox.ch> CC: Simon Wunderlich <sw@simonwunderlich.de> CC: Antonio Quartulli <a@unstable.cc> CC: Sven Eckelmann <sven@narfation.org> CC: b.a.t.m.a.n@lists.open-mesh.org Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-23 13:24:34 -07:00
Nikolay Aleksandrov	deb965662d	net: bridge: mcast: use br_ip's src for src groups and querier address Now that we have src and dst in br_ip it is logical to use the src field for the cases where we need to work with a source address such as querier source address and group source address. Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-23 13:24:34 -07:00
Nikolay Aleksandrov	83f7398ea5	net: bridge: mdb: use extack in br_mdb_add() and br_mdb_add_group() Pass and use extack all the way down to br_mdb_add_group(). Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-23 13:24:34 -07:00
Nikolay Aleksandrov	7eea629d07	net: bridge: mdb: move all port and bridge checks to br_mdb_add To avoid doing duplicate device checks and searches (the same were done in br_mdb_add and __br_mdb_add) pass the already found port to __br_mdb_add and pull the bridge's netif_running and enabled multicast checks to br_mdb_add. This would also simplify the future extack errors. Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-23 13:24:34 -07:00
Nikolay Aleksandrov	2ac95dfe25	net: bridge: mdb: use extack in br_mdb_parse() We can drop the pr_info() calls and just use extack to return a meaningful error to user-space when br_mdb_parse() fails. Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-23 13:24:34 -07:00
David S. Miller	3ab0a7a0c3	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Two minor conflicts: 1) net/ipv4/route.c, adding a new local variable while moving another local variable and removing it's initial assignment. 2) drivers/net/dsa/microchip/ksz9477.c, overlapping changes. One pretty prints the port mode differently, whilst another changes the driver to try and obtain the port mode from the port node rather than the switch node. Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-22 16:45:34 -07:00
Vladimir Oltean	99f62a7460	net: bridge: br_vlan_get_pvid_rcu() should dereference the VLAN group under RCU When calling the RCU brother of br_vlan_get_pvid(), lockdep warns: ============================= WARNING: suspicious RCU usage 5.9.0-rc3-01631-g13c17acb8e38-dirty #814 Not tainted ----------------------------- net/bridge/br_private.h:1054 suspicious rcu_dereference_protected() usage! Call trace: lockdep_rcu_suspicious+0xd4/0xf8 __br_vlan_get_pvid+0xc0/0x100 br_vlan_get_pvid_rcu+0x78/0x108 The warning is because br_vlan_get_pvid_rcu() calls nbp_vlan_group() which calls rtnl_dereference() instead of rcu_dereference(). In turn, rtnl_dereference() calls rcu_dereference_protected() which assumes operation under an RCU write-side critical section, which obviously is not the case here. So, when the incorrect primitive is used to access the RCU-protected VLAN group pointer, READ_ONCE() is not used, which may cause various unexpected problems. I'm sad to say that br_vlan_get_pvid() and br_vlan_get_pvid_rcu() cannot share the same implementation. So fix the bug by splitting the 2 functions, and making br_vlan_get_pvid_rcu() retrieve the VLAN groups under proper locking annotations. Fixes: `7582f5b70f` ("bridge: add br_vlan_get_pvid_rcu()") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-21 17:37:44 -07:00
Randy Dunlap	4bbd026cb9	net: bridge: delete duplicated words Drop repeated words in net/bridge/. Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: Jakub Kicinski <kuba@kernel.org> Cc: Roopa Prabhu <roopa@nvidia.com> Cc: Nikolay Aleksandrov <nikolay@nvidia.com> Cc: bridge@lists.linux-foundation.org Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-18 14:12:43 -07:00
Nikolay Aleksandrov	d5bf31ddd8	net: bridge: mcast: don't ignore return value of __grp_src_toex_excl When we're handling TO_EXCLUDE report in EXCLUDE filter mode we should not ignore the return value of __grp_src_toex_excl() as we'll miss sending notifications about group changes. Fixes: `5bf1e00b68` ("net: bridge: mcast: support for IGMPV3/MLDv2 CHANGE_TO_INCLUDE/EXCLUDE report") Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-16 17:13:25 -07:00
Alexandra Winter	d05e8e68b0	bridge: Add SWITCHDEV_FDB_FLUSH_TO_BRIDGE notifier so the switchdev can notifiy the bridge to flush non-permanent fdb entries for this port. This is useful whenever the hardware fdb of the switchdev is reset, but the netdev and the bridgeport are not deleted. Note that this has the same effect as the IFLA_BRPORT_FLUSH attribute. CC: Jiri Pirko <jiri@resnulli.us> CC: Ivan Vecera <ivecera@redhat.com> CC: Roopa Prabhu <roopa@nvidia.com> CC: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: Alexandra Winter <wintera@linux.ibm.com> Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com> Acked-by: Ivan Vecera <ivecera@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-15 13:21:47 -07:00
Ido Schimmel	12913f7459	bridge: mcast: Fix incomplete MDB dump Each MDB entry is encoded in a nested netlink attribute called 'MDBA_MDB_ENTRY'. In turn, this attribute contains another nested attributed called 'MDBA_MDB_ENTRY_INFO', which encodes a single port group entry within the MDB entry. The cited commit added the ability to restart a dump from a specific port group entry. However, on failure to add a port group entry to the dump the entire MDB entry (stored in 'nest2') is removed, resulting in missing port group entries. Fix this by finalizing the MDB entry with the partial list of already encoded port group entries. Fixes: `5205e919c9` ("net: bridge: mcast: add support for src list and filter mode dumping") Signed-off-by: Ido Schimmel <idosch@nvidia.com> Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-11 14:49:47 -07:00
David S. Miller	d85427e3c8	Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next Pablo Neira Ayuso says: ==================== Netfilter updates for net-next The following patchset contains Netfilter updates for net-next: 1) Rewrite inner header IPv6 in ICMPv6 messages in ip6t_NPT, from Michael Zhou. 2) do_ip_vs_set_ctl() dereferences uninitialized value, from Peilin Ye. 3) Support for userdata in tables, from Jose M. Guisado. 4) Do not increment ct error and invalid stats at the same time, from Florian Westphal. 5) Remove ct ignore stats, also from Florian. 6) Add ct stats for clash resolution, from Florian Westphal. 7) Bump reference counter bump on ct clash resolution only, this is safe because bucket lock is held, again from Florian. 8) Use ip_is_fragment() in xt_HMARK, from YueHaibing. 9) Add wildcard support for nft_socket, from Balazs Scheidler. 10) Remove superfluous IPVS dependency on iptables, from Yaroslav Bolyukin. 11) Remove unused definition in ebt_stp, from Wang Hai. 12) Replace CONFIG_NFT_CHAIN_NAT_{IPV4,IPV6} by CONFIG_NFT_NAT in selftests/net, from Fabian Frederick. 13) Add userdata support for nft_object, from Jose M. Guisado. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-09 11:21:19 -07:00
Nikolay Aleksandrov	071445c605	net: bridge: mcast: fix unused br var when lockdep isn't defined Stephen reported the following warning: net/bridge/br_multicast.c: In function 'br_multicast_find_port': net/bridge/br_multicast.c:1818:21: warning: unused variable 'br' [-Wunused-variable] 1818 \| struct net_bridge *br = mp->br; \| ^~ It happens due to bridge's mlock_dereference() when lockdep isn't defined. Silence the warning by annotating the variable as __maybe_unused. Fixes: `0436862e41` ("net: bridge: mcast: support for IGMPv3/MLDv2 ALLOW_NEW_SOURCES report") Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-08 20:11:57 -07:00
Wang Hai	36c3be8a2c	netfilter: ebt_stp: Remove unused macro BPDU_TYPE_TCN BPDU_TYPE_TCN is never used after it was introduced. So better to remove it. Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Wang Hai <wanghai38@huawei.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2020-09-08 12:56:38 +02:00
Nikolay Aleksandrov	e12cec65b5	net: bridge: mcast: destroy all entries via gc Since each entry type has timers that can be running simultaneously we need to make sure that entries are not freed before their timers have finished. In order to do that generalize the src gc work to mcast gc work and use a callback to free the entries (mdb, port group or src). v3: add IPv6 support v2: force mcast gc on port del to make sure all port group timers have finished before freeing the bridge port Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-09-07 13:16:36 -07:00
Nikolay Aleksandrov	23550b8313	net: bridge: mcast: improve IGMPv3/MLDv2 query processing When an IGMPv3/MLDv2 query is received and we're operating in such mode then we need to avoid updating group timers if the suppress flag is set. Also we should update only timers for groups in exclude mode. v3: add IPv6/MLDv2 support Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-09-07 13:16:36 -07:00
Nikolay Aleksandrov	109865fe12	net: bridge: mcast: support for IGMPV3/MLDv2 BLOCK_OLD_SOURCES report We already have all necessary helpers, so process IGMPV3/MLDv2 BLOCK_OLD_SOURCES as per the RFCs. v3: add IPv6/MLDv2 support v2: directly do flag bit operations Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-09-07 13:16:36 -07:00
Nikolay Aleksandrov	5bf1e00b68	net: bridge: mcast: support for IGMPV3/MLDv2 CHANGE_TO_INCLUDE/EXCLUDE report In order to process IGMPV3/MLDv2 CHANGE_TO_INCLUDE/EXCLUDE report types we need new helpers which allow us to mark entries based on their timer state and to query only marked entries. v3: add IPv6/MLDv2 support, fix other_query checks v2: directly do flag bit operations Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-09-07 13:16:35 -07:00
Nikolay Aleksandrov	e6231bca6a	net: bridge: mcast: support for IGMPV3/MLDv2 MODE_IS_INCLUDE/EXCLUDE report In order to process IGMPV3/MLDv2_MODE_IS_INCLUDE/EXCLUDE report types we need some new helpers which allow us to set/clear flags for all current entries and later delete marked entries after the report sources have been processed. v3: add IPv6/MLDv2 support v2: drop flag helpers and directly do flag bit operations Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-09-07 13:16:35 -07:00
Nikolay Aleksandrov	0436862e41	net: bridge: mcast: support for IGMPv3/MLDv2 ALLOW_NEW_SOURCES report This patch adds handling for the ALLOW_NEW_SOURCES IGMPv3/MLDv2 report types and limits them only when multicast_igmp_version == 3 or multicast_mld_version == 2 respectively. Now that IGMPv3/MLDv2 handling functions will be managing timers we need to delay their activation, thus a new argument is added which controls if the timer should be updated. We also disable host IGMPv3/MLDv2 handling as it's not yet implemented and could cause inconsistent group state, the host can only join a group as EXCLUDE {} or leave it. v4: rename update_timer to igmpv2_mldv1 and use the passed value from br_multicast_add_group's callers v3: Add IPv6/MLDv2 support Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-09-07 13:16:35 -07:00
Nikolay Aleksandrov	d6c33d67a8	net: bridge: mcast: delete expired port groups without srcs If an expired port group is in EXCLUDE mode, then we have to turn it into INCLUDE mode, remove all srcs with zero timer and finally remove the group itself if there are no more srcs with an active timer. For IGMPv2 use there would be no sources, so this will reduce to just removing the group as before. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-09-07 13:16:35 -07:00
Nikolay Aleksandrov	81f1983852	net: bridge: mdb: use mdb and port entries in notifications We have to use mdb and port entries when sending mdb notifications in order to fill in all group attributes properly. Before this change we would've used a fake br_mdb_entry struct to fill in only partial information about the mdb. Now we can also reuse the mdb dump fill function and thus have only a single central place which fills the mdb attributes. v3: add IPv6 support Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-09-07 13:16:35 -07:00
Nikolay Aleksandrov	79abc87505	net: bridge: mdb: push notifications in __br_mdb_add/del This change is in preparation for using the mdb port group entries when sending a notification, so their full state and additional attributes can be filled in. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-09-07 13:16:35 -07:00
Nikolay Aleksandrov	42c11ccfe8	net: bridge: mcast: add support for group query retransmit We need to be able to retransmit group-specific and group-and-source specific queries. The new timer takes care of those. v3: add IPv6 support Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-09-07 13:16:35 -07:00
Nikolay Aleksandrov	438ef2d027	net: bridge: mcast: add support for group-and-source specific queries Allows br_multicast_alloc_query to build queries with the port group's source lists and sends a query for sources over and under lmqt when necessary as per RFCs 3376 and 3810 with the suppress flag set appropriately. v3: add IPv6 support Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-09-07 13:16:34 -07:00
Nikolay Aleksandrov	5205e919c9	net: bridge: mcast: add support for src list and filter mode dumping Support per port group src list (address and timer) and filter mode dumping. Protected by either multicast_lock or rcu. v3: add IPv6 support v2: require RCU or multicast_lock to traverse src groups Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-09-07 13:16:34 -07:00
Nikolay Aleksandrov	8b671779b7	net: bridge: mcast: add support for group source list Initial functions for group source lists which are needed for IGMPv3 and MLDv2 include/exclude lists. Both IPv4 and IPv6 sources are supported. User-added mdb entries are created with exclude filter mode, we can extend that later to allow user-supplied mode. When group src entries are deleted, they're freed from a workqueue to make sure their timers are not still running. Source entries are protected by the multicast_lock and rcu. The number of src groups per port group is limited to 32. v4: use the new port group del function directly add igmpv2/mldv1 bool to denote if the entry was added in those modes, it will later replace the old update_timer bool v3: add IPv6 support v2: allow src groups to be traversed under rcu Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-09-07 13:16:34 -07:00
Nikolay Aleksandrov	681590bd4c	net: bridge: mcast: factor out port group del In order to avoid future errors and reduce code duplication we should factor out the port group del sequence. This allows us to have one function which takes care of all details when removing a port group. v4: set pg's fast leave flag when deleting due to fast leave move the patch before adding source lists Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-09-07 13:16:34 -07:00
Nikolay Aleksandrov	6ec0d0ee66	net: bridge: mdb: arrange internal structs so fast-path fields are close Before this patch we'd need 2 cache lines for fast-path, now all used fields are in the first cache line. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-09-07 13:16:34 -07:00
Johannes Berg	8140860c81	netlink: consistently use NLA_POLICY_EXACT_LEN() Change places that open-code NLA_POLICY_EXACT_LEN() to use the macro instead, giving us flexibility in how we handle the details of the macro. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Acked-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-08-18 12:28:45 -07:00
Florian Westphal	5c04da55c7	netfilter: ebtables: reject bogus getopt len value syzkaller reports splat: ------------[ cut here ]------------ Buffer overflow detected (80 < 137)! Call Trace: do_ebt_get_ctl+0x2b4/0x790 net/bridge/netfilter/ebtables.c:2317 nf_getsockopt+0x72/0xd0 net/netfilter/nf_sockopt.c:116 ip_getsockopt net/ipv4/ip_sockglue.c:1778 [inline] caused by a copy-to-user with a too-large "len" value. This adds a argument check on len just like in the non-compat version of the handler. Before the "Fixes" commit, the reproducer fails with -EINVAL as expected: 1. core calls the "compat" getsockopt version 2. compat getsockopt version detects the len value is possibly in 64-bit layout (len != compat_len) 3. compat getsockopt version delegates everything to native getsockopt version 4. native getsockopt rejects invalid len -> compat handler only sees len == sizeof(compat_struct) for GET_ENTRIES. After the refactor, event sequence is: 1. getsockopt calls "compat" version (len != native_len) 2. compat version attempts to copy len bytes, where *len is random value from userspace Fixes: `fc66de8e16` ("netfilter/ebtables: clean up compat {get, set}sockopt handling") Reported-by: syzbot+5accb5c62faa1d346480@syzkaller.appspotmail.com Signed-off-by: Florian Westphal <fw@strlen.de> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2020-08-14 11:59:08 +02:00
Florian Westphal	2404b73c3f	netfilter: avoid ipv6 -> nf_defrag_ipv6 module dependency nf_ct_frag6_gather is part of nf_defrag_ipv6.ko, not ipv6 core. The current use of the netfilter ipv6 stub indirections causes a module dependency between ipv6 and nf_defrag_ipv6. This prevents nf_defrag_ipv6 module from being removed because ipv6 can't be unloaded. Remove the indirection and always use a direct call. This creates a depency from nf_conntrack_bridge to nf_defrag_ipv6 instead: modinfo nf_conntrack depends: nf_conntrack,nf_defrag_ipv6,bridge .. and nf_conntrack already depends on nf_defrag_ipv6 anyway. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2020-08-13 04:16:15 +02:00
Linus Torvalds	47ec5303d7	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next Pull networking updates from David Miller: 1) Support 6Ghz band in ath11k driver, from Rajkumar Manoharan. 2) Support UDP segmentation in code TSO code, from Eric Dumazet. 3) Allow flashing different flash images in cxgb4 driver, from Vishal Kulkarni. 4) Add drop frames counter and flow status to tc flower offloading, from Po Liu. 5) Support n-tuple filters in cxgb4, from Vishal Kulkarni. 6) Various new indirect call avoidance, from Eric Dumazet and Brian Vazquez. 7) Fix BPF verifier failures on 32-bit pointer arithmetic, from Yonghong Song. 8) Support querying and setting hardware address of a port function via devlink, use this in mlx5, from Parav Pandit. 9) Support hw ipsec offload on bonding slaves, from Jarod Wilson. 10) Switch qca8k driver over to phylink, from Jonathan McDowell. 11) In bpftool, show list of processes holding BPF FD references to maps, programs, links, and btf objects. From Andrii Nakryiko. 12) Several conversions over to generic power management, from Vaibhav Gupta. 13) Add support for SO_KEEPALIVE et al. to bpf_setsockopt(), from Dmitry Yakunin. 14) Various https url conversions, from Alexander A. Klimov. 15) Timestamping and PHC support for mscc PHY driver, from Antoine Tenart. 16) Support bpf iterating over tcp and udp sockets, from Yonghong Song. 17) Support 5GBASE-T i40e NICs, from Aleksandr Loktionov. 18) Add kTLS RX HW offload support to mlx5e, from Tariq Toukan. 19) Fix the ->ndo_start_xmit() return type to be netdev_tx_t in several drivers. From Luc Van Oostenryck. 20) XDP support for xen-netfront, from Denis Kirjanov. 21) Support receive buffer autotuning in MPTCP, from Florian Westphal. 22) Support EF100 chip in sfc driver, from Edward Cree. 23) Add XDP support to mvpp2 driver, from Matteo Croce. 24) Support MPTCP in sock_diag, from Paolo Abeni. 25) Commonize UDP tunnel offloading code by creating udp_tunnel_nic infrastructure, from Jakub Kicinski. 26) Several pci_ --> dma_ API conversions, from Christophe JAILLET. 27) Add FLOW_ACTION_POLICE support to mlxsw, from Ido Schimmel. 28) Add SK_LOOKUP bpf program type, from Jakub Sitnicki. 29) Refactor a lot of networking socket option handling code in order to avoid set_fs() calls, from Christoph Hellwig. 30) Add rfc4884 support to icmp code, from Willem de Bruijn. 31) Support TBF offload in dpaa2-eth driver, from Ioana Ciornei. 32) Support XDP_REDIRECT in qede driver, from Alexander Lobakin. 33) Support PCI relaxed ordering in mlx5 driver, from Aya Levin. 34) Support TCP syncookies in MPTCP, from Flowian Westphal. 35) Fix several tricky cases of PMTU handling wrt. briding, from Stefano Brivio. * git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (2056 commits) net: thunderx: initialize VF's mailbox mutex before first usage usb: hso: remove bogus check for EINPROGRESS usb: hso: no complaint about kmalloc failure hso: fix bailout in error case of probe ip_tunnel_core: Fix build for archs without _HAVE_ARCH_IPV6_CSUM selftests/net: relax cpu affinity requirement in msg_zerocopy test mptcp: be careful on subflow creation selftests: rtnetlink: make kci_test_encap() return sub-test result selftests: rtnetlink: correct the final return value for the test net: dsa: sja1105: use detected device id instead of DT one on mismatch tipc: set ub->ifindex for local ipv6 address ipv6: add ipv6_dev_find() net: openvswitch: silence suspicious RCU usage warning Revert "vxlan: fix tos value before xmit" ptp: only allow phase values lower than 1 period farsync: switch from 'pci_' to 'dma_' API wan: wanxl: switch from 'pci_' to 'dma_' API hv_netvsc: do not use VF device if link is down dpaa2-eth: Fix passing zero to 'PTR_ERR' warning net: macb: Properly handle phylink on at91sam9x ...	2020-08-05 20:13:21 -07:00
Linus Torvalds	fd76a74d94	audit/stable-5.9 PR 20200803 -----BEGIN PGP SIGNATURE----- iQJIBAABCAAyFiEES0KozwfymdVUl37v6iDy2pc3iXMFAl8okpIUHHBhdWxAcGF1 bC1tb29yZS5jb20ACgkQ6iDy2pc3iXNqOQ/8D+m9Ykcby3csEKsp8YtsaukEu62U lRVaxzRNO9wwB24aFwDFuJnIkmsSi/s/O4nBsy2mw+Apn+uDCvHQ9tBU07vlNn2f lu27YaTya7YGlqoe315xijd8tyoX99k8cpQeixvAVr9/jdR09yka7SJ8O7X9mjV7 +SUVDiKCplPKpiwCCRS9cqD7F64T6y35XKzbrzYqdP0UOF2XelZo/Evt5rDRvWUf 5qDN2tP+iM/Fvu5lCfczFwAeivfAdxjQ11n783hx8Ms2qyiaKQCzbEwjqAslmkbs 1k/+ED0NjzXX1ne0JZaz/bk0wsMnmOoa8o+NDcyd7Za/cj5prUZi7kBy+xry4YV8 qKJ40Lk0flCWgUpm6bkYVOByIYHk0gmfBNvjilqf25NR/eOC/9e9ir8PywvYUW/7 kvVK37+N/a3LnFj80sZpIeqqnNU8z9PV1i7//5/kDuKvz94Bq83TJDO6pPKvqDtC njQfCFoHwdEeF8OalK793lIiYaoODqvbkWKChKMqziODJ4ZP8AW06gXpEbEWn7G3 TTnJx7hqzR9t90vBQJeO3Fromfn+9TDlZVdX+EGO8gIqUiLGr0r7LPPep4VkDbNw LxMYKeC2cgRp8Z+XXPDxfXSDL2psTwg6CXcDrXcYnUyBo/yerpBvbJkeaR0h+UR0 j6cvMX+T39X2JXM= =Xs3M -----END PGP SIGNATURE----- Merge tag 'audit-pr-20200803' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit Pull audit updates from Paul Moore: "Aside from some smaller bug fixes, here are the highlights: - add a new backlog wait metric to the audit status message, this is intended to help admins determine how long processes have been waiting for the audit backlog queue to clear - generate audit records for nftables configuration changes - generate CWD audit records for for the relevant LSM audit records" * tag 'audit-pr-20200803' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit: audit: report audit wait metric in audit status reply audit: purge audit_log_string from the intra-kernel audit API audit: issue CWD record to accompany LSM_AUDIT_DATA_* records audit: use the proper gfp flags in the audit_log_nfcfg() calls audit: remove unused !CONFIG_AUDITSYSCALL __audit_inode* stubs audit: add gfp parameter to audit_log_nfcfg audit: log nftables configuration change events audit: Use struct_size() helper in alloc_chunk	2020-08-04 14:20:26 -07:00
David S. Miller	f2e0b29a9a	Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next Pablo Neira Ayuso says: ==================== Netfilter updates for net-next 1) UAF in chain binding support from previous batch, from Dan Carpenter. 2) Queue up delayed work to expire connections with no destination, from Andrew Sy Kim. 3) Use fallthrough pseudo-keyword, from Gustavo A. R. Silva. 4) Replace HTTP links with HTTPS, from Alexander A. Klimov. 5) Remove superfluous null header checks in ip6tables, from Gaurav Singh. 6) Add extended netlink error reporting for expression. 7) Report EEXIST on overlapping chain, set elements and flowtable devices. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2020-08-03 16:03:18 -07:00
Nikolay Aleksandrov	fd65e5a95d	net: bridge: clear bridge's private skb space on xmit We need to clear all of the bridge private skb variables as they can be stale due to the packet being recirculated through the stack and then transmitted through the bridge device. Similar memset is already done on bridge's input. We've seen cases where proxyarp_replied was 1 on routed multicast packets transmitted through the bridge to ports with neigh suppress which were getting dropped. Same thing can in theory happen with the port isolation bit as well. Fixes: `821f1b21ca` ("bridge: add new BR_NEIGH_SUPPRESS port flag to suppress arp and nd flood") Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-08-03 15:26:46 -07:00
Christoph Hellwig	c2f12630c6	netfilter: switch nf_setsockopt to sockptr_t Pass a sockptr_t to prepare for set_fs-less handling of the kernel pointer from bpf-cgroup. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-24 15:41:54 -07:00
Christoph Hellwig	7e4b9dbabb	netfilter: remove the unused user argument to do_update_counters Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-24 15:41:53 -07:00
Gustavo A. R. Silva	954d82979b	netfilter: Use fallthrough pseudo-keyword Replace the existing /* fall through */ comments and its variants with the new pseudo-keyword macro fallthrough[1]. Also, remove unnecessary fall-through markings when it is the case. [1] https://www.kernel.org/doc/html/latest/process/deprecated.html?highlight=fallthrough#implicit-switch-case-fall-through Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2020-07-22 01:18:05 +02:00
Christoph Hellwig	fc66de8e16	netfilter/ebtables: clean up compat {get, set}sockopt handling Merge the native and compat {get,set}sockopt handlers using in_compat_syscall(). Note that this required moving a fair amout of code around to be done sanely. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-19 18:16:40 -07:00
Horatiu Vultur	ffb3adba64	net: bridge: Add port attribute IFLA_BRPORT_MRP_IN_OPEN This patch adds a new port attribute, IFLA_BRPORT_MRP_IN_OPEN, which allows to notify the userspace when the node lost the contiuity of MRP_InTest frames. Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com> Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-14 13:46:43 -07:00
Horatiu Vultur	4fc4871fc2	bridge: mrp: Extend br_mrp_fill_info This patch extends the function br_mrp_fill_info to return also the status for the interconnect ring. Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com> Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-14 13:46:43 -07:00
Horatiu Vultur	7ab1748e4c	bridge: mrp: Extend MRP netlink interface for configuring MRP interconnect This patch extends the existing MRP netlink interface with the following attributes: IFLA_BRIDGE_MRP_IN_ROLE, IFLA_BRIDGE_MRP_IN_STATE and IFLA_BRIDGE_MRP_START_IN_TEST. These attributes are similar with their ring attributes but they apply to the interconnect port. Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com> Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-14 13:46:43 -07:00
Horatiu Vultur	537ed5676d	bridge: mrp: Implement the MRP Interconnect API Thie patch adds support for MRP Interconnect. Similar with the MRP ring, if the HW can't generate MRP_InTest frames, then the SW will try to generate them. And if also the SW fails to generate the frames then an error is return to userspace. The forwarding/termination of MRP_In frames is happening in the kernel and is done by MRP instances. Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com> Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-14 13:46:43 -07:00
Horatiu Vultur	f23f0db360	bridge: switchdev: mrp: Extend MRP API for switchdev for MRP Interconnect Implement the MRP API for interconnect switchdev. Similar with the other br_mrp_switchdev function, these function will just eventually call the switchdev functions: switchdev_port_obj_add/del. Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com> Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-14 13:46:43 -07:00
Horatiu Vultur	4139d4b51a	bridge: mrp: Add br_mrp_in_port_open function This function notifies the userspace when the node lost the continuity of MRP_InTest frames. Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com> Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-14 13:46:42 -07:00
Horatiu Vultur	4cc625c63a	bridge: mrp: Rename br_mrp_port_open to br_mrp_ring_port_open This patch renames the function br_mrp_port_open to br_mrp_ring_port_open. In this way is more clear that a ring port lost the continuity because there will be also a br_mrp_in_port_open. Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com> Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-14 13:46:42 -07:00
Horatiu Vultur	78c1b4fb0e	bridge: mrp: Extend br_mrp for MRP interconnect This patch extends the 'struct br_mrp' to contain information regarding the MRP interconnect. It contains the following: - the interconnect port 'i_port', which is NULL if the node doesn't have a interconnect role - the interconnect id, which is similar with the ring id, but this field is also part of the MRP_InTest frames. - the interconnect role, which can be MIM or MIC. - the interconnect state, which can be open or closed. - the interconnect delayed_work for sending MRP_InTest frames and check for lost of continuity. Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com> Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-14 13:46:42 -07:00
Nikolay Aleksandrov	528ae84a34	net: bridge: fix undefined br_vlan_can_enter_range in tunnel code If bridge vlan filtering is not defined we won't have br_vlan_can_enter_range and thus will get a compile error as was reported by Stephen and the build bot. So let's define a stub for when vlan filtering is not used. Fixes: `9433944368` ("net: bridge: notify on vlan tunnel changes done via the old api") Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-13 11:22:55 -07:00
Nikolay Aleksandrov	9433944368	net: bridge: notify on vlan tunnel changes done via the old api If someone uses the old vlan API to configure tunnel mappings we'll only generate the old-style full port notification. That would be a problem if we are monitoring the new vlan notifications for changes. The patch resolves the issue by adding vlan notifications to the old tunnel netlink code. As usual we try to compress the notifications for as many vlans in a range as possible, thus a vlan tunnel change is considered able to enter the "current" vlan notification range if: 1. vlan exists 2. it has actually changed (curr_change == true) 3. it passes all standard vlan notification range checks done by br_vlan_can_enter_range() such as option equality, id continuity etc Note that vlan tunnel changes (add/del) are considered a part of vlan options so only RTM_NEWVLAN notification is generated with the relevant information inside. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-12 15:18:24 -07:00
David S. Miller	71930d6102	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net All conflicts seemed rather trivial, with some guidance from Saeed Mameed on the tc_ct.c one. Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-11 00:46:00 -07:00
Linus Lüssing	5fc6266af7	bridge: mcast: Fix MLD2 Report IPv6 payload length check Commit `e57f61858b` ("net: bridge: mcast: fix stale nsrcs pointer in igmp3/mld2 report handling") introduced a bug in the IPv6 header payload length check which would potentially lead to rejecting a valid MLD2 Report: The check needs to take into account the 2 bytes for the "Number of Sources" field in the "Multicast Address Record" before reading it. And not the size of a pointer to this field. Fixes: `e57f61858b` ("net: bridge: mcast: fix stale nsrcs pointer in igmp3/mld2 report handling") Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: Linus Lüssing <linus.luessing@c0d3.blue> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-07 15:37:57 -07:00
Horatiu Vultur	36a8e8e265	bridge: Extend br_fill_ifinfo to return MPR status This patch extends the function br_fill_ifinfo to return also the MRP status for each instance on a bridge. It also adds a new filter RTEXT_FILTER_MRP to return the MRP status only when this is set, not to interfer with the vlans. The MRP status is return only on the bridge interfaces. Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com> Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-02 14:19:15 -07:00
Horatiu Vultur	df42ef227d	bridge: mrp: Add br_mrp_fill_info Add the function br_mrp_fill_info which populates the MRP attributes regarding the status of each MRP instance. Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com> Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-02 14:19:15 -07:00
Richard Guy Briggs	142240398e	audit: add gfp parameter to audit_log_nfcfg Fixed an inconsistent use of GFP flags in nft_obj_notify() that used GFP_KERNEL when a GFP flag was passed in to that function. Given this allocated memory was then used in audit_log_nfcfg() it led to an audit of all other GFP allocations in net/netfilter/nf_tables_api.c and a modification of audit_log_nfcfg() to accept a GFP parameter. Reported-by: Dan Carptenter <dan.carpenter@oracle.com> Signed-off-by: Richard Guy Briggs <rgb@redhat.com> Signed-off-by: Paul Moore <paul@paul-moore.com>	2020-06-29 19:14:47 -04:00
Horatiu Vultur	9b14d1f8a7	bridge: mrp: Fix endian conversion and some other warnings The following sparse warnings are fixed: net/bridge/br_mrp.c:106:18: warning: incorrect type in assignment (different base types) net/bridge/br_mrp.c:106:18: expected unsigned short [usertype] net/bridge/br_mrp.c:106:18: got restricted __be16 [usertype] net/bridge/br_mrp.c:281:23: warning: incorrect type in argument 1 (different modifiers) net/bridge/br_mrp.c:281:23: expected struct list_head entry net/bridge/br_mrp.c:281:23: got struct list_head [noderef] net/bridge/br_mrp.c:332:28: warning: incorrect type in argument 1 (different modifiers) net/bridge/br_mrp.c:332:28: expected struct list_head new net/bridge/br_mrp.c:332:28: got struct list_head [noderef] net/bridge/br_mrp.c:332:40: warning: incorrect type in argument 2 (different modifiers) net/bridge/br_mrp.c:332:40: expected struct list_head head net/bridge/br_mrp.c:332:40: got struct list_head [noderef] net/bridge/br_mrp.c:682:29: warning: incorrect type in argument 1 (different modifiers) net/bridge/br_mrp.c:682:29: expected struct list_head const head net/bridge/br_mrp.c:682:29: got struct list_head [noderef] Reported-by: kernel test robot <lkp@intel.com> Fixes: `2f1a11ae11` ("bridge: mrp: Add MRP interface.") Fixes: `4b8d7d4c59` ("bridge: mrp: Extend bridge interface") Fixes: `9a9f26e8f7` ("bridge: mrp: Connect MRP API with the switchdev API") Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com> Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-06-28 20:44:10 -07:00
David S. Miller	7bed145516	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Minor overlapping changes in xfrm_device.c, between the double ESP trailing bug fix setting the XFRM_INIT flag and the changes in net-next preparing for bonding encryption support. Signed-off-by: David S. Miller <davem@davemloft.net>	2020-06-25 19:29:51 -07:00
David S. Miller	f4926d513b	Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf Pablo Neira Ayuso says: ==================== Netfilter fixes for net The following patchset contains Netfilter fixes for net, they are: 1) Unaligned atomic access in ipset, from Russell King. 2) Missing module description, from Rob Gill. 3) Patches to fix a module unload causing NULL pointer dereference in xtables, from David Wilder. For the record, I posting here his cover letter explaining the problem: A crash happened on ppc64le when running ltp network tests triggered by "rmmod iptable_mangle". See previous discussion in this thread: https://lists.openwall.net/netdev/2020/06/03/161 . In the crash I found in iptable_mangle_hook() that state->net->ipv4.iptable_mangle=NULL causing a NULL pointer dereference. net->ipv4.iptable_mangle is set to NULL in +iptable_mangle_net_exit() and called when ip_mangle modules is unloaded. A rmmod task was found running in the crash dump. A 2nd crash showed the same problem when running "rmmod iptable_filter" (net->ipv4.iptable_filter=NULL). To fix this I added .pre_exit hook in all iptable_foo.c. The pre_exit will un-register the underlying hook and exit would do the table freeing. The netns core does an unconditional +synchronize_rcu after the pre_exit hooks insuring no packets are in flight that have picked up the pointer before completing the un-register. These patches include changes for both iptables and ip6tables. We tested this fix with ltp running iptables01.sh and iptables01.sh -6 a loop for 72 hours. 4) Add a selftest for conntrack helper assignment, from Florian Westphal. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2020-06-25 12:52:41 -07:00
Thomas Martitz	206e732323	net: bridge: enfore alignment for ethernet address The eth_addr member is passed to ether_addr functions that require 2-byte alignment, therefore the member must be properly aligned to avoid unaligned accesses. The problem is in place since the initial merge of multicast to unicast: commit `6db6f0eae6` bridge: multicast to unicast Fixes: `6db6f0eae6` ("bridge: multicast to unicast") Cc: Roopa Prabhu <roopa@cumulusnetworks.com> Cc: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Cc: David S. Miller <davem@davemloft.net> Cc: Jakub Kicinski <kuba@kernel.org> Cc: Felix Fietkau <nbd@nbd.name> Signed-off-by: Thomas Martitz <t.martitz@avm.de> Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-06-25 12:38:16 -07:00
Rob Gill	4cacc39516	netfilter: Add MODULE_DESCRIPTION entries to kernel modules The user tool modinfo is used to get information on kernel modules, including a description where it is available. This patch adds a brief MODULE_DESCRIPTION to netfilter kernel modules (descriptions taken from Kconfig file or code comments) Signed-off-by: Rob Gill <rrobgill@protonmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2020-06-25 00:50:31 +02:00
Nikolay Aleksandrov	b5f1d9ec28	net: bridge: add a flag to avoid refreshing fdb when changing/adding When we modify or create a new fdb entry sometimes we want to avoid refreshing its activity in order to track it properly. One example is when a mac is received from EVPN multi-homing peer by FRR, which doesn't want to change local activity accounting. It makes it static and sets a flag to track its activity. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-06-24 14:36:33 -07:00
Nikolay Aleksandrov	31cbc39b63	net: bridge: add option to allow activity notifications for any fdb entries This patch adds the ability to notify about activity of any entries (static, permanent or ext_learn). EVPN multihoming peers need it to properly and efficiently handle mac sync (peer active/locally active). We add a new NFEA_ACTIVITY_NOTIFY attribute which is used to dump the current activity state and to control if static entries should be monitored at all. We use 2 bits - one to activate fdb entry tracking (disabled by default) and the second to denote that an entry is inactive. We need the second bit in order to avoid multiple notifications of inactivity. Obviously this makes no difference for dynamic entries since at the time of inactivity they get deleted, while the tracked non-dynamic entries get the inactive bit set and get a notification. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-06-24 14:36:33 -07:00
Nikolay Aleksandrov	0592ff8834	net: bridge: fdb_add_entry takes ndm as argument We can just pass ndm as an argument instead of its fields separately. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-06-24 14:36:33 -07:00
Horatiu Vultur	7882c895b7	bridge: mrp: Validate when setting the port role This patch adds specific checks for primary(0x0) and secondary(0x1) when setting the port role. For any other value the function 'br_mrp_set_port_role' will return -EINVAL. Fixes: `20f6a05ef6` ("bridge: mrp: Rework the MRP netlink interface") Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-06-23 14:38:05 -07:00
Linus Torvalds	96144c58ab	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from David Miller: 1) Fix cfg80211 deadlock, from Johannes Berg. 2) RXRPC fails to send norigications, from David Howells. 3) MPTCP RM_ADDR parsing has an off by one pointer error, fix from Geliang Tang. 4) Fix crash when using MSG_PEEK with sockmap, from Anny Hu. 5) The ucc_geth driver needs __netdev_watchdog_up exported, from Valentin Longchamp. 6) Fix hashtable memory leak in dccp, from Wang Hai. 7) Fix how nexthops are marked as FDB nexthops, from David Ahern. 8) Fix mptcp races between shutdown and recvmsg, from Paolo Abeni. 9) Fix crashes in tipc_disc_rcv(), from Tuong Lien. 10) Fix link speed reporting in iavf driver, from Brett Creeley. 11) When a channel is used for XSK and then reused again later for XSK, we forget to clear out the relevant data structures in mlx5 which causes all kinds of problems. Fix from Maxim Mikityanskiy. 12) Fix memory leak in genetlink, from Cong Wang. 13) Disallow sockmap attachments to UDP sockets, it simply won't work. From Lorenz Bauer. * git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (83 commits) net: ethernet: ti: ale: fix allmulti for nu type ale net: ethernet: ti: am65-cpsw-nuss: fix ale parameters init net: atm: Remove the error message according to the atomic context bpf: Undo internal BPF_PROBE_MEM in BPF insns dump libbpf: Support pre-initializing .bss global variables tools/bpftool: Fix skeleton codegen bpf: Fix memlock accounting for sock_hash bpf: sockmap: Don't attach programs to UDP sockets bpf: tcp: Recv() should return 0 when the peer socket is closed ibmvnic: Flush existing work items before device removal genetlink: clean up family attributes allocations net: ipa: header pad field only valid for AP->modem endpoint net: ipa: program upper nibbles of sequencer type net: ipa: fix modem LAN RX endpoint id net: ipa: program metadata mask differently ionic: add pcie_print_link_status rxrpc: Fix race between incoming ACK parser and retransmitter net/mlx5: E-Switch, Fix some error pointer dereferences net/mlx5: Don't fail driver on failure to create debugfs net/mlx5e: CT: Fix ipv6 nat header rewrite actions ...	2020-06-13 16:27:13 -07:00
Masahiro Yamada	a7f7f6248d	treewide: replace '---help---' in Kconfig files with 'help' Since commit `84af7a6194` ("checkpatch: kconfig: prefer 'help' over '---help---'"), the number of '---help---' has been gradually decreasing, but there are still more than 2400 instances. This commit finishes the conversion. While I touched the lines, I also fixed the indentation. There are a variety of indentation styles found. a) 4 spaces + '---help---' b) 7 spaces + '---help---' c) 8 spaces + '---help---' d) 1 space + 1 tab + '---help---' e) 1 tab + '---help---' (correct indentation) f) 1 tab + 1 space + '---help---' g) 1 tab + 2 spaces + '---help---' In order to convert all of them to 1 tab + 'help', I ran the following commend: $ find . -name 'Kconfig' \| xargs sed -i 's/^[[:space:]]---help---/\thelp/' Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>	2020-06-14 01:57:21 +09:00
Cong Wang	845e0ebb44	net: change addr_list_lock back to static key The dynamic key update for addr_list_lock still causes troubles, for example the following race condition still exists: CPU 0: CPU 1: (RCU read lock) (RTNL lock) dev_mc_seq_show() netdev_update_lockdep_key() -> lockdep_unregister_key() -> netif_addr_lock_bh() because lockdep doesn't provide an API to update it atomically. Therefore, we have to move it back to static keys and use subclass for nest locking like before. In commit `1a33e10e4a` ("net: partially revert dynamic lockdep key changes"), I already reverted most parts of commit `ab92d68fc2` ("net: core: add generic lockdep keys"). This patch reverts the rest and also part of commit `f3b0a18bb6` ("net: remove unnecessary variables and callback"). After this patch, addr_list_lock changes back to using static keys and subclasses to satisfy lockdep. Thanks to dev->lower_level, we do not have to change back to ->ndo_get_lock_subclass(). And hopefully this reduces some syzbot lockdep noises too. Reported-by: syzbot+f3a0e80c34b3fc28ac5e@syzkaller.appspotmail.com Cc: Taehee Yoo <ap420073@gmail.com> Cc: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-06-09 12:59:45 -07:00
Linus Torvalds	cb8e59cc87	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next Pull networking updates from David Miller: 1) Allow setting bluetooth L2CAP modes via socket option, from Luiz Augusto von Dentz. 2) Add GSO partial support to igc, from Sasha Neftin. 3) Several cleanups and improvements to r8169 from Heiner Kallweit. 4) Add IF_OPER_TESTING link state and use it when ethtool triggers a device self-test. From Andrew Lunn. 5) Start moving away from custom driver versions, use the globally defined kernel version instead, from Leon Romanovsky. 6) Support GRO vis gro_cells in DSA layer, from Alexander Lobakin. 7) Allow hard IRQ deferral during NAPI, from Eric Dumazet. 8) Add sriov and vf support to hinic, from Luo bin. 9) Support Media Redundancy Protocol (MRP) in the bridging code, from Horatiu Vultur. 10) Support netmap in the nft_nat code, from Pablo Neira Ayuso. 11) Allow UDPv6 encapsulation of ESP in the ipsec code, from Sabrina Dubroca. Also add ipv6 support for espintcp. 12) Lots of ReST conversions of the networking documentation, from Mauro Carvalho Chehab. 13) Support configuration of ethtool rxnfc flows in bcmgenet driver, from Doug Berger. 14) Allow to dump cgroup id and filter by it in inet_diag code, from Dmitry Yakunin. 15) Add infrastructure to export netlink attribute policies to userspace, from Johannes Berg. 16) Several optimizations to sch_fq scheduler, from Eric Dumazet. 17) Fallback to the default qdisc if qdisc init fails because otherwise a packet scheduler init failure will make a device inoperative. From Jesper Dangaard Brouer. 18) Several RISCV bpf jit optimizations, from Luke Nelson. 19) Correct the return type of the ->ndo_start_xmit() method in several drivers, it's netdev_tx_t but many drivers were using 'int'. From Yunjian Wang. 20) Add an ethtool interface for PHY master/slave config, from Oleksij Rempel. 21) Add BPF iterators, from Yonghang Song. 22) Add cable test infrastructure, including ethool interfaces, from Andrew Lunn. Marvell PHY driver is the first to support this facility. 23) Remove zero-length arrays all over, from Gustavo A. R. Silva. 24) Calculate and maintain an explicit frame size in XDP, from Jesper Dangaard Brouer. 25) Add CAP_BPF, from Alexei Starovoitov. 26) Support terse dumps in the packet scheduler, from Vlad Buslov. 27) Support XDP_TX bulking in dpaa2 driver, from Ioana Ciornei. 28) Add devm_register_netdev(), from Bartosz Golaszewski. 29) Minimize qdisc resets, from Cong Wang. 30) Get rid of kernel_getsockopt and kernel_setsockopt in order to eliminate set_fs/get_fs calls. From Christoph Hellwig. * git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (2517 commits) selftests: net: ip_defrag: ignore EPERM net_failover: fixed rollback in net_failover_open() Revert "tipc: Fix potential tipc_aead refcnt leak in tipc_crypto_rcv" Revert "tipc: Fix potential tipc_node refcnt leak in tipc_rcv" vmxnet3: allow rx flow hash ops only when rss is enabled hinic: add set_channels ethtool_ops support selftests/bpf: Add a default $(CXX) value tools/bpf: Don't use $(COMPILE.c) bpf, selftests: Use bpf_probe_read_kernel s390/bpf: Use bcr 0,%0 as tail call nop filler s390/bpf: Maintain 8-byte stack alignment selftests/bpf: Fix verifier test selftests/bpf: Fix sample_cnt shared between two threads bpf, selftests: Adapt cls_redirect to call csum_level helper bpf: Add csum_level helper for fixing up csum levels bpf: Fix up bpf_skb_adjust_room helper's skb csum setting sfc: add missing annotation for efx_ef10_try_update_nic_stats_vf() crypto/chtls: IPv6 support for inline TLS Crypto/chcr: Fixes a coccinile check error Crypto/chcr: Fixes compilations warnings ...	2020-06-03 16:27:18 -07:00
Linus Torvalds	9d99b1647f	audit/stable-5.8 PR 20200601 -----BEGIN PGP SIGNATURE----- iQJIBAABCAAyFiEES0KozwfymdVUl37v6iDy2pc3iXMFAl7VnKEUHHBhdWxAcGF1 bC1tb29yZS5jb20ACgkQ6iDy2pc3iXMbHA/+PQmrPdzPvkLAjjf1y3LXvyEIAXIQ h2r8SxHa7iGyF6vVPz+ya7ux0KAm8wCVdfkokWG5jxjwK7pysS6gx9JzBVK7dbhD FsKBSoq9+to9fYlaCyX7vn85C7kK5oGrwS/ECos0BHBpij8ukLgvPQu+PDs7d4xW 1X2Nrgqnc7M4L8ayzXTQX0fDWcOkapzaN86+R+Lavb4hO/FownaYbuCFn+1mdzux ZNBpt3/y1pM6vi5YBkI1rkauBCmkl/YSX/mf/EwDNlQ0XmcadGQ6z7iwjyiE826g etCHWD3cgQH7Zzz6CxBNX8Xbq0nIQueHHiFYpVyy9lf4xleFvnfFDebrs8Q9TB6G jTWU8okioUKPZyRDaRuIAmCf/LBQRsMkIYTU3w6J0ZqsBycTw3NXPiQArmlxZESM HquxWpKoZytRiw581hiSGKNqY+R3FvA+Jroc/7bWfNOE3IdFxegvCsC3giKJf1rY AlQitehql9a5jp7A57+477WRYOygYRnd+ntMD5KqR90QSIcQXeg0/lFKhco+zc2p bXbWLE+aaOTGCeC+3Eow3T7FMWmrIn6ccKgM84+WT7YQYtRqUYu3RIZbnlYXN7uH 8xGXT6ccPcEwIjgyF87J0KyGhrbT1N91Jd2jMJkEry9OLAn/yr+pUBQtAa456MMi JYevS4atZaUqgvw= =iLfC -----END PGP SIGNATURE----- Merge tag 'audit-pr-20200601' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit Pull audit updates from Paul Moore: "Summary of the significant patches: - Record information about binds/unbinds to the audit multicast socket. This helps identify which processes have/had access to the information in the audit stream. - Cleanup and add some additional information to the netfilter configuration events collected by audit. - Fix some of the audit error handling code so we don't leak network namespace references" * tag 'audit-pr-20200601' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit: audit: add subj creds to NETFILTER_CFG record to audit: Replace zero-length array with flexible-array audit: make symbol 'audit_nfcfgs' static netfilter: add audit table unregister actions audit: tidy and extend netfilter_cfg x_tables audit: log audit netlink multicast bind and unbind audit: fix a net reference leak in audit_list_rules_send() audit: fix a net reference leak in audit_send_reply()	2020-06-02 17:13:37 -07:00
Christoph Hellwig	88dca4ca5a	mm: remove the pgprot argument to __vmalloc The pgprot argument to __vmalloc is always PAGE_KERNEL now, so remove it. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Michael Kelley <mikelley@microsoft.com> [hyperv] Acked-by: Gao Xiang <xiang@kernel.org> [erofs] Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Wei Liu <wei.liu@kernel.org> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Christophe Leroy <christophe.leroy@c-s.fr> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: David Airlie <airlied@linux.ie> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Haiyang Zhang <haiyangz@microsoft.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: "K. Y. Srinivasan" <kys@microsoft.com> Cc: Laura Abbott <labbott@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Nitin Gupta <ngupta@vflare.org> Cc: Robin Murphy <robin.murphy@arm.com> Cc: Sakari Ailus <sakari.ailus@linux.intel.com> Cc: Stephen Hemminger <sthemmin@microsoft.com> Cc: Sumit Semwal <sumit.semwal@linaro.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Paul Mackerras <paulus@ozlabs.org> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Will Deacon <will@kernel.org> Link: http://lkml.kernel.org/r/20200414131348.444715-22-hch@lst.de Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2020-06-02 10:59:11 -07:00
Horatiu Vultur	c6676e7d62	bridge: mrp: Add support for role MRA A node that has the MRA role, it can behave as MRM or MRC. Initially it starts as MRM and sends MRP_Test frames on both ring ports. If it detects that there are MRP_Test send by another MRM, then it checks if these frames have a lower priority than itself. In this case it would send MRP_Nack frames to notify the other node that it needs to stop sending MRP_Test frames. If it receives a MRP_Nack frame then it stops sending MRP_Test frames and starts to behave as a MRC but it would continue to monitor the MRP_Test frames send by MRM. If at a point the MRM stops to send MRP_Test frames it would get the MRM role and start to send MRP_Test frames. Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-06-01 11:56:11 -07:00
Horatiu Vultur	4b3a61b030	bridge: mrp: Set the priority of MRP instance Each MRP instance has a priority, a lower value means a higher priority. The priority of MRP instance is stored in MRP_Test frame in this way all the MRP nodes in the ring can see other nodes priority. Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-06-01 11:56:11 -07:00
Ido Schimmel	53fc685243	bridge: Avoid infinite loop when suppressing NS messages with invalid options When neighbor suppression is enabled the bridge device might reply to Neighbor Solicitation (NS) messages on behalf of remote hosts. In case the NS message includes the "Source link-layer address" option [1], the bridge device will use the specified address as the link-layer destination address in its reply. To avoid an infinite loop, break out of the options parsing loop when encountering an option with length zero and disregard the NS message. This is consistent with the IPv6 ndisc code and RFC 4886 which states that "Nodes MUST silently discard an ND packet that contains an option with length zero" [2]. [1] https://tools.ietf.org/html/rfc4861#section-4.3 [2] https://tools.ietf.org/html/rfc4861#section-4.6 Fixes: `ed842faeb2` ("bridge: suppress nd pkts on BR_NEIGH_SUPPRESS ports") Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reported-by: Alla Segal <allas@mellanox.com> Tested-by: Alla Segal <allas@mellanox.com> Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-06-01 11:08:41 -07:00
David S. Miller	1806c13dc2	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net xdp_umem.c had overlapping changes between the 64-bit math fix for the calculation of npgs and the removal of the zerocopy memory type which got rid of the chunk_size_nohdr member. The mlx5 Kconfig conflict is a case where we just take the net-next copy of the Kconfig entry dependency as it takes on the ESWITCH dependency by one level of indirection which is what the 'net' conflicting change is trying to ensure. Signed-off-by: David S. Miller <davem@davemloft.net>	2020-05-31 17:48:46 -07:00
Arnd Bergmann	b3b6a84c6a	bridge: multicast: work around clang bug Clang-10 and clang-11 run into a corner case of the register allocator on 32-bit ARM, leading to excessive stack usage from register spilling: net/bridge/br_multicast.c:2422:6: error: stack frame size of 1472 bytes in function 'br_multicast_get_stats' [-Werror,-Wframe-larger-than=] Work around this by marking one of the internal functions as noinline_for_stack. Link: https://bugs.llvm.org/show_bug.cgi?id=45802#c9 Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-05-27 11:34:48 -07:00
Horatiu Vultur	20f6a05ef6	bridge: mrp: Rework the MRP netlink interface This patch reworks the MRP netlink interface. Before, each attribute represented a binary structure which made it hard to be extended. Therefore update the MRP netlink interface such that each existing attribute to be a nested attribute which contains the fields of the binary structures. In this way the MRP netlink interface can be extended without breaking the backwards compatibility. It is also using strict checking for attributes under the MRP top attribute. Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com> Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-05-27 11:30:43 -07:00
Horatiu Vultur	617504c67e	bridge: mrp: Fix out-of-bounds read in br_mrp_parse The issue was reported by syzbot. When the function br_mrp_parse was called with a valid net_bridge_port, the net_bridge was an invalid pointer. Therefore the check br->stp_enabled could pass/fail depending where it was pointing in memory. The fix consists of setting the net_bridge pointer if the port is a valid pointer. Reported-by: syzbot+9c6f0f1f8e32223df9a4@syzkaller.appspotmail.com Fixes: `6536993371` ("bridge: mrp: Integrate MRP into the bridge") Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com> Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-05-25 18:09:26 -07:00
Michael Braun	e9c284ec4b	netfilter: nft_reject_bridge: enable reject with bridge vlan Currently, using the bridge reject target with tagged packets results in untagged packets being sent back. Fix this by mirroring the vlan id as well. Fixes: `85f5b3086a` ("netfilter: bridge: add reject support") Signed-off-by: Michael Braun <michael-dev@fami-braun.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2020-05-25 20:39:05 +02:00
Horatiu Vultur	4fb13499d3	bridge: mrp: Restore port state when deleting MRP instance When a MRP instance is deleted, then restore the port according to the bridge state. If the bridge is up then the ports will be in forwarding state otherwise will be in disabled state. Fixes: `9a9f26e8f7` ("bridge: mrp: Connect MRP API with the switchdev API") Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-05-22 16:17:15 -07:00
Horatiu Vultur	7aa38018be	bridge: mrp: Add br_mrp_unique_ifindex function It is not allow to have the same net bridge port part of multiple MRP rings. Therefore add a check if the port is used already in a different MRP. In that case return failure. Fixes: `9a9f26e8f7` ("bridge: mrp: Connect MRP API with the switchdev API") Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-05-22 16:17:15 -07:00
Vladimir Oltean	9eb8eff0cf	net: bridge: allow enslaving some DSA master network devices Commit `8db0a2ee2c` ("net: bridge: reject DSA-enabled master netdevices as bridge members") added a special check in br_if.c in order to check for a DSA master network device with a tagging protocol configured. This was done because back then, such devices, once enslaved in a bridge would become inoperative and would not pass DSA tagged traffic anymore due to br_handle_frame returning RX_HANDLER_CONSUMED. But right now we have valid use cases which do require bridging of DSA masters. One such example is when the DSA master ports are DSA switch ports themselves (in a disjoint tree setup). This should be completely equivalent, functionally speaking, from having multiple DSA switches hanging off of the ports of a switchdev driver. So we should allow the enslaving of DSA tagged master network devices. Instead of the regular br_handle_frame(), install a new function br_handle_frame_dummy() on these DSA masters, which returns RX_HANDLER_PASS in order to call into the DSA specific tagging protocol handlers, and lift the restriction from br_add_if. Suggested-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Suggested-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Tested-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-05-10 19:52:33 -07:00
Eric Dumazet	f78ed2204d	netpoll: accept NULL np argument in netpoll_send_skb() netpoll_send_skb() callers seem to leak skb if the np pointer is NULL. While this should not happen, we can make the code more robust. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-05-07 18:11:07 -07:00
Jacob Keller	c75a33c84b	net: remove newlines in NL_SET_ERR_MSG_MOD The NL_SET_ERR_MSG_MOD macro is used to report a string describing an error message to userspace via the netlink extended ACK structure. It should not have a trailing newline. Add a cocci script which catches cases where the newline marker is present. Using this script, fix the handful of cases which accidentally included a trailing new line. I couldn't figure out a way to get a patch mode working, so this script only implements context, report, and org. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Cc: Jakub Kicinski <kuba@kernel.org> Cc: Andy Whitcroft <apw@canonical.com> Cc: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-05-07 17:56:14 -07:00
David S. Miller	3793faad7b	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Conflicts were all overlapping changes. Signed-off-by: David S. Miller <davem@davemloft.net>	2020-05-06 22:10:13 -07:00
Jason Yan	8741e18419	net: bridge: return false in br_mrp_enabled() Fix the following coccicheck warning: net/bridge/br_private.h:1334:8-9: WARNING: return of 0/1 in function 'br_mrp_enabled' with return type bool Fixes: `6536993371` ("bridge: mrp: Integrate MRP into the bridge") Signed-off-by: Jason Yan <yanaijie@huawei.com> Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-05-06 13:57:30 -07:00
David S. Miller	115506fea4	Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next Alexei Starovoitov says: ==================== pull-request: bpf-next 2020-05-01 (v2) The following pull-request contains BPF updates for your net-next tree. We've added 61 non-merge commits during the last 6 day(s) which contain a total of 153 files changed, 6739 insertions(+), 3367 deletions(-). The main changes are: 1) pulled work.sysctl from vfs tree with sysctl bpf changes. 2) bpf_link observability, from Andrii. 3) BTF-defined map in map, from Andrii. 4) asan fixes for selftests, from Andrii. 5) Allow bpf_map_lookup_elem for SOCKMAP and SOCKHASH, from Jakub. 6) production cloudflare classifier as a selftes, from Lorenz. 7) bpf_ktime_get_*_ns() helper improvements, from Maciej. 8) unprivileged bpftool feature probe, from Quentin. 9) BPF_ENABLE_STATS command, from Song. 10) enable bpf_[gs]etsockopt() helpers for sock_ops progs, from Stanislav. 11) enable a bunch of common helpers for cg-device, sysctl, sockopt progs, from Stanislav. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2020-05-01 17:02:27 -07:00
Ido Schimmel	7979457b1d	net: bridge: vlan: Add a schedule point during VLAN processing User space can request to delete a range of VLANs from a bridge slave in one netlink request. For each deleted VLAN the FDB needs to be traversed in order to flush all the affected entries. If a large range of VLANs is deleted and the number of FDB entries is large or the FDB lock is contented, it is possible for the kernel to loop through the deleted VLANs for a long time. In case preemption is disabled, this can result in a soft lockup. Fix this by adding a schedule point after each VLAN is deleted to yield the CPU, if needed. This is safe because the VLANs are traversed in process context. Fixes: `bdced7ef78` ("bridge: support for multiple vlans and vlan ranges in setlink and dellink requests") Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reported-by: Stefan Priebe - Profihost AG <s.priebe@profihost.ag> Tested-by: Stefan Priebe - Profihost AG <s.priebe@profihost.ag> Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-04-30 17:45:41 -07:00
Richard Guy Briggs	a45d88530b	netfilter: add audit table unregister actions Audit the action of unregistering ebtables and x_tables. See: https://github.com/linux-audit/audit-kernel/issues/44 Signed-off-by: Richard Guy Briggs <rgb@redhat.com> Signed-off-by: Paul Moore <paul@paul-moore.com>	2020-04-28 18:11:36 -04:00
Richard Guy Briggs	c4dad0aab3	audit: tidy and extend netfilter_cfg x_tables NETFILTER_CFG record generation was inconsistent for x_tables and ebtables configuration changes. The call was needlessly messy and there were supporting records missing at times while they were produced when not requested. Simplify the logging call into a new audit_log_nfcfg call. Honour the audit_enabled setting while more consistently recording information including supporting records by tidying up dummy checks. Add an op= field that indicates the operation being performed (register or replace). Here is the enhanced sample record: type=NETFILTER_CFG msg=audit(1580905834.919:82970): table=filter family=2 entries=83 op=replace Generate audit NETFILTER_CFG records on ebtables table registration. Previously this was being done for x_tables registration and replacement operations and ebtables table replacement only. See: https://github.com/linux-audit/audit-kernel/issues/25 See: https://github.com/linux-audit/audit-kernel/issues/35 See: https://github.com/linux-audit/audit-kernel/issues/43 Signed-off-by: Richard Guy Briggs <rgb@redhat.com> Signed-off-by: Paul Moore <paul@paul-moore.com>	2020-04-28 17:52:42 -04:00
Horatiu Vultur	419dba8a49	net: bridge: Add checks for enabling the STP. It is not possible to have the MRP and STP running at the same time on the bridge, therefore add check when enabling the STP to check if MRP is already enabled. In that case return error. Reviewed-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-04-27 11:40:25 -07:00
Horatiu Vultur	6536993371	bridge: mrp: Integrate MRP into the bridge To integrate MRP into the bridge, the bridge needs to do the following: - detect if the MRP frame was received on MRP ring port in that case it would be processed otherwise just forward it as usual. - enable parsing of MRP - before whenever the bridge was set up, it would set all the ports in forwarding state. Add an extra check to not set ports in forwarding state if the port is an MRP ring port. The reason of this change is that if the MRP instance initially sets the port in blocked state by setting the bridge up it would overwrite this setting. Reviewed-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-04-27 11:40:25 -07:00
Horatiu Vultur	4d02b8f075	bridge: mrp: Implement netlink interface to configure MRP Implement netlink interface to configure MRP. The implementation will do sanity checks over the attributes and then eventually call the MRP interface. Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-04-27 11:40:25 -07:00
Horatiu Vultur	9a9f26e8f7	bridge: mrp: Connect MRP API with the switchdev API Implement the MRP API. In case the HW can't generate MRP Test frames then the SW will try to generate the frames. In case that also the SW will fail in generating the frames then a error is return to the userspace. The userspace is responsible to generate all the other MRP frames regardless if the test frames are generated by HW or SW. The forwarding/termination of MRP frames is happening in the kernel and is done by the MRP instance. The userspace application doesn't do the forwarding. Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-04-27 11:40:25 -07:00
Horatiu Vultur	fadd409136	bridge: switchdev: mrp: Implement MRP API for switchdev Implement the MRP api for switchdev. These functions will just eventually call the switchdev functions: switchdev_port_obj_add/del and switchdev_port_attr_set. Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-04-27 11:40:25 -07:00
Horatiu Vultur	2f1a11ae11	bridge: mrp: Add MRP interface. Define the MRP interface. This interface is used by the netlink to update the MRP instances and by the MRP to make the calls to switchdev to offload it to HW. It defines an MRP instance 'struct br_mrp' which is a list of MRP instances. Which will be part of the 'struct net_bridge'. Each instance has 2 ring ports, a bridge and an ID. In case the HW can't generate MRP Test frames then the SW will generate those. br_mrp_add - adds a new MRP instance. br_mrp_del - deletes an existing MRP instance. Each instance has an ID(ring_id). br_mrp_set_port_state - changes the port state. The port can be in forwarding state, which means that the frames can pass through or in blocked state which means that the frames can't pass through except MRP frames. This will eventually call the switchdev API to notify the HW. This information is used also by the SW bridge to know how to forward frames in case the HW doesn't have this capability. br_mrp_set_port_role - a port role can be primary or secondary. This information is required to be pushed to HW in case the HW can generate MRP_Test frames. Because the MRP_Test frames contains a file with this information. Otherwise the HW will not be able to generate the frames correctly. br_mrp_set_ring_state - a ring can be in state open or closed. State open means that the mrp port stopped receiving MRP_Test frames, while closed means that the mrp port received MRP_Test frames. Similar with br_mrp_port_role, this information is pushed in HW because the MRP_Test frames contain this information. br_mrp_set_ring_role - a ring can have the following roles MRM or MRC. For the role MRM it is expected that the HW can terminate the MRP frames, notify the SW that it stopped receiving MRP_Test frames and trapp all the other MRP frames. While for MRC mode it is expected that the HW can forward the MRP frames only between the MRP ports and copy MRP_Topology frames to CPU. In case the HW doesn't support a role it needs to return an error code different than -EOPNOTSUPP. br_mrp_start_test - this starts/stops the generation of MRP_Test frames. To stop the generation of frames the interval needs to have a value of 0. In this case the userspace needs to know if the HW supports this or not. Not to have duplicate frames(generated by HW and SW). Because if the HW supports this then the SW will not generate anymore frames and will expect that the HW will notify when it stopped receiving MRP frames using the function br_mrp_port_open. br_mrp_port_open - this function is used by drivers to notify the userspace via a netlink callback that one of the ports stopped receiving MRP_Test frames. This function is called only when the node has the role MRM. It is not supposed to be called from userspace. br_mrp_port_switchdev_add - this corresponds to the function br_mrp_add, and will notify the HW that a MRP instance is added. The function gets as parameter the MRP instance. br_mrp_port_switchdev_del - this corresponds to the function br_mrp_del, and will notify the HW that a MRP instance is removed. The function gets as parameter the ID of the MRP instance that is removed. br_mrp_port_switchdev_set_state - this corresponds to the function br_mrp_set_port_state. It would notify the HW if it should block or not non-MRP frames. br_mrp_port_switchdev_set_port - this corresponds to the function br_mrp_set_port_role. It would set the port role, primary or secondary. br_mrp_switchdev_set_role - this corresponds to the function br_mrp_set_ring_role and would set one of the role MRM or MRC. br_mrp_switchdev_set_ring_state - this corresponds to the function br_mrp_set_ring_state and would set the ring to be open or closed. br_mrp_switchdev_send_ring_test - this corresponds to the function br_mrp_start_test. This will notify the HW to start or stop generating MRP_Test frames. Value 0 for the interval parameter means to stop generating the frames. br_mrp_port_open - this function is used to notify the userspace that the port lost the continuity of MRP Test frames. Reviewed-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-04-27 11:40:25 -07:00
Horatiu Vultur	3e54442c93	net: bridge: Add port attribute IFLA_BRPORT_MRP_RING_OPEN This patch adds a new port attribute, IFLA_BRPORT_MRP_RING_OPEN, which allows to notify the userspace when the port lost the continuite of MRP frames. This attribute is set by kernel whenever the SW or HW detects that the ring is being open or closed. Reviewed-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-04-27 11:40:25 -07:00
Horatiu Vultur	4b8d7d4c59	bridge: mrp: Extend bridge interface To integrate MRP into the bridge, first the bridge needs to be aware of ports that are part of an MRP ring and which rings are on the bridge. Therefore extend bridge interface with the following: - add new flag(BR_MPP_AWARE) to the net bridge ports, this bit will be set when the port is added to an MRP instance. In this way it knows if the frame was received on MRP ring port - add new flag(BR_MRP_LOST_CONT) to the net bridge ports, this bit will be set when the port lost the continuity of MRP Test frames. - add a list of MRP instances Reviewed-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-04-27 11:40:25 -07:00
Horatiu Vultur	2cc974f83f	bridge: mrp: Update Kconfig Add the option BRIDGE_MRP to allow to build in or not MRP support. The default value is N. Reviewed-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-04-27 11:40:25 -07:00
Christoph Hellwig	32927393dc	sysctl: pass kernel pointers to ->proc_handler Instead of having all the sysctl handlers deal with user pointers, which is rather hairy in terms of the BPF interaction, copy the input to and from userspace in common code. This also means that the strings are always NUL-terminated by the common code, making the API a little bit safer. As most handler just pass through the data to one of the common handlers a lot of the changes are mechnical. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Andrey Ignatov <rdna@fb.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2020-04-27 02:07:40 -04:00
Nikolay Aleksandrov	c443758b21	net: bridge: vlan options: move the tunnel command to the nested attribute Now that we have a nested tunnel info attribute we can add a separate one for the tunnel command and require it explicitly from user-space. It must be one of RTM_SETLINK/DELLINK. Only RTM_SETLINK requires a valid tunnel id, DELLINK just removes it if it was set before. This allows us to have all tunnel attributes and control in one place, thus removing the need for an outside vlan info flag. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-03-20 08:52:20 -07:00
Nikolay Aleksandrov	fa388f29a9	net: bridge: vlan options: nest the tunnel id into a tunnel info attribute While discussing the new API, Roopa mentioned that we'll be adding more tunnel attributes and options in the future, so it's better to make it a nested attribute, since this is still in net-next we can easily change it and nest the tunnel id attribute under BRIDGE_VLANDB_ENTRY_TUNNEL_INFO. The new format is: [BRIDGE_VLANDB_ENTRY] [BRIDGE_VLANDB_ENTRY_TUNNEL_INFO] [BRIDGE_VLANDB_TINFO_ID] Any new tunnel attributes can be nested under BRIDGE_VLANDB_ENTRY_TUNNEL_INFO. Suggested-by: Roopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-03-20 08:52:20 -07:00
Nikolay Aleksandrov	56d099761a	net: bridge: vlan: include stats in dumps if requested This patch adds support for vlan stats to be included when dumping vlan information. We have to dump them only when explicitly requested (thus the flag below) because that disables the vlan range compression and will make the dump significantly larger. In order to request the stats to be included we add a new dump attribute called BRIDGE_VLANDB_DUMP_FLAGS which can affect dumps with the following first flag: - BRIDGE_VLANDB_DUMPF_STATS The stats are intentionally nested and put into separate attributes to make it easier for extending later since we plan to add per-vlan mcast stats, drop stats and possibly STP stats. This is the last missing piece from the new vlan API which makes the dumped vlan information complete. A dump request which should include stats looks like: [BRIDGE_VLANDB_DUMP_FLAGS] \|= BRIDGE_VLANDB_DUMPF_STATS A vlandb entry attribute with stats looks like: [BRIDGE_VLANDB_ENTRY] = { [BRIDGE_VLANDB_ENTRY_STATS] = { [BRIDGE_VLANDB_STATS_RX_BYTES] [BRIDGE_VLANDB_STATS_RX_PACKETS] ... } } Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-03-19 20:21:47 -07:00
David S. Miller	a58741ef1e	Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next Pablo Neira Ayuso says: ==================== Netfilter updates for net-next The following patchset contains Netfilter updates for net-next: 1) Use nf_flow_offload_tuple() to fetch flow stats, from Paul Blakey. 2) Add new xt_IDLETIMER hard mode, from Manoj Basapathi. Follow up patch to clean up this new mode, from Dan Carpenter. 3) Add support for geneve tunnel options, from Xin Long. 4) Make sets built-in and remove modular infrastructure for sets, from Florian Westphal. 5) Remove unused TEMPLATE_NULLS_VAL, from Li RongQing. 6) Statify nft_pipapo_get, from Chen Wandun. 7) Use C99 flexible-array member, from Gustavo A. R. Silva. 8) More descriptive variable names for bitwise, from Jeremy Sowden. 9) Four patches to add tunnel device hardware offload to the flowtable infrastructure, from wenxu. 10) pipapo set supports for 8-bit grouping, from Stefano Brivio. 11) pipapo can switch between nibble and byte grouping, also from Stefano. 12) Add AVX2 vectorized version of pipapo, from Stefano Brivio. 13) Update pipapo to be use it for single ranges, from Stefano. 14) Add stateful expression support to elements via control plane, eg. counter per element. 15) Re-visit sysctls in unprivileged namespaces, from Florian Westphal. 15) Add new egress hook, from Lukas Wunner. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2020-03-17 23:51:31 -07:00
Nikolay Aleksandrov	569da08228	net: bridge: vlan options: add support for tunnel mapping set/del This patch adds support for manipulating vlan/tunnel mappings. The tunnel ids are globally unique and are one per-vlan. There were two trickier issues - first in order to support vlan ranges we have to compute the current tunnel id in the following way: - base tunnel id (attr) + current vlan id - starting vlan id This is in line how the old API does vlan/tunnel mapping with ranges. We already have the vlan range present, so it's redundant to add another attribute for the tunnel range end. It's simply base tunnel id + vlan range. And second to support removing mappings we need an out-of-band way to tell the option manipulating function because there are no special/reserved tunnel id values, so we use a vlan flag to denote the operation is tunnel mapping removal. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-03-17 22:47:12 -07:00
Nikolay Aleksandrov	188c67dd19	net: bridge: vlan options: add support for tunnel id dumping Add a new option - BRIDGE_VLANDB_ENTRY_TUNNEL_ID which is used to dump the tunnel id mapping. Since they're unique per vlan they can enter a vlan range if they're consecutive, thus we can calculate the tunnel id range map simply as: vlan range end id - vlan range start id. The starting point is the tunnel id in BRIDGE_VLANDB_ENTRY_TUNNEL_ID. This is similar to how the tunnel entries can be created in a range via the old API (a vlan range maps to a tunnel range). Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-03-17 22:47:12 -07:00
Nikolay Aleksandrov	53e96632ab	net: bridge: vlan tunnel: constify bridge and port arguments The vlan tunnel code changes vlan options, it shouldn't touch port or bridge options so we can constify the port argument. This would later help us to re-use these functions from the vlan options code. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-03-17 22:47:12 -07:00
Nikolay Aleksandrov	99f7c5e096	net: bridge: vlan options: rename br_vlan_opts_eq to br_vlan_opts_eq_range It is more appropriate name as it shows the intent of why we need to check the options' state. It also allows us to give meaning to the two arguments of the function: the first is the current vlan (v_curr) being checked if it could enter the range ending in the second one (range_end). Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-03-17 22:47:12 -07:00
Gustavo A. R. Silva	6daf141401	netfilter: Replace zero-length array with flexible-array member The current codebase makes use of the zero-length array language extension to the C90 standard, but the preferred mechanism to declare variable-length types such as these ones is a flexible array member[1][2], introduced in C99: struct foo { int stuff; struct boo array[]; }; By making use of the mechanism above, we will get a compiler warning in case the flexible array does not occur last in the structure, which will help us prevent some kind of undefined behavior bugs from being inadvertently introduced[3] to the codebase from now on. Also, notice that, dynamic memory allocations won't be affected by this change: "Flexible array members have incomplete type, and so the sizeof operator may not be applied. As a quirk of the original implementation of zero-length arrays, sizeof evaluates to zero."[1] Lastly, fix checkpatch.pl warning WARNING: __aligned(size) is preferred over __attribute__((aligned(size))) in net/bridge/netfilter/ebtables.c This issue was found with the help of Coccinelle. [1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html [2] https://github.com/KSPP/linux/issues/21 [3] commit `7649773293` ("cxgb3/l2t: Fix undefined behaviour") Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2020-03-15 15:20:16 +01:00
Nikolay Aleksandrov	823d81b0fa	net: bridge: fix stale eth hdr pointer in br_dev_xmit In br_dev_xmit() we perform vlan filtering in br_allowed_ingress() but if the packet has the vlan header inside (e.g. bridge with disabled tx-vlan-offload) then the vlan filtering code will use skb_vlan_untag() to extract the vid before filtering which in turn calls pskb_may_pull() and we may end up with a stale eth pointer. Moreover the cached eth header pointer will generally be wrong after that operation. Remove the eth header caching and just use eth_hdr() directly, the compiler does the right thing and calculates it only once so we don't lose anything. Fixes: `057658cb33` ("bridge: suppress arp pkts on BR_NEIGH_SUPPRESS ports") Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-02-24 11:11:19 -08:00
Madhuparna Bhowmik	33c4acbe2f	bridge: br_stp: Use built-in RCU list checking list_for_each_entry_rcu() has built-in RCU and lock checking. Pass cond argument to list_for_each_entry_rcu() to silence false lockdep warning when CONFIG_PROVE_RCU_LIST is enabled by default. Signed-off-by: Madhuparna Bhowmik <madhuparnabhowmik10@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-02-19 11:13:43 -08:00
Nikolay Aleksandrov	a580c76d53	net: bridge: vlan: add per-vlan state The first per-vlan option added is state, it is needed for EVPN and for per-vlan STP. The state allows to control the forwarding on per-vlan basis. The vlan state is considered only if the port state is forwarding in order to avoid conflicts and be consistent. br_allowed_egress is called only when the state is forwarding, but the ingress case is a bit more complicated due to the fact that we may have the transition between port:BR_STATE_FORWARDING -> vlan:BR_STATE_LEARNING which should still allow the bridge to learn from the packet after vlan filtering and it will be dropped after that. Also to optimize the pvid state check we keep a copy in the vlan group to avoid one lookup. The state members are modified with *_ONCE() to annotate the lockless access. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-01-24 12:58:14 +01:00
Nikolay Aleksandrov	a5d29ae226	net: bridge: vlan: add basic option setting support This patch adds support for option modification of single vlans and ranges. It allows to only modify options, i.e. skip create/delete by using the BRIDGE_VLAN_INFO_ONLY_OPTS flag. When working with a range option changes we try to pack the notifications as much as possible. v2: do full port (all vlans) notification only when creating/deleting vlans for compatibility, rework the range detection when changing options, add more verbose extack errors and check if a vlan should be used (br_vlan_should_use checks) Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-01-24 12:58:14 +01:00
Nikolay Aleksandrov	7a53e718c5	net: bridge: vlan: add basic option dumping support We'll be dumping the options for the whole range if they're equal. The first range vlan will be used to extract the options. The commit doesn't change anything yet it just adds the skeleton for the support. The dump will happen when the first option is added. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-01-24 12:58:14 +01:00
Nikolay Aleksandrov	ac0e932d0e	net: bridge: check port state before br_allowed_egress If we make sure that br_allowed_egress is called only when we have BR_STATE_FORWARDING state then we can avoid a test later when we add per-vlan state. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-01-24 12:58:14 +01:00
Nikolay Aleksandrov	f545923b4a	net: bridge: vlan: notify on vlan add/delete/change flags Now that we can notify, send a notification on add/del or change of flags. Notifications are also compressed when possible to reduce their number and relieve user-space of extra processing, due to that we have to manually notify after each add/del in order to avoid double notifications. We try hard to notify only about the vlans which actually changed, thus a single command can result in multiple notifications about disjoint ranges if there were vlans which didn't change inside. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-01-15 13:48:18 +01:00
Nikolay Aleksandrov	cf5bddb95c	net: bridge: vlan: add rtnetlink group and notify support Add a new rtnetlink group for bridge vlan notifications - RTNLGRP_BRVLAN and add support for sending vlan notifications (both single and ranges). No functional changes intended, the notification support will be used by later patches. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-01-15 13:48:18 +01:00
Nikolay Aleksandrov	0ab5587951	net: bridge: vlan: add rtm range support Add a new vlandb nl attribute - BRIDGE_VLANDB_ENTRY_RANGE which causes RTM_NEWVLAN/DELVAN to act on a range. Dumps now automatically compress similar vlans into ranges. This will be also used when per-vlan options are introduced and vlans' options match, they will be put into a single range which is encapsulated in one netlink attribute. We need to run similar checks as br_process_vlan_info() does because these ranges will be used for options setting and they'll be able to skip br_process_vlan_info(). Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-01-15 13:48:18 +01:00
Nikolay Aleksandrov	adb3ce9bcb	net: bridge: vlan: add del rtm message support Adding RTM_DELVLAN support similar to RTM_NEWVLAN is simple, just need to map DELVLAN to DELLINK and register the handler. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-01-15 13:48:17 +01:00
Nikolay Aleksandrov	f26b296585	net: bridge: vlan: add new rtm message support Add initial RTM_NEWVLAN support which can only create vlans, operating similar to the current br_afspec(). We will use it later to also change per-vlan options. Old-style (flag-based) vlan ranges are not allowed when using RTM messages, we will introduce vlan ranges later via a new nested attribute which would allow us to have all the information about a range encapsulated into a single nl attribute. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-01-15 13:48:17 +01:00
Nikolay Aleksandrov	8dcea18708	net: bridge: vlan: add rtm definitions and dump support This patch adds vlan rtm definitions: - NEWVLAN: to be used for creating vlans, setting options and notifications - DELVLAN: to be used for deleting vlans - GETVLAN: used for dumping vlan information Dumping vlans which can span multiple messages is added now with basic information (vid and flags). We use nlmsg_parse() to validate the header length in order to be able to extend the message with filtering attributes later. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-01-15 13:48:17 +01:00
Nikolay Aleksandrov	8f4cc940a1	net: bridge: netlink: add extack error messages when processing vlans Add extack messages on vlan processing errors. We need to move the flags missing check after the "last" check since we may have "last" set but lack a range end flag in the next entry. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-01-15 13:48:17 +01:00
Nikolay Aleksandrov	5a46facbbc	net: bridge: vlan: add helpers to check for vlan id/range validity Add helpers to check if a vlan id or range are valid. The range helper must be called when range start or end are detected. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-01-15 13:48:17 +01:00
David S. Miller	31d518f35e	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Simple overlapping changes in bpf land wrt. bpf_helper_defs.h handling. Signed-off-by: David S. Miller <davem@davemloft.net>	2019-12-31 13:37:13 -08:00
David S. Miller	ec34c01575	Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf Pablo Neira Ayuso says: ==================== Netfilter fixes for net The following patchset contains Netfilter fixes for net: 1) Fix endianness issue in flowtable TCP flags dissector, from Arnd Bergmann. 2) Extend flowtable test script with dnat rules, from Florian Westphal. 3) Reject padding in ebtables user entries and validate computed user offset, reported by syzbot, from Florian Westphal. 4) Fix endianness in nft_tproxy, from Phil Sutter. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2019-12-26 13:11:40 -08:00
Hangbin Liu	bd085ef678	net: add bool confirm_neigh parameter for dst_ops.update_pmtu The MTU update code is supposed to be invoked in response to real networking events that update the PMTU. In IPv6 PMTU update function __ip6_rt_update_pmtu() we called dst_confirm_neigh() to update neighbor confirmed time. But for tunnel code, it will call pmtu before xmit, like: - tnl_update_pmtu() - skb_dst_update_pmtu() - ip6_rt_update_pmtu() - __ip6_rt_update_pmtu() - dst_confirm_neigh() If the tunnel remote dst mac address changed and we still do the neigh confirm, we will not be able to update neigh cache and ping6 remote will failed. So for this ip_tunnel_xmit() case, _EVEN_ if the MTU is changed, we should not be invoking dst_confirm_neigh() as we have no evidence of successful two-way communication at this point. On the other hand it is also important to keep the neigh reachability fresh for TCP flows, so we cannot remove this dst_confirm_neigh() call. To fix the issue, we have to add a new bool parameter for dst_ops.update_pmtu to choose whether we should do neigh update or not. I will add the parameter in this patch and set all the callers to true to comply with the previous way, and fix the tunnel code one by one on later patches. v5: No change. v4: No change. v3: Do not remove dst_confirm_neigh, but add a new bool parameter in dst_ops.update_pmtu to control whether we should do neighbor confirm. Also split the big patch to small ones for each area. v2: Remove dst_confirm_neigh in __ip6_rt_update_pmtu. Suggested-by: David Miller <davem@davemloft.net> Reviewed-by: Guillaume Nault <gnault@redhat.com> Acked-by: David Ahern <dsahern@gmail.com> Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-12-24 22:28:54 -08:00
David S. Miller	ac80010fc9	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Mere overlapping changes in the conflicts here. Signed-off-by: David S. Miller <davem@davemloft.net>	2019-12-22 15:15:05 -08:00
Linus Torvalds	78bac77b52	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from David Miller: 1) Several nf_flow_table_offload fixes from Pablo Neira Ayuso, including adding a missing ipv6 match description. 2) Several heap overflow fixes in mwifiex from qize wang and Ganapathi Bhat. 3) Fix uninit value in bond_neigh_init(), from Eric Dumazet. 4) Fix non-ACPI probing of nxp-nci, from Stephan Gerhold. 5) Fix use after free in tipc_disc_rcv(), from Tuong Lien. 6) Enforce limit of 33 tail calls in mips and riscv JIT, from Paul Chaignon. 7) Multicast MAC limit test is off by one in qede, from Manish Chopra. 8) Fix established socket lookup race when socket goes from TCP_ESTABLISHED to TCP_LISTEN, because there lacks an intervening RCU grace period. From Eric Dumazet. 9) Don't send empty SKBs from tcp_write_xmit(), also from Eric Dumazet. 10) Fix active backup transition after link failure in bonding, from Mahesh Bandewar. 11) Avoid zero sized hash table in gtp driver, from Taehee Yoo. 12) Fix wrong interface passed to ->mac_link_up(), from Russell King. 13) Fix DSA egress flooding settings in b53, from Florian Fainelli. 14) Memory leak in gmac_setup_txqs(), from Navid Emamdoost. 15) Fix double free in dpaa2-ptp code, from Ioana Ciornei. 16) Reject invalid MTU values in stmmac, from Jose Abreu. 17) Fix refcount leak in error path of u32 classifier, from Davide Caratti. 18) Fix regression causing iwlwifi firmware crashes on boot, from Anders Kaseorg. 19) Fix inverted return value logic in llc2 code, from Chan Shu Tak. 20) Disable hardware GRO when XDP is attached to qede, frm Manish Chopra. 21) Since we encode state in the low pointer bits, dst metrics must be at least 4 byte aligned, which is not necessarily true on m68k. Add annotations to fix this, from Geert Uytterhoeven. * git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (160 commits) sfc: Include XDP packet headroom in buffer step size. sfc: fix channel allocation with brute force net: dst: Force 4-byte alignment of dst_metrics selftests: pmtu: fix init mtu value in description hv_netvsc: Fix unwanted rx_table reset net: phy: ensure that phy IDs are correctly typed mod_devicetable: fix PHY module format qede: Disable hardware gro when xdp prog is installed net: ena: fix issues in setting interrupt moderation params in ethtool net: ena: fix default tx interrupt moderation interval net/smc: unregister ib devices in reboot_event net: stmmac: platform: Fix MDIO init for platforms without PHY llc2: Fix return statement of llc_stat_ev_rx_null_dsap_xid_c (and _test_c) net: hisilicon: Fix a BUG trigered by wrong bytes_compl net: dsa: ksz: use common define for tag len s390/qeth: don't return -ENOTSUPP to userspace s390/qeth: fix promiscuous mode after reset s390/qeth: handle error due to unsupported transport mode cxgb4: fix refcount init for TC-MQPRIO offload tc-testing: initial tdc selftests for cls_u32 ...	2019-12-22 09:54:33 -08:00
Florian Westphal	e608f631f0	netfilter: ebtables: compat: reject all padding in matches/watchers syzbot reported following splat: BUG: KASAN: vmalloc-out-of-bounds in size_entry_mwt net/bridge/netfilter/ebtables.c:2063 [inline] BUG: KASAN: vmalloc-out-of-bounds in compat_copy_entries+0x128b/0x1380 net/bridge/netfilter/ebtables.c:2155 Read of size 4 at addr ffffc900004461f4 by task syz-executor267/7937 CPU: 1 PID: 7937 Comm: syz-executor267 Not tainted 5.5.0-rc1-syzkaller #0 size_entry_mwt net/bridge/netfilter/ebtables.c:2063 [inline] compat_copy_entries+0x128b/0x1380 net/bridge/netfilter/ebtables.c:2155 compat_do_replace+0x344/0x720 net/bridge/netfilter/ebtables.c:2249 compat_do_ebt_set_ctl+0x22f/0x27e net/bridge/netfilter/ebtables.c:2333 [..] Because padding isn't considered during computation of ->buf_user_offset, "total" is decremented by fewer bytes than it should. Therefore, the first part of if (total < sizeof(entry) \|\| entry->next_offset < sizeof(*entry)) will pass, -- it should not have. This causes oob access: entry->next_offset is past the vmalloced size. Reject padding and check that computed user offset (sum of ebt_entry structure plus all individual matches/watchers/targets) is same value that userspace gave us as the offset of the next entry. Reported-by: syzbot+f68108fed972453a0ad4@syzkaller.appspotmail.com Fixes: `81e675c227` ("netfilter: ebtables: add CONFIG_COMPAT support") Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2019-12-20 02:12:27 +01:00
Vivien Didelot	de1799667b	net: bridge: add STP xstats This adds rx_bpdu, tx_bpdu, rx_tcn, tx_tcn, transition_blk, transition_fwd xstats counters to the bridge ports copied over via netlink, providing useful information for STP. Signed-off-by: Vivien Didelot <vivien.didelot@gmail.com> Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>	2019-12-14 20:02:36 -08:00
David S. Miller	7da538c1e1	Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf Pablo Neira Ayuso says: ==================== Netfilter fixes for net The following patchset contains Netfilter fixes for net: 1) Wait for rcu grace period after releasing netns in ctnetlink, from Florian Westphal. 2) Incorrect command type in flowtable offload ndo invocation, from wenxu. 3) Incorrect callback type in flowtable offload flow tuple updates, also from wenxu. 4) Fix compile warning on flowtable offload infrastructure due to possible reference to uninitialized variable, from Nathan Chancellor. 5) Do not inline nf_ct_resolve_clash(), this is called from slow path / stress situations. From Florian Westphal. 6) Missing IPv6 flow selector description in flowtable offload. 7) Missing check for NETDEV_UNREGISTER in nf_tables offload infrastructure, from wenxu. 8) Update NAT selftest to use randomized netns names, from Florian Westphal. 9) Restore nfqueue bridge support, from Marco Oliverio. 10) Compilation warning in SCTP_CHUNKMAP_*() on xt_sctp header. From Phil Sutter. 11) Fix bogus lookup/get match for non-anonymous rbtree sets. 12) Missing netlink validation for NFT_SET_ELEM_INTERVAL_END elements. 13) Missing netlink validation for NFT_DATA_VALUE after nft_data_init(). 14) If rule specifies no actions, offload infrastructure returns EOPNOTSUPP. 15) Module refcount leak in object updates. 16) Missing sanitization for ARP traffic from br_netfilter, from Eric Dumazet. 17) Compilation breakage on big-endian due to incorrect memcpy() size in the flowtable offload infrastructure. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2019-12-09 14:03:33 -08:00
Pankaj Bharadiya	c593642c8b	treewide: Use sizeof_field() macro Replace all the occurrences of FIELD_SIZEOF() with sizeof_field() except at places where these are defined. Later patches will remove the unused definition of FIELD_SIZEOF(). This patch is generated using following script: EXCLUDE_FILES="include/linux/stddef.h\|include/linux/kernel.h" git grep -l -e "\bFIELD_SIZEOF\b" \| while read file; do if [[ "$file" =~ $EXCLUDE_FILES ]]; then continue fi sed -i -e 's/\bFIELD_SIZEOF\b/sizeof_field/g' $file; done Signed-off-by: Pankaj Bharadiya <pankaj.laxminarayan.bharadiya@intel.com> Link: https://lore.kernel.org/r/20190924105839.110713-3-pankaj.laxminarayan.bharadiya@intel.com Co-developed-by: Kees Cook <keescook@chromium.org> Signed-off-by: Kees Cook <keescook@chromium.org> Acked-by: David Miller <davem@davemloft.net> # for net	2019-12-09 10:36:44 -08:00
Eric Dumazet	5604285839	netfilter: bridge: make sure to pull arp header in br_nf_forward_arp() syzbot is kind enough to remind us we need to call skb_may_pull() BUG: KMSAN: uninit-value in br_nf_forward_arp+0xe61/0x1230 net/bridge/br_netfilter_hooks.c:665 CPU: 1 PID: 11631 Comm: syz-executor.1 Not tainted 5.4.0-rc8-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: <IRQ> __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x1c9/0x220 lib/dump_stack.c:118 kmsan_report+0x128/0x220 mm/kmsan/kmsan_report.c:108 __msan_warning+0x64/0xc0 mm/kmsan/kmsan_instr.c:245 br_nf_forward_arp+0xe61/0x1230 net/bridge/br_netfilter_hooks.c:665 nf_hook_entry_hookfn include/linux/netfilter.h:135 [inline] nf_hook_slow+0x18b/0x3f0 net/netfilter/core.c:512 nf_hook include/linux/netfilter.h:260 [inline] NF_HOOK include/linux/netfilter.h:303 [inline] __br_forward+0x78f/0xe30 net/bridge/br_forward.c:109 br_flood+0xef0/0xfe0 net/bridge/br_forward.c:234 br_handle_frame_finish+0x1a77/0x1c20 net/bridge/br_input.c:162 nf_hook_bridge_pre net/bridge/br_input.c:245 [inline] br_handle_frame+0xfb6/0x1eb0 net/bridge/br_input.c:348 __netif_receive_skb_core+0x20b9/0x51a0 net/core/dev.c:4830 __netif_receive_skb_one_core net/core/dev.c:4927 [inline] __netif_receive_skb net/core/dev.c:5043 [inline] process_backlog+0x610/0x13c0 net/core/dev.c:5874 napi_poll net/core/dev.c:6311 [inline] net_rx_action+0x7a6/0x1aa0 net/core/dev.c:6379 __do_softirq+0x4a1/0x83a kernel/softirq.c:293 do_softirq_own_stack+0x49/0x80 arch/x86/entry/entry_64.S:1091 </IRQ> do_softirq kernel/softirq.c:338 [inline] __local_bh_enable_ip+0x184/0x1d0 kernel/softirq.c:190 local_bh_enable+0x36/0x40 include/linux/bottom_half.h:32 rcu_read_unlock_bh include/linux/rcupdate.h:688 [inline] __dev_queue_xmit+0x38e8/0x4200 net/core/dev.c:3819 dev_queue_xmit+0x4b/0x60 net/core/dev.c:3825 packet_snd net/packet/af_packet.c:2959 [inline] packet_sendmsg+0x8234/0x9100 net/packet/af_packet.c:2984 sock_sendmsg_nosec net/socket.c:637 [inline] sock_sendmsg net/socket.c:657 [inline] __sys_sendto+0xc44/0xc70 net/socket.c:1952 __do_sys_sendto net/socket.c:1964 [inline] __se_sys_sendto+0x107/0x130 net/socket.c:1960 __x64_sys_sendto+0x6e/0x90 net/socket.c:1960 do_syscall_64+0xb6/0x160 arch/x86/entry/common.c:291 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x45a679 Code: ad b6 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 7b b6 fb ff c3 66 2e 0f 1f 84 00 00 00 00 RSP: 002b:00007f0a3c9e5c78 EFLAGS: 00000246 ORIG_RAX: 000000000000002c RAX: ffffffffffffffda RBX: 0000000000000006 RCX: 000000000045a679 RDX: 000000000000000e RSI: 0000000020000200 RDI: 0000000000000003 RBP: 000000000075bf20 R08: 00000000200000c0 R09: 0000000000000014 R10: 0000000000000000 R11: 0000000000000246 R12: 00007f0a3c9e66d4 R13: 00000000004c8ec1 R14: 00000000004dfe28 R15: 00000000ffffffff Uninit was created at: kmsan_save_stack_with_flags mm/kmsan/kmsan.c:149 [inline] kmsan_internal_poison_shadow+0x5c/0x110 mm/kmsan/kmsan.c:132 kmsan_slab_alloc+0x97/0x100 mm/kmsan/kmsan_hooks.c:86 slab_alloc_node mm/slub.c:2773 [inline] __kmalloc_node_track_caller+0xe27/0x11a0 mm/slub.c:4381 __kmalloc_reserve net/core/skbuff.c:141 [inline] __alloc_skb+0x306/0xa10 net/core/skbuff.c:209 alloc_skb include/linux/skbuff.h:1049 [inline] alloc_skb_with_frags+0x18c/0xa80 net/core/skbuff.c:5662 sock_alloc_send_pskb+0xafd/0x10a0 net/core/sock.c:2244 packet_alloc_skb net/packet/af_packet.c:2807 [inline] packet_snd net/packet/af_packet.c:2902 [inline] packet_sendmsg+0x63a6/0x9100 net/packet/af_packet.c:2984 sock_sendmsg_nosec net/socket.c:637 [inline] sock_sendmsg net/socket.c:657 [inline] __sys_sendto+0xc44/0xc70 net/socket.c:1952 __do_sys_sendto net/socket.c:1964 [inline] __se_sys_sendto+0x107/0x130 net/socket.c:1960 __x64_sys_sendto+0x6e/0x90 net/socket.c:1960 do_syscall_64+0xb6/0x160 arch/x86/entry/common.c:291 entry_SYSCALL_64_after_hwframe+0x44/0xa9 Fixes: `c4e70a87d9` ("netfilter: bridge: rename br_netfilter.c to br_netfilter_hooks.c") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Reviewed-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2019-12-09 13:14:06 +01:00
Nikolay Aleksandrov	c4b4c42185	net: bridge: deny dev_set_mac_address() when unregistering We have an interesting memory leak in the bridge when it is being unregistered and is a slave to a master device which would change the mac of its slaves on unregister (e.g. bond, team). This is a very unusual setup but we do end up leaking 1 fdb entry because dev_set_mac_address() would cause the bridge to insert the new mac address into its table after all fdbs are flushed, i.e. after dellink() on the bridge has finished and we call NETDEV_UNREGISTER the bond/team would release it and will call dev_set_mac_address() to restore its original address and that in turn will add an fdb in the bridge. One fix is to check for the bridge dev's reg_state in its ndo_set_mac_address callback and return an error if the bridge is not in NETREG_REGISTERED. Easy steps to reproduce: 1. add bond in mode != A/B 2. add any slave to the bond 3. add bridge dev as a slave to the bond 4. destroy the bridge device Trace: unreferenced object 0xffff888035c4d080 (size 128): comm "ip", pid 4068, jiffies 4296209429 (age 1413.753s) hex dump (first 32 bytes): 41 1d c9 36 80 88 ff ff 00 00 00 00 00 00 00 00 A..6............ d2 19 c9 5e 3f d7 00 00 00 00 00 00 00 00 00 00 ...^?........... backtrace: [<00000000ddb525dc>] kmem_cache_alloc+0x155/0x26f [<00000000633ff1e0>] fdb_create+0x21/0x486 [bridge] [<0000000092b17e9c>] fdb_insert+0x91/0xdc [bridge] [<00000000f2a0f0ff>] br_fdb_change_mac_address+0xb3/0x175 [bridge] [<000000001de02dbd>] br_stp_change_bridge_id+0xf/0xff [bridge] [<00000000ac0e32b1>] br_set_mac_address+0x76/0x99 [bridge] [<000000006846a77f>] dev_set_mac_address+0x63/0x9b [<00000000d30738fc>] __bond_release_one+0x3f6/0x455 [bonding] [<00000000fc7ec01d>] bond_netdev_event+0x2f2/0x400 [bonding] [<00000000305d7795>] notifier_call_chain+0x38/0x56 [<0000000028885d4a>] call_netdevice_notifiers+0x1e/0x23 [<000000008279477b>] rollback_registered_many+0x353/0x6a4 [<0000000018ef753a>] unregister_netdevice_many+0x17/0x6f [<00000000ba854b7a>] rtnl_delete_link+0x3c/0x43 [<00000000adf8618d>] rtnl_dellink+0x1dc/0x20a [<000000009b6395fd>] rtnetlink_rcv_msg+0x23d/0x268 Fixes: `4359881338` ("bridge: add local MAC address to forwarding table (v2)") Reported-by: syzbot+2add91c08eb181fea1bf@syzkaller.appspotmail.com Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-12-03 11:21:20 -08:00
Matthias Schiffer	542575fe4b	bridge: implement get_link_ksettings ethtool method We return the maximum speed of all active ports. This matches how the link speed would give an upper limit for traffic to/from any single peer if the bridge were replaced with a hardware switch. Signed-off-by: Matthias Schiffer <mschiffer@universe-factory.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-11-12 19:52:15 -08:00
David S. Miller	14684b9301	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net One conflict in the BPF samples Makefile, some fixes in 'net' whilst we were converting over to Makefile.target rules in 'net-next'. Signed-off-by: David S. Miller <davem@davemloft.net>	2019-11-09 11:04:37 -08:00
Florian Westphal	b23c0742c2	bridge: ebtables: don't crash when using dnat target in output chains xt_in() returns NULL in the output hook, skip the pkt_type change for that case, redirection only makes sense in broute/prerouting hooks. Reported-by: Tom Yan <tom.ty89@gmail.com> Cc: Linus Lüssing <linus.luessing@c0d3.blue> Fixes: `cf3cb246e2` ("bridge: ebtables: fix reception of frames DNAT-ed to bridge device/port") Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2019-11-04 20:58:34 +01:00
Nikolay Aleksandrov	5d1fcaf35d	net: bridge: fdb: eliminate extra port state tests from fast-path When commit `df1c0b8468` ("[BRIDGE]: Packets leaking out of disabled/blocked ports.") introduced the port state tests in br_fdb_update() it was to avoid learning/refreshing from STP BPDUs, it was also used to avoid learning/refreshing from user-space with NTF_USE. Those two tests are done for every packet entering the bridge if it's learning, but for the fast-path we already have them checked in br_handle_frame() and is unnecessary to do it again. Thus push the checks to the unlikely cases and drop them from br_fdb_update(), the new nbp_state_should_learn() helper is used to determine if the port state allows br_fdb_update() to be called. The two places which need to do it manually are: - user-space add call with NTF_USE set - link-local packet learning done in __br_handle_local_finish() Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-11-04 11:15:27 -08:00
David S. Miller	d31e95585c	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net The only slightly tricky merge conflict was the netdevsim because the mutex locking fix overlapped a lot of driver reload reorganization. The rest were (relatively) trivial in nature. Signed-off-by: David S. Miller <davem@davemloft.net>	2019-11-02 13:54:56 -07:00
Nikolay Aleksandrov	58ec1ea637	net: bridge: fdb: restore unlikely() when taking over externally added entries Taking over hw-learned entries is not a likely scenario so restore the unlikely() use for the case of SW taking over externally learned entries. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-11-01 10:32:43 -07:00
Nikolay Aleksandrov	31f1155bdc	net: bridge: fdb: avoid two atomic bitops in br_fdb_external_learn_add() If we setup the fdb flags prior to calling fdb_create() we can avoid two atomic bitops when learning a new entry. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-11-01 10:32:43 -07:00
Nikolay Aleksandrov	be0c567797	net: bridge: fdb: br_fdb_update can take flags directly If we modify br_fdb_update() to take flags directly we can get rid of one test and one atomic bitop in the learning path. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-11-01 10:32:43 -07:00
Nikolay Aleksandrov	3fb01a31af	net: bridge: fdb: set flags directly in fdb_create No need to have separate arguments for each flag, just set the flags to whatever was passed to fdb_create() before the fdb is published. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-10-29 18:12:49 -07:00
Nikolay Aleksandrov	d38c6e3db0	net: bridge: fdb: convert offloaded to use bitops Convert the offloaded field to a flag and use bitops. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-10-29 18:12:49 -07:00
Nikolay Aleksandrov	b5cd9f7c42	net: bridge: fdb: convert added_by_external_learn to use bitops Convert the added_by_external_learn field to a flag and use bitops. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-10-29 18:12:49 -07:00
Nikolay Aleksandrov	ac3ca6af44	net: bridge: fdb: convert added_by_user to bitops Straight-forward convert of the added_by_user field to bitops. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-10-29 18:12:49 -07:00
Nikolay Aleksandrov	e0458d9a73	net: bridge: fdb: convert is_sticky to bitops Straight-forward convert of the is_sticky field to bitops. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-10-29 18:12:49 -07:00
Nikolay Aleksandrov	29e63fffd6	net: bridge: fdb: convert is_static to bitops Convert the is_static to bitops, make use of the combined test_and_set/clear_bit to simplify expressions in fdb_add_entry. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-10-29 18:12:49 -07:00
Nikolay Aleksandrov	6869c3b02b	net: bridge: fdb: convert is_local to bitops The patch adds a new fdb flags field in the hole between the two cache lines and uses it to convert is_local to bitops. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-10-29 18:12:49 -07:00
Taehee Yoo	ab92d68fc2	net: core: add generic lockdep keys Some interface types could be nested. (VLAN, BONDING, TEAM, MACSEC, MACVLAN, IPVLAN, VIRT_WIFI, VXLAN, etc..) These interface types should set lockdep class because, without lockdep class key, lockdep always warn about unexisting circular locking. In the current code, these interfaces have their own lockdep class keys and these manage itself. So that there are so many duplicate code around the /driver/net and /net/. This patch adds new generic lockdep keys and some helper functions for it. This patch does below changes. a) Add lockdep class keys in struct net_device - qdisc_running, xmit, addr_list, qdisc_busylock - these keys are used as dynamic lockdep key. b) When net_device is being allocated, lockdep keys are registered. - alloc_netdev_mqs() c) When net_device is being free'd llockdep keys are unregistered. - free_netdev() d) Add generic lockdep key helper function - netdev_register_lockdep_key() - netdev_unregister_lockdep_key() - netdev_update_lockdep_key() e) Remove unnecessary generic lockdep macro and functions f) Remove unnecessary lockdep code of each interfaces. After this patch, each interface modules don't need to maintain their lockdep keys. Signed-off-by: Taehee Yoo <ap420073@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-10-24 14:53:48 -07:00
Eric Dumazet	e7a409c3f4	ipv4: fix IPSKB_FRAG_PMTU handling with fragmentation This patch removes the iph field from the state structure, which is not properly initialized. Instead, add a new field to make the "do we want to set DF" be the state bit and move the code to set the DF flag from ip_frag_next(). Joint work with Pablo and Linus. Fixes: `19c3401a91` ("net: ipv4: place control buffer handling away from fragmentation iterators") Reported-by: Patrick Schönthaler <patrick@notvads.ovh> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-10-21 10:46:42 -07:00
Eric Dumazet	9669fffc14	net: ensure correct skb->tstamp in various fragmenters Thomas found that some forwarded packets would be stuck in FQ packet scheduler because their skb->tstamp contained timestamps far in the future. We thought we addressed this point in commit `8203e2d844` ("net: clear skb->tstamp in forwarding paths") but there is still an issue when/if a packet needs to be fragmented. In order to meet EDT requirements, we have to make sure all fragments get the original skb->tstamp. Note that this original skb->tstamp should be zero in forwarding path, but might have a non zero value in output path if user decided so. Fixes: `fb420d5d91` ("tcp/fq: move back to CLOCK_MONOTONIC") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Thomas Bartschies <Thomas.Bartschies@cvk.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-10-18 10:02:37 -07:00
David S. Miller	aa2eaa8c27	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Minor overlapping changes in the btusb and ixgbe drivers. Signed-off-by: David S. Miller <davem@davemloft.net>	2019-09-15 14:17:27 +02:00
Jeremy Sowden	46705b070c	netfilter: move nf_bridge_frag_data struct definition to a more appropriate header. There is a struct definition function in nf_conntrack_bridge.h which is not specific to conntrack and is used elswhere in netfilter. Move it into netfilter_bridge.h. Signed-off-by: Jeremy Sowden <jeremy@azazel.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2019-09-13 12:35:33 +02:00
Jeremy Sowden	40d102cde0	netfilter: update include directives. Include some headers in files which require them, and remove others which are not required. Signed-off-by: Jeremy Sowden <jeremy@azazel.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2019-09-13 12:33:06 +02:00
Jeremy Sowden	85cfbc25e5	netfilter: inline xt_hashlimit, ebt_802_3 and xt_physdev headers Three netfilter headers are only included once. Inline their contents at those sites and remove them. Signed-off-by: Jeremy Sowden <jeremy@azazel.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2019-09-13 12:32:48 +02:00
Nicolas Dichtel	94a72b3f02	bridge/mdb: remove wrong use of NLM_F_MULTI NLM_F_MULTI must be used only when a NLMSG_DONE message is sent at the end. In fact, NLMSG_DONE is sent only at the end of a dump. Libraries like libnl will wait forever for NLMSG_DONE. Fixes: `949f1e39a6` ("bridge: mdb: notify on router port add and del") CC: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-09-10 09:10:53 +01:00
Leonardo Bras	48bd0d68cd	netfilter: bridge: Drops IPv6 packets if IPv6 module is not loaded A kernel panic can happen if a host has disabled IPv6 on boot and have to process guest packets (coming from a bridge) using it's ip6tables. IPv6 packets need to be dropped if the IPv6 module is not loaded, and the host ip6tables will be used. Signed-off-by: Leonardo Bras <leonardo@linux.ibm.com> Acked-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2019-09-02 23:19:27 +02:00
David S. Miller	765b7590c9	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net r8152 conflicts are the NAPI fixes in 'net' overlapping with some tasklet stuff in net-next Signed-off-by: David S. Miller <davem@davemloft.net>	2019-09-02 11:20:17 -07:00
Vladimir Oltean	f40d9b2086	net: bridge: Populate the pvid flag in br_vlan_get_info Currently this simplified code snippet fails: br_vlan_get_pvid(netdev, &pvid); br_vlan_get_info(netdev, pvid, &vinfo); ASSERT(!(vinfo.flags & BRIDGE_VLAN_INFO_PVID)); It is intuitive that the pvid of a netdevice should have the BRIDGE_VLAN_INFO_PVID flag set. However I can't seem to pinpoint a commit where this behavior was introduced. It seems like it's been like that since forever. At a first glance it would make more sense to just handle the BRIDGE_VLAN_INFO_PVID flag in __vlan_add_flags. However, as Nikolay explains: There are a few reasons why we don't do it, most importantly because we need to have only one visible pvid at any single time, even if it's stale - it must be just one. Right now that rule will not be violated by this change, but people will try using this flag and could see two pvids simultaneously. You can see that the pvid code is even using memory barriers to propagate the new value faster and everywhere the pvid is read only once. That is the reason the flag is set dynamically when dumping entries, too. A second (weaker) argument against would be given the above we don't want another way to do the same thing, specifically if it can provide us with two pvids (e.g. if walking the vlan list) or if it can provide us with a pvid different from the one set in the vg. [Obviously, I'm talking about RCU pvid/vlan use cases similar to the dumps. The locked cases are fine. I would like to avoid explaining why this shouldn't be relied upon without locking] So instead of introducing the above change and making sure of the pvid uniqueness under RCU, simply dynamically populate the pvid flag in br_vlan_get_info(). Signed-off-by: Vladimir Oltean <olteanv@gmail.com> Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-08-31 13:21:19 -07:00
wenxu	daf1de9078	netfilter: nft_meta_bridge: Fix get NFT_META_BRI_IIFVPROTO in network byteorder Get the vlan_proto of ingress bridge in network byteorder as userspace expects. Otherwise this is inconsistent with NFT_META_PROTOCOL. Fixes: `2a3a93ef0b` ("netfilter: nft_meta_bridge: Add NFT_META_BRI_IIFVPROTO support") Signed-off-by: wenxu <wenxu@ucloud.cn> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2019-08-30 02:49:04 +02:00
David S. Miller	68aaf44595	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Minor conflict in r8169, bug fix had two versions in net and net-next, take the net-next hunks. Signed-off-by: David S. Miller <davem@davemloft.net>	2019-08-27 14:23:31 -07:00
Todd Seidelmann	f20faa06d8	netfilter: ebtables: Fix argument order to ADD_COUNTER The ordering of arguments to the x_tables ADD_COUNTER macro appears to be wrong in ebtables (cf. ip_tables.c, ip6_tables.c, and arp_tables.c). This causes data corruption in the ebtables userspace tools because they get incorrect packet & byte counts from the kernel. Fixes: `d72133e628` ("netfilter: ebtables: use ADD_COUNTER macro") Signed-off-by: Todd Seidelmann <tseidelmann@linode.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2019-08-19 09:34:20 +02:00
Nikolay Aleksandrov	1bc844ee0f	net: bridge: mdb: allow add/delete for host-joined groups Currently this is needed only for user-space compatibility, so similar object adds/deletes as the dumped ones would succeed. Later it can be used for L2 mcast MAC add/delete. v3: fix compiler warning (DaveM) v2: don't send a notification when used from user-space, arm the group timer if no ports are left after host entry del Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-08-17 12:36:57 -07:00
Nikolay Aleksandrov	e77b0c84e3	net: bridge: mdb: dump host-joined entries as well Currently we dump only the port mdb entries but we can have host-joined entries on the bridge itself and they should be treated as normal temp mdbs, they're already notified: $ bridge monitor all [MDB]dev br0 port br0 grp ff02::8 temp The group will not be shown in the bridge mdb output, but it takes 1 slot and it's timing out. If it's only host-joined then the mdb show output can even be empty. After this patch we show the host-joined groups: $ bridge mdb show dev br0 port br0 grp ff02::8 temp Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-08-17 12:36:57 -07:00
Nikolay Aleksandrov	6545916ed9	net: bridge: mdb: factor out mdb filling We have to factor out the mdb fill portion in order to re-use it later for the bridge mdb entries. No functional changes intended. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-08-17 12:36:56 -07:00
Nikolay Aleksandrov	f59783f5bb	net: bridge: mdb: move vlan comments Trivial patch to move the vlan comments in their proper places above the vid 0 checks. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-08-17 12:36:56 -07:00
David S. Miller	13dfb3fa49	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Just minor overlapping changes in the conflicts here. Signed-off-by: David S. Miller <davem@davemloft.net>	2019-08-06 18:44:57 -07:00
Nikolay Aleksandrov	091adf9ba6	net: bridge: move default pvid init/deinit to NETDEV_REGISTER/UNREGISTER Most of the bridge device's vlan init bugs come from the fact that its default pvid is created at the wrong time, way too early in ndo_init() before the device is even assigned an ifindex. It introduces a bug when the bridge's dev_addr is added as fdb during the initial default pvid creation the notification has ifindex/NDA_MASTER both equal to 0 (see example below) which really makes no sense for user-space[0] and is wrong. Usually user-space software would ignore such entries, but they are actually valid and will eventually have all necessary attributes. It makes much more sense to send a notification after the device has registered and has a proper ifindex allocated rather than before when there's a chance that the registration might still fail or to receive it with ifindex/NDA_MASTER == 0. Note that we can remove the fdb flush from br_vlan_flush() since that case can no longer happen. At NETDEV_REGISTER br->default_pvid is always == 1 as it's initialized by br_vlan_init() before that and at NETDEV_UNREGISTER it can be anything depending why it was called (if called due to NETDEV_REGISTER error it'll still be == 1, otherwise it could be any value changed during the device life time). For the demonstration below a small change to iproute2 for printing all fdb notifications is added, because it contained a workaround not to show entries with ifindex == 0. Command executed while monitoring: $ ip l add br0 type bridge Before (both ifindex and master == 0): $ bridge monitor fdb 36:7e:8a:b3:56:ba dev * vlan 1 master * permanent After (proper br0 ifindex): $ bridge monitor fdb e6:2a:ae:7a:b7:48 dev br0 vlan 1 master br0 permanent v4: move only the default pvid init/deinit to NETDEV_REGISTER/UNREGISTER v3: send the correct v2 patch with all changes (stub should return 0) v2: on error in br_vlan_init set br->vlgrp to NULL and return 0 in the br_vlan_bridge_event stub when bridge vlans are disabled [0] https://bugzilla.kernel.org/show_bug.cgi?id=204389 Reported-by: michael-dev <michael-dev@fami-braun.de> Fixes: `5be5a2df40` ("bridge: Add filtering support for default_pvid") Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Acked-by: Roopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-08-05 13:32:53 -07:00
Nikolay Aleksandrov	3247b27204	net: bridge: mcast: add delete due to fast-leave mdb flag In user-space there's no way to distinguish why an mdb entry was deleted and that is a problem for daemons which would like to keep the mdb in sync with remote ends (e.g. mlag) but would also like to converge faster. In almost all cases we'd like to age-out the remote entry for performance and convergence reasons except when fast-leave is enabled. In that case we want explicit immediate remote delete, thus add mdb flag which is set only when the entry is being deleted due to fast-leave. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-07-31 19:13:40 -04:00
Nikolay Aleksandrov	5c725b6b65	net: bridge: mcast: don't delete permanent entries when fast leave is enabled When permanent entries were introduced by the commit below, they were exempt from timing out and thus igmp leave wouldn't affect them unless fast leave was enabled on the port which was added before permanent entries existed. It shouldn't matter if fast leave is enabled or not if the user added a permanent entry it shouldn't be deleted on igmp leave. Before: $ echo 1 > /sys/class/net/eth4/brport/multicast_fast_leave $ bridge mdb add dev br0 port eth4 grp 229.1.1.1 permanent $ bridge mdb show dev br0 port eth4 grp 229.1.1.1 permanent < join and leave 229.1.1.1 on eth4 > $ bridge mdb show $ After: $ echo 1 > /sys/class/net/eth4/brport/multicast_fast_leave $ bridge mdb add dev br0 port eth4 grp 229.1.1.1 permanent $ bridge mdb show dev br0 port eth4 grp 229.1.1.1 permanent < join and leave 229.1.1.1 on eth4 > $ bridge mdb show dev br0 port eth4 grp 229.1.1.1 permanent Fixes: `ccb1c31a7a` ("bridge: add flags to distinguish permanent mdb entires") Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-07-31 19:03:01 -04:00
David S. Miller	fa9586aff9	Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf Pablo Neira Ayuso says: ==================== netfilter fixes for net The following patchset contains Netfilter fixes for your net tree: 1) memleak in ebtables from the error path for the 32/64 compat layer, from Florian Westphal. 2) Fix inverted meta ifname/ifidx matching when no interface is set on either from the input/output path, from Phil Sutter. 3) Remove goto label in nft_meta_bridge, also from Phil. 4) Missing include guard in xt_connlabel, from Masahiro Yamada. 5) Two patch to fix ipset destination MAC matching coming from Stephano Brivio, via Jozsef Kadlecsik. 6) Fix set rename and listing concurrency problem, from Shijie Luo. Patch also coming via Jozsef Kadlecsik. 7) ebtables 32/64 compat missing base chain policy in rule count, from Florian Westphal. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2019-07-31 08:49:09 -07:00
Florian Westphal	3b48300d5c	netfilter: ebtables: also count base chain policies ebtables doesn't include the base chain policies in the rule count, so we need to add them manually when we call into the x_tables core to allocate space for the comapt offset table. This lead syzbot to trigger: WARNING: CPU: 1 PID: 9012 at net/netfilter/x_tables.c:649 xt_compat_add_offset.cold+0x11/0x36 net/netfilter/x_tables.c:649 Reported-by: syzbot+276ddebab3382bbf72db@syzkaller.appspotmail.com Fixes: `2035f3ff8e` ("netfilter: ebtables: compat: un-break 32bit setsockopt when no rules are present") Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2019-07-30 13:37:44 +02:00
Nikolay Aleksandrov	d7bae09fa0	net: bridge: delete local fdb on device init failure On initialization failure we have to delete the local fdb which was inserted due to the default pvid creation. This problem has been present since the inception of default_pvid. Note that currently there are 2 cases: 1) in br_dev_init() when br_multicast_init() fails 2) if register_netdevice() fails after calling ndo_init() This patch takes care of both since br_vlan_flush() is called on both occasions. Also the new fdb delete would be a no-op on normal bridge device destruction since the local fdb would've been already flushed by br_dev_delete(). This is not an issue for ports since nbp_vlan_init() is called last when adding a port thus nothing can fail after it. Reported-by: syzbot+88533dc8b582309bf3ee@syzkaller.appspotmail.com Fixes: `5be5a2df40` ("bridge: Add filtering support for default_pvid") Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-07-29 09:50:05 -07:00
Phil Sutter	67d8683584	netfilter: nft_meta_bridge: Eliminate 'out' label The label is used just once and the code it points at is not reused, no point in keeping it. Signed-off-by: Phil Sutter <phil@nwl.cc> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2019-07-25 08:38:29 +02:00
Phil Sutter	cb81572e8c	netfilter: nf_tables: Make nft_meta expression more robust nft_meta_get_eval()'s tendency to bail out setting NFT_BREAK verdict in situations where required data is missing leads to unexpected behaviour with inverted checks like so: \| meta iifname != eth0 accept This rule will never match if there is no input interface (or it is not known) which is not intuitive and, what's worse, breaks consistency of iptables-nft with iptables-legacy. Fix this by falling back to placing a value in dreg which never matches (avoiding accidental matches), i.e. zero for interface index and an empty string for interface name. Signed-off-by: Phil Sutter <phil@nwl.cc> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2019-07-25 08:37:20 +02:00
Wenwen Wang	15a78ba184	netfilter: ebtables: fix a memory leak bug in compat In compat_do_replace(), a temporary buffer is allocated through vmalloc() to hold entries copied from the user space. The buffer address is firstly saved to 'newinfo->entries', and later on assigned to 'entries_tmp'. Then the entries in this temporary buffer is copied to the internal kernel structure through compat_copy_entries(). If this copy process fails, compat_do_replace() should be terminated. However, the allocated temporary buffer is not freed on this path, leading to a memory leak. To fix the bug, free the buffer before returning from compat_do_replace(). Signed-off-by: Wenwen Wang <wenwen@cs.uga.edu> Reviewed-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2019-07-21 21:00:15 +02:00
Arnd Bergmann	dfee0e99bc	netfilter: bridge: make NF_TABLES_BRIDGE tristate The new nft_meta_bridge code fails to link as built-in when NF_TABLES is a loadable module. net/bridge/netfilter/nft_meta_bridge.o: In function `nft_meta_bridge_get_eval': nft_meta_bridge.c:(.text+0x1e8): undefined reference to `nft_meta_get_eval' net/bridge/netfilter/nft_meta_bridge.o: In function `nft_meta_bridge_get_init': nft_meta_bridge.c:(.text+0x468): undefined reference to `nft_meta_get_init' nft_meta_bridge.c:(.text+0x49c): undefined reference to `nft_parse_register' nft_meta_bridge.c:(.text+0x4cc): undefined reference to `nft_validate_register_store' net/bridge/netfilter/nft_meta_bridge.o: In function `nft_meta_bridge_module_exit': nft_meta_bridge.c:(.exit.text+0x14): undefined reference to `nft_unregister_expr' net/bridge/netfilter/nft_meta_bridge.o: In function `nft_meta_bridge_module_init': nft_meta_bridge.c:(.init.text+0x14): undefined reference to `nft_register_expr' net/bridge/netfilter/nft_meta_bridge.o:(.rodata+0x60): undefined reference to `nft_meta_get_dump' net/bridge/netfilter/nft_meta_bridge.o:(.rodata+0x88): undefined reference to `nft_meta_set_eval' This can happen because the NF_TABLES_BRIDGE dependency itself is just a 'bool'. Make the symbol a 'tristate' instead so Kconfig can propagate the dependencies correctly. Fixes: `30e103fe24` ("netfilter: nft_meta: move bridge meta keys into nft_meta_bridge") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2019-07-19 18:08:14 +02:00
Pablo Neira Ayuso	fc2f14f8f7	netfilter: bridge: NF_CONNTRACK_BRIDGE does not depend on NF_TABLES_BRIDGE Place NF_CONNTRACK_BRIDGE away from the NF_TABLES_BRIDGE dependency. Fixes: `3c171f496e` ("netfilter: bridge: add connection tracking system") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2019-07-18 20:55:54 +02:00

... 3 4 5 6 7 ...

2329 Commits