Commit Graph

1043129 Commits

Author SHA1 Message Date
Davide Caratti
3f4a08909e mptcp: allow changing the 'backup' bit when no sockets are open
current Linux refuses to change the 'backup' bit of MPTCP endpoints, i.e.
using MPTCP_PM_CMD_SET_FLAGS, unless it finds (at least) one subflow that
matches the endpoint address. There is no reason for that, so we can just
ignore the return value of mptcp_nl_addr_backup(). In this way, endpoints
can reconfigure their 'backup' flag even if no MPTCP sockets are open (or
more generally, in case the MP_PRIO message is not sent out).

Fixes: 0f9f696a50 ("mptcp: add set_flags command in PM netlink")
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-24 10:51:36 +01:00
Florian Westphal
ea1300b9df mptcp: don't return sockets in foreign netns
mptcp_token_get_sock() may return a mptcp socket that is in
a different net namespace than the socket that received the token value.

The mptcp syncookie code path had an explicit check for this,
this moves the test into mptcp_token_get_sock() function.

Eventually token.c should be converted to pernet storage, but
such change is not suitable for net tree.

Fixes: 2c5ebd001d ("mptcp: refactor token container")
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-24 10:51:36 +01:00
Xin Long
f7e745f8e9 sctp: break out if skb_header_pointer returns NULL in sctp_rcv_ootb
We should always check if skb_header_pointer's return is NULL before
using it, otherwise it may cause null-ptr-deref, as syzbot reported:

  KASAN: null-ptr-deref in range [0x0000000000000000-0x0000000000000007]
  RIP: 0010:sctp_rcv_ootb net/sctp/input.c:705 [inline]
  RIP: 0010:sctp_rcv+0x1d84/0x3220 net/sctp/input.c:196
  Call Trace:
  <IRQ>
   sctp6_rcv+0x38/0x60 net/sctp/ipv6.c:1109
   ip6_protocol_deliver_rcu+0x2e9/0x1ca0 net/ipv6/ip6_input.c:422
   ip6_input_finish+0x62/0x170 net/ipv6/ip6_input.c:463
   NF_HOOK include/linux/netfilter.h:307 [inline]
   NF_HOOK include/linux/netfilter.h:301 [inline]
   ip6_input+0x9c/0xd0 net/ipv6/ip6_input.c:472
   dst_input include/net/dst.h:460 [inline]
   ip6_rcv_finish net/ipv6/ip6_input.c:76 [inline]
   NF_HOOK include/linux/netfilter.h:307 [inline]
   NF_HOOK include/linux/netfilter.h:301 [inline]
   ipv6_rcv+0x28c/0x3c0 net/ipv6/ip6_input.c:297

Fixes: 3acb50c18d ("sctp: delay as much as possible skb_linearize")
Reported-by: syzbot+581aff2ae6b860625116@syzkaller.appspotmail.com
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-24 10:36:19 +01:00
Linus Torvalds
9bc62afe03 Networking fixes for 5.15-rc3.
Current release - regressions:
 
  - dsa: bcm_sf2: fix array overrun in bcm_sf2_num_active_ports()
 
 Previous releases - regressions:
 
  - introduce a shutdown method to mdio device drivers, and make DSA
    switch drivers compatible with masters disappearing on shutdown;
    preventing infinite reference wait
 
  - fix issues in mdiobus users related to ->shutdown vs ->remove
 
  - virtio-net: fix pages leaking when building skb in big mode
 
  - xen-netback: correct success/error reporting for the SKB-with-fraglist
 
  - dsa: tear down devlink port regions when tearing down the devlink
         port on error
 
  - nexthop: fix division by zero while replacing a resilient group
 
  - hns3: check queue, vf, vlan ids range before using
 
 Previous releases - always broken:
 
  - napi: fix race against netpoll causing NAPI getting stuck
 
  - mlx4_en: ensure link operstate is updated even if link comes up
             before netdev registration
 
  - bnxt_en: fix TX timeout when TX ring size is set to the smallest
 
  - enetc: fix illegal access when reading affinity_hint;
           prevent oops on sysfs access
 
  - mtk_eth_soc: avoid creating duplicate offload entries
 
 Misc:
 
  - core: correct the sock::sk_lock.owned lockdep annotations
 
 Signed-off-by: Jakub Kicinski <kuba@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmFMr4MACgkQMUZtbf5S
 Irv6Ag/+Ml4q6/IVO0jBppztZO1RrSalb3YE9JjPQyMchauVcdcADNpYF+Jo/gcH
 4q+/Oikfp6gkQpJTFd0Y9X7UhwA4Jm4wWtEisqc6PJeHOagDZmVUn353WtgpnCNL
 CgYBfa5k3msGudkqgeXIyiP/2sekBevTy+fOptubLZClyBrEwNUUUZBlpT9aI9Sj
 ru1eMYklfcxP60AQgNhqq6ZwJnRELgN75fSR6ypVCGcRnTK4UGL/b6TvnPYn8uYY
 zeNuMZZzYZK5B73tC6rWpteHWZ7VW3Km0WvIKs+ORM8nYchz/EprKZ0HCLPYrWvf
 ib5Wi7HyL7/n9k9NUTCGrQY3tkOWNzXOepjpiBZPqCG9r2hc3JSR7Q2lFwL+gKv/
 sh2y+T2xfp0WFGmG2XiU2MgnkypMSKah1sC/XRE7YLw02vPAnWQxxl/KVNek4j7M
 CH/Tg9ErVKDRLN7KO/kKl3s8I8N4hdctms/YUt9QD5J9Rw/Jqwr/79bq1uLy6d4o
 //ipmCTHex57Nvy80PtgcuKJhoeqGwR/Av6BvBMRZ1SOYs/C6q45skHTlYyiNY3+
 Dyj9+nfrhsyE835GKPe8lqBFZONBXpXw+EUNXeYRiv0Pcd+JKek07bbajSQVSpd8
 8nqQwylpGII0iPGyOc9wKajzh7O5W2odFIdOwtY/5yVjrcFgBd0=
 =VcL3
 -----END PGP SIGNATURE-----

Merge tag 'net-5.15-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Pull networking fixes from Jakub Kicinski:
 "Current release - regressions:

   - dsa: bcm_sf2: fix array overrun in bcm_sf2_num_active_ports()

  Previous releases - regressions:

   - introduce a shutdown method to mdio device drivers, and make DSA
     switch drivers compatible with masters disappearing on shutdown;
     preventing infinite reference wait

   - fix issues in mdiobus users related to ->shutdown vs ->remove

   - virtio-net: fix pages leaking when building skb in big mode

   - xen-netback: correct success/error reporting for the
     SKB-with-fraglist

   - dsa: tear down devlink port regions when tearing down the devlink
     port on error

   - nexthop: fix division by zero while replacing a resilient group

   - hns3: check queue, vf, vlan ids range before using

  Previous releases - always broken:

   - napi: fix race against netpoll causing NAPI getting stuck

   - mlx4_en: ensure link operstate is updated even if link comes up
     before netdev registration

   - bnxt_en: fix TX timeout when TX ring size is set to the smallest

   - enetc: fix illegal access when reading affinity_hint; prevent oops
     on sysfs access

   - mtk_eth_soc: avoid creating duplicate offload entries

  Misc:

   - core: correct the sock::sk_lock.owned lockdep annotations"

* tag 'net-5.15-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (51 commits)
  atlantic: Fix issue in the pm resume flow.
  net/mlx4_en: Don't allow aRFS for encapsulated packets
  net: mscc: ocelot: fix forwarding from BLOCKING ports remaining enabled
  net: ethernet: mtk_eth_soc: avoid creating duplicate offload entries
  nfc: st-nci: Add SPI ID matching DT compatible
  MAINTAINERS: remove Guvenc Gulce as net/smc maintainer
  nexthop: Fix memory leaks in nexthop notification chain listeners
  mptcp: ensure tx skbs always have the MPTCP ext
  qed: rdma - don't wait for resources under hw error recovery flow
  s390/qeth: fix deadlock during failing recovery
  s390/qeth: Fix deadlock in remove_discipline
  s390/qeth: fix NULL deref in qeth_clear_working_pool_list()
  net: dsa: realtek: register the MDIO bus under devres
  net: dsa: don't allocate the slave_mii_bus using devres
  Doc: networking: Fox a typo in ice.rst
  net: dsa: fix dsa_tree_setup error path
  net/smc: fix 'workqueue leaked lock' in smc_conn_abort_work
  net/smc: add missing error check in smc_clc_prfx_set()
  net: hns3: fix a return value error in hclge_get_reset_status()
  net: hns3: check vlan id before using it
  ...
2021-09-23 10:30:31 -07:00
Shakeel Butt
1f828223b7 memcg: flush lruvec stats in the refault
Prior to the commit 7e1c0d6f58 ("memcg: switch lruvec stats to rstat")
and the commit aa48e47e39 ("memcg: infrastructure to flush memcg
stats"), each lruvec memcg stats can be off by (nr_cgroups * nr_cpus *
32) at worst and for unbounded amount of time.  The commit aa48e47e39
moved the lruvec stats to rstat infrastructure and the commit
7e1c0d6f58 bounded the error for all the lruvec stats to (nr_cpus *
32) at worst for at most 2 seconds.  More specifically it decoupled the
number of stats and the number of cgroups from the error rate.

However this reduction in error comes with the cost of triggering the
slowpath of stats update more frequently.  Previously in the slowpath
the kernel adds the stats up the memcg tree.  After aa48e47e39, the
kernel triggers the asyn lruvec stats flush through queue_work().  This
causes regression reports from 0day kernel bot [1] as well as from
phoronix test suite [2].

We tried two options to fix the regression:

 1) Increase the threshold to trigger the slowpath in lruvec stats
    update codepath from 32 to 512.

 2) Remove the slowpath from lruvec stats update codepath and instead
    flush the stats in the page refault codepath. The assumption is that
    the kernel timely flush the stats, so, the update tree would be
    small in the refault codepath to not cause the preformance impact.

Following are the results of will-it-scale/page_fault[1|2|3] benchmark
on four settings i.e.  (1) 5.15-rc1 as baseline (2) 5.15-rc1 with
aa48e47e39 and 7e1c0d6f58 reverted (3) 5.15-rc1 with option-1
(4) 5.15-rc1 with option-2.

  test       (1)      (2)               (3)               (4)
  pg_f1   368563   406277 (10.23%)   399693  (8.44%)   416398 (12.97%)
  pg_f2   338399   372133  (9.96%)   369180  (9.09%)   381024 (12.59%)
  pg_f3   500853   575399 (14.88%)   570388 (13.88%)   576083 (15.02%)

From the above result, it seems like the option-2 not only solves the
regression but also improves the performance for at least these
benchmarks.

Feng Tang (intel) ran the aim7 benchmark with these two options and
confirms that option-1 reduces the regression but option-2 removes the
regression.

Michael Larabel (phoronix) ran multiple benchmarks with these options
and reported the results at [3] and it shows for most benchmarks
option-2 removes the regression introduced by the commit aa48e47e39
("memcg: infrastructure to flush memcg stats").

Based on the experiment results, this patch proposed the option-2 as the
solution to resolve the regression.

Link: https://lore.kernel.org/all/20210726022421.GB21872@xsang-OptiPlex-9020 [1]
Link: https://www.phoronix.com/scan.php?page=article&item=linux515-compile-regress [2]
Link: https://openbenchmarking.org/result/2109226-DEBU-LINUX5104 [3]
Fixes: aa48e47e39 ("memcg: infrastructure to flush memcg stats")
Signed-off-by: Shakeel Butt <shakeelb@google.com>
Tested-by: Michael Larabel <Michael@phoronix.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Roman Gushchin <guro@fb.com>
Cc: Feng Tang <feng.tang@intel.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Hillf Danton <hdanton@sina.com>,
Cc: Michal Koutný <mkoutny@suse.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-09-23 10:09:13 -07:00
Sudarsana Reddy Kalluru
4d88c339c4 atlantic: Fix issue in the pm resume flow.
After fixing hibernation resume flow, another usecase was found which
should be explicitly handled - resume when device is in "down" state.
Invoke aq_nic_init jointly with aq_nic_start only if ndev was already
up during suspend/hibernate. We still need to perform nic_deinit() if
caller requests for it, to handle the freeze/resume scenarios.

Fixes: 57f780f1c4 ("atlantic: Fix driver resume flow.")
Signed-off-by: Sudarsana Reddy Kalluru <skalluru@marvell.com>
Signed-off-by: Igor Russkikh <irusskikh@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-23 13:24:14 +01:00
Aya Levin
fdbccea419 net/mlx4_en: Don't allow aRFS for encapsulated packets
Driver doesn't support aRFS for encapsulated packets, return early error
in such a case.

Fixes: 1eb8c695bd ("net/mlx4_en: Add accelerated RFS support")
Signed-off-by: Aya Levin <ayal@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-23 13:17:39 +01:00
Vladimir Oltean
acc64f52af net: mscc: ocelot: fix forwarding from BLOCKING ports remaining enabled
The blamed commit made the fatally incorrect assumption that ports which
aren't in the FORWARDING STP state should not have packets forwarded
towards them, and that is all that needs to be done.

However, that logic alone permits BLOCKING ports to forward to
FORWARDING ports, which of course allows packet storms to occur when
there is an L2 loop.

The ocelot_get_bridge_fwd_mask should not only ask "what can the bridge
do for you", but "what can you do for the bridge". This way, only
FORWARDING ports forward to the other FORWARDING ports from the same
bridging domain, and we are still compatible with the idea of multiple
bridges.

Fixes: df291e54cc ("net: ocelot: support multiple bridges")
Suggested-by: Colin Foster <colin.foster@in-advantage.com>
Reported-by: Colin Foster <colin.foster@in-advantage.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Colin Foster <colin.foster@in-advantage.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-23 13:15:31 +01:00
Felix Fietkau
e68daf61ed net: ethernet: mtk_eth_soc: avoid creating duplicate offload entries
Sometimes multiple CLS_REPLACE calls are issued for the same connection.
rhashtable_insert_fast does not check for these duplicates, so multiple
hardware flow entries can be created.
Fix this by checking for an existing entry early

Fixes: 502e84e238 ("net: ethernet: mtk_eth_soc: add flow offloading support")
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Signed-off-by: Ilya Lipnitskiy <ilya.lipnitskiy@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-23 13:14:19 +01:00
Mark Brown
31339440b2 nfc: st-nci: Add SPI ID matching DT compatible
Currently autoloading for SPI devices does not use the DT ID table, it uses
SPI modalises. Supporting OF modalises is going to be difficult if not
impractical, an attempt was made but has been reverted, so ensure that
module autoloading works for this driver by adding the part name used in
the compatible to the list of SPI IDs.

Fixes: 96c8395e21 ("spi: Revert modalias changes")
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-23 12:53:06 +01:00
Guvenc Gulce
5b099870c8 MAINTAINERS: remove Guvenc Gulce as net/smc maintainer
Remove myself as net/smc maintainer, as I am
leaving IBM soon and can not maintain net/smc anymore.

Cc: Julian Wiedmann <jwi@linux.ibm.com>
Acked-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: Guvenc Gulce <guvenc@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-23 12:51:42 +01:00
Ido Schimmel
3106a08475 nexthop: Fix memory leaks in nexthop notification chain listeners
syzkaller discovered memory leaks [1] that can be reduced to the
following commands:

 # ip nexthop add id 1 blackhole
 # devlink dev reload pci/0000:06:00.0

As part of the reload flow, mlxsw will unregister its netdevs and then
unregister from the nexthop notification chain. Before unregistering
from the notification chain, mlxsw will receive delete notifications for
nexthop objects using netdevs registered by mlxsw or their uppers. mlxsw
will not receive notifications for nexthops using netdevs that are not
dismantled as part of the reload flow. For example, the blackhole
nexthop above that internally uses the loopback netdev as its nexthop
device.

One way to fix this problem is to have listeners flush their nexthop
tables after unregistering from the notification chain. This is
error-prone as evident by this patch and also not symmetric with the
registration path where a listener receives a dump of all the existing
nexthops.

Therefore, fix this problem by replaying delete notifications for the
listener being unregistered. This is symmetric to the registration path
and also consistent with the netdev notification chain.

The above means that unregister_nexthop_notifier(), like
register_nexthop_notifier(), will have to take RTNL in order to iterate
over the existing nexthops and that any callers of the function cannot
hold RTNL. This is true for mlxsw and netdevsim, but not for the VXLAN
driver. To avoid a deadlock, change the latter to unregister its nexthop
listener without holding RTNL, making it symmetric to the registration
path.

[1]
unreferenced object 0xffff88806173d600 (size 512):
  comm "syz-executor.0", pid 1290, jiffies 4295583142 (age 143.507s)
  hex dump (first 32 bytes):
    41 9d 1e 60 80 88 ff ff 08 d6 73 61 80 88 ff ff  A..`......sa....
    08 d6 73 61 80 88 ff ff 01 00 00 00 00 00 00 00  ..sa............
  backtrace:
    [<ffffffff81a6b576>] kmemleak_alloc_recursive include/linux/kmemleak.h:43 [inline]
    [<ffffffff81a6b576>] slab_post_alloc_hook+0x96/0x490 mm/slab.h:522
    [<ffffffff81a716d3>] slab_alloc_node mm/slub.c:3206 [inline]
    [<ffffffff81a716d3>] slab_alloc mm/slub.c:3214 [inline]
    [<ffffffff81a716d3>] kmem_cache_alloc_trace+0x163/0x370 mm/slub.c:3231
    [<ffffffff82e8681a>] kmalloc include/linux/slab.h:591 [inline]
    [<ffffffff82e8681a>] kzalloc include/linux/slab.h:721 [inline]
    [<ffffffff82e8681a>] mlxsw_sp_nexthop_obj_group_create drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c:4918 [inline]
    [<ffffffff82e8681a>] mlxsw_sp_nexthop_obj_new drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c:5054 [inline]
    [<ffffffff82e8681a>] mlxsw_sp_nexthop_obj_event+0x59a/0x2910 drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c:5239
    [<ffffffff813ef67d>] notifier_call_chain+0xbd/0x210 kernel/notifier.c:83
    [<ffffffff813f0662>] blocking_notifier_call_chain kernel/notifier.c:318 [inline]
    [<ffffffff813f0662>] blocking_notifier_call_chain+0x72/0xa0 kernel/notifier.c:306
    [<ffffffff8384b9c6>] call_nexthop_notifiers+0x156/0x310 net/ipv4/nexthop.c:244
    [<ffffffff83852bd8>] insert_nexthop net/ipv4/nexthop.c:2336 [inline]
    [<ffffffff83852bd8>] nexthop_add net/ipv4/nexthop.c:2644 [inline]
    [<ffffffff83852bd8>] rtm_new_nexthop+0x14e8/0x4d10 net/ipv4/nexthop.c:2913
    [<ffffffff833e9a78>] rtnetlink_rcv_msg+0x448/0xbf0 net/core/rtnetlink.c:5572
    [<ffffffff83608703>] netlink_rcv_skb+0x173/0x480 net/netlink/af_netlink.c:2504
    [<ffffffff833de032>] rtnetlink_rcv+0x22/0x30 net/core/rtnetlink.c:5590
    [<ffffffff836069de>] netlink_unicast_kernel net/netlink/af_netlink.c:1314 [inline]
    [<ffffffff836069de>] netlink_unicast+0x5ae/0x7f0 net/netlink/af_netlink.c:1340
    [<ffffffff83607501>] netlink_sendmsg+0x8e1/0xe30 net/netlink/af_netlink.c:1929
    [<ffffffff832fde84>] sock_sendmsg_nosec net/socket.c:704 [inline]
    [<ffffffff832fde84>] sock_sendmsg net/socket.c:724 [inline]
    [<ffffffff832fde84>] ____sys_sendmsg+0x874/0x9f0 net/socket.c:2409
    [<ffffffff83304a44>] ___sys_sendmsg+0x104/0x170 net/socket.c:2463
    [<ffffffff83304c01>] __sys_sendmsg+0x111/0x1f0 net/socket.c:2492
    [<ffffffff83304d5d>] __do_sys_sendmsg net/socket.c:2501 [inline]
    [<ffffffff83304d5d>] __se_sys_sendmsg net/socket.c:2499 [inline]
    [<ffffffff83304d5d>] __x64_sys_sendmsg+0x7d/0xc0 net/socket.c:2499

Fixes: 2a014b200b ("mlxsw: spectrum_router: Add support for nexthop objects")
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-23 12:33:22 +01:00
Johannes Berg
313bbd1990 mac80211-hwsim: fix late beacon hrtimer handling
Thomas explained in https://lore.kernel.org/r/87mtoeb4hb.ffs@tglx
that our handling of the hrtimer here is wrong: If the timer fires
late (e.g. due to vCPU scheduling, as reported by Dmitry/syzbot)
then it tries to actually rearm the timer at the next deadline,
which might be in the past already:

 1          2          3          N          N+1
 |          |          |   ...    |          |

 ^ intended to fire here (1)
            ^ next deadline here (2)
                                      ^ actually fired here

The next time it fires, it's later, but will still try to schedule
for the next deadline (now 3), etc. until it catches up with N,
but that might take a long time, causing stalls etc.

Now, all of this is simulation, so we just have to fix it, but
note that the behaviour is wrong even per spec, since there's no
value then in sending all those beacons unaligned - they should be
aligned to the TBTT (1, 2, 3, ... in the picture), and if we're a
bit (or a lot) late, then just resume at that point.

Therefore, change the code to use hrtimer_forward_now() which will
ensure that the next firing of the timer would be at N+1 (in the
picture), i.e. the next interval point after the current time.

Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Reported-by: syzbot+0e964fad69a9c462bc1e@syzkaller.appspotmail.com
Fixes: 01e59e467e ("mac80211_hwsim: hrtimer beacon")
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20210915112936.544f383472eb.I3f9712009027aa09244b65399bf18bf482a8c4f1@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2021-09-23 13:25:12 +02:00
Johannes Berg
b9731062ce mac80211: mesh: fix potentially unaligned access
The pointer here points directly into the frame, so the
access is potentially unaligned. Use get_unaligned_le16
to avoid that.

Fixes: 3f52b7e328 ("mac80211: mesh power save basics")
Link: https://lore.kernel.org/r/20210920154009.3110ff75be0c.Ib6a2ff9e9cc9bc6fca50fce631ec1ce725cc926b@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2021-09-23 13:25:09 +02:00
Lorenzo Bianconi
13cb6d826e mac80211: limit injected vht mcs/nss in ieee80211_parse_tx_radiotap
Limit max values for vht mcs and nss in ieee80211_parse_tx_radiotap
routine in order to fix the following warning reported by syzbot:

WARNING: CPU: 0 PID: 10717 at include/net/mac80211.h:989 ieee80211_rate_set_vht include/net/mac80211.h:989 [inline]
WARNING: CPU: 0 PID: 10717 at include/net/mac80211.h:989 ieee80211_parse_tx_radiotap+0x101e/0x12d0 net/mac80211/tx.c:2244
Modules linked in:
CPU: 0 PID: 10717 Comm: syz-executor.5 Not tainted 5.14.0-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
RIP: 0010:ieee80211_rate_set_vht include/net/mac80211.h:989 [inline]
RIP: 0010:ieee80211_parse_tx_radiotap+0x101e/0x12d0 net/mac80211/tx.c:2244
RSP: 0018:ffffc9000186f3e8 EFLAGS: 00010216
RAX: 0000000000000618 RBX: ffff88804ef76500 RCX: ffffc900143a5000
RDX: 0000000000040000 RSI: ffffffff888f478e RDI: 0000000000000003
RBP: 00000000ffffffff R08: 0000000000000000 R09: 0000000000000100
R10: ffffffff888f46f9 R11: 0000000000000000 R12: 00000000fffffff8
R13: ffff88804ef7653c R14: 0000000000000001 R15: 0000000000000004
FS:  00007fbf5718f700(0000) GS:ffff8880b9c00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000001b2de23000 CR3: 000000006a671000 CR4: 00000000001506f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
Call Trace:
 ieee80211_monitor_select_queue+0xa6/0x250 net/mac80211/iface.c:740
 netdev_core_pick_tx+0x169/0x2e0 net/core/dev.c:4089
 __dev_queue_xmit+0x6f9/0x3710 net/core/dev.c:4165
 __bpf_tx_skb net/core/filter.c:2114 [inline]
 __bpf_redirect_no_mac net/core/filter.c:2139 [inline]
 __bpf_redirect+0x5ba/0xd20 net/core/filter.c:2162
 ____bpf_clone_redirect net/core/filter.c:2429 [inline]
 bpf_clone_redirect+0x2ae/0x420 net/core/filter.c:2401
 bpf_prog_eeb6f53a69e5c6a2+0x59/0x234
 bpf_dispatcher_nop_func include/linux/bpf.h:717 [inline]
 __bpf_prog_run include/linux/filter.h:624 [inline]
 bpf_prog_run include/linux/filter.h:631 [inline]
 bpf_test_run+0x381/0xa30 net/bpf/test_run.c:119
 bpf_prog_test_run_skb+0xb84/0x1ee0 net/bpf/test_run.c:663
 bpf_prog_test_run kernel/bpf/syscall.c:3307 [inline]
 __sys_bpf+0x2137/0x5df0 kernel/bpf/syscall.c:4605
 __do_sys_bpf kernel/bpf/syscall.c:4691 [inline]
 __se_sys_bpf kernel/bpf/syscall.c:4689 [inline]
 __x64_sys_bpf+0x75/0xb0 kernel/bpf/syscall.c:4689
 do_syscall_x64 arch/x86/entry/common.c:50 [inline]
 do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
 entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x4665f9

Reported-by: syzbot+0196ac871673f0c20f68@syzkaller.appspotmail.com
Fixes: 646e76bb5d ("mac80211: parse VHT info in injected frames")
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Link: https://lore.kernel.org/r/c26c3f02dcb38ab63b2f2534cb463d95ee81bb13.1632141760.git.lorenzo@kernel.org
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2021-09-23 13:22:57 +02:00
YueHaibing
a6555f8445 mac80211: Drop frames from invalid MAC address in ad-hoc mode
WARNING: CPU: 1 PID: 9 at net/mac80211/sta_info.c:554
sta_info_insert_rcu+0x121/0x12a0
Modules linked in:
CPU: 1 PID: 9 Comm: kworker/u8:1 Not tainted 5.14.0-rc7+ #253
Workqueue: phy3 ieee80211_iface_work
RIP: 0010:sta_info_insert_rcu+0x121/0x12a0
...
Call Trace:
 ieee80211_ibss_finish_sta+0xbc/0x170
 ieee80211_ibss_work+0x13f/0x7d0
 ieee80211_iface_work+0x37a/0x500
 process_one_work+0x357/0x850
 worker_thread+0x41/0x4d0

If an Ad-Hoc node receives packets with invalid source MAC address,
it hits a WARN_ON in sta_info_insert_check(), this can spam the log.

Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Link: https://lore.kernel.org/r/20210827144230.39944-1-yuehaibing@huawei.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2021-09-23 13:05:38 +02:00
Chih-Kang Chang
fe94bac626 mac80211: Fix ieee80211_amsdu_aggregate frag_tail bug
In ieee80211_amsdu_aggregate() set a pointer frag_tail point to the
end of skb_shinfo(head)->frag_list, and use it to bind other skb in
the end of this function. But when execute ieee80211_amsdu_aggregate()
->ieee80211_amsdu_realloc_pad()->pskb_expand_head(), the address of
skb_shinfo(head)->frag_list will be changed. However, the
ieee80211_amsdu_aggregate() not update frag_tail after call
pskb_expand_head(). That will cause the second skb can't bind to the
head skb appropriately.So we update the address of frag_tail to fix it.

Fixes: 6e0456b545 ("mac80211: add A-MSDU tx support")
Signed-off-by: Chih-Kang Chang <gary.chang@realtek.com>
Signed-off-by: Zong-Zhe Yang <kevin_yang@realtek.com>
Signed-off-by: Ping-Ke Shih <pkshih@realtek.com>
Link: https://lore.kernel.org/r/20210830073240.12736-1-pkshih@realtek.com
[reword comment]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2021-09-23 13:04:35 +02:00
Felix Fietkau
98d46b021f Revert "mac80211: do not use low data rates for data frames with no ack flag"
This reverts commit d333322361 ("mac80211: do not use low data rates for
data frames with no ack flag").

Returning false early in rate_control_send_low breaks sending broadcast
packets, since rate control will not select a rate for it.

Before re-introducing a fixed version of this patch, we should probably also
make some changes to rate control to be more conservative in selecting rates
for no-ack packets and also prevent using probing rates on them, since we won't
get any feedback.

Fixes: d333322361 ("mac80211: do not use low data rates for data frames with no ack flag")
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Link: https://lore.kernel.org/r/20210906083559.9109-1-nbd@nbd.name
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2021-09-23 12:59:29 +02:00
Geert Uytterhoeven
58e2cf5d79 init: Revert accidental changes to print irqs_disabled()
Commit f8ade8dddb ("xsurf100: drop include of lib8390.c") accidentally
changed init/main.c.  Revert that part.

Fixes: f8ade8dddb ("xsurf100: drop include of lib8390.c")
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-09-22 13:02:30 -07:00
Konrad Rzeszutek Wilk
4057525736 MAINTAINERS: Update Xen-[PCI,SWIOTLB,Block] maintainership
Konrad's new job role is putting a serious cramp on him
being a responsive maintainer and as such he is handing off
the reins to Juergen, Roger, and Stefano.

Thank you!

Acked-by: Juergen Gross <jgross@suse.com>
Acked-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-09-22 12:49:48 -07:00
Konrad Rzeszutek Wilk
2e36a964ad MAINTAINERS: Update SWIOTLB maintainership
Konrad's new job role is putting a serious cramp on him
being a responsive maintainer and as such he is handing off
the reins to Christoph Hellwig.

Thank you!

Acked-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-09-22 12:49:48 -07:00
Dinh Nguyen
c4aa1eeb09 MAINTAINERS: update entry for NIOS2
Ley Foon has left Intel and will no longer be able to maintain NIOS2.
Update the MAINTAINER's entry to Dinh Nguyen.

Acked-by: Ley Foon Tan <ley.foon.tan@intel.com>
Signed-off-by: Dinh Nguyen <dinguyen@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-09-22 12:37:46 -07:00
Linus Torvalds
9bedf10b57 spi: Fix modalias issues
As reported by Russell King the change to use OF style modaliases for DT
 enumerated broke at least the spi-nor driver, the patch here reverts
 that change to fix the regression.  Sadly this will mean that anything
 that started loading since the change to OF modaliases will run into
 issues, there doesn't seem to be any approach which doesn't cause some
 problems and thi seems like the least bad approach - gory details are in
 the commit log for the change.  I'm currently working through the SPI
 drivers to add ID tables and missing IDs to tables which should address
 things from the other end, this seems more straightforward and robust
 than any other options.
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCgAdFiEEreZoqmdXGLWf4p/qJNaLcl1Uh9AFAmFLdIEACgkQJNaLcl1U
 h9DRKgf/ZwDCBYGJuF25LRKxjGIOzKpkA+FGwBFvyQfvtp72HWc19c4UPnG5yQro
 ZfzBNCX4nOFL7z1My2M0MRbPYJMQI5PGMu8di856y/+UHmS1PrQXRdq+b2F2UyXo
 8qnhePU4sN+iP9RaQfCxBIistK0kNALvdWELlIVHBeMznVX3/touXNCzZKPSeW3/
 uXK7Y9vinJ8jvSkCpc0XMAU1DcM0ldZzVYQ8v8bjblcHMHK/RYLpEbzklLWaFC+/
 kp7oVUGTjcxT2IZ/t3IkfmOdUbnHLZhRlDkflmdesjS1adJR5KXj+fx+hFzkPZZd
 ptwmajovs2OlM952KMXHa4Z1PcxOaA==
 =dmjx
 -----END PGP SIGNATURE-----

Merge tag 'spi-fix-v5.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi

Pull spi modalias fix from Mark Brown:
 "Fix modalias issues

  As reported by Russell King the change to use OF style modaliases for
  DT enumerated broke at least the spi-nor driver, the patch here
  reverts that change to fix the regression.

  Sadly this will mean that anything that started loading since the
  change to OF modaliases will run into issues, there doesn't seem to be
  any approach which doesn't cause some problems and thi seems like the
  least bad approach - gory details are in the commit log for the
  change.

  I'm currently working through the SPI drivers to add ID tables and
  missing IDs to tables which should address things from the other end,
  this seems more straightforward and robust than any other options"

* tag 'spi-fix-v5.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi:
  spi: Revert modalias changes
2021-09-22 11:58:24 -07:00
Linus Torvalds
cf1d2c3e7e Critical bug fixes:
- Fix crash in NLM TEST procedure
 - NFSv4.1+ backchannel not restored after PATH_DOWN
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEKLLlsBKG3yQ88j7+M2qzM29mf5cFAmFLUKcACgkQM2qzM29m
 f5fCAhAAp66o6n49/fxOLWo+MftFlT1EY8NtFjyTh1x/o4R9S74qxTy3RC3GzRvk
 oGOnkFvuiToyjcoeyb9yumYxO00Qf75hrTJvsXqnsbrLZOAKVuITn9MkQXBOXjCi
 GDxQSRFg8ihz0vG4YbE/brnZR1fIMr7KSzXLwdXOs8mKvro7JmiiB87JOGhw9yon
 W9+bFcnN2TynYsqmtHu987LvaIUE79dFfhrfj6bIobNQ25oqJoG5e1/M48/1MJol
 DFPiWoErJ/S1c0lA8rbjIvtzgbXs84U88EXmFUVsxSXhepGui3Uh/cA49vu46icH
 vze8fwHs6q3qzF7gE6jbslrrdQ/H6AZ6arhe27h4cVxdh0AouDuBat2xLY2I4TP3
 DckfLbEsOqTJhfzqYnk+8ckOaBMpkfyDqG6SodIKglPoknNCtCp0/7NuYF0yMLe5
 I6pO7JDgz7ySrbpm27ZMOpdwkLqqA1i8V9MPvimUsKTYJqlVBsc2RsdldQhunNbd
 50InJarWQ+japkEl3WK3aJ5rTluiIWjcePT7wA76wP3PnZmcjweOiQMc8uuLlzPw
 tOLRlHdpdZzeM3hGuI6KKsg8ZRbDB7L8YiaLkSwxl2qwJwDSB0xo7/WWwVzkyfdf
 zdQ2cR9z70I2Bgxq/1lAPB8tXq+SEvu1qCYDFSTo3I9c0Y2frs4=
 =L0c1
 -----END PGP SIGNATURE-----

Merge tag 'nfsd-5.15-2' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux

Pull nfsd fixes from Chuck Lever:
 "Critical bug fixes:

   - Fix crash in NLM TEST procedure

   - NFSv4.1+ backchannel not restored after PATH_DOWN"

* tag 'nfsd-5.15-2' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux:
  nfsd: back channel stuck in SEQ4_STATUS_CB_PATH_DOWN
  NLM: Fix svcxdr_encode_owner()
2021-09-22 09:21:02 -07:00
Linus Torvalds
bee42512c4 platform-drivers-x86 for v5.15-2
Highlights:
  -  amd-pmc fix for some suspend/resume issues
  -  intel-hid fix to avoid false-positive SW_TABLET_MODE=1 reporting
  -  some build error/warning fixes
  -  various DMI quirk additions
 
 The following is an automated git shortlog grouped by driver:
 
 amd-pmc:
  -  Increase the response register timeout
 
 dell:
  -  fix DELL_WMI_PRIVACY dependencies & build error
 
 gigabyte-wmi:
  -  add support for B550I Aorus Pro AX
 
 lg-laptop:
  -  Correctly handle dmi_get_system_info() returning NULL
 
 platform/x86/intel:
  -  hid: Add DMI switches allow list
  -  punit_ipc: Drop wrong use of ACPI_PTR()
 
 touchscreen_dmi:
  -  Update info for the Chuwi Hi10 Plus (CWI527) tablet
  -  Add info for the Chuwi HiBook (CWI514) tablet
 -----BEGIN PGP SIGNATURE-----
 
 iQFIBAABCAAyFiEEuvA7XScYQRpenhd+kuxHeUQDJ9wFAmFJ5SwUHGhkZWdvZWRl
 QHJlZGhhdC5jb20ACgkQkuxHeUQDJ9zuPwf/ZGt6pXKBZLPQqZ0MpjUgBgUtjk70
 ISEplVXx7A3zBJGRJCHBVVT7VYnx7rnRyq/jIukOxTHG6EfbQS88+oDXePSoNzNp
 xkrY17xPBWjny1yYWcmkRMVYdyRBmF6jKEC3oLVW/ilot5+BYG4m9QNgQbSKmUnQ
 eKGZ8O4r0COBeUJVmcPa4Gcd9LuLQMMEE2yE6Hi1la4fo2civ+QzSTdbLw28QhAd
 QlD79pw744lhvBdXqzNZzyHAUhbbBxWIl79YJZnwScPN2EEqht3BIqehtd+uRm3l
 OCvR7x14xmEfTm96rjrGwQFxEKPR3HWtwqJmb7W0ZeLJ4bsxdxYb3wHfug==
 =5l9q
 -----END PGP SIGNATURE-----

Merge tag 'platform-drivers-x86-v5.15-2' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86

Pull x86 platform driver fixes from Hans de Goede:
 "The first round of bug-fixes for platform-drivers-x86 for 5.15,
  highlights:

   - amd-pmc fix for some suspend/resume issues

   - intel-hid fix to avoid false-positive SW_TABLET_MODE=1 reporting

   - some build error/warning fixes

   - various DMI quirk additions"

* tag 'platform-drivers-x86-v5.15-2' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86:
  platform/x86: gigabyte-wmi: add support for B550I Aorus Pro AX
  platform/x86/intel: hid: Add DMI switches allow list
  platform/x86: dell: fix DELL_WMI_PRIVACY dependencies & build error
  platform/x86: amd-pmc: Increase the response register timeout
  platform/x86: touchscreen_dmi: Update info for the Chuwi Hi10 Plus (CWI527) tablet
  platform/x86: touchscreen_dmi: Add info for the Chuwi HiBook (CWI514) tablet
  lg-laptop: Correctly handle dmi_get_system_info() returning NULL
  platform/x86/intel: punit_ipc: Drop wrong use of ACPI_PTR()
2021-09-22 09:16:18 -07:00
Jiri Slaby
8f1b7ba55c MAINTAINERS: ARM/VT8500, remove defunct e-mail
linux@prisktech.co.nz is defunct:

  4.1.2 <linux@prisktech.co.nz>: Recipient address rejected: Domain not found

Remove it from MAINTAINERS and mark the ARM/VT8500 entry orphan.

Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-09-22 08:46:30 -07:00
Paolo Abeni
977d293e23 mptcp: ensure tx skbs always have the MPTCP ext
Due to signed/unsigned comparison, the expression:

	info->size_goal - skb->len > 0

evaluates to true when the size goal is smaller than the
skb size. That results in lack of tx cache refill, so that
the skb allocated by the core TCP code lacks the required
MPTCP skb extensions.

Due to the above, syzbot is able to trigger the following WARN_ON():

WARNING: CPU: 1 PID: 810 at net/mptcp/protocol.c:1366 mptcp_sendmsg_frag+0x1362/0x1bc0 net/mptcp/protocol.c:1366
Modules linked in:
CPU: 1 PID: 810 Comm: syz-executor.4 Not tainted 5.14.0-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
RIP: 0010:mptcp_sendmsg_frag+0x1362/0x1bc0 net/mptcp/protocol.c:1366
Code: ff 4c 8b 74 24 50 48 8b 5c 24 58 e9 0f fb ff ff e8 13 44 8b f8 4c 89 e7 45 31 ed e8 98 57 2e fe e9 81 f4 ff ff e8 fe 43 8b f8 <0f> 0b 41 bd ea ff ff ff e9 6f f4 ff ff 4c 89 e7 e8 b9 8e d2 f8 e9
RSP: 0018:ffffc9000531f6a0 EFLAGS: 00010216
RAX: 000000000000697f RBX: 0000000000000000 RCX: ffffc90012107000
RDX: 0000000000040000 RSI: ffffffff88eac9e2 RDI: 0000000000000003
RBP: ffff888078b15780 R08: 0000000000000000 R09: 0000000000000000
R10: ffffffff88eac017 R11: 0000000000000000 R12: ffff88801de0a280
R13: 0000000000006b58 R14: ffff888066278280 R15: ffff88803c2fe9c0
FS:  00007fd9f866e700(0000) GS:ffff8880b9d00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007faebcb2f718 CR3: 00000000267cb000 CR4: 00000000001506e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 __mptcp_push_pending+0x1fb/0x6b0 net/mptcp/protocol.c:1547
 mptcp_release_cb+0xfe/0x210 net/mptcp/protocol.c:3003
 release_sock+0xb4/0x1b0 net/core/sock.c:3206
 sk_stream_wait_memory+0x604/0xed0 net/core/stream.c:145
 mptcp_sendmsg+0xc39/0x1bc0 net/mptcp/protocol.c:1749
 inet6_sendmsg+0x99/0xe0 net/ipv6/af_inet6.c:643
 sock_sendmsg_nosec net/socket.c:704 [inline]
 sock_sendmsg+0xcf/0x120 net/socket.c:724
 sock_write_iter+0x2a0/0x3e0 net/socket.c:1057
 call_write_iter include/linux/fs.h:2163 [inline]
 new_sync_write+0x40b/0x640 fs/read_write.c:507
 vfs_write+0x7cf/0xae0 fs/read_write.c:594
 ksys_write+0x1ee/0x250 fs/read_write.c:647
 do_syscall_x64 arch/x86/entry/common.c:50 [inline]
 do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
 entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x4665f9
Code: ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 bc ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007fd9f866e188 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
RAX: ffffffffffffffda RBX: 000000000056c038 RCX: 00000000004665f9
RDX: 00000000000e7b78 RSI: 0000000020000000 RDI: 0000000000000003
RBP: 00000000004bfcc4 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 000000000056c038
R13: 0000000000a9fb1f R14: 00007fd9f866e300 R15: 0000000000022000

Fix the issue rewriting the relevant expression to avoid
sign-related problems - note: size_goal is always >= 0.

Additionally, ensure that the skb in the tx cache always carries
the relevant extension.

Reported-and-tested-by: syzbot+263a248eec3e875baa7b@syzkaller.appspotmail.com
Fixes: 1094c6fe72 ("mptcp: fix possible divide by zero")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-22 14:39:41 +01:00
Shai Malin
1ea7812326 qed: rdma - don't wait for resources under hw error recovery flow
If the HW device is during recovery, the HW resources will never return,
hence we shouldn't wait for the CID (HW context ID) bitmaps to clear.
This fix speeds up the error recovery flow.

Fixes: 64515dc899 ("qed: Add infrastructure for error detection and recovery")
Signed-off-by: Michal Kalderon <mkalderon@marvell.com>
Signed-off-by: Ariel Elior <aelior@marvell.com>
Signed-off-by: Shai Malin <smalin@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-22 14:38:34 +01:00
Jakub Kicinski
b52d3161c2 Merge branch 's390-qeth-fixes-2021-09-21'
Julian Wiedmann says:

====================
s390/qeth: fixes 2021-09-21

This brings two fixes for deadlocks when a device is removed while it
has certain types of async work pending. And one additional fix for a
missing NULL check in an error case.
====================

Link: https://lore.kernel.org/r/20210921145217.1584654-1-jwi@linux.ibm.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-09-21 20:02:28 -07:00
Alexandra Winter
d2b59bd4b0 s390/qeth: fix deadlock during failing recovery
Commit 0b9902c1fc ("s390/qeth: fix deadlock during recovery") removed
taking discipline_mutex inside qeth_do_reset(), fixing potential
deadlocks. An error path was missed though, that still takes
discipline_mutex and thus has the original deadlock potential.

Intermittent deadlocks were seen when a qeth channel path is configured
offline, causing a race between qeth_do_reset and ccwgroup_remove.
Call qeth_set_offline() directly in the qeth_do_reset() error case and
then a new variant of ccwgroup_set_offline(), without taking
discipline_mutex.

Fixes: b41b554c1e ("s390/qeth: fix locking for discipline setup / removal")
Signed-off-by: Alexandra Winter <wintera@linux.ibm.com>
Reviewed-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-09-21 20:02:24 -07:00
Alexandra Winter
ee909d0b1d s390/qeth: Fix deadlock in remove_discipline
Problem: qeth_close_dev_handler is a worker that tries to acquire
card->discipline_mutex via drv->set_offline() in ccwgroup_set_offline().
Since commit b41b554c1e
("s390/qeth: fix locking for discipline setup / removal")
qeth_remove_discipline() is called under card->discipline_mutex and
cancels the work and waits for it to finish.

STOPLAN reception with reason code IPA_RC_VEPA_TO_VEB_TRANSITION is the
only situation that schedules close_dev_work. In that situation scheduling
qeth recovery will also result in an offline interface, when resetting the
isolation mode fails, if the external switch is still set to VEB.
And since commit 0b9902c1fc ("s390/qeth: fix deadlock during recovery")
qeth recovery does not aquire card->discipline_mutex anymore.

So we accept the longer pathlength of qeth_schedule_recovery in this
error situation and re-use the existing function.

As a side-benefit this changes the hwtrap to behave like during recovery
instead of like during a user-triggered set_offline.

Fixes: b41b554c1e ("s390/qeth: fix locking for discipline setup / removal")
Signed-off-by: Alexandra Winter <wintera@linux.ibm.com>
Acked-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-09-21 20:02:24 -07:00
Julian Wiedmann
248f064af2 s390/qeth: fix NULL deref in qeth_clear_working_pool_list()
When qeth_set_online() calls qeth_clear_working_pool_list() to roll
back after an error exit from qeth_hardsetup_card(), we are at risk of
accessing card->qdio.in_q before it was allocated by
qeth_alloc_qdio_queues() via qeth_mpc_initialize().

qeth_clear_working_pool_list() then dereferences NULL, and by writing to
queue->bufs[i].pool_entry scribbles all over the CPU's lowcore.
Resulting in a crash when those lowcore areas are used next (eg. on
the next machine-check interrupt).

Such a scenario would typically happen when the device is first set
online and its queues aren't allocated yet. An early IO error or certain
misconfigs (eg. mismatched transport mode, bad portno) then cause us to
error out from qeth_hardsetup_card() with card->qdio.in_q still being
NULL.

Fix it by checking the pointer for NULL before accessing it.

Note that we also have (rare) paths inside qeth_mpc_initialize() where
a configuration change can cause us to free the existing queues,
expecting that subsequent code will allocate them again. If we then
error out before that re-allocation happens, the same bug occurs.

Fixes: eff73e16ee ("s390/qeth: tolerate pre-filled RX buffer")
Reported-by: Stefan Raspl <raspl@linux.ibm.com>
Root-caused-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Reviewed-by: Alexandra Winter <wintera@linux.ibm.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-09-21 20:02:23 -07:00
Mark Brown
96c8395e21
spi: Revert modalias changes
During the v5.13 cycle we updated the SPI subsystem to generate OF style
modaliases for SPI devices, replacing the old Linux style modalises we
used to generate based on spi_device_id which are the DT style name with
the vendor removed.  Unfortunately this means that we start only
reporting OF style modalises and not the old ones and there is nothing
that ensures that drivers list every possible OF compatible string in
their OF ID table.  The result is that there are systems which have been
relying on loading modules based on the old style that are now broken,
as found by Russell King with spi-nor on Macchiatobin.

spi-nor is a particularly problematic case for this, it only lists a
single generic DT compatible jedec,spi-nor in the driver but supports a
huge raft of device specific compatibles, with a large set of part
numbers many of which are offered by multiple vendors.  Russell's
searches of upstream device trees has turned up examples with vendor
names written in non-standard ways too.  To make matters worse up until
8ff16cf77c ("Documentation: devicetree: m25p80: add "nor-jedec"
binding") the generic compatible was not part of the binding so there
are device trees out there written to that binding version which don't
list it all.  The sheer number of parts supported together with our
previous approach of ignoring the vendor ID makes robustly fixing this
by adding compatibles to the spi-nor driver seem problematic, the
current DT binding document does not list all the parts supported by the
driver at the minute (further patches will fix this).

I've also investigated supporting both formats of modalias
simultaneously but that doesn't seem possible, especially without
breaking our userspace ABI which is obviously not viable.

Instead revert the relevant changes for now:

e09f2ab8ee ("spi: update modalias_show after of_device_uevent_modalias support")
3ce6c9e261 ("spi: add of_device_uevent_modalias support")

This will unfortunately mean that any system which had started having
modules autoload based on the OF compatibles for drivers that list
things there but not in the spi_device_ids will now not have those
modules load which is itself a regression.  Since it affects a narrower
time window and the particularly problematic spi-nor driver may be
critical to system boot on smaller systems this seems the best of a
series of bad options.  I will start an audit of SPI drivers to identify
and fix cases where things won't autoload using spi_device_id, this is
not great but seems to be the best way forward that anyone has been able
to identify.

Thanks to Russell for both his report and the additional diagnostic and
analysis work he has done here, the detailed research above was his
work.

Fixes: e09f2ab8ee ("spi: update modalias_show after of_device_uevent_modalias support")
Fixes: 3ce6c9e261 ("spi: add of_device_uevent_modalias support")
Reported-by: Russell King (Oracle) <linux@armlinux.org.uk>
Suggested-by: Russell King (Oracle) <linux@armlinux.org.uk>
Signed-off-by: Mark Brown <broonie@kernel.org>
Tested-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Cc: Andreas Schwab <schwab@suse.de>
Cc: Marco Felsch <m.felsch@pengutronix.de>
2021-09-21 20:22:48 +01:00
Linus Torvalds
92477dd1fa s390 eBPF JIT miscompilation issues fixes.
These issues can be used by an unprivileged local user to circumvent the verifier and gain root privileges.
 
 v4.1+
 s390/bpf: Fix optimizing out zero-extensions
 s390/bpf: Fix 64-bit subtraction of the -0x80000000 constant
 v5.5+
 s390/bpf: Fix branch shortening during codegen pass
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEE3QHqV+H2a8xAv27vjYWKoQLXFBgFAmFDDAoACgkQjYWKoQLX
 FBiK/ggAi9LxNSrxNI4u83PY/ubZ+13XIkB59BtvShHnzSSw/e0Sdbr7dnedljaF
 qWv9MUF9s1P88DRlXw/gHIckvIH77uvT28i5Omg6JfUnBLB0LaVC4QiwhmfbqBbV
 psew9zV1z0/2YFvagCsc/XdF4QdlyLm3GKO/bEAeFXG/jDigkzBVB3clffIv7zm7
 mD4uXgELil8PNv9tliRonUwoGG5P+0tF7KviTxi3aU7NLXtZxJ1Qt0N+HjRaDpwR
 kxvSaukn61yWafUWlgbmGL8HGqhFqCk4SQzHSAZ8m87dl6KjxDiQUjvccn5xQPF2
 M7JQU7INFApvK2QZFgW66jnTeqvUcQ==
 =Qmcn
 -----END PGP SIGNATURE-----

Merge tag 's390-5.15-ebpf-jit-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux

Pull s390 eBPF fixes from Vasily Gorbik:
 "Johan Almbladh has implemented a number of new testcases for eBPF [1],
  which uncovered three miscompilation issues in the s390 eBPF JIT"

Link: https://lore.kernel.org/bpf/20210902185229.1840281-1-johan.almbladh@anyfinetworks.com/ [1]

* tag 's390-5.15-ebpf-jit-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
  s390/bpf: Fix optimizing out zero-extensions
  s390/bpf: Fix 64-bit subtraction of the -0x80000000 constant
  s390/bpf: Fix branch shortening during codegen pass
2021-09-21 09:36:11 -07:00
Linus Torvalds
d5f6545934 qnx4: work around gcc false positive warning bug
In commit b7213ffa0e ("qnx4: avoid stringop-overread errors") I tried
to teach gcc about how the directory entry structure can be two
different things depending on a status flag.  It made the code clearer,
and it seemed to make gcc happy.

However, Arnd points to a gcc bug, where despite using two different
members of a union, gcc then gets confused, and uses the size of one of
the members to decide if a string overrun happens.  And not necessarily
the rigth one.

End result: with some configurations, gcc-11 will still complain about
the source buffer size being overread:

  fs/qnx4/dir.c: In function 'qnx4_readdir':
  fs/qnx4/dir.c:76:32: error: 'strnlen' specified bound [16, 48] exceeds source size 1 [-Werror=stringop-overread]
     76 |                         size = strnlen(name, size);
        |                                ^~~~~~~~~~~~~~~~~~~
  fs/qnx4/dir.c:26:22: note: source object declared here
     26 |                 char de_name;
        |                      ^~~~~~~

because gcc will get confused about which union member entry is actually
getting accessed, even when the source code is very clear about it.  Gcc
internally will have combined two "redundant" pointers (pointing to
different union elements that are at the same offset), and takes the
size checking from one or the other - not necessarily the right one.

This is clearly a gcc bug, but we can work around it fairly easily.  The
biggest thing here is the big honking comment about why we do what we
do.

Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99578#c6
Reported-and-tested-by: Arnd Bergmann <arnd@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-09-21 08:36:48 -07:00
Tobias Jakobi
6f6aab1caf platform/x86: gigabyte-wmi: add support for B550I Aorus Pro AX
Tested with a AMD Ryzen 7 5800X.

Signed-off-by: Tobias Jakobi <tjakobi@math.uni-bielefeld.de>
Acked-by: Thomas Weißschuh <thomas@weissschuh.net>
Link: https://lore.kernel.org/r/20210921100702.3838-1-tjakobi@math.uni-bielefeld.de
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
2021-09-21 15:49:23 +02:00
José Expósito
b201cb0ebe platform/x86/intel: hid: Add DMI switches allow list
Some devices, even non convertible ones, can send incorrect
SW_TABLET_MODE reports.

Add an allow list and accept such reports only from devices in it.

Bug reported for Dell XPS 17 9710 on:
https://gitlab.freedesktop.org/libinput/libinput/-/issues/662

Reported-by: Tobias Gurtzick <magic@wizardtales.com>
Suggested-by: Hans de Goede <hdegoede@redhat.com>
Tested-by: Tobias Gurtzick <magic@wizardtales.com>
Signed-off-by: José Expósito <jose.exposito89@gmail.com>
Link: https://lore.kernel.org/r/20210920160312.9787-1-jose.exposito89@gmail.com
[hdegoede@redhat.com: Check dmi_switches_auto_add_allow_list only once]
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
2021-09-21 15:49:16 +02:00
Randy Dunlap
5b72dafaca platform/x86: dell: fix DELL_WMI_PRIVACY dependencies & build error
When DELL_WMI=y, DELL_WMI_PRIVACY=y, and LEDS_TRIGGER_AUDIO=m, there
is a linker error since the LEDS trigger code is built as a loadable
module. This happens because DELL_WMI_PRIVACY is a bool that depends
on a tristate (LEDS_TRIGGER_AUDIO=m), which can be dangerous.

ld: drivers/platform/x86/dell/dell-wmi-privacy.o: in function `dell_privacy_wmi_probe':
dell-wmi-privacy.c:(.text+0x3df): undefined reference to `ledtrig_audio_get'

Fixes: 8af9fa37b8 ("platform/x86: dell-privacy: Add support for Dell hardware privacy")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Perry Yuan <Perry.Yuan@dell.com>
Cc: Dell.Client.Kernel@dell.com
Cc: platform-driver-x86@vger.kernel.org
Cc: Hans de Goede <hdegoede@redhat.com>
Cc: Mark Gross <mgross@linux.intel.com>
Link: https://lore.kernel.org/r/20210918044829.19222-1-rdunlap@infradead.org
Reviewed-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
2021-09-21 15:49:09 +02:00
David S. Miller
b3f98404bd Merge branch 'dsa-devres'
Vladimir Oltean says:

====================
Fix mdiobus users with devres

Commit ac3a68d566 ("net: phy: don't abuse devres in
devm_mdiobus_register()") by Bartosz Golaszewski has introduced two
classes of potential bugs by making the devres callback of
devm_mdiobus_alloc stop calling mdiobus_unregister.

The exact buggy circumstances are presented in the individual commit
messages. I have searched the tree for other occurrences, but at the
moment:

- for issue (a) I have no concrete proof that other buses except SPI and
  I2C suffer from it, and the only SPI or I2C device drivers that call
  of_mdiobus_alloc are the DSA drivers that leave a NULL
  ds->slave_mii_bus and a non-NULL ds->ops->phy_read, aka ksz9477,
  ksz8795, lan9303_i2c, vsc73xx-spi.

- for issue (b), all drivers which call of_mdiobus_alloc either use
  of_mdiobus_register too, or call mdiobus_unregister sometime within
  the ->remove path.

Although at this point I've seen enough strangeness caused by this
"device_del during ->shutdown" that I'm just going to copy the SPI and
I2C subsystem maintainers to this patch series, to get their feedback
whether they've had reports about things like this before. I don't think
other buses behave in this way, it forces SPI and I2C devices to have to
protect themselves from a really strange set of issues.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-21 13:52:16 +01:00
Vladimir Oltean
74b6d7d133 net: dsa: realtek: register the MDIO bus under devres
The Linux device model permits both the ->shutdown and ->remove driver
methods to get called during a shutdown procedure. Example: a DSA switch
which sits on an SPI bus, and the SPI bus driver calls this on its
->shutdown method:

spi_unregister_controller
-> device_for_each_child(&ctlr->dev, NULL, __unregister);
   -> spi_unregister_device(to_spi_device(dev));
      -> device_del(&spi->dev);

So this is a simple pattern which can theoretically appear on any bus,
although the only other buses on which I've been able to find it are
I2C:

i2c_del_adapter
-> device_for_each_child(&adap->dev, NULL, __unregister_client);
   -> i2c_unregister_device(client);
      -> device_unregister(&client->dev);

The implication of this pattern is that devices on these buses can be
unregistered after having been shut down. The drivers for these devices
might choose to return early either from ->remove or ->shutdown if the
other callback has already run once, and they might choose that the
->shutdown method should only perform a subset of the teardown done by
->remove (to avoid unnecessary delays when rebooting).

So in other words, the device driver may choose on ->remove to not
do anything (therefore to not unregister an MDIO bus it has registered
on ->probe), because this ->remove is actually triggered by the
device_shutdown path, and its ->shutdown method has already run and done
the minimally required cleanup.

This used to be fine until the blamed commit, but now, the following
BUG_ON triggers:

void mdiobus_free(struct mii_bus *bus)
{
	/* For compatibility with error handling in drivers. */
	if (bus->state == MDIOBUS_ALLOCATED) {
		kfree(bus);
		return;
	}

	BUG_ON(bus->state != MDIOBUS_UNREGISTERED);
	bus->state = MDIOBUS_RELEASED;

	put_device(&bus->dev);
}

In other words, there is an attempt to free an MDIO bus which was not
unregistered. The attempt to free it comes from the devres release
callbacks of the SPI device, which are executed after the device is
unregistered.

I'm not saying that the fact that MDIO buses allocated using devres
would automatically get unregistered wasn't strange. I'm just saying
that the commit didn't care about auditing existing call paths in the
kernel, and now, the following code sequences are potentially buggy:

(a) devm_mdiobus_alloc followed by plain mdiobus_register, for a device
    located on a bus that unregisters its children on shutdown. After
    the blamed patch, either both the alloc and the register should use
    devres, or none should.

(b) devm_mdiobus_alloc followed by plain mdiobus_register, and then no
    mdiobus_unregister at all in the remove path. After the blamed
    patch, nobody unregisters the MDIO bus anymore, so this is even more
    buggy than the previous case which needs a specific bus
    configuration to be seen, this one is an unconditional bug.

In this case, the Realtek drivers fall under category (b). To solve it,
we can register the MDIO bus under devres too, which restores the
previous behavior.

Fixes: ac3a68d566 ("net: phy: don't abuse devres in devm_mdiobus_register()")
Reported-by: Lino Sanfilippo <LinoSanfilippo@gmx.de>
Reported-by: Alvin Šipraga <alsi@bang-olufsen.dk>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-21 13:52:16 +01:00
Vladimir Oltean
5135e96a3d net: dsa: don't allocate the slave_mii_bus using devres
The Linux device model permits both the ->shutdown and ->remove driver
methods to get called during a shutdown procedure. Example: a DSA switch
which sits on an SPI bus, and the SPI bus driver calls this on its
->shutdown method:

spi_unregister_controller
-> device_for_each_child(&ctlr->dev, NULL, __unregister);
   -> spi_unregister_device(to_spi_device(dev));
      -> device_del(&spi->dev);

So this is a simple pattern which can theoretically appear on any bus,
although the only other buses on which I've been able to find it are
I2C:

i2c_del_adapter
-> device_for_each_child(&adap->dev, NULL, __unregister_client);
   -> i2c_unregister_device(client);
      -> device_unregister(&client->dev);

The implication of this pattern is that devices on these buses can be
unregistered after having been shut down. The drivers for these devices
might choose to return early either from ->remove or ->shutdown if the
other callback has already run once, and they might choose that the
->shutdown method should only perform a subset of the teardown done by
->remove (to avoid unnecessary delays when rebooting).

So in other words, the device driver may choose on ->remove to not
do anything (therefore to not unregister an MDIO bus it has registered
on ->probe), because this ->remove is actually triggered by the
device_shutdown path, and its ->shutdown method has already run and done
the minimally required cleanup.

This used to be fine until the blamed commit, but now, the following
BUG_ON triggers:

void mdiobus_free(struct mii_bus *bus)
{
	/* For compatibility with error handling in drivers. */
	if (bus->state == MDIOBUS_ALLOCATED) {
		kfree(bus);
		return;
	}

	BUG_ON(bus->state != MDIOBUS_UNREGISTERED);
	bus->state = MDIOBUS_RELEASED;

	put_device(&bus->dev);
}

In other words, there is an attempt to free an MDIO bus which was not
unregistered. The attempt to free it comes from the devres release
callbacks of the SPI device, which are executed after the device is
unregistered.

I'm not saying that the fact that MDIO buses allocated using devres
would automatically get unregistered wasn't strange. I'm just saying
that the commit didn't care about auditing existing call paths in the
kernel, and now, the following code sequences are potentially buggy:

(a) devm_mdiobus_alloc followed by plain mdiobus_register, for a device
    located on a bus that unregisters its children on shutdown. After
    the blamed patch, either both the alloc and the register should use
    devres, or none should.

(b) devm_mdiobus_alloc followed by plain mdiobus_register, and then no
    mdiobus_unregister at all in the remove path. After the blamed
    patch, nobody unregisters the MDIO bus anymore, so this is even more
    buggy than the previous case which needs a specific bus
    configuration to be seen, this one is an unconditional bug.

In this case, DSA falls into category (a), it tries to be helpful and
registers an MDIO bus on behalf of the switch, which might be on such a
bus. I've no idea why it does it under devres.

It does this on probe:

	if (!ds->slave_mii_bus && ds->ops->phy_read)
		alloc and register mdio bus

and this on remove:

	if (ds->slave_mii_bus && ds->ops->phy_read)
		unregister mdio bus

I _could_ imagine using devres because the condition used on remove is
different than the condition used on probe. So strictly speaking, DSA
cannot determine whether the ds->slave_mii_bus it sees on remove is the
ds->slave_mii_bus that _it_ has allocated on probe. Using devres would
have solved that problem. But nonetheless, the existing code already
proceeds to unregister the MDIO bus, even though it might be
unregistering an MDIO bus it has never registered. So I can only guess
that no driver that implements ds->ops->phy_read also allocates and
registers ds->slave_mii_bus itself.

So in that case, if unregistering is fine, freeing must be fine too.

Stop using devres and free the MDIO bus manually. This will make devres
stop attempting to free a still registered MDIO bus on ->shutdown.

Fixes: ac3a68d566 ("net: phy: don't abuse devres in devm_mdiobus_register()")
Reported-by: Lino Sanfilippo <LinoSanfilippo@gmx.de>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Tested-by: Lino Sanfilippo <LinoSanfilippo@gmx.de>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-21 13:52:16 +01:00
Masanari Iida
3e95cfa24e Doc: networking: Fox a typo in ice.rst
This patch fixes a spelling typo in ice.rst

Signed-off-by: Masanari Iida <standby24x7@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-21 11:01:25 +01:00
Vladimir Oltean
e5845aa0ea net: dsa: fix dsa_tree_setup error path
Since the blamed commit, dsa_tree_teardown_switches() was split into two
smaller functions, dsa_tree_teardown_switches and dsa_tree_teardown_ports.

However, the error path of dsa_tree_setup stopped calling dsa_tree_teardown_ports.

Fixes: a57d8c217a ("net: dsa: flush switchdev workqueue before tearing down CPU/DSA ports")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-21 10:59:39 +01:00
David S. Miller
431db53c73 Merge branch 'smc-fixes'
Karsten Graul says:

====================
net/smc: fixes 2021-09-20

Please apply the following patches for smc to netdev's net tree.

The first patch adds a missing error check, and the second patch
fixes a possible leak of a lock in a worker.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-21 10:54:17 +01:00
Karsten Graul
a18cee4791 net/smc: fix 'workqueue leaked lock' in smc_conn_abort_work
The abort_work is scheduled when a connection was detected to be
out-of-sync after a link failure. The work calls smc_conn_kill(),
which calls smc_close_active_abort() and that might end up calling
smc_close_cancel_work().
smc_close_cancel_work() cancels any pending close_work and tx_work but
needs to release the sock_lock before and acquires the sock_lock again
afterwards. So when the sock_lock was NOT acquired before then it may
be held after the abort_work completes. Thats why the sock_lock is
acquired before the call to smc_conn_kill() in __smc_lgr_terminate(),
but this is missing in smc_conn_abort_work().

Fix that by acquiring the sock_lock first and release it after the
call to smc_conn_kill().

Fixes: b286a0651e ("net/smc: handle incoming CDC validation message")
Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-21 10:54:16 +01:00
Karsten Graul
6c90731980 net/smc: add missing error check in smc_clc_prfx_set()
Coverity stumbled over a missing error check in smc_clc_prfx_set():

*** CID 1475954:  Error handling issues  (CHECKED_RETURN)
/net/smc/smc_clc.c: 233 in smc_clc_prfx_set()
>>>     CID 1475954:  Error handling issues  (CHECKED_RETURN)
>>>     Calling "kernel_getsockname" without checking return value (as is done elsewhere 8 out of 10 times).
233     	kernel_getsockname(clcsock, (struct sockaddr *)&addrs);

Add the return code check in smc_clc_prfx_set().

Fixes: c246d942ea ("net/smc: restructure netinfo for CLC proposal msgs")
Reported-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-21 10:54:16 +01:00
Eric Dumazet
e9edc188fc netfilter: conntrack: serialize hash resizes and cleanups
Syzbot was able to trigger the following warning [1]

No repro found by syzbot yet but I was able to trigger similar issue
by having 2 scripts running in parallel, changing conntrack hash sizes,
and:

for j in `seq 1 1000` ; do unshare -n /bin/true >/dev/null ; done

It would take more than 5 minutes for net_namespace structures
to be cleaned up.

This is because nf_ct_iterate_cleanup() has to restart everytime
a resize happened.

By adding a mutex, we can serialize hash resizes and cleanups
and also make get_next_corpse() faster by skipping over empty
buckets.

Even without resizes in the picture, this patch considerably
speeds up network namespace dismantles.

[1]
INFO: task syz-executor.0:8312 can't die for more than 144 seconds.
task:syz-executor.0  state:R  running task     stack:25672 pid: 8312 ppid:  6573 flags:0x00004006
Call Trace:
 context_switch kernel/sched/core.c:4955 [inline]
 __schedule+0x940/0x26f0 kernel/sched/core.c:6236
 preempt_schedule_common+0x45/0xc0 kernel/sched/core.c:6408
 preempt_schedule_thunk+0x16/0x18 arch/x86/entry/thunk_64.S:35
 __local_bh_enable_ip+0x109/0x120 kernel/softirq.c:390
 local_bh_enable include/linux/bottom_half.h:32 [inline]
 get_next_corpse net/netfilter/nf_conntrack_core.c:2252 [inline]
 nf_ct_iterate_cleanup+0x15a/0x450 net/netfilter/nf_conntrack_core.c:2275
 nf_conntrack_cleanup_net_list+0x14c/0x4f0 net/netfilter/nf_conntrack_core.c:2469
 ops_exit_list+0x10d/0x160 net/core/net_namespace.c:171
 setup_net+0x639/0xa30 net/core/net_namespace.c:349
 copy_net_ns+0x319/0x760 net/core/net_namespace.c:470
 create_new_namespaces+0x3f6/0xb20 kernel/nsproxy.c:110
 unshare_nsproxy_namespaces+0xc1/0x1f0 kernel/nsproxy.c:226
 ksys_unshare+0x445/0x920 kernel/fork.c:3128
 __do_sys_unshare kernel/fork.c:3202 [inline]
 __se_sys_unshare kernel/fork.c:3200 [inline]
 __x64_sys_unshare+0x2d/0x40 kernel/fork.c:3200
 do_syscall_x64 arch/x86/entry/common.c:50 [inline]
 do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
 entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x7f63da68e739
RSP: 002b:00007f63d7c05188 EFLAGS: 00000246 ORIG_RAX: 0000000000000110
RAX: ffffffffffffffda RBX: 00007f63da792f80 RCX: 00007f63da68e739
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000040000000
RBP: 00007f63da6e8cc4 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00007f63da792f80
R13: 00007fff50b75d3f R14: 00007f63d7c05300 R15: 0000000000022000

Showing all locks held in the system:
1 lock held by khungtaskd/27:
 #0: ffffffff8b980020 (rcu_read_lock){....}-{1:2}, at: debug_show_all_locks+0x53/0x260 kernel/locking/lockdep.c:6446
2 locks held by kworker/u4:2/153:
 #0: ffff888010c69138 ((wq_completion)events_unbound){+.+.}-{0:0}, at: arch_atomic64_set arch/x86/include/asm/atomic64_64.h:34 [inline]
 #0: ffff888010c69138 ((wq_completion)events_unbound){+.+.}-{0:0}, at: arch_atomic_long_set include/linux/atomic/atomic-long.h:41 [inline]
 #0: ffff888010c69138 ((wq_completion)events_unbound){+.+.}-{0:0}, at: atomic_long_set include/linux/atomic/atomic-instrumented.h:1198 [inline]
 #0: ffff888010c69138 ((wq_completion)events_unbound){+.+.}-{0:0}, at: set_work_data kernel/workqueue.c:634 [inline]
 #0: ffff888010c69138 ((wq_completion)events_unbound){+.+.}-{0:0}, at: set_work_pool_and_clear_pending kernel/workqueue.c:661 [inline]
 #0: ffff888010c69138 ((wq_completion)events_unbound){+.+.}-{0:0}, at: process_one_work+0x896/0x1690 kernel/workqueue.c:2268
 #1: ffffc9000140fdb0 ((kfence_timer).work){+.+.}-{0:0}, at: process_one_work+0x8ca/0x1690 kernel/workqueue.c:2272
1 lock held by systemd-udevd/2970:
1 lock held by in:imklog/6258:
 #0: ffff88807f970ff0 (&f->f_pos_lock){+.+.}-{3:3}, at: __fdget_pos+0xe9/0x100 fs/file.c:990
3 locks held by kworker/1:6/8158:
1 lock held by syz-executor.0/8312:
2 locks held by kworker/u4:13/9320:
1 lock held by syz-executor.5/10178:
1 lock held by syz-executor.4/10217:

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-09-21 03:46:56 +02:00
Florian Westphal
b53deef054 netfilter: log: work around missing softdep backend module
iptables/nftables has two types of log modules:

1. backend, e.g. nf_log_syslog, which implement the functionality
2. frontend, e.g. xt_LOG or nft_log, which call the functionality
   provided by backend based on nf_tables or xtables rule set.

Problem is that the request_module() call to load the backed in
nf_logger_find_get() might happen with nftables transaction mutex held
in case the call path is via nf_tables/nft_compat.

This can cause deadlocks (see 'Fixes' tags for details).

The chosen solution as to let modprobe deal with this by adding 'pre: '
soft dep tag to xt_LOG (to load the syslog backend) and xt_NFLOG (to
load nflog backend).

Eric reports that this breaks on systems with older modprobe that
doesn't support softdeps.

Another, similar issue occurs when someone either insmods xt_(NF)LOG
directly or unloads the backend module (possible if no log frontend
is in use): because the frontend module is already loaded, modprobe is
not invoked again so the softdep isn't evaluated.

Add a workaround: If nf_logger_find_get() returns -ENOENT and call
is not via nft_compat, load the backend explicitly and try again.

Else, let nft_compat ask for deferred request_module via nf_tables
infra.

Softdeps are kept in-place, so with newer modprobe the dependencies
are resolved from userspace.

Fixes: cefa31a9d4 ("netfilter: nft_log: perform module load from nf_tables")
Fixes: a38b5b56d6 ("netfilter: nf_log: add module softdeps")
Reported-and-tested-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-09-21 03:46:56 +02:00
Florian Westphal
cc8072153a netfilter: iptable_raw: drop bogus net_init annotation
This is a leftover from the times when this function was wired up via
pernet_operations.  Now its called when userspace asks for the table.

With CONFIG_NET_NS=n, iptable_raw_table_init memory has been discarded
already and we get a kernel crash.

Other tables are fine, __net_init annotation was removed already.

Fixes: fdacd57c79 ("netfilter: x_tables: never register tables by default")
Reported-by: youling 257 <youling257@gmail.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-09-21 03:46:56 +02:00
Florian Westphal
7970a19b71 netfilter: nf_nat_masquerade: defer conntrack walk to work queue
The ipv4 and device notifiers are called with RTNL mutex held.
The table walk can take some time, better not block other RTNL users.

'ip a' has been reported to block for up to 20 seconds when conntrack table
has many entries and device down events are frequent (e.g., PPP).

Reported-and-tested-by: Martin Zaharinov <micron10@gmail.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-09-21 03:46:56 +02:00