Pull cgroup fixes from Tejun Heo:
"Two cgroup fixes:
- fix a NULL deref when trying to poll PSI in the root cgroup
- fix confusing controller parsing corner case when mounting cgroup
v1 hierarchies
And doc / maintainer file updates"
* 'for-5.11-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
cgroup: update PSI file description in docs
cgroup: fix psi monitor for root cgroup
MAINTAINERS: Update my email address
MAINTAINERS: Remove stale URLs for cpuset
cgroup-v1: add disabled controller check in cgroup1_parse_param()
Merge fixes from Andrew Morton:
"6 patches.
Subsystems affected by this patch series: mm/pagemap, scripts,
MAINTAINERS, and h8300"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
h8300: fix PREEMPTION build, TI_PRE_COUNT undefined
MAINTAINERS: add Andrey Konovalov to KASAN reviewers
MAINTAINERS: update Andrey Konovalov's email address
MAINTAINERS: update KASAN file list
scripts/recordmcount.pl: support big endian for ARCH sh
m68k: make __pfn_to_phys() and __phys_to_pfn() available for !MMU
Pull i2c fix from Wolfram Sang:
"One more I2C driver bugfix"
* 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
i2c: stm32f7: fix configuration of the digital filter
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEE8rQSAMVO+zA4DBdWxWXV+ddtWDsFAmAmlkAACgkQxWXV+ddt
WDuwNxAAiBAhEwPllzyU86p4RMMip5pa24zu11HkTya65yGk6EFuj4zTlx/L5Fn6
JOjxwlPqaTItER1PYJ5HRdIy1Y2E4eWEiDLolvmvDCPZrfKRKhBU1MZbgXwDbp+Z
pwaJGIm5ZaXDGyuFge3bKA48BERfqxRBO3qIOZ0tzgsUFLlZ2d9EdDc99093/J6k
QzIijXQjFnvnB2MNawN1b/KQ63xqXLo2hemKcKIFCxJHm9eaet/qwGHl5iuR5ScY
bOGCWvLSkCXceartDur3msOZXur09YLyfeYmE9dj1FN3aNu97sW8VivWRrs3aglK
if51iYrrjKSnDr4SOK28S5UYdgeStb/qWWtosdcMsQVBo0t7iCnGT2psGaQCkdfG
FChqbs2uXlbJrojlelV6xbaU3S2D2MtSz5mF+I2G5MpQbj1jkhYE9ZTUQeibcd7o
l+edn/VJvVK4X0NAX8pIWJ4nFY1HqUTyfn28IQ7ymBhyyUloIoazvSkBuSWy6iy0
9aPpohOKjCw8Y3MbgcIfIEJhdK+aIKF8ZPh52+zcXQzf1OtSryVarLHsNXWm9vJ8
tHsRHCzrbLFdAXZccT6YlerzPs4+PVf44UknDbFCg7sLcG04NIGGrMXOtTHwgEZL
BEywTjAMlMDjrEXouxYAPNPnEg/NlvQGZYRvBnxrtZE4G2fxJ7o=
=7w6G
-----END PGP SIGNATURE-----
Merge tag 'for-5.11-rc7-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull btrfs fix from David Sterba:
"A regression fix caused by a refactoring in 5.11.
A corrupted superblock wouldn't be detected by checksum verification
due to wrongly placed initialization of the checksum length, thus
making memcmp always work"
* tag 'for-5.11-rc7-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: initialize fs_info::csum_size earlier in open_ctree
Fix a build error for undefined 'TI_PRE_COUNT' by adding it to
asm-offsets.c.
h8300-linux-ld: arch/h8300/kernel/entry.o: in function `resume_kernel': (.text+0x29a): undefined reference to `TI_PRE_COUNT'
Link: https://lkml.kernel.org/r/20210212021650.22740-1-rdunlap@infradead.org
Fixes: df2078b8da ("h8300: Low level entry")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Reported-by: kernel test robot <lkp@intel.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The kernel test robot reported the following issue:
CC [M] drivers/soc/litex/litex_soc_ctrl.o
sh4-linux-objcopy: Unable to change endianness of input file(s)
sh4-linux-ld: cannot find drivers/soc/litex/.tmp_gl_litex_soc_ctrl.o: No such file or directory
sh4-linux-objcopy: 'drivers/soc/litex/.tmp_mx_litex_soc_ctrl.o': No such file
The problem is that the format of input file is elf32-shbig-linux, but
sh4-linux-objcopy wants to output a file which format is elf32-sh-linux:
$ sh4-linux-objdump -d drivers/soc/litex/litex_soc_ctrl.o | grep format
drivers/soc/litex/litex_soc_ctrl.o: file format elf32-shbig-linux
Link: https://lkml.kernel.org/r/20210210150435.2171567-1-rong.a.chen@intel.com
Link: https://lore.kernel.org/linux-mm/202101261118.GbbYSlHu-lkp@intel.com
Signed-off-by: Rong Chen <rong.a.chen@intel.com>
Reported-by: kernel test robot <lkp@intel.com>
Cc: Yoshinori Sato <ysato@users.osdn.me>
Cc: Rich Felker <dalias@libc.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Recent changes that obsoleted DISCONTIGMEM on m68k switched the MMU
variant to use generic definitions of __pfn_to_phys() and __phys_to_pfn(),
but missed the !MMU variant which caused a build failure:
drivers/media/common/videobuf2/videobuf2-dma-contig.c: In function 'vb2_dc_get_userptr':
drivers/media/common/videobuf2/videobuf2-dma-contig.c:509:5: error: implicit declaration of function '__pfn_to_phys' [-Werror=implicit-function-declaration]
509 | __pfn_to_phys(nums[0]), size, buf->dma_dir, 0);
| ^~~~~~~~~~~~~
cc1: some warnings being treated as errors
Enable __pfn_to_phys() and __phys_to_pfn() on !MMU builds.
Link: https://lkml.kernel.org/r/20210211232202.GS299309@linux.ibm.com
Fixes: 4bfc848e09 ("m68k/mm: enable use of generic memory_model.h for !DISCONTIGMEM")
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Reported-by: kernel test robot <lkp@intel.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Greg Ungerer <gerg@linux-m68k.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Be nice and prune these upfront, in case the ring is being shared and
one of the tasks is going away. This is a bit more important now that
we account the allocations.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
We have three different ones, put it in a helper for easy calling. This
is in preparation for doing it outside of ring freeing as well. With
that in mind, also ensure that we do the proper locking for safe calling
from a context where the ring it still live.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
No changes in this patch, just allows a caller to pass in a targeted
task that we must match for freeing requests in the cache.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Robert Hancock says:
====================
Xilinx axienet updates
Updates to the Xilinx AXI Ethernet driver to add support for an additional
ethtool operation, and to support dynamic switching between 1000BaseX and
SGMII interface modes.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Newer versions of the Xilinx AXI Ethernet core (specifically version 7.2 or
later) allow the core to be configured with a PHY interface mode of "Both",
allowing either 1000BaseX or SGMII modes to be selected at runtime. Add
support for this in the driver to allow better support for applications
which can use both fiber and copper SFP modules.
Signed-off-by: Robert Hancock <robert.hancock@calian.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Document the new xlnx,switch-x-sgmii attribute which is used to indicate
that the Ethernet core supports dynamic switching between 1000BaseX and
SGMII.
Signed-off-by: Robert Hancock <robert.hancock@calian.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Hook up the nway_reset ethtool operation to the corresponding phylink
function so that "ethtool -r" can be supported.
Signed-off-by: Robert Hancock <robert.hancock@calian.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Dmitrii Banshchikov says:
====================
This patchset adds support of pointers to type with known size among
global function arguments.
The motivation is to overcome the limit on the maximum number of allowed
arguments and avoid tricky and unoptimal ways of passing arguments.
A referenced type may contain pointers but access via such pointers
cannot be veirified currently.
v2 -> v3
- Fix reg ID generation
- Fix commit description
- Fix typo
- Fix tests
v1 -> v2:
- Allow pointer to any type with known size rather than struct only
- Allow pointer in global functions only
- Add more tests
- Fix wrapping and v1 comments
====================
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
test_global_func9 - check valid pointer's scenarios
test_global_func10 - check that a smaller type cannot be passed as a
larger one
test_global_func11 - check that CTX pointer cannot be passed
test_global_func12 - check access to a null pointer
test_global_func13 - check access to an arbitrary pointer value
test_global_func14 - check that an opaque pointer cannot be passed
test_global_func15 - check that a variable has an unknown value after
it was passed to a global function by pointer
test_global_func16 - check access to uninitialized stack memory
test_global_func_args - check read and write operations through a pointer
Signed-off-by: Dmitrii Banshchikov <me@ubique.spb.ru>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210212205642.620788-5-me@ubique.spb.ru
Add an ability to pass a pointer to a type with known size in arguments
of a global function. Such pointers may be used to overcome the limit on
the maximum number of arguments, avoid expensive and tricky workarounds
and to have multiple output arguments.
A referenced type may contain pointers but indirect access through them
isn't supported.
The implementation consists of two parts. If a global function has an
argument that is a pointer to a type with known size then:
1) In btf_check_func_arg_match(): check that the corresponding
register points to NULL or to a valid memory region that is large enough
to contain the expected argument's type.
2) In btf_prepare_func_args(): set the corresponding register type to
PTR_TO_MEM_OR_NULL and its size to the size of the expected type.
Only global functions are supported because allowance of pointers for
static functions might break validation. Consider the following
scenario. A static function has a pointer argument. A caller passes
pointer to its stack memory. Because the callee can change referenced
memory verifier cannot longer assume any particular slot type of the
caller's stack memory hence the slot type is changed to SLOT_MISC. If
there is an operation that relies on slot type other than SLOT_MISC then
verifier won't be able to infer safety of the operation.
When verifier sees a static function that has a pointer argument
different from PTR_TO_CTX then it skips arguments check and continues
with "inline" validation with more information available. The operation
that relies on the particular slot type now succeeds.
Because global functions were not allowed to have pointer arguments
different from PTR_TO_CTX it's not possible to break existing and valid
code.
Signed-off-by: Dmitrii Banshchikov <me@ubique.spb.ru>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210212205642.620788-4-me@ubique.spb.ru
Extract conversion from a register's nullable type to a type with a
value. The helper will be used in mark_ptr_not_null_reg().
Signed-off-by: Dmitrii Banshchikov <me@ubique.spb.ru>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210212205642.620788-3-me@ubique.spb.ru
Using "reg" for an array of bpf_reg_state and "reg[i + 1]" for an
individual bpf_reg_state is error-prone and verbose. Use "regs" for the
former and "reg" for the latter as other code nearby does.
Signed-off-by: Dmitrii Banshchikov <me@ubique.spb.ru>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210212205642.620788-2-me@ubique.spb.ru
This driver is set up to use a clock mapping in the device tree if it is
present, but still work without one for backward compatibility. However,
if getting the clock returns -EPROBE_DEFER, then we need to abort and
return that error from our driver initialization so that the probe can
be retried later after the clock is set up.
Move clock initialization to earlier in the process so we do not waste as
much effort if the clock is not yet available. Switch to use
devm_clk_get_optional and abort initialization on any error reported.
Also enable the clock regardless of whether the controller is using an MDIO
bus, as the clock is required in any case.
Fixes: 09a0354cad ("net: axienet: Use clock framework to get device clock rate")
Signed-off-by: Robert Hancock <robert.hancock@calian.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet says:
====================
tcp: mem pressure vs SO_RCVLOWAT
First patch fixes an issue for applications using SO_RCVLOWAT
to reduce context switches.
Second patch is a cleanup.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Both tcp_data_ready() and tcp_stream_is_readable() share the same logic.
Add tcp_epollin_ready() helper to avoid duplication.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Arjun Roy <arjunroy@google.com>
Cc: Wei Wang <weiwan@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
While commit 24adbc1676 ("tcp: fix SO_RCVLOWAT hangs with fat skbs")
fixed an issue vs too small sk_rcvbuf for given sk_rcvlowat constraint,
it missed to address issue caused by memory pressure.
1) If we are under memory pressure and socket receive queue is empty.
First incoming packet is allowed to be queued, after commit
76dfa60820 ("tcp: allow one skb to be received per socket under memory pressure")
But we do not send EPOLLIN yet, in case tcp_data_ready() sees sk_rcvlowat
is bigger than skb length.
2) Then, when next packet comes, it is dropped, and we directly
call sk->sk_data_ready().
3) If application is using poll(), tcp_poll() will then use
tcp_stream_is_readable() and decide the socket receive queue is
not yet filled, so nothing will happen.
Even when sender retransmits packets, phases 2) & 3) repeat
and flow is effectively frozen, until memory pressure is off.
Fix is to consider tcp_under_memory_pressure() to take care
of global memory pressure or memcg pressure.
Fixes: 24adbc1676 ("tcp: fix SO_RCVLOWAT hangs with fat skbs")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Arjun Roy <arjunroy@google.com>
Suggested-by: Wei Wang <weiwan@google.com>
Reviewed-by: Wei Wang <weiwan@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Tony Nguyen says:
====================
40GbE Intel Wired LAN Driver Updates 2021-02-12
This series contains updates to i40e, ice, and ixgbe drivers.
Maciej does cleanups on the following drivers.
For i40e, removes redundant check for XDP prog, cleans up no longer
relevant information, and removes an unused function argument.
For ice, removes local variable use, instead returning values directly.
Moves skb pointer from buffer to ring and removes an unneeded check for
xdp_prog in zero copy path. Also removes a redundant MTU check when
changing it.
For i40e, ice, and ixgbe, stores the rx_offset in the Rx ring as
the value is constant so there's no need for continual calls.
Bjorn folds a decrement into a while statement.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
The supported indirect subcrq entries on Power8 is 16. Power9
supports 128. Redefined this value to 16 to minimize the driver from
having to reset when migrating between Power9 and Power8. In our rx/tx
performance testing, we found no performance difference between 16 and
128 at this time.
Fixes: f019fb6392 ("ibmvnic: Introduce indirect subordinate Command Response Queue buffer")
Signed-off-by: Dany Madden <drt@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Guillaume Nault says:
====================
selftests: tc: Test tc-flower's MPLS features
A couple of patches for exercising the MPLS filters of tc-flower.
Patch 1 tests basic MPLS matching features: those that only work on the
first label stack entry (that is, the mpls_label, mpls_tc, mpls_bos and
mpls_ttl options).
Patch 2 tests the more generic "mpls" and "lse" options, which allow
matching MPLS fields beyond the first stack entry.
In both patches, special care is taken to skip these new tests for
incompatible versions of tc.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Add tests in tc_flower.sh for generic matching on MPLS Label Stack
Entries. The label, tc, bos and ttl fields are tested for the first
and second labels. For each field, the minimal and maximal values are
tested (the former at depth 1 and the later at depth 2).
There are also tests for matching the presence of a label stack entry
at a given depth.
In order to reduce the amount of code, all "lse" subcommands are tested
in match_mpls_lse_test(). Action "continue" is used, so that test
packets are evaluated by all filters. Then, we can verify if each
filter matched the expected number of packets.
Some versions of tc-flower produced invalid json output when dumping
MPLS filters with depth > 1. Skip the test if tc isn't recent enough.
Signed-off-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add tests in tc_flower.sh for mpls_label, mpls_tc, mpls_bos and
mpls_ttl. For each keyword, test the minimal and maximal values.
Selectively skip these new mpls tests for tc versions that don't
support them.
Signed-off-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vladimir Oltean says:
====================
Cleanup in brport flags switchdev offload for DSA
The initial goal of this series was to have better support for
standalone ports mode on the DSA drivers like ocelot/felix and sja1105.
This turned out to require some API adjustments in both directions:
to the information presented to and by the switchdev notifier, and to
the API presented to the switch drivers by the DSA layer.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
The chip can configure unicast flooding, broadcast flooding and learning.
Learning is per port, while flooding is per {ingress, egress} port pair
and we need to configure the same value for all possible ingress ports
towards the requested one.
While multicast flooding is not officially supported, we can hack it by
using a feature of the second generation (P/Q/R/S) devices, which is that
FDB entries are maskable, and multicast addresses always have an odd
first octet. So by putting a match-all for 00:01:00:00:00:00 addr and
00:01:00:00:00:00 mask at the end of the FDB, we make sure that it is
always checked last, and does not take precedence in front of any other
MDB. So it behaves effectively as an unknown multicast entry.
For the first generation switches, this feature is not available, so
unknown multicast will always be treated the same as unknown unicast.
So the only thing we can do is request the user to offload the settings
for these 2 flags in tandem, i.e.
ip link set swp2 type bridge_slave flood off
Error: sja1105: This chip cannot configure multicast flooding independently of unicast.
ip link set swp2 type bridge_slave flood off mcast_flood off
ip link set swp2 type bridge_slave mcast_flood on
Error: sja1105: This chip cannot configure multicast flooding independently of unicast.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We should not be unconditionally enabling address learning, since doing
that is actively detrimential when a port is standalone and not offloading
a bridge. Namely, if a port in the switch is standalone and others are
offloading the bridge, then we could enter a situation where we learn an
address towards the standalone port, but the bridged ports could not
forward the packet there, because the CPU is the only path between the
standalone and the bridged ports. The solution of course is to not
enable address learning unless the bridge asks for it.
We need to set up the initial port flags for no learning and flooding
everything, and also when the port joins and leaves the bridge.
The flood configuration was already configured ok for standalone mode
in ocelot_init, we just need to disable learning in ocelot_init_port.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In preparation of offloading the bridge port flags which have
independent settings for unknown multicast and for broadcast, we should
also start reserving one destination Port Group ID for the flooding of
broadcast packets, to allow configuring it individually.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
ocelot_init sets up PGID_MC to include the CPU port module, and that is
fine, but the ocelot-8021q tagger removes the CPU port module from the
unknown multicast replicator. So after a transition from the default
ocelot tagger towards ocelot-8021q and then again towards ocelot,
multicast flooding towards the CPU port module will be disabled.
Fixes: e21268efbe ("net: dsa: felix: perform switch setup for tag_8021q")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
There are multiple ways in which a PORT_BRIDGE_FLAGS attribute can be
expressed by the bridge through switchdev, and not all of them can be
emulated by DSA mid-layer API at the same time.
One possible configuration is when the bridge offloads the port flags
using a mask that has a single bit set - therefore only one feature
should change. However, DSA currently groups together unicast and
multicast flooding in the .port_egress_floods method, which limits our
options when we try to add support for turning off broadcast flooding:
do we extend .port_egress_floods with a third parameter which b53 and
mv88e6xxx will ignore? But that means that the DSA layer, which
currently implements the PRE_BRIDGE_FLAGS attribute all by itself, will
see that .port_egress_floods is implemented, and will report that all 3
types of flooding are supported - not necessarily true.
Another configuration is when the user specifies more than one flag at
the same time, in the same netlink message. If we were to create one
individual function per offloadable bridge port flag, we would limit the
expressiveness of the switch driver of refusing certain combinations of
flag values. For example, a switch may not have an explicit knob for
flooding of unknown multicast, just for flooding in general. In that
case, the only correct thing to do is to allow changes to BR_FLOOD and
BR_MCAST_FLOOD in tandem, and never allow mismatched values. But having
a separate .port_set_unicast_flood and .port_set_multicast_flood would
not allow the driver to possibly reject that.
Also, DSA doesn't consider it necessary to inform the driver that a
SWITCHDEV_ATTR_ID_BRIDGE_MROUTER attribute was offloaded, because it
just calls .port_egress_floods for the CPU port. When we'll add support
for the plain SWITCHDEV_ATTR_ID_PORT_MROUTER, that will become a real
problem because the flood settings will need to be held statefully in
the DSA middle layer, otherwise changing the mrouter port attribute will
impact the flooding attribute. And that's _assuming_ that the underlying
hardware doesn't have anything else to do when a multicast router
attaches to a port than flood unknown traffic to it. If it does, there
will need to be a dedicated .port_set_mrouter anyway.
So we need to let the DSA drivers see the exact form that the bridge
passes this switchdev attribute in, otherwise we are standing in the
way. Therefore we also need to use this form of language when
communicating to the driver that it needs to configure its initial
(before bridge join) and final (after bridge leave) port flags.
The b53 and mv88e6xxx drivers are converted to the passthrough API and
their implementation of .port_egress_floods is split into two: a
function that configures unicast flooding and another for multicast.
The mv88e6xxx implementation is quite hairy, and it turns out that
the implementations of unknown unicast flooding are actually the same
for 6185 and for 6352:
behind the confusing names actually lie two individual bits:
NO_UNKNOWN_MC -> FLOOD_UC = 0x4 = BIT(2)
NO_UNKNOWN_UC -> FLOOD_MC = 0x8 = BIT(3)
so there was no reason to entangle them in the first place.
Whereas the 6185 writes to MV88E6185_PORT_CTL0_FORWARD_UNKNOWN of
PORT_CTL0, which has the exact same bit index. I have left the
implementations separate though, for the only reason that the names are
different enough to confuse me, since I am not able to double-check with
a user manual. The multicast flooding setting for 6185 is in a different
register than for 6352 though.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This switchdev attribute offers a counterproductive API for a driver
writer, because although br_switchdev_set_port_flag gets passed a
"flags" and a "mask", those are passed piecemeal to the driver, so while
the PRE_BRIDGE_FLAGS listener knows what changed because it has the
"mask", the BRIDGE_FLAGS listener doesn't, because it only has the final
value. But certain drivers can offload only certain combinations of
settings, like for example they cannot change unicast flooding
independently of multicast flooding - they must be both on or both off.
The way the information is passed to switchdev makes drivers not
expressive enough, and unable to reject this request ahead of time, in
the PRE_BRIDGE_FLAGS notifier, so they are forced to reject it during
the deferred BRIDGE_FLAGS attribute, where the rejection is currently
ignored.
This patch also changes drivers to make use of the "mask" field for edge
detection when possible.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Grygorii Strashko <grygorii.strashko@ti.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
For a DSA switch port operating in standalone mode, address learning
doesn't make much sense since that is a bridge function. In fact,
address learning even breaks setups such as this one:
+---------------------------------------------+
| |
| +-------------------+ |
| | br0 | send receive |
| +--------+-+--------+ +--------+ +--------+ |
| | | | | | | | | |
| | swp0 | | swp1 | | swp2 | | swp3 | |
| | | | | | | | | |
+-+--------+-+--------+-+--------+-+--------+-+
| ^ | ^
| | | |
| +-----------+ |
| |
+--------------------------------+
because if the switch has a single FDB (can offload a single bridge)
then source address learning on swp3 can "steal" the source MAC address
of swp2 from br0's FDB, because learning frames coming from swp2 will be
done twice: first on the swp1 ingress port, second on the swp3 ingress
port. So the hardware FDB will become out of sync with the software
bridge, and when swp2 tries to send one more packet towards swp1, the
ASIC will attempt to short-circuit the forwarding path and send it
directly to swp3 (since that's the last port it learned that address on),
which it obviously can't, because swp3 operates in standalone mode.
So DSA drivers operating in standalone mode should still configure a
list of bridge port flags even when they are standalone. Currently DSA
attempts to call dsa_port_bridge_flags with 0, which disables egress
flooding of unknown unicast and multicast, something which doesn't make
much sense. For the switches that implement .port_egress_floods - b53
and mv88e6xxx, it probably doesn't matter too much either, since they
can possibly inject traffic from the CPU into a standalone port,
regardless of MAC DA, even if egress flooding is turned off for that
port, but certainly not all DSA switches can do that - sja1105, for
example, can't. So it makes sense to use a better common default there,
such as "flood everything".
It should also be noted that what DSA calls "dsa_port_bridge_flags()"
is a degenerate name for just calling .port_egress_floods(), since
nothing else is implemented - not learning, in particular. But disabling
address learning, something that this driver is also coding up for, will
be supported by individual drivers once .port_egress_floods is replaced
with a more generic .port_bridge_flags.
Previous attempts to code up this logic have been in the common bridge
layer, but as pointed out by Ido Schimmel, there are corner cases that
are missed when doing that:
https://patchwork.kernel.org/project/netdevbpf/patch/20210209151936.97382-5-olteanv@gmail.com/
So, at least for now, let's leave DSA in charge of setting port flags
before and after the bridge join and leave.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
For the netlink interface, propagate errors through extack rather than
simply printing them to the console. For the sysfs interface, we still
print to the console, but at least that's one layer higher than in
switchdev, which also allows us to silently ignore the offloading of
flags if that is ever needed in the future.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If for example this command:
ip link set swp0 type bridge_slave flood off mcast_flood off learning off
succeeded at configuring BR_FLOOD and BR_MCAST_FLOOD but not at
BR_LEARNING, there would be no attempt to revert the partial state in
any way. Arguably, if the user changes more than one flag through the
same netlink command, this one _should_ be all or nothing, which means
it should be passed through switchdev as all or nothing.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When a struct switchdev_attr is notified through switchdev, there is no
way to report informational messages, unlike for struct switchdev_obj.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Reviewed-by: Grygorii Strashko <grygorii.strashko@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
the following command:
# tc filter add dev $h2 ingress protocol ip pref 1 handle 101 flower \
$tcflags dst_ip 192.0.2.2 ip_ttl 63 action drop
doesn't drop all IPv4 packets that match the configured TTL / destination
address. In particular, if "fragment offset" or "more fragments" have non
zero value in the IPv4 header, setting of FLOW_DISSECTOR_KEY_IP is simply
ignored. Fix this dissecting IPv4 TTL and TOS before fragment info; while
at it, add a selftest for tc flower's match on 'ip_ttl' that verifies the
correct behavior.
Fixes: 518d8a2e9b ("net/flow_dissector: add support for dissection of misc ip header fields")
Reported-by: Shuang Li <shuali@redhat.com>
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Alex Elder says:
====================
net: ipa: some more cleanup
Version 3 of this series uses dev_err_probe() in the second patch,
as suggested by Heiner Kallweit.
Version 2 was sent to ensure the series was based on current
net-next/master, and added copyright updates to files touched.
The original introduction is below.
This is another fairly innocuous set of cleanup patches.
The first was motivated by a bug found that would affect IPA v4.5.
It maintain a new GSI address pointer; one is the "raw" (original
mapped) address, and the other will have been adjusted if necessary
for use on newer platforms.
The second just quiets some unnecessary noise during early probe.
The third fixes some errors that show up when IPA_VALIDATION is
enabled.
The last two just create helper functions to improve readability.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Create a simple helper function that indicates whether a channel has
been initialized. This abstacts/hides the details of how this is
determined.
Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Introduce a new function to abstract the knowledge of whether hashed
routing and filter tables are supported for a given IPA instance.
IPA v4.2 is the only one that doesn't support hashed tables (now
and for the foreseeable future), but the name of the helper function
is better for explaining what's going on.
Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
In ipa_cmd_register_write_valid() we verify that values we will
supply to a REGISTER_WRITE IPA immediate command will fit in
the fields that need to hold them. This patch fixes some issues
in that function and ipa_cmd_register_write_offset_valid().
The dev_err() call in ipa_cmd_register_write_offset_valid() has
some printf format errors:
- The name of the register (corresponding to the string format
specifier) was not supplied.
- The IPA base offset and offset need to be supplied separately to
match the other format specifiers.
Also make the ~0 constant used there to compute the maximum
supported offset value explicitly unsigned.
There are two other issues in ipa_cmd_register_write_valid():
- There's no need to check the hash flush register for platforms
(like IPA v4.2) that do not support hashed tables
- The highest possible endpoint number, whose status register
offset is computed, is COUNT - 1, not COUNT.
Fix these problems, and add some additional commentary.
Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
When initializing the IPA core clock and interconnects, it's
possible we'll get an EPROBE_DEFER error. This isn't really an
error, it's just means we need to be re-probed later.
Use dev_err_probe() to report the error rather than dev_err().
This avoids polluting the log with these "error" messages.
Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch actually fixes a bug, though it doesn't affect the two
platforms supported currently. The fix implements GSI memory
pointers a bit differently.
For IPA version 4.5 and above, the address space for almost all GSI
registers is adjusted downward by a fixed amount. This is currently
handled by adjusting the I/O virtual address pointer after it has
been mapped. The bug is that the pointer is not "de-adjusted" as it
should be when it's unmapped.
This patch fixes that error, but it does so by maintaining one "raw"
pointer for the mapped memory range. This is assigned when the
memory is mapped and used to unmap the memory. This pointer is also
used to access the two registers that do *not* sit in the "adjusted"
memory space.
Rather than adjusting *that* pointer, we maintain a separate pointer
that's an adjusted copy of the "raw" pointer, and that is used for
most GSI register accesses.
Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>