Armin reported that after referenced commit his RTL8105e is dead when
resuming from suspend and machine runs on battery. This patch has been
confirmed to fix the issue.
Fixes: e80bd76fbf ("r8169: work around power-saving bug on some chip versions")
Reported-by: Armin Wolf <W_Armin@gmx.de>
Tested-by: Armin Wolf <W_Armin@gmx.de>
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
test_global_func4 fails on s390 as reported by Yauheni in [1].
The immediate problem is that the zext code includes the instruction,
whose result needs to be zero-extended, into the zero-extension
patchlet, and if this instruction happens to be a branch, then its
delta is not adjusted. As a result, the verifier rejects the program
later.
However, according to [2], as far as the verifier's algorithm is
concerned and as specified by the insn_no_def() function, branching
insns do not define anything. This includes call insns, even though
one might argue that they define %r0.
This means that the real problem is that zero extension kicks in at
all. This happens because clear_caller_saved_regs() sets BPF_REG_0's
subreg_def after global function calls. This can be fixed in many
ways; this patch mimics what helper function call handling already
does.
[1] https://lore.kernel.org/bpf/20200903140542.156624-1-yauheni.kaliuta@redhat.com/
[2] https://lore.kernel.org/bpf/CAADnVQ+2RPKcftZw8d+B1UwB35cpBhpF5u3OocNh90D9pETPwg@mail.gmail.com/
Fixes: 51c39bb1d5 ("bpf: Introduce function-by-function verification")
Reported-by: Yauheni Kaliuta <yauheni.kaliuta@redhat.com>
Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20210212040408.90109-1-iii@linux.ibm.com
With MTU less than 1500B on all ports, the driver uses per CPU pool mode.
If one of the ports set to jumbo frame MTU size, all ports move
to shared pools mode.
Here, buffer manager TX Flow Control reconfigured on all ports.
Signed-off-by: Stefan Chulski <stefanc@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
1KB is enough for loopback port, so 2KB can be distributed
between other ports.
Signed-off-by: Stefan Chulski <stefanc@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The sja1105 driver has a limitation, extensively described under
Documentation/networking/dsa/sja1105.rst and
Documentation/networking/devlink/sja1105.rst, which says that when the
ports are under a bridge with vlan_filtering=1, traffic to and from
the network stack is not possible, unless the driver-specific
best_effort_vlan_filtering devlink parameter is enabled.
For users, this creates a 'wtf' moment. They need to go to the
documentation and find about the existence of this property, then maybe
install devlink and set it to true.
Having best_effort_vlan_filtering enabled by the kernel by default
delays that 'wtf' moment (maybe up to the point that it never even
happens). The user doesn't need to care that the driver supports
addressing the ports individually by retagging VLAN IDs until he/she
needs to use more than 32 VLAN IDs (since there can be at most 32
retagging rules). Only then do they need to think whether they need the
full VLAN table, at the expense of no individual port addressing, or
not.
But the odds that an sja1105 user will need more than 32 VLANs
terminated by the CPU is probably low. And, if we were to follow the
principle that more advanced use cases should require more advanced
preparation steps, then it makes more sense for ping to 'just work'
while CPU termination of > 32 VLAN IDs to require a bit more forethought
and possibly a driver-specific devlink param.
So we should be able to safely change the default here, and make this
driver act just a little bit more sanely out of the box.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The usage of in_interrupt() in non-core code is phased out. Ideally the
information of the calling context should be passed by the callers or the
functions be split as appropriate.
The attempt to consolidate the code by passing an arguemnt or by
distangling it failed due lack of knowledge about this driver and because
the call chains are hard to follow.
As a stop gap use netif_rx_any_context() which invokes the correct code path
depending on context and confines the in_interrupt() usage to core code.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
My prior cleanup missed that tcp_data_ready() has to look at SOCK_DONE.
Otherwise, an application using SO_RCVLOWAT will not get EPOLLIN event
if a FIN is received in the middle of expected payload.
The reason SOCK_DONE is not examined in tcp_epollin_ready()
is that tcp_poll() catches the FIN because tcp_fin()
is also setting RCV_SHUTDOWN into sk->sk_shutdown
Fixes: 05dc72aba3 ("tcp: factorize logic into tcp_epollin_ready()")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Wei Wang <weiwan@google.com>
Cc: Arjun Roy <arjunroy@google.com>
Reviewed-by: Wei Wang <weiwan@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vladimir Oltean says:
====================
Fix buggy brport flags offload for SJA1105 DSA
While testing the "Software fallback for bridging in DSA" on sja1105, I
discovered that I managed to introduce two bugs in a single patch
submitted recently to net-next.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
The prototype of br_vlan_filter_toggle was updated to include a netlink
extack, but the stub definition wasn't, which results in a build error
when CONFIG_BRIDGE_VLAN_FILTERING=n.
Fixes: 9e781401cb ("net: bridge: propagate extack through store_bridge_parm")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The switchdev_port_attr_set function prototype was updated only for the
case where CONFIG_SWITCHDEV=y|m, leaving a prototype mismatch with the
stub definition for the disabled case. This results in a build error, so
update that function too.
Fixes: dcbdf1350e ("net: bridge: propagate extack through switchdev_port_attr_set")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
dev->name is determined only after calling register_hdlc_device(),
however ,it is used by printk before the name is fully determined.
[ 4.565137] hdlc%d: detected at e8000000, irq 11
Instead of printing out a %d, print hdlc directly
Signed-off-by: Tong Zhang <ztong0001@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Smatch is confused by the fact that a 32-bit BIT(port) macro is passed
as argument to the ocelot_ifh_set_dest function and warns:
ocelot_xmit() warn: should '(((1))) << (dp->index)' be a 64 bit type?
seville_xmit() warn: should '(((1))) << (dp->index)' be a 64 bit type?
The destination port mask is copied into a 12-bit field of the packet,
starting at bit offset 67 and ending at 56.
So this DSA tagging protocol supports at most 12 bits, which is clearly
less than 32. Attempting to send to a port number > 12 will cause the
packing() call to truncate way before there will be 32-bit truncation
due to type promotion of the BIT(port) argument towards u64.
Therefore, smatch's fears that BIT(port) will do the wrong thing and
cause unexpected truncation for "port" values >= 32 are unfounded.
Nonetheless, let's silence the warning by explicitly passing an u64
value to ocelot_ifh_set_dest, such that the compiler does not need to do
a questionable type promotion.
Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The documentation for the PHY update [1] states:
Loop 4 times with index i
If PHY Revision >= 3
Copy table[i] to coef[i]
Otherwise
Set coef[i] to 0
the copy of the table to coef is currently implemented the wrong way
around, table is being updated from uninitialized values in coeff.
Fix this by swapping the assignment around.
[1] https://bcm-v4.sipsolutions.net/802.11/PHY/N/RestoreCal/
Fixes: 2f258b74d1 ("b43: N-PHY: implement restoring general configuration")
Addresses-Coverity: ("Uninitialized scalar variable")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Acked-by: Larry Finger <Larry.Finger@lwfinger.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
The Max imm data size in cxgb4 is not similar to the max imm data size
in the chtls. This caused an mismatch in output of is_ofld_imm() of
cxgb4 and chtls. So fixed this by keeping the max wreq size of imm data
same in both chtls and cxgb4 as MAX_IMM_OFLD_TX_DATA_WR_LEN.
As cxgb4's max imm. data value for ofld packets is changed to
MAX_IMM_OFLD_TX_DATA_WR_LEN. Using the same in cxgbit also.
Fixes: 36bedb3f2e ("crypto: chtls - Inline TLS record Tx")
Signed-off-by: Ayush Sawal <ayush.sawal@chelsio.com>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
idt77252 is broken and wont load on amd64 systems
modprobe idt77252 shows the following
idt77252_init: skb->cb is too small (48 < 56)
Add packed attribute to struct idt77252_skb_prv and struct atm_skb_data
so that the total size can be <= sizeof(skb->cb)
Also convert runtime size check to buildtime size check in
idt77252_init()
Signed-off-by: Tong Zhang <ztong0001@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Armin reported that after referenced commit his RTL8105e is dead when
resuming from suspend and machine runs on battery. This patch has been
confirmed to fix the issue.
Fixes: e80bd76fbf ("r8169: work around power-saving bug on some chip versions")
Reported-by: Armin Wolf <W_Armin@gmx.de>
Tested-by: Armin Wolf <W_Armin@gmx.de>
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If this is attempted by an io-wq kthread, then return -EOPNOTSUPP as we
don't currently support that. Once we can get task_pid_ptr() doing the
right thing, then this can go away again.
Use PF_IO_WORKER for this to speciically target the io_uring workers.
Modify the /proc/self/ check to use PF_IO_WORKER as well.
Cc: stable@vger.kernel.org
Fixes: 8d4c3e76e3 ("proc: don't allow async path resolution of /proc/self components")
Reported-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
* powercap:
powercap: intel_rapl: Use topology interface in rapl_init_domains()
powercap: intel_rapl: Use topology interface in rapl_add_package()
powercap/intel_rapl: add support for AlderLake Mobile
powercap/drivers/dtpm: Fix size of object being allocated
powercap/drivers/dtpm: Fix an IS_ERR() vs NULL check
powercap/drivers/dtpm: Fix some missing unlock bugs
powercap/drivers/dtpm: Fix a double shift bug
powercap/drivers/dtpm: Fix __udivdi3 and __aeabi_uldivmod unresolved symbols
powercap/drivers/dtpm: Add CPU energy model based support
powercap/drivers/dtpm: Add API for dynamic thermal power management
Documentation/powercap/dtpm: Add documentation for dtpm
units: Add Watt units
* pm-misc:
PM: Kconfig: remove unneeded "default n" options
PM: EM: update Kconfig description and drop "default n" option
A userspace daemon like firewalld might need to monitor for netlink
updates to detect its ruleset removal by the (global) flush ruleset
command to ensure ruleset persistency. This adds extra complexity from
userspace and, for some little time, the firewall policy is not in
place.
This patch adds the NFT_TABLE_F_OWNER flag which allows a userspace
program to own the table that creates in exclusivity.
Tables that are owned...
- can only be updated and removed by the owner, non-owners hit EPERM if
they try to update it or remove it.
- are destroyed when the owner closes the netlink socket or the process
is gone (implicit netlink socket closure).
- are skipped by the global flush ruleset command.
- are listed in the global ruleset.
The userspace process that sets on the NFT_TABLE_F_OWNER flag need to
leave open the netlink socket.
A new NFTA_TABLE_OWNER netlink attribute specifies the netlink port ID
to identify the owner from userspace.
This patch also updates error reporting when an unknown table flag is
specified to change it from EINVAL to EOPNOTSUPP given that EINVAL is
usually reserved to report for malformed netlink messages to userspace.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* acpi-messages:
ACPI: OSL: Clean up printing messages
ACPI: OSL: Rework acpi_check_resource_conflict()
ACPI: thermal: Clean up printing messages
ACPI: video: Clean up printing messages
ACPI: button: Clean up printing messages
ACPI: battery: Clean up printing messages
ACPI: AC: Clean up printing messages
ACPI: bus: Drop ACPI_BUS_COMPONENT which is not used any more
ACPI: utils: Clean up printing messages
ACPI: scan: Clean up printing messages
ACPI: bus: Clean up printing messages
ACPI: PM: Clean up printing messages
ACPI: power: Clean up printing messages
* pm-devfreq:
PM / devfreq: rk3399_dmc: Remove unneeded semicolon
PM / devfreq: Replace devfreq->dev.parent as dev in devfreq_add_device
PM / devfreq: Correct spelling in a comment
* pm-tools:
cpupower: Add cpuid cap flag for MSR_AMD_HWCR support
cpupower: Remove family arg to decode_pstates()
cpupower: Condense pstate enabled bit checks in decode_pstates()
cpupower: Update family checks when decoding HW pstates
cpupower: Remove unused pscur variable.
cpupower: Add CPUPOWER_CAP_AMD_HW_PSTATE cpuid caps flag
cpupower: Correct macro name for CPB caps flag
cpupower: Update msr_pstate union struct naming
cpupower: add Makefile dependencies for install targets
* pm-opp: (37 commits)
PM / devfreq: Add required OPPs support to passive governor
PM / devfreq: Cache OPP table reference in devfreq
OPP: Add function to look up required OPP's for a given OPP
opp: Replace ENOTSUPP with EOPNOTSUPP
opp: Fix "foo * bar" should be "foo *bar"
opp: Don't ignore clk_get() errors other than -ENOENT
opp: Update bandwidth requirements based on scaling up/down
opp: Allow lazy-linking of required-opps
opp: Remove dev_pm_opp_set_bw()
devfreq: tegra30: Migrate to dev_pm_opp_set_opp()
drm: msm: Migrate to dev_pm_opp_set_opp()
cpufreq: qcom: Migrate to dev_pm_opp_set_opp()
opp: Implement dev_pm_opp_set_opp()
opp: Update parameters of _set_opp_custom()
opp: Allow _generic_set_opp_clk_only() to work for non-freq devices
opp: Allow _generic_set_opp_regulator() to work for non-freq devices
opp: Allow _set_opp() to work for non-freq devices
opp: Split _set_opp() out of dev_pm_opp_set_rate()
opp: Keep track of currently programmed OPP
opp: No need to check clk for errors
...
* pm-sleep:
PM: sleep: Constify static struct attribute_group
PM: sleep: Use dev_printk() when possible
PM: sleep: No need to check PF_WQ_WORKER in thaw_kernel_threads()
* pm-core:
PM: runtime: Fix typos and grammar
PM: runtime: Fix resposible -> responsible in runtime.c
* pm-domains:
PM: domains: Simplify the calculation of variables
PM: domains: Add "performance" column to debug summary
PM: domains: Make of_genpd_add_subdomain() return -EPROBE_DEFER
PM: domains: Make set_performance_state() callback optional
PM: domains: use device's next wakeup to determine domain idle state
PM: domains: inform PM domain of a device's next wakeup
* pm-clk:
PM: clk: make PM clock layer compatible with clocks that must sleep
* pm-cpuidle:
MAINTAINERS: cpuidle: exynos: include header in file pattern
intel_idle: remove definition of DEBUG
* pm-cpufreq:
cpufreq: Remove unused flag CPUFREQ_PM_NO_WARN
cpufreq: Remove CPUFREQ_STICKY flag
cpufreq: intel_pstate: Remove repeated word
cpufreq: remove tango driver
cpufreq: brcmstb-avs-cpufreq: Fix resource leaks in ->remove()
cpufreq: brcmstb-avs-cpufreq: Free resources in error path
cpufreq: qcom-hw: enable boost support
cpufreq: tegra20: Use resource-managed API
cpufreq: intel_pstate: Get per-CPU max freq via MSR_HWP_CAPABILITIES if available
cpufreq: intel_pstate: Rename two functions
cpufreq: intel_pstate: Change intel_pstate_get_hwp_max() argument
cpufreq: intel_pstate: Always read hwp_cap_cached with READ_ONCE()
- New driver for the MIPS-based Realtek RTL838x/RTL839x SoC
- Conversion of the sun6i-r support code to a hierarchical setup
- Fix wake-up interrupts for the ls-extirq driver
- Fix MSI allocation for the loongson-pch-msi driver
- Add compatible strings for new Qualcomm SoCs
- Tidy up a few Kconfig entries (IMX, CSKY)
- Spelling phyksiz
- Remove the sirfsoc and tango drivers
-----BEGIN PGP SIGNATURE-----
iQJDBAABCgAtFiEEn9UcU+C1Yxj9lZw9I9DQutE9ekMFAmApGDMPHG1hekBrZXJu
ZWwub3JnAAoJECPQ0LrRPXpDcucP/RXS6lW9uTfTUudLIib/YTHNZ7uTuVZrfX+K
4OEAPZKkRDlsSuWvr5i98da0DM0yw0B06c62oNuw7sex6JXsO/2FYISDskUZnlYZ
+ApdQGHhVL+kAOHubepYrZLXf7hPX6N/kZotnq1GU0VY1JhxBkVj6sVFJKEnOwf/
sUwDO2+XvmjgrdXRyZgjsyz71SXQ8OosAIsLAI9UHIB/mHi9R3oItEJD7xANeunp
IsNelUAnqBPN7/OuyeThh2SGxVXVBtja9mRd+AN7HrDmocxnYnsgMQDR/cEO2aTw
xdeKQbEnViUK0JXkofIPwkck58ceUkAl8LLqe0tHPfQgkoPVE9diBrqrP4df7999
NIhPYe2kjZ3+DDNK0zAkvKHiHXC7VPl2qGoqqIUX10x1M+BNxfMSYXgjM4PNK7p7
Ey0w43hnvNKQp7y7ubf+BMStXEOPd4pwyt1P713mSbxeX9sSwGwffI26o58tVpHA
zmoaL5MULWQPFCnznN3yaoBf57hNrkCmXR5lZLwD30VN8m5pJshduvO+rE+6LUgI
DMQPWpCNToKDd0xxnrnMuf/XjOunNSrX0wt9xtuwmpw5gwlpJp2m9TyZ6Xa5M3rl
PYwnVr9gWwGSMw9+G0Ky+C2s+SBzRrPDLfkDOODaGflP/t1b0S8K+1vuIqUHH8fz
2DBWttJp
=hhRV
-----END PGP SIGNATURE-----
Merge tag 'irqchip-5.12' of git://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms into irq/core
Pull irqchip updates from Marc Zyngier
- New driver for the MIPS-based Realtek RTL838x/RTL839x SoC
- Conversion of the sun6i-r support code to a hierarchical setup
- Fix wake-up interrupts for the ls-extirq driver
- Fix MSI allocation for the loongson-pch-msi driver
- Add compatible strings for new Qualcomm SoCs
- Tidy up a few Kconfig entries (IMX, CSKY)
- Spelling phyksiz
- Remove the sirfsoc and tango drivers
Link: https://lore.kernel.org/r/20210214124015.3333457-1-maz@kernel.org
There is a specific API to treat raw data as GUID, i.e. export_guid()
and import_guid(). Use them instead of guid_copy() with explicit casting.
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Matias Bjørling <matias.bjorling@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Remove NULL checks before vfree() to fix these warnings:
./drivers/lightnvm/pblk-gc.c:27:2-7: WARNING: NULL check before some
freeing functions is not needed.
Signed-off-by: Tian Tao <tiantao6@hisilicon.com>
Signed-off-by: Matias Bjørling <matias.bjorling@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Stefan Chulski says:
====================
net: mvpp2: Minor non functional driver code improvements
The patch series contains minor code improvements and did not change any functionality.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
GENCONF_CTRL0_PORTX naming improved.
Non functional change.
Signed-off-by: Stefan Chulski <stefanc@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use PTR_ERR_OR_ZERO instead of IS_ERR and PTR_ERR.
Non functional change.
Signed-off-by: Stefan Chulski <stefanc@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use >= MVPP22 instead of != MVPP21.
Non functional change.
Signed-off-by: Stefan Chulski <stefanc@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
PPv2.1 contain 0 in Version ID register, priv->hw_version check
can be removed.
Signed-off-by: Stefan Chulski <stefanc@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vladimir Oltean says:
====================
Propagate extack for switchdev VLANs from DSA
This series moves the restriction messages printed by the DSA core, and
by some individual device drivers, into the netlink extended ack
structure, to be communicated to user space where possible, or still
printed to the kernel log from the bridge layer.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Some drivers can't dynamically change the VLAN filtering option, or
impose some restrictions, it would be nice to propagate this info
through netlink instead of printing it to a kernel log that might never
be read. Also netlink extack includes the module that emitted the
message, which means that it's easier to figure out which ones are
driver-generated errors as opposed to command misuse.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Allow drivers to communicate their restrictions to user space directly,
instead of printing to the kernel log. Where the conversion would have
been lossy and things like VLAN ID could no longer be conveyed (due to
the lack of support for printf format specifier in netlink extack), I
chose to keep the messages in full form to the kernel log only, and
leave it up to individual driver maintainers to move more messages to
extack.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The benefit is the ability to propagate errors from switchdev drivers
for the SWITCHDEV_ATTR_ID_BRIDGE_VLAN_FILTERING and
SWITCHDEV_ATTR_ID_BRIDGE_VLAN_PROTOCOL attributes.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The bridge sysfs interface stores parameters for the STP, VLAN,
multicast etc subsystems using a predefined function prototype.
Sometimes the underlying function being called supports a netlink
extended ack message, and we ignore it.
Let's expand the store_bridge_parm function prototype to include the
extack, and just print it to console, but at least propagate it where
applicable. Where not applicable, create a shim function in the
br_sysfs_br.c file that discards the extra function argument.
This patch allows us to propagate the extack argument to
br_vlan_set_default_pvid, br_vlan_set_proto and br_vlan_filter_toggle,
and from there, further up in br_changelink from br_netlink.c.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This function is identical with br_vlan_filter_toggle.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vladimir Oltean says:
====================
PTP for DSA tag_ocelot_8021q
Changes in v2:
Add stub definition for ocelot_port_inject_frame when switch driver is
not compiled in.
This is part two of the errata workaround begun here:
https://patchwork.kernel.org/project/netdevbpf/cover/20210129010009.3959398-1-olteanv@gmail.com/
Now that we have basic traffic support when we operate the Ocelot DSA
switches without an NPI port, it would be nice to regain some of the
features lost due to the lack of the NPI port functionality. An
important one is PTP timestamping, which is intimately tied to the DSA
frame header added by the NPI port: on TX, we put a "timestamp request
ID" in the Injection Frame Header, while on RX, the Extraction Frame
Header contains a partial 32-bit PTP timestamp. Get rid of the NPI port
and replace it with a VLAN-based tagger, and you lose PTP, right?
Well, not quite, this is what this patch series is about. The NPI port
is basically a regular Ethernet port configured to service the packets
in and out of the switch's CPU port module (which has other non-DSA I/O
mechanisms too, such as register-based MMIO and DMA). If we disable the
NPI port, we can in theory still access the packets delivered to the CPU
port module by doing exactly what the ocelot switchdev driver does:
extracting Ethernet packets through registers (yes, it is as icky as it
sounds).
However, there's a catch. The Felix switch was integrated into NXP
LS1028A with the idea in mind that it will operate as DSA, i.e. using
the CPU port module connected to the NPI port, not having I/O over
register-based MMIO which is painfully slow and CPU intensive. So
register-based packet I/O not supposed to work - those registers aren't
even documented in the hardware reference manual for Felix. However
they kinda do, with the exception of the fact that an RX interrupt was
really not wired to the CPU cores - so we don't know when the CPU port
module receives a new packet. But we can hack even around that, by
replicating every packet that goes to the CPU port module and making it
also go to a plain internal Ethernet port. Then drop the Ethernet packet
and read the other copy of it from the CPU port module, this time
annotated with the much-wanted RX timestamp.
This is all fine and it works, but it does raise some questions about
what DSA even is anymore, if we start having switches that inject some
of their packets over Ethernet and some through registers, where do we
draw the line. In principle I believe these concerns are founded, but at
the same time, the way that the Felix driver uses register MMIO based
packet I/O is fundamentally the same as any other DSA driver capable of
PTP makes use of a side-channel for timestamps like a FIFO (just that
this one is a lot more complicated, and comes with the entire actual
packet, not just the timestamp).
Nonetheless, I tried to keep the extra pressure added by this ERR
workaround upon the DSA subsystem as small as possible, so some of the
patches are just a revisit of some of Andrew's complaints w.r.t. the
fact that tag_ocelot already violates any driver <-> tagger boundary,
and as a consequence, is not able to be used on testbeds such as
dsa_loop (which it now can). So now, the tag_ocelot and tag_ocelot_8021q
drivers should be dsa_loop-clean, and have the ERR workarounds as
self-contained as possible, using all the designated features for PTP
timestamping and nothing more.
Comments appreciated.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
For TX timestamping, we use the felix_txtstamp method which is common
with the regular (non-8021q) ocelot tagger. This method says that skb
deferral is needed, prepares a timestamp request ID, and puts a clone of
the skb in a queue waiting for the timestamp IRQ.
felix_txtstamp is called by dsa_skb_tx_timestamp() just before the
tagger's xmit method. In the tagger xmit, we divert the packets
classified by dsa_skb_tx_timestamp() as PTP towards the MMIO-based
injection registers, and we declare them as dead towards dsa_slave_xmit.
If not PTP, we proceed with normal tag_8021q stuff.
Then the timestamp IRQ fires, the clone queued up from felix_txtstamp is
matched to the TX timestamp retrieved from the switch's FIFO based on
the timestamp request ID, and the clone is delivered to the stack.
On RX, thanks to the VCAP IS2 rule that redirects the frames with an
EtherType for 1588 towards two destinations:
- the CPU port module (for MMIO based extraction) and
- if the "no XTR IRQ" workaround is in place, the dsa_8021q CPU port
the relevant data path processing starts in the ptp_classify_raw BPF
classifier installed by DSA in the RX data path (post tagger, which is
completely unaware that it saw a PTP packet).
This time we can't reuse the same implementation of .port_rxtstamp that
also works with the default ocelot tagger. That is because felix_rxtstamp
is given an skb with a freshly stripped DSA header, and it says "I don't
need deferral for its RX timestamp, it's right in it, let me show you";
and it just points to the header right behind skb->data, from where it
unpacks the timestamp and annotates the skb with it.
The same thing cannot happen with tag_ocelot_8021q, because for one
thing, the skb did not have an extraction frame header in the first
place, but a VLAN tag with no timestamp information. So the code paths
in felix_rxtstamp for the regular and 8021q tagger are completely
independent. With tag_8021q, the timestamp must come from the packet's
duplicate delivered to the CPU port module, but there is potentially
complex logic to be handled [ and prone to reordering ] if we were to
just start reading packets from the CPU port module, and try to match
them to the one we received over Ethernet and which needs an RX
timestamp. So we do something simple: we tell DSA "give me some time to
think" (we request skb deferral by returning false from .port_rxtstamp)
and we just drop the frame we got over Ethernet with no attempt to match
it to anything - we just treat it as a notification that there's data to
be processed from the CPU port module's queues. Then we proceed to read
the packets from those, one by one, which we deliver up the stack,
timestamped, using netif_rx - the same function that any driver would
use anyway if it needed RX timestamp deferral. So the assumption is that
we'll come across the PTP packet that triggered the CPU extraction
notification eventually, but we don't know when exactly. Thanks to the
VCAP IS2 trap/redirect rule and the exclusion of the CPU port module
from the flooding replicators, only PTP frames should be present in the
CPU port module's RX queues anyway.
There is just one conflict between the VCAP IS2 trapping rule and the
semantics of the BPF classifier. Namely, ptp_classify_raw() deems
general messages as non-timestampable, but still, those are trapped to
the CPU port module since they have an EtherType of ETH_P_1588. So, if
the "no XTR IRQ" workaround is in place, we need to run another BPF
classifier on the frames extracted over MMIO, to avoid duplicates being
sent to the stack (once over Ethernet, once over MMIO). It doesn't look
like it's possible to install VCAP IS2 rules based on keys extracted
from the 1588 frame headers.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since the tag_8021q tagger is software-defined, it has no means by
itself for retrieving hardware timestamps of PTP event messages.
Because we do want to support PTP on ocelot even with tag_8021q, we need
to use the CPU port module for that. The RX timestamp is present in the
Extraction Frame Header. And because we can't use NPI mode which redirects
the CPU queues to an "external CPU" (meaning the ARM CPU running Linux),
then we need to poll the CPU port module through the MMIO registers to
retrieve TX and RX timestamps.
Sadly, on NXP LS1028A, the Felix switch was integrated into the SoC
without wiring the extraction IRQ line to the ARM GIC. So, if we want to
be notified of any PTP packets received on the CPU port module, we have
a problem.
There is a possible workaround, which is to use the Ethernet CPU port as
a notification channel that packets are available on the CPU port module
as well. When a PTP packet is received by the DSA tagger (without timestamp,
of course), we go to the CPU extraction queues, poll for it there, then
we drop the original Ethernet packet and masquerade the packet retrieved
over MMIO (plus the timestamp) as the original when we inject it up the
stack.
Create a quirk in struct felix is selected by the Felix driver (but not
by Seville, since that doesn't support PTP at all). We want to do this
such that the workaround is minimally invasive for future switches that
don't require this workaround.
The only traffic for which we need timestamps is PTP traffic, so add a
redirection rule to the CPU port module for this. Currently we only have
the need for PTP over L2, so redirection rules for UDP ports 319 and 320
are TBD for now.
Note that for the workaround of matching of PTP-over-Ethernet-port with
PTP-over-MMIO queues to work properly, both channels need to be
absolutely lossless. There are two parts to achieving that:
- We keep flow control enabled on the tag_8021q CPU port
- We put the DSA master interface in promiscuous mode, so it will never
drop a PTP frame (for the profiles we are interested in, these are
sent to the multicast MAC addresses of 01-80-c2-00-00-0e and
01-1b-19-00-00-00).
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since the felix DSA driver will need to poll the CPU port module for
extracted frames as well, let's create some common functions that read
an Extraction Frame Header, and then an skb, from a CPU extraction
group.
We abuse the struct ocelot_ops :: port_to_netdev function a little bit,
in order to retrieve the DSA port net_device or the ocelot switchdev
net_device based on the source port information from the Extraction
Frame Header, but it's all in the benefit of code simplification -
netdev_alloc_skb needs it. Originally, the port_to_netdev method was
intended for parsing act->dev from tc flower offload code.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The ocelot tagger is a hot mess currently, it relies on memory
initialized by the attached driver for basic frame transmission.
This is against all that DSA tagging protocols stand for, which is that
the transmission and reception of a DSA-tagged frame, the data path,
should be independent from the switch control path, because the tag
protocol is in principle hot-pluggable and reusable across switches
(even if in practice it wasn't until very recently). But if another
driver like dsa_loop wants to make use of tag_ocelot, it couldn't.
This was done to have common code between Felix and Ocelot, which have
one bit difference in the frame header format. Quoting from commit
67c2404922 ("net: dsa: felix: create a template for the DSA tags on
xmit"):
Other alternatives have been analyzed, such as:
- Create a separate tag_seville.c: too much code duplication for just 1
bit field difference.
- Create a separate DSA_TAG_PROTO_SEVILLE under tag_ocelot.c, just like
tag_brcm.c, which would have a separate .xmit function. Again, too
much code duplication for just 1 bit field difference.
- Allocate the template from the init function of the tag_ocelot.c
module, instead of from the driver: couldn't figure out a method of
accessing the correct port template corresponding to the correct
tagger in the .xmit function.
The really interesting part is that Seville should have had its own
tagging protocol defined - it is not compatible on the wire with Ocelot,
even for that single bit. In principle, a packet generated by
DSA_TAG_PROTO_OCELOT when booted on NXP LS1028A would look in a certain
way, but when booted on NXP T1040 it would look differently. The reverse
is also true: a packet generated by a Seville switch would be
interpreted incorrectly by Wireshark if it was told it was generated by
an Ocelot switch.
Actually things are a bit more nuanced. If we concentrate only on the
DSA tag, what I said above is true, but Ocelot/Seville also support an
optional DSA tag prefix, which can be short or long, and it is possible
to distinguish the two taggers based on an integer constant put in that
prefix. Nonetheless, creating a separate tagger is still justified,
since the tag prefix is optional, and without it, there is again no way
to distinguish.
Claiming backwards binary compatibility is a bit more tough, since I've
already changed the format of tag_ocelot once, in commit 5124197ce5
("net: dsa: tag_ocelot: use a short prefix on both ingress and egress").
Therefore I am not very concerned with treating this as a bugfix and
backporting it to stable kernels (which would be another mess due to the
fact that there would be lots of conflicts with the other DSA_TAG_PROTO*
definitions). It's just simpler to say that the string values of the
taggers have ABI value starting with kernel 5.12, which will be when the
changing of tag protocol via /sys/class/net/<dsa-master>/dsa/tagging
goes live.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
There is one place where we cannot avoid accessing driver data, and that
is 2-step PTP TX timestamping, since the switch wants us to provide a
timestamp request ID through the injection header, which naturally must
come from a sequence number kept by the driver (it is generated by the
.port_txtstamp method prior to the tagger's xmit).
However, since other drivers like dsa_loop do not claim PTP support
anyway, the DSA_SKB_CB(skb)->clone will always be NULL anyway, so if we
move all PTP-related dereferences of struct ocelot and struct ocelot_port
into a separate function, we can effectively ensure that this is dead
code when the ocelot tagger is attached to non-ocelot switches, and the
stateful portion of the tagger is more self-contained.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>