The function allows flushing all the existing VLAN entries on a port. It
is invoked when a port is destroyed and when it is unlinked from a LAG.
In the latter case, when moving to the new default VLAN, there will not
be a need to destroy the default VLAN entry.
Therefore, add an argument that allows to control whether the default
port VLAN should be destroyed or not. Currently it is always set to
'true'.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently, the driver does not set the port's PVID when initializing a
new port. This is because the driver is using VID 1 as PVID which is the
firmware default.
Subsequent patches are going to change the PVID the driver is setting
when initializing a new port.
Prepare for that by explicitly setting the port's PVID.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Subsequent patches are going to replace the current default VID (1) with
VLAN_N_VID - 1 (4095).
Prepare for this conversion by replacing the hard-coded '1' with a
define.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Previous patch added the ability to offload a VXLAN tunnel used for L3
VNI when it is present in the VLAN-aware bridge before the corresponding
VLAN interface is configured. This patch adds a test case to verify
that.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In symmetric routing, the only two members in the VLAN corresponding to
the L3 VNI are the router port and the VXLAN tunnel.
In case the VXLAN device is already enslaved to the bridge and only
later the VLAN interface is configured, the tunnel will not be
offloaded.
The reason for this is that when the router interface (RIF)
corresponding to the VLAN interface is configured, it calls the core
fid_get() API which does not check if NVE should be enabled on the FID.
Instead, call into the bridge code which will check if NVE should be
enabled on the FID.
This effectively means that the same code path is used to retrieve a FID
when either a local port or a router port joins the FID.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We have no explicit signal when a UDP stream has terminated, peers just
stop sending.
For suspected stream connections a timeout of two minutes is sane to keep
NAT mapping alive a while longer.
It matches tcp conntracks 'timewait' default timeout value.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Currently DNS resolvers that send both A and AAAA queries from same source port
can trigger stream mode prematurely, which results in non-early-evictable conntrack entry
for three minutes, even though DNS requests are done in a few milliseconds.
Add a two second grace period where we continue to use the ordinary
30-second default timeout. Its enough for DNS request/response traffic,
even if two request/reply packets are involved.
ASSURED is still set, else conntrack (and thus a possible
NAT mapping ...) gets zapped too in case conntrack table runs full.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Jeff Kirsher says:
====================
Intel Wired LAN Driver Updates 2018-12-20
This series contains updates to e100, igb, ixgbe, i40e and ice drivers.
I replaced spinlocks for mutex locks to reduce the latency on CPU0 for
igb when updating the statistics. This work was based off a patch
provided by Jan Jablonsky, which was against an older version of the igb
driver.
Jesus adjusts the receive packet buffer size from 32K to 30K when
running in QAV mode, to stay within 60K for total packet buffer size for
igb.
Vinicius adds igb kernel documentation regarding the CBS algorithm and
its implementation in the i210 family of NICs.
YueHaibing from Huawei fixed the e100 driver that was potentially
passing a NULL pointer, so use the kernel macro IS_ERR_OR_NULL()
instead.
Konstantin Khorenko fixes i40e where we were not setting up the
neigh_priv_len in our net_device, which caused the driver to read beyond
the neighbor entry allocated memory.
Miroslav Lichvar extends the PTP gettime() to read the system clock by
adding support for PTP_SYS_OFFSET_EXTENDED ioctl in i40e.
Young Xiao fixed the ice driver to only enable NAPI on q_vectors that
actually have transmit and receive rings.
Kai-Heng Feng fixes an igb issue that when placed in suspend mode, the
NIC does not wake up when a cable is plugged in. This was due to the
driver not setting PME during runtime suspend.
Stephen Douthit enables the ixgbe driver allow DSA devices to use the
MII interface to talk to switches.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull i2c fixes from Wolfram Sang:
"I2C has a MAINTAINERS update for you, so people will be immediately
pointed to the right person for this previously orphaned driver.
And one of Arnd's build warning fixes for a new driver added this
cycle"
* 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
i2c: nvidia-gpu: mark resume function as __maybe_unused
MAINTAINERS: add entry for i2c-axxia driver
John Fastabend says:
====================
Set of bpf fixes and improvements to make sockmap with kTLS usable
with "real" applications. This set came as the fallout of pulling
kTLS+sockmap into Cilium[1] and running in container environment.
Roughly broken into three parts,
Patches 1-3: resolve/improve handling of size field in sk_msg_md
Patch 4: it became difficult to use this in Cilium when the
SK_PASS verdict was not correctly handle. So handle
the case correctly.
Patch 5-8: Set of issues found while running OpenSSL TX kTLS
enabled applications. This resolves the most obvious
issues and gets applications using kTLS TX up and
running with sock{map|has}.
Other than the "sk_msg, zap ingress queue on psock down" (PATCH 6/8)
which can potentially cause a WARNING the issues fixed in this
series do not cause kernel side warnings, BUG, etc. but instead
cause stalls and other odd behavior in the user space applications
when using kTLS with BPF policies applied.
Primarily tested with 'curl' compiled with latest openssl and
also 'openssl s_client/s_server' containers using Cilium network
plugin with docker/k8s. Some basic testing with httpd was also
enabled. Cilium CI tests will be added shortly to cover these
cases as well. We also have 'wrk' and other test and benchmarking
tools we can run now.
We have two more sets of patches currently under testing that
will be sent shortly to address a few more issues. First the
OpenSSL RX kTLS side breaks when both sk_msg and sk_skb_verdict
programs are used with kTLS, the sk_skb_verdict programs are
not enforced. Second skmsg needs to call into tcp stack to
send to indicate consumed data.
====================
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
The existing code did not expect users would initialize the TLS ULP
without subsequently calling the TLS TX enabling socket option.
If the application tries to send data after the TLS ULP enable op
but before the TLS TX enable op the BPF sk_msg verdict program is
skipped. This patch resolves this by converting the ipv4 sock ops
to be calculated at init time the same way ipv6 ops are done. This
pulls in any changes to the sock ops structure that have been made
after the socket was created including the changes from adding the
socket to a sock{map|hash}.
This was discovered by running OpenSSL master branch which calls
the TLS ULP setsockopt early in TLS handshake but only enables
the TLS TX path once the handshake has completed. As a result the
datapath missed the initial handshake messages.
Fixes: 02c558b2d5 ("bpf: sockmap, support for msg_peek in sk_msg with redirect ingress")
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
A sockmap program that redirects through a kTLS ULP enabled socket
will not work correctly because the ULP layer is skipped. This
fixes the behavior to call through the ULP layer on redirect to
ensure any operations required on the data stream at the ULP layer
continue to be applied.
To do this we add an internal flag MSG_SENDPAGE_NOPOLICY to avoid
calling the BPF layer on a redirected message. This is
required to avoid calling the BPF layer multiple times (possibly
recursively) which is not the current/expected behavior without
ULPs. In the future we may add a redirect flag if users _do_
want the policy applied again but this would need to work for both
ULP and non-ULP sockets and be opt-in to avoid breaking existing
programs.
Also to avoid polluting the flag space with an internal flag we
reuse the flag space overlapping MSG_SENDPAGE_NOPOLICY with
MSG_WAITFORONE. Here WAITFORONE is specific to recv path and
SENDPAGE_NOPOLICY is only used for sendpage hooks. The last thing
to verify is user space API is masked correctly to ensure the flag
can not be set by user. (Note this needs to be true regardless
because we have internal flags already in-use that user space
should not be able to set). But for completeness we have two UAPI
paths into sendpage, sendfile and splice.
In the sendfile case the function do_sendfile() zero's flags,
./fs/read_write.c:
static ssize_t do_sendfile(int out_fd, int in_fd, loff_t *ppos,
size_t count, loff_t max)
{
...
fl = 0;
#if 0
/*
* We need to debate whether we can enable this or not. The
* man page documents EAGAIN return for the output at least,
* and the application is arguably buggy if it doesn't expect
* EAGAIN on a non-blocking file descriptor.
*/
if (in.file->f_flags & O_NONBLOCK)
fl = SPLICE_F_NONBLOCK;
#endif
file_start_write(out.file);
retval = do_splice_direct(in.file, &pos, out.file, &out_pos, count, fl);
}
In the splice case the pipe_to_sendpage "actor" is used which
masks flags with SPLICE_F_MORE.
./fs/splice.c:
static int pipe_to_sendpage(struct pipe_inode_info *pipe,
struct pipe_buffer *buf, struct splice_desc *sd)
{
...
more = (sd->flags & SPLICE_F_MORE) ? MSG_MORE : 0;
...
}
Confirming what we expect that internal flags are in fact internal
to socket side.
Fixes: d3b18ad31f ("tls: add bpf support to sk_msg handling")
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
In addition to releasing any cork'ed data on a psock when the psock
is removed we should also release any skb's in the ingress work queue.
Otherwise the skb's eventually get free'd but late in the tear
down process so we see the WARNING due to non-zero sk_forward_alloc.
void sk_stream_kill_queues(struct sock *sk)
{
...
WARN_ON(sk->sk_forward_alloc);
...
}
Fixes: 604326b41a ("bpf, sockmap: convert to generic sk_msg interface")
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
When a skb verdict program is in-use and either another BPF program
redirects to that socket or the new SK_PASS support is used the
data_ready callback does not wake up application. Instead because
the stream parser/verdict is using the sk data_ready callback we wake
up the stream parser/verdict block.
Fix this by adding a helper to check if the stream parser block is
enabled on the sk and if so call the saved pointer which is the
upper layers wake up function.
This fixes application stalls observed when an application is waiting
for data in a blocking read().
Fixes: d829e9c411 ("tls: convert to generic sk_msg interface")
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Add SK_PASS verdict support to SK_SKB_VERDICT programs. Now that
support for redirects exists we can implement SK_PASS as a redirect
to the same socket. This simplifies the BPF programs and avoids an
extra map lookup on RX path for simple visibility cases.
Further, reduces user (BPF programmer in this context) confusion
when their program drops skb due to lack of support.
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Enforce comment on structure layout dependency with a BUILD_BUG_ON
to ensure the condition is maintained.
Suggested-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
The check for max offset in sk_msg_is_valid_access uses sizeof()
which is incorrect because it would allow accessing possibly
past the end of the struct in the padded case. Further, it doesn't
preclude accessing any padding that may be added in the middle of
a struct. All told this makes it fragile to rely on.
To fix this explicitly check offsets with fields using the
bpf_ctx_range() and bpf_ctx_range_till() macros.
For reference the current structure layout looks as follows (reported
by pahole)
struct sk_msg_md {
union {
void * data; /* 8 */
}; /* 0 8 */
union {
void * data_end; /* 8 */
}; /* 8 8 */
__u32 family; /* 16 4 */
__u32 remote_ip4; /* 20 4 */
__u32 local_ip4; /* 24 4 */
__u32 remote_ip6[4]; /* 28 16 */
__u32 local_ip6[4]; /* 44 16 */
__u32 remote_port; /* 60 4 */
/* --- cacheline 1 boundary (64 bytes) --- */
__u32 local_port; /* 64 4 */
__u32 size; /* 68 4 */
/* size: 72, cachelines: 2, members: 10 */
/* last cacheline: 8 bytes */
};
So there should be no padding at the moment but fixing this now
prevents future errors.
Reported-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Currently, the test to ensure reads past the end of the sk_msg_md
data structure fail is incorrectly expecting success. Fix this
typo and use correct expected error.
Fixes: 945a47d87c ("bpf: sk_msg, add tests for size field")
Reported-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
The frame_size passed to build_skb must be aligned, else it is
possible that the embedded struct skb_shared_info gets unaligned.
For correctness make sure that xdpf->headroom in included in the
alignment. No upstream drivers can hit this, as all XDP drivers provide
an aligned headroom. This was discovered when playing with implementing
XDP support for mvneta, which have a 2 bytes DSA header, and this
Marvell ARM64 platform didn't like doing atomic operations on an
unaligned skb_shinfo(skb)->dataref addresses.
Fixes: 1c601d829a ("bpf: cpumap xdp_buff to skb conversion and allocation")
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
- Kconfig dependency fixes for our new auth feature
- Fix for selecting the right compressor when creating a fs
- Bugfix for a bug in UBIFS's O_TMPFILE implementation
- Refcounting fixes for UBI
-----BEGIN PGP SIGNATURE-----
iQJKBAABCAA0FiEEdgfidid8lnn52cLTZvlZhesYu8EFAlwb+MAWHHJpY2hhcmRA
c2lnbWEtc3Rhci5hdAAKCRBm+VmF6xi7wSIjD/9zyAQUaMXJzI/ct2SXXikyTgRi
hmLbUhuIEZD/c3tDQMgb3XqL7yYkOt/QnfeMXpi/Fy9wAdKcOU9QCcL6TPfR9B+B
v22VdDcsU5pp2Hcinm2Yz6v2OyYwchb2YShfMkiZSf0CKAhcsKZhaeqhug/Zq+w5
xZ6fMPvu8HhinBavq9OgSpPx892vlF1ypXHNeQNxwslz/gWKso47jtmXpz4LDX7+
s4+p+lagVJ0MHQ3fVhBvEplWr9zPKHvX8uGohADLsDp5bAqRyGNxBwqgGm6J6H7B
UHuqUjUf19hUZEauwKugmPTM3Y0TkZjV472qsp9omx8w/XozVqfb4YgMBBE3Qwuy
ZS7GcaGxf54WJbOJFQv7pS3DUmfOjQBnU7YAR7phMBDeOUfxRXWkuzIB+RkED+0d
7JjTGWQItdbTfXl2QWfSr/1rj45LNKmVPMAI2pKUBnC9dD9pktCcB90vTcrg2vjE
3U9GbqOVORZvHh2bcR0Y8uDUBOp4v7ybCc+OOTNVWuWQpckp79E/7dkVLbpoI12R
xEgMEaCSXlwPUxQ/FrxKBWhOZprPXNMo4xw547SDBxMq0IKUniA+YoUxtN3FxhQ8
OiLkfcQ+5bove9qYfARaRC/hk5lpgMWebAGou/CkfTIPJsw3ZdlmL6HH6ag5i6fS
fBHsVBdtv0ADSLXylg==
=nZML
-----END PGP SIGNATURE-----
Merge tag 'upstream-4.20-rc7' of git://git.infradead.org/linux-ubifs
Pull UBI/UBIFS fixes from Richard Weinberger:
- Kconfig dependency fixes for our new auth feature
- Fix for selecting the right compressor when creating a fs
- Bugfix for a bug in UBIFS's O_TMPFILE implementation
- Refcounting fixes for UBI
* tag 'upstream-4.20-rc7' of git://git.infradead.org/linux-ubifs:
ubifs: Handle re-linking of inodes correctly while recovery
ubi: Do not drop UBI device reference before using
ubi: Put MTD device after it is not used
ubifs: Fix default compression selection in ubifs
ubifs: Fix memory leak on error condition
ubifs: auth: Add CONFIG_KEYS dependency
ubifs: CONFIG_UBIFS_FS_AUTHENTICATION should depend on UBIFS_FS
ubifs: replay: Fix high stack usage
In __ghes_panic() clear the block status in the APEI generic
error status block for that generic hardware error source before
calling panic() to prevent a second panic() in the crash kernel
for exactly the same fatal error.
Otherwise ghes_probe(), running in the crash kernel, would see
an unhandled error in the APEI generic error status block and
panic again, thereby precluding any crash dump.
Signed-off-by: Lenny Szubowicz <lszubowi@redhat.com>
Signed-off-by: David Arcari <darcari@redhat.com>
Tested-by: Tyler Baicar <baicar.tyler@gmail.com>
Acked-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Use the mii_bus callbacks to address the entire clause 22/45 address
space. Enables userspace to poke switch registers instead of a single
PHY address.
The ixgbe firmware may be polling PHYs in a way that is not protected by
the mii_bus lock. This isn't new behavior, but as Andrew Lunn pointed
out there are more addresses available for conflicts.
Signed-off-by: Stephen Douthit <stephend@silicom-usa.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Most dsa devices expect a 'struct mii_bus' pointer to talk to switches
via the MII interface.
While this works for dsa devices, it will not work safely with Linux
PHYs in all configurations since the firmware of the ixgbe device may
be polling some PHY addresses in the background.
Signed-off-by: Stephen Douthit <stephend@silicom-usa.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
I210 ethernet card doesn't wakeup when a cable gets plugged. It's
because its PME is not set.
Since commit 42eca23021 ("PCI: Don't touch card regs after runtime
suspend D3"), if the PCI state is saved, pci_pm_runtime_suspend() stops
calling pci_finish_runtime_suspend(), which enables the PCI PME.
To fix the issue, let's not to save PCI states when it's runtime
suspend, to let the PCI subsystem enables PME.
Fixes: 42eca23021 ("PCI: Don't touch card regs after runtime suspend D3")
Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
If ice driver has q_vectors w/ active NAPI that has no rings,
then this will result in a divide by zero error. To correct it
I am updating the driver code so that we only support NAPI on
q_vectors that have 1 or more rings allocated to them.
See commit 13a8cd191a ("i40e: Do not enable NAPI on q_vectors
that have no rings") for detail.
Signed-off-by: Young Xiao <YangX92@hotmail.com>
Acked-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
This adds support for the PTP_SYS_OFFSET_EXTENDED ioctl.
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Miroslav Lichvar <mlichvar@redhat.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Acked-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Out of bound read reported by KASan.
i40iw_net_event() reads unconditionally 16 bytes from
neigh->primary_key while the memory allocated for
"neighbour" struct is evaluated in neigh_alloc() as
tbl->entry_size + dev->neigh_priv_len
where "dev" is a net_device.
But the driver does not setup dev->neigh_priv_len and
we read beyond the neigh entry allocated memory,
so the patch in the next mail fixes this.
Signed-off-by: Konstantin Khorenko <khorenko@virtuozzo.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Fix a static code checker warning:
drivers/net/ethernet/intel/e100.c:1349
e100_load_ucode_wait() warn: passing zero to 'PTR_ERR'
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Lots of conflicts, by happily all cases of overlapping
changes, parallel adds, things of that nature.
Thanks to Stephen Rothwell, Saeed Mahameed, and others
for their guidance in these resolutions.
Signed-off-by: David S. Miller <davem@davemloft.net>
Add some pointers to the definition of the CBS algorithm, and some
notes about the limits of its implementation in the i210 family of
controllers.
Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Section 4.5.9 of the datasheet says that the total size of all packet
buffers combined (TxPB 0 + 1 + 2 + 3 + RxPB + BMC2OS + OS2BMC) must not
exceed 60KB. Today we are configuring a total of 62KB, so reduce the
RxPB from 32KB to 30KB in order to respect that.
The choice of changing RxPBSIZE here is mainly because it seems more
correct to give more priority to the transmit packet buffers over the
receiver ones when running in Qav mode. Also, the BMC2OS and OS2BMC
sizes are already too short.
Signed-off-by: Jesus Sanchez-Palencia <jesus.s.palencia@gmail.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
This change is based off of the work and suggestion of Jan Jablonsky
<jan.jablonsky@thalesgroup.com>.
The Watchdog workqueue in igb driver is scheduled every 2s for each
network interface. That includes updating a statistics protected by
spinlock. Function igb_update_stats in this case will be protected
against preemption. According to number of a statistics registers
(cca 60), processing this function might cause additional cpu load
on CPU0.
In case of statistics spinlock may be replaced with mutex, which
reduce latency on CPU0.
CC: Bernhard Kaindl <bernhard.kaindl@thalesgroup.com>
CC: Jan Jablonsky <jan.jablonsky@thalesgroup.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Mapping the delay slot emulation page as both writeable & executable
presents a security risk, in that if an exploit can write to & jump into
the page then it can be used as an easy way to execute arbitrary code.
Prevent this by mapping the page read-only for userland, and using
access_process_vm() with the FOLL_FORCE flag to write to it from
mips_dsemul().
This will likely be less efficient due to copy_to_user_page() performing
cache maintenance on a whole page, rather than a single line as in the
previous use of flush_cache_sigtramp(). However this delay slot
emulation code ought not to be running in any performance critical paths
anyway so this isn't really a problem, and we can probably do better in
copy_to_user_page() anyway in future.
A major advantage of this approach is that the fix is small & simple to
backport to stable kernels.
Reported-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Paul Burton <paul.burton@mips.com>
Fixes: 432c6bacbd ("MIPS: Use per-mm page to execute branch delay slot instructions")
Cc: stable@vger.kernel.org # v4.8+
Cc: linux-mips@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: Rich Felker <dalias@libc.org>
Cc: David Daney <david.daney@cavium.com>
In commit 4f83d5ea64 ("security: integrity: make ima_main explicitly
non-modular") I'd removed <linux/module.h> after assuming that the
function is_module_sig_enforced() was an LSM function and not a core
kernel module function.
Unfortunately the typical .config selections used in build testing
provide an implicit <linux/module.h> presence, and so normal/typical
build testing did not immediately reveal my incorrect assumption.
Cc: Mimi Zohar <zohar@linux.vnet.ibm.com>
Cc: Dmitry Kasatkin <dmitry.kasatkin@gmail.com>
Cc: James Morris <james.l.morris@oracle.com>
Cc: "Serge E. Hallyn" <serge@hallyn.com>
Cc: linux-ima-devel@lists.sourceforge.net
Cc: linux-security-module@vger.kernel.org
Reported-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: James Morris <james.morris@microsoft.com>
- Implement BPF based syscall filtering in 'perf trace', using BPF maps and
the augmented_raw_syscalls.c BPF proggie (Arnaldo Carvalho de Melo)
- Allow specifying in .perfconfig a set of events use in 'perf trace' in
addition to any other specified from the command line. This initially
will be used to always use the augmented_raw_syscalls.o precompiled
BPF program for getting pointer contents. (Arnaldo Carvalho de Melo)
- Allow fine grained control about how the syscall output should be
formatted. This will be used to allow producing the same output produced
by the 'strace' tool, to then use in regression tests comparing the
output of 'perf trace' with the one produced from 'strace' (Arnaldo Carvalho de Melo)
- Beautify the renameat2 olddirfd, newdirfd and flags arguments (Arnaldo Carvalho de Melo)
- Beautify arch_prctl 'code' syscall arg (Arnaldo Carvalho de Melo)
- Beautify fadvise64 'advice' syscall arg (Arnaldo Carvalho de Melo)
- Relax checks on perf-PID.map ownership, resulting in symbols in
executable anonymous maps setup by JITs in things like node.js to
be resolved in a 'perf top' session run by root without the need
for --force to be used (Arnaldo Carvalho de Melo)
- Update asm-generic/unistd.h copy (Arnaldo Carvalho de Melo)
- Do not use the first and last symbols when setting up address filters in
auxtrace, this fails when we don't have a symbol table, filter the entire
area based on the dso size. (Adrian Hunter)
- Do not use kernel headers to build libsubcmd, we shouldn't use
anything from outside tools/, fixes the build with the Android NDK (Arnaldo Carvalho de Melo)
- Add several prototypes for systems lacking those, such as open_memstream(),
sigqueue(), fixing warnings building with Android's bionic libc that were
preventing the use of -Werror there (Arnaldo Carvalho de Melo)
- Use LDFLAGS in the libtraceevent build commands, allowing developers
to override its values (Jiri Olsa)
- Link libperf-jvmti.so with LDFLAGS variable, allowing distro
packages to propagate its settings when building this library (Jiri Olsa)
- cs-etm (ARM CoreSight) fixes: (Leo Yan)
- Correct packets swapping in cs_etm__flush()
- Avoid stale branch samples when flush packet
- Remove unused 'trace_on' in cs_etm_decoder
- Refactor enumeration cs_etm_sample_type
- Rename CS_ETM_TRACE_ON to CS_ETM_DISCONTINUITY
- Treat NO_SYNC element as trace discontinuity
- Treat EO_TRACE element as trace discontinuity
- Generate branch sample for exception packet
- Use shebangs in the 'perf test' shell scripts, making them identifiable as
shell scripts (Michael Petlan)
- Avoid segfaults caused by negated options in 'perf stat' (Michael Petlan)
- Fix processing of dereferenced args in bprintk events in libtracevent (Steven Rostedt)
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQR2GiIUctdOfX2qHhGyPKLppCJ+JwUCXBluNgAKCRCyPKLppCJ+
J9bDAP9aHQ/33R1W0FTa9MFUosp2g8Qki1eFlNyC1mfTBk66sgD/TynpNQLadlmS
nqUg3kG5oZK+npwDCyZ0RfAVvOyFXQQ=
=hF7Q
-----END PGP SIGNATURE-----
Merge tag 'perf-core-for-mingo-4.21-20181218' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/core
Pull perf/core improvements and fixes from Arnaldo Carvalho de Melo:
- Implement BPF based syscall filtering in 'perf trace', using BPF maps and
the augmented_raw_syscalls.c BPF proggie (Arnaldo Carvalho de Melo)
- Allow specifying in .perfconfig a set of events use in 'perf trace' in
addition to any other specified from the command line. This initially
will be used to always use the augmented_raw_syscalls.o precompiled
BPF program for getting pointer contents. (Arnaldo Carvalho de Melo)
- Allow fine grained control about how the syscall output should be
formatted. This will be used to allow producing the same output produced
by the 'strace' tool, to then use in regression tests comparing the
output of 'perf trace' with the one produced from 'strace' (Arnaldo Carvalho de Melo)
- Beautify the renameat2 olddirfd, newdirfd and flags arguments (Arnaldo Carvalho de Melo)
- Beautify arch_prctl 'code' syscall arg (Arnaldo Carvalho de Melo)
- Beautify fadvise64 'advice' syscall arg (Arnaldo Carvalho de Melo)
- Relax checks on perf-PID.map ownership, resulting in symbols in
executable anonymous maps setup by JITs in things like node.js to
be resolved in a 'perf top' session run by root without the need
for --force to be used (Arnaldo Carvalho de Melo)
- Update asm-generic/unistd.h copy (Arnaldo Carvalho de Melo)
- Do not use the first and last symbols when setting up address filters in
auxtrace, this fails when we don't have a symbol table, filter the entire
area based on the dso size. (Adrian Hunter)
- Do not use kernel headers to build libsubcmd, we shouldn't use
anything from outside tools/, fixes the build with the Android NDK (Arnaldo Carvalho de Melo)
- Add several prototypes for systems lacking those, such as open_memstream(),
sigqueue(), fixing warnings building with Android's bionic libc that were
preventing the use of -Werror there (Arnaldo Carvalho de Melo)
- Use LDFLAGS in the libtraceevent build commands, allowing developers
to override its values (Jiri Olsa)
- Link libperf-jvmti.so with LDFLAGS variable, allowing distro
packages to propagate its settings when building this library (Jiri Olsa)
- cs-etm (ARM CoreSight) fixes: (Leo Yan)
- Correct packets swapping in cs_etm__flush()
- Avoid stale branch samples when flush packet
- Remove unused 'trace_on' in cs_etm_decoder
- Refactor enumeration cs_etm_sample_type
- Rename CS_ETM_TRACE_ON to CS_ETM_DISCONTINUITY
- Treat NO_SYNC element as trace discontinuity
- Treat EO_TRACE element as trace discontinuity
- Generate branch sample for exception packet
- Use shebangs in the 'perf test' shell scripts, making them identifiable as
shell scripts (Michael Petlan)
- Avoid segfaults caused by negated options in 'perf stat' (Michael Petlan)
- Fix processing of dereferenced args in bprintk events in libtracevent (Steven Rostedt)
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
ath.git patches for 4.21. Major changes:
ath10k
* add amsdu support for QCA6174 monitor mode
* report tx rate using the new ieee80211_tx_rate_update() API
* wcn3990 support is not experimental anymore
Add wmi configuration cmd to configure base band(BB) power amplifier(PA)
off timing values in hardware. The default PA off timings were fine tuned
to make proper DFS radar detection in QCA reference design. If ODM uses
different PA in their design, then the same default PA off timing values
cannot be used, it requires different settling time to detect radar pulses
very sooner and avoid radar detection problems. In that case it provides
provision to select proper PA off timing values based on the PA hardware used.
The PA component is part of FEM hardware and new device tree entry
"ext-fem-name" is used to indentify the FEM hardware. And this wmi configuration
cmd is enabled via wmi service flag "WMI_SERVICE_BB_TIMING_CONFIG_SUPPORT".
Other way is to apply these values through calibration data, but recalibration
of all boards out there might not be feasible.
This change tested on firmware ver 10.2.4-1.0-00042 in QCA988X chipset.
Signed-off-by: Bhagavathi Perumal S <bperumal@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
This adds new dt entry ext-fem-name, it is used by ath10k driver
to select correct timing parameters and configure it in target wifi hardware.
The Front End Module(FEM) normally includes tx power amplifier(PA) and
rx low noise amplifier(LNA). The default timing parameters like tx end to
PA off timing values were fine tuned for internal FEM used in reference
design. And these timing values can not be same if ODM modifies hardware
design with different external FEM. This DT entry helps to choose correct
timing values in driver if different external FEM hardware used.
Signed-off-by: Bhagavathi Perumal S <bperumal@codeaurora.org>
Reviewed-by: Rob Herring <robh@kernel.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
In qcom,ath10k documentation, ath10k is used as node name in the example of
pci based device. Normally, node name should be class of device and not the
model name, so fix it to node name "wifi". And remove the property device_type
pci since only pci bridges should have this property.
Signed-off-by: Bhagavathi Perumal S <bperumal@codeaurora.org>
Reviewed-by: Rob Herring <robh@kernel.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Memory of tx_stats was allocated when a STA was added. But it's not freed
if the STA failed to be added to driver. This issue could be seen in MDK3
attack case when STA number reached the limit.
Tested: QCA9984 with firmware ver 10.4-3.9.0.1-00005
Signed-off-by: Zhi Chen <zhichen@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
There was a race condition in SMP that an ath10k_peer was created but its
member sta was null. Following are procedures of ath10k_peer creation and
member sta access in peer statistics path.
1. Peer creation:
ath10k_peer_create()
=>ath10k_wmi_peer_create()
=>ath10k_wait_for_peer_created()
...
# another kernel path, RX from firmware
ath10k_htt_t2h_msg_handler()
=>ath10k_peer_map_event()
=>wake_up()
# ar->peer_map[id] = peer //add peer to map
#wake up original path from waiting
...
# peer->sta = sta //sta assignment
2. RX path of statistics
ath10k_htt_t2h_msg_handler()
=>ath10k_update_per_peer_tx_stats()
=>ath10k_htt_fetch_peer_stats()
# peer->sta //sta accessing
Any access of peer->sta after peer was added to peer_map but before sta was
assigned could cause a null pointer issue. And because these two steps are
asynchronous, no proper lock can protect them. So both peer and sta need to
be checked before access.
Tested: QCA9984 with firmware ver 10.4-3.9.0.1-00005
Signed-off-by: Zhi Chen <zhichen@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
WCN3990 wifi module can optionally make use of the IOMMU.
Add binding documentation for phandle to the IOMMU and
the stream id of wifi iommu block.
Signed-off-by: Govind Singh <govinds@codeaurora.org>
Reviewed-by: Rob Herring <robh@kernel.org>
Reviewed-by: Brian Norris <briannorris@chromium.org>
Tested-by: Brian Norris <briannorris@chromium.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Add missing optional properties in WCN3990 wifi node.
Signed-off-by: Govind Singh <govinds@codeaurora.org>
Reviewed-by: Rob Herring <robh@kernel.org>
Reviewed-by: Brian Norris <briannorris@chromium.org>
Tested-by: Brian Norris <briannorris@chromium.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
The "survey" pointer is the address of an array element. We know that
it can't be NULL so this check can be removed.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
During driver load below warn logs are printed in the console.
Since driver may not implement all wmi events sent by fw and
all of them are non-fatal, move this log to debug level to
remove un-necessary warn message on console.
[ 361.887230] ath10k_snoc a000000.wifi: Unknown eventid: 16393
[ 361.907037] ath10k_snoc a000000.wifi: Unknown eventid: 237569
Signed-off-by: Govind Singh <govinds@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
The devm_memremap() function doesn't return NULLs, it returns error
pointers.
Fixes: ba94c753cc ("ath10k: add QMI message handshake for wcn3990 client")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
All the necessary patches to make wifi running (over SNOC)
are merged and tested on SDM845/QCS404 platform with WCN3990
wifi module, hence remove work in progress debug from snoc
driver and Kconfig.
Signed-off-by: Govind Singh <govinds@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Some hardwares variants (QCA99x0) are limiting msdu deaggregation with
some threshold value(default limit in QCA99x0 is 64 msdus), it was introduced to
avoid excessive MSDU-deaggregation in error cases. When number of sub frames
exceeds the limit, target hardware will send all msdus starting from present
msdu in RAW format as a single msdu packet and it will be indicated with
error status bit "RX_MSDU_END_INFO0_MSDU_LIMIT_ERR" set in rx descriptor.
This msdu frame is a partial raw MSDU and does't have first msdu and ieee80211
header. It caused below warning message.
[ 320.151332] ------------[ cut here ]------------
[ 320.155006] WARNING: CPU: 0 PID: 3 at drivers/net/wireless/ath/ath10k/htt_rx.c:1188
In our issue case, MSDU limit error happened due to FCS error and generated
this warning message.
This fixes the warning by handling the MSDU limit error. If msdu limit error
happens, driver adds first MSDU's ieee80211 header and sets A-MSDU present bit
in QOS header so that upper layer processes this frame if it is valid or drop it
if FCS error set. And removed the warning message, hence partial msdus without
first msdu is expected in msdu limit error cases.
Tested on QCA9984, Firmware 10.4-3.6-00104
Signed-off-by: Bhagavathi Perumal S <bperumal@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>