Use the actual return value instead of always -1 if register_kretprobe()
failed.
E.g. without this patch:
# insmod samples/kprobes/kretprobe_example.ko func=no_such_func
insmod: ERROR: could not insert module samples/kprobes/kretprobe_example.ko: Operation not permitted
With this patch:
# insmod samples/kprobes/kretprobe_example.ko func=no_such_func
insmod: ERROR: could not insert module samples/kprobes/kretprobe_example.ko: Unknown symbol in module
Link: https://lkml.kernel.org/r/1635213091-24387-2-git-send-email-yangtiezhu@loongson.cn
Fixes: 804defea1c ("Kprobes: move kprobe examples to samples/")
Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Use the compiler-defined __BYTE_ORDER__ instead of the libc-defined
__BYTE_ORDER for consistency.
Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20211026010831.748682-5-iii@linux.ibm.com
In order to better track where in the kernel the dma-buf code is used,
put the symbols in the namespace DMA_BUF and modify all users of the
symbols to properly import the namespace to not break the build at the
same time.
Now the output of modinfo shows the use of these symbols, making it
easier to watch for users over time:
$ modinfo drivers/misc/fastrpc.ko | grep import
import_ns: DMA_BUF
Cc: "Pan, Xinhui" <Xinhui.Pan@amd.com>
Cc: David Airlie <airlied@linux.ie>
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Cc: Maxime Ripard <mripard@kernel.org>
Cc: Thomas Zimmermann <tzimmermann@suse.de>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: dri-devel@lists.freedesktop.org
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Acked-by: Christian König <christian.koenig@amd.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Sumit Semwal <sumit.semwal@linaro.org>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Link: https://lore.kernel.org/r/20211010124628.17691-1-gregkh@linuxfoundation.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When compiling bpf samples, the following warning appears:
readelf: Error: Missing knowledge of 32-bit reloc types used in DWARF
sections of machine number 247
readelf: Warning: unable to apply unsupported reloc type 10 to section
.debug_info
readelf: Warning: unable to apply unsupported reloc type 1 to section
.debug_info
readelf: Warning: unable to apply unsupported reloc type 10 to section
.debug_info
Same problem was mentioned in commit 2f0921262b ("selftests/bpf:
suppress readelf stderr when probing for BTF support"), let's use
readelf that supports btf.
Signed-off-by: Pu Lehui <pulehui@huawei.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20211021123913.48833-1-pulehui@huawei.com
Adding simple module that uses multi direct interface:
register_ftrace_direct_multi
unregister_ftrace_direct_multi
The init function registers trampoline for 2 functions,
and exit function unregisters them.
Link: https://lkml.kernel.org/r/20211008091336.33616-9-jolsa@kernel.org
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
The coccinelle check report:
"./samples/bpf/xdp_redirect_cpu_user.c:397:32-38:
ERROR: application of sizeof to pointer"
Using the "strlen" to fix it.
Reported-by: Zeal Robot <zealci@zte.com.cn>
Signed-off-by: David Yang <davidcomponentone@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20211012111649.983253-1-davidcomponentone@gmail.com
Add s390 support for ftrace direct call samples, which also enables
ftrace direct call selftests within ftrace selftests.
Acked-by: Ilya Leoshkevich <iii@linux.ibm.com>
Reviewed-by: Sven Schnelle <svens@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Link: https://lore.kernel.org/r/20211012133802.2460757-5-hca@linux.ibm.com
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Add HAVE_SAMPLE_FTRACE_DIRECT config option which can be selected by
architectures which have support for ftrace direct call samples.
Acked-by: Ilya Leoshkevich <iii@linux.ibm.com>
Reviewed-by: Sven Schnelle <svens@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Link: https://lore.kernel.org/r/20211012133802.2460757-4-hca@linux.ibm.com
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
In samples/bpf/Makefile, libbpf has a FORCE dependency that force it to
be rebuilt. I read this as a way to keep the library up-to-date, given
that we do not have, in samples/bpf, a list of the source files for
libbpf itself. However, a better approach would be to use the
"$(wildcard ...)" function from make, and to have libbpf depend on all
the .c and .h files in its directory. This is what samples/bpf/Makefile
does for bpftool, and also what the BPF selftests' Makefile does for
libbpf.
Let's update the Makefile to avoid rebuilding libbpf all the time (and
bpftool on top of it).
Signed-off-by: Quentin Monnet <quentin@isovalent.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20211007194438.34443-11-quentin@isovalent.com
API headers from libbpf should not be accessed directly from the source
directory. Instead, they should be exported with "make install_headers".
Make sure that samples/bpf/Makefile installs the headers properly when
building.
The object compiled from and exported by libbpf are now placed into a
subdirectory of sample/bpf/ instead of remaining in tools/lib/bpf/. We
attempt to remove this directory on "make clean". However, the "clean"
target re-enters the samples/bpf/ directory from the root of the
repository ("$(MAKE) -C ../../ M=$(CURDIR) clean"), in such a way that
$(srctree) and $(src) are not defined, making it impossible to use
$(LIBBPF_OUTPUT) and $(LIBBPF_DESTDIR) in the recipe. So we only attempt
to clean $(CURDIR)/libbpf, which is the default value.
Add a dependency on libbpf's headers for the $(TRACE_HELPERS).
We also change the output directory for bpftool, to place the generated
objects under samples/bpf/bpftool/ instead of building in bpftool's
directory directly. Doing so, we make sure bpftool reuses the libbpf
library previously compiled and installed.
Signed-off-by: Quentin Monnet <quentin@isovalent.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20211007194438.34443-10-quentin@isovalent.com
Update samples/bpf/.gitignore to ignore files generated when building
the samples. Add:
- vmlinux.h
- the generated skeleton files (*.skel.h)
- the samples/bpf/libbpf/ and .../bpftool/ directories, in preparation
of a future commit which introduces a local output directory for
building libbpf and bpftool.
Signed-off-by: Quentin Monnet <quentin@isovalent.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20211007194438.34443-9-quentin@isovalent.com
Replace deprecated bpf_{map,program}__next APIs with newly added
bpf_object__next_{map,program} APIs, so that no compilation warnings
emit.
Signed-off-by: Hengqi Chen <hengqi.chen@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Song Liu <songliubraving@fb.com>
Link: https://lore.kernel.org/bpf/20211003165844.4054931-3-hengqi.chen@gmail.com
Recent-ish versions of make do no longer consider number signs ("#") as
comment symbols when they are inserted inside of a macro reference or in
a function invocation. In such cases, the symbols should not be escaped.
There are a few occurrences of "\#" in libbpf's and samples' Makefiles.
In the former, the backslash is harmless, because grep associates no
particular meaning to the escaped symbol and reads it as a regular "#".
In samples' Makefile, recent versions of make will pass the backslash
down to the compiler, making the probe fail all the time and resulting
in the display of a warning about "make headers_install" being required,
even after headers have been installed.
A similar issue has been addressed at some other locations by commit
9564a8cf42 ("Kbuild: fix # escaping in .cmd files for future Make").
Let's address it for libbpf's and samples' Makefiles in the same
fashion, by using a "$(pound)" variable (pulled from
tools/scripts/Makefile.include for libbpf, or re-defined for the
samples).
Reference for the change in make:
https://git.savannah.gnu.org/cgit/make.git/commit/?id=c6966b323811c37acedff05b57
Fixes: 2f38304127 ("libbpf: Make libbpf_version.h non-auto-generated")
Fixes: 07c3bbdb1a ("samples: bpf: print a warning about headers_install")
Signed-off-by: Quentin Monnet <quentin@isovalent.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20211006111049.20708-1-quentin@isovalent.com
Reuse the logic in vfio_noiommu_group_alloc to allocate a fake
single-device iommu group for mediated devices by factoring out a common
function, and replacing the noiommu boolean field in struct vfio_group
with an enum to distinguish the three different kinds of groups.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Link: https://lore.kernel.org/r/20210924155705.4258-8-hch@lst.de
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
libbpf and bpftool have been dual-licensed to facilitate inclusion in software
that is not compatible with GPL2-only (ie: Apache2), but the samples are still
GPL2-only.
Given these files are samples, they get naturally copied around. For example,
it is the case for samples/bpf/bpf_insn.h which was copied into the systemd
tree: https://github.com/systemd/systemd/blob/main/src/shared/linux/bpf_insn.h
Some more context on systemd's needs specifically:
Most of systemd is (L)GPL2-or-later, which means there is no perceived
incompatibility with Apache2 software and can thus be linked with
OpenSSL 3.0. But given this GPL2-only header is included this is currently
not possible. Dual-licensing this header solves this problem for us as we
are scoping a move to OpenSSL 3.0, see:
https://lists.freedesktop.org/archives/systemd-devel/2021-September/046882.html
Dual-license this header as GPL-2.0-only OR BSD-2-Clause to follow the same
licensing used by libbpf and bpftool:
1bc38b8ff6 ("libbpf: relicense libbpf as LGPL-2.1 OR BSD-2-Clause")
907b223651 ("tools: bpftool: dual license all files")
Signed-off-by: Luca Boccassi <bluca@debian.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Simon Horman <simon.horman@corigine.com>
Acked-by: Daniel Mack <daniel@zonque.org>
Acked-by: Josef Bacik <josef@toxicpanda.com>
Acked-by: Joe Stringer <joe@ovn.org>
Acked-by: Chenbo Feng <fengc@google.com>
Acked-by: Björn Töpel <bjorn@kernel.org>
Acked-by: Magnus Karlsson <magnus.karlsson@intel.com>
Acked-by: Brendan Jackman <jackmanb@google.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20210923000540.47344-1-luca.boccassi@gmail.com
Generate vmlinux.h only from the in-tree vmlinux, and remove enum
declarations that would cause a build failure in case of version
mismatches.
There are now two options when building the samples:
1. Compile the kernel to use in-tree vmlinux for vmlinux.h
2. Override VMLINUX_BTF for samples using something like this:
make VMLINUX_BTF=/sys/kernel/btf/vmlinux -C samples/bpf
This change was tested with relative builds, e.g. cases like:
* make O=build -C samples/bpf
* make KBUILD_OUTPUT=build -C samples/bpf
* make -C samples/bpf
* cd samples/bpf && make
When a suitable VMLINUX_BTF is not found, the following message is
printed:
/home/kkd/src/linux/samples/bpf/Makefile:333: *** Cannot find a vmlinux
for VMLINUX_BTF at any of " ./vmlinux", build the kernel or set
VMLINUX_BTF variable. Stop.
Fixes: 384b6b3bbf (samples: bpf: Add vmlinux.h generation support)
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
Link: https://lore.kernel.org/bpf/20210928054608.1799021-1-memxor@gmail.com
The ARP table that is dumped when the xdp_router_ipv4 process is launched
has the IP address & MAC address in non-readable network byte order format,
also the alignment is off when printing the table.
Address HwAddress
160000e0 1600005e0001
ff96a8c0 ffffffffffff
faffffef faff7f5e0001
196a8c0 9607871293ea
fb0000e0 fb00005e0001
0 0
196a8c0 9607871293ea
ffff11ac ffffffffffff
faffffef faff7f5e0001
fb0000e0 fb00005e0001
160000e0 1600005e0001
160000e0 1600005e0001
faffffef faff7f5e0001
fb0000e0 fb00005e0001
40011ac 40011ac4202
Fix this by converting the "Address" field from network byte order Hex into
dotted decimal notation IPv4 format and "HwAddress" field from network byte
order Hex into Colon separated Hex format. Also fix the aligntment of the
fields in the ARP table.
Address HwAddress
224.0.0.22 01:00:5e:00:00:16
192.168.150.255 ff:ff:ff:ff:ff:ff
239.255.255.250 01:00:5e:7f:ff:fa
192.168.150.1 ea:93:12:87:07:96
224.0.0.251 01:00:5e:00:00:fb
0.0.0.0 00:00:00:00:00:00
192.168.150.1 ea:93:12:87:07:96
172.17.255.255 ff:ff:ff:ff:ff:ff
239.255.255.250 01:00:5e:7f:ff:fa
224.0.0.251 01:00:5e:00:00:fb
224.0.0.22 01:00:5e:00:00:16
224.0.0.22 01:00:5e:00:00:16
239.255.255.250 01:00:5e:7f:ff:fa
224.0.0.251 01:00:5e:00:00:fb
172.17.0.4 02:42:ac:11:00:04
Signed-off-by: Gokul Sivakumar <gokulkumar792@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20210919080305.173588-2-gokulkumar792@gmail.com
The route table that is dumped when the xdp_router_ipv4 process is launched
has the "Gateway" field in non-readable network byte order format, also the
alignment is off when printing the table.
Destination Gateway Genmask Metric Iface
0.0.0.0 196a8c0 0 0 enp7s0
0.0.0.0 196a8c0 0 0 wlp6s0
169.254.0.0 196a8c0 16 0 enp7s0
172.17.0.0 0 16 0 docker0
192.168.150.0 0 24 0 enp7s0
192.168.150.0 0 24 0 wlp6s0
Fix this by converting the "Gateway" field from network byte order Hex into
dotted decimal notation IPv4 format and "Genmask" from CIDR notation into
dotted decimal notation IPv4 format. Also fix the aligntment of the fields
in the route table.
Destination Gateway Genmask Metric Iface
0.0.0.0 192.168.150.1 0.0.0.0 0 enp7s0
0.0.0.0 192.168.150.1 0.0.0.0 0 wlp6s0
169.254.0.0 192.168.150.1 255.255.0.0 0 enp7s0
172.17.0.0 0.0.0.0 255.255.0.0 0 docker0
192.168.150.0 0.0.0.0 255.255.255.0 0 enp7s0
192.168.150.0 0.0.0.0 255.255.255.0 0 wlp6s0
Signed-off-by: Gokul Sivakumar <gokulkumar792@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20210919080305.173588-1-gokulkumar792@gmail.com
Remove blank lines that are not necessary, fixing the checkpatch script
reports. While at it, add a blank line after the switch default block,
similar to the other parts of the codebase.
Reviewed-by: George-Aurelian Popescu <popegeo@amazon.com>
Signed-off-by: Andra Paraschiv <andraprs@amazon.com>
Link: https://lore.kernel.org/r/20210827154930.40608-8-andraprs@amazon.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Fix the typos in the words spelling as per the checkpatch script
reports.
Reviewed-by: George-Aurelian Popescu <popegeo@amazon.com>
Signed-off-by: Andra Paraschiv <andraprs@amazon.com>
Link: https://lore.kernel.org/r/20210827154930.40608-7-andraprs@amazon.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Changes for kgdb/kdb this cycle are dominated by a change from
Sumit that removes as small (256K) private heap from kdb. This is
change I've hoped for ever since I discovered how few users of this
heap remained in the kernel, so many thanks to Sumit for hunting
these down. Other change is an incremental step towards SPDX headers.
Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEELzVBU1D3lWq6cKzwfOMlXTn3iKEFAmE2OuUACgkQfOMlXTn3
iKEcTQ/7BiMPnSzPeh7srNIG1a7/ig8CUtu+qMsrU50BUvCBzhas3+fGbzlZHV4p
GwCGTuRDWhVJjntpCkLYroSSF0h/AhN68pxqUx6EJUWEYMGuuMBuVtAdXYMSYbse
MW+Jw6Tec2fWPyw6lyWd0vwOKLzhf/pC8axbmGESzgdmZrsWeA4XEeF5XRHsHyJf
pL4GvBXN/mzH2TSQNQmSgda5VLnK3m6B66tlT17mywG8QTKwID8I4LXs9doNwqeW
6fkwJswAsCa+nO1GjMsRcA8HoUd7mtqLCYbjZzFVQdFJ6kF9eBKSPvgbaqr6oNvv
t6VBfm908+fM4/4yEQSuwVPT32IPe7I/Bk+aUqnSpPiXU8v75u0lIfB929O/DS07
X+NjqpXN7v6mEpLAFgWhqPyzTODw40OhpOi1uf9jzPB/OSNm9y1zs21FJQSx6KGa
AahMHrRoeZwlxNdD/2RBf9pQCSBWF5R1DyIye2iQX+e/jrvTXHMCK33tpcbcE9Cl
bMVqnnJ4Pnmul6hAAf9WlJduIrACV1Oq1iQrtXWE3wdFZdQuvqKvwHUK/Zko+O0f
cdW6neyW7Slo+8q2qXuEE0Bp+Ae61oZTrGIdfj29ZdXwp0+sBzcA5CFO7MKPJllV
yEq41Bxc3cgJPcsFMZWSaHcEAI5fq1iBmquVdCA0/vYbP9dsO1U=
=iMIF
-----END PGP SIGNATURE-----
Merge tag 'kgdb-5.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/danielt/linux
Pull kgdb updates from Daniel Thompson:
"Changes for kgdb/kdb this cycle are dominated by a change from Sumit
that removes as small (256K) private heap from kdb. This is change
I've hoped for ever since I discovered how few users of this heap
remained in the kernel, so many thanks to Sumit for hunting these
down.
The other change is an incremental step towards SPDX headers"
* tag 'kgdb-5.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/danielt/linux:
kernel: debug: Convert to SPDX identifier
kdb: Rename members of struct kdbtab_t
kdb: Simplify kdb_defcmd macro logic
kdb: Get rid of redundant kdb_register_flags()
kdb: Rename struct defcmd_set to struct kdb_macro
kdb: Get rid of custom debug heap allocator
- Fix dma-valid return WAITED implementation (Anthony Yznaga)
- SPDX license cleanups (Cai Huoqing)
- Split vfio-pci-core from vfio-pci and enhance PCI driver matching
to support future vendor provided vfio-pci variants (Yishai Hadas,
Max Gurtovoy, Jason Gunthorpe)
- Replace duplicated reflck with core support for managing first
open, last close, and device sets (Jason Gunthorpe, Max Gurtovoy,
Yishai Hadas)
- Fix non-modular mdev support and don't nag about request callback
support (Christoph Hellwig)
- Add semaphore to protect instruction intercept handler and replace
open-coded locks in vfio-ap driver (Tony Krowiak)
- Convert vfio-ap to vfio_register_group_dev() API (Jason Gunthorpe)
-----BEGIN PGP SIGNATURE-----
iQJPBAABCAA5FiEEQvbATlQL0amee4qQI5ubbjuwiyIFAmEvwWkbHGFsZXgud2ls
bGlhbXNvbkByZWRoYXQuY29tAAoJECObm247sIsi+1UP/3CRizghroINVYR+cJ99
Tjz7lB/wlzxmRfX+SL4NAVe1SSB2VeCgU4B0PF6kywELLS8OhCO3HXYXVsz244fW
Gk5UIns86+TFTrfCOMpwYBV0P86zuaa1ZnvCnkhMK1i2pTZ+oX8hUH1Yj5clHuU+
YgC7JfEuTIAX73q2bC/llLvNE9ke1QCoDX3+HAH87ttqutnRWcnnq56PTEqwe+EW
eMA+glB1UG6JAqXxoJET4155arNOny1/ZMprfBr3YXZTiXDF/lSzuMyUtbp526Sf
hsvlnqtE6TCdfKbog0Lxckl+8E9NCq8jzFBKiZhbccrQv3vVaoP6dOsPWcT35Kp1
IjzMLiHIbl4wXOL+Xap/biz3LCM5BMdT/OhW5LUC007zggK71ndRvb9F8ptW83Bv
0Uh9DNv7YIQ0su3JHZEsJ3qPFXQXceP199UiADOGSeV8U1Qig3YKsHUDMuALfFvN
t+NleeJ4qCWao+W4VCfyDfKurVnMj/cThXiDEWEeq5gMOO+6YKBIFWJVKFxUYDbf
MgGdg0nQTUECuXKXxLD4c1HAWH9xi207OnLvhW1Icywp20MsYqOWt0vhg+PRdMBT
DK6STxP18aQxCaOuQN9Vf81LjhXNTeg+xt3mMyViOZPcKfX6/wAC9qLt4MucJDdw
FBfOz2UL2F56dhAYT+1vHoUM
=nzK7
-----END PGP SIGNATURE-----
Merge tag 'vfio-v5.15-rc1' of git://github.com/awilliam/linux-vfio
Pull VFIO updates from Alex Williamson:
- Fix dma-valid return WAITED implementation (Anthony Yznaga)
- SPDX license cleanups (Cai Huoqing)
- Split vfio-pci-core from vfio-pci and enhance PCI driver matching to
support future vendor provided vfio-pci variants (Yishai Hadas, Max
Gurtovoy, Jason Gunthorpe)
- Replace duplicated reflck with core support for managing first open,
last close, and device sets (Jason Gunthorpe, Max Gurtovoy, Yishai
Hadas)
- Fix non-modular mdev support and don't nag about request callback
support (Christoph Hellwig)
- Add semaphore to protect instruction intercept handler and replace
open-coded locks in vfio-ap driver (Tony Krowiak)
- Convert vfio-ap to vfio_register_group_dev() API (Jason Gunthorpe)
* tag 'vfio-v5.15-rc1' of git://github.com/awilliam/linux-vfio: (37 commits)
vfio/pci: Introduce vfio_pci_core.ko
vfio: Use kconfig if XX/endif blocks instead of repeating 'depends on'
vfio: Use select for eventfd
PCI / VFIO: Add 'override_only' support for VFIO PCI sub system
PCI: Add 'override_only' field to struct pci_device_id
vfio/pci: Move module parameters to vfio_pci.c
vfio/pci: Move igd initialization to vfio_pci.c
vfio/pci: Split the pci_driver code out of vfio_pci_core.c
vfio/pci: Include vfio header in vfio_pci_core.h
vfio/pci: Rename ops functions to fit core namings
vfio/pci: Rename vfio_pci_device to vfio_pci_core_device
vfio/pci: Rename vfio_pci_private.h to vfio_pci_core.h
vfio/pci: Rename vfio_pci.c to vfio_pci_core.c
vfio/ap_ops: Convert to use vfio_register_group_dev()
s390/vfio-ap: replace open coded locks for VFIO_GROUP_NOTIFY_SET_KVM notification
s390/vfio-ap: r/w lock for PQAP interception handler function pointer
vfio/type1: Fix vfio_find_dma_valid return
vfio-pci/zdev: Remove repeated verbose license text
vfio: platform: reset: Convert to SPDX identifier
vfio: Remove struct vfio_device_ops open/release
...
Here is the big set of char/misc driver changes for 5.15-rc1.
Lots of different driver subsystems are being updated in here, notably:
- mhi subsystem update
- fpga subsystem update
- coresight/hwtracing subsystem update
- interconnect subsystem update
- nvmem subsystem update
- parport drivers update
- phy subsystem update
- soundwire subsystem update
and there are some other char/misc drivers being updated as well:
- binder driver additions
- new misc drivers
- lkdtm driver updates
- mei driver updates
- sram driver updates
- other minor driver updates.
Note, there are no habanna labs driver updates in this pull request,
that will probably come later before -rc1 is out in a different request.
All of these have been in linux-next for a while with no reported
problems.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCYS+Kyw8cZ3JlZ0Brcm9h
aC5jb20ACgkQMUfUDdst+ymlpACg0JM+hSeo8T5GtwZksZ1QXXQfh8sAoK6Dt6xF
e62OQuuMFT0Un0qOflZk
=emH+
-----END PGP SIGNATURE-----
Merge tag 'char-misc-5.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc
Pull char / misc driver updates from Greg KH:
"Here is the big set of char/misc driver changes for 5.15-rc1.
Lots of different driver subsystems are being updated in here,
notably:
- mhi subsystem update
- fpga subsystem update
- coresight/hwtracing subsystem update
- interconnect subsystem update
- nvmem subsystem update
- parport drivers update
- phy subsystem update
- soundwire subsystem update
and there are some other char/misc drivers being updated as well:
- binder driver additions
- new misc drivers
- lkdtm driver updates
- mei driver updates
- sram driver updates
- other minor driver updates.
Note, there are no habanalabs driver updates in this pull request,
that will probably come later before -rc1 is out in a different
request.
All of these have been in linux-next for a while with no reported
problems"
* tag 'char-misc-5.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (169 commits)
Revert "bus: mhi: Add inbound buffers allocation flag"
misc/pvpanic: fix set driver data
VMCI: fix NULL pointer dereference when unmapping queue pair
char: mware: fix returnvar.cocci warnings
parport: remove non-zero check on count
soundwire: cadence: do not extend reset delay
soundwire: intel: conditionally exit clock stop mode on system suspend
soundwire: intel: skip suspend/resume/wake when link was not started
soundwire: intel: fix potential race condition during power down
phy: qcom-qmp: Add support for SM6115 UFS phy
dt-bindings: phy: qcom,qmp: Add SM6115 UFS PHY bindings
phy: qmp: Provide unique clock names for DP clocks
lkdtm: remove IDE_CORE_CP crashpoint
lkdtm: replace SCSI_DISPATCH_CMD with SCSI_QUEUE_RQ
coresight: Replace deprecated CPU-hotplug functions.
Documentation: coresight: Add documentation for CoreSight config
coresight: syscfg: Add initial configfs support
coresight: config: Add preloaded configurations
coresight: etm4x: Add complex configuration handlers to etmv4
coresight: etm-perf: Update to activate selected configuration
...
- Enable memcg accounting for various networking objects.
BPF:
- Introduce bpf timers.
- Add perf link and opaque bpf_cookie which the program can read
out again, to be used in libbpf-based USDT library.
- Add bpf_task_pt_regs() helper to access user space pt_regs
in kprobes, to help user space stack unwinding.
- Add support for UNIX sockets for BPF sockmap.
- Extend BPF iterator support for UNIX domain sockets.
- Allow BPF TCP congestion control progs and bpf iterators to call
bpf_setsockopt(), e.g. to switch to another congestion control
algorithm.
Protocols:
- Support IOAM Pre-allocated Trace with IPv6.
- Support Management Component Transport Protocol.
- bridge: multicast: add vlan support.
- netfilter: add hooks for the SRv6 lightweight tunnel driver.
- tcp:
- enable mid-stream window clamping (by user space or BPF)
- allow data-less, empty-cookie SYN with TFO_SERVER_COOKIE_NOT_REQD
- more accurate DSACK processing for RACK-TLP
- mptcp:
- add full mesh path manager option
- add partial support for MP_FAIL
- improve use of backup subflows
- optimize option processing
- af_unix: add OOB notification support.
- ipv6: add IFLA_INET6_RA_MTU to expose MTU value advertised by
the router.
- mac80211: Target Wake Time support in AP mode.
- can: j1939: extend UAPI to notify about RX status.
Driver APIs:
- Add page frag support in page pool API.
- Many improvements to the DSA (distributed switch) APIs.
- ethtool: extend IRQ coalesce uAPI with timer reset modes.
- devlink: control which auxiliary devices are created.
- Support CAN PHYs via the generic PHY subsystem.
- Proper cross-chip support for tag_8021q.
- Allow TX forwarding for the software bridge data path to be
offloaded to capable devices.
Drivers:
- veth: more flexible channels number configuration.
- openvswitch: introduce per-cpu upcall dispatch.
- Add internet mix (IMIX) mode to pktgen.
- Transparently handle XDP operations in the bonding driver.
- Add LiteETH network driver.
- Renesas (ravb):
- support Gigabit Ethernet IP
- NXP Ethernet switch (sja1105)
- fast aging support
- support for "H" switch topologies
- traffic termination for ports under VLAN-aware bridge
- Intel 1G Ethernet
- support getcrosststamp() with PCIe PTM (Precision Time
Measurement) for better time sync
- support Credit-Based Shaper (CBS) offload, enabling HW traffic
prioritization and bandwidth reservation
- Broadcom Ethernet (bnxt)
- support pulse-per-second output
- support larger Rx rings
- Mellanox Ethernet (mlx5)
- support ethtool RSS contexts and MQPRIO channel mode
- support LAG offload with bridging
- support devlink rate limit API
- support packet sampling on tunnels
- Huawei Ethernet (hns3):
- basic devlink support
- add extended IRQ coalescing support
- report extended link state
- Netronome Ethernet (nfp):
- add conntrack offload support
- Broadcom WiFi (brcmfmac):
- add WPA3 Personal with FT to supported cipher suites
- support 43752 SDIO device
- Intel WiFi (iwlwifi):
- support scanning hidden 6GHz networks
- support for a new hardware family (Bz)
- Xen pv driver:
- harden netfront against malicious backends
- Qualcomm mobile
- ipa: refactor power management and enable automatic suspend
- mhi: move MBIM to WWAN subsystem interfaces
Refactor:
- Ambient BPF run context and cgroup storage cleanup.
- Compat rework for ndo_ioctl.
Old code removal:
- prism54 remove the obsoleted driver, deprecated by the p54 driver.
- wan: remove sbni/granch driver.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmEukBYACgkQMUZtbf5S
IrsyHA//TO8dw18NYts4n9LmlJT2naJ7yBUUSSXK/M+DtW0MQ9nnHhqzPm5uJdRl
IgQTNJrW3dYzRwgqaWZqEwO1t5/FI+f87ND1Nsekg7x9tF66a6ov5WxU26TwwSba
U+si/inQ/4chuQ+LxMQobqCDxaLE46I2dIoRl+YfndJ24DRzYSwAEYIPPbSdfyU+
+/l+3s4GaxO4k/hLciPAiOniyxLoUNiGUTNh+2yqRBXelSRJRKVnl+V22ANFrxRW
nTEiplfVKhlPU1e4iLuRtaxDDiePHhw9I3j/lMHhfeFU2P/gKJIvz4QpGV0CAZg2
1VvDU32WEx1GQLXJbKm0KwoNRUq1QSjOyyFti+BO7ugGaYAR4gKhShOqlSYLzUtB
tbtzQhSNLWOGqgmSJOztZb5kFDm2EdRSll5/lP2uyFlPkIsIp0QbscJVzNTnS74b
Xz15ZOw41Z4TfWPEMWgfrx6Zkm7pPWkly+7WfUkPcHa1gftNz6tzXXxSXcXIBPdi
yQ5JCzzxrM5573YHuk5YedwZpn6PiAt4A/muFGk9C6aXP60TQAOS/ppaUzZdnk4D
NfOk9mj06WEULjYjPcKEuT3GGWE6kmjb8Pu0QZWKOchv7vr6oZly1EkVZqYlXELP
AfhcrFeuufie8mqm0jdb4LnYaAnqyLzlb1J4Zxh9F+/IX7G3yoc=
=JDGD
-----END PGP SIGNATURE-----
Merge tag 'net-next-5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next
Pull networking updates from Jakub Kicinski:
"Core:
- Enable memcg accounting for various networking objects.
BPF:
- Introduce bpf timers.
- Add perf link and opaque bpf_cookie which the program can read out
again, to be used in libbpf-based USDT library.
- Add bpf_task_pt_regs() helper to access user space pt_regs in
kprobes, to help user space stack unwinding.
- Add support for UNIX sockets for BPF sockmap.
- Extend BPF iterator support for UNIX domain sockets.
- Allow BPF TCP congestion control progs and bpf iterators to call
bpf_setsockopt(), e.g. to switch to another congestion control
algorithm.
Protocols:
- Support IOAM Pre-allocated Trace with IPv6.
- Support Management Component Transport Protocol.
- bridge: multicast: add vlan support.
- netfilter: add hooks for the SRv6 lightweight tunnel driver.
- tcp:
- enable mid-stream window clamping (by user space or BPF)
- allow data-less, empty-cookie SYN with TFO_SERVER_COOKIE_NOT_REQD
- more accurate DSACK processing for RACK-TLP
- mptcp:
- add full mesh path manager option
- add partial support for MP_FAIL
- improve use of backup subflows
- optimize option processing
- af_unix: add OOB notification support.
- ipv6: add IFLA_INET6_RA_MTU to expose MTU value advertised by the
router.
- mac80211: Target Wake Time support in AP mode.
- can: j1939: extend UAPI to notify about RX status.
Driver APIs:
- Add page frag support in page pool API.
- Many improvements to the DSA (distributed switch) APIs.
- ethtool: extend IRQ coalesce uAPI with timer reset modes.
- devlink: control which auxiliary devices are created.
- Support CAN PHYs via the generic PHY subsystem.
- Proper cross-chip support for tag_8021q.
- Allow TX forwarding for the software bridge data path to be
offloaded to capable devices.
Drivers:
- veth: more flexible channels number configuration.
- openvswitch: introduce per-cpu upcall dispatch.
- Add internet mix (IMIX) mode to pktgen.
- Transparently handle XDP operations in the bonding driver.
- Add LiteETH network driver.
- Renesas (ravb):
- support Gigabit Ethernet IP
- NXP Ethernet switch (sja1105):
- fast aging support
- support for "H" switch topologies
- traffic termination for ports under VLAN-aware bridge
- Intel 1G Ethernet
- support getcrosststamp() with PCIe PTM (Precision Time
Measurement) for better time sync
- support Credit-Based Shaper (CBS) offload, enabling HW traffic
prioritization and bandwidth reservation
- Broadcom Ethernet (bnxt)
- support pulse-per-second output
- support larger Rx rings
- Mellanox Ethernet (mlx5)
- support ethtool RSS contexts and MQPRIO channel mode
- support LAG offload with bridging
- support devlink rate limit API
- support packet sampling on tunnels
- Huawei Ethernet (hns3):
- basic devlink support
- add extended IRQ coalescing support
- report extended link state
- Netronome Ethernet (nfp):
- add conntrack offload support
- Broadcom WiFi (brcmfmac):
- add WPA3 Personal with FT to supported cipher suites
- support 43752 SDIO device
- Intel WiFi (iwlwifi):
- support scanning hidden 6GHz networks
- support for a new hardware family (Bz)
- Xen pv driver:
- harden netfront against malicious backends
- Qualcomm mobile
- ipa: refactor power management and enable automatic suspend
- mhi: move MBIM to WWAN subsystem interfaces
Refactor:
- Ambient BPF run context and cgroup storage cleanup.
- Compat rework for ndo_ioctl.
Old code removal:
- prism54 remove the obsoleted driver, deprecated by the p54 driver.
- wan: remove sbni/granch driver"
* tag 'net-next-5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1715 commits)
net: Add depends on OF_NET for LiteX's LiteETH
ipv6: seg6: remove duplicated include
net: hns3: remove unnecessary spaces
net: hns3: add some required spaces
net: hns3: clean up a type mismatch warning
net: hns3: refine function hns3_set_default_feature()
ipv6: remove duplicated 'net/lwtunnel.h' include
net: w5100: check return value after calling platform_get_resource()
net/mlxbf_gige: Make use of devm_platform_ioremap_resourcexxx()
net: mdio: mscc-miim: Make use of the helper function devm_platform_ioremap_resource()
net: mdio-ipq4019: Make use of devm_platform_ioremap_resource()
fou: remove sparse errors
ipv4: fix endianness issue in inet_rtm_getroute_build_skb()
octeontx2-af: Set proper errorcode for IPv4 checksum errors
octeontx2-af: Fix static code analyzer reported issues
octeontx2-af: Fix mailbox errors in nix_rss_flowkey_cfg
octeontx2-af: Fix loop in free and unmap counter
af_unix: fix potential NULL deref in unix_dgram_connect()
dpaa2-eth: Replace strlcpy with strscpy
octeontx2-af: Use NDC TX for transmit packet data
...
Daniel Borkmann says:
====================
bpf-next 2021-08-31
We've added 116 non-merge commits during the last 17 day(s) which contain
a total of 126 files changed, 6813 insertions(+), 4027 deletions(-).
The main changes are:
1) Add opaque bpf_cookie to perf link which the program can read out again,
to be used in libbpf-based USDT library, from Andrii Nakryiko.
2) Add bpf_task_pt_regs() helper to access userspace pt_regs, from Daniel Xu.
3) Add support for UNIX stream type sockets for BPF sockmap, from Jiang Wang.
4) Allow BPF TCP congestion control progs to call bpf_setsockopt() e.g. to switch
to another congestion control algorithm during init, from Martin KaFai Lau.
5) Extend BPF iterator support for UNIX domain sockets, from Kuniyuki Iwashima.
6) Allow bpf_{set,get}sockopt() calls from setsockopt progs, from Prankur Gupta.
7) Add bpf_get_netns_cookie() helper for BPF_PROG_TYPE_{SOCK_OPS,CGROUP_SOCKOPT}
progs, from Xu Liu and Stanislav Fomichev.
8) Support for __weak typed ksyms in libbpf, from Hao Luo.
9) Shrink struct cgroup_bpf by 504 bytes through refactoring, from Dave Marchevsky.
10) Fix a smatch complaint in verifier's narrow load handling, from Andrey Ignatov.
11) Fix BPF interpreter's tail call count limit, from Daniel Borkmann.
12) Big batch of improvements to BPF selftests, from Magnus Karlsson, Li Zhijian,
Yucong Sun, Yonghong Song, Ilya Leoshkevich, Jussi Maki, Ilya Leoshkevich, others.
13) Another big batch to revamp XDP samples in order to give them consistent look
and feel, from Kumar Kartikeya Dwivedi.
* https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (116 commits)
MAINTAINERS: Remove self from powerpc BPF JIT
selftests/bpf: Fix potential unreleased lock
samples: bpf: Fix uninitialized variable in xdp_redirect_cpu
selftests/bpf: Reduce more flakyness in sockmap_listen
bpf: Fix bpf-next builds without CONFIG_BPF_EVENTS
bpf: selftests: Add dctcp fallback test
bpf: selftests: Add connect_to_fd_opts to network_helpers
bpf: selftests: Add sk_state to bpf_tcp_helpers.h
bpf: tcp: Allow bpf-tcp-cc to call bpf_(get|set)sockopt
selftests: xsk: Preface options with opt
selftests: xsk: Make enums lower case
selftests: xsk: Generate packets from specification
selftests: xsk: Generate packet directly in umem
selftests: xsk: Simplify cleanup of ifobjects
selftests: xsk: Decrease sending speed
selftests: xsk: Validate tx stats on tx thread
selftests: xsk: Simplify packet validation in xsk tests
selftests: xsk: Rename worker_* functions that are not thread entry points
selftests: xsk: Disassociate umem size with packets sent
selftests: xsk: Remove end-of-test packet
...
====================
Link: https://lore.kernel.org/r/20210830225618.11634-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
While at it, also improve help output when CPU number is greater than
possible.
Fixes: e531a220cc ("samples: bpf: Convert xdp_redirect_cpu to XDP samples helper")
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20210826120910.454081-1-memxor@gmail.com
All pktgen samples can send indefinitely num messages per thread by
setting the count option to 0(-n 0). If running sample with setting
count 0 and press Ctrl-C to stop this program, the program prints the
result of the execution so far. Currently, the samples besides
sample{3...5} don't work properly. Because Ctrl-C stops the script, not
just pktgen.
This is results of samples:
# DEV=eth0 DEST_IP=10.1.0.1 DST_MAC=00:11:22:33:44:55 ./pktgen_sample04_many_flows.sh -n 0
Running... ctrl^C to stop
^CDevice: eth0@0
Result: OK: 569657(c569538+d118) usec, 84650 (60byte,0frags)
148597pps 71Mb/sec (71326560bps) errors: 0
# DEV=eth0 DEST_IP=10.1.0.1 DST_MAC=00:11:22:33:44:55 ./pktgen_sample01_simple.sh -n 0
Running... ctrl^C to stop
^C
In order to solve this, this commit adds trap SIGINT. Also, this commit
changes control_c function to print_result to maintain consistency with
other samples.
Signed-off-by: Juhee Kang <claudiajkang@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently, most pktgen samples print the execution result when the
program is terminated normally. However, sample03 doesn't work
appropriately.
This is results of samples:
# DEV=eth0 DEST_IP=10.1.0.1 DST_MAC=00:11:22:33:44:55 ./pktgen_sample04_many_flows.sh -n 1
Running... ctrl^C to stop
Device: eth0@0
Result: OK: 19(c5+d13) usec, 1 (60byte,0frags)
51762pps 24Mb/sec (24845760bps) errors: 0
# DEV=eth0 DEST_IP=10.1.0.1 DST_MAC=00:11:22:33:44:55 ./pktgen_sample03_burst_single_flow.sh -n 1
Running... ctrl^C to stop
The reason why it doesn't print the execution result when the program is
terminated usually is that sample03 doesn't call the function which
prints the result, unlike other samples.
So, this commit solves this issue by calling the function before
termination. Also, this commit changes control_c function to
print_result to maintain consistency with other samples.
Signed-off-by: Juhee Kang <claudiajkang@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use the libbpf skeleton facility and other utilities provided by XDP
samples helper. Also adapt to change of type of mac address map, so that
no resizing is required.
Add a new flag for sample mask that skips priting the
from_device->to_device heading for each line, as xdp_redirect_map_multi
may have two devices but the flow of data may be bidirectional, so the
output would be confusing.
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20210821002010.845777-23-memxor@gmail.com
One of the notable changes is using a BPF_MAP_TYPE_HASH instead of array
map to store mac addresses of devices, as the resizing behavior was
based on max_ifindex, which unecessarily maximized the capacity of map
beyond what was needed.
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20210821002010.845777-22-memxor@gmail.com
Use the libbpf skeleton facility and other utilities provided by XDP
samples helper.
Since get_mac_addr is already provided by XDP samples helper, we drop
it. Also convert to XDP samples helper similar to prior samples to
minimize duplication of code.
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20210821002010.845777-21-memxor@gmail.com
Also update it to use consistent SEC("xdp") and SEC("xdp_devmap")
naming, and use global variable instead of BPF map for copying the mac
address.
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20210821002010.845777-20-memxor@gmail.com
Use the libbpf skeleton facility and other utilities provided by XDP
samples helper.
Similar to xdp_monitor, xdp_redirect_cpu was quite featureful except a
few minor omissions (e.g. redirect errno reporting). All of these have
been moved to XDP samples helper, hence drop the unneeded code and
convert to usage of helpers provided by it.
One of the important changes here is dropping of mprog-disable option,
as we make that the default. Also, we support built-in programs for some
common actions on the packet when it reaches kthread (pass, drop,
redirect to device). If the user still needs to install a custom
program, they can still supply a BPF object, however the program should
be suitably tagged with SEC("xdp_cpumap") annotation so that the
expected attach type is correct when updating our cpumap map element.
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20210821002010.845777-19-memxor@gmail.com
Similar to xdp_monitor_kern, a lot of these BPF programs have been
reimplemented properly consolidating missing features from other XDP
samples. Hence, drop the unneeded code and rename to .bpf.c suffix.
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20210821002010.845777-18-memxor@gmail.com
Use the libbpf skeleton facility and other utilities provided by XDP
samples helper.
One important note:
The XDP samples helper handles ownership of installed XDP programs on
devices, including responding to SIGINT and SIGTERM, so drop the code
here and use the helpers we provide going forward for all xdp_redirect*
conversions.
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20210821002010.845777-17-memxor@gmail.com
We moved swap_src_dst_mac to xdp_sample.bpf.h to be shared with other
potential users, so drop it while moving code to the new file.
Also, consistently use SEC("xdp") naming instead.
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20210821002010.845777-16-memxor@gmail.com
Use the libbpf skeleton facility and other utilities provided by XDP
samples helper.
A lot of the code in xdp_monitor and xdp_redirect_cpu has been moved to
the xdp_sample_user.o helper, so we remove the duplicate functions here
that are no longer needed.
Thanks to BPF skeleton, we no longer depend on order of tracepoints to
uninstall them on startup. Instead, the sample mask is used to install
the needed tracepoints.
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20210821002010.845777-15-memxor@gmail.com
We already moved all the functionality it provided in XDP samples helper
userspace and kernel BPF object, so just delete the unneeded code.
We also add generation of BPF skeleton and compilation using clang
-target bpf for files ending with .bpf.c suffix (to denote that they use
vmlinux.h).
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20210821002010.845777-14-memxor@gmail.com
Also, take this opportunity to depend on in-tree bpftool, so that we can
use static linking support in subsequent commits for XDP samples BPF
helper object.
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20210821002010.845777-13-memxor@gmail.com
This adds support for retrieval and printing for devmap_xmit total and
mutli mode tracepoint. For multi mode, we keep a hash map entry for each
redirection stream, such that we can dynamically add and remove entries
on output.
The from_match and to_match will be set by individual samples when
setting up the XDP program on these devices.
The multi mode tracepoint is also handy for xdp_redirect_map_multi,
where up to 32 devices can be specified.
Also add samples_init_pre_load macro to finally set up the resized maps
and mmap them in place for low overhead stats retrieval.
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20210821002010.845777-12-memxor@gmail.com
This adds support for the devmap_xmit tracepoint, and its multi device
variant that can be used to obtain streams for each individual
net_device to net_device redirection. This is useful for decomposing
total xmit stats in xdp_monitor.
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20210821002010.845777-11-memxor@gmail.com
This consolidates retrieval and printing into the XDP sample helper. For
the kthread stats, it expands xdp_stats separately with its own per-CPU
stats. For cpumap enqueue, we display FROM->TO stats also with its
per-CPU stats.
The help out explains in detail the various aspects of the output.
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20210821002010.845777-10-memxor@gmail.com
These are invoked in two places, when the XDP frame or SKB (for generic
XDP) enqueued to the ptr_ring (cpumap_enqueue) and when kthread processes
the frame after invoking the CPUMAP program for it (returning stats for
the batch).
We use cpumap_map_id to filter on the map_id as a way to avoid printing
incorrect stats for parallel sessions of xdp_redirect_cpu.
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20210821002010.845777-9-memxor@gmail.com
This would allow us to store stats for each XDP action, including their
per-CPU counts. Consolidating this here allows all redirect samples to
detect xdp_exception events.
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20210821002010.845777-7-memxor@gmail.com
This implements per-errno reporting (for the ones we explicitly
recognize), adds some help output, and implements the stats retrieval
and printing functions.
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20210821002010.845777-6-memxor@gmail.com
This adds the shared BPF file that will be used going forward for
sharing tracepoint programs among XDP redirect samples.
Since vmlinux.h conflicts with tools/include for READ_ONCE/WRITE_ONCE
and ARRAY_SIZE, they are copied in to xdp_sample.bpf.h along with other
helpers that will be required.
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20210821002010.845777-5-memxor@gmail.com
This file implements some common helpers to consolidate differences in
features and functionality between the various XDP samples and give them
a consistent look, feel, and reporting capabilities.
This commit only adds support for receive statistics, which does not
rely on any tracepoint, but on the XDP program installed on the device
by each XDP redirect sample.
Some of the key features are:
* A concise output format accompanied by helpful text explaining its
fields.
* An elaborate output format building upon the concise one, and folding
out details in case of errors and staying out of view otherwise.
* Printing driver names for devices redirecting packets.
* Getting mac address for interface.
* Printing summarized total statistics for the entire session.
* Ability to dynamically switch between concise and verbose mode, using
SIGQUIT (Ctrl + \).
In later patches, the support will be extended for each tracepoint with
its own custom output in concise and verbose mode.
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20210821002010.845777-4-memxor@gmail.com
cookie_uid_helper_example.c: In function ‘main’:
cookie_uid_helper_example.c:178:69: warning: ‘ -j ACCEPT’ directive
writing 10 bytes into a region of size between 8 and 58
[-Wformat-overflow=]
178 | sprintf(rules, "iptables -A OUTPUT -m bpf --object-pinned %s -j ACCEPT",
| ^~~~~~~~~~
/home/kkd/src/linux/samples/bpf/cookie_uid_helper_example.c:178:9: note:
‘sprintf’ output between 53 and 103 bytes into a destination of size 100
178 | sprintf(rules, "iptables -A OUTPUT -m bpf --object-pinned %s -j ACCEPT",
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
179 | file);
| ~~~~~
Fix by using snprintf and a sufficiently sized buffer.
tracex4_user.c:35:15: warning: ‘write’ reading 12 bytes from a region of
size 11 [-Wstringop-overread]
35 | key = write(1, "\e[1;1H\e[2J", 12); /* clear screen */
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~
Use size as 11.
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20210821002010.845777-2-memxor@gmail.com
There's a few cases that a string that is to be recorded in a trace event,
does not have a terminating 'nul' character, and instead, the tracepoint
passes in the length of the string to record.
Add two helper macros to the trace event code that lets this work easier,
than tricks with "%.*s" logic.
__string_len() which is similar to __string() for declaration, but takes a
length argument.
__assign_str_len() which is similar to __assign_str() for assiging the
string, but it too takes a length argument.
Note, the TRACE_EVENT() macro will allocate the location on the ring
buffer to 'len + 1', that will be used to store the string into. It is a
requirement that the 'len' used for this is a most the length of the
string being recorded.
This string can still use __get_str() just like strings created with
__string() can use to retrieve the string.
Link: https://lore.kernel.org/linux-nfs/20210513105018.7539996a@gandalf.local.home/
Tested-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Currently, "sample04" and "sample05" are not working properly when
running with an IPv6 option("-6"). The commit 0f06a6787e ("samples:
Add an IPv6 "-6" option to the pktgen scripts") has omitted the addition
of this option at "sample04" and "sample05".
In order to support IPv6 option, this commit adds logic related to IPv6
option.
Fixes: 0f06a6787e ("samples: Add an IPv6 "-6" option to the pktgen scripts")
Signed-off-by: Juhee Kang <claudiajkang@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
All pktgen samples can use the environment variable instead of option
parameters(eg. $DEV is able to use instead of '-i' option).
This is results of running sample as root and user:
// running as root
# DEV=eth0 DEST_IP=10.1.0.1 DST_MAC=00:11:22:33:44:55 ./pktgen_sample01_simple.sh -v -n 1
Running... ctrl^C to stop
// running as normal user
$ DEV=eth0 DEST_IP=10.1.0.1 DST_MAC=00:11:22:33:44:55 ./pktgen_sample01_simple.sh -v -n 1
[...]
ERROR: Please specify output device
This results show the sample doesn't work properly when the sample runs
as normal user. Because the sample is restarted by the function
(root_check_run_with_sudo) to run with sudo. In this process, the
environment variable of normal user doesn't propagate to sudo.
It can be solved by using "-E"(--preserve-env) option of "sudo", which
preserve normal user's existing environment variables. So this commit
adds "-E" option in the function (root_check_run_with_sudo).
Signed-off-by: Juhee Kang <claudiajkang@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
mbochs_close() iterates over global device state and frees it. Currently
this is done every time a device FD is closed, but if multiple device FDs
are open this could corrupt other still active FDs.
Change this to use close_device() so it only runs on the last close.
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/11-v4-9ea22c5e6afb+1adf-vfio_reflck_jgg@nvidia.com
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
The core code no longer requires these ops to be defined, so delete these
empty functions and leave the op as NULL. mtty's functions only log a
pointless message, delete that entirely.
Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/5-v4-9ea22c5e6afb+1adf-vfio_reflck_jgg@nvidia.com
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
This pairs with vfio_init_group_dev() and allows undoing any state that is
stored in the vfio_device unrelated to registration. Add appropriately
placed calls to all the drivers.
The following patch will use this to add pre-registration state for the
device set.
Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/3-v4-9ea22c5e6afb+1adf-vfio_reflck_jgg@nvidia.com
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Convert mbochs to use an atomic scheme for this like mtty was changed
into. The atomic fixes various race conditions with probing. Add the
missing error unwind. Also add the missing kfree of mdev_state->pages.
Fixes: 681c1615f8 ("vfio/mbochs: Convert to use vfio_register_group_dev()")
Reported-by: Cornelia Huck <cohuck@redhat.com>
Co-developed-by: Alex Williamson <alex.williamson@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Link: https://lore.kernel.org/r/2-v4-9ea22c5e6afb+1adf-vfio_reflck_jgg@nvidia.com
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
The patch to move the get/put to core and the patch to convert the samples
to use vfio_device crossed in a way that this was missed. When both
patches are together the samples do not need their own get/put.
Fixes: 437e41368c ("vfio/mdpy: Convert to use vfio_register_group_dev()")
Fixes: 681c1615f8 ("vfio/mbochs: Convert to use vfio_register_group_dev()")
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Max Gurtovoy <mgurtovoy@nvidia.com>
Link: https://lore.kernel.org/r/1-v4-9ea22c5e6afb+1adf-vfio_reflck_jgg@nvidia.com
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
A codeblock for handling nested vlan trips newbies into thinking it as
duplicate code. Explicitly add a comment to clarify.
Signed-off-by: Muhammad Falak R Wani <falakreyaz@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20210809070046.32142-1-falakreyaz@gmail.com
There is a forward declaration of ip_fast_csum() just before its
implementation, remove the unneeded forward declaration.
While at it mark the implementation as static inline.
Signed-off-by: Niklas Söderlund <niklas.soderlund@corigine.com>
Signed-off-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Reviewed-by: Louis Peens <louis.peens@corigine.com>
Link: https://lore.kernel.org/bpf/20210806122855.26115-3-simon.horman@corigine.com
The xdpsock sample application is a useful base for experiment's around
AF_XDP sockets. Compiling the sample outside of the kernel tree is made
harder then it has to be as the sample includes two headers and that are
not installed by 'make install_header' nor are usually part of
distributions kernel headers.
The first header asm/barrier.h is not used and can just be dropped.
The second linux/compiler.h are only needed for the decorator __force
and are only used in ip_fast_csum(), csum_fold() and
csum_tcpudp_nofold(). These functions are copied verbatim from
include/asm-generic/checksum.h and lib/checksum.c. While it's fine to
copy and use these functions in the sample application the decorator
brings no value and can be dropped together with the include.
With this change it's trivial to compile the xdpsock sample outside the
kernel tree from xdpsock_user.c and xdpsock.h.
$ gcc -o xdpsock xdpsock_user.c -lbpf -lpthread
Signed-off-by: Niklas Söderlund <niklas.soderlund@corigine.com>
Signed-off-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Reviewed-by: Louis Peens <louis.peens@corigine.com>
Link: https://lore.kernel.org/bpf/20210806122855.26115-2-simon.horman@corigine.com
Commit ce4dade7f1 ("samples/bpf: xdp_redirect_cpu: Load a eBPF program
on cpumap") added the following option, but missed adding it to optstring:
- mprog-disable: disable loading XDP program on cpumap entries
Fix it and add the missing option character.
Fixes: ce4dade7f1 ("samples/bpf: xdp_redirect_cpu: Load a eBPF program on cpumap")
Signed-off-by: Matthew Cover <matthew.cover@stackpath.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20210731005632.13228-1-matthew.cover@stackpath.com
The original mei driver communication was strictly write command and
receive response flow, the completion of write was determined when
response was ready using select(). This paradigm is a long time not
true. There can be write without a response and an unsolicited read.
The driver is capable of handling those.
Adjust also the sample code and remove select() on read() from the
write flow. Add select to the read flow to showcase how to do the
read with a timeout.
Signed-off-by: Alexander Usyskin <alexander.usyskin@intel.com>
Signed-off-by: Tomas Winkler <tomas.winkler@intel.com>
Link: https://lore.kernel.org/r/20210801072532.8668-1-tomas.winkler@intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
There are recently added xdp samples (xdp_redirect_map_multi and
xdpsock_ctrl_proc) which are not managed by .gitignore.
This commit adds these files to .gitignore.
Signed-off-by: Juhee Kang <claudiajkang@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210727041056.23455-2-claudiajkang@gmail.com
The current behavior of 'tracex7' doesn't consist with other bpf samples
tracex{1..6}. Other samples do not require any argument to run with, but
tracex7 should be run with btrfs device argument. (it should be executed
with test_override_return.sh)
Currently, tracex7 doesn't have any description about how to run this
program and raises an unexpected error. And this result might be
confusing since users might not have a hunch about how to run this
program.
// Current behavior
# ./tracex7
sh: 1: Syntax error: word unexpected (expecting ")")
// Fixed behavior
# ./tracex7
ERROR: Run with the btrfs device argument!
In order to fix this error, this commit adds logic to report a message
and exit when running this program with a missing argument.
Additionally in test_override_return.sh, there is a problem with
multiple directory(tmpmnt) creation. So in this commit adds a line with
removing the directory with every execution.
Signed-off-by: Juhee Kang <claudiajkang@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20210727041056.23455-1-claudiajkang@gmail.com
Remove redundant prefix "cmd_" from name of members in struct kdbtab_t
for better readibility.
Suggested-by: Doug Anderson <dianders@chromium.org>
Signed-off-by: Sumit Garg <sumit.garg@linaro.org>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Link: https://lore.kernel.org/r/20210712134620.276667-5-sumit.garg@linaro.org
Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org>
Commit e4f291b3f7 ("kdb: Simplify kdb commands registration")
allowed registration of pre-allocated kdb commands with pointer to
struct kdbtab_t. Lets switch other users as well to register pre-
allocated kdb commands via:
- Changing prototype for kdb_register() to pass a pointer to struct
kdbtab_t instead.
- Embed kdbtab_t structure in kdb_macro_t rather than individual params.
With these changes kdb_register_flags() becomes redundant and hence
removed. Also, since we have switched all users to register
pre-allocated commands, "is_dynamic" flag in struct kdbtab_t becomes
redundant and hence removed as well.
Suggested-by: Daniel Thompson <daniel.thompson@linaro.org>
Signed-off-by: Sumit Garg <sumit.garg@linaro.org>
Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Link: https://lore.kernel.org/r/20210712134620.276667-3-sumit.garg@linaro.org
Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org>
Alexei Starovoitov says:
====================
pull-request: bpf-next 2021-07-15
The following pull-request contains BPF updates for your *net-next* tree.
We've added 45 non-merge commits during the last 15 day(s) which contain
a total of 52 files changed, 3122 insertions(+), 384 deletions(-).
The main changes are:
1) Introduce bpf timers, from Alexei.
2) Add sockmap support for unix datagram socket, from Cong.
3) Fix potential memleak and UAF in the verifier, from He.
4) Add bpf_get_func_ip helper, from Jiri.
5) Improvements to generic XDP mode, from Kumar.
6) Support for passing xdp_md to XDP programs in bpf_prog_run, from Zvi.
===================
Signed-off-by: David S. Miller <davem@davemloft.net>
Current release - regressions:
- sock: fix parameter order in sock_setsockopt()
Current release - new code bugs:
- netfilter: nft_last:
- fix incorrect arithmetic when restoring last used
- honor NFTA_LAST_SET on restoration
Previous releases - regressions:
- udp: properly flush normal packet at GRO time
- sfc: ensure correct number of XDP queues; don't allow enabling the
feature if there isn't sufficient resources to Tx from any CPU
- dsa: sja1105: fix address learning getting disabled on the CPU port
- mptcp: addresses a rmem accounting issue that could keep packets
in subflow receive buffers longer than necessary, delaying
MPTCP-level ACKs
- ip_tunnel: fix mtu calculation for ETHER tunnel devices
- do not reuse skbs allocated from skbuff_fclone_cache in the napi
skb cache, we'd try to return them to the wrong slab cache
- tcp: consistently disable header prediction for mptcp
Previous releases - always broken:
- bpf: fix subprog poke descriptor tracking use-after-free
- ipv6:
- allocate enough headroom in ip6_finish_output2() in case
iptables TEE is used
- tcp: drop silly ICMPv6 packet too big messages to avoid
expensive and pointless lookups (which may serve as a DDOS
vector)
- make sure fwmark is copied in SYNACK packets
- fix 'disable_policy' for forwarded packets (align with IPv4)
- netfilter: conntrack: do not renew entry stuck in tcp SYN_SENT state
- netfilter: conntrack: do not mark RST in the reply direction coming
after SYN packet for an out-of-sync entry
- mptcp: cleanly handle error conditions with MP_JOIN and syncookies
- mptcp: fix double free when rejecting a join due to port mismatch
- validate lwtstate->data before returning from skb_tunnel_info()
- tcp: call sk_wmem_schedule before sk_mem_charge in zerocopy path
- mt76: mt7921: continue to probe driver when fw already downloaded
- bonding: fix multiple issues with offloading IPsec to (thru?) bond
- stmmac: ptp: fix issues around Qbv support and setting time back
- bcmgenet: always clear wake-up based on energy detection
Misc:
- sctp: move 198 addresses from unusable to private scope
- ptp: support virtual clocks and timestamping
- openvswitch: optimize operation for key comparison
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmDu3mMACgkQMUZtbf5S
Irsjxg//UwcPJMYFmXV+fGkEsWYe1Kf29FcUDEeANFtbltfAcIfZ0GoTbSDRnrVb
HcYAKcm4XRx5bWWdQrQsQq/yiLbnS/rSLc7VRB+uRHWRKl3eYcaUB2rnCXsxrjGw
wQJgOmztDCJS4BIky24iQpF/8lg7p/Gj2Ih532gh93XiYo612FrEJKkYb2/OQfYX
GkbnZ0kL2Y1SV+bhy6aT5azvhHKM4/3eA4fHeJ2p8e2gOZ5ni0vpX0xEzdzKOCd0
vwR/Wu3h/+2QuFYVcSsVguuM++JXACG8MAS/Tof78dtNM4a3kQxzqeh5Bv6IkfTu
rokENLq4pjNRy+nBAOeQZj8Jd0K0kkf/PN9WMdGQtplMoFhjjV25R6PeRrV9wwPo
peozIz2MuQo7Kfof1D+44h2foyLfdC28/Z0CvRbDpr5EHOfYynvBbrnhzIGdQp6V
xgftKTOdgz2Djgg8HiblZund1FA44OYerddVAASrIsnSFnIz1VLVQIsfV+GLBwwc
FawrIZ6WfIjzRSrDGOvDsbAQI47T/1jbaPJeK6XgjWkQmjEd6UtRWRZLYCxemQEw
4HP3sWC96BOehuD8ylipVE1oFqrxCiOB/fZxezXqjo8dSX3NLdak4cCHTHoW5SuZ
eEAxQRaBliKd+P7hoy9cZ57CAu3zUa8kijfM5QRlCAHF+zSxaPs=
=QFnb
-----END PGP SIGNATURE-----
Merge tag 'net-5.14-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski.
"Including fixes from bpf and netfilter.
Current release - regressions:
- sock: fix parameter order in sock_setsockopt()
Current release - new code bugs:
- netfilter: nft_last:
- fix incorrect arithmetic when restoring last used
- honor NFTA_LAST_SET on restoration
Previous releases - regressions:
- udp: properly flush normal packet at GRO time
- sfc: ensure correct number of XDP queues; don't allow enabling the
feature if there isn't sufficient resources to Tx from any CPU
- dsa: sja1105: fix address learning getting disabled on the CPU port
- mptcp: addresses a rmem accounting issue that could keep packets in
subflow receive buffers longer than necessary, delaying MPTCP-level
ACKs
- ip_tunnel: fix mtu calculation for ETHER tunnel devices
- do not reuse skbs allocated from skbuff_fclone_cache in the napi
skb cache, we'd try to return them to the wrong slab cache
- tcp: consistently disable header prediction for mptcp
Previous releases - always broken:
- bpf: fix subprog poke descriptor tracking use-after-free
- ipv6:
- allocate enough headroom in ip6_finish_output2() in case
iptables TEE is used
- tcp: drop silly ICMPv6 packet too big messages to avoid
expensive and pointless lookups (which may serve as a DDOS
vector)
- make sure fwmark is copied in SYNACK packets
- fix 'disable_policy' for forwarded packets (align with IPv4)
- netfilter: conntrack:
- do not renew entry stuck in tcp SYN_SENT state
- do not mark RST in the reply direction coming after SYN packet
for an out-of-sync entry
- mptcp: cleanly handle error conditions with MP_JOIN and syncookies
- mptcp: fix double free when rejecting a join due to port mismatch
- validate lwtstate->data before returning from skb_tunnel_info()
- tcp: call sk_wmem_schedule before sk_mem_charge in zerocopy path
- mt76: mt7921: continue to probe driver when fw already downloaded
- bonding: fix multiple issues with offloading IPsec to (thru?) bond
- stmmac: ptp: fix issues around Qbv support and setting time back
- bcmgenet: always clear wake-up based on energy detection
Misc:
- sctp: move 198 addresses from unusable to private scope
- ptp: support virtual clocks and timestamping
- openvswitch: optimize operation for key comparison"
* tag 'net-5.14-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (158 commits)
net: dsa: properly check for the bridge_leave methods in dsa_switch_bridge_leave()
sfc: add logs explaining XDP_TX/REDIRECT is not available
sfc: ensure correct number of XDP queues
sfc: fix lack of XDP TX queues - error XDP TX failed (-22)
net: fddi: fix UAF in fza_probe
net: dsa: sja1105: fix address learning getting disabled on the CPU port
net: ocelot: fix switchdev objects synced for wrong netdev with LAG offload
net: Use nlmsg_unicast() instead of netlink_unicast()
octeontx2-pf: Fix uninitialized boolean variable pps
ipv6: allocate enough headroom in ip6_finish_output2()
net: hdlc: rename 'mod_init' & 'mod_exit' functions to be module-specific
net: bridge: multicast: fix MRD advertisement router port marking race
net: bridge: multicast: fix PIM hello router port marking race
net: phy: marvell10g: fix differentiation of 88X3310 from 88X3340
dsa: fix for_each_child.cocci warnings
virtio_net: check virtqueue_add_sgs() return value
mptcp: properly account bulk freed memory
selftests: mptcp: fix case multiple subflows limited by server
mptcp: avoid processing packet if a subflow reset
mptcp: fix syncookie process if mptcp can not_accept new subflow
...
Experience from production shows queue size of 192 is too small, as
this caused packet drops during cpumap-enqueue on RX-CPU. This can be
diagnosed with xdp_monitor sample program.
This bpftrace program was used to diagnose the problem in more detail:
bpftrace -e '
tracepoint:xdp:xdp_cpumap_kthread { @deq_bulk = lhist(args->processed,0,10,1); @drop_net = lhist(args->drops,0,10,1) }
tracepoint:xdp:xdp_cpumap_enqueue { @enq_bulk = lhist(args->processed,0,10,1); @enq_drops = lhist(args->drops,0,10,1); }'
Watch out for the @enq_drops counter. The @drop_net counter can happen
when netstack gets invalid packets, so don't despair it can be
natural, and that counter will likely disappear in newer kernels as it
was a source of confusion (look at netstat info for reason of the
netstack @drop_net counters).
The production system was configured with CPU power-saving C6 state.
Learn more in this blogpost[1].
And wakeup latency in usec for the states are:
# grep -H . /sys/devices/system/cpu/cpu0/cpuidle/*/latency
/sys/devices/system/cpu/cpu0/cpuidle/state0/latency:0
/sys/devices/system/cpu/cpu0/cpuidle/state1/latency:2
/sys/devices/system/cpu/cpu0/cpuidle/state2/latency:10
/sys/devices/system/cpu/cpu0/cpuidle/state3/latency:133
Deepest state take 133 usec to wakeup from (133/10^6). The link speed
is 25Gbit/s ((25*10^9/8) in bytes/sec). How many bytes can arrive with
in 133 usec at this speed: (25*10^9/8)*(133/10^6) = 415625 bytes. With
MTU size packets this is 275 packets, and with minimum Ethernet (incl
intergap overhead) 84 bytes it is 4948 packets. Clearly default queue
size is too small.
Setting default cpumap queue to 2048 as worst-case (small packet) at
10Gbit/s is 1979 packets with 133 usec wakeup time, +64 packet before
kthread wakeup call (due to xdp_do_flush) worst-case 2043 packets.
Thus, if a packet burst on RX-CPU will enqueue packets to a remote
cpumap CPU that is in deep-sleep state it can overrun the cpumap queue.
The production system was also configured to avoid deep-sleep via:
tuned-adm profile network-latency
[1] https://jeremyeder.com/2013/08/30/oh-did-you-expect-the-cpu/
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Song Liu <songliubraving@fb.com>
Link: https://lore.kernel.org/bpf/162523477604.786243.13372630844944530891.stgit@firesoul
Execute the following command and exit, then execute it again, the following
error will be reported:
$ sudo ./samples/bpf/xdpsock -i ens4f2 -M
^C
$ sudo ./samples/bpf/xdpsock -i ens4f2 -M
libbpf: elf: skipping unrecognized data section(16) .eh_frame
libbpf: elf: skipping relo section(17) .rel.eh_frame for section(16) .eh_frame
libbpf: Kernel error message: XDP program already attached
ERROR: link set xdp fd failed
Commit c9d27c9e8d ("samples: bpf: Do not unload prog within xdpsock") removed
the unloading prog code because of the presence of bpf_link. This is fine if
XDP_SHARED_UMEM is disabled, but if it is enabled, unloading the prog is still
needed.
Fixes: c9d27c9e8d ("samples: bpf: Do not unload prog within xdpsock")
Signed-off-by: Wang Hai <wanghai38@huawei.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Magnus Karlsson <magnus.karlsson@intel.com>
Cc: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Link: https://lore.kernel.org/bpf/20210628091815.2373487-1-wanghai38@huawei.com
The samples/bpf Makefile currently compiles BPF files in a way that will
produce an .eh_frame section, which will in turn confuse libbpf and produce
errors when loading BPF programs, like:
libbpf: elf: skipping unrecognized data section(32) .eh_frame
libbpf: elf: skipping relo section(33) .rel.eh_frame for section(32) .eh_frame
Fix this by instruction Clang not to produce this section, as it's useless
for BPF anyway.
Suggested-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20210705103841.180260-1-toke@redhat.com
Core:
- BPF:
- add syscall program type and libbpf support for generating
instructions and bindings for in-kernel BPF loaders (BPF loaders
for BPF), this is a stepping stone for signed BPF programs
- infrastructure to migrate TCP child sockets from one listener
to another in the same reuseport group/map to improve flexibility
of service hand-off/restart
- add broadcast support to XDP redirect
- allow bypass of the lockless qdisc to improving performance
(for pktgen: +23% with one thread, +44% with 2 threads)
- add a simpler version of "DO_ONCE()" which does not require
jump labels, intended for slow-path usage
- virtio/vsock: introduce SOCK_SEQPACKET support
- add getsocketopt to retrieve netns cookie
- ip: treat lowest address of a IPv4 subnet as ordinary unicast address
allowing reclaiming of precious IPv4 addresses
- ipv6: use prandom_u32() for ID generation
- ip: add support for more flexible field selection for hashing
across multi-path routes (w/ offload to mlxsw)
- icmp: add support for extended RFC 8335 PROBE (ping)
- seg6: add support for SRv6 End.DT46 behavior
- mptcp:
- DSS checksum support (RFC 8684) to detect middlebox meddling
- support Connection-time 'C' flag
- time stamping support
- sctp: packetization Layer Path MTU Discovery (RFC 8899)
- xfrm: speed up state addition with seq set
- WiFi:
- hidden AP discovery on 6 GHz and other HE 6 GHz improvements
- aggregation handling improvements for some drivers
- minstrel improvements for no-ack frames
- deferred rate control for TXQs to improve reaction times
- switch from round robin to virtual time-based airtime scheduler
- add trace points:
- tcp checksum errors
- openvswitch - action execution, upcalls
- socket errors via sk_error_report
Device APIs:
- devlink: add rate API for hierarchical control of max egress rate
of virtual devices (VFs, SFs etc.)
- don't require RCU read lock to be held around BPF hooks
in NAPI context
- page_pool: generic buffer recycling
New hardware/drivers:
- mobile:
- iosm: PCIe Driver for Intel M.2 Modem
- support for Qualcomm MSM8998 (ipa)
- WiFi: Qualcomm QCN9074 and WCN6855 PCI devices
- sparx5: Microchip SparX-5 family of Enterprise Ethernet switches
- Mellanox BlueField Gigabit Ethernet (control NIC of the DPU)
- NXP SJA1110 Automotive Ethernet 10-port switch
- Qualcomm QCA8327 switch support (qca8k)
- Mikrotik 10/25G NIC (atl1c)
Driver changes:
- ACPI support for some MDIO, MAC and PHY devices from Marvell and NXP
(our first foray into MAC/PHY description via ACPI)
- HW timestamping (PTP) support: bnxt_en, ice, sja1105, hns3, tja11xx
- Mellanox/Nvidia NIC (mlx5)
- NIC VF offload of L2 bridging
- support IRQ distribution to Sub-functions
- Marvell (prestera):
- add flower and match all
- devlink trap
- link aggregation
- Netronome (nfp): connection tracking offload
- Intel 1GE (igc): add AF_XDP support
- Marvell DPU (octeontx2): ingress ratelimit offload
- Google vNIC (gve): new ring/descriptor format support
- Qualcomm mobile (rmnet & ipa): inline checksum offload support
- MediaTek WiFi (mt76)
- mt7915 MSI support
- mt7915 Tx status reporting
- mt7915 thermal sensors support
- mt7921 decapsulation offload
- mt7921 enable runtime pm and deep sleep
- Realtek WiFi (rtw88)
- beacon filter support
- Tx antenna path diversity support
- firmware crash information via devcoredump
- Qualcomm 60GHz WiFi (wcn36xx)
- Wake-on-WLAN support with magic packets and GTK rekeying
- Micrel PHY (ksz886x/ksz8081): add cable test support
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmDb+fUACgkQMUZtbf5S
Irs2Jg//aqN0Q8CgIvYCVhPxQw1tY7pTAbgyqgBZ01vwjyvtIOgJiWzSfFEU84mX
M8fcpFX5eTKrOyJ9S6UFfQ/JG114n3hjAxFFT4Hxk2gC1Tg0vHuFQTDHcUl28bUE
mTm61e1YpdorILnv2k5JVQ/wu0vs5QKDrjcYcrcPnh+j93wvnPOgAfDBV95nZzjS
OTt4q2fR8GzLcSYWWsclMbDNkzyTG50RW/0Yd6aGjr5QGvXfrMeXfUJNz533PMf/
w5lNyjRKv+x9mdTZJzU0+msNUrZgUdRz7W8Ey8lD3hJZRE+D6/uU7FtsE8Mi3+uc
HWxeZUyzA3YF1MfVl/eesbxyPT7S/OkLzk4O5B35FbqP0YltaP+bOjq1/nM3ce1/
io9Dx9pIl/2JANUgRCAtLi8Z2dkvRoqTaBxZ/nPudCCljFwDwl6joTMJ7Ow22i5Y
5aIkcXFmZq4LbJDiHvbTlqT7yiuaEvu2UK/23bSIg/K3nF4eAmkY9Y1EgiMf60OF
78Ttw0wk2tUegwaS5MZnCniKBKDyl9gM2F6rbZ/IxQRR2LTXFc1B6gC+ynUxgXfh
Ub8O++6qGYGYZ0XvQH4pzco79p3qQWBTK5beIp2eu6BOAjBVIXq4AibUfoQLACsu
hX7jMPYd0kc3WFgUnKgQP8EnjFSwbf4XiaE7fIXvWBY8hzCw2h4=
=LvtX
-----END PGP SIGNATURE-----
Merge tag 'net-next-5.14' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next
Pull networking updates from Jakub Kicinski:
"Core:
- BPF:
- add syscall program type and libbpf support for generating
instructions and bindings for in-kernel BPF loaders (BPF loaders
for BPF), this is a stepping stone for signed BPF programs
- infrastructure to migrate TCP child sockets from one listener to
another in the same reuseport group/map to improve flexibility
of service hand-off/restart
- add broadcast support to XDP redirect
- allow bypass of the lockless qdisc to improving performance (for
pktgen: +23% with one thread, +44% with 2 threads)
- add a simpler version of "DO_ONCE()" which does not require jump
labels, intended for slow-path usage
- virtio/vsock: introduce SOCK_SEQPACKET support
- add getsocketopt to retrieve netns cookie
- ip: treat lowest address of a IPv4 subnet as ordinary unicast
address allowing reclaiming of precious IPv4 addresses
- ipv6: use prandom_u32() for ID generation
- ip: add support for more flexible field selection for hashing
across multi-path routes (w/ offload to mlxsw)
- icmp: add support for extended RFC 8335 PROBE (ping)
- seg6: add support for SRv6 End.DT46 behavior
- mptcp:
- DSS checksum support (RFC 8684) to detect middlebox meddling
- support Connection-time 'C' flag
- time stamping support
- sctp: packetization Layer Path MTU Discovery (RFC 8899)
- xfrm: speed up state addition with seq set
- WiFi:
- hidden AP discovery on 6 GHz and other HE 6 GHz improvements
- aggregation handling improvements for some drivers
- minstrel improvements for no-ack frames
- deferred rate control for TXQs to improve reaction times
- switch from round robin to virtual time-based airtime scheduler
- add trace points:
- tcp checksum errors
- openvswitch - action execution, upcalls
- socket errors via sk_error_report
Device APIs:
- devlink: add rate API for hierarchical control of max egress rate
of virtual devices (VFs, SFs etc.)
- don't require RCU read lock to be held around BPF hooks in NAPI
context
- page_pool: generic buffer recycling
New hardware/drivers:
- mobile:
- iosm: PCIe Driver for Intel M.2 Modem
- support for Qualcomm MSM8998 (ipa)
- WiFi: Qualcomm QCN9074 and WCN6855 PCI devices
- sparx5: Microchip SparX-5 family of Enterprise Ethernet switches
- Mellanox BlueField Gigabit Ethernet (control NIC of the DPU)
- NXP SJA1110 Automotive Ethernet 10-port switch
- Qualcomm QCA8327 switch support (qca8k)
- Mikrotik 10/25G NIC (atl1c)
Driver changes:
- ACPI support for some MDIO, MAC and PHY devices from Marvell and
NXP (our first foray into MAC/PHY description via ACPI)
- HW timestamping (PTP) support: bnxt_en, ice, sja1105, hns3, tja11xx
- Mellanox/Nvidia NIC (mlx5)
- NIC VF offload of L2 bridging
- support IRQ distribution to Sub-functions
- Marvell (prestera):
- add flower and match all
- devlink trap
- link aggregation
- Netronome (nfp): connection tracking offload
- Intel 1GE (igc): add AF_XDP support
- Marvell DPU (octeontx2): ingress ratelimit offload
- Google vNIC (gve): new ring/descriptor format support
- Qualcomm mobile (rmnet & ipa): inline checksum offload support
- MediaTek WiFi (mt76)
- mt7915 MSI support
- mt7915 Tx status reporting
- mt7915 thermal sensors support
- mt7921 decapsulation offload
- mt7921 enable runtime pm and deep sleep
- Realtek WiFi (rtw88)
- beacon filter support
- Tx antenna path diversity support
- firmware crash information via devcoredump
- Qualcomm WiFi (wcn36xx)
- Wake-on-WLAN support with magic packets and GTK rekeying
- Micrel PHY (ksz886x/ksz8081): add cable test support"
* tag 'net-next-5.14' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (2168 commits)
tcp: change ICSK_CA_PRIV_SIZE definition
tcp_yeah: check struct yeah size at compile time
gve: DQO: Fix off by one in gve_rx_dqo()
stmmac: intel: set PCI_D3hot in suspend
stmmac: intel: Enable PHY WOL option in EHL
net: stmmac: option to enable PHY WOL with PMT enabled
net: say "local" instead of "static" addresses in ndo_dflt_fdb_{add,del}
net: use netdev_info in ndo_dflt_fdb_{add,del}
ptp: Set lookup cookie when creating a PTP PPS source.
net: sock: add trace for socket errors
net: sock: introduce sk_error_report
net: dsa: replay the local bridge FDB entries pointing to the bridge dev too
net: dsa: ensure during dsa_fdb_offload_notify that dev_hold and dev_put are on the same dev
net: dsa: include fdb entries pointing to bridge in the host fdb list
net: dsa: include bridge addresses which are local in the host fdb list
net: dsa: sync static FDB entries on foreign interfaces to hardware
net: dsa: install the host MDB and FDB entries in the master's RX filter
net: dsa: reference count the FDB addresses at the cross-chip notifier level
net: dsa: introduce a separate cross-chip notifier type for host FDBs
net: dsa: reference count the MDB entries at the cross-chip notifier level
...
- Some kernel-doc cleanups. That script is still regex onslaught from
hell, but it has gotten a little better.
- Improvements to the checkpatch docs, which are also used by the tool
itself.
- A major update to the pathname lookup documentation.
- Elimination of :doc: markup, since our automarkup magic can create
references from filenames without all the extra noise.
- The flurry of Chinese translation activity continues.
Plus, of course, the usual collection of updates, typo fixes, and warning
fixes.
-----BEGIN PGP SIGNATURE-----
iQFDBAABCAAtFiEEIw+MvkEiF49krdp9F0NaE2wMflgFAmDZ6pQPHGNvcmJldEBs
d24ubmV0AAoJEBdDWhNsDH5Y9W0IAIpzBZDVsDQ7s5cIjbxEh9Oeh1uRmwuObnQh
xsM5oLuAUSMczf5JX8cdyutWJfdoEF5WHjfbt1otfys+kW9m7z0b1K4xw684Y390
sPk3eYVYLiUAZ4/LVdC47BpAzzgJ5U9iC6+FjOATAYsY40EwruxyZWjmY+SaDOU5
dQPjbpRuNQTFjYE6nZIW0o6jyunrfFaJTS6g2bdDoBDOGKyNOSKEw4XZ442cJ3km
uXoMfSJGslQj6qbGY0YhNeaNQm0ErcQw2K4lS3K4gc7Lht32Fbi1lhaqnTIkgI5f
Rh3X37pb90Ya88uWxldVB2bXUrA+PZA/cJqwNTrgw+niBQl6sKU=
=KDcM
-----END PGP SIGNATURE-----
Merge tag 'docs-5.14' of git://git.lwn.net/linux
Pull documentation updates from Jonathan Corbet:
"This was a reasonably active cycle for documentation; this includes:
- Some kernel-doc cleanups. That script is still regex onslaught from
hell, but it has gotten a little better.
- Improvements to the checkpatch docs, which are also used by the
tool itself.
- A major update to the pathname lookup documentation.
- Elimination of :doc: markup, since our automarkup magic can create
references from filenames without all the extra noise.
- The flurry of Chinese translation activity continues.
Plus, of course, the usual collection of updates, typo fixes, and
warning fixes"
* tag 'docs-5.14' of git://git.lwn.net/linux: (115 commits)
docs: path-lookup: use bare function() rather than literals
docs: path-lookup: update symlink description
docs: path-lookup: update get_link() ->follow_link description
docs: path-lookup: update WALK_GET, WALK_PUT desc
docs: path-lookup: no get_link()
docs: path-lookup: update i_op->put_link and cookie description
docs: path-lookup: i_op->follow_link replaced with i_op->get_link
docs: path-lookup: Add macro name to symlink limit description
docs: path-lookup: remove filename_mountpoint
docs: path-lookup: update do_last() part
docs: path-lookup: update path_mountpoint() part
docs: path-lookup: update path_to_nameidata() part
docs: path-lookup: update follow_managed() part
docs: Makefile: Use CONFIG_SHELL not SHELL
docs: Take a little noise out of the build process
docs: x86: avoid using ReST :doc:`foo` markup
docs: virt: kvm: s390-pv-boot.rst: avoid using ReST :doc:`foo` markup
docs: userspace-api: landlock.rst: avoid using ReST :doc:`foo` markup
docs: trace: ftrace.rst: avoid using ReST :doc:`foo` markup
docs: trace: coresight: coresight.rst: avoid using ReST :doc:`foo` markup
...
Daniel Borkmann says:
====================
pull-request: bpf-next 2021-06-28
The following pull-request contains BPF updates for your *net-next* tree.
We've added 37 non-merge commits during the last 12 day(s) which contain
a total of 56 files changed, 394 insertions(+), 380 deletions(-).
The main changes are:
1) XDP driver RCU cleanups, from Toke Høiland-Jørgensen and Paul E. McKenney.
2) Fix bpf_skb_change_proto() IPv4/v6 GSO handling, from Maciej Żenczykowski.
3) Fix false positive kmemleak report for BPF ringbuf alloc, from Rustam Kovhaev.
4) Fix x86 JIT's extable offset calculation for PROBE_LDX NULL, from Ravi Bangoria.
5) Enable libbpf fallback probing with tracing under RHEL7, from Jonathan Edwards.
6) Clean up x86 JIT to remove unused cnt tracking from EMIT macro, from Jiri Olsa.
7) Netlink cleanups for libbpf to please Coverity, from Kumar Kartikeya Dwivedi.
8) Allow to retrieve ancestor cgroup id in tracing programs, from Namhyung Kim.
9) Fix lirc BPF program query to use user-provided prog_cnt, from Sean Young.
10) Add initial libbpf doc including generated kdoc for its API, from Grant Seltzer.
11) Make xdp_rxq_info_unreg_mem_model() more robust, from Jakub Kicinski.
12) Fix up bpfilter startup log-level to info level, from Gary Lin.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
The sample mtty mdev driver doesn't actually enforce the number of
device instances it claims are available. Implement this properly.
Link: https://lore.kernel.org/r/162465624894.3338367.12935940647049917981.stgit@omen
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Reviewed by: Kirti Wankhede <kwankhede@nvidia.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
- Platform PMU driver updates:
- x86 Intel uncore driver updates for Skylake (SNR) and Icelake (ICX) servers
- Fix RDPMC support
- Fix [extended-]PEBS-via-PT support
- Fix Sapphire Rapids event constraints
- Fix :ppp support on Sapphire Rapids
- Fix fixed counter sanity check on Alder Lake & X86_FEATURE_HYBRID_CPU
- Other heterogenous-PMU fixes
- Kprobes:
- Remove the unused and misguided kprobe::fault_handler callbacks.
- Warn about kprobes taking a page fault.
- Fix the 'nmissed' stat counter.
- Misc cleanups and fixes.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmDZaxMRHG1pbmdvQGtl
cm5lbC5vcmcACgkQEnMQ0APhK1hPgw//f9SnGzFoP1uR5TBqM8j/QHulMewew/iD
dM5lh2emdmqHWYPBeRxUHgag38K2Golr3Y+NxLA3R+RMx+OZQe8Mz/wYvPQcBvsV
k1HHImU3GRMn4GM7GwxH3vPIottDUx3mNS2J6pzlw3kwRUVqrxUdj/0/pSY/4eJ7
ZT4uq4yLV83Jd3qioU7o7e/u6MrdNIIcAXRpVDdE9Mm1+kWXSVN7/h3Vsiz4tj5E
iS+UXEtSc1a2mnmekv63pYkJHHNUb6guD8jgI/wrm1KIFGjDRifM+3TV6R/kB96/
TfD2LhCcTShfSp8KI191pgV7/NQbB/PmLdSYmff3rTBiii4cqXuCygJCHInZ09z0
4fTSSqM6aHg7kfTQyOCp+DUQ+9vNVXWo8mxt9c6B8xA0GyCI3zhjQ4UIiSUWRpjs
Be5ZyF0kNNuPxYrKFnGnBf8+51DURpCz3sDdYRuK4KNkj1+4ZvJo/KzGTMUUIE4B
IDQG6wDP5Kb388eRDtKrG5X7IXg+L5F/kezin60j0QF5MwDgxirT217teN8H1lNn
YgWMjRK8Tw0flUJsbCxa51/nl93UtByB+fIRIc88MSeLxcI6/ORW+TxBBEqkYm5Z
6BLFtmHSuAqAXUuyZXSGLcW7XLJvIaDoHgvbDn6l4g7FMWHqPOIq6nJQY3L8ben2
e+fQrGh4noI=
=20Vc
-----END PGP SIGNATURE-----
Merge tag 'perf-core-2021-06-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf events updates from Ingo Molnar:
- Platform PMU driver updates:
- x86 Intel uncore driver updates for Skylake (SNR) and Icelake (ICX) servers
- Fix RDPMC support
- Fix [extended-]PEBS-via-PT support
- Fix Sapphire Rapids event constraints
- Fix :ppp support on Sapphire Rapids
- Fix fixed counter sanity check on Alder Lake & X86_FEATURE_HYBRID_CPU
- Other heterogenous-PMU fixes
- Kprobes:
- Remove the unused and misguided kprobe::fault_handler callbacks.
- Warn about kprobes taking a page fault.
- Fix the 'nmissed' stat counter.
- Misc cleanups and fixes.
* tag 'perf-core-2021-06-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf: Fix task context PMU for Hetero
perf/x86/intel: Fix instructions:ppp support in Sapphire Rapids
perf/x86/intel: Add more events requires FRONTEND MSR on Sapphire Rapids
perf/x86/intel: Fix fixed counter check warning for some Alder Lake
perf/x86/intel: Fix PEBS-via-PT reload base value for Extended PEBS
perf/x86: Reset the dirty counter to prevent the leak for an RDPMC task
kprobes: Do not increment probe miss count in the fault handler
x86,kprobes: WARN if kprobes tries to handle a fault
kprobes: Remove kprobe::fault_handler
uprobes: Update uprobe_write_opcode() kernel-doc comment
perf/hw_breakpoint: Fix DocBook warnings in perf hw_breakpoint
perf/core: Fix DocBook warnings
perf/core: Make local function perf_pmu_snapshot_aux() static
perf/x86/intel/uncore: Enable I/O stacks to IIO PMON mapping on ICX
perf/x86/intel/uncore: Enable I/O stacks to IIO PMON mapping on SNR
perf/x86/intel/uncore: Generalize I/O stacks to PMON mapping procedure
perf/x86/intel/uncore: Drop unnecessary NULL checks after container_of()
Dan points out that an error case left things on this list. It is also
missing locking in available_instances_show().
Further study shows the list isn't needed at all, just store the total
ports in use in an atomic and delete the whole thing.
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Fixes: 09177ac919 ("vfio/mtty: Convert to use vfio_register_group_dev()")
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/0-v1-0bc56b362ca7+62-mtty_used_ports_jgg@nvidia.com
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
In the case where the call to vfio_register_group_dev fails the error
return path kfree's mdev_state but not mdev_state->vconfig. Fix this
by kfree'ing mdev_state->vconfig before returning.
Addresses-Coverity: ("Resource leak")
Fixes: 437e41368c ("vfio/mdpy: Convert to use vfio_register_group_dev()")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Link: https://lore.kernel.org/r/20210622183710.28954-1-colin.king@canonical.com
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
This is straightforward conversion, the mdev_state is actually serving as
the vfio_device and we can replace all the mdev_get_drvdata()'s and the
wonky dead code with a simple container_of().
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Link: https://lore.kernel.org/r/20210617142218.1877096-11-hch@lst.de
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
This is straightforward conversion, the mdev_state is actually serving as
the vfio_device and we can replace all the mdev_get_drvdata()'s and the
wonky dead code with a simple container_of().
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Link: https://lore.kernel.org/r/20210617142218.1877096-10-hch@lst.de
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
This is straightforward conversion, the mdev_state is actually serving as
the vfio_device and we can replace all the mdev_get_drvdata()'s and the
wonky dead code with a simple container_of()
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Kirti Wankhede <kwankhede@nvidia.com>
Link: https://lore.kernel.org/r/20210617142218.1877096-9-hch@lst.de
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
For some reason the vfio_mdev shim mdev_driver has its own module and
kconfig. As the next patch requires access to it from mdev.ko merge the
two modules together and remove VFIO_MDEV_DEVICE.
A later patch deletes this driver entirely.
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Kirti Wankhede <kwankhede@nvidia.com>
Link: https://lore.kernel.org/r/20210617142218.1877096-7-hch@lst.de
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Trivial conflicts in net/can/isotp.c and
tools/testing/selftests/net/mptcp/mptcp_connect.sh
scaled_ppm_to_ppb() was moved from drivers/ptp/ptp_clock.c
to include/linux/ptp_clock_kernel.h in -next so re-apply
the fix there.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Fix to return a negative error code from the error handling
case instead of 0, as done elsewhere in this function.
If bpf_map_update_elem() failed, main() should return a negative error.
Fixes: 832622e6bd ("xdp: sample program for new bpf_redirect helper")
Signed-off-by: Wang Hai <wanghai38@huawei.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210616042534.315097-1-wanghai38@huawei.com
A Segmentation fault error is caused when the following command
is executed.
$ sudo ./samples/bpf/xdp_redirect lo
Segmentation fault
This command is missing a device <IFNAME|IFINDEX> as an argument, resulting
in out-of-bounds access from argv.
If the number of devices for the xdp_redirect parameter is not 2,
we should report an error and exit.
Fixes: 24251c2647 ("samples/bpf: add option for native and skb mode for redirect apps")
Signed-off-by: Wang Hai <wanghai38@huawei.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210616042324.314832-1-wanghai38@huawei.com
Daniel Borkmann says:
====================
pull-request: bpf-next 2021-06-17
The following pull-request contains BPF updates for your *net-next* tree.
We've added 50 non-merge commits during the last 25 day(s) which contain
a total of 148 files changed, 4779 insertions(+), 1248 deletions(-).
The main changes are:
1) BPF infrastructure to migrate TCP child sockets from a listener to another
in the same reuseport group/map, from Kuniyuki Iwashima.
2) Add a provably sound, faster and more precise algorithm for tnum_mul() as
noted in https://arxiv.org/abs/2105.05398, from Harishankar Vishwanathan.
3) Streamline error reporting changes in libbpf as planned out in the
'libbpf: the road to v1.0' effort, from Andrii Nakryiko.
4) Add broadcast support to xdp_redirect_map(), from Hangbin Liu.
5) Extends bpf_map_lookup_and_delete_elem() functionality to 4 more map
types, that is, {LRU_,PERCPU_,LRU_PERCPU_,}HASH, from Denis Salopek.
6) Support new LLVM relocations in libbpf to make them more linker friendly,
also add a doc to describe the BPF backend relocations, from Yonghong Song.
7) Silence long standing KUBSAN complaints on register-based shifts in
interpreter, from Daniel Borkmann and Eric Biggers.
8) Add dummy PT_REGS macros in libbpf to fail BPF program compilation when
target arch cannot be determined, from Lorenz Bauer.
9) Extend AF_XDP to support large umems with 1M+ pages, from Magnus Karlsson.
10) Fix two minor libbpf tc BPF API issues, from Kumar Kartikeya Dwivedi.
11) Move libbpf BPF_SEQ_PRINTF/BPF_SNPRINTF macros that can be used by BPF
programs to bpf_helpers.h header, from Florent Revest.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
xdp_sample_pkts usage() is missing the introduction of the
"-S" option, this patch adds it.
Fixes: d50ecc46d1 ("samples/bpf: Attach XDP programs in driver mode by default")
Signed-off-by: Wang Hai <wanghai38@huawei.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Link: https://lore.kernel.org/bpf/20210615135724.29528-1-wanghai38@huawei.com
xdp_fwd usage() is missing the introduction of the "-S"
and "-F" options, this patch adds it.
Fixes: d50ecc46d1 ("samples/bpf: Attach XDP programs in driver mode by default")
Signed-off-by: Wang Hai <wanghai38@huawei.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Link: https://lore.kernel.org/bpf/20210615135554.29158-1-wanghai38@huawei.com
The reason for kprobe::fault_handler(), as given by their comment:
* We come here because instructions in the pre/post
* handler caused the page_fault, this could happen
* if handler tries to access user space by
* copy_from_user(), get_user() etc. Let the
* user-specified handler try to fix it first.
Is just plain bad. Those other handlers are ran from non-preemptible
context and had better use _nofault() functions. Also, there is no
upstream usage of this.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Link: https://lore.kernel.org/r/20210525073213.561116662@infradead.org
This is a sample for xdp redirect broadcast. In the sample we could forward
all packets between given interfaces. There is also an option -X that could
enable 2nd xdp_prog on egress interface.
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
Link: https://lore.kernel.org/bpf/20210519090747.1655268-4-liuhangbin@gmail.com
The opening comment mark '/**' is used for highlighting the beginning of
kernel-doc comments.
The header for samples/bpf/ibumad_kern.c follows this syntax, but
the content inside does not comply with kernel-doc.
This line was probably not meant for kernel-doc parsing, but is parsed
due to the presence of kernel-doc like comment syntax(i.e, '/**'), which
causes unexpected warnings from kernel-doc:
warning: This comment starts with '/**', but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst
* ibumad BPF sample kernel side
Provide a simple fix by replacing this occurrence with general comment
format, i.e. '/*', to prevent kernel-doc from parsing it.
Signed-off-by: Aditya Srivastava <yashsri421@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Link: https://lore.kernel.org/bpf/20210523151408.22280-1-yashsri421@gmail.com
Fix to return a negative error code from the framebuffer_alloc() error
handling case instead of 0, also release regions in some error handing
cases.
Fixes: cacade1946 ("sample: vfio mdev display - guest driver")
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Message-Id: <20210520133641.1421378-1-weiyongjun1@huawei.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>