To avoid confusion caused by the increasing fragmentation of the BPF
Loader program, this commit would like to change to the libbpf loader
instead of using the bpf_load.
Thanks to libbpf's bpf_link interface, managing the tracepoint BPF
program is much easier. bpf_program__attach_tracepoint manages the
enable of tracepoint event and attach of BPF programs to it with a
single interface bpf_link, so there is no need to manage event_fd and
prog_fd separately.
This commit refactors xdp_monitor with using this libbpf API, and the
bpf_load is removed and migrated to libbpf.
Signed-off-by: Daniel T. Lee <danieltimlee@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20201010181734.1109-2-danieltimlee@gmail.com
Add an option to count the number of interrupts generated per second and
total number of interrupts during the lifetime of the application for a
given interface. This information is extracted from /proc/interrupts. Since
there is no naming convention across drivers, the user must provide the
string which is specific to their interface in the /proc/interrupts file on
the command line.
Usage:
./xdpsock ... -I <irq_str>
eg. for queue 0 of i40e device eth0:
./xdpsock ... -I i40e-eth0-TxRx-0
Signed-off-by: Ciara Loftus <ciara.loftus@intel.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20201002133612.31536-3-ciara.loftus@intel.com
Categorise and record syscalls issued in the xdpsock sample app. The
categories recorded are:
rx_empty_polls: polls when the rx ring is empty
fill_fail_polls: polls when failed to get addr from fill ring
copy_tx_sendtos: sendtos issued for tx when copy mode enabled
tx_wakeup_sendtos: sendtos issued when tx ring needs waking up
opt_polls: polls issued since the '-p' flag is set
Print the stats using '-a' on the xdpsock command line.
Signed-off-by: Ciara Loftus <ciara.loftus@intel.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20201002133612.31536-2-ciara.loftus@intel.com
New statistics will be added in future commits. In preparation for this,
let's split out the existing statistics into their own struct.
Signed-off-by: Ciara Loftus <ciara.loftus@intel.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20201002133612.31536-1-ciara.loftus@intel.com
Compiling samples/bpf hits an error related to fallthrough marking.
...
CC samples/bpf/hbm.o
samples/bpf/hbm.c: In function ‘main’:
samples/bpf/hbm.c:486:4: error: ‘fallthrough’ undeclared (first use in this function)
fallthrough;
^~~~~~~~~~~
The "fallthrough" is not defined under tools/include directory.
Rather, it is "__fallthrough" is defined in linux/compiler.h.
Including "linux/compiler.h" and using "__fallthrough" fixed the issue.
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20201006043427.1891805-1-yhs@fb.com
With latest llvm trunk, bpf programs under samples/bpf
directory, if using CORE, may experience the following
errors:
LLVM ERROR: Cannot select: intrinsic %llvm.preserve.struct.access.index
PLEASE submit a bug report to https://bugs.llvm.org/ and include the crash backtrace.
Stack dump:
0. Program arguments: llc -march=bpf -filetype=obj -o samples/bpf/test_probe_write_user_kern.o
1. Running pass 'Function Pass Manager' on module '<stdin>'.
2. Running pass 'BPF DAG->DAG Pattern Instruction Selection' on function '@bpf_prog1'
#0 0x000000000183c26c llvm::sys::PrintStackTrace(llvm::raw_ostream&, int)
(/data/users/yhs/work/llvm-project/llvm/build.cur/install/bin/llc+0x183c26c)
...
#7 0x00000000017c375e (/data/users/yhs/work/llvm-project/llvm/build.cur/install/bin/llc+0x17c375e)
#8 0x00000000016a75c5 llvm::SelectionDAGISel::CannotYetSelect(llvm::SDNode*)
(/data/users/yhs/work/llvm-project/llvm/build.cur/install/bin/llc+0x16a75c5)
#9 0x00000000016ab4f8 llvm::SelectionDAGISel::SelectCodeCommon(llvm::SDNode*, unsigned char const*,
unsigned int) (/data/users/yhs/work/llvm-project/llvm/build.cur/install/bin/llc+0x16ab4f8)
...
Aborted (core dumped) | llc -march=bpf -filetype=obj -o samples/bpf/test_probe_write_user_kern.o
The reason is due to llvm change https://reviews.llvm.org/D87153
where the CORE relocation global generation is moved from the beginning
of target dependent optimization (llc) to the beginning
of target independent optimization (opt).
Since samples/bpf programs did not use vmlinux.h and its clang compilation
uses native architecture, we need to adjust arch triple at opt level
to do CORE relocation global generation properly. Otherwise, the above
error will appear.
This patch fixed the issue by introduce opt and llvm-dis to compilation chain,
which will do proper CORE relocation global generation as well as O2 level
optimization. Tested with llvm10, llvm11 and trunk/llvm12.
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/20201006043427.1891742-1-yhs@fb.com
Alexei Starovoitov says:
====================
pull-request: bpf-next 2020-09-23
The following pull-request contains BPF updates for your *net-next* tree.
We've added 95 non-merge commits during the last 22 day(s) which contain
a total of 124 files changed, 4211 insertions(+), 2040 deletions(-).
The main changes are:
1) Full multi function support in libbpf, from Andrii.
2) Refactoring of function argument checks, from Lorenz.
3) Make bpf_tail_call compatible with functions (subprograms), from Maciej.
4) Program metadata support, from YiFei.
5) bpf iterator optimizations, from Yonghong.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Add a quiet option (-Q) that disables the statistics print outs of
xdpsock. This is good to have when measuring 0% loss rate performance
as it will be quite terrible if the application uses printfs.
Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/1599726666-8431-4-git-send-email-magnus.karlsson@gmail.com
Fix a possible deadlock in the l2fwd application in xdpsock that can
occur when there is no space in the Tx ring. There are two ways to get
the kernel to consume entries in the Tx ring: calling sendto() to make
it send packets and freeing entries from the completion ring, as the
kernel will not send a packet if there is no space for it to add a
completion entry in the completion ring. The Tx loop in l2fwd only
used to call sendto(). This patches adds cleaning the completion ring
in that loop.
Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/1599726666-8431-3-git-send-email-magnus.karlsson@gmail.com
Fix the sending of a single packet (or small burst) in xdpsock when
executing in copy mode. Currently, the l2fwd application in xdpsock
only transmits the packets after a batch of them has been received,
which might be confusing if you only send one packet and expect that
it is returned pronto. Fix this by calling sendto() more often and add
a comment in the code that states that this can be optimized if
needed.
Reported-by: Tirthendu Sarkar <tirthendu.sarkar@intel.com>
Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/1599726666-8431-2-git-send-email-magnus.karlsson@gmail.com
We got slightly different patches removing a double word
in a comment in net/ipv4/raw.c - picked the version from net.
Simple conflict in drivers/net/ethernet/ibm/ibmvnic.c. Use cached
values instead of VNIC login response buffer (following what
commit 507ebe6444 ("ibmvnic: Fix use-after-free of VNIC login
response buffer") did).
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
From commit 5210958420 ("libbpf: Deprecate notion of BPF program
"title" in favor of "section name""), the term title has been replaced
with section name in libbpf.
Since the bpf_program__title() has been deprecated, this commit
switches this function to bpf_program__section_name(). Due to
this commit, the compilation warning issue has also been resolved.
Fixes: 5210958420 ("libbpf: Deprecate notion of BPF program "title" in favor of "section name"")
Signed-off-by: Daniel T. Lee <danieltimlee@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20200904063434.24963-1-danieltimlee@gmail.com
The txpush program in the xdpsock sample application is supposed
to send out all packets in the umem in a round-robin fashion.
The problem is that it only cycled through the first BATCH_SIZE
worth of packets. Fixed this so that it cycles through all buffers
in the umem as intended.
Fixes: 248c7f9c0e ("samples/bpf: convert xdpsock to use libbpf for AF_XDP access")
Signed-off-by: Weqaar Janjua <weqaar.a.janjua@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Björn Töpel <bjorn.topel@intel.com>
Link: https://lore.kernel.org/bpf/20200828161717.42705-1-weqaar.a.janjua@intel.com
Optimize the throughput performance of the l2fwd sub-app in the
xdpsock sample application by removing a duplicate syscall and
increasing the size of the fill ring.
The latter needs some further explanation. We recommend that you set
the fill ring size >= HW RX ring size + AF_XDP RX ring size. Make sure
you fill up the fill ring with buffers at regular intervals, and you
will with this setting avoid allocation failures in the driver. These
are usually quite expensive since drivers have not been written to
assume that allocation failures are common. For regular sockets,
kernel allocated memory is used that only runs out in OOM situations
that should be rare.
These two performance optimizations together lead to a 6% percent
improvement for the l2fwd app on my machine.
Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/1598619065-1944-1-git-send-email-magnus.karlsson@intel.com
This sample code illustrates the packet forwarding between multiple
AF_XDP sockets in multi-threading environment. All the threads and
sockets are sharing a common buffer pool, with each socket having
its own private buffer cache. The sockets are created with the
xsk_socket__create_shared() function, which allows multiple AF_XDP
sockets to share the same UMEM object.
Example 1: Single thread handling two sockets. Packets received
from socket A (on top of interface IFA, queue QA) are forwarded
to socket B (on top of interface IFB, queue QB) and vice-versa.
The thread is affinitized to CPU core C:
./xsk_fwd -i IFA -q QA -i IFB -q QB -c C
Example 2: Two threads, each handling two sockets. Packets from
socket A are sent to socket B (by thread X), packets
from socket B are sent to socket A (by thread X); packets from
socket C are sent to socket D (by thread Y), packets from socket
D are sent to socket C (by thread Y). The two threads are bound
to CPU cores CX and CY:
./xdp_fwd -i IFA -q QA -i IFB -q QB -i IFC -q QC -i IFD -q QD -c CX -c CY
Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Björn Töpel <bjorn.topel@intel.com>
Link: https://lore.kernel.org/bpf/1598603189-32145-15-git-send-email-magnus.karlsson@intel.com
For the problem of increasing fragmentation of the bpf loader programs,
instead of using bpf_loader.o, which is used in samples/bpf, this
commit refactors the existing tracepoint tracing programs with libbbpf
bpf loader.
- Adding a tracepoint event and attaching a bpf program to it was done
through bpf_program_attach().
- Instead of using the existing BPF MAP definition, MAP definition
has been refactored with the new BTF-defined MAP format.
Signed-off-by: Daniel T. Lee <danieltimlee@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200823085334.9413-4-danieltimlee@gmail.com
For the problem of increasing fragmentation of the bpf loader programs,
instead of using bpf_loader.o, which is used in samples/bpf, this
commit refactors the existing kprobe tracing programs with libbbpf
bpf loader.
- For kprobe events pointing to system calls, the SYSCALL() macro in
trace_common.h was used.
- Adding a kprobe event and attaching a bpf program to it was done
through bpf_program_attach().
- Instead of using the existing BPF MAP definition, MAP definition
has been refactored with the new BTF-defined MAP format.
Signed-off-by: Daniel T. Lee <danieltimlee@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200823085334.9413-3-danieltimlee@gmail.com
Since commit cc7f641d63 ("samples: bpf: Refactor BPF map performance
test with libbpf") has ommited the removal of bpf_load.o from Makefile,
this commit removes the bpf_load.o rule for targets where bpf_load.o is
not used.
Fixes: cc7f641d63 ("samples: bpf: Refactor BPF map performance test with libbpf")
Signed-off-by: Daniel T. Lee <danieltimlee@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200823085334.9413-2-danieltimlee@gmail.com
>From commit f1394b7988 ("block: mark blk_account_io_completion
static") symbol blk_account_io_completion() has been marked as static,
which makes it no longer possible to attach kprobe to this event.
Currently, there are broken samples due to this reason.
As a solution to this, attach kprobe events to blk_account_io_done()
to modify them to perform the same behavior as before.
Signed-off-by: Daniel T. Lee <danieltimlee@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20200818051641.21724-1-danieltimlee@gmail.com
- run the checker (e.g. sparse) after the compiler
- remove unneeded cc-option tests for old compiler flags
- fix tar-pkg to install dtbs
- introduce ccflags-remove-y and asflags-remove-y syntax
- allow to trace functions in sub-directories of lib/
- introduce hostprogs-always-y and userprogs-always-y syntax
- various Makefile cleanups
-----BEGIN PGP SIGNATURE-----
iQJJBAABCgAzFiEEbmPs18K1szRHjPqEPYsBB53g2wYFAl8wJXEVHG1hc2FoaXJv
eUBrZXJuZWwub3JnAAoJED2LAQed4NsGMGEP/0jDq/WafbfPN0aU83EqEWLt/sKg
bluzmf/6HGx3XVRnuAzsHNNqysUx77WJiDsU/jbC/zdH8Iox3Sc1diE2sELLNAfY
iJmQ8NBPggyU74aYG3OJdpDjz8T9EX/nVaYrjyFlbuXElM+Qvo8Z4Fz6NpWqKWlA
gU+yGxEPPdX6MLHcSPSIu1hGWx7UT4fgfx3zDFTI2qvbQgQjKtzyTjAH5Cm3o87h
rfomvHSSoAUg+Fh1LediRh1tJlkdVO+w7c+LNwCswmdBtkZuxecj1bQGUTS8GaLl
CCWOKYfWp0KsVf1veXNNNaX/ecbp+Y34WErFq3V9Fdq5RmVlp+FPSGMyjDMRiQ/p
LGvzbJLPpG586MnK8of0dOj6Es6tVPuq6WH2HuvsyTGcZJDpFTTxRcK3HDkE8ig6
ZtuM3owB/Mep8IzwY2yWQiDrc7TX5Fz8S4hzGPU1zG9cfj4VT6TBqHGAy1Eql/0l
txj6vJpnbQSdXiIX8MIU3yH35Y7eW3JYWgspTZH5Woj1S/wAWwuG93Fuuxq6mQIJ
q6LSkMavtOfuCjOA9vJBZewpKXRU6yo0CzWNL/5EZ6z/r/I+DGtfb/qka8oYUDjX
9H0cecL37AQxDHRPTxCZDQF0TpYiFJ6bmnMftK9NKNuIdvsk9DF7UBa3EdUNIj38
yKS3rI7Lw55xWuY3
=bkNQ
-----END PGP SIGNATURE-----
Merge tag 'kbuild-v5.9' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild
Pull Kbuild updates from Masahiro Yamada:
- run the checker (e.g. sparse) after the compiler
- remove unneeded cc-option tests for old compiler flags
- fix tar-pkg to install dtbs
- introduce ccflags-remove-y and asflags-remove-y syntax
- allow to trace functions in sub-directories of lib/
- introduce hostprogs-always-y and userprogs-always-y syntax
- various Makefile cleanups
* tag 'kbuild-v5.9' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
kbuild: stop filtering out $(GCC_PLUGINS_CFLAGS) from cc-option base
kbuild: include scripts/Makefile.* only when relevant CONFIG is enabled
kbuild: introduce hostprogs-always-y and userprogs-always-y
kbuild: sort hostprogs before passing it to ifneq
kbuild: move host .so build rules to scripts/gcc-plugins/Makefile
kbuild: Replace HTTP links with HTTPS ones
kbuild: trace functions in subdirectories of lib/
kbuild: introduce ccflags-remove-y and asflags-remove-y
kbuild: do not export LDFLAGS_vmlinux
kbuild: always create directories of targets
powerpc/boot: add DTB to 'targets'
kbuild: buildtar: add dtbs support
kbuild: remove cc-option test of -ffreestanding
kbuild: remove cc-option test of -fno-stack-protector
Revert "kbuild: Create directory for target DTB"
kbuild: run the checker after the compiler
To build host programs, you need to add the program names to 'hostprogs'
to use the necessary build rule, but it is not enough to build them
because there is no dependency.
There are two types of host programs: built as the prerequisite of
another (e.g. gen_crc32table in lib/Makefile), or always built when
Kbuild visits the Makefile (e.g. genksyms in scripts/genksyms/Makefile).
The latter is typical in Makefiles under scripts/, which contains host
programs globally used during the kernel build. To build them, you need
to add them to both 'hostprogs' and 'always-y'.
This commit adds hostprogs-always-y as a shorthand.
The same applies to user programs. net/bpfilter/Makefile builds
bpfilter_umh on demand, hence always-y is unneeded. In contrast,
programs under samples/ are added to both 'userprogs' and 'always-y'
so they are always built when Kbuild visits the Makefiles.
userprogs-always-y works as a shorthand.
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Acked-by: Miguel Ojeda <miguel.ojeda.sandonis@gmail.com>
Pull networking updates from David Miller:
1) Support 6Ghz band in ath11k driver, from Rajkumar Manoharan.
2) Support UDP segmentation in code TSO code, from Eric Dumazet.
3) Allow flashing different flash images in cxgb4 driver, from Vishal
Kulkarni.
4) Add drop frames counter and flow status to tc flower offloading,
from Po Liu.
5) Support n-tuple filters in cxgb4, from Vishal Kulkarni.
6) Various new indirect call avoidance, from Eric Dumazet and Brian
Vazquez.
7) Fix BPF verifier failures on 32-bit pointer arithmetic, from
Yonghong Song.
8) Support querying and setting hardware address of a port function via
devlink, use this in mlx5, from Parav Pandit.
9) Support hw ipsec offload on bonding slaves, from Jarod Wilson.
10) Switch qca8k driver over to phylink, from Jonathan McDowell.
11) In bpftool, show list of processes holding BPF FD references to
maps, programs, links, and btf objects. From Andrii Nakryiko.
12) Several conversions over to generic power management, from Vaibhav
Gupta.
13) Add support for SO_KEEPALIVE et al. to bpf_setsockopt(), from Dmitry
Yakunin.
14) Various https url conversions, from Alexander A. Klimov.
15) Timestamping and PHC support for mscc PHY driver, from Antoine
Tenart.
16) Support bpf iterating over tcp and udp sockets, from Yonghong Song.
17) Support 5GBASE-T i40e NICs, from Aleksandr Loktionov.
18) Add kTLS RX HW offload support to mlx5e, from Tariq Toukan.
19) Fix the ->ndo_start_xmit() return type to be netdev_tx_t in several
drivers. From Luc Van Oostenryck.
20) XDP support for xen-netfront, from Denis Kirjanov.
21) Support receive buffer autotuning in MPTCP, from Florian Westphal.
22) Support EF100 chip in sfc driver, from Edward Cree.
23) Add XDP support to mvpp2 driver, from Matteo Croce.
24) Support MPTCP in sock_diag, from Paolo Abeni.
25) Commonize UDP tunnel offloading code by creating udp_tunnel_nic
infrastructure, from Jakub Kicinski.
26) Several pci_ --> dma_ API conversions, from Christophe JAILLET.
27) Add FLOW_ACTION_POLICE support to mlxsw, from Ido Schimmel.
28) Add SK_LOOKUP bpf program type, from Jakub Sitnicki.
29) Refactor a lot of networking socket option handling code in order to
avoid set_fs() calls, from Christoph Hellwig.
30) Add rfc4884 support to icmp code, from Willem de Bruijn.
31) Support TBF offload in dpaa2-eth driver, from Ioana Ciornei.
32) Support XDP_REDIRECT in qede driver, from Alexander Lobakin.
33) Support PCI relaxed ordering in mlx5 driver, from Aya Levin.
34) Support TCP syncookies in MPTCP, from Flowian Westphal.
35) Fix several tricky cases of PMTU handling wrt. briding, from Stefano
Brivio.
* git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (2056 commits)
net: thunderx: initialize VF's mailbox mutex before first usage
usb: hso: remove bogus check for EINPROGRESS
usb: hso: no complaint about kmalloc failure
hso: fix bailout in error case of probe
ip_tunnel_core: Fix build for archs without _HAVE_ARCH_IPV6_CSUM
selftests/net: relax cpu affinity requirement in msg_zerocopy test
mptcp: be careful on subflow creation
selftests: rtnetlink: make kci_test_encap() return sub-test result
selftests: rtnetlink: correct the final return value for the test
net: dsa: sja1105: use detected device id instead of DT one on mismatch
tipc: set ub->ifindex for local ipv6 address
ipv6: add ipv6_dev_find()
net: openvswitch: silence suspicious RCU usage warning
Revert "vxlan: fix tos value before xmit"
ptp: only allow phase values lower than 1 period
farsync: switch from 'pci_' to 'dma_' API
wan: wanxl: switch from 'pci_' to 'dma_' API
hv_netvsc: do not use VF device if link is down
dpaa2-eth: Fix passing zero to 'PTR_ERR' warning
net: macb: Properly handle phylink on at91sam9x
...
while to come. Changes include:
- Some new Chinese translations
- Progress on the battle against double words words and non-HTTPS URLs
- Some block-mq documentation
- More RST conversions from Mauro. At this point, that task is
essentially complete, so we shouldn't see this kind of churn again for a
while. Unless we decide to switch to asciidoc or something...:)
- Lots of typo fixes, warning fixes, and more.
-----BEGIN PGP SIGNATURE-----
iQFDBAABCAAtFiEEIw+MvkEiF49krdp9F0NaE2wMflgFAl8oVkwPHGNvcmJldEBs
d24ubmV0AAoJEBdDWhNsDH5YoW8H/jJ/xnXFn7tkgVPQAlL3k5HCnK7A5nDP9RVR
cg1pTx1cEFdjzxPlJyExU6/v+AImOvtweHXC+JDK7YcJ6XFUNYXJI3LxL5KwUXbY
BL/xRFszDSXH2C7SJF5GECcFYp01e/FWSLN3yWAh+g+XwsKiTJ8q9+CoIDkHfPGO
7oQsHKFu6s36Af0LfSgxk4sVB7EJbo8e4psuPsP5SUrl+oXRO43Put0rXkR4yJoH
9oOaB51Do5fZp8I4JVAqGXvpXoExyLMO4yw0mASm6YSZ3KyjR8Fae+HD9Cq4ZuwY
0uzb9K+9NEhqbfwtyBsi99S64/6Zo/MonwKwevZuhtsDTK4l4iU=
=JQLZ
-----END PGP SIGNATURE-----
Merge tag 'docs-5.9' of git://git.lwn.net/linux
Pull documentation updates from Jonathan Corbet:
"It's been a busy cycle for documentation - hopefully the busiest for a
while to come. Changes include:
- Some new Chinese translations
- Progress on the battle against double words words and non-HTTPS
URLs
- Some block-mq documentation
- More RST conversions from Mauro. At this point, that task is
essentially complete, so we shouldn't see this kind of churn again
for a while. Unless we decide to switch to asciidoc or
something...:)
- Lots of typo fixes, warning fixes, and more"
* tag 'docs-5.9' of git://git.lwn.net/linux: (195 commits)
scripts/kernel-doc: optionally treat warnings as errors
docs: ia64: correct typo
mailmap: add entry for <alobakin@marvell.com>
doc/zh_CN: add cpu-load Chinese version
Documentation/admin-guide: tainted-kernels: fix spelling mistake
MAINTAINERS: adjust kprobes.rst entry to new location
devices.txt: document rfkill allocation
PCI: correct flag name
docs: filesystems: vfs: correct flag name
docs: filesystems: vfs: correct sync_mode flag names
docs: path-lookup: markup fixes for emphasis
docs: path-lookup: more markup fixes
docs: path-lookup: fix HTML entity mojibake
CREDITS: Replace HTTP links with HTTPS ones
docs: process: Add an example for creating a fixes tag
doc/zh_CN: add Chinese translation prefer section
doc/zh_CN: add clearing-warn-once Chinese version
doc/zh_CN: add admin-guide index
doc:it_IT: process: coding-style.rst: Correct __maybe_unused compiler label
futex: MAINTAINERS: Re-add selftests directory
...
A handful of samples and selftests fail to build on s390, because
after commit 0ebeea8ca8 ("bpf: Restrict bpf_probe_read{, str}()
only to archs where they work") bpf_probe_read is not available
anymore.
Fix by using bpf_probe_read_kernel.
Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200720114806.88823-1-iii@linux.ibm.com
Extend xdp_redirect_cpu_{usr,kern}.c adding the possibility to load
a XDP program on cpumap entries. The following options have been added:
- mprog-name: cpumap entry program name
- mprog-filename: cpumap entry program filename
- redirect-device: output interface if the cpumap program performs a
XDP_REDIRECT to an egress interface
- redirect-map: bpf map used to perform XDP_REDIRECT to an egress
interface
- mprog-disable: disable loading XDP program on cpumap entries
Add xdp_pass, xdp_drop, xdp_redirect stats accounting
Co-developed-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/aa5a9a281b9dac425620fdabe82670ffb6bbdb92.1594734381.git.lorenzo@kernel.org
Do not update xdp_redirect_cpu maps running while option loop but
defer it after all available options have been parsed. This is a
preliminary patch to pass the program name we want to attach to the
map entries as a user option
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Link: https://lore.kernel.org/bpf/95dc46286fd2c609042948e04bb7ae1f5b425538.1594734381.git.lorenzo@kernel.org
Alexei Starovoitov says:
====================
pull-request: bpf-next 2020-07-13
The following pull-request contains BPF updates for your *net-next* tree.
We've added 36 non-merge commits during the last 7 day(s) which contain
a total of 62 files changed, 2242 insertions(+), 468 deletions(-).
The main changes are:
1) Avoid trace_printk warning banner by switching bpf_trace_printk to use
its own tracing event, from Alan.
2) Better libbpf support on older kernels, from Andrii.
3) Additional AF_XDP stats, from Ciara.
4) build time resolution of BTF IDs, from Jiri.
5) BPF_CGROUP_INET_SOCK_RELEASE hook, from Stanislav.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Introduce the --extra-stats (or simply -x) flag to the xdpsock application
which prints additional statistics alongside the regular rx and tx
counters. The new statistics printed report error conditions eg. rx ring
full, invalid descriptors, etc.
Signed-off-by: Ciara Loftus <ciara.loftus@intel.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200708072835.4427-3-ciara.loftus@intel.com
The `BPF_LOG_BUF_SIZE`'s value is `UINT32_MAX >> 8`, so define an array
with it on stack caused an overflow.
Signed-off-by: Wenbo Zhang <ethercflow@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/20200710092035.28919-1-ethercflow@gmail.com
Previously, in order to set the numa_node attribute at the time of map
creation using "libbpf", it was necessary to call bpf_create_map_node()
directly (bpf_load approach), instead of calling bpf_object_load()
that handles everything on its own, including map creation. And because
of this problem, this sample had problems with refactoring from bpf_load
to libbbpf.
However, by commit 1bdb6c9a1c ("libbpf: Add a bunch of attribute
getters/setters for map definitions") added the numa_node attribute and
allowed it to be set in the map.
By using libbpf instead of bpf_load, the inner map definition has
been explicitly declared with BTF-defined format. Also, the element of
ARRAY_OF_MAPS was also statically specified using the BTF format. And
for this reason some logic in fixup_map() was not needed and changed
or removed.
Signed-off-by: Daniel T. Lee <danieltimlee@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/20200707184855.30968-4-danieltimlee@gmail.com
From commit 646f02ffdd ("libbpf: Add BTF-defined map-in-map
support"), a way to define internal map in BTF-defined map has been
added.
Instead of using previous 'inner_map_idx' definition, the structure to
be used for the inner map can be directly defined using array directive.
__array(values, struct inner_map)
This commit refactors map in map test program with libbpf by explicitly
defining inner map with BTF-defined format.
Signed-off-by: Daniel T. Lee <danieltimlee@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/20200707184855.30968-3-danieltimlee@gmail.com
Currently, BPF programs with kprobe/sys_connect does not work properly.
Commit 34745aed51 ("samples/bpf: fix kprobe attachment issue on x64")
This commit modifies the bpf_load behavior of kprobe events in the x64
architecture. If the current kprobe event target starts with "sys_*",
add the prefix "__x64_" to the front of the event.
Appending "__x64_" prefix with kprobe/sys_* event was appropriate as a
solution to most of the problems caused by the commit below.
commit d5a00528b5 ("syscalls/core, syscalls/x86: Rename struct
pt_regs-based sys_*() to __x64_sys_*()")
However, there is a problem with the sys_connect kprobe event that does
not work properly. For __sys_connect event, parameters can be fetched
normally, but for __x64_sys_connect, parameters cannot be fetched.
ffffffff818d3520 <__x64_sys_connect>:
ffffffff818d3520: e8 fb df 32 00 callq 0xffffffff81c01520
<__fentry__>
ffffffff818d3525: 48 8b 57 60 movq 96(%rdi), %rdx
ffffffff818d3529: 48 8b 77 68 movq 104(%rdi), %rsi
ffffffff818d352d: 48 8b 7f 70 movq 112(%rdi), %rdi
ffffffff818d3531: e8 1a ff ff ff callq 0xffffffff818d3450
<__sys_connect>
ffffffff818d3536: 48 98 cltq
ffffffff818d3538: c3 retq
ffffffff818d3539: 0f 1f 80 00 00 00 00 nopl (%rax)
As the assembly code for __x64_sys_connect shows, parameters should be
fetched and set into rdi, rsi, rdx registers prior to calling
__sys_connect.
Because of this problem, this commit fixes the sys_connect event by
first getting the value of the rdi register and then the value of the
rdi, rsi, and rdx register through an offset based on that value.
Fixes: 34745aed51 ("samples/bpf: fix kprobe attachment issue on x64")
Signed-off-by: Daniel T. Lee <danieltimlee@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/20200707184855.30968-2-danieltimlee@gmail.com
Something changed recently to uncover this warning:
samples/vfs/test-statx.c:24:15: warning: `struct foo' declared inside parameter list will not be visible outside of this definition or declaration
24 | #define statx foo
| ^~~
Which is due the use of "struct statx" (here, "struct foo") in a function
prototype argument list before it has been defined:
int
# 56 "/usr/include/x86_64-linux-gnu/bits/statx-generic.h"
foo
# 56 "/usr/include/x86_64-linux-gnu/bits/statx-generic.h" 3 4
(int __dirfd, const char *__restrict __path, int __flags,
unsigned int __mask, struct
# 57 "/usr/include/x86_64-linux-gnu/bits/statx-generic.h"
foo
# 57 "/usr/include/x86_64-linux-gnu/bits/statx-generic.h" 3 4
*__restrict __buf)
__attribute__ ((__nothrow__ , __leaf__)) __attribute__ ((__nonnull__ (2, 5)));
Add explicit struct before #include to avoid warning.
Fixes: f1b5618e01 ("vfs: Add a sample program for the new mount API")
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Miklos Szeredi <mszeredi@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: David Howells <dhowells@redhat.com>
Link: http://lkml.kernel.org/r/202006282213.C516EA6@keescook
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull networking fixes from David Miller:
1) Don't insert ESP trailer twice in IPSEC code, from Huy Nguyen.
2) The default crypto algorithm selection in Kconfig for IPSEC is out
of touch with modern reality, fix this up. From Eric Biggers.
3) bpftool is missing an entry for BPF_MAP_TYPE_RINGBUF, from Andrii
Nakryiko.
4) Missing init of ->frame_sz in xdp_convert_zc_to_xdp_frame(), from
Hangbin Liu.
5) Adjust packet alignment handling in ax88179_178a driver to match
what the hardware actually does. From Jeremy Kerr.
6) register_netdevice can leak in the case one of the notifiers fail,
from Yang Yingliang.
7) Use after free in ip_tunnel_lookup(), from Taehee Yoo.
8) VLAN checks in sja1105 DSA driver need adjustments, from Vladimir
Oltean.
9) tg3 driver can sleep forever when we get enough EEH errors, fix from
David Christensen.
10) Missing {READ,WRITE}_ONCE() annotations in various Intel ethernet
drivers, from Ciara Loftus.
11) Fix scanning loop break condition in of_mdiobus_register(), from
Florian Fainelli.
12) MTU limit is incorrect in ibmveth driver, from Thomas Falcon.
13) Endianness fix in mlxsw, from Ido Schimmel.
14) Use after free in smsc95xx usbnet driver, from Tuomas Tynkkynen.
15) Missing bridge mrp configuration validation, from Horatiu Vultur.
16) Fix circular netns references in wireguard, from Jason A. Donenfeld.
17) PTP initialization on recovery is not done properly in qed driver,
from Alexander Lobakin.
18) Endian conversion of L4 ports in filters of cxgb4 driver is wrong,
from Rahul Lakkireddy.
19) Don't clear bound device TX queue of socket prematurely otherwise we
get problems with ktls hw offloading, from Tariq Toukan.
20) ipset can do atomics on unaligned memory, fix from Russell King.
21) Align ethernet addresses properly in bridging code, from Thomas
Martitz.
22) Don't advertise ipv4 addresses on SCTP sockets having ipv6only set,
from Marcelo Ricardo Leitner.
* git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (149 commits)
rds: transport module should be auto loaded when transport is set
sch_cake: fix a few style nits
sch_cake: don't call diffserv parsing code when it is not needed
sch_cake: don't try to reallocate or unshare skb unconditionally
ethtool: fix error handling in linkstate_prepare_data()
wil6210: account for napi_gro_receive never returning GRO_DROP
hns: do not cast return value of napi_gro_receive to null
socionext: account for napi_gro_receive never returning GRO_DROP
wireguard: receive: account for napi_gro_receive never returning GRO_DROP
vxlan: fix last fdb index during dump of fdb with nhid
sctp: Don't advertise IPv4 addresses if ipv6only is set on the socket
tc-testing: avoid action cookies with odd length.
bpf: tcp: bpf_cubic: fix spurious HYSTART_DELAY exit upon drop in min RTT
tcp_cubic: fix spurious HYSTART_DELAY exit upon drop in min RTT
net: dsa: sja1105: fix tc-gate schedule with single element
net: dsa: sja1105: recalculate gating subschedule after deleting tc-gate rules
net: dsa: sja1105: unconditionally free old gating config
net: dsa: sja1105: move sja1105_compose_gating_subschedule at the top
net: macb: free resources on failure path of at91ether_open()
net: macb: call pm_runtime_put_sync on failure path
...
- fix -gz=zlib compiler option test for CONFIG_DEBUG_INFO_COMPRESSED
- improve cc-option in scripts/Kbuild.include to clean up temp files
- improve cc-option in scripts/Kconfig.include for more reliable compile
option test
- do not copy modules.builtin by 'make install' because it would break
existing systems
- use 'userprogs' syntax for watch_queue sample
-----BEGIN PGP SIGNATURE-----
iQJJBAABCgAzFiEEbmPs18K1szRHjPqEPYsBB53g2wYFAl7vkn4VHG1hc2FoaXJv
eUBrZXJuZWwub3JnAAoJED2LAQed4NsG69MP/Raf/dW6GEHsUu7McIM3/HmNkjKh
6oBaD3NyY2TTjL/BtM18GmUuQTfeLG60UitPYVwvzVom/88JLQYzKmLFcwEH+KMz
7Bv/UukTXq4OjcmafG/h47BMYZTZ3No4Z+kMWzHe8HU8w+adfMh578nK5JeDz/1/
wY/xn+/OeXGcpEyjpR/rS9XCxKTYdEF6NwPknHhjGK3+byn3oqsZ3yRc+WwYumOD
UvXT2KE8krCJjTQ4kKUY3Q+jzZiKEuHEEcWI6AdHLXADpUll60DBc/5OgW75+6NA
FYOU2Ocuq+D8Q8wifMBKXjhN5ci/I8/h+aGvE2M05IbXN9BqMw6sbf/SEY0j2Saq
+p2AB4hbCzrFMtUTK2Al+bhV5tPYukQQqKpIRnhZe3NwJwX3EMtmAokbBtR/oD4i
yN28JZhosCggVV3o/9wFyWzq6fr376SSoHDogAtPOefvJVRQKHmavdnHk68ixkAk
itntVrMS2T/wB5esnAMiiCY4zdWwXd+OTceN2sxgdxxXZ+IklAbJG5IemUnys9Ts
eZ1IbIaopTKriWyOIjmlKP4dVbSqHVkHSovZFACu4PcKyvpIJtZhigFQJyXIDGho
GDsrgXRRykYiG42wu1l4zHyI9O76XXqweCJobyURN/JmQ2wUPahczEROH0rPJG5H
SEr1yjd7KaMxu5mx
=IAng
-----END PGP SIGNATURE-----
Merge tag 'kbuild-fixes-v5.8' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild
Pull Kbuild fixes from Masahiro Yamada:
- fix -gz=zlib compiler option test for CONFIG_DEBUG_INFO_COMPRESSED
- improve cc-option in scripts/Kbuild.include to clean up temp files
- improve cc-option in scripts/Kconfig.include for more reliable
compile option test
- do not copy modules.builtin by 'make install' because it would break
existing systems
- use 'userprogs' syntax for watch_queue sample
* tag 'kbuild-fixes-v5.8' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
samples: watch_queue: build sample program for target architecture
Revert "Makefile: install modules.builtin even if CONFIG_MODULES=n"
scripts: Fix typo in headers_install.sh
kconfig: unify cc-option and as-option
kbuild: improve cc-option to clean up all temporary files
Makefile: Improve compressed debug info support detection
This userspace program includes UAPI headers exported to usr/include/.
'make headers' always works for the target architecture (i.e. the same
architecture as the kernel), so the sample program should be built for
the target as well. Kbuild now supports 'userprogs' for that.
I also guarded the CONFIG option by 'depends on CC_CAN_LINK' because
$(CC) may not provide libc.
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
- Have recordmcount work with > 64K sections (to support LTO)
- kprobe RCU fixes
- Correct a kprobe critical section with missing mutex
- Remove redundant arch_disarm_kprobe() call
- Fix lockup when kretprobe triggers within kprobe_flush_task()
- Fix memory leak in fetch_op_data operations
- Fix sleep in atomic in ftrace trace array sample code
- Free up memory on failure in sample trace array code
- Fix incorrect reporting of function_graph fields in format file
- Fix quote within quote parsing in bootconfig
- Fix return value of bootconfig tool
- Add testcases for bootconfig tool
- Fix maybe uninitialized warning in ftrace pid file code
- Remove unused variable in tracing_iter_reset()
- Fix some typos
-----BEGIN PGP SIGNATURE-----
iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCXu1jrRQccm9zdGVkdEBn
b29kbWlzLm9yZwAKCRAp5XQQmuv6qoCMAP91nOccE3X+Nvc3zET3isDWnl1tWJxk
icsBgN/JwBRuTAD/dnWTHIWM2/5lTiagvyVsmINdJHP6JLr8T7dpN9tlxAQ=
=Cuo7
-----END PGP SIGNATURE-----
Merge tag 'trace-v5.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Pull tracing fixes from Steven Rostedt:
- Have recordmcount work with > 64K sections (to support LTO)
- kprobe RCU fixes
- Correct a kprobe critical section with missing mutex
- Remove redundant arch_disarm_kprobe() call
- Fix lockup when kretprobe triggers within kprobe_flush_task()
- Fix memory leak in fetch_op_data operations
- Fix sleep in atomic in ftrace trace array sample code
- Free up memory on failure in sample trace array code
- Fix incorrect reporting of function_graph fields in format file
- Fix quote within quote parsing in bootconfig
- Fix return value of bootconfig tool
- Add testcases for bootconfig tool
- Fix maybe uninitialized warning in ftrace pid file code
- Remove unused variable in tracing_iter_reset()
- Fix some typos
* tag 'trace-v5.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
ftrace: Fix maybe-uninitialized compiler warning
tools/bootconfig: Add testcase for show-command and quotes test
tools/bootconfig: Fix to return 0 if succeeded to show the bootconfig
tools/bootconfig: Fix to use correct quotes for value
proc/bootconfig: Fix to use correct quotes for value
tracing: Remove unused event variable in tracing_iter_reset
tracing/probe: Fix memleak in fetch_op_data operations
trace: Fix typo in allocate_ftrace_ops()'s comment
tracing: Make ftrace packed events have align of 1
sample-trace-array: Remove trace_array 'sample-instance'
sample-trace-array: Fix sleeping function called from invalid context
kretprobe: Prevent triggering kretprobe from within kprobe_flush_task
kprobes: Remove redundant arch_disarm_kprobe() call
kprobes: Fix to protect kick_kprobe_optimizer() by kprobe_mutex
kprobes: Use non RCU traversal APIs on kprobe_tables if possible
kprobes: Suppress the suspicious RCU warning on kprobes
recordmcount: support >64k sections
There are several files that I was unable to find a proper place
for them, and 3 ones that are still in plain old text format.
Let's place those stuff behind the carpet, as we'd like to keep the
root directory clean.
We can later discuss and move those into better places.
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Link: https://lore.kernel.org/r/11bd0d75e65a874f7c276a0aeab0fe13f3376f5f.1592203650.git.mchehab+huawei@kernel.org
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Alexei Starovoitov says:
====================
pull-request: bpf 2020-06-17
The following pull-request contains BPF updates for your *net* tree.
We've added 10 non-merge commits during the last 2 day(s) which contain
a total of 14 files changed, 158 insertions(+), 59 deletions(-).
The main changes are:
1) Important fix for bpf_probe_read_kernel_str() return value, from Andrii.
2) [gs]etsockopt fix for large optlen, from Stanislav.
3) devmap allocation fix, from Toke.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Memset on the pointer right after malloc can cause a NULL pointer
deference if it failed to allocate memory. A simple fix is to
replace malloc()/memset() pair with a simple call to calloc().
Fixes: 0fca931a6f ("samples/bpf: program demonstrating access to xdp_rxq_info")
Signed-off-by: Gaurav Singh <gaurav1086@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
There is a regular need in the kernel to provide a way to declare having a
dynamically sized set of trailing elements in a structure. Kernel code should
always use “flexible array members”[1] for these cases. The older style of
one-element or zero-length arrays should no longer be used[2].
[1] https://en.wikipedia.org/wiki/Flexible_array_member
[2] https://github.com/KSPP/linux/issues/21
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>