mirror of
https://mirrors.bfsu.edu.cn/git/linux.git
synced 2024-11-27 22:24:11 +08:00
eff94154cc
Experience from production shows queue size of 192 is too small, as this caused packet drops during cpumap-enqueue on RX-CPU. This can be diagnosed with xdp_monitor sample program. This bpftrace program was used to diagnose the problem in more detail: bpftrace -e ' tracepoint:xdp:xdp_cpumap_kthread { @deq_bulk = lhist(args->processed,0,10,1); @drop_net = lhist(args->drops,0,10,1) } tracepoint:xdp:xdp_cpumap_enqueue { @enq_bulk = lhist(args->processed,0,10,1); @enq_drops = lhist(args->drops,0,10,1); }' Watch out for the @enq_drops counter. The @drop_net counter can happen when netstack gets invalid packets, so don't despair it can be natural, and that counter will likely disappear in newer kernels as it was a source of confusion (look at netstat info for reason of the netstack @drop_net counters). The production system was configured with CPU power-saving C6 state. Learn more in this blogpost[1]. And wakeup latency in usec for the states are: # grep -H . /sys/devices/system/cpu/cpu0/cpuidle/*/latency /sys/devices/system/cpu/cpu0/cpuidle/state0/latency:0 /sys/devices/system/cpu/cpu0/cpuidle/state1/latency:2 /sys/devices/system/cpu/cpu0/cpuidle/state2/latency:10 /sys/devices/system/cpu/cpu0/cpuidle/state3/latency:133 Deepest state take 133 usec to wakeup from (133/10^6). The link speed is 25Gbit/s ((25*10^9/8) in bytes/sec). How many bytes can arrive with in 133 usec at this speed: (25*10^9/8)*(133/10^6) = 415625 bytes. With MTU size packets this is 275 packets, and with minimum Ethernet (incl intergap overhead) 84 bytes it is 4948 packets. Clearly default queue size is too small. Setting default cpumap queue to 2048 as worst-case (small packet) at 10Gbit/s is 1979 packets with 133 usec wakeup time, +64 packet before kthread wakeup call (due to xdp_do_flush) worst-case 2043 packets. Thus, if a packet burst on RX-CPU will enqueue packets to a remote cpumap CPU that is in deep-sleep state it can overrun the cpumap queue. The production system was also configured to avoid deep-sleep via: tuned-adm profile network-latency [1] https://jeremyeder.com/2013/08/30/oh-did-you-expect-the-cpu/ Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Song Liu <songliubraving@fb.com> Link: https://lore.kernel.org/bpf/162523477604.786243.13372630844944530891.stgit@firesoul |
||
---|---|---|
.. | ||
.gitignore | ||
asm_goto_workaround.h | ||
bpf_insn.h | ||
cookie_uid_helper_example.c | ||
cpustat_kern.c | ||
cpustat_user.c | ||
do_hbm_test.sh | ||
fds_example.c | ||
hash_func01.h | ||
hbm_edt_kern.c | ||
hbm_kern.h | ||
hbm_out_kern.c | ||
hbm.c | ||
hbm.h | ||
ibumad_kern.c | ||
ibumad_user.c | ||
lathist_kern.c | ||
lathist_user.c | ||
lwt_len_hist_kern.c | ||
lwt_len_hist_user.c | ||
lwt_len_hist.sh | ||
Makefile | ||
Makefile.target | ||
map_perf_test_kern.c | ||
map_perf_test_user.c | ||
offwaketime_kern.c | ||
offwaketime_user.c | ||
parse_ldabs.c | ||
parse_simple.c | ||
parse_varlen.c | ||
README.rst | ||
run_cookie_uid_helper_example.sh | ||
sampleip_kern.c | ||
sampleip_user.c | ||
sock_example.c | ||
sock_example.h | ||
sock_flags_kern.c | ||
sockex1_kern.c | ||
sockex1_user.c | ||
sockex2_kern.c | ||
sockex2_user.c | ||
sockex3_kern.c | ||
sockex3_user.c | ||
spintest_kern.c | ||
spintest_user.c | ||
syscall_nrs.c | ||
syscall_tp_kern.c | ||
syscall_tp_user.c | ||
task_fd_query_kern.c | ||
task_fd_query_user.c | ||
tc_l2_redirect_kern.c | ||
tc_l2_redirect_user.c | ||
tc_l2_redirect.sh | ||
tcbpf1_kern.c | ||
tcp_basertt_kern.c | ||
tcp_bpf.readme | ||
tcp_bufs_kern.c | ||
tcp_clamp_kern.c | ||
tcp_cong_kern.c | ||
tcp_dumpstats_kern.c | ||
tcp_iw_kern.c | ||
tcp_rwnd_kern.c | ||
tcp_synrto_kern.c | ||
tcp_tos_reflect_kern.c | ||
test_cgrp2_array_pin.c | ||
test_cgrp2_attach.c | ||
test_cgrp2_sock2.c | ||
test_cgrp2_sock2.sh | ||
test_cgrp2_sock.c | ||
test_cgrp2_sock.sh | ||
test_cgrp2_tc_kern.c | ||
test_cgrp2_tc.sh | ||
test_cls_bpf.sh | ||
test_current_task_under_cgroup_kern.c | ||
test_current_task_under_cgroup_user.c | ||
test_lru_dist.c | ||
test_lwt_bpf.c | ||
test_lwt_bpf.sh | ||
test_map_in_map_kern.c | ||
test_map_in_map_user.c | ||
test_overhead_kprobe_kern.c | ||
test_overhead_raw_tp_kern.c | ||
test_overhead_tp_kern.c | ||
test_overhead_user.c | ||
test_override_return.sh | ||
test_probe_write_user_kern.c | ||
test_probe_write_user_user.c | ||
trace_common.h | ||
trace_event_kern.c | ||
trace_event_user.c | ||
trace_output_kern.c | ||
trace_output_user.c | ||
tracex1_kern.c | ||
tracex1_user.c | ||
tracex2_kern.c | ||
tracex2_user.c | ||
tracex3_kern.c | ||
tracex3_user.c | ||
tracex4_kern.c | ||
tracex4_user.c | ||
tracex5_kern.c | ||
tracex5_user.c | ||
tracex6_kern.c | ||
tracex6_user.c | ||
tracex7_kern.c | ||
tracex7_user.c | ||
xdp1_kern.c | ||
xdp1_user.c | ||
xdp2_kern.c | ||
xdp2skb_meta_kern.c | ||
xdp2skb_meta.sh | ||
xdp_adjust_tail_kern.c | ||
xdp_adjust_tail_user.c | ||
xdp_fwd_kern.c | ||
xdp_fwd_user.c | ||
xdp_monitor_kern.c | ||
xdp_monitor_user.c | ||
xdp_redirect_cpu_kern.c | ||
xdp_redirect_cpu_user.c | ||
xdp_redirect_kern.c | ||
xdp_redirect_map_kern.c | ||
xdp_redirect_map_multi_kern.c | ||
xdp_redirect_map_multi_user.c | ||
xdp_redirect_map_user.c | ||
xdp_redirect_user.c | ||
xdp_router_ipv4_kern.c | ||
xdp_router_ipv4_user.c | ||
xdp_rxq_info_kern.c | ||
xdp_rxq_info_user.c | ||
xdp_sample_pkts_kern.c | ||
xdp_sample_pkts_user.c | ||
xdp_tx_iptunnel_common.h | ||
xdp_tx_iptunnel_kern.c | ||
xdp_tx_iptunnel_user.c | ||
xdpsock_ctrl_proc.c | ||
xdpsock_kern.c | ||
xdpsock_user.c | ||
xdpsock.h | ||
xsk_fwd.c |
eBPF sample programs ==================== This directory contains a test stubs, verifier test-suite and examples for using eBPF. The examples use libbpf from tools/lib/bpf. Build dependencies ================== Compiling requires having installed: * clang >= version 3.4.0 * llvm >= version 3.7.1 Note that LLVM's tool 'llc' must support target 'bpf', list version and supported targets with command: ``llc --version`` Clean and configuration ----------------------- It can be needed to clean tools, samples or kernel before trying new arch or after some changes (on demand):: make -C tools clean make -C samples/bpf clean make clean Configure kernel, defconfig for instance:: make defconfig Kernel headers -------------- There are usually dependencies to header files of the current kernel. To avoid installing devel kernel headers system wide, as a normal user, simply call:: make headers_install This will creates a local "usr/include" directory in the git/build top level directory, that the make system automatically pickup first. Compiling ========= For building the BPF samples, issue the below command from the kernel top level directory:: make M=samples/bpf It is also possible to call make from this directory. This will just hide the invocation of make as above. Manually compiling LLVM with 'bpf' support ------------------------------------------ Since version 3.7.0, LLVM adds a proper LLVM backend target for the BPF bytecode architecture. By default llvm will build all non-experimental backends including bpf. To generate a smaller llc binary one can use:: -DLLVM_TARGETS_TO_BUILD="BPF" We recommend that developers who want the fastest incremental builds use the Ninja build system, you can find it in your system's package manager, usually the package is ninja or ninja-build. Quick sniplet for manually compiling LLVM and clang (build dependencies are ninja, cmake and gcc-c++):: $ git clone https://github.com/llvm/llvm-project.git $ mkdir -p llvm-project/llvm/build $ cd llvm-project/llvm/build $ cmake .. -G "Ninja" -DLLVM_TARGETS_TO_BUILD="BPF;X86" \ -DLLVM_ENABLE_PROJECTS="clang" \ -DCMAKE_BUILD_TYPE=Release \ -DLLVM_BUILD_RUNTIME=OFF $ ninja It is also possible to point make to the newly compiled 'llc' or 'clang' command via redefining LLC or CLANG on the make command line:: make M=samples/bpf LLC=~/git/llvm-project/llvm/build/bin/llc CLANG=~/git/llvm-project/llvm/build/bin/clang Cross compiling samples ----------------------- In order to cross-compile, say for arm64 targets, export CROSS_COMPILE and ARCH environment variables before calling make. But do this before clean, cofiguration and header install steps described above. This will direct make to build samples for the cross target:: export ARCH=arm64 export CROSS_COMPILE="aarch64-linux-gnu-" Headers can be also installed on RFS of target board if need to keep them in sync (not necessarily and it creates a local "usr/include" directory also):: make INSTALL_HDR_PATH=~/some_sysroot/usr headers_install Pointing LLC and CLANG is not necessarily if it's installed on HOST and have in its targets appropriate arm64 arch (usually it has several arches). Build samples:: make M=samples/bpf Or build samples with SYSROOT if some header or library is absent in toolchain, say libelf, providing address to file system containing headers and libs, can be RFS of target board:: make M=samples/bpf SYSROOT=~/some_sysroot