linux

korg/linux

mirror of https://mirrors.bfsu.edu.cn/git/linux.git synced 2024-11-17 01:04:19 +08:00

Author	SHA1	Message	Date
Martin KaFai Lau	263d0b3533	bpf: Add ene-to-end test for bpf_sk_storage_* helpers This patch rides on an existing BPF_PROG_TYPE_CGROUP_SKB test (test_sock_fields.c) to do a TCP end-to-end test on the new bpf_sk_storage_* helpers. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2019-04-27 09:07:05 -07:00
Martin KaFai Lau	51a0e301a5	bpf: Add BPF_MAP_TYPE_SK_STORAGE test to test_maps This patch adds BPF_MAP_TYPE_SK_STORAGE test to test_maps. The src file is rather long, so it is put into another dir map_tests/ and compile like the current prog_tests/ does. Other existing tests in test_maps can also be re-factored into map_tests/ in the future. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2019-04-27 09:07:05 -07:00
Martin KaFai Lau	7a9bb9762d	bpf: Add verifier tests for the bpf_sk_storage This patch adds verifier tests for the bpf_sk_storage: 1. ARG_PTR_TO_MAP_VALUE_OR_NULL 2. Map and helper compatibility (e.g. disallow bpf_map_loookup_elem) It also takes this chance to remove the unused struct btf_raw_data and uses the BTF encoding macros from "test_btf.h". Acked-by: Yonghong Song <yhs@fb.com> Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2019-04-27 09:07:05 -07:00
Martin KaFai Lau	3f4d4c7410	bpf: Refactor BTF encoding macro to test_btf.h Refactor common BTF encoding macros for other tests to use. The libbpf may reuse some of them in the future which requires some more thoughts before publishing as a libbpf API. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2019-04-27 09:07:05 -07:00
Martin KaFai Lau	a19f89f366	bpf: Support BPF_MAP_TYPE_SK_STORAGE in bpf map probing This patch supports probing for the new BPF_MAP_TYPE_SK_STORAGE. BPF_MAP_TYPE_SK_STORAGE enforces BTF usage, so the new probe requires to create and load a BTF also. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2019-04-27 09:07:05 -07:00
Martin KaFai Lau	948d930e3d	bpf: Sync bpf.h to tools This patch sync the bpf.h to tools/. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2019-04-27 09:07:05 -07:00
Matt Mullins	e950e84336	selftests: bpf: test writable buffers in raw tps This tests that: * a BPF_PROG_TYPE_RAW_TRACEPOINT_WRITABLE cannot be attached if it uses either: * a variable offset to the tracepoint buffer, or * an offset beyond the size of the tracepoint buffer * a tracer can modify the buffer provided when attached to a writable tracepoint in bpf_prog_test_run Signed-off-by: Matt Mullins <mmullins@fb.com> Acked-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2019-04-26 19:04:19 -07:00
Matt Mullins	4635b0ae4d	tools: sync bpf.h This adds BPF_PROG_TYPE_RAW_TRACEPOINT_WRITABLE, and fixes up the error: enumeration value ‘BPF_PROG_TYPE_RAW_TRACEPOINT_WRITABLE’ not handled in switch [-Werror=switch-enum] build errors it would otherwise cause in libbpf. Signed-off-by: Matt Mullins <mmullins@fb.com> Acked-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2019-04-26 19:04:19 -07:00
David S. Miller	ad759c9069	Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf Alexei Starovoitov says: ==================== pull-request: bpf 2019-04-25 The following pull-request contains BPF updates for your net tree. The main changes are: 1) the bpf verifier fix to properly mark registers in all stack frames, from Paul. 2) preempt_enable_no_resched->preempt_enable fix, from Peter. 3) other misc fixes. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2019-04-26 01:54:42 -04:00
Jiri Pirko	e05b2d141f	netdevsim: move netdev creation/destruction to dev probe Remove the existing way to create netdevsim over rtnetlink and move the netdev creation/destruction to dev probe, so for every probed port, a netdevsim-netdev instance is created. Adjust selftests to work with new interface. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-04-26 01:52:03 -04:00
Jiri Pirko	ab1d0cc004	netdevsim: change debugfs tree topology With the model where dev is represented by devlink and ports are represented by devlink ports, make debugfs file names independent on netdev names. Change the topology to the one illustrated by the following example: $ ls /sys/kernel/debug/netdevsim/ netdevsim1 $ ls /sys/kernel/debug/netdevsim/netdevsim1/ bpf_bind_accept bpf_bind_verifier_delay bpf_bound_progs ports $ ls /sys/kernel/debug/netdevsim/netdevsim1/ports/ 0 1 $ ls /sys/kernel/debug/netdevsim/netdevsim1/ports/0/ bpf_map_accept bpf_offloaded_id bpf_tc_accept bpf_tc_non_bound_accept bpf_xdpdrv_accept bpf_xdpoffload_accept dev ipsec $ ls /sys/kernel/debug/netdevsim/netdevsim1/ports/0/dev -l lrwxrwxrwx 1 root root 0 Apr 13 15:58 /sys/kernel/debug/netdevsim/netdevsim1/ports/0/dev -> ../../../netdevsim1 Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-04-26 01:52:02 -04:00
Jiri Pirko	d514f41e79	netdevsim: merge sdev into dev As previously introduce dev which is mapped 1:1 to a bus device covers the purpose of the original shared device, merge the sdev code into dev. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-04-26 01:52:02 -04:00
Andrii Nakryiko	8ed1875bf3	bpftool: fix indendation in bash-completion/bpftool Fix misaligned default case branch for `prog dump` sub-command. Reported-by: Quentin Monnet <quentin.monnet@netronome.com> Cc: Yonghong Song <yhs@fb.com> Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com> Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2019-04-25 21:45:14 -07:00
Andrii Nakryiko	4a714feefd	bpftool: add bash completions for btf command Add full support for btf command in bash-completion script. Cc: Quentin Monnet <quentin.monnet@netronome.com> Cc: Yonghong Song <yhs@fb.com> Cc: Alexei Starovoitov <ast@fb.com> Cc: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com> Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2019-04-25 21:45:14 -07:00
Andrii Nakryiko	ca253339af	bpftool/docs: add btf sub-command documentation Document usage and sample output format for `btf dump` sub-command. Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Alexei Starovoitov <ast@fb.com> Cc: Yonghong Song <yhs@fb.com> Cc: Martin KaFai Lau <kafai@fb.com> Cc: Song Liu <songliubraving@fb.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Acked-by: Yonghong Song <yhs@fb.com> Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2019-04-25 21:45:14 -07:00
Andrii Nakryiko	c93cc69004	bpftool: add ability to dump BTF types Add new `btf dump` sub-command to bpftool. It allows to dump human-readable low-level BTF types representation of BTF types. BTF can be retrieved from few different sources: - from BTF object by ID; - from PROG, if it has associated BTF; - from MAP, if it has associated BTF data; it's possible to narrow down types to either key type, value type, both, or all BTF types; - from ELF file (.BTF section). Output format mostly follows BPF verifier log format with few notable exceptions: - all the type/field/param/etc names are enclosed in single quotes to allow easier grepping and to stand out a little bit more; - FUNC_PROTO output follows STRUCT/UNION/ENUM format of having one line per each argument; this is more uniform and allows easy grepping, as opposed to succinct, but inconvenient format that BPF verifier log is using. Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Alexei Starovoitov <ast@fb.com> Cc: Yonghong Song <yhs@fb.com> Cc: Martin KaFai Lau <kafai@fb.com> Cc: Song Liu <songliubraving@fb.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Acked-by: Yonghong Song <yhs@fb.com> Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2019-04-25 21:45:14 -07:00
David S. Miller	8b44836583	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Two easy cases of overlapping changes. Signed-off-by: David S. Miller <davem@davemloft.net>	2019-04-25 23:52:29 -04:00
Benjamin Poirier	77d764263d	bpftool: Fix errno variable usage The test meant to use the saved value of errno. Given the current code, it makes no practical difference however. Fixes: `bf598a8f0f` ("bpftool: Improve handling of ENOENT on map dumps") Signed-off-by: Benjamin Poirier <bpoirier@suse.com> Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com> Acked-by: Song Liu <songliubraving@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2019-04-25 17:30:14 -07:00
Paul Chaignon	6dd7f14080	selftests/bpf: test cases for pkt/null checks in subprogs The first test case, for pointer null checks, is equivalent to the following pseudo-code. It checks that the verifier does not complain on line 6 and recognizes that ptr isn't null. 1: ptr = bpf_map_lookup_elem(map, &key); 2: ret = subprog(ptr) { 3: return ptr != NULL; 4: } 5: if (ret) 6: value = ptr; The second test case, for packet bound checks, is equivalent to the following pseudo-code. It checks that the verifier does not complain on line 7 and recognizes that the packet is at least 1 byte long. 1: pkt_end = ctx.pkt_end; 2: ptr = ctx.pkt + 8; 3: ret = subprog(ptr, pkt_end) { 4: return ptr <= pkt_end; 5: } 6: if (ret) 7: value = (u8 *)ctx.pkt; Signed-off-by: Paul Chaignon <paul.chaignon@orange.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2019-04-25 17:20:06 -07:00
Matteo Croce	39391377f8	libbpf: add binary to gitignore Some binaries are generated when building libbpf from tools/lib/bpf/, namely libbpf.so.0.0.2 and libbpf.so.0. Add them to the local .gitignore. Signed-off-by: Matteo Croce <mcroce@redhat.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Acked-by: Song Liu <songliubraving@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2019-04-25 17:20:06 -07:00
Alban Crequy	8694d8c1f8	tools: bpftool: fix infinite loop in map create "bpftool map create" has an infinite loop on "while (argc)". The error case is missing. Symptoms: when forgetting to type the keyword 'type' in front of 'hash': $ sudo bpftool map create /sys/fs/bpf/dir/foobar hash key 8 value 8 entries 128 (infinite loop, taking all the CPU) ^C After the patch: $ sudo bpftool map create /sys/fs/bpf/dir/foobar hash key 8 value 8 entries 128 Error: unknown arg hash Fixes: `0b592b5a01` ("tools: bpftool: add map create command") Signed-off-by: Alban Crequy <alban@kinvolk.io> Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com> Acked-by: Song Liu <songliubraving@fb.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2019-04-25 17:20:06 -07:00
Kees Cook	4ee0776760	selftests/seccomp: Prepare for exclusive seccomp flags Some seccomp flags will become exclusive, so the selftest needs to be adjusted to mask those out and test them individually for the "all flags" tests. Cc: stable@vger.kernel.org # v5.0+ Signed-off-by: Kees Cook <keescook@chromium.org> Reviewed-by: Tycho Andersen <tycho@tycho.ws> Acked-by: James Morris <jamorris@linux.microsoft.com>	2019-04-25 15:55:48 -07:00
Stanislav Fomichev	7f0c57fec8	bpftool: show flow_dissector attachment status Right now there is no way to query whether BPF flow_dissector program is attached to a network namespace or not. In previous commit, I added support for querying that info, show it when doing `bpftool net`: $ bpftool prog loadall ./bpf_flow.o \ /sys/fs/bpf/flow type flow_dissector \ pinmaps /sys/fs/bpf/flow $ bpftool prog 3: flow_dissector name _dissect tag 8c9e917b513dd5cc gpl loaded_at 2019-04-23T16:14:48-0700 uid 0 xlated 656B jited 461B memlock 4096B map_ids 1,2 btf_id 1 ... $ bpftool net -j [{"xdp":[],"tc":[],"flow_dissector":[]}] $ bpftool prog attach pinned \ /sys/fs/bpf/flow/flow_dissector flow_dissector $ bpftool net -j [{"xdp":[],"tc":[],"flow_dissector":["id":3]}] Doesn't show up in a different net namespace: $ ip netns add test $ ip netns exec test bpftool net -j [{"xdp":[],"tc":[],"flow_dissector":[]}] Non-json output: $ bpftool net xdp: tc: flow_dissector: id 3 v2: * initialization order (Jakub Kicinski) * clear errno for batch mode (Quentin Monnet) Signed-off-by: Stanislav Fomichev <sdf@google.com> Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2019-04-25 23:49:06 +02:00
Daniel T. Lee	32e621e554	libbpf: fix samples/bpf build failure due to undefined UINT32_MAX Currently, building bpf samples will cause the following error. ./tools/lib/bpf/bpf.h:132:27: error: 'UINT32_MAX' undeclared here (not in a function) .. #define BPF_LOG_BUF_SIZE (UINT32_MAX >> 8) /* verifier maximum in kernels <= 5.1 */ ^ ./samples/bpf/bpf_load.h:31:25: note: in expansion of macro 'BPF_LOG_BUF_SIZE' extern char bpf_log_buf[BPF_LOG_BUF_SIZE]; ^~~~~~~~~~~~~~~~ Due to commit `4519efa6f8` ("libbpf: fix BPF_LOG_BUF_SIZE off-by-one error") hard-coded size of BPF_LOG_BUF_SIZE has been replaced with UINT32_MAX which is defined in <stdint.h> header. Even with this change, bpf selftests are running fine since these are built with clang and it includes header(-idirafter) from clang/6.0.0/include. (it has <stdint.h>) clang -I. -I./include/uapi -I../../../include/uapi -idirafter /usr/local/include -idirafter /usr/include \ -idirafter /usr/lib/llvm-6.0/lib/clang/6.0.0/include -idirafter /usr/include/x86_64-linux-gnu \ -Wno-compare-distinct-pointer-types -O2 -target bpf -emit-llvm -c progs/test_sysctl_prog.c -o - \| \ llc -march=bpf -mcpu=generic -filetype=obj -o /linux/tools/testing/selftests/bpf/test_sysctl_prog.o But bpf samples are compiled with GCC, and it only searches and includes headers declared at the target file. As '#include <stdint.h>' hasn't been declared in tools/lib/bpf/bpf.h, it causes build failure of bpf samples. gcc -Wp,-MD,./samples/bpf/.sockex3_user.o.d -Wall -Wmissing-prototypes -Wstrict-prototypes \ -O2 -fomit-frame-pointer -std=gnu89 -I./usr/include -I./tools/lib/ -I./tools/testing/selftests/bpf/ \ -I./tools/ lib/ -I./tools/include -I./tools/perf -c -o ./samples/bpf/sockex3_user.o ./samples/bpf/sockex3_user.c; This commit add declaration of '#include <stdint.h>' to tools/lib/bpf/bpf.h to fix this problem. Signed-off-by: Daniel T. Lee <danieltimlee@gmail.com> Acked-by: Yonghong Song <yhs@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2019-04-25 23:34:10 +02:00
Daniel Borkmann	4f8827d2b6	bpf, libbpf: fix segfault in bpf_object__init_maps' pr_debug statement Ran into it while testing; in bpf_object__init_maps() data can be NULL in the case where no map section is present. Therefore we simply cannot access data->d_size before NULL test. Move the pr_debug() where it's safe to access. Fixes: `d859900c4c` ("bpf, libbpf: support global data/bss/rodata sections") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2019-04-25 13:47:29 -07:00
Daniel Borkmann	8837fe5dd0	bpf, libbpf: handle old kernels more graceful wrt global data sections Andrii reported a corner case where e.g. global static data is present in the BPF ELF file in form of .data/.bss/.rodata section, but without any relocations to it. Such programs could be loaded before commit `d859900c4c` ("bpf, libbpf: support global data/bss/rodata sections"), whereas afterwards if kernel lacks support then loading would fail. Add a probing mechanism which skips setting up libbpf internal maps in case of missing kernel support. In presence of relocation entries, we abort the load attempt. Fixes: `d859900c4c` ("bpf, libbpf: support global data/bss/rodata sections") Reported-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2019-04-25 13:47:29 -07:00
Mathieu Poirier	82500a810e	coresight: etm4x: Add kernel configuration for CONTEXTID Set the proper bit in the configuration register when contextID tracing has been requested by user space. That way PE_CONTEXT elements are generated by the tracers when a process is installed on a CPU. Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org> Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com> Tested-by: Leo Yan <leo.yan@linaro.org> Tested-by: Robert Walker <robert.walker@arm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2019-04-25 22:00:16 +02:00
Kees Cook	5821ba9695	selftests: Add test plan API to kselftest.h and adjust callers The test plan for TAP needs to be declared immediately after the header. This adds the test plan API to kselftest.h and updates all callers to declare their expected test counts. Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>	2019-04-25 13:15:46 -06:00
Kees Cook	f41c322f17	selftests: Remove KSFT_TAP_LEVEL Since sub-testing can now be detected by indentation level, this removes KSFT_TAP_LEVEL so that subtests report their TAP header for later parsing. Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>	2019-04-25 13:15:26 -06:00
Kees Cook	5c069b6ded	selftests: Move test output to diagnostic lines This changes the selftest output so that each test's output is prefixed with "# " as a TAP "diagnostic line". This creates a bit of a kernel-specific TAP dialect where the diagnostics precede the results. The TAP spec isn't entirely clear about this, though, so I think it's the correct solution so as to keep interactive runs making sense. If the output _followed_ the result line in the spec-suggested YAML form, each test would dump all of its output at once instead of as it went, making debugging harder. This does, however, solve the recursive TAP output problem, as sub-tests will simply be prefixed by "# ". Parsing sub-tests becomes a simple problem of just removing the first two characters of a given top-level test's diagnostic output, and parsing the results. Note that the shell construct needed to both get an exit code from the first command in a pipe and still filter the pipe (to add the "# " prefix) uses a POSIX solution rather than the bash "pipefail" option which is not supported by dash. Since some test environments may have a very minimal set of utilities available, the new prefixing code will fall back to doing line-at-a-time prefixing if perl and/or stdbuf are not available. Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>	2019-04-25 13:15:19 -06:00
Kees Cook	fd63b2eae5	selftests: Distinguish between missing and non-executable If a test was missing (e.g. wrong architecture, etc), the test runner would incorrectly claim the test was non-executable. This adds an existence check to report correctly. Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>	2019-04-25 13:15:10 -06:00
Kees Cook	b0df366bbd	selftests: Add plan line and fix result line syntax The TAP version 13 spec requires a "plan" line, which has been missing. Since we always know how many tests we're going to run, emit the count on the plan line. This also fixes the result lines to remove the "1.." prefix which is against spec, and to mark skips with the correct "# SKIP" suffix. Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>	2019-04-25 13:14:45 -06:00
Kees Cook	bf66078235	selftests: Extract logic for multiple test runs This moves the logic for running multiple tests into a single "run_many" function of runner.sh. Both "run_tests" and "emit_tests" are modified to use it. Summary handling is now controlled by the "per_test_logging" shell flag. Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>	2019-04-25 13:14:38 -06:00
Kees Cook	d4e59a536f	selftests: Use runner.sh for emit targets This reuses the new runner.sh for the emit targets instead of manually running each test via run_kselftest.sh. Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>	2019-04-25 13:14:30 -06:00
Kees Cook	42d46e57ec	selftests: Extract single-test shell logic from lib.mk In order to improve the reusability of the kselftest test running logic, this extracts the single-test logic from lib.mk into kselftest/runner.sh which lib.mk can call directly. No changes in output. As part of the change, this moves the "summary" Makefile logic around to set a new "logfile" output. This will be used again in the future "emit_tests" target as well. Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>	2019-04-25 13:14:13 -06:00
Linus Torvalds	cd8dead0c3	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Pull networking fixes from David Miller: "Just the usual assortment of small'ish fixes: 1) Conntrack timeout is sometimes not initialized properly, from Alexander Potapenko. 2) Add a reasonable range limit to tcp_min_rtt_wlen to avoid undefined behavior. From ZhangXiaoxu. 3) des1 field of descriptor in stmmac driver is initialized with the wrong variable. From Yue Haibing. 4) Increase mlxsw pci sw reset timeout a little bit more, from Ido Schimmel. 5) Match IOT2000 stmmac devices more accurately, from Su Bao Cheng. 6) Fallback refcount fix in TLS code, from Jakub Kicinski. 7) Fix max MTU check when using XDP in mlx5, from Maxim Mikityanskiy. 8) Fix recursive locking in team driver, from Hangbin Liu. 9) Fix tls_set_device_offload_Rx() deadlock, from Jakub Kicinski. 10) Don't use napi_alloc_frag() outside of softiq context of socionext driver, from Ilias Apalodimas. 11) MAC address increment overflow in ncsi, from Tao Ren. 12) Fix a regression in 8K/1M pool switching of RDS, from Zhu Yanjun. 13) ipv4_link_failure has to validate the headers that are actually there because RAW sockets can pass in arbitrary garbage, from Eric Dumazet" * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (43 commits) ipv4: add sanity checks in ipv4_link_failure() net/rose: fix unbound loop in rose_loopback_timer() rxrpc: fix race condition in rxrpc_input_packet() net: rds: exchange of 8K and 1M pool net: vrf: Fix operation not supported when set vrf mac net/ncsi: handle overflow when incrementing mac address net: socionext: replace napi_alloc_frag with the netdev variant on init net: atheros: fix spelling mistake "underun" -> "underrun" spi: ST ST95HF NFC: declare missing of table spi: Micrel eth switch: declare missing of table net: stmmac: move stmmac_check_ether_addr() to driver probe netfilter: fix nf_l4proto_log_invalid to log invalid packets netfilter: never get/set skb->tstamp netfilter: ebtables: CONFIG_COMPAT: drop a bogus WARN_ON Documentation: decnet: remove reference to CONFIG_DECNET_ROUTE_FWMARK dt-bindings: add an explanation for internal phy-mode net/tls: don't leak IV and record seq when offload fails net/tls: avoid potential deadlock in tls_set_device_offload_rx() selftests/net: correct the return value for run_afpackettests team: fix possible recursive locking when add slaves ...	2019-04-24 16:18:59 -07:00
Willem de Bruijn	f6ad6accaa	selftests/bpf: expand test_tc_tunnel with SIT encap So far, all BPF tc tunnel testcases encapsulate in the same network protocol. Add an encap testcase that requires updating skb->protocol. The 6in4 tunnel encapsulates an IPv6 packet inside an IPv4 tunnel. Verify that bpf_skb_net_grow correctly updates skb->protocol to select the right protocol handler in __netif_receive_skb_core. The BPF program should also manually update the link layer header to encode the right network protocol. Changes v1->v2 - improve documentation of non-obvious logic Signed-off-by: Willem de Bruijn <willemb@google.com> Tested-by: Alan Maguire <alan.maguire@oracle.com> Acked-by: Yonghong Song <yhs@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2019-04-24 01:32:26 +02:00
Stanislav Fomichev	02ee065836	bpf/flow_dissector: don't adjust nhoff by ETH_HLEN in BPF_PROG_TEST_RUN Now that we use skb-less flow dissector let's return true nhoff and thoff. We used to adjust them by ETH_HLEN because that's how it was done in the skb case. For VLAN tests that looks confusing: nhoff is pointing to vlan parts :-\ Warning, this is an API change for BPF_PROG_TEST_RUN! Feel free to drop if you think that it's too late at this point to fix it. Signed-off-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2019-04-23 18:36:35 +02:00
Stanislav Fomichev	fe993c6468	selftests/bpf: properly return error from bpf_flow_load Right now we incorrectly return 'ret' which is always zero at that point. Signed-off-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2019-04-23 18:36:34 +02:00
Stanislav Fomichev	0905beec9f	selftests/bpf: run flow dissector tests in skb-less mode Export last_dissection map from flow dissector and use a known place in tun driver to trigger BPF flow dissection. Signed-off-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2019-04-23 18:36:34 +02:00
Stanislav Fomichev	c9cb2c1e11	selftests/bpf: add flow dissector bpf_skb_load_bytes helper test When flow dissector is called without skb, we want to make sure bpf_skb_load_bytes invocations return error. Add small test which tries to read single byte from a packet. bpf_skb_load_bytes should always fail under BPF_PROG_TEST_RUN because it was converted to the skb-less mode. Signed-off-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2019-04-23 18:36:34 +02:00
David S. Miller	2843ba2ec7	Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next Alexei Starovoitov says: ==================== pull-request: bpf-next 2019-04-22 The following pull-request contains BPF updates for your net-next tree. The main changes are: 1) allow stack/queue helpers from more bpf program types, from Alban. 2) allow parallel verification of root bpf programs, from Alexei. 3) introduce bpf sysctl hook for trusted root cases, from Andrey. 4) recognize var/datasec in btf deduplication, from Andrii. 5) cpumap performance optimizations, from Jesper. 6) verifier prep for alu32 optimization, from Jiong. 7) libbpf xsk cleanup, from Magnus. 8) other various fixes and cleanups. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2019-04-22 21:35:55 -07:00
David S. Miller	acced9d2b4	Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf Pablo Neira Ayuso says: ==================== Netfilter/IPVS fixes for net The following patchset contains Netfilter/IPVS fixes for your net tree: 1) Add a selftest for icmp packet too big errors with conntrack, from Florian Westphal. 2) Validate inner header in ICMP error message does not lie to us in conntrack, also from Florian. 3) Initialize ct->timeout to calm down KASAN, from Alexander Potapenko. 4) Skip ICMP error messages from tunnels in IPVS, from Julian Anastasov. 5) Use a hash to expose conntrack and expectation ID, from Florian Westphal. 6) Prevent shift wrap in nft_chain_parse_hook(), from Dan Carpenter. 7) Fix broken ICMP ID randomization with NAT, also from Florian. 8) Remove WARN_ON in ebtables compat that is reached via syzkaller, from Florian Westphal. 9) Fix broken timestamps since `fb420d5d91` ("tcp/fq: move back to CLOCK_MONOTONIC"), from Florian. 10) Fix logging of invalid packets in conntrack, from Andrei Vagin. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2019-04-22 21:23:55 -07:00
Shuah Khan	d917fb876f	selftests: build and run gpio when output directory is the src dir Build and run gpio when output directory is the src dir. gpio has dependency on tools/gpio and builds tools/gpio objects in the src directory in all cases making the src repo dirty even when object relocation is specified. This fixes the following commands from generating gpio objects in the source repository: make O=dir kselftest export KBUILD_OUTPUT=dir; make kselftest make O=dir -C tools/testing/selftests expoert KBUILD_OUTPUT=dir; make -C tools/testing/selftests The following commands still build gpio objects in the source repo (gpio Makefile needs to fixed): make O=dir kselftest TARGETS="gpio" export KBUILD_OUTPUT=dir; make kselftest TARGETS="gpio" make O=dir -C tools/testing/selftests TARGETS="gpio" expoert KBUILD_OUTPUT=dir; make -C tools/testing/selftests TARGETS="gpio" Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>	2019-04-22 17:02:26 -06:00
Vishal Verma	92f6f2d7f5	tools/testing/nvdimm: add watermarks for dax_pmem* modules Add nfit_test 'watermarks' for the dax_pmem, dax_pmem_core, and dax_pmem_compat modules. This causes the nfit_test module to fail loading in case any of these modules are also not overridden with the ldconfig wrapped modules. Without this, nfit_test would sometimes fail creation of device-dax namespaces on the nfit_test_bus with an unhelpful error log such as: dax_pmem dax5.0: could not reserve metadata dax_pmem: probe of dax5.0 failed with error -16 Which was caused due to the unwrapped version of devm_request_mem_region() being called. Cc: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Vishal Verma <vishal.l.verma@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2019-04-22 15:56:28 -07:00
Jens Axboe	5c61ee2cd5	Linux 5.1-rc6 -----BEGIN PGP SIGNATURE----- iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAly8rGYeHHRvcnZhbGRz QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGmZMH/1IRB0E1Qmzz8yzw wj79UuRGYPqxDDSWW+wNc8sU4Ic7iYirn9APHAztCdQqsjmzU/OVLfSa3JhdBe5w THo7pbGKBqEDcWnKfNk/21jXFNLZ1vr9BoQv2DGU2MMhHAyo/NZbalo2YVtpQPmM OCRth5n+LzvH7rGrX7RYgWu24G9l3NMfgtaDAXBNXesCGFAjVRrdkU5CBAaabvtU 4GWh/nnutndOOLdByL3x+VZ3H3fIBnbNjcIGCglvvqzk7h3hrfGEl4UCULldTxcM IFsfMUhSw1ENy7F6DHGbKIG90cdCJcrQ8J/ziEzjj/KLGALluutfFhVvr6YCM2J6 2RgU8CY= =CfY1 -----END PGP SIGNATURE----- Merge tag 'v5.1-rc6' into for-5.2/block Pull in v5.1-rc6 to resolve two conflicts. One is in BFQ, in just a comment, and is trivial. The other one is a conflict due to a later fix in the bio multi-page work, and needs a bit more care. * tag 'v5.1-rc6': (770 commits) Linux 5.1-rc6 block: make sure that bvec length can't be overflow block: kill all_q_node in request_queue x86/cpu/intel: Lower the "ENERGY_PERF_BIAS: Set to normal" message's log priority coredump: fix race condition between mmget_not_zero()/get_task_mm() and core dumping mm/kmemleak.c: fix unused-function warning init: initialize jump labels before command line option parsing kernel/watchdog_hld.c: hard lockup message should end with a newline kcov: improve CONFIG_ARCH_HAS_KCOV help text mm: fix inactive list balancing between NUMA nodes and cgroups mm/hotplug: treat CMA pages as unmovable proc: fixup proc-pid-vm test proc: fix map_files test on F29 mm/vmstat.c: fix /proc/vmstat format for CONFIG_DEBUG_TLBFLUSH=y CONFIG_SMP=n mm/memory_hotplug: do not unlock after failing to take the device_hotplug_lock mm: swapoff: shmem_unuse() stop eviction without igrab() mm: swapoff: take notice of completion sooner mm: swapoff: remove too limiting SWAP_UNUSE_MAX_TRIES mm: swapoff: shmem_find_swap_entries() filter out other types slab: store tagged freelist for off-slab slabmgmt ... Signed-off-by: Jens Axboe <axboe@kernel.dk>	2019-04-22 09:47:36 -06:00
Shuah Khan	bc81c1c796	media: selftests: media_dev_allocator api test Add a new test for Media Device Allocator API. Media Device Allocator API to allows multiple drivers share a media device. This API solves a very common use-case for media devices where one physical device (an USB stick) provides both audio and video. When such media device exposes a standard USB Audio class, a proprietary Video class, two or more independent drivers will share a single physical USB bridge. In such cases, it is necessary to coordinate access to the shared resource. Using this API, drivers can allocate a media device with the shared struct device as the key. Once the media device is allocated by a driver, other drivers can get a reference to it. The media device is released when all the references are released. This test does a series of unbind/bind tests to make sure media device is released correctly when it is no longer is use and when the last driver releases the reference. Signed-off-by: Shuah Khan <shuah@kernel.org> Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl> Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>	2019-04-22 11:23:14 -04:00
Po-Hsu Lin	8c03557c3f	selftests/net: correct the return value for run_afpackettests The run_afpackettests will be marked as passed regardless the return value of those sub-tests in the script: -------------------- running psock_tpacket test -------------------- [FAIL] selftests: run_afpackettests [PASS] Fix this by changing the return value for each tests. Signed-off-by: Po-Hsu Lin <po-hsu.lin@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-04-20 20:29:02 -07:00
Linus Torvalds	b25c69b9d5	Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fixes from Ingo Molnar: "Misc fixes: - various tooling fixes - kretprobe fixes - kprobes annotation fixes - kprobes error checking fix - fix the default events for AMD Family 17h CPUs - PEBS fix - AUX record fix - address filtering fix" * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/kprobes: Avoid kretprobe recursion bug kprobes: Mark ftrace mcount handler functions nokprobe x86/kprobes: Verify stack frame on kretprobe perf/x86/amd: Add event map for AMD Family 17h perf bpf: Return NULL when RB tree lookup fails in perf_env__find_btf() perf tools: Fix map reference counting perf evlist: Fix side band thread draining perf tools: Check maps for bpf programs perf bpf: Return NULL when RB tree lookup fails in perf_env__find_bpf_prog_info() tools include uapi: Sync sound/asound.h copy perf top: Always sample time to satisfy needs of use of ordered queuing perf evsel: Use hweight64() instead of hweight_long(attr.sample_regs_user) tools lib traceevent: Fix missing equality check for strcmp perf stat: Disable DIR_FORMAT feature for 'perf stat record' perf scripts python: export-to-sqlite.py: Fix use of parent_id in calls_view perf header: Fix lock/unlock imbalances when processing BPF/BTF info perf/x86: Fix incorrect PEBS_REGS perf/ring_buffer: Fix AUX record suppression perf/core: Fix the address filtering fix kprobes: Fix error check when reusing optimized probes	2019-04-20 10:05:02 -07:00
McCabe, Robert J	4519efa6f8	libbpf: fix BPF_LOG_BUF_SIZE off-by-one error The BPF_PROG_LOAD condition for kernel version <= 5.1 is log->len_total > UINT_MAX >> 8 /* (16 * 1024 * 1024) - 1 */ Signed-off-by: McCabe, Robert J <robert.mccabe@rockwellcollins.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2019-04-19 17:05:48 -07:00
Kees Cook	a147faa96f	selftests/ipc: Fix msgque compiler warnings This fixes the various compiler warnings when building the msgque selftest. The primary change is using sys/msg.h instead of linux/msg.h directly to gain the API declarations. Fixes: `3a665531a3` ("selftests: IPC message queue copy feature test") Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>	2019-04-19 17:18:00 -06:00
Po-Hsu Lin	dff6d2ae56	selftests/efivarfs: clean up test files from test_create*() Test files created by test_create() and test_create_empty() tests will stay in the $efivarfs_mount directory until the system was rebooted. When the tester tries to run this efivarfs test again on the same system, the immutable characteristics in that directory will cause some "Operation not permitted" noises, and a false-positve test result as the file was created in previous run. -------------------- running test_create -------------------- ./efivarfs.sh: line 59: /sys/firmware/efi/efivars/test_create-210be57c-9849-4fc7-a635-e6382d1aec27: Operation not permitted [PASS] -------------------- running test_create_empty -------------------- ./efivarfs.sh: line 78: /sys/firmware/efi/efivars/test_create_empty-210be57c-9849-4fc7-a635-e6382d1aec27: Operation not permitted [PASS] -------------------- Create a file_cleanup() to remove those test files in the end of each test to solve this issue. For the test_create_read, we can move the clean up task to the end of the test to ensure the system is clean. Also, use this function to replace the existing file removal code. Signed-off-by: Po-Hsu Lin <po-hsu.lin@canonical.com> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>	2019-04-19 17:17:19 -06:00
Shuah Khan	8ce72dc325	selftests: fix headers_install circular dependency "make kselftest" fails with "Circular Makefile.o <- prepare dependency dropped." error, when lib.mk invokes "make headers_install". Make level 0: Main make calls selftests run_tests target ... Make level n: selftests lib.mk invokes main make's headers_install The secondary level make inherits builtin-rules which will use the rule to generate Makefile.o and runs into "Circular Makefile.o <- prepare dependency dropped." error, and kselftest compile fails. Invoke headers_install target with --no-builtin-rules to avoid circular error. In addition, lib.mk installs headers in the default HDR_PATH, even when build relocation is requested with O= or export KBUILD_OUTPUT. Fix the problem by passing in INSTALL_HDR_PATH. The headers are installed under the specified output "dir/usr". Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>	2019-04-19 17:15:27 -06:00
Po-Hsu Lin	30c04d796b	selftests/net: correct the return value for run_netsocktests The run_netsocktests will be marked as passed regardless the actual test result from the ./socket: selftests: net: run_netsocktests ======================================== -------------------- running socket test -------------------- [FAIL] ok 1..6 selftests: net: run_netsocktests [PASS] This is because the test script itself has been successfully executed. Fix this by exit 1 when the test failed. Signed-off-by: Po-Hsu Lin <po-hsu.lin@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-04-19 14:39:51 -07:00
Roman Gushchin	5313bfe425	kselftests: cgroup: add freezer controller self-tests This patch implements 9 tests for the freezer controller for cgroup v2: 1) a simple test, which aims to freeze and unfreeze a cgroup with 100 processes 2) a more complicated tree test, which creates a hierarchy of cgroups, puts some processes in some cgroups, and tries to freeze and unfreeze different parts of the subtree 3) a forkbomb test: the test aims to freeze a forkbomb running in a cgroup, kill all tasks in the cgroup and remove the cgroup without the unfreezing. 4) rmdir test: the test creates two nested cgroups, freezes the parent one, checks that the child can be successfully removed, and a new child can be created 5) migration tests: the test checks migration of a task between frozen cgroups: from a frozen to a running, from a running to a frozen, and from a frozen to a frozen. 6) ptrace test: the test checks that it's possible to attach to a process in a frozen cgroup, get some information and detach, and the cgroup will remain frozen. 7) stopped test: the test checks that it's possible to freeze a cgroup with a stopped task 8) ptraced test: the test checks that it's possible to freeze a cgroup with a ptraced task 9) vfork test: the test checks that it's possible to freeze a cgroup with a parent process waiting for the child process in vfork() Expected output: $ ./test_freezer ok 1 test_cgfreezer_simple ok 2 test_cgfreezer_tree ok 3 test_cgfreezer_forkbomb ok 4 test_cgrreezer_rmdir ok 5 test_cgfreezer_migrate ok 6 test_cgfreezer_ptrace ok 7 test_cgfreezer_stopped ok 8 test_cgfreezer_ptraced ok 9 test_cgfreezer_vfork Signed-off-by: Roman Gushchin <guro@fb.com> Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Shuah Khan <shuah@kernel.org> Cc: kernel-team@fb.com Cc: linux-kselftest@vger.kernel.org	2019-04-19 11:26:49 -07:00
Roman Gushchin	ff9fb7cb51	kselftests: cgroup: don't fail on cg_kill_all() error in cg_destroy() If the cgroup destruction races with an exit() of a belonging process(es), cg_kill_all() may fail. It's not a good reason to make cg_destroy() fail and leave the cgroup in place, potentially causing next test runs to fail. Signed-off-by: Roman Gushchin <guro@fb.com> Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Shuah Khan <shuah@kernel.org> Cc: kernel-team@fb.com Cc: linux-kselftest@vger.kernel.org	2019-04-19 11:26:49 -07:00
Alexey Dobriyan	68545aa1cd	proc: fixup proc-pid-vm test Silly sizeof(pointer) vs sizeof(uint8_t[]) bug. Link: http://lkml.kernel.org/r/20190414123009.GA12971@avx2 Fixes: `e483b02087` ("proc: test /proc/*/maps, smaps, smaps_rollup, statm") Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2019-04-19 09:46:04 -07:00
Alexey Dobriyan	8cd40d1d41	proc: fix map_files test on F29 F29 bans mapping first 64KB even for root making test fail. Iterate from address 0 until mmap() works. Gentoo (root): openat(AT_FDCWD, "/dev/zero", O_RDONLY) = 3 mmap(NULL, 4096, PROT_NONE, MAP_PRIVATE\|MAP_FIXED, 3, 0) = 0 Gentoo (non-root): openat(AT_FDCWD, "/dev/zero", O_RDONLY) = 3 mmap(NULL, 4096, PROT_NONE, MAP_PRIVATE\|MAP_FIXED, 3, 0) = -1 EPERM (Operation not permitted) mmap(0x1000, 4096, PROT_NONE, MAP_PRIVATE\|MAP_FIXED, 3, 0) = 0x1000 F29 (root): openat(AT_FDCWD, "/dev/zero", O_RDONLY) = 3 mmap(NULL, 4096, PROT_NONE, MAP_PRIVATE\|MAP_FIXED, 3, 0) = -1 EACCES (Permission denied) mmap(0x1000, 4096, PROT_NONE, MAP_PRIVATE\|MAP_FIXED, 3, 0) = -1 EACCES (Permission denied) mmap(0x2000, 4096, PROT_NONE, MAP_PRIVATE\|MAP_FIXED, 3, 0) = -1 EACCES (Permission denied) mmap(0x3000, 4096, PROT_NONE, MAP_PRIVATE\|MAP_FIXED, 3, 0) = -1 EACCES (Permission denied) mmap(0x4000, 4096, PROT_NONE, MAP_PRIVATE\|MAP_FIXED, 3, 0) = -1 EACCES (Permission denied) mmap(0x5000, 4096, PROT_NONE, MAP_PRIVATE\|MAP_FIXED, 3, 0) = -1 EACCES (Permission denied) mmap(0x6000, 4096, PROT_NONE, MAP_PRIVATE\|MAP_FIXED, 3, 0) = -1 EACCES (Permission denied) mmap(0x7000, 4096, PROT_NONE, MAP_PRIVATE\|MAP_FIXED, 3, 0) = -1 EACCES (Permission denied) mmap(0x8000, 4096, PROT_NONE, MAP_PRIVATE\|MAP_FIXED, 3, 0) = -1 EACCES (Permission denied) mmap(0x9000, 4096, PROT_NONE, MAP_PRIVATE\|MAP_FIXED, 3, 0) = -1 EACCES (Permission denied) mmap(0xa000, 4096, PROT_NONE, MAP_PRIVATE\|MAP_FIXED, 3, 0) = -1 EACCES (Permission denied) mmap(0xb000, 4096, PROT_NONE, MAP_PRIVATE\|MAP_FIXED, 3, 0) = -1 EACCES (Permission denied) mmap(0xc000, 4096, PROT_NONE, MAP_PRIVATE\|MAP_FIXED, 3, 0) = -1 EACCES (Permission denied) mmap(0xd000, 4096, PROT_NONE, MAP_PRIVATE\|MAP_FIXED, 3, 0) = -1 EACCES (Permission denied) mmap(0xe000, 4096, PROT_NONE, MAP_PRIVATE\|MAP_FIXED, 3, 0) = -1 EACCES (Permission denied) mmap(0xf000, 4096, PROT_NONE, MAP_PRIVATE\|MAP_FIXED, 3, 0) = -1 EACCES (Permission denied) mmap(0x10000, 4096, PROT_NONE, MAP_PRIVATE\|MAP_FIXED, 3, 0) = 0x10000 Now all proc tests succeed on F29 if run as root, at last! Link: http://lkml.kernel.org/r/20190414123612.GB12971@avx2 Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Cc: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2019-04-19 09:46:04 -07:00
Martin KaFai Lau	849f257f61	bpf: Increase MAX_NR_MAPS to 17 in test_verifier.c map_fds[16] is the last one index-ed by fixup_map_array_small. Hence, the MAX_NR_MAPS should be 17 instead. Fixes: `fb2abb73e5` ("bpf, selftest: test {rd, wr}only flags and direct value access") Signed-off-by: Martin KaFai Lau <kafai@fb.com> Acked-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2019-04-18 16:10:47 -07:00
Wang YanQing	5de35e3ae9	selftests/bpf: fix compile errors due to unsync linux/in6.h and netinet/in.h I meet below compile errors: " In file included from test_tcpnotify_kern.c:12: /usr/include/netinet/in.h:101:5: error: expected identifier IPPROTO_HOPOPTS = 0, /* IPv6 Hop-by-Hop options. / ^ /usr/include/linux/in6.h:131:26: note: expanded from macro 'IPPROTO_HOPOPTS' ^ In file included from test_tcpnotify_kern.c:12: /usr/include/netinet/in.h:103:5: error: expected identifier IPPROTO_ROUTING = 43, / IPv6 routing header. / ^ /usr/include/linux/in6.h:132:26: note: expanded from macro 'IPPROTO_ROUTING' ^ In file included from test_tcpnotify_kern.c:12: /usr/include/netinet/in.h:105:5: error: expected identifier IPPROTO_FRAGMENT = 44, / IPv6 fragmentation header. / ^ /usr/include/linux/in6.h:133:26: note: expanded from macro 'IPPROTO_FRAGMENT' " The same compile errors are reported for test_tcpbpf_kern.c too. My environment: lsb_release -a: No LSB modules are available. Distributor ID: Ubuntu Description: Ubuntu 16.04.6 LTS Release: 16.04 Codename: xenial dpkg -l \| grep libc-dev: ii libc-dev-bin 2.23-0ubuntu11 amd64 GNU C Library: Development binaries ii linux-libc-dev:amd64 4.4.0-145.171 amd64 Linux Kernel Headers for development. The reason is linux/in6.h and netinet/in.h aren't synchronous about how to handle the same definitions, IPPROTO_HOPOPTS, etc. This patch fixes the compile errors by moving <netinet/in.h> to before the <linux/.h>. Signed-off-by: Wang YanQing <udknight@gmail.com> Acked-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2019-04-18 16:08:40 -07:00
Magnus Karlsson	79b1b30e4c	libbpf: remove compile time warning from libbpf_util.h Having a helpful compile time warning in libbpf_util.h is not a good idea since all warnings are treated as errors. Change this into a comment in the code instead. Fixes: `b7e3a28019` ("libbpf: remove dependency on barrier.h in xsk.h") Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Acked-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2019-04-18 16:07:12 -07:00
Masayoshi Mizuma	37e1677330	ktest: introduce REBOOT_RETURN_CODE to confirm the result of REBOOT Unexpected power cycle occurs while the installation of the kernel. ssh root@Test sync ... [0 seconds] SUCCESS ssh root@Test reboot ... [1 second] FAILED! virsh destroy Test; sleep 5; virsh start Test ... [6 seconds] SUCCESS That is because REBOOT, the default is "ssh $SSH_USER@$MACHINE reboot", exits as 255 even if the reboot is successfully done, like as: ]# ssh root@Test reboot Connection to Test closed by remote host. ]# echo $? 255 ]# To avoid the unexpected power cycle, introduce a new parameter, REBOOT_RETURN_CODE to judge whether REBOOT is successfully done or not. Link: http://lkml.kernel.org/r/20190418135943.12640-1-msys.mizuma@gmail.com Signed-off-by: Masayoshi Mizuma <m.mizuma@jp.fujitsu.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>	2019-04-18 11:25:13 -04:00
Ingo Molnar	94e4dcc75a	Merge branch 'for-mingo' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu into core/rcu Pull RCU and LKMM commits from Paul E. McKenney: - An LKMM commit adding support for synchronize_srcu_expedited() - A couple of straggling RCU flavor consolidation updates - Documentation updates. - Miscellaneous fixes - SRCU updates - RCU CPU stall-warning updates - Torture-test updates Signed-off-by: Ingo Molnar <mingo@kernel.org>	2019-04-18 14:42:24 +02:00
Christian Borntraeger	13209ad039	KVM: s390: add MSA9 to cpumodel This enables stfle.155 and adds the subfunctions for KDSA. Bit 155 is added to the list of facilities that will be enabled when there is no cpu model involved as MSA9 requires no additional handling from userspace, e.g. for migration. Please note that a cpu model enabled user space can and will have the final decision on the facility bits for a guests. Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Acked-by: Janosch Frank <frankja@linux.ibm.com> Reviewed-by: Collin Walling <walling@linux.ibm.com> Reviewed-by: David Hildenbrand <david@redhat.com>	2019-04-18 10:14:11 +02:00
Yonghong Song	ba02de1aa0	selftests/bpf: fix a compilation error I hit the following compilation error with gcc 4.8.5. prog_tests/flow_dissector.c: In function ‘test_flow_dissector’: prog_tests/flow_dissector.c:155:2: error: ‘for’ loop initial declarations are only allowed in C99 mode for (int i = 0; i < ARRAY_SIZE(tests); i++) { ^ prog_tests/flow_dissector.c:155:2: note: use option -std=c99 or -std=gnu99 to compile your code Let us fix the issue by avoiding this particular c99 feature. Fixes: `a5cb33464e` ("selftests/bpf: make flow dissector tests more extensible") Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2019-04-17 22:29:51 -07:00
Masayoshi Mizuma	68911069f5	ktest: Add support for meta characters in GRUB_MENU ktest fails if meta characters are in GRUB_MENU, for example GRUB_MENU = 'Fedora (test)' The failure happens because the meta characters are not escaped, so the menu doesn't match in any entries in GRUB_FILE. Use quotemeta() to escape the meta characters. Link: http://lkml.kernel.org/r/20190417235823.18176-1-msys.mizuma@gmail.com Signed-off-by: Masayoshi Mizuma <m.mizuma@jp.fujitsu.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>	2019-04-17 22:36:30 -04:00
Steven Rostedt (VMware)	fca797f163	ktest: Show name and iteration on errors If a test has an error, display not only the what type of test failed, but if the test was giving a name, display that too, as well as the current iteration of the tests. Each test has an iteration number associated to it. For error messages display that iteration number along with the test type and test name. This includes the message that gets sent via email. Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>	2019-04-17 22:30:33 -04:00
Mimi Zohar	b433a52aa2	selftests/kexec: update get_secureboot_mode The get_secureboot_mode() function unnecessarily requires both CONFIG_EFIVAR_FS and CONFIG_EFI_VARS to be enabled to determine if the system is booted in secure boot mode. On some systems the old EFI variable support is not enabled or, possibly, even implemented. This patch first checks the efivars filesystem for the SecureBoot and SetupMode flags, but falls back to using the old EFI variable support. The "secure_boot_file" and "setup_mode_file" couldn't be quoted due to globbing. This patch also removes the globbing. Signed-off-by: Mimi Zohar <zohar@linux.ibm.com> Reviewed-by: Petr Vorel <pvorel@suse.cz> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>	2019-04-17 15:32:44 -06:00
Mimi Zohar	726ff75f29	selftests/kexec: make kexec_load test independent of IMA being enabled Verify IMA is enabled before failing tests or emitting irrelevant messages. Suggested-by: Dave Young <dyoung@redhat.com> Signed-off-by: Mimi Zohar <zohar@linux.ibm.com> Reviewed-by: Dave Young <dyoung@redhat.com> Reviewed-by: Petr Vorel <pvorel@suse.cz> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>	2019-04-17 15:32:40 -06:00
Mimi Zohar	7cea0b9227	selftests/kexec: check kexec_load and kexec_file_load are enabled Skip the kexec_load and kexec_file_load tests, if they aren't configured in the kernel. This change adds a new requirement that ikconfig is configured in the kexec_load test. Suggested-by: Dave Young <dyoung@redhat.com> Signed-off-by: Mimi Zohar <zohar@linux.ibm.com> Reviewed-by: Petr Vorel <pvorel@suse.cz> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>	2019-04-17 15:32:34 -06:00
Petr Vorel	a4df92adca	selftests/kexec: Add missing '=y' to config options so the file can be used as kernel config snippet. Signed-off-by: Petr Vorel <pvorel@suse.cz> [zohar@linux.ibm.com: remove CONFIG_KEXEC_VERIFY_SIG from config] Signed-off-by: Mimi Zohar <zohar@linux.ibm.com> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>	2019-04-17 15:32:29 -06:00
Mimi Zohar	973b71c60f	selftests/kexec: kexec_file_load syscall test The kernel can be configured to verify PE signed kernel images, IMA kernel image signatures, both types of signatures, or none. This test verifies only properly signed kernel images are loaded into memory, based on the kernel configuration and runtime policies. Signed-off-by: Mimi Zohar <zohar@linux.ibm.com> Reviewed-by: Petr Vorel <pvorel@suse.cz> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>	2019-04-17 15:32:24 -06:00
Mimi Zohar	c660a81796	selftests/kexec: define "require_root_privileges" Many tests require root privileges. Define a common function. Suggested-by: Petr Vorel <pvorel@suse.cz> Signed-off-by: Mimi Zohar <zohar@linux.ibm.com> Reviewed-by: Petr Vorel <pvorel@suse.cz> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>	2019-04-17 15:32:19 -06:00
Mimi Zohar	6038c81526	selftests/kexec: define common logging functions Define log_info, log_pass, log_fail, and log_skip functions. Suggested-by: Petr Vorel <pvorel@suse.cz> Signed-off-by: Mimi Zohar <zohar@linux.ibm.com> Reviewed-by: Petr Vorel <pvorel@suse.cz> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>	2019-04-17 15:32:14 -06:00
Mimi Zohar	5025b0f0fa	selftests/kexec: define a set of common functions Define, update and move get_secureboot_mode() to a common file for use by other tests. Updated to check both the efivar SecureBoot-$(UUID) and SetupMode-$(UUID), based on Dave Young's review. Signed-off-by: Mimi Zohar <zohar@linux.ibm.com> Reviewed-by: Petr Vorel <pvorel@suse.cz> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>	2019-04-17 15:32:02 -06:00
Mimi Zohar	89eba7db8e	selftests/kexec: cleanup the kexec selftest Remove the few bashisms and use the complete option name for clarity. Signed-off-by: Mimi Zohar <zohar@linux.ibm.com> Reviewed-by: Petr Vorel <pvorel@suse.cz> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>	2019-04-17 15:31:55 -06:00
Mimi Zohar	c3c0e81142	selftests/kexec: move the IMA kexec_load selftest to selftests/kexec As requested move the existing kexec_load selftest and subsequent kexec tests to the selftests/kexec directory. Suggested-by: Dave Young <dyoung@redhat.com> Signed-off-by: Mimi Zohar <zohar@linux.ibm.com> Reviewed-by: Petr Vorel <pvorel@suse.cz> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>	2019-04-17 15:10:23 -06:00
David S. Miller	6b0a7f84ea	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Conflict resolution of af_smc.c from Stephen Rothwell. Signed-off-by: David S. Miller <davem@davemloft.net>	2019-04-17 11:26:25 -07:00
Jiri Olsa	2db7b1e0bd	perf bpf: Return NULL when RB tree lookup fails in perf_env__find_btf() We don't return NULL when we don't find the bpf_prog_info_node, fix that. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Reported-by: Song Liu <songliubraving@fb.com> Acked-by: Song Liu <songliubraving@fb.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Fixes: `3792cb2ff4` ("perf bpf: Save BTF in a rbtree in perf_env") Link: http://lkml.kernel.org/r/20190417145539.11669-1-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2019-04-17 14:30:23 -03:00
Jiri Olsa	b9abbdfa88	perf tools: Fix map reference counting By calling maps__insert() we assume to get 2 references on the map, which we relese within maps__remove call. However if there's already same map name, we currently don't bump the reference and can crash, like: Program received signal SIGABRT, Aborted. 0x00007ffff75e60f5 in raise () from /lib64/libc.so.6 (gdb) bt #0 0x00007ffff75e60f5 in raise () from /lib64/libc.so.6 #1 0x00007ffff75d0895 in abort () from /lib64/libc.so.6 #2 0x00007ffff75d0769 in __assert_fail_base.cold () from /lib64/libc.so.6 #3 0x00007ffff75de596 in __assert_fail () from /lib64/libc.so.6 #4 0x00000000004fc006 in refcount_sub_and_test (i=1, r=0x1224e88) at tools/include/linux/refcount.h:131 #5 refcount_dec_and_test (r=0x1224e88) at tools/include/linux/refcount.h:148 #6 map__put (map=0x1224df0) at util/map.c:299 #7 0x00000000004fdb95 in __maps__remove (map=0x1224df0, maps=0xb17d80) at util/map.c:953 #8 maps__remove (maps=0xb17d80, map=0x1224df0) at util/map.c:959 #9 0x00000000004f7d8a in map_groups__remove (map=<optimized out>, mg=<optimized out>) at util/map_groups.h:65 #10 machine__process_ksymbol_unregister (sample=<optimized out>, event=0x7ffff7279670, machine=<optimized out>) at util/machine.c:728 #11 machine__process_ksymbol (machine=<optimized out>, event=0x7ffff7279670, sample=<optimized out>) at util/machine.c:741 #12 0x00000000004fffbb in perf_session__deliver_event (session=0xb11390, event=0x7ffff7279670, tool=0x7fffffffc7b0, file_offset=13936) at util/session.c:1362 #13 0x00000000005039bb in do_flush (show_progress=false, oe=0xb17e80) at util/ordered-events.c:243 #14 __ordered_events__flush (oe=0xb17e80, how=OE_FLUSH__ROUND, timestamp=<optimized out>) at util/ordered-events.c:322 #15 0x00000000005005e4 in perf_session__process_user_event (session=session@entry=0xb11390, event=event@entry=0x7ffff72a4af8, ... Add the map to the list and getting the reference event if we find the map with same name. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Andi Kleen <ak@linux.intel.com> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Eric Saint-Etienne <eric.saint.etienne@oracle.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Song Liu <songliubraving@fb.com> Fixes: `1e6285699b` ("perf symbols: Fix slowness due to -ffunction-section") Link: http://lkml.kernel.org/r/20190416160127.30203-10-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2019-04-17 14:30:11 -03:00
Jiri Olsa	adc6257c4a	perf evlist: Fix side band thread draining Current perf_evlist__poll_thread() code could finish without draining the data. Adding the logic that makes sure we won't finish before the drain. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Andi Kleen <ak@linux.intel.com> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Song Liu <songliubraving@fb.com> Fixes: `657ee55319` ("perf evlist: Introduce side band thread") Link: http://lkml.kernel.org/r/20190416160127.30203-9-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2019-04-17 14:30:11 -03:00
Song Liu	a93e0b2365	perf tools: Check maps for bpf programs As reported by Jiri Olsa in: "[BUG] perf: intel_pt won't display kernel function" https://lore.kernel.org/lkml/20190403143738.GB32001@krava Recent changes to support PERF_RECORD_KSYMBOL and PERF_RECORD_BPF_EVENT broke --kallsyms option. This is because it broke test __map__is_kmodule. This patch fixes this by adding check for bpf program, so that these maps are not mistaken as kernel modules. Signed-off-by: Song Liu <songliubraving@fb.com> Reported-by: Jiri Olsa <jolsa@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Andi Kleen <ak@linux.intel.com> Cc: Andrii Nakryiko <andrii.nakryiko@gmail.com> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Martin KaFai Lau <kafai@fb.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Yonghong Song <yhs@fb.com> Link: http://lkml.kernel.org/r/20190416160127.30203-8-jolsa@kernel.org Fixes: `76193a9452` ("perf, bpf: Introduce PERF_RECORD_KSYMBOL") Signed-off-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2019-04-17 14:30:11 -03:00
Jiri Olsa	aa52660231	perf bpf: Return NULL when RB tree lookup fails in perf_env__find_bpf_prog_info() We currently don't return NULL in case we don't find the bpf_prog_info_node, fixing that. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Acked-by: Song Liu <songliubraving@fb.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Fixes: `e4378f0cb9` ("perf bpf: Save bpf_prog_info in a rbtree in perf_env") Link: http://lkml.kernel.org/r/20190416134151.15282-1-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2019-04-17 14:30:06 -03:00
Linus Torvalds	2a3a028fc6	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Pull networking fixes from David Miller: 1) Handle init flow failures properly in iwlwifi driver, from Shahar S Matityahu. 2) mac80211 TXQs need to be unscheduled on powersave start, from Felix Fietkau. 3) SKB memory accounting fix in A-MDSU aggregation, from Felix Fietkau. 4) Increase RCU lock hold time in mlx5 FPGA code, from Saeed Mahameed. 5) Avoid checksum complete with XDP in mlx5, also from Saeed. 6) Fix netdev feature clobbering in ibmvnic driver, from Thomas Falcon. 7) Partial sent TLS record leak fix from Jakub Kicinski. 8) Reject zero size iova range in vhost, from Jason Wang. 9) Allow pending work to complete before clcsock release from Karsten Graul. 10) Fix XDP handling max MTU in thunderx, from Matteo Croce. 11) A lot of protocols look at the sa_family field of a sockaddr before validating it's length is large enough, from Tetsuo Handa. 12) Don't write to free'd pointer in qede ptp error path, from Colin Ian King. 13) Have to recompile IP options in ipv4_link_failure because it can be invoked from ARP, from Stephen Suryaputra. 14) Doorbell handling fixes in qed from Denis Bolotin. 15) Revert net-sysfs kobject register leak fix, it causes new problems. From Wang Hai. 16) Spectre v1 fix in ATM code, from Gustavo A. R. Silva. 17) Fix put of BROPT_VLAN_STATS_PER_PORT in bridging code, from Nikolay Aleksandrov. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (111 commits) socket: fix compat SO_RCVTIMEO_NEW/SO_SNDTIMEO_NEW tcp: tcp_grow_window() needs to respect tcp_space() ocelot: Clean up stats update deferred work ocelot: Don't sleep in atomic context (irqs_disabled()) net: bridge: fix netlink export of vlan_stats_per_port option qed: fix spelling mistake "faspath" -> "fastpath" tipc: set sysctl_tipc_rmem and named_timeout right range tipc: fix link established but not in session net: Fix missing meta data in skb with vlan packet net: atm: Fix potential Spectre v1 vulnerabilities net/core: work around section mismatch warning for ptp_classifier net: bridge: fix per-port af_packet sockets bnx2x: fix spelling mistake "dicline" -> "decline" route: Avoid crash from dereferencing NULL rt->from MAINTAINERS: normalize Woojung Huh's email address bonding: fix event handling for stacked bonds Revert "net-sysfs: Fix memory leak in netdev_register_kobject" rtnetlink: fix rtnl_valid_stats_req() nlmsg_len check qed: Fix the DORQ's attentions handling qed: Fix missing DORQ attentions ...	2019-04-17 09:57:45 -07:00
Magnus Karlsson	2c5935f1b2	libbpf: optimize barrier for XDP socket rings The full memory barrier in the XDP socket rings on the consumer side between the load of the data and the store of the consumer ring is there to protect the store from being executed before the load of the data. If this was allowed to happen, the producer might overwrite the data field with a new entry before the consumer got the chance to read it. On x86, stores are guaranteed not to be reordered with older loads, so it does not need a full memory barrier here. A compile time barrier would be enough. This patch introdcues a new primitive in libbpf_util.h that implements a new barrier type (libbpf_smp_rwmb) hindering stores to be reordered with older loads. It is then used in the XDP socket ring access code in libbpf to improve performance. Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2019-04-16 20:13:10 -07:00
Magnus Karlsson	b7e3a28019	libbpf: remove dependency on barrier.h in xsk.h The use of smp_rmb() and smp_wmb() creates a Linux header dependency on barrier.h that is unnecessary in most parts. This patch implements the two small defines that are needed from barrier.h. As a bonus, the new implementations are faster than the default ones as they default to sfence and lfence for x86, while we only need a compiler barrier in our case. Just as it is when the same ring access code is compiled in the kernel. Fixes: `1cad078842` ("libbpf: add support for using AF_XDP sockets") Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2019-04-16 20:13:10 -07:00
Magnus Karlsson	a06d729646	libbpf: remove likely/unlikely in xsk.h This patch removes the use of likely and unlikely in xsk.h since they create a dependency on Linux headers as reported by several users. There have also been reports that the use of these decreases performance as the compiler puts the code on two different cache lines instead of on a single one. All in all, I think we are better off without them. Fixes: `1cad078842` ("libbpf: add support for using AF_XDP sockets") Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2019-04-16 20:13:10 -07:00
Magnus Karlsson	d5e63fdd44	libbpf: fix XDP socket ring buffer memory ordering The ring buffer code of XDP sockets is missing a memory barrier on the consumer side between the load of the data and the write that signals that it is ok for the producer to put new data into the buffer. On architectures that does not guarantee that stores are not reordered with older loads, the producer might put data into the ring before the consumer had the chance to read it. As IA does guarantee this ordering, it would only need a compiler barrier here, but there are no primitives in barrier.h for this specific case (hinder writes to be ordered before older reads) so I had to add a smp_mb() here which will translate into a run-time synch operation on IA. Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2019-04-16 20:13:10 -07:00
Prashant Bhole	d1b7725dfe	tools/bpftool: show btf_id in map listing Let's print btf id of map similar to the way we are printing it for programs. Sample output: user@test# bpftool map -f 61: lpm_trie flags 0x1 key 20B value 8B max_entries 1 memlock 4096B 133: array name test_btf_id flags 0x0 key 4B value 4B max_entries 4 memlock 4096B pinned /sys/fs/bpf/test100 btf_id 174 170: array name test_btf_id flags 0x0 key 4B value 4B max_entries 4 memlock 4096B btf_id 240 Signed-off-by: Prashant Bhole <bhole_prashant_q7@lab.ntt.co.jp> Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Acked-by: Song Liu <songliubraving@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2019-04-16 19:48:26 -07:00
Prashant Bhole	d459b59ee0	tools/bpftool: re-organize newline printing for map listing Let's move the final newline printing in show_map_close_plain() at the end of the function because it looks correct and consistent with prog.c. Also let's do related changes for the line which prints pinned file name. Signed-off-by: Prashant Bhole <bhole_prashant_q7@lab.ntt.co.jp> Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Acked-by: Song Liu <songliubraving@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2019-04-16 19:48:26 -07:00
Andrey Ignatov	f25377ee4f	bpftool: Support sysctl hook Add support for recently added BPF_PROG_TYPE_CGROUP_SYSCTL program type and BPF_CGROUP_SYSCTL attach type. Example of bpftool output with sysctl program from selftests: # bpftool p load ./test_sysctl_prog.o /mnt/bpf/sysctl_prog type cgroup/sysctl # bpftool p l 9: cgroup_sysctl name sysctl_tcp_mem tag 0dd05f81a8d0d52e gpl loaded_at 2019-04-16T12:57:27-0700 uid 0 xlated 1008B jited 623B memlock 4096B # bpftool c a /mnt/cgroup2/bla sysctl id 9 # bpftool c t CgroupPath ID AttachType AttachFlags Name /mnt/cgroup2/bla 9 sysctl sysctl_tcp_mem # bpftool c d /mnt/cgroup2/bla sysctl id 9 # bpftool c t CgroupPath ID AttachType AttachFlags Name Signed-off-by: Andrey Ignatov <rdna@fb.com> Acked-by: Song Liu <songliubraving@fb.com> Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2019-04-16 19:45:47 -07:00
Andrii Nakryiko	e1d1dc4653	libbpf: fix printf formatter for ptrdiff_t argument Using %ld for printing out value of ptrdiff_t type is not portable between 32-bit and 64-bit archs. This is causing compilation errors for libbpf on 32-bit platform (discovered as part of an effort to integrate libbpf into systemd ([0])). Proper formatter is %td, which is used in this patch. v2->v1: - add Reported-by - provide more context on how this issue was discovered [0] https://github.com/systemd/systemd/pull/12151 Reported-by: Evgeny Vereshchagin <evvers@ya.ru> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Alexei Starovoitov <ast@fb.com> Cc: Yonghong Song <yhs@fb.com> Signed-off-by: Andrii Nakryiko <andriin@fb.com> Acked-by: Song Liu <songliubraving@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2019-04-16 19:44:19 -07:00
Peter Oskolkov	809041e765	selftests: bpf: add VRF test cases to lwt_ip_encap test. This patch adds tests validating that VRF and BPF-LWT encap work together well, as requested by David Ahern. Signed-off-by: Peter Oskolkov <posk@google.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2019-04-16 19:19:51 -07:00
Kees Cook	a745f7af3c	selftests/harness: Add 30 second timeout per test In order to keep tests from hanging forever, this adds an alarm signal to each test run. This assumes an individual test doesn't take longer than 30 seconds. Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>	2019-04-16 17:04:34 -06:00
Kees Cook	9dd3fcb0ab	selftests/seccomp: Handle namespace failures gracefully When running without USERNS or PIDNS the seccomp test would hang since it was waiting forever for the child to trigger the user notification since it seems the glibc() abort handler makes a call to getpid(), which would trap again. This changes the getpid filter to getppid, and makes sure ASSERTs execute to stop from spawning the listener. Reported-by: Shuah Khan <shuah@kernel.org> Fixes: `6a21cc50f0` ("seccomp: add a return code to trap to userspace") Cc: stable@vger.kernel.org # > 5.0 Signed-off-by: Kees Cook <keescook@chromium.org> Reviewed-by: Tycho Andersen <tycho@tycho.ws> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>	2019-04-16 17:04:08 -06:00
Linus Torvalds	b5de3c5026	* Fix for a memory leak introduced during the merge window * Fixes for nested VMX with ept=0 * Fixes for AMD (APIC virtualization, NMI injection) * Fixes for Hyper-V under KVM and KVM under Hyper-V * Fixes for 32-bit SMM and tests for SMM virtualization * More array_index_nospec peppering -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (GNU/Linux) iQEcBAABAgAGBQJctdrUAAoJEL/70l94x66Deq8H/0OEIBBuDt53nPEHXufNSV1S uzIVvwJoL6786URWZfWZ99Z/NTTA1rn9Vr/leLPkSidpDpw7IuK28KZtEMP2rdRE Sb8eN2g4SoQ51ZDSIMUzjcx9VGNqkH8CWXc2yhDtTUSD21S3S1kidZ0O0YbmetkJ OwF1EDx4m7JO6EUHaJhIfdTUb9ItRC1Vfo7hpOuRVxPx2USv5+CLbexpteKogMcI 5WDaXFIRwUWW6Z8Bwyi7yA9gELKcXTTXlz9T/A7iKeqxRMLBazVKnH8h7Lfd0M0A wR4AI+tE30MuHT7WLh1VOAKZk6TDabq9FJrva3JlDq+T+WOjgUzYALLKEd4Vv4o= =zsT5 -----END PGP SIGNATURE----- Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm Pull KVM fixes from Paolo Bonzini: "5.1 keeps its reputation as a big bugfix release for KVM x86. - Fix for a memory leak introduced during the merge window - Fixes for nested VMX with ept=0 - Fixes for AMD (APIC virtualization, NMI injection) - Fixes for Hyper-V under KVM and KVM under Hyper-V - Fixes for 32-bit SMM and tests for SMM virtualization - More array_index_nospec peppering" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (21 commits) KVM: x86: avoid misreporting level-triggered irqs as edge-triggered in tracing KVM: fix spectrev1 gadgets KVM: x86: fix warning Using plain integer as NULL pointer selftests: kvm: add a selftest for SMM selftests: kvm: fix for compilers that do not support -no-pie selftests: kvm/evmcs_test: complete I/O before migrating guest state KVM: x86: Always use 32-bit SMRAM save state for 32-bit kernels KVM: x86: Don't clear EFER during SMM transitions for 32-bit vCPU KVM: x86: clear SMM flags before loading state while leaving SMM KVM: x86: Open code kvm_set_hflags KVM: x86: Load SMRAM in a single shot when leaving SMM KVM: nVMX: Expose RDPMC-exiting only when guest supports PMU KVM: x86: Raise #GP when guest vCPU do not support PMU x86/kvm: move kvm_load/put_guest_xcr0 into atomic context KVM: x86: svm: make sure NMI is injected after nmi_singlestep svm/avic: Fix invalidate logical APIC id entry Revert "svm: Fix AVIC incomplete IPI emulation" kvm: mmu: Fix overflow on kvm mmu page limit calculation KVM: nVMX: always use early vmcs check when EPT is disabled KVM: nVMX: allow tests to use bad virtual-APIC page address ...	2019-04-16 08:52:00 -07:00
Arnaldo Carvalho de Melo	30e4c57496	tools include uapi: Sync sound/asound.h copy Picking the changes from: Fixes: `b5bdbb6ccd` ("ALSA: uapi: #include <time.h> in asound.h") Which entails no changes in the tooling side. To silence this perf tools build warning: Warning: Kernel ABI header at 'tools/include/uapi/sound/asound.h' differs from latest version at 'include/uapi/sound/asound.h' diff -u tools/include/uapi/sound/asound.h include/uapi/sound/asound.h Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Daniel Mentz <danielmentz@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Takashi Iwai <tiwai@suse.de> Link: https://lkml.kernel.org/n/tip-15o4twfkbn6nny9aus90dyzx@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2019-04-16 12:36:20 -03:00
Jiri Olsa	1e6db2ee86	perf top: Always sample time to satisfy needs of use of ordered queuing Bastian reported broken 'perf top -p PID' command, it won't display any data. The problem is that for -p option we monitor single thread, so we don't enable time in samples, because it's not needed. However since commit `16c66bc167` we use ordered queues to stash data plus later commits added logic for dropping samples in case there's big load and we don't keep up. All this needs timestamp for sample. Enabling it unconditionally for perf top. Reported-by: Bastian Beischer <bastian.beischer@rwth-aachen.de> Signed-off-by: Jiri Olsa <jolsa@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: bastian beischer <bastian.beischer@rwth-aachen.de> Fixes: `16c66bc167` ("perf top: Add processing thread") Link: http://lkml.kernel.org/r/20190415125333.27160-1-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2019-04-16 12:36:20 -03:00
Mao Han	3a5b64f05d	perf evsel: Use hweight64() instead of hweight_long(attr.sample_regs_user) On 32-bits platform with more than 32 registers, the 64 bits mask is truncate to the lower 32 bits and the return value of hweight_long will always smaller than 32. When kernel outputs more than 32 registers, but the user perf program only counts 32, there will be a data mismatch result to overflow check fail. Signed-off-by: Mao Han <han_mao@c-sky.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Fixes: `6a21c0b5c2` ("perf tools: Add core support for sampling intr machine state regs") Fixes: `d03f217054` ("perf tools: Expand perf_event__synthesize_sample()") Fixes: `0f6a30150c` ("perf tools: Support user regs and stack in sample parsing") Link: http://lkml.kernel.org/r/29ad7947dc8fd1ff0abd2093a72cc27a2446be9f.1554883878.git.han_mao@c-sky.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2019-04-16 11:27:53 -03:00
Rikard Falkeborn	f32c2877bc	tools lib traceevent: Fix missing equality check for strcmp There was a missing comparison with 0 when checking if type is "s64" or "u64". Therefore, the body of the if-statement was entered if "type" was "u64" or not "s64", which made the first strcmp() redundant since if type is "u64", it's not "s64". If type is "s64", the body of the if-statement is not entered but since the remainder of the function consists of if-statements which will not be entered if type is "s64", we will just return "val", which is correct, albeit at the cost of a few more calls to strcmp(), i.e., it will behave just as if the if-statement was entered. If type is neither "s64" or "u64", the body of the if-statement will be entered incorrectly and "val" returned. This means that any type that is checked after "s64" and "u64" is handled the same way as "s64" and "u64", i.e., the limiting of "val" to fit in for example "s8" is never reached. This was introduced in the kernel tree when the sources were copied from trace-cmd in commit `f7d82350e5` ("tools/events: Add files to create libtraceevent.a"), and in the trace-cmd repo in 1cdbae6035cei ("Implement typecasting in parser") when the function was introduced, i.e., it has always behaved the wrong way. Detected by cppcheck. Signed-off-by: Rikard Falkeborn <rikard.falkeborn@gmail.com> Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Cc: Tzvetomir Stoyanov <tstoyanov@vmware.com> Fixes: `f7d82350e5` ("tools/events: Add files to create libtraceevent.a") Link: http://lkml.kernel.org/r/20190409091529.2686-1-rikard.falkeborn@gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2019-04-16 11:27:36 -03:00
Jiri Olsa	8002a63f9a	perf stat: Disable DIR_FORMAT feature for 'perf stat record' Arnaldo reported assertion in perf stat record: assertion failed at util/header.c:875 There's no support for this in the 'perf state record' command, disable the feature for that case. Reported-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Jiri Olsa <jolsa@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Fixes: `258031c017` ("perf header: Add DIR_FORMAT feature to describe directory data") Link: http://lkml.kernel.org/r/20190409100156.20303-1-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2019-04-16 11:27:18 -03:00
Adrian Hunter	6e4b1cac30	perf scripts python: export-to-sqlite.py: Fix use of parent_id in calls_view Fix following error using calls_view: Query failed: ambiguous column name: parent_id Unable to execute statement Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Fixes: `8ce9a7251d` ("perf scripts python: export-to-sqlite.py: Export calls parent_id") Link: http://lkml.kernel.org/r/20190409062557.26138-1-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2019-04-16 11:27:05 -03:00
Gustavo A. R. Silva	14c9b31a92	perf header: Fix lock/unlock imbalances when processing BPF/BTF info Fix lock/unlock imbalances by refactoring the code a bit and adding calls to up_write() before return. Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Acked-by: Song Liu <songliubraving@fb.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Addresses-Coverity-ID: 1444315 ("Missing unlock") Addresses-Coverity-ID: 1444316 ("Missing unlock") Fixes: `a70a112317` ("perf bpf: Save BTF information as headers to perf.data") Fixes: `606f972b13` ("perf bpf: Save bpf_prog_info information as headers to perf.data") Link: http://lkml.kernel.org/r/20190408173355.GA10501@embeddedor [ Simplified the exit path to have just one up_write() + return ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2019-04-16 11:26:43 -03:00
Paul E. McKenney	91df49e187	Merge LKMM and RCU commits	2019-04-16 07:08:07 -07:00
Vitaly Kuznetsov	79904c9de0	selftests: kvm: add a selftest for SMM Add a simple test for SMM, based on VMX. The test implements its own sync between the guest and the host as using our ucall library seems to be too cumbersome: SMI handler is happening in real-address mode. This patch also fixes KVM_SET_NESTED_STATE to happen after KVM_SET_VCPU_EVENTS, in fact it places it last. This is because KVM needs to know whether the processor is in SMM or not. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2019-04-16 15:38:06 +02:00
Paolo Bonzini	c2390f16fc	selftests: kvm: fix for compilers that do not support -no-pie -no-pie was added to GCC at the same time as their configuration option --enable-default-pie. Compilers that were built before do not have -no-pie, but they also do not need it. Detect the option at build time. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2019-04-16 15:38:05 +02:00
Paolo Bonzini	c68c21ca92	selftests: kvm/evmcs_test: complete I/O before migrating guest state Starting state migration after an IO exit without first completing IO may result in test failures. We already have two tests that need this (this patch in fact fixes evmcs_test, similar to what was fixed for state_test in commit `0f73bbc851`, "KVM: selftests: complete IO before migrating guest state", 2019-03-13) and a third is coming. So, move the code to vcpu_save_state, and while at it do not access register state until after I/O is complete. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2019-04-16 15:37:39 +02:00
Ingo Molnar	496156e364	Merge branch 'linus' into perf/core, to pick up fixes Signed-off-by: Ingo Molnar <mingo@kernel.org>	2019-04-16 12:02:43 +02:00
Stanislav Fomichev	a5cb33464e	selftests/bpf: make flow dissector tests more extensible Rewrite selftest to iterate over an array with input packet and expected flow_keys. This should make it easier to extend this test with additional cases without too much boilerplate. Signed-off-by: Stanislav Fomichev <sdf@google.com> Acked-by: Song Liu <songliubraving@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2019-04-16 10:21:12 +02:00
Alexei Starovoitov	08de198c95	selftests/bpf: two scale tests Add two tests to check that sequence of 1024 jumps is verifiable. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Song Liu <songliubraving@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2019-04-16 10:18:15 +02:00
Benjamin Poirier	3da6e7e408	bpftool: Improve handling of ENOSPC on reuseport_array map dumps avoids outputting a series of value: No space left on device The value itself is not wrong but bpf_fd_reuseport_array_lookup_elem() can only return it if the map was created with value_size = 8. There's nothing bpftool can do about it. Instead of repeating this error for every key in the map, print an explanatory warning and a specialized error. example before: key: 00 00 00 00 value: No space left on device key: 01 00 00 00 value: No space left on device key: 02 00 00 00 value: No space left on device Found 0 elements example after: Warning: cannot read values from reuseport_sockarray map with value_size != 8 key: 00 00 00 00 value: <cannot read> key: 01 00 00 00 value: <cannot read> key: 02 00 00 00 value: <cannot read> Found 0 elements Signed-off-by: Benjamin Poirier <bpoirier@suse.com> Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2019-04-16 10:16:33 +02:00
Benjamin Poirier	0478c3bf81	bpftool: Use print_entry_error() in case of ENOENT when dumping Commit `bf598a8f0f` ("bpftool: Improve handling of ENOENT on map dumps") used print_entry_plain() in case of ENOENT. However, that commit introduces dead code. Per-cpu maps are zero-filled. When reading them, it's all or nothing. There will never be a case where some cpus have an entry and others don't. The truth is that ENOENT is an error case. Use print_entry_error() to output the desired message. That function's "value" parameter is also renamed to indicate that we never use it for an actual map value. The output format is unchanged. Signed-off-by: Benjamin Poirier <bpoirier@suse.com> Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2019-04-16 10:16:33 +02:00
Quentin Monnet	25df480def	tools: bpftool: add a note on program statistics in man page Linux kernel now supports statistics for BPF programs, and bpftool is able to dump them. However, these statistics are not enabled by default, and administrators may not know how to access them. Add a paragraph in bpftool documentation, under the description of the "bpftool prog show" command, to explain that such statistics are available and that their collection is controlled via a dedicated sysctl knob. Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Acked-by: Song Liu <songliubraving@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2019-04-16 10:13:37 +02:00
Quentin Monnet	88b3eed805	tools: bpftool: fix short option name for printing version in man pages Manual pages would tell that option "-v" (lower case) would print the version number for bpftool. This is wrong: the short name of the option is "-V" (upper case). Fix the documentation accordingly. Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Acked-by: Song Liu <songliubraving@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2019-04-16 10:13:32 +02:00
Quentin Monnet	9a487883bd	tools: bpftool: fix man page documentation for "pinmaps" keyword The "pinmaps" keyword is present in the man page, in the verbose description of the "bpftool prog load" command. However, it is missing from the summary of available commands at the beginning of the file. Add it there as well. Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Acked-by: Song Liu <songliubraving@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2019-04-16 10:13:27 +02:00
Quentin Monnet	39c9f10639	tools: bpftool: reset errno for "bpftool cgroup tree" When trying to dump the tree of all cgroups under a given root node, bpftool attempts to query programs of all available attach types. Some of those attach types do not support queries, therefore several of the calls are actually expected to fail. Those calls set errno to EINVAL, which has no consequence for dumping the rest of the tree. It does have consequences however if errno is inspected at a later time. For example, bpftool batch mode relies on errno to determine whether a command has succeeded, and whether it should carry on with the next command. Setting errno to EINVAL when everything worked as expected would therefore make such command fail: # echo 'cgroup tree \n net show' \| \ bpftool batch file - To improve this, reset errno when its value is EINVAL after attempting to show programs for all existing attach types in do_show_tree_fn(). Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Acked-by: Song Liu <songliubraving@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2019-04-16 10:13:21 +02:00
Quentin Monnet	031ebc1aac	tools: bpftool: remove blank line after btf_id when listing programs Commit `569b0c7773` ("tools/bpftool: show btf id in program information") made bpftool print an empty line after each program entry when listing the BPF programs loaded on the system (plain output). This is especially confusing when some programs have an associated BTF id, and others don't. Let's remove the blank line. Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Acked-by: Song Liu <songliubraving@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2019-04-16 10:13:12 +02:00
Alan Maguire	bfb35c27c6	bpf: fix whitespace for ENCAP_L2 defines in bpf.h replace tab after #define with space in line with rest of definitions Signed-off-by: Alan Maguire <alan.maguire@oracle.com> Acked-by: Song Liu <songliubraving@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2019-04-16 09:54:21 +02:00
Stanislav Fomichev	bcbccad694	selftests/bpf: bring back (void *) cast to set_ipv4_csum in test_tc_tunnel It was removed in commit `166b5a7f2c` ("selftests_bpf: extend test_tc_tunnel for UDP encap") without any explanation. Otherwise I see: progs/test_tc_tunnel.c:160:17: warning: taking address of packed member 'ip' of class or structure 'v4hdr' may result in an unaligned pointer value [-Waddress-of-packed-member] set_ipv4_csum(&h_outer.ip); ^~~~~~~~~~ 1 warning generated. Cc: Alan Maguire <alan.maguire@oracle.com> Cc: Willem de Bruijn <willemdebruijn.kernel@gmail.com> Fixes: `166b5a7f2c` ("selftests_bpf: extend test_tc_tunnel for UDP encap") Signed-off-by: Stanislav Fomichev <sdf@google.com> Acked-by: Song Liu <songliubraving@fb.com> Reviewed-by: Alan Maguire <alan.maguire@oracle.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2019-04-16 09:51:48 +02:00
Andrii Nakryiko	efb2ddc4ce	selftests/btf: add VAR and DATASEC case for dedup tests Add test case verifying that dedup happens (INTs are deduped in this case) and VAR/DATASEC types are not deduped, but have their referenced type IDs adjusted correctly. Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Yonghong Song <yhs@fb.com> Cc: Alexei Starovoitov <ast@fb.com> Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2019-04-16 09:50:20 +02:00
Andrii Nakryiko	189cf5a4a7	btf: add support for VAR and DATASEC in btf_dedup() This patch adds support for VAR and DATASEC in btf_dedup(). VAR/DATASEC are never deduplicated, but they need to be processed anyway as types they refer to might need to be remapped due to deduplication and compaction. Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Yonghong Song <yhs@fb.com> Cc: Alexei Starovoitov <ast@fb.com> Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2019-04-16 09:50:20 +02:00
Linus Torvalds	618d919cae	libnvdimm fixes v5.1-rc6 - Compatibility fix for nvdimm-security implementations with a default zero-key. - Miscellaneous small fixes for out-of-bound accesses, cleanup after initialization failures, and missing debug messages. -----BEGIN PGP SIGNATURE----- iQIcBAABAgAGBQJctQ6UAAoJEB7SkWpmfYgCV4AQAL18Kv0VJjeWzMirH+Q5B9Z2 WdHzKBvOWUWx8HeUhQoTtP+QnWriXI37EmhD7S34mVJZdYXxIQJBESPpFF1IpjUi jMibrdgrPAzyXq+x6FS4gHwi8uwUwwHOYfBEPV+7UvA8Zi8AU+g1Sgl+FftY34Em ACWc8/BtMtnwr2xFZT/4brzDCyvVHTK7f9HB280Je7DU6ghjEAaRFqqFXgAAbQ+l HAOQz4GVweT/JUmu4cvABGwllTN3np4wR/ePKYdlZTVWpN02InECukiSFtgCWN4S +bKm5EMTGDprLtNDz3m1SDWPrGSkWkoEtmVZljAXepJzAcZ1qbEw4xe++Kqrgr0z YOawM0lMciTp78uiH797thYnS3fo5+Ccr0WE4lhrSC3kAZE+EfGvbyhv3T+Pz3M+ Z3hEpz+gGNMBwby0AmCLJHfwyujztNBE5hnXcsL5dC6BXKHZGZSgsUllRcZJ+xJ1 H6b5sdxmNvn7Ja0svhKJzfpP4j8v25v+KEns9VlbIejJAp62cQCmA1dHlGaC5pDc 0g9mtPbYsEZjKQ5/5grHgtre+JYmYDAIKwS4UK11ZyChqR+tmZ2Cp7XgmVab9a7T QpFLczMV/Q8NSWIFYSHvXjj1/PWtUxf81lEtA+Y3+mDznn30QctPwufPcdxeTNJs KSyFKhhKIOnasEplrLu4 =zISv -----END PGP SIGNATURE----- Merge tag 'libnvdimm-fixes-5.1-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm Pull libnvdimm fixes from Dan Williams: "I debated holding this back for the v5.2 merge window due to the size of the "zero-key" changes, but affected users would benefit from having the fixes sooner. It did not make sense to change the zero-key semantic in isolation for the "secure-erase" command, but instead include it for all security commands. The short background on the need for these changes is that some NVDIMM platforms enable security with a default zero-key rather than let the OS specify the initial key. This makes the security enabling that landed in v5.0 unusable for some users. Summary: - Compatibility fix for nvdimm-security implementations with a default zero-key. - Miscellaneous small fixes for out-of-bound accesses, cleanup after initialization failures, and missing debug messages" * tag 'libnvdimm-fixes-5.1-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: tools/testing/nvdimm: Retain security state after overwrite libnvdimm/pmem: fix a possible OOB access when read and write pmem libnvdimm/security, acpi/nfit: unify zero-key for all security commands libnvdimm/security: provide fix for secure-erase to use zero-key libnvdimm/btt: Fix a kmemdup failure check libnvdimm/namespace: Fix a potential NULL pointer dereference acpi/nfit: Always dump _DSM output payload	2019-04-15 16:48:51 -07:00
Ido Schimmel	3321cff3c5	selftests: mlxsw: Test neighbour offload indication Test that neighbour entries are marked as offloaded. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-04-15 13:29:21 -07:00
David S. Miller	95337b9821	Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next Pablo Neira Ayuso says: ==================== Netfilter/IPVS updates for net-next The following patchset contains Netfilter updates for net-next: 1) Remove the broute pseudo hook, implement this from the bridge prerouting hook instead. Now broute becomes real table in ebtables, from Florian Westphal. This also includes a size reduction patch for the bridge control buffer area via squashing boolean into bitfields and a selftest. 2) Add OS passive fingerprint version matching, from Fernando Fernandez. 3) Support for gue encapsulation for IPVS, from Jacky Hu. 4) Add support for NAT to the inet family, from Florian Westphal. This includes support for masquerade, redirect and nat extensions. 5) Skip interface lookup in flowtable, use device in the dst object. 6) Add jiffies64_to_msecs() and use it, from Li RongQing. 7) Remove unused parameter in nf_tables_set_desc_parse(), from Colin Ian King. 8) Statify several functions, patches from YueHaibing and Florian Westphal. 9) Add an optimized version of nf_inet_addr_cmp(), from Li RongQing. 10) Merge route extension to core, also from Florian. 11) Use IS_ENABLED(CONFIG_NF_NAT) instead of NF_NAT_NEEDED, from Florian. 12) Merge ip/ip6 masquerade extensions, from Florian. This includes netdevice notifier unification. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2019-04-15 12:07:35 -07:00
Miroslav Benes	802c247160	selftests/livepatch: Add functions.sh to TEST_PROGS_EXTENDED Add functions.sh to TEST_PROGS_EXTENDED so that it is installed along with the rest of the selftests and they can be run. Originally-by: Shuah Khan <shuah@kernel.org> Signed-off-by: Miroslav Benes <mbenes@suse.cz> Acked-by: Joe Lawrence <joe.lawrence@redhat.com> Signed-off-by: Petr Mladek <pmladek@suse.com>	2019-04-15 10:43:21 +02:00
Florian Westphal	5bdac418f3	netfilter: nat: fix icmp id randomization Sven Auhagen reported that a 2nd ping request will fail if 'fully-random' mode is used. Reason is that if no proto information is given, min/max are both 0, so we set the icmp id to 0 instead of chosing a random value between 0 and 65535. Update test case as well to catch this, without fix this yields: [..] ERROR: cannot ping ns1 from ns2 with ip masquerade fully-random (attempt 2) ERROR: cannot ping ns1 from ns2 with ipv6 masquerade fully-random (attempt 2) ... becaus 2nd ping clashes with existing 'id 0' icmp conntrack and gets dropped. Fixes: `203f2e7820` ("netfilter: nat: remove l4proto->unique_tuple") Reported-by: Sven Auhagen <sven.auhagen@voleatech.de> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2019-04-15 07:31:50 +02:00
Linus Torvalds	4443f8e6ac	for-linus-20190412 -----BEGIN PGP SIGNATURE----- iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAlyw354QHGF4Ym9lQGtl cm5lbC5kawAKCRD301j7KXHgpiMyEAC4THUReCrTuv9oFRNg5uILVYIq51nP8dw7 XamC7A92jPXd6vl/QVjmvLwT34/Y2XvX0t62RBsk849CEjgGYTeF1/qI3tMkpN7c huupab3aYM/Rrv4i1KSPQu6iIto3DYqfmREaGJJ1Ikbu/CKDuUGyEo+Z4wrKUPon GWnE8QMS2fdc764eVzKKqB+GryaEiHmeD1N4NnPs+nla14ysueUvJUikkTt/Laef h7nOmz9mrqE6u1xVHNpo0TlW0oJdLfaDIL9ghwHFJXqvriTh8Tg2tEHpXI6vSTTt StnPbTA1s1uhHs4rWYl8J0UXSZnRRp0Ep8jCvqEb9CJ23uHCNyGEoy/R7q+x2quf T+ruolMXY7IIJP30ZMHar374YfajJdw7EH/565nlbLnjSBXhqjmc07kQ7mIYvpg6 JgureSdDwOOHpfrJgVq5es48ndt5HBYUBPzkvVGTgkeSJkMydkkM1qZeYEnai105 8EnUFusRUnYZtb73HBPjKS7i0BZZvZlI1oKYHabiMtajqcKyvwDP2tTmhqXYLDLY 9uloW0u2B0lddfzCb9hTYZOroNWfifo4vuSU5DHvnJoKvf4z3auDxaFD9N8fGn6S aZsRjMCpFqFd0YEnZPbsctgPg2Licrs02uPntlzBTJ0ByH20pX4OepYrvgQk3vao tOQ1jRYMKw== =cISy -----END PGP SIGNATURE----- Merge tag 'for-linus-20190412' of git://git.kernel.dk/linux-block Pull block fixes from Jens Axboe: "Set of fixes that should go into this round. This pull is larger than I'd like at this time, but there's really no specific reason for that. Some are fixes for issues that went into this merge window, others are not. Anyway, this contains: - Hardware queue limiting for virtio-blk/scsi (Dongli) - Multi-page bvec fixes for lightnvm pblk - Multi-bio dio error fix (Jason) - Remove the cache hint from the io_uring tool side, since we didn't move forward with that (me) - Make io_uring SETUP_SQPOLL root restricted (me) - Fix leak of page in error handling for pc requests (Jérôme) - Fix BFQ regression introduced in this merge window (Paolo) - Fix break logic for bio segment iteration (Ming) - Fix NVMe cancel request error handling (Ming) - NVMe pull request with two fixes (Christoph): - fix the initial CSN for nvme-fc (James) - handle log page offsets properly in the target (Keith)" * tag 'for-linus-20190412' of git://git.kernel.dk/linux-block: block: fix the return errno for direct IO nvmet: fix discover log page when offsets are used nvme-fc: correct csn initialization and increments on error block: do not leak memory in bio_copy_user_iov() lightnvm: pblk: fix crash in pblk_end_partial_read due to multipage bvecs nvme: cancel request synchronously blk-mq: introduce blk_mq_complete_request_sync() scsi: virtio_scsi: limit number of hw queues by nr_cpu_ids virtio-blk: limit number of hw queues by nr_cpu_ids block, bfq: fix use after free in bfq_bfqq_expire io_uring: restrict IORING_SETUP_SQPOLL to root tools/io_uring: remove IOCQE_FLAG_CACHEHIT block: don't use for-inside-for in bio_for_each_segment_all	2019-04-13 16:23:16 -07:00
Florian Westphal	becf2319f3	selftests: netfilter: check icmp pkttoobig errors are set as related When an icmp error such as pkttoobig is received, conntrack checks if the "inner" header (header of packet that did not fit link mtu) is matches an existing connection, and, if so, sets that packet as being related to the conntrack entry it found. It was recently reported that this "related" setting also works if the inner header is from another, different connection (i.e., artificial/forged icmp error). Add a test, followup patch will add additional "inner dst matches outer dst in reverse direction" check before setting related state. Link: https://www.synacktiv.com/posts/systems/icmp-reachable.html Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2019-04-13 14:52:57 +02:00
Linus Torvalds	54c63a7558	Merge branch 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull core fixes from Ingo Molnar: "Fix an objtool warning plus fix a u64_to_user_ptr() macro expansion bug" * 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: objtool: Add rewind_stack_do_exit() to the noreturn list linux/kernel.h: Use parentheses around argument in u64_to_user_ptr()	2019-04-12 20:13:13 -07:00
Jiri Pirko	38f58c9723	netdevsim: move sdev specific bpf debugfs files to sdev dir Some netdevsim bpf debugfs files are per-sdev, yet they are defined per netdevsim instance. Move them under sdev directory. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-04-12 16:49:54 -07:00
Andrey Ignatov	7568f4cbbe	selftests/bpf: C based test for sysctl and strtoX Add C based test for a few bpf_sysctl_* helpers and bpf_strtoul. Make sure that sysctl can be identified by name and that multiple integers can be parsed from sysctl value with bpf_strtoul. net/ipv4/tcp_mem is chosen as a testing sysctl, it contains 3 unsigned longs, they all are parsed and compared (val[0] < val[1] < val[2]). Example of output: # ./test_sysctl ... Test case: C prog: deny all writes .. [PASS] Test case: C prog: deny access by name .. [PASS] Test case: C prog: read tcp_mem .. [PASS] Summary: 39 PASSED, 0 FAILED Signed-off-by: Andrey Ignatov <rdna@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2019-04-12 13:54:59 -07:00
Andrey Ignatov	8549ddc832	selftests/bpf: Test bpf_strtol and bpf_strtoul helpers Test that bpf_strtol and bpf_strtoul helpers can be used to convert provided buffer to long or unsigned long correspondingly and return both correct result and number of consumed bytes, or proper errno. Example of output: # ./test_sysctl .. Test case: bpf_strtoul one number string .. [PASS] Test case: bpf_strtoul multi number string .. [PASS] Test case: bpf_strtoul buf_len = 0, reject .. [PASS] Test case: bpf_strtoul supported base, ok .. [PASS] Test case: bpf_strtoul unsupported base, EINVAL .. [PASS] Test case: bpf_strtoul buf with spaces only, EINVAL .. [PASS] Test case: bpf_strtoul negative number, EINVAL .. [PASS] Test case: bpf_strtol negative number, ok .. [PASS] Test case: bpf_strtol hex number, ok .. [PASS] Test case: bpf_strtol max long .. [PASS] Test case: bpf_strtol overflow, ERANGE .. [PASS] Summary: 36 PASSED, 0 FAILED Signed-off-by: Andrey Ignatov <rdna@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2019-04-12 13:54:59 -07:00
Andrey Ignatov	c2d5f12e4c	selftests/bpf: Test ARG_PTR_TO_LONG arg type Test that verifier handles new argument types properly, including uninitialized or partially initialized value, misaligned stack access, etc. Example of output: #456/p ARG_PTR_TO_LONG uninitialized OK #457/p ARG_PTR_TO_LONG half-uninitialized OK #458/p ARG_PTR_TO_LONG misaligned OK #459/p ARG_PTR_TO_LONG size < sizeof(long) OK #460/p ARG_PTR_TO_LONG initialized OK Signed-off-by: Andrey Ignatov <rdna@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2019-04-12 13:54:59 -07:00
Andrey Ignatov	99f57973ac	selftests/bpf: Add sysctl and strtoX helpers to bpf_helpers.h Add bpf_sysctl_* and bpf_strtoX helpers to bpf_helpers.h. Signed-off-by: Andrey Ignatov <rdna@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2019-04-12 13:54:59 -07:00
Andrey Ignatov	b457e5534c	bpf: Sync bpf.h to tools/ Sync bpf_strtoX related bpf UAPI changes to tools/. Signed-off-by: Andrey Ignatov <rdna@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2019-04-12 13:54:59 -07:00
Andrey Ignatov	9a1027e525	selftests/bpf: Test file_pos field in bpf_sysctl ctx Test access to file_pos field of bpf_sysctl context, both read (incl. narrow read) and write. # ./test_sysctl ... Test case: ctx:file_pos sysctl:read read ok .. [PASS] Test case: ctx:file_pos sysctl:read read ok narrow .. [PASS] Test case: ctx:file_pos sysctl:read write ok .. [PASS] ... Signed-off-by: Andrey Ignatov <rdna@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2019-04-12 13:54:59 -07:00
Andrey Ignatov	786047dd08	selftests/bpf: Test bpf_sysctl_{get,set}_new_value helpers Test that new value provided by user space on sysctl write can be read by bpf_sysctl_get_new_value and overridden by bpf_sysctl_set_new_value. # ./test_sysctl ... Test case: sysctl_get_new_value sysctl:read EINVAL .. [PASS] Test case: sysctl_get_new_value sysctl:write ok .. [PASS] Test case: sysctl_get_new_value sysctl:write ok long .. [PASS] Test case: sysctl_get_new_value sysctl:write E2BIG .. [PASS] Test case: sysctl_set_new_value sysctl:read EINVAL .. [PASS] Test case: sysctl_set_new_value sysctl:write ok .. [PASS] Summary: 22 PASSED, 0 FAILED Signed-off-by: Andrey Ignatov <rdna@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2019-04-12 13:54:59 -07:00
Andrey Ignatov	11ff34f74e	selftests/bpf: Test sysctl_get_current_value helper Test sysctl_get_current_value on sysctl read and write, buffers with enough space and too small buffers to get E2BIG and truncated result, etc. # ./test_sysctl ... Test case: sysctl_get_current_value sysctl:read ok, gt .. [PASS] Test case: sysctl_get_current_value sysctl:read ok, eq .. [PASS] Test case: sysctl_get_current_value sysctl:read E2BIG truncated .. [PASS] Test case: sysctl_get_current_value sysctl:read EINVAL .. [PASS] Test case: sysctl_get_current_value sysctl:write ok .. [PASS] Summary: 16 PASSED, 0 FAILED Signed-off-by: Andrey Ignatov <rdna@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2019-04-12 13:54:59 -07:00
Andrey Ignatov	6041c67f28	selftests/bpf: Test bpf_sysctl_get_name helper Test w/ and w/o BPF_F_SYSCTL_BASE_NAME, buffers with enough space and too small buffers to get E2BIG and truncated result, etc. # ./test_sysctl ... Test case: sysctl_get_name sysctl_value:base ok .. [PASS] Test case: sysctl_get_name sysctl_value:base E2BIG truncated .. [PASS] Test case: sysctl_get_name sysctl:full ok .. [PASS] Test case: sysctl_get_name sysctl:full E2BIG truncated .. [PASS] Test case: sysctl_get_name sysctl:full E2BIG truncated small .. [PASS] Summary: 11 PASSED, 0 FAILED Signed-off-by: Andrey Ignatov <rdna@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2019-04-12 13:54:58 -07:00
Andrey Ignatov	1f5fa9ab6e	selftests/bpf: Test BPF_CGROUP_SYSCTL Add unit test for BPF_PROG_TYPE_CGROUP_SYSCTL program type. Test that program can allow/deny access. Test both valid and invalid accesses to ctx->write. Example of output: # ./test_sysctl Test case: sysctl wrong attach_type .. [PASS] Test case: sysctl:read allow all .. [PASS] Test case: sysctl:read deny all .. [PASS] Test case: ctx:write sysctl:read read ok .. [PASS] Test case: ctx:write sysctl:write read ok .. [PASS] Test case: ctx:write sysctl:read write reject .. [PASS] Summary: 6 PASSED, 0 FAILED Signed-off-by: Andrey Ignatov <rdna@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2019-04-12 13:54:58 -07:00
Andrey Ignatov	7007af63da	selftests/bpf: Test sysctl section name Add unit test to verify that program and attach types are properly identified for "cgroup/sysctl" section name. Signed-off-by: Andrey Ignatov <rdna@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2019-04-12 13:54:58 -07:00
Andrey Ignatov	063cc9f06e	libbpf: Support sysctl hook Support BPF_PROG_TYPE_CGROUP_SYSCTL program in libbpf: identifying program and attach types by section name, probe. Signed-off-by: Andrey Ignatov <rdna@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2019-04-12 13:54:58 -07:00
Andrey Ignatov	196398d4c0	bpf: Sync bpf.h to tools/ Sync BPF_PROG_TYPE_CGROUP_SYSCTL related bpf UAPI changes to tools/. Signed-off-by: Andrey Ignatov <rdna@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2019-04-12 13:54:58 -07:00
David Ahern	56490b623a	selftests: Add debugging options to pmtu.sh pmtu.sh script runs a number of tests and dumps a summary of pass/fail. If a test fails, it is near impossible to debug why. For example: TEST: ipv6: PMTU exceptions [FAIL] There are a lot of commands run behind the scenes for this test. Which one is failing? Add a VERBOSE option to show commands that are run and any output from those commands. Add a PAUSE_ON_FAIL option to halt the script if a test fails allowing users to poke around with the setup in the failed state. In the process, rename tracing to TRACING and move declaration to top with the new variables. Signed-off-by: David Ahern <dsahern@gmail.com> Reviewed-by: Stefano Brivio <sbrivio@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-04-11 21:32:00 -07:00
David S. Miller	bb23581b9b	Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next Daniel Borkmann says: ==================== pull-request: bpf-next 2019-04-12 The following pull-request contains BPF updates for your net-next tree. The main changes are: 1) Improve BPF verifier scalability for large programs through two optimizations: i) remove verifier states that are not useful in pruning, ii) stop walking parentage chain once first LIVE_READ is seen. Combined gives approx 20x speedup. Increase limits for accepting large programs under root, and add various stress tests, from Alexei. 2) Implement global data support in BPF. This enables static global variables for .data, .rodata and .bss sections to be properly handled which allows for more natural program development. This also opens up the possibility to optimize program workflow by compiling ELFs only once and later only rewriting section data before reload, from Daniel and with test cases and libbpf refactoring from Joe. 3) Add config option to generate BTF type info for vmlinux as part of the kernel build process. DWARF debug info is converted via pahole to BTF. Latter relies on libbpf and makes use of BTF deduplication algorithm which results in 100x savings compared to DWARF data. Resulting .BTF section is typically about 2MB in size, from Andrii. 4) Add BPF verifier support for stack access with variable offset from helpers and add various test cases along with it, from Andrey. 5) Extend bpf_skb_adjust_room() growth BPF helper to mark inner MAC header so that L2 encapsulation can be used for tc tunnels, from Alan. 6) Add support for input __sk_buff context in BPF_PROG_TEST_RUN so that users can define a subset of allowed __sk_buff fields that get fed into the test program, from Stanislav. 7) Add bpf fs multi-dimensional array tests for BTF test suite and fix up various UBSAN warnings in bpftool, from Yonghong. 8) Generate a pkg-config file for libbpf, from Luca. 9) Dump program's BTF id in bpftool, from Prashant. 10) libbpf fix to use smaller BPF log buffer size for AF_XDP's XDP program, from Magnus. 11) kallsyms related fixes for the case when symbols are not present in BPF selftests and samples, from Daniel ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2019-04-11 17:00:05 -07:00
Florian Westphal	26f7fe4a5d	selftests: netfilter: add ebtables broute test case ebtables -t broute allows to redirect packets in a way that they get pushed up the stack, even if the interface is part of a bridge. In case of IP packets to non-local address, this means those IP packets are routed instead of bridged-forwarded, just as if the bridge would not have existed. Expected test output is: PASS: netns connectivity: ns1 and ns2 can reach each other PASS: ns1/ns2 connectivity with active broute rule PASS: ns1/ns2 connectivity with active broute rule and bridge forward drop Signed-off-by: Florian Westphal <fw@strlen.de> Acked-by: David S. Miller <davem@davemloft.net> Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2019-04-12 01:45:58 +02:00
Daniel Borkmann	6b7a21140f	tools: add smp_* barrier variants to include infrastructure Add the definition for smp_rmb(), smp_wmb(), and smp_mb() to the tools include infrastructure: this patch adds the implementation for x86-64 and arm64, and have it fall back as currently is for other archs which do not have it implemented at this point. The x86-64 one uses lock + add combination for smp_mb() with address below red zone. This is on top of `09d62154f6` ("tools, perf: add and use optimized ring_buffer_{read_head, write_tail} helpers"), which didn't touch smp_* barrier implementations. Magnus recently rightfully reported however that the latter on x86-64 still wrongly falls back to sfence, lfence and mfence respectively, thus fix that for applications under tools making use of these to avoid such ugly surprises. The main header under tools (include/asm/barrier.h) will in that case not select the fallback implementation. Reported-by: Magnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2019-04-11 14:45:50 -07:00
David Ahern	a5f622984a	selftests: fib_tests: Fix 'Command line is not complete' errors A couple of tests are verifying a route has been removed. The helper expects the prefix as the first part of the expected output. When checking that a route has been deleted the prefix is empty leading to an invalid ip command: $ ip ro ls match Command line is not complete. Try option "help" Fix by moving the comparison of expected output and output to a new function that is used by both check_route and check_route6. Use the new helper for the 2 checks on route removal. Also, remove the reset of 'set -x' in route_setup which overrides the user managed setting. Fixes: `d69faad765` ("selftests: fib_tests: Add prefix route tests with metric") Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-04-11 14:17:59 -07:00
Alan Maguire	3ec61df82b	selftests_bpf: add L2 encap to test_tc_tunnel Update test_tc_tunnel to verify adding inner L2 header encapsulation (an MPLS label or ethernet header) works. Signed-off-by: Alan Maguire <alan.maguire@oracle.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2019-04-11 22:50:57 +02:00
Alan Maguire	1db04c300a	bpf: sync bpf.h to tools/ for BPF_F_ADJ_ROOM_ENCAP_L2 Sync include/uapi/linux/bpf.h with tools/ equivalent to add BPF_F_ADJ_ROOM_ENCAP_L2(len) macro. Signed-off-by: Alan Maguire <alan.maguire@oracle.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2019-04-11 22:50:57 +02:00
Alan Maguire	166b5a7f2c	selftests_bpf: extend test_tc_tunnel for UDP encap commit `868d523535` ("bpf: add bpf_skb_adjust_room encap flags") introduced support to bpf_skb_adjust_room for GSO-friendly GRE and UDP encapsulation and later introduced associated test_tc_tunnel tests. Here those tests are extended to cover UDP encapsulation also. Signed-off-by: Alan Maguire <alan.maguire@oracle.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2019-04-11 22:50:56 +02:00
Vlad Buslov	9e35552ae1	net: sched: flower: use correct ht function to prevent duplicates Implementation of function rhashtable_insert_fast() check if its internal helper function __rhashtable_insert_fast() returns non-NULL pointer and seemingly return -EEXIST in such case. However, since __rhashtable_insert_fast() is called with NULL key pointer, it never actually checks for duplicates, which means that -EEXIST is never returned to the user. Use rhashtable_lookup_insert_fast() hash table API instead. In order to verify that it works as expected and prevent the problem from happening in future, extend tc-tests with new test that verifies that no new filters with existing key can be inserted to flower classifier. Fixes: `1f17f7742e` ("net: sched: flower: insert filter to ht before offloading it to hw") Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-04-11 11:33:06 -07:00
Kishon Vijay Abraham I	993d5fe31c	tools: PCI: Handle pcitest.sh independently from pcitest Handling pcitest.sh along with pcitest (obtained by compiling pcitest.c) results in pcitest.sh getting removed inadvertently while "make -C tools/pci clean". Fix it by handling pcitest.sh independently of pcitest. Fixes: `1ce78ce094` ("tools: PCI: Change pcitest compiling process") Reported-by: Adrian Hunter <adrian.hunter@intel.com> Signed-off-by: Kishon Vijay Abraham I <kishon@ti.com> Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Cc: Gustavo Pimentel <gustavo.pimentel@synopsys.com>	2019-04-11 10:25:38 +01:00
Kishon Vijay Abraham I	fbca0b284b	tools: PCI: Add 'h' in optstring of getopt() 'h' is a valid option character for the pcitest tool used to print the pcitest usage. Add 'h' in optstring of getopt() in order to get rid of "pcitest: invalid option -- 'h'" warning. While at that remove unncessary case '?'. Signed-off-by: Kishon Vijay Abraham I <kishon@ti.com> Signed-off-by: Sekhar Nori <nsekhar@ti.com> Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>	2019-04-11 10:13:00 +01:00
Stanislav Fomichev	3daf8e703e	selftests: bpf: add selftest for __sk_buff context in BPF_PROG_TEST_RUN Simple test that sets cb to {1,2,3,4,5} and priority to 6, runs bpf program that fails if cb is not what we expect and increments cb[i] and priority. When the test finishes, we check that cb is now {2,3,4,5,6} and priority is 7. We also test the sanity checks: * ctx_in is provided, but ctx_size_in is zero (same for ctx_out/ctx_size_out) * unexpected non-zero fields in __sk_buff return EINVAL Signed-off-by: Stanislav Fomichev <sdf@google.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2019-04-11 10:21:41 +02:00
Stanislav Fomichev	5e903c656b	libbpf: add support for ctx_{size, }_{in, out} in BPF_PROG_TEST_RUN Support recently introduced input/output context for test runs. We extend only bpf_prog_test_run_xattr. bpf_prog_test_run is unextendable and left as is. Signed-off-by: Stanislav Fomichev <sdf@google.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2019-04-11 10:21:41 +02:00
Prashant Bhole	569b0c7773	tools/bpftool: show btf id in program information Let's add a way to know whether a program has btf context. Patch adds 'btf_id' in the output of program listing. When btf_id is present, it means program has btf context. Sample output: user@test# bpftool prog list 25: xdp name xdp_prog1 tag 539ec6ce11b52f98 gpl loaded_at 2019-04-10T11:44:20+0900 uid 0 xlated 488B not jited memlock 4096B map_ids 23 btf_id 1 Signed-off-by: Prashant Bhole <bhole_prashant_q7@lab.ntt.co.jp> Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2019-04-11 10:21:39 +02:00
Andrey Ignatov	d5adbdd77e	libbpf: Fix build with gcc-8 Reported in [1]. With gcc 8.3.0 the following error is issued: cc -Ibpf@sta -I. -I.. -I.././include -I.././include/uapi -fdiagnostics-color=always -fsanitize=address,undefined -fno-omit-frame-pointer -pipe -D_FILE_OFFSET_BITS=64 -Wall -Winvalid-pch -Werror -g -fPIC -g -O2 -Werror -Wall -Wno-pointer-arith -Wno-sign-compare -MD -MQ 'bpf@sta/src_libbpf.c.o' -MF 'bpf@sta/src_libbpf.c.o.d' -o 'bpf@sta/src_libbpf.c.o' -c ../src/libbpf.c ../src/libbpf.c: In function 'bpf_object__elf_collect': ../src/libbpf.c:947:18: error: 'map_def_sz' may be used uninitialized in this function [-Werror=maybe-uninitialized] if (map_def_sz <= sizeof(struct bpf_map_def)) { ~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ../src/libbpf.c:827:18: note: 'map_def_sz' was declared here int i, map_idx, map_def_sz, nr_syms, nr_maps = 0, nr_maps_glob = 0; ^~~~~~~~~~ According to [2] -Wmaybe-uninitialized is enabled by -Wall. Same error is generated by clang's -Wconditional-uninitialized. [1] https://github.com/libbpf/libbpf/pull/29#issuecomment-481902601 [2] https://gcc.gnu.org/onlinedocs/gcc/Warning-Options.html Fixes: `d859900c4c` ("bpf, libbpf: support global data/bss/rodata sections") Reported-by: Evgeny Vereshchagin <evvers@ya.ru> Signed-off-by: Andrey Ignatov <rdna@fb.com> Acked-by: Yonghong Song <yhs@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2019-04-11 10:21:38 +02:00
Ido Schimmel	7052e24363	selftests: mlxsw: Test VRF MAC vetoing Test that it is possible to set an IP address on a VRF and that it is not vetoed. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-04-10 11:57:08 -07:00
Martin Schwidefsky	e24e4712ef	s390/rseq: use trap4 for RSEQ_SIG Use trap4 as the guard instruction for the restartable sequence abort handler. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>	2019-04-10 17:48:34 +02:00
Magnus Karlsson	50bd645b3a	libbpf: fix crash in XDP socket part with new larger BPF_LOG_BUF_SIZE In commit `da11b41758` ("libbpf: teach libbpf about log_level bit 2"), the BPF_LOG_BUF_SIZE was increased to 16M. The XDP socket part of libbpf allocated the log_buf on the stack, but for the new 16M buffer size this is not going to work. Change the code so it uses a 16K buffer instead. Fixes: `da11b41758` ("libbpf: teach libbpf about log_level bit 2") Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2019-04-10 09:51:50 +02:00
Yonghong Song	69a0f9ecef	bpf, bpftool: fix a few ubsan warnings The issue is reported at https://github.com/libbpf/libbpf/issues/28. Basically, per C standard, for void memcpy(void dest, const void *src, size_t n) if "dest" or "src" is NULL, regardless of whether "n" is 0 or not, the result of memcpy is undefined. clang ubsan reported three such instances in bpf.c with the following pattern: memcpy(dest, 0, 0). Although in practice, no known compiler will cause issues when copy size is 0. Let us still fix the issue to silence ubsan warnings. Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2019-04-10 09:46:51 +02:00
Daniel Borkmann	c861168b7c	bpf, selftest: add test cases for BTF Var and DataSec Extend test_btf with various positive and negative tests around BTF verification of kind Var and DataSec. All passing as well: # ./test_btf [...] BTF raw test[4] (global data test #1): OK BTF raw test[5] (global data test #2): OK BTF raw test[6] (global data test #3): OK BTF raw test[7] (global data test #4, unsupported linkage): OK BTF raw test[8] (global data test #5, invalid var type): OK BTF raw test[9] (global data test #6, invalid var type (fwd type)): OK BTF raw test[10] (global data test #7, invalid var type (fwd type)): OK BTF raw test[11] (global data test #8, invalid var size): OK BTF raw test[12] (global data test #9, invalid var size): OK BTF raw test[13] (global data test #10, invalid var size): OK BTF raw test[14] (global data test #11, multiple section members): OK BTF raw test[15] (global data test #12, invalid offset): OK BTF raw test[16] (global data test #13, invalid offset): OK BTF raw test[17] (global data test #14, invalid offset): OK BTF raw test[18] (global data test #15, not var kind): OK BTF raw test[19] (global data test #16, invalid var referencing sec): OK BTF raw test[20] (global data test #17, invalid var referencing var): OK BTF raw test[21] (global data test #18, invalid var loop): OK BTF raw test[22] (global data test #19, invalid var referencing var): OK BTF raw test[23] (global data test #20, invalid ptr referencing var): OK BTF raw test[24] (global data test #21, var included in struct): OK BTF raw test[25] (global data test #22, array of var): OK [...] PASS:167 SKIP:0 FAIL:0 Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2019-04-09 17:05:47 -07:00
Joe Stringer	b915ebe6d9	bpf, selftest: test global data/bss/rodata sections Add tests for libbpf relocation of static variable references into the .data, .rodata and .bss sections of the ELF, also add read-only test for .rodata. All passing: # ./test_progs [...] test_global_data:PASS:load program 0 nsec test_global_data:PASS:pass global data run 925 nsec test_global_data_number:PASS:relocate .bss reference 925 nsec test_global_data_number:PASS:relocate .data reference 925 nsec test_global_data_number:PASS:relocate .rodata reference 925 nsec test_global_data_number:PASS:relocate .bss reference 925 nsec test_global_data_number:PASS:relocate .data reference 925 nsec test_global_data_number:PASS:relocate .rodata reference 925 nsec test_global_data_number:PASS:relocate .bss reference 925 nsec test_global_data_number:PASS:relocate .bss reference 925 nsec test_global_data_number:PASS:relocate .rodata reference 925 nsec test_global_data_number:PASS:relocate .rodata reference 925 nsec test_global_data_number:PASS:relocate .rodata reference 925 nsec test_global_data_string:PASS:relocate .rodata reference 925 nsec test_global_data_string:PASS:relocate .data reference 925 nsec test_global_data_string:PASS:relocate .bss reference 925 nsec test_global_data_string:PASS:relocate .data reference 925 nsec test_global_data_string:PASS:relocate .bss reference 925 nsec test_global_data_struct:PASS:relocate .rodata reference 925 nsec test_global_data_struct:PASS:relocate .bss reference 925 nsec test_global_data_struct:PASS:relocate .rodata reference 925 nsec test_global_data_struct:PASS:relocate .data reference 925 nsec test_global_data_rdonly:PASS:test .rodata read-only map 925 nsec [...] Summary: 229 PASSED, 0 FAILED Note map helper signatures have been changed to avoid warnings when passing in const data. Joint work with Daniel Borkmann. Signed-off-by: Joe Stringer <joe@wand.net.nz> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Andrii Nakryiko <andriin@fb.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2019-04-09 17:05:47 -07:00
Daniel Borkmann	fb2abb73e5	bpf, selftest: test {rd, wr}only flags and direct value access Extend test_verifier with various test cases around the two kernel extensions, that is, {rd,wr}only map support as well as direct map value access. All passing, one skipped due to xskmap not present on test machine: # ./test_verifier [...] #948/p XDP pkt read, pkt_meta' <= pkt_data, bad access 1 OK #949/p XDP pkt read, pkt_meta' <= pkt_data, bad access 2 OK #950/p XDP pkt read, pkt_data <= pkt_meta', good access OK #951/p XDP pkt read, pkt_data <= pkt_meta', bad access 1 OK #952/p XDP pkt read, pkt_data <= pkt_meta', bad access 2 OK Summary: 1410 PASSED, 1 SKIPPED, 0 FAILED Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2019-04-09 17:05:47 -07:00
Daniel Borkmann	817998afa0	bpf: bpftool support for dumping data/bss/rodata sections Add the ability to bpftool to handle BTF Var and DataSec kinds in order to dump them out of btf_dumper_type(). The value has a single object with the section name, which itself holds an array of variables it dumps. A single variable is an object by itself printed along with its name. From there further type information is dumped along with corresponding value information. Example output from .rodata: # ./bpftool m d i 150 [{ "value": { ".rodata": [{ "load_static_data.bar": 18446744073709551615 },{ "num2": 24 },{ "num5": 43947 },{ "num6": 171 },{ "str0": [97,98,99,100,101,102,103,104,105,106,107,108,109,110,111,112,113,114,115,116,117,118,119,120,121,122,0,0,0,0,0,0 ] },{ "struct0": { "a": 42, "b": 4278120431, "c": 1229782938247303441 } },{ "struct2": { "a": 0, "b": 0, "c": 0 } } ] } } ] Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2019-04-09 17:05:47 -07:00
Daniel Borkmann	1713d68b3b	bpf, libbpf: add support for BTF Var and DataSec This adds libbpf support for BTF Var and DataSec kinds. Main point here is that libbpf needs to do some preparatory work before the whole BTF object can be loaded into the kernel, that is, fixing up of DataSec size taken from the ELF section size and non-static variable offset which needs to be taken from the ELF's string section. Upstream LLVM doesn't fix these up since at time of BTF emission it is too early in the compilation process thus this information isn't available yet, hence loader needs to take care of it. Note, deduplication handling has not been in the scope of this work and needs to be addressed in a future commit. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://reviews.llvm.org/D59441 Acked-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2019-04-09 17:05:47 -07:00
Daniel Borkmann	d859900c4c	bpf, libbpf: support global data/bss/rodata sections This work adds BPF loader support for global data sections to libbpf. This allows to write BPF programs in more natural C-like way by being able to define global variables and const data. Back at LPC 2018 [0] we presented a first prototype which implemented support for global data sections by extending BPF syscall where union bpf_attr would get additional memory/size pair for each section passed during prog load in order to later add this base address into the ldimm64 instruction along with the user provided offset when accessing a variable. Consensus from LPC was that for proper upstream support, it would be more desirable to use maps instead of bpf_attr extension as this would allow for introspection of these sections as well as potential live updates of their content. This work follows this path by taking the following steps from loader side: 1) In bpf_object__elf_collect() step we pick up ".data", ".rodata", and ".bss" section information. 2) If present, in bpf_object__init_internal_map() we add maps to the obj's map array that corresponds to each of the present sections. Given section size and access properties can differ, a single entry array map is created with value size that is corresponding to the ELF section size of .data, .bss or .rodata. These internal maps are integrated into the normal map handling of libbpf such that when user traverses all obj maps, they can be differentiated from user-created ones via bpf_map__is_internal(). In later steps when we actually create these maps in the kernel via bpf_object__create_maps(), then for .data and .rodata sections their content is copied into the map through bpf_map_update_elem(). For .bss this is not necessary since array map is already zero-initialized by default. Additionally, for .rodata the map is frozen as read-only after setup, such that neither from program nor syscall side writes would be possible. 3) In bpf_program__collect_reloc() step, we record the corresponding map, insn index, and relocation type for the global data. 4) And last but not least in the actual relocation step in bpf_program__relocate(), we mark the ldimm64 instruction with src_reg = BPF_PSEUDO_MAP_VALUE where in the first imm field the map's file descriptor is stored as similarly done as in BPF_PSEUDO_MAP_FD, and in the second imm field (as ldimm64 is 2-insn wide) we store the access offset into the section. Given these maps have only single element ldimm64's off remains zero in both parts. 5) On kernel side, this special marked BPF_PSEUDO_MAP_VALUE load will then store the actual target address in order to have a 'map-lookup'-free access. That is, the actual map value base address + offset. The destination register in the verifier will then be marked as PTR_TO_MAP_VALUE, containing the fixed offset as reg->off and backing BPF map as reg->map_ptr. Meaning, it's treated as any other normal map value from verification side, only with efficient, direct value access instead of actual call to map lookup helper as in the typical case. Currently, only support for static global variables has been added, and libbpf rejects non-static global variables from loading. This can be lifted until we have proper semantics for how BPF will treat multi-object BPF loads. From BTF side, libbpf will set the value type id of the types corresponding to the ".bss", ".data" and ".rodata" names which LLVM will emit without the object name prefix. The key type will be left as zero, thus making use of the key-less BTF option in array maps. Simple example dump of program using globals vars in each section: # bpftool prog [...] 6784: sched_cls name load_static_dat tag a7e1291567277844 gpl loaded_at 2019-03-11T15:39:34+0000 uid 0 xlated 1776B jited 993B memlock 4096B map_ids 2238,2237,2235,2236,2239,2240 # bpftool map show id 2237 2237: array name test_glo.bss flags 0x0 key 4B value 64B max_entries 1 memlock 4096B # bpftool map show id 2235 2235: array name test_glo.data flags 0x0 key 4B value 64B max_entries 1 memlock 4096B # bpftool map show id 2236 2236: array name test_glo.rodata flags 0x80 key 4B value 96B max_entries 1 memlock 4096B # bpftool prog dump xlated id 6784 int load_static_data(struct __sk_buff * skb): ; int load_static_data(struct __sk_buff skb) 0: (b7) r6 = 0 ; test_reloc(number, 0, &num0); 1: (63) (u32 )(r10 -4) = r6 2: (bf) r2 = r10 ; int load_static_data(struct __sk_buff skb) 3: (07) r2 += -4 ; test_reloc(number, 0, &num0); 4: (18) r1 = map[id:2238] 6: (18) r3 = map[id:2237][0]+0 <-- direct addr in .bss area 8: (b7) r4 = 0 9: (85) call array_map_update_elem#100464 10: (b7) r1 = 1 ; test_reloc(number, 1, &num1); [...] ; test_reloc(string, 2, str2); 120: (18) r8 = map[id:2237][0]+16 <-- same here at offset +16 122: (18) r1 = map[id:2239] 124: (18) r3 = map[id:2237][0]+16 126: (b7) r4 = 0 127: (85) call array_map_update_elem#100464 128: (b7) r1 = 120 ; str1[5] = 'x'; 129: (73) (u8 )(r9 +5) = r1 ; test_reloc(string, 3, str1); 130: (b7) r1 = 3 131: (63) (u32 )(r10 -4) = r1 132: (b7) r9 = 3 133: (bf) r2 = r10 ; int load_static_data(struct __sk_buff skb) 134: (07) r2 += -4 ; test_reloc(string, 3, str1); 135: (18) r1 = map[id:2239] 137: (18) r3 = map[id:2235][0]+16 <-- direct addr in .data area 139: (b7) r4 = 0 140: (85) call array_map_update_elem#100464 141: (b7) r1 = 111 ; __builtin_memcpy(&str2[2], "hello", sizeof("hello")); 142: (73) (u8 )(r8 +6) = r1 <-- further access based on .bss data 143: (b7) r1 = 108 144: (73) (u8 *)(r8 +5) = r1 [...] For Cilium use-case in particular, this enables migrating configuration constants from Cilium daemon's generated header defines into global data sections such that expensive runtime recompilations with LLVM can be avoided altogether. Instead, the ELF file becomes effectively a "template", meaning, it is compiled only once (!) and the Cilium daemon will then rewrite relevant configuration data from the ELF's .data or .rodata sections directly instead of recompiling the program. The updated ELF is then loaded into the kernel and atomically replaces the existing program in the networking datapath. More info in [0]. Based upon recent fix in LLVM, commit c0db6b6bd444 ("[BPF] Don't fail for static variables"). [0] LPC 2018, BPF track, "ELF relocation for static data in BPF", http://vger.kernel.org/lpc-bpf2018.html#session-3 Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Andrii Nakryiko <andriin@fb.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2019-04-09 17:05:47 -07:00
Joe Stringer	f8c7a4d4dc	bpf, libbpf: refactor relocation handling Adjust the code for relocations slightly with no functional changes, so that upcoming patches that will introduce support for relocations into the .data, .rodata and .bss sections can be added independent of these changes. Signed-off-by: Joe Stringer <joe@wand.net.nz> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Andrii Nakryiko <andriin@fb.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2019-04-09 17:05:47 -07:00
Daniel Borkmann	c83fef6bc5	bpf: sync {btf, bpf}.h uapi header from tools infrastructure Pull in latest changes from both headers, so we can make use of them in libbpf. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2019-04-09 17:05:47 -07:00
Daniel Borkmann	d8eca5bbb2	bpf: implement lookup-free direct value access for maps This generic extension to BPF maps allows for directly loading an address residing inside a BPF map value as a single BPF ldimm64 instruction! The idea is similar to what BPF_PSEUDO_MAP_FD does today, which is a special src_reg flag for ldimm64 instruction that indicates that inside the first part of the double insns's imm field is a file descriptor which the verifier then replaces as a full 64bit address of the map into both imm parts. For the newly added BPF_PSEUDO_MAP_VALUE src_reg flag, the idea is the following: the first part of the double insns's imm field is again a file descriptor corresponding to the map, and the second part of the imm field is an offset into the value. The verifier will then replace both imm parts with an address that points into the BPF map value at the given value offset for maps that support this operation. Currently supported is array map with single entry. It is possible to support more than just single map element by reusing both 16bit off fields of the insns as a map index, so full array map lookup could be expressed that way. It hasn't been implemented here due to lack of concrete use case, but could easily be done so in future in a compatible way, since both off fields right now have to be 0 and would correctly denote a map index 0. The BPF_PSEUDO_MAP_VALUE is a distinct flag as otherwise with BPF_PSEUDO_MAP_FD we could not differ offset 0 between load of map pointer versus load of map's value at offset 0, and changing BPF_PSEUDO_MAP_FD's encoding into off by one to differ between regular map pointer and map value pointer would add unnecessary complexity and increases barrier for debugability thus less suitable. Using the second part of the imm field as an offset into the value does /not/ come with limitations since maximum possible value size is in u32 universe anyway. This optimization allows for efficiently retrieving an address to a map value memory area without having to issue a helper call which needs to prepare registers according to calling convention, etc, without needing the extra NULL test, and without having to add the offset in an additional instruction to the value base pointer. The verifier then treats the destination register as PTR_TO_MAP_VALUE with constant reg->off from the user passed offset from the second imm field, and guarantees that this is within bounds of the map value. Any subsequent operations are normally treated as typical map value handling without anything extra needed from verification side. The two map operations for direct value access have been added to array map for now. In future other types could be supported as well depending on the use case. The main use case for this commit is to allow for BPF loader support for global variables that reside in .data/.rodata/.bss sections such that we can directly load the address of them with minimal additional infrastructure required. Loader support has been added in subsequent commits for libbpf library. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2019-04-09 17:05:46 -07:00
Bob Moore	3278675567	ACPICA: Rename nameseg length macro/define for clarity ACPICA commit 24870bd9e73d71e2a1ff0a1e94519f8f8409e57d ACPI_NAME_SIZE changed to ACPI_NAMESEG_SIZE This clarifies that this is the length of an individual nameseg, not the length of a generic namestring/namepath. Improves understanding of the code. Link: https://github.com/acpica/acpica/commit/24870bd9 Signed-off-by: Bob Moore <robert.moore@intel.com> Signed-off-by: Erik Schmauss <erik.schmauss@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2019-04-09 11:24:48 +02:00
Bob Moore	5599fb6935	ACPICA: Rename nameseg compare macro for clarity ACPICA commit 92ec0935f27e217dff0b176fca02c2ec3d782bb5 ACPI_COMPARE_NAME changed to ACPI_COMPARE_NAMESEG This clarifies (1) this is a compare on 4-byte namesegs, not a generic compare. Improves understanding of the code. Link: https://github.com/acpica/acpica/commit/92ec0935 Signed-off-by: Bob Moore <robert.moore@intel.com> Signed-off-by: Erik Schmauss <erik.schmauss@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2019-04-09 10:08:28 +02:00
Bob Moore	a3ce7a8e0d	ACPICA: Rename nameseg copy macro for clarity ACPICA commit 19c18d3157945d1b8b64a826f0a8e848b7dbb127 ACPI_MOVE_NAME changed to ACPI_COPY_NAMESEG This clarifies (1) this is a copy operation, and (2) it operates on ACPI name_segs. Improves understanding of the code. Link: https://github.com/acpica/acpica/commit/19c18d31 Signed-off-by: Bob Moore <robert.moore@intel.com> Signed-off-by: Erik Schmauss <erik.schmauss@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2019-04-09 10:08:28 +02:00
David S. Miller	310655b07a	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net	2019-04-08 23:39:36 -07:00
Linus Torvalds	869e3305f2	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Pull networking fixes from David Miller: 1) Off by one and bounds checking fixes in NFC, from Dan Carpenter. 2) There have been many weird regressions in r8169 since we turned ASPM support on, some are still not understood nor completely resolved. Let's turn this back off for now. From Heiner Kallweit. 3) Signess fixes for ethtool speed value handling, from Michael Zhivich. 4) Handle timestamps properly in macb driver, from Paul Thomas. 5) Two erspan fixes, it's the usual "skb ->data potentially reallocated and we're holding a stale protocol header pointer". From Lorenzo Bianconi. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: bnxt_en: Reset device on RX buffer errors. bnxt_en: Improve RX consumer index validity check. net: macb driver, check for SKBTX_HW_TSTAMP qlogic: qlcnic: fix use of SPEED_UNKNOWN ethtool constant broadcom: tg3: fix use of SPEED_UNKNOWN ethtool constant ethtool: avoid signed-unsigned comparison in ethtool_validate_speed() net: ip6_gre: fix possible use-after-free in ip6erspan_rcv net: ip_gre: fix possible use-after-free in erspan_rcv r8169: disable ASPM again MAINTAINERS: ieee802154: update documentation file pattern net: vrf: Fix ping failed when vrf mtu is set to 0 selftests: add a tc matchall test case nfc: nci: Potential off by one in ->pipes[] array NFC: nci: Add some bounds checking in nci_hci_cmd_received()	2019-04-08 17:10:46 -10:00
Tadeusz Struk	6da70580af	selftests/tpm2: Open tpm dev in unbuffered mode In order to have control over how many bytes are read or written the device needs to be opened in unbuffered mode. Signed-off-by: Tadeusz Struk <tadeusz.struk@intel.com> Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> Tested-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> Signed-off-by: James Morris <james.morris@microsoft.com>	2019-04-08 15:58:55 -07:00
Tadeusz Struk	f1a0ba6ccc	selftests/tpm2: Extend tests to cover partial reads Three new tests added: 1. Send get random cmd, read header in 1st read, read the rest in second read - expect success 2. Send get random cmd, read only part of the response, send another get random command, read the response - expect success 3. Send get random cmd followed by another get random cmd, without reading the first response - expect the second cmd to fail with -EBUSY Signed-off-by: Tadeusz Struk <tadeusz.struk@intel.com> Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> Tested-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> Signed-off-by: James Morris <james.morris@microsoft.com>	2019-04-08 15:58:55 -07:00
Roman Gushchin	e14d314c7a	selftests: cgroup: fix cleanup path in test_memcg_subtree_control() Dan reported, that cleanup path in test_memcg_subtree_control() triggers a static checker warning: ./tools/testing/selftests/cgroup/test_memcontrol.c:76 \ test_memcg_subtree_control() error: uninitialized symbol 'child2'. Fix this by initializing child2 and parent2 variables and split the cleanup path into few stages. Signed-off-by: Roman Gushchin <guro@fb.com> Fixes: `84092dbcf9` ("selftests: cgroup: add memory controller self-tests") Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Cc: Dan Carpenter <dan.carpenter@oracle.com> Cc: Shuah Khan (Samsung OSG) <shuah@kernel.org> Cc: Mike Rapoport <rppt@linux.vnet.ibm.com> Signed-off-by: Shuah Khan <shuah@kernel.org>	2019-04-08 16:44:22 -06:00
ZhangXiaoxu	f8a0590f0e	selftests: efivarfs: remove the test_create_read file if it was exist After the first run, the test case 'test_create_read' will always fail because the file is exist and file's attr is 'S_IMMUTABLE', open with 'O_RDWR' will always return -EPERM. Signed-off-by: ZhangXiaoxu <zhangxiaoxu5@huawei.com> Signed-off-by: Shuah Khan <shuah@kernel.org>	2019-04-08 16:44:21 -06:00
Mathieu Desnoyers	0a7dc82ef2	rseq/selftests: Adapt number of threads to the number of detected cpus On smaller systems, running a test with 200 threads can take a long time on machines with smaller number of CPUs. Detect the number of online cpus at test runtime, and multiply that by 6 to have 6 rseq threads per cpu preempting each other. Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Joel Fernandes <joelaf@google.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Dave Watson <davejwatson@fb.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: linux-kselftest@vger.kernel.org Cc: "H . Peter Anvin" <hpa@zytor.com> Cc: Chris Lameter <cl@linux.com> Cc: Russell King <linux@arm.linux.org.uk> Cc: Michael Kerrisk <mtk.manpages@gmail.com> Cc: "Paul E . McKenney" <paulmck@linux.vnet.ibm.com> Cc: Paul Turner <pjt@google.com> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Ben Maurer <bmaurer@fb.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Shuah Khan <shuah@kernel.org>	2019-04-08 16:44:21 -06:00
Tobin C. Harding	0b0600c8c9	lib: Add test module for strscpy_pad Add a test module for the new strscpy_pad() function. Tie it into the kselftest infrastructure for lib/ tests. Acked-by: Kees Cook <keescook@chromium.org> Signed-off-by: Tobin C. Harding <tobin@kernel.org> Signed-off-by: Shuah Khan <shuah@kernel.org>	2019-04-08 16:44:21 -06:00
Tobin C. Harding	eebf4dd452	kselftest: Add test module framework header kselftest runs as a userspace process. Sometimes we need to test things from kernel space. One way of doing this is by creating a test module. Currently doing so requires developers to write a bunch of boiler plate in the module if kselftest is to be used to run the tests. This means we currently have a load of duplicate code to achieve these ends. If we have a uniform method for implementing test modules then we can reduce code duplication, ensure uniformity in the test framework, ease code maintenance, and reduce the work required to create tests. This all helps to encourage developers to write and run tests. Add a C header file that can be included in test modules. This provides a single point for common test functions/macros. Implement a few macros that make up the start of the test framework. Add documentation for new kselftest header to kselftest documentation. Acked-by: Kees Cook <keescook@chromium.org> Signed-off-by: Tobin C. Harding <tobin@kernel.org> Signed-off-by: Shuah Khan <shuah@kernel.org>	2019-04-08 16:44:20 -06:00
Tobin C. Harding	d346052770	kselftest: Add test runner creation script Currently if we wish to use kselftest to run tests within a kernel module we write a small script to load/unload and do error reporting. There are a bunch of these under tools/testing/selftests/lib/ that are all identical except for the test name. We can reduce code duplication and improve maintainability if we have one version of this. However kselftest requires an executable for each test. We can move all the script logic to a central script then have each individual test script call the main script. Oneliner to call kselftest_module.sh courtesy of Kees, thanks! Add test runner creation script. Convert tools/testing/selftests/lib/*.sh to use new test creation script. Testing ------- Configure kselftests for lib/ then build and boot kernel. Then run kselftests as follows: $ cd /path/to/kernel/tree $ sudo make O=$output_path -C tools/testing/selftests TARGETS="lib" run_tests and also $ cd /path/to/kernel/tree $ cd tools/testing/selftests $ sudo make O=$output_path TARGETS="lib" run_tests and also $ cd /path/to/kernel/tree $ cd tools/testing/selftests $ sudo make TARGETS="lib" run_tests Acked-by: Kees Cook <keescook@chromium.org> Signed-off-by: Tobin C. Harding <tobin@kernel.org> Signed-off-by: Shuah Khan <shuah@kernel.org>	2019-04-08 16:44:11 -06:00
David Ahern	228ddb3315	selftests: fib_tests: Add tests for ipv6 gateway with ipv4 route Add tests for ipv6 gateway with ipv4 route. Tests include basic single path with ping to verify connectivity and multipath. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-04-08 15:22:41 -07:00
Sabyasachi Gupta	6f9e64b0ff	selftest/gpio: Remove duplicate header Remove duplicate header which are included twice. Signed-off-by: Sabyasachi Gupta <sabyasachi.linux@gmail.com> Signed-off-by: Souptick Joarder <jrdr.linux@gmail.com> Signed-off-by: Shuah Khan <shuah@kernel.org>	2019-04-08 16:18:21 -06:00
Sabyasachi Gupta	cde53520e2	selftest/rseq: Remove duplicate header Remove duplicate header which is included twice Signed-off-by: Sabyasachi Gupta <sabyasachi.linux@gmail.com> Signed-off-by: Souptick Joarder <jrdr.linux@gmail.com> Acked-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Signed-off-by: Shuah Khan <shuah@kernel.org>	2019-04-08 16:18:21 -06:00
Sabyasachi Gupta	a04a67845c	selftest/timers: Remove duplicate header Remove duplicate header which is included twice. Signed-off-by: Sabyasachi Gupta <sabyasachi.linux@gmail.com> Signed-off-by: Souptick Joarder <jrdr.linux@gmail.com> Signed-off-by: Shuah Khan <shuah@kernel.org>	2019-04-08 16:18:21 -06:00
Sabyasachi Gupta	d11a7e376a	selftest/x86/mpx-dig.c: Remove duplicate header Remove duplicate header which is included twice. Signed-off-by: Sabyasachi Gupta <sabyasachi.linux@gmail.com> Signed-off-by: Souptick Joarder <jrdr.linux@gmail.com> Signed-off-by: Shuah Khan <shuah@kernel.org>	2019-04-08 16:18:21 -06:00
Florian Westphal	6978cdb129	kselftests: extend nft_nat with inet family based nat hooks With older nft versions, this will cause: [..] PASS: ipv6 ping to ns1 was ip6 NATted to ns2 /dev/stdin:4:30-31: Error: syntax error, unexpected to, expecting newline or semicolon ip daddr 10.0.1.99 dnat ip to 10.0.2.99 ^^ SKIP: inet nat tests PASS: ip IP masquerade for ns2 [..] as there is currently no way to detect if nft will be able to parse the inet format. redirect and masquerade tests need to be skipped in this case for inet too because nft userspace has overzealous family check and rejects their use in the inet family. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2019-04-08 23:03:04 +02:00
Jens Axboe	704236672e	tools/io_uring: remove IOCQE_FLAG_CACHEHIT This ended up not being included in the mainline version of io_uring, so drop it from the test app as well. Signed-off-by: Jens Axboe <axboe@kernel.dk>	2019-04-08 10:48:50 -06:00
Dave Jiang	2170a0d53b	tools/testing/nvdimm: Retain security state after overwrite Overwrite retains the security state after completion of operation. Fix nfit_test to reflect this so that the kernel can test the behavior it is more likely to see in practice. Fixes: `926f74802c` ("tools/testing/nvdimm: Add overwrite support for nfit_test") Signed-off-by: Dave Jiang <dave.jiang@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2019-04-08 09:39:32 -07:00
Tom Zanussi	4eab1cc461	selftests/ftrace: Add tracing/error_log testcase Add a testcase verifying basic tracing/error_log functionality. Link: http://lkml.kernel.org/r/bf1c0d47a24672df945331462682d96296d1ab28.1554072478.git.tom.zanussi@linux.intel.com Acked-by: Masami Hiramatsu <mhiramat@kernel.org> Acked-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>	2019-04-08 09:22:50 -04:00
Tom Zanussi	0ae8dde9d7	selftests/ftrace: Remove trigger-extended-error-support testcase Error handling has been moved to the common tracing/error_log, so this test is no longer valid. Link: http://lkml.kernel.org/r/876a98b21018814cbf46f0a3605ae0906c51d53c.1554072478.git.tom.zanussi@linux.intel.com Acked-by: Masami Hiramatsu <mhiramat@kernel.org> Acked-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>	2019-04-08 09:22:50 -04:00
Tom Zanussi	c5e4114fee	selftests/ftrace: Move kprobe/uprobe check_error() to test.d/functions The k/uprobe_sytax_errors test case defines a check_error() function used to run a command and check the position of the caret in the output. This would be useful for other ftrace facilities too, so move it to test.d/functions for use by anyone. In the process, rename it to ftrace_errlog_check() and parametrize it for general use. Link: http://lkml.kernel.org/r/9f88080a06f1755811f69081926afe7e5cb53178.1554072478.git.tom.zanussi@linux.intel.com Acked-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>	2019-04-08 09:22:50 -04:00
Masami Hiramatsu	8ab4483eb6	selftests/ftrace: Add error_log testcase for probe errors Add error_log testcase for error logs on probe events. This tests most of error cases and checks the error position is correct. Link: http://lkml.kernel.org/r/63d695b74e0965988fa54ffa12beeb2c3475250d.1554072478.git.tom.zanussi@linux.intel.com Acked-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> [tom.zanussi@linux.intel.com: changed >& redirection to 2>] Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>	2019-04-08 09:22:50 -04:00
Florian Westphal	4c145dce26	xfrm: make xfrm modes builtin after previous changes, xfrm_mode contains no function pointers anymore and all modules defining such struct contain no code except an init/exit functions to register the xfrm_mode struct with the xfrm core. Just place the xfrm modes core and remove the modules, the run-time xfrm_mode register/unregister functionality is removed. Before: text data bss dec filename 7523 200 2364 10087 net/xfrm/xfrm_input.o 40003 628 440 41071 net/xfrm/xfrm_state.o 15730338 6937080 4046908 26714326 vmlinux 7389 200 2364 9953 net/xfrm/xfrm_input.o 40574 656 440 41670 net/xfrm/xfrm_state.o 15730084 6937068 4046908 26714060 vmlinux The xfrm_mode_{transport,tunnel,beet} modules are gone. v2: replace CONFIG_INET6_XFRM_MODE_ IS_ENABLED guards with CONFIG_IPV6 ones rather than removing them. Signed-off-by: Florian Westphal <fw@strlen.de> Reviewed-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2019-04-08 09:15:17 +02:00
Nicolas Dichtel	b959ecf8f9	selftests: add a tc matchall test case This is a follow up of the commit `0db6f8befc` ("net/sched: fix ->get helper of the matchall cls"). To test it: $ cd tools/testing/selftests/tc-testing $ ln -s ../plugin-lib/nsPlugin.py plugins/20-nsPlugin.py $ ./tdc.py -n -e 2638 Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-04-07 19:31:49 -07:00
Andrey Ignatov	ff466b5805	libbpf: Ignore -Wformat-nonliteral warning vsprintf() in __base_pr() uses nonliteral format string and it breaks compilation for those who provide corresponding extra CFLAGS, e.g.: https://github.com/libbpf/libbpf/issues/27 If libbpf is built with the flags from PR: libbpf.c:68:26: error: format string is not a string literal [-Werror,-Wformat-nonliteral] return vfprintf(stderr, format, args); ^~~~~~ 1 error generated. Ignore this warning since the use case in libbpf.c is legit. Signed-off-by: Andrey Ignatov <rdna@fb.com> Acked-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2019-04-06 23:13:54 -07:00
Nikolay Aleksandrov	f1054c65bc	selftests: forwarding: test for bridge mcast traffic after report and leave This test is split in two, the first part checks if a report creates a corresponding mdb entry and if traffic is properly forwarded to it, and the second part checks if the mdb entry is deleted after a leave and if traffic is not forwarded to it. Since the mcast querier is enabled we should see standard mcast snooping bridge behaviour. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-04-06 18:30:04 -07:00
Christoph Hellwig	72deb455b5	block: remove CONFIG_LBDAF Currently support for 64-bit sector_t and blkcnt_t is optional on 32-bit architectures. These types are required to support block device and/or file sizes larger than 2 TiB, and have generally defaulted to on for a long time. Enabling the option only increases the i386 tinyconfig size by 145 bytes, and many data structures already always use 64-bit values for their in-core and on-disk data structures anyway, so there should not be a large change in dynamic memory usage either. Dropping this option removes a somewhat weird non-default config that has cause various bugs or compiler warnings when actually used. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2019-04-06 10:48:35 -06:00
David S. Miller	f83f715195	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Minor comment merge conflict in mlx5. Staging driver has a fixup due to the skb->xmit_more changes in 'net-next', but was removed in 'net'. Signed-off-by: David S. Miller <davem@davemloft.net>	2019-04-05 14:14:19 -07:00
Andrey Ignatov	07f9196241	selftests/bpf: Test unbounded var_off stack access Test the case when reg->smax_value is too small/big and can overflow, and separately min and max values outside of stack bounds. Example of output: # ./test_verifier #856/p indirect variable-offset stack access, unbounded OK #857/p indirect variable-offset stack access, max out of bound OK #858/p indirect variable-offset stack access, min out of bound OK Signed-off-by: Andrey Ignatov <rdna@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2019-04-05 16:50:08 +02:00
Andrey Ignatov	2c6927dbdc	selftests/bpf: Test indirect var_off stack access in unpriv mode Test that verifier rejects indirect stack access with variable offset in unprivileged mode and accepts same code in privileged mode. Since pointer arithmetics is prohibited in unprivileged mode verifier should reject the program even before it gets to helper call that uses variable offset, at the time when that variable offset is trying to be constructed. Example of output: # ./test_verifier ... #859/u indirect variable-offset stack access, priv vs unpriv OK #859/p indirect variable-offset stack access, priv vs unpriv OK Signed-off-by: Andrey Ignatov <rdna@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2019-04-05 16:50:08 +02:00
Andrey Ignatov	f68a5b4464	selftests/bpf: Test indirect var_off stack access in raw mode Test that verifier rejects indirect access to uninitialized stack with variable offset. Example of output: # ./test_verifier ... #859/p indirect variable-offset stack access, uninitialized OK Signed-off-by: Andrey Ignatov <rdna@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2019-04-05 16:50:07 +02:00
Josh Poimboeuf	4fa5ecda2b	objtool: Add rewind_stack_do_exit() to the noreturn list This fixes the following warning seen on GCC 7.3: arch/x86/kernel/dumpstack.o: warning: objtool: oops_end() falls through to next function show_regs() Reported-by: kbuild test robot <lkp@intel.com> Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lkml.kernel.org/r/3418ebf5a5a9f6ed7e80954c741c0b904b67b5dc.1554398240.git.jpoimboe@redhat.com	2019-04-05 13:30:51 +02:00
Thomas Gleixner	cabf5ebbab	perf/core improvements and fixes: perf record: Alexey Budankov: - Implement --mmap-flush=<number> option, to control a threshold for draining the mmap ring buffers and consequently the size of the write calls to the output, be it perf.data, pipe mode or soon a compressor that with bigger buffers will do a better job before dumping compressed data into a new perf.data content mode, which is in the final steps of reviewing and testing. perf trace: Arnaldo Carvalho de Melo: - Add 'string' event alias to select syscalls with string args, i.e. for testing the BPF program used to copy those strings, allow for: # perf trace -e string To select all the syscalls that have things like pathnames. - Use a PERCPU_ARRAY BPF map to copy more string bytes than what is possible using the BPF stack, just like pioneered by the sysdig tool. Feature detection: Alexey Budankov: - Implement libzstd feature check, which is a library that provides a uniform API to various compression formats, will be used in 'perf record', see note about --mmap-flush feature. perf stat: Andi Kleen: - Implement a tool specific 'duration_time' event to allow showing the "time elapsed" line in the default 'perf stat' output as one of the events that can be asked for when using --field-separator and other script consumable outputs. Intel vendor events (JSON files): Andi Kleen: - Update metrics from TMAM 3.5. - Update events: Bonnell to V4 Broadwell-DE to v7 Broadwell to v23 BroadwellX to v14 GoldmontPlus to v1.01 Goldmont to v13 Haswell to v28 HaswellX to v20 IvyBridge to v21 IvyTown to v20 JakeTown to v20 KnightsLanding to v9 SandyBridge to v16 Silvermont to v14 Skylake to v42 SkylakeX to v1.12 IBM S/390 vendor events (JSON): Thomas Richter: - Fix s390 counter long description for L1D_RO_EXCL_WRITES. tools lib traceevent: Steven Rostedt (Red Hat): - Add more debugging to see various internal ring buffer entries. Steven Rostedt (VMWare): - Handle trace_printk() "%px". - Add mono clocks to be parsed in seconds. - Removed unneeded !! and return parenthesis. Tzvetomir Stoyanov : - Implement a new API, tep_list_events_copy(). - Implement new traceevent APIs for accessing struct tep_handler fields. - Remove tep filter trivial APIs, not used anymore. - Remove call to exit() from tep_filter_add_filter_str(), library routines shouldn't kill tools using it. - Make traceevent APIs more consistent. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> -----BEGIN PGP SIGNATURE----- iHUEABYIAB0WIQR2GiIUctdOfX2qHhGyPKLppCJ+JwUCXKNwXgAKCRCyPKLppCJ+ JypjAPsGYcZF1YFO5053kU6yo0fkVJB/3gyGQVjlAGqXcjsRnAD/cTnhni0ocQDW hu5nRBYCnw0l4r6yfg6Y6+jXlvCZyg8= =wiIr -----END PGP SIGNATURE----- Merge tag 'perf-core-for-mingo-5.2-20190402' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/core Pull perf/core improvements and fixes from Arnaldo: perf record: Alexey Budankov: - Implement --mmap-flush=<number> option, to control a threshold for draining the mmap ring buffers and consequently the size of the write calls to the output, be it perf.data, pipe mode or soon a compressor that with bigger buffers will do a better job before dumping compressed data into a new perf.data content mode, which is in the final steps of reviewing and testing. perf trace: Arnaldo Carvalho de Melo: - Add 'string' event alias to select syscalls with string args, i.e. for testing the BPF program used to copy those strings, allow for: # perf trace -e string To select all the syscalls that have things like pathnames. - Use a PERCPU_ARRAY BPF map to copy more string bytes than what is possible using the BPF stack, just like pioneered by the sysdig tool. Feature detection: Alexey Budankov: - Implement libzstd feature check, which is a library that provides a uniform API to various compression formats, will be used in 'perf record', see note about --mmap-flush feature. perf stat: Andi Kleen: - Implement a tool specific 'duration_time' event to allow showing the "time elapsed" line in the default 'perf stat' output as one of the events that can be asked for when using --field-separator and other script consumable outputs. Intel vendor events (JSON files): Andi Kleen: - Update metrics from TMAM 3.5. - Update events: Bonnell to V4 Broadwell-DE to v7 Broadwell to v23 BroadwellX to v14 GoldmontPlus to v1.01 Goldmont to v13 Haswell to v28 HaswellX to v20 IvyBridge to v21 IvyTown to v20 JakeTown to v20 KnightsLanding to v9 SandyBridge to v16 Silvermont to v14 Skylake to v42 SkylakeX to v1.12 IBM S/390 vendor events (JSON): Thomas Richter: - Fix s390 counter long description for L1D_RO_EXCL_WRITES. tools lib traceevent: Steven Rostedt (Red Hat): - Add more debugging to see various internal ring buffer entries. Steven Rostedt (VMWare): - Handle trace_printk() "%px". - Add mono clocks to be parsed in seconds. - Removed unneeded !! and return parenthesis. Tzvetomir Stoyanov : - Implement a new API, tep_list_events_copy(). - Implement new traceevent APIs for accessing struct tep_handler fields. - Remove tep filter trivial APIs, not used anymore. - Remove call to exit() from tep_filter_add_filter_str(), library routines shouldn't kill tools using it. - Make traceevent APIs more consistent. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2019-04-05 13:28:15 +02:00
Linus Torvalds	0548740e53	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Pull networking fixes from David Miller: 1) Several hash table refcount fixes in batman-adv, from Sven Eckelmann. 2) Use after free in bpf_evict_inode(), from Daniel Borkmann. 3) Fix mdio bus registration in ixgbe, from Ivan Vecera. 4) Unbounded loop in __skb_try_recv_datagram(), from Paolo Abeni. 5) ila rhashtable corruption fix from Herbert Xu. 6) Don't allow upper-devices to be added to vrf devices, from Sabrina Dubroca. 7) Add qmi_wwan device ID for Olicard 600, from Bjørn Mork. 8) Don't leave skb->next poisoned in __netif_receive_skb_list_ptype, from Alexander Lobakin. 9) Missing IDR checks in mlx5 driver, from Aditya Pakki. 10) Fix false connection termination in ktls, from Jakub Kicinski. 11) Work around some ASPM issues with r8169 by disabling rx interrupt coalescing on certain chips. From Heiner Kallweit. 12) Properly use per-cpu qstat values on NOLOCK qdiscs, from Paolo Abeni. 13) Fully initialize sockaddr_in structures in SCTP, from Xin Long. 14) Various BPF flow dissector fixes from Stanislav Fomichev. 15) Divide by zero in act_sample, from Davide Caratti. 16) Fix bridging multicast regression introduced by rhashtable conversion, from Nikolay Aleksandrov. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (106 commits) ibmvnic: Fix completion structure initialization ipv6: sit: reset ip header pointer in ipip6_rcv net: bridge: always clear mcast matching struct on reports and leaves libcxgb: fix incorrect ppmax calculation vlan: conditional inclusion of FCoE hooks to match netdevice.h and bnx2x sch_cake: Make sure we can write the IP header before changing DSCP bits sch_cake: Use tc_skb_protocol() helper for getting packet protocol tcp: Ensure DCTCP reacts to losses net/sched: act_sample: fix divide by zero in the traffic path net: thunderx: fix NULL pointer dereference in nicvf_open/nicvf_stop net: hns: Fix sparse: some warnings in HNS drivers net: hns: Fix WARNING when remove HNS driver with SMMU enabled net: hns: fix ICMP6 neighbor solicitation messages discard problem net: hns: Fix probabilistic memory overwrite when HNS driver initialized net: hns: Use NAPI_POLL_WEIGHT for hns driver net: hns: fix KASAN: use-after-free in hns_nic_net_xmit_hw() flow_dissector: rst'ify documentation ipv6: Fix dangling pointer when ipv6 fragment net-gro: Fix GRO flush when receiving a GSO packet. flow_dissector: document BPF flow dissector environment ...	2019-04-04 18:07:12 -10:00
Paul E. McKenney	a5220e7d2e	tools/memory-model: Add support for synchronize_srcu_expedited() Given that synchronize_rcu_expedited() is supported, this commit adds support for synchronize_srcu_expedited(). Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com> Acked-by: Andrea Parri <andrea.parri@amarulasolutions.com>	2019-04-04 13:48:34 -07:00
David S. Miller	5ba5780117	Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf Daniel Borkmann says: ==================== pull-request: bpf 2019-04-04 The following pull-request contains BPF updates for your net tree. The main changes are: 1) Batch of fixes to the existing BPF flow dissector API to support calling BPF programs from the eth_get_headlen context (support for latter is planned to be added in bpf-next), from Stanislav. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2019-04-04 13:30:55 -07:00
Rafael J. Wysocki	58b0cf8e24	Merge branch 'pm-tools' * pm-tools: tools/power turbostat: update version number tools/power turbostat: Warn on bad ACPI LPIT data tools/power turbostat: Add checks for failure of fgets() and fscanf() tools/power turbostat: Also read package power on AMD F17h (Zen) tools/power turbostat: Add support for AMD Fam 17h (Zen) RAPL tools/power turbostat: Do not display an error on systems without a cpufreq driver tools/power turbostat: Add Die column tools/power turbostat: Add Icelake support tools/power turbostat: Cleanup CNL-specific code tools/power turbostat: Cleanup CC3-skip code tools/power turbostat: Restore ability to execute in topology-order	2019-04-04 21:57:45 +02:00
Davide Caratti	fae2708174	net/sched: act_sample: fix divide by zero in the traffic path the control path of 'sample' action does not validate the value of 'rate' provided by the user, but then it uses it as divisor in the traffic path. Validate it in tcf_sample_init(), and return -EINVAL with a proper extack message in case that value is zero, to fix a splat with the script below: # tc f a dev test0 egress matchall action sample rate 0 group 1 index 2 # tc -s a s action sample total acts 1 action order 0: sample rate 1/0 group 1 pipe index 2 ref 1 bind 1 installed 19 sec used 19 sec Action statistics: Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) backlog 0b 0p requeues 0 # ping 192.0.2.1 -I test0 -c1 -q divide error: 0000 [#1] SMP PTI CPU: 1 PID: 6192 Comm: ping Not tainted 5.1.0-rc2.diag2+ #591 Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 RIP: 0010:tcf_sample_act+0x9e/0x1e0 [act_sample] Code: 6a f1 85 c0 74 0d 80 3d 83 1a 00 00 00 0f 84 9c 00 00 00 4d 85 e4 0f 84 85 00 00 00 e8 9b d7 9c f1 44 8b 8b e0 00 00 00 31 d2 <41> f7 f1 85 d2 75 70 f6 85 83 00 00 00 10 48 8b 45 10 8b 88 08 01 RSP: 0018:ffffae320190ba30 EFLAGS: 00010246 RAX: 00000000b0677d21 RBX: ffff8af1ed9ec000 RCX: 0000000059a9fe49 RDX: 0000000000000000 RSI: 000000000c7e33b7 RDI: ffff8af23daa0af0 RBP: ffff8af1ee11b200 R08: 0000000074fcaf7e R09: 0000000000000000 R10: 0000000000000050 R11: ffffffffb3088680 R12: ffff8af232307f80 R13: 0000000000000003 R14: ffff8af1ed9ec000 R15: 0000000000000000 FS: 00007fe9c6d2f740(0000) GS:ffff8af23da80000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007fff6772f000 CR3: 00000000746a2004 CR4: 00000000001606e0 Call Trace: tcf_action_exec+0x7c/0x1c0 tcf_classify+0x57/0x160 __dev_queue_xmit+0x3dc/0xd10 ip_finish_output2+0x257/0x6d0 ip_output+0x75/0x280 ip_send_skb+0x15/0x40 raw_sendmsg+0xae3/0x1410 sock_sendmsg+0x36/0x40 __sys_sendto+0x10e/0x140 __x64_sys_sendto+0x24/0x30 do_syscall_64+0x60/0x210 entry_SYSCALL_64_after_hwframe+0x49/0xbe [...] Kernel panic - not syncing: Fatal exception in interrupt Add a TDC selftest to document that 'rate' is now being validated. Reported-by: Matteo Croce <mcroce@redhat.com> Fixes: `5c5670fae4` ("net/sched: Introduce sample tc action") Signed-off-by: Davide Caratti <dcaratti@redhat.com> Acked-by: Yotam Gigi <yotam.gi@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-04-04 10:46:33 -07:00
Daniel T. Lee	e67b2c7154	samples, selftests/bpf: add NULL check for ksym_search Since, ksym_search added with verification logic for symbols existence, it could return NULL when the kernel symbols are not loaded. This commit will add NULL check logic after ksym_search. Signed-off-by: Daniel T. Lee <danieltimlee@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2019-04-04 16:43:47 +02:00
Daniel T. Lee	0979ff7992	selftests/bpf: ksym_search won't check symbols exists Currently, ksym_search located at trace_helpers won't check symbols are existing or not. In ksym_search, when symbol is not found, it will return &syms[0](_stext). But when the kernel symbols are not loaded, it will return NULL, which is not a desired action. This commit will add verification logic whether symbols are loaded prior to the symbol search. Signed-off-by: Daniel T. Lee <danieltimlee@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2019-04-04 16:43:46 +02:00
Alexei Starovoitov	8aa2d4b4b9	selftests/bpf: synthetic tests to push verifier limits Add a test to generate 1m ld_imm64 insns to stress the verifier. Bump the size of fill_ld_abs_vlan_push_pop test from 4k to 29k and jump_around_ld_abs from 4k to 5.5k. Larger sizes are not possible due to 16-bit offset encoding in jump instructions. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2019-04-04 01:27:38 +02:00
Alexei Starovoitov	e5e7a8f2d8	selftests/bpf: add few verifier scale tests Add 3 basic tests that stress verifier scalability. test_verif_scale1.c calls non-inlined jhash() function 90 times on different position in the packet. This test simulates network packet parsing. jhash function is ~140 instructions and main program is ~1200 insns. test_verif_scale2.c force inlines jhash() function 90 times. This program is ~15k instructions long. test_verif_scale3.c calls non-inlined jhash() function 90 times on But this time jhash has to process 32-bytes from the packet instead of 14-bytes in tests 1 and 2. jhash function is ~230 insns and main program is ~1200 insns. $ test_progs -s can be used to see verifier stats. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2019-04-04 01:27:38 +02:00
Alexei Starovoitov	da11b41758	libbpf: teach libbpf about log_level bit 2 Allow bpf_prog_load_xattr() to specify log_level for program loading. Teach libbpf to accept log_level with bit 2 set. Increase default BPF_LOG_BUF_SIZE from 256k to 16M. There is no downside to increase it to a maximum allowed by old kernels. Existing 256k limit caused ENOSPC errors and users were not able to see verifier error which is printed at the end of the verifier log. If ENOSPC is hit, double the verifier log and try again to capture the verifier error. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2019-04-04 01:27:38 +02:00
Stanislav Fomichev	822fe61795	net/flow_dissector: pass flow_keys->n_proto to BPF programs This is a preparation for the next commit that would prohibit access to the most fields of __sk_buff from the BPF programs. Instead of requiring BPF flow dissector programs to look into skb, pass all input data in the flow_keys. Signed-off-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2019-04-03 16:49:48 +02:00
Stanislav Fomichev	2c3af7d901	selftests/bpf: fix vlan handling in flow dissector program When we tail call PROG(VLAN) from parse_eth_proto we don't need to peek back to handle vlan proto because we didn't adjust nhoff/thoff yet. Use flow_keys->n_proto, that we set in parse_eth_proto instead and properly increment nhoff as well. Also, always use skb->protocol and don't look at skb->vlan_present. skb->vlan_present indicates that vlan information is stored out-of-band in skb->vlan_{tci,proto} and vlan header is already pulled from skb. That means, skb->vlan_present == true is not relevant for BPF flow dissector. Add simple test cases with VLAN tagged frames: * single vlan for ipv4 * double vlan for ipv6 Signed-off-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2019-04-03 16:49:48 +02:00
Peter Zijlstra	2f0f9e9ad7	objtool: Add Direction Flag validation Having DF escape is BAD(tm). Linus; you suggested this one, but since DF really is only used from ASM and the failure case is fairly obvious, do we really need this? OTOH the patch is fairly small and simple, so let's just do this to demonstrate objtool's superior awesomeness. Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>	2019-04-03 11:02:24 +02:00
Peter Zijlstra	ea24213d80	objtool: Add UACCESS validation It is important that UACCESS regions are as small as possible; furthermore the UACCESS state is not scheduled, so doing anything that might directly call into the scheduler will cause random code to be ran with UACCESS enabled. Teach objtool too track UACCESS state and warn about any CALL made while UACCESS is enabled. This very much includes the __fentry__() and __preempt_schedule() calls. Note that exceptions _do_ save/restore the UACCESS state, and therefore they can drive preemption. This also means that all exception handlers must have an otherwise redundant UACCESS disable instruction; therefore ignore this warning for !STT_FUNC code (exception handlers are not normal functions). Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>	2019-04-03 11:02:24 +02:00
Peter Zijlstra	54262aa283	objtool: Fix sibling call detection It turned out that we failed to detect some sibling calls; specifically those without relocation records; like: $ ./objdump-func.sh defconfig-build/mm/kasan/generic.o __asan_loadN 0000 0000000000000840 <__asan_loadN>: 0000 840: 48 8b 0c 24 mov (%rsp),%rcx 0004 844: 31 d2 xor %edx,%edx 0006 846: e9 45 fe ff ff jmpq 690 <check_memory_region> So extend the cross-function jump to also consider those that are not between known (or newly detected) parent/child functions, as sibling-cals when they jump to the start of the function. The second part of that condition is to deal with random jumps to the middle of other function, as can be found in arch/x86/lib/copy_user_64.S for example. This then (with later patches applied) makes the above recognise the sibling call: mm/kasan/generic.o: warning: objtool: __asan_loadN()+0x6: call to check_memory_region() with UACCESS enabled Also make sure to set insn->call_dest for sibling calls so we can know who we're calling. This is useful information when printing validation warnings later. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>	2019-04-03 11:02:24 +02:00
Peter Zijlstra	764eef4b10	objtool: Rewrite alt->skip_orig Really skip the original instruction flow, instead of letting it continue with NOPs. Since the alternative code flow already continues after the original instructions, only the alt-original is skipped. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>	2019-04-03 11:02:24 +02:00
Peter Zijlstra	7697eee3dd	objtool: Add --backtrace support For when you want to know the path that reached your fail state: $ ./objtool check --no-fp --backtrace arch/x86/lib/usercopy_64.o arch/x86/lib/usercopy_64.o: warning: objtool: .altinstr_replacement+0x3: UACCESS disable without MEMOPs: __clear_user() arch/x86/lib/usercopy_64.o: warning: objtool: __clear_user()+0x3a: (alt) arch/x86/lib/usercopy_64.o: warning: objtool: __clear_user()+0x2e: (branch) arch/x86/lib/usercopy_64.o: warning: objtool: __clear_user()+0x18: (branch) arch/x86/lib/usercopy_64.o: warning: objtool: .altinstr_replacement+0xffffffffffffffff: (branch) arch/x86/lib/usercopy_64.o: warning: objtool: __clear_user()+0x5: (alt) arch/x86/lib/usercopy_64.o: warning: objtool: __clear_user()+0x0: <=== (func) 0000000000000000 <__clear_user>: 0: e8 00 00 00 00 callq 5 <__clear_user+0x5> 1: R_X86_64_PLT32 __fentry__-0x4 5: 90 nop 6: 90 nop 7: 90 nop 8: 48 89 f0 mov %rsi,%rax b: 48 c1 ee 03 shr $0x3,%rsi f: 83 e0 07 and $0x7,%eax 12: 48 89 f1 mov %rsi,%rcx 15: 48 85 c9 test %rcx,%rcx 18: 74 0f je 29 <__clear_user+0x29> 1a: 48 c7 07 00 00 00 00 movq $0x0,(%rdi) 21: 48 83 c7 08 add $0x8,%rdi 25: ff c9 dec %ecx 27: 75 f1 jne 1a <__clear_user+0x1a> 29: 48 89 c1 mov %rax,%rcx 2c: 85 c9 test %ecx,%ecx 2e: 74 0a je 3a <__clear_user+0x3a> 30: c6 07 00 movb $0x0,(%rdi) 33: 48 ff c7 inc %rdi 36: ff c9 dec %ecx 38: 75 f6 jne 30 <__clear_user+0x30> 3a: 90 nop 3b: 90 nop 3c: 90 nop 3d: 48 89 c8 mov %rcx,%rax 40: c3 retq Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>	2019-04-03 11:02:24 +02:00
Peter Zijlstra	aaf5c623b9	objtool: Rewrite add_ignores() The whole add_ignores() thing was wildly weird; rewrite it according to 'modern' ways. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>	2019-04-03 11:02:24 +02:00
Peter Zijlstra	09f30d83d3	objtool: Handle function aliases Function aliases result in different symbols for the same set of instructions; track a canonical symbol so there is a unique point of access. This again prepares the way for function attributes. And in particular the need for aliases comes from how KASAN uses them. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>	2019-04-03 11:02:24 +02:00
Peter Zijlstra	a4d09dde90	objtool: Set insn->func for alternatives In preparation of function attributes, we need each instruction to have a valid link back to its function. Therefore make sure we set the function association for alternative instruction sequences; they are, after all, still part of the function. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>	2019-04-03 11:02:24 +02:00
Peter Zijlstra	ff05ab2305	x86/nospec, objtool: Introduce ANNOTATE_IGNORE_ALTERNATIVE To facillitate other usage of ignoring alternatives; rename ANNOTATE_NOSPEC_IGNORE to ANNOTATE_IGNORE_ALTERNATIVE. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>	2019-04-03 09:39:46 +02:00
Stanislav Fomichev	7596aa3ea8	selftests: bpf: remove duplicate .flags initialization in ctx_skb.c verifier/ctx_skb.c:708:11: warning: initializer overrides prior initialization of this subobject [-Winitializer-overrides] .flags = F_NEEDS_EFFICIENT_UNALIGNED_ACCESS, ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Signed-off-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2019-04-02 23:17:18 +02:00
Stanislav Fomichev	a918b03e8c	selftests: bpf: fix -Wformat-invalid-specifier for bpf_obj_id.c Use standard C99 %zu for sizeof, not GCC's custom %Zu: bpf_obj_id.c:76:48: warning: invalid conversion specifier 'Z' Signed-off-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2019-04-02 23:17:18 +02:00
Stanislav Fomichev	94e8f3c712	selftests: bpf: fix -Wformat-security warning for flow_dissector_load.c flow_dissector_load.c:55:19: warning: format string is not a string literal (potentially insecure) [-Wformat-security] error(1, errno, command); ^~~~~~~ flow_dissector_load.c:55:19: note: treat the string as an argument to avoid this error(1, errno, command); ^ "%s", 1 warning generated. Signed-off-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2019-04-02 23:17:18 +02:00
Stanislav Fomichev	6b7b6995c4	selftests: bpf: tests.h should depend on .c files, not the output This makes sure we don't put headers as input files when doing compilation, because clang complains about the following: clang-9: error: cannot specify -o when generating multiple output files ../lib.mk:152: recipe for target 'xxx/tools/testing/selftests/bpf/test_verifier' failed make: * [xxx/tools/testing/selftests/bpf/test_verifier] Error 1 make: * Waiting for unfinished jobs.... clang-9: error: cannot specify -o when generating multiple output files ../lib.mk:152: recipe for target 'xxx/tools/testing/selftests/bpf/test_progs' failed Signed-off-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2019-04-02 23:17:18 +02:00
Andi Kleen	1c3a2c864d	perf vendor events intel: Update Silvermont to v14 Signed-off-by: Andi Kleen <ak@linux.intel.com> Cc: Kan Liang <kan.liang@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Link: https://lkml.kernel.org/r/20190315165219.GA21223@tassilo.jf.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2019-04-01 15:23:48 -03:00
Andi Kleen	c53dd58988	perf vendor events intel: Update GoldmontPlus to v1.01 Signed-off-by: Andi Kleen <ak@linux.intel.com> Cc: Kan Liang <kan.liang@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Link: https://lkml.kernel.org/r/20190315165219.GA21223@tassilo.jf.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2019-04-01 15:23:46 -03:00
Andi Kleen	f3ef08583e	perf vendor events intel: Update Goldmont to v13 Signed-off-by: Andi Kleen <ak@linux.intel.com> Cc: Kan Liang <kan.liang@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Link: https://lkml.kernel.org/r/20190315165219.GA21223@tassilo.jf.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2019-04-01 15:23:44 -03:00
Andi Kleen	b1580f542c	perf vendor events intel: Update Bonnell to V4 Signed-off-by: Andi Kleen <ak@linux.intel.com> Cc: Kan Liang <kan.liang@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Link: https://lkml.kernel.org/r/20190315165219.GA21223@tassilo.jf.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2019-04-01 15:23:42 -03:00
Andi Kleen	643e72255e	perf vendor events intel: Update KnightsLanding events to v9 Signed-off-by: Andi Kleen <ak@linux.intel.com> Cc: Kan Liang <kan.liang@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Link: https://lkml.kernel.org/r/20190315165219.GA21223@tassilo.jf.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2019-04-01 15:23:40 -03:00
Andi Kleen	efc351f1b5	perf vendor events intel: Update Haswell events to v28 Signed-off-by: Andi Kleen <ak@linux.intel.com> Cc: Kan Liang <kan.liang@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Link: https://lkml.kernel.org/r/20190315165219.GA21223@tassilo.jf.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2019-04-01 15:23:38 -03:00
Andi Kleen	2111da70ff	perf vendor events intel: Update IvyBridge events to v21 Signed-off-by: Andi Kleen <ak@linux.intel.com> Cc: Kan Liang <kan.liang@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Link: https://lkml.kernel.org/r/20190315165219.GA21223@tassilo.jf.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2019-04-01 15:23:35 -03:00
Andi Kleen	59da390e54	perf vendor events intel: Update SandyBridge events to v16 Signed-off-by: Andi Kleen <ak@linux.intel.com> Cc: Kan Liang <kan.liang@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Link: https://lkml.kernel.org/r/20190315165219.GA21223@tassilo.jf.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2019-04-01 15:23:33 -03:00
Andi Kleen	e6b32be445	perf vendor events intel: Update JakeTown events to v20 Signed-off-by: Andi Kleen <ak@linux.intel.com> Cc: Kan Liang <kan.liang@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Link: https://lkml.kernel.org/r/20190315165219.GA21223@tassilo.jf.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2019-04-01 15:23:31 -03:00
Andi Kleen	009edd9ae0	perf vendor events intel: Update IvyTown events to v20 Signed-off-by: Andi Kleen <ak@linux.intel.com> Cc: Kan Liang <kan.liang@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Link: https://lkml.kernel.org/r/20190315165219.GA21223@tassilo.jf.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2019-04-01 15:23:29 -03:00
Andi Kleen	e313477f7e	perf vendor events intel: Update HaswellX events to v20 Signed-off-by: Andi Kleen <ak@linux.intel.com> Cc: Kan Liang <kan.liang@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Link: https://lkml.kernel.org/r/20190315165219.GA21223@tassilo.jf.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2019-04-01 15:23:26 -03:00
Andi Kleen	9f0f4a242c	perf vendor events intel: Update BroadwellX events to v14 Signed-off-by: Andi Kleen <ak@linux.intel.com> Cc: Kan Liang <kan.liang@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Link: https://lkml.kernel.org/r/20190315165219.GA21223@tassilo.jf.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2019-04-01 15:23:24 -03:00
Andi Kleen	19f2d40c57	perf vendor events intel: Update SkylakeX events to v1.12 Signed-off-by: Andi Kleen <ak@linux.intel.com> Cc: Kan Liang <kan.liang@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Link: https://lkml.kernel.org/r/20190315165219.GA21223@tassilo.jf.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2019-04-01 15:23:21 -03:00
Andi Kleen	24339348b9	perf vendor events intel: Update Skylake events to v42 Signed-off-by: Andi Kleen <ak@linux.intel.com> Cc: Kan Liang <kan.liang@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Link: https://lkml.kernel.org/r/20190315165219.GA21223@tassilo.jf.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2019-04-01 15:23:18 -03:00
Andi Kleen	d2243329ef	perf vendor events intel: Update Broadwell-DE events to v7 Signed-off-by: Andi Kleen <ak@linux.intel.com> Cc: Kan Liang <kan.liang@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Link: https://lkml.kernel.org/r/20190315165219.GA21223@tassilo.jf.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2019-04-01 15:23:14 -03:00
Andi Kleen	8313fe2d68	perf vendor events intel: Update Broadwell events to v23 Signed-off-by: Andi Kleen <ak@linux.intel.com> Cc: Kan Liang <kan.liang@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Link: https://lkml.kernel.org/r/20190315165219.GA21223@tassilo.jf.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2019-04-01 15:23:09 -03:00
Andi Kleen	fd5500989c	perf vendor events intel: Update metrics from TMAM 3.5 Update all the Intel JSON metrics from Ahmad Yasin's TMAM 3.5 for Intel big core from Sandy Bridge to Cascade Lake. This has many improvements and new metircs - New TopDownL1_SMT group that provides a per SMT thread version of --topdown that does not require -a anymore. The drawback is increased multiplexing though since L1 TopDown does not fit into 4 generic counters anymore. - Added SMT aware versions of other metrics - Split SMT aware metrics into separate metrics to avoid unnecessary event collections - New metrics for better branch analysis: Estimated Branch_Mispredict_Costs, Instructions per taken Branch, Branch Instructions per Taken Branch, etc. - Instruction mix metrics: Instructions per load, Instructions per store, Instructions per Branch, Instructions per Call - New Cache metrics: Bandwidth to L1/L2/L3 caches. L1/L2/L3 misses per kilo instructions. memory level parallelism - New memory controller metrics: Normalized memory bandwidth in interval mode, Average memory latency, Average number of parallel read requests, - 3DXP persistent memory metrics for Cascade Lake: 3dxp read latency, 3dxp read/write bandwidth - Some other useful metrics like Instruction Level Parallelism, - Various other improvements. Not all metrics are available on all CPUs. Skylake has best coverage. Signed-off-by: Andi Kleen <ak@linux.intel.com> Cc: Kan Liang <kan.liang@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Link: https://lkml.kernel.org/r/20190315165219.GA21223@tassilo.jf.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2019-04-01 15:22:22 -03:00
Alexey Budankov	470530bbb8	perf record: Implement --mmap-flush=<number> option Implement a --mmap-flush option that specifies minimal number of bytes that is extracted from mmaped kernel buffer to store into a trace. The default option value is 1 byte what means every time trace writing thread finds some new data in the mmaped buffer the data is extracted, possibly compressed and written to a trace. $ tools/perf/perf record --mmap-flush 1024 -e cycles -- matrix.gcc $ tools/perf/perf record --aio --mmap-flush 1K -e cycles -- matrix.gcc The option is independent from -z setting, doesn't vary with compression level and can serve two purposes. The first purpose is to increase the compression ratio of a trace data. Larger data chunks are compressed more effectively so the implemented option allows specifying data chunk size to compress. Also at some cases executing more write syscalls with smaller data size can take longer than executing less write syscalls with bigger data size due to syscall overhead so extracting bigger data chunks specified by the option value could additionally decrease runtime overhead. The second purpose is to avoid self monitoring live-lock issue in system wide (-a) profiling mode. Profiling in system wide mode with compression (-a -z) can additionally induce data into the kernel buffers along with the data from monitored processes. If performance data rate and volume from the monitored processes is high then trace streaming and compression activity in the tool is also high. High tool process activity can lead to subtle live-lock effect when compression of single new byte from some of mmaped kernel buffer leads to generation of the next single byte at some mmaped buffer. So perf tool process ends up in endless self monitoring. Implemented synch parameter is the mean to force data move independently from the specified flush threshold value. Despite the provided flush value the tool needs capability to unconditionally drain memory buffers, at least in the end of the collection. Committer testing: Running with the default value, i.e. as soon as there is something to read go on consuming, we first write the synthesized events, small chunks of about 128 bytes: # perf trace -m 2048 --call-graph dwarf -e write -- perf record <SNIP> 101.142 ( 0.004 ms): perf/25821 write(fd: 3</root/perf.data>, buf: 0x210db60, count: 120) = 120 __libc_write (/usr/lib64/libpthread-2.28.so) ion (/home/acme/bin/perf) record__write (inlined) process_synthesized_event (/home/acme/bin/perf) perf_tool__process_synth_event (inlined) perf_event__synthesize_mmap_events (/home/acme/bin/perf) Then we move to reading the mmap buffers consuming the events put there by the kernel perf infrastructure: 107.561 ( 0.005 ms): perf/25821 write(fd: 3</root/perf.data>, buf: 0x7f1befc02000, count: 336) = 336 __libc_write (/usr/lib64/libpthread-2.28.so) ion (/home/acme/bin/perf) record__write (inlined) record__pushfn (/home/acme/bin/perf) perf_mmap__push (/home/acme/bin/perf) record__mmap_read_evlist (inlined) record__mmap_read_all (inlined) __cmd_record (inlined) cmd_record (/home/acme/bin/perf) 12919.953 ( 0.136 ms): perf/25821 write(fd: 3</root/perf.data>, buf: 0x7f1befc83150, count: 184984) = 184984 <SNIP same backtrace as in the 107.561 timestamp> 12920.094 ( 0.155 ms): perf/25821 write(fd: 3</root/perf.data>, buf: 0x7f1befc02150, count: 261816) = 261816 <SNIP same backtrace as in the 107.561 timestamp> 12920.253 ( 0.093 ms): perf/25821 write(fd: 3</root/perf.data>, buf: 0x7f1befb81120, count: 170832) = 170832 <SNIP same backtrace as in the 107.561 timestamp> If we limit it to write only when more than 16MB are available for reading, it throttles that to a quarter of the --mmap-pages set for 'perf record', which by default get to 528384 bytes, found out using 'record -v': mmap flush: 132096 mmap size 528384B With that in place all the writes coming from record__mmap_read_evlist(), i.e. from the mmap buffers setup by the kernel perf infrastructure were at least 132096 bytes long. Trying with a bigger mmap size: perf trace -e write perf record -v -m 2048 --mmap-flush 16M 74982.928 ( 2.471 ms): perf/26500 write(fd: 3</root/perf.data>, buf: 0x7ff94a6cc000, count: 3580888) = 3580888 74985.406 ( 2.353 ms): perf/26500 write(fd: 3</root/perf.data>, buf: 0x7ff949ecb000, count: 3453256) = 3453256 74987.764 ( 2.629 ms): perf/26500 write(fd: 3</root/perf.data>, buf: 0x7ff9496ca000, count: 3859232) = 3859232 74990.399 ( 2.341 ms): perf/26500 write(fd: 3</root/perf.data>, buf: 0x7ff948ec9000, count: 3769032) = 3769032 74992.744 ( 2.064 ms): perf/26500 write(fd: 3</root/perf.data>, buf: 0x7ff9486c8000, count: 3310520) = 3310520 74994.814 ( 2.619 ms): perf/26500 write(fd: 3</root/perf.data>, buf: 0x7ff947ec7000, count: 4194688) = 4194688 74997.439 ( 2.787 ms): perf/26500 write(fd: 3</root/perf.data>, buf: 0x7ff9476c6000, count: 4029760) = 4029760 Was again limited to a quarter of the mmap size: mmap flush: 2098176 mmap size 8392704B A warning about that would be good to have but can be added later, something like: "max flush is a quarter of the mmap size, if wanting to bump the mmap flush further, bump the mmap size as well using -m/--mmap-pages" Also rename the 'sync' parameters to 'synch' to keep tools/perf building with older glibcs: cc1: warnings being treated as errors builtin-record.c: In function 'record__mmap_read_evlist': builtin-record.c:775: warning: declaration of 'sync' shadows a global declaration /usr/include/unistd.h:933: warning: shadowed declaration is here builtin-record.c: In function 'record__mmap_read_all': builtin-record.c:856: warning: declaration of 'sync' shadows a global declaration /usr/include/unistd.h:933: warning: shadowed declaration is here Signed-off-by: Alexey Budankov <alexey.budankov@linux.intel.com> Reviewed-by: Jiri Olsa <jolsa@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/f6600d72-ecfa-2eb7-7e51-f6954547d500@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2019-04-01 15:18:10 -03:00

... 3 4 5 6 7 ...

17520 Commits