Commit Graph

7164 Commits

Author SHA1 Message Date
Wang Nan
7bf98369a7 bpf tools: Introduce bpf_load_program() to bpf.c
bpf_load_program() can be used to load bpf program into kernel. To make
loading faster, first try to load without logbuf. Try again with logbuf
if the first try failed.

Signed-off-by: Wang Nan <wangnan0@huawei.com>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Cc: Brendan Gregg <brendan.d.gregg@gmail.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: David Ahern <dsahern@gmail.com>
Cc: He Kuang <hekuang@huawei.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kaixu Xia <xiakaixu@huawei.com>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Zefan Li <lizefan@huawei.com>
Cc: pi3orama@163.com
Link: http://lkml.kernel.org/r/1435716878-189507-19-git-send-email-wangnan0@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-08-07 10:16:58 -03:00
Wang Nan
8a47a6c522 bpf tools: Relocate eBPF programs
If an eBPF program accesses a map, LLVM generates a load instruction
which loads an absolute address into a register, like this:

  ld_64   r1, <MCOperand Expr:(mymap)>
  ...
  call    2

That ld_64 instruction will be recorded in relocation section.
To enable the usage of that map, relocation must be done by replacing
the immediate value by real map file descriptor so it can be found by
eBPF map functions.

This patch to the relocation work based on information collected by
patches:

'bpf tools: Collect symbol table from SHT_SYMTAB section',
'bpf tools: Collect relocation sections from SHT_REL sections'
and
'bpf tools: Record map accessing instructions for each program'.

For each instruction which needs relocation, it inject corresponding
file descriptor to imm field. As a part of protocol, src_reg is set to
BPF_PSEUDO_MAP_FD to notify kernel this is a map loading instruction.

This is the final part of map relocation patch. The principle of map
relocation is described in commit message of 'bpf tools: Collect symbol
table from SHT_SYMTAB section'.

Signed-off-by: Wang Nan <wangnan0@huawei.com>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Cc: Brendan Gregg <brendan.d.gregg@gmail.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: David Ahern <dsahern@gmail.com>
Cc: He Kuang <hekuang@huawei.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kaixu Xia <xiakaixu@huawei.com>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Zefan Li <lizefan@huawei.com>
Cc: pi3orama@163.com
Link: http://lkml.kernel.org/r/1435716878-189507-18-git-send-email-wangnan0@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-08-07 10:16:58 -03:00
Wang Nan
52d3352e79 bpf tools: Create eBPF maps defined in an object file
This patch creates maps based on 'map' section in object file using
bpf_create_map(), and stores the fds into an array in 'struct
bpf_object'.

Previous patches parse ELF object file and collects required data, but
doesn't play with the kernel. They belong to the 'opening' phase. This
patch is the first patch in 'loading' phase. The 'loaded' field is
introduced in 'struct bpf_object' to avoid loading an object twice,
because the loading phase clears resources collected during the opening
which becomes useless after loading. In this patch, maps_buf is cleared.

Signed-off-by: Wang Nan <wangnan0@huawei.com>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Cc: Brendan Gregg <brendan.d.gregg@gmail.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: David Ahern <dsahern@gmail.com>
Cc: He Kuang <hekuang@huawei.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kaixu Xia <xiakaixu@huawei.com>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Zefan Li <lizefan@huawei.com>
Cc: pi3orama@163.com
Link: http://lkml.kernel.org/r/1435716878-189507-17-git-send-email-wangnan0@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-08-07 10:16:58 -03:00
Wang Nan
e3ed2fef22 bpf tools: Add bpf.c/h for common bpf operations
This patch introduces bpf.c and bpf.h, which hold common functions
issuing bpf syscall. The goal of these two files is to hide syscall
completely from user. Note that bpf.c and bpf.h deal with kernel
interface only. Things like structure of 'map' section in the ELF object
is not cared by of bpf.[ch].

We first introduce bpf_create_map().

Note that, since functions in bpf.[ch] are wrapper of sys_bpf, they
don't use OO style naming.

Signed-off-by: Wang Nan <wangnan0@huawei.com>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Cc: Brendan Gregg <brendan.d.gregg@gmail.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: David Ahern <dsahern@gmail.com>
Cc: He Kuang <hekuang@huawei.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kaixu Xia <xiakaixu@huawei.com>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Zefan Li <lizefan@huawei.com>
Cc: pi3orama@163.com
Link: http://lkml.kernel.org/r/1435716878-189507-16-git-send-email-wangnan0@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-08-07 10:16:58 -03:00
Wang Nan
340909152a bpf tools: Record map accessing instructions for each program
This patch records the indices of instructions which are needed to be
relocated. That information is saved in the 'reloc_desc' field in
'struct bpf_program'. In the loading phase (this patch takes effect in
the opening phase), the collected instructions will be replaced by map
loading instructions.

Since we are going to close the ELF file and clear all data at the end
of the 'opening' phase, the ELF information will no longer be valid in
the 'loading' phase. We have to locate the instructions before maps are
loaded, instead of directly modifying the instruction.

'struct bpf_map_def' is introduced in this patch to let us know how many
maps are defined in the object.

This is the third part of map relocation. The principle of map relocation
is described in commit message of 'bpf tools: Collect symbol table from
SHT_SYMTAB section'.

Signed-off-by: Wang Nan <wangnan0@huawei.com>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Cc: Brendan Gregg <brendan.d.gregg@gmail.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: David Ahern <dsahern@gmail.com>
Cc: He Kuang <hekuang@huawei.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kaixu Xia <xiakaixu@huawei.com>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Zefan Li <lizefan@huawei.com>
Cc: pi3orama@163.com
Link: http://lkml.kernel.org/r/1435716878-189507-15-git-send-email-wangnan0@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-08-07 10:16:58 -03:00
Wang Nan
b62f06e81b bpf tools: Collect relocation sections from SHT_REL sections
This patch collects relocation sections into 'struct object'.  Such
sections are used for connecting maps to bpf programs. 'reloc' field in
'struct bpf_object' is introduced for storing such information.

This patch simply store the data into 'reloc' field. Following patch
will parse them to know the exact instructions which are needed to be
relocated.

Note that the collected data will be invalid after ELF object file is
closed.

This is the second patch related to map relocation. The first one is
'bpf tools: Collect symbol table from SHT_SYMTAB section'. The
principle of map relocation is described in its commit message.

Signed-off-by: Wang Nan <wangnan0@huawei.com>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Cc: Brendan Gregg <brendan.d.gregg@gmail.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: David Ahern <dsahern@gmail.com>
Cc: He Kuang <hekuang@huawei.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kaixu Xia <xiakaixu@huawei.com>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Zefan Li <lizefan@huawei.com>
Cc: pi3orama@163.com
Link: http://lkml.kernel.org/r/1435716878-189507-14-git-send-email-wangnan0@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-08-07 10:16:57 -03:00
Wang Nan
a5b8bd47dc bpf tools: Collect eBPF programs from their own sections
This patch collects all programs in an object file into an array of
'struct bpf_program' for further processing. That structure is for
representing each eBPF program. 'bpf_prog' should be a better name, but
it has been used by linux/filter.h. Although it is a kernel space name,
I still prefer to call it 'bpf_program' to prevent possible confusion.

bpf_object__add_program() creates a new 'struct bpf_program' object.
It first init a variable in stack using bpf_program__init(), then if
success, enlarges obj->programs array and copy the new object in.

Signed-off-by: Wang Nan <wangnan0@huawei.com>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Cc: Brendan Gregg <brendan.d.gregg@gmail.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: David Ahern <dsahern@gmail.com>
Cc: He Kuang <hekuang@huawei.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kaixu Xia <xiakaixu@huawei.com>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Zefan Li <lizefan@huawei.com>
Cc: pi3orama@163.com
Link: http://lkml.kernel.org/r/1435716878-189507-13-git-send-email-wangnan0@huawei.com
[ Made bpf_object__add_program() propagate the error (-EINVAL or -ENOMEM) ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-08-07 10:16:57 -03:00
Wang Nan
bec7d68cb5 bpf tools: Collect symbol table from SHT_SYMTAB section
This patch collects symbols section. This section is useful when linking
BPF maps.

What 'bpf_map_xxx()' functions actually require are map's file
descriptors (and the internal verifier converts fds into pointers to
'struct bpf_map'), which we don't know when compiling. Therefore, we
should make compiler generate a 'ldr_64 r1, <imm>' instruction, and
fill the 'imm' field with the actual file descriptor when loading in
libbpf.

BPF programs should be written in this way:

 struct bpf_map_def SEC("maps") my_map = {
    .type = BPF_MAP_TYPE_HASH,
    .key_size = sizeof(unsigned long),
    .value_size = sizeof(unsigned long),
    .max_entries = 1000000,
 };

 SEC("my_func=sys_write")
 int my_func(void *ctx)
 {
     ...
     bpf_map_update_elem(&my_map, &key, &value, BPF_ANY);
     ...
 }

Compiler should convert '&my_map' into a 'ldr_64, r1, <imm>'
instruction, where imm should be the address of 'my_map'. According to
the address, libbpf knows which map it actually referenced, and then
fills the imm field with the 'fd' of that map created by it.

However, since we never really 'link' the object file, the imm field is
only a record in relocation section. Therefore libbpf should do the
relocation:

 1. In relocation section (type == SHT_REL), positions of each such
    'ldr_64' instruction are recorded with a reference of an entry in
    symbol table (SHT_SYMTAB);

 2. From records in symbol table we can find the indics of map
    variables.

Libbpf first record SHT_SYMTAB and positions of each instruction which
required bu such operation. Then create file descriptor. Finally, after
map creation complete, replace the imm field.

This is the first patch of BPF map related stuff. It records SHT_SYMTAB
into object's efile field for further use.

Signed-off-by: Wang Nan <wangnan0@huawei.com>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Cc: Brendan Gregg <brendan.d.gregg@gmail.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: David Ahern <dsahern@gmail.com>
Cc: He Kuang <hekuang@huawei.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kaixu Xia <xiakaixu@huawei.com>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Zefan Li <lizefan@huawei.com>
Cc: pi3orama@163.com
Link: http://lkml.kernel.org/r/1435716878-189507-12-git-send-email-wangnan0@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-08-07 10:16:57 -03:00
Wang Nan
0b3d1efade bpf tools: Collect map definitions from 'maps' section
If maps are used by eBPF programs, corresponding object file(s) should
contain a section named 'map'. Which contains map definitions. This
patch copies the data of the whole section. Map data parsing should be
acted just before map loading.

Signed-off-by: Wang Nan <wangnan0@huawei.com>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Cc: Brendan Gregg <brendan.d.gregg@gmail.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: David Ahern <dsahern@gmail.com>
Cc: He Kuang <hekuang@huawei.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kaixu Xia <xiakaixu@huawei.com>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Zefan Li <lizefan@huawei.com>
Cc: pi3orama@163.com
Link: http://lkml.kernel.org/r/1435716878-189507-11-git-send-email-wangnan0@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-08-07 10:16:57 -03:00
Wang Nan
cb1e5e9619 bpf tools: Collect version and license from ELF sections
Expand bpf_obj_elf_collect() to collect license and kernel version
information in eBPF object file. eBPF object file should have a section
named 'license', which contains a string. It should also have a section
named 'version', contains a u32 LINUX_VERSION_CODE.

bpf_obj_validate() is introduced to validate object file after loaded.
Currently it only check existence of 'version' section.

Signed-off-by: Wang Nan <wangnan0@huawei.com>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Cc: Brendan Gregg <brendan.d.gregg@gmail.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: David Ahern <dsahern@gmail.com>
Cc: He Kuang <hekuang@huawei.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kaixu Xia <xiakaixu@huawei.com>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Zefan Li <lizefan@huawei.com>
Cc: pi3orama@163.com
Link: http://lkml.kernel.org/r/1435716878-189507-10-git-send-email-wangnan0@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-08-07 10:16:57 -03:00
Wang Nan
296036653a bpf tools: Iterate over ELF sections to collect information
bpf_obj_elf_collect() is introduced to iterate over each elf sections to
collection information in eBPF object files. This function will futher
enhanced to collect license, kernel version, programs, configs and map
information.

Signed-off-by: Wang Nan <wangnan0@huawei.com>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Cc: Brendan Gregg <brendan.d.gregg@gmail.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: David Ahern <dsahern@gmail.com>
Cc: He Kuang <hekuang@huawei.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kaixu Xia <xiakaixu@huawei.com>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Zefan Li <lizefan@huawei.com>
Cc: pi3orama@163.com
Link: http://lkml.kernel.org/r/1435716878-189507-9-git-send-email-wangnan0@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-08-07 10:16:56 -03:00
Wang Nan
cc4228d57c bpf tools: Check endianness and make libbpf fail early
Check endianness according to EHDR. Code is taken from
tools/perf/util/symbol-elf.c.

Libbpf doesn't magically convert missmatched endianness. Even if we swap
eBPF instructions to correct byte order, we are unable to deal with
endianness in code logical generated by LLVM.

Therefore, libbpf should simply reject missmatched ELF object, and let
LLVM to create good code.

Signed-off-by: Wang Nan <wangnan0@huawei.com>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Cc: Brendan Gregg <brendan.d.gregg@gmail.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: David Ahern <dsahern@gmail.com>
Cc: He Kuang <hekuang@huawei.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kaixu Xia <xiakaixu@huawei.com>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Zefan Li <lizefan@huawei.com>
Cc: pi3orama@163.com
Link: http://lkml.kernel.org/r/1435716878-189507-8-git-send-email-wangnan0@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-08-07 10:16:56 -03:00
Wang Nan
6c956392b0 bpf tools: Read eBPF object from buffer
To support dynamic compiling, this patch allows caller to pass a
in-memory buffer to libbpf by bpf_object__open_buffer(). libbpf calls
elf_memory() to open it as ELF object file.

Because __bpf_object__open() collects all required data and won't need
that buffer anymore, libbpf uses that buffer directly instead of clone a
new buffer. Caller of libbpf can free that buffer or use it do other
things after bpf_object__open_buffer() return.

Signed-off-by: Wang Nan <wangnan0@huawei.com>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Cc: Brendan Gregg <brendan.d.gregg@gmail.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: David Ahern <dsahern@gmail.com>
Cc: He Kuang <hekuang@huawei.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kaixu Xia <xiakaixu@huawei.com>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Zefan Li <lizefan@huawei.com>
Cc: pi3orama@163.com
Link: http://lkml.kernel.org/r/1435716878-189507-7-git-send-email-wangnan0@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-08-07 10:16:56 -03:00
Wang Nan
1a5e3fb1e9 bpf tools: Open eBPF object file and do basic validation
This patch defines basic interface of libbpf. 'struct bpf_object' will
be the handler of each object file. Its internal structure is hide to
user. eBPF object files are compiled by LLVM as ELF format. In this
patch, libelf is used to open those files, read EHDR and do basic
validation according to e_type and e_machine.

All elf related staffs are grouped together and reside in efile field of
'struct bpf_object'. bpf_object__elf_finish() is introduced to clear it.

After all eBPF programs in an object file are loaded, related ELF
information is useless. Close the object file and free those memory.

The zfree() and zclose() functions are introduced to ensure setting NULL
pointers and negative file descriptors after resources are released.

Signed-off-by: Wang Nan <wangnan0@huawei.com>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Cc: Brendan Gregg <brendan.d.gregg@gmail.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: David Ahern <dsahern@gmail.com>
Cc: He Kuang <hekuang@huawei.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kaixu Xia <xiakaixu@huawei.com>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Zefan Li <lizefan@huawei.com>
Cc: pi3orama@163.com
Link: http://lkml.kernel.org/r/1435716878-189507-6-git-send-email-wangnan0@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-08-07 10:16:56 -03:00
Wang Nan
b3f59d66e2 bpf tools: Allow caller to set printing function
By libbpf_set_print(), users of libbpf are allowed to register he/she
own debug, info and warning printing functions. Libbpf will use those
functions to print messages. If not provided, default info and warning
printing functions are fprintf(stderr, ...); default debug printing
is NULL.

This API is designed to be used by perf, enables it to register its own
logging functions to make all logs uniform, instead of separated
logging level control.

Signed-off-by: Wang Nan <wangnan0@huawei.com>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Cc: Brendan Gregg <brendan.d.gregg@gmail.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: David Ahern <dsahern@gmail.com>
Cc: He Kuang <hekuang@huawei.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kaixu Xia <xiakaixu@huawei.com>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Zefan Li <lizefan@huawei.com>
Cc: pi3orama@163.com
Link: http://lkml.kernel.org/r/1435716878-189507-5-git-send-email-wangnan0@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-08-07 10:16:56 -03:00
Wang Nan
1b76c13e4b bpf tools: Introduce 'bpf' library and add bpf feature check
This is the first patch of libbpf. The goal of libbpf is to create a
standard way for accessing eBPF object files. This patch creates
'Makefile' and 'Build' for it, allows 'make' to build libbpf.a and
libbpf.so, 'make install' to put them into proper directories.
Most part of Makefile is borrowed from traceevent.

Before building, it checks the existence of libelf in Makefile, and deny
to build if not found. Instead of throwing an error if libelf not found,
the error raises in a phony target "elfdep". This design is to ensure
'make clean' still workable even if libelf is not found.

Because libbpf requires 'kern_version' field set for 'union bpf_attr'
(bpfdep" is used for that dependency), Kernel BPF API is also checked
by intruducing a new feature check 'bpf' into tools/build/feature,
which checks the existence and version of linux/bpf.h. When building
libbpf, it searches that file from include/uapi/linux in kernel source
tree (controlled by FEATURE_CHECK_CFLAGS-bpf). Since it searches kernel
source tree it reside, installing of newest kernel headers is not
required, except we are trying to port these files to an old kernel.

To avoid checking that file when perf building, the newly introduced
'bpf' feature check doesn't added into FEATURE_TESTS and
FEATURE_DISPLAY by default in tools/build/Makefile.feature, but added
into libbpf's specific.

Signed-off-by: Wang Nan <wangnan0@huawei.com>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Cc: Brendan Gregg <brendan.d.gregg@gmail.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: David Ahern <dsahern@gmail.com>
Cc: He Kuang <hekuang@huawei.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kaixu Xia <xiakaixu@huawei.com>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Zefan Li <lizefan@huawei.com>
Bcc: pi3orama@163.com
Link: http://lkml.kernel.org/r/1435716878-189507-4-git-send-email-wangnan0@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-08-07 10:16:56 -03:00
Adrian Hunter
141b2d3161 perf tools: Extend the event parser maximum error index
Extend the event parser maximum error index from 10 to 13.  That allows
PMU config terms of up to 10 characters to display un-truncated in the
error message.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Link: http://lkml.kernel.org/r/1437150840-31811-17-git-send-email-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-08-06 16:49:44 -03:00
Adrian Hunter
0efe6b6769 perf tools: Validate config term maximum value
Currently the value of a PMU config term is silently truncated if it is
too big. This is an impediment to validating the value for other
criteria later on i.e.  the user provides an invalid value that gets
truncated to a valid one.

The maximum value validation is only done for the parser where the error
is passed back to the user. In other cases the silent truncation
continues so as not to affect tools that perhaps rely on it.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Link: http://lkml.kernel.org/r/1437150840-31811-16-git-send-email-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-08-06 16:49:28 -03:00
Adrian Hunter
09ff607176 perf tools: Add perf_pmu__format_bits()
Add perf_pmu__format_bits() to get the format bits for a PMU config
term.  Intel PT will use this to validate terms and to record format
bits to enable later interpreting the config from the attribute stored
in the perf.data file.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Link: http://lkml.kernel.org/r/1437150840-31811-15-git-send-email-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-08-06 16:49:01 -03:00
Adrian Hunter
8bd1b2d257 perf tools: Fix perf-with-kcore handling of arguments containing spaces
Fix the perf-with-kcore script so that it doesn't split arguments that
contain spaces.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Link: http://lkml.kernel.org/r/1437150840-31811-13-git-send-email-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-08-06 16:48:27 -03:00
Adrian Hunter
f70cfa07e3 perf auxtrace: Fix period type 'i' not working
PERF_ITRACE_PERIOD_INSTRUCTIONS is zero so it got overwritten by the
default period type.

Fix by checking if the period type was set rather than if the value was
zero when applying the default.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Link: http://lkml.kernel.org/r/1437150840-31811-12-git-send-email-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-08-06 16:47:58 -03:00
Max Filippov
74d4582f43 perf tools xtensa: Add DWARF register names
Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
Cc: Chris Zankel <chris@zankel.net>
Cc: Marc Gauthier <marc@cadence.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: linux-xtensa@linux-xtensa.org
Link: http://lkml.kernel.org/r/1437208216-15729-9-git-send-email-jcmvbkbc@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-08-06 16:45:05 -03:00
Andi Kleen
40997d6cf9 perf report: Display cycles in branch sort mode
Display the cycles by default in branch sort mode.

To make enough room for the new column I removed dso_to. It is usually
redundant with dso_from.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: http://lkml.kernel.org/r/1437233094-12844-9-git-send-email-andi@firstfloor.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-08-06 16:39:53 -03:00
Andi Kleen
a18b027efe perf top: Add branch annotation code to top
Now that we can process branch data in annotate it makes sense to
support enabling branch recording from top too. Most of the code needed
for this is already in shared code with report. But we need to add:

- The option parsing code (using shared code from the previous patch)
- Document the options
- Set up the IPC/cycles accounting state in the top session
- Call the accounting code in the hist iter callback

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: http://lkml.kernel.org/r/1437233094-12844-8-git-send-email-andi@firstfloor.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-08-06 16:39:22 -03:00
Andi Kleen
f8f4aaead5 perf annotate: Finally display IPC and cycle accounting
Add two new columns to the annotate display and display the average
cycles and the compute IPC if available.

When the LBR was not in any branch mode the IPC computation is
automatically disabled. We still display the cycle information.

Example output (with made up numbers):

The second column is the IPC and third average cycles.

                 │    __attribute__((noinline)) f2()
                 │    {
  5.15  0.07     │       push   %rbp
  0.01  0.07     │       mov    %rsp,%rbp
                 │            c = a / b;
  9.87  0.07     │       mov    a,%eax
        0.07     │       mov    b,%ecx
        0.07     │       cltd
  4.92  0.07  123│       idiv   %ecx
 70.79  0.07     │       mov    %eax,__TMC_END__
                 │    }
  9.25  0.07     │       pop    %rbp
  0.01  0.07  123│     ← retq

v2: Fix display problems.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: http://lkml.kernel.org/r/1437233094-12844-7-git-send-email-andi@firstfloor.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-08-06 16:37:22 -03:00
Andi Kleen
30e863bb6f perf annotate: Compute IPC and basic block cycles
Compute the IPC and the basic block cycles for the annotate display.

IPC is computed by counting the instructions, and then dividing the
accounted cycles by that count.

The actual IPC computation can only be done at annotate time, because we
need to parse the objdump output first to know the number of
instructions in the basic block.

The cycles/IPC are also put into the perf function annotation so that
the display code can show them.

Again basic block overlaps are not handled, with the longest winning,
but there are some heuristics to hide the IPC when the longest is not
the most common.

v2: Compute IPC correctly.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: http://lkml.kernel.org/r/1437233094-12844-6-git-send-email-andi@firstfloor.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-08-06 16:36:12 -03:00
Andi Kleen
57849998e2 perf report: Add processing for cycle histograms
Call the earlier added cycle histogram infrastructure from the perf
report hist iter callback. For this we walk the branch records.

This allows to use cycle histograms when browsing perf report annotate.

v2: Rename flag

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: http://lkml.kernel.org/r/1437233094-12844-5-git-send-email-andi@firstfloor.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-08-06 16:35:30 -03:00
Andi Kleen
d4957633bf perf report: Add infrastructure for a cycles histogram
This adds the basic infrastructure to keep track of cycle counts per
basic block for annotate. We allocate an array similar to the normal
accounting, and then account branch cycles there.

We handle two cases:

cycles per basic block with start and cycles per branch (these are later
used for either IPC or just cycles per BB)

In the start case we cannot handle overlaps, so always the longest basic
block wins.

For the cycles per branch case everything is accurately accounted.

v2: Remove unnecessary checks. Slight restructure. Move
symbol__get_annotation to another patch. Move histogram allocation.
v3: Merged with current tree

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: http://lkml.kernel.org/r/1437233094-12844-4-git-send-email-andi@firstfloor.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-08-06 16:32:45 -03:00
Andi Kleen
98df858ed4 perf report: Add flag for non ANY branch mode
Later patches need to cheaply check that the branch mode is in ANY.  Add
a new function to check all event attrs and add a flag to the report
state, which is then initialized.

v2: Rename flag

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: http://lkml.kernel.org/r/1437233094-12844-3-git-send-email-andi@firstfloor.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-08-06 16:31:39 -03:00
Andi Kleen
0e332f033a perf tools: Add support for cycles, weight branch_info field
cycles is a new branch_info field available on some CPUs that indicates
the time deltas between branches in the LBR.

Add a sort key and output code for the cycles to allow to display the
basic block cycles individually in perf report.

We also pass in the cycles for weight when LBRs are processed, which
allows to get global and local weight, to get an estimate of the total
cost.

And also print the cycles information for perf report -D.  I also added
printing for the previously missing LBR flags (mispredict etc.)

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: http://lkml.kernel.org/r/1437233094-12844-2-git-send-email-andi@firstfloor.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-08-06 16:29:45 -03:00
Ben Hutchings
93df8a1ed6 perf tools: Add empty Build files for architectures lacking them
perf currently fails to build on MIPS as there is no
tools/perf/arch/mips/Build file.  Adding an empty file fixes this as
there are no MIPS-specific sources to build.

It looks like the same is needed for Alpha and PA-RISC, though I
haven't been able to test those.

Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Fixes: 5e8c0fb6a9 ("perf build: Add arch x86 objects building")
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1438704627.7315.2.camel@decadent.org.uk
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-08-06 16:24:15 -03:00
Jiri Olsa
f80010eb23 perf stat: Move counter processing code into stat object
Moving counter processing code into stat object as
perf_stat__process_counter.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1437481927-29538-8-git-send-email-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-08-06 16:08:16 -03:00
Jiri Olsa
5e5fe748be perf stat: Pass 'struct perf_stat_config' into process_counter()
Passing 'struct perf_stat_config' into process_counter(), so that we can
make process_counter() non static and use it from other places.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1437481927-29538-7-git-send-email-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-08-06 16:07:36 -03:00
Jiri Olsa
ec0d3d1fd2 perf stat: Move 'interval' into struct perf_stat_config
Moving 'interval' into struct perf_stat_config. The point is to
centralize the base stat config so it could be used localy together with
other stat routines in other parts of perf code.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1437481927-29538-6-git-send-email-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-08-06 16:03:04 -03:00
Jiri Olsa
5821522e94 perf stat: Move 'output' into struct perf_stat_config
Moving 'output' into struct perf_stat_config. The point is to centralize
the base stat config so it could be used localy together with other stat
routines in other parts of perf code.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1437481927-29538-5-git-send-email-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-08-06 16:02:51 -03:00
Jiri Olsa
711a572ea8 perf stat: Move 'scale' into struct perf_stat_config
Moving 'scale' into struct perf_stat_config. The point is to centralize
the base stat config so it could be used localy together with other stat
routines in other parts of perf code.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1437481927-29538-4-git-send-email-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-08-06 16:02:39 -03:00
Jiri Olsa
421a50f3fa perf stat: Introduce struct perf_stat_config
Moving 'aggr_mode' into new struct. The point is to centralize the base
stat config so it could be used localy together with other stat routines
in other parts of perf code.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1437481927-29538-3-git-send-email-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-08-06 16:02:29 -03:00
Wang Nan
5a023b57a8 perf tools: Add missing forward declaration of struct map to probe-event.h
Commit 7b6ff0bdbf ("perf probe ppc64le:
Fixup function entry if using kallsyms lookup") adds 'struct map' into
probe-event.h but not forward declares it. This patch fixes it.

Signed-off-by: Wang Nan <wangnan0@huawei.com>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: Brendan Gregg <brendan.d.gregg@gmail.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: David Ahern <dsahern@gmail.com>
Cc: He Kuang <hekuang@huawei.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kaixu Xia <xiakaixu@huawei.com>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Zefan Li <lizefan@huawei.com>
Cc: pi3orama@163.com
Fixes: 7b6ff0bdbf ("perf probe ppc64le: Fixup function entry if using kallsyms lookup")
Link: http://lkml.kernel.org/n/1436445342-1402-30-git-send-email-wangnan0@huawei.com
[ No need to include map.h, just forward declare 'struct map' ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-08-06 15:38:40 -03:00
Wang Nan
0af0885ef6 perf tools: Introduce veprintf
va_args alternative to eprintf().

Signed-off-by: Wang Nan <wangnan0@huawei.com>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Cc: Brendan Gregg <brendan.d.gregg@gmail.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: David Ahern <dsahern@gmail.com>
Cc: He Kuang <hekuang@huawei.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kaixu Xia <xiakaixu@huawei.com>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Zefan Li <lizefan@huawei.com>
Cc: pi3orama@163.com
Link: http://lkml.kernel.org/n/1436445342-1402-19-git-send-email-wangnan0@huawei.com
[ split from another patch ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-08-06 15:30:38 -03:00
Milian Wolff
834fd46ddb perf trace: Add total time column to summary.
It is cumbersome to manually calculate the total time spent in a given
syscall by multiplying the average value with the number of calls.

Instead, we now do this directly inside perf trace.

Note that this is also done by 'strace', which even adds a column with
relative numbers - something we could do in the future.

Example:

  perf trace -s find /some/folder > /dev/null

   Summary of events:

   find (19976), 700123 events, 100.0%, 0.000 msec

     syscall            calls    total       min       avg       max      stddev
                                 (msec)    (msec)    (msec)    (msec)        (%)
     --------------- -------- --------- --------- --------- ---------     ------
     read                   4     0.006     0.001     0.002     0.003     27.42%
     write               8046     9.617     0.001     0.001     0.035      0.56%
     open               34196    40.384     0.001     0.001     0.071      0.30%
     close              68375    57.104     0.001     0.001     0.076      0.25%
     stat                   4     0.004     0.001     0.001     0.001      3.14%
     fstat              34189    27.518     0.001     0.001     0.060      0.34%
     mmap                  13     0.029     0.001     0.002     0.003     10.74%
     mprotect               6     0.018     0.002     0.003     0.005     17.04%
     munmap                 3     0.014     0.003     0.005     0.006     24.87%
     brk                   87     0.490     0.001     0.006     0.016      6.50%
     ioctl                  3     0.004     0.001     0.001     0.003     36.39%
     access                 1     0.004     0.004     0.004     0.004      0.00%
     uname                  1     0.001     0.001     0.001     0.001      0.00%
     getdents           68393   143.600     0.001     0.002     0.187      0.95%
     fchdir             68371    56.980     0.001     0.001     0.111      0.39%
     arch_prctl             1     0.001     0.001     0.001     0.001      0.00%
     openat             34184    41.737     0.001     0.001     0.102      0.41%
     newfstatat         34184    41.180     0.001     0.001     0.064      0.34%

Signed-off-by: Milian Wolff <milian.wolff@kdab.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
LPU-Reference: 1438853069-5902-1-git-send-email-milian.wolff@kdab.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-08-06 11:29:49 -03:00
Petri Gynther
f151f53aa4 perf tools: Fix build errors with mipsel-linux-uclibc compiler
linux/tools$ make ARCH=mips CROSS_COMPILE=mipsel-linux- perf
...
config/Makefile:256: *** No gnu/libc-version.h found, please install
glibc-dev[el].  Stop.
make[1]: *** [all] Error 2
make: *** [perf] Error 2

...
In file included from builtin-sched.c:13:0:
util/cloexec.h:8:12: error: redundant redeclaration of ‘sched_getcpu’
 [-Werror=redundant-decls]
 extern int sched_getcpu(void) __THROW;

mipsel-buildroot-linux-uclibc/sysroot/usr/include/bits/sched.h:88:12:
 note: previous declaration of ‘sched_getcpu’ was here
 extern int sched_getcpu (void) __THROW;

uclibc info:
sysroot/usr/include/bits/uClibc_config.h
__UCLIBC_MAJOR__ 0
__UCLIBC_MINOR__ 9
__UCLIBC_SUBLEVEL__ 33

sysroot/usr/include/features.h
__UCLIBC__ 1
__GLIBC__ 2
__GLIBC_MINOR__ 2

Signed-off-by: Petri Gynther <pgynther@google.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1438735081-24131-1-git-send-email-pgynther@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-08-05 16:56:16 -03:00
Milian Wolff
007d66a0bd perf trace: Write to stderr by default
Without this patch, it is cumbersome to read the trace output but
ignoring the normal, potentially verbose, output of the debuggee.  One
common example is doing something like the following:

 perf trace -s find /tmp > /dev/null

Without this patch, the trace summary will be lost. Now, it will still
be printed at the end. This behavior is also applied by strace.

Cc: Milian Wolff <milian.wolff@kdab.com>
Cc: David Ahern <dsahern@gmail.com>
Link: http://lkml.kernel.org/n/tip-tqnks6y2cnvm5f9g2dsfr7zl@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-08-05 16:52:23 -03:00
Andi Kleen
b7a001d206 perf tools: Do not include escape sequences in color_vfprintf return
color_vprintf was including the length of the invisible escape sequences
in its return argument. Don't include them to make the return value
usable for indentation calculations.

v2: Add comment, rebase

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/r/1438649408-20807-3-git-send-email-andi@firstfloor.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-08-05 16:46:06 -03:00
Jiri Olsa
8011de7ab3 perf tools: Remove trail argument to color vsprintf
Seems like it's always '\n' through color_fprintf_ln, which is not used
at all, removing.. ;-)

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/r/1438649408-20807-2-git-send-email-andi@firstfloor.org
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-08-05 16:44:02 -03:00
Kan Liang
c3a6a8c405 perf tools: Refine parse/config callchain functions
Pass global callchain_param into parse_callchain_record_opt and
perf_evsel__config_callgraph as parameter. So we can reuse these
functions to parse/config local param for callchain.

Signed-off-by: Kan Liang <kan.liang@intel.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: http://lkml.kernel.org/r/1438677022-34296-3-git-send-email-kan.liang@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-08-05 16:42:11 -03:00
Kan Liang
3206771239 perf tools: Per-event time support
This patchkit adds the ability to turn off time stamps per event.

One usaful case for partial time is to work with per-event callgraph to
enable "PEBS threshold > 1" (https://lkml.org/lkml/2015/5/10/196), which
can significantly reduce the sampling overhead.

The event samples with time stamps off will not be ordered.

Signed-off-by: Kan Liang <kan.liang@intel.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: http://lkml.kernel.org/r/1438677022-34296-2-git-send-email-kan.liang@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-08-05 12:50:52 -03:00
Arnaldo Carvalho de Melo
34221118cb perf trace: Use vfs_getname syscall arg beautifier in more syscalls
Those were covered and tested in this cset:

 access, chdir, chmod, chown, chroot, creat, getxattr,
 inotify_add_watch, lchown, lgetxattr, listxattr,
 lsetxattr, mkdir, mkdirat, mknod, rmdir, faccessat,
 newfstatat, openat, readlink, readlinkat, removexattr,
 setxattr, statfs, swapon, swapoff, truncate, unlinkat,
 utime, utimes, utimensat.

E.g.:

  # trace -e statfs,access,mkdir mkdir /tmp/bla
   0.285 (0.020 ms): mkdir/2799 access(filename: /etc/ld.so.preload, mode: R         ) = -1 ENOENT No such file or directory
   1.070 (0.032 ms): mkdir/2799 statfs(pathname: /sys/fs/selinux, buf: 0x7ffeafbdc930) = 0
   1.087 (0.013 ms): mkdir/2799 statfs(pathname: /sys/fs/selinux, buf: 0x7ffeafbdc820) = 0
   1.189 (0.014 ms): mkdir/2799 access(filename: /etc/selinux/config                 ) = 0
   1.905 (0.610 ms): mkdir/2799 mkdir(pathname: /tmp/bla, mode: 511                  ) = 0
  #

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Milian Wolff <mail@milianw.de>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/n/tip-wbqtnlktquun3wtpjdz3okul@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

  and an empty message aborts the commit.
2015-08-05 12:50:11 -03:00
Arnaldo Carvalho de Melo
f994592d93 perf trace: Deref sys_enter pointer args with contents from probe:vfs_getname
To work like strace and dereference syscall pointer args we need to
insert probes (or tracepoints) right after we copy those bytes from
userspace.

Since we're formatting the syscall args at raw_syscalls:sys_enter time,
we need to have a formatter that just stores the position where, later,
when we get the probe:vfs_getname, we can insert the pointer contents.

Now, if a probe:vfs_getname with this format is in place:

 # perf probe -l
  probe:vfs_getname (on getname_flags:72@/home/git/linux/fs/namei.c with pathname)

That was, in this case, put in place with:

 # perf probe 'vfs_getname=getname_flags:72 pathname=filename:string'
 Added new event:
  probe:vfs_getname    (on getname_flags:72 with pathname=filename:string)

 You can now use it in all perf tools, such as:

	perf record -e probe:vfs_getname -aR sleep 1
 #

Then 'perf trace' will notice that and do the pointer -> contents
expansion:

 # trace -e open touch /tmp/bla
  0.165 (0.010 ms): touch/17752 open(filename: /etc/ld.so.cache, flags: CLOEXEC) = 3
  0.195 (0.011 ms): touch/17752 open(filename: /lib64/libc.so.6, flags: CLOEXEC) = 3
  0.512 (0.012 ms): touch/17752 open(filename: /usr/lib/locale/locale-archive, flags: CLOEXEC) = 3
  0.582 (0.012 ms): touch/17752 open(filename: /tmp/bla, flags: CREAT|NOCTTY|NONBLOCK|WRONLY, mode: 438) = 3
 #

Roughly equivalent to strace's output:

 # strace -rT -e open touch /tmp/bla
  0.000000 open("/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3 <0.000039>
  0.000317 open("/lib64/libc.so.6", O_RDONLY|O_CLOEXEC) = 3 <0.000102>
  0.001461 open("/usr/lib/locale/locale-archive", O_RDONLY|O_CLOEXEC) = 3 <0.000072>
  0.000405 open("/tmp/bla", O_WRONLY|O_CREAT|O_NOCTTY|O_NONBLOCK, 0666) = 3 <0.000055>
  0.000641 +++ exited with 0 +++
 #

Now we need to either look for at all syscalls that are marked as
pointers and have some well known names ("filename", "pathname", etc)
and set the arg formatter to the one used for the "open" syscall in this
patch.

This implementation works for syscalls with just a string being copied
from userspace, for matching syscalls with more than one string being
copied via the same probe/trace point (vfs_getname) we need to extend
the vfs_getname probe spec to include the pointer too, but there are
some problems with that in 'perf probe' or the kernel kprobes code, need
to investigate before considering supporting multiple strings per
syscall.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Milian Wolff <mail@milianw.de>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/n/tip-xvuwx6nuj8cf389kf9s2ue2s@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-08-05 10:52:45 -03:00
Arnaldo Carvalho de Melo
e4d44e830a perf trace: Use a constant for the syscall formatting buffer
We were using it as a magic number, 1024, fix that.

Eventually we need to stop doing it per line, and do it per
arg, traversing the args at output time, to avoid the memmove()
calls that will be used in the next cset to replace pointers
present at raw_syscalls:sys_enter time with its contents that
appear at probe:vfs_getname time, before raw_syscalls:sys_exit
time.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Milian Wolff <mail@milianw.de>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/n/tip-4sz3wid39egay1pp8qmbur4u@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-08-05 10:52:40 -03:00
Arnaldo Carvalho de Melo
08c987763a perf trace: Remember if the vfs_getname tracepoint/kprobe is in place
So that we can later decide if we will store where to expand the
pathname once we are handling vfs_getname or if we should instead
just go on and straight away print the pointer.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Milian Wolff <mail@milianw.de>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/n/tip-ytxk5s5jpc50wahffmlxgxuw@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-08-05 10:52:32 -03:00