Commit Graph

1233 Commits

Author SHA1 Message Date
Linus Torvalds
40c843218f perf tools changes for v5.17: 2nd batch
- Fix printing 'phys_addr' in 'perf script'.
 
 - Fix failure to add events with 'perf probe' in ppc64 due to not removing
   leading dot (ppc64 ABIv1).
 
 - Fix cpu_map__item() python binding building.
 
 - Support event alias in form foo-bar-baz, add pmu-events and parse-event tests
   for it.
 
 - No need to setup affinities when starting a workload or attaching to a pid.
 
 - Use path__join() to compose a path instead of ad-hoc snprintf() equivalent.
 
 - Override attr->sample_period for non-libpfm4 events.
 
 - Use libperf cpumap APIs instead of accessing the internal state directly.
 
 - Sync x86 arch prctl headers and files changed by the new
   set_mempolicy_home_node syscall with the kernel sources.
 
 - Remove duplicate include in cpumap.h.
 
 - Remove redundant err variable.
 
 Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQR2GiIUctdOfX2qHhGyPKLppCJ+JwUCYeytiAAKCRCyPKLppCJ+
 JyxZAQDNopdDHXLcUaa2Yp8M97QZQ31APY4vOSkn+AneVuBxvAEAs+BPCyBKWCHA
 o45Hit7BzOtb3erHN5XVLTNW3WNBIQM=
 =RdC6
 -----END PGP SIGNATURE-----

Merge tag 'perf-tools-for-v5.17-2022-01-22' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux

Pull more perf tools updates from Arnaldo Carvalho de Melo:

 - Fix printing 'phys_addr' in 'perf script'.

 - Fix failure to add events with 'perf probe' in ppc64 due to not
   removing leading dot (ppc64 ABIv1).

 - Fix cpu_map__item() python binding building.

 - Support event alias in form foo-bar-baz, add pmu-events and
   parse-event tests for it.

 - No need to setup affinities when starting a workload or attaching to
   a pid.

 - Use path__join() to compose a path instead of ad-hoc snprintf()
   equivalent.

 - Override attr->sample_period for non-libpfm4 events.

 - Use libperf cpumap APIs instead of accessing the internal state
   directly.

 - Sync x86 arch prctl headers and files changed by the new
   set_mempolicy_home_node syscall with the kernel sources.

 - Remove duplicate include in cpumap.h.

 - Remove redundant err variable.

* tag 'perf-tools-for-v5.17-2022-01-22' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux:
  perf tools: Remove redundant err variable
  perf test: Add parse-events test for aliases with hyphens
  perf test: Add pmu-events test for aliases with hyphens
  perf parse-events: Support event alias in form foo-bar-baz
  perf evsel: Override attr->sample_period for non-libpfm4 events
  perf cpumap: Remove duplicate include in cpumap.h
  perf cpumap: Migrate to libperf cpumap api
  perf python: Fix cpu_map__item() building
  perf script: Fix printing 'phys_addr' failure issue
  tools headers UAPI: Sync files changed by new set_mempolicy_home_node syscall
  tools headers UAPI: Sync x86 arch prctl headers with the kernel sources
  perf machine: Use path__join() to compose a path instead of snprintf(dir, '/', filename)
  perf evlist: No need to setup affinities when disabling events for pid targets
  perf evlist: No need to setup affinities when enabling events for pid targets
  perf stat: No need to setup affinities when starting a workload
  perf affinity: Allow passing a NULL arg to affinity__cleanup()
  perf probe: Fix ppc64 'perf probe add events failed' case
2022-01-23 08:14:21 +02:00
Linus Torvalds
3689f9f8b0 bitmap patches for 5.17-rc1
-----BEGIN PGP SIGNATURE-----
 
 iQHJBAABCgAzFiEEi8GdvG6xMhdgpu/4sUSA/TofvsgFAmHi+xgVHHl1cnkubm9y
 b3ZAZ21haWwuY29tAAoJELFEgP06H77IxdoMAMf3E+L51Ys/4iAiyJQNVoT3aIBC
 A8ZVOB9he1OA3o3wBNIRKmICHk+ovnfCWcXTr9fG/Ade2wJz88NAsGPQ1Phywb+s
 iGlpySllFN72RT9ZqtJhLEzgoHHOL0CzTW07TN9GJy4gQA2h2G9CTP+OmsQdnVqE
 m9Fn3PSlJ5lhzePlKfnln8rGZFgrriJakfEFPC79n/7an4+2Hvkb5rWigo7KQc4Z
 9YNqYUcHWZFUgq80adxEb9LlbMXdD+Z/8fCjOrAatuwVkD4RDt6iKD0mFGjHXGL7
 MZ9KRS8AfZXawmetk3jjtsV+/QkeS+Deuu7k0FoO0Th2QV7BGSDhsLXAS5By/MOC
 nfSyHhnXHzCsBMyVNrJHmNhEZoN29+tRwI84JX9lWcf/OLANcCofnP6f2UIX7tZY
 CAZAgVELp+0YQXdybrfzTQ8BT3TinjS/aZtCrYijRendI1GwUXcyl69vdOKqAHuk
 5jy8k/xHyp+ZWu6v+PyAAAEGowY++qhL0fmszA==
 =RKW4
 -----END PGP SIGNATURE-----

Merge tag 'bitmap-5.17-rc1' of git://github.com/norov/linux

Pull bitmap updates from Yury Norov:

 - introduce for_each_set_bitrange()

 - use find_first_*_bit() instead of find_next_*_bit() where possible

 - unify for_each_bit() macros

* tag 'bitmap-5.17-rc1' of git://github.com/norov/linux:
  vsprintf: rework bitmap_list_string
  lib: bitmap: add performance test for bitmap_print_to_pagebuf
  bitmap: unify find_bit operations
  mm/percpu: micro-optimize pcpu_is_populated()
  Replace for_each_*_bit_from() with for_each_*_bit() where appropriate
  find: micro-optimize for_each_{set,clear}_bit()
  include/linux: move for_each_bit() macros from bitops.h to find.h
  cpumask: replace cpumask_next_* with cpumask_first_* where appropriate
  tools: sync tools/bitmap with mother linux
  all: replace find_next{,_zero}_bit with find_first{,_zero}_bit where appropriate
  cpumask: use find_first_and_bit()
  lib: add find_first_and_bit()
  arch: remove GENERIC_FIND_FIRST_BIT entirely
  include: move find.h from asm_generic to linux
  bitops: move find_bit_*_le functions from le.h to find.h
  bitops: protect find_first_{,zero}_bit properly
2022-01-23 06:20:44 +02:00
Linus Torvalds
636b5284d8 Generic:
- selftest compilation fix for non-x86
 
 - KVM: avoid warning on s390 in mark_page_dirty
 
 x86:
 - fix page write-protection bug and improve comments
 
 - use binary search to lookup the PMU event filter, add test
 
 - enable_pmu module parameter support for Intel CPUs
 
 - switch blocked_vcpu_on_cpu_lock to raw spinlock
 
 - cleanups of blocked vCPU logic
 
 - partially allow KVM_SET_CPUID{,2} after KVM_RUN (5.16 regression)
 
 - various small fixes
 -----BEGIN PGP SIGNATURE-----
 
 iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmHpmT0UHHBib256aW5p
 QHJlZGhhdC5jb20ACgkQv/vSX3jHroOstggAi1VSpT43oGslQjXNDZacHEARoYQs
 b0XpoW7HXicGSGRMWspCmiAPdJyYTsioEACttAmXUMs7brAgHb9n/vzdlcLh1ymL
 rQw2YFQlfqqB1Ki1iRhNkWlH9xOECsu28WLng6ylrx51GuT/pzWRt+V3EGUFTxIT
 ldW9HgZg2oFJIaLjg2hQVR/8EbBf0QdsAD3KV3tyvhBlXPkyeLOMcGe9onfjZ/NE
 JQeW7FtKtP4SsIFt1KrJpDPjtiwFt3bRM0gfgGw7//clvtKIqt1LYXZiq4C3b7f5
 tfYiC8lO2vnOoYcfeYEmvybbSsoS/CgSliZB32qkwoVvRMIl82YmxtDD+Q==
 =/Mak
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull more kvm updates from Paolo Bonzini:
 "Generic:

   - selftest compilation fix for non-x86

   - KVM: avoid warning on s390 in mark_page_dirty

 x86:

   - fix page write-protection bug and improve comments

   - use binary search to lookup the PMU event filter, add test

   - enable_pmu module parameter support for Intel CPUs

   - switch blocked_vcpu_on_cpu_lock to raw spinlock

   - cleanups of blocked vCPU logic

   - partially allow KVM_SET_CPUID{,2} after KVM_RUN (5.16 regression)

   - various small fixes"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (46 commits)
  docs: kvm: fix WARNINGs from api.rst
  selftests: kvm/x86: Fix the warning in lib/x86_64/processor.c
  selftests: kvm/x86: Fix the warning in pmu_event_filter_test.c
  kvm: selftests: Do not indent with spaces
  kvm: selftests: sync uapi/linux/kvm.h with Linux header
  selftests: kvm: add amx_test to .gitignore
  KVM: SVM: Nullify vcpu_(un)blocking() hooks if AVIC is disabled
  KVM: SVM: Move svm_hardware_setup() and its helpers below svm_x86_ops
  KVM: SVM: Drop AVIC's intermediate avic_set_running() helper
  KVM: VMX: Don't do full kick when handling posted interrupt wakeup
  KVM: VMX: Fold fallback path into triggering posted IRQ helper
  KVM: VMX: Pass desired vector instead of bool for triggering posted IRQ
  KVM: VMX: Don't do full kick when triggering posted interrupt "fails"
  KVM: SVM: Skip AVIC and IRTE updates when loading blocking vCPU
  KVM: SVM: Use kvm_vcpu_is_blocking() in AVIC load to handle preemption
  KVM: SVM: Remove unnecessary APICv/AVIC update in vCPU unblocking path
  KVM: SVM: Don't bother checking for "running" AVIC when kicking for IPIs
  KVM: SVM: Signal AVIC doorbell iff vCPU is in guest mode
  KVM: x86: Remove defunct pre_block/post_block kvm_x86_ops hooks
  KVM: x86: Unexport LAPIC's switch_to_{hv,sw}_timer() helpers
  ...
2022-01-22 09:40:01 +02:00
Arnaldo Carvalho de Melo
6e10e21915 tools headers UAPI: Sync files changed by new set_mempolicy_home_node syscall
To pick the changes in these csets:

  21b084fdf2 ("mm/mempolicy: wire up syscall set_mempolicy_home_node")

That add support for this new syscall in tools such as 'perf trace'.

For instance, this is now possible:

  [root@five ~]# perf trace -e set_mempolicy_home_node
  ^C[root@five ~]#
  [root@five ~]# perf trace -v -e set_mempolicy_home_node
  Using CPUID AuthenticAMD-25-21-0
  event qualifier tracepoint filter: (common_pid != 253729 && common_pid != 3585) && (id == 450)
  mmap size 528384B
  ^C[root@five ~]
  [root@five ~]# perf trace -v -e set*  --max-events 5
  Using CPUID AuthenticAMD-25-21-0
  event qualifier tracepoint filter: (common_pid != 253734 && common_pid != 3585) && (id == 38 || id == 54 || id == 105 || id == 106 || id == 109 || id == 112 || id == 113 || id == 114 || id == 116 || id == 117 || id == 119 || id == 122 || id == 123 || id == 141 || id == 160 || id == 164 || id == 170 || id == 171 || id == 188 || id == 205 || id == 218 || id == 238 || id == 273 || id == 308 || id == 450)
  mmap size 528384B
       0.000 ( 0.008 ms): bash/253735 setpgid(pid: 253735 (bash), pgid: 253735 (bash))      = 0
    6849.011 ( 0.008 ms): bash/16046 setpgid(pid: 253736 (bash), pgid: 253736 (bash))       = 0
    6849.080 ( 0.005 ms): bash/253736 setpgid(pid: 253736 (bash), pgid: 253736 (bash))      = 0
    7437.718 ( 0.009 ms): gnome-shell/253737 set_robust_list(head: 0x7f34b527e920, len: 24) = 0
   13445.986 ( 0.010 ms): bash/16046 setpgid(pid: 253738 (bash), pgid: 253738 (bash))       = 0
  [root@five ~]#

That is the filter expression attached to the raw_syscalls:sys_{enter,exit}
tracepoints.

  $ find tools/perf/arch/ -name "syscall*tbl" | xargs grep -w set_mempolicy_home_node
  tools/perf/arch/mips/entry/syscalls/syscall_n64.tbl:450	common	set_mempolicy_home_node		sys_set_mempolicy_home_node
  tools/perf/arch/powerpc/entry/syscalls/syscall.tbl:450 	nospu	set_mempolicy_home_node		sys_set_mempolicy_home_node
  tools/perf/arch/s390/entry/syscalls/syscall.tbl:450  common	set_mempolicy_home_node	sys_set_mempolicy_home_node	sys_set_mempolicy_home_node
  tools/perf/arch/x86/entry/syscalls/syscall_64.tbl:450	common	set_mempolicy_home_node	sys_set_mempolicy_home_node
  $

  $ grep -w set_mempolicy_home_node /tmp/build/perf/arch/x86/include/generated/asm/syscalls_64.c
	[450] = "set_mempolicy_home_node",
  $

This addresses these perf build warnings:

  Warning: Kernel ABI header at 'tools/include/uapi/asm-generic/unistd.h' differs from latest version at 'include/uapi/asm-generic/unistd.h'
  diff -u tools/include/uapi/asm-generic/unistd.h include/uapi/asm-generic/unistd.h
  Warning: Kernel ABI header at 'tools/perf/arch/x86/entry/syscalls/syscall_64.tbl' differs from latest version at 'arch/x86/entry/syscalls/syscall_64.tbl'
  diff -u tools/perf/arch/x86/entry/syscalls/syscall_64.tbl arch/x86/entry/syscalls/syscall_64.tbl
  Warning: Kernel ABI header at 'tools/perf/arch/powerpc/entry/syscalls/syscall.tbl' differs from latest version at 'arch/powerpc/kernel/syscalls/syscall.tbl'
  diff -u tools/perf/arch/powerpc/entry/syscalls/syscall.tbl arch/powerpc/kernel/syscalls/syscall.tbl
  Warning: Kernel ABI header at 'tools/perf/arch/s390/entry/syscalls/syscall.tbl' differs from latest version at 'arch/s390/kernel/syscalls/syscall.tbl'
  diff -u tools/perf/arch/s390/entry/syscalls/syscall.tbl arch/s390/kernel/syscalls/syscall.tbl
  Warning: Kernel ABI header at 'tools/perf/arch/mips/entry/syscalls/syscall_n64.tbl' differs from latest version at 'arch/mips/kernel/syscalls/syscall_n64.tbl'
  diff -u tools/perf/arch/mips/entry/syscalls/syscall_n64.tbl arch/mips/kernel/syscalls/syscall_n64.tbl

Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-01-20 11:20:37 -03:00
Linus Torvalds
f4484d138b Merge branch 'akpm' (patches from Andrew)
Merge more updates from Andrew Morton:
 "55 patches.

  Subsystems affected by this patch series: percpu, procfs, sysctl,
  misc, core-kernel, get_maintainer, lib, checkpatch, binfmt, nilfs2,
  hfs, fat, adfs, panic, delayacct, kconfig, kcov, and ubsan"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (55 commits)
  lib: remove redundant assignment to variable ret
  ubsan: remove CONFIG_UBSAN_OBJECT_SIZE
  kcov: fix generic Kconfig dependencies if ARCH_WANTS_NO_INSTR
  lib/Kconfig.debug: make TEST_KMOD depend on PAGE_SIZE_LESS_THAN_256KB
  btrfs: use generic Kconfig option for 256kB page size limit
  arch/Kconfig: split PAGE_SIZE_LESS_THAN_256KB from PAGE_SIZE_LESS_THAN_64KB
  configs: introduce debug.config for CI-like setup
  delayacct: track delays from memory compact
  Documentation/accounting/delay-accounting.rst: add thrashing page cache and direct compact
  delayacct: cleanup flags in struct task_delay_info and functions use it
  delayacct: fix incomplete disable operation when switch enable to disable
  delayacct: support swapin delay accounting for swapping without blkio
  panic: remove oops_id
  panic: use error_report_end tracepoint on warnings
  fs/adfs: remove unneeded variable make code cleaner
  FAT: use io_schedule_timeout() instead of congestion_wait()
  hfsplus: use struct_group_attr() for memcpy() region
  nilfs2: remove redundant pointer sbufs
  fs/binfmt_elf: use PT_LOAD p_align values for static PIE
  const_structs.checkpatch: add frequently used ops structs
  ...
2022-01-20 10:41:01 +02:00
Isabella Basso
fd0a146240 hash.h: remove unused define directive
Patch series "test_hash.c: refactor into KUnit", v3.

We refactored the lib/test_hash.c file into KUnit as part of the student
group LKCAMP [1] introductory hackathon for kernel development.

This test was pointed to our group by Daniel Latypov [2], so its full
conversion into a pure KUnit test was our goal in this patch series, but
we ran into many problems relating to it not being split as unit tests,
which complicated matters a bit, as the reasoning behind the original
tests is quite cryptic for those unfamiliar with hash implementations.

Some interesting developments we'd like to highlight are:

 - In patch 1/5 we noticed that there was an unused define directive
   that could be removed.

 - In patch 4/5 we noticed how stringhash and hash tests are all under
   the lib/test_hash.c file, which might cause some confusion, and we
   also broke those kernel config entries up.

Overall KUnit developments have been made in the other patches in this
series:

In patches 2/5, 3/5 and 5/5 we refactored the lib/test_hash.c file so as
to make it more compatible with the KUnit style, whilst preserving the
original idea of the maintainer who designed it (i.e.  George Spelvin),
which might be undesirable for unit tests, but we assume it is enough
for a first patch.

This patch (of 5):

Currently, there exist hash_32() and __hash_32() functions, which were
introduced in a patch [1] targeting architecture specific optimizations.
These functions can be overridden on a per-architecture basis to achieve
such optimizations.  They must set their corresponding define directive
(HAVE_ARCH_HASH_32 and HAVE_ARCH__HASH_32, respectively) so that header
files can deal with these overrides properly.

As the supported 32-bit architectures that have their own hash function
implementation (i.e.  m68k, Microblaze, H8/300, pa-risc) have only been
making use of the (more general) __hash_32() function (which only lacks
a right shift operation when compared to the hash_32() function), remove
the define directive corresponding to the arch-specific hash_32()
implementation.

[1] https://lore.kernel.org/lkml/20160525073311.5600.qmail@ns.sciencehorizons.net/

[akpm@linux-foundation.org: hash_32_generic() becomes hash_32()]

Link: https://lkml.kernel.org/r/20211208183711.390454-1-isabbasso@riseup.net
Link: https://lkml.kernel.org/r/20211208183711.390454-2-isabbasso@riseup.net
Reviewed-by: David Gow <davidgow@google.com>
Tested-by: David Gow <davidgow@google.com>
Co-developed-by: Augusto Durães Camargo <augusto.duraes33@gmail.com>
Signed-off-by: Augusto Durães Camargo <augusto.duraes33@gmail.com>
Co-developed-by: Enzo Ferreira <ferreiraenzoa@gmail.com>
Signed-off-by: Enzo Ferreira <ferreiraenzoa@gmail.com>
Signed-off-by: Isabella Basso <isabbasso@riseup.net>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Brendan Higgins <brendanhiggins@google.com>
Cc: Daniel Latypov <dlatypov@google.com>
Cc: Shuah Khan <skhan@linuxfoundation.org>
Cc: Rodrigo Siqueira <rodrigosiqueiramelo@gmail.com>
Cc: kernel test robot <lkp@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-01-20 08:52:54 +02:00
Paolo Bonzini
fa68118144 kvm: selftests: sync uapi/linux/kvm.h with Linux header
KVM_CAP_XSAVE2 is out of sync due to a conflict.  Copy the whole
file while at it.

Reported-by: Yang Zhong <yang.zhong@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-01-19 13:30:29 -05:00
Paolo Bonzini
4f5a884fc2 Merge branch 'kvm-pi-raw-spinlock' into HEAD
Bring in fix for VT-d posted interrupts before further changing the code in 5.17.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-01-19 12:14:02 -05:00
Linus Torvalds
57d17378a4 perf tools changes for v5.17: 1st batch
New features:
 
 - Add 'trace' subcommand for 'perf ftrace', setting the stage for more
   'perf ftrace' subcommands. Not using a subcommand yields the previous
   behaviour of 'perf ftrace'.
 
 - Add 'latency' subcommand to 'perf ftrace', that can use the function
   graph tracer or a BPF optimized one, via the -b/--use-bpf option.
 
   E.g.:
 
   $ sudo perf ftrace latency -a -T mutex_lock sleep 1
   #   DURATION     |      COUNT | GRAPH                          |
        0 - 1    us |       4596 | ########################       |
        1 - 2    us |       1680 | #########                      |
        2 - 4    us |       1106 | #####                          |
        4 - 8    us |        546 | ##                             |
        8 - 16   us |        562 | ###                            |
       16 - 32   us |          1 |                                |
       32 - 64   us |          0 |                                |
       64 - 128  us |          0 |                                |
      128 - 256  us |          0 |                                |
      256 - 512  us |          0 |                                |
      512 - 1024 us |          0 |                                |
        1 - 2    ms |          0 |                                |
        2 - 4    ms |          0 |                                |
        4 - 8    ms |          0 |                                |
        8 - 16   ms |          0 |                                |
       16 - 32   ms |          0 |                                |
       32 - 64   ms |          0 |                                |
       64 - 128  ms |          0 |                                |
      128 - 256  ms |          0 |                                |
      256 - 512  ms |          0 |                                |
      512 - 1024 ms |          0 |                                |
        1 - ...   s |          0 |                                |
 
   The original implementation of this command was in the bcc tool.
 
 - Support --cputype option for hybrid events in 'perf stat'.
 
 Improvements:
 
 - Call chain improvements for ARM64.
 
 - No need to do any affinity setup when profiling pids.
 
 - Reduce multiplexing with duration_time in 'perf stat' metrics.
 
 - Improve error message for uncore events, stating that some event groups are
   can only be used in system wide (-a) mode.
 
 - perf stat metric group leader fixes/improvements, including arch specific
   changes to better support Intel topdown events.
 
 - Probe non-deprecated sysfs path 1st, i.e. try /sys/devices/system/cpu/cpuN/topology/thread_siblings
   first, then the old /sys/devices/system/cpu/cpuN/topology/core_cpus.
 
 - Disable debuginfod by default in 'perf record', to avoid stalls on distros
   such as Fedora 35.
 
 - Use unbuffered output in 'perf bench' when pipe/tee'ing to a file.
 
 - Enable ignore_missing_thread in 'perf trace'
 
 Fixes:
 
 - Avoid TUI crash when navigating in the annotation of recursive functions.
 
 - Fix hex dump character output in 'perf script'.
 
 - Fix JSON indentation to 4 spaces standard in the ARM vendor event files.
 
 - Fix use after free in metric__new().
 
 - Fix IS_ERR_OR_NULL() usage in the perf BPF loader.
 
 - Fix up cross-arch register support, i.e. when printing register names take
   into account the architecture where the perf.data file was collected.
 
 - Fix SMT fallback with large core counts.
 
 - Don't lower case MetricExpr when parsing JSON files so as not to lose info
   such as the ":G" event modifier in metrics.
 
 perf test:
 
 - Add basic stress test for sigtrap handling to 'perf test'.
 
 - Fix 'perf test' failures on s/390
 
 - Enable system wide for metricgroups test in 'perf test´.
 
 - Use 3 digits for test numbering now we can have more tests.
 
 Arch specific:
 
 - Add events for Arm Neoverse N2 in the ARM JSON vendor event files
 
 - Support PERF_MEM_LVLNUM encodings in powerpc, that came from a single
   patch series, where I incorrectly merged the kernel bits, that were then
   reverted after coordination with Michael Ellerman and Stephen Rothwell.
 
 - Add ARM SPE total latency as PERF_SAMPLE_WEIGHT.
 
 - Update AMD documentation, with info on raw event encoding.
 
 - Add support for global and local variants of the "p_stage_cyc" sort key,
   applicable to perf.data files collected on powerpc.
 
 - Remove duplicate and incorrect aux size checks in the ARM CoreSight ETM code.
 
 Refactorings:
 
 - Add a perf_cpu abstraction to disambiguate CPUs and CPU map indexes, fixing
   problems along the way.
 
 - Document CPU map methods.
 
 UAPI sync:
 
 - Update arch/x86/lib/mem{cpy,set}_64.S copies used in 'perf bench mem memcpy'
 
 - Sync UAPI files with the kernel sources: drm, msr-index, cpufeatures.
 
 Build system
 
 - Enable warnings through HOSTCFLAGS.
 
 - Drop requirement for libstdc++.so for libopencsd check
 
 libperf:
 
 - Make libperf adopt perf_counts_values__scale() from tools/perf/util/.
 
 - Add a stat multiplexing test to libperf.
 
 Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQR2GiIUctdOfX2qHhGyPKLppCJ+JwUCYeQj6AAKCRCyPKLppCJ+
 JwyWAQCBmU8OJxhSJQnNCwTB9zNkPPBbihvIztepOJ7zsw7JcQD+KfAidHGQvI/Y
 EmXIYkmdNkWPYJafONllnKK5cckjxgI=
 =aj9V
 -----END PGP SIGNATURE-----

Merge tag 'perf-tools-for-v5.17-2022-01-16' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux

Pull perf tool updates from Arnaldo Carvalho de Melo:
 "New features:

   - Add 'trace' subcommand for 'perf ftrace', setting the stage for
     more 'perf ftrace' subcommands. Not using a subcommand yields the
     previous behaviour of 'perf ftrace'.

   - Add 'latency' subcommand to 'perf ftrace', that can use the
     function graph tracer or a BPF optimized one, via the -b/--use-bpf
     option.

     E.g.:

	$ sudo perf ftrace latency -a -T mutex_lock sleep 1
	#   DURATION     |      COUNT | GRAPH                          |
	     0 - 1    us |       4596 | ########################       |
	     1 - 2    us |       1680 | #########                      |
	     2 - 4    us |       1106 | #####                          |
	     4 - 8    us |        546 | ##                             |
	     8 - 16   us |        562 | ###                            |
	    16 - 32   us |          1 |                                |
	    32 - 64   us |          0 |                                |
	    64 - 128  us |          0 |                                |
	   128 - 256  us |          0 |                                |
	   256 - 512  us |          0 |                                |
	   512 - 1024 us |          0 |                                |
	     1 - 2    ms |          0 |                                |
	     2 - 4    ms |          0 |                                |
	     4 - 8    ms |          0 |                                |
	     8 - 16   ms |          0 |                                |
	    16 - 32   ms |          0 |                                |
	    32 - 64   ms |          0 |                                |
	    64 - 128  ms |          0 |                                |
	   128 - 256  ms |          0 |                                |
	   256 - 512  ms |          0 |                                |
	   512 - 1024 ms |          0 |                                |
	     1 - ...   s |          0 |                                |

     The original implementation of this command was in the bcc tool.

   - Support --cputype option for hybrid events in 'perf stat'.

  Improvements:

   - Call chain improvements for ARM64.

   - No need to do any affinity setup when profiling pids.

   - Reduce multiplexing with duration_time in 'perf stat' metrics.

   - Improve error message for uncore events, stating that some event
     groups are can only be used in system wide (-a) mode.

   - perf stat metric group leader fixes/improvements, including arch
     specific changes to better support Intel topdown events.

   - Probe non-deprecated sysfs path first, i.e. try the path
     /sys/devices/system/cpu/cpuN/topology/thread_siblings first, then
     the old /sys/devices/system/cpu/cpuN/topology/core_cpus.

   - Disable debuginfod by default in 'perf record', to avoid stalls on
     distros such as Fedora 35.

   - Use unbuffered output in 'perf bench' when pipe/tee'ing to a file.

   - Enable ignore_missing_thread in 'perf trace'

  Fixes:

   - Avoid TUI crash when navigating in the annotation of recursive
     functions.

   - Fix hex dump character output in 'perf script'.

   - Fix JSON indentation to 4 spaces standard in the ARM vendor event
     files.

   - Fix use after free in metric__new().

   - Fix IS_ERR_OR_NULL() usage in the perf BPF loader.

   - Fix up cross-arch register support, i.e. when printing register
     names take into account the architecture where the perf.data file
     was collected.

   - Fix SMT fallback with large core counts.

   - Don't lower case MetricExpr when parsing JSON files so as not to
     lose info such as the ":G" event modifier in metrics.

  perf test:

   - Add basic stress test for sigtrap handling to 'perf test'.

   - Fix 'perf test' failures on s/390

   - Enable system wide for metricgroups test in 'perf test´.

   - Use 3 digits for test numbering now we can have more tests.

  Arch specific:

   - Add events for Arm Neoverse N2 in the ARM JSON vendor event files

   - Support PERF_MEM_LVLNUM encodings in powerpc, that came from a
     single patch series, where I incorrectly merged the kernel bits,
     that were then reverted after coordination with Michael Ellerman
     and Stephen Rothwell.

   - Add ARM SPE total latency as PERF_SAMPLE_WEIGHT.

   - Update AMD documentation, with info on raw event encoding.

   - Add support for global and local variants of the "p_stage_cyc" sort
     key, applicable to perf.data files collected on powerpc.

   - Remove duplicate and incorrect aux size checks in the ARM CoreSight
     ETM code.

  Refactorings:

   - Add a perf_cpu abstraction to disambiguate CPUs and CPU map
     indexes, fixing problems along the way.

   - Document CPU map methods.

  UAPI sync:

   - Update arch/x86/lib/mem{cpy,set}_64.S copies used in 'perf bench
     mem memcpy'

   - Sync UAPI files with the kernel sources: drm, msr-index,
     cpufeatures.

  Build system

   - Enable warnings through HOSTCFLAGS.

   - Drop requirement for libstdc++.so for libopencsd check

  libperf:

   - Make libperf adopt perf_counts_values__scale() from tools/perf/util/.

   - Add a stat multiplexing test to libperf"

* tag 'perf-tools-for-v5.17-2022-01-16' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux: (115 commits)
  perf record: Disable debuginfod by default
  perf evlist: No need to do any affinity setup when profiling pids
  perf cpumap: Add is_dummy() method
  perf metric: Fix metric_leader
  perf cputopo: Fix CPU topology reading on s/390
  perf metricgroup: Fix use after free in metric__new()
  libperf tests: Update a use of the new cpumap API
  perf arm: Fix off-by-one directory path
  tools arch x86: Sync the msr-index.h copy with the kernel sources
  tools headers cpufeatures: Sync with the kernel sources
  tools headers UAPI: Update tools's copy of drm.h header
  tools arch: Update arch/x86/lib/mem{cpy,set}_64.S copies used in 'perf bench mem memcpy'
  perf pmu-events: Don't lower case MetricExpr
  perf expr: Add debug logging for literals
  perf tools: Probe non-deprecated sysfs path 1st
  perf tools: Fix SMT fallback with large core counts
  perf cpumap: Give CPUs their own type
  perf stat: Correct first_shadow_cpu to return index
  perf script: Fix flipped index and cpu
  perf c2c: Use more intention revealing iterator
  ...
2022-01-18 06:32:11 +02:00
Linus Torvalds
79e06c4c49 RISCV:
- Use common KVM implementation of MMU memory caches
 
 - SBI v0.2 support for Guest
 
 - Initial KVM selftests support
 
 - Fix to avoid spurious virtual interrupts after clearing hideleg CSR
 
 - Update email address for Anup and Atish
 
 ARM:
 - Simplification of the 'vcpu first run' by integrating it into
   KVM's 'pid change' flow
 
 - Refactoring of the FP and SVE state tracking, also leading to
   a simpler state and less shared data between EL1 and EL2 in
   the nVHE case
 
 - Tidy up the header file usage for the nvhe hyp object
 
 - New HYP unsharing mechanism, finally allowing pages to be
   unmapped from the Stage-1 EL2 page-tables
 
 - Various pKVM cleanups around refcounting and sharing
 
 - A couple of vgic fixes for bugs that would trigger once
   the vcpu xarray rework is merged, but not sooner
 
 - Add minimal support for ARMv8.7's PMU extension
 
 - Rework kvm_pgtable initialisation ahead of the NV work
 
 - New selftest for IRQ injection
 
 - Teach selftests about the lack of default IPA space and
   page sizes
 
 - Expand sysreg selftest to deal with Pointer Authentication
 
 - The usual bunch of cleanups and doc update
 
 s390:
 - fix sigp sense/start/stop/inconsistency
 
 - cleanups
 
 x86:
 - Clean up some function prototypes more
 
 - improved gfn_to_pfn_cache with proper invalidation, used by Xen emulation
 
 - add KVM_IRQ_ROUTING_XEN_EVTCHN and event channel delivery
 
 - completely remove potential TOC/TOU races in nested SVM consistency checks
 
 - update some PMCs on emulated instructions
 
 - Intel AMX support (joint work between Thomas and Intel)
 
 - large MMU cleanups
 
 - module parameter to disable PMU virtualization
 
 - cleanup register cache
 
 - first part of halt handling cleanups
 
 - Hyper-V enlightened MSR bitmap support for nested hypervisors
 
 Generic:
 - clean up Makefiles
 
 - introduce CONFIG_HAVE_KVM_DIRTY_RING
 
 - optimize memslot lookup using a tree
 
 - optimize vCPU array usage by converting to xarray
 -----BEGIN PGP SIGNATURE-----
 
 iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmHhxvsUHHBib256aW5p
 QHJlZGhhdC5jb20ACgkQv/vSX3jHroPZkAf+Nz92UL/5nNGcdHtE4m7AToMmitE9
 bYkesf9BMQvAe5wjkABLuoHGi6ay4jabo4fiGzbdkiK7lO5YgfsWiMB3/MT5fl4E
 jRPzaVQabp3YZLM8UYCBmfUVuRj524S967SfSRe0AvYjDEH8y7klPf4+7sCsFT0/
 Px9Vf2KGuOlf0eM78yKg4rGaF0jS22eLgXm6FfNMY8/e29ZAo/jyUmqBY+Z2xxZG
 aWhceDtSheW1jwLHLj3nOlQJvHTn8LVGXBE/R8Gda3ZjrBV2rKaDi4Fh+HD+dz86
 2zVXwzQ7uck2CMW73GMoXMTWoKSHMyvlBOs1BdvBm4UsnGcXR+q8IFCeuQ==
 =s73m
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull kvm updates from Paolo Bonzini:
 "RISCV:

   - Use common KVM implementation of MMU memory caches

   - SBI v0.2 support for Guest

   - Initial KVM selftests support

   - Fix to avoid spurious virtual interrupts after clearing hideleg CSR

   - Update email address for Anup and Atish

  ARM:

   - Simplification of the 'vcpu first run' by integrating it into KVM's
     'pid change' flow

   - Refactoring of the FP and SVE state tracking, also leading to a
     simpler state and less shared data between EL1 and EL2 in the nVHE
     case

   - Tidy up the header file usage for the nvhe hyp object

   - New HYP unsharing mechanism, finally allowing pages to be unmapped
     from the Stage-1 EL2 page-tables

   - Various pKVM cleanups around refcounting and sharing

   - A couple of vgic fixes for bugs that would trigger once the vcpu
     xarray rework is merged, but not sooner

   - Add minimal support for ARMv8.7's PMU extension

   - Rework kvm_pgtable initialisation ahead of the NV work

   - New selftest for IRQ injection

   - Teach selftests about the lack of default IPA space and page sizes

   - Expand sysreg selftest to deal with Pointer Authentication

   - The usual bunch of cleanups and doc update

  s390:

   - fix sigp sense/start/stop/inconsistency

   - cleanups

  x86:

   - Clean up some function prototypes more

   - improved gfn_to_pfn_cache with proper invalidation, used by Xen
     emulation

   - add KVM_IRQ_ROUTING_XEN_EVTCHN and event channel delivery

   - completely remove potential TOC/TOU races in nested SVM consistency
     checks

   - update some PMCs on emulated instructions

   - Intel AMX support (joint work between Thomas and Intel)

   - large MMU cleanups

   - module parameter to disable PMU virtualization

   - cleanup register cache

   - first part of halt handling cleanups

   - Hyper-V enlightened MSR bitmap support for nested hypervisors

  Generic:

   - clean up Makefiles

   - introduce CONFIG_HAVE_KVM_DIRTY_RING

   - optimize memslot lookup using a tree

   - optimize vCPU array usage by converting to xarray"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (268 commits)
  x86/fpu: Fix inline prefix warnings
  selftest: kvm: Add amx selftest
  selftest: kvm: Move struct kvm_x86_state to header
  selftest: kvm: Reorder vcpu_load_state steps for AMX
  kvm: x86: Disable interception for IA32_XFD on demand
  x86/fpu: Provide fpu_sync_guest_vmexit_xfd_state()
  kvm: selftests: Add support for KVM_CAP_XSAVE2
  kvm: x86: Add support for getting/setting expanded xstate buffer
  x86/fpu: Add uabi_size to guest_fpu
  kvm: x86: Add CPUID support for Intel AMX
  kvm: x86: Add XCR0 support for Intel AMX
  kvm: x86: Disable RDMSR interception of IA32_XFD_ERR
  kvm: x86: Emulate IA32_XFD_ERR for guest
  kvm: x86: Intercept #NM for saving IA32_XFD_ERR
  x86/fpu: Prepare xfd_err in struct fpu_guest
  kvm: x86: Add emulation for IA32_XFD
  x86/fpu: Provide fpu_update_guest_xfd() for IA32_XFD emulation
  kvm: x86: Enable dynamic xfeatures at KVM_SET_CPUID2
  x86/fpu: Provide fpu_enable_guest_xfd_features() for KVM
  x86/fpu: Add guest support to xfd_enable_feature()
  ...
2022-01-16 16:15:14 +02:00
Yury Norov
4ade0818cf tools: sync tools/bitmap with mother linux
Remove tools/include/asm-generic/bitops/find.h and copy
include/linux/bitmap.h to tools. find_*_le() functions are not copied
because not needed in tools.

Signed-off-by: Yury Norov <yury.norov@gmail.com>
Tested-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
2022-01-15 08:47:31 -08:00
Wei Wang
415a3c33e8 kvm: selftests: Add support for KVM_CAP_XSAVE2
When KVM_CAP_XSAVE2 is supported, userspace is expected to allocate
buffer for KVM_GET_XSAVE2 and KVM_SET_XSAVE using the size returned
by KVM_CHECK_EXTENSION(KVM_CAP_XSAVE2).

Signed-off-by: Wei Wang <wei.w.wang@intel.com>
Signed-off-by: Guang Zeng <guang.zeng@intel.com>
Signed-off-by: Jing Liu <jing2.liu@intel.com>
Signed-off-by: Yang Zhong <yang.zhong@intel.com>
Message-Id: <20220105123532.12586-20-yang.zhong@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-01-14 13:44:42 -05:00
Arnaldo Carvalho de Melo
f1dcda0f79 tools headers UAPI: Update tools's copy of drm.h header
Picking the changes from:

  43d5ac7d07 ("drm: document DRM_IOCTL_MODE_GETFB2")

It is just a comment, so no changes and silences these perf build warnings:

  Warning: Kernel ABI header at 'tools/include/uapi/drm/drm.h' differs from latest version at 'include/uapi/drm/drm.h'
  diff -u tools/include/uapi/drm/drm.h include/uapi/drm/drm.h

Cc: Simon Ser <contact@emersion.fr>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-01-13 10:56:56 -03:00
Arnaldo Carvalho de Melo
1aa77e716c Merge remote-tracking branch 'torvalds/master' into perf/core
To pick up fixes and get in line with other trees, powerpc kernel
mostly this time, but BPF as well.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-01-13 10:20:59 -03:00
Linus Torvalds
e7d38f16c2 RCU pull request for v5.17
This pull request contains the following branches:
 
 doc.2021.11.30c: Documentation updates, perhaps most notably Neil Brown's
 	writeup of the reference-counting analogy to RCU.
 
 exp.2021.12.07a: Expedited grace-period cleanups.
 
 fastnohz.2021.11.30c: Remove CONFIG_RCU_FAST_NO_HZ due to lack of valid
 	users. I have asked around, posted a blog entry, and sent this
 	series to LKML without result.
 
 fixes.2021.11.30c: Miscellaneous fixes.
 
 nocb.2021.12.09a: RCU callback offloading updates, perhaps most notably
 	Frederic Weisbecker's updates allowing CPUs booted in the
 	de-offloaded state to be offloaded at runtime.
 
 nolibc.2021.11.30c: nolibc fixes from Willy Tarreau and Anmar Faizi, but
 	also including Mark Brown's addition of gettid().
 
 tasks.2021.12.09a: RCU Tasks Trace fixes, including changes that increase
 	the scalability of call_rcu_tasks_trace() for the BPF folks
 	(Martin Lau and KP Singh).
 
 torture.2021.12.07a: Various fixes including those from Wander Lairson
 	Costa and Li Zhijian.
 
 torturescript.2021.11.30c: Fixes plus addition of tests for the increased
 	call_rcu_tasks_trace() scalability.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEbK7UrM+RBIrCoViJnr8S83LZ+4wFAmHbtukTHHBhdWxtY2tA
 a2VybmVsLm9yZwAKCRCevxLzctn7jAX3D/4mrDqAPhAWLWKp7klRhvwypDxj0cxd
 /TuGNcZN+YdvNfwozcrog+8yiPxcxhNW1pMESi7SolAhRwuk1JEjiclY+7ORYd6a
 /dmJB/lQBezGAdgVabRaJjfLKikpQ+/EnzKee3jjTS1XhJRJe/hDwlVP2B6IROfy
 iko5yi+hxfhQdPW6UcpTPCl/4Jn63d9+2SIlW16H0LhzlJeYYsWz4tqOEKYeiHeB
 Zxq90InCVmb3YYJzOtk/G7pGQ2RxKPR6/ilm87yzAfJD0Dawd2pgYeDoGvzx94S6
 CmhvA6GmwO3JOL6lH891AQVXskCODSJdosP/7otm9u36XJT+5lNOeLRsLbS0Sd9t
 BrJKfC7wBFuuIug8j5k3+QSXiKB7Q5JpXEhOjH4BIrkSL0Z0jSVsrZwCSbiUkjZZ
 CdF19bL+4h4x5ZL3pndsplX+9BDXsKEgGHWeuzzB4rmsUMtBg84HyfbPp8mLxm6B
 i7a1hNVQ5rFWYj6TpI1ZgOBIX07i21OyMAUbXn5JSWUmOyPp2V6D4Sp1zdlvRM0r
 hKkIg73NP6ah9QZQTp7T1rIjVmFc2KjbmNZQegjR2pHykPCChT6xnlFix4InV4Ma
 BDtigP6vhWz1YfKPjek5WESzHmMRoxdpFjqDY//Uj8/bKBccldO0osERKWtdDlDL
 bwMNjny3PPLRng==
 =K6AN
 -----END PGP SIGNATURE-----

Merge tag 'rcu.2022.01.09a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu

Pull RCU updates from Paul McKenney:

 - Documentation updates, perhaps most notably Neil Brown's writeup of
   the reference-counting analogy to RCU.

 - Expedited grace-period cleanups.

 - Remove CONFIG_RCU_FAST_NO_HZ due to lack of valid users. I have asked
   around, posted a blog entry, and sent this series to LKML without
   result.

 - Miscellaneous fixes.

 - RCU callback offloading updates, perhaps most notably Frederic
   Weisbecker's updates allowing CPUs booted in the de-offloaded state
   to be offloaded at runtime.

 - nolibc fixes from Willy Tarreau and Anmar Faizi, but also including
   Mark Brown's addition of gettid().

 - RCU Tasks Trace fixes, including changes that increase the
   scalability of call_rcu_tasks_trace() for the BPF folks (Martin Lau
   and KP Singh).

 - Various fixes including those from Wander Lairson Costa and Li
   Zhijian.

 - Fixes plus addition of tests for the increased call_rcu_tasks_trace()
   scalability.

* tag 'rcu.2022.01.09a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu: (87 commits)
  rcu/nocb: Merge rcu_spawn_cpu_nocb_kthread() and rcu_spawn_one_nocb_kthread()
  rcu/nocb: Allow empty "rcu_nocbs" kernel parameter
  rcu/nocb: Create kthreads on all CPUs if "rcu_nocbs=" or "nohz_full=" are passed
  rcu/nocb: Optimize kthreads and rdp initialization
  rcu/nocb: Prepare nocb_cb_wait() to start with a non-offloaded rdp
  rcu/nocb: Remove rcu_node structure from nocb list when de-offloaded
  rcu-tasks: Use fewer callbacks queues if callback flood ends
  rcu-tasks: Use separate ->percpu_dequeue_lim for callback dequeueing
  rcu-tasks: Use more callback queues if contention encountered
  rcu-tasks: Avoid raw-spinlocked wakeups from call_rcu_tasks_generic()
  rcu-tasks: Count trylocks to estimate call_rcu_tasks() contention
  rcu-tasks: Add rcupdate.rcu_task_enqueue_lim to set initial queueing
  rcu-tasks: Make rcu_barrier_tasks*() handle multiple callback queues
  rcu-tasks: Use workqueues for multiple rcu_tasks_invoke_cbs() invocations
  rcu-tasks: Abstract invocations of callbacks
  rcu-tasks: Abstract checking of callback lists
  rcu-tasks: Add a ->percpu_enqueue_lim to the rcu_tasks structure
  rcu-tasks: Inspect stalled task's trc state in locked state
  rcu-tasks: Use spin_lock_rcu_node() and friends
  rcutorture: Combine n_max_cbs from all kthreads in a callback flood
  ...
2022-01-11 09:29:44 -08:00
Linus Torvalds
8efd0d9c31 Networking changes for 5.17.
Core
 ----
 
  - Defer freeing TCP skbs to the BH handler, whenever possible,
    or at least perform the freeing outside of the socket lock section
    to decrease cross-CPU allocator work and improve latency.
 
  - Add netdevice refcount tracking to locate sources of netdevice
    and net namespace refcount leaks.
 
  - Make Tx watchdog less intrusive - avoid pausing Tx and restarting
    all queues from a single CPU removing latency spikes.
 
  - Various small optimizations throughout the stack from Eric Dumazet.
 
  - Make netdev->dev_addr[] constant, force modifications to go via
    appropriate helpers to allow us to keep addresses in ordered data
    structures.
 
  - Replace unix_table_lock with per-hash locks, improving performance
    of bind() calls.
 
  - Extend skb drop tracepoint with a drop reason.
 
  - Allow SO_MARK and SO_PRIORITY setsockopt under CAP_NET_RAW.
 
 BPF
 ---
 
  - New helpers:
    - bpf_find_vma(), find and inspect VMAs for profiling use cases
    - bpf_loop(), runtime-bounded loop helper trading some execution
      time for much faster (if at all converging) verification
    - bpf_strncmp(), improve performance, avoid compiler flakiness
    - bpf_get_func_arg(), bpf_get_func_ret(), bpf_get_func_arg_cnt()
      for tracing programs, all inlined by the verifier
 
  - Support BPF relocations (CO-RE) in the kernel loader.
 
  - Further the support for BTF_TYPE_TAG annotations.
 
  - Allow access to local storage in sleepable helpers.
 
  - Convert verifier argument types to a composable form with different
    attributes which can be shared across types (ro, maybe-null).
 
  - Prepare libbpf for upcoming v1.0 release by cleaning up APIs,
    creating new, extensible ones where missing and deprecating those
    to be removed.
 
 Protocols
 ---------
 
  - WiFi (mac80211/cfg80211):
    - notify user space about long "come back in N" AP responses,
      allow it to react to such temporary rejections
    - allow non-standard VHT MCS 10/11 rates
    - use coarse time in airtime fairness code to save CPU cycles
 
  - Bluetooth:
    - rework of HCI command execution serialization to use a common
      queue and work struct, and improve handling errors reported
      in the middle of a batch of commands
    - rework HCI event handling to use skb_pull_data, avoiding packet
      parsing pitfalls
    - support AOSP Bluetooth Quality Report
 
  - SMC:
    - support net namespaces, following the RDMA model
    - improve connection establishment latency by pre-clearing buffers
    - introduce TCP ULP for automatic redirection to SMC
 
  - Multi-Path TCP:
    - support ioctls: SIOCINQ, OUTQ, and OUTQNSD
    - support socket options: IP_TOS, IP_FREEBIND, IP_TRANSPARENT,
      IPV6_FREEBIND, and IPV6_TRANSPARENT, TCP_CORK and TCP_NODELAY
    - support cmsgs: TCP_INQ
    - improvements in the data scheduler (assigning data to subflows)
    - support fastclose option (quick shutdown of the full MPTCP
      connection, similar to TCP RST in regular TCP)
 
  - MCTP (Management Component Transport) over serial, as defined by
    DMTF spec DSP0253 - "MCTP Serial Transport Binding".
 
 Driver API
 ----------
 
  - Support timestamping on bond interfaces in active/passive mode.
 
  - Introduce generic phylink link mode validation for drivers which
    don't have any quirks and where MAC capability bits fully express
    what's supported. Allow PCS layer to participate in the validation.
    Convert a number of drivers.
 
  - Add support to set/get size of buffers on the Rx rings and size of
    the tx copybreak buffer via ethtool.
 
  - Support offloading TC actions as first-class citizens rather than
    only as attributes of filters, improve sharing and device resource
    utilization.
 
  - WiFi (mac80211/cfg80211):
    - support forwarding offload (ndo_fill_forward_path)
    - support for background radar detection hardware
    - SA Query Procedures offload on the AP side
 
 New hardware / drivers
 ----------------------
 
  - tsnep - FPGA based TSN endpoint Ethernet MAC used in PLCs with
    real-time requirements for isochronous communication with protocols
    like OPC UA Pub/Sub.
 
  - Qualcomm BAM-DMUX WWAN - driver for data channels of modems
    integrated into many older Qualcomm SoCs, e.g. MSM8916 or
    MSM8974 (qcom_bam_dmux).
 
  - Microchip LAN966x multi-port Gigabit AVB/TSN Ethernet Switch
    driver with support for bridging, VLANs and multicast forwarding
    (lan966x).
 
  - iwlmei driver for co-operating between Intel's WiFi driver and
    Intel's Active Management Technology (AMT) devices.
 
  - mse102x - Vertexcom MSE102x Homeplug GreenPHY chips
 
  - Bluetooth:
    - MediaTek MT7921 SDIO devices
    - Foxconn MT7922A
    - Realtek RTL8852AE
 
 Drivers
 -------
 
  - Significantly improve performance in the datapaths of:
    lan78xx, ax88179_178a, lantiq_xrx200, bnxt.
 
  - Intel Ethernet NICs:
    - igb: support PTP/time PEROUT and EXTTS SDP functions on
      82580/i354/i350 adapters
    - ixgbevf: new PF -> VF mailbox API which avoids the risk of
      mailbox corruption with ESXi
    - iavf: support configuration of VLAN features of finer granularity,
      stacked tags and filtering
    - ice: PTP support for new E822 devices with sub-ns precision
    - ice: support firmware activation without reboot
 
  - Mellanox Ethernet NICs (mlx5):
    - expose control over IRQ coalescing mode (CQE vs EQE) via ethtool
    - support TC forwarding when tunnel encap and decap happen between
      two ports of the same NIC
    - dynamically size and allow disabling various features to save
      resources for running in embedded / SmartNIC scenarios
 
  - Broadcom Ethernet NICs (bnxt):
    - use page frag allocator to improve Rx performance
    - expose control over IRQ coalescing mode (CQE vs EQE) via ethtool
 
  - Other Ethernet NICs:
    - amd-xgbe: add Ryzen 6000 (Yellow Carp) Ethernet support
 
  - Microsoft cloud/virtual NIC (mana):
    - add XDP support (PASS, DROP, TX)
 
  - Mellanox Ethernet switches (mlxsw):
    - initial support for Spectrum-4 ASICs
    - VxLAN with IPv6 underlay
 
  - Marvell Ethernet switches (prestera):
    - support flower flow templates
    - add basic IP forwarding support
 
  - NXP embedded Ethernet switches (ocelot & felix):
    - support Per-Stream Filtering and Policing (PSFP)
    - enable cut-through forwarding between ports by default
    - support FDMA to improve packet Rx/Tx to CPU
 
  - Other embedded switches:
    - hellcreek: improve trapping management (STP and PTP) packets
    - qca8k: support link aggregation and port mirroring
 
  - Qualcomm 802.11ax WiFi (ath11k):
    - qca6390, wcn6855: enable 802.11 power save mode in station mode
    - BSS color change support
    - WCN6855 hw2.1 support
    - 11d scan offload support
    - scan MAC address randomization support
    - full monitor mode, only supported on QCN9074
    - qca6390/wcn6855: report signal and tx bitrate
    - qca6390: rfkill support
    - qca6390/wcn6855: regdb.bin support
 
  - Intel WiFi (iwlwifi):
    - support SAR GEO Offset Mapping (SGOM) and Time-Aware-SAR (TAS)
      in cooperation with the BIOS
    - support for Optimized Connectivity Experience (OCE) scan
    - support firmware API version 68
    - lots of preparatory work for the upcoming Bz device family
 
  - MediaTek WiFi (mt76):
    - Specific Absorption Rate (SAR) support
    - mt7921: 160 MHz channel support
 
  - RealTek WiFi (rtw88):
    - Specific Absorption Rate (SAR) support
    - scan offload
 
  - Other WiFi NICs
    - ath10k: support fetching (pre-)calibration data from nvmem
    - brcmfmac: configure keep-alive packet on suspend
    - wcn36xx: beacon filter support
 
 Signed-off-by: Jakub Kicinski <kuba@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmHbkZAACgkQMUZtbf5S
 IruYkQ//XX7BggcwBfukPK83j0dONolClijqKcKR08g4vB5L8GXvv6OErKIWrh4k
 h8JanCH352ZkbCSw3MvFdm825UYQv8vPMd6Qks/LJ4aSKqCuy4MIlAo+yOw4Km3O
 i7++lRfma6DqHHI59wvLjWoxZSPu8lL+rI8UsZ5qMOlnNlGAOXsNrzRjaqQ3FddY
 AMxZeBUtrPqUCCQZFq3U8apkYzUp7CA/3XR9zRcja3uPbrtOV2G+4whRF90qGNWz
 Tm/QvJ9F/Ab292cbhxR4KuaQ3hUhaCQyDjbZk3+FZzZpAVhYTVqcNjny6+yXmbiP
 NXRtwemnl1NlWKMnJM8lEeY48u626tRIkxA/Wtd61uoO5uKUSxfGP+UpUi+DfXbF
 yIw50VQ7L2bpxXP/HjtmhVgZDaWKYyh22Zw4Hp/muMJz0hgUB0KODY3tf2jUWbjJ
 0oEgocWyzhhwMQKqupTDCIaRgIs2ewYr4ZrFDhI3HnHC/vv1VjoPRUPIyxwppD2N
 cXvZb3B1sWK8iX5gCbISGzyU4bB7I0rvJSTU42ueti7n6NqRFZ79qHQpYnnY+JdO
 z1qOwY/d/yWfBoXVKRtRg2qz6CdEt5BQklwAgVEBgrFpf58gp694EwGMb1htY14J
 r/k9bVpmyIFpUnBH2CPMRfBVA3tUTqzyzzFV4AMw40NYLKmhLdo=
 =KLm3
 -----END PGP SIGNATURE-----

Merge tag '5.17-net-next' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next

Pull networking updates from Jakub Kicinski:
 "Core
  ----

   - Defer freeing TCP skbs to the BH handler, whenever possible, or at
     least perform the freeing outside of the socket lock section to
     decrease cross-CPU allocator work and improve latency.

   - Add netdevice refcount tracking to locate sources of netdevice and
     net namespace refcount leaks.

   - Make Tx watchdog less intrusive - avoid pausing Tx and restarting
     all queues from a single CPU removing latency spikes.

   - Various small optimizations throughout the stack from Eric Dumazet.

   - Make netdev->dev_addr[] constant, force modifications to go via
     appropriate helpers to allow us to keep addresses in ordered data
     structures.

   - Replace unix_table_lock with per-hash locks, improving performance
     of bind() calls.

   - Extend skb drop tracepoint with a drop reason.

   - Allow SO_MARK and SO_PRIORITY setsockopt under CAP_NET_RAW.

  BPF
  ---

   - New helpers:
      - bpf_find_vma(), find and inspect VMAs for profiling use cases
      - bpf_loop(), runtime-bounded loop helper trading some execution
        time for much faster (if at all converging) verification
      - bpf_strncmp(), improve performance, avoid compiler flakiness
      - bpf_get_func_arg(), bpf_get_func_ret(), bpf_get_func_arg_cnt()
        for tracing programs, all inlined by the verifier

   - Support BPF relocations (CO-RE) in the kernel loader.

   - Further the support for BTF_TYPE_TAG annotations.

   - Allow access to local storage in sleepable helpers.

   - Convert verifier argument types to a composable form with different
     attributes which can be shared across types (ro, maybe-null).

   - Prepare libbpf for upcoming v1.0 release by cleaning up APIs,
     creating new, extensible ones where missing and deprecating those
     to be removed.

  Protocols
  ---------

   - WiFi (mac80211/cfg80211):
      - notify user space about long "come back in N" AP responses,
        allow it to react to such temporary rejections
      - allow non-standard VHT MCS 10/11 rates
      - use coarse time in airtime fairness code to save CPU cycles

   - Bluetooth:
      - rework of HCI command execution serialization to use a common
        queue and work struct, and improve handling errors reported in
        the middle of a batch of commands
      - rework HCI event handling to use skb_pull_data, avoiding packet
        parsing pitfalls
      - support AOSP Bluetooth Quality Report

   - SMC:
      - support net namespaces, following the RDMA model
      - improve connection establishment latency by pre-clearing buffers
      - introduce TCP ULP for automatic redirection to SMC

   - Multi-Path TCP:
      - support ioctls: SIOCINQ, OUTQ, and OUTQNSD
      - support socket options: IP_TOS, IP_FREEBIND, IP_TRANSPARENT,
        IPV6_FREEBIND, and IPV6_TRANSPARENT, TCP_CORK and TCP_NODELAY
      - support cmsgs: TCP_INQ
      - improvements in the data scheduler (assigning data to subflows)
      - support fastclose option (quick shutdown of the full MPTCP
        connection, similar to TCP RST in regular TCP)

   - MCTP (Management Component Transport) over serial, as defined by
     DMTF spec DSP0253 - "MCTP Serial Transport Binding".

  Driver API
  ----------

   - Support timestamping on bond interfaces in active/passive mode.

   - Introduce generic phylink link mode validation for drivers which
     don't have any quirks and where MAC capability bits fully express
     what's supported. Allow PCS layer to participate in the validation.
     Convert a number of drivers.

   - Add support to set/get size of buffers on the Rx rings and size of
     the tx copybreak buffer via ethtool.

   - Support offloading TC actions as first-class citizens rather than
     only as attributes of filters, improve sharing and device resource
     utilization.

   - WiFi (mac80211/cfg80211):
      - support forwarding offload (ndo_fill_forward_path)
      - support for background radar detection hardware
      - SA Query Procedures offload on the AP side

  New hardware / drivers
  ----------------------

   - tsnep - FPGA based TSN endpoint Ethernet MAC used in PLCs with
     real-time requirements for isochronous communication with protocols
     like OPC UA Pub/Sub.

   - Qualcomm BAM-DMUX WWAN - driver for data channels of modems
     integrated into many older Qualcomm SoCs, e.g. MSM8916 or MSM8974
     (qcom_bam_dmux).

   - Microchip LAN966x multi-port Gigabit AVB/TSN Ethernet Switch driver
     with support for bridging, VLANs and multicast forwarding
     (lan966x).

   - iwlmei driver for co-operating between Intel's WiFi driver and
     Intel's Active Management Technology (AMT) devices.

   - mse102x - Vertexcom MSE102x Homeplug GreenPHY chips

   - Bluetooth:
      - MediaTek MT7921 SDIO devices
      - Foxconn MT7922A
      - Realtek RTL8852AE

  Drivers
  -------

   - Significantly improve performance in the datapaths of: lan78xx,
     ax88179_178a, lantiq_xrx200, bnxt.

   - Intel Ethernet NICs:
      - igb: support PTP/time PEROUT and EXTTS SDP functions on
        82580/i354/i350 adapters
      - ixgbevf: new PF -> VF mailbox API which avoids the risk of
        mailbox corruption with ESXi
      - iavf: support configuration of VLAN features of finer
        granularity, stacked tags and filtering
      - ice: PTP support for new E822 devices with sub-ns precision
      - ice: support firmware activation without reboot

   - Mellanox Ethernet NICs (mlx5):
      - expose control over IRQ coalescing mode (CQE vs EQE) via ethtool
      - support TC forwarding when tunnel encap and decap happen between
        two ports of the same NIC
      - dynamically size and allow disabling various features to save
        resources for running in embedded / SmartNIC scenarios

   - Broadcom Ethernet NICs (bnxt):
      - use page frag allocator to improve Rx performance
      - expose control over IRQ coalescing mode (CQE vs EQE) via ethtool

   - Other Ethernet NICs:
      - amd-xgbe: add Ryzen 6000 (Yellow Carp) Ethernet support

   - Microsoft cloud/virtual NIC (mana):
      - add XDP support (PASS, DROP, TX)

   - Mellanox Ethernet switches (mlxsw):
      - initial support for Spectrum-4 ASICs
      - VxLAN with IPv6 underlay

   - Marvell Ethernet switches (prestera):
      - support flower flow templates
      - add basic IP forwarding support

   - NXP embedded Ethernet switches (ocelot & felix):
      - support Per-Stream Filtering and Policing (PSFP)
      - enable cut-through forwarding between ports by default
      - support FDMA to improve packet Rx/Tx to CPU

   - Other embedded switches:
      - hellcreek: improve trapping management (STP and PTP) packets
      - qca8k: support link aggregation and port mirroring

   - Qualcomm 802.11ax WiFi (ath11k):
      - qca6390, wcn6855: enable 802.11 power save mode in station mode
      - BSS color change support
      - WCN6855 hw2.1 support
      - 11d scan offload support
      - scan MAC address randomization support
      - full monitor mode, only supported on QCN9074
      - qca6390/wcn6855: report signal and tx bitrate
      - qca6390: rfkill support
      - qca6390/wcn6855: regdb.bin support

   - Intel WiFi (iwlwifi):
      - support SAR GEO Offset Mapping (SGOM) and Time-Aware-SAR (TAS)
        in cooperation with the BIOS
      - support for Optimized Connectivity Experience (OCE) scan
      - support firmware API version 68
      - lots of preparatory work for the upcoming Bz device family

   - MediaTek WiFi (mt76):
      - Specific Absorption Rate (SAR) support
      - mt7921: 160 MHz channel support

   - RealTek WiFi (rtw88):
      - Specific Absorption Rate (SAR) support
      - scan offload

   - Other WiFi NICs
      - ath10k: support fetching (pre-)calibration data from nvmem
      - brcmfmac: configure keep-alive packet on suspend
      - wcn36xx: beacon filter support"

* tag '5.17-net-next' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (2048 commits)
  tcp: tcp_send_challenge_ack delete useless param `skb`
  net/qla3xxx: Remove useless DMA-32 fallback configuration
  rocker: Remove useless DMA-32 fallback configuration
  hinic: Remove useless DMA-32 fallback configuration
  lan743x: Remove useless DMA-32 fallback configuration
  net: enetc: Remove useless DMA-32 fallback configuration
  cxgb4vf: Remove useless DMA-32 fallback configuration
  cxgb4: Remove useless DMA-32 fallback configuration
  cxgb3: Remove useless DMA-32 fallback configuration
  bnx2x: Remove useless DMA-32 fallback configuration
  et131x: Remove useless DMA-32 fallback configuration
  be2net: Remove useless DMA-32 fallback configuration
  vmxnet3: Remove useless DMA-32 fallback configuration
  bna: Simplify DMA setting
  net: alteon: Simplify DMA setting
  myri10ge: Simplify DMA setting
  qlcnic: Simplify DMA setting
  net: allwinner: Fix print format
  page_pool: remove spinlock in page_pool_refill_alloc_cache()
  amt: fix wrong return type of amt_send_membership_update()
  ...
2022-01-10 19:06:09 -08:00
Linus Torvalds
4369b3cec2 linux-kselftest-next-5.17-rc1
This Kselftest update for Linux 5.17-rc1 consists of fixes to build
 errors, false negatives, and several code cleanups, including the
 ARRAY_SIZE cleanup that removes 25+ duplicates ARRAY_SIZE defines
 from individual tests.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEPZKym/RZuOCGeA/kCwJExA0NQxwFAmHYu/cACgkQCwJExA0N
 Qxw6Qg//atuqFrmKzmlzces5pHmM88+RpkKDjA2e0Ltr8to8IWMg8gzr3o2qQ1aw
 /zKTCxWtd7PHQiJXmG9LV5z4UefT3rfV7TQfRTlH6XLe+T9wdBVREMUp4HaPeYq7
 Q+u9NIe570ylBUVYznh40SsgSV9YJtAztxDrU39YidmuYEwDCbIwmIFRHMgClRLF
 KgtE1tc0iuFE5Ru/kQwd+MBZBHnA/Zf1S8YiM5AIncRx/9f4EgoMAWcbUHQYFRjS
 W+mlyLR3/UjeB0LUCLiwoXjDvUoga3RA1A6Nf9/7CC7pqRAXHdGFMxCLOJfy1AGh
 n4MNwxQA4yJDXUCS9zhQLob0n75NKJDKbLJ9PlP42IaJXK8KbRM8UcXRRob+ywjT
 xL476F5D49TO2YRS00vMyBor06MUa0cT3HLQKDdv6IdKEbUhpiHUFL9h9nxISnAN
 rFC/y4vrYfpJrR7BF/dK3zs98d9u+lEDxbsKxB7dZ/gvzMIDr2QU0/RiAhurdzz6
 7kswA2ZTBfJkJ2o9+8W9E4jJIORV51b8ep+nmv/aiFfCjLwCrPF0/J63l5Pd/Bgr
 xDXlydtGVmVSGiM5Vq5CmuFlDJzPD+ZDZLbpxpRmLZPsDjAJS5Zo1wIdtbbZ1hI5
 vbvPf9jadsR9WiBjohC09u6J34GwTsWcTW9aLZBUyzjmo07hYV0=
 =r9GV
 -----END PGP SIGNATURE-----

Merge tag 'linux-kselftest-next-5.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest

Pull Kselftest update from Shuah Khan:
 "Fixes to build errors, false negatives, and several code cleanups,
  including the ARRAY_SIZE cleanup that removes 25+ duplicates
  ARRAY_SIZE defines from individual tests"

* tag 'linux-kselftest-next-5.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
  selftests/vm: remove ARRAY_SIZE define from individual tests
  selftests/timens: remove ARRAY_SIZE define from individual tests
  selftests/sparc64: remove ARRAY_SIZE define from adi-test
  selftests/seccomp: remove ARRAY_SIZE define from seccomp_benchmark
  selftests/rseq: remove ARRAY_SIZE define from individual tests
  selftests/net: remove ARRAY_SIZE define from individual tests
  selftests/landlock: remove ARRAY_SIZE define from common.h
  selftests/ir: remove ARRAY_SIZE define from ir_loopback.c
  selftests/core: remove ARRAY_SIZE define from close_range_test.c
  selftests/cgroup: remove ARRAY_SIZE define from cgroup_util.h
  selftests/arm64: remove ARRAY_SIZE define from vec-syscfg.c
  tools: fix ARRAY_SIZE defines in tools and selftests hdrs
  selftests: cgroup: build error multiple outpt files
  selftests/move_mount_set_group remove unneeded conversion to bool
  selftests/mount: remove unneeded conversion to bool
  selftests: harness: avoid false negatives if test has no ASSERTs
  selftests/ftrace: make kprobe profile testcase description unique
  selftests: clone3: clone3: add case CLONE3_ARGS_NO_TEST
  selftests: timers: Remove unneeded semicolon
  kselftests: timers:Remove unneeded semicolon
2022-01-10 12:08:12 -08:00
Coco Li
eac1b93c14 gro: add ability to control gro max packet size
Eric Dumazet suggested to allow users to modify max GRO packet size.

We have seen GRO being disabled by users of appliances (such as
wifi access points) because of claimed bufferbloat issues,
or some work arounds in sch_cake, to split GRO/GSO packets.

Instead of disabling GRO completely, one can chose to limit
the maximum packet size of GRO packets, depending on their
latency constraints.

This patch adds a per device gro_max_size attribute
that can be changed with ip link command.

ip link set dev eth0 gro_max_size 16000

Suggested-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Coco Li <lixiaoyan@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-01-06 12:27:05 +00:00
Kajol Jain
7fbddf40b8 tools headers UAPI: Add new macros for mem_hops field to perf_event.h
Add new macros for mem_hops field which can be used to represent
remote-node, socket and board level details.

Currently the code had macro for HOPS_0 which, corresponds to data
coming from another core but same node.  Add new macros for HOPS_1 to
HOPS_3 to represent remote-node, socket and board level data.

Also add corresponding strings in the mem_hops array to represent
mem_hop field data in perf_mem__lvl_scnprintf function

Incase mem_hops field is used, PERF_MEM_LVLNUM field also need to be set
inorder to represent the data source. Hence printing data source via
PERF_MEM_LVL field can be skip in that scenario.

For ex: Encodings for mem_hops fields with L2 cache:

  L2                      - local L2
  L2 | REMOTE | HOPS_0    - remote core, same node L2
  L2 | REMOTE | HOPS_1    - remote node, same socket L2
  L2 | REMOTE | HOPS_2    - remote socket, same board L2
  L2 | REMOTE | HOPS_3    - remote board L2

Signed-off-by: Kajol Jain <kjain@linux.ibm.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Jin Yao <yao.jin@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nageswara R Sastry <rnsastry@linux.ibm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Song Liu <songliubraving@fb.com>
Cc: linuxppc-dev@lists.ozlabs.org
Link: http://lore.kernel.org/lkml/20211206091749.87585-3-kjain@linux.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-12-22 09:34:43 -03:00
Jiri Olsa
f92c1e1836 bpf: Add get_func_[arg|ret|arg_cnt] helpers
Adding following helpers for tracing programs:

Get n-th argument of the traced function:
  long bpf_get_func_arg(void *ctx, u32 n, u64 *value)

Get return value of the traced function:
  long bpf_get_func_ret(void *ctx, u64 *value)

Get arguments count of the traced function:
  long bpf_get_func_arg_cnt(void *ctx)

The trampoline now stores number of arguments on ctx-8
address, so it's easy to verify argument index and find
return value argument's position.

Moving function ip address on the trampoline stack behind
the number of functions arguments, so it's now stored on
ctx-16 address if it's needed.

All helpers above are inlined by verifier.

Also bit unrelated small change - using newly added function
bpf_prog_has_trampoline in check_get_func_ip.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20211208193245.172141-5-jolsa@kernel.org
2021-12-13 09:25:59 -08:00
Hou Tao
c5fb199374 bpf: Add bpf_strncmp helper
The helper compares two strings: one string is a null-terminated
read-only string, and another string has const max storage size
but doesn't need to be null-terminated. It can be used to compare
file name in tracing or LSM program.

Signed-off-by: Hou Tao <houtao1@huawei.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20211210141652.877186-2-houtao1@huawei.com
2021-12-11 17:40:23 -08:00
Shuah Khan
066b34aa54 tools: fix ARRAY_SIZE defines in tools and selftests hdrs
tools/include/linux/kernel.h and kselftest_harness.h are missing
ifndef guard around ARRAY_SIZE define. Fix them to avoid duplicate
define errors during compile when another file defines it. This
problem was found when compiling selftests that include a header
with ARRAY_SIZE define.

ARRAY_SIZE is defined in several selftests. There are about 25+
duplicate defines in various selftests source and header files.
Add ARRAY_SIZE to kselftest.h in preparation for removing duplicate
ARRAY_SIZE defines from individual test files.

Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2021-12-10 17:50:17 -07:00
Jakub Kicinski
be3158290d Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Andrii Nakryiko says:

====================
bpf-next 2021-12-10 v2

We've added 115 non-merge commits during the last 26 day(s) which contain
a total of 182 files changed, 5747 insertions(+), 2564 deletions(-).

The main changes are:

1) Various samples fixes, from Alexander Lobakin.

2) BPF CO-RE support in kernel and light skeleton, from Alexei Starovoitov.

3) A batch of new unified APIs for libbpf, logging improvements, version
   querying, etc. Also a batch of old deprecations for old APIs and various
   bug fixes, in preparation for libbpf 1.0, from Andrii Nakryiko.

4) BPF documentation reorganization and improvements, from Christoph Hellwig
   and Dave Tucker.

5) Support for declarative initialization of BPF_MAP_TYPE_PROG_ARRAY in
   libbpf, from Hengqi Chen.

6) Verifier log fixes, from Hou Tao.

7) Runtime-bounded loops support with bpf_loop() helper, from Joanne Koong.

8) Extend branch record capturing to all platforms that support it,
   from Kajol Jain.

9) Light skeleton codegen improvements, from Kumar Kartikeya Dwivedi.

10) bpftool doc-generating script improvements, from Quentin Monnet.

11) Two libbpf v0.6 bug fixes, from Shuyi Cheng and Vincent Minet.

12) Deprecation warning fix for perf/bpf_counter, from Song Liu.

13) MAX_TAIL_CALL_CNT unification and MIPS build fix for libbpf,
    from Tiezhu Yang.

14) BTF_KING_TYPE_TAG follow-up fixes, from Yonghong Song.

15) Selftests fixes and improvements, from Ilya Leoshkevich, Jean-Philippe
    Brucker, Jiri Olsa, Maxim Mikityanskiy, Tirthendu Sarkar, Yucong Sun,
    and others.

* https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (115 commits)
  libbpf: Add "bool skipped" to struct bpf_map
  libbpf: Fix typo in btf__dedup@LIBBPF_0.0.2 definition
  bpftool: Switch bpf_object__load_xattr() to bpf_object__load()
  selftests/bpf: Remove the only use of deprecated bpf_object__load_xattr()
  selftests/bpf: Add test for libbpf's custom log_buf behavior
  selftests/bpf: Replace all uses of bpf_load_btf() with bpf_btf_load()
  libbpf: Deprecate bpf_object__load_xattr()
  libbpf: Add per-program log buffer setter and getter
  libbpf: Preserve kernel error code and remove kprobe prog type guessing
  libbpf: Improve logging around BPF program loading
  libbpf: Allow passing user log setting through bpf_object_open_opts
  libbpf: Allow passing preallocated log_buf when loading BTF into kernel
  libbpf: Add OPTS-based bpf_btf_load() API
  libbpf: Fix bpf_prog_load() log_buf logic for log_level 0
  samples/bpf: Remove unneeded variable
  bpf: Remove redundant assignment to pointer t
  selftests/bpf: Fix a compilation warning
  perf/bpf_counter: Use bpf_map_create instead of bpf_create_map
  samples: bpf: Fix 'unknown warning group' build warning on Clang
  samples: bpf: Fix xdp_sample_user.o linking with Clang
  ...
====================

Link: https://lore.kernel.org/r/20211210234746.2100561-1-andrii@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-10 15:56:13 -08:00
Jakub Kicinski
3150a73366 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
No conflicts.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-09 13:23:02 -08:00
Sasha Levin
3a49cc22d3 tools/lib/lockdep: drop leftover liblockdep headers
Clean up remaining headers that are specific to liblockdep but lived in
the shared header directory.  These are all unused after the liblockdep
code was removed in commit 7246f4dcac ("tools/lib/lockdep: drop
liblockdep").

Note that there are still headers that were originally created for
liblockdep, that still have liblockdep references, but they are used by
other tools/ code at this point.

Signed-off-by: Sasha Levin <sashal@kernel.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-12-09 09:37:49 -08:00
Jakub Kicinski
fc993be36f Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-02 11:44:56 -08:00
Alexei Starovoitov
fbd94c7afc bpf: Pass a set of bpf_core_relo-s to prog_load command.
struct bpf_core_relo is generated by llvm and processed by libbpf.
It's a de-facto uapi.
With CO-RE in the kernel the struct bpf_core_relo becomes uapi de-jure.
Add an ability to pass a set of 'struct bpf_core_relo' to prog_load command
and let the kernel perform CO-RE relocations.

Note the struct bpf_line_info and struct bpf_func_info have the same
layout when passed from LLVM to libbpf and from libbpf to the kernel
except "insn_off" fields means "byte offset" when LLVM generates it.
Then libbpf converts it to "insn index" to pass to the kernel.
The struct bpf_core_relo's "insn_off" field is always "byte offset".

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20211201181040.23337-6-alexei.starovoitov@gmail.com
2021-12-02 11:18:35 -08:00
Alexei Starovoitov
46334a0cd2 bpf: Define enum bpf_core_relo_kind as uapi.
enum bpf_core_relo_kind is generated by llvm and processed by libbpf.
It's a de-facto uapi.
With CO-RE in the kernel the bpf_core_relo_kind values become uapi de-jure.
Also rename them with BPF_CORE_ prefix to distinguish from conflicting names in
bpf_core_read.h. The enums bpf_field_info_kind, bpf_type_id_kind,
bpf_type_info_kind, bpf_enum_value_kind are passing different values from bpf
program into llvm.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20211201181040.23337-5-alexei.starovoitov@gmail.com
2021-12-02 11:18:35 -08:00
Mark Brown
b0fe9dec66 tools/nolibc: Implement gettid()
Allow test programs to determine their thread ID.

Signed-off-by: Mark Brown <broonie@kernel.org>
Cc: Willy Tarreau <w@1wt.eu>
Signed-off-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-11-30 17:26:42 -08:00
Ammar Faizi
7bdc0e7a39 tools/nolibc: x86-64: Use mov $60,%eax instead of mov $60,%rax
Note that mov to 32-bit register will zero extend to 64-bit register.
Thus `mov $60,%eax` has the same effect with `mov $60,%rax`. Use the
shorter opcode to achieve the same thing.
```
  b8 3c 00 00 00       	mov    $60,%eax (5 bytes) [1]
  48 c7 c0 3c 00 00 00 	mov    $60,%rax (7 bytes) [2]
```
Currently, we use [2]. Change it to [1] for shorter code.

Signed-off-by: Ammar Faizi <ammar.faizi@students.amikom.ac.id>
Signed-off-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-11-30 17:26:42 -08:00
Ammar Faizi
bf91666959 tools/nolibc: x86: Remove r8, r9 and r10 from the clobber list
Linux x86-64 syscall only clobbers rax, rcx and r11 (and "memory").

  - rax for the return value.
  - rcx to save the return address.
  - r11 to save the rflags.

Other registers are preserved.

Having r8, r9 and r10 in the syscall clobber list is harmless, but this
results in a missed-optimization.

As the syscall doesn't clobber r8-r10, GCC should be allowed to reuse
their value after the syscall returns to userspace. But since they are
in the clobber list, GCC will always miss this opportunity.

Remove them from the x86-64 syscall clobber list to help GCC generate
better code and fix the comment.

See also the x86-64 ABI, section A.2 AMD64 Linux Kernel Conventions,
A.2.1 Calling Conventions [1].

Extra note:
Some people may think it does not really give a benefit to remove r8,
r9 and r10 from the syscall clobber list because the impression of
syscall is a C function call, and function call always clobbers those 3.

However, that is not the case for nolibc.h, because we have a potential
to inline the "syscall" instruction (which its opcode is "0f 05") to the
user functions.

All syscalls in the nolibc.h are written as a static function with inline
ASM and are likely always inline if we use optimization flag, so this is
a profit not to have r8, r9 and r10 in the clobber list.

Here is the example where this matters.

Consider the following C code:
```
  #include "tools/include/nolibc/nolibc.h"
  #define read_abc(a, b, c) __asm__ volatile("nop"::"r"(a),"r"(b),"r"(c))

  int main(void)
  {
  	int a = 0xaa;
  	int b = 0xbb;
  	int c = 0xcc;

  	read_abc(a, b, c);
  	write(1, "test\n", 5);
  	read_abc(a, b, c);

  	return 0;
  }
```

Compile with:
    gcc -Os test.c -o test -nostdlib

With r8, r9, r10 in the clobber list, GCC generates this:

0000000000001000 <main>:
    1000:	f3 0f 1e fa          	endbr64
    1004:	41 54                	push   %r12
    1006:	41 bc cc 00 00 00    	mov    $0xcc,%r12d
    100c:	55                   	push   %rbp
    100d:	bd bb 00 00 00       	mov    $0xbb,%ebp
    1012:	53                   	push   %rbx
    1013:	bb aa 00 00 00       	mov    $0xaa,%ebx
    1018:	90                   	nop
    1019:	b8 01 00 00 00       	mov    $0x1,%eax
    101e:	bf 01 00 00 00       	mov    $0x1,%edi
    1023:	ba 05 00 00 00       	mov    $0x5,%edx
    1028:	48 8d 35 d1 0f 00 00 	lea    0xfd1(%rip),%rsi
    102f:	0f 05                	syscall
    1031:	90                   	nop
    1032:	31 c0                	xor    %eax,%eax
    1034:	5b                   	pop    %rbx
    1035:	5d                   	pop    %rbp
    1036:	41 5c                	pop    %r12
    1038:	c3                   	ret

GCC thinks that syscall will clobber r8, r9, r10. So it spills 0xaa,
0xbb and 0xcc to callee saved registers (r12, rbp and rbx). This is
clearly extra memory access and extra stack size for preserving them.

But syscall does not actually clobber them, so this is a missed
optimization.

Now without r8, r9, r10 in the clobber list, GCC generates better code:

0000000000001000 <main>:
    1000:	f3 0f 1e fa          	endbr64
    1004:	41 b8 aa 00 00 00    	mov    $0xaa,%r8d
    100a:	41 b9 bb 00 00 00    	mov    $0xbb,%r9d
    1010:	41 ba cc 00 00 00    	mov    $0xcc,%r10d
    1016:	90                   	nop
    1017:	b8 01 00 00 00       	mov    $0x1,%eax
    101c:	bf 01 00 00 00       	mov    $0x1,%edi
    1021:	ba 05 00 00 00       	mov    $0x5,%edx
    1026:	48 8d 35 d3 0f 00 00 	lea    0xfd3(%rip),%rsi
    102d:	0f 05                	syscall
    102f:	90                   	nop
    1030:	31 c0                	xor    %eax,%eax
    1032:	c3                   	ret

Cc: Andy Lutomirski <luto@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: x86@kernel.org
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: David Laight <David.Laight@ACULAB.COM>
Acked-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Ammar Faizi <ammar.faizi@students.amikom.ac.id>
Link: https://gitlab.com/x86-psABIs/x86-64-ABI/-/wikis/x86-64-psABI [1]
Link: https://lore.kernel.org/lkml/20211011040344.437264-1-ammar.faizi@students.amikom.ac.id/
Signed-off-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-11-30 17:26:42 -08:00
Willy Tarreau
de0244ae40 tools/nolibc: fix incorrect truncation of exit code
Ammar Faizi reported that our exit code handling is wrong. We truncate
it to the lowest 8 bits but the syscall itself is expected to take a
regular 32-bit signed integer, not an unsigned char. It's the kernel
that later truncates it to the lowest 8 bits. The difference is visible
in strace, where the program below used to show exit(255) instead of
exit(-1):

  int main(void)
  {
        return -1;
  }

This patch applies the fix to all archs. x86_64, i386, arm64, armv7 and
mips were all tested and confirmed to work fine now. Risc-v was not
tested but the change is trivial and exactly the same as for other archs.

Reported-by: Ammar Faizi <ammar.faizi@students.amikom.ac.id>
Cc: stable@vger.kernel.org
Signed-off-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-11-30 17:26:42 -08:00
Willy Tarreau
ebbe0d8a44 tools/nolibc: i386: fix initial stack alignment
After re-checking in the spec and comparing stack offsets with glibc,
The last pushed argument must be 16-byte aligned (i.e. aligned before the
call) so that in the callee esp+4 is multiple of 16, so the principle is
the 32-bit equivalent to what Ammar fixed for x86_64. It's possible that
32-bit code using SSE2 or MMX could have been affected. In addition the
frame pointer ought to be zero at the deepest level.

Link: https://gitlab.com/x86-psABIs/i386-ABI/-/wikis/Intel386-psABI
Cc: Ammar Faizi <ammar.faizi@students.amikom.ac.id>
Cc: stable@vger.kernel.org
Signed-off-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-11-30 17:26:42 -08:00
Ammar Faizi
937ed91c71 tools/nolibc: x86-64: Fix startup code bug
Before this patch, the `_start` function looks like this:
```
0000000000001170 <_start>:
    1170:	pop    %rdi
    1171:	mov    %rsp,%rsi
    1174:	lea    0x8(%rsi,%rdi,8),%rdx
    1179:	and    $0xfffffffffffffff0,%rsp
    117d:	sub    $0x8,%rsp
    1181:	call   1000 <main>
    1186:	movzbq %al,%rdi
    118a:	mov    $0x3c,%rax
    1191:	syscall
    1193:	hlt
    1194:	data16 cs nopw 0x0(%rax,%rax,1)
    119f:	nop
```
Note the "and" to %rsp with $-16, it makes the %rsp be 16-byte aligned,
but then there is a "sub" with $0x8 which makes the %rsp no longer
16-byte aligned, then it calls main. That's the bug!

What actually the x86-64 System V ABI mandates is that right before the
"call", the %rsp must be 16-byte aligned, not after the "call". So the
"sub" with $0x8 here breaks the alignment. Remove it.

An example where this rule matters is when the callee needs to align
its stack at 16-byte for aligned move instruction, like `movdqa` and
`movaps`. If the callee can't align its stack properly, it will result
in segmentation fault.

x86-64 System V ABI also mandates the deepest stack frame should be
zero. Just to be safe, let's zero the %rbp on startup as the content
of %rbp may be unspecified when the program starts. Now it looks like
this:
```
0000000000001170 <_start>:
    1170:	pop    %rdi
    1171:	mov    %rsp,%rsi
    1174:	lea    0x8(%rsi,%rdi,8),%rdx
    1179:	xor    %ebp,%ebp                # zero the %rbp
    117b:	and    $0xfffffffffffffff0,%rsp # align the %rsp
    117f:	call   1000 <main>
    1184:	movzbq %al,%rdi
    1188:	mov    $0x3c,%rax
    118f:	syscall
    1191:	hlt
    1192:	data16 cs nopw 0x0(%rax,%rax,1)
    119d:	nopl   (%rax)
```

Cc: Bedirhan KURT <windowz414@gnuweeb.org>
Cc: Louvian Lyndal <louvianlyndal@gmail.com>
Reported-by: Peter Cordes <peter@cordes.ca>
Signed-off-by: Ammar Faizi <ammar.faizi@students.amikom.ac.id>
[wt: I did this on purpose due to a misunderstanding of the spec, other
     archs will thus have to be rechecked, particularly i386]
Cc: stable@vger.kernel.org
Signed-off-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-11-30 17:26:42 -08:00
Joanne Koong
e6f2dd0f80 bpf: Add bpf_loop helper
This patch adds the kernel-side and API changes for a new helper
function, bpf_loop:

long bpf_loop(u32 nr_loops, void *callback_fn, void *callback_ctx,
u64 flags);

where long (*callback_fn)(u32 index, void *ctx);

bpf_loop invokes the "callback_fn" **nr_loops** times or until the
callback_fn returns 1. The callback_fn can only return 0 or 1, and
this is enforced by the verifier. The callback_fn index is zero-indexed.

A few things to please note:
~ The "u64 flags" parameter is currently unused but is included in
case a future use case for it arises.
~ In the kernel-side implementation of bpf_loop (kernel/bpf/bpf_iter.c),
bpf_callback_t is used as the callback function cast.
~ A program can have nested bpf_loop calls but the program must
still adhere to the verifier constraint of its stack depth (the stack depth
cannot exceed MAX_BPF_STACK))
~ Recursive callback_fns do not pass the verifier, due to the call stack
for these being too deep.
~ The next patch will include the tests and benchmark

Signed-off-by: Joanne Koong <joannekoong@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20211130030622.4131246-2-joannekoong@fb.com
2021-11-30 10:56:28 -08:00
Matthew Wilcox (Oracle)
d6e6a27d96 tools: Fix math.h breakage
Commit 98e1385ef2 ("include/linux/radix-tree.h: replace kernel.h with
the necessary inclusions") broke the radix tree test suite in two
different ways; first by including math.h which didn't exist in the
tools directory, and second by removing an implicit include of
spinlock.h before lockdep.h.  Fix both issues.

Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Acked-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-11-30 09:14:42 -08:00
Hangbin Liu
5944b5abd8 Bonding: add arp_missed_max option
Currently, we use hard code number to verify if we are in the
arp_interval timeslice. But some user may want to reduce/extend
the verify timeslice. With the similar team option 'missed_max'
the uers could change that number based on their own environment.

Acked-by: Jay Vosburgh <jay.vosburgh@canonical.com>
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-11-30 12:15:58 +00:00
Jakub Kicinski
93d5404e89 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
drivers/net/ipa/ipa_main.c
  8afc7e471a ("net: ipa: separate disabling setup from modem stop")
  76b5fbcd6b ("net: ipa: kill ipa_modem_init()")

Duplicated include, drop one.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-11-26 13:45:19 -08:00
Linus Torvalds
c5c17547b7 Networking fixes for 5.16-rc3, including fixes from netfilter.
Current release - regressions:
 
  - r8169: fix incorrect mac address assignment
 
  - vlan: fix underflow for the real_dev refcnt when vlan creation fails
 
  - smc: avoid warning of possible recursive locking
 
 Current release - new code bugs:
 
  - vsock/virtio: suppress used length validation
 
  - neigh: fix crash in v6 module initialization error path
 
 Previous releases - regressions:
 
  - af_unix: fix change in behavior in read after shutdown
 
  - igb: fix netpoll exit with traffic, avoid warning
 
  - tls: fix splice_read() when starting mid-record
 
  - lan743x: fix deadlock in lan743x_phy_link_status_change()
 
  - marvell: prestera: fix bridge port operation
 
 Previous releases - always broken:
 
  - tcp_cubic: fix spurious Hystart ACK train detections for
    not-cwnd-limited flows
 
  - nexthop: fix refcount issues when replacing IPv6 groups
 
  - nexthop: fix null pointer dereference when IPv6 is not enabled
 
  - phylink: force link down and retrigger resolve on interface change
 
  - mptcp: fix delack timer length calculation and incorrect early
    clearing
 
  - ieee802154: handle iftypes as u32, prevent shift-out-of-bounds
 
  - nfc: virtual_ncidev: change default device permissions
 
  - netfilter: ctnetlink: fix error codes and flags used for kernel side
    filtering of dumps
 
  - netfilter: flowtable: fix IPv6 tunnel addr match
 
  - ncsi: align payload to 32-bit to fix dropped packets
 
  - iavf: fix deadlock and loss of config during VF interface reset
 
  - ice: avoid bpf_prog refcount underflow
 
  - ocelot: fix broken PTP over IP and PTP API violations
 
 Misc:
 
  - marvell: mvpp2: increase MTU limit when XDP enabled
 
 Signed-off-by: Jakub Kicinski <kuba@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmGhSCwACgkQMUZtbf5S
 IrvAdw//aR54mgc9rc0mkvS5sbDeKzDscmTzav5ANGjl+2ooKTOe8Qd07s59z6TJ
 H9IJlTu0Uc9Psbb2RvRo1T1HDohSpWy7SEN/Qlo6N+z1WzDHWbuXyC/KTQDM+8I1
 coMYBBTwBGkblBosuoMUi60GWLbBslLv9gR7HUZj7gbtxMfk36BrX5UYz1ONy+tx
 HiVshtOmzOgumBi+/j0tkI4lpI/ajf9eYaG6Vvd0A6F3idcbhWKNKfLPgw9qQF36
 sQrbz1SYwL5Ucgk47EG+Lpk7oSzbkdNoO6Ro9ncsebB8OMoLUhddclmG/fbgPG0o
 SWJ4kK3kmaRSTvSi6q4e5BM89oIhtFWhGRB6vURokrAQU1Ds+sq5F+8IwCaMqEYb
 GNyEZ8cdJhLc50RU+/Im3lN6IrRHvQiirE1BN+ZuCMjeSTrsqX18ZYMh1pSJhxkZ
 wRC03sSd2ZcaooFrSNJ5Scr3ndacrWNtVr78IQYCNrTjqJn1QUK7ZegTjP04FUfD
 JLB7+en8Hd6EKosJLKyoAPRwoFPZN6mDAPC6RfF45B3OoZAHbvXmJrOT6PatcqHe
 i0YwDkAJKPRijfcepN1IQYlY2Za5HwNWzCV6v0bf4tUCluDsSkczTKS02dZ1hegR
 oYW1Ra1BIyYK4cbG4H0lD7iBQGLGgwt38U1NlFpawbJa/fECUSs=
 =LtZ7
 -----END PGP SIGNATURE-----

Merge tag 'net-5.16-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Pull networking fixes from Jakub Kicinski:
 "Networking fixes, including fixes from netfilter.

  Current release - regressions:

   - r8169: fix incorrect mac address assignment

   - vlan: fix underflow for the real_dev refcnt when vlan creation
     fails

   - smc: avoid warning of possible recursive locking

  Current release - new code bugs:

   - vsock/virtio: suppress used length validation

   - neigh: fix crash in v6 module initialization error path

  Previous releases - regressions:

   - af_unix: fix change in behavior in read after shutdown

   - igb: fix netpoll exit with traffic, avoid warning

   - tls: fix splice_read() when starting mid-record

   - lan743x: fix deadlock in lan743x_phy_link_status_change()

   - marvell: prestera: fix bridge port operation

  Previous releases - always broken:

   - tcp_cubic: fix spurious Hystart ACK train detections for
     not-cwnd-limited flows

   - nexthop: fix refcount issues when replacing IPv6 groups

   - nexthop: fix null pointer dereference when IPv6 is not enabled

   - phylink: force link down and retrigger resolve on interface change

   - mptcp: fix delack timer length calculation and incorrect early
     clearing

   - ieee802154: handle iftypes as u32, prevent shift-out-of-bounds

   - nfc: virtual_ncidev: change default device permissions

   - netfilter: ctnetlink: fix error codes and flags used for kernel
     side filtering of dumps

   - netfilter: flowtable: fix IPv6 tunnel addr match

   - ncsi: align payload to 32-bit to fix dropped packets

   - iavf: fix deadlock and loss of config during VF interface reset

   - ice: avoid bpf_prog refcount underflow

   - ocelot: fix broken PTP over IP and PTP API violations

  Misc:

   - marvell: mvpp2: increase MTU limit when XDP enabled"

* tag 'net-5.16-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (94 commits)
  net: dsa: microchip: implement multi-bridge support
  net: mscc: ocelot: correctly report the timestamping RX filters in ethtool
  net: mscc: ocelot: set up traps for PTP packets
  net: ptp: add a definition for the UDP port for IEEE 1588 general messages
  net: mscc: ocelot: create a function that replaces an existing VCAP filter
  net: mscc: ocelot: don't downgrade timestamping RX filters in SIOCSHWTSTAMP
  net: hns3: fix incorrect components info of ethtool --reset command
  net: hns3: fix one incorrect value of page pool info when queried by debugfs
  net: hns3: add check NULL address for page pool
  net: hns3: fix VF RSS failed problem after PF enable multi-TCs
  net: qed: fix the array may be out of bound
  net/smc: Don't call clcsock shutdown twice when smc shutdown
  net: vlan: fix underflow for the real_dev refcnt
  ptp: fix filter names in the documentation
  ethtool: ioctl: fix potential NULL deref in ethtool_set_coalesce()
  nfc: virtual_ncidev: change default device permissions
  net/sched: sch_ets: don't peek at classes beyond 'nbands'
  net: stmmac: Disable Tx queues when reconfiguring the interface
  selftests: tls: test for correct proto_ops
  tls: fix replacing proto_ops
  ...
2021-11-26 12:58:53 -08:00
Eric Dumazet
710d5835b7 tools: sync uapi/linux/if_link.h header
This file has not been updated for a while.

Sync it before BIG TCP patch series.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20211122184810.769159-1-eric.dumazet@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-11-23 19:39:28 -08:00
Jakub Kicinski
50fc24944a Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-11-18 13:13:16 -08:00
Arnaldo Carvalho de Melo
346e91998c tools headers UAPI: Sync linux/kvm.h with the kernel sources
To pick the changes in:

  b56639318b ("KVM: SEV: Add support for SEV intra host migration")
  e615e35589 ("KVM: x86: On emulation failure, convey the exit reason, etc. to userspace")
  a9d496d8e0 ("KVM: x86: Clarify the kvm_run.emulation_failure structure layout")
  c68dc1b577 ("KVM: x86: Report host tsc and realtime values in KVM_GET_CLOCK")
  dea8ee31a0 ("RISC-V: KVM: Add SBI v0.1 support")

That just rebuilds perf, as these patches don't add any new KVM ioctl to
be harvested for the the 'perf trace' ioctl syscall argument
beautifiers.

This is also by now used by tools/testing/selftests/kvm/, a simple test
build succeeded.

This silences this perf build warning:

  Warning: Kernel ABI header at 'tools/include/uapi/linux/kvm.h' differs from latest version at 'include/uapi/linux/kvm.h'
  diff -u tools/include/uapi/linux/kvm.h include/uapi/linux/kvm.h

Cc: Anup Patel <anup@brainfault.org>
Cc: Atish Patra <atish.patra@wdc.com>
Cc: David Edmondson <david.edmondson@oracle.com>
Cc: Oliver Upton <oupton@google.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Peter Gonda <pgonda@google.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-11-18 10:08:07 -03:00
Tiezhu Yang
ebf7f6f0a6 bpf: Change value of MAX_TAIL_CALL_CNT from 32 to 33
In the current code, the actual max tail call count is 33 which is greater
than MAX_TAIL_CALL_CNT (defined as 32). The actual limit is not consistent
with the meaning of MAX_TAIL_CALL_CNT and thus confusing at first glance.
We can see the historical evolution from commit 04fd61ab36 ("bpf: allow
bpf programs to tail-call other bpf programs") and commit f9dabe016b
("bpf: Undo off-by-one in interpreter tail call count limit"). In order
to avoid changing existing behavior, the actual limit is 33 now, this is
reasonable.

After commit 874be05f52 ("bpf, tests: Add tail call test suite"), we can
see there exists failed testcase.

On all archs when CONFIG_BPF_JIT_ALWAYS_ON is not set:
 # echo 0 > /proc/sys/net/core/bpf_jit_enable
 # modprobe test_bpf
 # dmesg | grep -w FAIL
 Tail call error path, max count reached jited:0 ret 34 != 33 FAIL

On some archs:
 # echo 1 > /proc/sys/net/core/bpf_jit_enable
 # modprobe test_bpf
 # dmesg | grep -w FAIL
 Tail call error path, max count reached jited:1 ret 34 != 33 FAIL

Although the above failed testcase has been fixed in commit 18935a72eb
("bpf/tests: Fix error in tail call limit tests"), it would still be good
to change the value of MAX_TAIL_CALL_CNT from 32 to 33 to make the code
more readable.

The 32-bit x86 JIT was using a limit of 32, just fix the wrong comments and
limit to 33 tail calls as the constant MAX_TAIL_CALL_CNT updated. For the
mips64 JIT, use "ori" instead of "addiu" as suggested by Johan Almbladh.
For the riscv JIT, use RV_REG_TCC directly to save one register move as
suggested by Björn Töpel. For the other implementations, no function changes,
it does not change the current limit 33, the new value of MAX_TAIL_CALL_CNT
can reflect the actual max tail call count, the related tail call testcases
in test_bpf module and selftests can work well for the interpreter and the
JIT.

Here are the test results on x86_64:

 # uname -m
 x86_64
 # echo 0 > /proc/sys/net/core/bpf_jit_enable
 # modprobe test_bpf test_suite=test_tail_calls
 # dmesg | tail -1
 test_bpf: test_tail_calls: Summary: 8 PASSED, 0 FAILED, [0/8 JIT'ed]
 # rmmod test_bpf
 # echo 1 > /proc/sys/net/core/bpf_jit_enable
 # modprobe test_bpf test_suite=test_tail_calls
 # dmesg | tail -1
 test_bpf: test_tail_calls: Summary: 8 PASSED, 0 FAILED, [8/8 JIT'ed]
 # rmmod test_bpf
 # ./test_progs -t tailcalls
 #142 tailcalls:OK
 Summary: 1/11 PASSED, 0 SKIPPED, 0 FAILED

Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Tested-by: Johan Almbladh <johan.almbladh@anyfinetworks.com>
Tested-by: Ilya Leoshkevich <iii@linux.ibm.com>
Acked-by: Björn Töpel <bjorn@kernel.org>
Acked-by: Johan Almbladh <johan.almbladh@anyfinetworks.com>
Acked-by: Ilya Leoshkevich <iii@linux.ibm.com>
Link: https://lore.kernel.org/bpf/1636075800-3264-1-git-send-email-yangtiezhu@loongson.cn
2021-11-16 14:03:15 +01:00
Jakub Kicinski
a5bdc36354 Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Daniel Borkmann says:

====================
pull-request: bpf-next 2021-11-15

We've added 72 non-merge commits during the last 13 day(s) which contain
a total of 171 files changed, 2728 insertions(+), 1143 deletions(-).

The main changes are:

1) Add btf_type_tag attributes to bring kernel annotations like __user/__rcu to
   BTF such that BPF verifier will be able to detect misuse, from Yonghong Song.

2) Big batch of libbpf improvements including various fixes, future proofing APIs,
   and adding a unified, OPTS-based bpf_prog_load() low-level API, from Andrii Nakryiko.

3) Add ingress_ifindex to BPF_SK_LOOKUP program type for selectively applying the
   programmable socket lookup logic to packets from a given netdev, from Mark Pashmfouroush.

4) Remove the 128M upper JIT limit for BPF programs on arm64 and add selftest to
   ensure exception handling still works, from Russell King and Alan Maguire.

5) Add a new bpf_find_vma() helper for tracing to map an address to the backing
   file such as shared library, from Song Liu.

6) Batch of various misc fixes to bpftool, fixing a memory leak in BPF program dump,
   updating documentation and bash-completion among others, from Quentin Monnet.

7) Deprecate libbpf bpf_program__get_prog_info_linear() API and migrate its users as
   the API is heavily tailored around perf and is non-generic, from Dave Marchevsky.

8) Enable libbpf's strict mode by default in bpftool and add a --legacy option as an
   opt-out for more relaxed BPF program requirements, from Stanislav Fomichev.

9) Fix bpftool to use libbpf_get_error() to check for errors, from Hengqi Chen.

* https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (72 commits)
  bpftool: Use libbpf_get_error() to check error
  bpftool: Fix mixed indentation in documentation
  bpftool: Update the lists of names for maps and prog-attach types
  bpftool: Fix indent in option lists in the documentation
  bpftool: Remove inclusion of utilities.mak from Makefiles
  bpftool: Fix memory leak in prog_dump()
  selftests/bpf: Fix a tautological-constant-out-of-range-compare compiler warning
  selftests/bpf: Fix an unused-but-set-variable compiler warning
  bpf: Introduce btf_tracing_ids
  bpf: Extend BTF_ID_LIST_GLOBAL with parameter for number of IDs
  bpftool: Enable libbpf's strict mode by default
  docs/bpf: Update documentation for BTF_KIND_TYPE_TAG support
  selftests/bpf: Clarify llvm dependency with btf_tag selftest
  selftests/bpf: Add a C test for btf_type_tag
  selftests/bpf: Rename progs/tag.c to progs/btf_decl_tag.c
  selftests/bpf: Test BTF_KIND_DECL_TAG for deduplication
  selftests/bpf: Add BTF_KIND_TYPE_TAG unit tests
  selftests/bpf: Test libbpf API function btf__add_type_tag()
  bpftool: Support BTF_KIND_TYPE_TAG
  libbpf: Support BTF_KIND_TYPE_TAG
  ...
====================

Link: https://lore.kernel.org/r/20211115162008.25916-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-11-15 08:49:23 -08:00
Arnaldo Carvalho de Melo
06cf00c48f tools headers UAPI: Sync drm/i915_drm.h with the kernel sources
To pick up the changes in:

  e5e32171a2 ("drm/i915/guc: Connect UAPI to GuC multi-lrc interface")
  9409eb3594 ("drm/i915: Expose logical engine instance to user")
  ea673f17ab ("drm/i915/uapi: Add comment clarifying purpose of I915_TILING_* values")
  d3ac8d4216 ("drm/i915/pxp: interfaces for using protected objects")
  cbbd3764b2 ("drm/i915/pxp: Create the arbitrary session after boot")

That don't add any new ioctl, so no changes in tooling.

This silences this perf build warning:

  Warning: Kernel ABI header at 'tools/include/uapi/drm/i915_drm.h' differs from latest version at 'include/uapi/drm/i915_drm.h'
  diff -u tools/include/uapi/drm/i915_drm.h include/uapi/drm/i915_drm.h

Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Huang, Sean Z <sean.z.huang@intel.com>
Cc: John Harrison <John.C.Harrison@Intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Matt Roper <matthew.d.roper@intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-11-13 18:11:51 -03:00
Arnaldo Carvalho de Melo
37057e743c tools headers UAPI: Sync sound/asound.h with the kernel sources
To pick up the changes in:

  5aec579e08 ("ALSA: uapi: Fix a C++ style comment in asound.h")

That is just changing a // style comment to /* */.

This silences this perf build warning:

  Warning: Kernel ABI header at 'tools/include/uapi/sound/asound.h' differs from latest version at 'include/uapi/sound/asound.h'
  diff -u tools/include/uapi/sound/asound.h include/uapi/sound/asound.h

Cc: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-11-13 18:11:51 -03:00
Arnaldo Carvalho de Melo
4902420432 tools headers UAPI: Sync linux/prctl.h with the kernel sources
To pick the changes in:

  61bc346ce6 ("uapi/linux/prctl: provide macro definitions for the PR_SCHED_CORE type argument")

That don't result in any changes in tooling:

  $ tools/perf/trace/beauty/prctl_option.sh > before
  $ cp include/uapi/linux/prctl.h tools/include/uapi/linux/prctl.h
  $ tools/perf/trace/beauty/prctl_option.sh > after
  $ diff -u before after
  $

Just silences this perf tools build warning:

  Warning: Kernel ABI header at 'tools/include/uapi/linux/prctl.h' differs from latest version at 'include/uapi/linux/prctl.h'
  diff -u tools/include/uapi/linux/prctl.h include/uapi/linux/prctl.h

Cc: Christian Brauner <christian.brauner@ubuntu.com>
Cc: Eugene Syromiatnikov <esyr@redhat.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-11-13 18:11:51 -03:00
Arnaldo Carvalho de Melo
7380aa8990 tools headers UAPI: Sync files changed by new futex_waitv syscall
To pick the changes in these csets:

  039c0ec9bb ("futex,x86: Wire up sys_futex_waitv()")
  bf69bad38c ("futex: Implement sys_futex_waitv()")

That add support for this new syscall in tools such as 'perf trace'.

For instance, this is now possible:

  # perf trace -e futex_waitv
  ^C#
  # perf trace -v -e futex_waitv
  Using CPUID AuthenticAMD-25-21-0
  event qualifier tracepoint filter: (common_pid != 807333 && common_pid != 3564) && (id == 449)
  mmap size 528384B
  ^C#
  # perf trace -v -e futex* --max-events 10
  Using CPUID AuthenticAMD-25-21-0
  event qualifier tracepoint filter: (common_pid != 812168 && common_pid != 3564) && (id == 202 || id == 449)
  mmap size 528384B
           ? (         ): Timer/219310  ... [continued]: futex())                                            = -1 ETIMEDOUT (Connection timed out)
       0.012 ( 0.002 ms): Timer/219310 futex(uaddr: 0x7fd0b152d3c8, op: WAKE|PRIVATE_FLAG, val: 1)           = 0
       0.024 ( 0.060 ms): Timer/219310 futex(uaddr: 0x7fd0b152d420, op: WAIT_BITSET|PRIVATE_FLAG, utime: 0x7fd0b1657840, val3: MATCH_ANY) = 0
       0.086 ( 0.001 ms): Timer/219310 futex(uaddr: 0x7fd0b152d3c8, op: WAKE|PRIVATE_FLAG, val: 1)           = 0
       0.088 (         ): Timer/219310 futex(uaddr: 0x7fd0b152d424, op: WAIT_BITSET|PRIVATE_FLAG, utime: 0x7fd0b1657840, val3: MATCH_ANY) ...
       0.075 ( 0.005 ms): Web Content/219299 futex(uaddr: 0x7fd0b152d420, op: WAKE|PRIVATE_FLAG, val: 1)     = 1
       0.169 ( 0.004 ms): Web Content/219299 futex(uaddr: 0x7fd0b152d424, op: WAKE|PRIVATE_FLAG, val: 1)     = 1
       0.088 ( 0.089 ms): Timer/219310  ... [continued]: futex())                                            = 0
       0.179 ( 0.001 ms): Timer/219310 futex(uaddr: 0x7fd0b152d3c8, op: WAKE|PRIVATE_FLAG, val: 1)           = 0
       0.181 (         ): Timer/219310 futex(uaddr: 0x7fd0b152d420, op: WAIT_BITSET|PRIVATE_FLAG, utime: 0x7fd0b1657840, val3: MATCH_ANY) ...
  #

That is the filter expression attached to the raw_syscalls:sys_{enter,exit}
tracepoints.

  $ grep futex_waitv tools/perf/arch/x86/entry/syscalls/syscall_64.tbl
  449	common	futex_waitv		sys_futex_waitv
  $

This addresses these perf build warnings:

  Warning: Kernel ABI header at 'tools/include/uapi/asm-generic/unistd.h' differs from latest version at 'include/uapi/asm-generic/unistd.h'
  diff -u tools/include/uapi/asm-generic/unistd.h include/uapi/asm-generic/unistd.h
  Warning: Kernel ABI header at 'tools/perf/arch/x86/entry/syscalls/syscall_64.tbl' differs from latest version at 'arch/x86/entry/syscalls/syscall_64.tbl'
  diff -u tools/perf/arch/x86/entry/syscalls/syscall_64.tbl arch/x86/entry/syscalls/syscall_64.tbl

Cc: André Almeida <andrealmeid@collabora.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-11-13 18:11:51 -03:00
Yonghong Song
8c42d2fa4e bpf: Support BTF_KIND_TYPE_TAG for btf_type_tag attributes
LLVM patches ([1] for clang, [2] and [3] for BPF backend)
added support for btf_type_tag attributes. This patch
added support for the kernel.

The main motivation for btf_type_tag is to bring kernel
annotations __user, __rcu etc. to btf. With such information
available in btf, bpf verifier can detect mis-usages
and reject the program. For example, for __user tagged pointer,
developers can then use proper helper like bpf_probe_read_user()
etc. to read the data.

BTF_KIND_TYPE_TAG may also useful for other tracing
facility where instead of to require user to specify
kernel/user address type, the kernel can detect it
by itself with btf.

  [1] https://reviews.llvm.org/D111199
  [2] https://reviews.llvm.org/D113222
  [3] https://reviews.llvm.org/D113496

Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20211112012609.1505032-1-yhs@fb.com
2021-11-11 17:41:11 -08:00
Mark Pashmfouroush
f89315650b bpf: Add ingress_ifindex to bpf_sk_lookup
It may be helpful to have access to the ifindex during bpf socket
lookup. An example may be to scope certain socket lookup logic to
specific interfaces, i.e. an interface may be made exempt from custom
lookup code.

Add the ifindex of the arriving connection to the bpf_sk_lookup API.

Signed-off-by: Mark Pashmfouroush <markpash@cloudflare.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20211110111016.5670-2-markpash@cloudflare.com
2021-11-10 16:29:58 -08:00