Commit Graph

854 Commits

Author SHA1 Message Date
Yonghong Song
74fc097de3 tools/bpf: Support new uapi for map element bpf iterator
Previous commit adjusted kernel uapi for map
element bpf iterator. This patch adjusted libbpf API
due to uapi change. bpftool and bpf_iter selftests
are also changed accordingly.

Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20200805055058.1457623-1-yhs@fb.com
2020-08-06 16:39:14 -07:00
Linus Torvalds
47ec5303d7 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next
Pull networking updates from David Miller:

 1) Support 6Ghz band in ath11k driver, from Rajkumar Manoharan.

 2) Support UDP segmentation in code TSO code, from Eric Dumazet.

 3) Allow flashing different flash images in cxgb4 driver, from Vishal
    Kulkarni.

 4) Add drop frames counter and flow status to tc flower offloading,
    from Po Liu.

 5) Support n-tuple filters in cxgb4, from Vishal Kulkarni.

 6) Various new indirect call avoidance, from Eric Dumazet and Brian
    Vazquez.

 7) Fix BPF verifier failures on 32-bit pointer arithmetic, from
    Yonghong Song.

 8) Support querying and setting hardware address of a port function via
    devlink, use this in mlx5, from Parav Pandit.

 9) Support hw ipsec offload on bonding slaves, from Jarod Wilson.

10) Switch qca8k driver over to phylink, from Jonathan McDowell.

11) In bpftool, show list of processes holding BPF FD references to
    maps, programs, links, and btf objects. From Andrii Nakryiko.

12) Several conversions over to generic power management, from Vaibhav
    Gupta.

13) Add support for SO_KEEPALIVE et al. to bpf_setsockopt(), from Dmitry
    Yakunin.

14) Various https url conversions, from Alexander A. Klimov.

15) Timestamping and PHC support for mscc PHY driver, from Antoine
    Tenart.

16) Support bpf iterating over tcp and udp sockets, from Yonghong Song.

17) Support 5GBASE-T i40e NICs, from Aleksandr Loktionov.

18) Add kTLS RX HW offload support to mlx5e, from Tariq Toukan.

19) Fix the ->ndo_start_xmit() return type to be netdev_tx_t in several
    drivers. From Luc Van Oostenryck.

20) XDP support for xen-netfront, from Denis Kirjanov.

21) Support receive buffer autotuning in MPTCP, from Florian Westphal.

22) Support EF100 chip in sfc driver, from Edward Cree.

23) Add XDP support to mvpp2 driver, from Matteo Croce.

24) Support MPTCP in sock_diag, from Paolo Abeni.

25) Commonize UDP tunnel offloading code by creating udp_tunnel_nic
    infrastructure, from Jakub Kicinski.

26) Several pci_ --> dma_ API conversions, from Christophe JAILLET.

27) Add FLOW_ACTION_POLICE support to mlxsw, from Ido Schimmel.

28) Add SK_LOOKUP bpf program type, from Jakub Sitnicki.

29) Refactor a lot of networking socket option handling code in order to
    avoid set_fs() calls, from Christoph Hellwig.

30) Add rfc4884 support to icmp code, from Willem de Bruijn.

31) Support TBF offload in dpaa2-eth driver, from Ioana Ciornei.

32) Support XDP_REDIRECT in qede driver, from Alexander Lobakin.

33) Support PCI relaxed ordering in mlx5 driver, from Aya Levin.

34) Support TCP syncookies in MPTCP, from Flowian Westphal.

35) Fix several tricky cases of PMTU handling wrt. briding, from Stefano
    Brivio.

* git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (2056 commits)
  net: thunderx: initialize VF's mailbox mutex before first usage
  usb: hso: remove bogus check for EINPROGRESS
  usb: hso: no complaint about kmalloc failure
  hso: fix bailout in error case of probe
  ip_tunnel_core: Fix build for archs without _HAVE_ARCH_IPV6_CSUM
  selftests/net: relax cpu affinity requirement in msg_zerocopy test
  mptcp: be careful on subflow creation
  selftests: rtnetlink: make kci_test_encap() return sub-test result
  selftests: rtnetlink: correct the final return value for the test
  net: dsa: sja1105: use detected device id instead of DT one on mismatch
  tipc: set ub->ifindex for local ipv6 address
  ipv6: add ipv6_dev_find()
  net: openvswitch: silence suspicious RCU usage warning
  Revert "vxlan: fix tos value before xmit"
  ptp: only allow phase values lower than 1 period
  farsync: switch from 'pci_' to 'dma_' API
  wan: wanxl: switch from 'pci_' to 'dma_' API
  hv_netvsc: do not use VF device if link is down
  dpaa2-eth: Fix passing zero to 'PTR_ERR' warning
  net: macb: Properly handle phylink on at91sam9x
  ...
2020-08-05 20:13:21 -07:00
Linus Torvalds
99ea1521a0 Remove uninitialized_var() macro for v5.9-rc1
- Clean up non-trivial uses of uninitialized_var()
 - Update documentation and checkpatch for uninitialized_var() removal
 - Treewide removal of uninitialized_var()
 -----BEGIN PGP SIGNATURE-----
 
 iQJKBAABCgA0FiEEpcP2jyKd1g9yPm4TiXL039xtwCYFAl8oYLQWHGtlZXNjb29r
 QGNocm9taXVtLm9yZwAKCRCJcvTf3G3AJsfjEACvf0D3WL3H7sLHtZ2HeMwOgAzq
 il08t6vUscINQwiIIK3Be43ok3uQ1Q+bj8sr2gSYTwunV2IYHFferzgzhyMMno3o
 XBIGd1E+v1E4DGBOiRXJvacBivKrfvrdZ7AWiGlVBKfg2E0fL1aQbe9AYJ6eJSbp
 UGqkBkE207dugS5SQcwrlk1tWKUL089lhDAPd7iy/5RK76OsLRCJFzIerLHF2ZK2
 BwvA+NWXVQI6pNZ0aRtEtbbxwEU4X+2J/uaXH5kJDszMwRrgBT2qoedVu5LXFPi8
 +B84IzM2lii1HAFbrFlRyL/EMueVFzieN40EOB6O8wt60Y4iCy5wOUzAdZwFuSTI
 h0xT3JI8BWtpB3W+ryas9cl9GoOHHtPA8dShuV+Y+Q2bWe1Fs6kTl2Z4m4zKq56z
 63wQCdveFOkqiCLZb8s6FhnS11wKtAX4czvXRXaUPgdVQS1Ibyba851CRHIEY+9I
 AbtogoPN8FXzLsJn7pIxHR4ADz+eZ0dQ18f2hhQpP6/co65bYizNP5H3h+t9hGHG
 k3r2k8T+jpFPaddpZMvRvIVD8O2HvJZQTyY6Vvneuv6pnQWtr2DqPFn2YooRnzoa
 dbBMtpon+vYz6OWokC5QNWLqHWqvY9TmMfcVFUXE4AFse8vh4wJ8jJCNOFVp8On+
 drhmmImUr1YylrtVOw==
 =xHmk
 -----END PGP SIGNATURE-----

Merge tag 'uninit-macro-v5.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux

Pull uninitialized_var() macro removal from Kees Cook:
 "This is long overdue, and has hidden too many bugs over the years. The
  series has several "by hand" fixes, and then a trivial treewide
  replacement.

   - Clean up non-trivial uses of uninitialized_var()

   - Update documentation and checkpatch for uninitialized_var() removal

   - Treewide removal of uninitialized_var()"

* tag 'uninit-macro-v5.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
  compiler: Remove uninitialized_var() macro
  treewide: Remove uninitialized_var() usage
  checkpatch: Remove awareness of uninitialized_var() macro
  mm/debug_vm_pgtable: Remove uninitialized_var() usage
  f2fs: Eliminate usage of uninitialized_var() macro
  media: sur40: Remove uninitialized_var() usage
  KVM: PPC: Book3S PR: Remove uninitialized_var() usage
  clk: spear: Remove uninitialized_var() usage
  clk: st: Remove uninitialized_var() usage
  spi: davinci: Remove uninitialized_var() usage
  ide: Remove uninitialized_var() usage
  rtlwifi: rtl8192cu: Remove uninitialized_var() usage
  b43: Remove uninitialized_var() usage
  drbd: Remove uninitialized_var() usage
  x86/mm/numa: Remove uninitialized_var() usage
  docs: deprecated.rst: Add uninitialized_var()
2020-08-04 13:49:43 -07:00
Linus Torvalds
9ba19ccd2d These were the main changes in this cycle:
- LKMM updates: mostly documentation changes, but also some new litmus tests for atomic ops.
 
  - KCSAN updates: the most important change is that GCC 11 now has all fixes in place
                   to support KCSAN, so GCC support can be enabled again. Also more annotations.
 
  - futex updates: minor cleanups and simplifications
 
  - seqlock updates: merge preparatory changes/cleanups for the 'associated locks' facilities.
 
  - lockdep updates:
     - simplify IRQ trace event handling
     - add various new debug checks
     - simplify header dependencies, split out <linux/lockdep_types.h>, decouple
       lockdep from other low level headers some more
     - fix NMI handling
 
  - misc cleanups and smaller fixes
 
 Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAl8n9/wRHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1hZFQ//dD+AKw9Nym+WbylovmeD0qxWxPyeN/jG
 vBVDTOJIJLtZTkZf6YHcYOJlPwaMDYUQluqTPQhsaQZy/NoEb5NM2cFAj2R9gjyT
 O8665T1dvhW9Sh353mBpuwviqdrnvCeHTBEcglSlFY7hxToYAflUN0+DXGVtNys8
 PFNf3L9SHT0GLVC8+di/eJzQaRqxiB0Pq7kvh2RvPJM/dcQNA9Ho3CCNO5j6qGoY
 u7OnMT8xJXkgbdjjUO4RO0v9VjMuNthZ2JiONDgvgKtJfIL2wt5YXIv1EYX0GuWp
 WZgIzE4o1G7GJOOzKpFfZFyK8grHu2fWgK1plvodWjlLkBmltJZ1qyOM+wngd/m2
 TgtPo73/YFbxFUbbBpkb0eiIaH2t99kMvfCWd05+GiPCtzn9UL9GfFRWd42vonwc
 sQWjFrHKlnuzifUfNcLmKg7R2nUtF3Dm/SydiTJ+9NtH/QA17YJKWnlE1moulNtQ
 p7H7+8UdcvSQ7F38A74v2IYNIyDsv5qcE8ar4QHdaanBBX/LCyD0UlfgsgxEReXf
 GDKkpx7LFQlI6Y2YB+dZgkCwhNBl3/OQ3v6hC95B37fA67dAIQyPIWHiHbaM+029
 gghqU4GcUcbjSnHPzl9PPL+hi9MyXrMjpb7CBXytg4NI4EE1waHR+0kX14V8ndRj
 MkWQOKPUgB0=
 =3MTT
 -----END PGP SIGNATURE-----

Merge tag 'locking-core-2020-08-03' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull locking updates from Ingo Molnar:

 - LKMM updates: mostly documentation changes, but also some new litmus
   tests for atomic ops.

 - KCSAN updates: the most important change is that GCC 11 now has all
   fixes in place to support KCSAN, so GCC support can be enabled again.
   Also more annotations.

 - futex updates: minor cleanups and simplifications

 - seqlock updates: merge preparatory changes/cleanups for the
   'associated locks' facilities.

 - lockdep updates:
    - simplify IRQ trace event handling
    - add various new debug checks
    - simplify header dependencies, split out <linux/lockdep_types.h>,
      decouple lockdep from other low level headers some more
    - fix NMI handling

 - misc cleanups and smaller fixes

* tag 'locking-core-2020-08-03' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (60 commits)
  kcsan: Improve IRQ state trace reporting
  lockdep: Refactor IRQ trace events fields into struct
  seqlock: lockdep assert non-preemptibility on seqcount_t write
  lockdep: Add preemption enabled/disabled assertion APIs
  seqlock: Implement raw_seqcount_begin() in terms of raw_read_seqcount()
  seqlock: Add kernel-doc for seqcount_t and seqlock_t APIs
  seqlock: Reorder seqcount_t and seqlock_t API definitions
  seqlock: seqcount_t latch: End read sections with read_seqcount_retry()
  seqlock: Properly format kernel-doc code samples
  Documentation: locking: Describe seqlock design and usage
  locking/qspinlock: Do not include atomic.h from qspinlock_types.h
  locking/atomic: Move ATOMIC_INIT into linux/types.h
  lockdep: Move list.h inclusion into lockdep.h
  locking/lockdep: Fix TRACE_IRQFLAGS vs. NMIs
  futex: Remove unused or redundant includes
  futex: Consistently use fshared as boolean
  futex: Remove needless goto's
  futex: Remove put_futex_key()
  rwsem: fix commas in initialisation
  docs: locking: Replace HTTP links with HTTPS ones
  ...
2020-08-03 14:39:35 -07:00
Linus Torvalds
145ff1ec09 arm64 and cross-arch updates for 5.9:
- Removal of the tremendously unpopular read_barrier_depends() barrier,
   which is a NOP on all architectures apart from Alpha, in favour of
   allowing architectures to override READ_ONCE() and do whatever dance
   they need to do to ensure address dependencies provide LOAD ->
   LOAD/STORE ordering. This work also offers a potential solution if
   compilers are shown to convert LOAD -> LOAD address dependencies into
   control dependencies (e.g. under LTO), as weakly ordered architectures
   will effectively be able to upgrade READ_ONCE() to smp_load_acquire().
   The latter case is not used yet, but will be discussed further at LPC.
 
 - Make the MSI/IOMMU input/output ID translation PCI agnostic, augment
   the MSI/IOMMU ACPI/OF ID mapping APIs to accept an input ID
   bus-specific parameter and apply the resulting changes to the device
   ID space provided by the Freescale FSL bus.
 
 - arm64 support for TLBI range operations and translation table level
   hints (part of the ARMv8.4 architecture version).
 
 - Time namespace support for arm64.
 
 - Export the virtual and physical address sizes in vmcoreinfo for
   makedumpfile and crash utilities.
 
 - CPU feature handling cleanups and checks for programmer errors
   (overlapping bit-fields).
 
 - ACPI updates for arm64: disallow AML accesses to EFI code regions and
   kernel memory.
 
 - perf updates for arm64.
 
 - Miscellaneous fixes and cleanups, most notably PLT counting
   optimisation for module loading, recordmcount fix to ignore
   relocations other than R_AARCH64_CALL26, CMA areas reserved for
   gigantic pages on 16K and 64K configurations.
 
 - Trivial typos, duplicate words.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEE5RElWfyWxS+3PLO2a9axLQDIXvEFAl8oTcsACgkQa9axLQDI
 XvEj6hAAkn39mO5xrR/Vhpg3DyFPk63ZlMSX9SsOeVyaLbovT6stTs1XAZXPpnkt
 rV3gwACyGSrqH6+uey9pHgHJuPF2TdrGEVK08yVKo9KGW/6yXSIncdKFE4jUJ/WJ
 wF5j7eMET2aGzcpm5AlzMmq6HOrKB8nZac9H8/x6H+Ox2WdgJkEjOkDvyqACUyum
 N3FsTZkWj2pIkTXHNgDZ8KjxVLO8HlFaB2hkxFDl9NPlX2UTCQJ8Tg1KiPLafKaK
 gUvH4usQDFdb5RU/UWogre37J4emO0ZTApZOyju+U+PMMWlWVHjZ4isUIS9zz/AE
 JNZ23dnKZX2HrYa5p8HZx175zwj/vXUqUHCZPLvQXaAudCEhF8BVljPiG0e80FV5
 GHFUgUbylKspp01I/9L+2JvsG96Mr0e+P3Sx7L2HTI42cmtoSa14+MpoSRj7zlft
 Qcl8hfrVOjCjUnFRHa/1y1cGvnD9GbgnKJR7zgVxl9bD/Jd48r1HUtwRORZCzWFr
 mRPVbPS72fWxMzMV9DZYJm02jJY9kLX2BMl49njbB8MhAhzOvrMVzoVVtMMeRFLR
 XHeJpmg36W09FiRGe7LRXlkXIhCQzQG2bJfiphuupCfhjRAitPoq8I925G6Pig60
 c8RWaXGU7PrEsdMNrL83vekvGKgqrkoFkRVtsCoQ2X6Hvu/XdYI=
 =mh79
 -----END PGP SIGNATURE-----

Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux

Pull arm64 and cross-arch updates from Catalin Marinas:
 "Here's a slightly wider-spread set of updates for 5.9.

  Going outside the usual arch/arm64/ area is the removal of
  read_barrier_depends() series from Will and the MSI/IOMMU ID
  translation series from Lorenzo.

  The notable arm64 updates include ARMv8.4 TLBI range operations and
  translation level hint, time namespace support, and perf.

  Summary:

   - Removal of the tremendously unpopular read_barrier_depends()
     barrier, which is a NOP on all architectures apart from Alpha, in
     favour of allowing architectures to override READ_ONCE() and do
     whatever dance they need to do to ensure address dependencies
     provide LOAD -> LOAD/STORE ordering.

     This work also offers a potential solution if compilers are shown
     to convert LOAD -> LOAD address dependencies into control
     dependencies (e.g. under LTO), as weakly ordered architectures will
     effectively be able to upgrade READ_ONCE() to smp_load_acquire().
     The latter case is not used yet, but will be discussed further at
     LPC.

   - Make the MSI/IOMMU input/output ID translation PCI agnostic,
     augment the MSI/IOMMU ACPI/OF ID mapping APIs to accept an input ID
     bus-specific parameter and apply the resulting changes to the
     device ID space provided by the Freescale FSL bus.

   - arm64 support for TLBI range operations and translation table level
     hints (part of the ARMv8.4 architecture version).

   - Time namespace support for arm64.

   - Export the virtual and physical address sizes in vmcoreinfo for
     makedumpfile and crash utilities.

   - CPU feature handling cleanups and checks for programmer errors
     (overlapping bit-fields).

   - ACPI updates for arm64: disallow AML accesses to EFI code regions
     and kernel memory.

   - perf updates for arm64.

   - Miscellaneous fixes and cleanups, most notably PLT counting
     optimisation for module loading, recordmcount fix to ignore
     relocations other than R_AARCH64_CALL26, CMA areas reserved for
     gigantic pages on 16K and 64K configurations.

   - Trivial typos, duplicate words"

Link: http://lkml.kernel.org/r/20200710165203.31284-1-will@kernel.org
Link: http://lkml.kernel.org/r/20200619082013.13661-1-lorenzo.pieralisi@arm.com

* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (82 commits)
  arm64: use IRQ_STACK_SIZE instead of THREAD_SIZE for irq stack
  arm64/mm: save memory access in check_and_switch_context() fast switch path
  arm64: sigcontext.h: delete duplicated word
  arm64: ptrace.h: delete duplicated word
  arm64: pgtable-hwdef.h: delete duplicated words
  bus: fsl-mc: Add ACPI support for fsl-mc
  bus/fsl-mc: Refactor the MSI domain creation in the DPRC driver
  of/irq: Make of_msi_map_rid() PCI bus agnostic
  of/irq: make of_msi_map_get_device_domain() bus agnostic
  dt-bindings: arm: fsl: Add msi-map device-tree binding for fsl-mc bus
  of/device: Add input id to of_dma_configure()
  of/iommu: Make of_map_rid() PCI agnostic
  ACPI/IORT: Add an input ID to acpi_dma_configure()
  ACPI/IORT: Remove useless PCI bus walk
  ACPI/IORT: Make iort_msi_map_rid() PCI agnostic
  ACPI/IORT: Make iort_get_device_domain IRQ domain agnostic
  ACPI/IORT: Make iort_match_node_callback walk the ACPI namespace for NC
  arm64: enable time namespace support
  arm64/vdso: Restrict splitting VVAR VMA
  arm64/vdso: Handle faults on timens page
  ...
2020-08-03 14:11:08 -07:00
Andrii Nakryiko
2e49527e52 libbpf: Add bpf_link detach APIs
Add low-level bpf_link_detach() API. Also add higher-level bpf_link__detach()
one.

Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Song Liu <songliubraving@fb.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20200731182830.286260-3-andriin@fb.com
2020-08-01 20:38:28 -07:00
Catalin Marinas
0e4cd9f265 Merge branch 'for-next/read-barrier-depends' into for-next/core
* for-next/read-barrier-depends:
  : Allow architectures to override __READ_ONCE()
  arm64: Reduce the number of header files pulled into vmlinux.lds.S
  compiler.h: Move compiletime_assert() macros into compiler_types.h
  checkpatch: Remove checks relating to [smp_]read_barrier_depends()
  include/linux: Remove smp_read_barrier_depends() from comments
  tools/memory-model: Remove smp_read_barrier_depends() from informal doc
  Documentation/barriers/kokr: Remove references to [smp_]read_barrier_depends()
  Documentation/barriers: Remove references to [smp_]read_barrier_depends()
  locking/barriers: Remove definitions for [smp_]read_barrier_depends()
  alpha: Replace smp_read_barrier_depends() usage with smp_[r]mb()
  vhost: Remove redundant use of read_barrier_depends() barrier
  asm/rwonce: Don't pull <asm/barrier.h> into 'asm-generic/rwonce.h'
  asm/rwonce: Remove smp_read_barrier_depends() invocation
  alpha: Override READ_ONCE() with barriered implementation
  asm/rwonce: Allow __READ_ONCE to be overridden by the architecture
  compiler.h: Split {READ,WRITE}_ONCE definitions out into rwonce.h
  tools: bpf: Use local copy of headers including uapi/linux/filter.h
2020-07-31 18:09:57 +01:00
Andrii Nakryiko
e1613b5714 bpf: Fix bpf_ringbuf_output() signature to return long
Due to bpf tree fix merge, bpf_ringbuf_output() signature ended up with int as
a return type, while all other helpers got converted to returning long. So fix
it in bpf-next now.

Fixes: b0659d8a95 ("bpf: Fix definition of bpf_ringbuf_output() helper in UAPI comments")
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Song Liu <songliubraving@fb.com>
Link: https://lore.kernel.org/bpf/20200727224715.652037-1-andriin@fb.com
2020-07-28 12:20:44 +02:00
Andrii Nakryiko
dc8698cac7 libbpf: Add support for BPF XDP link
Sync UAPI header and add support for using bpf_link-based XDP attachment.
Make xdp/ prog type set expected attach type. Kernel didn't enforce
attach_type for XDP programs before, so there is no backwards compatiblity
issues there.

Also fix section_names selftest to recognize that xdp prog types now have
expected attach type.

Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200722064603.3350758-8-andriin@fb.com
2020-07-25 20:37:02 -07:00
Yonghong Song
a5cbe05a66 bpf: Implement bpf iterator for map elements
The bpf iterator for map elements are implemented.
The bpf program will receive four parameters:
  bpf_iter_meta *meta: the meta data
  bpf_map *map:        the bpf_map whose elements are traversed
  void *key:           the key of one element
  void *value:         the value of the same element

Here, meta and map pointers are always valid, and
key has register type PTR_TO_RDONLY_BUF_OR_NULL and
value has register type PTR_TO_RDWR_BUF_OR_NULL.
The kernel will track the access range of key and value
during verification time. Later, these values will be compared
against the values in the actual map to ensure all accesses
are within range.

A new field iter_seq_info is added to bpf_map_ops which
is used to add map type specific information, i.e., seq_ops,
init/fini seq_file func and seq_file private data size.
Subsequent patches will have actual implementation
for bpf_map_ops->iter_seq_info.

In user space, BPF_ITER_LINK_MAP_FD needs to be
specified in prog attr->link_create.flags, which indicates
that attr->link_create.target_fd is a map_fd.
The reason for such an explicit flag is for possible
future cases where one bpf iterator may allow more than
one possible customization, e.g., pid and cgroup id for
task_file.

Current kernel internal implementation only allows
the target to register at most one required bpf_iter_link_info.
To support the above case, optional bpf_iter_link_info's
are needed, the target can be extended to register such link
infos, and user provided link_info needs to match one of
target supported ones.

Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200723184112.590360-1-yhs@fb.com
2020-07-25 20:16:32 -07:00
Ingo Molnar
c84d53051f Linux 5.8-rc6
-----BEGIN PGP SIGNATURE-----
 
 iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAl8UzA4eHHRvcnZhbGRz
 QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGQ7cH/3v+Gv+SmHJCvaT2
 CSu0+7okVnYbY3UTb3hykk7/aOqb6284KjxR03r0CWFzsEsZVhC5pvvruASSiMQg
 Pi04sLqv6CsGLHd1n+pl4AUYEaxq6k4KS3uU3HHSWxrahDDApQoRUx2F8lpOxyj8
 RiwnoO60IMPA7IFJqzcZuFqsgdxqiiYvnzT461KX8Mrw6fyMXeR2KAj2NwMX8dZN
 At21Sf8+LSoh6q2HnugfiUd/jR10XbfxIIx2lXgIinb15GXgWydEQVrDJ7cUV7ix
 Jd0S+dtOtp+lWtFHDoyjjqqsMV7+G8i/rFNZoxSkyZqsUTaKzaR6JD3moSyoYZgG
 0+eXO4A=
 =9EpR
 -----END PGP SIGNATURE-----

Merge tag 'v5.8-rc6' into locking/core, to pick up fixes

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2020-07-25 21:49:36 +02:00
David S. Miller
dee72f8a0c Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Alexei Starovoitov says:

====================
pull-request: bpf-next 2020-07-21

The following pull-request contains BPF updates for your *net-next* tree.

We've added 46 non-merge commits during the last 6 day(s) which contain
a total of 68 files changed, 4929 insertions(+), 526 deletions(-).

The main changes are:

1) Run BPF program on socket lookup, from Jakub.

2) Introduce cpumap, from Lorenzo.

3) s390 JIT fixes, from Ilya.

4) teach riscv JIT to emit compressed insns, from Luke.

5) use build time computed BTF ids in bpf iter, from Yonghong.
====================

Purely independent overlapping changes in both filter.h and xdp.h

Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-22 12:35:33 -07:00
Yonghong Song
fce557bcef bpf: Make btf_sock_ids global
tcp and udp bpf_iter can reuse some socket ids in
btf_sock_ids, so make it global.

I put the extern definition in btf_ids.h as a central
place so it can be easily discovered by developers.

Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200720163402.1393427-1-yhs@fb.com
2020-07-21 13:26:26 -07:00
Yonghong Song
0f12e584b2 bpf: Add BTF_ID_LIST_GLOBAL in btf_ids.h
Existing BTF_ID_LIST used a local static variable
to store btf_ids. This patch provided a new macro
BTF_ID_LIST_GLOBAL to store btf_ids in a global
variable which can be shared among multiple files.

The existing BTF_ID_LIST is still retained.
Two reasons. First, BTF_ID_LIST is also used to build
btf_ids for helper arguments which typically
is an array of 5. Since typically different
helpers have different signature, it makes
little sense to share them. Second, some
current computed btf_ids are indeed local.
If later those btf_ids are shared between
different files, they can use BTF_ID_LIST_GLOBAL then.

Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Link: https://lore.kernel.org/bpf/20200720163401.1393159-1-yhs@fb.com
2020-07-21 13:26:26 -07:00
Yonghong Song
d8dfe5bfe8 tools/bpf: Sync btf_ids.h to tools
Sync kernel header btf_ids.h to tools directory.
Also define macro CONFIG_DEBUG_INFO_BTF before
including btf_ids.h in prog_tests/resolve_btfids.c
since non-stub definitions for BTF_ID_LIST etc. macros
are defined under CONFIG_DEBUG_INFO_BTF. This
prevented test_progs from failing.

Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200720163359.1393079-1-yhs@fb.com
2020-07-21 13:26:26 -07:00
Will Deacon
f143c11bb7 tools: bpf: Use local copy of headers including uapi/linux/filter.h
Pulling header files directly out of the kernel sources for inclusion in
userspace programs is highly error prone, not least because it bypasses
the kbuild infrastructure entirely and so may end up referencing other
header files that have not been generated.

Subsequent patches will cause compiler.h to pull in the ungenerated
asm/rwonce.h file via filter.h, breaking the build for tools/bpf:

  | $ make -C tools/bpf
  | make: Entering directory '/linux/tools/bpf'
  |   CC       bpf_jit_disasm.o
  |   LINK     bpf_jit_disasm
  |   CC       bpf_dbg.o
  | In file included from /linux/include/uapi/linux/filter.h:9,
  |                  from /linux/tools/bpf/bpf_dbg.c:41:
  | /linux/include/linux/compiler.h:247:10: fatal error: asm/rwonce.h: No such file or directory
  |  #include <asm/rwonce.h>
  |           ^~~~~~~~~~~~~~
  | compilation terminated.
  | make: *** [Makefile:61: bpf_dbg.o] Error 1
  | make: Leaving directory '/linux/tools/bpf'

Take a copy of the installed version of linux/filter.h  (i.e. the one
created by the 'headers_install' target) into tools/include/uapi/linux/
and adjust the BPF tool Makefile to reference the local include
directories instead of those in the main source tree.

Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Suggested-by: Daniel Borkmann <daniel@iogearbox.net>
Reported-by: Xiao Yang <ice_yangxiao@163.com>
Signed-off-by: Will Deacon <will@kernel.org>
2020-07-21 10:50:35 +01:00
Leo Yan
5271d915a9 tools headers UAPI: Update tools's copy of linux/perf_event.h
To get the changes in the commit:

  "perf: Add perf_event_mmap_page::cap_user_time_short ABI"

This update is a prerequisite to add support for short clock counters
related ABI extension.

Signed-off-by: Leo Yan <leo.yan@linaro.org>
Link: https://lore.kernel.org/r/20200716051130.4359-8-leo.yan@linaro.org
Signed-off-by: Will Deacon <will@kernel.org>
2020-07-20 11:50:47 +01:00
Christoph Hellwig
55db9c0e85 net: remove compat_sys_{get,set}sockopt
Now that the ->compat_{get,set}sockopt proto_ops methods are gone
there is no good reason left to keep the compat syscalls separate.

This fixes the odd use of unsigned int for the compat_setsockopt
optlen and the missing sock_use_custom_sol_socket.

It would also easily allow running the eBPF hooks for the compat
syscalls, but such a large change in behavior does not belong into
a consolidation patch like this one.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-19 18:16:40 -07:00
Jakub Sitnicki
a352b32ae9 bpf: Sync linux/bpf.h to tools/
Newly added program, context type and helper is used by tests in a
subsequent patch. Synchronize the header file.

Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/20200717103536.397595-12-jakub@cloudflare.com
2020-07-17 20:18:17 -07:00
Kees Cook
63a0895d96 compiler: Remove uninitialized_var() macro
Using uninitialized_var() is dangerous as it papers over real bugs[1]
(or can in the future), and suppresses unrelated compiler warnings
(e.g. "unused variable"). If the compiler thinks it is uninitialized,
either simply initialize the variable or make compiler changes.

As recommended[2] by[3] Linus[4], remove the macro. With the recent
change to disable -Wmaybe-uninitialized in v5.7 in commit 78a5255ffb
("Stop the ad-hoc games with -Wno-maybe-initialized"), this is likely
the best time to make this treewide change.

[1] https://lore.kernel.org/lkml/20200603174714.192027-1-glider@google.com/
[2] https://lore.kernel.org/lkml/CA+55aFw+Vbj0i=1TGqCR5vQkCzWJ0QxK6CernOU6eedsudAixw@mail.gmail.com/
[3] https://lore.kernel.org/lkml/CA+55aFwgbgqhbp1fkxvRKEpzyR5J8n1vKT1VZdz9knmPuXhOeg@mail.gmail.com/
[4] https://lore.kernel.org/lkml/CA+55aFz2500WfbKXAx8s67wrm9=yVJu65TpLgN_ybYNv0VEOKA@mail.gmail.com/

Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Reviewed-by: Bart van Assche <bvanassche@acm.org>
Reviewed-by: Miguel Ojeda <miguel.ojeda.sandonis@gmail.com>
Tested-by: Nathan Chancellor <natechancellor@gmail.com>
Tested-by: Sedat Dilek <sedat.dilek@gmail.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
2020-07-16 12:35:31 -07:00
Randy Dunlap
bfdfa51702 bpf: Drop duplicated words in uapi helper comments
Drop doubled words "will" and "attach".

Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/6b9f71ae-4f8e-0259-2c5d-187ddaefe6eb@infradead.org
2020-07-16 21:00:09 +02:00
Lorenzo Bianconi
9216477449 bpf: cpumap: Add the possibility to attach an eBPF program to cpumap
Introduce the capability to attach an eBPF program to cpumap entries.
The idea behind this feature is to add the possibility to define on
which CPU run the eBPF program if the underlying hw does not support
RSS. Current supported verdicts are XDP_DROP and XDP_PASS.

This patch has been tested on Marvell ESPRESSObin using xdp_redirect_cpu
sample available in the kernel tree to identify possible performance
regressions. Results show there are no observable differences in
packet-per-second:

$./xdp_redirect_cpu --progname xdp_cpu_map0 --dev eth0 --cpu 1
rx: 354.8 Kpps
rx: 356.0 Kpps
rx: 356.8 Kpps
rx: 356.3 Kpps
rx: 356.6 Kpps
rx: 356.6 Kpps
rx: 356.7 Kpps
rx: 355.8 Kpps
rx: 356.8 Kpps
rx: 356.8 Kpps

Co-developed-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Link: https://lore.kernel.org/bpf/5c9febdf903d810b3415732e5cd98491d7d9067a.1594734381.git.lorenzo@kernel.org
2020-07-16 17:00:32 +02:00
Lorenzo Bianconi
644bfe51fa cpumap: Formalize map value as a named struct
As it has been already done for devmap, introduce 'struct bpf_cpumap_val'
to formalize the expected values that can be passed in for a CPUMAP.
Update cpumap code to use the struct.

Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Link: https://lore.kernel.org/bpf/754f950674665dae6139c061d28c1d982aaf4170.1594734381.git.lorenzo@kernel.org
2020-07-16 17:00:32 +02:00
Horatiu Vultur
ffb3adba64 net: bridge: Add port attribute IFLA_BRPORT_MRP_IN_OPEN
This patch adds a new port attribute, IFLA_BRPORT_MRP_IN_OPEN, which
allows to notify the userspace when the node lost the contiuity of
MRP_InTest frames.

Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-14 13:46:43 -07:00
David S. Miller
07dd1b7e68 Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Alexei Starovoitov says:

====================
pull-request: bpf-next 2020-07-13

The following pull-request contains BPF updates for your *net-next* tree.

We've added 36 non-merge commits during the last 7 day(s) which contain
a total of 62 files changed, 2242 insertions(+), 468 deletions(-).

The main changes are:

1) Avoid trace_printk warning banner by switching bpf_trace_printk to use
   its own tracing event, from Alan.

2) Better libbpf support on older kernels, from Andrii.

3) Additional AF_XDP stats, from Ciara.

4) build time resolution of BTF IDs, from Jiri.

5) BPF_CGROUP_INET_SOCK_RELEASE hook, from Stanislav.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-13 18:04:05 -07:00
Ciara Loftus
8aa5a33578 xsk: Add new statistics
It can be useful for the user to know the reason behind a dropped packet.
Introduce new counters which track drops on the receive path caused by:
1. rx ring being full
2. fill ring being empty

Also, on the tx path introduce a counter which tracks the number of times
we attempt pull from the tx ring when it is empty.

Signed-off-by: Ciara Loftus <ciara.loftus@intel.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200708072835.4427-2-ciara.loftus@intel.com
2020-07-13 15:32:56 -07:00
Jiri Olsa
e5a0516ec9 tools headers: Adopt verbatim copy of btf_ids.h from kernel sources
It will be needed by bpf selftest for resolve_btfids tool.

Also adding __PASTE macro as btf_ids.h dependency, which is
defined in:

  include/linux/compiler_types.h

but because tools/include do not have this header, I'm putting
the macro into linux/compiler.h header.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Tested-by: Andrii Nakryiko <andriin@fb.com>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/20200711215329.41165-9-jolsa@kernel.org
2020-07-13 10:42:03 -07:00
David S. Miller
71930d6102 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
All conflicts seemed rather trivial, with some guidance from
Saeed Mameed on the tc_ct.c one.

Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-11 00:46:00 -07:00
Linus Torvalds
5a764898af Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from David Miller:

 1) Restore previous behavior of CAP_SYS_ADMIN wrt loading networking
    BPF programs, from Maciej Żenczykowski.

 2) Fix dropped broadcasts in mac80211 code, from Seevalamuthu
    Mariappan.

 3) Slay memory leak in nl80211 bss color attribute parsing code, from
    Luca Coelho.

 4) Get route from skb properly in ip_route_use_hint(), from Miaohe Lin.

 5) Don't allow anything other than ARPHRD_ETHER in llc code, from Eric
    Dumazet.

 6) xsk code dips too deeply into DMA mapping implementation internals.
    Add dma_need_sync and use it. From Christoph Hellwig

 7) Enforce power-of-2 for BPF ringbuf sizes. From Andrii Nakryiko.

 8) Check for disallowed attributes when loading flow dissector BPF
    programs. From Lorenz Bauer.

 9) Correct packet injection to L3 tunnel devices via AF_PACKET, from
    Jason A. Donenfeld.

10) Don't advertise checksum offload on ipa devices that don't support
    it. From Alex Elder.

11) Resolve several issues in TCP MD5 signature support. Missing memory
    barriers, bogus options emitted when using syncookies, and failure
    to allow md5 key changes in established states. All from Eric
    Dumazet.

12) Fix interface leak in hsr code, from Taehee Yoo.

13) VF reset fixes in hns3 driver, from Huazhong Tan.

14) Make loopback work again with ipv6 anycast, from David Ahern.

15) Fix TX starvation under high load in fec driver, from Tobias
    Waldekranz.

16) MLD2 payload lengths not checked properly in bridge multicast code,
    from Linus Lüssing.

17) Packet scheduler code that wants to find the inner protocol
    currently only works for one level of VLAN encapsulation. Allow
    Q-in-Q situations to work properly here, from Toke
    Høiland-Jørgensen.

18) Fix route leak in l2tp, from Xin Long.

19) Resolve conflict between the sk->sk_user_data usage of bpf reuseport
    support and various protocols. From Martin KaFai Lau.

20) Fix socket cgroup v2 reference counting in some situations, from
    Cong Wang.

21) Cure memory leak in mlx5 connection tracking offload support, from
    Eli Britstein.

* git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (146 commits)
  mlxsw: pci: Fix use-after-free in case of failed devlink reload
  mlxsw: spectrum_router: Remove inappropriate usage of WARN_ON()
  net: macb: fix call to pm_runtime in the suspend/resume functions
  net: macb: fix macb_suspend() by removing call to netif_carrier_off()
  net: macb: fix macb_get/set_wol() when moving to phylink
  net: macb: mark device wake capable when "magic-packet" property present
  net: macb: fix wakeup test in runtime suspend/resume routines
  bnxt_en: fix NULL dereference in case SR-IOV configuration fails
  libbpf: Fix libbpf hashmap on (I)LP32 architectures
  net/mlx5e: CT: Fix memory leak in cleanup
  net/mlx5e: Fix port buffers cell size value
  net/mlx5e: Fix 50G per lane indication
  net/mlx5e: Fix CPU mapping after function reload to avoid aRFS RX crash
  net/mlx5e: Fix VXLAN configuration restore after function reload
  net/mlx5e: Fix usage of rcu-protected pointer
  net/mxl5e: Verify that rpriv is not NULL
  net/mlx5: E-Switch, Fix vlan or qos setting in legacy mode
  net/mlx5: Fix eeprom support for SFP module
  cgroup: Fix sock_cgroup_data on big-endian.
  selftests: bpf: Fix detach from sockmap tests
  ...
2020-07-10 18:16:22 -07:00
Peter Zijlstra
f9ad4a5f3f lockdep: Remove lockdep_hardirq{s_enabled,_context}() argument
Now that the macros use per-cpu data, we no longer need the argument.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Link: https://lkml.kernel.org/r/20200623083721.571835311@infradead.org
2020-07-10 12:00:02 +02:00
Linus Torvalds
6ec4476ac8 Raise gcc version requirement to 4.9
I realize that we fairly recently raised it to 4.8, but the fact is, 4.9
is a much better minimum version to target.

We have a number of workarounds for actual bugs in pre-4.9 gcc versions
(including things like internal compiler errors on ARM), but we also
have some syntactic workarounds for lacking features.

In particular, raising the minimum to 4.9 means that we can now just
assume _Generic() exists, which is likely the much better replacement
for a lot of very convoluted built-time magic with conditionals on
sizeof and/or __builtin_choose_expr() with same_type() etc.

Using _Generic also means that you will need to have a very recent
version of 'sparse', but thats easy to build yourself, and much less of
a hassle than some old gcc version can be.

The latest (in a long string) of reasons for minimum compiler version
upgrades was commit 5435f73d5c ("efi/x86: Fix build with gcc 4").

Ard points out that RHEL 7 uses gcc-4.8, but the people who stay back on
old RHEL versions persumably also don't build their own kernels anyway.
And maybe they should cross-built or just have a little side affair with
a newer compiler?

Acked-by: Ard Biesheuvel <ardb@kernel.org>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-07-08 10:48:35 -07:00
Stanislav Fomichev
e8b012e9fa libbpf: Add support for BPF_CGROUP_INET_SOCK_RELEASE
Add auto-detection for the cgroup/sock_release programs.

Signed-off-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/20200706230128.4073544-3-sdf@google.com
2020-07-08 01:07:35 +02:00
David S. Miller
f91c031e65 Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Daniel Borkmann says:

====================
pull-request: bpf-next 2020-07-04

The following pull-request contains BPF updates for your *net-next* tree.

We've added 73 non-merge commits during the last 17 day(s) which contain
a total of 106 files changed, 5233 insertions(+), 1283 deletions(-).

The main changes are:

1) bpftool ability to show PIDs of processes having open file descriptors
   for BPF map/program/link/BTF objects, relying on BPF iterator progs
   to extract this info efficiently, from Andrii Nakryiko.

2) Addition of BPF iterator progs for dumping TCP and UDP sockets to
   seq_files, from Yonghong Song.

3) Support access to BPF map fields in struct bpf_map from programs
   through BTF struct access, from Andrey Ignatov.

4) Add a bpf_get_task_stack() helper to be able to dump /proc/*/stack
   via seq_file from BPF iterator progs, from Song Liu.

5) Make SO_KEEPALIVE and related options available to bpf_setsockopt()
   helper, from Dmitry Yakunin.

6) Optimize BPF sk_storage selection of its caching index, from Martin
   KaFai Lau.

7) Removal of redundant synchronize_rcu()s from BPF map destruction which
   has been a historic leftover, from Alexei Starovoitov.

8) Several improvements to test_progs to make it easier to create a shell
   loop that invokes each test individually which is useful for some CIs,
   from Jesper Dangaard Brouer.

9) Fix bpftool prog dump segfault when compiled without skeleton code on
   older clang versions, from John Fastabend.

10) Bunch of cleanups and minor improvements, from various others.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-04 17:48:34 -07:00
Song Liu
fa28dcb82a bpf: Introduce helper bpf_get_task_stack()
Introduce helper bpf_get_task_stack(), which dumps stack trace of given
task. This is different to bpf_get_stack(), which gets stack track of
current task. One potential use case of bpf_get_task_stack() is to call
it from bpf_iter__task and dump all /proc/<pid>/stack to a seq_file.

bpf_get_task_stack() uses stack_trace_save_tsk() instead of
get_perf_callchain() for kernel stack. The benefit of this choice is that
stack_trace_save_tsk() doesn't require changes in arch/. The downside of
using stack_trace_save_tsk() is that stack_trace_save_tsk() dumps the
stack trace to unsigned long array. For 32-bit systems, we need to
translate it to u64 array.

Signed-off-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/20200630062846.664389-3-songliubraving@fb.com
2020-07-01 08:23:19 -07:00
David S. Miller
e708e2bd55 Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf
Daniel Borkmann says:

====================
pull-request: bpf 2020-06-30

The following pull-request contains BPF updates for your *net* tree.

We've added 28 non-merge commits during the last 9 day(s) which contain
a total of 35 files changed, 486 insertions(+), 232 deletions(-).

The main changes are:

1) Fix an incorrect verifier branch elimination for PTR_TO_BTF_ID pointer
   types, from Yonghong Song.

2) Fix UAPI for sockmap and flow_dissector progs that were ignoring various
   arguments passed to BPF_PROG_{ATTACH,DETACH}, from Lorenz Bauer & Jakub Sitnicki.

3) Fix broken AF_XDP DMA hacks that are poking into dma-direct and swiotlb
   internals and integrate it properly into DMA core, from Christoph Hellwig.

4) Fix RCU splat from recent changes to avoid skipping ingress policy when
   kTLS is enabled, from John Fastabend.

5) Fix BPF ringbuf map to enforce size to be the power of 2 in order for its
   position masking to work, from Andrii Nakryiko.

6) Fix regression from CAP_BPF work to re-allow CAP_SYS_ADMIN for loading
   of network programs, from Maciej Żenczykowski.

7) Fix libbpf section name prefix for devmap progs, from Jesper Dangaard Brouer.

8) Fix formatting in UAPI documentation for BPF helpers, from Quentin Monnet.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-30 14:20:45 -07:00
Linus Torvalds
4a21185cda Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from David Miller:

 1) Don't insert ESP trailer twice in IPSEC code, from Huy Nguyen.

 2) The default crypto algorithm selection in Kconfig for IPSEC is out
    of touch with modern reality, fix this up. From Eric Biggers.

 3) bpftool is missing an entry for BPF_MAP_TYPE_RINGBUF, from Andrii
    Nakryiko.

 4) Missing init of ->frame_sz in xdp_convert_zc_to_xdp_frame(), from
    Hangbin Liu.

 5) Adjust packet alignment handling in ax88179_178a driver to match
    what the hardware actually does. From Jeremy Kerr.

 6) register_netdevice can leak in the case one of the notifiers fail,
    from Yang Yingliang.

 7) Use after free in ip_tunnel_lookup(), from Taehee Yoo.

 8) VLAN checks in sja1105 DSA driver need adjustments, from Vladimir
    Oltean.

 9) tg3 driver can sleep forever when we get enough EEH errors, fix from
    David Christensen.

10) Missing {READ,WRITE}_ONCE() annotations in various Intel ethernet
    drivers, from Ciara Loftus.

11) Fix scanning loop break condition in of_mdiobus_register(), from
    Florian Fainelli.

12) MTU limit is incorrect in ibmveth driver, from Thomas Falcon.

13) Endianness fix in mlxsw, from Ido Schimmel.

14) Use after free in smsc95xx usbnet driver, from Tuomas Tynkkynen.

15) Missing bridge mrp configuration validation, from Horatiu Vultur.

16) Fix circular netns references in wireguard, from Jason A. Donenfeld.

17) PTP initialization on recovery is not done properly in qed driver,
    from Alexander Lobakin.

18) Endian conversion of L4 ports in filters of cxgb4 driver is wrong,
    from Rahul Lakkireddy.

19) Don't clear bound device TX queue of socket prematurely otherwise we
    get problems with ktls hw offloading, from Tariq Toukan.

20) ipset can do atomics on unaligned memory, fix from Russell King.

21) Align ethernet addresses properly in bridging code, from Thomas
    Martitz.

22) Don't advertise ipv4 addresses on SCTP sockets having ipv6only set,
    from Marcelo Ricardo Leitner.

* git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (149 commits)
  rds: transport module should be auto loaded when transport is set
  sch_cake: fix a few style nits
  sch_cake: don't call diffserv parsing code when it is not needed
  sch_cake: don't try to reallocate or unshare skb unconditionally
  ethtool: fix error handling in linkstate_prepare_data()
  wil6210: account for napi_gro_receive never returning GRO_DROP
  hns: do not cast return value of napi_gro_receive to null
  socionext: account for napi_gro_receive never returning GRO_DROP
  wireguard: receive: account for napi_gro_receive never returning GRO_DROP
  vxlan: fix last fdb index during dump of fdb with nhid
  sctp: Don't advertise IPv4 addresses if ipv6only is set on the socket
  tc-testing: avoid action cookies with odd length.
  bpf: tcp: bpf_cubic: fix spurious HYSTART_DELAY exit upon drop in min RTT
  tcp_cubic: fix spurious HYSTART_DELAY exit upon drop in min RTT
  net: dsa: sja1105: fix tc-gate schedule with single element
  net: dsa: sja1105: recalculate gating subschedule after deleting tc-gate rules
  net: dsa: sja1105: unconditionally free old gating config
  net: dsa: sja1105: move sja1105_compose_gating_subschedule at the top
  net: macb: free resources on failure path of at91ether_open()
  net: macb: call pm_runtime_put_sync on failure path
  ...
2020-06-25 18:27:40 -07:00
Yonghong Song
0d4fad3e57 bpf: Add bpf_skc_to_udp6_sock() helper
The helper is used in tracing programs to cast a socket
pointer to a udp6_sock pointer.
The return value could be NULL if the casting is illegal.

Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Cc: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/bpf/20200623230815.3988481-1-yhs@fb.com
2020-06-24 18:37:59 -07:00
Yonghong Song
478cfbdf5f bpf: Add bpf_skc_to_{tcp, tcp_timewait, tcp_request}_sock() helpers
Three more helpers are added to cast a sock_common pointer to
an tcp_sock, tcp_timewait_sock or a tcp_request_sock for
tracing programs.

Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/20200623230811.3988277-1-yhs@fb.com
2020-06-24 18:37:59 -07:00
Yonghong Song
af7ec13833 bpf: Add bpf_skc_to_tcp6_sock() helper
The helper is used in tracing programs to cast a socket
pointer to a tcp6_sock pointer.
The return value could be NULL if the casting is illegal.

A new helper return type RET_PTR_TO_BTF_ID_OR_NULL is added
so the verifier is able to deduce proper return types for the helper.

Different from the previous BTF_ID based helpers,
the bpf_skc_to_tcp6_sock() argument can be several possible
btf_ids. More specifically, all possible socket data structures
with sock_common appearing in the first in the memory layout.
This patch only added socket types related to tcp and udp.

All possible argument btf_id and return value btf_id
for helper bpf_skc_to_tcp6_sock() are pre-calculcated and
cached. In the future, it is even possible to precompute
these btf_id's at kernel build time.

Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/20200623230809.3988195-1-yhs@fb.com
2020-06-24 18:37:59 -07:00
Dmitry Yakunin
f9bcf96837 bpf: Add SO_KEEPALIVE and related options to bpf_setsockopt
This patch adds support of SO_KEEPALIVE flag and TCP related options
to bpf_setsockopt() routine. This is helpful if we want to enable or tune
TCP keepalive for applications which don't do it in the userspace code.

v3:
  - update kernel-doc in uapi (Nikita Vetoshkin <nekto0n@yandex-team.ru>)

v4:
  - update kernel-doc in tools too (Alexei Starovoitov)
  - add test to selftests (Alexei Starovoitov)

Signed-off-by: Dmitry Yakunin <zeil@yandex-team.ru>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/20200620153052.9439-3-zeil@yandex-team.ru
2020-06-24 11:21:03 -07:00
Quentin Monnet
bcc7f554cf bpf: Fix formatting in documentation for BPF helpers
When producing the bpf-helpers.7 man page from the documentation from
the BPF user space header file, rst2man complains:

    <stdin>:2636: (ERROR/3) Unexpected indentation.
    <stdin>:2640: (WARNING/2) Block quote ends without a blank line; unexpected unindent.

Let's fix formatting for the relevant chunk (item list in
bpf_ringbuf_query()'s description), and for a couple other functions.

Signed-off-by: Quentin Monnet <quentin@isovalent.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/20200623153935.6215-1-quentin@isovalent.com
2020-06-23 17:57:02 -07:00
Alexei Starovoitov
4e608675e7 Merge up to bpf_probe_read_kernel_str() fix into bpf-next 2020-06-23 15:33:41 -07:00
Andrii Nakryiko
bdb7b79b4c bpf: Switch most helper return values from 32-bit int to 64-bit long
Switch most of BPF helper definitions from returning int to long. These
definitions are coming from comments in BPF UAPI header and are used to
generate bpf_helper_defs.h (under libbpf) to be later included and used from
BPF programs.

In actual in-kernel implementation, all the helpers are defined as returning
u64, but due to some historical reasons, most of them are actually defined as
returning int in UAPI (usually, to return 0 on success, and negative value on
error).

This actually causes Clang to quite often generate sub-optimal code, because
compiler believes that return value is 32-bit, and in a lot of cases has to be
up-converted (usually with a pair of 32-bit bit shifts) to 64-bit values,
before they can be used further in BPF code.

Besides just "polluting" the code, these 32-bit shifts quite often cause
problems for cases in which return value matters. This is especially the case
for the family of bpf_probe_read_str() functions. There are few other similar
helpers (e.g., bpf_read_branch_records()), in which return value is used by
BPF program logic to record variable-length data and process it. For such
cases, BPF program logic carefully manages offsets within some array or map to
read variable-length data. For such uses, it's crucial for BPF verifier to
track possible range of register values to prove that all the accesses happen
within given memory bounds. Those extraneous zero-extending bit shifts,
inserted by Clang (and quite often interleaved with other code, which makes
the issues even more challenging and sometimes requires employing extra
per-variable compiler barriers), throws off verifier logic and makes it mark
registers as having unknown variable offset. We'll study this pattern a bit
later below.

Another common pattern is to check return of BPF helper for non-zero state to
detect error conditions and attempt alternative actions in such case. Even in
this simple and straightforward case, this 32-bit vs BPF's native 64-bit mode
quite often leads to sub-optimal and unnecessary extra code. We'll look at
this pattern as well.

Clang's BPF target supports two modes of code generation: ALU32, in which it
is capable of using lower 32-bit parts of registers, and no-ALU32, in which
only full 64-bit registers are being used. ALU32 mode somewhat mitigates the
above described problems, but not in all cases.

This patch switches all the cases in which BPF helpers return 0 or negative
error from returning int to returning long. It is shown below that such change
in definition leads to equivalent or better code. No-ALU32 mode benefits more,
but ALU32 mode doesn't degrade or still gets improved code generation.

Another class of cases switched from int to long are bpf_probe_read_str()-like
helpers, which encode successful case as non-negative values, while still
returning negative value for errors.

In all of such cases, correctness is preserved due to two's complement
encoding of negative values and the fact that all helpers return values with
32-bit absolute value. Two's complement ensures that for negative values
higher 32 bits are all ones and when truncated, leave valid negative 32-bit
value with the same value. Non-negative values have upper 32 bits set to zero
and similarly preserve value when high 32 bits are truncated. This means that
just casting to int/u32 is correct and efficient (and in ALU32 mode doesn't
require any extra shifts).

To minimize the chances of regressions, two code patterns were investigated,
as mentioned above. For both patterns, BPF assembly was analyzed in
ALU32/NO-ALU32 compiler modes, both with current 32-bit int return type and
new 64-bit long return type.

Case 1. Variable-length data reading and concatenation. This is quite
ubiquitous pattern in tracing/monitoring applications, reading data like
process's environment variables, file path, etc. In such case, many pieces of
string-like variable-length data are read into a single big buffer, and at the
end of the process, only a part of array containing actual data is sent to
user-space for further processing. This case is tested in test_varlen.c
selftest (in the next patch). Code flow is roughly as follows:

  void *payload = &sample->payload;
  u64 len;

  len = bpf_probe_read_kernel_str(payload, MAX_SZ1, &source_data1);
  if (len <= MAX_SZ1) {
      payload += len;
      sample->len1 = len;
  }
  len = bpf_probe_read_kernel_str(payload, MAX_SZ2, &source_data2);
  if (len <= MAX_SZ2) {
      payload += len;
      sample->len2 = len;
  }
  /* and so on */
  sample->total_len = payload - &sample->payload;
  /* send over, e.g., perf buffer */

There could be two variations with slightly different code generated: when len
is 64-bit integer and when it is 32-bit integer. Both variations were analysed.
BPF assembly instructions between two successive invocations of
bpf_probe_read_kernel_str() were used to check code regressions. Results are
below, followed by short analysis. Left side is using helpers with int return
type, the right one is after the switch to long.

ALU32 + INT                                ALU32 + LONG
===========                                ============

64-BIT (13 insns):                         64-BIT (10 insns):
------------------------------------       ------------------------------------
  17:   call 115                             17:   call 115
  18:   if w0 > 256 goto +9 <LBB0_4>         18:   if r0 > 256 goto +6 <LBB0_4>
  19:   w1 = w0                              19:   r1 = 0 ll
  20:   r1 <<= 32                            21:   *(u64 *)(r1 + 0) = r0
  21:   r1 s>>= 32                           22:   r6 = 0 ll
  22:   r2 = 0 ll                            24:   r6 += r0
  24:   *(u64 *)(r2 + 0) = r1              00000000000000c8 <LBB0_4>:
  25:   r6 = 0 ll                            25:   r1 = r6
  27:   r6 += r1                             26:   w2 = 256
00000000000000e0 <LBB0_4>:                   27:   r3 = 0 ll
  28:   r1 = r6                              29:   call 115
  29:   w2 = 256
  30:   r3 = 0 ll
  32:   call 115

32-BIT (11 insns):                         32-BIT (12 insns):
------------------------------------       ------------------------------------
  17:   call 115                             17:   call 115
  18:   if w0 > 256 goto +7 <LBB1_4>         18:   if w0 > 256 goto +8 <LBB1_4>
  19:   r1 = 0 ll                            19:   r1 = 0 ll
  21:   *(u32 *)(r1 + 0) = r0                21:   *(u32 *)(r1 + 0) = r0
  22:   w1 = w0                              22:   r0 <<= 32
  23:   r6 = 0 ll                            23:   r0 >>= 32
  25:   r6 += r1                             24:   r6 = 0 ll
00000000000000d0 <LBB1_4>:                   26:   r6 += r0
  26:   r1 = r6                            00000000000000d8 <LBB1_4>:
  27:   w2 = 256                             27:   r1 = r6
  28:   r3 = 0 ll                            28:   w2 = 256
  30:   call 115                             29:   r3 = 0 ll
                                             31:   call 115

In ALU32 mode, the variant using 64-bit length variable clearly wins and
avoids unnecessary zero-extension bit shifts. In practice, this is even more
important and good, because BPF code won't need to do extra checks to "prove"
that payload/len are within good bounds.

32-bit len is one instruction longer. Clang decided to do 64-to-32 casting
with two bit shifts, instead of equivalent `w1 = w0` assignment. The former
uses extra register. The latter might potentially lose some range information,
but not for 32-bit value. So in this case, verifier infers that r0 is [0, 256]
after check at 18:, and shifting 32 bits left/right keeps that range intact.
We should probably look into Clang's logic and see why it chooses bitshifts
over sub-register assignments for this.

NO-ALU32 + INT                             NO-ALU32 + LONG
==============                             ===============

64-BIT (14 insns):                         64-BIT (10 insns):
------------------------------------       ------------------------------------
  17:   call 115                             17:   call 115
  18:   r0 <<= 32                            18:   if r0 > 256 goto +6 <LBB0_4>
  19:   r1 = r0                              19:   r1 = 0 ll
  20:   r1 >>= 32                            21:   *(u64 *)(r1 + 0) = r0
  21:   if r1 > 256 goto +7 <LBB0_4>         22:   r6 = 0 ll
  22:   r0 s>>= 32                           24:   r6 += r0
  23:   r1 = 0 ll                          00000000000000c8 <LBB0_4>:
  25:   *(u64 *)(r1 + 0) = r0                25:   r1 = r6
  26:   r6 = 0 ll                            26:   r2 = 256
  28:   r6 += r0                             27:   r3 = 0 ll
00000000000000e8 <LBB0_4>:                   29:   call 115
  29:   r1 = r6
  30:   r2 = 256
  31:   r3 = 0 ll
  33:   call 115

32-BIT (13 insns):                         32-BIT (13 insns):
------------------------------------       ------------------------------------
  17:   call 115                             17:   call 115
  18:   r1 = r0                              18:   r1 = r0
  19:   r1 <<= 32                            19:   r1 <<= 32
  20:   r1 >>= 32                            20:   r1 >>= 32
  21:   if r1 > 256 goto +6 <LBB1_4>         21:   if r1 > 256 goto +6 <LBB1_4>
  22:   r2 = 0 ll                            22:   r2 = 0 ll
  24:   *(u32 *)(r2 + 0) = r0                24:   *(u32 *)(r2 + 0) = r0
  25:   r6 = 0 ll                            25:   r6 = 0 ll
  27:   r6 += r1                             27:   r6 += r1
00000000000000e0 <LBB1_4>:                 00000000000000e0 <LBB1_4>:
  28:   r1 = r6                              28:   r1 = r6
  29:   r2 = 256                             29:   r2 = 256
  30:   r3 = 0 ll                            30:   r3 = 0 ll
  32:   call 115                             32:   call 115

In NO-ALU32 mode, for the case of 64-bit len variable, Clang generates much
superior code, as expected, eliminating unnecessary bit shifts. For 32-bit
len, code is identical.

So overall, only ALU-32 32-bit len case is more-or-less equivalent and the
difference stems from internal Clang decision, rather than compiler lacking
enough information about types.

Case 2. Let's look at the simpler case of checking return result of BPF helper
for errors. The code is very simple:

  long bla;
  if (bpf_probe_read_kenerl(&bla, sizeof(bla), 0))
      return 1;
  else
      return 0;

ALU32 + CHECK (9 insns)                    ALU32 + CHECK (9 insns)
====================================       ====================================
  0:    r1 = r10                             0:    r1 = r10
  1:    r1 += -8                             1:    r1 += -8
  2:    w2 = 8                               2:    w2 = 8
  3:    r3 = 0                               3:    r3 = 0
  4:    call 113                             4:    call 113
  5:    w1 = w0                              5:    r1 = r0
  6:    w0 = 1                               6:    w0 = 1
  7:    if w1 != 0 goto +1 <LBB2_2>          7:    if r1 != 0 goto +1 <LBB2_2>
  8:    w0 = 0                               8:    w0 = 0
0000000000000048 <LBB2_2>:                 0000000000000048 <LBB2_2>:
  9:    exit                                 9:    exit

Almost identical code, the only difference is the use of full register
assignment (r1 = r0) vs half-registers (w1 = w0) in instruction #5. On 32-bit
architectures, new BPF assembly might be slightly less optimal, in theory. But
one can argue that's not a big issue, given that use of full registers is
still prevalent (e.g., for parameter passing).

NO-ALU32 + CHECK (11 insns)                NO-ALU32 + CHECK (9 insns)
====================================       ====================================
  0:    r1 = r10                             0:    r1 = r10
  1:    r1 += -8                             1:    r1 += -8
  2:    r2 = 8                               2:    r2 = 8
  3:    r3 = 0                               3:    r3 = 0
  4:    call 113                             4:    call 113
  5:    r1 = r0                              5:    r1 = r0
  6:    r1 <<= 32                            6:    r0 = 1
  7:    r1 >>= 32                            7:    if r1 != 0 goto +1 <LBB2_2>
  8:    r0 = 1                               8:    r0 = 0
  9:    if r1 != 0 goto +1 <LBB2_2>        0000000000000048 <LBB2_2>:
 10:    r0 = 0                               9:    exit
0000000000000058 <LBB2_2>:
 11:    exit

NO-ALU32 is a clear improvement, getting rid of unnecessary zero-extension bit
shifts.

Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20200623032224.4020118-1-andriin@fb.com
2020-06-24 00:04:36 +02:00
Arnaldo Carvalho de Melo
0e093c77c5 tools headers UAPI: Sync linux/fs.h with the kernel sources
To pick the changes from:

  b383a73f2b ("fs/ext4: Introduce DAX inode flag")

And silence this perf build warning:

  Warning: Kernel ABI header at 'tools/include/uapi/linux/fs.h' differs from latest version at 'include/uapi/linux/fs.h'
  diff -u tools/include/uapi/linux/fs.h include/uapi/linux/fs.h

It causes various beautifiers for things like fspick, fsmount, etc (see
below) to get rebuilt, but this specific change doesn't make 'perf
trace' be capable of decoding anything new, as we still don't decode
what comes from ioctls, just its cmds.

Details about the update:

  $ cp include/uapi/linux/fs.h tools/include/uapi/linux/fs.h
  $ git diff
  diff --git a/tools/include/uapi/linux/fs.h b/tools/include/uapi/linux/fs.h
  index 379a612f8f1d..f44eb0a04afd 100644
  --- a/tools/include/uapi/linux/fs.h
  +++ b/tools/include/uapi/linux/fs.h
  @@ -262,6 +262,7 @@ struct fsxattr {
   #define FS_EA_INODE_FL                 0x00200000 /* Inode used for large EA */
   #define FS_EOFBLOCKS_FL                        0x00400000 /* Reserved for ext4 */
   #define FS_NOCOW_FL                    0x00800000 /* Do not cow file */
  +#define FS_DAX_FL                      0x02000000 /* Inode is DAX */
   #define FS_INLINE_DATA_FL              0x10000000 /* Reserved for ext4 */
   #define FS_PROJINHERIT_FL              0x20000000 /* Create with parents projid */
   #define FS_CASEFOLD_FL                 0x40000000 /* Folder is case insensitive */
  $ m
  make: Entering directory '/home/acme/git/perf/tools/perf'
    BUILD:   Doing 'make -j8' parallel build
    INSTALL  GTK UI
    CC       /tmp/build/perf/builtin-trace.o
    DESCEND  plugins
    CC       /tmp/build/perf/trace/beauty/fsmount.o
    CC       /tmp/build/perf/trace/beauty/fspick.o
    CC       /tmp/build/perf/trace/beauty/mount_flags.o
    CC       /tmp/build/perf/trace/beauty/move_mount.o
    CC       /tmp/build/perf/trace/beauty/renameat.o
    CC       /tmp/build/perf/trace/beauty/sync_file_range.o
    INSTALL  trace_plugins
    LD       /tmp/build/perf/trace/beauty/perf-in.o
    LD       /tmp/build/perf/perf-in.o
    LINK     /tmp/build/perf/perf
  <SNIP>

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ira Weiny <ira.weiny@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-06-17 13:23:02 -03:00
Arnaldo Carvalho de Melo
f64925c1eb tools include UAPI: Sync linux/vhost.h with the kernel sources
To get the changes in:

  776f395004 ("vhost_vdpa: Support config interrupt in vdpa")

Silencing this perf build warning:

  Warning: Kernel ABI header at 'tools/include/uapi/linux/vhost.h' differs from latest version at 'include/uapi/linux/vhost.h'
  diff -u tools/include/uapi/linux/vhost.h include/uapi/linux/vhost.h

This automatically picks the new ioctl introduced in the above patch,
making tools such as 'perf trace' aware of them and possibly allowing to
use the strings in filters, etc:

  # perf trace -e ioctl --pid 7951
  <SNIP>
     0.178 ( 0.010 ms): CPU 0/KVM/8023 ioctl(fd: 14, cmd: KVM_RUN) = 0
     0.194 ( 0.010 ms): CPU 0/KVM/8023 ioctl(fd: 14, cmd: KVM_RUN) = 0
     0.209 ( 0.010 ms): CPU 0/KVM/8023 ioctl(fd: 14, cmd: KVM_RUN) = 0
     0.224 (249.413 ms): CPU 0/KVM/8023 ioctl(fd: 14, cmd: KVM_RUN) = 0
   249.660 ( 0.011 ms): CPU 0/KVM/8023 ioctl(fd: 14, cmd: KVM_RUN) = 0
   249.675 ( 0.007 ms): CPU 0/KVM/8023 ioctl(fd: 14, cmd: KVM_RUN) = 0
   249.686 ( 0.007 ms): CPU 0/KVM/8023 ioctl(fd: 14, cmd: KVM_RUN) = 0
   249.697 ( 0.008 ms): CPU 0/KVM/8023 ioctl(fd: 14, cmd: KVM_RUN) = 0
   249.709 ( 0.007 ms): CPU 0/KVM/8023 ioctl(fd: 14, cmd: KVM_RUN) = 0
   249.720 ( 0.007 ms): CPU 0/KVM/8023 ioctl(fd: 14, cmd: KVM_RUN) = 0
   249.730 ( 0.007 ms): CPU 0/KVM/8023 ioctl(fd: 14, cmd: KVM_RUN) = 0
   249.740 ( 0.007 ms): CPU 0/KVM/8023 ioctl(fd: 14, cmd: KVM_RUN) = 0
   249.752 ( 0.007 ms): CPU 0/KVM/8023 ioctl(fd: 14, cmd: KVM_RUN) = 0
   249.762 ( 0.007 ms): CPU 0/KVM/8023 ioctl(fd: 14, cmd: KVM_RUN) = 0
   249.772 ( 0.007 ms): CPU 0/KVM/8023 ioctl(fd: 14, cmd: KVM_RUN) = 0
   249.782 (120.138 ms): CPU 0/KVM/8023 ioctl(fd: 14, cmd: KVM_RUN) = 0
   370.201 ( 0.039 ms): CPU 0/KVM/8023 ioctl(fd: 12, cmd: KVM_IRQ_LINE_STATUS, arg: 0x7f744f9e1420) = 0
   370.254 ( 0.052 ms): CPU 0/KVM/8023 ioctl(fd: 14, cmd: KVM_RUN) = 0
   370.575 ( 0.365 ms): CPU 0/KVM/8023 ioctl(fd: 14, cmd: KVM_RUN) = 0
   370.973 ( 0.028 ms): CPU 0/KVM/8023 ioctl(fd: 14, cmd: KVM_RUN) = 0
   371.015 ( 0.037 ms): CPU 0/KVM/8023 ioctl(fd: 14, cmd: KVM_RUN) = 0
   371.071 ( 0.009 ms): CPU 0/KVM/8023 ioctl(fd: 12, cmd: KVM_IRQ_LINE_STATUS, arg: 0x7f744f9e14b0) = 0
  <SNIP>
  #

Details about the update:

  $ diff -u tools/include/uapi/linux/vhost.h include/uapi/linux/vhost.h
  --- tools/include/uapi/linux/vhost.h	2020-04-16 13:19:12.056763843 -0300
  +++ include/uapi/linux/vhost.h	2020-06-17 10:04:20.532056428 -0300
  @@ -15,6 +15,8 @@
   #include <linux/types.h>
   #include <linux/ioctl.h>

  +#define VHOST_FILE_UNBIND -1
  +
   /* ioctls */

   #define VHOST_VIRTIO 0xAF
  @@ -140,4 +142,6 @@
   /* Get the max ring size. */
   #define VHOST_VDPA_GET_VRING_NUM	_IOR(VHOST_VIRTIO, 0x76, __u16)

  +/* Set event fd for config interrupt*/
  +#define VHOST_VDPA_SET_CONFIG_CALL	_IOW(VHOST_VIRTIO, 0x77, int)
   #endif
  $
  $ tools/perf/trace/beauty/vhost_virtio_ioctl.sh > before
  $ cp include/uapi/linux/vhost.h tools/include/uapi/linux/vhost.h
  $ tools/perf/trace/beauty/vhost_virtio_ioctl.sh > after
  $ diff -u before after
  --- before	2020-06-17 10:15:35.123275966 -0300
  +++ after	2020-06-17 10:15:51.812482117 -0300
  @@ -27,6 +27,7 @@
   	[0x72] = "VDPA_SET_STATUS",
   	[0x74] = "VDPA_SET_CONFIG",
   	[0x75] = "VDPA_SET_VRING_ENABLE",
  +	[0x77] = "VDPA_SET_CONFIG_CALL",
   };
   static const char *vhost_virtio_ioctl_read_cmds[] = {
   	[0x00] = "GET_FEATURES",
  $

This causes these parts to get rebuilt:

  CC       /tmp/build/perf/trace/beauty/ioctl.o
  INSTALL  trace_plugins
  LD       /tmp/build/perf/trace/beauty/perf-in.o
  LD       /tmp/build/perf/perf-in.o
  LINK     /tmp/build/perf/perf

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Zhu Lingshan <lingshan.zhu@intel.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-06-17 13:22:41 -03:00
Arnaldo Carvalho de Melo
08a7c7772b Merge remote-tracking branch 'torvalds/master' into perf/urgent
To get some newer headers that got out of sync with the copies in tools/
so that we can try to have the tools/perf/ build clean for v5.8 with
fewer pull requests.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-06-17 13:20:14 -03:00
Andrii Nakryiko
b0659d8a95 bpf: Fix definition of bpf_ringbuf_output() helper in UAPI comments
Fix definition of bpf_ringbuf_output() in UAPI header comments, which is used
to generate libbpf's bpf_helper_defs.h header. Return value is a number (error
code), not a pointer.

Fixes: 457f44363a ("bpf: Implement BPF ring buffer and verifier support for it")
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20200615214926.3638836-1-andriin@fb.com
2020-06-16 02:17:01 +02:00
Linus Torvalds
96144c58ab Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from David Miller:

 1) Fix cfg80211 deadlock, from Johannes Berg.

 2) RXRPC fails to send norigications, from David Howells.

 3) MPTCP RM_ADDR parsing has an off by one pointer error, fix from
    Geliang Tang.

 4) Fix crash when using MSG_PEEK with sockmap, from Anny Hu.

 5) The ucc_geth driver needs __netdev_watchdog_up exported, from
    Valentin Longchamp.

 6) Fix hashtable memory leak in dccp, from Wang Hai.

 7) Fix how nexthops are marked as FDB nexthops, from David Ahern.

 8) Fix mptcp races between shutdown and recvmsg, from Paolo Abeni.

 9) Fix crashes in tipc_disc_rcv(), from Tuong Lien.

10) Fix link speed reporting in iavf driver, from Brett Creeley.

11) When a channel is used for XSK and then reused again later for XSK,
    we forget to clear out the relevant data structures in mlx5 which
    causes all kinds of problems. Fix from Maxim Mikityanskiy.

12) Fix memory leak in genetlink, from Cong Wang.

13) Disallow sockmap attachments to UDP sockets, it simply won't work.
    From Lorenz Bauer.

* git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (83 commits)
  net: ethernet: ti: ale: fix allmulti for nu type ale
  net: ethernet: ti: am65-cpsw-nuss: fix ale parameters init
  net: atm: Remove the error message according to the atomic context
  bpf: Undo internal BPF_PROBE_MEM in BPF insns dump
  libbpf: Support pre-initializing .bss global variables
  tools/bpftool: Fix skeleton codegen
  bpf: Fix memlock accounting for sock_hash
  bpf: sockmap: Don't attach programs to UDP sockets
  bpf: tcp: Recv() should return 0 when the peer socket is closed
  ibmvnic: Flush existing work items before device removal
  genetlink: clean up family attributes allocations
  net: ipa: header pad field only valid for AP->modem endpoint
  net: ipa: program upper nibbles of sequencer type
  net: ipa: fix modem LAN RX endpoint id
  net: ipa: program metadata mask differently
  ionic: add pcie_print_link_status
  rxrpc: Fix race between incoming ACK parser and retransmitter
  net/mlx5: E-Switch, Fix some error pointer dereferences
  net/mlx5: Don't fail driver on failure to create debugfs
  net/mlx5e: CT: Fix ipv6 nat header rewrite actions
  ...
2020-06-13 16:27:13 -07:00
David S. Miller
fa7566a0d6 Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf
Alexei Starovoitov says:

====================
pull-request: bpf 2020-06-12

The following pull-request contains BPF updates for your *net* tree.

We've added 26 non-merge commits during the last 10 day(s) which contain
a total of 27 files changed, 348 insertions(+), 93 deletions(-).

The main changes are:

1) sock_hash accounting fix, from Andrey.

2) libbpf fix and probe_mem sanitizing, from Andrii.

3) sock_hash fixes, from Jakub.

4) devmap_val fix, from Jesper.

5) load_bytes_relative fix, from YiFei.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-13 15:28:08 -07:00
Jesper Dangaard Brouer
042b1545fe bpf: Selftests and tools use struct bpf_devmap_val from uapi
Sync tools uapi bpf.h header file and update selftests that use
struct bpf_devmap_val.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/159170951195.2102545.1833108712124273987.stgit@firesoul
2020-06-09 11:36:19 -07:00